Journal article
Towards self-learning based hypotheses generation in biomedical text domain
BIOINFORMATICS, Vol.34(12), pp.2103-2115
06/15/2018
DOI: 10.1093/bioinformatics/btx837
PMID: 29293920
Abstract
Motivation: The overwhelming amount of research articles in the domain of bio-medicine might cause important connections to remain unnoticed. Literature Based Discovery is a sub-field within biomedical text mining that peruses these articles to formulate high confident hypotheses on possible connections between medical concepts. Although many alternate methodologies have been proposed over the last decade, they still suffer from scalability issues. The primary reason, apart from the dense inter-connections between biological concepts, is the absence of information on the factors that lead to the edge-formation. In this work, we formulate this problem as a collaborative filtering task and leverage a relatively new concept of word-vectors to learn and mimic the implicit edge-formation process. Along with single-class classifier, we prune the search-space of redundant and irrelevant hypotheses to increase the efficiency of the system and at the same time maintaining and in some cases even boosting the overall accuracy.
Results: We show that our proposed framework is able to prune up to 90% of the hypotheses while still retaining high recall in top-K results. This level of efficiency enables the discovery algorithm to look for higher-order hypotheses, something that was infeasible until now. Furthermore, the generic formulation allows our approach to be agile to perform both open and closed discovery. We also experimentally validate that the core data-structures upon which the system bases its decision has a high concordance with the opinion of the experts.This coupled with the ability to understand the edge formation process provides us with interpretable results without any manual intervention.
Details
- Title: Subtitle
- Towards self-learning based hypotheses generation in biomedical text domain
- Creators
- Vishrawas Gopalakrishnan - University at Buffalo, State University of New YorkKishlay Jha - University at Buffalo, State University of New YorkGuangxu Xun - University at Buffalo, State University of New YorkHung Q. Ngo - University at Buffalo, State University of New YorkAidong Zhang - University at Buffalo, State University of New York
- Resource Type
- Journal article
- Publication Details
- BIOINFORMATICS, Vol.34(12), pp.2103-2115
- DOI
- 10.1093/bioinformatics/btx837
- PMID
- 29293920
- NLM abbreviation
- Bioinformatics
- ISSN
- 1367-4803
- eISSN
- 1460-2059
- Publisher
- Oxford Univ Press
- Number of pages
- 13
- Grant note
- IIS-1218393; IIS-1514204 / National Science Foundation; National Science Foundation (NSF)
- Language
- English
- Date published
- 06/15/2018
- Academic Unit
- Electrical and Computer Engineering
- Record Identifier
- 9984294926102771
Metrics
6 Record Views