Conference proceeding
Mining Novel Knowledge from Biomedical Literature using Statistical Measures and Domain Knowledge
Proceedings of the 7th Acm International Conference on Bioinformatics, Computational Biology, and Health Informatics, pp.317-326
01/01/2016
DOI: 10.1145/2975167.2975200
Abstract
The massive and unprecedented volume of scientific literature readily available in the domain of biomedicine has presented us with challenges and opportunities to accelerate hypothesis generation. Advanced text mining techniques are required to leverage this abundant textual representation in order to provide timely access to explicit facts and aid in elucidating association among implicit facts. The problem of inferring novel knowledge from these implicit facts by logically connecting independent fragments of literature is known as Literature Based Discovery(LBD).
In LBD, to discover hidden links, it is important to determine the relevancy between concepts using appropriate information measures. In this paper, to discover interesting and inherent links latent in large corpora, nine distinct methods, comprising variants of statistical information measures and derived semantic knowledge from domain ontology, are designed and compared. For better understanding of results, we split methods into three groups. The first group includes traditional information measures such as Mutual information, Chi-Square and those used in association rule mining; the second group incorporates popular nullinvariant correlation measures: AILConfidence, Kulczynski, and Cosine; the third group consists of null-invariant measures combined with our proposed notion of semantic relatedness. We have also proposed a new strategy of effective preprocessing, which is capable of removing terms that are spurious, semantically unrelated or have meager chances of constituting a new discovery. A series of experiments are performed and analyzed for those proposed methods. In addition, we also provide an organized list of final concepts deemed worthy of scientific investigation or experimentation. Overall, our research presents a comprehensive analysis and perspective of how different statistical information measures and semantic knowledge affect the knowledge discovery procedure.
Details
- Title: Subtitle
- Mining Novel Knowledge from Biomedical Literature using Statistical Measures and Domain Knowledge
- Creators
- Kishlay Jha - North Dakota State UniversityWei Jin - University of North TexasACM
- Resource Type
- Conference proceeding
- Publication Details
- Proceedings of the 7th Acm International Conference on Bioinformatics, Computational Biology, and Health Informatics, pp.317-326
- DOI
- 10.1145/2975167.2975200
- Publisher
- Assoc Computing Machinery
- Number of pages
- 10
- Grant note
- IIS-1452898 / National Science Foundation; National Science Foundation (NSF)
- Language
- English
- Date published
- 01/01/2016
- Academic Unit
- Electrical and Computer Engineering
- Record Identifier
- 9984294923002771
Metrics
9 Record Views