Logo image
InterHG: an Interpretable and Accurate Model for Hypothesis Generation
Conference proceeding

InterHG: an Interpretable and Accurate Model for Hypothesis Generation

Haoyu Wang, Xuan Wang, Yaqing Wang, Guangxu Xun, Kishlay Jha and Jing Gao
2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp.1552-1557
12/09/2021
DOI: 10.1109/BIBM52615.2021.9669740

View Online

Abstract

Hypothesis generation, which tries to identify implicit associations between two concepts, has attracted much attention due to its ability of linking key concepts scattered in different articles and enriching plausible new hypotheses. Among existing approaches for hypothesis generation, matrix factorization based methods have achieved start-of-the-art performance. However, matrix factorization based methods suffer from the following limitations: 1) Bridge concepts are determined only as a post-hoc analysis of matrix factorization results; 2) The embeddings of concepts by matrix factorization cannot be explained, and thus it is hard to understand whether the concepts are linked in a semantically meaningful way. To overcome these limitations, we propose an interpretable and accurate hypothesis generation model (InterHG), which improves both accuracy and interpretability compared with existing methods. First, we propose to explicitly model the relationship between bridge concepts and given concept pairs, and conduct tensor factorization to identify link concepts. This reduces information loss and improves accuracy compared with post-hoc approaches. Second, we leverage the description of categories in the tensor factorization, which can output concept embedding as a weighted combination of known categories. With this meaningful embedding representation, medical researchers are able to check the correctness of the suggested link concepts for a given concept pair. We conduct experiments based on MeSH terms (a controlled vocabulary of biomedical concepts) extracted from MEDLINE corpus and category information obtained from UMLS (a comprehensive biomedical concept database). Results demonstrate that the proposed InterHG is highly accurate and produces meaningful embeddings for explanations.
Biological system modeling Biomedical domain Bridges Computational modeling Databases Hypothesis generation Interpretation Tensors Unified modeling language Vocabulary

Details

Metrics

16 Record Views
Logo image