InterHG: an Interpretable and Accurate Model for Hypothesis Generation

Haoyu Wang; Xuan Wang; Yaqing Wang; Guangxu Xun; Kishlay Jha; Jing Gao

doi:10.1109/BIBM52615.2021.9669740

Back

Conference proceeding

InterHG: an Interpretable and Accurate Model for Hypothesis Generation

Haoyu Wang, Xuan Wang, Yaqing Wang, Guangxu Xun, Kishlay Jha and Jing Gao

2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp.1552-1557

12/09/2021

DOI: 10.1109/BIBM52615.2021.9669740

View Online

Abstract

Hypothesis generation, which tries to identify implicit associations between two concepts, has attracted much attention due to its ability of linking key concepts scattered in different articles and enriching plausible new hypotheses. Among existing approaches for hypothesis generation, matrix factorization based methods have achieved start-of-the-art performance. However, matrix factorization based methods suffer from the following limitations: 1) Bridge concepts are determined only as a post-hoc analysis of matrix factorization results; 2) The embeddings of concepts by matrix factorization cannot be explained, and thus it is hard to understand whether the concepts are linked in a semantically meaningful way. To overcome these limitations, we propose an interpretable and accurate hypothesis generation model (InterHG), which improves both accuracy and interpretability compared with existing methods. First, we propose to explicitly model the relationship between bridge concepts and given concept pairs, and conduct tensor factorization to identify link concepts. This reduces information loss and improves accuracy compared with post-hoc approaches. Second, we leverage the description of categories in the tensor factorization, which can output concept embedding as a weighted combination of known categories. With this meaningful embedding representation, medical researchers are able to check the correctness of the suggested link concepts for a given concept pair. We conduct experiments based on MeSH terms (a controlled vocabulary of biomedical concepts) extracted from MEDLINE corpus and category information obtained from UMLS (a comprehensive biomedical concept database). Results demonstrate that the proposed InterHG is highly accurate and produces meaningful embeddings for explanations.

Biological system modeling

Biomedical domain

Bridges

Computational modeling

Databases

Hypothesis generation

Interpretation

Tensors

Unified modeling language

Vocabulary

Details

Title: Subtitle: InterHG: an Interpretable and Accurate Model for Hypothesis Generation
Creators: Haoyu Wang - Purdue University System
Xuan Wang - University of Illinois Urbana-Champaign
Yaqing Wang - Purdue University System
Guangxu Xun - University of Virginia
Kishlay Jha - University of Virginia
Jing Gao - Purdue University System
Resource Type: Conference proceeding
Publication Details: 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp.1552-1557
DOI: 10.1109/BIBM52615.2021.9669740
Publisher: IEEE
Grant note: National Science Foundation (10.13039/100000001)
Language: English
Date published: 12/09/2021
Academic Unit: Electrical and Computer Engineering
Record Identifier: 9984295023202771

Metrics

16 Record Views