Conference proceeding
Interpretable Word Embeddings for Medical Domain
2018 IEEE International Conference on Data Mining (ICDM), Vol.2018-, pp.1061-1066
11/2018
DOI: 10.1109/ICDM.2018.00135
Abstract
Word embeddings are finding their increasing application in a variety of biomedical Natural Language Processing (bioNLP) tasks, ranging from drug discovery to automated disease diagnosis. While these word embeddings in their entirety have shown meaningful syntactic and semantic regularities, however, the meaning of individual dimensions remains elusive. This becomes problematic both in general and particularly in sensitive domains such as bio-medicine, wherein, the interpretability of results is crucial to its widespread adoption. To address this issue, in this study, we aim to improve the interpretability of pre-trained word embeddings generated from a text corpora, and in doing so provide a systematic approach to formalize the problem. More specifically, we exploit the rich categorical knowledge present in the biomedical domain, and propose to learn a transformation matrix that transforms the input embeddings to a new space where they are both interpretable and retain their original expressive features. Experiments conducted on the largest available biomedical corpus suggests that the model is capable of performing interpretability that resembles closely to the human-level intuition.
Details
- Title: Subtitle
- Interpretable Word Embeddings for Medical Domain
- Creators
- Kishlay Jha - University at Buffalo, State University of New YorkYaqing Wang - University at Buffalo, State University of New YorkGuangxu Xun - University at Buffalo, State University of New YorkAidong Zhang - University at Buffalo, State University of New York
- Resource Type
- Conference proceeding
- Publication Details
- 2018 IEEE International Conference on Data Mining (ICDM), Vol.2018-, pp.1061-1066
- Publisher
- IEEE
- DOI
- 10.1109/ICDM.2018.00135
- ISSN
- 1550-4786
- eISSN
- 2374-8486
- Language
- English
- Date published
- 11/2018
- Academic Unit
- Electrical and Computer Engineering
- Record Identifier
- 9984294927502771
Metrics
4 Record Views