Knowledge-Base Enriched Word Embeddings for Biomedical Domain

Kishlay Jha

doi:10.48550/arXiv.2103.00479

Back

Knowledge-Base Enriched Word Embeddings for Biomedical Domain

Preprint

Open access

Knowledge-Base Enriched Word Embeddings for Biomedical Domain

Kishlay Jha

ArXiv.org

Cornell University

02/20/2021

DOI: 10.48550/arXiv.2103.00479

Files and links (1)

url

https://doi.org/10.48550/arXiv.2103.00479View

Preprint (Author's original) This preprint has not been evaluated by subject experts through peer review. Preprints may undergo extensive changes and/or become peer-reviewed journal articles. Open Access

Abstract

Word embeddings have been shown adept at capturing the semantic and syntactic regularities of the natural language text, as a result of which these representations have found their utility in a wide variety of downstream content analysis tasks. Commonly, these word embedding techniques derive the distributed representation of words based on the local context information. However, such approaches ignore the rich amount of explicit information present in knowledge-bases. This is problematic, as it might lead to poor representation for words with insufficient local context such as domain specific words. Furthermore, the problem becomes pronounced in domain such as bio-medicine where the presence of these domain specific words are relatively high. Towards this end, in this project, we propose a new word embedding based model for biomedical domain that jointly leverages the information from available corpora and domain knowledge in order to generate knowledge-base powered embeddings. Unlike existing approaches, the proposed methodology is simple but adept at capturing the precise knowledge available in domain resources in an accurate way. Experimental results on biomedical concept similarity and relatedness task validates the effectiveness of the proposed approach.

Details

Title: Subtitle: Knowledge-Base Enriched Word Embeddings for Biomedical Domain
Creators: Kishlay Jha
Resource Type: Preprint
Publication Details: ArXiv.org
DOI: 10.48550/arXiv.2103.00479
ISSN: 2331-8422
Publisher: Cornell University
Language: English
Date posted: 02/20/2021
Academic Unit: Electrical and Computer Engineering
Record Identifier: 9984296038302771

Metrics

22 Record Views