Journal article
Boosting Social Determinants of Health Extraction with Semantic Knowledge Augmented Large Language Model
AMIA ... Annual Symposium proceedings, Vol.2024, pp.453-462
2024
PMCID: PMC12099417
PMID: 40417469
Abstract
Social determinants of health (SDoH) significantly impacts health outcomes and contributes to perpetuating health disparities across healthcare applications. However, automatic extraction of SDoH information from Electronic Health Records (EHRs) is challenging due to the unstructured nature of clinical narratives that contain SDoH related information. Recent advances in Large Language Models (LLMs) have shown great promise for automated SDoH extraction. However, their performance suffers for the imbalanced SDoH categories due to the data scarcity issues. To address this, we propose an innovative approach that augments LLMs with semantic knowledge obtained from the Unified Medical Language Systems (UMLS). This strategy enriches the feature representations of imbalanced SDoH classes, leading to accurate SDoH extraction. More specifically, our proposed data augmentation strategy generates semantically enriched clinical narratives at the LLM pre-finetuning stage. This approach enables the LLM to better adapt to the target data and leads to a good initialization for the finetuning stage. Through extensive experiments using publicly available MIMIC-SDoH data, the proposed approach demonstrates significant improvement in results for the SDoH extraction, especially for the imbalanced classes.
Details
- Title: Subtitle
- Boosting Social Determinants of Health Extraction with Semantic Knowledge Augmented Large Language Model
- Creators
- Lei Gong - University of VirginiaJaren Bresnick - University of VirginiaAidong Zhang - University of VirginiaCathy Wu - University of DelawareKishlay Jha - University of Iowa
- Resource Type
- Journal article
- Publication Details
- AMIA ... Annual Symposium proceedings, Vol.2024, pp.453-462
- PMID
- 40417469
- PMCID
- PMC12099417
- NLM abbreviation
- AMIA Annu Symp Proc
- ISSN
- 1942-597X
- eISSN
- 1559-4076
- Language
- English
- Date published
- 2024
- Academic Unit
- Electrical and Computer Engineering
- Record Identifier
- 9984824164502771
Metrics
4 Record Views