Symptom-BERT: Enhancing Cancer Symptom Detection in EHR Clinical Notes

Nahid Zeinali; Alaa Albashayreh; Weiguo Fan; Stephanie Gilbertson White

doi:10.1016/j.jpainsymman.2024.05.015

Back

Symptom-BERT: Enhancing Cancer Symptom Detection in EHR Clinical Notes

Journal article

Peer reviewed

Symptom-BERT: Enhancing Cancer Symptom Detection in EHR Clinical Notes

Nahid Zeinali, Alaa Albashayreh, Weiguo Fan and Stephanie Gilbertson White

Journal of pain and symptom management, Vol.68(2), pp.190-198.e1

08/2024

DOI: 10.1016/j.jpainsymman.2024.05.015

PMCID: PMC12433187

PMID: 38789092

View Online

Abstract

Extracting cancer symptom documentation allows clinicians to develop highly individualized symptom prediction algorithms to deliver symptom management care. Leveraging advanced language models to detect symptom data in clinical narratives can significantly enhance this process. This study uses a pre-trained large language model to detect and extract cancer symptoms in clinical notes. We developed a pre-trained language model to identify cancer symptoms in clinical notes based on a clinical corpus from the Enterprise Data Warehouse for Research at a healthcare system in the Midwestern United States. This study was conducted in 4 phases:1 pre-training a Bio-Clinical BERT model on 1 million unlabeled clinical documents,2 fine-tuning Symptom-BERT for detecting 13 cancer symptom groups within 1112 annotated clinical notes,3 generating 180 synthetic clinical notes using ChatGPT-4 for external validation, and4 comparing the internal and external performance of Symptom-BERT against a non-pre-trained version and six other BERT implementations. The Symptom-BERT model effectively detected cancer symptoms in clinical notes. It achieved results with a micro-averaged F1-score of 0.933, an AUC of 0.929 internally, and 0.831 and 0.834 externally. Our analysis shows that physical symptoms, like Pruritus, are typically identified with higher performance than psychological symptoms, such as Anxiety. This study underscores the transformative potential of specialized pre-training on domain-specific data in boosting the performance of language models for medical applications. The Symptom-BERT model's exceptional efficacy in detecting cancer symptoms heralds a groundbreaking stride in patient-centered AI technologies, offering a promising path to elevate symptom management and cultivate superior patient self-care outcomes.

Cancer symptoms

large language Model

Multiclassification

Natural language processing

Details

Title: Subtitle: Symptom-BERT: Enhancing Cancer Symptom Detection in EHR Clinical Notes
Creators: Nahid Zeinali - University of Iowa
Alaa Albashayreh - University of Iowa
Weiguo Fan - University of Iowa
Stephanie Gilbertson White - University of Iowa, Nursing
Resource Type: Journal article
Publication Details: Journal of pain and symptom management, Vol.68(2), pp.190-198.e1
DOI: 10.1016/j.jpainsymman.2024.05.015
PMID: 38789092
PMCID: PMC12433187
NLM abbreviation: J Pain Symptom Manage
ISSN: 0885-3924
eISSN: 1873-6513
Publisher: Elsevier Inc
Grant note: College of Nursing, University of Iowa, Center for Advancing Multimorbidity Science (CAMS) NINR (National Institute for Nursing Research): P20 1P20NR018081 National Cancer Institute (NCI): P30 P30CA086862
Disclosures and Acknowledgments Declaration of competing interest: None declared. Acknowledgments: All listed authors (N.Z, A.A, W.F, and S.G.W) meet the four criteria for authorship as they have signi fi cantly contributed to both the conception and design of this study, the interpretation of results, and the drafting and fi nalization of the manuscript. N.Z and A. A further analysis of the model was performed. Special thanks to our research assistants, Andrea Pingol, Anindita Bandyopadhyay, Teagan White, and Min Zhang, for their crucial contributions to this research. Disclosure: This work was supported by the Betty Irene Moore Fellowship for Nurse Leaders and Innova-tors; College of Nursing, University of Iowa, Center for Advancing Multimorbidity Science (CAMS) NINR (National Institute for Nursing Research) P20 1P20NR018081; Holden Comprehensive Cancer Cen-ter, University of Iowa, National Cancer Institute (NCI) P30 P30CA086862; Institute for Clinical and Transla-tional Science, CTSA University of Iowa UL1TR002537; and Iowa Health Data Resource (IHDR) and the Uni-versity of Iowa https://strategicplan.uiowa.edu/public-private-partnership-p3/p3-program-support-strategic-priorities/p3-proposals-funded-fy-2022. For total trans-parency, it is disclosed that this study utilized ChatGPT, a language model developed by OpenAI, for the crea-tion of an external dataset employed in the external validation of our fi ndings.
Language: English
Electronic publication date: 05/22/2024
Date published: 08/2024
Academic Unit: Nursing; Business Analytics; Internal Medicine
Record Identifier: 9984630759202771

Metrics

13 Record Views

14 Times Cited - Web of Science