Natural Language Processing Accurately Differentiates Cancer Symptom Information in Electronic Health Record Narratives

Alaa Albashayreh; Anindita Bandyopadhyay; Nahid Zeinali; Min Zhang; Weiguo Fan; Stephanie Gilbertson White

doi:10.1200/CCI.23.00235

Back

Natural Language Processing Accurately Differentiates Cancer Symptom Information in Electronic Health Record Narratives

Journal article

Open access

Peer reviewed

Natural Language Processing Accurately Differentiates Cancer Symptom Information in Electronic Health Record Narratives

Alaa Albashayreh, Anindita Bandyopadhyay, Nahid Zeinali, Min Zhang, Weiguo Fan and Stephanie Gilbertson White

JCO clinical cancer informatics, Vol.8, e2300235

08/01/2024

DOI: 10.1200/CCI.23.00235

PMCID: PMC12493229

PMID: 39116379

Files and links (1)

url

https://doi.org/10.1200/CCI.23.00235View

Published (Version of record) Open Access

Abstract

Identifying cancer symptoms in electronic health record (EHR) narratives is feasible with natural language processing (NLP). However, more efficient NLP systems are needed to detect various symptoms and distinguish observed symptoms from negated symptoms and medication-related side effects. We evaluated the accuracy of NLP in (1) detecting 14 symptom groups (ie, pain, fatigue, swelling, depressed mood, anxiety, nausea/vomiting, pruritus, headache, shortness of breath, constipation, numbness/tingling, decreased appetite, impaired memory, disturbed sleep) and (2) distinguishing observed symptoms in EHR narratives among patients with cancer.PURPOSEIdentifying cancer symptoms in electronic health record (EHR) narratives is feasible with natural language processing (NLP). However, more efficient NLP systems are needed to detect various symptoms and distinguish observed symptoms from negated symptoms and medication-related side effects. We evaluated the accuracy of NLP in (1) detecting 14 symptom groups (ie, pain, fatigue, swelling, depressed mood, anxiety, nausea/vomiting, pruritus, headache, shortness of breath, constipation, numbness/tingling, decreased appetite, impaired memory, disturbed sleep) and (2) distinguishing observed symptoms in EHR narratives among patients with cancer.We extracted 902,508 notes for 11,784 unique patients diagnosed with cancer and developed a gold standard corpus of 1,112 notes labeled for presence or absence of 14 symptom groups. We trained an embeddings-augmented NLP system integrating human and machine intelligence and conventional machine learning algorithms. NLP metrics were calculated on a gold standard corpus subset for testing.METHODSWe extracted 902,508 notes for 11,784 unique patients diagnosed with cancer and developed a gold standard corpus of 1,112 notes labeled for presence or absence of 14 symptom groups. We trained an embeddings-augmented NLP system integrating human and machine intelligence and conventional machine learning algorithms. NLP metrics were calculated on a gold standard corpus subset for testing.The interannotator agreement for labeling the gold standard corpus was excellent at 92%. The embeddings-augmented NLP model achieved the best performance (F1 score = 0.877). The highest NLP accuracy was observed in pruritus (F1 score = 0.937) while the lowest accuracy was in swelling (F1 score = 0.787). After classifying the entire data set with embeddings-augmented NLP, we found that 41% of the notes included symptom documentation. Pain was the most documented symptom (29% of all notes) while impaired memory was the least documented (0.7% of all notes).RESULTSThe interannotator agreement for labeling the gold standard corpus was excellent at 92%. The embeddings-augmented NLP model achieved the best performance (F1 score = 0.877). The highest NLP accuracy was observed in pruritus (F1 score = 0.937) while the lowest accuracy was in swelling (F1 score = 0.787). After classifying the entire data set with embeddings-augmented NLP, we found that 41% of the notes included symptom documentation. Pain was the most documented symptom (29% of all notes) while impaired memory was the least documented (0.7% of all notes).We illustrated the feasibility of detecting 14 symptom groups in EHR narratives and showed that an embeddings-augmented NLP system outperforms conventional machine learning algorithms in detecting symptom information and differentiating observed symptoms from negated symptoms and medication-related side effects.CONCLUSIONWe illustrated the feasibility of detecting 14 symptom groups in EHR narratives and showed that an embeddings-augmented NLP system outperforms conventional machine learning algorithms in detecting symptom information and differentiating observed symptoms from negated symptoms and medication-related side effects.

Details

Title: Subtitle: Natural Language Processing Accurately Differentiates Cancer Symptom Information in Electronic Health Record Narratives
Creators: Alaa Albashayreh - University of Iowa, Nursing
Anindita Bandyopadhyay - University of Iowa
Nahid Zeinali - University of Iowa
Min Zhang - Communication University of China
Weiguo Fan - University of Iowa
Stephanie Gilbertson White
Resource Type: Journal article
Publication Details: JCO clinical cancer informatics, Vol.8, e2300235
DOI: 10.1200/CCI.23.00235
PMID: 39116379
PMCID: PMC12493229
NLM abbreviation: JCO Clin Cancer Inform
ISSN: 2473-4276
eISSN: 2473-4276
Publisher: LIPPINCOTT WILLIAMS & WILKINS
Grant note: Center for Advancing Multimorbidity Science (NINR) at the University of Iowa College of Nursing: 1P20NR018081 University of Iowa Institute for Clinical and Translational Science (CTSA): UL1TR002537
Supported by the Center for Advancing Multimorbidity Science (NINR, 1P20NR018081) at the University of Iowa College of Nursing, the University of Iowa Institute for Clinical and Translational Science (CTSA, UL1TR002537), and the American Cancer Society, Theory Lab Collaboratory.
Language: English
Date published: 08/01/2024
Academic Unit: Nursing; Business Analytics; Internal Medicine
Record Identifier: 9984696759602771

Metrics

12 Record Views

9 Times Cited - Web of Science