Journal article
Systematic data quality assessment of electronic health record data to evaluate study-specific fitness: Report from the PRESERVE research study
PLOS digital health, Vol.3(6), e0000527
06/27/2024
DOI: 10.1371/journal.pdig.0000527
PMCID: PMC11210795
PMID: 38935590
Abstract
Study-specific data quality testing is an essential part of minimizing analytic errors, particularly for studies making secondary use of clinical data. We applied a systematic and reproducible approach for study-specific data quality testing to the analysis plan for PRESERVE, a 15-site, EHR-based observational study of chronic kidney disease in children. This approach integrated widely adopted data quality concepts with healthcare-specific evaluation methods. We implemented two rounds of data quality assessment. The first produced high-level evaluation using aggregate results from a distributed query, focused on cohort identification and main analytic requirements. The second focused on extended testing of row-level data centralized for analysis. We systematized reporting and cataloguing of data quality issues, providing institutional teams with prioritized issues for resolution. We tracked improvements and documented anomalous data for consideration during analyses. The checks we developed identified 115 and 157 data quality issues in the two rounds, involving completeness, data model conformance, cross-variable concordance, consistency, and plausibility, extending traditional data quality approaches to address more complex stratification and temporal patterns. Resolution efforts focused on higher priority issues, given finite study resources. In many cases, institutional teams were able to correct data extraction errors or obtain additional data, avoiding exclusion of 2 institutions entirely and resolving 123 other gaps. Other results identified complexities in measures of kidney function, bearing on the study's outcome definition. Where limitations such as these are intrinsic to clinical data, the study team must account for them in conducting analyses. This study rigorously evaluated fitness of data for intended use. The framework is reusable and built on a strong theoretical underpinning. Significant data quality issues that would have otherwise delayed analyses or made data unusable were addressed. This study highlights the need for teams combining subject-matter and informatics expertise to address data quality when working with real world data.Study-specific data quality testing is an essential part of minimizing analytic errors, particularly for studies making secondary use of clinical data. We applied a systematic and reproducible approach for study-specific data quality testing to the analysis plan for PRESERVE, a 15-site, EHR-based observational study of chronic kidney disease in children. This approach integrated widely adopted data quality concepts with healthcare-specific evaluation methods. We implemented two rounds of data quality assessment. The first produced high-level evaluation using aggregate results from a distributed query, focused on cohort identification and main analytic requirements. The second focused on extended testing of row-level data centralized for analysis. We systematized reporting and cataloguing of data quality issues, providing institutional teams with prioritized issues for resolution. We tracked improvements and documented anomalous data for consideration during analyses. The checks we developed identified 115 and 157 data quality issues in the two rounds, involving completeness, data model conformance, cross-variable concordance, consistency, and plausibility, extending traditional data quality approaches to address more complex stratification and temporal patterns. Resolution efforts focused on higher priority issues, given finite study resources. In many cases, institutional teams were able to correct data extraction errors or obtain additional data, avoiding exclusion of 2 institutions entirely and resolving 123 other gaps. Other results identified complexities in measures of kidney function, bearing on the study's outcome definition. Where limitations such as these are intrinsic to clinical data, the study team must account for them in conducting analyses. This study rigorously evaluated fitness of data for intended use. The framework is reusable and built on a strong theoretical underpinning. Significant data quality issues that would have otherwise delayed analyses or made data unusable were addressed. This study highlights the need for teams combining subject-matter and informatics expertise to address data quality when working with real world data.
Details
- Title: Subtitle
- Systematic data quality assessment of electronic health record data to evaluate study-specific fitness: Report from the PRESERVE research study
- Creators
- Hanieh Razzaghi - Children's Hospital of PhiladelphiaAmy Goodwin DaviesSamuel Boss - Children's Hospital of PhiladelphiaH Timothy Bunnell - Dupont HospitalYong Chen - University of PennsylvaniaElizabeth A Chrischilles - University of IowaKimberley Dickinson - Children's Hospital of PhiladelphiaDavid Hanauer - University of Michigan–Ann ArborYungui Huang - Nationwide Children's HospitalK T Sandra Ilunga - Children's Hospital of PhiladelphiaChryso Katsoufis - University of MiamiHarold LehmannDominick J Lemas - University of Florida HealthKevin Matthews - Children's Hospital ColoradoEneida A Mendonca - Cincinnati Children's Hospital Medical CenterKeith Morse - Stanford MedicineDaksha Ranade - Seattle Children's HospitalMarc Rosenman - Lurie Children's HospitalBradley Taylor - Medical College of WisconsinKellie Walters - University of North Carolina at Chapel HillMichelle R Denburg - Children's Hospital of PhiladelphiaChristopher B. Forrest - Children's Hospital of PhiladelphiaL Charles Bailey - University of Pennsylvania
- Resource Type
- Journal article
- Publication Details
- PLOS digital health, Vol.3(6), e0000527
- DOI
- 10.1371/journal.pdig.0000527
- PMID
- 38935590
- PMCID
- PMC11210795
- ISSN
- 2767-3170
- eISSN
- 2767-3170
- Language
- English
- Date published
- 06/27/2024
- Academic Unit
- Pharmacy; Epidemiology
- Record Identifier
- 9984649154202771
Metrics
11 Record Views