Journal article
Characterizing Families of Spectral Similarity Scores and Their Use Cases for Gas Chromatography-Mass Spectrometry Small Molecule Identification
Metabolites, Vol.13(10), p.1101
10/21/2023
DOI: 10.3390/metabo13101101
PMCID: PMC10608912
PMID: 37887426
Abstract
Metabolomics provides a unique snapshot into the world of small molecules and the complex biological processes that govern the human, animal, plant, and environmental ecosystems encapsulated by the One Health modeling framework. However, this "molecular snapshot" is only as informative as the number of metabolites confidently identified within it. The spectral similarity (SS) score is traditionally used to identify compound(s) in mass spectrometry approaches to metabolomics, where spectra are matched to reference libraries of candidate spectra. Unfortunately, there is little consensus on which of the dozens of available SS metrics should be used. This lack of standard SS score creates analytic uncertainty and potentially leads to issues in reproducibility, especially as these data are integrated across other domains. In this work, we use metabolomic spectral similarity as a case study to showcase the challenges in consistency within just one piece of the One Health framework that must be addressed to enable data science approaches for One Health problems. Here, using a large cohort of datasets comprising both standard and complex datasets with expert-verified truth annotations, we evaluated the effectiveness of 66 similarity metrics to delineate between correct matches (true positives) and incorrect matches (true negatives). We additionally characterize the families of these metrics to make informed recommendations for their use. Our results indicate that specific families of metrics (the Inner Product, Correlative, and Intersection families of scores) tend to perform better than others, with no single similarity metric performing optimally for all queried spectra. This work and its findings provide an empirically-based resource for researchers to use in their selection of similarity metrics for GC-MS identification, increasing scientific reproducibility through taking steps towards standardizing identification workflows.
Details
- Title: Subtitle
- Characterizing Families of Spectral Similarity Scores and Their Use Cases for Gas Chromatography-Mass Spectrometry Small Molecule Identification
- Creators
- David J. Degnan - Pacific Northwest National LaboratoryJavier E. Flores - Pacific Northwest National LaboratoryEva R. Brayfindley - Pacific Northwest National LaboratoryVanessa L. Paurus - Environmental Molecular Sciences LaboratoryBobbie-Jo M. Webb-Robertson - Pacific Northwest Natl Lab, Biol Sci Div, Richland, WA 99354 USAChaevien S. Clendinen - Environmental Molecular Sciences LaboratoryLisa M. Bramer - Pacific Northwest National Laboratory
- Resource Type
- Journal article
- Publication Details
- Metabolites, Vol.13(10), p.1101
- DOI
- 10.3390/metabo13101101
- PMID
- 37887426
- PMCID
- PMC10608912
- NLM abbreviation
- Metabolites
- ISSN
- 2218-1989
- eISSN
- 2218-1989
- Publisher
- Mdpi
- Number of pages
- 12
- Grant note
- m/q Initiative at Pacific Northwest National Laboratory (PNNL) PNNL Laboratory Directed Research and Development program DE-AC05-76RL01830 / U.S. Department of Energy (DOE) by Battelle Memorial Institute; United States Department of Energy (DOE)
- Language
- English
- Date published
- 10/21/2023
- Academic Unit
- Biostatistics
- Record Identifier
- 9985113181702771
Metrics
1 Record Views