Preprint
Lossy Compression of Scientific Data: Applications Constrains and Requirements
ArXiV.org
Cornell University
03/25/2025
DOI: 10.48550/arxiv.2503.20031
Abstract
Increasing data volumes from scientific simulations and instruments (supercomputers, accelerators, telescopes) often exceed network, storage, and analysis capabilities. The scientific community's response to this challenge is scientific data reduction. Reduction can take many forms, such as triggering, sampling, filtering, quantization, and dimensionality reduction. This report focuses on a specific technique: lossy compression. Lossy compression retains all data points, leveraging correlations and controlled reduced accuracy. Quality constraints, especially for quantities of interest, are crucial for preserving scientific discoveries. User requirements also include compression ratio and speed. While many papers have been published on lossy compression techniques and reference datasets are shared by the community, there is a lack of detailed specifications of application needs that can guide lossy compression researchers and developers. This report fills this gap by reporting on the requirements and constraints of nine scientific applications covering a large spectrum of domains (climate, combustion, cosmology, fusion, light sources, molecular dynamics, quantum circuit simulation, seismology, and system logs). The report also details key lossy compression technologies (SZ, ZFP, MGARD, LC, SPERR, DCTZ, TEZip, LibPressio), discussing their history, principles, error control, hardware support, features, and impact. By presenting both application needs and compression technologies, the report aims to inspire new research to fill existing gaps.
Details
- Title: Subtitle
- Lossy Compression of Scientific Data: Applications Constrains and Requirements
- Creators
- Franck Cappello - Argonne National LaboratoryAllison Baker - NSF National Center for Atmospheric ResearchEbru Bozda - Colorado School of MinesMartin Burtscher - Texas State UniversityKyle Chard - University of ChicagoSheng Di - Argonne National LaboratoryPaul Christopher O'Grady - Stanford UniversityPeng Jiang - University of IowaShaomeng Li - NSF National Center for Atmospheric ResearchErik Lindahl - Stockholm UniversityPeter Lindstrom - Lawrence Livermore National LaboratoryMagnus LundborgKai Zhao - Florida State UniversityXin Liang - University of KentuckyMasaru Nagaso - Colorado School of MinesKento SatoAmarjit SinghSeung Woo SonDingwen Tao - Indiana University BloomingtonJiannan Tian - Indiana University BloomingtonRobert UnderwoodKazutomo YoshiiDanylo LykovYuri AlexeevKyle Gerard Felker
- Resource Type
- Preprint
- Publication Details
- ArXiV.org
- DOI
- 10.48550/arxiv.2503.20031
- ISSN
- 2331-8422
- Publisher
- Cornell University; Ithaca, New York
- Language
- English
- Date posted
- 03/25/2025
- Academic Unit
- Computer Science
- Record Identifier
- 9984802409602771
Metrics
11 Record Views