CereSZ: Enabling and Scaling Error-bounded Lossy Compression on Cerebras CS-2

Shihui Song; Yafan Huang; Peng Jiang; Xiaodong Yu; Weijian Zheng; Sheng Di; Qinglei Cao; Yunhe Feng; Zhen Xie; Franck Cappello

doi:10.1145/3625549.3658691

Conference proceeding

CereSZ: Enabling and Scaling Error-bounded Lossy Compression on Cerebras CS-2

Shihui Song, Yafan Huang, Peng Jiang, Xiaodong Yu, Weijian Zheng, Sheng Di, Qinglei Cao, Yunhe Feng, Zhen Xie and Franck Cappello

Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing, pp.309-321

ACM Conferences

HPDC '24: 33rd International Symposium on High-Performance Parallel and Distributed Computing

06/03/2024

DOI: 10.1145/3625549.3658691

View Online

Abstract

Today's scientific applications running on supercomputers produce large volumes of data, leading to critical data storage and communication challenges. To tackle the challenges, error-bounded lossy compression is commonly adopted since it can reduce data size drastically within a user-defined error threshold. Previous work has shown that compression techniques can significantly reduce the storage and I/O overhead while retaining good data quality. However, the existing compressors are mainly designed for CPU and GPU. As new AI chips are being incorporated into supercomputers and increasingly used for accelerating scientific computing, there is a growing demand for efficient data compression on the new architecture. In this paper, we propose an efficient lossy compressor, CereSZ, based on the Cerebras CS-2 system. The compression algorithm is mapped onto Cerebras using both data parallelism and pipeline parallelism. In order to achieve a balanced workload on each processing unit, we propose an algorithm to evenly distribute the pipeline stages. Our experiments with six scientific datasets demonstrate that CereSZ can achieve a throughput from 227.93 GB/s to 773.8 GB/s, 2.43x to 10.98x faster than existing GPU compressors.

Computing methodologies -- Parallel computing methodologies -- Parallel algorithms -- Massively parallel algorithms

Theory of computation -- Design and analysis of algorithms -- Data structures design and analysis -- Data compression

Details

Title: Subtitle: CereSZ: Enabling and Scaling Error-bounded Lossy Compression on Cerebras CS-2
Creators: Shihui Song - University of Iowa, Iowa City, United States of America
Yafan Huang - University of Iowa, Iowa City, United States of America
Peng Jiang - University of Iowa, Iowa City, United States of America
Xiaodong Yu - Stevens Institute of Technology
Weijian Zheng - Argonne National Laboratory
Sheng Di - Argonne National Laboratory
Qinglei Cao - Saint Louis University
Yunhe Feng - University of North Texas at Dallas
Zhen Xie - Binghamton University
Franck Cappello - Argonne National Laboratory
Resource Type: Conference proceeding
Publication Details: Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing, pp.309-321
Conference: HPDC '24: 33rd International Symposium on High-Performance Parallel and Distributed Computing
Series: ACM Conferences
DOI: 10.1145/3625549.3658691
Publisher: ACM
Grant note: U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research (ASCR): DE-AC02-06CH11357 National Science Foundation: OAC-2003709, OAC-2104023, OAC-2311875 U.S. DOE Office of Science-Advanced Scientific Computing Research Program: DE-AC02-06CH11357
This research was supported by the U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research (ASCR), under contract DE-AC02-06CH11357, and supported by the National Science Foundation under Grant OAC-2003709, OAC-2104023, OAC-2311875. This research used resources of the Argonne Leadership Computing Facility, a U.S. Department of Energy (DOE) Office of Science user facility at Argonne National Laboratory and is based on research supported by the U.S. DOE Office of Science-Advanced Scientific Computing Research Program, under Contract No. DE-AC02-06CH11357.
Language: English
Date published: 06/03/2024
Academic Unit: Computer Science
Record Identifier: 9984699518202771

Metrics

25 Record Views

2 Times Cited - Web of Science

See more details