Logo image
CereSZ: Enabling and Scaling Error-bounded Lossy Compression on Cerebras CS-2
Conference proceeding

CereSZ: Enabling and Scaling Error-bounded Lossy Compression on Cerebras CS-2

Shihui Song, Yafan Huang, Peng Jiang, Xiaodong Yu, Weijian Zheng, Sheng Di, Qinglei Cao, Yunhe Feng, Zhen Xie and Franck Cappello
Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing, pp.309-321
ACM Conferences
HPDC '24: 33rd International Symposium on High-Performance Parallel and Distributed Computing
06/03/2024
DOI: 10.1145/3625549.3658691

View Online

Abstract

Today's scientific applications running on supercomputers produce large volumes of data, leading to critical data storage and communication challenges. To tackle the challenges, error-bounded lossy compression is commonly adopted since it can reduce data size drastically within a user-defined error threshold. Previous work has shown that compression techniques can significantly reduce the storage and I/O overhead while retaining good data quality. However, the existing compressors are mainly designed for CPU and GPU. As new AI chips are being incorporated into supercomputers and increasingly used for accelerating scientific computing, there is a growing demand for efficient data compression on the new architecture. In this paper, we propose an efficient lossy compressor, CereSZ, based on the Cerebras CS-2 system. The compression algorithm is mapped onto Cerebras using both data parallelism and pipeline parallelism. In order to achieve a balanced workload on each processing unit, we propose an algorithm to evenly distribute the pipeline stages. Our experiments with six scientific datasets demonstrate that CereSZ can achieve a throughput from 227.93 GB/s to 773.8 GB/s, 2.43x to 10.98x faster than existing GPU compressors.
Computing methodologies -- Parallel computing methodologies -- Parallel algorithms -- Massively parallel algorithms Theory of computation -- Design and analysis of algorithms -- Data structures design and analysis -- Data compression

Details

Metrics

Logo image