COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression

Sian Jin; Chengming Zhang; Xintong Jiang; Yunhe Feng; Hui Guan; Guanpeng Li; Shuaiwen Leon Song; Dingwen Tao

doi:10.14778/3503585.3503597

Back

COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression

Journal article

Peer reviewed

COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression

Sian Jin, Chengming Zhang, Xintong Jiang, Yunhe Feng, Hui Guan, Guanpeng Li, Shuaiwen Leon Song and Dingwen Tao

Proceedings of the VLDB Endowment, Vol.15(4), pp.886-899

12/01/2021

DOI: 10.14778/3503585.3503597

View Online

Abstract

Deep neural networks (DNNs) are becoming increasingly deeper, wider, and non-linear due to the growing demands on prediction accuracy and analysis quality. Training wide and deep neural networks require large amounts of storage resources such as memory because the intermediate activation data must be saved in the memory during forward propagation and then restored for backward propagation. However, state-of-the-art accelerators such as GPUs are only equipped with very limited memory capacities due to hardware design constraints, which significantly limits the maximum batch size and hence performance speedup when training large-scale DNNs. Traditional memory saving techniques either suffer from performance overhead or are constrained by limited interconnect bandwidth or specific interconnect technology. In this paper, we propose a novel memory-efficient CNN training framework (called COMET) that leverages error-bounded lossy compression to significantly reduce the memory requirement for training in order to allow training larger models or to accelerate training. Our framework purposely adopts error-bounded lossy compression with a strict error-controlling mechanism. Specifically, we perform a theoretical analysis on the compression error propagation from the altered activation data to the gradients, and empirically investigate the impact of altered gradients over the training process. Based on these analyses, we optimize the error-bounded lossy compression and propose an adaptive error-bound control scheme for activation data compression. Experiments demonstrate that our proposed framework can significantly reduce the training memory consumption by up to 13.5x over the baseline training and 1.8x over another state-of-the-art compression-based framework, respectively, with little or no accuracy loss.

Computer Science

Computer Science, Information Systems

Computer Science, Theory & Methods

Science & Technology

Technology

Details

Title: Subtitle: COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression
Creators: Sian Jin - Washington State University
Chengming Zhang - Washington State University
Xintong Jiang - McGill University
Yunhe Feng - University of Washington
Hui Guan - University of Massachusetts Amherst
Guanpeng Li - University of Iowa
Shuaiwen Leon Song - University of Sydney
Dingwen Tao - Washington State University
Resource Type: Journal article
Publication Details: Proceedings of the VLDB Endowment, Vol.15(4), pp.886-899
Publisher: Assoc Computing Machinery
DOI: 10.14778/3503585.3503597
ISSN: 2150-8097
eISSN: 2150-8097
Number of pages: 14
Grant note: Facebook Faculty Award; Facebook Inc OAC-2034169; OAC-2042084 / National Science Foundation; National Science Foundation (NSF) DP210101984 / Australian Research Council
Language: English
Date published: 12/01/2021
Academic Unit: Computer Science
Record Identifier: 9984411090002771

Metrics

2 Record Views

3 Times Cited - Web of Science