Journal article
COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression
Proceedings of the VLDB Endowment, Vol.15(4), pp.886-899
12/01/2021
DOI: 10.14778/3503585.3503597
Abstract
Deep neural networks (DNNs) are becoming increasingly deeper, wider, and non-linear due to the growing demands on prediction accuracy and analysis quality. Training wide and deep neural networks require large amounts of storage resources such as memory because the intermediate activation data must be saved in the memory during forward propagation and then restored for backward propagation. However, state-of-the-art accelerators such as GPUs are only equipped with very limited memory capacities due to hardware design constraints, which significantly limits the maximum batch size and hence performance speedup when training large-scale DNNs. Traditional memory saving techniques either suffer from performance overhead or are constrained by limited interconnect bandwidth or specific interconnect technology.
In this paper, we propose a novel memory-efficient CNN training framework (called COMET) that leverages error-bounded lossy compression to significantly reduce the memory requirement for training in order to allow training larger models or to accelerate training. Our framework purposely adopts error-bounded lossy compression with a strict error-controlling mechanism. Specifically, we perform a theoretical analysis on the compression error propagation from the altered activation data to the gradients, and empirically investigate the impact of altered gradients over the training process. Based on these analyses, we optimize the error-bounded lossy compression and propose an adaptive error-bound control scheme for activation data compression. Experiments demonstrate that our proposed framework can significantly reduce the training memory consumption by up to 13.5x over the baseline training and 1.8x over another state-of-the-art compression-based framework, respectively, with little or no accuracy loss.
Details
- Title: Subtitle
- COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression
- Creators
- Sian Jin - Washington State UniversityChengming Zhang - Washington State UniversityXintong Jiang - McGill UniversityYunhe Feng - University of WashingtonHui Guan - University of Massachusetts AmherstGuanpeng Li - University of IowaShuaiwen Leon Song - University of SydneyDingwen Tao - Washington State University
- Resource Type
- Journal article
- Publication Details
- Proceedings of the VLDB Endowment, Vol.15(4), pp.886-899
- Publisher
- Assoc Computing Machinery
- DOI
- 10.14778/3503585.3503597
- ISSN
- 2150-8097
- eISSN
- 2150-8097
- Number of pages
- 14
- Grant note
- Facebook Faculty Award; Facebook Inc OAC-2034169; OAC-2042084 / National Science Foundation; National Science Foundation (NSF) DP210101984 / Australian Research Council
- Language
- English
- Date published
- 12/01/2021
- Academic Unit
- Computer Science
- Record Identifier
- 9984411090002771
Metrics
2 Record Views