Preprint
Unified Convergence Analysis of Stochastic Momentum Methods for Convex and Non-convex Optimization
ArXiv.org
04/12/2016
DOI: 10.48550/arxiv.1604.03257
Abstract
Recently, {\it stochastic momentum} methods have been widely adopted in
training deep neural networks. However, their convergence analysis is still
underexplored at the moment, in particular for non-convex optimization. This
paper fills the gap between practice and theory by developing a basic
convergence analysis of two stochastic momentum methods, namely stochastic
heavy-ball method and the stochastic variant of Nesterov's accelerated gradient
method. We hope that the basic convergence results developed in this paper can
serve the reference to the convergence of stochastic momentum methods and also
serve the baselines for comparison in future development of stochastic momentum
methods. The novelty of convergence analysis presented in this paper is a
unified framework, revealing more insights about the similarities and
differences between different stochastic momentum methods and stochastic
gradient method. The unified framework exhibits a continuous change from the
gradient method to Nesterov's accelerated gradient method and finally the
heavy-ball method incurred by a free parameter, which can help explain a
similar change observed in the testing error convergence behavior for deep
learning. Furthermore, our empirical results for optimizing deep neural
networks demonstrate that the stochastic variant of Nesterov's accelerated
gradient method achieves a good tradeoff (between speed of convergence in
training error and robustness of convergence in testing error) among the three
stochastic methods.
Details
- Title: Subtitle
- Unified Convergence Analysis of Stochastic Momentum Methods for Convex and Non-convex Optimization
- Creators
- Tianbao YangQihang LinZhe Li
- Resource Type
- Preprint
- Publication Details
- ArXiv.org
- DOI
- 10.48550/arxiv.1604.03257
- ISSN
- 2331-8422
- Language
- English
- Date posted
- 04/12/2016
- Academic Unit
- Business Analytics; Computer Science
- Record Identifier
- 9984380583302771
Metrics
7 Record Views