Nonconvex min-max optimization in deep learning: algorithms and applications

Mingrui Liu

doi:10.17077/etd.005562

Back

Nonconvex min-max optimization in deep learning: algorithms and applications

Dissertation

Open access

Nonconvex min-max optimization in deep learning: algorithms and applications

Mingrui Liu

University of Iowa

Doctor of Philosophy (PhD), University of Iowa

Summer 2020

DOI: 10.17077/etd.005562

Files and links (1)

pdf

Liu_uiowa_0096D_16870-o2.60 MBDownload View

Free to read and download, Open Access

Abstract

Nonconvex min-max optimization receives increasing attention in modern machine learning, especially in the context of deep learning. Examples include stochastic AUC maximization with deep neural networks and Generative Adversarial Nets (GANs), which correspond to a nonconvex-concave and nonconvex-nonconcave min-max problem respectively. The classical theory of min-max optimization mainly focuses on convex-concave setting, which is not applicable for deep learning applications with nonconvex min-max formulation. A natural question is proposed---how to design provably efficient algorithms for nonconvex min-max problems in deep learning? To answer this question, this dissertation focuses on the following four concrete aspects: First, we consider the problem of stochastic AUC maximization problem with deep neural networks as preditive models. Building on the saddle point reformulation of a surrogated loss of AUC, the problem can be cast into a {\it non-convex concave} min-max problem. We explore Polyak-Lojasiewicz (PL) condition that has been proved and observed in deep learning, and develop new stochastic algorithms with even faster convergence rate and more practical step size scheme. Second, we consider the first-order convergence theory for weakly-convex-weakly-concave min-max problems. We propose an algorithmic framework motivated by the inexact proximal point method, where the weakly monotone variational inequality (VI) corresponding to the original min-max problem is solved through approximately solving a sequence of strongly monotone VIs constructed by adding a strongly monotone mapping to the original gradient mapping. We prove first-order non-asymptotic convergence to a nearly stationary point of the original min-max problem in polynomial time. Third, we consider a class of nonconvex-nonconcave min-max problems with the focus on GAN training. Although adaptive gradient methods with alternate update empirically work well in training GANs, it requires expensive tuning efforts, lacks theoretical convergence guarantees and might diverge in practice. To bridge the gap, we design an adaptive gradient algorithm for training GANs with provably faster convergence than its non-adaptive counterpart. Fourth, we aim to consider large-scale GAN training in a decentralized distributed manner. Decentralized parallel algorithms are robust to network bandwidth and latency compared with its centralized counterpart and it has merits for protecting users' privacy, but decentralized algorithms for nonconvex-nonconcave min-max optimization are not considered in the existing literature. We propose and analyze a decentralized algorithm for train GANs, and show its provable convergence to first-order stationary point in polynomial time.

Algorithms

Operations Research

Optimization

Applications

Deep Learning

Min-max

Nonconvex

Details

Title: Subtitle: Nonconvex min-max optimization in deep learning: algorithms and applications
Creators: Mingrui Liu
Contributors: Tianbao Yang (Advisor)
Qihang Lin (Committee Member)
Kasturi Varadarajan (Committee Member)
Suely Oliveira (Committee Member)
Weiyu Xu (Committee Member)
Resource Type: Dissertation
Degree Awarded: Doctor of Philosophy (PhD), University of Iowa
Degree in: Computer Science
Date degree season: Summer 2020
DOI: 10.17077/etd.005562
Publisher: University of Iowa
Number of pages: xv, 168 pages
Comment: This thesis has been optimized for improved web viewing. If you require the original version, contact the University Archives at the University of Iowa: https://www.lib.uiowa.edu/sc/contact/
Language: English
Description illustrations: color illustrations
Description bibliographic: Includes bibliographical references (pages 149-168).
Public Abstract (ETD): Nonconvex min-max optimization receives increasing attention in modern machine learning, especially in the context of deep learning. Examples include stochastic AUC maximization with deep neural networks and Generative Adversarial Nets (GANs), which correspond to a nonconvex-concave and nonconvex-nonconcave min-max problem respectively. The classical theory of min-max optimization mainly focuses on convex-concave setting, which is not applicable for deep learning applications with nonconvex min-max formulation. A natural question is proposed—how to design provably eﬃcient algorithms for nonconvex min-max problems in deep learning? To answer this question, this dissertation focuses on solving two important deep learning applications: stochastic AUC maximization with deep neural networks, and GANs. For stochastic AUC maximization with deep neural network, we approach this problem as a nonconvex-concave min-max problem, explore an special property called Polyak-Lojasiewicz (PL) that has been proved and observed in deep learning, and develop provably fast stochastic algorithms with practical step size scheme. For GANs, we view this problem as a nonconvex-nonconcave min-max problem and tackle it from the following three perspectives: the ﬁrst-order convergence theory for weakly-convex-weakly-concave min-max problem with corresponding algorithms, a provably eﬃcient adaptive gradient algorithm for training GANs, and a decentralized parallel algorithm for training GANs.
Academic Unit: Computer Science
Record Identifier: 9983988296702771

Metrics

237 File views/ downloads

424 Record Views