Dissertation
Using second-order information in machine learning
University of Iowa
Doctor of Philosophy (PhD), University of Iowa
Spring 2025
DOI: 10.25820/etd.007852
Abstract
This thesis proposes second-order optimization algorithms for support vector machines (SVM) and autoencoders. In the first part of this thesis, we propose a smoothing algorithm for solving the soft-margin SVM optimization problem with an $\ell^1$\,penalty. In the second part, we present a stochastic Hessian-free optimization method for training autoencoders by reducing the amount of data used in training.
A smoothing algorithm is presented for solving the soft-margin SVM optimization problem with an $\ell^1$\,penalty. This algorithm is designed to minimize the number of passes over the data, which is a crucial factor for its cost-effectiveness when working with very large datasets. The algorithm employs smoothing for the hinge-loss function and utilizes an active set approach for the $\ell^1$\,penalty. The smoothing parameter $\alpha$ is initially set to a large value and is typically halved when the smoothed problem is solved to a satisfactory level of accuracy. The convergence theory indicates that the algorithm requires $\mathcal{O}(1+\log(1+\log_+(1/\alpha)))$ guarded Newton steps for each value of $\alpha$, except for two asymptotic ranges: $\alpha = \Theta(1)$ and $\alpha = \Theta(1/N)$, where $N$ is the number of data points. In cases where $\alpha \gg 1/N$, only one Newton step is needed. Experimental results demonstrate that our algorithm achieves high test accuracy without compromising training speed.
Hessian-free (HF) optimization has proven to be an effective method for training deep autoencoders. In the second part of this thesis, we aim to accelerate HF training of autoencoders by reducing the amount of data used during the training process. HF utilizes the conjugate gradient algorithm to estimate update directions. Instead, we propose using the LSMR method, which is known for effectively solving large sparse linear systems. We also incorporate Chapelle and Erhan (2011)'s improved preconditioner for HF optimization. In addition, we introduce a new mini-batch selection algorithm to mitigate overfitting. Our algorithm starts with a small subset of the training data. It gradually increases the mini-batch size based on (i) variance estimates obtained during the computation of a mini-batch gradient and (ii) the relative decrease in objective value for the validation data. Our experimental results demonstrate that using the LSMR method and the new sample selection algorithm, our stochastic Hessian-free optimization leads to rapid training of deep autoencoders with improved generalization error.
Details
- Title: Subtitle
- Using second-order information in machine learning
- Creators
- Ibrahim Emirahmetoglu
- Contributors
- David E Stewart (Advisor)Bruce Ayati (Committee Member)Samuel Burer (Committee Member)Laurent Jay (Committee Member)Xueyu Zhu (Committee Member)
- Resource Type
- Dissertation
- Degree Awarded
- Doctor of Philosophy (PhD), University of Iowa
- Degree in
- Applied Mathematical and Computational Sciences
- Date degree season
- Spring 2025
- DOI
- 10.25820/etd.007852
- Publisher
- University of Iowa
- Number of pages
- x, 86 pages
- Copyright
- Copyright 2025 Ibrahim Emirahmetoglu
- Language
- English
- Date submitted
- 04/16/2025
- Description illustrations
- illustrations, tables, graphs
- Description bibliographic
- Includes bibliographical references (pages 83-86).
- Public Abstract (ETD)
- In recent years, we have been observing how artificial intelligence enhances our lives by making them safer, easier, and healthier in numerous ways. One of the building blocks of artificial intelli- gence is machine learning. At the core of almost all machine learning algorithms is an optimization algorithm. In this thesis, we propose new optimization algorithms for various machine learning problems. In particular, we focus on second-order optimization methods, which use the curvature (Hessian) information along with the slope (gradient) information. Support vector machines are classification algorithms widely used to solve various real-world problems. One challenge in using these algorithms is overfitting, which occurs when a machine learning model performs well on training data but poorly on new, unseen data. An overfit model tends to produce inaccurate predictions, leading to poor generalization. To mitigate overfitting in classification tasks, an `1 penalty term can be added to the objective function. However, this introduces another complication: non-smoothness. We propose a smoothing algorithm to address the soft-margin SVM optimization problem with an `1 penalty. This algorithm is designed to be effective for very large datasets. Experimental results demonstrate that our algorithm achieves strong test accuracy while maintaining efficient training speeds. Autoencoders are an unsupervised learning technique built upon neural networks for feature learning. First-order methods like gradient-descent algorithms are used to train such neural net- works. However, these methods can progress very slowly when applied to deep autoencoders, leading to poor performance on the training set. To address this issue, we turn to second-order methods, specifically the Hessian-free (HF) optimization, to train deep autoencoders. One draw- back of HF optimization is that it requires solving large linear systems, which can be quite costly. To mitigate this expense, we employ the LSMR method, a conjugate-gradient technique designed for solving sparse linear equations. Our experimental results show that combining HF optimization with the LSMR method yields a fast training of deep autoencoders with better generalization error.
- Academic Unit
- Interdisciplinary Graduate Program in Applied Mathematical & Computational Sciences
- Record Identifier
- 9984830728602771
Metrics
1 File views/ downloads
4 Record Views