Using second-order information in machine learning

Ibrahim Emirahmetoglu

doi:10.25820/etd.007852

Back

Using second-order information in machine learning

Dissertation

Open access

Using second-order information in machine learning

Ibrahim Emirahmetoglu

University of Iowa

Doctor of Philosophy (PhD), University of Iowa

Spring 2025

DOI: 10.25820/etd.007852

Files and links (2)

pdf

Ibrahim-Emirahmetoglu-Thesis725.20 kBDownload View

Free to read and download, Open Access

zip

latex files for dissertation190.25 kBDownload View

supplementalFree to read and download, Open Access

Abstract

This thesis proposes second-order optimization algorithms for support vector machines (SVM) and autoencoders. In the first part of this thesis, we propose a smoothing algorithm for solving the soft-margin SVM optimization problem with an $\ell^1$\,penalty. In the second part, we present a stochastic Hessian-free optimization method for training autoencoders by reducing the amount of data used in training. A smoothing algorithm is presented for solving the soft-margin SVM optimization problem with an $\ell^1$\,penalty. This algorithm is designed to minimize the number of passes over the data, which is a crucial factor for its cost-effectiveness when working with very large datasets. The algorithm employs smoothing for the hinge-loss function and utilizes an active set approach for the $\ell^1$\,penalty. The smoothing parameter $\alpha$ is initially set to a large value and is typically halved when the smoothed problem is solved to a satisfactory level of accuracy. The convergence theory indicates that the algorithm requires $\mathcal{O}(1+\log(1+\log_+(1/\alpha)))$ guarded Newton steps for each value of $\alpha$, except for two asymptotic ranges: $\alpha = \Theta(1)$ and $\alpha = \Theta(1/N)$, where $N$ is the number of data points. In cases where $\alpha \gg 1/N$, only one Newton step is needed. Experimental results demonstrate that our algorithm achieves high test accuracy without compromising training speed. Hessian-free (HF) optimization has proven to be an effective method for training deep autoencoders. In the second part of this thesis, we aim to accelerate HF training of autoencoders by reducing the amount of data used during the training process. HF utilizes the conjugate gradient algorithm to estimate update directions. Instead, we propose using the LSMR method, which is known for effectively solving large sparse linear systems. We also incorporate Chapelle and Erhan (2011)'s improved preconditioner for HF optimization. In addition, we introduce a new mini-batch selection algorithm to mitigate overfitting. Our algorithm starts with a small subset of the training data. It gradually increases the mini-batch size based on (i) variance estimates obtained during the computation of a mini-batch gradient and (ii) the relative decrease in objective value for the validation data. Our experimental results demonstrate that using the LSMR method and the new sample selection algorithm, our stochastic Hessian-free optimization leads to rapid training of deep autoencoders with improved generalization error.

Mathematics

Autoencoders

Newton's method

Second-order optimization

Support Vector Machines

Details

Title: Subtitle: Using second-order information in machine learning
Creators: Ibrahim Emirahmetoglu
Contributors: David E Stewart (Advisor)
Bruce Ayati (Committee Member)
Samuel Burer (Committee Member)
Laurent Jay (Committee Member)
Xueyu Zhu (Committee Member)
Resource Type: Dissertation
Degree Awarded: Doctor of Philosophy (PhD), University of Iowa
Degree in: Applied Mathematical and Computational Sciences
Date degree season: Spring 2025
DOI: 10.25820/etd.007852
Publisher: University of Iowa
Number of pages: x, 86 pages
Language: English
Date submitted: 04/16/2025
Description illustrations: illustrations, tables, graphs
Description bibliographic: Includes bibliographical references (pages 83-86).
Public Abstract (ETD): In recent years, we have been observing how artiﬁcial intelligence enhances our lives by making them safer, easier, and healthier in numerous ways. One of the building blocks of artiﬁcial intelli- gence is machine learning. At the core of almost all machine learning algorithms is an optimization algorithm. In this thesis, we propose new optimization algorithms for various machine learning problems. In particular, we focus on second-order optimization methods, which use the curvature (Hessian) information along with the slope (gradient) information. Support vector machines are classiﬁcation algorithms widely used to solve various real-world problems. One challenge in using these algorithms is overﬁtting, which occurs when a machine learning model performs well on training data but poorly on new, unseen data. An overﬁt model tends to produce inaccurate predictions, leading to poor generalization. To mitigate overﬁtting in classiﬁcation tasks, an `1 penalty term can be added to the objective function. However, this introduces another complication: non-smoothness. We propose a smoothing algorithm to address the soft-margin SVM optimization problem with an `1 penalty. This algorithm is designed to be eﬀective for very large datasets. Experimental results demonstrate that our algorithm achieves strong test accuracy while maintaining eﬃcient training speeds. Autoencoders are an unsupervised learning technique built upon neural networks for feature learning. First-order methods like gradient-descent algorithms are used to train such neural net- works. However, these methods can progress very slowly when applied to deep autoencoders, leading to poor performance on the training set. To address this issue, we turn to second-order methods, speciﬁcally the Hessian-free (HF) optimization, to train deep autoencoders. One draw- back of HF optimization is that it requires solving large linear systems, which can be quite costly. To mitigate this expense, we employ the LSMR method, a conjugate-gradient technique designed for solving sparse linear equations. Our experimental results show that combining HF optimization with the LSMR method yields a fast training of deep autoencoders with better generalization error.
Academic Unit: Interdisciplinary Graduate Program in Applied Mathematical & Computational Sciences
Record Identifier: 9984830728602771

Metrics

1 File views/ downloads

4 Record Views