From distributionally robust optimization to broader machine learning applications

Dixian Zhu

doi:10.25820/etd.007289

Back

From distributionally robust optimization to broader machine learning applications

Dissertation

Open access

From distributionally robust optimization to broader machine learning applications

Dixian Zhu

University of Iowa

Doctor of Philosophy (PhD), University of Iowa

Spring 2023

DOI: 10.25820/etd.007289

Files and links (1)

pdf

PhD-dissertation-Dixian8.91 MBDownload View

Free to read and download, Open Access

Abstract

Distributionally Robust Optimization (DRO) was initially proposed as a technique to train a model with higher weights for more difficult data samples, thereby improving model robustness. However, the philosophy behind DRO can be extended to improve other machine learning applications, not just Empirical Risk Minimization (ERM) at the data instance level. The question naturally arises: can we use DRO to make other machine learning applications more robust? This thesis seeks to adapt the philosophy of DRO to broader machine learning applications, namely, Pool-based Active Learning, Partial AUC Optimization, Multi-class Classification, and Multiple Instance Learning. To begin, we provide an overview of DRO in the background section of this thesis. In chapter 2, we adapt DRO from passive learning to active learning by unifying model training on labeled data and data querying on unlabeled data. We propose using DRO as exact and soft estimators for partial AUC optimization in chapter 3. In chapter 4, we explore the connection between DRO and multi-class classification, where we investigate and enhance classification consistency, robustness, and adaptivity. Finally, in chapter 5, we propose stochastic pooling methods for multiple instance learning based on DRO and attention mechanisms. Overall, this thesis seeks to extend the application of DRO to various machine learning tasks, beyond the traditional ERM approach. By leveraging the philosophy of DRO, we aim to make machine learning algorithms more robust and adaptable in various real-world scenarios.

Machine Learning

Artificial Intelligence

Distributionally Robust Optimization

Details

Title: Subtitle: From distributionally robust optimization to broader machine learning applications
Creators: Dixian Zhu
Contributors: Tianbao Yang (Advisor)
Kasturi Varadarajan (Committee Member)
Bijaya Adhikari (Committee Member)
Qihang Lin (Committee Member)
Xun Zhou (Committee Member)
Resource Type: Dissertation
Degree Awarded: Doctor of Philosophy (PhD), University of Iowa
Degree in: Computer Science
Date degree season: Spring 2023
Publisher: University of Iowa
DOI: 10.25820/etd.007289
Number of pages: xii, 178 pages
Language: English
Date submitted: 04/25/2023
Date approved: 06/30/2023
Description illustrations: illustrations (some color)
Description bibliographic: Includes bibliographical references (pages 86-102).
Public Abstract (ETD): For the purpose of training machine learning models, it is conventionally accepted that peers define a differentiable loss function for each data sample and optimize the averaged empirical loss. However, people have been questioning whether this is the only approach to train a model. In the last decade, an alternative approach based on natural philosophy has been proposed. This approach involves trading-off between optimizing the averaged individual loss and the maximal individual loss. By focusing more on the harder data samples, the approach can be more robust and perform better.

Inspired by this philosophy, we propose to apply this high-level idea to various machine learning applications. For instance, this philosophy can be used not only to guide model training but also to query labeled data under the active learning paradigm. Additionally, we have discovered that the hard-attention mechanism can naturally adapt to optimizing the partial Area under the ROC curve, which is especially significant for machine learning on imbalanced datasets such as medical and healthcare data. We have also found that this philosophy is related to the multiclass classification problem and its commonly used loss functions. To this end, we have proposed a unified loss function and investigated its properties to enhance multi-class classification performance. Lastly, we propose to employ this philosophy to multiple instance learning, where we aim to classify bags of data with limited instances displaying the interests.
Academic Unit: Computer Science
Record Identifier: 9984437257402771

Metrics

11 File views/ downloads

36 Record Views