Derivation of Information-Theoretically Optimal Adversarial Attacks with Applications to Robust Machine Learning

Jirong Yi; Raghu Mudumbai; Weiyu Xu

Back

Derivation of Information-Theoretically Optimal Adversarial Attacks with Applications to Robust Machine Learning

Preprint

Open access

Derivation of Information-Theoretically Optimal Adversarial Attacks with Applications to Robust Machine Learning

Jirong Yi, Raghu Mudumbai and Weiyu Xu

ArXiv.org

07/28/2020

Files and links (1)

url

https://arxiv.org/abs/2007.14042View

Preprint (Author's original)This preprint has not been evaluated by subject experts through peer review. Preprints may undergo extensive changes and/or become peer-reviewed journal articles. Open Access

Abstract

We consider the theoretical problem of designing an optimal adversarial attack on a decision system that maximally degrades the achievable performance of the system as measured by the mutual information between the degraded signal and the label of interest. This problem is motivated by the existence of adversarial examples for machine learning classifiers. By adopting an information theoretic perspective, we seek to identify conditions under which adversarial vulnerability is unavoidable i.e. even optimally designed classifiers will be vulnerable to small adversarial perturbations. We present derivations of the optimal adversarial attacks for discrete and continuous signals of interest, i.e., finding the optimal perturbation distributions to minimize the mutual information between the degraded signal and a signal following a continuous or discrete distribution. In addition, we show that it is much harder to achieve adversarial attacks for minimizing mutual information when multiple redundant copies of the input signal are available. This provides additional support to the recently proposed ``feature compression" hypothesis as an explanation for the adversarial vulnerability of deep learning classifiers. We also report on results from computational experiments to illustrate our theoretical results.

Details

Title: Subtitle: Derivation of Information-Theoretically Optimal Adversarial Attacks with Applications to Robust Machine Learning
Creators: Jirong Yi
Raghu Mudumbai
Weiyu Xu
Resource Type: Preprint
Publication Details: ArXiv.org
ISSN: 2331-8422
Number of pages: 16 pages
Language: English
Date posted: 07/28/2020
Academic Unit: Electrical and Computer Engineering
Record Identifier: 9984198018702771

Metrics

19 Record Views