Provably Invincible Adversarial Attacks on Reinforcement Learning Systems: A Rate-Distortion Information-Theoretic Approach

Ziqing Lu; Lifeng Lai; Weiyu Xu

doi:10.48550/arxiv.2510.13792

Back

Provably Invincible Adversarial Attacks on Reinforcement Learning Systems: A Rate-Distortion Information-Theoretic Approach

Preprint

Open access

Provably Invincible Adversarial Attacks on Reinforcement Learning Systems: A Rate-Distortion Information-Theoretic Approach

Ziqing Lu, Lifeng Lai and Weiyu Xu

ArXiv.org

Cornell University

10/15/2025

DOI: 10.48550/arxiv.2510.13792

Files and links (1)

url

https://doi.org/10.48550/arxiv.2510.13792View

Preprint (Author's original)This preprint has not been evaluated by subject experts through peer review. Preprints may undergo extensive changes and/or become peer-reviewed journal articles. Open Access

Abstract

Reinforcement learning (RL) for the Markov Decision Process (MDP) has emerged in many security-related applications, such as autonomous driving, financial decisions, and drone/robot algorithms. In order to improve the robustness/defense of RL systems against adversaries, studying various adversarial attacks on RL systems is very important. Most previous work considered deterministic adversarial attack strategies in MDP, which the recipient (victim) agent can defeat by reversing the deterministic attacks. In this paper, we propose a provably ``invincible'' or ``uncounterable'' type of adversarial attack on RL. The attackers apply a rate-distortion information-theoretic approach to randomly change agents' observations of the transition kernel (or other properties) so that the agent gains zero or very limited information about the ground-truth kernel (or other properties) during the training. We derive an information-theoretic lower bound on the recipient agent's reward regret and show the impact of rate-distortion attacks on state-of-the-art model-based and model-free algorithms. We also extend this notion of an information-theoretic approach to other types of adversarial attack, such as state observation attacks.

Computer Science - Artificial Intelligence

Computer Science - Learning

Details

Title: Subtitle: Provably Invincible Adversarial Attacks on Reinforcement Learning Systems: A Rate-Distortion Information-Theoretic Approach
Creators: Ziqing Lu - University of Iowa
Lifeng Lai - University of California, Davis
Weiyu Xu - University of Iowa
Resource Type: Preprint
Publication Details: ArXiv.org
DOI: 10.48550/arxiv.2510.13792
ISSN: 2331-8422
Publisher: Cornell University; Ithaca, New York
Language: English
Date posted: 10/15/2025
Academic Unit: Electrical and Computer Engineering
Record Identifier: 9985014803102771

Metrics

3 Record Views