Tail Distribution of Regret in Optimistic Reinforcement Learning

Sajad Khodadadian; Mehrdad Moharrami

doi:10.48550/arxiv.2511.18247

Back

Tail Distribution of Regret in Optimistic Reinforcement Learning

Preprint

Open access

Tail Distribution of Regret in Optimistic Reinforcement Learning

Sajad Khodadadian and Mehrdad Moharrami

ArXiv.org

Cornell University

11/23/2025

DOI: 10.48550/arxiv.2511.18247

Files and links (1)

url

https://doi.org/10.48550/arxiv.2511.18247View

Preprint (Author's original)This preprint has not been evaluated by subject experts through peer review. Preprints may undergo extensive changes and/or become peer-reviewed journal articles. Open Access

Abstract

We derive instance-dependent tail bounds for the regret of optimism-based reinforcement learning in finite-horizon tabular Markov decision processes with unknown transition dynamics. Focusing on a UCBVI-type algorithm, we characterize the tail distribution of the cumulative regret$R_K$over$K$episodes, rather than only its expectation or a single high-probability quantile. We analyze two natural exploration-bonus schedules: (i) a$K$ -dependent scheme that explicitly incorporates the total number of episodes$K$ , and (ii) a$K$ -independent scheme that depends only on the current episode index. For both settings, we obtain an upper bound on$\Pr(R_K \ge x)$that exhibits a distinctive two-regime structure: a sub-Gaussian tail starting from an instance-dependent scale$m_K$up to a transition threshold, followed by a sub-Weibull tail beyond that point. We further derive corresponding instance-dependent bounds on the expected regret$\mathbb{E}[R_K]$ . The proposed algorithm depends on a tuning parameter$α$ , which balances the expected regret and the range over which the regret exhibits a sub-Gaussian tail. To the best of our knowledge, our results provide one of the first comprehensive tail-regret guarantees for a standard optimistic algorithm in episodic reinforcement learning.

Computer Science - Learning

Mathematics - Optimization and Control

Details

Title: Subtitle: Tail Distribution of Regret in Optimistic Reinforcement Learning
Creators: Sajad Khodadadian
Mehrdad Moharrami
Resource Type: Preprint
Publication Details: ArXiv.org
DOI: 10.48550/arxiv.2511.18247
ISSN: 2331-8422
Publisher: Cornell University; Ithaca, New York
Language: English
Date posted: 11/23/2025
Academic Unit: Computer Science
Record Identifier: 9985034934402771

Metrics

30 Record Views