Preprint
Near-Optimal Regret for Distributed Adversarial Bandits: A Black-Box Approach
ArXiv.org
Cornell University
02/06/2026
DOI: 10.48550/arxiv.2602.06404
Abstract
We study distributed adversarial bandits, whereNagents cooperate to minimize the global average loss while observing only their own local losses. We show that the minimax regret for this problem is\tildeΘ{(}{√(̅ρ̅^̅(̅-̅1̅/̅2̅)̅+̅K̅/̅N̅)̅T̅}{)} , whereTis the horizon,Kis the number of actions, andρis the spectral gap of the communication matrix. Our algorithm, based on a novel black-box reduction to bandits with delayed feedback, requires agents to communicate only through gossip. It achieves an upper bound that significantly improves over the previous best boundÕ(ρ^(-1/3)(KT)^(2/3))of Yi and Vojnovic (2023). We complement this result with a matching lower bound, showing that the problem's difficulty decomposes into a communication costρ^(-1/4)√T̅and a bandit cost√K̅T̅/̅N̅ . We further demonstrate the versatility of our approach by deriving first-order and best-of-both-worlds bounds in the distributed adversarial setting. Finally, we extend our framework to distributed linear bandits inRᵈ , obtaining a regret bound ofÕ(√(̅ρ̅^̅(̅-̅1̅/̅2̅)̅+̅1̅/̅N̅)̅d̅T̅) , achieved with onlyO(d)communication cost per agent and per round via a volumetric spanner.
Details
- Title: Subtitle
- Near-Optimal Regret for Distributed Adversarial Bandits: A Black-Box Approach
- Creators
- Hao QiuMengxiao ZhangNicolò Cesa-Bianchi
- Resource Type
- Preprint
- Publication Details
- ArXiv.org
- DOI
- 10.48550/arxiv.2602.06404
- ISSN
- 2331-8422
- Publisher
- Cornell University; Ithaca, New York
- Language
- English
- Date posted
- 02/06/2026
- Academic Unit
- Business Analytics
- Record Identifier
- 9985139477402771
Metrics
4 Record Views