Logo image
Near-Optimal Regret for Distributed Adversarial Bandits: A Black-Box Approach
Preprint   Open access

Near-Optimal Regret for Distributed Adversarial Bandits: A Black-Box Approach

Hao Qiu, Mengxiao Zhang and Nicolò Cesa-Bianchi
ArXiv.org
Cornell University
02/06/2026
DOI: 10.48550/arxiv.2602.06404
url
https://doi.org/10.48550/arxiv.2602.06404View
Preprint (Author's original)This preprint has not been evaluated by subject experts through peer review. Preprints may undergo extensive changes and/or become peer-reviewed journal articles. Open Access

Abstract

We study distributed adversarial bandits, whereNagents cooperate to minimize the global average loss while observing only their own local losses. We show that the minimax regret for this problem is\tildeΘ{(}{√(̅ρ̅^̅(̅-̅1̅/̅2̅)̅+̅K̅/̅N̅)̅T̅}{)} , whereTis the horizon,Kis the number of actions, andρis the spectral gap of the communication matrix. Our algorithm, based on a novel black-box reduction to bandits with delayed feedback, requires agents to communicate only through gossip. It achieves an upper bound that significantly improves over the previous best boundÕ(ρ^(-1/3)(KT)^(2/3))of Yi and Vojnovic (2023). We complement this result with a matching lower bound, showing that the problem's difficulty decomposes into a communication costρ^(-1/4)√T̅and a bandit cost√K̅T̅/̅N̅ . We further demonstrate the versatility of our approach by deriving first-order and best-of-both-worlds bounds in the distributed adversarial setting. Finally, we extend our framework to distributed linear bandits inRᵈ , obtaining a regret bound ofÕ(√(̅ρ̅^̅(̅-̅1̅/̅2̅)̅+̅1̅/̅N̅)̅d̅T̅) , achieved with onlyO(d)communication cost per agent and per round via a volumetric spanner.
Computer Science - Learning

Details

Metrics

4 Record Views
Logo image