Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs

Haipeng Luo; Hanghang Tong; Mengxiao Zhang; Yuheng Zhang

Back

Conference proceeding

Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs

Haipeng Luo, Hanghang Tong, Mengxiao Zhang and Yuheng Zhang

INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, Vol.201, pp.1074-1100

Proceedings of Machine Learning Research

01/01/2023

View Online

Abstract

We study high-probability regret bounds for adversarial K-armed bandits with time-varying feedback graphs over T rounds. For general strongly observable graphs, we develop an algorithm that achieves the optimal regret (O) over tilde((Sigma T-t=1 alpha(t))(1/2) + max(t is an element of[T]) alpha t) with high probability, where alpha(t) is the independence number of the feedback graph at round t. Compared to the best existing result (Neu, 2015) which only considers graphs with self-loops for all nodes, our result not only holds more generally, but importantly also removes any poly(K) dependence that can be prohibitively large for applications such as contextual bandits. Furthermore, we also develop the first algorithm that achieves the optimal high-probability regret bound for weakly observable graphs, which even improves the best expected regret bound of (Alon et al., 2015b) by removing the O(root KT) term with a refined analysis. Our algorithms are based on the online mirror descent framework, but importantly with an innovative combination of several techniques. Notably, while earlier works use optimistic biased loss estimators for achieving high-probability bounds, we find it important to use a pessimistic one for nodes without self-loop in a strongly observable graph.

Computer Science

Technology

Computer Science, Information Systems

Computer Science, Theory & Methods

Science & Technology

Details

Title: Subtitle: Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs
Creators: Haipeng Luo - Univ Southern Calif, Los Angeles, CA 90007 USA
Hanghang Tong - University of Illinois Urbana-Champaign
Mengxiao Zhang - Univ Southern Calif, Los Angeles, CA 90007 USA
Yuheng Zhang - University of Illinois Urbana-Champaign
Contributors: S Agrawal (Editor)
F Orabona (Editor)
Resource Type: Conference proceeding
Publication Details: INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, Vol.201, pp.1074-1100
Publisher: JMLR-JOURNAL MACHINE LEARNING RESEARCH
Series: Proceedings of Machine Learning Research
ISSN: 2640-3498
eISSN: 2640-3498
Number of pages: 27
Language: English
Date published: 01/01/2023
Academic Unit: Business Analytics
Record Identifier: 9984701724402771

Metrics

1 Record Views