Preprint
Contextual Multinomial Logit Bandits with General Value Functions
ArXiv.org
Cornell University
02/18/2024
DOI: 10.48550/arxiv.2402.08126
Abstract
Contextual multinomial logit (MNL) bandits capture many real-world assortment recommendation problems such as online retailing/advertising. However, prior work has only considered (generalized) linear value functions, which greatly limits its applicability. Motivated by this fact, in this work, we consider contextual MNL bandits with a general value function class that contains the ground truth, borrowing ideas from a recent trend of studies on contextual bandits. Specifically, we consider both the stochastic and the adversarial settings, and propose a suite of algorithms, each with different computation-regret trade-off. When applied to the linear case, our results not only are the first ones with no dependence on a certain problem-dependent constant that can be exponentially large, but also enjoy other advantages such as computational efficiency, dimension-free regret bounds, or the ability to handle completely adversarial contexts and rewards.
Details
- Title: Subtitle
- Contextual Multinomial Logit Bandits with General Value Functions
- Creators
- Mengxiao ZhangHaipeng Luo
- Resource Type
- Preprint
- Publication Details
- ArXiv.org
- DOI
- 10.48550/arxiv.2402.08126
- ISSN
- 2331-8422
- Publisher
- Cornell University; Ithaca, New York
- Language
- English
- Date posted
- 02/18/2024
- Academic Unit
- Business Analytics
- Record Identifier
- 9984701809402771
Metrics
58 Record Views