No-Regret Learning for Fair Multi-Agent Social Welfare Optimization

Mengxiao Zhang; Ramiro Deo-Campo Vuong; Haipeng Luo

doi:10.48550/arxiv.2405.20678

Back

No-Regret Learning for Fair Multi-Agent Social Welfare Optimization

Preprint

Open access

No-Regret Learning for Fair Multi-Agent Social Welfare Optimization

Mengxiao Zhang, Ramiro Deo-Campo Vuong and Haipeng Luo

ArXiv.org

Cornell University

05/31/2024

DOI: 10.48550/arxiv.2405.20678

Files and links (1)

url

https://doi.org/10.48550/arxiv.2405.20678View

Preprint (Author's original)This preprint has not been evaluated by subject experts through peer review. Preprints may undergo extensive changes and/or become peer-reviewed journal articles. Open Access

Abstract

We consider the problem of online multi-agent Nash social welfare (NSW) maximization. While previous works of Hossain et al. [2021], Jones et al. [2023] study similar problems in stochastic multi-agent multi-armed bandits and show that T−−√-regret is possible after T rounds, their fairness measure is the product of all agents' rewards, instead of their NSW (that is, their geometric mean). Given the fundamental role of NSW in the fairness literature, it is more than natural to ask whether no-regret fair learning with NSW as the objective is possible. In this work, we provide a complete answer to this question in various settings. Specifically, in stochastic N-agent K-armed bandits, we develop an algorithm with O˜(K2NTN−1N) regret and prove that the dependence on T is tight, making it a sharp contrast to the T−−√-regret bounds of Hossain et al. [2021], Jones et al. [2023]. We then consider a more challenging version of the problem with adversarial rewards. Somewhat surprisingly, despite NSW being a concave function, we prove that no algorithm can achieve sublinear regret. To circumvent such negative results, we further consider a setting with full-information feedback and design two algorithms with T−−√-regret: the first one has no dependence on N at all and is applicable to not just NSW but a broad class of welfare functions, while the second one has better dependence on K and is preferable when N is small. Finally, we also show that logarithmic regret is possible whenever there exists one agent who is indifferent about different arms. whenever there exists one agent who is indifferent about different arms.

Computer Science - Computer Science and Game Theory

Computer Science - Learning

Computer Science - Multiagent Systems

Statistics - Machine Learning

Details

Title: Subtitle: No-Regret Learning for Fair Multi-Agent Social Welfare Optimization
Creators: Mengxiao Zhang
Ramiro Deo-Campo Vuong
Haipeng Luo
Resource Type: Preprint
Publication Details: ArXiv.org
DOI: 10.48550/arxiv.2405.20678
ISSN: 2331-8422
Publisher: Cornell University; Ithaca, New York
Language: English
Date posted: 05/31/2024
Academic Unit: Business Analytics
Record Identifier: 9984702718902771

Metrics

65 Record Views