Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without

Sébastien Bubeck; Yuanzhi Li; Yuval Peres; Mark Sellke

Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without

Sébastien Bubeck ,
Yuanzhi Li ,
Yuval Peres ,
Mark Sellke

COLT 2020 | May 2019

Download BibTex

We consider the non-stochastic version of the (cooperative) multi-player multi-armed bandit problem. The model assumes no communication at all between the players, and furthermore when two (or more) players select the same action this results in a maximal loss. We prove the first √T-type regret guarantee for this problem, under the feedback model where collisions are announced to the colliding players. Such a bound was not known even for the simpler stochastic version. We also prove the first sublinear regret guarantee for the feedback model where collision information is not available, namely T1−1 2m where m is the number of players.