Bounded Regret In Stochastic Multi-Armed Bandits

Sébastien Bubeck; Vianney Perchet; Philippe Rigollet

Bounded Regret In Stochastic Multi-Armed Bandits

Sébastien Bubeck ,
Vianney Perchet ,
Philippe Rigollet

JMLR: Workshop and Conference Proceedings | August 2013

Download BibTex

We study the stochastic multi-armed bandit problem when one knows the value µ(⋆) of an optimal arm, as a well as a positive lower bound on the smallest positive gap ∆. We propose a new randomized policy that attains a regret uniformly bounded over time in this setting. We also prove several lower bounds, which show in particular that bounded regret is not possible if one only knows ∆, and bounded regret of order 1/∆ is not possible if one only knows µ(⋆).

Video (opens in new tab) – (Note: The proof of Theorem 8 is not correct. We do not know if the theorem holds true.)