Bandits With Heavy Tail

Sébastien Bubeck; Nicolò Cesa-Bianchi; Gábor Lugosi

Bandits With Heavy Tail

Sébastien Bubeck ,
Nicolò Cesa-Bianchi ,
Gábor Lugosi

IEEE Transactions On Information Theory | November 2013

Published by IEEE

Publication

Download BibTex

The stochastic multiarmed bandit problem is well understood when the reward distributions are sub-Gaussian. In this paper, we examine the bandit problem under the weaker assumption that the distributions have moments of order 1 + ε, for some ε ∈ (0,1]. Surprisingly, moments of order 2 (i.e., finite variance) are sufficient to obtain regret bounds of the same order as under sub-Gaussian reward distributions. In order to achieve such regret, we define sampling strategies based on refined estimators of the mean such as the truncated empirical mean, Catoni’s M-estimator, and the median-of-means estimator. We also derive matching lower bounds that also show that the best achievable regret deteriorates when ε <; 1.