Efficient Online Bandit Multiclass Learning with tilde{O}(sqrt{T}) Regret

Alina Beygelzimer; Francesco Orabona; Chicheng Zhang

Efficient Online Bandit Multiclass Learning with tilde{O}(sqrt{T}) Regret

Alina Beygelzimer ,
Francesco Orabona ,
Chicheng Zhang

International Conference on Machine Learning 2017 | August 2017

Download BibTex

We present an efficient second-order algorithm with \tilde{O}(1/η \sqrt{T}) regret for the bandit online multiclass problem. The regret bound holds simultaneously with respect to a family of loss functions parameterized by η, for a range of η restricted by the norm of the competitor. The family of loss functions ranges from hinge loss (η=0) to squared hinge loss (η=1). This provides a solution to the open problem of (J. Abernethy and A. Rakhlin. An efficient bandit algorithm for \sqrt{T}-regret in online multiclass prediction? In COLT, 2009). We test our algorithm experimentally, showing that it also performs favorably against earlier algorithms.