Online Optimization in ê•-Armed Bandits
- Sébastien Bubeck ,
- Rémi Munos ,
- Gilles Stoltz ,
- Csaba Szepesvári
Advances in Neural Information Processing Systems (NIPS) |
We consider a generalization of stochastic bandit problems where the set of arms, X, is allowed to be a generic topological space and the mean-payoff function is “locally Lipschitz” with respect to a dissimilarity function that is known to the decision maker. Under this condition we construct an arm selection policy whose regret improves upon previous results for a large class of problems. In particular, our results imply that if X is the unit hypercube in a Euclidean space and the mean-payoff function has a finite number of global maxima around which the behavior of the function is locally Hölder with a known exponent, then the expected regret is bounded up to a logarithmic factor by √n, i.e., the rate of the growth of the regret is independent of the dimension of the space. Moreover, we prove the minimax optimality of our algorithm for the class of mean-payoff functions we consider.