Ofer Dekel – Bandit Convex Optimization: SqrtT Regret in One Dimension
Bandit convex optimization is a sequential decision making game with partial information. On each round, an adversary chooses a bounded convex loss function and the decision maker chooses a point in the function’s domain. The loss incurred by the decision maker on that round is the value of the loss function at his chosen point. The decision maker observes his loss value (a single number) and uses this information to make better decisions going forward, but he does not observe any other information about the loss function. The decision maker’s goal is to reduce his regret – the difference between his total loss and the loss of the best fixed point in hindsight. This game generalizes the classic problem of noisy derivative-free convex optimization. In 2005, Flaxman et al. showed that the minimax regret of this game is at most O(T^(5/6)), compared to the classic lower bound of Omega(T^(1/2)). Since then, the gap between the upper and lower bounds was reduced in special cases, but resolving the general question remains a tantalizing open problem. In this talk, I will survey the recent progress on this problem. Specifically, in the one-dimensional case, the minimax regret is of order T^(1/2). The proof is non-constructive: it does not specify a concrete algorithm that guarantees a T^(1/2) regret. Instead, we use minimax duality to reduce the problem to a Bayesian setting, where the convex loss functions are drawn from a worst-case distribution, and solve the Bayesian version of the problem with a variant of Thompson Sampling. The analysis features a novel use of convexity that may be of independent interest. Joint work with Sebastien Bubeck, Tomer Koren and Yuval Peres.