Abstract

In an online decision problem, one makes a sequence of decisions without knowledge of the future. Each period, one pays a cost based on the decision and observed state. We give a simple approach for doing nearly as well as the best single decision, where the best is chosen with the benefit of hindsight. A natural idea is to follow the leader, i.e.¬†each period choose the decision which has done best so far. We show that by slightly perturbing the totals and then choosing the best decision, the expected performance is nearly as good as the best decision in hindsight. Our approach, which is very much like Hannan’s original game-theoretic approach from the 1950s, yields guarantees competitive with the more modern exponential weighting algorithms like Weighted Majority.
More importantly, these follow-the-leader style algorithms extend naturally to a large class of structured online problems for which the exponential algorithms are inefficient.