Most machine learning algorithms rely on the assumption that the data is generated by a stochastic process. Many online learning algorithms go one step further and allow the data to be generated by an oblivious adversary (a.k.a. a non-adaptive or non-reactive adversary). Very little is known about the machine learning scenario where the data is generated by a more powerful adversary, such as a switching-cost adversary, a memory-bounded adversary, or a general adaptive adversary. In this talk, I will describe ongoing efforts to understand what can and cannot be done against these powerful rivals. First, I will define policy-regret, which is a meaningful way of measuring the performance of a learning algorithm in the adversarial setting. Then, I will present the current state-of-the-art upper and lower bounds on policy regret against different adversary types, in the full-information and the bandit-feedback settings.
This talk represents joint work with Ambuj Tewari, Raman Arora, Nicolo Cesa-Bianchi, and Ohad Shamir.