Online Learning and Bandits – Part 2


August 18, 2015


Aditya Gopalan




The ability to make continual, accurate decisions based on evolving data is key in many of today’s data-driven intelligent systems. This tutorial-style talk presents an introduction to the modern study of sequential learning and decision making under uncertainty. The broad objective is to cover modeling frameworks for online prediction and learning, explore algorithms for decision making, and gain an understanding of their performance. Specifically, we will look at multi-armed bandits – models of decision making that capture the explore-vs-exploit tradeoff in learning, regret minimization, non-stochastic or adversarial online learning, and online convex optimization. Time permitting, we will discuss new directions and frontiers in the area of sequential decision making.