This is an umbrella project for machine learning with explore-exploit tradeoff: the trade-off between acquiring and using information. This is a mature, yet very active, research area studied in Machine Learning, Theoretical Computer Science, Operations Research, and Economics. Much of our activity focuses on “multi-armed bandits” and “contextual bandits”, relatively simple and yet very powerful models for explore-exploit tradeoff.
We are located in (or heavily collaborating with) Microsoft Research New York City. Most of us are involved in Multi-World Testing: an approach & system for contextual bandit learning.