Microsoft Research Blog

Microsoft Research Blog

The Microsoft Research blog provides in-depth views and perspectives from our researchers, scientists and engineers, plus information about noteworthy events and conferences, scholarships, and fellowships designed for academic and scientific communities.

New perspectives on contextual bandit

July 13, 2018 | By Sébastien Bubeck, Senior Researcher

What is the common denominator between the following situations: a doctor choosing among different drugs for a sequence of patients; a website selecting ads for its visitors based on the information it has about them; and a travel agency making recommendations to its customers for their next vacation? All can be viewed as instances of an abstract mathematical problem known as contextual multi-armed bandit. In the last decade, this problem has been a focus of the interactive machine learning community. It is rare for abstract problems to have immediate applications, but the contextual bandit is both rich mathematically speaking while it also models fundamental problems out in the world in a wide variety of current applications.

Microsoft Research, has been working extensively on both refining the theory of contextual bandit, as well as making the known algorithms scalable to internet-scale applications. The New York City’s team of bandit experts has made several breakthroughs in the field. Particularly striking was when researchers Alekh Agarwal, John Langford, Rob Schapire, Akshay Krishnamurthy, and Haipeng Luo put forth an open problem about an elementary question on contextual bandit at the 2017 Conference On Learning Theory in Amsterdam, The Netherlands. This open problem immediately attracted the attention of fellow conference attendee Sebastien Bubeck, Senior Researcher at Microsoft Research in Redmond, Washington and an expert in multi-armed bandit theory. Upon his return to Redmond, he started to work on the problem with his intern, Yuanzhi Li, a graduate student at Princeton University, and he also enrolled his new colleague Zeyuan Allen-Zhu who had recently joined the Machine Learning and Optimization team in Redmond.

The open problem from the New York City research team can be phrased as follows. In contextual bandit one tries to identify online the best strategy from some set of fixed policies (for example, defined by some set of neural networks). A basic performance measure is the so-called “regret”, that controls how much one regrets not playing the optimal policy. Seminal work from the 1990s gave an algorithm, known as Exp4, which was proved to be unimprovable without restricting in some ways the set of policies. What the 2017 open problem asks is this: knowing a priori that there is a very good policy, can we hope to get a much smaller regret? Intuitively this should be the case, as such a good policy should stand out more rapidly. In the bandit theory community this is known as a first order regret bound and such bounds were known for many bandit settings. Therefore, it was really surprising that the contextual version of the question seemed to require new tools.

In work to appear this month at ICML 2018 in Stockholm, Sweden, Allen-Zhu, Bubeck and Li succeeded in solving the open problem and gave the first contextual bandit algorithm with an optimal first order regret bound. It turned out that the intuition from the New York City researchers was correct and that new tools were required to solve the problem. A major conceptual contribution of the paper is the introduction of auxiliary experts. This idea, while powerful in theory, also has limitations in practice. In particular the Microsoft Research New York City team’s open problem was also asking for a scalable algorithm. This challenge remains open to date but the Redmond researchers hope to tackle it too. One thing is certain, more discoveries in contextual bandit are awaiting!

Up Next

Algorithms, Artificial intelligence, Human-computer interaction, Social sciences

Keeping an Eye on AI with Dr. Kate Crawford

Episode 14, February 28, 2018 - Today, Dr. Crawford talks about both the promises and the problems of AI; why— when it comes to data – bigger isn’t necessarily better; and how – even in an era of increasingly complex technological advances – we can adopt AI design principles that empower people to shape their technical tools in ways they’d like to use them most.

Microsoft blog editor

Algorithms, Artificial intelligence, Medical, health and genomics, Social sciences

Opportunities abound in the creative and collaborative culture of STEM

The explosion of data available today everywhere from biomedicine to the arts is opening new opportunities for researchers with backgrounds in science, technology, engineering and math to pursue creative and collaborative endeavors that have deep societal impact. My research has always been interdisciplinary, which by nature is collaborative and creative. You need experts from different […]

Jennifer Chayes

Technical Fellow / Managing Director, Microsoft Research New England & New York City; and Microsoft Research Montreal

Algorithms, Artificial intelligence, Data visualization, analytics, and platform

Summer school data science research could trigger real world changes

By John Kaiser, Writer, Microsoft Research Microsoft Research hosted its third annual Data Science Summer School in New York City as a diverse group of undergraduate students deployed some of the latest data crunching techniques on millions of rows of anonymized data in an effort to uncover useful information. “We’re really hoping to give them […]

Microsoft blog editor