### Episode 53, December 5, 2018

Dr. Sébastien Bubeck is a mathematician and a senior researcher in the Machine Learning and Optimization group at Microsoft Research. He’s also a self-proclaimed “bandit” who claims that, despite all the buzz around AI, it’s still a science in its infancy. That’s why he’s devoted his career to advancing the mathematical foundations behind the machine learning algorithms behind AI.

Today, Dr. Bubeck explains the difficulty of the multi-armed bandit problem in the context of a parameter- and data-rich online world. He also discusses a host of topics from randomness and convex optimization to metrical task systems and log n competitiveness to the surprising connection between Gaussian kernels and what he calls some of the most beautiful objects in mathematics.

### Related:

- Microsoft Research Podcast: View more podcasts on Microsoft.com
- iTunes: Subscribe and listen to new podcasts each week on iTunes
- Email: Subscribe and listen by email
- Android: Subscribe and listen on Android
- Spotify: Listen on Spotify
- RSS feed
- Microsoft Research Newsletter: Sign up to receive the latest news from Microsoft Research

### Episode Transcript

Sébastien Bubeck: You want science to be based on scientific facts. And that’s what I like about the theoretical computer science community is that they have well-established, long-standing open problems, and if you do make progress on those problems, then people will listen to you, rightfully, because it means you have some new techniques, something that people, before, couldn’t do.

**Host: You’re listening to the Microsoft Research Podcast, a show that brings you closer to the cutting-edge of technology research and the scientists behind it. I’m your host, Gretchen Huizinga.**

**Host: Dr. Sébastien Bubeck is a mathematician and a senior researcher in the Machine Learning and Optimization group at Microsoft Research. He’s also a self-proclaimed “bandit” who claims that, despite all the buzz around AI, it’s still a science in its infancy. That’s why he’s devoted his career to advancing the mathematical foundations behind the machine learning algorithms behind AI.**

**Today, Dr. Bubeck explains the difficulty of the multi-armed bandit problem in the context of a parameter- and data-rich online world. He also discusses a host of topics from randomness and convex optimization to metrical task systems and log n competitiveness to the surprising connection between Gaussian kernels and what he calls some of the most beautiful objects in mathematics. That and much more on this episode of the Microsoft Research Podcast.**

**Host: Sébastien Bubeck, welcome to the podcast.**

Sébastien Bubeck: Thank you.

**Host: You’re a senior researcher in the newly-minted, or relatively newly-minted, Machine Learning and Optimization group at MSR which used to be called the Theory Group and now there’s a couple of sub-groups under this new umbrella, so, give us a quick picture of the org chart and who does what and then tell us how we can best frame the work you do. What gets you up in the morning?**

Sébastien Bubeck: Sure. So, I will start with the new organization that we have here at Microsoft Research in Redmond. So, I guess, a little bit more than a year ago, we started a new lab which is called Microsoft Research AI, for artificial intelligence. So, this is following, of course, all the hype and the real scientific excitement around machine learning. So, we created Microsoft Research AI and some of the members of the former Theory Group decided to join this new endeavor and that’s when we decided to create the Machine Learning and Optimization group. And then the other half of the group went to do the Algorithms Group in just the basic Microsoft Research lab in Redmond.

**Host: So, is that actually a group, the Algorithms Group?**

Sébastien Bubeck: There is a group called the Algorithms Group. So, of course, in Machine Learning and Optimization, we also do a lot of algorithms. In fact, it is the main focus of our group, but specifically around machine learning.

**Host: So, you’re in the Machine Learning and Optimization side of this group.**

Sébastien Bubeck: Yes.

**Host: And you also work on algorithms. But what get you, particularly, up in the morning? What questions are you asking, what problems are you solving?**

Sébastien Bubeck: So, I guess, in general, I am really passionate about online decision making, or decision making in general. I think, you know no matter what you think will happen in the future, humans or robots or AI, there will always be a need for making decisions. And the questions in mathematics that surround decision making, I find them really fascinating, and they are still in their infancy. So, there is really a lot to be done there. And somehow, traditionally, pure mathematicians have not looked at those questions so much. And so there is really this gap between beautiful mathematics that has been developed, say, for physics or, slightly more recently for computer science, but I would argue that decision making, online decision making, is just as important today as those more traditional topics.

**Host: How does your group, doing very theoretical work on the math of machine learning, interact with other groups at Microsoft Research, or even the discipline, writ large?**

Sébastien Bubeck: Right. So, you know, these days in the media, we hear a lot about artificial intelligence. As much as we would want to have it, an artificial intelligence, that, you know, makes recommendations, that tells us what we should cook for the next day, you know, help us sleep better, whatever… it doesn’t exist yet. And what this means is that we are still very much working on the foundations. So, we have a couple of techniques that do work great, including gradient decent, convolutional neural networks, things like that. They really do work very well. And naturally, big companies are investing a lot in those and trying to, you know, engineer around them as much as possible, get the full stack, get the hardware ready, you know, all of those things. But there are many, many other things that we would want to do that are out of reach for the current techniques. So, what we really need to do is develop fundamentally new tools. And that is what our group is doing.

**Host: Well, we couldn’t talk about the mathematics of machine learning and sequential decision making without talking about the multi-armed bandit problem. It’s known, as you say to be simple – I’m making air-quotes here – and yet incredibly rich. So, explain this essential problem and tell us what’s new in the research right now.**

Sébastien Bubeck: Right. So, the multi-armed bandit problem is actually the first real research problem that I looked at now, a little bit more than a decade ago. So, it’s a mathematical question that conceptualizes the problem of exploration versus exploitation. So, you know, in our everyday life, we have to deal with this problem, you know, do we order from the same restaurant that we know we love, or do we try this new restaurant that we don’t really know yet, whether it’s good or not? So, maybe I should explain where does the name come from?

**Host: That’s a good idea.**

Sébastien Bubeck: So, in American slang, a one-armed bandit is just a slot machine at the casino. And you can imagine that in the multi-armed bandit, you are facing, you know, many of those one-armed bandits and for every coin that you have, you have to choose which bandit you are going to put your coin in and, you know, pull the arm. So, we refer to the different actions as different arms of the bandit, OK? So, that’s where the name comes from. And it really got popularized in the 1950s by Herbert Robbins. And at the time, it’s actually quite interesting, the multi-armed bandit problem was viewed as a negative result in the sense that in the 1930s, people developed classical statistics, so people like Fischer or Neiman and etc. So, they developed classical statistics and then they wanted, very naturally, to move to online statistics. So, what happens if the data is not given to you all at once, but it’s coming to you sequentially? And then they realized that the questions were really tricky, really difficult. And then Robbins came along and said, “Look at this multi-armed bandit problem, it’s so simple, yet we have no idea how to solve it.” So, it was more viewed as a negative problem. But then what happened is that in the early 2000s, a huge, very important application appeared on the scene which is ad placement on the internet. So, ad placement on the internet is exactly a bandit problem. You can imagine each user coming to your website as, you know, one time-step of your problem, and then for that user, you have to select which ad to present. So, each ad is an arm of your bandit and then when you present the ad, you see whether the person clicks on it. You see whether the treatment was efficient or not, and then you move on to the next one. So, it’s a new important application of multi-armed bandit. So, now what has happened since 100 years ago? So, what has happened is that now we really care about applications where the number of actions, the number of arms is huge. It is very, very large. Another application, in fact, is website display, so you can imagine that you have many different ways to display your website and, you know, users are going to prefer certain ways. But this is a combinatorial problem, you know, whether you place the ad at the top or at the bottom of the page, whether you use the color green or the color red. So, you have many, many possible choices which leads to a combinatorial explosion of possibilities, and in turn, this means that you have to really think very deeply on how to leverage the experience that you had with a certain arm to other arms. You have to understand, what are the correlations between those different arms. So, this is what modern multi-armed bandits is about.

**(music plays)**

**Host: Let’s talk about a project you’re involved in on the foundations of optimization. So, optimization methods are, as you call them, the engine of machine learning algorithms. Give us an overview of the research directions in this arena, particularly convex optimization, and explain why they’re important to the theory behind machine learning algorithms.**

Sébastien Bubeck: So, today’s optimization problems in machine learning are really characterized by two key features. One of them is that they deal with optimization problems in very high dimension. So, by that, what I mean is, the number of parameters of your model, these are the number of variables that you are going to do optimization over. We are not talking about optimizing a single parameter in dimension one, or, you know, two or three parameter dimensions two or three, that are easy to visualize for humans. We’re talking about millions, tens of millions, probably billions in the future. That’s the space we’re dealing with in optimization today. So that’s one of the big obstacles. The other one is just the sheer amount of data. So, the data is so huge that you have to be very careful about the operations that you do with them. You cannot, you know, visit your whole data set a million times. It’s just not possible. You have to do that more parsimoniously. So, this begs the question, in turn, when you go back to the optimization literature, on how to make the most of the calculations that you have already done. And what are the basic calculations that you can do to optimize a function? One of the basic things you can do is to calculate its derivative, its gradient. And now, you are asking the question, given the gradients that you have observed so far, how to best combine them to calculate the next gradient. So, this optimal usage of your information, it has a very long history in optimization, in particular in the Russian school, and they developed optimal methods in the 70s and the 80s and we are only now starting to really understand the fundamental concept behind it.

**Host: Really?**

Sébastien Bubeck: Yes. So, it’s actually a quite interesting story. At the end of the 70s, Arkadi Nemirovski, who is the leading figure in the Russian school in optimization, he discovered that when the function that you are optimizing is smooth, you can actually do better than the simple gradient descent technique. You can leverage the past information in a better way. So, he designed an accelerated technique, which later, Yurii Nesterov, who is now a legendary figure in machine learning, simplified dramatically and he made a very practical algorithm out of it, which is now called Nesterov’s Momentum Technique. So, this momentum technique was first designed in the context of convex optimization as a way to obtain an optimal method. And nowadays, it’s one of the most practical techniques for training deep neural networks which are non-convex functions and, you know, we don’t really understand why momentum is working there, but that’s the story of that technique.

**Host: So, there are things you do that you don’t totally understand, but they’re working, so you keep doing them.**

Sébastien Bubeck: That’s right. But what our group is actually doing is trying to understand why they are working. So, in particular, for this momentum technique, we worked a lot on it, and a couple of years ago, with my good collaborator, Yin Tat Lee, who is now is an Assistant Professor at the University of Washington, we found a way to understand the momentum from a geometric perspective. And this is very useful because when you start to be able to draw a picture, you get better intuition. So, this was, at least for us, a very important step towards understanding better, momentum.

**Host: You have two big areas of interest. Well you have a lot of varies of interest, let’s be honest. But you’ve noted that two big areas of your interest are the interplay between convexity and randomness in optimization, and also inference problems on random graphs. So, I wonder if you could just unpack those a little bit. Why are you interested in them? What do they address? Why are they important enough for someone with a brain like yours to tackle?**

Sébastien Bubeck: So, as I said before, nowadays, we really deal with very high dimensional spaces, and one key technique to explore those high dimensional spaces is to use randomness. And in classical convex optimization, randomness was not really a tool that people were looking at. They were more focused on trying to understand, what can you do with deterministic methods? So, it seems to me that there was, and there still is to some extent, a real opportunity to bring in those randomized algorithm tools into convex optimization and hopefully get a couple of breakthroughs, you know, on the way. So that was for the first area. So, the second area actually, was one of my areas of focus for a couple of years which comes from the observation that, nowadays, many data sets come in the form of a graph. And what you want to do is you want to do basic statistics with it. What are, you know, the basic things you want to compute with those graphs? To my surprise, there were some questions that seemed really fundamental to me that were not looked at. So, there are a couple of models which are very well-known now in random graph theory and in network analysis to explain, say, how the Facebook graph grew, or, you know, how the Twitter graph is growing. And one basic question that I wondered about is, okay, you have those evolution processes, but they start with some seed graphs. So, the seed is not empty. It’s not like, you know, Facebook started with one user, it was actually seeded with hundreds of users, or maybe more. So, what I was curious about is that there was this discrepancy between the reality, where people were designing, very carefully, a good seed graph in order to enhance the growth of the network, and the mathematical analysis, which was thinking about, what if the seed is empty? So, the first question that I asked was, is a theory without a seed relevant to the practice with a seed? And what we proved is that it’s not! That, you know, if you have two different seeds, you can get two completely different graph evolution processes.

**Host: Quelle surprise!**

Sébastien Bubeck: Yeah. It was actually a surprise, yeah! We didn’t expect it. We thought that maybe, you know, in the very large sample regime, at the end of time, once everything has settled down, then maybe the seed doesn’t matter anymore. But that’s not the case.

**Host: Interesting. You have a blog called, “I’m a bandit” and it deals with random topics in optimization, probability and statistics.**

Sébastien Bubeck: That’s right.

**Host: So, first of all, I’m glad you’re not knocking over banks. Uhh… But the big topic here is bandit convex optimization. Talk about this in general, and then talk about your polynomial time algorithm for bandit convex optimization. We’ll go kind of deep here. I’d like you to get as technical as you like, but also, we’ll nod to the fact that you’ve got a great video on this, on the website, and three big blog posts on it that kind of goes deep into the weeds, yeah?**

Sébastien Bubeck: Yes. So, bandit convex optimization is the following problem. So, as I said before, the new research on the multi-armed bandit problem is really interested in the case where there are many, many actions. So now you can ask, okay, what about the limit case where there is actually an infinite number of actions, a continuum of actions? So, then, of course, if you don’t make any assumptions about how the rewards are shaped over this set of actions, there is no hope. You cannot hope to find, you know, what is the best action. But as we discussed already, one of the very basic assumptions, which has found a lot of application in machine learning, is convexity. So, then the question becomes, in these bandit scenarios, where you have a continuum of actions, and the loss function is shaped as a convex function, what can you do? Can you still recover the optimal guarantee that we had before, with just two or three actions? So, this was an open problem that was published in 2004. And when I joined Microsoft Research five years ago, the Theory Group, and some of the members in the Theory Group, in particular Ofer Dekel, who is now the manager of the Machine Learning and Optimization group, and Yuval Peres, who is a distinguished probabilist, they were talking and thinking about this problem. And when I joined, you know, I was coming with my own expertise on bandits and I was really excited to join the group and start thinking about the questions that those guys cared about. And we were very fortunate that we made good progress on it. So, our first stride into the problem was, in fact, to show something a little bit unique in machine learning which was, we showed that the problem is solvable, but we didn’t actually give a technique to solve it. So, that was both interesting and disappointing.

**Host: “That don’t make so sense!”**

Sébastien Bubeck: Yes. It was strange. We were all surprised by this result, to some extent. But also, very excited because it meant that there is something to be discovered there. The problem is solvable, but it is certainly not solvable with the current techniques that we have.

**Host: I want you to drill in there for a second, because you’re actually saying something that baffles me. You could solve it, but you don’t know how you solved it?**

Sébastien Bubeck: So, it’s more as follows. There was this very concrete question: can you get sqrt{T} regret for bandit convex optimization? What we showed is that the answer to this question is, yes. Yes, it is, in principle, possible to get sqrt{T} regret for bandit convex optimization, but we did not give an algorithm that actually achieves this regret guarantee. So, then the question remained, can you actually get a polynomial time algorithm, like an efficient algorithm, something that will actually work on real data, and that gets the sqrt{T}-type behavior. So, that was the state of this problem when Yin Tat Lee came for an internship with me in the summer of 2016 and we started thinking about this question. And, again, we were lucky to come up with a new approach based on kernels. So, kernel methods have a long and distinguished history in machine learning. In fact, they were the most popular technique before deep learning overtook the field. And what we found is a connection between bandit-type information and kernels. That there really is an interplay between the two. And while we were exploring this, to our own bafflement, we discovered that there is a connection between kernels and what are called Bernoulli Convolutions. So, Bernoulli Convolutions, they are mathematical objects that were defined in the 1930s that Paul Erdos, the famous mathematician, thought of as one of the most beautiful objects in mathematics. And these Bernoulli Convolutions, they are directly related to the Gaussian kernel in machine learning and they are exactly what we needed to solve bandit convex optimization. And it turned out that the world expert on this topic was, you know, my office mate, sitting next door, Yuval Peres. So, it was very convenient that I could go to him and ask him all the questions around this. And then later, Ronen Eldan joined us on this project, who was also one of the members of the team with which we proved that there is a solution, but we don’t know, what is the solution!

**Host: I just have the biggest grin on my face. I, I… Every podcast I hear things that delight me, and nobody can see my face!**

**(music plays)**

**Host: Let’s talk about task systems. As a level set, tell us what they are, and what they are for, and then talk about metrical task systems and how they’re different.**

Sébastien Bubeck: So, we have discussed, so far, the multi-armed bandit problem, which is a very fundamental and important problem, but it’s missing a notion of a state. So, in most of the real-life problems that we care about, it’s not like you take an action and it has no consequence on the future. In fact, when you take action, often the state of the world, or the state of the system that you are acting on, is evolving and you have to take that into account. So, task systems are a very fundamental mathematical modelization of this problem and it goes like this. So, in task systems, you have a finite state space – I’m going to say that N is the size of the state space – and you are, again, going to play a sequential game, just like in multi-armed bandit. And at every time step, what you’re presented with is a task. And what is a task? A task is a cost function on the state space. So, a task just tells you, for every state, how long it’s going to take you to complete the task if you’re sitting in that state. So, now, in the game, you are sitting in a certain state, you know, because of what you have done in the past. You see the new task coming at you. And the question is, to which new state do you move to complete the task? Now, of course, stated like this, you would just move to the state that has the minimal completion time for the task. But that’s where the metrical part comes into play, the metrical task system. So, the metrical part is that states, they can be far away from each other. It’s not like you can easily move, you know, from a state where, say, you know no mathematics to a state where you are a world-expert in mathematics. This is going to cost you a lot to make this movement, you know, to change your own state. So, now you can think of the state space as a metric space, meaning that there is a notion of distance between two states. And when you’re sitting in a certain state and you get your new task, when you move to a new state, you are going to pay the distance to travel there, plus the completion time. So, these metrical task systems were introduced in the 80s in the theoretical computer science community. And, as I defined it, it’s a very discrete problem. And, as expected, people used discrete mathematics techniques to try to solve this problem. So, the big conjecture, the big open problem in that sub-field, is the following. So, you say that an algorithm for the metrical task system problem is competitive if it has the following property: no matter what the task sequence is, the algorithm’s performance, the total cost of the algorithm, is going to be comparable to the best it could have done had it known in hindsight what was the sequence of tasks to be presented. That’s what it means to be competitive. It basically means you can see into the future. So, we’re going to say that you are alpha competitive if, not only you are comparable to the best you could have done, but you are at most paying alpha times what the best could have been. So, you know, maybe you want to be two competitive or three competitive. And the conjecture in that field is that you can be log n competitive for any metrical task system. So, we are not very far from proving this conjecture. We are basically a polynomial factor off. But the important thing is that people were really looking at this problem from a discrete mathematics perspective. And what we brought in a year ago with my great collaborators, Michael Cohen, Yin Tat Lee and James Lee is that we looked at it from a convex optimization perspective. So, we viewed this very discrete problem in continuous space and continuous time. And once you make that modification, then it’s very natural to again talk about gradient descent, and in fact, this older strategy which is called mirror descent, and using this new continuous strategy, we were able to refine the guarantees that we know about this problem. So, this was for the discrete metrical task system. Now there is a convex version of the metrical task system. Because just as, for discrete multi-arm bandits, you know, there is a lot of interest into going into the case where there is huge set of actions, set of arms, it’s the same thing for metrical task systems. You might be interested in a case where the set of states is, in fact, infinite. So then, if N is infinity, you know log n is also infinity, and you are not competitive. So that’s the question….

**Host: Dang it!**

Sébastien Bubeck: Yes, yes indeed. Indeed. This is very disappointing. And, so the question is, just in metrical task systems, when you task a convex function over the state space, can you be competitive at all? So, this was an open problem from the early 90s, from Linial and Freedman, and what they proved is that in dimension two, you can be competitive. Very restrictive, I mean, certainly not talking about the millions of dimensions that we opened with. And what we just proved recently, we just put out the paper a month ago, we proved that, indeed, you can be competitive for convex metrical task systems in any dimension. But the competitive ratio that you get is going to depend on the dimension. And a great open question is whether you can polynomial in the dimension scale. That seems very difficult.

**Host: All the rest of it seems very difficult also. And this then, as you reference, solves a 30-year old problem.**

Sébastien Bubeck: Right. So, in this literature, they don’t talk about convex metrical task systems, instead they talk about these problems that they call chasing convex bodies. So, in chasing convex bodies, you are moving a point in Euclidian space, or let’s say you are moving a point with D parameters. And every time-step, you are a given a convex set of constraints and what you have to do is you have to move your set of parameters inside this convex set. So, you have to chase the convex set, you have to go inside the convex set. And you see this sequence of convex sets that are given to you and you keep moving. And then again, you want to be competitive, so you want to compare yourself to the best you could have done in hindsight had you known the sequence of convex bodies right from the start. And so, indeed, we do show that you can be competitive to chase convex bodies.

**(music plays)**

**Host: So, going philosophical a little bit here. We talked about some of the issues with what we call the current machine learning culture versus the culture of theoretical computer science, which is more traditional. What are some of the things we ought to be paying attention to right now? Is there anything that keeps you up at night? Anything we ought to be thinking about?**

Sébastien Bubeck: Yeah, sure. So, I personally believe that science is a fundamentally slow enterprise. It takes time to digest new concepts. It takes time to really recognize whether something that looks new on the face of it is actually new, or it’s just you know a slightly different point of view, a slightly different perspective. Our time is a little bit different from, you know, say, two decades ago, and one reason is just because the amount of data that we have. So, we have this influx of data, and people want to do something with it. So, this, I believe, is not going away, and there will be a need for, you know, data scientists and people who really think deeply about how to use and how to leverage information that you can acquire from data. So, that will not change. What I worry a little bit more about is the culture which is centered around claims which have no, as far as I can tell, no mathematical basis. You want science to be based on scientific facts. And that’s what I like about the theoretical computer science community is that they have well-established, long-standing open problems, and if you do make progress on those problems, then people will listen to you, rightfully, because it means you have some new techniques, something that people, before, couldn’t do. So, there are pros and cons for these very well-defined open problems. The cons is, of course, that those open problems, they are not solving burning practical questions that we need to resolve right now. So, then, you know, you come up with a solution and it’s not like people are going to implement your algorithm right away. So, these are maybe not the most important questions if you really want to get new algorithms right away. But on the other hand, progress on these questions, this is deep progress. This is progress where you really develop new tools which, hopefully, maybe a couple of years from now, some more practical methods are going to emerge out of it.

**Host: Tell us a bit about your history, academic and otherwise. It’s always great to hear personal anecdotes about where people came from and how they ended up where they are. What’s your story?**

Sébastien Bubeck: Sure. So, I grew up in a small town around Strasbourg in France next to Germany. Ummm…

**Host: What’s it called?**

Sébastien Bubeck: It’s called Oberhausbergen, which means “the house at the bottom of the hill.” And then, you know, there’s a city next to it which is Mittelhausbergen, “the house on the middle of the hill.” And then there is Niederhausbergen, “the house at the top of the hill.” So, these are the three towns where, you know, I was hanging out until I was 18. So during those first 18 years, yeah, I didn’t know anything about mathematics, science… but, then at 18, I went into this thing which is called Prepas, so that’s famous in France, that’s where you train to go into the Grandes Ecoles, Engineering Grandes Ecoles, or, you know, Ecoles Normales Superieures, which is where I went, where you learn how to do mathematics. So that’s where I really fell in love with mathematics. I always remember it, like, the very first lecture, I immediately thought, OK, this is the first time that somebody’s saying something that really resonates with me. So, two years of Prepas, one year in Ecoles Normales Superieures and then I moved on to do my PhD. What I thought at the time I wanted to do was mathematics of the brain, but then I realized that there was no such thing. But, as I realized that, I also noticed this field of machine learning which seemed very exciting. So, I started to learn more about it and then I decided to do a PhD in machine learning and, specifically in multi-armed bandit problems. So, that’s what I did in France.

**Host: So how did you end up at Microsoft Research?**

Sébastien Bubeck: After my PhD, I did one year in Barcelona for my post doc, which was just amazing, scientifically and otherwise, and then, with my wife, we moved to Princeton, New Jersey, where I was an Assistant Professor in the OR department. I stayed there for three years. And then I did a semester at the Simons Institute in Berkeley and that’s where I met Yuval Peres, and that’s also where I discovered the field of theoretical computer science which, again, I fell in love with. And that’s really where I could see, you know, my second calling, I’d say. After mathematics, really, theoretical computer science. I thought the people in the field were amazing. I thought the questions they were looking at were fantastic. I thought their work ethic was very refreshing. So, that’s where I met Yuval Peres and then after that is history. That’s why I joined Microsoft Research. I came to visit, and they made me an offer and I decided to join.

**Host: So as of two days ago, you have a pretty cool announcement to make.**

Sébastien Bubeck: Right. Our paper, a joint work with researchers at Inria in France, and Yin Tat Lee, again, Yin Tat Lee, just got awarded the best paper at NIPS. So, it’s pretty exciting.

**Host: Tell us a little bit about the paper and what it’s adding to the literature in the field.**

Sébastien Bubeck: Right, so it’s, in fact, tying back to one of my main areas of interest which is convexity and randomness. So, the question we were looking at is a problem of distributed convex optimization. So, what happens when the objective function that you want to optimize which depends on your data, it’s not all in a single place, but the data is distributed among many servers. Then how do you optimize both the communication between the different servers as well as the work that each server is doing? So, that’s the question we were looking at. Of course, we were very far from being the first ones to look at… there are literally hundreds, if not thousands, of papers written on this topic. But, somewhat surprisingly, it seems that nobody has really looked at optimal methods, meaning provably optimal, both upper and lower bounds. So, in classical serial optimization, this is, again, work done by the Russian school, Arkadi Nemirovski, Yurii Nesterov, and those people. But they didn’t look so much at the distributed scenario. And one reason, I think, is because in this distributed scenario, in fact, randomness is key. Randomization is very, very important. And that’s what our paper proves is that if you use randomizations then you can get optimal guarantees.

**Host: That’s awesome. Congratulations.**

Sébastien Bubeck: Thank you.

**Host: Sébastien, as we close, I always ask my guests to offer some parting advice to our listeners, whether it be in the form of inspiration for future research or challenges that remain to be solved. And your recent paper kind of tees that up a little bit. What would you say to researchers who might just be getting interested in this field?**

Sébastien Bubeck: Maybe a good advice to any starting researcher is to become an expert in a very narrow field or even narrow question, but really become the world expert in that question and that will yield rewards for a very long time. So that’s what I did with multi bandit problem and I think this was very successful and, you know, later on I could see connections with this problem that I wouldn’t have made if I didn’t know all of this about the multi-bandit problem. Now, in terms of research directions, there are really many, many questions emerging. So, one of them we already mentioned is in this chasing convex bodies problem. We showed that you can be competitive, but our dependency on the dimension is very bad. It’s exponential in the dimension. So, a great open question, but difficult one, is whether you can be polynomial. Another open question is this metrical task system to actually solve the long-standing conjecture of whether you can be log n competitive. This is a fantastic question that relates to how do you define a good notion of entropy for general metric spaces? It’s really a very deep and beautiful question. We don’t know how to solve this. Related to this latest best paper award at NIPS, in fact, you can think about the multi-armed bandit version of this problem which is exactly distributed bandit. This problem is basically entirely open, and I believe it’s a very rich question and, in fact, it ties into a broader question of how do you think about multi-agent problems? And this, yeah, is a very rich field yet to emerge.

**(music plays)**

**Host: Sébastien Bubeck. Thank you very much for coming on the podcast. It’s been really, really interesting.**

Sébastien Bubeck: My pleasure. I really enjoyed the questions.

**Host: To learn more about Dr. Sébastien Bubeck and how theoretical computer scientists are tackling the multi-armed bandit and other combinatorial conundrums, visit Microsoft.com/research**