Portrait of Romain Laroche

Romain Laroche

Principal Researcher

About

I am part of the Microsoft Research Montreal Reinforcement Learning team.

I frame my research agenda as Responsible Reinforcement Learning, as I believe that deep Reinforcement Learning transgresses most of the responsibility requirements in their common use: this includes (but is not limited to) reliability, privacy, safety, sustainability, and fairness, 5 topics for which I currently have work going on. Here is a quick overview of my latest/current work:

  • Deep Reinforcement Learning algorithms are unreliable. As such, we might consider training an ensemble of them and select the fittest one to control the agent. SSBAS (ICLR18) is an architecture for Reinforcement Learning Algorithm Selection. We show in several domains and many settings that SSBAS improves reliability and even outperforms the best algorithm in the ensemble. It may also be used to enforce secondary constraints such as safety or fairness.
  • SPIBB (long oral ICML19, ECML19) is a batch Reinforcement Learning algorithm that provably reliably improves the behavioural policy, and that outperforms the competing algorithms by a wide margin. It is also the first of its kind that may be applied to large problems requiring function approximation, and still safely improves the policy.
  • For a stochastic bandit problem, we study the problem of privacy in a distributed architecture (long oral ICML19): we have N players, one per user, that need to synchronize to minimize the regret, without jeopardizing the privacy.
  • A Budgeted Markov Decision Process is an extension of an MDP to critical applications requiring safety constraints. So far, It could only be solved in the case of finite state spaces with known dynamics. Our work (NeurIPS19) extends the state-of-the-art to continuous spaces environments and unknown dynamics. We validate our approach on two simulated applications: spoken dialogue and autonomous driving.
  • Fairness in RL typically tries to ensure an expected minimum user-centric return to any critical group of the population. We flip this formulation and instead optimize the fairness under sustainability/profitability constraint.

You may find a recent video of my keynote at Data Fest by following this link.