I am part of the Microsoft Research Montreal Reinforcement Learning team.
I frame my research agenda as morals in Reinforcement Learning, as I believe that deep Reinforcement Learning transgresses most of the morals requirements in their common use: this includes (but is not limited to) reliability, privacy, safety, sustainability, and fairness, 5 topics for which I currently have work in progress. Here is a quick overview of my latest work:
- Deep Reinforcement Learning algorithms are unreliable. As such, we might consider training an ensemble of them and select the fittest one to control the agent. SSBAS is an architecture for Reinforcement Learning Algorithm Selection. We show in several domains and many settings that SSBAS improves reliability and even outperforms the best algorithm in the ensemble. It may also be used to enforce secondary constraints such as safety or fairness.
- SPIBB is a batch Reinforcement Learning algorithm that provably reliably improves the behavioural policy, and that outperforms the competing algorithms by a wide margin. It is also the first of its kind that may be applied to large problems requiring function approximation, and still safely improves the policy. It will be presented as a long oral at ICML2019 on Tue Jun 11th 2pm @ Room 104, poster #101 Pacific Ballroom.
- For a stochastic bandit problem, we study the problem of privacy in a distributed architecture: we have N players, typically one per user, that need to synchronize to minimize the regret. It will be presented as a long oral at ICML2019 on Thu Jun 13th 4pm @ Hall B, poster #51 Pacific Ballroom.
- A Budgeted Markov Decision Process is an extension of an MDP to critical applications requiring safety constraints. So far, It could only be solved in the case of finite state spaces with known dynamics. Our work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We validate our approach on two simulated applications: spoken dialogue and autonomous driving.