Directions in ML: Automating Dataset Comparison and Manipulation with Optimal Transport
Machine learning research has traditionally been model-centric, focusing on architectures, parameter optimization, and model transfer. Much less attention has been given to the datasets on which these models are trained, which are often assumed to be fixed, or subject to extrinsic and inevitable change. However, successful application of ML in practice often requires substantial effort in terms of dataset preprocessing and manipulation, such as augmenting, merging, mixing, or reducing datasets.
In this talk I will present some of our recent work that seeks to formalize and automatize these and other flavors of dataset manipulation under a unified approach. First, I will introduce the Optimal Transport Dataset Distance, which provides a fundamental theoretical building block: a formal notion of similarity between labeled datasets. In the second part of the talk, I will discuss how this notion of distance can be used to formulate a general framework of dataset optimization by means of gradient flows in probability space. I will end by presenting various exciting potential applications of this dataset optimization framework.
Learn more about the 2020-2021 Directions in ML: AutoML and Automating Algorithms virtual speaker series: https://aka.ms/diml
- Date:
- Speakers:
- David Alvarez-Melis
- Affiliation:
- Microsoft Research New England
-
-
David Alvarez-Melis
Senior Researcher
-
-
Watch Next
-
-
-
Panel: Causal ML in industry
Speakers:- Ya Xu,
- Totte Harinen,
- Dawen Liang
-
Panel: Causal ML Research at Microsoft
Speakers:- Daniel McDuff,
- Javier González,
- Justin Ding
-
Research talk: Automating and Optimizing IT Operations Management with AI
Speakers:- Rama Akkiraju
-
Research talk: Causal ML and business
Speakers:- Jacob LaRiviere
-
-
Directions in ML: Structured Models for Automated Machine Learning
Speakers:- Madeleine Udell
-
Directions in ML: Latent Stochastic Differential Equations: An Unexplored Model Class [Talk]
Speakers:- David Duvenaud
-