Efficient Algorithms for High Dimensional Robust Learning
We study high-dimensional estimation in a setting where an adversary is allowed to arbitrarily corrupt an $\varepsilon$-fraction of the samples. Such questions have a rich history spanning statistics, machine learning and theoretical computer science. Even…
Machine Learning Systems for Highly Distributed and Rapidly Growing Data
The usability and practicality of machine learning are largely influenced by two critical factors: low latency and low cost. However, achieving low latency and low cost is very challenging when machine learning depends on real-world…
Get Your Data Together! Algorithms for Managing Data Lakes
Data lakes (e.g., enterprise data catalogs and Open Data portals) are data dumps if users cannot find and utilize the data in them. In this talk, I present two problems in massive, dynamic data lakes:…
Orthogonal Statistical Learning
PAC Battling Bandits in the Plackett-Luce Model
Predicting the ‘holy grail’ of climate forecasting: A new model and a new public dataset
It was crunch time, just as it had been many times before in the preceding weeks. Such is the nature of real-time competition. The yearlong Subseasonal Climate Forecast Rodeo was being sponsored by the Bureau…