MIT-MSR Machine Learning Seminar


A roughly monthly seminar series about recent results in Machine Learning. The MIT-MSR ML Seminar is a seminar series highlighting recent advances in Machine Learning that meets roughly once per month. Local and visiting researchers will speak about their work. The agenda typically consists of a prepared presentation, followed by a period of questions and discussion. The seminar is aimed at an audience of machine learning researchers and graduate students. All are welcome!

Event Details

Monday, May 2nd, 2016 | 3:00 PM - 4:00 PM | MIT Stata 32-G449

The Zen of Passwords (and the Complexity of Human Computation)Santosh Vempala, Georgia Tech


What can a human compute in his/her head with no external aids? As a motivating application: is there a secure and humanly-usable password generation method? How about a humanly-computable one-way function? We propose a rigorous model of human computation and apply it to the problem of generating and remembering passwords. Our main finding is that there exist password schemas that are (a) precisely defined (b) humanly usable, in that they can be learned in minutes and can be used to generate passwords in 10-20 seconds and (c) secure to a well-defined extent, in that individual passwords are hard to guess, and knowing several passwords does not enable an adversary to infer others. Our Human Usability Measure (HUM) allows us to draw sharp boundaries for human computability and has other applications, including user-interface design and coin-flipping over SMS. It also raises a number of intriguing questions about human computation.

We demonstrate a tradeoff between security and ease of use of a password scheme. By a series of experiments on Amazon’s Mechanical Turk, we find that the easiest-to-remember secure password is KaKaChooChoo1243!

This is joint work with Manuel Blum, Jeremiah Blocki, Yongkoo Kang, Lisa Masserova, Elan Rosenfeld and Samira Samadi.


Santosh Vempala is a Distinguished Professor of Computer Science in the College of Computing at the Georgia Institute of Technology. He joined the College of Computing in the fall of 2006 as a professor of Computer Science. He spearheaded the Algorithms and Randomness Center and Think Tank at Georgia Tech, and served as its first director from 2006 until 2011. His research interests include algorithms, randomness, geometry and computing-for-good (C4G). He graduated from CMU in 1997, advised by Avrim Blum, and then taught at at MIT until 2006 except for a year as a Miller fellow at UC Berkeley. Vempala is also a Sloan, Guggenheim, ACM and generally excitable Fellow, especially when a phenomenon that appears complex from one perspective, turns out to be simple from another. In recent years, he has been trying to understand, with little success, how the brain works and how to model its computational abilities.


The Patil/Kiva Seminar room is on the 4th floor of the Gates Tower in the Stata Center (Building 32). Enter Building 32 at the front entrance and proceed straight ahead; there will be elevators to the right. Take the elevators to the 4th floor; exit and then turn right. At the end of the short corridor bear to the left and continue around the R&D Dining Room. The Patil/Kiva Seminar Room will be straight ahead, before the dining room

Wednesday, April 20th, 2016 | 3:00 PM - 4:00 PM | MIT Stata 32-G449

Big Data and Bayesian NonparametricsMatt Taddy, Microsoft Research


Big Data is often characterized by large sample sizes, high dimensions, and strange variable distributions. For example, an e-commerce website has 10-100s million observations weekly on a huge number of variables with density spikes at zero and elsewhere and very fat tails. These properties — big and strange — beg for nonparametric analysis. We revisit a flavor of distribution-free Bayesian nonparametrics that approximates the data generating process (DGP) with a multinomial sampling model. This model then serves as the basis for analysis of statistics — functionals of the DGP — that are useful for decision making regardless of the true DGP. The ideas will be illustrated in the measurement of treatment effect heterogeneity, analysis of heavy tailed distributions, and in fitting decision trees.

The result is a framework for scalable nonparametric Bayesian decision making on massive data.


Matt Taddy joined Microsoft Research in New England from the University of Chicago Booth School of Business, where he is an Associate Professor of Econometrics and Statistics. Taddy’s research is focused on methodology and applications in statistics, econometrics, and machine learning. He’s especially interested in the intersections of these fields, and in designing systems for analysis that work automatically and on a massive scale. He has collaborated with small start-ups, with large research agencies including Sandia, Los Alamos, and Lawrence Livermore National Labs, and was a research fellow at eBay from 2014-2016. He also created and teaches the MBA Big Data course at Booth. Taddy earned his PhD in Applied Math and Statistics in 2008 from the University of California, Santa Cruz, as well as a BA in Philosophy and Mathematics and an MSc in Mathematical Statistics from McGill University.

Wednesday, March 2nd, 2016 | 3:00 PM - 4:00 PM | MIT Stata 32-G463

Statistical Estimation Under Communication ConstraintsJohn Lafferty, University of Chicago


Imagine that I estimate a statistical model from data, and then want to share my model with you. But we are communicating over a resource constrained channel. By sending lots of bits, I can communicate my model accurately, with little loss in statistical risk. Sending a small number of bits will incur some excess risk. What can we say about the tradeoff between statistical risk and the communication constraints? This is a type of rate distortion and constrained minimax problem, for which we provide a sharp analysis in certain nonparametric settings.

Joint work with Yuancheng Zhu (University of Chicago).


John Lafferty is the Louis Block Professor in the Departments of Statistics, Computer Science, and the University of Chicago. He is also an Adjoint Professor at the Toyota Technological Institute at Chicago. His research area is statistical machine learning, with a focus on computational and statistical aspects of nonparametric methods, high-dimensional data, graphical models, and text modeling. Lafferty received his doctoral degree in mathematics from Princeton University, where he was a member of the Program in Applied and Computational Mathematics. Prior to joining the University of Chicago in 2011, he was a faculty member in the Computer Science Department at Carnegie Mellon University for almost twenty years. He is currently a member of the Committee on Applied and Theoretical Statistics (CATS) of the National Research Council, and co-chaired a National Academies Workshop on “Training Students to Extract Value from Big Data” in 2014. He was a Medallion Lecturer, awarded by the Institute for Mathematical Statistics, at the Joint Statistical Meetings in 2015.

Thursday, November 19th, 2015 | 4:00 PM - 5:00 PM | MIT Stata 32-G463

Methods for Quanitifying Conflict Casualties in SyriaRebecca Steorts, Duke University


Information about social entities is often spread across multiple large databases, each degraded by noise, and without unique identifiers shared across databases. Record linkage—reconstructing the actual entities and their attributes—is essential to using big data and is challenging not only for inference but also for computation. In this talk, I motivate record linkage by the current conflict in Syria. It has been tremendously well documented, however, we still do not know how many people have been killed from conflict-related violence. We describe a novel approach towards estimating death counts in Syria and challenges that are unique to this database. We first introduce a novel approach to record linkage by discovering a bipartite graph, which links manifest records to a common set of latent entities. Our model quantifies the uncertainty in the inference and propagates this uncertainty into subsequent analyses. We then introduce computational speed-ups to avoid all-to-all record comparisons based upon locality-sensitive hashing from the computer science literature. Finally, we speak to the success and challenges of solving a problem that is at the forefront of national headlines and news.


Rebecca C. Steorts is currently an Assistant Professor at Duke University in the Department of Statistical Science with affiations in the Social Science Research Institute and the information iniative at Duke. She was a Visiting Assistant Professor in the Statistics Department at Carnegie Mellon University from 2012–2015. She received her B.S. in Mathematics in 2005 from Davidson College, her MS in Mathematical Sciences in 2007 from Clemson University, and her PhD in 2012 from the Department of Statistics at the University of Florida under the supervision of Malay Ghosh. Rebecca is a recipient of the Graduate Alumni Fellowship Award (2007-2010) from the University of Florida and the U.S. Census Bureau Dissertation Fellowship Award (2010-2011). In 2011, she was awarded the UF Innovation through Institutional Integration Program (I-Cubed) and NSF for development of an introductory Bayesian course for undergraduates. She has also been awarded Finalist for the 2012 Leonard J. Savage Thesis Award in Applied Methology. She is interested in scalable computational methods for social science applications. Her current works focuses on recovering high dimensional objects from degraded data and determining how to recover the underlying structure. Methods used for this are entity resolution, small area estimation, locality sensitive hashing, and privacy-preserving record linkage as applied to medical studies, fmri studies, human rights violations, and estimation of poverty rates in hard to reach domains. Her research was on record linkage and sparse clustering was recently funded by the John Templeton Foundation, MetaKnowledge Network Grants Awarded, November 2014. She was recently named to MIT Technology Review’s 35 Innovators Under 35 for 2015 as a humantarian in the field of software. Her work will be profiled in the Septmember/October issue of MIT Technology Review and she will be recognized at a special ceremony along with an invited talk at EmTech in November 2015.

Thursday, October 15th, 2015 | 4:00 PM - 5:15 PM | MIT Stata 32-G449

Contextual Dueling BanditsRobert Schapire, Microsoft Research


We consider the problem of learning to choose actions using contextual information when provided with limited feedback in the form of relative pairwise comparisons. We study this problem in the dueling-bandits framework of Yue et al., which we extend to incorporate context. Roughly, the learner’s goal is to find the best policy, or way of behaving, in some space of policies, although “best” is not always so clearly defined. Here, we propose a new and natural solution concept, rooted in game theory, called a von Neumann winner, a randomized policy that beats or ties every other policy. This notion overcomes important limitations of existing solutions, particularly the Condorcet winner which has typically been used in the past, but which requires strong and often unrealistic assumptions. We then present three efficient algorithms for online learning in our setting, and for approximating a von Neumann winner from batch-like data. The first of these algorithms achieves particularly low regret, even when data is adversarial, although its time and space requirements are linear in the size of the policy space. The other two algorithms require time and space only logarithmic in the size of the policy space when provided access to an oracle for solving classification problems on the space.

This is joint work with Miro Dudík, Katja Hofmann, Alex Slivkins, and Masrour Zoghi.


Robert Schapire is a Principal Researcher at Microsoft Research in New York City. He received his PhD from MIT in 1991. After a short post-doc at Harvard, he joined the technical staff at AT&T Bell Laboratories (later, AT&T Labs) in 1991. In 2002, he became a Professor of Computer Science at Princeton University where he was later named the David M. Siegel ’83 Professor in Computer Science. He joined Microsoft Research in 2014. His awards include the 1991 ACM Doctoral Dissertation Award, the 2003 Gödel Prize, and the 2004 Kanelakkis Theory and Practice Award (both of the last two with Yoav Freund). He is a fellow of the AAAI, and a member of the National Academy of Engineering. His main research interest is in theoretical and applied machine learning, with particular focus on boosting, online learning, game theory, and maximum entropy.

Thursday, September 10th, 2015 | 4:00 PM - 5:15 PM | MIT Stata 32-G449

Inference and Learning with Probabilistic Submodular ModelsAndreas Krause, ETH Zurich


As a discrete analogue of convexity, submodularity has profound implications for optimization. In recent years, submodular optimization has found many new applications, such as in machine learning and network analysis. These include active learning, dictionary learning, data summarization, influence maximization and network structure inference. In this talk, I will present our recent work on quantifying uncertainty in submodular optimization. In particular, we carry out the first systematic investigation of inference and learning in probabilistic submodular models (PSMs).

These are probabilistic models defined through submodular functions — log-sub/supermodular distributions — generalizing regular binary Markov Random Fields and Determinantal Point Processes. They express natural notions such as attractiveness and repulsion and allow to capture long-range, high-order dependencies among the variables. I will present our recently discovered variational approach towards inference in general PSMs based on sub- and supergradients. We obtain both lower and upper bounds on the log-partition function, which enables computing probability intervals for marginals, conditionals and marginal likelihoods. We also obtain fully factorized approximate posteriors, at essentially the same computational cost as ordinary submodular optimization. Our framework results in convex problems for optimizing over differentials of submodular functions, which we show how to optimally solve. Our approximation is exact at the mode (for log-supermodular distributions), and we provide bounds on the approximation quality of the log-partition function with respect to the curvature of the function. We further establish natural relations between our variational approach and the classical mean-field method from statistical physics. Exploiting additive structure in the objective leads to highly scalable, parallelizable message passing algorithms. We empirically demonstrate the accuracy of our inference scheme on several PSMs arising in computer vision and network analysis. Lastly, I will discuss some very recent work on learning PSMs from data.


Andreas Krause is an Associate Professor of Computer Science at ETH Zurich, where he leads the Learning & Adaptive Systems Group. Before that he was Assistant Professor of Computer Science at Caltech. He received his Ph.D. in Computer Science from Carnegie Mellon University

(2008) and his Diploma in Computer Science and Mathematics from the Technical University of Munich (2004). He is a Microsoft Research Faculty Fellow and received an ERC Starting Investigator grant, the Deutscher Mustererkennungspreis, an NSF CAREER award and several best paper awards.