New England Machine Learning Day 2017

New England Machine Learning Day 2017


The sixth annual New England Machine Learning Day will be Friday, May 12, 2017, at Microsoft Research New England, One Memorial Drive, Cambridge, MA 02142. The event will bring together local academics and researchers in Machine Learning, Artificial Intelligence, and their applications. There will be a lively poster session during lunch.

Interested in helping improve fairness and reduce bias/discrimination in ML? Attend New England Machine Learning Hackathon: Hacking Bias in ML, the day before, Thursday May 11, at the same location.

For talk abstracts, see the Agenda tab above.


Time Session
Opening remarks

Leslie Pack Kaelbling, Massachusetts Institute of Technology
Intelligent robots redux

10:35–11:05 Alexander Rush, Harvard University
Structured attention networks
11:10–11:40 Lester Mackey, Microsoft Research
Measuring sample quality with Stein’s method
11:40–1:45 Lunch and posters
1:45–2:15 Thomas Serre, Brown University
What are the visual features underlying human versus machine vision?
2:20–2:50 David Sontag, Massachusetts Institute of Technology
Causal inference via deep learning
2:50–3:20 Coffee break
3:20–3:50 Roni Khardon, Tufts University
Effective variational inference in non-conjugate 2-level latent variable models
3:55–4:25 Tina Eliassi-Rad, Northeastern University
Learning, mining and graphs
4:30–5:00 Erik Learned-Miller, University of Massachusetts Amherst
Bootstrapping intelligence with motion estimation


  • David Cox, Harvard University
  • Adam Tauman Kalai, Microsoft Research (chair)
  • Ankur Moitra, Massachusetts Institute of Technology
  • Kate Saenko, Boston University

Poster chairs

Steering committee

  • Ryan Adams, Harvard University
  • Adam Tauman Kalai, Microsoft Research
  • Joshua Tenenbaum, Massachusetts Institute of Technology

Related events


9:50 – 10:00
Opening remarks

10:00 – 10:30, Leslie Pack Kaelbling, Massachusetts Institute of Technology
Intelligent robots redux
The fields of AI and robotics have made great improvements in many individual subfields, including in motion planning, symbolic planning, probabilistic reasoning, perception, and learning. Our goal is to develop an integrated approach to solving very large problems that are hopelessly intractable to solve optimally. We make a number of approximations during planning, including serializing subtasks, factoring distributions, and determinizing stochastic dynamics, but regain robustness and effectiveness through a continuous state-estimation and replanning process. I will describe our initial approach to this problem, as well as recent work on improving effectiveness and efficiency through learning, and speculate a bit about the role of learning in generally intelligent robots.

10:35 – 11:05, Alexander Rush, Harvard University
Structured attention networks
Recent deep learning systems for NLP and related fields have relied heavily on the use of neural attention, which allows models to learn to focus on selected regions of their input or memory. The use of neural attention has proven to be a crucial component for advances in machine translation, image captioning, question answering, summarization, end-to-end speech recognition, and more. In this talk, I will give an overview of the current uses of neural attentionand memory, describe how the selection paradigm has provided NLP researchers flexibility in designing neural models, and demonstrate some fun applications of this approach from our group.

I will then argue that selection-based attention may be an unnecessarily simplistic approach for NLP, and discuss our recent work on Structured Attention Networks [Kim et al 2017]. These models integrate structured prediction as a hidden layer within deep neural networks to form a variant of attention that enables soft-selection over combinatorial structures, such as segmentations, labelings, and even parse trees. While this approach is inspired by structuredprediction methods in NLP, building structured attention layers within a deep network is quite challenging, and I will describe the interesting dynamic programming approach needed for exact computation. Experiments test the approach on a range of NLP tasks including translation, question answering, and natural language inference, demonstrating improvements upon standard attention in performance and interpretability.

11:10 – 11:40, Lester Mackey, Microsoft Research
Measuring sample quality with Stein’s method
Approximate Markov chain Monte Carlo (MCMC) offers the promise of more rapid sampling at the cost of more biased inference. Since standard MCMC diagnostics fail to detect these biases, researchers have developed computable Stein discrepancy measures that provably determine the convergence of a sample to its target distribution. This approach was recently combined with the theory of reproducing kernels to define a closed-form kernel Stein discrepancy (KSD) computable by summing kernelevaluations across pairs of sample points. We develop a theory of weak convergence for KSDs based on Stein’s method, demonstrate that commonly used KSDs fail to detect non-convergence even for Gaussian targets, and show that kernels with slowly decaying tails provably determine convergence for a large class of target distributions. The resulting convergence-determining KSDs are suitable for comparing biased, exact, and deterministic sample sequences and simpler to compute and parallelize than alternative Stein discrepancies. We use our tools to compare biased samplers, select sampler hyperparameters, and improve upon existing KSD approaches to one-sample hypothesis testing and sample quality improvement.

11:40 – 1:45
Lunch and posters

1:45 – 2:15, Thomas Serre, Brown University
What are the visual features underlying human versus machine vision?

2:20 – 2:50, David Sontag, Massachusetts Institute of Technology
Causal inference via deep learning

2:50 – 3:20
Coffee break

3:20 – 3:50, Roni Khardon, Tufts University
Effective variational inference in non-conjugate 2-level latent variable models

3:55 – 4:25, Tina Eliassi-Rad, Northeastern University
Learning, mining and graphs

4:30 – 5:00, Erik Learned-Miller, University of Massachusetts Amherst
Bootstrapping intelligence with motion estimation


Poster Title Presenting Author / Authors
Robust and Efficient Transfer Learning using Hidden Parameter Markov Decision Processes
Sam Daulton, Harvard University / Taylor Killian, Harvard University; Finale Doshi-Velez, Harvard University; George Konidaris, Brown University
Multimodal Sparse Representation Learning for Multimedia Applications
Miriam Cha, Harvard University / Youngjune L. Gwon & H.T. Kung, Harvard University
Learning Optimized Risk Scores on Large-Scale Datasets
Berk Ustun, Massachusetts Institute of Technology / Cynthia Rudin, Duke University
Accurate structure-based drug-protein binding energy prediction with deep convolutional neural networks

Maksym Korablyov, Massachusetts Institute of Technology /  Xiao Luo, Nilai Sarda, Mengyuan Sun, Tyson Chen, Lily Zhang, Ellen Shea, Erica Weng, Brian Xie, Yejin You, Ryan Hays, Shuo Gu, Collin Stultz, & Gil Alterovitz, Harvard-MIT division, Boston Children’s Hospital

Kronecker Determinantal Point Processes
Zelda Mariet, Massachusetts Institute of Technology / Suvrit Sra, Massachusetts Institute of Technology
Synthesizing 3D via Modeling Multi-View Depth Maps and Silhouettes with Deep Generative Networks

Amir Arsalan Soltani, Massachusetts Institute of Technology / Haibin Huang, University of Massachusetts, Amherst; Jiajun Wu, Massachusetts Institute of Technology; Tejas D. Kulkarni, Google DeepMind; Joshua B. Tenenbaum, Massachusetts Institute of Technology

R-C3D: Region Convolutional 3D Network for Temporal Activity Detection
Huijuan Xu, Boston University / Abir Das, Boston University; Kate Saenko, Boston University
A Decentralized Cluster Primal Dual Splitting Method for Large-Scale Sparse Support Vector Machines with An Application to Hospitalization Prediction

Theodora S. Brisimi, Boston University / Alex Olshevsky, Ioannis Ch. Paschalidis, & Wei Shi, Boston University

SmartPlayroom: Semi-automated behavioral analysis of children with ASD in naturalistic environment
Pankaj Gupta, Brown University / Elena Tenenbaum, Stephen Sheinkopf, Thomas Serre, & Dima Amso, Brown University
Guided Proofreading of Automatic Segmentations for Connectomics

Daniel Haehn, Harvard University / Verena Kaynig-Fittkau, Harvard University; James Tompkin, Brown University; Jeff W. Lichtman & Hanspeter Pfister, Harvard University

Lie-Access Neural Turing Machines
Greg Yang, Harvard University / Alexander Rush, Harvard University
Discriminate-and-Rectify Encoders: Learning from Image Transformation Sets

Andrea Tacchetti, Massachusetts Institute of Technology / Stephen Voinea & Georgios Evangelopoulos, Massachusetts Institute of Technology

Testing Ising Models
Gautam Kamath, Massachusetts Institute of Technology / Constantinos Daskalakis & Nishanth Dikkala, Massachusetts Institute of Technology
Mutual Information Hashing

Fatih Cakir, Boston University / Kun He, Sarah Adel Bargal, & Stan Sclaroff, Boston University

Dataflow Matrix Machines as a Model of Computations with Linear Streams
Michael Bukatin, HERE North America LLC / Jon Anthony, Boston College
A Bandit Framework for Strategic Regression

Yang Liu, Harvard University / Yiling Chen, Harvard University

Robust Budget Allocation via Continuous Submodular Functions
Matthew Staib, Massachusetts Institute of Technology / Stefanie Jegelka, Massachusetts Institute of Technology
Value Directed Exploration in Multi-Armed Bandits with Structured Priors
Bence Cserna, University of New Hampshire / Marek Petrik, Reazul Hasan Russel, & Wheeler Ruml, University of New Hampshire
Designing Neural Network Architectures Using Reinforcement Learning
Bowen Baker, Massachusetts Institute of Technology / Otkrist Gupta, Nikhil Naik, & Ramesh Raskar, Massachusetts Institute of Technology
What do Neural Machine Translation Models Learn about Morphology?

Yonatan Belinkov, Massachusetts Institute of Technology / Nadir Durrani, Fahim Dalvi, & Hassan Sajjad, Qatar Computing Research Institute; James Glass, Massachusetts Institute of Technology

Message-passing algorithms for synchronization problems
Amelia Perry, Massachusetts Institute of Technology / Alexander S. Wein, Massachusetts Institute of Technology; Afonso S. Bandeira, New York University; Ankur Moitra, Massachusetts Institute of Technology
Non-detection in spiked matrix models

Alex Wein, Massachusetts Institute of Technology / Amelia Perry, Massachusetts Institute of Technology; Afonso Bandeira, New York University Courant; Ankur Moitra, Massachusetts Institute of Technology

Coarse-to-Fine Attention Models for Document Summarization
Jeffrey Ling, Harvard University / Alexander Rush, Harvard University
TensorFlow Debugger: Debugging Dataflow Graphs for Machine Learning
Shanqing Cai, Google / Eric Breck, Eric Nielsen, Michael Salib, & D. Sculley, Google
Computational Prediction of Neoantigens for Personalized Cancer Vaccines
Michael Rooney, Neon Therapeutics (formerly at Broad, MIT) / Jenn Abelin, Neon Therapeutics (formerly at Broad); Derin Keskin, Dana–Farber Cancer Institute; Sisi Sarkizova, Harvard; Nir Hacohen & Steve Carr, Broad Institute; Cathy Wu, Dana–Farber Cancer Institute
On Sequential Elimination Algorithms for Best-Arm Identification in Multi-Armed Bandits

Shahin Shahrampour, Harvard University / Mohammad Noshad & Vahid Tarokh, Harvard University

Bayesian Group Decisions: Algorithms and Complexity
Amin Rahimian, University of Pennsylvania/MIT Institute for Data, Systems, and Society / Ali Jadbabaie & Elchanan Mossel, Massachusetts Institute of Technology
Node Embedding for Network Community Discovery
Christy Lin, Boston University / Prakash Ishwar, Boston University; Weicong Ding, Technicolor
Max-value Entropy Search for Efficient Bayesian Optimization
Zi Wang, Massachusetts Institute of Technology / Stefanie Jegelka Professor, Massachusetts Institute of Technology
Network Analysis Identifies Regions of Chromosome Interactions in the Genome

Anastasiya Belyaeva, Massachusetts Institute of Technology / Caroline Uhler, Massachusetts Institute of Technology; Saradha Venkatachalapathy, GV Shivashankar, & Mallika Nagarajan, National University of Singapore

SoundNet: Learning Sound Representations from Unlabeled Video
Carl Vondrick, Massachusetts Institute of Technology / Yusuf Aytar & Antonio Torralba, Massachusetts Institute of Technology
Recursive Sampling for the Nystrom Method

Christopher Musco, Massachusetts Institute of Technology

Robust Statistics in High Dimensions, Revisited
Jerry Li, Massachusetts Institute of Technology / Ilias Diakonikolas, University of Southern California; Gautam Kamath, Massachusetts Institute of Technology; Daniel M. Kane, University of California, San Diego; Ankur Moitra, Massachusetts Institute of Technology; Alistair Stewart, University of Southern California
From Patches to Images: A Nonparametric Generative Model

Geng Ji, Brown University / Mike Hughes, Harvard University; Erik Sudderth, Brown University/University of California, Irvine

Nucleotide-level Modeling of Genetic Regulation with Large Receptive Fields using Dilated Convolutions
Ankit Gupta, Harvard University / Alexander Rush, Harvard University
Predicting the Quality of Short Narratives from Social Media
Tong Wang, University of Massachusetts Boston / Ping C., University of Massachusetts Boston; Albert L., Disney Research
Generative Adversarial Models for Layered Segmentation
Deniz Oktay, Massachusetts Institute of Technology / Carl Vondrick & Antonio Torralba, Massachusetts Institute of Technology
ST-LDDM: An effective model for urban air quality prediction
Zheyun Xiao, University of Massachusetts Boston / Yang Mu, Facebook; Wei Ding, University of Massachusetts Boston
Data-driven identification and repair of software vulnerabilities
Onur Ozdemir, Draper / Jacob H., Boston University; Louis K., Onur O., Rebecca R., Marc M., Tomo Lazovich,
& Jeffrey O., Draper
A Non-Linear Spatio-Temporal Modeling Framework for Heavy Precipitation and Crop Yield Prediction

Yahui Di, University of Massachusetts Boston / Wei Ding, University of Massachusetts Boston

Predicting neural response of olfactory system with structural and vibrational properties of molecules
Benjamin Sanchez, Harvard University / Aniket Zinzuwadia, Harvard University;
Semion Saikin, Harvard University; Honggoo Chae & Dinu F. Albeanu, Cold Spring Harbor Laboratory; Venkatesh N. Murthy & Alán Aspuru-Guzik, Harvard University
On Causal Analysis for Heterogeneous Networks
Katerina Marazopoulou, University of Massachusetts Amherst / David Arbour &
David Jensen, University of Massachusetts Amherst
The Ombú estimator: topology of samples to compare distributions
Javier Burroni, University of Massachusetts Amherst / David Jensen, University of Massachusetts Amherst
A/B Testing in Networks with Adversarial Members

Kaleigh Clary, University of Massachusetts Amherst / David Jensen & Andrew McGregor, University of Massachusetts Amherst

Scene Grammars, Factor Graphs, and Belief Propagation
Jeroen Chua, Brown University / Pedro Felzenszwalb, Brown University
Locally Interpretable Models to Generate Annotated Active Learning Recommendations

Richard L. Phillips, Haverford College / Kyu Hyun Chang & Sorelle Friedler, Haverford College

Crime Hotspot Forecasting via Deep Neural Networks
Yong Zhuang, University of Massachusetts Boston / Wei Ding, University of Massachusetts Boston; Melissa Morabito, University of Massachusetts Lowell