The first New England Machine Learning Day will be held May 16, 2012, from 9:50 AM to 5:40 PM at Microsoft Research New England, One Memorial Drive, Cambridge, MA 02138. The event will bring together local academics and researchers in machine learning and its applications. There will be a lively poster session during lunch. See the agenda tab for the list of presentations.
10:00-10:40, Tommi Jaakkola (MIT)
Scaling structured prediction
10:45-11:25, Andrew McCallum (UMass Amherst)
Joint inference and probabilistic databases for large-scale knowledge-base construction
Wikipedia’s impact has been revolutionary. The collaboratively edited encyclopedia has transformed the way many people learn, browse new interests, share knowledge and make decisions. Its information is mainly represented in natural language text. However, for many tasks more structured information is useful because it better supports pattern analysis and decision-making. In this talk I will describe multiple research components useful for building large, structured knowledge bases, including information extraction from text, entity resolution, joint inference with conditional random fields, probabilistic databases to manage uncertainty at scale, robust reasoning about human edits, tight integration of probabilistic inference and parallel/distributed processing, and probabilistic programming languages for easy specification of complex graphical models. I will also discuss applications of these methods to scientometrics and a new publishing model for science research. Joint work with Michael Wick, Sameer Singh, Karl Schultz, Sebastian Riedel, Limin Yao, Brian Martin and Gerome Miklau.
11:30-12:10, Leslie Valiant (Harvard)
Reasoning on learned knowledge
Lunch and posters
1:45-2:25, Ohad Shamir (MSR)
New approaches to collaborative filtering
Collaborative Filtering (CF) is a common approach for recommender systems, which utilize information from many users’ prior preferences. In particular, a matrix-based approach, which attempts to predict by fitting a low-rank or low-norm matrix to the observed data, has proven highly effective in modern CF tasks, such as the Netflix challenge. In this talk, I will outline two recent contributions in this direction. The first is a simple and principled algorithm to find a low-rank solution to large-scale convex optimization problems, such as those encountered in CF. The second is the development of online learning algorithms for CF, which can potentially provide very strong guarantees under minimal assumptions on the users’ behavior. Moreover, the unique nature of CF hinders the application of standard online learning techniques, and requires some fundamentally new approaches which might be of independent interest. Based on joint works with Nicolò Cesa-Bianchi, Alon Gonen, Sasha Rakhlin, Shai Shalev-Shwartz and Karthik Sridharan.
2:30-3:10, Pedro Felzenszwalb (Brown)
Object detection with grammar models
3:30-4:10, Edo Airoldi (Harvard)
Valid inference in high-throughput biology and social media marketing
4:15-4:55, Regina Barzilay (MIT)
Multilingual learning via selective sharing
5:00-5:40, Ce Liu (MSR)
A dense correspondence framework for visual computing
What has been will be again, what has been done will be done again; there is nothing new under the sun — Ecclesiastes 1:9.
While we walk on a street, we stare at a 3D scene and see the same objects from different views. In our lives, we look at our home, workplaces, colleagues, family members and friends every day. Even when we travel to different countries, this world consists of similar objects and scenes that we have seen before. Therefore, repetitiveness (also called sparsity) is an important characteristics of images and videos for visual computing.
Repetition of visual patterns across frames is established by means of correspondences, as pixels and objects move from frame to frame. In this talk, we will exploit dense correspondences for visual computing. We start with videos, where dense correspondences can be naturally formulated as motion. We show how accurate motion estimation can be used for video reconstruction (denoising, super resolution, deblocking), and discuss possible directions for long-range motion representation. We will move to a large set of images, where dense correspondences can be captured by SIFT flow, an algorithm that is able to align images across different scenes. Now that pixels and visual patterns are dense connected across images, we are able to treat a large collection of images just like videos, and perform both low-level image reconstruction and high-level image understanding on a large graph.
Sham Kakade, Microsoft Research, Chair
Ryan Adams, Harvard
Adam Tauman Kalai, Microsoft Research
Cynthia Rudin, MIT
Joshua Tenenbaum, MIT