Unsupervised Discovery of Objects and their Interactions for Common-Sense Physical Reasoning

April 27, 2018
Michael Chang | UC Berkeley

Common-sense physical reasoning is facilitated by representing sensory percepts into discrete objects. By decomposing a complex visual scene into distinct objects, humans can describe relations between the objects and reason about their dynamics as well as the consequences of their interactions. What is the program that performs such reasoning, and how do we reverse-engineer this prior knowledge of physics into machines? One approach has been to use a physics simulator whose parameters are inferred from raw sensory observations. Another approach has been to directly learn to map raw observations to predictions. However, the advantage of one is the disadvantage of the other: the source code of a physics engine is typically difficult to adapt to real-world scenarios, but traditional unsupervised learning approaches for modeling physical interaction in pixel space do not generalize to different scenes or respect commonsense intuitions about object permanence. I will discuss a line of recent work that seeks to bridge the gap between the two approaches. With the Neural Physics Engine (paper: https://arxiv.org/abs/1612.00341 (opens in new tab), webpage: http://mbchang.github.io/npe/ (opens in new tab)), we first assumed object state information and proposed a learnable model that can generalize to different numbers of objects and scene configurations the way symbolic physics engines do. Then with Relational Neural Expectation Maximization (paper: https://arxiv.org/abs/1802.10353 (opens in new tab), webpage: https://sites.google.com/view/r-nem-gifs/ (opens in new tab)), we removed the state-space assumption with a method for obtaining object representations from pixels. The result is an end-to-end system that learns to discover objects and model their physical interactions from raw visual images in a purely unsupervised fashion, exhibiting simple notions of object permanence, attention, and compositionality that characterize human-level common-sense physical reasoning.

Speaker Details

Michael Chang is a Ph.D. student at U.C. Berkeley. He is broadly interested in reverse-engineering two aspects of human cognition in machines: their flexible generalization from limited experience and their design of algorithms for solving long-horizon problems. He received his B.S. in EECS from MIT, where his undergraduate research was supervised by Joshua Tenenbaum, and he has interned with Honglak Lee and Jürgen Schmidhuber. He is a recipient of the NSF Graduate Fellowship. For additional information, please see: http://mbchang.github.io/.