This talk consists of two short talks.
Localization of 3D Audio-Visual Objects Using Unsupervised Clustering
We address the problem of localizing objects that can be both seen and heard. We exploit the benefits of a human-like configuration of sensors (binaural and binocular) for gathering auditory and visual observations. It is shown that the localization problem can be recast as the task of clustering the audio-visual observations into coherent groups. We propose a probabilistic generative model that captures the relations between audio and visual observations. This model maps the data into a common audio-visual 3D representation via a pair of mixture models. Inference is performed by a version of EM which provides cooperative estimates of both the auditory activity and the 3D position of each object. We describe several experiments with single- and multiple-speaker localization, in the presence of other audio sources.
Robust Shape and Graph Matching using Laplacian Embedding and EM
Shape matching is a central topic in computational vision, medical image analysis, etc. One instance of shape matching is to find dense correspondences between point representations. The problem of matching 3-D articulated shapes remains very difficult, mainly because it is not clear how to choose a transformation group under whose action the shapes could be studied. One possible approach is to represent shapes by locally connected sets of points, i.e. sparse graphs, and to use a spectral embedding method in order to map these graphs onto a lower dimensional space. As a result, a dense match between shapes can be found through rigid point registration of their embeddings. We will describe in detail the matching method and show numerous results with voxel- and mesh-data.