Vision-Steered Audio for Interactive Environments

Sumit Basu; Michael Casey; William Gardner; Ali Azarbayejani; Alex Pentland

Vision-Steered Audio for Interactive Environments

Sumit Basu ,
Michael Casey ,
William Gardner ,
Ali Azarbayejani ,
Alex Pentland

Proceedings of IMAGE'COM '96, Bordeaux, France | May 1996

Published by M.I.T Media Laboratory Perceptual Computing Section

Download BibTex

We present novel techniques for obtaining and producing audio information in an interactive virtual environment using vision information. These techniques are free of mechanisms that would encumber the user, such as clip-on microphones, headphones, etc. Methods are described for both extracting sound from a given position in space and for rendering an \auditory scene,” i.e., given a user location, producing sounds that appear to the user to be coming from an arbitrary point in 3-D space. In both cases, vision information about user position is used to guide the algorithms, resulting in solutions to problems that are difficult and often impossible to robustly solve in the auditory domain alone.