Spatial Audio

Established: December 1, 2015





Spatial audio, also known as 3D stereo sound, is about creating a 3D audio experience by using headphones. Applications of this technology include augmented and virtual reality, listening to music, and watching a movie on a tablet or PC.


Head related transfer functions

Head-related transfer functions (HRTFs) are measurements that describe the directivity patterns of human ears, that is, a description of how sound, arriving from given direction, reaches the left ear and the right ear. A person’s HRTFs measurements depend on the direction, elevation, distance, and the frequency of the sound. Figure 1 shows the representation of the right HRTF of a human for the horizontal plane only and at a distance of one meter. Figure 2 shows the sensitivity of the right ear for 1000 hertz as function of the direction and elevation for a distance of one meter.


Figure 1


Figure 2


Early applications and their challenges

HRTFs were first used in the 1950s in binaural recordings, created by placing two microphones near the ears of a mannequin to record sound, such as live concerts. The recordings, when listened to via headphones, create the effect of being acoustically present at the event. Early applications had some technical challenges, such as creating a smeared acoustical image if the listener’s head did not match the mannequin’s. In addition, head movements by the listener would move the entire audio picture, which is different than when listening to a real sound.


HRTF personalization

img_2671The HRTF of a person can be measured by using a setup, similar to the one on the left. A set of loudspeakers is rotated around a person who has small microphones in their left and right ears. These measurements indicate the spatial directivity patterns of the ears, that is, that person’s HRTFs.
project6This process requires specialized equipment and is long and cumbersome. Through machine learning, we can synthesize personalized HRTFs by using anthropometrics: head width, height, depth; entrance coordinates of the ear canals; and more. These can be measured even from a crude head scan, as shown on the left. Obviously the more parameters, the more personalized the HRTFs. To make this technology usable from practical standpoint is to find the right balance between anthropometrics and good-enough HRTFs.

Applications for spatial audio


Gaming is an ideal application for HRTFs, because of the availability of the 3D coordinates of the sound sources and the ability to place each sound source where the object is visually.

Augmented and virtual reality

These scenarios are also ideal applications, where spatial audio is a must-have feature.

Virtual surround sound support

When 5.1 (six channel) or 7.1 (eight channel) surround sound is rendered through spatial audio headphones, it creates the same audio experience as listening to the actual loudspeaker system.

Stereo music rendering

Even rendering a normal stereo recording through a spatial audio sound system provides a better experience. Stereo music is designed to be listened to through two loudspeakers in front of the listener. Listening to it with regular headphones places the audio scene between the two ears, inside the listener’s head. With spatial audio, the two loudspeakers can be rendered in front of the listener, placing the audio scene in front, where it is supposed to be.


Technology transfers

The Audio and Acoustics Research Group worked closely with our partners in the engineering teams to convert the spatial audio rendering from a research project to shippable code in various Microsoft products:

Virtual surround sound support in Windows 10 for Xbox One.

The 3D audio rendering engine in Microsoft Soundscape.

And, of course, the spatial audio engine in HoloLens – Microsoft’s augmented and virtual reality wearable device.