{"id":212079,"date":"2015-12-01T18:14:03","date_gmt":"2015-12-01T18:14:03","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/project\/spatial-audio\/"},"modified":"2022-01-21T12:44:12","modified_gmt":"2022-01-21T20:44:12","slug":"spatial-audio","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/spatial-audio\/","title":{"rendered":"Spatial Audio"},"content":{"rendered":"<p>Spatial sound is perceived by a listener as emanating from a certain location in space, due to temporal and spectral cues that inform the auditory system about the sound\u2019s direction of arrival and distance. Rendering sound spatially, by encoding these localization cues and delivering them to the listener via headphones, allows placing virtual audio sources arbitrarily in the listener\u2019s environment. Spatial sound is an integral part of mixed reality applications, including telepresence, gaming, and entertainment.<\/p>\n<h2>Head-related transfer functions<\/h2>\n<p>The temporal and spectral cues used by our auditory system to determine the direction of arrival of a sound source can be expressed in head-related transfer functions (HRTFs). HRTFs are measurements that capture the directivity patterns of human ears, that is, the way sound, arriving from a certain direction, reaches the left and right ear. HRTFs are a function of source azimuth and elevation, distance, and frequency. Figure 2 illustrates the right HRTF of a subject for the horizontal plane at a distance of one meter. Figure 1 shows the sensitivity of the right ear at 1000 Hz as function of azimuth and elevation at a distance of one meter.<\/p>\n<table style=\"border-spacing: inherit;border-collapse: collapse;margin-bottom: 40px\">\n<tbody>\n<tr>\n<td style=\"padding: inherit;border: inherit\">\n<p><div id=\"attachment_233321\" style=\"width: 310px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-233321\" class=\"wp-image-233321 size-medium\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2015\/12\/spatialaudio_project2_mediumres-300x236.jpg\" alt=\"spatialaudio_project2_mediumres\" width=\"300\" height=\"236\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2015\/12\/spatialaudio_project2_mediumres-300x236.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2015\/12\/spatialaudio_project2_mediumres-768x603.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2015\/12\/spatialaudio_project2_mediumres.jpg 1000w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><p id=\"caption-attachment-233321\" class=\"wp-caption-text\">Figure 1<\/p><\/div><\/td>\n<td style=\"padding: inherit;border: inherit\">\n<p><div id=\"attachment_212965\" style=\"width: 310px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-212965\" class=\"wp-image-212965 size-medium\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2015\/12\/Project1_lowres-300x225.jpg\" alt=\"Project1_lowres\" width=\"300\" height=\"225\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2015\/12\/Project1_lowres-300x225.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2015\/12\/Project1_lowres-768x576.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2015\/12\/Project1_lowres-1024x768.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2015\/12\/Project1_lowres-80x60.jpg 80w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><p id=\"caption-attachment-212965\" class=\"wp-caption-text\">Figure 2<\/p><\/div><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2>Early applications and their challenges<\/h2>\n<p>HRTFs were first used in the 1950s in binaural recordings created by recording sound, e.g., a concert, via two microphones near the ears of a mannequin. Listening to these recordings via headphones creates the illusion of being acoustically present at the recorded event. Technical challenges of these early applications included a smeared acoustical image if the listener\u2019s head did not match the mannequin\u2019s. In addition, head movements by the listener would move the entire audio scene, which does not happen when listening to a real sound scene and thus may break the binaural illusion.<\/p>\n<h2>HRTF personalization<\/h2>\n<table style=\"height: 655px;border-collapse: collapse;border-spacing: inherit;margin-bottom: 20px\" width=\"923\">\n<tbody>\n<tr>\n<td style=\"padding: inherit;border: inherit\"><span class=\"ImageBlock fn\"><span class=\"ImageCaptionCoreCss ImageCaption\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-212969 alignleft\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2015\/12\/img_2671-300x225.jpg\" alt=\"img_2671\" width=\"300\" height=\"225\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2015\/12\/img_2671-300x225.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2015\/12\/img_2671-768x576.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2015\/12\/img_2671-1024x768.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2015\/12\/img_2671-80x60.jpg 80w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/span><\/span>The HRTF of a person can be measured by using a setup similar to the one shown on the left. A set of loudspeakers is rotated around a person wearing small microphones in their left and right ears. Test signals are recorded from each loudspeaker location to measure the spatial directivity patterns of the ears, that is, the person\u2019s HRTFs.<\/td>\n<\/tr>\n<tr>\n<td style=\"padding: inherit;border: inherit\"><span id=\"238f7d8e-6c6c-4659-99e8-dccd8c9950e0\" class=\"ImageBlock fn\"><span id=\"ImageCaption238f7d8e-6c6c-4659-99e8-dccd8c9950e0\" class=\"ImageCaptionCoreCss ImageCaption\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-212970 alignleft\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2015\/12\/project6-300x295.jpg\" alt=\"project6\" width=\"300\" height=\"295\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2015\/12\/project6-300x295.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2015\/12\/project6-768x756.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2015\/12\/project6-1024x1008.jpg 1024w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/>This measurement process requires specialized equipment and is time-consuming and cumbersome. To reduce the measurement time or the amount or type of information needed about a subject, personalized HRTFs can be synthesized via acoustic models or machine learning, e.g., from anthropometric features (head width, height, length; ear entrance locations; etc.) or even a crude head scan, as shown on the left. There exists a trade-off between the accuracy of the personalized HRTF and the amount and quality of information known about the user. The challenge in practical applications is to synthesize good-enough HRTFs while not unnecessarily burdening the user with data collection.<br \/>\n<\/span><\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2>Applications for spatial audio<\/h2>\n<h3>Gaming<\/h3>\n<p style=\"padding-left: 30px\">Gaming is an ideal application for HRTFs since the 3-D coordinates of individual sound sources are typically available, allowing to collocate visual and auditory sources.<\/p>\n<h3>Virtual surround sound<\/h3>\n<p style=\"padding-left: 30px\">Rendering 5.1 (six channel) or 7.1 (eight channel) surround sound spatially creates a similar audio experience as listening to an actual loudspeaker system. Virtual surround sound can enhance the acoustic experience of games or movies even when using regular headphones.<\/p>\n<h3>Mixed reality<\/h3>\n<p style=\"padding-left: 30px\">Spatial sound is a key feature of many mixed reality applications, as it can enhance the sense of presence and immersion, or create a more realistic experience of virtual content.<\/p>\n<h3>Stereo music rendering<\/h3>\n<p style=\"padding-left: 30px\">Stereo music is intended to be listened to through two loudspeakers in front of the listener. Listening to it with regular headphones places the audio scene between the two ears, inside the listener\u2019s head. With spatial audio, the two loudspeakers can be rendered in front of the listener, placing the audio scene in front, where it is supposed to be.<\/p>\n<h2>Technology transfers<\/h2>\n<p>The Audio and Acoustics Research Group worked closely with our partners in the engineering teams to convert spatial audio research projects to shippable code in various Microsoft products:<\/p>\n<ul>\n<li>Virtual surround sound support in Windows 10 and in Xbox One.<\/li>\n<li>The 3D audio rendering engine in <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/product\/soundscape\/\">Microsoft Soundscape<\/a>.<\/li>\n<li>And, of course, the spatial audio engine in <a href=\"http:\/\/www.microsoft.com\/microsoft-hololens\/en-us\" target=\"_new\" rel=\"noopener noreferrer\">HoloLens<\/a> &#8211; Microsoft&#8217;s augmented and virtual reality wearable device.<\/li>\n<\/ul>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-233329\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2015\/12\/spatialaudio_hololens_0.gif\" alt=\"spatialaudio_hololens_0\" width=\"663\" height=\"373\" \/><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Spatial audio, also known as 3D stereo sound, is about creating a 3D audio experience by using headphones. Applications of this technology include augmented and virtual reality, listening to music, and watching a movie on a tablet or PC.<\/p>\n","protected":false},"featured_media":668838,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"research-area":[243062,13551,13554],"msr-locale":[268875],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-212079","msr-project","type-msr-project","status-publish","has-post-thumbnail","hentry","msr-research-area-audio-acoustics","msr-research-area-graphics-and-multimedia","msr-research-area-human-computer-interaction","msr-locale-en_us","msr-archive-status-active"],"msr_project_start":"12\/1\/2015","related-publications":[486468,480927,441264,371261,238131,168888,166725,166792,166734,164059,687735,1160906,885639,885597,697225,166682,678717,664644,656256,612741,582376,582358,578248,573123,509498,507317,466449,466431,168298],"related-downloads":[],"related-videos":[191147,182376,189952,192776,253634,253673,264468,474945,474957,505685,544005,609111,668193,692004,806653],"related-groups":[144923],"related-events":[],"related-opportunities":[],"related-posts":[],"related-articles":[],"tab-content":[],"related-researchers":[{"type":"user_nicename","display_name":"David Johnston","user_id":31562,"people_section":"Project contributors","alias":"davidjo"},{"type":"user_nicename","display_name":"Hannes Gamper","user_id":31943,"people_section":"Project contributors","alias":"hagamper"},{"type":"user_nicename","display_name":"Ivan Tashev","user_id":32127,"people_section":"Project contributors","alias":"ivantash"},{"type":"guest","display_name":"Shoken Kaneko","user_id":814630,"people_section":"Past interns","alias":""},{"type":"guest","display_name":"Shoken Kaneko","user_id":814624,"people_section":"Past interns","alias":""},{"type":"guest","display_name":"Fabian Brinkmann","user_id":663237,"people_section":"Past interns","alias":""},{"type":"guest","display_name":"Christoph F. Hold","user_id":663231,"people_section":"Past interns","alias":""},{"type":"guest","display_name":"Etienne Thuillier","user_id":663219,"people_section":"Past interns","alias":""},{"type":"guest","display_name":"Andrea Genovese","user_id":663225,"people_section":"Past interns","alias":""},{"type":"guest","display_name":"Vani Rajendran","user_id":663213,"people_section":"Past interns","alias":""},{"type":"guest","display_name":"Archontis Politis","user_id":663207,"people_section":"Past interns","alias":""},{"type":"guest","display_name":"Piotr Bilinski","user_id":663201,"people_section":"Past interns","alias":""},{"type":"guest","display_name":"Keith Godin","user_id":663195,"people_section":"Past interns","alias":""},{"type":"guest","display_name":"Ville Pulkki","user_id":664344,"people_section":"Consulting researchers","alias":""}],"msr_research_lab":[199565,1161007],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/212079","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-project"}],"version-history":[{"count":48,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/212079\/revisions"}],"predecessor-version":[{"id":810688,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/212079\/revisions\/810688"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/668838"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=212079"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=212079"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=212079"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=212079"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=212079"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}