{"id":1136279,"date":"2025-04-15T09:39:48","date_gmt":"2025-04-15T16:39:48","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-project&#038;p=1136279"},"modified":"2025-05-01T19:21:43","modified_gmt":"2025-05-02T02:21:43","slug":"sonora","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/sonora\/","title":{"rendered":"Sonora: Human-AI Co-Creation of 3D Audio Worlds"},"content":{"rendered":"<section class=\"mb-3 moray-highlight\">\n\t<div class=\"card-img-overlay mx-lg-0\">\n\t\t<div class=\"card-background  has-background- card-background--full-bleed\">\n\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"1920\" height=\"720\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/sonora_1920x720.jpg\" class=\"attachment-full size-full\" alt=\"Sonora\" style=\"\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/sonora_1920x720.jpg 1920w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/sonora_1920x720-300x113.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/sonora_1920x720-1024x384.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/sonora_1920x720-768x288.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/sonora_1920x720-1536x576.jpg 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/sonora_1920x720-1600x600.jpg 1600w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/sonora_1920x720-240x90.jpg 240w\" sizes=\"auto, (max-width: 1920px) 100vw, 1920px\" \/>\t\t<\/div>\n\t\t<!-- Foreground -->\n\t\t<div class=\"card-foreground d-flex mt-md-n5 my-lg-5 px-g px-lg-0\">\n\t\t\t<!-- Container -->\n\t\t\t<div class=\"container d-flex mt-md-n5 my-lg-5 \">\n\t\t\t\t<!-- Card wrapper -->\n\t\t\t\t<div class=\"w-100 w-lg-col-5\">\n\t\t\t\t\t<!-- Card -->\n\t\t\t\t\t<div class=\"card material-md-card py-5 px-md-5\">\n\t\t\t\t\t\t<div class=\"card-body \">\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\n<h1 class=\"wp-block-heading\" id=\"sonora-human-ai-co-creation-of-3d-audio-worlds\"><strong>Sonora:<\/strong> Human-AI Co-Creation of 3D Audio Worlds <\/h1>\n\n\n\n<p><\/p>\n\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t<\/div>\n<\/section>\n\n\n\n\n\n<p><strong>Sonora<\/strong> is a novel AI-powered system developed by Microsoft Research that enables real-time, voice-driven creation and navigation of immersive 3D audio environments. Designed to promote relaxation and reduce anxiety, Sonora blends cutting-edge AI technologies\u2014including large language models (LLMs), audio diffusion models, and Unity3D game engine integration\u2014to offer deeply personalized and interactive soundscapes.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"rethinking-soundscapes-with-ai-and-its-impact-on-anxiety-and-cognitive-load\">Rethinking Soundscapes with AI and its Impact on Anxiety and Cognitive Load<\/h2>\n\n\n\n<p>While traditional soundscapes offer passive relaxation, Sonora introduces a co-creative experience where users can speak naturally to an AI to add, remove, or reposition spatialized audio elements. Whether it\u2019s the sound of ocean waves, birds overhead, or footsteps in snow, Sonora allows users to build calming environments in real time, tailoring the auditory world to their needs and preferences without relying on screens or visual interfaces.<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"Sonora explanation\" width=\"500\" height=\"281\" src=\"https:\/\/www.youtube-nocookie.com\/embed\/Ilg6LbtHNzg?feature=oembed&rel=0\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"system-architecture\">System Architecture<\/h2>\n\n\n\n<p>Sonora features a modular architecture comprising:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>LLM modules<\/strong> (powered by GPT-4o) that interpret user input and manage sound generation and placement<\/li>\n\n\n\n<li><strong>Audio diffusion models<\/strong> that synthesize realistic, non-pre-recorded sounds<\/li>\n\n\n\n<li><strong>The \u201cAI Conversationalist\u201d<\/strong>, a voice-based interface offering guidance and emotional engagement<\/li>\n\n\n\n<li>A curated library of 482 diffusion-generated sounds for fast, high-quality experiences<\/li>\n<\/ul>\n\n\n\n<p>The experience runs in Unity3D and can be accessed via VR or standard audio setups, allowing flexibility across environments and use cases.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1920\" height=\"941\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/SonoraExamples_Numbered.png\" alt=\"diagram\" class=\"wp-image-1136381\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/SonoraExamples_Numbered.png 1920w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/SonoraExamples_Numbered-300x147.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/SonoraExamples_Numbered-1024x502.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/SonoraExamples_Numbered-768x376.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/SonoraExamples_Numbered-1536x753.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/SonoraExamples_Numbered-240x118.png 240w\" sizes=\"auto, (max-width: 1920px) 100vw, 1920px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"evaluating-impact-a-user-study\">Evaluating Impact: A User Study<\/h2>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1400\" height=\"303\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/user_study.png\" alt=\"graphical user interface, text, application, email\" class=\"wp-image-1136378\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/user_study.png 1400w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/user_study-300x65.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/user_study-1024x222.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/user_study-768x166.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/user_study-240x52.png 240w\" sizes=\"auto, (max-width: 1400px) 100vw, 1400px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>A controlled user study (n=32) compared Sonora to a state-of-the-art passive soundscape (Headspace). Key findings include:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Participants rated Sonora as significantly more <strong>entertaining<\/strong> and <strong>engaging<\/strong><\/li>\n\n\n\n<li>Users with moderate to high trait anxiety showed a <strong>significant reduction in state anxiety<\/strong> in both conditions, with Sonora offering greater interactivity<\/li>\n\n\n\n<li>No increase in <strong>cognitive load<\/strong>, despite the added complexity of interaction<\/li>\n\n\n\n<li>A positive correlation between <strong>anxiety levels and system engagement<\/strong>, suggesting Sonora is particularly appealing for anxious individuals<\/li>\n<\/ul>\n\n\n\n<h2 id=\"\" class=\"wp-block-heading\"><\/h2>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"535\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/stai-5-1024x535.jpeg\" alt=\"chart, diagram\" class=\"wp-image-1136925\" style=\"width:1185px;height:auto\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/stai-5-1024x535.jpeg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/stai-5-300x157.jpeg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/stai-5-768x401.jpeg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/stai-5-1536x802.jpeg 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/stai-5-2048x1070.jpeg 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/04\/stai-5-240x125.jpeg 240w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Anxiety was measured using the State-Trait Anxiety Inventory (STAI). Participants were divided into two groups: moderate\/high anxiety (STAI trait score \u2265 38) and low anxiety. A cutoff score of 38 is most commonly used to define clinically significant symptoms, which is considered when a patient no longer meets the diagnostic criteria for the disorder. For participants in the Sonora condition, those with moderate\/high trait anxiety (13 per condition) showed a significant reduction in state anxiety (\ud835\udc5d < 0.001), while those with low anxiety showed no significant change (\ud835\udc5d = 0.570).<\/p>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"real-world-implications\">Real-World Implications<\/h2>\n\n\n\n<p>Sonora exemplifies the potential of AI-driven, screenless interaction paradigms to support mental health and well-being. By allowing users to &#8220;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/speaking-the-world-into-existence\/\">speak the world into existence<\/a>&#8221; through voice commands, Sonora creates personalized, immersive 3D audio environments that can be used for stress relief, mindfulness practice, education, and immersive entertainment.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"future-applications\">Future Applications<\/h2>\n\n\n\n<p>Beyond wellness, Sonora&#8217;s architecture opens doors to applications in gaming, accessibility, education, and therapeutic environments. The integration of fuzzy world modeling with spatial audio hints at new frontiers for naturalistic AI-human interaction in virtual environments.<\/p>\n\n\n","protected":false},"excerpt":{"rendered":"<p>Sonora is a novel AI-powered system developed by Microsoft Research that enables real-time, voice-driven creation and navigation of immersive 3D audio environments. Designed to promote relaxation and reduce anxiety, Sonora blends cutting-edge AI technologies\u2014including large language models (LLMs), audio diffusion models, and Unity3D game engine integration\u2014to offer deeply personalized and interactive soundscapes. While traditional soundscapes [&hellip;]<\/p>\n","protected":false},"featured_media":1136376,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"research-area":[13556,243062,13554,13553],"msr-locale":[268875],"msr-impact-theme":[266208,261673],"msr-pillar":[],"class_list":["post-1136279","msr-project","type-msr-project","status-publish","has-post-thumbnail","hentry","msr-research-area-artificial-intelligence","msr-research-area-audio-acoustics","msr-research-area-human-computer-interaction","msr-research-area-medical-health-genomics","msr-locale-en_us","msr-archive-status-active"],"msr_project_start":"","related-publications":[1135076],"related-downloads":[],"related-videos":[],"related-groups":[1084857,1105932],"related-events":[1134700],"related-opportunities":[],"related-posts":[],"related-articles":[],"tab-content":[],"slides":[],"related-researchers":[{"type":"user_nicename","display_name":"Judith Amores","user_id":42003,"people_section":"Section name 0","alias":"judithamores"},{"type":"user_nicename","display_name":"Javier Hernandez","user_id":38413,"people_section":"Section name 0","alias":"javierh"},{"type":"user_nicename","display_name":"Andy Wilson","user_id":31159,"people_section":"Section name 0","alias":"awilson"}],"msr_research_lab":[199563,199565],"msr_impact_theme":["Discovery","Health"],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/1136279","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-project"}],"version-history":[{"count":15,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/1136279\/revisions"}],"predecessor-version":[{"id":1138458,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/1136279\/revisions\/1138458"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/1136376"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1136279"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1136279"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1136279"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1136279"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=1136279"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}