{"id":1018134,"date":"2024-03-28T14:28:52","date_gmt":"2024-03-28T21:28:52","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-project&#038;p=1018134"},"modified":"2025-06-03T09:25:04","modified_gmt":"2025-06-03T16:25:04","slug":"covomix","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/covomix\/","title":{"rendered":"CoVoMix"},"content":{"rendered":"<section class=\"mb-3 moray-highlight\">\n\t<div class=\"card-img-overlay mx-lg-0\">\n\t\t<div class=\"card-background bg-gray-200 has-background- card-background--full-bleed\">\n\t\t\t\t\t<\/div>\n\t\t<!-- Foreground -->\n\t\t<div class=\"card-foreground d-flex mt-md-n5 my-lg-5 px-g px-lg-0\">\n\t\t\t<!-- Container -->\n\t\t\t<div class=\"container d-flex mt-md-n5 my-lg-5 \">\n\t\t\t\t<!-- Card wrapper -->\n\t\t\t\t<div class=\"w-100 w-lg-col-5\">\n\t\t\t\t\t<!-- Card -->\n\t\t\t\t\t<div class=\"card material-md-card py-5 px-md-5\">\n\t\t\t\t\t\t<div class=\"card-body \">\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\n<h1 class=\"wp-block-heading\" id=\"covomix\">CoVoMix<\/h1>\n\n\n\n<p>Advancing Zero-shot Speech Generation for Human-like Multi-talker Conversation<\/p>\n\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t<\/div>\n<\/section>\n\n\n\n<div style=\"height:28px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n\n\n<p>We introduce CoVoMix: Conversational Voice Mixture Generation, a novel model for zero-shot, human-like, multi-speaker, multi-round dialogue speech generation. In addition, we devise a comprehensive set of metrics for measuring the effectiveness of dialogue modeling and generation. Our experimental results show that CoVoMix can generate dialogues that are not only human-like in their naturalness and coherence but also involve multiple speakers engaging in multiple rounds of conversation. These dialogues, generated within a single channel, are characterized by seamless speech transitions, including overlapping speech, and appropriate paralinguistic behaviors such as laughter and coughing.<\/p>\n\n\n\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button\"><a data-bi-type=\"button\" class=\"wp-block-button__link wp-element-button\" href=\"https:\/\/arxiv.org\/abs\/2404.06690\">Paper&#8217;s Link<\/a><\/div>\n<\/div>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"286\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/04\/intro-1024x286.png\" alt=\"diagram\" class=\"wp-image-1023120\" style=\"width:1125px;height:auto\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/04\/intro-1024x286.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/04\/intro-300x84.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/04\/intro-768x214.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/04\/intro-1536x429.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/04\/intro-2048x572.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/04\/intro-240x67.png 240w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<div style=\"height:60px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"CoVoMix Demo\" width=\"500\" height=\"281\" src=\"https:\/\/www.youtube-nocookie.com\/embed\/OZPkBXhWT78?feature=oembed&rel=0\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n\n\n\n<div style=\"height:60px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-group is-layout-constrained wp-block-group-is-layout-constrained\">\n<h2 class=\"wp-block-heading\" id=\"dialogue\">Dialogue<\/h2>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-vertically-aligned-bottom is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<h6 class=\"wp-block-heading\" id=\"transcription\">Transcription<\/h6>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<h6 class=\"wp-block-heading\" id=\"ground-truth-1\">Ground truth<\/h6>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-vertically-aligned-bottom is-layout-flow wp-block-column-is-layout-flow\">\n<h6 class=\"wp-block-heading has-text-align-left\" id=\"prompt\">Prompt<\/h6>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-vertically-aligned-bottom is-layout-flow wp-block-column-is-layout-flow\">\n<h6 class=\"wp-block-heading\" id=\"covosingle\">CoVoSingle<\/h6>\n\n\n\n<h6 class=\"wp-block-heading\" id=\"covomix-1\">CoVoMix<\/h6>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<\/p>\n\n\n\n<div class=\"wp-block-group is-layout-constrained wp-block-group-is-layout-constrained\">\n<div class=\"wp-block-columns are-vertically-aligned-center is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p>he was in jail for fourteen times and they finally deported him |&nbsp; fourteen times | yeah his family spent over two hundred thousand dollars keeping him here | wow | and then finally they said no that&#8217;s it he&#8217;s out can&#8217;t even come back to visit<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-audio aligncenter\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/GT1-fe_03_09727_028.wav\"><\/audio><\/figure>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/PT1-fe_03_09727_028.wav\"><\/audio><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/CS1-fe_03_09727_028.wav\"><\/audio><\/figure>\n\n\n\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/CM1-fe_03_09727_028.wav\"><\/audio><\/figure>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<\/p>\n\n\n\n<div class=\"wp-block-group is-layout-constrained wp-block-group-is-layout-constrained\">\n<div class=\"wp-block-columns are-vertically-aligned-center is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p>which is uh very strange it&#8217;s not something i ever thought would happen | yeah that&#8217;s not good | no<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-audio aligncenter\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/GT2-fe_03_09724_023.wav\"><\/audio><\/figure>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/PT2-fe_03_09724_023.wav\"><\/audio><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/CS2-fe_03_09724_023.wav\"><\/audio><\/figure>\n\n\n\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/CM2-fe_03_09724_023.wav\"><\/audio><\/figure>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p>so i&#8217;m not i don&#8217;t know what it is i don&#8217;t know what the minimum wage is or | uh it&#8217;s five fifteen now it was like four seventy five something like that | my gosh<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-audio aligncenter\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/GT3-fe_03_09720_004.wav\"><\/audio><\/figure>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/PT3-fe_03_09720_004.wav\"><\/audio><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/CS3-fe_03_09720_004.wav\"><\/audio><\/figure>\n\n\n\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/04\/CM3-fe_03_09720_004.wav\"><\/audio><\/figure>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p>something around there | yeah that&#8217;s that&#8217;s good | i don&#8217;t think they&#8217;d ever get a divorce | no i know<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-audio aligncenter\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/GT4-fe_03_09715_029.wav\"><\/audio><\/figure>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/PT4-fe_03_09715_029.wav\"><\/audio><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/CS4-fe_03_09715_029.wav\"><\/audio><\/figure>\n\n\n\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/CM4-fe_03_09715_029.wav\"><\/audio><\/figure>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<\/p>\n\n\n\n<div class=\"wp-block-columns are-vertically-aligned-center is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p>my life was like consumed with the television and it was it was just sad i it that was awful [laughter] | and i remember the i think the day after and like gas prices went up to like three bucks a gallon | oh i know | everybody was like filling up with gas in the town i live in panicking and mhm | really wow<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-audio aligncenter\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/04\/GT5-fe_03_06386_033.wav\"><\/audio><\/figure>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/PT5-fe_03_06386_033.wav\"><\/audio><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/04\/CS5-fe_03_06386_033.wav\"><\/audio><\/figure>\n\n\n\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/04\/CM5-fe_03_06386_033.wav\"><\/audio><\/figure>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<\/p>\n\n\n\n<div class=\"wp-block-columns are-vertically-aligned-center is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p>hard choice to make especially when you get peer pressure and then once you start doing it hey look now i&#8217;m cool | mhm | but in reality if you actually had to do that to be cool you&#8217;re hanging with the wrong parents anyways [laughter] | right right i think my brother in law and his<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-audio aligncenter\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/GT6-fe_03_06333_011-6605d31b42eb5.wav\"><\/audio><\/figure>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"wp-block-columns are-vertically-aligned-center is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/PT6-fe_03_06333_011.wav\"><\/audio><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/CS6-fe_03_06333_011.wav\"><\/audio><\/figure>\n\n\n\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/CM6-fe_03_06333_011.wav\"><\/audio><\/figure>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-group is-layout-constrained wp-block-group-is-layout-constrained\">\n<h2 class=\"wp-block-heading\" id=\"dialogue\">Monologue<\/h2>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-vertically-aligned-bottom is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<h6 class=\"wp-block-heading\" id=\"transcription\">Transcription<\/h6>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<h6 class=\"wp-block-heading\" id=\"ground-truth-1\">Ground truth<\/h6>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-vertically-aligned-bottom is-layout-flow wp-block-column-is-layout-flow\">\n<h6 class=\"wp-block-heading has-text-align-left\" id=\"prompt\">Prompt<\/h6>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-vertically-aligned-bottom is-layout-flow wp-block-column-is-layout-flow\">\n<h6 class=\"wp-block-heading\" id=\"covosingle\">VoiceBox-Style<\/h6>\n\n\n\n<h6 class=\"wp-block-heading\" id=\"covosingle\">CoVoSingle<\/h6>\n\n\n\n<h6 class=\"wp-block-heading\" id=\"covomix-1\">CoVoMix<\/h6>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<\/p>\n\n\n\n<div class=\"wp-block-group is-layout-constrained wp-block-group-is-layout-constrained\">\n<div class=\"wp-block-columns are-vertically-aligned-center is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p>totally offended i and like they would be like oh that&#8217;s a yankee for you but like they like they would they would be like um<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-audio aligncenter\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/fe_03_09764-B_076-MGT1.wav\"><\/audio><\/figure>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/fe_03_09764-B_076-MPT1.wav\"><\/audio><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/fe_03_09764-B_076-MVB1.wav\"><\/audio><\/figure>\n\n\n\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/fe_03_09764-B_076-MCS1.wav\"><\/audio><\/figure>\n\n\n\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/fe_03_09764-B_076-MCM1.wav\"><\/audio><\/figure>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<\/p>\n\n\n\n<div class=\"wp-block-group is-layout-constrained wp-block-group-is-layout-constrained\">\n<div class=\"wp-block-columns are-vertically-aligned-center is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p>um well a way to get more money at your job and that&#8217;s pretty much it you know<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-audio aligncenter\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/fe_03_09758-B_139-MGT2.wav\"><\/audio><\/figure>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/fe_03_09758-B_139-MPT2.wav\"><\/audio><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/fe_03_09758-B_139-MVB2.wav\"><\/audio><\/figure>\n\n\n\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/fe_03_09758-B_139-MCS2.wav\"><\/audio><\/figure>\n\n\n\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/fe_03_09758-B_139-MCM2.wav\"><\/audio><\/figure>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p>right i totally agree it gets in your clothes gets in your hair and for a nonsmoker they don&#8217;t realize how sensitive i think th um<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-audio aligncenter\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/fe_03_06379-A_039-MGT3.wav\"><\/audio><\/figure>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/fe_03_06379-A_039-MPT3.wav\"><\/audio><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/fe_03_06379-A_039-MVB3.wav\"><\/audio><\/figure>\n\n\n\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/fe_03_06379-A_039-MCS3.wav\"><\/audio><\/figure>\n\n\n\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/fe_03_06379-A_039-MCM3.wav\"><\/audio><\/figure>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p>yeah it&#8217;s really not a pleasant odor and it it&#8217;s it&#8217;s horrible my fiance smokes um on a r daily basis he smokes like a pack and a half a week or more i don&#8217;t know he doesn&#8217;t tell me<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-audio aligncenter\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/fe_03_06373-B_026-MGT4.wav\"><\/audio><\/figure>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/fe_03_06373-B_026-MPT4.wav\"><\/audio><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/fe_03_06373-B_026-MVB4.wav\"><\/audio><\/figure>\n\n\n\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/fe_03_06373-B_026-MCS4.wav\"><\/audio><\/figure>\n\n\n\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/fe_03_06373-B_026-MCM4.wav\"><\/audio><\/figure>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<\/p>\n\n\n\n<div class=\"wp-block-columns are-vertically-aligned-center is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p>and i i was i was saddened too because so many people do have a problem with tobacco and it is very addictive and i&#8217;m i have family members that<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-audio aligncenter\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/fe_03_06372-B_090-MGT5.wav\"><\/audio><\/figure>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/fe_03_06372-B_090-MPT5.wav\"><\/audio><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/fe_03_06372-B_090-MVB5.wav\"><\/audio><\/figure>\n\n\n\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/fe_03_06372-B_090-MCS5.wav\"><\/audio><\/figure>\n\n\n\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/fe_03_06372-B_090-MCM5.wav\"><\/audio><\/figure>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<\/p>\n\n\n\n<div class=\"wp-block-columns are-vertically-aligned-center is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p>right yeah and i i think that&#8217;s what you have to do personally to get to that point i have my my dad kinda got the same way you know it was too much of a hassle to go outside and if it was raining and you know just too much of an inconvenience<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-audio aligncenter\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/fe_03_06346-B_029-MGT7.wav\"><\/audio><\/figure>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"wp-block-columns are-vertically-aligned-center is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/fe_03_06346-B_029-MPT7.wav\"><\/audio><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/fe_03_06346-B_029-MVB7.wav\"><\/audio><\/figure>\n\n\n\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/fe_03_06346-B_029-MCS7.wav\"><\/audio><\/figure>\n\n\n\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/fe_03_06346-B_029-MCM7.wav\"><\/audio><\/figure>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<\/p>\n\n\n\n<div class=\"wp-block-columns are-vertically-aligned-center is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p>and the american culture is so big there that you know because most of the people i saw smoking<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-audio aligncenter\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/fe_03_06328-A_110-MGT8.wav\"><\/audio><\/figure>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"wp-block-columns are-vertically-aligned-center is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/fe_03_06328-A_110-MPT8.wav\"><\/audio><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/fe_03_06328-A_110-MVB8.wav\"><\/audio><\/figure>\n\n\n\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/fe_03_06328-A_110-MCS8.wav\"><\/audio><\/figure>\n\n\n\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/fe_03_06328-A_110-MCM8.wav\"><\/audio><\/figure>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<\/p>\n<\/div>\n\n\n\n\n\n<p>We introduce CoVoMix2: a fully non-autoregressive framework for zero-shot multi-talker dialogue generation. It directly predicts mel-spectrograms from multi-stream transcriptions using a flow-matching-based generative model, eliminating the reliance on intermediate token representations. To better capture realistic conversational dynamics, we propose transcription-level speaker disentanglement, sentence-level alignment, and prompt-level random masking strategies. CoVoMix2 operates without requiring transcriptions for the prompt and supports controllable dialogue generation, including overlapping speech and precise timing control, demonstrating strong generalizability to real-world speech generation scenarios.<\/p>\n\n\n\n<p><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/2506.00885v1\">Paper&#8217;s Link<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n\n\n\n<h2 class=\"wp-block-heading has-text-align-center\" id=\"system-overview\">System Overview<\/h2>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"879\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/pipeline-06012025-2-1024x879.png\" alt=\"outline2\" class=\"wp-image-1140879\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/pipeline-06012025-2-1024x879.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/pipeline-06012025-2-300x258.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/pipeline-06012025-2-768x659.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/pipeline-06012025-2-1536x1319.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/pipeline-06012025-2-2048x1758.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/pipeline-06012025-2-210x180.png 210w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading has-text-align-center\" id=\"samples\">Samples<\/h2>\n\n\n\n<p>All prompt audios are from the demo page of Moonshot&#8217;s MoonCast and Google&#8217;s Soundstorm and NotebookLM.<\/p>\n\n\n\n<div class=\"wp-block-group is-layout-constrained wp-block-group-is-layout-constrained\">\n<div class=\"wp-block-group is-layout-constrained wp-block-group-is-layout-constrained\">\n<h3 class=\"wp-block-heading has-text-align-center\" id=\"zero-shot-dialogue-generation-samples\">Zero-shot Dialogue Generation Samples<\/h3>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<h5 class=\"wp-block-heading has-text-align-center\" id=\"transcription-1\"><strong>Transcription<\/strong><\/h5>\n\n\n\n<p>Sun-set hotel. May I help you? | Yes, I have booked a room for 24th. It&#8217;s a double room. | Hold on, please. Let me check it for you. Yes, you&#8217;re right. You will keep it for 3 days. | Well, now I want to change the date from 24th to 28th. | OK, that shall be arranged.<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<h5 class=\"wp-block-heading has-text-align-center\" id=\"prompts\"><strong>Prompts<\/strong><\/h5>\n\n\n\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/prompt1.wav\"><\/audio><\/figure>\n\n\n\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/prompt2.wav\"><\/audio><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<h5 class=\"wp-block-heading has-text-align-center\" id=\"generated-speech\"><strong>Generated Speech<\/strong><\/h5>\n\n\n\n<figure class=\"wp-block-audio aligncenter\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/dailydialog-010.wav\"><\/audio><\/figure>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<h5 class=\"wp-block-heading has-text-align-center\" id=\"transcription-1\"><strong>Transcription<\/strong><\/h5>\n\n\n\n<p>Why&#8217;d you pull me over? | Are you aware that you drove through a red light? | I ran a red light? | Yes, you did. | I apologize, but I didn&#8217;t realize that I did. | Weren&#8217;t you taught that yellow means slow down, not speed up? | I did learn that. | So, then why did you speed up? | I don&#8217;t know what to tell you. | I&#8217;m going to have to write you a ticket. | I understand. | Here you go. Don&#8217;t do that again.<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<h5 class=\"wp-block-heading has-text-align-center\" id=\"prompts\"><strong>Prompts<\/strong><\/h5>\n\n\n\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/prompt1-1.wav\"><\/audio><\/figure>\n\n\n\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/prompt2-1.wav\"><\/audio><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<h5 class=\"wp-block-heading has-text-align-center\" id=\"generated-speech\"><strong>Generated Speech<\/strong><\/h5>\n\n\n\n<figure class=\"wp-block-audio aligncenter\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/dailydialog-020.wav\"><\/audio><\/figure>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<h5 class=\"wp-block-heading has-text-align-center\" id=\"transcription-1\"><strong>Transcription<\/strong><\/h5>\n\n\n\n<p>Hey man, you wanna buy some weed? | Some what? | Weed! You know? Pot, Ganja, Mary Jane some chronic! | Oh, umm, no thanks. | I also have blow if you prefer to do a few lines. | No, I am ok, really. | Come on man! I even got dope and acid! Try some! | Do you really have all of these drugs? Where do you get them from? | I got my connections! Just tell me what you want and I&#8217;ll even give you one ounce for free. | Sounds good! Let&#8217;s see, I want. | Yeah? | I want you to put your hands behind your head! You are under arrest!<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<h5 class=\"wp-block-heading has-text-align-center\" id=\"prompts\"><strong>Prompts<\/strong><\/h5>\n\n\n\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/prompt1-2.wav\"><\/audio><\/figure>\n\n\n\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/prompt2-2.wav\"><\/audio><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<h5 class=\"wp-block-heading has-text-align-center\" id=\"generated-speech\"><strong>Generated Speech<\/strong><\/h5>\n\n\n\n<figure class=\"wp-block-audio aligncenter\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/dailydialog-000.wav\"><\/audio><\/figure>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-group is-layout-constrained wp-block-group-is-layout-constrained\">\n<div class=\"wp-block-group is-layout-constrained wp-block-group-is-layout-constrained\">\n<h3 class=\"wp-block-heading has-text-align-center\" id=\"zero-shot-dialogue-generation-samples\">Cross-lingual Samples<\/h3>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<h5 class=\"wp-block-heading has-text-align-center\" id=\"transcription-1\"><strong>Transcription<\/strong><\/h5>\n\n\n\n<p>Can you tell me where the pots and pans are? | Pots and pans are right over there. | Oh, thank you. | Could I interest you in our store credit card? | No, thanks. I already have credit cards. | But our credit card saves you 10%. | That&#8217;s a nice discount. | Here. Let me give you an application form. | Thank you, but I&#8217;m just browsing today. | Okay. Enjoy your browsing.<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<h5 class=\"wp-block-heading has-text-align-center\" id=\"prompts\"><strong>Prompts<\/strong><\/h5>\n\n\n\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/prompt1-3.wav\"><\/audio><\/figure>\n\n\n\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/prompt2-3.wav\"><\/audio><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<h5 class=\"wp-block-heading has-text-align-center\" id=\"generated-speech\"><strong>Generated Speech<\/strong><\/h5>\n\n\n\n<figure class=\"wp-block-audio aligncenter\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/dailydialog-101.wav\"><\/audio><\/figure>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<h5 class=\"wp-block-heading has-text-align-center\" id=\"transcription-1\"><strong>Transcription<\/strong><\/h5>\n\n\n\n<p>Here is the fish counter. Look at the lobsters and crabs. Shall we have some? | I&#8217;m allergic to these things, you know. | Sorry, I forgot. I don&#8217;t like seafood, neither. | Let&#8217;s go over there and get some milk, a couple dozen eggs and some orange juice. | Let&#8217;s get frozen juice. It is really good. We have got enough food. Let&#8217; s go over to the check-out stand. | OK. But just let me pick up a bottle of cooking wine and oil as we go by.<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<h5 class=\"wp-block-heading has-text-align-center\" id=\"prompts\"><strong>Prompts<\/strong><\/h5>\n\n\n\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/prompt1-4.wav\"><\/audio><\/figure>\n\n\n\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/prompt2-4.wav\"><\/audio><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<h5 class=\"wp-block-heading has-text-align-center\" id=\"generated-speech\"><strong>Generated Speech<\/strong><\/h5>\n\n\n\n<figure class=\"wp-block-audio aligncenter\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/dailydialog-102.wav\"><\/audio><\/figure>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-group is-layout-constrained wp-block-group-is-layout-constrained\">\n<div class=\"wp-block-group is-layout-constrained wp-block-group-is-layout-constrained\">\n<h3 class=\"wp-block-heading has-text-align-center\" id=\"zero-shot-dialogue-generation-samples\">Overlapping Samples<\/h3>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<h5 class=\"wp-block-heading has-text-align-center\" id=\"transcription-1\"><strong>Transcription<\/strong><\/h5>\n\n\n\n<p>Seriously, Jamie? You didn\u2019t think to tell me about this? I had to find out from someone else | Oh! come on! It\u2019s not that big of a deal.<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<h5 class=\"wp-block-heading has-text-align-center\" id=\"prompts\"><strong>Prompts<\/strong><\/h5>\n\n\n\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/prompt1-6.wav\"><\/audio><\/figure>\n\n\n\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/prompt2-6.wav\"><\/audio><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<h5 class=\"wp-block-heading has-text-align-center\" id=\"generated-speech\"><strong>Generated Speech<\/strong><\/h5>\n\n\n\n<figure class=\"wp-block-audio aligncenter\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/overlap5.wav\"><\/audio><\/figure>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<h5 class=\"wp-block-heading has-text-align-center\" id=\"transcription-1\"><strong>Transcription<\/strong><\/h5>\n\n\n\n<p>Hey, guys! I think we should focus on the marketing strategy first. If we can get the word out there, we&#8217;ll have a better chance of attracting more customers. | yeah yeah I agree with you<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<h5 class=\"wp-block-heading has-text-align-center\" id=\"prompts\"><strong>Prompts<\/strong><\/h5>\n\n\n\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/prompt1-7.wav\"><\/audio><\/figure>\n\n\n\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/prompt2-7.wav\"><\/audio><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<h5 class=\"wp-block-heading has-text-align-center\" id=\"generated-speech\"><strong>Generated Speech<\/strong><\/h5>\n\n\n\n<figure class=\"wp-block-audio aligncenter\"><audio controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/overlap10.wav\"><\/audio><\/figure>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Advancing Zero-shot Speech Generation for Human-like Multi-talker Conversation We introduce CoVoMix: Conversational Voice Mixture Generation, a novel model for zero-shot, human-like, multi-speaker, multi-round dialogue speech generation. In addition, we devise a comprehensive set of metrics for measuring the effectiveness of dialogue modeling and generation. Our experimental results show that CoVoMix can generate dialogues that are [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"research-area":[13556,243062],"msr-locale":[268875],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-1018134","msr-project","type-msr-project","status-publish","hentry","msr-research-area-artificial-intelligence","msr-research-area-audio-acoustics","msr-locale-en_us","msr-archive-status-active"],"msr_project_start":"","related-publications":[1041822],"related-downloads":[],"related-videos":[],"related-groups":[],"related-events":[],"related-opportunities":[],"related-posts":[],"related-articles":[],"tab-content":[],"slides":[],"related-researchers":[{"type":"user_nicename","display_name":"Yao Qian","user_id":34976,"people_section":"Related people","alias":"yaoqian"}],"msr_research_lab":[],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/1018134","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-project"}],"version-history":[{"count":34,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/1018134\/revisions"}],"predecessor-version":[{"id":1141078,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/1018134\/revisions\/1141078"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1018134"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1018134"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1018134"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1018134"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=1018134"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}