{"id":749392,"date":"2021-05-20T13:46:35","date_gmt":"2021-05-20T20:46:35","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&#038;p=749392"},"modified":"2021-05-27T13:56:04","modified_gmt":"2021-05-27T20:56:04","slug":"pushing-the-frontier-of-neural-text-to-speech","status":"publish","type":"msr-video","link":"https:\/\/www.microsoft.com\/en-us\/research\/video\/pushing-the-frontier-of-neural-text-to-speech\/","title":{"rendered":"Pushing the frontier of neural text to speech"},"content":{"rendered":"<p>In the popular field of text to speech, the goal is to transform the written or printed word into speech that is natural and intelligible. Today, the technology is being used in products and services to help people who are blind or have low vision consume digital content, power personal digital assistants that sound more realistic, and make it easier to do two things at once, such as listening to an article online while washing dishes, among other applications. Although the quality of synthesized speech has gotten better thanks to neural network-based end-to-end TTS, advancing neural TTS and allowing it to be more easily integrated into product development and deployment requires overcoming a variety of remaining challenges.<\/p>\n<p>In this webinar, Senior Researcher Xu Tan will talk about these challenges, specifically the high computational cost and slow inference speed in online serving; word skipping and repeating issues, poor voice quality, and lack of voice controllability; the large amounts of training data needed for improved voice synthesis; and the practical challenges in TTS voice adaptation. He\u2019ll introduce his team\u2019s work addressing these challenges\u2014including fast TTS, end-to-end TTS, low-resource TTS, and adaptive TTS\u2014as well as discuss other critical questions and opportunities to pursue in the space.<\/p>\n<p>Together, you&#8217;ll explore:<\/p>\n<ul>\n<li>An overview of text to speech, including its evolution<\/li>\n<li>The important challenges in neural text to speech and how to address them with dedicated research<\/li>\n<li>How to factor product development into your research<\/li>\n<\/ul>\n<p><strong>Resource list:<\/strong><\/p>\n<ul>\n<li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/text-to-speech\/\">Text to Speech<\/a> (Project page)<\/li>\n<li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/xuta\/publications\/\">Xu Tan<\/a> (Publications page)<\/li>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/speechresearch.github.io\/\" target=\"_blank\" rel=\"noopener noreferrer\">Speech Research Repository Master List<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (GitHub)<\/li>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/speechresearch.github.io\/fastspeech\/\" target=\"_blank\" rel=\"noopener noreferrer\">FastSpeech: Fast, Robust and Controllable Text to Speech<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (GitHub)<\/li>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/speechresearch.github.io\/fastspeech2\/\" target=\"_blank\" rel=\"noopener noreferrer\">FastSpeech 2: Fast and High-Quality End-to-End Text to Speech<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (GitHub)<\/li>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/speechresearch.github.io\/adaspeech\/\" target=\"_blank\" rel=\"noopener noreferrer\">AdaSpeech: Adaptive Text to Speech for Custom Voice<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (GitHub)<\/li>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/speechresearch.github.io\/adaspeech2\/\" target=\"_blank\" rel=\"noopener noreferrer\">AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (GitHub)<\/li>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/speechresearch.github.io\/lightspeech\/\" target=\"_blank\" rel=\"noopener noreferrer\">LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (GitHub)<\/li>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/speechresearch.github.io\/lrspeech\/\" target=\"_blank\" rel=\"noopener noreferrer\">LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (Github)<\/li>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/techcommunity.microsoft.com\/t5\/azure-ai\/neural-text-to-speech-previews-five-new-languages-with\/ba-p\/1907604\" target=\"_blank\" rel=\"noopener noreferrer\">Neural Text-to-Speech previews five new languages with innovative models in the low-resource setting<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (blog)<\/li>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/azure.microsoft.com\/en-us\/services\/cognitive-services\/text-to-speech\">Microsoft Azure Text to Speech<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/li>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/speech.microsoft.com\/customvoice\" target=\"_blank\" rel=\"noopener noreferrer\">Microsoft Azure Custom Voice<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/li>\n<li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/xuta\/\">Xu Tan<\/a> (Researcher profile)<\/li>\n<\/ul>\n<p>*This on-demand webinar features a previously recorded Q&A session and open captioning.<\/p>\n<p>This webinar originally aired on May 20, 2021<\/p>\n<p>Explore more Microsoft Research webinars: <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/aka.ms\/msrwebinars\">https:\/\/aka.ms\/msrwebinars<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In the popular field of text to speech, the goal is to transform the written or printed word into speech that is natural and intelligible. Today, the technology is being used in products and services to help people who are blind or have low vision consume digital content, power personal digital assistants that sound more [&hellip;]<\/p>\n","protected":false},"featured_media":749395,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr_hide_image_in_river":0,"footnotes":""},"research-area":[13556,243062,13545],"msr-video-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-session-type":[],"msr-impact-theme":[],"msr-pillar":[],"msr-episode":[],"msr-research-theme":[],"class_list":["post-749392","msr-video","type-msr-video","status-publish","has-post-thumbnail","hentry","msr-research-area-artificial-intelligence","msr-research-area-audio-acoustics","msr-research-area-human-language-technologies","msr-locale-en_us"],"msr_download_urls":"","msr_external_url":"https:\/\/youtu.be\/MA8PCvmr8B0","msr_secondary_video_url":"","msr_video_file":"","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video\/749392","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-video"}],"version-history":[{"count":1,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video\/749392\/revisions"}],"predecessor-version":[{"id":749404,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video\/749392\/revisions\/749404"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/749395"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=749392"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=749392"},{"taxonomy":"msr-video-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video-type?post=749392"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=749392"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=749392"},{"taxonomy":"msr-session-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-session-type?post=749392"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=749392"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=749392"},{"taxonomy":"msr-episode","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-episode?post=749392"},{"taxonomy":"msr-research-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-theme?post=749392"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}