{"id":134,"date":"2012-06-14T10:08:00","date_gmt":"2012-06-14T10:08:00","guid":{"rendered":"https:\/\/blogs.technet.microsoft.com\/inside_microsoft_research\/2012\/06\/14\/deep-neural-network-speech-recognition-debuts\/"},"modified":"2016-07-20T07:32:47","modified_gmt":"2016-07-20T14:32:47","slug":"deep-neural-network-speech-recognition-debuts","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/deep-neural-network-speech-recognition-debuts\/","title":{"rendered":"Deep-Neural-Network Speech Recognition Debuts"},"content":{"rendered":"<p class=\"posted-by\">Posted by <span class=\"author\">Rob Knies<\/p>\n<p><\/span><span class=\"author\"><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/msdnshared.blob.core.windows.net\/media\/TNBlogsFS\/prod.evol.blogs.technet.com\/CommunityServer.Blogs.Components.WeblogFiles\/00\/00\/00\/90\/35\/3582.MAVIS%20logo.jpg\"><img decoding=\"async\" style=\"margin: 10px; border: 0px currentColor; float: left;\" title=\"MAVIS logo\" src=\"https:\/\/msdnshared.blob.core.windows.net\/media\/TNBlogsFS\/prod.evol.blogs.technet.com\/CommunityServer.Blogs.Components.WeblogFiles\/00\/00\/00\/90\/35\/3582.MAVIS%20logo.jpg\" alt=\"MAVIS logo\" width=\"234\" \/><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><br \/>\n<\/span><span class=\"author\">Last August, my colleague <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" title=\"Janie Chang\" href=\"http:\/\/www.janiechang.com\/home\" target=\"_blank\">Janie Chang<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> wrote a feature story titled <em><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" title=\"<em>Speech Recognition Leaps Forward<\/em>\" href=\"http:\/\/research.microsoft.com\/en-us\/news\/features\/speechrecognition-082911.aspx\" target=\"_blank\">Speech Recognition Leaps Forward<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/em> that was published on the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" title=\"Microsoft Research website\" href=\"http:\/\/research.microsoft.com\/en-us\/\" target=\"_blank\">Microsoft Research website<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. The article outlined how <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" title=\"Dong Yu\" href=\"http:\/\/research.microsoft.com\/en-us\/people\/dongyu\/\" target=\"_blank\">Dong Yu<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, of <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" title=\"Microsoft Research Redmond\" href=\"http:\/\/research.microsoft.com\/en-us\/labs\/redmond\/default.aspx\" target=\"_blank\">Microsoft Research Redmond<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, and <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" title=\"Frank Seide\" href=\"http:\/\/research.microsoft.com\/en-us\/people\/fseide\/\" target=\"_blank\">Frank Seide<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, of <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" title=\"Microsoft Research Asia\" href=\"http:\/\/research.microsoft.com\/en-us\/labs\/asia\/default.aspx\" target=\"_blank\">Microsoft Research Asia<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, had extended the state of the art in real-time, speaker-independent, automatic speech recognition.<\/p>\n<p>Now, that improvement has been deployed to the world. Microsoft is updating the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" title=\"Microsoft Audio Video Indexing Service\" href=\"http:\/\/research.microsoft.com\/en-us\/projects\/mavis\/\" target=\"_blank\">Microsoft Audio Video Indexing Service<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> with new algorithms that enable customers to take advantage of the improved accuracy detailed in a paper Yu, Seide, and Gang Li, also of Microsoft Research Asia, delivered in Florence, Italy, during Interspeech 2011, the 12th annual Conference of the International Speech Communication Association.<br \/>\nThe algorithms represent the first time a company has released a deep-neural-networks (DNN)-based speech-recognition algorithm in a commercial product.<\/p>\n<p>It\u2019s a big deal. The benefits, says <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" title=\"Behrooz Chitsaz\" href=\"http:\/\/research.microsoft.com\/en-us\/people\/behroozc\/\" target=\"_blank\">Behrooz Chitsaz<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, director of Intellectual Property Strategy for Microsoft Research, are improved accuracy and faster processor timing.<\/p>\n<p>He says that tests have demonstrated that the algorithm provides a 10- to 20-percent relative error reduction and uses about 30 percent less processing time than the best-of-breed speech-recognition algorithms based on so-called Gaussian Mixture Models.<\/p>\n<p>Importantly, deep neural networks achieve these gains without the need for \u201cspeaker adaptation.\u201d In comparison, today\u2019s state-of-the-art technology operates in \u201cspeaker-adaptive\u201d mode, in which an audio file is recognized multiple times, and after each time, the recognizer \u201ctunes\u201d itself a little more closely to the specific speaker or speakers in the file, so that the next time, it gets better\u2014an expensive process.<\/p>\n<p>The ultimate goal of automatic speech recognition, Chang\u2019s story indicates, is out-of-the-box speaker-independent services that don\u2019t require user training. Such services are critical in mobile scenarios, at call centers, and in web services for speech-to-speech translation. It\u2019s difficult to overstate the impact that this technology will have as it rolls out across the breadth of Microsoft\u2019s other services and applications that employ speech recognition.<\/p>\n<p>Artificial neural networks are mathematical models of low-level circuits in the human brain. They have been in use for speech recognition for more than 20 years, but only a few years ago did computer scientists gain access to enough computing power to make it possible to build models that are fine-grained\u00a0 and complex enough to show promise in automatic speech recognition.<\/p>\n<p>An intern at Microsoft Research Redmond, <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" title=\"George Dahl\" href=\"http:\/\/www.cs.toronto.edu\/~gdahl\/\" target=\"_blank\">George Dahl<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, now at the University of Toronto, contributed insights into the working of DNNs and experience in training them. His work helped Yu and teammates produce a paper called <em><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" title=\"<em>Context-Dependent Pre-trained Deep Neural Networks for Large Vocabulary Speech Recognition<\/em>\" href=\"http:\/\/research.microsoft.com\/pubs\/144412\/DBN4LVCSR-TransASLP.pdf\" target=\"_blank\">Context-Dependent Pre-trained Deep Neural Networks for Large Vocabulary Speech Recognition<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/em>.<\/p>\n<p>In October 2010, Yu presented the paper during a visit to Microsoft Research Asia. Seide was intrigued by the research results, and the two joined forces in a collaboration that has scaled up the new, DNN-based algorithms to thousands of hours of training data.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Posted by Rob Knies Last August, my colleague Janie Chang wrote a feature story titled Speech Recognition Leaps Forward that was published on the Microsoft Research website. The article outlined how Dong Yu, of Microsoft Research Redmond, and Frank Seide, of Microsoft Research Asia, had extended the state of the art in real-time, speaker-independent, automatic [&hellip;]<\/p>\n","protected":false},"author":30766,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[],"msr_hide_image_in_river":0,"footnotes":""},"categories":[1],"tags":[186834,200615,201099,201357,201635,201673,201699,202031,202089,202145,202621,202709,196432,196463,186936,203977],"research-area":[13561,13545],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-134","post","type-post","status-publish","format-standard","hentry","category-research-blog","tag-algorithms","tag-behrooz-chitsaz","tag-context-dependent-pre-trained-deep-neural-networks-for-large-vocabulary-speech-recognition","tag-dong-yu","tag-frank-seide","tag-gang-li","tag-george-dahl","tag-intellectual-property-strategy","tag-interspeech-2011","tag-janie-chang","tag-mavis","tag-microsoft-audio-video-indexing-service","tag-microsoft-research-asia","tag-microsoft-research-redmond","tag-natural-language-processing","tag-speech-recognition-leaps-forward","msr-research-area-algorithms","msr-research-area-human-language-technologies","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[],"related-projects":[],"related-events":[],"related-researchers":[],"msr_type":"Post","byline":"","formattedDate":"June 14, 2012","formattedExcerpt":"Posted by Rob Knies Last August, my colleague Janie Chang wrote a feature story titled Speech Recognition Leaps Forward that was published on the Microsoft Research website. The article outlined how Dong Yu, of Microsoft Research Redmond, and Frank Seide, of Microsoft Research Asia, had&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/134","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/30766"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=134"}],"version-history":[{"count":1,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/134\/revisions"}],"predecessor-version":[{"id":235569,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/134\/revisions\/235569"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=134"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=134"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=134"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=134"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=134"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=134"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=134"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=134"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=134"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=134"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=134"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}