{"id":185968,"date":"2011-03-02T00:00:00","date_gmt":"2011-03-03T19:32:39","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/msr-research-item\/recognizing-a-million-voices-low-dimensional-audio-representations-for-speaker-identification\/"},"modified":"2016-08-22T11:32:48","modified_gmt":"2016-08-22T18:32:48","slug":"recognizing-a-million-voices-low-dimensional-audio-representations-for-speaker-identification","status":"publish","type":"msr-video","link":"https:\/\/www.microsoft.com\/en-us\/research\/video\/recognizing-a-million-voices-low-dimensional-audio-representations-for-speaker-identification\/","title":{"rendered":"Recognizing a Million Voices: Low Dimensional Audio Representations for Speaker Identification"},"content":{"rendered":"<div class=\"asset-content\">\n<p>Recent advances in speaker verification technology have resulted in dramatic performance improvements in both speed and accuracy. Over the past few years, error rates have decreased by a factor of 5 or more. At the same time, the new techniques have resulted in massive speed-ups, which have increased the scale of viable speaker-id systems by several orders of magnitude. These improvements stem from a recent shift in the speaker modeling paradigm. Only a few years ago, the model for each individual speaker was trained using  data from only that particular speaker. Now, we make use of large speaker-labeled databases to learn distributions describing inter- and intra-speaker variability. This allow us to reveal the speech characteristics that are important for discriminating between speakers.<br \/>\nDuring the 2008 JHU summer workshop, our team has found that speech utterances can be encoded into low dimensional fixed-length vectors that preserve information about speaker identity. This concept of so-called &#8220;i-vectors&#8221;, which now forms the basis of state-of-the-art  systems, enabled new machine learning approaches to be applied to the speaker identification problem. Inter- and intra-speaker variability can now be easily modeled using Bayesian approaches, which leads to superior performance. A new training strategies can now benefit form the simpler statistical model form and the inherent speed-up. In our most recent work, we have retrained the hyperparameters of our Bayesian model using a discriminative objective function that directly addresses the task in speaker verification: discrimination between same-speaker and different-speaker trials. This is the first time such discriminative training has been successfully applied to speaker verification task.<\/p>\n<\/div>\n<p><!-- .asset-content --><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Recent advances in speaker verification technology have resulted in dramatic performance improvements in both speed and accuracy. Over the past few years, error rates have decreased by a factor of 5 or more. At the same time, the new techniques have resulted in massive speed-ups, which have increased the scale of viable speaker-id systems by [&hellip;]<\/p>\n","protected":false},"featured_media":196008,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr_hide_image_in_river":0,"footnotes":""},"research-area":[],"msr-video-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-session-type":[],"msr-impact-theme":[],"msr-pillar":[],"msr-episode":[],"msr-research-theme":[],"class_list":["post-185968","msr-video","type-msr-video","status-publish","has-post-thumbnail","hentry","msr-locale-en_us"],"msr_download_urls":"","msr_external_url":"https:\/\/youtu.be\/yWBfBOmekjU","msr_secondary_video_url":"","msr_video_file":"","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video\/185968","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-video"}],"version-history":[{"count":0,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video\/185968\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/196008"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=185968"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=185968"},{"taxonomy":"msr-video-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video-type?post=185968"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=185968"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=185968"},{"taxonomy":"msr-session-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-session-type?post=185968"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=185968"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=185968"},{"taxonomy":"msr-episode","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-episode?post=185968"},{"taxonomy":"msr-research-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-theme?post=185968"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}