{"id":578896,"date":"2019-04-13T11:33:35","date_gmt":"2019-04-13T18:33:35","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&#038;p=578896"},"modified":"2025-08-01T13:52:34","modified_gmt":"2025-08-01T20:52:34","slug":"single-channel-speech-extraction-using-speaker-inventory-and-attention-network","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/single-channel-speech-extraction-using-speaker-inventory-and-attention-network\/","title":{"rendered":"Single-channel Speech Extraction Using Speaker Inventory and Attention Network"},"content":{"rendered":"<p>Neural network-based speech separation has received a surge of interest in recent years. Previously proposed methods either are speaker independent or extract a target speaker\u2019s voice by using his or her voice snippet. In applications such as home devices or office meeting transcriptions, a possible speaker list is available, which can be leveraged for speech separation. This paper proposes a novel speech extraction method that utilizes an inventory of voice snippets of possible interfering speakers, or speaker enrollment data, in addition to that of the target speaker. Furthermore, an attention-based network architecture is proposed to form time-varying masks for both the target and other speakers during the separation process. This architecture does not reduce the enrollment audio of each speaker into a single vector, thereby allowing each short time frame of the input mixture signal to be aligned and accurately compared with the enrollment signals. We evaluate the proposed system on a speaker extraction task derived from the Libri corpus and show the effectiveness of the method.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Neural network-based speech separation has received a surge of interest in recent years. Previously proposed methods either are speaker independent or extract a target speaker\u2019s voice by using his or her voice snippet. In applications such as home devices or office meeting transcriptions, a possible speaker list is available, which can be leveraged for speech [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_publishername":"IEEE","msr_publisher_other":"","msr_booktitle":"","msr_chapter":"","msr_edition":"","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"","msr_number":"","msr_organization":"IEEE","msr_pages_string":"","msr_page_range_start":"","msr_page_range_end":"","msr_series":"","msr_volume":"","msr_copyright":"","msr_conference_name":"ICASSP 2019","msr_doi":"","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":null,"msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"2019-5-11","msr_highlight_text":"","msr_notes":"","msr_longbiography":"","msr_publicationurl":"","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":0,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":true,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[],"msr_hide_image_in_river":null,"footnotes":""},"msr-research-highlight":[],"research-area":[13556,243062,13545],"msr-publication-type":[193716],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[246691],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-578896","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-artificial-intelligence","msr-research-area-audio-acoustics","msr-research-area-human-language-technologies","msr-locale-en_us","msr-field-of-study-computer-science"],"msr_publishername":"IEEE","msr_edition":"","msr_affiliation":"","msr_published_date":"2019-5-11","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"IEEE","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"doi","viewUrl":"false","id":"false","title":"https:\/\/doi.org\/10.1109\/ICASSP.2019.8682245","label_id":"243106","label":0},{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/dblp.org\/rec\/conf\/icassp\/XiaoCYELDDG19.html","label_id":"243109","label":0}],"msr_related_uploader":"","msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[],"msr-author-ordering":[{"type":"text","value":"Xiong Xiao","user_id":0,"rest_url":false},{"type":"text","value":"Zhuo Chen","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Takuya Yoshioka","user_id":36278,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Takuya Yoshioka"},{"type":"text","value":"Hakan Erdogan","user_id":0,"rest_url":false},{"type":"text","value":"Changliang Liu","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Dimitrios Dimitriadis","user_id":37521,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Dimitrios Dimitriadis"},{"type":"text","value":"J. Droppo","user_id":0,"rest_url":false},{"type":"text","value":"Y. Gong","user_id":0,"rest_url":false}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[664548,783091],"msr_project":[585154,171185],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":585154,"post_title":"Project Denmark","post_name":"project-denmark","post_type":"msr-project","post_date":"2019-05-09 13:13:15","post_modified":"2020-11-12 13:43:43","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/project-denmark\/","post_excerpt":"The goal of Project Denmark is to move beyond the need for traditional microphone arrays, such as those supported by Microsoft\u2019s Speech Devices SDK, to achieve high-quality capture of meeting conversations.","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/585154"}]}},{"ID":171185,"post_title":"Meeting Recognition and Understanding","post_name":"meeting-recognition-and-understanding","post_type":"msr-project","post_date":"2013-07-30 14:28:35","post_modified":"2023-08-12 21:11:41","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/meeting-recognition-and-understanding\/","post_excerpt":"In most organizations, staff spend many hours in meetings. This project addresses all levels of analysis and understanding, from speaker tracking and robust speech transcription to meaning extraction and summarization, with the goal of increasing productivity both during the meeting and after, for both participants and nonparticipants. The Meeting Recognition and Understanding project is a collection of online and offline spoken language understanding tasks. The following functions could be performed both on- and offline, but&hellip;","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/171185"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/578896","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":4,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/578896\/revisions"}],"predecessor-version":[{"id":1146371,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/578896\/revisions\/1146371"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=578896"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=578896"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=578896"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=578896"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=578896"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=578896"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=578896"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=578896"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=578896"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=578896"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=578896"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=578896"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=578896"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}