{"id":466377,"date":"2018-02-14T18:19:56","date_gmt":"2018-02-15T02:19:56","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&#038;p=466377"},"modified":"2020-06-04T17:27:30","modified_gmt":"2020-06-05T00:27:30","slug":"limiting-numerical-precision-of-neural-networks-to-achieve-real-time-voice-activity-detection","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/limiting-numerical-precision-of-neural-networks-to-achieve-real-time-voice-activity-detection\/","title":{"rendered":"Limiting Numerical Precision of Neural Networks to Achieve Real-time Voice Activity Detection"},"content":{"rendered":"<p>Fast and robust voice-activity detection is critical to efficiently process speech. While deep-learning based methods to detect voice have shown competitive accuracy, the best models in the literature incur over a 100 ms latency on commodity processors. Such delays are unacceptable for real-time speech processing. In this paper, we study the impact of lowering the representation precision of the neural-network weights and neurons on both the accuracy and delay of voice-activity detection. Based on a design-space exploration, we not only determine the optimal scaling strategy but also adjust the network structure to accommodate the new quantization levels. Through experiments conducted with real user data, we demonstrate that optimized deep neural networks with lower bit precision outperform the state-of-the-art WebRTC voice-activity detector with 87x lower delay and 6.8% lower error rate.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Fast and robust voice-activity detection is critical to efficiently process speech. While deep-learning based methods to detect voice have shown competitive accuracy, the best models in the literature incur over a 100 ms latency on commodity processors. Such delays are unacceptable for real-time speech processing. In this paper, we study the impact of lowering the [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_publishername":"","msr_publisher_other":"","msr_booktitle":"","msr_chapter":"","msr_edition":"","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"","msr_number":"","msr_organization":"","msr_pages_string":"","msr_page_range_start":"","msr_page_range_end":"","msr_series":"","msr_volume":"","msr_copyright":"\u00a9 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting\/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.","msr_conference_name":"IEEE Int. Conf. Acoustics Speech and Signal Processing (ICASSP)","msr_doi":"","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":"","msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"2018-4-17","msr_highlight_text":"","msr_notes":"","msr_longbiography":"","msr_publicationurl":"https:\/\/2018.ieeeicassp.org\/","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":0,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":true,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[],"msr_hide_image_in_river":0,"footnotes":""},"msr-research-highlight":[],"research-area":[13561,13556,243062,13545],"msr-publication-type":[193716],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-466377","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-algorithms","msr-research-area-artificial-intelligence","msr-research-area-audio-acoustics","msr-research-area-human-language-technologies","msr-locale-en_us"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2018-4-17","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"489158","msr_publicationurl":"https:\/\/2018.ieeeicassp.org\/","msr_doi":"","msr_publication_uploader":[{"type":"file","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2018\/02\/KoPhiliposeTashevZarar_ICASSP_2018.pdf","id":"489158","title":"KoPhiliposeTashevZarar_ICASSP_2018","label_id":"243109","label":0},{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/2018.ieeeicassp.org\/","label_id":"243109","label":0}],"msr_related_uploader":"","msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[{"id":0,"url":"https:\/\/2018.ieeeicassp.org\/"}],"msr-author-ordering":[{"type":"text","value":"Jong-Hwan Ko","user_id":0,"rest_url":false},{"type":"text","value":"Josh Fromm","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Matthai Philipose","user_id":32834,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Matthai Philipose"},{"type":"user_nicename","value":"Ivan Tashev","user_id":32127,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Ivan Tashev"},{"type":"user_nicename","value":"Shuayb Zarar","user_id":36563,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Shuayb Zarar"}],"msr_impact_theme":[],"msr_research_lab":[199565],"msr_event":[],"msr_group":[144923],"msr_project":[364265],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":364265,"post_title":"Neural Networks-based Speech Enhancement","post_name":"nn-speech-enhancement","post_type":"msr-project","post_date":"2017-02-15 17:08:27","post_modified":"2022-01-21 13:15:43","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/nn-speech-enhancement\/","post_excerpt":"The goal of speech enhancement is to take the audio signal from a microphone,\u00a0clean\u00a0it and forward clean audio to multiple clients such as speech-recognition software, archival databases and speakers.","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/364265"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/466377","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":1,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/466377\/revisions"}],"predecessor-version":[{"id":466395,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/466377\/revisions\/466395"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=466377"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=466377"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=466377"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=466377"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=466377"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=466377"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=466377"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=466377"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=466377"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=466377"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=466377"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=466377"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=466377"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}