{"id":156385,"date":"2008-09-01T00:00:00","date_gmt":"2008-09-01T00:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/msr-research-item\/parameter-clustering-and-sharing-in-variable-parameter-hmms-for-noise-robust-speech-recognition\/"},"modified":"2018-10-16T20:22:57","modified_gmt":"2018-10-17T03:22:57","slug":"parameter-clustering-and-sharing-in-variable-parameter-hmms-for-noise-robust-speech-recognition","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/parameter-clustering-and-sharing-in-variable-parameter-hmms-for-noise-robust-speech-recognition\/","title":{"rendered":"Parameter Clustering and Sharing in Variable-Parameter HMMs for Noise Robust Speech Recognition"},"content":{"rendered":"<div class=\"asset-content\">\n<p>Recently we proposed a cubic-spline-based variableparameter hidden Markov model (CS-VPHMM) whose mean and variance parameters vary according to some cubic spline functions of additional environment-dependent parameters. We have shown good properties of the CS-VPHMM and demonstrated on the Aurora-3 corpus that MCE-trained CSVPHMM greatly outperforms the MCE-trained conventional HMM at the cost of increased total number of model parameters. In this paper, we propose to share spline functions across different Gaussian mixture components to reduce the total number of model parameters and develop a clustering algorithm to do so. We demonstrate the effectiveness of our parameter clustering and sharing algorithm for the CSVPHMM on Aurora-3 corpus and show that proper parameter sharing can reduce the number of parameters from 4 times of that used in the conventional HMM to 1.13 times and still get 18% relative WER reduction over the MCE trained conventional HMM under the well-matched condition. Effective parameter sharing makes the CS-VPHMM an attractive model for noise robustness.<\/p>\n<\/div>\n<p><!-- .asset-content --><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Recently we proposed a cubic-spline-based variableparameter hidden Markov model (CS-VPHMM) whose mean and variance parameters vary according to some cubic spline functions of additional environment-dependent parameters. We have shown good properties of the CS-VPHMM and demonstrated on the Aurora-3 corpus that MCE-trained CSVPHMM greatly outperforms the MCE-trained conventional HMM at the cost of increased total [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[{"type":"user_nicename","value":"dongyu","user_id":"31667"},{"type":"user_nicename","value":"deng","user_id":"31602"},{"type":"user_nicename","value":"ygong","user_id":"34994"},{"type":"user_nicename","value":"alexac","user_id":"30932"}],"msr_publishername":"International Speech Communication Association","msr_publisher_other":"","msr_booktitle":"","msr_chapter":"","msr_edition":"Proc. of the Interspeech","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"","msr_number":"","msr_organization":"","msr_pages_string":"","msr_page_range_start":"","msr_page_range_end":"","msr_series":"","msr_volume":"","msr_copyright":"\u00a9 2007 ISCA. Personal use of this material is permitted. However, permission to reprint\/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the ISCA and\/or the author.","msr_conference_name":"Proc. of the Interspeech","msr_doi":"","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":"","msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"2008-09-01","msr_highlight_text":"","msr_notes":"","msr_longbiography":"","msr_publicationurl":"","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":2008,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":true,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[],"msr_hide_image_in_river":0,"footnotes":""},"msr-research-highlight":[],"research-area":[13556,13545,13554],"msr-publication-type":[193716],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-156385","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-artificial-intelligence","msr-research-area-human-language-technologies","msr-research-area-human-computer-interaction","msr-locale-en_us"],"msr_publishername":"International Speech Communication Association","msr_edition":"Proc. of the Interspeech","msr_affiliation":"","msr_published_date":"2008-09-01","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"208083","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"file","title":"VPHMM-Cluster-Interspeech2008.pdf","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/02\/VPHMM-Cluster-Interspeech2008.pdf","id":208083,"label_id":0}],"msr_related_uploader":"","msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[{"id":208083,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/02\/VPHMM-Cluster-Interspeech2008.pdf"}],"msr-author-ordering":[{"type":"user_nicename","value":"dongyu","user_id":31667,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=dongyu"},{"type":"user_nicename","value":"deng","user_id":31602,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=deng"},{"type":"user_nicename","value":"ygong","user_id":34994,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=ygong"},{"type":"user_nicename","value":"alexac","user_id":30932,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=alexac"}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[],"msr_project":[169434],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":169434,"post_title":"Acoustic Modeling","post_name":"acoustic-modeling","post_type":"msr-project","post_date":"2004-01-29 16:42:42","post_modified":"2019-08-14 14:50:04","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/acoustic-modeling\/","post_excerpt":"Acoustic modeling of speech typically refers to the process of\u00a0establishing statistical\u00a0representations for the feature vector sequences\u00a0computed from the speech waveform. Hidden Markov Model (HMM) is one most common type of acoustuc models. Other acosutic models include segmental models, super-segmental models (including hidden dynamic models), neural networks, maximum entropy models, and (hidden) conditional random fields, etc. Acoustic modeling also encompasses \"pronunciation modeling\", which describes how a sequence or multi-sequences of fundamental speech units\u00a0(such as phones or&hellip;","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/169434"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/156385","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":1,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/156385\/revisions"}],"predecessor-version":[{"id":527461,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/156385\/revisions\/527461"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=156385"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=156385"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=156385"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=156385"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=156385"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=156385"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=156385"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=156385"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=156385"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=156385"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=156385"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=156385"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=156385"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}