{"id":901248,"date":"2022-11-28T05:50:32","date_gmt":"2022-11-28T13:50:32","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-academic-program&#038;p=901248"},"modified":"2023-10-14T10:56:10","modified_gmt":"2023-10-14T17:56:10","slug":"deep-noise-suppression-challenge-icassp-2023","status":"publish","type":"msr-academic-program","link":"https:\/\/www.microsoft.com\/en-us\/research\/academic-program\/deep-noise-suppression-challenge-icassp-2023\/","title":{"rendered":"ICASSP 2023 Deep Noise Suppression Challenge"},"content":{"rendered":"\n\n<p>Includes de-reverberation and suppression of interfering talkers for headset and speakerphone scenarios<\/p>\n\n\n\n\n\n\n<p><strong>Program dates:<\/strong>&nbsp;November 2022\u2013August 2023<\/p>\n\n\n\n<p><em>See Results tab for final results evaluated on Blind Testset. Five papers were invited from 7 teams where three teams from Tencent agreed to submit one paper. These teams will submit 2-page paper in ICASSP 2023. For deadline, please see the timeline tab. These five teams will get an email with instructions on writing the 2-page paper. All teams are invited to submit their current and future work in DNS Challenge Special Issues of IEEE OJ-SP journal. More details on OJ-SP will be emailed to all participants in next few months. <\/em><\/p>\n\n\n\n<p>With decades of research, deep noise suppression (DNS) has advanced to improve subjective overall audio quality. However, we are still quite far from eliminating speech distortions which results from over-suppression of noise, reverberation and neighboring talkers etc. in real-world scenarios. Enhancing speech quality (i.e., eliminating speech distortion) and suppressing noise, reverberation and neighboring talkers are two trade-offs DNS models must handle. IEEE ICASSP 2023 Deep Noise Suppression (DNS) grand challenge is the 5th edition of Microsoft DNS challenges with focus on deep speech enhancement achieved by suppressing background noise, reverberation and neighboring talkers and enhancing the signal quality. This challenge invites researchers to develop real-time deep speech enhancement models for full band speech. Deep speech enhancement models submitted to challenge are supposed to do joint denoising and dereverberation in presence of neighboring (interfering) talkers. We will release the development test set for intermediate evaluation of challenge models and a blind test set for final evaluation for selecting top 5 models. Each test clips will have a corresponding enrollment clip (30s) for primary talker to enable the development of personalized models. Participants are encouraged to develop both personalized and non-personalized models to elucidate the benefits of personalized for deep speech enhancement models.<\/p>\n\n\n\n<p>The challenge has two tracks: (1) Headset (wired\/wireless headphone, earbuds such as airpods etc.) speech enhancement; (2) non-headset (speakerphone, built-in mic in laptop\/desktop\/mobile phone\/other meeting devices etc.) speech enhancement. Past challenges demonstrated that headset scenarios exhibit certain acoustic properties by dint of proximity of microphone to primary talker. Such acoustic properties can be leveraged to improve deep speech enhancement models with and without personalization. On the other hand, speakerphone cases exhibit different set of acoustic properties which motivates a separate track for non-headset scenarios. Participants can develop personalized as well as non-personalized (non-enrollment) deep speech enhancement models for both tracks. Blind test set will have paralinguistic test clips which will include standard forms of paralanguage including but not limited to The throat-clear, \u201dhmm\u201d or \u201dmhm\u201d, \u201dHuh?\u201d or \u201dwhat?, Gasps, Sighs, Moans and groans, Deceptive speech, Sincere speech, Speech with high-base, Speech with high-pitch, Speech with low-pitch, Confident speech, Tired speech (when talker is tired), Persuasive speech, Voice change mid-clip (i.e. mimicry in last 50% of the clip). Blind testset will also include emotional speech including but not limited to happy, sad, angry, yelling, crying, and laughter. Blind testset include real test clips with high reverberation, high reverberation with noise, and noise in presence of interfering talkers. Testset noises include but not limited to Office scenarios (Typing, AC, Door shutting, Eating\/munching, Copy machine, Squeaking chair, Notification sounds etc.), Home scenarios (Baby crying, Dogs, TV, Radiators, hair dryer, kitchen noise, running water etc.), Appliances (Washer Dryer, Dishwasher, Coffee maker, kitchen noise, Vacuum cleaner etc.), Fire alarm, Car, Inside parked car on busy road, In-car neighboring talkers, Traffic Road, Car noise (from machinery, control systems, turn signal etc.), Caf\u00b4e, Coffee machine, Blender, Background babble, Airport Announcements etc. <\/p>\n\n\n\n<p>In all test clips, we have only one primary talker in enrollment clip while noisy test clip may have noise, reverberation, one or more neighboring talkers in addition to a primary talker. The goal of deep speech enhancement model is to preserve primary speaker speech while suppressing everything else. We provide a flexible framework for synthesizing training datasets which allows participants to choose a subset of challenge dataset or add their own corpora to augmented challenge training dataset. The two tracks have different test sets using corresponding device types. Test clips are real-world recordings collected through crowd sourcing. Test set includes representative noisy scenarios relevant for video\/audio meetings in hybrid-work settings. The challenge overview paper will discuss the test set data collection in detail including list of devices, specifications sent to crowd-sourced workers, steps taken for quality assurance (QA) of test set. Test sets are selected to include speaker variety, device variety, different acoustic properties such as impulse response, direct-to-reverb-ratio (DRR), T60 which is achieved by changing the relative and absolute position of primary and interfering talkers, noise source and presence of reflecting surfaces etc. Along with training datasets and testsets, we also provide a baseline model (or enhanced clips) for both tracks. Track 1 and Track 2 both may have personalized or non-personalized models, but we only provide one baseline for each track.<\/p>\n\n\n\n<p>We also provide personalized P.835 subjective evaluation framework which is used for both tracks along with Word Accuracy (WAcc) Azure API. The personalized P.835 subjective evaluation framework is an improved version of framework used in the 4th DNS Challenge. Our subjective framework is a modified version of ITU-T (International Telecommunication Union-Telecommunication Standardization Sector) P.835 which provides three scores for each test clip (and its corresponding enrollment clip) namely speech quality (SIG), background<br>noise quality (BAK), and overall quality (OVRL). This challenge aims to improve the subjective audio quality as measured by SIG, BAK and OVRL, and provide improvements in WAcc. Crowd-sourced workers doing subjective evaluation are instructed to rate interfering talkers as undesirable signal so the model which suppresses interfering talkers is rated higher. Similarly, WAcc ground-truth transcripts will only contain words spoken by the primary talker, thus treating interfering talker as undesirable signal. Enhanced test clips from past DNS Challenges had noticeable WAcc and SIG degradation due to over-suppression resulting in removal of primary talker\u2019s speech. We open-sourced DNSMOS P.835 which is a deep neural network (DNN) model for non-intrusive prediction of speech, background noise and overall quality of audio signal. DNSMOS aims to help participants with intermediate model evaluations. Participants need to register on challenge\u2019s CMT site which will be used for challenge communication: https:\/\/cmt3.research.microsoft.com\/DNSChallenge2023. Questions related to challenge can be sent to dns challenge@microsoft.com.<\/p>\n\n\n\n<p>The challenge paper is <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/pdf\/2202.13288\">ICASSP 2022 Deep Noise Suppression Challenge<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n\n\n\n<p><strong>Please NOTE<\/strong> that the intellectual property (IP) is not transferred to the challenge organizers, i.e., if code is shared\/submitted, the participants remain the owners of their code (when the code is made publicly available, an appropriate license should be added).<\/p>\n\n\n\n<p><strong>Challenge Tracks<\/strong><\/p>\n\n\n\n<p>\u2022 This challenge has two tracks: Track-1: Headset DNS; Track-2: Speakerphone DNS.<br>\u2022 Each testclip in both tracks has enrollment clip with 30s duration. The enrollment speech can be noise-free or noisy with\/without reverberation. This facilitates multi-condition enrollment of primary talkers which serves as a measure of robustness for personalized models which use enrollment speech as additional input for denosing the testclips.<br>\u2022 Participants can choose to work on models with speaker enrollment or without it for one or both tracks. Each team can submit 1-4 models depending on what experiments they conduct. Each participating team can submit a maximum of one personalized and one non-personalized model for each track, e.g., a team can submit one personalized and one non-personalized model for Track 1 but not two personalized or two non-personalized model for Track 1. Similarly, a team can submit 4 models, personalized and non-personalized<br>model for Track 1 and personalized and non-personalized model for Track 2. All models for a track will be evaluated and ranked together i.e., both personalized and non-personalized models for Track 1 will go through one subjective evaluation. Similarly, for track 2, all models in one subjective evaluation.<br>\u2022 Participants are encouraged to conduct experiments with both personalized and non-personalized models to elucidate the benefits of personalization. Though, this is NOT a requirement for this challenge.<br>\u2022 Each track will have its own dev testset and blind testset, each consisting of 600 testclips. Thus, total of 2400 unique testclips is released in this challenge. The main difference between two tracks will be devices used for collecting the testsets, i.e., headsets for Track 1 or speakerphones for Track 2.<\/p>\n\n\n\n<p><strong>Challenge Requirements<\/strong><\/p>\n\n\n\n<p><span style=\"text-decoration: underline\">Failing to adhere to challenge rules will lead to disqualification from the challenge. <\/span><\/p>\n\n\n\n<p><strong><em>Algorithmic <\/em><\/strong><span style=\"font-size: 18px\"><b><i>laten<\/i><\/b><\/span><strong><em>cy:<\/em><\/strong> The offset introduced by the whole processing chain including STFT, iSTFT, overlap-add, additional lookahead frames, etc., compared to just passing the signal through without modification. But this doesn&#8217;t include buffering latency.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Ex.1: A STFT-based processing with window length = 20 ms and hop length = 10 ms introduces an algorithmic delay of window length &#8211; hop length = 10 ms. <\/li>\n\n\n\n<li>Ex.2: A STFT-based processing with window length = 32 ms and hop length = 8 ms introduces an algorithmic delay of window length &#8211; hop length = 24 ms. <\/li>\n\n\n\n<li>Ex.3: An overlap-save based processing algorithm introduces no additional algorithmic latency.<\/li>\n\n\n\n<li>Ex.4: A time-domain convolution with a filter kernel size = 16 samples introduce an algorithmic latency of kernel size \u2013 1 = 15 samples. Using one-sided padding, the operation can be made fully \u201ccausal\u201d, i.e. a left-sided padding with kernel size-1 samples would result in no algorithmic latency.<\/li>\n\n\n\n<li>Ex.5: A STFT-based processing with window_length = 20 ms and hop_length = 10 ms using 2 future frames information introduces an algorithmic latency of (window_length &#8211; hop_length) + 2*hop_length = 30 ms.<\/li>\n<\/ul>\n\n\n\n<p><strong><em>Buffering latency:<\/em><\/strong> It is defined as the latency introduced by block-wise processing, often referred to as hop-size, frame-shift, or temporal stride.<\/p>\n\n\n\n<p>\u2022Ex.1: A STFT-based processing has a buffering latency corresponding to the hop size<br>\u2022 Ex.2: A overlap-save processing has a buffering latency corresponding to the frame size.<br>\u2022 Ex.3: A time-domain convolution with stride 1 introduces a buffering latency of 1 sample.<\/p>\n\n\n\n<p><strong><em>Real-time factor (RTF):<\/em><\/strong> RTF is defined as the fraction of time it takes to execute one processing step. For a STFT-based algorithm, one processing step is the hop-size. For a time-domain convolution, one processing step is 1 sample. RTF = compute time\/time step.<\/p>\n\n\n\n<p><strong>All models submitted to this challenge must meet all of the below requirements.<\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>To be able to execute an algorithm in real-time, and to accommodate for variance in compute time which occurs in practice, we require RTF <= 0.5 in the challenge on an Intel Core i5 Quadcore clocked at 2.4 GHz using single thread. <\/li>\n\n\n\n<li>Algorithmic latency + buffering latency <= 20ms. <\/li>\n\n\n\n<li>No future information can be used during model inference.<\/li>\n\n\n\n<li>Participants can only enhance the testclips by a single pass through their model.<\/li>\n\n\n\n<li>None of the testclips from current or previous DNS Challenges can be used for training or fine-tuning the model.<\/li>\n<\/ol>\n\n\n\n<p><strong>Evaluation Criteria and Methodology<\/strong><\/p>\n\n\n\n<p>This challenge\u202fadopts the\u202fITU-T P.835 subjective\u202ftest framework\u202fto measure speech quality (SIG), background noise quality (BAK), and overall audio quality (OVRL). We modified the ITU-T P.835 to make it reliable for test clips with interfering (undesired neighboring) talkers. We are also releasing&nbsp;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/arxiv.org\/pdf\/2110.01763.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">DNSMOS\u202fP.835<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, which is a machine learning\u202fbased\u202fmodel for predicting SIG, BAK, OVRL. Participants can use DNSMOS P.835 to evaluate their intermediate models.\u202fIn this challenge, we introduced\u202fWord Accuracy (WAcc)\u202fas\u202fan additional metric to compare the performance of DNS models. Challenge winners will be decided based on OVRL and WAcc as follows:<\/p>\n\n\n\n<p class=\"has-text-align-center\">\\( \\text{Metric} = {{{(\\text{OVLR} \u2013 1) \\over 4}+\\text{WAcc}} \\over 2} \\)<\/p>\n\n\n\n<p>WAcc\u202fwill be obtained using\u202fMicrosoft\u202fAzure Speech Recognition API. This challenge metric gives an equal weighting between subjective quality and speech recognition performance. The dev-test set and\u202fDNSMOS\u202fP.835 are provided to participants to accelerate model development.\u202fA script to evaluate WAcc is also provided.\u202fWe neither use the dev-test set nor\u202fDNSMOS\u202fP.835 for deciding final winners.\u202fDNSMOS\u202fP.835 has a high correlation with human perception and hence can serve as a robust measure of audio quality. Challenge winner will be decided based on&nbsp;<em>M<\/em>&nbsp;computed on enhanced clips from blind\u202ftest set.  <\/p>\n\n\n\n<p>Challenge winners will be decided based on enhanced blind test set. Participants are also required to report multiply\u2013accumulate (MAC) or multiply-add (MAD) operation in single-pass inference of one audio frame for all models submitted to the challenge. In case of tie, model with lower MAC, lower RTF and lower algorithmic latency will be ranked higher. <\/p>\n\n\n\n<p><strong>Registration procedure<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>There are two steps in registering for the challenge. <\/li>\n\n\n\n<li>Step 1: Participants are required to email Deep Noise Suppression Challenge dns_challenge@microsoft.com with the List of all the participants, Affiliation of each participant (include country name), Contact information for the team and Name of your team. <\/li>\n\n\n\n<li>Step 2: Participants need to register on the Challenge CMT site https:\/\/cmt3.research.microsoft.com\/DNSChallenge2023 where they can submit the enhanced clips and receive challenge announcements etc.<\/li>\n\n\n\n<li>Organizers plan to announce the availability of data, baseline models, and evaluation results etc. via CMT.<\/li>\n\n\n\n<li>Challenge leadership board will be managed using Piazza platform (Event name: ICASSP 2023 DNS CHALLENGE). Participants can register using the link below: https:\/\/piazza.com\/microsoft dns challenge\/spring2023\/icassp2023dnschallenge<\/li>\n\n\n\n<li>Results, challenge rules and descriptions of the two tracks will be posted to the challenge website: https:\/\/aka.ms\/dns-challenge<\/li>\n\n\n\n<li>Organizers will make the challenge overview paper available initially on Arxiv\/Researchgate. It will eventually be published at OJ-SP as per IEEE GC guidelines.<\/li>\n<\/ul>\n\n\n\n<p><strong>Contact us:&nbsp;<\/strong>If you have questions about this program, email us at&nbsp;<a href=\"mailto:dns_challenge@microsoft.com\">dns_challenge@microsoft.com<\/a>.<\/p>\n\n\n\n\n\n<h2 class=\"wp-block-heading\" id=\"heading-official-rules\">Official rules<\/h2>\n\n\n\n<p>SPONSOR<\/p>\n\n\n\n<p>These Official Rules (\u201cRules\u201d) govern the operation of the&nbsp;IEEE ICASSP 2023 Deep Noise Suppression&nbsp;Challenge&nbsp;Event&nbsp;Contest (\u201cContest\u201d). Microsoft Corporation, One Microsoft Way, Redmond, WA, 98052, USA, is the Contest sponsor (\u201cSponsor\u201d).<\/p>\n\n\n\n<p>DEFINITIONS<\/p>\n\n\n\n<p>In these Rules, \u201cMicrosoft\u201d, \u201cwe\u201d, \u201cour\u201d, and \u201cus\u201d, refer to Sponsor, and \u201cyou\u201d and \u201cyourself\u201d refers to a Contest participant or the parent\/legal guardian of any Contest entrant who has not reached the age of majority to contractually obligate themselves in their legal place of residence. \u201cEvent\u201d refers to the&nbsp;ICASSP 2023 Deep Noise Suppression&nbsp;event held in&nbsp;Singapore&nbsp;(the \u201cEvent\u201d). By entering you (your parent\/legal guardian if you are not the age of majority in your legal place of residence) agree to be bound by these Rules.<\/p>\n\n\n\n<p>ENTRY PERIOD<\/p>\n\n\n\n<p>The Contest will operate from&nbsp;November 1,&nbsp;2022&nbsp;to August 9, 2023&nbsp;(\u201cEntry Period\u201d). The Entry Period is divided into several periods as described in Section 5 How to Enter.<\/p>\n\n\n\n<p>ELIGIBILITY<\/p>\n\n\n\n<p>Open to any registered Event attendee 18 years of age or older. If you are 18 years of age or older but have not reached the age of majority in your legal place of residence, then you must have the consent of a parent\/legal guardian. Employees and directors of Microsoft Corporation and its subsidiaries, affiliates, advertising agencies, and Contest Parties are not eligible, nor are persons involved in the execution or administration of this promotion, or the family members of each above (parents, children, siblings, spouse\/domestic partners, or individuals residing in the same household). Void in Cuba, Iran, North Korea, Sudan, Syria, Region of Crimea, and where prohibited. For business\/tradeshow events: If you are attending the Event in your capacity as an employee, it is your sole responsibility to comply with your employer\u2019s gift policies. Microsoft will not be a party to any disputes or actions related to this matter. PLEASE NOTE: If you are a public sector employee (government and education), all prize awards will be awarded directly to your public sector organization and subject to receipt of a gift letter signed by your agency\/institution\u2019s ethics officer, attorney, or designated executive\/officer responsible for your organization\u2019s gifts\/ethics policy. Microsoft seeks to&nbsp;ensure that by offering items of value at no charge in promotional settings it does not create any violation of the letter or spirit of the entrant\u2019s applicable gifts and ethics rules.<\/p>\n\n\n\n<p>HOW TO ENTER<\/p>\n\n\n\n<p>The Contest Objective is to promote collaborative research in real-time single-channel Speech Enhancement aimed to maximize the subjective (perceptual) quality of&nbsp;the enhanced&nbsp;speech. Prizes will be awarded based on the speech quality of deep noise suppression models using the online subjective evaluation framework ITU-T P.835. Only methods described in accepted&nbsp;ICASSP 2023&nbsp;Deep Noise Suppression Challenge&nbsp;papers will be eligible for the contest. You may participate as an individual or a team. If forming a team, you must designate a \u201cTeam Captain\u201d who will submit all entry materials on behalf of the team. Once you register as part of a Team, you cannot change Teams or alter your current team (either by adding or removing members) after the submission of your Entry. Limit one Entry per person and per team. You may not compete&nbsp;on&nbsp;multiple&nbsp;teams,&nbsp;and you may not enter individually and on a team. We are not responsible for Entries that we do not receive for any reason, or for Entries that we receive but are not decipherable or not functional for any reason. Each Team is solely responsible for its own cooperation and teamwork. In no event will Microsoft officiate in any dispute regarding the conduct or cooperation of any Team or its members. The Contest will operate as follows:<\/p>\n\n\n\n<p><strong>Registration \/ Development Period:&nbsp;November 1, 2022 \u2013 January 27, 2023<br><\/strong>\u2022 There are two steps in registering for the challenge.   <\/p>\n\n\n\n<p>\u2022 Step 1: Participants are required to email Deep Noise Suppression Challenge dns_challenge@microsoft.com with the List of all the participants, Affiliation of each participant (include country name), Contact information for the team and Name of your team.<br>\u2022 Step 2: Participants need to register on the Challenge CMT site https:\/\/cmt3.research.microsoft.com\/DNSChallenge2023 where they can submit the enhanced clips and receive challenge announcements etc.<br>\u2022 Organizers plan to announce the availability of data, baseline models, and evaluation results etc. via CMT.<br>\u2022 Challenge leadership board will be managed using Piazza platform (Event name: ICASSP 2023 DNS CHALLENGE). Participants can register using the link below: https:\/\/piazza.com\/microsoft dns challenge\/spring2023\/icassp2023dnschallenge<br>\u2022 Results, challenge rules and descriptions of the two tracks will be posted to the challenge website: <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/aka.ms\/5th-dns-challenge\">https:\/\/aka.ms\/5th-dns-challenge<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><br>\u2022 Organizers will make the challenge overview paper available initially on Arxiv\/Researchgate. It will eventually be published at IEEE Open Journal of Signal Processing (OJ-SP) as per IEEE GC guidelines.<\/p>\n\n\n\n<p>Then&nbsp;1)&nbsp;develop a speech enhancement model that best meets the Contest Objective as described in the base paper and&nbsp;2)&nbsp;computational complexity of the model in terms of the number of parameters and the time it takes to infer a frame on a particular CPU (preferably Intel Core i5 quad core machine clocked at 2.4 GHz). To develop your model, use any publicly available clean speech and noise datasets, including the contest datasets provided for training and developing models. You may augment your datasets with the contest dataset. You may mix clean speech and noise in any way that improves the performance of your model.<\/p>\n\n\n\n<p>The final evaluation will be conducted on a blind test set that is similar to the open-sourced development stage test set. You may use scripts for a baseline noise suppressor that was recently published here. Testing \/ Entry Period: December 5, 2022 \u2013 January 20, 2023. On Feb. 6, 2023, the blind test dataset will be released. You will have until 11:59 PM AoE on Feb. 9, 2023 to test your model against this dataset and create a set of enhanced clips to submit for judging (your \u201cEntry\u201d) via Conference Management Tool.<\/p>\n\n\n\n<p>You may not use the blind test set to retrain or tweak your model. To submit your entry, submit your processed clips via conference management tool. Each Entry will fall in one of two tracks&nbsp;based on testset used. You must satisfy all the requirements of each track as described on homepage. You must also specify the Number of operations per second in your paper submission. IEEE ICASSP 2023&nbsp;Paper Submission and Judging Period:&nbsp;Feb. 9, 2023&nbsp;11:59 PM AoE\u2013&nbsp;March 14, 2023 11:59 PM AoE . Your Entry must be described in a paper accepted by&nbsp;ICASSP 2023 Deep Noise Suppression Challenge. To submit a paper,&nbsp;use the&nbsp;IEEE ICASSP 2023 Grand Challenge paper submission site. The entry limit is one per person during the Entry Period. Any attempt by any you to obtain more than the stated number of entries by using multiple\/different accounts, identities, registrations, logins, or any other methods will void your entries and you may be disqualified. Use of any automated system to participate is prohibited.<\/p>\n\n\n\n<p>We are not responsible for excess, lost, late, or incomplete entries. If disputed, entries will be deemed submitted by the \u201cauthorized account holder\u201d of the email address, social media account, or other method used to enter. The \u201cauthorized account holder\u201d is the natural person assigned to an email address by internet or online service provider, or other organization responsible for assigning email addresses.<\/p>\n\n\n\n<p>PAPER FORMAT<\/p>\n\n\n\n<p>The challenge papers are published in two phases. First phase consists of 2-page summary papers (by invitation only) due on Feb. 20, 2023 and will be published in IEEE ICASSP proceedings. Second phase consists of full-length journal article to be submitted to IEEE Open Journal of Signal Processing (by invitation only) due on August 9, 2023. <\/p>\n\n\n\n<p>ELIGIBLE ENTRY<\/p>\n\n\n\n<p>To be eligible, an entry must meet the following content\/technical requirements:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Your Entry must be the among top 5 teams as per evaluation methodology used in this challenge. <\/li>\n\n\n\n<li>Your 2-page summary paper must be accepted by&nbsp;ICASSP 2023 Grand Challenge review process.<\/li>\n\n\n\n<li>Your entry must be your own original work; and<\/li>\n\n\n\n<li>Your entry cannot have been selected as a winner in any other contest; and<\/li>\n\n\n\n<li>You must have obtained&nbsp;any and all&nbsp;consents, approvals, or licenses required for you to submit your entry; and<\/li>\n\n\n\n<li>To the extent that entry requires the submission of user-generated content such as software, photos, videos, music, artwork, essays, etc., entrants warrant that their entry is their original work, has not been copied from others without permission or apparent rights, and does not violate the privacy, intellectual property rights, or other rights of any other person or entity. You may include Microsoft trademarks, logos, and designs, for which Microsoft grants you a limited license to use for the sole purposes of submitting an entry into this Contest; and<\/li>\n\n\n\n<li>Your entry may NOT contain, as determined by us in our sole and absolute discretion, any content that is obscene or offensive, violent, defamatory, disparaging or illegal, or that promotes alcohol, illegal drugs, tobacco or a particular political&nbsp;agenda, or that communicates messages that may reflect negatively on the goodwill of Microsoft.<\/li>\n\n\n\n<li>Your entry must NOT include enhanced clips using other noise suppression methods that you are not submitting to&nbsp;ICASSP 2023&nbsp;Deep Noise Suppression Challenge.<\/li>\n<\/ul>\n\n\n\n<p>USE OF ENTRIES<\/p>\n\n\n\n<p>We are not claiming ownership rights to your Submission. However, by submitting an entry, you grant us an irrevocable, royalty-free, worldwide right and license to use, review, assess, test, and otherwise analyze your entry and all its content in connection with this Contest and use your entry in any media whatsoever now known or later invented for any non-commercial or commercial purpose, including, but not limited to, the marketing, sale or promotion of Microsoft products or services, without further permission from you. You will not receive any compensation or credit for use of your entry, other than what is described in these Official Rules.<\/p>\n\n\n\n<p>By entering you acknowledge that we may have developed or commissioned materials similar or identical to your entry and you waive any claims resulting from any similarities to your entry. Further, you understand that we will not restrict work assignments of representatives who have had access to your&nbsp;entry&nbsp;and you agree that the use of information in our representatives\u2019 unaided memories in the development or deployment of our products or services does not create liability for us under this agreement or copyright or trade secret law.<\/p>\n\n\n\n<p>Your entry may be posted on a public website. We are not responsible for any unauthorized use of your entry by visitors to this website. We are not obligated to use your entry for any purpose, even if it has been selected as a winning entry.<\/p>\n\n\n\n<p>WINNER SELECTION AND NOTIFICATION<\/p>\n\n\n\n<p>Pending confirmation of eligibility, potential prize winners will be selected by Microsoft or their Agent or a qualified judging panel from among all eligible entries received based on the following judging criteria: 99% \u2013 The subjective speech quality evaluated on the blind test set using ITU-T P.835 framework. We will use the submitted clips with no alteration to conduct ITU-T P.835 subjective evaluation.&nbsp;We will use&nbsp;Word Accuracy&nbsp;(WAcc) as an additional metric to compare the performance of DNS models.&nbsp;WAcc&nbsp;will be obtained using Microsoft Azure Speech Recognition API.&nbsp;Challenge winners will be decided based on&nbsp;Overall&nbsp;MOS&nbsp;(OVRL)&nbsp;rating&nbsp;from the ITU-T P.835 subjective evaluation&nbsp;results&nbsp;and&nbsp;WAcc&nbsp;as follows:&nbsp;M=((OVRL &#8211; 1)\/4+WAcc)\/2.<\/p>\n\n\n\n<p>This challenge metric gives an equal weighting between subjective quality and speech recognition performance.&nbsp;Among the submitted proposals, if the&nbsp;difference between&nbsp;overall&nbsp;evaluation metric M&nbsp;between the models is not statistically significant, the model with a lower number of operations per second be given a higher ranking.&nbsp;1%&nbsp;\u2013 The Entry was described in an accepted&nbsp;ICASSP 2023&nbsp;Deep Noise Suppression Challenge&nbsp;paper.&nbsp;Winners will be selected within 7 days following the event. Winners will be notified within 7 days following the Event.<\/p>\n\n\n\n<p>In the event of a tie between any eligible entries, an additional judge will break the tie based on the judging criteria described above. The decisions of the judges are final and binding. If we do not receive enough entries meeting the entry requirements, we may, at our discretion, select fewer winners. If public vote determines winners, it is prohibited for any person to obtain votes by any fraudulent or inappropriate means, including offering prizes or other inducements in exchange for votes, automated programs, or fraudulent IDs. Microsoft will void any questionable votes.<\/p>\n\n\n\n<p>ODDS<\/p>\n\n\n\n<p>The odds of winning are based on the number and quality of eligible entries received.<\/p>\n\n\n\n<p>GENERAL CONDITIONS AND RELEASE OF LIABILITY<\/p>\n\n\n\n<p>To the extent allowed by law, by entering you agree to release and hold harmless Microsoft and its respective parents, partners, subsidiaries, affiliates, employees, and agents from&nbsp;any and all&nbsp;liability or any injury, loss, or damage of any kind arising in connection with this&nbsp;Contest&nbsp;or any prize won.<\/p>\n\n\n\n<p>All local laws apply. The decisions of Microsoft are final and binding.<\/p>\n\n\n\n<p>We reserve the right to cancel, change, or suspend this Contest for any reason, including cheating, technology failure, catastrophe, war, or any other unforeseen or unexpected event that affects the integrity of this Contest, whether human or mechanical. If the integrity of the Contest cannot be restored, we may select winners from among all eligible entries received before we had to cancel, change or suspend the Contest.<\/p>\n\n\n\n<p>If you attempt or we have strong reason to believe that you have compromised the integrity or the legitimate operation of this Contest by cheating, hacking, creating a&nbsp;bot&nbsp;or other automated program, or by committing fraud in any way, we may seek damages from you to the full extent of the law and you may be banned from participation in future Microsoft promotions.<\/p>\n\n\n\n<p>GOVERNING LAW<\/p>\n\n\n\n<p>This Contest will be governed by the laws of the State of Washington, and you consent to the exclusive jurisdiction and venue of the courts of the State of Washington for any disputes arising out of this Contest.<\/p>\n\n\n\n<p>PRIVACY<\/p>\n\n\n\n<p>At Microsoft, we are committed to protecting your privacy. Microsoft uses the information you provide on this form to notify you of important information about our products, upgrades and enhancements, and to send you information about other Microsoft products and services. Microsoft will not share the information you provide with third parties without&nbsp;your permission except where necessary to complete the services or transactions you have requested, or as required by law. Microsoft is committed to protecting the security of your personal information. We use a variety of security technologies and procedures to help protect your personal information from unauthorized access, use, or disclosure. Your personal information is never shared outside the company without your permission, except under&nbsp;conditions&nbsp;explained above.<\/p>\n\n\n\n<p>If you believe that Microsoft has not adhered to this statement, please contact Microsoft by sending an email to\u202f<a href=\"mailto:privrc@microsoft.com\">privrc@microsoft.com<\/a>\u202for postal mail to Microsoft Privacy Response Center, Microsoft Corporation, One Microsoft Way, Redmond, WA.<\/p>\n\n\n\n\n\n<h2 class=\"wp-block-heading\" id=\"heading-program-timeline\">Program timeline<\/h2>\n\n\n\n<p><em>Time zone for below dates is Anywhere on Earth<\/em>&nbsp;(AoE)<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Release of Dev Test set, Training dataset:<strong>&nbsp;<\/strong>Dec. 6, 2022<\/li>\n\n\n\n<li>Release of Baseline enhanced clips:<strong>&nbsp;<\/strong>Dec. 12, 2022<\/li>\n\n\n\n<li>Enhanced Dev Test set for complimentary subjective evaluation due: Jan. 24, 2023<\/li>\n\n\n\n<li>Results of complimentary subjective evaluation on enhanced Dev Test set: Jan. 27, 2023<\/li>\n\n\n\n<li>Release of Blind Test set:&nbsp;Feb. 6, 2023<\/li>\n\n\n\n<li>Enhanced Blind Test set for final subjective evaluation due:<strong>&nbsp;<\/strong>Feb. 8, 2023<\/li>\n\n\n\n<li>Results of final subjective evaluation on Enhanced Blind Test set:&nbsp;Feb. 13, 2023<\/li>\n\n\n\n<li>Grand Challenge 2-page Papers Due (by invitation only): Feb. 20, 2023<\/li>\n\n\n\n<li>Grand Challenge 2-page Paper Acceptance Notification: March 7, 2023<\/li>\n\n\n\n<li>Camera-ready Grand Challenge 2-page Papers Due: March 14, 2023<\/li>\n\n\n\n<li>Grand Challenge OJ-SP Papers Due (by invitation only): August 9, 2023<\/li>\n<\/ul>\n\n\n\n\n\n<h2 class=\"wp-block-heading\" id=\"heading-organizers\">Organizers<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Harishchandra Dubey, Microsoft, USA<\/li>\n\n\n\n<li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/ashkana\/\">Ashkan Aazami<\/a>, Microsoft, USA<\/li>\n\n\n\n<li>Vishak Gopal, Microsoft, USA<\/li>\n\n\n\n<li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/rcutler\/\">Ross Cutler<\/a>,&nbsp;Microsoft, USA<\/li>\n\n\n\n<li>Sergiy Matusevych, Microsoft, USA<\/li>\n\n\n\n<li>Sebastian Braun, Microsoft Research, Germany<\/li>\n\n\n\n<li>Emre Eskimez, Microsoft Research, USA<\/li>\n\n\n\n<li>Takuya Yoshioka, Microsoft Research, USA<\/li>\n\n\n\n<li>Hannes Gamper, Microsoft Research, USA<\/li>\n\n\n\n<li>Robert Aichner, Microsoft, USA<\/li>\n<\/ul>\n\n\n\n\n\n<h2 class=\"wp-block-heading\" id=\"heading-related-links\">Related links<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/microsoft\/DNS-Challenge\/\" target=\"_blank\" rel=\"noopener noreferrer\">Training and test datasets<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>&nbsp;<\/li>\n\n\n\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/microsoft\/DNS-Challenge\" target=\"_blank\" rel=\"noopener noreferrer\">Data synthesizer and unit tests scripts<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>&nbsp;<\/li>\n\n\n\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/microsoft\/DNS-Challenge\/tree\/master\/DNSMOS\" target=\"_blank\" rel=\"noopener noreferrer\">DNSMOS Azure service<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>&nbsp;<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"heading-other-challenges\">Other challenges<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/academic-program\/audio-deep-packet-loss-concealment-challenge-interspeech-2022\/\">Packet Loss Concealment Challenge \u2013 INTERSPEECH 2022<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/academic-program\/acoustic-echo-cancellation-challenge-icassp-2022\/\">Acoustic Echo Cancellation Challenge \u2013 ICASSP 2022<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/academic-program\/acoustic-echo-cancellation-challenge-icassp-2021\/\">Acoustic Echo Cancellation Challenge \u2013 ICASSP 2021<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/academic-program\/deep-noise-suppression-challenge-icassp-2021\/\">Deep Noise Suppression Challenge \u2013 ICASSP 2021<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/academic-program\/acoustic-echo-cancellation-challenge-interspeech-2021\/\">Acoustic Echo Cancellation Challenge \u2013 INTERSPEECH 2021<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/academic-program\/deep-noise-suppression-challenge-interspeech-2020\/\">Deep Noise Suppression Challenge \u2013 INTERSPEECH 2020<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/academic-program\/deep-noise-suppression-challenge-icassp-2022\/\" target=\"_blank\" rel=\"noreferrer noopener\">Deep Noise Suppression Challenge &#8211; ICASSP 2022<\/a><\/li>\n<\/ul>\n\n\n\n\n\n<p><strong>Results: Personalized P.835 subjective evaluation for Track 1 &#8211; Headset.&nbsp;<\/strong>DMOS is difference of MOS between enhanced speech and noisy speech. Verified Real-time &#8216;Yes&#8217; means we verified it with enhanced NRT Testset.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"637\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/03\/website_headset_final_results-1024x637.png\" alt=\"track1 final results\" class=\"wp-image-927720\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/03\/website_headset_final_results-1024x637.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/03\/website_headset_final_results-300x187.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/03\/website_headset_final_results-768x478.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/03\/website_headset_final_results-1536x955.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/03\/website_headset_final_results-240x149.png 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/03\/website_headset_final_results.png 1737w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Track 1- Headset results<\/figcaption><\/figure>\n\n\n\n<p><strong>ANOVA results for Track-1:<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"231\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/03\/new_anova_track1-1024x231.png\" alt=\"new anova track 1\" class=\"wp-image-932187\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/03\/new_anova_track1-1024x231.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/03\/new_anova_track1-300x68.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/03\/new_anova_track1-768x174.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/03\/new_anova_track1-1536x347.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/03\/new_anova_track1-2048x463.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/03\/new_anova_track1-240x54.png 240w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><strong>Results: Personalized P.835 subjective evaluation for Track 2 &#8211; Speakerphone.&nbsp;<\/strong>DMOS is difference of MOS between enhanced speech and noisy speech. Verified Real-time &#8216;Yes&#8217; means we verified it with enhanced NRT Testset.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"613\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/03\/website_speakerphone_final_results-1024x613.png\" alt=\"Track2 final results\" class=\"wp-image-927723\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/03\/website_speakerphone_final_results-1024x613.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/03\/website_speakerphone_final_results-300x180.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/03\/website_speakerphone_final_results-768x460.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/03\/website_speakerphone_final_results-1536x920.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/03\/website_speakerphone_final_results-240x144.png 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/03\/website_speakerphone_final_results.png 1758w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Track 2 &#8211; Speakerphone results<\/figcaption><\/figure>\n\n\n\n<p><strong>ANOVA results for Track-2:<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"246\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/03\/new_anova_track2-1024x246.png\" alt=\"new anova Track 2\" class=\"wp-image-932193\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/03\/new_anova_track2-1024x246.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/03\/new_anova_track2-300x72.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/03\/new_anova_track2-768x185.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/03\/new_anova_track2-1536x370.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/03\/new_anova_track2-2048x493.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/03\/new_anova_track2-240x58.png 240w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr_hide_image_in_river":0,"footnotes":""},"msr-opportunity-type":[187426],"msr-region":[256048],"msr-locale":[268875],"msr-program-audience":[],"msr-post-option":[],"msr-impact-theme":[],"class_list":["post-901248","msr-academic-program","type-msr-academic-program","status-publish","hentry","msr-opportunity-type-challenges","msr-region-global","msr-locale-en_us"],"msr_description":"","msr_social_media":[],"related-researchers":[{"type":"user_nicename","display_name":"Ashkan Aazami","user_id":41054,"people_section":"Section name 0","alias":"ashkana"},{"type":"guest","display_name":"Robert Aichner","user_id":902031,"people_section":"Section name 0","alias":""},{"type":"guest","display_name":"Sebastian Braun","user_id":902019,"people_section":"Section name 0","alias":""},{"type":"user_nicename","display_name":"Ross Cutler","user_id":40660,"people_section":"Section name 0","alias":"rcutler"},{"type":"guest","display_name":"Hannes Gamper","user_id":902022,"people_section":"Section name 0","alias":""},{"type":"guest","display_name":"Mehrsa Golestaneh","user_id":902025,"people_section":"Section name 0","alias":""},{"type":"user_nicename","display_name":"Vishak Gopal","user_id":39624,"people_section":"Section name 0","alias":"vigopal"},{"type":"guest","display_name":"Babak Naderi","user_id":902016,"people_section":"Section name 0","alias":""}],"tab-content":[],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-academic-program\/901248","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-academic-program"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-academic-program"}],"version-history":[{"count":89,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-academic-program\/901248\/revisions"}],"predecessor-version":[{"id":1114602,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-academic-program\/901248\/revisions\/1114602"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=901248"}],"wp:term":[{"taxonomy":"msr-opportunity-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-opportunity-type?post=901248"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=901248"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=901248"},{"taxonomy":"msr-program-audience","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-program-audience?post=901248"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=901248"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=901248"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}