Interspeech 2018 Special Session: Low Resource Speech Recognition Challenge for Indian Languages

Challenge Overview

In keeping with the Interspeech 2018 theme of ‘Speech Research for Emerging Markets in Multilingual Societies’, we are organizing a special session and challenge on speech recognition for low resource languages. Most languages in the world lack the amount of text, speech and linguistic resources required to build large Deep Neural Network (DNN)-based models. However, there have been many advances in DNN architectures, cross-lingual and multilingual speech processing techniques, and approaches incorporating linguistic knowledge into machine-learning based models, that can help in building systems for low resource languages. In this challenge, we will focus on building Automatic Speech Recognition (ASR) systems for Indian languages with constraints on the data available for Acoustic Modeling and Language Modeling.

India has around 1500 languages, of which 22 languages have been given the status of official languages by the Government of India. According to the 2001 census, 29 Indian languages have more than a million speakers. Most of these languages, except for Hindi, are low resource. Many of these, do not have a written script and hence, speech technology solutions would greatly benefit such communities. To be able to truly support speech and language systems that can be used by everyone in the country, we need to come up with techniques to build systems in these resource constrained settings, while also exploiting the unique properties and similarities between Indian languages.

We are releasing data in Telugu, Tamil and Gujarati, and participants in this challenge will be required to use only the released data to build ASR systems in these languages, which will make the task fair for all participants and direct the focus of the work to the low resource setting. However, we will not restrict participants from only working on one of the components of the ASR pipeline – participants will be free to innovate in any aspect of the ASR system as long as they only use the data provided. We will release a baseline system that participants can compare their systems against and use as a starting point. During testing, we will release a held-out blind test set that the systems will be evaluated on.

Contact us: interspeech2018@microsoft.com

Challenge Timeline

January 2, 2018 – Registration for the challenge opens, training data released

January 15, 2018 – Release baseline recipe

March 6, 2018 – Test portal opens at 10 am IST (Indian Standard Time, GMT+05:30) with test audio released.

March 9, 2018 – Test deadline for competition at 5 pm IST (Indian Standard Time, GMT+05:30); up to 3 hypotheses files per language

March 16, 2018 – Abstract submission deadline

March 23, 2018 – Interspeech final paper upload deadline

June 17, 2018 – Camera ready paper deadline

Note: While submitting the paper, please make sure you select the special session “Low Resource Speech Recognition Challenge for Indian Languages”.

Challenge Rules

1. To participate in the Challenge, you must register and consent to the agreement at the “Register” page and download the data. Participants may not share the data with any person or organization without Microsoft’s prior written consent.

2. Participants who register but do not submit a system to the Challenge are considered withdrawn from the Challenge and are required to delete and purge all data copies.

3. To qualify for the Challenge, Participants must submit a system created on the following guidelines:

  • Participants may only use the audio and transcriptions provided to build their systems.
  • Participants may choose to use the corresponding language’s data to build each system or combine the data and use it cross-lingually.
  • Participants may build systems for any number of languages, even if they all use the data.
  • The systems submitted are expected to beat the baseline system in terms of WER, however, innovative systems that come close to the baseline may be considered.
  • Only the audio for the blind test set (5 hours) will be released. Participants are expected to run their systems on the blind test set.
  • Participants must submit the following items to Microsoft for evaluation: (1) the ASR hypotheses; (2) the final ASR model; and (3) the research paper so Microsoft can reproduce the hypotheses against the blind set.

4. Participants who register and submit systems to the Challenge may use the data in the future solely for research purposes. Participants should provide the following attribution when they publish their findings, “Data provided by SpeechOcean.com and Microsoft”. Data may not be used for commercial purposes.

If you have any questions, please write to interspeech2018@microsoft.com

Registration

Registration for the challenge is now closed.

Baselines

Baselines are built using Kaldi. Please see this README file for instructions on how to replicate the baselines and for links to lexicons for all languages.

The Word Error Rates of the baseline systems for all three languages are below:

Language GMM-HMM DNN TDNN
Tamil  33.55  25.47  19.45
Telugu  40.12  34.97  22.61
Gujarati  23.78  27.79  19.76

Leaderboard

Language: Gujarati

Models submitted: 40

Number of teams: 18

Team Name Word Error Rate
Jilebi 14.06%, 14.70%, 15.04%
Cogknit 17.69%
ISI-Billa 19.31%

 

Language: Tamil

Models submitted: 36

Number of teams: 14

Team Name Word Error Rate
Jilebi 13.92%, 14.08%, 14.27%
Cogknit 16.07%
CSALT-LEAP 16.32%

 

Language: Telugu

Models submitted: 33

Number of teams: 18

Team Name Word Error Rate
Jilebi 14.71%, 14.86%, 15.07%
Cogknit 17.14%
CSALT-LEAP 17.59%

 

Note: Final winners will be determined after verification and replication of results.