Multi-rate neural networks for efficient acoustic modeling
- Vijay Peddinti | Johns Hopkins University
In sequence recognition, the problem of long-span dependency in input sequences is typically tackled using recurrent neural network architectures, and robustness to sequential distortions is achieved using training data representative of a variety of these distortions. However, both these solutions substantially increase the training time. Thus low computation complexity during training is critical for acoustic modeling. This talk proposes the use of multi-rate neural network architectures to satisfy the design requirement of computational efficiency. In these architectures the network is partitioned into groups of units, operating at various sampling rates. As the network evaluates certain groups only once every few time steps, the computational cost is reduced. This talk will focus on the multi-rate feed-forward convolutional architecture. It will present results on several large vocabulary continuous speech recognition (LVCSR) tasks with training data ranging from 3 to 1800 hours to show the effectiveness of this architecture in efficiently learning wider temporal dependencies in both small and large data scenarios. Further it will discuss the use of this architecture for robust acoustic modeling in far-field environments. This model was shown to provide state-of-art results in the ASpIRE far-field recognition challenge. This talk will also discuss some preliminary results of multi-rate recurrent neural network based acoustic models.
Speaker Details
Vijayaditya Peddinti is a PhD candidate in Center for Language and Speech Processing at the Johns Hopkins University. He is working with Dan Povey and Sanjeev Khudanpur for his thesis. His research focus is in the area of acoustic modeling for automatic speech recognition. He is a recipient of the Fred Jelinek Fellowship and the best student paper award at Interspeech 2015. He was part of JHU’s 4 member team which was one of the winners of the ASpIRE far-field recognition challenge held by IARPA. He has done internships at IBM T.J. Watson Research Center and Microsoft Research. Website : vijaypeddinti.com
-
-
Jeff Running
-
Series: Microsoft Research Talks
-
Decoding the Human Brain – A Neurosurgeon’s Experience
- Dr. Pascal O. Zinn
-
-
-
-
-
-
Challenges in Evolving a Successful Database Product (SQL Server) to a Cloud Service (SQL Azure)
- Hanuma Kodavalla,
- Phil Bernstein
-
Improving text prediction accuracy using neurophysiology
- Sophia Mehdizadeh
-
Tongue-Gesture Recognition in Head-Mounted Displays
- Tan Gemicioglu
-
DIABLo: a Deep Individual-Agnostic Binaural Localizer
- Shoken Kaneko
-
-
-
-
Audio-based Toxic Language Detection
- Midia Yousefi
-
-
From SqueezeNet to SqueezeBERT: Developing Efficient Deep Neural Networks
- Forrest Iandola,
- Sujeeth Bharadwaj
-
Hope Speech and Help Speech: Surfacing Positivity Amidst Hate
- Ashique Khudabukhsh
-
-
-
Towards Mainstream Brain-Computer Interfaces (BCIs)
- Brendan Allison
-
-
-
-
Learning Structured Models for Safe Robot Control
- Subramanian Ramamoorthy
-