Speech Recognition with Segmental Conditional Random Fields: A Summary of the JHU CLSP 2010 Summer Workshop

  • Geoffrey Zweig ,
  • Patrick Nguyen ,
  • Dirk Van Compernolle ,
  • Kris Demuynck ,
  • Les Atlas ,
  • Pascal Clark ,
  • Greg Sell ,
  • Meihong Wang ,
  • Fei Sha ,
  • Hynek Hermansky ,
  • Damianos Karakos ,
  • Aren Jansen ,
  • Samuel Thomas ,
  • Samuel Bowman ,
  • Justine Kao ,
  • G.S.V.S. Sivaram

ICASSP 2011 |

Published by IEEE

This paper summarizes the 2010 CLSP Summer Workshop on speech recognition at Johns Hopkins University. The key theme of the workshop was to improve on state-of-the-art speech recognition systems by using Segmental Conditional Random Fields (SCRFs) to integrate multiple types of information. This approach uses a state-of-the-art baseline as a springboard from which to add a suite of novel features including ones derived from acoustic templates, deep neural net phoneme detections, duration models, modulation features, and whole word point-process models. The SCRF framework is able to appropriately weight these different information sources to produce significant gains on both the Broadcast News and Wall Street Journal tasks.