The IBM 2004 Conversational Telephony System for Rich Transcription

  • Hagen Soltau ,
  • Brian Kingsbury ,
  • Lidia Mangu ,
  • Daniel Povey ,
  • George Saon ,
  • Geoffrey Zweig

Proceedings of ICASSP |

This paper describes the technical advances in IBM’s conversational telephony submission to the DARPA-sponsored 2004 Rich Transcription evaluation (RT-04). These advances include a system architecture based on cross-adaptation; a new form of feature-based MPE training; the use of a full-scale discriminatively trained full covariance gaussian system; the use of septaphone cross-word acoustic context in static decoding graphs; and the incorporation of 2100 hours of training data in every system component. These advances reduced the error rate by approximately 21% relative, on the 2003 test set, over the best-performing system in last year’s evaluation, and produced the best results on the RT-04 current and progress CTS data.