A Discriminative Training Framework Using N-Best Speech Recognition Transcriptions and Scores for Spoken Utterance Classification

  • Sibel Yaman ,
  • Li Deng ,
  • Dong Yu ,
  • Ye-Yi Wang ,
  • Alex Acero

Proc. of the International Conference on Acoustics, Speech and Signal Processing |

Published by Institute of Electrical and Electronics Engineers, Inc.

In this paper, we propose a novel discriminative training approach to spoken utterance classification (SUC). The ultimate objective of the SUC task, originally developed to map a spoken speech utterance into the most appropriate semantic class, is to minimize the classification error rate (CER). Conventionally, a two-phase approach is adapted, in which the first phase is the ASR transcription phase, and the second phase is the semantic classification phase. In the proposed framework, the classification error rate is approximated as differentiable functions of the language and classifier model parameters. Furthermore, in order to exploit all the available information from the first phase, class-specific discriminant functions are defined based on score functions derived from the N-best lists. Our experimental results on the standard ATIS database indicate a notable reduction in CER from the earlier best result on the identical task. The proposed framework achieved a reduction of CER from 4.92% to 4.04%.