Abstract

We study issues related to designing speech event detectors for
automatic speech recognition. Event detection is a critical
component of a recently proposed automatic speech attribute
transcription (ASAT) paradigm for speech research. Similar to
keyword spotting and non-keyword rejection, a good detector
needs to effectively detect speech attributes of interest while
rejecting extraneous events. We compare frame and segment
based detectors, study their properties in detecting manners of
articulation, and propose new performance measures. We test
these detectors on the TIMIT database with several evaluation
criteria. Our results indicate that segment based detectors
outperform frame based detectors in several key aspects of
speech detector design. We also show that the performance
can be significantly enhanced by incorporating discriminative
training into designing speech event detectors.

‚Äč