Abstract

Due to speech recognition errors, repetition can be a frequent
occurrence in voice-search applications. While a proper treatment
of this phenomenon requires the joint modeling of two
or more utterances simultaneously, currently deployed systems
typically treat the utterances independently. In this paper, we
analyze the structure of repetitions and find that in at least one
commercial directory assistance application, repetitions follow
simple structural transformations more than 70% of the time.
We present preliminary results that suggest that significant gains
are possible by explicitly modeling this structure in a joint decoding
process.

‚Äč