Abstract

This paper explores the error-robustness of phone-to-word transduction across a variety of languages. It is motivated by recent literature advocating the decomposition of the decoding process into an initial phoneme recognition step followed by a subsequent word recovery step. We adopt this strategy, and use a phone-to-word transducer for word recovery. Our decoding process requires only a one-best phone string from the first stage and uses an error model on phones to recover from mistakes in the input. We implement a noisy channel model, and by controlling the error level, we are able to measure the sensitivity of different languages to degradation in the phonetic input stream. This analysis is carried further to measure the importance of each phone in each language individually. We study Arabic, Chinese, English, German and Spanish, and find that they behave similarly in this paradigm: in each case, a phone error produces about 1.4 word errors, and frequently incorrect phones matter slightly less than others. In the absence of phone errors, transduced word errors are still present, and we present an information-theoretic measure to explain the observed behavior.

‚Äč