Abstract

Checking transcription errors in speech database is an important but tedious task that traditionally requires intensive manual labor. In [9], Template Constrained Posterior (TCP) was proposed to automate the checking process by screening potential erroneous sentences with a single context template. However, single template-based method is not robust and requires parameter optimization that still involves some manual work. In this work, we propose to use multiple templates which is more robust and requires no development data for parameter optimization. By using its multiple hypothesis sifting capabilities — from well-defined, full context to loosely defined context like wild card, the confidence for a focus unit can be measured at different expected accuracy. The joint verification by multiple TCP improves measured confidence of each unit in the transcription and is robust across different speech databases. Experimental results show that the checking process automatically separates erroneous sentences from correct ones: the sentence error hit rate decrease rapidly in the sorted TCP values, from 59% to 7% for the Mexican Spanish database and from 63% to 11% for the American English database, among the top 10% sentences in the rank lists.

‚Äč