Towards Spoken Term Discovery at Scale with Zero Resources

May 7, 2010
Kenneth Church | Johns Hopkins University

The spoken term discovery task takes speech as input and identifies terms of possible interest. The challenge is to perform this task efficiently on large amounts of speech with zero resources (no training data and no dictionaries), where we must fall back to more basic properties of language. We find that long (~1 s) repetitions tend to be contentful phrases (e.g. University of Pennsylvania) and propose an algorithm to search for these long repetitions without first recognizing the speech. To address efficiency concerns, we take advantage of (i) sparse feature representations and (ii) inherent low occurrence frequency of long content terms to achieve orders-of-magnitude speedup relative to the prior art. We frame our evaluation in the context of spoken document information retrieval, and demonstrate our method’s competence at identifying repeated terms in conversational telephone speech.

Speaker Details

I am the head of a data mining department in AT&T Labs-Research. I received my BS, Masters and PhD from MIT in computer science in 1978, 1980 and 1983, and immediately joined AT&T Bell Labs, where I have been ever since (though the name of the organization has changed). I have worked in many areas of computational linguistics including: acoustics, speech recognition, speech synthesis, OCR, phonetics, phonology, morphology, word-sense disambiguation, spelling correction, terminology, translation, lexicography, information retrieval, compression, language modeling and text analysis. I enjoy working with very large corpora such as the Associated Press newswire (1 million words per week). My datamining department is currently applying similar methods to much larger data sets such as telephone call detail (1-10 billion records per month).