Discriminatively Trained Spoken Document Similarity Models and Their Application to Probabilistic Latent Semantic Analysis

Kit Thambiratnam; Frank Seide; Roger (Peng) Yu

Discriminatively Trained Spoken Document Similarity Models and Their Application to Probabilistic Latent Semantic Analysis

Kit Thambiratnam ,
Frank Seide ,
Roger (Peng) Yu

Proc. IEEE Workshop on Spoken Language Technology (SLT) | January 2006

Published by IEEE

Download BibTex

This paper presents a novel framework for discriminatively training spoken document similarity models. Traditional similarity methods such as Vector Space Modeling and Probabilistic Latent Semantic Analysis suffer from a mismatch in modeling and evaluation objective functions. This work proposes reconciling this mismatch by using a discriminative training process in conjunction with prior knowledge of known document relationships to train an ensemble of spoken document similarity models. The reported experiments demonstrate dramatic improvements in mAP performance for the tasks of related document search and query-by-document retrieval, and highlight the ability of the resulting models to better generalize to unseen topics and unseen documents.

© 2008 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.http://www.ieee.org/