Integrating Several Annotation Layers for Statistical Information Distillation

Michael Levit; Dilek Hakkani-Tür; Gokhan Tur; Daniel Gillick

Integrating Several Annotation Layers for Statistical Information Distillation

Michael Levit ,
Dilek Hakkani-Tür ,
Gokhan Tur ,
Daniel Gillick

Proc. of ASRU | December 2007

Published by IEEE - Institute of Electrical and Electronics Engineers

Download BibTex

We present a sentence extraction algorithm for Information Distillation, a task where for a given templated query, relevant passages must be extracted from massive audio and textual document sources. For each sentence of the relevant documents (that are assumed to be known from the upstream stages) we employ statistical classification methods to est imate the extent of its relevance to the query, whereby two aspects of relevance are taken into account: the template (type) of the query and its slots (free-text descriptions of names, organizations, topic, events and so on, around which templates are centered). The idiosyncrasy of the presented method is in the choice of features used for classification. We extract ou r features from charts, compilations of elements from various annotation levels, such as word transcriptions, syntactic and semantic parses, and Information Extraction annotations. In our experiments we show that this integrated approach outperforms a purely lexical baseline by as much as 30% relative in terms of F-measure. We also investigate the algorithm’s b ehavior under noisy conditions, by comparing its performance on ASR output and on corresponding manual transcriptions.

© IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.