Incorporating Discourse and Syntactic Dependencies into Probabilistic Models for Summarization of Multiparty Speech

PhD Thesis: Columbia University (Ph.D. thesis) |

Automatic speech summarization can reduce the overhead of navigating through long streams of speech data by generating concise and readable textual summaries that capture the “aboutness” of speech recordings, such as meetings, talks, and lectures. In this thesis, we address the problem of summarizing meeting recordings, a task that faces many challenges not found with written texts and prepared speech. Informal style, speech errors, presence of many speakers, and apparent lack of coherent organization mean that the typical approaches used for text summarization are generally difficult to apply to speech. On the other hand, meetings provide a rich source of structural and pragmatic information that presents many research opportunities.

In this work, we illustrate how techniques that exploit automatically derived pragmatic and syntactic structures can help create summaries that are more on-target and readable than those produced by current state-of-the-art approaches. We present a conditional Markov random field model for summary sentence selection that exploits long-distance dependencies between pragmatically related utterances, such as between a question and its answer, or an offer and its acceptance. These dependencies are automatically inferred by a probabilistic model of interpersonal interaction, and are instrumental in achieving accuracies superior to competitive approaches. We also investigate the problem of rendering errorful and disfluent meeting utterances into sentences that are readable, pertinent, and concise. We present a trainable syntax-directed sentence compression system that automatically removes information-poor phrases and clauses from the syntax tree of each input sentence.

A novel aspect of this sentence compression approach lies in its use of Markov synchronous context-free grammars, which provide a probabilistic decomposition of deletion rules into lexicosyntactic sub-structures more amenable to robust estimation. This sentence compression technique was effectively applied to both speech and written texts, and produces sentences that were judged more grammatical than those generated by previous work.