Computing Representations of the Structure of Written Discourse

  • Simon Corston-Oliver

MSR-TR-98-15 |

Publication

RASTA (Rhetorical Structure Theory Analyzer), a discourse analysis component within the Microsoft English Grammar, efficiently computes representations of the structure of written discourse using cue phrases and additional information available in syntactic and logical form analyses of a text. RASTA heuristically scores the rhetorical relations that it hypothesizes, using those scores to guide it in producing more plausible discourse representations before less plausible ones. The heuristic scores also provide a genre-independent method for evaluating competing discourse analyses: the best discourse analyses are those constructed from the strongest hypotheses. This dissertation describes in detail a set of linguistic cues that can be identified in a text as evidence of discourse relations, and gives complete and explicit algorithms for identifying the terminal nodes of a discourse analysis and for efficiently combining those terminal nodes to form hierarchical representations of discourse structure.