Generative Models of Discourse


July 12, 2007


Eugene Charniak


Brown University


Discourse is the study of how the meaning of a document is built out the meanings of its sentences. As such it is the inter-sentential analogue of semantics. In this talk we consider the following abstract problem. Given a news article, randomly permute the order of its sentences and then attempt to distinguish the original from the permuted version. We present a sequence of generative models that can do this with increasing accuracy. Each (individual) model accounts for some aspect of the document, and assigns a probability to the documents contents. In the standard generative way subsequent models simply multiply the probabilities of the individual models to get their results. We also discuss the linkage of this abstract tasks to more realistic ones such as essay grading, document summarization and document generation.


Eugene Charniak

Eugene Charniak is University Professor of Computer Science and Cognitive Science at Brown University and past chair of the Department of Computer Science. He received his A.B. degree in Physics from University of Chicago, and a Ph.D. from M.I.T. in Computer Science. He has published four books the most recent being Statistical Language Learning. He is a Fellow of the American Association of Artificial Intelligence and was previously a Councilor of the organization. His research has always been in the area of language understanding or technologies which relate to it. Over the last 15 years he has been interested in statistical techniques for many areas of language processing including parsing and, most recently, discourse.