Bayesian Methods for Unsupervised Language Learning

Date

July 16, 2007

Speaker

Sharon Goldwater

Affiliation

Stanford

Overview

Unsupervised learning of linguistic structure is a difficult task. Frequently, standard techniques such as maximum-likelihood estimation yield poor results or are simply inappropriate (as when the class of models under consideration includes models of varying complexity). In this talk, I discuss how Bayesian statistical methods can be applied to the problem of unsupervised language learning to develop principled model-based systems and improve results. I first present some work on word segmentation, showing that maximum-likelihood estimation is inappropriate for this task and discussing a nonparametric Bayesian modeling solution. I then argue, using part-of-speech tagging as an example, that a Bayesian approach provides advantages even when maximum-likelihood (or maximum a posteriori) estimation is possible. I conclude by discussing some of the challenges that remain in pursuing a Bayesian approach to language learning.

Speakers

Sharon Goldwater

Sharon Goldwater is a postdoctoral scholar in the linguistics department at Stanford University, where she works with Dan Jurafsky, Chris Manning, and others in the Stanford natural language processing group. Her research focuses on unsupervised learning and computer modeling of language acquisition, particularly phonology and morphology. She completed her master’s degree in computer science in 2005 and her Ph.D. in linguistics in 2006, both from Brown University. Prior to graduate school, she worked as a researcher in the Artificial Intelligence Laboratory at SRI International. In addition to her work on unsupervised language learning, she has published papers on machine translation, statistical parsing, and human-computer dialogue systems.