Resource acquisition via an unsupervised WSD system

December 3, 2004
Mona T Diab | Stanford University

In the field of WSD, it has been established that supervised systems (systems that rely on sense labeled data) perform significantly better than unsupervised WSD systems. Unfortunately, acquiring manual sense annotated data has proven to be a tedious expensive task. In this talk, I will show you how I use my unsupervised system, SALAAM, to acquire high quality sense annotated data to be used by supervised WSD systems for training. The approach could effectively reduces manual annotations by at least 40%. In this portion of my talk I will characterize some crucial identifying features for discovering which automatically annotated data could be useful and which ones do require hand labeling. Another use for SALAAM lies in the ability to bootstrap resources for different languages. I will illustrate how I build an Arabic WordNet with a relatively high accuracy based on human ratings and judgments. The results obtained are very much in agreement with results reported by builders of the EuroWordNet.

Speaker Details

Mona Diab is currently a postdoctoral scholar at Stanford University, working with Dan Jurafsky and Chris Manning on Arabic Language Processing. She was a research associate at the Center for Spoken Language Research, University of Colorado, Boulder from July 2003-December 2003. Mona received her PhD in August 2003 in the University of Maryland College Park under the supervision of Philip Resnik.