Candidate talk: Domain Adaptation with Structural Correspondence Learning
- John Blitzer | University of Pennsylvania
Statistical language processing tools are being applied to an ever-wider and more varied range of linguistic data. Researchers and engineers are using statistical models to organize and understand financial news, legal documents, biomedical abstracts, and weblog entries, among many other domains. Because language varies so widely, collecting and curating training sets for each different domain is prohibitively expensive. At the same time, differences in vocabulary and writing style across domains can cause state-of-the-art supervised models to dramatically increase in error.
This talk describes structural correspondence learning (SCL), a method for adapting models from resource-rich source domains to resource-poor target domains. SCL uses unlabeled data from both domains to induce a common feature representation for domain adaptation. We demonstrate SCL empirically for the task of sentiment classification, where it decreases error due to adaptation by more than 40%. We also give a uniform convergence bound on the error of a classifier trained in one domain and tested in another. Our bound confirms the intuitive result that a good feature representation for domain adaptation is one which makes domains appear similar, while maintaining discriminative power.
Speaker Details
John Blitzer graduated from Cornell University with a degree in computer science. He is now completing his PhD under Fernando Pereira at the University of Pennsylvania. John’s research area is machine learning for natural language processing, with a primary focus on unsupervised dimensionality reduction of text. He has published papers demonstrating the effectiveness of dimensionality reduction for language modeling, sequence tagging, and sentiment classification. Recently, he has worked on empirical and theoretical analyses of low-dimensional representations for domain adaptation.To learn more about John and his research interests, please visit his homepage and read his research statement:http://www.cis.upenn.edu/~blitzer/http://www.cis.upenn.edu/~blitzer/blitzer_rstatement.pdf
-
-
Jeff Running
-
Watch Next
-
-
Fuzzy Extractors are Practical
- Melissa Chase,
- Amey Shukla
-
-
-
-
-
-
-
From Microfarms to the Moon: A Teen Innovator’s Journey in Robotics
- Pranav Kumar Redlapalli
-