On Learning Parsimonious Models for Extracting Consumer Opinions

Xue Bai; Rema Padman; Edoardo Airoldi

On Learning Parsimonious Models for Extracting Consumer Opinions

Xue Bai ,
Rema Padman ,
Edoardo Airoldi

Proceedings of HICSS-05,the 38th Annual Hawaii International Conference on System Sciences | January 2005

Published by IEEE Computer Society

Download BibTex

Extracting sentiments from unstructured text has emerged as an important problem in many disciplines. An accurate method would enable us, for example, to mine on-line opinions from the Internet and learn customers’ preferences for economic or marketing research, or for leveraging a strategic advantage. In this paper, we propose a two-stage Bayesian algorithm that is able to capture the dependencies among words, and, at the same time, finds a vocabulary that is efficient for the purpose of extracting sentiments. Experimental results on the Movie Reviews data set show that our algorithm is able to select a parsimonious feature set with substantially fewer predictor variables than in the full data set and leads to better predictions about sentiment orientations than several state-of-the-art machine learning methods. Our findings suggest that sentiments are captured by conditional dependence relations among words, rather than by keywords or high-frequency words.