Creating Diverse Ensemble Classifiers to Reduce Supervision

August 30, 2005
Prem Melville | University of Texas at Austin

For many predictive modeling tasks, acquiring supervised training data for building accurate classifiers is often difficult or expensive. Training data may either be limited, or often additional data may be acquired, but there is a cost associated with the acquisition. We study the problem of learning with reduced supervision in three setting. First, in the pure supervised learning setting, where we try to maximize the utility of small datasets. Second, in a traditional active learning setting, where a large pool of unlabeled examples is available, and the learner can select training examples to be labeled. Third, in the setting of active feature-value acquisition, where the data contain missing feature-values, that may be acquired at a cost. For these settings, we present methods to learn more accurate models at lower costs of data acquisition. Our methods are based on a new technique for building a diverse ensemble of classifiers by using specially constructed artificial training examples.

Experiments demonstrate that our method, DECORATE, performs consistently better than bagging, boosting and Random Forests when training data is limited. We also show that DECORATE can be very effective for the tasks of active learning and active feature-value acquisition.

Speaker Details

Prem Melville is a Ph.D. candidate at the University of Texas at Austin under the supervision of Dr. Raymond Mooney. His research interests lie in Machine Learning and Data Mining; more specifically, he has worked in ensemble methods, active learning, active feature-value acquisition, recommender systems and class probability estimation.