Abstract

Machine learning requires an effective combination of data, features, and algorithms. While many tools exist for working with machine learning data and algorithms, support for thinking of new features, or feature ideation, remains poor. In this paper, we investigate two general approaches to support feature ideation: visual summaries and sets of errors. We present FeatureInsight, an interactive visual analytics tool for building new dictionary features (semantically related groups of words) for text classification problems. FeatureInsight supports an error-driven feature ideation process and provides interactive visual summaries of sets of misclassified documents. We conducted a controlled experiment evaluating both visual summaries and sets of errors in FeatureInsight. Our results show that visual summaries significantly improve feature ideation, especially in combination with sets of errors. Users preferred visual summaries over viewing raw data, and only preferred examining sets when visual summaries were provided. We discuss extensions of both approaches to data types other than text, and point to areas for future research.