We study statistical learning problems where the data space is multimodal and heterogeneous, and constructing a single global predictor is difficult. We address such problems by iteratively identifying high-error regions in the data space and learning specialized predictors for those regions. While the idea of composing localized predictors is not new, our approach is unique in that we actively seek out predictors that clump errors together, making it easier to isolate the problematic regions. When errors are clumped together they are also easier to interpret and resolve through appropriate feature engineering and data preprocessing. We present an error-clumping classification algorithm based on a convex optimization problem, and an efficient stochastic optimization algorithm for this problem. We theoretically motivate our approach with a novel sample complexity analysis for piecewise predictors, and empirically demonstrate its behavior on an illustrative classification problem.