Modern machine learning applications require large models, lots of data, and complicated optimization. I will discuss scaling machine learning by decomposing learning problems into simpler sub-problems. These decompositions allow us to trade off accuracy, computational complexity, and potential for parallelization, where a small sacrifice in one can mean a big gain in another. Moreover, we can tailor our decomposition to our model and data in order to optimize these trade-offs.
I will present two examples. First, I will discuss parallel optimization for sparse regression. Our Shotgun algorithm parallelizes coordinate descent, a seemingly sequential method. Shotgun theoretically achieves near-linear speedups and empirically is one of the fastest methods for multicore sparse regression. Second, I will discuss parameter learning for Probabilistic Graphical Models. Using structured composite likelihood, we prove finite sample complexity bounds for general Markov and Conditional Random Fields, and our bounds indicate how to choose estimator structure in practice. In both examples, our analysis provides strong theoretical guarantees which guide our very practical implementations.