Distributed Newton Methods for CTR (Click Through Rate) Prediction


February 20, 2014


Chih-Jen Lin


National Taiwan University


CTR (Click Through Rate) prediction is extremely important for Internet advertisements. Data of users’ impression and click logs possess two major challenges. First, the collected data set in just a few days contains billions or more instances. Second, the number of positive data (i.e., clicks) is relatively small, so the data set is highly unbalanced. We develop a distributed Newton method for training very large-sale logistic regression. We use real data to analyze the scalability of our method, the relationship between test accuracy and data size, the workflow of big-data experiments, and the various tools for implementing big-data machine learning packages.