Parabel: Partitioned Label Trees for Extreme Classification with Application to Dynamic Search Advertising

ACM International WWW Conference |

Published by International World Wide Web Conferences Steering Committee | Organized by International World Wide Web Conferences Steering Committee

This paper develops the Parabel algorithm for extreme multi-label learning where the objective is to learn classifiers that can annotate each data point with the most relevant subset of labels from an extremely large label set. The state-of-the-art 1-vs-All based DiSMEC and PPDSparse algorithms are the most accurate but can take upto months for training and prediction as they learn and apply an independent linear classifier per label. Consequently, they do not scale to large datasets with millions of labels. Parabel addresses both limitations by learning a balanced label hierarchy such that: (a) the 1-vs-All classifiers in the leaf nodes of the label hierarchy can be trained on a small subset of the training set thereby reducing the training time to a few hours on a single core of a standard desktop and (b) novel points can be classified by traversing the learned hierarchy in logarithmic time and applying the 1-vs-All classifiers present in just the leaf thereby reducing the prediction time to a few milliseconds per test point. This allows Parabel to scale to tasks considered infeasible for DiSMEC and PPDSparse such as predicting the subset of 7 million Bing queries that might lead to a click on a given ad-landing page for dynamic search advertising.