Abstract

Recommender systems find items that a target user is likely to find interesting. A common approach is collaborative filtering (CF), where several users share information in order to provide each with recommendations. Practical applications require dealing with huge mounts
of information. Previous work has used sketching techniques to handle these massive data sets, but only allowed testing whether two users have a high proportion of items they have both ranked. We show how to determine the correlation between the rankings of two users, by sending and maintaining extremely concise “sketches” of the rankings of items. The sketches allow approximating Kendall’s Tau, a known non-parametric rank correlation grade, with high accuracy and high confidence. The required sketch size is logarithmic in the confidence and polynomial in the accuracy. Our methods are based on random min-wise independent hash functions, and the good tradeoff between the sketch size, the accuracy and the confidence makes the method very well-suited for large-scale collaborative filtering systems.