Less is More: Sampling the Neighborhood Graph Makes SALSA Better and Faster

  • Marc Najork ,
  • Sreenivas Gollapudi ,
  • Rina Panigrahy

2nd ACM International Conference on Web Search and Data Mining (WSDM) |

Published by Association for Computing Machinery, Inc.

In this paper, we attempt to improve the effectiveness and the efficiency of query-dependent link-based ranking algorithms such as HITS, MAX and SALSA. All these ranking algorithms view the results of a query as nodes in the web graph, expand the result set to include neighboring nodes, and compute scores on the induced neighborhood graph. In previous work it was shown that SALSA in particular is substantially more effective than query-independent link-based ranking algorithms such as PageRank. In this work, we show that whittling down the neighborhood graph through consistent sampling of nodes and edges makes SALSA and its cousins both faster (more efficient) and better (more effective). We offer a hypothesis as to why “less is more”, i.e. why using a reduced graph improves performance.