This paper proposes a new method for computing page importance, referred to as BrowseRank. The conventional approach to compute page importance is to exploit the link graph of the web and to build a model based on that graph. For instance, PageRank is such an algorithm, which employs a discrete-time Markov process as the model. Unfortunately, the link graph might be incomplete and inaccurate with respect to data for determining page importance, because links can be easily added and deleted by web content creators. In this paper, we propose computing page importance by using a ’user browsing graph’ created from user behavior data. In this graph, vertices represent pages and directed edges represent transitions between pages in the users’ web browsing history. Furthermore, the lengths of staying time spent on the pages by users are also included. The user browsing graph is more reliable than the link graph for inferring page importance. This paper further proposes using the continuous-time Markov process on the user browsing graph as a model and computing the stationary probability distribution of the process as page importance. An eﬃcient algorithm for this computation has also been devised. In this way,we can leverage hundreds of millions of users’ implicit voting on page importance. Experimental results show that BrowseRank indeed outperforms the baseline methods such as PageRank and TrustRank in several tasks.
Copyright © 2007 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or email@example.com. The definitive version of this paper can be found at ACM's Digital Library --http://www.acm.org/dl/.