TopK Conductance Queries for Entity Relationship Graphs

Popularity of the object oriented paradigm has ensured the prevalence of entity-relationship graphs in various fields. Providing proximity search on ER graphs is therefore, of importance. We provide a methodology for graph conductance based proximity search on ER graphs. Proximity is evaluated using entity scores based on steady-state distributions of random walks with restarts, considering the ER graph as a Markovian network. This helps us answer queries like “Entities near keywords biochemistry, Information, Retrieval”. We first introduce a conductancebased algorithm – the Bookmark coloring Algorithm which performs pushes using a special hubset of nodes for which steady-state ditributions have been pre-computed and stored and then propose a topK-based termination of the algorithm. Our search system is both time-wise and space-wise efficient and almost as accurate as the ObjectRank system, which we use as a baseline. We show this, via our experiments on Citeseer graphs of sizes 74K and 320K nodes. We explain the need of approximating some of the Personalized PageRank vectors with destination distributions obtained by running multiple random walks originating from those nodes, known as fingerprints. These ditributions approach the PPVs if the number of random walks are large. So, we present some of the schemes for allocating number of random walks for fingerprint computation viz. Saturate-Fill, Linear, Squared, Cohen-based, Uniform. We then justify the usage of a hybrid hubset which consists of word PPVs and entity FPs. The modified conductance-based algorithm can be used to compute directed proximity from a node `a’ to a node `b’ in the ER graph. This is defined as the probability that a random walk started from node `a’ reaches node `b’ before reaching node `a’. We also present ways to find topK most proximal nodes to a particular node. In many applications, the ER graph keeps changing(nodes get added/deleted, edge weights change). We propose an algorithm to update the above forms of proximity, given perturbations to the graph conductance matrix. We also propose a way to use this update algorithm to compute Personalized PageRank vectors faster.