Graph partitioning is a key problem to enable efficient solving of a wide range of computational tasks and querying over large-scale graph data, such as computing node centralities using iterative computations, and personalized recommendations. In this work, we introduce a unifying framework for graph partitioning which enables a well principled design of scalable, streaming graph partitioning algorithms that are amenable to distributed implementation. We show that many previously proposed methods are special instances of this framework, we derive a novel one-pass, streaming graph partitioning algorithm and show that it yields significant benefits over previous approaches, using a large set of real-world and synthetic graphs.
Surprisingly, despite the fact that our algorithm is a one-pass streaming algorithm, we found its performance to be overall comparable to the de-facto standard offline software METIS, and it even outperforms it on numerous real-world graphs. For instance, for the Twitter graph with more than 1.4 billion of edges, our method partitions the graph in about 40 minutes achieving a balanced partition that cuts as few as 6.8% of edges, whereas it took more than 8.5 hours by METIS to produce a balanced partition that cuts 11.98% of edges. Furthermore, modularity–a popular measure for community detection [Girvan and Newman, 2002; Newman and Girvan, 2004; Newman, 2006]–is also a special instance of our framework. We establish the first rigorous approximation algorithm, achieving a guarantee of O(log(k)/k) for partitioning into k clusters.
Finally, we evaluate the performance gains by using our graph partitioner while solving standard PageRank computation in a graph processing platform, and observe significant gains in terms of the communication cost and runtime.