Extracting Knowledge from Networks: Rumors, Superstars, and Communities


January 19, 2011


Tauhid Zaman




As social networks such as Facebook and Twitter become larger and more complex, it becomes more and more difficult to extract useful information from them efficiently. However, there is a great deal of knowledge encoded in the structure of these networks. In this talk I will focus on techniques for extracting knowledge from networks using their structure in three situations: finding sources of rumors in networks, predicting the emergence of superstars in networks, and learning community structure in networks. In all three problems I will use the notion of network centrality to extract the relevant knowledge.
For the rumor source detection problem, I develop a new centrality notion which I call “rumor centrality” which is an exact maximum likelihood estimator for the rumor source in certain networks. Rumor centrality counts the number of rumor spreading sequences from a fixed rumor source. I show how the network structure affects one’s ability to detect rumor sources and prove asymptotic limits for the detection probability on different network topologies. Simulations performed on real network topologies such as the Internet, US powergrid, and Facebook, show that rumor centrality is an effective rumor source estimator.
Superstars are observed in empirical data from Twitter networks. These are nodes with degree on the order of the network size. Twitter networks for specific topics are found to have both a power-law degree distribution and a single superstar node. Common network growth models such as preferential attachment fail to predict the emergence of superstars. To remedy this, I propose a new topological network growth model in which new nodes connect to existing nodes with probability proportional to their “importance”. If node degree is used as the importance, then this model reduces to preferential attachment. However, if rumor centrality is used as the importance, then this model predicts not only the superstar nodes seen in the Twitter data, but also the power-law degree distribution. This model works for Twitter networks on topics ranging from Wimbledon and the World Cup to music award shows, indicating that there may be a common, topic independent mechanism driving the growth of these networks and the emergence of these superstars.
Finally, I propose the “leader-follower” algorithm for learning community structure in networks. The algorithm is based upon the idea that a community should have leaders which connect it to other communities, and followers, which have no neighbors outside of the community. The algorithm uses network centrality in a differential manner to detect leaders and followers, and then assigns followers to leaders to form communities. The algorithm learns the number and size of communities naturally from the network structure and runs in O(|E|) time for a network with edgeset E. Also, the algorithm is guaranteed to find any communities which are cliques and contain at least one loyal follower. Experiments run on Facebook networks and human brain networks show that the algorithm can find relevant communities and resolve a more detailed community structure than other more common methods such as spectral clustering.


Tauhid Zaman

Tauhid is a PhD student at the Massachusetts Institute of Technology advised by Professor Devavrat Shah. His research focuses on algorithm design for large-scale networks. He has developed algorithms for finding rumor sources in networks and for learning the community structure of networks, and also created a topological network growth model that accurately describes Twitter networks. During an internship at Microsoft Research Cambridge in the summer of 2010 he designed a system to predict the spread of information on Twitter. He is currently a Shell-MIT Energy Fellow.