Parallel Bayesian Network Structure Learning for Genome-Scale Gene Networks


March 9, 2015


Sanchit Misra


Intel, Bangalore


Learning Bayesian networks is NP-hard. Even with recent progress in heuristic and parallel algorithms, modeling capabilities still fall short of the scale of the problems encountered. In this work, we present a massively parallel method for Bayesian network structure learning, and demonstrate its capability by constructing genome-scale gene networks of the model plant Arabidopsis thaliana from over 168.5 million gene expression values. We report strong scaling efficiency of 75% and demonstrate scaling to 1.57 million cores of the Tianhe-2 supercomputer. Our results constitute three and five orders of magnitude increase over previously published results in the scale of data analyzed and computations performed, respectively. We achieve this through algorithmic innovations, using efficient techniques to distribute work across all compute nodes, all available processors and coprocessors on each node, all available threads on each processor and coprocessor, and vectorization techniques to maximize single thread performance.


Sanchit Misra

Dr. Sanchit Misra is a Research Scientist at the Parallel Computing Lab, Intel, Bangalore. His research interests include computational biology, machine learning, high performance computing and application driven architecture design. Currently, he is working on parallel algorithms for key machine learning and computational biology algorithms for multi/many-core architectures and their clusters. Sanchit earned a PhD in Computer Engineering from Northwestern University in 2011. Prior to that, he got his B.Tech. from IIT Kharagpur in 2005. He worked at Trilogy, Bangalore from 2005-2006. While at Northwestern, Sanchit had been on summer internships at Intel (2007) and Google (2010). He has published at noted conferences and journals like Bioinformatics, IPDPS and Supercomputing.