Computational challenges have become more and more important to infer biologically relevant information from the vast amount of experimental data available to systems biologists.
We encompass several approaches to computational biology: we try to frame the biological question under consideration in terms of more standard problems in computer science, like clustering, Steiner trees, flow problems, etc., and then use approximation algorithms motivated by statistical physics to solve these problems. One of our most successful approaches in this realm involves variants of belief- and survey propagation algorithms, but in the course of adapting our problem to this setting, we often need to derive alternative representations of the original computer science problem which might be useful when applying other algorithms as well.
We also approach many problems from the perspective of applied statistics and machine learning, making use of latent variable models and efficient operations on them to perform inference and learning. In this vein, we have tackled problems in CRISPR gene editing; problems in statistical genetics such as effective and efficient handling of unknown confounding factors in eQTL association studies, genome-wide association studies, and analysis of methylation data; immunoinformatics such as HLA imputation and refinement, epitope prediction; problems in proteomics such as alignment of vector time series resulting from liquid-chromatography-mass-spectrometry systems.
Optimized sgRNA design to maximize activity and minimize off-target effects for genetic screens with CRISPR-Cas9
JG Doench*, N Fusi*, M Sullender*, M Hegde*, EW Vaimberg*, KF Donovan, I Smith, Z Tothova, C Wilen , R Orchard , HW Virgin, J Listgarten*, DE Root, Nature Biotechnology(2016)
Warped linear mixed models for the genetic analysis of transformed phenotypes
Fusi F., Lippert C., Lawrence N., Stegle O, Nature Communications (2014)
Epigenome-wide association studies without the need for cell-type composition
Zou J, Lippert C, Heckerman D, Aryee, M, Listgarten J Nature Methods,309–311 (2014)
FaST-LMM-Select for addressing confounding from spatial structure and rare variants
Listgarten* J, Lippert* C, Heckerman* D (*equal contributions) Nature Genetics, 45, 470-471 (2013)
Improved linear mixed models for genome-wide association studies
Listgarten J*, Lippert* C, Kadie C, Davidson B, Eskin E, Heckerman* D *(equal contributions)
Nature Methods, 2012
FaST Linear Mixed Models for Genome-Wide Association Studies
Lippert* C, Listgarten* J., Liu Y, Kadie C, Davidson R, Heckerman* D. (*equal contributions) Nature Methods, Aug. 2011
Correction for Hidden Confounders in the Genetic Analysis of Gene Expression
Listgarten J, Kadie C, Schadt E, Heckerman D
Proceedings of the National Academy of Sciences, September 1, 2010
Statistical resolution of ambiguous HLA typing data
Listgarten J, Brumme Z, Kadie C, Xiaojiang G, Walker B, Carrington M, Goulder P, Heckerman D, PLoS Computational Biology (2008)
Statistical and computational methods for comparative proteomic profiling using liquid chromatography-tandem mass spectrometry Listgarten J and Emili A, Molecular and Cellular Proteomics (2005)
Simultaneous reconstruction of multiple signaling pathways via the prize-collecting Steiner forest problem (N. Tuncbag, A. Braunstein, A. Pagnani, S.S. Huang, J. Chayes, C. Borgs, R. Zecchina, and E. Fraenkel) Journal of Computational Biology20 (2013) 124 – 136.
Finding undetected protein associations in cell signaling by belief propagation (with M. Bailly-Bechet, C. Borgs, A. Braunstein, J. Chayes, A. Dagkessamanskaia, J. Francois, and R. Zecchina). Proceedings of the National Academy of Sciences (PNAS) 108 (2011) 882 – 887.
Statistical mechanics of Steiner trees (M. Bayati, C. Borgs, A. Braunstein, A. Ramezanpour, and R. Zecchina) Physical Review Letters 101, 037208 (2008), reprinted in Virtual Journal of Biological Physics Research16, August 1 (2008).