United States Change | All Microsoft Sites
A wealth of biological data
Science at Microsoft

Algorithms that Can Handle the "Omics"

Imagine two Neanderthal proto-scientists, standing before an enormous pile of rocks—big rocks, small rocks, smooth rocks, rocks with jagged edges.

“Well,” remarks one of Stone Age researchers, “this is a lot of data here.”

“Yes,” replies his colleague, “if only we had some way to make sense of it all, I bet we could achieve a breakthrough in rock utilization.”

Today, biological researchers are confronting a similar dilemma: a wealth of data, but an inadequate analytical toolkit. Increasingly, biologists are using genomic methods, such as expression profiling, next-generation sequencing, and RNAi screens, together with proteomic and metabolomic technologies, to discover the molecular basis of changes in living systems. Collectively, these methods are often referred to as “omics.”

Unfortunately, these advances in experimental technologies have, in many cases, outpaced the development of the bioinformatics tools that are needed to analyze the data. In other words, optimal analysis of large quantities of experimental data often requires the solution of hard problems that are intractable by conventional computational tools. Happily, collaborative efforts between the Massachusetts Institute of Technology and Microsoft Research have resulted in new algorithms that often do quite well in analyzing these types of problems.

Our methods, which extend ideas from statistical physics of disordered systems to problems in computer science, have provided novel distributive algorithmic schemes for solving large-scale optimization and inference tasks. Among the features of our new algorithms are computational efficiency, parallelizability, and flexibility to include heterogeneous prior knowledge and to integrate diverse data sources. The spectrum of applications ranges from constraint satisfaction and stochastic optimization problems over networks, to graphical games and statistical inference problems.

Our collaboration aims to apply these new algorithmic techniques to significant problems in biological research. Preliminary results have already led to the discovery of new functional genes, the prediction of protein contacts from sequence data, and the discovery of a new algorithm for message passing. These developments have the potential for broad application across computer science and to improve future Microsoft products.

Our current focus is on cancer genomics. In collaboration with the Memorial Sloan-Kettering Cancer Center, we are working on the integration of different types of molecular data to reveal complex response pathways of relevance in cancer development. We hope that this work leads not only to advances in general algorithmic techniques for biological research, but also to the development of drug targets for specific cancers.

Primary Researchers

Jennifer Tour Chayes

Jennifer Tour Chayes is Distinguished Scientist and managing director of Microsoft Research New England in Cambridge, Massachusetts, which she co-founded in July 2008. Before this, she was research area manager for Mathematics, Theoretical Computer Science, and Cryptography at Microsoft Research Redmond. Chayes joined Microsoft Research in 1997, when she co-founded the Theory Group. Her research areas include phase transitions in discrete mathematics and computer science, structural and dynamical properties of self-engineered networks, and algorithmic game theory.

Christian Borgs

Christian Borgs is deputy managing director of the Microsoft Research New England lab in Cambridge, Massachusetts. He is also an affiliate professor of Mathematics at the University of Washington. Since joining Microsoft in 1997, Borgs has become one of the world leaders in the study in phase transitions in combinatorial optimization, and more generally, in the application of methods from statistical physics and probability theory to address problems of interest to computer science and technology. He is one of the top researchers in the modeling and analysis of self-organized networks, such as the Internet, the World Wide Web, and social networks.

Riccardo Zecchina

Riccardo Zecchina is professor of Theoretical Physics at the Politecnico di Torino in Italy. His interests are in topics at the interface between statistical physics and computer science. His current research activity is focused on combinatorial and stochastic optimization, probabilistic and message-passing algorithms, and interdisciplinary applications of statistical physics (in computational biology, graphical games, and statistical inference).