Algorithms that Can Handle the "Omics"
Imagine two Neanderthal proto-scientists, standing before an enormous pile of rocks—big rocks, small rocks, smooth rocks, rocks with jagged edges.
“Well,” remarks one of Stone Age researchers, “this is a lot of data here.”
“Yes,” replies his colleague, “if only we had some way to make sense of it all, I bet we could achieve a breakthrough in rock utilization.”
Today, biological researchers are confronting a similar dilemma: a wealth of data, but an inadequate analytical toolkit. Increasingly, biologists are using genomic methods, such as expression profiling, next-generation sequencing, and RNAi screens, together with proteomic and metabolomic technologies, to discover the molecular basis of changes in living systems. Collectively, these methods are often referred to as “omics.”
Unfortunately, these advances in experimental technologies have, in many cases, outpaced the development of the bioinformatics tools that are needed to analyze the data. In other words, optimal analysis of large quantities of experimental data often requires the solution of hard problems that are intractable by conventional computational tools. Happily, collaborative efforts between the Massachusetts Institute of Technology and Microsoft Research have resulted in new algorithms that often do quite well in analyzing these types of problems.
Our methods, which extend ideas from statistical physics of disordered systems to problems in computer science, have provided novel distributive algorithmic schemes for solving large-scale optimization and inference tasks. Among the features of our new algorithms are computational efficiency, parallelizability, and flexibility to include heterogeneous prior knowledge and to integrate diverse data sources. The spectrum of applications ranges from constraint satisfaction and stochastic optimization problems over networks, to graphical games and statistical inference problems.
Our collaboration aims at bringing these new algorithmic techniques to bear on significant problems in biological research. Preliminary results have already led to the discovery of new functional genes and to the prediction of protein contacts from sequence data.
Our current focus is on cancer genomics. In collaboration with the Memorial Sloan-Kettering Cancer Center, we are working on the integration of different types of molecular data to reveal complex response pathways of relevance in cancer development. We hope that this work leads not only to advances in general algorithmic techniques for biological research, but also to the development of drug targets for specific cancers.

