Microsoft Research Blog

The Microsoft Research blog provides in-depth views and perspectives from our researchers, scientists and engineers, plus information about noteworthy events and conferences, scholarships, and fellowships designed for academic and scientific communities.

Microsoft Researchers Present New Statistical Method for Genetic Analysis

August 23, 2010 | Posted by Microsoft Research Blog

A deeper understanding of a disease’s genetic underpinnings can lead to better biological insight into the disease and, thus, to improvements in screening, treatment, and drug development. This week, Jennifer Listgarten, David Heckerman, and Carl Kadie of Microsoft Research and Eric E. Schadt of Pacific Biosciences made a significant contribution to researchers’ insight into the role genetics plays in human disease. Their article, Correction for Hidden Confounders in the Genetic Analysis of Gene Expression, was published today in Proceedings of the National Academy of Sciences, one of the world’s most-cited multidisciplinary scientific serials.

Their research offers a possible solution to challenges presented by the variety of confounders hidden in genetic data that lead, when improperly addressed, to both spurious and missed associations. The article presents a new statistical method that better captures the true biological signal of interest by removing interfering signals from the data. Applying the method to real and synthetic data, the paper demonstrates the need for a joint correction of two types of confounders and shows the disadvantages of other possible approaches found in the current literature. In particular, the paper demonstrates that a new class of methods has maximum detection power on synthetic data and the best performance when applied to real data, as judged by a commonly accepted bronze standard.  The software used will be available for free download.

While the article recommends future avenues in which the method could be used, the framework can be applied today on existing data sets with SNP and gene-expression data, two of the most common types of biological data sets. In the future, I believe this new method will become even more relevant in the search for new and improved ways to manage disease. The central problem addressed by this work, of identifying which genetic markers affect the expression of specific genes, leads directly into improving analyses that aim to identify the biological processes that lead to disease. And those future discoveries of biological processes could have a direct impact on identifying the causes of amyotrophic lateral sclerosis, cancer, heart disease, HIV/AIDS, and many other complex diseases that affect many of us.

Tony Hey, corporate vice president, External Research, a division of Microsoft Research