Understanding the Genetic Causes of Human Disease
Many common diseases, including cardiovascular disease, cancer, and various psychiatric illnesses, arise from complex interactions between a person’s genetics and all the environmental influences he or she encounters over a lifetime. If we could only untangle these factors and determine the underlying causes, we might better prevent, diagnose, and treat the diseases. We might even develop individualized treatments that are based on a patient’s genetic make-up.
But analyzing multiple genetic and environmental factors is complicated, to say the least. We now have enormous genomics data sets, but we need statistical methods that can represent the complexity of human disease. In other words, we need both rich data about the behavior of human cells in various environmental contexts, along with complex statistical models to analyze the resultant data.
Impossible? Hardly, thanks to the work of the Machine Learning and Perception group at Microsoft Research Cambridge and the Wellcome Trust Sanger Institute. Senior Researcher John Winn and his colleagues have developed Infer.NET, an advanced machine-learning framework for modeling and understanding very complex systems. Infer.NET allows the team to represent the complexity of human diseases in a way previously unachievable. The team at the Wellcome Trust Sanger Institute, led by Joint Head of Human Genetics Richard Durbin, brings world-class expertise in large-scale genomic sequencing and analysis of genomic data.
The project requires analyzing four types of data:
- Genetic data: All or key parts of the DNA sequence of an individual
- Functional genomic data: Measurements, such as gene expression, that indicate the activity of individual genes in various body tissues
- Environmental data: Information about an individual’s environmental exposures, such as smoking or sunbathing
- Disease data: Physiological measurements and information about known diseases or symptoms that an individual has experienced
These data are brought together in a single statistical model, so as to discover associations between the genome, cell function, environmental factors, and disease.
Our analysis is identifying correlations between (1) our genetics and the activity of genes in different tissues and (2) the symptoms or characteristics of the individuals from whom the samples come. This is shedding new light on how variations in our genetic makeup can make us susceptible to different diseases, giving us deeper understanding than ever before on the genetic causes of human disease.
The application of Infer.NET to bioinformatics led to the first parallel version of the framework, driving improvements to the design and implementation of this key technology. By pushing the scalability of Infer.NET, this project directly helped make this machine-learning framework ready to use in a number of Microsoft products.
Learn more about this research:
- Joint Genetic Analysis of Gene Expression Data with Inferred Cellular Phenotypes
- A Bayesian Framework to Account for Complex Non-Genetic Factors in Gene Expression Levels Greatly Increases Power in eQTL Studies