Coevolution of Viruses and the Immune System
Inside our body, a battle rages between the immune system and disease-causing pathogens. Striving for an advantage, pathogens constantly evolve to evade detection by the immune system.
In 2010, Microsoft researchers—together with colleagues from Murdoch University in Western Australia, the University of Western Australia, and Fundación Ciencia para la Vida in Chile—explored this evolutionary struggle. Their study focused on human leukocyte antigen (HLA) molecules, which sample cellular proteins and present them on the cellular surface for examination by our immune system. (See Mapping the Landscape of Host-Pathogen Coevolution: HLA Class I Binding and Its Relationship with Evolutionary Conservation in Human and Viral Proteins).
When viruses infect a cell, they bring their own genetic material into the cell and use cellular resources to propagate. As a result, HLA molecules present viral proteins on the infected cell’s surface, spurring an immune attack on the "odd" cells. However, viruses often mutate to evade detection, altering the protein segments that HLA molecules are most likely to present.
On the other side, the distribution of the thousands of HLA variants present in human populations can change over many generations. This sets up an evolutionary game: viruses on one side, our immune system on the other. To analyze this contest, the researchers quantified HLA-binding preferences according to targeting efficiency, a novel measure that captures the correlation between HLA-binding affinities and the genetic conservation in the targeted regions. In theory, HLA molecules should draw attention to protein segments that are shared across related viral species, as such regions should be functionally important and thus immutable.
Analysis of targeting efficiencies indicated that HLA molecules do indeed prefer to target such conserved regions. The magnitude of this preference varies in a way that shows evidence of target splitting, where two different HLA loci focus on different viral families. This phenomenon is consistent with theoretical biology predictions for predator-prey models and indicates that targeting efficiency as a measure of the HLA-virus links will be useful in analyzing viral evolution. Furthermore, in many cases the host’s total targeting efficiency scores for various viruses correlate with clinical outcomes, offering a potentially useful system of measures for analyzing infection outcomes in individual patients or entire human populations under different conditions, such as post-vaccination or following a previous viral infection.
This work was only possible by combining machine learning techniques with large numbers of viral sequences. It illustrates that a computational approach can be just as important to biology as “wet lab” work for both formulating and testing new hypotheses. Several new fields of inquiry have stemmed from this work, including research on “correlation sifting,” a method for feature selection that improves upon standard LASSO approaches to a variety of tasks beyond biology, and which may be used to improve future Microsoft products that currently utilize LASSO or similar feature-selection algorithms.