FaST-LMM

Established: October 1, 2010

FaST-LMM (Factored Spectrally Transformed Linear Mixed Models) is a set of tools for performing genome-wide association studies (GWAS) on large data sets. FaST-LMM runs on both Windows and Linux, and has been tested on data sets with over one million samples.

Applications include single-SNP testing [1] with improvements described in [4], SNP-set testing [3], tests for epistasis, and heritability estimation [5].

Software versions

FaST-LMM (python): This version is our most up-to-date release and available on GitHub.  It supports univariate GWAS, set tests, epistatic tests, and heritability estimation. The release includes ipython notebook examples and API documentation.  An example of FaST-LMM with cloud computing is here.

FaST-LMM (C++): This version supports univariate GWAS and epistatic tests. The release includes Windows binary, Linux binary, and source.

FaST-LMM-EWASher: This version support corrections for cellular heterogeneity in methylation and similar data.  The release includes a python version and R version.

Selected references

  1. Lippert, J. Listgarten, Y. Liu, C.M. Kadie, R.I. Davidson, D. Heckerman. FaST linear mixed models for genome-wide association studies. Nature Methods, 8: 833-835, Oct 2011 (doi:10.1038/nmeth.1681).
  2. Zou, C. Lippert, D. Heckerman, M. Aryee, J. Listgarten. Epigenome-wide association studies without the need for cell-type composition. Nature Methods (doi:10.1038/nmeth.2815).
  3. Lippert, Jing Xiang, Danilo Horta, Christian Widmer, Carl M. Kadie, D. Heckerman, J. Listgarten. Greater power and computational efficiency for kernel-based association testing of sets of genetic variants. Bioinformatics, 2014 (doi: 10.1093/bioinformatics/btu504).
  4. Widmer, C. Lippert, O. Weissbrod, N. Fusi, C.M. Kadie, R.I. Davidson, J. Listgarten, and D. Heckerman. Further Improvements to Linear Mixed Models for Genome-Wide Association Studies. Scientific Reports, 4, 6874, Nov 2014 (doi:10.1038/srep06874).
  5. Heckerman, D. Gurdasani, C. Kadie, C. Pomilla, T. Carstensen, H. Martin, K. Ekoru, R.N. Nsubuga, G. Ssenyomo A. Kamali, P. Kaleebu, C. Widmer, and M.S. Sandhu. Linear mixed model for heritability estimation that explicitly addresses environmental variation. PNAS, 113: 7377–7382, July 2016 (doi: 10.1073/pnas.1510497113).

Click here for an annotated bibliography.

Univariate GWAS

  1. C. Lippert*, J. Listgarten*, Y. Liu, C.M. Kadie, R.I. Davidson, D. Heckerman*FaST linear mixed models for genome-wide association studiesNature Methods, 8: 833-835, Oct 2011 (doi:10.1038/nmeth.1681). (*equal contributions)
    • In this paper, we showed how estimating the GSM from fewer SNPs than individuals leads to computations which are linear in time and memory instead of cubic and quadratic, respectively. To thin our SNPs so as to achieve this condition, we relied on linkage disequilibrium, taking every Kth SNP, and showing the trade-off of using this reduced number over using all available SNPs.
  2. J. Listgarten*, C. Lippert*, C.M. Kadie, R.I. Davidson, E. Eskin, D. Heckerman*. Improved linear mixed models for genome-wide association studies. Nature Methods, 9: 525-526, June 2012 (doi:10.1038/nmeth.2037). (*equal contributions)
    • In this paper, we described an alternative method for selecting the SNPs, so as to leverage the computational efficiencies in [1], while simultaneously improving the model (i.e., maintaining type 1 error control, and improving power). Subsequent to this paper, we, and others, found that in some settings, this feature selection alone could fail to control the type 1 error. This led to a modified approach developed and demonstrated in [5].
  3. J. Listgarten*, C. Lippert*, D. Heckerman*. FaST-LMM-Select for addressing confounding from spatial structure and rare variants. Nature Genetics (2013) doi:10.1038/ng.2620 (*equal contributions)
    • In this paper, we showed how out-of-the-box application of our approach in [2] solved an open problem in statistical genetics that had been published in Nature Genetics. The problem was that none of the available methods (they did not try [2]) could control the type 1 error when there was a “sharply-peaked,, spatial, non-genetic risk” and rare-variants in a GWAS.
  4. C. Lippert*, Gerald Quon, Eun Youg Kang, Carl M. Kadie, J. Listgarten*, D. Heckerman*The benefits of selecting phenotype-specific variants for applications of mixed models in genomicsScientific Reports(2013) doi:10.1038/srep01815 (*equal contributions)
    • In this paper, we characterized empirically how feature selection of SNPs for the GRM could help improve GWAS and prediction. As stated with respect to [2], some of these ideas didn’t generalize to all settings, as shown and corrected in [5].
  5. C. Widmer*, C. Lippert*, O. Weissbrod, N. Fusi, C.M. Kadie, R.I. Davidson, J. Listgarten, and D. Heckerman*. Further Improvements to Linear Mixed Models for Genome-Wide Association Studies. Scientific Reports, 4, 6874, Nov 2014 (doi:10.1038/srep06874). (*equal contributions)
    • Describes the latest version of FaST-LMM. It shows that selecting SNPs for the linear-mixed-model similarity matrix through pruning via linkage disequilibrium works well to control type I error, and that selecting SNPs that are predictive of the phenotype does not.
  6. C. Lippert and D. Heckerman. Computational and statistical issues in personalized medicine. XRDS 21, 24-27, Summer 2015 (doi:10.1145/2788502).
    • We described statistical issues in GWAS with linear mixed models from a graphical-model perspective.

Set Tests for GWAS

  1. Listgarten*, C. Lippert*, Eun Youg Kang, Jing Xiang, Carl M. Kadie, D. Heckerman*A powerful and efficient set test for genetic markers that handles confounders. Bioinformatics, 29:1526-1533, April 2013 (doi:10.1093/bioinformatics/btt177). (*equal contributions)
    • This paper demonstrated how to efficiently test sets of SNPs using a LMM, when a low-rank background kernel is needed, as in [2], and that the LRT can be more powerful than a score test. We also introduced a new way to compute p-values for the LRT in this variance component setting.
  1. C. Lippert, Jing Xiang, Danilo Horta, Christian Widmer, Carl M. Kadie, D. Heckerman, J. Listgarten. Greater power and computational efficiency for kernel-based association testing of sets of genetic variantsBioinformatics, 2014 (doi: 10.1093/bioinformatics/btu504).
    • This paper makes theoretical arguments, and demonstrate empirically, that the LRT is often more powerful than the traditionally-used score test (e.g. SKAT), in practice, except when there is such weak signal that the power is in any case not useful. It also has exposition on how to do a number of algebraic computations for set tests with either a low- or full-rank background kernel, efficiently.

Epigenetic Cellular Heterogeneity Correction

  1. Zou, C. Lippert, D. Heckerman, M. Aryee, Jennifer Listgarten. Epigenome-wide association studies without the need for cell-type compositionNature Methods, doi:10.1038/NMETH.2815.
    • In this paper, we leveraged our work from [1] and [2], combined with adding principle components, to correct for the confounding effects of cellular heterogeneity in methylation association studies. Notably, this is achieved without any knowledge of which cell types are present, and without any auxiliary data of any kind.

Epistatic Genome-Wide Association (EWAS)

  1. Lippert*, J. Listgarten*, Robert Davidson, Scott Baxter, Hoifung Poon, Carl M. Kadie, D. Heckerman*An Exhaustive Epistatic SNP Association Analysis on Expanded Wellcome Trust Data, Scientific Reports, 2013, doi:10.1038/srep01099 (*equal contributions)
    • In this work, we computed, by brute force, all possible pairwise-epistatic tests for all phenotypes in the WTCCC1 data, by leveraging our fast low-rank computations in [2]. As mentioned, subsequently, these low-rank approaches were shown to not always control type 1 error [5], and so some of the results may have inflated test statistics. The rank order of the hits may be approximately correct, and therefore we have left these results on the Azure marketplace (http://datamarket.azure.com/dataset/microsoftresearch/epistasisgwas).

GWAS for “functional traits” such as longitudinal traits

  1. Fusi and J. Listgarten.  Leveraging Non-Linear Genetic Effects on Functional Traits for GWAS, Proceedings of RECOMB 2016.
    • In this work, we introduce a new model for performing GWAS for vector-valued traits which vary smoothly in time. The framework is expressive and computationally efficient, but the null model is not nested inside of the alternative model, something we are currently addressing in ongoing work.

Heritability estimation

  1. Heckerman, D. Gurdasani, C. Kadie, C. Pomilla, T. Carstensen, H. Martin, K. Ekoru, R.N. Nsubuga, G. Ssenyomo A. Kamali, P. Kaleebu, C. Widmer, and M.S. Sandhu. Linear mixed model for heritability estimation that explicitly addresses environmental variation. PNAS, 113: 7377–7382 (doi: 10.1073/pnas.1510497113).
    • We described a way to generalize linear mixed models to take spatial location into account when jointly modeling the influences of genomics and environment on traits.