FaST-LMM

FaST-LMM

Established: January 1, 2011

FaST-LMM (Factored Spectrally Transformed Linear Mixed Models) is a set of tools for performing efficient genome-wide association studies (GWAS) on large data sets. FaST-LMM runs on both Windows and Linux, and has been tested on data sets with over one million samples.

FaST-LMM applications include single-SNP testing, SNP-set testing, tests for epistasis, and heritability estimation.

Software versions for FaST-LMM

FaST-LMM (python): This version is our most up-to-date release and available on GitHub.  It supports univariate GWAS, set tests, epistatic tests, and heritability estimation. The release includes ipython notebook examples and API documentation.  An example of FaST-LMM with cloud computing is here.

FaST-LMM (C++): This version supports univariate GWAS and epistatic tests. The release includes Windows binary, Linux binary, and source.

EWASher: This version support corrections for cellular heterogeneity in methylation and similar data.  The release includes a python version and R version, although the R version has been reported to be difficult to run so we advise sticking with the python.

Annotated Bibliography

Univariate GWAS

  1. C. Lippert*, J. Listgarten*, Y. Liu, C.M. Kadie, R.I. Davidson, D. Heckerman*FaST linear mixed models for genome-wide association studiesNature Methods, 8: 833-835, Oct 2011 (doi:10.1038/nmeth.1681). (*equal contributions)
    • In this paper, we showed how estimating the GSM from fewer SNPs than individuals leads to computations which are linear in time and memory instead of cubic and quadratic, respectively. To thin our SNPs so as to achieve this condition, we relied on linkage disequilibrium, taking every Kth SNP, and showing the trade-off of using this reduced number over using all available SNPs.
  2. J. Listgarten*, C. Lippert*, C.M. Kadie, R.I. Davidson, E. Eskin, D. Heckerman*. Improved linear mixed models for genome-wide association studies. Nature Methods, 9: 525-526, June 2012 (doi:10.1038/nmeth.2037). (*equal contributions)
    • In this paper, we described an alternative method for selecting the SNPs, so as to leverage the computational efficiencies in [1], while simultaneously improving the model (i.e., maintaining type 1 error control, and improving power). Subsequent to this paper, we, and others, found that in some settings, this feature selection alone could fail to control the type 1 error. This led to a modified approach developed and demonstrated in [5].
  3. J. Listgarten*, C. Lippert*, D. Heckerman*. FaST-LMM-Select for addressing confounding from spatial structure and rare variants. Nature Genetics (2013) doi:10.1038/ng.2620 (*equal contributions)
    • In this paper, we showed how out-of-the-box application of our approach in [2] solved an open problem in statistical genetics that had been published in Nature Genetics. The problem was that none of the available methods (they did not try [2]) could control the type 1 error when there was a “sharply-peaked,, spatial, non-genetic risk” and rare-variants in a GWAS.
  4. C. Lippert*, Gerald Quon, Eun Youg Kang, Carl M. Kadie, J. Listgarten*, D. Heckerman*The benefits of selecting phenotype-specific variants for applications of mixed models in genomicsScientific Reports(2013) doi:10.1038/srep01815 (*equal contributions)
    • In this paper, we characterized empirically how feature selection of SNPs for the GRM could help improve GWAS and prediction. As stated with respect to [2], some of these ideas didn’t generalize to all settings, as shown and corrected in [5].
  5. C. Widmer*, C. Lippert*, O. Weissbrod, N. Fusi, C.M. Kadie, R.I. Davidson, J. Listgarten, and D. Heckerman*. Further Improvements to Linear Mixed Models for Genome-Wide Association Studies. Scientific Reports, 4, 6874, Nov 2014 (doi:10.1038/srep06874). (*equal contributions)
    • Describes the latest version of FaST-LMM. It shows that selecting SNPs for the linear-mixed-model similarity matrix through pruning via linkage disequilibrium works well to control type I error, and that selecting SNPs that are predictive of the phenotype does not.

Set Tests for GWAS

  1. Listgarten*, C. Lippert*, Eun Youg Kang, Jing Xiang, Carl M. Kadie, D. Heckerman*A powerful and efficient set test for genetic markers that handles confounders. Bioinformatics, 29:1526-1533, April 2013 (doi:10.1093/bioinformatics/btt177). (*equal contributions)
    • This paper demonstrated how to efficiently test sets of SNPs using a LMM, when a low-rank background kernel is needed, as in [2], and that the LRT can be more powerful than a score test. We also introduced a new way to compute p-values for the LRT in this variance component setting.
  2. C. Lippert, Jing Xiang, Danilo Horta, Christian Widmer, Carl M. Kadie, D. Heckerman, J. Listgarten. Greater power and computational efficiency for kernel-based association testing of sets of genetic variantsBioinformatics, 2014 (doi: 10.1093/bioinformatics/btu504).
    • This paper makes theoretical arguments, and demonstrate empirically, that the LRT is often more powerful than the traditionally-used score test (e.g. SKAT), in practice, except when there is such weak signal that the power is in any case not useful. It also has exposition on how to do a number of algebraic computations for set tests with either a low- or full-rank background kernel, efficiently.

Epigenetic Cellular Heterogeneity Correction (EWAS)

  1. Zou, C. Lippert, D. Heckerman, M. Aryee, Jennifer Listgarten. Epigenome-wide association studies without the need for cell-type compositionNature Methods, doi:10.1038/NMETH.2815.
    • In this paper, we leveraged our work from [1] and [2], combined with adding principle components, to correct for the confounding effects of cellular heterogeneity in methylation association studies. Notably, this is achieved without any knowledge of which cell types are present, and without any auxiliary data of any kind.

Epistatic Genome-Wide Association

  1. Lippert*, J. Listgarten*, Robert Davidson, Scott Baxter, Hoifung Poon, Carl M. Kadie, D. Heckerman*An Exhaustive Epistatic SNP Association Analysis on Expanded Wellcome Trust Data, Scientific Reports, 2013, doi:10.1038/srep01099 (*equal contributions)
    • In this work, we computed, by brute force, all possible pairwise-epistatic tests for all phenotypes in the WTCCC1 data, by leveraging our fast low-rank computations in [2]. As mentioned, subsequently, these low-rank approaches were shown to not always control type 1 error [5], and so some of the results may have inflated test statistics. The rank order of the hits may be approximately correct, and therefore we have left these results on the Azure marketplace (http://datamarket.azure.com/dataset/microsoftresearch/epistasisgwas).