{"id":171427,"date":"2014-12-17T13:47:15","date_gmt":"2014-12-17T13:47:15","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/project\/fast-lmm-factored-spectrally-transformed-linear-mixed-models-2\/"},"modified":"2019-08-15T10:29:06","modified_gmt":"2019-08-15T17:29:06","slug":"fast-lmm-software-papers","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/fast-lmm-software-papers\/","title":{"rendered":"FaST-LMM"},"content":{"rendered":"<p>FaST-LMM (Factored Spectrally Transformed Linear Mixed Models) is a set of tools for performing efficient genome-wide association studies (GWAS) on large data sets. FaST-LMM runs on both Windows and Linux, and has been tested on data sets with over one million samples.<\/p>\n<p>FaST-LMM applications include single-SNP testing, SNP-set testing, tests for epistasis, and heritability estimation.<\/p>\n<h2><strong>Software versions for FaST-LMM <\/strong><\/h2>\n<p>FaST-LMM (python): <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/MicrosoftGenomics\/FaST-LMM\"><span style=\"color: #008000;\">This version is our most up-to-date release<\/span> <span style=\"color: #008000;\">and available on GitHub<\/span><span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.\u00a0 It supports univariate GWAS, set tests, epistatic tests, and heritability estimation. The release includes <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"http:\/\/nbviewer.ipython.org\/github\/MicrosoftGenomics\/FaST-LMM\/blob\/master\/doc\/ipynb\/FaST-LMM.ipynb\">ipython notebook examples<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> and <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"http:\/\/research.microsoft.com\/en-us\/um\/redmond\/projects\/MSCompBio\/Fastlmm\/api\/\">API documentation<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.\u00a0 An example of\u00a0FaST-LMM\u00a0with cloud computing is\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/blogs.technet.microsoft.com\/machinelearning\/2016\/05\/27\/predicting-traits-from-genomic-data-using-the-microsoft-azure-linux-data-science-vm\/\">here<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n<p>FaST-LMM (C++): This version supports univariate GWAS and epistatic tests. The release includes <a href=\"https:\/\/www.microsoft.com\/en-us\/download\/details.aspx?id=52614\">Windows binary<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/download\/details.aspx?id=52588\">Linux binary<\/a>, and <a href=\"https:\/\/www.microsoft.com\/en-us\/download\/details.aspx?id=52559\">source<\/a>.<\/p>\n<p>EWASher: This version support corrections for cellular heterogeneity in methylation and similar data.\u00a0 The release includes a <a href=\"https:\/\/www.microsoft.com\/en-us\/download\/details.aspx?id=52345\">python version<\/a> and <a href=\"http:\/\/www.microsoft.com\/downloads\/details.aspx?displaylang=en&FamilyID=b5775c78-935d-40a4-8151-5c88116b676e\">R version,<\/a> although the R version has been reported to be\u00a0difficult to run so we advise sticking with the python.<\/p>\n<h2><strong>Annotated Bibliography<\/strong><\/h2>\n<h3><strong>Univariate GWAS<\/strong><\/h3>\n<ol>\n<li>C. Lippert<strong><sup>*<\/sup><\/strong>, J. Listgarten<strong><sup>*<\/sup><\/strong>, Y. Liu, C.M. Kadie, R.I. Davidson, D. Heckerman<strong><sup>*<\/sup><\/strong>.\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"http:\/\/www.nature.com\/nmeth\/journal\/v8\/n10\/abs\/nmeth.1681.html\">FaST linear mixed models for genome-wide association studies<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.\u00a0<em>Nature Methods<\/em>, 8: 833-835, Oct 2011 (doi:10.1038\/nmeth.1681). (<sup>*<\/sup>equal contributions)\n<ul>\n<li>In this paper, we showed how estimating the GSM from fewer SNPs than individuals leads to computations which are linear in time and memory instead of cubic and quadratic, respectively. To thin our SNPs so as to achieve this condition, we relied on linkage disequilibrium, taking every Kth SNP, and showing the trade-off of using this reduced number over using all available SNPs.<\/li>\n<\/ul>\n<\/li>\n<li>J. Listgarten<strong><sup>*<\/sup><\/strong>, C. Lippert<strong><sup>*<\/sup><\/strong>, C.M. Kadie, R.I. Davidson, E. Eskin, D. Heckerman<strong><sup>*<\/sup><\/strong>. Improved linear mixed models for genome-wide association studies.\u00a0<em>Nature Methods<\/em>, 9: 525-526, June 2012 (doi:10.1038\/nmeth.2037). (<sup>*<\/sup>equal contributions)\n<ul>\n<li>In this paper, we described an alternative method for selecting the SNPs, so as to leverage the computational efficiencies in [1], while simultaneously improving the model (i.e., maintaining type 1 error control, and improving power). Subsequent to this paper, we, and others, found that in some settings, this feature selection alone could fail to control the type 1 error. This led to a modified approach developed and demonstrated in [5].<\/li>\n<\/ul>\n<\/li>\n<li>J. Listgarten<strong><sup>*<\/sup><\/strong>, C. Lippert<strong><sup>*<\/sup><\/strong>, D. Heckerman<strong><sup>*<\/sup><\/strong>. FaST-LMM-Select for addressing confounding from spatial structure and rare variants.\u00a0<em>Nature Genetics <\/em>(2013) doi:10.1038\/ng.2620 (<sup>*<\/sup>equal contributions)\n<ul>\n<li>In this paper, we showed how out-of-the-box application of our approach in [2] solved an open problem in statistical genetics that had been published in Nature Genetics. The problem was that none of the available methods (they did not try [2]) could control the type 1 error when there was a &#8220;sharply-peaked,, spatial, non-genetic risk&#8221; and rare-variants in a GWAS.<\/li>\n<\/ul>\n<\/li>\n<li>C. Lippert<strong><sup>*<\/sup><\/strong>, Gerald Quon, Eun Youg Kang, Carl M. Kadie, J. Listgarten<strong><sup>*<\/sup><\/strong>, D. Heckerman<strong><sup>*<\/sup><\/strong>.\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"http:\/\/www.nature.com\/srep\/2013\/130509\/srep01815\/full\/srep01815.html\">The benefits of selecting phenotype-specific variants for applications of mixed models in genomics<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.\u00a0<em>Scientific Reports<\/em>(2013) doi:10.1038\/srep01815 (<sup>*<\/sup>equal contributions)\n<ul>\n<li>In this paper, we characterized empirically how feature selection of SNPs for the GRM could help improve GWAS and prediction. As stated with respect to [2], some of these ideas didn&#8217;t generalize to all settings, as shown and corrected in [5].<\/li>\n<\/ul>\n<\/li>\n<li>C. Widmer*, C. Lippert*, O. Weissbrod, N. Fusi, C.M. Kadie, R.I. Davidson, J. Listgarten, and D. Heckerman*.\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"http:\/\/www.nature.com\/srep\/2014\/141112\/srep06874\/full\/srep06874.html\">Further Improvements to Linear Mixed Models for Genome-Wide Association Studies<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. <em>Scientific Reports<\/em>, 4, 6874, Nov 2014 (doi:10.1038\/srep06874). (<sup>*<\/sup>equal contributions)\n<ul>\n<li>Describes the latest version of FaST-LMM. It shows that selecting SNPs for the linear-mixed-model similarity matrix through pruning via linkage disequilibrium works well to control type I error, and that selecting SNPs that are predictive of the phenotype does not.<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<h3><strong>Set Tests\u00a0for GWAS<\/strong><\/h3>\n<ol start=\"6\">\n<li>Listgarten<strong><sup>*<\/sup><\/strong>, C. Lippert<strong><sup>*<\/sup><\/strong>, Eun Youg Kang, Jing Xiang, Carl M. Kadie, D. Heckerman<strong><sup>*<\/sup><\/strong>.\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"http:\/\/bioinformatics.oxfordjournals.org\/content\/29\/12\/1526\">A powerful and efficient set test for genetic markers that handles confounders.<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> <em>Bioinformatics<\/em>, 29:1526-1533, April 2013 (doi:10.1093\/bioinformatics\/btt177). (<sup>*<\/sup>equal contributions)\n<ul>\n<li>This paper demonstrated how to efficiently test sets of SNPs using a LMM, when a low-rank background kernel is needed, as in [2], and that the LRT can be more powerful than a score test. We also introduced a new way to compute p-values for the LRT in this variance component setting.<\/li>\n<\/ul>\n<\/li>\n<li>C. Lippert, Jing Xiang, Danilo Horta, Christian Widmer, Carl M. Kadie, D. Heckerman, J. Listgarten. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"http:\/\/bioinformatics.oxfordjournals.org\/content\/30\/22\/3206\">Greater power and computational efficiency for kernel-based association testing of sets of genetic variants<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.\u00a0<em>Bioinformatics<\/em>, 2014 (doi: 10.1093\/bioinformatics\/btu504).\n<ul>\n<li>This paper makes theoretical arguments, and demonstrate empirically, that the LRT is often more powerful than the traditionally-used score test (e.g. SKAT), in practice, except when there is such weak signal that the power is in any case not useful. It also has exposition on how to do a number of algebraic computations for set tests with either a low- or full-rank background kernel, efficiently.<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<h3><strong>Epigenetic Cellular Heterogeneity Correction (EWAS)<\/strong><\/h3>\n<ol start=\"8\">\n<li>Zou, C. Lippert, D. Heckerman, M. Aryee, Jennifer Listgarten.\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"http:\/\/www.nature.com\/nmeth\/journal\/v11\/n3\/abs\/nmeth.2815.html\">Epigenome-wide association studies without the need for cell-type composition<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.\u00a0<em>Nature Methods<\/em>, doi:10.1038\/NMETH.2815.\n<ul>\n<li>In this paper, we leveraged our work from [1] and [2], combined with adding principle components, to correct for the confounding effects of cellular heterogeneity in methylation association studies. Notably, this is achieved without any knowledge of which cell types are present, and without any auxiliary data of any kind.<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<h3><strong>Epistatic Genome-Wide Association <\/strong><\/h3>\n<ol start=\"9\">\n<li>Lippert<strong><sup>*<\/sup><\/strong>, J. Listgarten<strong><sup>*<\/sup><\/strong>, Robert Davidson, Scott Baxter, Hoifung Poon, Carl M. Kadie, D. Heckerman<strong><sup>*<\/sup><\/strong>.\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"http:\/\/www.nature.com\/srep\/2013\/130122\/srep01099\/full\/srep01099.html\">An Exhaustive Epistatic SNP Association Analysis on Expanded Wellcome Trust Data<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, <em>Scientific Reports<\/em>, 2013, doi:10.1038\/srep01099 (<sup>*<\/sup>equal contributions)\n<ul>\n<li>In this work, we computed, by brute force, all possible pairwise-epistatic tests for all phenotypes in the WTCCC1 data, by leveraging our fast low-rank computations in [2]. As mentioned, subsequently, these low-rank approaches were shown to not always control type 1 error [5], and so some of the results may have inflated test statistics. The rank order of the hits may be approximately correct, and therefore we have left these results on the Azure marketplace (<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"http:\/\/datamarket.azure.com\/dataset\/microsoftresearch\/epistasisgwas\">http:\/\/datamarket.azure.com\/dataset\/microsoftresearch\/epistasisgwas<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>).<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<h3><\/h3>\n","protected":false},"excerpt":{"rendered":"<p>FaST-LMM (Factored Spectrally Transformed Linear Mixed Models) is a set of tools for performing efficient genome-wide association studies (GWAS) on large data sets. FaST-LMM runs on both Windows and Linux, and has been tested on data sets with over one million samples. FaST-LMM applications include single-SNP testing, SNP-set testing, tests for epistasis, and heritability estimation. [&hellip;]<\/p>\n","protected":false},"featured_media":255012,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"research-area":[13556,13553],"msr-locale":[268875],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-171427","msr-project","type-msr-project","status-publish","has-post-thumbnail","hentry","msr-research-area-artificial-intelligence","msr-research-area-medical-health-genomics","msr-locale-en_us","msr-archive-status-active"],"msr_project_start":"2011-01-01","related-publications":[],"related-downloads":[],"related-videos":[],"related-groups":[],"related-events":[],"related-opportunities":[],"related-posts":[],"related-articles":[],"tab-content":[{"id":0,"name":"","content":""}],"related-researchers":[],"msr_research_lab":[],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/171427","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-project"}],"version-history":[{"count":7,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/171427\/revisions"}],"predecessor-version":[{"id":603729,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/171427\/revisions\/603729"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/255012"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=171427"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=171427"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=171427"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=171427"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=171427"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}