IMS-Microsoft Research Workshop: Foundations of Data Science – False Discovery Rates – a new deal
Session Chair Intro – Rafael Irizarry Harvard University Session Chair Intro: Statistical and Computational Challenges in Biology
Matthew Stephens University of Chicago False Discovery Rates – a new deal False Discovery Rate (FDR) methodology, first put forward by Benjamini and Hochberg, and further developed by many authors – including Storey, Tibshirani, and Efron – is now one of the most widely used statistical methods in large-scale scientific data analysis, particularly in genomics. A typical genomics workflow consists of i) estimating thousands of effects, and their associated p values; ii) feeding these p values to software (e.g. the widely used qvalue package) to estimate the FDR for any given significance threshold. In this talk we take a fresh look at this problem, and highlight two deficiencies of this standard pipeline that we believe could be improved. First, current methods, being based directly on p values (or z scores), fail to fully account for the fact that some measurements are more precise than others. Second, current methods assume that the least significant p values (those near 1) are all null – something that initially appears intuitive, but will not necessarily hold in practice. We suggest simple approaches to address both issues, and demonstrate the potential for these methods to increase the number of discoveries at a given FDR threshold. We also discuss the connection between this problem and shrinkage estimation, and problems involving sparsity more generally.
- Rafael Irizarry and Matthew Stephens
- Harvard University, University of Chicago