IMS-Microsoft Research Workshop: Foundations of Data Science – Dense and Sparse Signal Detection in Genetic and Genomic Studies
Massive genetic and genomic data present many exciting opportunities as well as challenges in data analysis and result interpretation, e.g., how to develop effective strategies for signal detection using massive genetic and genomic data when signals are weak and sparse. Many variable selection methods have been developed for analysis of high-dimensional data in the statistical literature. However limited work has been done on statistical inference for massive data. In this talk, I will discuss hypothesis testing for analysis of high-dimensional data motivated by gene, pathway/network based analysis in genome-wide association studies using arrays and sequencing data. I will focus on signal detection when signals are weak and sparse, which is the case in genetic and genomic association studies. I will discuss hypothesis testing for signal detection using penalized likelihood based methods and aggregated marginal test statistics based method using the Generalized Higher Criticism (GHC) test. The results are illustrated using data from genome-wide association studies.
- Xihong Lin
- Harvard University