Statistical Inference in High-Dimensional Models
Abstract
This dissertation consists of the two independent studies on statistical inference in high-dimensional models. The first study considers high-dimensional linear model where the number of predictors is greater than the sample size. The second study covers high-dimensional association tests in genomics where the number of features exceeds the sample size.
In the first study, we develop a new method to estimate the projection direction in the debiased Lasso estimator. The basic idea is to decompose the overall bias into two terms corresponding to strong and weak signals respectively. We propose to estimate the projection direction by balancing the squared biases associated with the strong and weak signals as well as the variance of the projection-based estimator. Standard quadratic programming solver can efficiently solve the resulting optimization problem. In theory, we show that the unknown set of strong signals can be consistently estimated and the projection-based estimator enjoys the asymptotic normality under suitable assumptions. A slight modification of our procedure leads to an estimator with a potentially smaller order of bias comparing to the original debiased Lasso. We further generalize our method to conduct inference for a sparse linear combination of the regression coefficients. Numerical studies demonstrate the advantage of the proposed approach concerning coverage accuracy over some existing alternatives.
The second study presents a novel two-stage approach for more powerful confounder adjustment in large-scale multiple testing to strike a balance between the Type I error and power. Specifically, we use the unadjusted z-statistics to enrich signals in the first stage and then use the adjusted z-statistics to remove the false signals due to confounders in the second stage. We develop a new way of simultaneously choosing the two cutoffs in both steps. This is based on our estimates for the false rejections by using nonparametric empirical Bayes approach. We show that our proposed method provides asymptotic false discovery rate control and delivers more power than the traditional one-stage approach. Promising finite sample performance is demonstrated via simulations and real data illustration in comparison with existing competitors.
Subject
Confidence intervalHigh-dimensional linear models
Lasso
Quadratic programming
Benjamini-Hochberg procedure
Confounding factor
Empirical bayes
False discovery rate
Multiple testing.
Citation
Yi, Sangyoon (2020). Statistical Inference in High-Dimensional Models. Doctoral dissertation, Texas A&M University. Available electronically from https : / /hdl .handle .net /1969 .1 /192358.