Bayesian Variable Selection in High Dimensional Genomic Studies Using Nonlocal Priors

Nikooienejad, Amir

View/ Open

NIKOOIENEJAD-DISSERTATION-2017.pdf (525.0Kb)

Date

2017-12-01

Author

Nikooienejad, Amir

Metadata

Show full item record

Abstract

The advent of new genomic technologies has resulted in production of massive data sets. The outcomes in such experiments are often binary vectors or survival times, and the covariates are gene expressions obtained from thousands of genes under study. Analysis of these data, especially gene selection for a specific outcome, requires new statistical and computational methods. In this dissertation, I address this problem and propose one such method that is shown to be advantageous in selecting explanatory variables for prediction of binary responses and survival times. I adopt a Bayesian approach that utilizes a mixture of nonlocal prior densities and point masses on the regression coefficient vectors. The proposed method provides improved performance in identifying true models and reducing estimation and prediction error rates in a number of simulation studies for both binary and survival outcomes. I also describe a computational algorithm that can be used to implement the methodology in ultrahigh-dimensional settings (p ≫ n). In particular, for survival response datasets I show that MCMC is not feasible and instead provide a computational algorithm based on a stochastic search algorithm that is scalable and p invariant. As part of the variable selection methodology, I also propose a novel approach for setting prior hyperparameters by examining the total variation distance between the prior distributions on the regression parameters and the distribution of the maximum likelihood estimator under the null distribution. An R package, BVSNLP, is also introduced in this dissertation as a final product which contains all described methodology here. It performs high dimensional Bayesian variable selection for binary and survival outcome datasets that is expected to have a variety of applications including cancer genomic studies. Another problem that is addressed in this dissertation is methodology for deriving and extending Uniformly Most Powerful Bayesian tests (UMPBTs) from exponential family distributions to a larger class of testing contexts. UMPBTs are an objective class of Bayesian hypothesis tests that can be considered the Bayesian counterpart of classical uniformly most powerful tests. However, they have previously been exposed for application in one parameter exponential family models. I introduce sufficient conditions for the existence of UMPBTs and propose a unified approach for their derivation. An important application of my methodology is the extension of UMPBTs to testing whether the noncentrality parameter of a x^2 distribution is zero.

Citation

Nikooienejad, Amir (2017). Bayesian Variable Selection in High Dimensional Genomic Studies Using Nonlocal Priors. Doctoral dissertation, Texas A & M University. Available electronically from https : / /hdl .handle .net /1969 .1 /173161.