dc.contributor.advisor | Johnson, Valen E. | |
dc.creator | Nikooienejad, Amir | |
dc.date.accessioned | 2019-01-16T20:43:11Z | |
dc.date.available | 2019-12-01T06:34:07Z | |
dc.date.created | 2017-12 | |
dc.date.issued | 2017-12-01 | |
dc.date.submitted | December 2017 | |
dc.identifier.uri | https://hdl.handle.net/1969.1/173161 | |
dc.description.abstract | The advent of new genomic technologies has resulted in production of massive data
sets. The outcomes in such experiments are often binary vectors or survival times, and the
covariates are gene expressions obtained from thousands of genes under study. Analysis
of these data, especially gene selection for a specific outcome, requires new statistical and
computational methods. In this dissertation, I address this problem and propose one such
method that is shown to be advantageous in selecting explanatory variables for prediction
of binary responses and survival times. I adopt a Bayesian approach that utilizes a mixture
of nonlocal prior densities and point masses on the regression coefficient vectors. The
proposed method provides improved performance in identifying true models and reducing
estimation and prediction error rates in a number of simulation studies for both binary and
survival outcomes.
I also describe a computational algorithm that can be used to implement the methodology
in ultrahigh-dimensional settings (p ≫ n). In particular, for survival response datasets
I show that MCMC is not feasible and instead provide a computational algorithm based
on a stochastic search algorithm that is scalable and p invariant.
As part of the variable selection methodology, I also propose a novel approach for
setting prior hyperparameters by examining the total variation distance between the prior
distributions on the regression parameters and the distribution of the maximum likelihood
estimator under the null distribution. An R package, BVSNLP, is also introduced in this
dissertation as a final product which contains all described methodology here. It performs
high dimensional Bayesian variable selection for binary and survival outcome datasets that
is expected to have a variety of applications including cancer genomic studies.
Another problem that is addressed in this dissertation is methodology for deriving and extending Uniformly Most Powerful Bayesian tests (UMPBTs) from exponential family
distributions to a larger class of testing contexts. UMPBTs are an objective class of
Bayesian hypothesis tests that can be considered the Bayesian counterpart of classical uniformly
most powerful tests. However, they have previously been exposed for application
in one parameter exponential family models. I introduce sufficient conditions for the existence
of UMPBTs and propose a unified approach for their derivation. An important
application of my methodology is the extension of UMPBTs to testing whether the noncentrality
parameter of a x^2 distribution is zero. | en |
dc.format.mimetype | application/pdf | |
dc.language.iso | en | |
dc.subject | Bayesian Variable Selection | en |
dc.subject | Nonlocal Priors | en |
dc.subject | High Dimensional Analysis | en |
dc.subject | Cancer Genomics | en |
dc.subject | Binary Response Data | en |
dc.subject | Survival Data | en |
dc.subject | Uniformly Most Powerful Bayesian Tests (UMPBT) | en |
dc.subject | R package | en |
dc.subject | | en |
dc.title | Bayesian Variable Selection in High Dimensional Genomic Studies Using Nonlocal Priors | en |
dc.type | Thesis | en |
thesis.degree.department | Statistics | en |
thesis.degree.discipline | Statistics | en |
thesis.degree.grantor | Texas A & M University | en |
thesis.degree.name | Doctor of Philosophy | en |
thesis.degree.level | Doctoral | en |
dc.contributor.committeeMember | Wang, Wenyi | |
dc.contributor.committeeMember | Bhattacharya, Anirban | |
dc.contributor.committeeMember | Sivakumar, Natarajan | |
dc.type.material | text | en |
dc.date.updated | 2019-01-16T20:43:11Z | |
local.embargo.terms | 2019-12-01 | |
local.etdauthor.orcid | 0000-0003-4696-8300 | |