Show simple item record

dc.contributor.advisorSinha, Samiran
dc.creatorXue, Jingnan
dc.date.accessioned2020-02-24T21:49:54Z
dc.date.available2020-02-24T21:49:54Z
dc.date.created2017-08
dc.date.issued2017-05-30
dc.date.submittedAugust 2017
dc.identifier.urihttps://hdl.handle.net/1969.1/187253
dc.description.abstractBig data analysis and high dimensional data analysis are two popular and challenging topics in current statistical research. They bring us a lot of opportunities as well as many challenges. For big data, traditional methods are generally not efficient enough to handle them, from both time perspective and space perspective. For high dimensional data, most traditional methods can’t be implemented, let alone maintain their desirable properties, such as consistency. In this disseration, three new strategies are proposed to solve these issues. HZSIS is a robust model-free variable screening method and possesses sure screening property under the ultrahigh-dimensional setting. It works based on the nonparanormal transformation and Henze-Zirkler’s test. The numerical results indicate that, compared to the existing methods, the proposed method is more robust to the data generated from heavy-tailed distributions and/or complex models with interaction variables. Double Parallel Monte Carlo is a simple, practical and efficient MCMC algorithm for Bayesian analysis of big data. The proposed algorithm suggests to divide the big dataset into some smaller subsets and provides a simple method to aggregate the subset posteriors to approximate the full data posterior. To further speed up computation, the proposed algorithm employs the population stochastic approximation Monte Carlo (Pop-SAMC) algorithm, a parallel MCMC algorithm, to simulate from each subset posterior. Since the proposed algorithm consists of two levels of parallel, data parallel and simulation parallel, it is coined as “Double Parallel Monte Carlo”. The validity of the proposed algorithm is justified both mathematically and numerically. Average Bayesian Information Criterion (ABIC) and its high-dimensional variant Average Extended Bayesian Information Criterion (AEBIC) led to an innovative way to use posterior samples to conduct model selection. The consistency of this method is established for the high-dimensional generalized linear model under some sparsity and regularity conditions. The numerical results also indicate that, when the sample size is large enough, this method can accurately select the smallest true model with high probability.en
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectVariable selectionen
dc.subjectvariable screeningen
dc.subjectultrahigh dimensional data analysisen
dc.subjectbig dataen
dc.subjectparallel computingen
dc.subjectMCMCen
dc.titleRobust Model-free Variable Screening, Double-parallel Monte Carlo and Average Bayesian Information Criterionen
dc.typeThesisen
thesis.degree.departmentStatisticsen
thesis.degree.disciplineStatisticsen
thesis.degree.grantorTexas A&M Universityen
thesis.degree.nameDoctor of Philosophyen
thesis.degree.levelDoctoralen
dc.contributor.committeeMemberMallick, Bani
dc.contributor.committeeMemberBhattacharya, Anirban
dc.contributor.committeeMemberZhou, Jianxin
dc.type.materialtexten
dc.date.updated2020-02-24T21:49:54Z
local.etdauthor.orcid0000-0001-9679-915X


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record