Show simple item record

dc.contributor.advisorCarroll, Raymond J.
dc.contributor.advisorMa, Yanyuan
dc.creatorRahman, Shahina
dc.date.accessioned2015-10-29T19:59:19Z
dc.date.available2017-08-01T05:37:41Z
dc.date.created2015-08
dc.date.issued2015-08-11
dc.date.submittedAugust 2015
dc.identifier.urihttp://hdl.handle.net/1969.1/155719
dc.description.abstractRegression Analysis is one of the most important tools of statistics which is widely used in other scientific fields for projection and modeling of association between two variables. Nowadays with modern computing techniques and super high performance devices, regression analysis on multiple dimensions has become an important issue. Our task is to address the issue of modeling with no assumption on the mean and the variance structure and further with no assumption on the error distribution. In other words, we focus on developing robust semiparametric and nonparamteric regression problems. In modern genetic epidemiological association studies, it is often important to investigate the relationships among the potential covariates related to disease in case-control data, a study known as "Secondary Analysis". First we focus to model the association between the potential covariates in univariate dimension nonparametrically. Then we focus to model the association in mulivariate set up by assuming a convenient and popular multivariate semiparametric model, known as Single-Index Model. The secondary analysis of case-control studies is particularly challenging due to multiple reasons (a) the case-control sample is not a random sample, (b) the logistic intercept is practically not identifiable and (c) misspecification of error distribution leads to inconsistent results. For rare disease, controls (individual free of disease) are typically used for valid estimation. However, numerous publication are done to utilize the entire case-control sample (including the diseased individual) to increase the efficiency. Previous work in this context has either specified a fully parametric distribution for regression errors or specified a homoscedastic distribution for the regression errors or have assumed parametric forms on the regression mean. In the first chapter we focus on to predict an univariate covariate Y by another potential univariate covariate X neither by any parametric form on the mean function nor by any distributional assumption on error, hence addressing potential heteroscedasticity, a problem which has not been studied before. We develop a tilted Kernel based estimator which is a first attempt to model the mean function nonparametrically in secondary analysis. In the following chapters, we focus on i.i.d samples to model both the mean and variance function for predicting Y by multiple covariates X without assuming any form on the regression mean. In particular we model Y by a single-index model m(X^T ϴ), where ϴ is a single-index vector and m is unspecified. We also model the variance function by another flexible single index model. We develop a practical and readily applicable Bayesian methodology based on penalized spline and Markov Chain Monte Carlo (MCMC) both in i.i.d set up and in case-control set up. For efficient estimation, we model the error distribution by a Dirichlet process mixture models of Normals (DPMM). In numerical examples, we illustrate the finite sample performance of the posterior estimates for both i.i.d and for case-control set up. For single-index set up, in i.i.d case only one existing work based on local linear kernel method addresses modeling of the variance function. We found that our method based on DPMM vastly outperforms the other existing method in terms of mean square efficiency and computation stability. We develop the single-index modeling in secondary analysis to introduce flexible mean and variance function modeling in case-control studies, a problem which has not been studies before. We showed that our method is almost 2 times efficient than using only controls, which is typically used for many cases. We use the real data example from NIH-AARP study on breast cancer, from Colon Cancer Study on red meat consumption and from National Morbidity Air Pollution Study to illustrate the computational efficiency and stability of our methods.en
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectBayesian Methodsen
dc.subjectCase-controlen
dc.subjectDirichlet Process of Mixture Modelen
dc.subjectEfficiencyen
dc.subjectHeteroscedasticityen
dc.subjectKernel estimationen
dc.subjectNonparametricen
dc.subjectP-splinesen
dc.subjectRobusten
dc.subjectSecondary Analysisen
dc.subjectSemiparametricen
dc.subjectSingle-Index Modelen
dc.titleEfficient Nonparametric and Semiparametric Regression Methods with application in Case-Control Studiesen
dc.typeThesisen
thesis.degree.departmentStatisticsen
thesis.degree.disciplineStatisticsen
thesis.degree.grantorTexas A & M Universityen
thesis.degree.nameDoctor of Philosophyen
thesis.degree.levelDoctoralen
dc.contributor.committeeMemberSmith, Roger
dc.contributor.committeeMemberHarknett, Urshi Mueller
dc.contributor.committeeMemberMallick, Bani K.
dc.type.materialtexten
dc.date.updated2015-10-29T19:59:19Z
local.embargo.terms2017-08-01
local.etdauthor.orcid0000-0002-8161-9993


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record