Efficient Nonparametric and Semiparametric Regression Methods with application in Case-Control Studies

Rahman, Shahina

dc.contributor.advisor	Carroll, Raymond J.
dc.contributor.advisor	Ma, Yanyuan
dc.creator	Rahman, Shahina
dc.date.accessioned	2015-10-29T19:59:19Z
dc.date.available	2017-08-01T05:37:41Z
dc.date.created	2015-08
dc.date.issued	2015-08-11
dc.date.submitted	August 2015
dc.identifier.uri	https://hdl.handle.net/1969.1/155719
dc.description.abstract	Regression Analysis is one of the most important tools of statistics which is widely used in other scientific fields for projection and modeling of association between two variables. Nowadays with modern computing techniques and super high performance devices, regression analysis on multiple dimensions has become an important issue. Our task is to address the issue of modeling with no assumption on the mean and the variance structure and further with no assumption on the error distribution. In other words, we focus on developing robust semiparametric and nonparamteric regression problems. In modern genetic epidemiological association studies, it is often important to investigate the relationships among the potential covariates related to disease in case-control data, a study known as "Secondary Analysis". First we focus to model the association between the potential covariates in univariate dimension nonparametrically. Then we focus to model the association in mulivariate set up by assuming a convenient and popular multivariate semiparametric model, known as Single-Index Model. The secondary analysis of case-control studies is particularly challenging due to multiple reasons (a) the case-control sample is not a random sample, (b) the logistic intercept is practically not identifiable and (c) misspecification of error distribution leads to inconsistent results. For rare disease, controls (individual free of disease) are typically used for valid estimation. However, numerous publication are done to utilize the entire case-control sample (including the diseased individual) to increase the efficiency. Previous work in this context has either specified a fully parametric distribution for regression errors or specified a homoscedastic distribution for the regression errors or have assumed parametric forms on the regression mean. In the first chapter we focus on to predict an univariate covariate Y by another potential univariate covariate X neither by any parametric form on the mean function nor by any distributional assumption on error, hence addressing potential heteroscedasticity, a problem which has not been studied before. We develop a tilted Kernel based estimator which is a first attempt to model the mean function nonparametrically in secondary analysis. In the following chapters, we focus on i.i.d samples to model both the mean and variance function for predicting Y by multiple covariates X without assuming any form on the regression mean. In particular we model Y by a single-index model m(X^T ϴ), where ϴ is a single-index vector and m is unspecified. We also model the variance function by another flexible single index model. We develop a practical and readily applicable Bayesian methodology based on penalized spline and Markov Chain Monte Carlo (MCMC) both in i.i.d set up and in case-control set up. For efficient estimation, we model the error distribution by a Dirichlet process mixture models of Normals (DPMM). In numerical examples, we illustrate the finite sample performance of the posterior estimates for both i.i.d and for case-control set up. For single-index set up, in i.i.d case only one existing work based on local linear kernel method addresses modeling of the variance function. We found that our method based on DPMM vastly outperforms the other existing method in terms of mean square efficiency and computation stability. We develop the single-index modeling in secondary analysis to introduce flexible mean and variance function modeling in case-control studies, a problem which has not been studies before. We showed that our method is almost 2 times efficient than using only controls, which is typically used for many cases. We use the real data example from NIH-AARP study on breast cancer, from Colon Cancer Study on red meat consumption and from National Morbidity Air Pollution Study to illustrate the computational efficiency and stability of our methods.	en
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.subject	Bayesian Methods	en
dc.subject	Case-control	en
dc.subject	Dirichlet Process of Mixture Model	en
dc.subject	Efficiency	en
dc.subject	Heteroscedasticity	en
dc.subject	Kernel estimation	en
dc.subject	Nonparametric	en
dc.subject	P-splines	en
dc.subject	Robust	en
dc.subject	Secondary Analysis	en
dc.subject	Semiparametric	en
dc.subject	Single-Index Model	en
dc.title	Efficient Nonparametric and Semiparametric Regression Methods with application in Case-Control Studies	en
dc.type	Thesis	en
thesis.degree.department	Statistics	en
thesis.degree.discipline	Statistics	en
thesis.degree.grantor	Texas A & M University	en
thesis.degree.name	Doctor of Philosophy	en
thesis.degree.level	Doctoral	en
dc.contributor.committeeMember	Smith, Roger
dc.contributor.committeeMember	Harknett, Urshi Mueller
dc.contributor.committeeMember	Mallick, Bani K.
dc.type.material	text	en
dc.date.updated	2015-10-29T19:59:19Z
local.embargo.terms	2017-08-01
local.etdauthor.orcid	0000-0002-8161-9993

Files in this item

Name:: RAHMAN-DISSERTATION-2015.pdf
Size:: 804.9Kb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Electronic Theses, Dissertations, and Records of Study (2002– )
Texas A&M University Theses, Dissertations, and Records of Study (2002– )

Show simple item record