Semiparametric Efficient Estimators in Primary and Secondary Analysis of Case-Control Studies
Abstract
As a cost-efficient alternative to cohort design, case-control design is widely used in epidemiological studies. The primary analysis of the case-control studies focuses on the relationship between disease status and the potential risk factors, while the secondary analysis lies in analyzing the interrelationship between risk factors. The dissertation considers three semiparametric models arose in primary and ki secondary analysis of case-control studies and develops novel semiparametric estimators with great estimation efficiency.
We first investigate a special primary analysis problem, the gene-environment interaction model under independence assumption. While all existing approaches that exploit gene-environment independence assumption rely on a rare disease assumption or/and a distributional assumption on the genetic variable, we allow the disease rate and the distributions of the genetic and environmental variables in the underlying source population to be unknown. Under such a flexible semiparametric model, we derive the semiparametric efficient estimator and show that it outperformed the prospective logistic regression, the standard approach in primary analysis, through various numerical illustrations.
In the secondary conditional mean regression model, we analyze the interrelationship between covariates while only a conditional mean model is specified. Due to the unknown error distribution and the case-control nature of the data, semiparametric efficient estimation requires multivariate nonparametric regression on various quantities, which meets the curse of dimensionality as the dimension of covariates increases. We bypass this problem by devising a dimension reduction approach. The resulting estimator is robust against the misspecification of the regression error distribution and it shows great efficiency gain over several existing methods.
Lastly, we consider a secondary conditional quantile regression problem, which is a more preferable model in epidemiology when high or low values in the population are associated with high risks. Under a semiparametric framework that allows the covariates distribution to be nonparametric, we derive a class of consistent semiparametric estimators and spot the efficient member. The resulting estimator dominates the weighted estimating equation approach, the only published approach on secondary quantile regression, both theoretically and numerically.
Subject
Biased samplesCase-control study
Gene-environment interaction
Primary Analysis
Secondary analysis
Semiparametric estimation
Heteroscedastic errors
Quantile regression
Citation
Liang, Liang (2017). Semiparametric Efficient Estimators in Primary and Secondary Analysis of Case-Control Studies. Doctoral dissertation, Texas A & M University. Available electronically from https : / /hdl .handle .net /1969 .1 /161456.