Topics in Measurement Error Analysis and High-Dimensional Binary Classification

Wang, Tianying

dc.contributor.advisor	Carroll, Raymond J
dc.contributor.advisor	Gaynanova, Irina
dc.creator	Wang, Tianying
dc.date.accessioned	2019-01-18T14:18:14Z
dc.date.available	2020-08-01T06:38:45Z
dc.date.created	2018-08
dc.date.issued	2018-07-18
dc.date.submitted	August 2018
dc.identifier.uri	https://hdl.handle.net/1969.1/173923
dc.description.abstract	We propose novel methods to tackle two problems: the misspecified model with measurement error and high-dimensional binary classification, both have a crucial impact on applications in public health. The first problem exists in the epidemiology practice. Epidemiologists often categorize a continuous risk predictor since categorization is thought to be more robust and interpretable, even when the true risk model is not a categorical one. Thus, their goal is to fit the categorical model and interpret the categorical parameters. We address the question: with measurement error and categorization, how can we do what epidemiologists want, namely to estimate the parameters of the categorical model that would have been estimated if the true predictor was observed? We develop a general methodology for such an analysis, and illustrate it in linear and logistic regression. Simulation studies are presented, and the methodology is applied to a nutrition data set. Discussion of alternative approaches is also included. For the second project, we consider the problem of high-dimensional classification between the two groups with unequal covariance matrices. Rather than estimating the full quadratic discriminant rule, we propose to perform simultaneous variable selection and linear dimension reduction on original data, with the subsequent application of quadratic discriminant analysis on the reduced space. In contrast to quadratic discriminant analysis, the proposed framework does not require estimation of precision matrices and scales linearly with the number of measurements, making it especially attractive for the use on high-dimensional datasets. We support the methodology with theoretical guarantees on variable selection consistency, and empirical comparison with competing approaches. We apply the method to gene expression data of breast cancer patients and confirm the crucial importance of the ESR1 gene in differentiating estrogen receptor status. Further, we provide software support for the proposed methodology. We develop two R packages, CCP and DAP, and present two vignettes as long-format illustrations for their usage.	en
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.subject	categorization	en
dc.subject	differential misclassification	en
dc.subject	epidemiology practice	en
dc.subject	Inverse problems	en
dc.subject	measurement error	en
dc.subject	convex optimization	en
dc.subject	discriminant analysis	en
dc.subject	high-dimensional statistics	en
dc.subject	variable selection	en
dc.title	Topics in Measurement Error Analysis and High-Dimensional Binary Classification	en
dc.type	Thesis	en
thesis.degree.department	Statistics	en
thesis.degree.discipline	Statistics	en
thesis.degree.grantor	Texas A & M University	en
thesis.degree.name	Doctor of Philosophy	en
thesis.degree.level	Doctoral	en
dc.contributor.committeeMember	Wang, Suojin
dc.contributor.committeeMember	Zhao, Hongwei
dc.type.material	text	en
dc.date.updated	2019-01-18T14:18:15Z
local.embargo.terms	2020-08-01
local.etdauthor.orcid	0000-0002-2826-5364

Files in this item

Name:: WANG-DISSERTATION-2018.pdf
Size:: 945.3Kb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Electronic Theses, Dissertations, and Records of Study (2002– )
Texas A&M University Theses, Dissertations, and Records of Study (2002– )

Show simple item record