Statistical Inference for Multi-view Data

Zhang, Yunfeng

dc.contributor.advisor	Gaynanova, Irina
dc.creator	Zhang, Yunfeng
dc.date.accessioned	2021-02-03T17:11:38Z
dc.date.available	2022-08-01T06:51:42Z
dc.date.created	2020-08
dc.date.issued	2020-07-10
dc.date.submitted	August 2020
dc.identifier.uri	https://hdl.handle.net/1969.1/192350
dc.description.abstract	Multi-view data, that is matched sets of measurements on the same subjects, have become increasingly common with technological advances in genomics, neuroscience and wearable technologies, etc. Despite its prevalence, traditional techniques for classification or association analysis cannot be applied to multi-view data since they do not take into account the heterogeneity between the views. In this dissertation, we focus on generalizing the existing high-dimensional methods to multi-view data. First, we propose a framework for the Joint Association and Classification Analysis of multi-view data (JACA). We support the methodology with theoretical guarantees for estimation consistency in high-dimensional settings, and numerical comparisons with existing methods. In addition, our approach is capable of using partial information where class labels or subsets of views are missing. Second, we investigate the Pan-Cancer data with a goal to assess the strength of association between different cellular composition estimations by exploring the Generalized Association Study framework. We extract the shared and individual signals from each view, and evaluate the relationship they have with the survival to find out the bio-markers that are predictive for cancer prognosis. Lastly, we propose a low-rank canonical correlation analysis framework to model heterogeneous data (both Gaussian and non-Gaussian) using exponential family distributions. We exploit a decomposition-based strategy to extract shared and individual structures from underlying natural parameter matrices. In contrast to existing methods, our approach guarantees that there is no shared information embedded in the individual structures. An alternating split orthogonal constraints algorithm is developed to estimate the model parameters, and simulation studies show the advantages of the proposed approach over other classical methods.	en
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.subject	Multi-view data	en
dc.subject	Canonical correlation analysis	en
dc.subject	Discriminant analysis	en
dc.subject	Sparsity	en
dc.subject	Variable selection	en
dc.title	Statistical Inference for Multi-view Data	en
dc.type	Thesis	en
thesis.degree.department	Statistics	en
thesis.degree.discipline	Statistics	en
thesis.degree.grantor	Texas A&M University	en
thesis.degree.name	Doctor of Philosophy	en
thesis.degree.level	Doctoral	en
dc.contributor.committeeMember	Huang, Jianhua
dc.contributor.committeeMember	Zhang, Xianyang
dc.contributor.committeeMember	Qian, Xiaoning
dc.type.material	text	en
dc.date.updated	2021-02-03T17:11:39Z
local.embargo.terms	2022-08-01
local.etdauthor.orcid	0000-0001-7865-3165

Files in this item

Name:: ZHANG-DISSERTATION-2020.pdf
Size:: 1.727Mb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Electronic Theses, Dissertations, and Records of Study (2002– )
Texas A&M University Theses, Dissertations, and Records of Study (2002– )

Show simple item record