Show simple item record

dc.contributor.advisorGaynanova, Irina
dc.creatorZhang, Yunfeng
dc.date.accessioned2021-02-03T17:11:38Z
dc.date.available2022-08-01T06:51:42Z
dc.date.created2020-08
dc.date.issued2020-07-10
dc.date.submittedAugust 2020
dc.identifier.urihttps://hdl.handle.net/1969.1/192350
dc.description.abstractMulti-view data, that is matched sets of measurements on the same subjects, have become increasingly common with technological advances in genomics, neuroscience and wearable technologies, etc. Despite its prevalence, traditional techniques for classification or association analysis cannot be applied to multi-view data since they do not take into account the heterogeneity between the views. In this dissertation, we focus on generalizing the existing high-dimensional methods to multi-view data. First, we propose a framework for the Joint Association and Classification Analysis of multi-view data (JACA). We support the methodology with theoretical guarantees for estimation consistency in high-dimensional settings, and numerical comparisons with existing methods. In addition, our approach is capable of using partial information where class labels or subsets of views are missing. Second, we investigate the Pan-Cancer data with a goal to assess the strength of association between different cellular composition estimations by exploring the Generalized Association Study framework. We extract the shared and individual signals from each view, and evaluate the relationship they have with the survival to find out the bio-markers that are predictive for cancer prognosis. Lastly, we propose a low-rank canonical correlation analysis framework to model heterogeneous data (both Gaussian and non-Gaussian) using exponential family distributions. We exploit a decomposition-based strategy to extract shared and individual structures from underlying natural parameter matrices. In contrast to existing methods, our approach guarantees that there is no shared information embedded in the individual structures. An alternating split orthogonal constraints algorithm is developed to estimate the model parameters, and simulation studies show the advantages of the proposed approach over other classical methods.en
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectMulti-view dataen
dc.subjectCanonical correlation analysisen
dc.subjectDiscriminant analysisen
dc.subjectSparsityen
dc.subjectVariable selectionen
dc.titleStatistical Inference for Multi-view Dataen
dc.typeThesisen
thesis.degree.departmentStatisticsen
thesis.degree.disciplineStatisticsen
thesis.degree.grantorTexas A&M Universityen
thesis.degree.nameDoctor of Philosophyen
thesis.degree.levelDoctoralen
dc.contributor.committeeMemberHuang, Jianhua
dc.contributor.committeeMemberZhang, Xianyang
dc.contributor.committeeMemberQian, Xiaoning
dc.type.materialtexten
dc.date.updated2021-02-03T17:11:39Z
local.embargo.terms2022-08-01
local.etdauthor.orcid0000-0001-7865-3165


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record