Show simple item record

dc.contributor.advisorGaynanova, Irina
dc.creatorYuan, Dongbang
dc.date.accessioned2023-05-26T18:05:46Z
dc.date.created2022-08
dc.date.issued2022-07-25
dc.date.submittedAugust 2022
dc.identifier.urihttps://hdl.handle.net/1969.1/197995
dc.description.abstractWe consider the problem of extracting joint and individual signals from multi-view data, that is, data collected from different sources on matched samples. We present two main contributions in this dissertation. The first contribution is on matrix decomposition of double-matched data (matched by both samples and source features). The motivating example is the miRNA data collected from both primary tumor and normal tissues of the same subjects; the measurements from two tissues are thus matched both by subjects and by miRNAs. Our proposed double-matched matrix decomposition allows to simultaneously extract joint and individual signals across subjects, as well as joint and individual signals across miRNAs. Our estimation approach takes advantage of double-matching by formulating a new type of optimization problem with explicit row space and column space constraints, for which we develop an efficient iterative algorithm. Numerical studies indicate that taking advantage of double-matching leads to superior signal estimation performance compared to existing multi-view data decomposition based on single-matching. We apply our method to miRNA data as well as data from the English Premier League soccer matches and find joint and individual multi-view signals that align with domain-specific knowledge. The second contribution is that we propose a new framework for canonical correlation analysis (CCA) based on exponential families with explicit modeling of both common and source-specific signals. Unlike previous methods based on exponential families, the common signals from our model coincide with canonical variables in standard CCA, and the unique signals are exactly orthogonal. These modeling differences lead to a non-trivial estimation via optimization with orthogonality constraints, for which we develop an iterative algorithm based on a splitting method. Simulations show on par or superior performance of the pro-posed method compared to the available alternatives. We apply the method to analyze associations between gene expressions and lipids concentrations in nutrigenomic study, and to analyze associations between two distinct cell-type deconvolution methods in prostate cancer tumor heterogeneity study.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectBinomial family
dc.subjectdata integration
dc.subjectdimension reduction
dc.subjectmatrix factorization multi-block data
dc.subjectprincipal component analysis
dc.subjectoptimization
dc.subjectproportions data
dc.titleMatrix Decomposition for Multi-view Data
dc.typeThesis
thesis.degree.departmentStatistics
thesis.degree.disciplineStatistics
thesis.degree.grantorTexas A&M University
thesis.degree.nameDoctor of Philosophy
thesis.degree.levelDoctoral
dc.contributor.committeeMemberBhattacharya, Anirban
dc.contributor.committeeMemberWong, Raymond
dc.contributor.committeeMemberIvanov, Ivan
dc.type.materialtext
dc.date.updated2023-05-26T18:05:47Z
local.embargo.terms2024-08-01
local.embargo.lift2024-08-01
local.etdauthor.orcid0000-0002-3232-8137


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record