The full text of this item is not available at this time because the student has placed this item under an embargo for a period of time. The Libraries are not authorized to provide a copy of this work during the embargo period, even for Texas A&M users with NetID.
Matrix Decomposition for Multi-view Data
Abstract
We consider the problem of extracting joint and individual signals from multi-view data, that is, data collected from different sources on matched samples. We present two main contributions in this dissertation.
The first contribution is on matrix decomposition of double-matched data (matched by both samples and source features). The motivating example is the miRNA data collected from both primary tumor and normal tissues of the same subjects; the measurements from two tissues are thus matched both by subjects and by miRNAs. Our proposed double-matched matrix decomposition allows to simultaneously extract joint and individual signals across subjects, as well as joint and individual signals across miRNAs. Our estimation approach takes advantage of double-matching by formulating a new type of optimization problem with explicit row space and column space constraints, for which we develop an efficient iterative algorithm. Numerical studies indicate that taking advantage of double-matching leads to superior signal estimation performance compared to existing multi-view data decomposition based on single-matching. We apply our method to miRNA data as well as data from the English Premier League soccer matches and find joint and individual multi-view signals that align with domain-specific knowledge.
The second contribution is that we propose a new framework for canonical correlation analysis (CCA) based on exponential families with explicit modeling of both common and source-specific signals. Unlike previous methods based on exponential families, the common signals from our model coincide with canonical variables in standard CCA, and the unique signals are exactly orthogonal. These modeling differences lead to a non-trivial estimation via optimization with orthogonality constraints, for which we develop an iterative algorithm based on a splitting method. Simulations show on par or superior performance of the pro-posed method compared to the available alternatives. We apply the method to analyze associations between gene expressions and lipids concentrations in nutrigenomic study, and to analyze associations between two distinct cell-type deconvolution methods in prostate cancer tumor heterogeneity study.
Subject
Binomial familydata integration
dimension reduction
matrix factorization multi-block data
principal component analysis
optimization
proportions data
Citation
Yuan, Dongbang (2022). Matrix Decomposition for Multi-view Data. Doctoral dissertation, Texas A&M University. Available electronically from https : / /hdl .handle .net /1969 .1 /197995.