Bayesian Learning with Heterogeneous Data for Life Sciences

Hajiramezanali, Mohammad Ehsan

View/ Open

HAJIRAMEZANALI-DISSERTATION-2021.pdf (11.89Mb)

Date

2021-01-06

Author

Hajiramezanali, Mohammad Ehsan

Metadata

Show full item record

Abstract

We propose a suite of Bayesian learning methods to address challenges arising from task and data heterogeneity in life science applications. First, we develop a novel multi-domain negative binomial (NB) factorization model to analyze next-generation sequencing (NGS) count data, with the goal of enhancing cancer subtyping in the target domain with a limited number of NGS samples by leveraging surrogate data from other cancer types (source domains). In particular, such a Bayesian multi-domain learning (BMDL) method addresses data scarcity issues due to task heterogeneity by learning domain relevance through common latent factors based on given samples across domains. It automatically avoids ``negative transfer'', to which many existing transfer learning methods are amenable, and performs consistently better than single-domain learning regardless of the domain relevance level. In addition to study task heterogeneity, investigating longitudinal heterogeneity of temporal NGS count data may help to better understand the underlying cellular mechanisms of living systems. We propose gamma Markov negative binomial (GMNB) as a fully Bayesian solution to study temporal RNA-seq data. A notable advantage is the capacity to capture a broad range of gene expression patterns over time by integrating a gamma Markov chain into the NB distribution model. We then adopt the Bayes Factor (BF) as a measure that exploits information collectively from all time points to detect the genes with significant variations in temporal expression patterns across phenotypes or treatment conditions. Moving to more complicated experimental settings, we propose variational graph recurrent neural network (VGRNN) that combines additional structural heterogeneity to the longitudinal data. The use of high-level latent random variables in VGRNN can better capture potential variability observed in dynamic graphs as well as the uncertainty of node latent representations, with graphs capturing prior knowledge on dependency relationships. We further develop semi-implicit variational inference for this new VGRNN architecture (SI-VGRNN) to allow flexible non-Gaussian latent representations. Finally, in the last chapter, we propose a novel Bayesian relation learning framework, BayReL, that infers interactions across different heterogeneous input datasets as different views from different types of bio-molecules, aiming at deriving meaningful biological knowledge for integrative multi-omics data analysis. BayReL can flexibly incorporate the available graph dependency structure of each view, exploits non-linear transformations, and provides probabilistic interpretation simultaneously.

Citation

Hajiramezanali, Mohammad Ehsan (2021). Bayesian Learning with Heterogeneous Data for Life Sciences. Doctoral dissertation, Texas A&M University. Available electronically from https : / /hdl .handle .net /1969 .1 /195545.