Statistical Methods for Integrating Genomics Data
Abstract
This dissertation focuses on methodology to integrate multiplatform genomic data with cancer applications. Such integration facilitates the discovery of biological information crucial to the development of targeted treatments. We present iBAG (integrative Bayesian Analysis of Genomics data), a two-step hierarchical Bayesian model that uses the known biological relationships between genetic platforms to integrate an arbitrary number of platforms in a single model. This method identifies genes important to a clinical outcome, such as survival, and the integration approach also allows us to identify which platforms are modulating the important gene effects. A glioblastoma multiforme (GBM) data set publicly available from The Cancer Genome Atlas (TCGA) is analyzed with iBAG. We flag several genes as important to survival time, and we include a discussion of these genes in a biological context. We then present a nonlinear formulation of iBAG, which increases the flexibility of the model to accommodate nonlinear relationships among the data platforms. The TCGA GBM data is again analyzed, and we carefully compare the results from both the linear and nonlinear formulation. Next we present a pathway iBAG model, piBAG, which includes gene pathway membership information and utilizes hierarchical shrinkage to simultaneously select important genes and assign pathway scores. The integration of multiple genomic platforms again allows us to determine which platform is regulating each important gene, and it also provides insight as to through which platform each pathway is taking effect. We apply this method to a different subset of the TCGA GBM data. Finally, we present integrative heatmaps, a novel visualization tool for illustrating integrated data. We use a TCGA colorectal cancer data set to demonstrate the integrative heatmaps. Through the various simulation studies and data applications in this dissertation, we conclude that the methods presented achieve their respective goals and outperform standard methods. We demonstrate that our methods provide many advantages, including increased estimation efficiency, increased power, lower false discovery rates, and deeper
biological insight into the genetic mechanics of cancer development and progression.
Subject
Bayesian modelingGenomics
Heatmaps
Hierarchical models
Integrative analysis
Shrinkage priors
TCGA
Citation
McGuffey, Elizabeth Jennings (2015). Statistical Methods for Integrating Genomics Data. Doctoral dissertation, Texas A & M University. Available electronically from https : / /hdl .handle .net /1969 .1 /155093.