Bayesian Semiparametric Models for Heterogeneous Cross-platform Differential Gene Expression

Dhavala, Soma Sekhar

dc.contributor.advisor	Mallick, Bani K.
dc.creator	Dhavala, Soma Sekhar
dc.date.accessioned	2012-02-14T22:18:07Z
dc.date.accessioned	2012-02-16T16:12:25Z
dc.date.available	2012-02-14T22:18:07Z
dc.date.available	2012-02-16T16:12:25Z
dc.date.created	2010-12
dc.date.issued	2012-02-14
dc.date.submitted	December 2010
dc.identifier.uri	https://hdl.handle.net/1969.1/ETD-TAMU-2010-12-8659
dc.description.abstract	We are concerned with testing for differential expression and consider three different aspects of such testing procedures. First, we develop an exact ANOVA type model for discrete gene expression data, produced by technologies such as a Massively Parallel Signature Sequencing (MPSS), Serial Analysis of Gene Expression (SAGE) or other next generation sequencing technologies. We adopt two Bayesian hierarchical models—one parametric and the other semiparametric with a Dirichlet process prior that has the ability to borrow strength across related signatures, where a signature is a specific arrangement of the nucleotides. We utilize the discreteness of the Dirichlet process prior to cluster signatures that exhibit similar differential expression profiles. Tests for differential expression are carried out using non-parametric approaches, while controlling the false discovery rate. Next, we consider ways to combine expression data from different studies, possibly produced by different technologies resulting in mixed type responses, such as Microarrays and MPSS. Depending on the technology, the expression data can be continuous or discrete and can have different technology dependent noise characteristics. Adding to the difficulty, genes can have an arbitrary correlation structure both within and across studies. Performing several hypothesis tests for differential expression could also lead to false discoveries. We propose to address all the above challenges using a Hierarchical Dirichlet process with a spike-and-slab base prior on the random effects, while smoothing splines model the unknown link functions that map different technology dependent manifestations to latent processes upon which inference is based. Finally, we propose an algorithm for controlling different error measures in a Bayesian multiple testing under generic loss functions, including the widely used uniform loss function. We do not make any specific assumptions about the underlying probability model but require that indicator variables for the individual hypotheses are available as a component of the inference. Given this information, we recast multiple hypothesis testing as a combinatorial optimization problem and in particular, the 0-1 knapsack problem which can be solved efficiently using a variety of algorithms, both approximate and exact in nature.	en
dc.format.mimetype	application/pdf
dc.language.iso	en_US
dc.subject	Bayesian Models	en
dc.subject	Generalized linear models	en
dc.subject	Semiparametric models	en
dc.subject	Dirichlet process	en
dc.subject	Meta-analysis	en
dc.subject	Multiple hypothesis testing	en
dc.subject	Bioinformatics	en
dc.title	Bayesian Semiparametric Models for Heterogeneous Cross-platform Differential Gene Expression	en
dc.type	Thesis	en
thesis.degree.department	Statistics	en
thesis.degree.discipline	Statistics	en
thesis.degree.grantor	Texas A&M University	en
thesis.degree.name	Doctor of Philosophy	en
thesis.degree.level	Doctoral	en
dc.contributor.committeeMember	Carroll, Raymond J.
dc.contributor.committeeMember	Hart, Jeffrey D.
dc.contributor.committeeMember	Guikema, Seth D.
dc.type.genre	thesis	en
dc.type.material	text	en

Files in this item

Name:: DHAVALA-DISSERTATION.pdf
Size:: 1.170Mb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Electronic Theses, Dissertations, and Records of Study (2002– )
Texas A&M University Theses, Dissertations, and Records of Study (2002– )

Show simple item record