Show simple item record

dc.contributor.advisorMallick, Bani K.
dc.creatorDhavala, Soma Sekhar
dc.date.accessioned2012-02-14T22:18:07Z
dc.date.accessioned2012-02-16T16:12:25Z
dc.date.available2012-02-14T22:18:07Z
dc.date.available2012-02-16T16:12:25Z
dc.date.created2010-12
dc.date.issued2012-02-14
dc.date.submittedDecember 2010
dc.identifier.urihttps://hdl.handle.net/1969.1/ETD-TAMU-2010-12-8659
dc.description.abstractWe are concerned with testing for differential expression and consider three different aspects of such testing procedures. First, we develop an exact ANOVA type model for discrete gene expression data, produced by technologies such as a Massively Parallel Signature Sequencing (MPSS), Serial Analysis of Gene Expression (SAGE) or other next generation sequencing technologies. We adopt two Bayesian hierarchical models—one parametric and the other semiparametric with a Dirichlet process prior that has the ability to borrow strength across related signatures, where a signature is a specific arrangement of the nucleotides. We utilize the discreteness of the Dirichlet process prior to cluster signatures that exhibit similar differential expression profiles. Tests for differential expression are carried out using non-parametric approaches, while controlling the false discovery rate. Next, we consider ways to combine expression data from different studies, possibly produced by different technologies resulting in mixed type responses, such as Microarrays and MPSS. Depending on the technology, the expression data can be continuous or discrete and can have different technology dependent noise characteristics. Adding to the difficulty, genes can have an arbitrary correlation structure both within and across studies. Performing several hypothesis tests for differential expression could also lead to false discoveries. We propose to address all the above challenges using a Hierarchical Dirichlet process with a spike-and-slab base prior on the random effects, while smoothing splines model the unknown link functions that map different technology dependent manifestations to latent processes upon which inference is based. Finally, we propose an algorithm for controlling different error measures in a Bayesian multiple testing under generic loss functions, including the widely used uniform loss function. We do not make any specific assumptions about the underlying probability model but require that indicator variables for the individual hypotheses are available as a component of the inference. Given this information, we recast multiple hypothesis testing as a combinatorial optimization problem and in particular, the 0-1 knapsack problem which can be solved efficiently using a variety of algorithms, both approximate and exact in nature.en
dc.format.mimetypeapplication/pdf
dc.language.isoen_US
dc.subjectBayesian Modelsen
dc.subjectGeneralized linear modelsen
dc.subjectSemiparametric modelsen
dc.subjectDirichlet processen
dc.subjectMeta-analysisen
dc.subjectMultiple hypothesis testingen
dc.subjectBioinformaticsen
dc.titleBayesian Semiparametric Models for Heterogeneous Cross-platform Differential Gene Expressionen
dc.typeThesisen
thesis.degree.departmentStatisticsen
thesis.degree.disciplineStatisticsen
thesis.degree.grantorTexas A&M Universityen
thesis.degree.nameDoctor of Philosophyen
thesis.degree.levelDoctoralen
dc.contributor.committeeMemberCarroll, Raymond J.
dc.contributor.committeeMemberHart, Jeffrey D.
dc.contributor.committeeMemberGuikema, Seth D.
dc.type.genrethesisen
dc.type.materialtexten


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record