Show simple item record

dc.contributor.advisorCarroll, Raymond J.en_US
dc.contributor.advisorMallick, Bani K.en_US
dc.creatorRay, Shubhankaren_US
dc.date.accessioned2006-10-30T23:27:10Z
dc.date.available2006-10-30T23:27:10Z
dc.date.created2006-08en_US
dc.date.issued2006-10-30
dc.identifier.urihttp://hdl.handle.net/1969.1/4251
dc.description.abstractNonparametric Bayesian models have been researched extensively in the past 10 years following the work of Escobar and West (1995) on sampling schemes for Dirichlet processes. The infinite mixture representation of the Dirichlet process makes it useful for clustering problems where the number of clusters is unknown. We develop nonparametric Bayesian models for two different clustering problems, namely functional and graphical clustering. We propose a nonparametric Bayes wavelet model for clustering of functional or longitudinal data. The wavelet modelling is aimed at the resolution of global and local features during clustering. The model also allows the elicitation of prior belief about the regularity of the functions and has the ability to adapt to a wide range of functional regularity. Posterior inference is carried out by Gibbs sampling with conjugate priors for fast computation. We use simulated as well as real datasets to illustrate the suitability of the approach over other alternatives. The functional clustering model is extended to analyze splice microarray data. New microarray technologies probe consecutive segments along genes to observe alternative splicing (AS) mechanisms that produce multiple proteins from a single gene. Clues regarding the number of splice forms can be obtained by clustering the functional expression profiles from different tissues. The analysis was carried out on the Rosetta dataset (Johnson et al., 2003) to obtain a splice variant by tissue distribution for all the 10,000 genes. We were able to identify a number of splice forms that appear to be unique to cancer. We propose a Bayesian model for partitioning graphs depicting dependencies in a collection of objects. After suitable transformations and modelling techniques, the problem of graph cutting can be approached by nonparametric Bayes clustering. We draw motivation from a recent work (Dhillon, 2001) showing the equivalence of kernel k-means clustering and certain graph cutting algorithms. It is shown that loss functions similar to the kernel k-means naturally arise in this model, and the minimization of associated posterior risk comprises an effective graph cutting strategy. We present here results from the analysis of two microarray datasets, namely the melanoma dataset (Bittner et al., 2000) and the sarcoma dataset (Nykter et al., 2006).en_US
dc.format.extent857243 bytes
dc.format.mediumelectronicen_US
dc.format.mimetypeapplication/pdf
dc.language.isoen_USen_US
dc.publisherTexas A&M Universityen_US
dc.subjectNonparametric Bayesianen_US
dc.subjectClusteringen_US
dc.subjectDirichlet Processesen_US
dc.titleNonparametric Bayesian analysis of some clustering problemsen_US
dc.typeBooken
dc.typeThesisen
thesis.degree.departmentStatisticsen_US
thesis.degree.disciplineStatisticsen_US
thesis.degree.grantorTexas A&M Universityen_US
thesis.degree.nameDoctor of Philosophyen_US
thesis.degree.levelDoctoralen_US
dc.contributor.committeeMemberCline, Daren B.en_US
dc.contributor.committeeMemberDougherty, Edward R.en_US
dc.type.genreElectronic Dissertationen_US
dc.type.materialtexten_US
dc.format.digitalOriginborn digitalen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record