Nonparametric Bayesian analysis of some clustering problems

Ray, Shubhankar

dc.contributor.advisor	Carroll, Raymond J.
dc.contributor.advisor	Mallick, Bani K.
dc.creator	Ray, Shubhankar
dc.date.accessioned	2006-10-30T23:27:10Z
dc.date.available	2006-10-30T23:27:10Z
dc.date.created	2006-08
dc.date.issued	2006-10-30
dc.identifier.uri	https://hdl.handle.net/1969.1/4251
dc.description.abstract	Nonparametric Bayesian models have been researched extensively in the past 10 years following the work of Escobar and West (1995) on sampling schemes for Dirichlet processes. The infinite mixture representation of the Dirichlet process makes it useful for clustering problems where the number of clusters is unknown. We develop nonparametric Bayesian models for two different clustering problems, namely functional and graphical clustering. We propose a nonparametric Bayes wavelet model for clustering of functional or longitudinal data. The wavelet modelling is aimed at the resolution of global and local features during clustering. The model also allows the elicitation of prior belief about the regularity of the functions and has the ability to adapt to a wide range of functional regularity. Posterior inference is carried out by Gibbs sampling with conjugate priors for fast computation. We use simulated as well as real datasets to illustrate the suitability of the approach over other alternatives. The functional clustering model is extended to analyze splice microarray data. New microarray technologies probe consecutive segments along genes to observe alternative splicing (AS) mechanisms that produce multiple proteins from a single gene. Clues regarding the number of splice forms can be obtained by clustering the functional expression profiles from different tissues. The analysis was carried out on the Rosetta dataset (Johnson et al., 2003) to obtain a splice variant by tissue distribution for all the 10,000 genes. We were able to identify a number of splice forms that appear to be unique to cancer. We propose a Bayesian model for partitioning graphs depicting dependencies in a collection of objects. After suitable transformations and modelling techniques, the problem of graph cutting can be approached by nonparametric Bayes clustering. We draw motivation from a recent work (Dhillon, 2001) showing the equivalence of kernel k-means clustering and certain graph cutting algorithms. It is shown that loss functions similar to the kernel k-means naturally arise in this model, and the minimization of associated posterior risk comprises an effective graph cutting strategy. We present here results from the analysis of two microarray datasets, namely the melanoma dataset (Bittner et al., 2000) and the sarcoma dataset (Nykter et al., 2006).	en
dc.format.extent	857243 bytes	en
dc.format.medium	electronic	en
dc.format.mimetype	application/pdf
dc.language.iso	en_US
dc.publisher	Texas A&M University
dc.subject	Nonparametric Bayesian	en
dc.subject	Clustering	en
dc.subject	Dirichlet Processes	en
dc.title	Nonparametric Bayesian analysis of some clustering problems	en
dc.type	Book	en
dc.type	Thesis	en
thesis.degree.department	Statistics	en
thesis.degree.discipline	Statistics	en
thesis.degree.grantor	Texas A&M University	en
thesis.degree.name	Doctor of Philosophy	en
thesis.degree.level	Doctoral	en
dc.contributor.committeeMember	Cline, Daren B.
dc.contributor.committeeMember	Dougherty, Edward R.
dc.type.genre	Electronic Dissertation	en
dc.type.material	text	en
dc.format.digitalOrigin	born digital	en

Files in this item

Name:: etd-tamu-2006B-STAT-Ray.pdf
Size:: 837.1Kb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Electronic Theses, Dissertations, and Records of Study (2002– )
Texas A&M University Theses, Dissertations, and Records of Study (2002– )

Show simple item record