Bayesian variable selection in clustering via dirichlet process mixture models
Abstract
The increased collection of high-dimensional data in various fields has raised a strong
interest in clustering algorithms and variable selection procedures. In this disserta-
tion, I propose a model-based method that addresses the two problems simultane-
ously. I use Dirichlet process mixture models to define the cluster structure and to
introduce in the model a latent binary vector to identify discriminating variables. I
update the variable selection index using a Metropolis algorithm and obtain inference
on the cluster structure via a split-merge Markov chain Monte Carlo technique. I
evaluate the method on simulated data and illustrate an application with a DNA
microarray study. I also show that the methodology can be adapted to the problem
of clustering functional high-dimensional data. There I employ wavelet thresholding
methods in order to reduce the dimension of the data and to remove noise from the
observed curves. I then apply variable selection and sample clustering methods in the
wavelet domain. Thus my methodology is wavelet-based and aims at clustering the
curves while identifying wavelet coefficients describing discriminating local features.
I exemplify the method on high-dimensional and high-frequency tidal volume traces
measured under an induced panic attack model in normal humans.
Subject
Bayesian inferenceClustering
Dirichlet process mixture model
DNA microarray data analysis
variable selection
wavelet shrinkage
Citation
Kim, Sinae (2003). Bayesian variable selection in clustering via dirichlet process mixture models. Doctoral dissertation, Texas A&M University. Texas A&M University. Available electronically from https : / /hdl .handle .net /1969 .1 /5888.