Bayesian variable selection in clustering via dirichlet process mixture models

Kim, Sinae

View/ Open

etd-tamu-2006A-STAT-Kim-Sinae.pdf (2.620Mb)

Date

2007-09-17

Author

Kim, Sinae

Metadata

Show full item record

Abstract

The increased collection of high-dimensional data in various fields has raised a strong interest in clustering algorithms and variable selection procedures. In this disserta- tion, I propose a model-based method that addresses the two problems simultane- ously. I use Dirichlet process mixture models to define the cluster structure and to introduce in the model a latent binary vector to identify discriminating variables. I update the variable selection index using a Metropolis algorithm and obtain inference on the cluster structure via a split-merge Markov chain Monte Carlo technique. I evaluate the method on simulated data and illustrate an application with a DNA microarray study. I also show that the methodology can be adapted to the problem of clustering functional high-dimensional data. There I employ wavelet thresholding methods in order to reduce the dimension of the data and to remove noise from the observed curves. I then apply variable selection and sample clustering methods in the wavelet domain. Thus my methodology is wavelet-based and aims at clustering the curves while identifying wavelet coefficients describing discriminating local features. I exemplify the method on high-dimensional and high-frequency tidal volume traces measured under an induced panic attack model in normal humans.

Citation

Kim, Sinae (2003). Bayesian variable selection in clustering via dirichlet process mixture models. Doctoral dissertation, Texas A&M University. Texas A&M University. Available electronically from https : / /hdl .handle .net /1969 .1 /5888.