Bayesian Estimation of Correlation Matrices of Longitudinal Date and Variable Clustering
Loading...
Date
2019-07-25
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Estimation of correlation matrices is a challenging problem due to the notorious positive-definiteness constraint and high-dimensionality. Reparameterising Cholesky factors of correlation matrices in terms of angles or hyperspherical coordinates where the angles vary freely in the range [0, π) has become popular in the last two decades. However, it has not been used in Bayesian estimation of correlation matrices perhaps due to lack of clear statistical relevance and suitable priors for the angles. In this dissertation, we show for the first time that for longitudinal data these angles are the inverse cosine of the semi-partial correlations (SPCs). This simple connection makes it possible to introduce physically meaningful selection and shrinkage priors on the angles or correlation matrices with emphasis on selection (sparsity) and shrinking towards special structures. Our method deals effectively with the positive-definiteness constraint in posterior computation. We compare the performance of our Bayesian estimation based on angles with some recent methods based on partial autocorrelations through simulation and apply the method to data related to clinical trial on smoking. Subsequently this reparametrization has been exploited in a variable clustering problem which focuses on model-based clustering of components of a k-dimensional random vector hinging on a block diagonal correlation structure with equicorrelated blocks. There are plenty of data-driven and model based clustering algorithms available in the literature for data clustering.
However, literature on variable clustering is limited. We adopt a model-based approach for variable clustering which assumes an inherent probabilistic model determining the clusters. Starting from a multivariate normal likelihood, we enforce the clustering through prior modeling. With unknown number of clusters, we assume a truncated Poisson distribution (by penalizing large number of clusters) as prior for number of clusters and perform a reversible jump Markov Chain Monte Carlo to correctly estimate the number of clusters in the posterior computation. The end product of our algorithm is cluster recovery of the variables along with the estimation of number of clusters. The performance of the algorithm has been substantiated with extensive simulation studies and a real data example from genetics.
Description
Keywords
Angular parameterization, Cholesky decomposition, Selection, Shrinkage, Structured correlation
matrix, Bayesian Clustering, Protein expression