Browsing by Author "Guhaniyogi, Rajarshi"
Now showing 1 - 14 of 14
Results Per Page
Sort Options
Item A Bayesian Covariance Based Clustering for High-Dimensional Tensors(2021-12-13) Gutierrez, Rene; Scheffler, Aaron; Guhaniyogi, RajarshiItem Bayesian Covariate-Dependent Clustering of Undirected Networks with Brain-Imaging Data(2022-08-25) Guha, Sharmistha; Guhaniyogi, RajarshiThis article focuses on model-based clustering of subjects based on the shared relationships of subject-specific networks and covariates in scenarios when there are differences in the relationship between networks and covariates for different groups of subjects. It is also of interest to identify the network nodes significantly associated with each covariate in each cluster of subjects. To address these methodological questions, we propose a novel nonparametric Bayesian mixture modeling framework with an undirected network response and scalar predictors. The symmetric matrix coefficients corresponding to the scalar predictors of interest in each mixture component involve low-rankness and group sparsity within the low-rank structure. While the low-rank structure in the network coefficients adds parsimony and computational efficiency, the group sparsity within the low-rank structure enables drawing inference on network nodes and cells significantly associated with each scalar predictor. Our principled Bayesian framework allows precise characterization of uncertainty in identifying significant network nodes in each cluster. Empirical results in various simulation scenarios illustrate substantial inferential gains of the proposed framework in comparison with competitors. Analysis of a real brain connectome dataset using the proposed method provides interesting insights into the brain regions of interest (ROIs) significantly related to creative achievement in each cluster of subjects.Item Bayesian Data Sketching for Varying Coefficient Regression Models(2023-03-24) Guhaniyogi, Rajarshi; Laura, Baracaldo; Sudipto, BanerjeeVarying coefficient models are popular tools in estimating nonlinear regression functions in functional data models. Their Bayesian variants have received limited attention in large data applications, primarily due to the prohibitively slow posterior computations using Markov chain Monte Carlo (MCMC) algorithms. We introduce Bayesian data sketching for varying coefficient models to obviate computational challenges presented by large sample sizes. To address the challenges of analyzing large data, we compress functional response vector and predictor matrix by a random linear transformation to achieve dimension reduction and conduct inference on the compressed data. Our approach distinguishes itself from several existing methods for analyzing large functional data in that it requires neither the development of new models or algorithms nor any specialized computational hardware while delivering fully model-based Bayesian inference. Well-established methods and algorithms for varying coefficient regression models can be applied to the compressed data. We establish posterior contraction rates for estimating the varying coefficients and predicting the outcome at new locations under the randomly compressed data model. We use simulation experiments and conduct a spatially varying coefficient analysis of remote sensed vegetation data to empirically illustrate the inferential and computational efficiency of our approach.Item Bayesian Data Sketching for Varying Coefficient Regression Models(2024-09-25) Guhaniyogi, Rajarshi; Baracaldo, Laura; Banerjee, SudiptoItem Bayesian scalar-on-tensor regression using the Tucker decomposition for sparse spatial modeling finds promising results analyzing neuroimaging data(2024-09-25) Spencer, Daniel; Guhaniyogi, Rajarshi; Prado, Raquel; Shinohara, RussellModeling with multidimensional arrays, or tensors, often presents a problem due to high dimensionality. In addition, these structures typically exhibit inherent sparsity, requiring the use of regularization methods to properly characterize an association between a tensor covariate and a scalar response. We propose a Bayesian method to efficiently model a scalar response with a tensor covariate using the Tucker tensor decomposition in order to retain the spatial relationship within a tensor coefficient, while reducing the number of parameters varying within the model and applying regularization methods. Simulated data are analyzed to compare the model to recently proposed methods. A neuroimaging analysis using data from the Alzheimer's Data Neuroimaging Initiative shows improved inferential performance compared with other tensor regression methods.Item A Covariance Based Clustering for Tensor Objects(2023-03-03) Gutierrez, Rene; Scheffler, Aaron; Guhaniyogi, Rajarshi; Dickinson, Abigail; DiStefano, Charlotte; Jeste, ShafaliClustering of tensors with limited sample size has become prevalent in a variety of application areas. Existing Bayesian model based clustering of tensors yields less accurate clusters when the tensor dimensions are sufficiently large, sample size is low and clusters of tensors mainly reveal difference in their variability. This article develops a clustering technique for high dimensional tensors with limited sample size when the clusters show difference in their covariances, rather than in their means. The proposed approach constructs several matrices from a tensor, referred to as transformed features, to adequately estimate its variability along different modes and implements a model-based approximate Bayesian clustering algorithm with the matrices thus constructed, in place with the original tensor data. Although some information in the data is discarded, we gain substantial computational efficiency and accuracy in clustering. Simulation study assesses the proposed approach along with its competitors in terms of estimating the number of clusters, identification of the modal cluster membership along with the probability of mis-classification in clustering (a measure of uncertainty in clustering). The proposed methodology provides novel insights into potential clinical subgroups for children with autism spectrum disorder based on resting-state electroencephalography activity.Item Covariate-Dependent Clustering of Undirected Networks with Brain-Imaging Data(2023-05-16) Guha, Sharmistha; Guhaniyogi, RajarshiThis article focuses on model-based clustering of subjects based on the shared relationships of subject-specific networks and covariates in scenarios when there are differences in the relationship between networks and covariates for different groups of subjects. It is also of interest to identify the network nodes significantly associated with each covariate in each cluster of subjects. To address these methodological questions, we propose a novel nonparametric Bayesian mixture modeling framework with an undirected network response and scalar predictors. The symmetric matrix coefficients corresponding to the scalar predictors of interest in each mixture component involve low-rankness and group sparsity within the low-rank structure. While the low-rank structure in the network coefficients adds parsimony and computational efficiency, the group sparsity within the low-rank structure enables drawing inference on network nodes and cells significantly associated with each scalar predictor. Being a principled Bayesian mixture modeling framework, our approach allows model-based identification of the number of clusters, offers clustering uncertainty in terms of the co-clustering matrix and presents precise characterization of uncertainty in identifying network nodes significantly related to a predictor in each cluster. Empirical results in various simulation scenarios illustrate substantial inferential gains of the proposed framework in comparison with competitors. Analysis of a real brain connectome dataset using the proposed method provides interesting insights into the brain regions of interest (ROIs) significantly related to creative achievement in each cluster of subjects.Item Data Sketching and Stacking: A Confluence of Two Strategies for Predictive Inference in Gaussian Process Regressions with High-Dimensional Features(2024-05-02) Gailliot, Samuel; Guhaniyogi, Rajarshi; Peng, RogerThis article focuses on drawing computationally-efficient predictive inference from Gaussian process (GP) regressions with a large number of features when the response is conditionally independent of the features given the projection to a noisy low dimensional manifold. Bayesian estimation of the regression relationship using Markov Chain Monte Carlo and subsequent predictive inference is computationally prohibitive and may lead to inferential inaccuracies since accurate variable selection is essentially impossible in such high-dimensional GP regressions. As an alternative, this article proposes a strategy to sketch the high-dimensional feature vector with a carefully constructed sketching matrix, before fitting a GP with the scalar outcome and the sketched feature vector to draw predictive inference. The analysis is performed in parallel with many different sketching matrices and smoothing parameters in different processors, and the predictive inferences are combined using \emph{Bayesian predictive stacking}. Since posterior predictive distribution in each processor is analytically tractable, the algorithm allows bypassing the robustness issues due to convergence and mixing of MCMC chains, leading to fast implementation with very large number of features. The approach outperforms competitors in drawing point prediction with predictive uncertainties of outdoor air pollution from satellite images.Item InVA: Integrative Variational Autoencoder for Harmonization of Multi-modal Neuroimaging Data(2024-09-24) Lei, Bowen; Guhaniyogi, Rajarshi; Chandra, Krishnendu; Scheffler, Aaron; Mallick, BaniThere is a significant interest in exploring non-linear associations among multiple images derived from diverse imaging modalities. While there is a growing literature on image-on-image regression to delineate predictive inference of an image based on multiple images, existing approaches have limitations in efficiently borrowing information between multiple imaging modalities in the prediction of an image. Building on the literature of Variational Auto Encoders (VAEs), this article proposes a novel approach, referred to as Integrative Variational Autoencoder (\texttt{InVA}) method, which borrows information from multiple images obtained from different sources to draw predictive inference of an image. The proposed approach captures complex non-linear association between the outcome image and input images, while allowing rapid computation. Numerical results demonstrate substantial advantages of \texttt{InVA} over VAEs, which typically do not allow borrowing information between input images. The proposed framework offers highly accurate predictive inferences for costly positron emission topography (PET) from multiple measures of cortical structure in human brain scans readily available from magnetic resonance imaging (MRI).Item Multi-object Data Integration in the Study of Primary Progressive Aphasia(2024-09-25) Gutierrez , Rene; Scheffler, Aaron; Guhaniyogi, Rajarshi; Gorno-Tempini, Maria; Mandelli, Marilu; Battistella, GiovanniThis article focuses on a multi-modal imaging data application where structural/anatomical information from gray matter (GM) and brain connectivity information in the form of a brain connectome network from functional magnetic resonance imaging (fMRI) are available for a number of subjects with different degrees of primary progressive aphasia (PPA), a neurodegenerative disorder (ND) measured through a speech rate measure on motor speech loss. The clinical/scientific goal in this study becomes the identification of brain regions of interest significantly related to the speech rate measure to gain insight into ND patterns. Viewing the brain connectome network and GM images as objects, we develop an integrated object response regression framework of network and GM images on the speech rate measure. A novel integrated prior formulation is proposed on network and structural image coefficients in order to exploit network information of the brain connectome while leveraging the interconnections among the two objects. The principled Bayesian framework allows the characterization of uncertainty in ascertaining a region being actively related to the speech rate measure. Our framework yields new insights into the relationship of brain regions associated with PPA, offering a deeper understanding of neuro-degenerative patterns of PPA.Item Multi-object Data Integration in the Study of Primary Progressive Aphasia(2023-03-02) Gutierrez, Rene; Scheffler, Aaron; Guhaniyogi, Rajarshi; Gorno-Tempini, Maria; Mandelli, Maria; Battistella, GiovanniThis article focuses on a multi-modal imaging data application where structural/anatomical information from grey matter (GM) and brain connectivity information in the form of a brain connectome network from functional magnetic resonance imaging (fMRI) are available for a number of subjects with different degrees of primary progressive aphasia (PPA), a neurodegenerative disorder (ND) measured through a speech rate measure on motor speech loss. The clinical/scientific goal in this study becomes the identification of brain regions of interest significantly related to the speech rate measure to gain insight into ND pathways. Viewing the brain connectome network and GM images as objects, we develop a flexible joint object response regression framework of network and GM images on the speech rate measure. A novel joint prior formulation is proposed on network and structural image coefficients in order to exploit network information of the brain connectome, while leveraging the topological linkages among connectome network and anatomical information from GM to draw inference on brain regions significantly related to the speech rate measure. The principled Bayesian framework allows precise characterization of the uncertainty in ascertaining a region being actively related to the speech rate measure. Our framework yields new insights into the relationship of brain regions with PPA, offering deeper understanding of neuro-degeneration pathways for PPA.Item Regression with Structured Features at Multiple Scales to the Study of General Cognition in Children(2024-05-05) Gutierrez, Rene; Guhaniyogi, Rajarshi; Scheffler, AaronThis article is motivated by an application, where we aim to comprehend the neural underpinnings of general cognition, a pivotal indicator of healthy brain development, by examining the relationship between structural task-based brain activation maps and resting-state brain connectivity graphs in children aged 9-10 years old. While prior studies have identified certain brain regions linked to general cognition, these findings predominantly rely on analyses focusing on a single image modality, such as the resting-state graph alone. Moreover, no structured regression technique currently exists to assess the collective impact of both structural and graph features on general cognition while preserving linkage between their topology. To address this gap, this article focuses on developing a regression model with a scalar outcome and two sets of imaging features obtained at different scales: (a) a \emph{graph}-valued feature with ``labelled" nodes at a coarse scale, quantifying interconnections between nodes in the form of a brain connectome graph from resting state functional magnetic resonance imaging (fMRI); and (b) \emph{structural} features at a finer scale \emph{nested} within each graph node in the form of task-based brain activation maps. We introduce a novel flexible Bayesian regression framework that harnesses the relational information of nodes in the graph-valued feature and the nested architecture between graph and structural features through a novel joint prior structure on coefficients. We refer to the proposed framework as Bayesian Multi-Object Feature Regression (BMFR). The framework enables inference on significant nodes in the graph predictive of the outcome, coefficients for features at both scales, and predictive inference for the outcome, each accompanied by precise characterization of uncertainty. The implementation utilizes an efficient Markov Chain Monte Carlo algorithm. Results from simulations showcase the framework's excellent performance in terms of influential node inference, regression coefficient estimation, and outcome prediction, outperforming popular competitors such as high-dimensional regression approaches, tree-based models, and deep neural networks. Application of BMFR to the multi-modal imaging data identifies two parieto-frontal resting state networks and constituent structural regions activated during a working memory task that provide new evidence to support existing theories of neuronal integration.Item Robust Distributed Learning of Functional Data From Simulators through Data Sketching(2024-05-01) Andros, Jacob; Guhaniyogi, Rajarshi; Francom, Devin; Pasqualini, DonatellaRealistic simulations are crucial for comprehending complex systems in climate and environmental studies. Yet, running sophisticated computational models across a wide range of input settings often overwhelms large computer systems. Statistical surrogate models, or emulators, play a vital role in efficiently exploring the simulator input space. Functional data models involving Gaussian processes (GPs) and their computationally efficient variants have become standard tools for achieving this goal. The conventional centralized processing of such models requires substantial computational and storage resources at the central server. To counter this, emerging distributed Bayesian learning frameworks partition raw data into shards and distribute computations of these shards across machines. While this strategy mitigates data storage costs and improves computation within each machine, concerns arise regarding the sensitivity of distributed inference to shard selection. Motivated by the concept of data sketching in the literature, this article proposes an innovative alternative. Instead of creating data shards, our approach employs multiple random matrices to construct multiple random linear projections, or `"random sketches," of the complete dataset. Posterior inference on functional data models is performed using random sketches on various machines in parallel. These individual inferences are then combined across machines at a central server. By aggregating inference across diverse random matrices, our approach proves resilient to the selection of data sketches, leading to the development of novel robust distributed Bayesian learning approach. An important advantage of our approach is its ability to maintain the privacy of sampling units, as the inference is based on random data sketches that do not allow the recovery of raw data. We illustrate the significance of our approach through various simulated data examples in the realm of Bayesian distributed learning techniques. Finally, we demonstrate the performance of our proposed approach as an emulator with surrogates of the Sea, Lake, and Overland Surges from Hurricanes (SLOSH) simulator—a choice of simulator for government agencies.Item Sketching in High Dimensional Regression With Big Data Using Gaussian Scale Mixture Priors(2023-05-16) Guhaniyogi, Rajarshi; Scheffler, AaronBayesian computation of high dimensional linear regression models with popular Gaussian scale mixture prior distributions using Markov Chain Monte Carlo (MCMC) or its variants can be extremely slow or completely prohibitive due to the heavy computational cost that grows in the cubic order of p, with p as the number of features. Although a few recently developed algorithms allow computational efficiency in presence of a small to moderately large sample size, the computational issues are considerably less explored when sample size n is also large, except for a few recent articles. In this article we propose a sketching approach to compress the n original samples by a random linear transformation to m samples in p dimensions, and compute Bayesian regression with Gaussian scale mixture prior distributions with the randomly compressed response vector and feature matrix. Our proposed approach yields computational complexity growing in the cubic order of m. Our detailed empirical investigation with the Horseshoe prior from the class of Gaussian scale mixture priors shows closely similar inference and a considerable reduction in per iteration computation time of the proposed approach compared to the regression with the full sample. One notable contribution of this article is to derive posterior contraction rate for high dimensional feature coefficient with a general class of shrinkage priors on the coefficients under data compression/sketching. In particular, we characterize the dimension of the compressed response vector m as a function of the sample size, number of features and sparsity in the regression to guarantee accurate estimation of feature coefficients asymptotically, even after data sketching.