Sparse Inverse Cholesky Factorization for Scalable Gaussian-Process Inference

Kang, Myeongjong

The full text of this item is not available at this time because the student has placed this item under an embargo for a period of time. The Libraries are not authorized to provide a copy of this work during the embargo period, even for Texas A&M users with NetID.

View/ Open

KANG-DISSERTATION-2023.pdf (8.059Mb)

Date

2023-08-07

Author

Kang, Myeongjong

Metadata

Show full item record

Abstract

Gaussian processes (GPs) are widely used for modeling functions in statistics and machine learning. However, for large-scale applications, direct GP inference is computationally intractable and requires approximations, because decomposing the covariance matrix of the data vector requires cubic scaling computational complexity in the number of data points. A highly promising approach to achieve GP scalability is sparse inverse Cholesky (SIC) approximation, also as known as Vecchia approximation, which is an ordered conditional approximation of the data vector that implies a sparse Cholesky factor of the precision matrix. This dissertation proposes two SIC-type approximations which are applicable for complex settings such as GPs with non-Euclidean input domains or with non-Gaussian data likelihoods. The first approximation determines ordering of the data points and Cholesky sparsity pattern based on correlation-based distance of the inputs or locations corresponding to the data points, instead of Euclidean distance. This correlation-based approach implicitly applies the SIC approximation in a suitable transformed input space and offers a simple, automatic strategy for GP inference that can be applied to any covariance, even when Euclidean distance is not applicable. The second approximation is a variational approximation based on a family of Gaussian distributions whose covariance matrices have SIC factors. We combine this variational approximation of the posterior with a SIC approximation to the prior. This variational approach is double-Kullback-Leibler (KL)-optimal in the sense that variational approximation is reverse-KL-optimal for a given log normalizer and the prior SIC approximation is forward-KL-optimal for a given sparsity pattern. We also investigate asymptotic properties of maximum Vecchia-likelihood (MVL) estimation and GP prediction based on Vecchia approximation for GPs with a particular type of covariance functions associated with spectral density and boundary conditioning assumptions which allow to establish a theory based on the partial differential equation (PDE) literature, under the fixed-domain asymptotic framework. Our theoretical findings suggest that consistency and asymptotic normality of maximum exact-likelihood (ML) estimators imply those of MVL estimators, and the exact predictive distribution of unobserved variable can be inferred accurately using Vecchia approximation, given that the computational complexity parameter grows polylogarithmically with the data size.

Citation

Kang, Myeongjong (2023). Sparse Inverse Cholesky Factorization for Scalable Gaussian-Process Inference. Doctoral dissertation, Texas A&M University. Available electronically from https : / /hdl .handle .net /1969 .1 /200123.