Show simple item record

dc.contributor.advisorKatzfuss, Matthias
dc.creatorKang, Myeongjong
dc.date.accessioned2023-10-12T15:13:17Z
dc.date.created2023-08
dc.date.issued2023-08-07
dc.date.submittedAugust 2023
dc.identifier.urihttps://hdl.handle.net/1969.1/200123
dc.description.abstractGaussian processes (GPs) are widely used for modeling functions in statistics and machine learning. However, for large-scale applications, direct GP inference is computationally intractable and requires approximations, because decomposing the covariance matrix of the data vector requires cubic scaling computational complexity in the number of data points. A highly promising approach to achieve GP scalability is sparse inverse Cholesky (SIC) approximation, also as known as Vecchia approximation, which is an ordered conditional approximation of the data vector that implies a sparse Cholesky factor of the precision matrix. This dissertation proposes two SIC-type approximations which are applicable for complex settings such as GPs with non-Euclidean input domains or with non-Gaussian data likelihoods. The first approximation determines ordering of the data points and Cholesky sparsity pattern based on correlation-based distance of the inputs or locations corresponding to the data points, instead of Euclidean distance. This correlation-based approach implicitly applies the SIC approximation in a suitable transformed input space and offers a simple, automatic strategy for GP inference that can be applied to any covariance, even when Euclidean distance is not applicable. The second approximation is a variational approximation based on a family of Gaussian distributions whose covariance matrices have SIC factors. We combine this variational approximation of the posterior with a SIC approximation to the prior. This variational approach is double-Kullback-Leibler (KL)-optimal in the sense that variational approximation is reverse-KL-optimal for a given log normalizer and the prior SIC approximation is forward-KL-optimal for a given sparsity pattern. We also investigate asymptotic properties of maximum Vecchia-likelihood (MVL) estimation and GP prediction based on Vecchia approximation for GPs with a particular type of covariance functions associated with spectral density and boundary conditioning assumptions which allow to establish a theory based on the partial differential equation (PDE) literature, under the fixed-domain asymptotic framework. Our theoretical findings suggest that consistency and asymptotic normality of maximum exact-likelihood (ML) estimators imply those of MVL estimators, and the exact predictive distribution of unobserved variable can be inferred accurately using Vecchia approximation, given that the computational complexity parameter grows polylogarithmically with the data size.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectVecchia approximation
dc.subjectMaximum-minimum-distance ordering
dc.subjectNearest neighbors
dc.subjectVariational inference
dc.subjectInfill asymptotics
dc.titleSparse Inverse Cholesky Factorization for Scalable Gaussian-Process Inference
dc.typeThesis
thesis.degree.departmentStatistics
thesis.degree.disciplineStatistics
thesis.degree.grantorTexas A&M University
thesis.degree.nameDoctor of Philosophy
thesis.degree.levelDoctoral
dc.contributor.committeeMemberPourahmadi, Mohsen
dc.contributor.committeeMemberSang, Huiyan
dc.contributor.committeeMemberTuo, Rui
dc.type.materialtext
dc.date.updated2023-10-12T15:13:18Z
local.embargo.terms2025-08-01
local.embargo.lift2025-08-01
local.etdauthor.orcid0000-0003-2561-7384


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record