Randomized Functional Data Analysis and its Application in Astronomy
Abstract
Functional data analysis (FDA) methods have computational and theoretical appeals for some high dimensional data, but lack the scalability to modern large sample datasets. Covariance operators are fundamental concepts and modeling tools for many FDA methods, such as functional principal component analysis. However, the empirical (or estimated) covariance operator becomes too costly to compute when the functional dataset gets big. We study a randomized algorithm for covariance operator estimation. The algorithm works by sampling and rescaling observations from the large functional data collection to form a sketch of much smaller size, and performs computation on the sketch to obtain the subsampled empirical covariance operator. The proposed algorithm is theoretically justified via non-asymptotic bounds between the subsampled and the full-sample empirical covariance operator in terms of the Hilbert-Schmidt norm and operator norm. It is shown that the optimal sampling probability that minimizes the expected squared Hilbert-Schmidt norm of the subsampling error is determined by the norm of each function. Simulated and real data examples are used to illustrate the effectiveness of the proposed algorithm.
The idea of randomization is then used in a Type Ia supernova (SN Ia) spectrophotometric data modeling problem where we develop the Independent Component Estimation (ICE) method for sparse and irregularly spaced spectrophotometric data of Type Ia supernovae (SNe Ia) using functional principal component analysis (FPCA) and independent component analysis (ICA) to explore the separation of SN Ia intrinsic properties and interstellar dust reddening effect. This separation makes it possible to construct the intrinsic spectral energy distribution (SED) manifolds of SNe Ia, which facilitates supernova studies and their cosmological application.
Citation
Yan, Xiaomeng (2022). Randomized Functional Data Analysis and its Application in Astronomy. Doctoral dissertation, Texas A&M University. Available electronically from https : / /hdl .handle .net /1969 .1 /197365.