Adaptive Basis Sampling for Smoothing Splines
Abstract
Smoothing splines provide flexible nonparametric regression estimators. Penalized likelihood method is adopted when responses are from exponential families and multivariate models are constructed with certain analysis of variance decomposition. However, the high computational cost of smoothing splines for large data sets has hindered their wide application. We develop a new method, named adaptive basis sampling, for efficient computation of smoothing splines in super-large samples. Generally, a smoothing spline for a regression problem with sample size n can be expressed as a linear combination of n basis functions and its computational complexity is O(n³). We achieve a more scalable computation in the multivariate case by evaluating the smoothing spline using a smaller set of basis functions, obtained by an adaptive sampling scheme that uses values of the response variable. Our asymptotic analysis shows that smoothing splines computed via adaptive basis sampling converge to the true function at the same rate as full basis smoothing splines. We show that the proposed method outperforms a sampling method that does not use the values of response variable by simulation studies, and apply it to several real data examples.
Citation
Zhang, Nan (2015). Adaptive Basis Sampling for Smoothing Splines. Doctoral dissertation, Texas A & M University. Available electronically from https : / /hdl .handle .net /1969 .1 /155626.