Two-Sample Testing in High Dimension and a Smooth Block Bootstrap for Time Series
MetadataShow full item record
This document contains three sections. The first two present new methods for two-sample testing where there are many variables of interest and the third presents a new methodology for time series bootstrapping. In the first section we develop a test statistic for testing the equality of two population mean vectors in the "large-p-small-n" setting. Such a test must surmount the rank-deficiency of the sample covariance matrix, which breaks down the classic Hotelling T^(2) test. The proposed procedure, called the generalized component test, avoids full estimation of the covariance matrix by assuming that the p components admit a logical ordering such that the dependence between components is related to their displacement. The test is shown to be competitive with other recently developed methods under ARMA and long-range dependence structures and to achieve superior power for heavy-tailed data. The test does not assume equality of covariance matrices between the two populations, is robust to heteroscedasticity in the component variances, and requires very little computation time, which allows its use in settings with very large p. An analysis of mitochondrial calcium concentration in mouse cardiac muscles over time and of copy number variations in a glioblastoma multiforme data set from The Cancer Genome Atlas are carried out to illustrate the test. In the second section we present a theorem establishing a power improvement to the Benjamini-Hochberg procedure for controlling the false discovery rate when it is applied to test statistics which have been adjusted for the effects of latent factors. We extend recently published methodology to the context of serially dependent test statistics by presenting a frequency-domain adaptation of their procedure. We show that our harmonic factor adjustment to the test statistics improves the power of the Benjamini-Hochberg procedure without compromising its control of the false discovery rate when the test statistics are affected by latent periodic components. An illustration of our methodology is given in an analysis of copy number variations, which are measured along a chromosome and tend to exhibit serial dependence; power gains from our harmonic factor adjustment are demonstrated. In the third section we present a smoothed bootstrap procedure for time series data. Unlike with independent data, smoothed boostraps have received little consideration for time series. However, as evidenced in the iid smooth bootstrap, additional data smoothing steps within resampling can improve bootstrap approximations of the distributions of statistics, especially when such sampling distributions depend critically on unknown and smooth (e.g., infinite-dimensional) population quantities, such as marginal densities. To broaden the effectiveness of the bootstrap for time series, we propose a smooth bootstrap based on modifying a state-of-the-art block resampling approach for dependent data based on tapering windows. The resulting smooth (extended) tapered block bootstrap (TBB) is shown to provide valid variance and distributional approximations over a broad class of parameters and statistics for stationary time series, formulated in terms of statistical functionals (e.g., smooth function model statistics, L- and M-estimators, rank statistics). Our treatment goes beyond statistics as smooth functions of sample averages, showing that the smooth TBB has applicability in inference cases which have not been formally established for other TBB versions. Some finite-sample simulations also provide evidence that smoothing steps enhance the performance of the block bootstrap for various statistical functionals.
Gregory, Karl Bruce (2014). Two-Sample Testing in High Dimension and a Smooth Block Bootstrap for Time Series. Doctoral dissertation, Texas A & M University. Available electronically from