Non-linear and Sparse Discriminant Analysis with Data Compression
Abstract
Large-sample data became prevalent as data acquisition became cheaper and easier. While a large sample size has theoretical advantages for many statistical methods, it presents computational challenges either in the form of a large number of features or a large number of training samples. We consider the two-group classification problem and adapt Linear Discriminant Analysis to the problems above. Linear Discriminant Analysis is a linear classifier and will under-fit when the true decision boundary is non-linear.
To address non-linearity and sparse feature selection, we propose a kernel classifier based on the optimal scoring framework which trains a non-linear classifier. Unlike previous approaches, we provide theoretical guarantees on the expected risk consistency of the method. We also allow for feature selection by imposing structured sparsity using weighted kernels. We propose fully-automated methods for selection of all tuning parameters, and in particular adapt kernel shrinkage ideas for ridge parameter selection. Numerical studies demonstrate the superior classification performance of the proposed approach compared to existing nonparametric classifiers. We also propose automatic methods for ridge parameter selection and guassian kernel parameter selection.
To address the computational challenges of a large sample size, we adapt compression to the classification setting. Sketching, or compression, is a well-studied approach to address sample reduction in regression settings, but considerably less is known about its performance in classification settings. Here we consider the computational issues due to large sample size within the discriminant analysis framework. We propose a new compression approach for reducing the number of training samples for linear and quadratic discriminant analysis, in contrast to existing compression methods which focus on reducing the number of features. We support our approach with a theoretical bound on the misclassification error rate compared to the Bayes classifier. Empirical studies confirm the significant computational gains of the proposed method and its superior predictive ability compared to random sub-sampling.
Subject
Linear Discriminant Analysisclassification
kernels
sparse feature selection
sample reduction
random matrix
compression
sketching
Citation
Lapanowski, Alexander Frank (2020). Non-linear and Sparse Discriminant Analysis with Data Compression. Doctoral dissertation, Texas A&M University. Available electronically from https : / /hdl .handle .net /1969 .1 /192554.