Generalization Bounds for Compressed Learning with Hard Support Vector Machines, and Multiclass Learning with Error Correcting Output Codes
Abstract
This dissertation covers two topics. First, the Compressed Learning with hard SVM. We characterize the conditions under which the separability assumption holds after compression. Using these results, we give an upper bound on the compression ratio that maintains separability in the compressed domain. Furthermore, we provide theoretical results to show how the generalization bound changes with respect to the compression ratio used. These results allow for theoretical justifications in choosing the best compression matrix given the particular design parameters at hand.
Additionally, as required for the analysis presented, we extend the existing hard-SVM bounds to the case when a bias term is allowed. The second topic addresses the design of error correcting output codes for multiclass classification. Our algorithm optimizes the encoder for a channel code based coding matrix to ensure the maximum minimum distance of the coding matrix. The optimization procedure uses the properties of the code to run extremely fast, $\mathcal{O}(k \log k)$. We demonstrate the need for the optimal minimum distance for the coding matrix by proving a generalization bound for both hard and soft decoding. These bounds beat the previously published tight asypmtotic growth rate with respect to $k$. Finally, we present empirical results to validate our approach.
Subject
Machine LearningError Correcting Ouptut Codes
Multiclass classification
Compressed Learning
Citation
McVay, Paul Robert (2020). Generalization Bounds for Compressed Learning with Hard Support Vector Machines, and Multiclass Learning with Error Correcting Output Codes. Doctoral dissertation, Texas A&M University. Available electronically from https : / /hdl .handle .net /1969 .1 /192769.