DNN Weight Compression Method with ADMM Framework
Abstract
Deep neural networks (DNNs) is a powerful technique evolved to the state-of-the-art technique for computer vision tasks. The "deep compression" is introduced to overcome the significant problem in memory- and computational-efficient. The basic pipeline of compression is the three-stage model including pruning/fine-tuning, quantization/clustering, and encoding, and the Alternating Direction Method of Multipliers (ADMM) algorithm has been applied into pruning stage to improve the performance. Furthermore, the quantization/clustering is used with a 2-bit representation jointly to reduce the storage. Lastly we auto-adjust the hype-parameters, apply classification on the gradients and filter the whole layers to increase the weight reduction ratio and solving the time-consuming problems. The algorithm is training on the LeNet-5 model using the MNIST dataset and has 222.3x weight number reduction as the result.
Citation
Yeh, Rachel Hsingtze (2019). DNN Weight Compression Method with ADMM Framework. Master's thesis, Texas A&M University. Available electronically from https : / /hdl .handle .net /1969 .1 /188818.