Model Attack on Convolutional Neural Networks
Abstract
Deep learning is a machine learning technique that enables computers to learn directly from images, text, or sound in the same way that people do. It is a key technology which enables selfdriving cars and speech recognition. In the past few years, deep learning has been successfully used in a wide range of applications and has demonstrated results beyond what computers were thought to be capable of. This new technology is poised to change the way we live. Despite the successes, the exact working of deep learning models is not well-understood, and they can fail in several unintuitive ways. One such vulnerability is that small modifications to the input, which might not even be noticeable for humans, are enough to fool these models. This vulnerability has received significant attention from the research community and is a well-studied problem. Our focus is the scenario where the parameters of the model, rather than its inputs, are maliciously modified.
Deep learning models contain a large number of parameters that interact with each other in complex ways, so small perturbations to a large number of parameters can produce a cumulative effect, causing the model to misbehave. Further, noise inherent in practical systems can act as a camouflage for such malicious perturbations, making it difficult to detect them. Even though deep learning models have produced amazing results, their vulnerabilities present a serious concern that must be overcome before they can be deployed in practical systems. In this work, we evaluate the threat of attackers maliciously modifying the model parameters to compromise the model. We demonstrate that small perturbations to the parameters are enough to compromise the model without significantly affecting its performance. We also study the characteristics of these malicious perturbations and devise a strategy to detect such an attack.
Citation
Vallamkonda, Abhilash Rajendra Babu (2019). Model Attack on Convolutional Neural Networks. Master's thesis, Texas A&M University. Available electronically from https : / /hdl .handle .net /1969 .1 /188808.