Boosting Partial Channel Neural Architecture Search with Gradient Projection
Abstract
Neural Architecture Search has led to the discovery of novel neural network architectures that are capable of outperforming expertly designed architectures with fewer resource requirements at deployment time. This has lead to high performing neural networks that are small enough to fit into embedded systems and mobile devices. Recently, methods have been developed to significantly reduce the computational resources and time required to derive custom neural architectures. Specifically, gradient based methods have leveraged backpropagation to design architectures while a network is being trained, reducing search time from nearly 1400 GPU days to 1. However, differentiable neural architecture search suffers from dominating parameterless operations, steep local minimums, and shallow architectures. A recent multitasking method was able to reduce gradient confliction, dominating gradients, and high curvatures within their domain by projecting conflicting gradients from each task onto each other. We utilize a similar method to project conflicting gradients of edges in a search cell. In this paper we test various methods of gradient projection to determine the best way to avoid the derivation of suboptimal architectures. We show that differentiable neural architecture search can be boosted with the use of gradient projection and partial channel connections. By doing so, we show that parameterless operations and steep local minimum can be related to dominating gradients and high curvatures that are overcome in the multitask setting.
Subject
Neural NetworksNeural Architecture Search
Convolutional Neural Networks
Automated Machine Learning
Citation
King, Ryan (2021). Boosting Partial Channel Neural Architecture Search with Gradient Projection. Undergraduate Research Scholars Program. Available electronically from https : / /hdl .handle .net /1969 .1 /194323.