Neural architecture search

Neural architecture search (NAS)^[1]^[2] is a technique for automating the design of artificial neural networks (ANN), a widely used model in the field of machine learning. NAS has been used to design networks that are on par or outperform hand-designed architectures.^[3]^[4] Methods for NAS can be categorized according to the search space, search strategy and performance estimation strategy used:^[1]

The search space defines the type(s) of ANN that can be designed and optimized.
The search strategy defines the approach used to explore the search space.
The performance estimation strategy evaluates the performance of a possible ANN from its design (without constructing and training it).

NAS is closely related to hyperparameter optimization^[5] and meta-learning^[6] and is a subfield of automated machine learning (AutoML).^[7]

Reinforcement learning[]

Reinforcement learning (RL) can underpin a NAS search strategy. Zoph et al.^[3] applied NAS with RL targeting the CIFAR-10 dataset and achieved a network architecture that rivals the best manually-designed architecture for accuracy, with an error rate of 3.65, 0.09 percent better and 1.05x faster than a related hand-designed model. On the Penn Treebank dataset, that model composed a recurrent cell that outperforms LSTM, reaching a test set perplexity of 62.4, or 3.6 perplexity better than the prior leading system. On the PTB character language modeling task it achieved bits per character of 1.214.^[3]

Learning a model architecture directly on a large dataset can be a lengthy process. NASNet^[4]^[8] addressed this issue by transferring a building block designed for a small dataset to a larger dataset. The design was constrained to use two types of convolutional cells to return feature maps that serve two main functions when convoluting an input feature map: normal cells that return maps of the same extent (height and width) and reduction cells in which the returned feature map height and width is reduced by a factor of two. For the reduction cell, the initial operation applied to the cell’s inputs uses a stride of two (to reduce the height and width).^[4] The learned aspect of the design included elements such as which lower layer(s) each higher layer took as input, the transformations applied at that layer and to merge multiple outputs at each layer. In the studied example, the best convolutional layer (or "cell") was designed for the CIFAR-10 dataset and then applied to the ImageNet dataset by stacking copies of this cell, each with its own parameters. The approach yielded accuracy of 82.7% top-1 and 96.2% top-5. This exceeded the best human-invented architectures at a cost of 9 billion fewer FLOPS—a reduction of 28%. The system continued to exceed the manually-designed alternative at varying computation levels. The image features learned from image classification can be transferred to other computer vision problems. E.g., for object detection, the learned cells integrated with the Faster-RCNN framework improved performance by 4.0% on the COCO dataset.^[4]

In the so-called Efficient Neural Architecture Search (ENAS), a controller discovers architectures by learning to search for an optimal subgraph within a large graph. The controller is trained with policy gradient to select a subgraph that maximizes the validation set's expected reward. The model corresponding to the subgraph is trained to minimize a canonical cross entropy loss. Multiple child models share parameters, ENAS requires fewer GPU-hours than other approaches and 1000-fold less than "standard" NAS. On CIFAR-10, the ENAS design achieved a test error of 2.89%, comparable to NASNet. On Penn Treebank, the ENAS design reached test perplexity of 55.8.^[9]

Evolution[]

Several groups employed evolutionary algorithms for NAS.^[10]^[11]^[12] Mutations in the context of evolving ANNs are operations such as adding a layer, removing a layer or changing the type of a layer (e.g., from convolution to pooling). On CIFAR-10, evolution and RL performed comparably, while both outperformed random search.^[11]

Hill-climbing[]

Another group used a hill climbing procedure that applies network morphisms, followed by short cosine-annealing optimization runs. The approach yielded competitive results, requiring resources on the same order of magnitude as training a single network. E.g., on CIFAR-10, the method designed and trained a network with an error rate below 5% in 12 hours on a single GPU.^[13]

Multi-objective search[]

While most approaches solely focus on finding architecture with maximal predictive performance, for most practical applications other objectives are relevant, such as memory consumption, model size or inference time (i.e., the time required to obtain a prediction). Because of that, researchers created a multi-objective search.^[14]^[15]

LEMONADE^[14] is an evolutionary algorithm that adopted Lamarckism to efficiently optimize multiple objectives. In every generation, child networks are generated to improve the Pareto frontier with respect to the current population of ANNs.

Neural Architect^[15] is claimed to be a resource-aware multi-objective RL-based NAS with network embedding and performance prediction. Network embedding encodes an existing network to a trainable embedding vector. Based on the embedding, a controller network generates transformations of the target network. A multi-objective reward function considers network accuracy, computational resource and training time. The reward is predicted by multiple performance simulation networks that are pre-trained or co-trained with the controller network. The controller network is trained via policy gradient. Following a modification, the resulting candidate network is evaluated by both an accuracy network and a training time network. The results are combined by a reward engine that passes its output back to the controller network.

One-shot models[]

RL or evolution-based NAS require thousands of GPU-days of searching/training to achieve state-of-the-art computer vision results as described in the NASNet, mNASNet and MobileNetV3 papers.^[4]^[16]^[17]

To reduce computational cost, many recent NAS methods rely on the weight-sharing idea.^[18]^[19] In this approach a single supernetwork (also known as the one-shot model) is used as the search space where weights are shared among a large number of different sub-architectures that have edges in common, each of which is considered as a path within the supernet. The essential idea is to train one supernetwork that spans many options for the final design rather than generating and training thousands of networks independently. In addition to the learned parameters, a set of architecture parameters learn to prefer one module over another. Such methods reduce the required computational resources to only a few GPU days.

More recent works further combine this weight-sharing paradigm, with a continuous relaxation of the search space,^[20]^[21]^[22]^[23] which enables the use of gradient-based optimization methods. These approaches are generally referred to as differentiable NAS and have proven very efficient in exploring the search space of neural architectures.

Differentiable NAS has shown to produce competitive results using a fraction of the search-time required by RL-based search methods. For example, FBNet (which is short for Facebook Berkeley Network) demonstrated that supernetwork-based search produces networks that outperform the speed-accuracy tradeoff curve of mNASNet and MobileNetV2 on the ImageNet image-classification dataset. FBNet accomplishes this using over 400x less search time than was used for mNASNet.^[24]^[25]^[26] Further, SqueezeNAS demonstrated that supernetwork-based NAS produces neural networks that outperform the speed-accuracy tradeoff curve of MobileNetV3 on the Cityscapes semantic segmentation dataset, and SqueezeNAS uses over 100x less search time than was used in the MobileNetV3 authors' RL-based search.^[27]^[28]

References[]

^ Jump up to: ^a ^b Elsken, Thomas; Metzen, Jan Hendrik; Hutter, Frank (August 8, 2019). "Neural Architecture Search: A Survey". Journal of Machine Learning Research. 20 (55): 1–21. arXiv:1808.05377. Bibcode:2018arXiv180805377E – via jmlr.org.
^ Wistuba, Martin; Rawat, Ambrish; Pedapati, Tejaswini (2019-05-04). "A Survey on Neural Architecture Search". arXiv:1905.01392 [cs.LG].
^ Jump up to: ^a ^b ^c Zoph, Barret; Le, Quoc V. (2016-11-04). "Neural Architecture Search with Reinforcement Learning". arXiv:1611.01578 [cs.LG].
^ Jump up to: ^a ^b ^c ^d ^e Zoph, Barret; Vasudevan, Vijay; Shlens, Jonathon; Le, Quoc V. (2017-07-21). "Learning Transferable Architectures for Scalable Image Recognition". arXiv:1707.07012 [cs.CV].
^ Matthias Feurer and Frank Hutter. Hyperparameter optimization. In: AutoML: Methods, Sytems, Challenges, pages 3–38.
^ Vanschoren J. (2019) Meta-Learning. In: Hutter F., Kotthoff L., Vanschoren J. (eds) Automated Machine Learning, pages 35-61.
^ He, X., Zhao, K., & Chu, X (2021-01-05). "AutoML: A survey of the state-of-the-art". Knowledge-Based Systems. 212: 106622. arXiv:1908.00709. doi:10.1016/j.knosys.2020.106622. ISSN 0950-7051.CS1 maint: multiple names: authors list (link)
^ Zoph, Barret; Vasudevan, Vijay; Shlens, Jonathon; Le, Quoc V. (November 2, 2017). "AutoML for large scale image classification and object detection". Research Blog. Retrieved 2018-02-20.
^ Hieu, Pham; Y., Guan, Melody; Barret, Zoph; V., Le, Quoc; Jeff, Dean (2018-02-09). "Efficient Neural Architecture Search via Parameter Sharing". arXiv:1802.03268 [cs.LG].
^ Real, Esteban; Moore, Sherry; Selle, Andrew; Saxena, Saurabh; Suematsu, Yutaka Leon; Tan, Jie; Le, Quoc; Kurakin, Alex (2017-03-03). "Large-Scale Evolution of Image Classifiers". arXiv:1703.01041 [cs.NE].
^ Jump up to: ^a ^b Real, Esteban; Aggarwal, Alok; Huang, Yanping; Le, Quoc V. (2018-02-05). "Regularized Evolution for Image Classifier Architecture Search". arXiv:1802.01548 [cs.NE].
^ Stanley, Kenneth; Miikkulainen, Risto, "Evolving Neural Networks through Augmenting Topologies", in: Evolutionary Computation, 2002
^ Thomas, Elsken; Jan Hendrik, Metzen; Frank, Hutter (2017-11-13). "Simple And Efficient Architecture Search for Convolutional Neural Networks". arXiv:1711.04528 [stat.ML].
^ Jump up to: ^a ^b Elsken, Thomas; Metzen, Jan Hendrik; Hutter, Frank (2018-04-24). "Efficient Multi-objective Neural Architecture Search via Lamarckian Evolution". arXiv:1804.09081 [stat.ML].
^ Jump up to: ^a ^b Zhou, Yanqi; Diamos, Gregory. "Neural Architect: A Multi-objective Neural Architecture Search with Performance Prediction" (PDF). Baidu. Retrieved 2019-09-27.
^ Tan, Mingxing; Chen, Bo; Pang, Ruoming; Vasudevan, Vijay; Sandler, Mark; Howard, Andrew; Le, Quoc V. (2018). "MnasNet: Platform-Aware Neural Architecture Search for Mobile". arXiv:1807.11626 [cs.CV].
^ Howard, Andrew; Sandler, Mark; Chu, Grace; Chen, Liang-Chieh; Chen, Bo; Tan, Mingxing; Wang, Weijun; Zhu, Yukun; Pang, Ruoming; Vasudevan, Vijay; Le, Quoc V.; Adam, Hartwig (2019-05-06). "Searching for MobileNetV3". arXiv:1905.02244 [cs.CV].
^ Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Proceedings of the 35th International Conference on Machine Learning (2018).
^ Li, L., Talwalkar, A.: Random search and reproducibility for neural architecture search. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (2019).
^ H. Cai, L. Zhu, and S. Han. Proxylessnas: Direct neural architecture search on target task and hardware. ICLR, 2019.
^ X. Dong and Y. Yang. Searching for a robust neural architecture in four gpu hours. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2019.
^ H. Liu, K. Simonyan, and Y. Yang. Darts: Differentiable architecture search. In ICLR, 2019
^ S. Xie, H. Zheng, C. Liu, and L. Lin. Snas: stochastic neural architecture search. ICLR, 2019.
^ Wu, Bichen; Dai, Xiaoliang; Zhang, Peizhao; Wang, Yanghan; Sun, Fei; Wu, Yiming; Tian, Yuandong; Vajda, Peter; Jia, Yangqing; Keutzer, Kurt (24 May 2019). "FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search". arXiv:1812.03443 [cs.CV].
^ Sandler, Mark; Howard, Andrew; Zhu, Menglong; Zhmoginov, Andrey; Chen, Liang-Chieh (2018). "MobileNetV2: Inverted Residuals and Linear Bottlenecks". arXiv:1801.04381 [cs.CV].
^ Keutzer, Kurt (2019-05-22). "Co-Design of DNNs and NN Accelerators" (PDF). IEEE. Retrieved 2019-09-26.
^ Shaw, Albert; Hunter, Daniel; Iandola, Forrest; Sidhu, Sammy (2019). "SqueezeNAS: Fast neural architecture search for faster semantic segmentation". arXiv:1908.01748 [cs.CV].
^ Yoshida, Junko (2019-08-25). "Does Your AI Chip Have Its Own DNN?". EE Times. Retrieved 2019-09-26.

[survey-1] Jump up to: ^a ^b Elsken, Thomas; Metzen, Jan Hendrik; Hutter, Frank (August 8, 2019). "Neural Architecture Search: A Survey". Journal of Machine Learning Research. 20 (55): 1–21. arXiv:1808.05377. Bibcode:2018arXiv180805377E – via jmlr.org.

[survey2-2] Wistuba, Martin; Rawat, Ambrish; Pedapati, Tejaswini (2019-05-04). "A Survey on Neural Architecture Search". arXiv:1905.01392 [cs.LG].

[Zoph_2016-3] Jump up to: ^a ^b ^c Zoph, Barret; Le, Quoc V. (2016-11-04). "Neural Architecture Search with Reinforcement Learning". arXiv:1611.01578 [cs.LG].

[Zoph_2017-4] Jump up to: ^a ^b ^c ^d ^e Zoph, Barret; Vasudevan, Vijay; Shlens, Jonathon; Le, Quoc V. (2017-07-21). "Learning Transferable Architectures for Scalable Image Recognition". arXiv:1707.07012 [cs.CV].

[5] Matthias Feurer and Frank Hutter. Hyperparameter optimization. In: AutoML: Methods, Sytems, Challenges, pages 3–38.

[6] Vanschoren J. (2019) Meta-Learning. In: Hutter F., Kotthoff L., Vanschoren J. (eds) Automated Machine Learning, pages 35-61.

[7] He, X., Zhao, K., & Chu, X (2021-01-05). "AutoML: A survey of the state-of-the-art". Knowledge-Based Systems. 212: 106622. arXiv:1908.00709. doi:10.1016/j.knosys.2020.106622. ISSN 0950-7051.CS1 maint: multiple names: authors list (link)

[8] Zoph, Barret; Vasudevan, Vijay; Shlens, Jonathon; Le, Quoc V. (November 2, 2017). "AutoML for large scale image classification and object detection". Research Blog. Retrieved 2018-02-20.

[9] Hieu, Pham; Y., Guan, Melody; Barret, Zoph; V., Le, Quoc; Jeff, Dean (2018-02-09). "Efficient Neural Architecture Search via Parameter Sharing". arXiv:1802.03268 [cs.LG].

[10] Real, Esteban; Moore, Sherry; Selle, Andrew; Saxena, Saurabh; Suematsu, Yutaka Leon; Tan, Jie; Le, Quoc; Kurakin, Alex (2017-03-03). "Large-Scale Evolution of Image Classifiers". arXiv:1703.01041 [cs.NE].

[Real_2018-11] Jump up to: ^a ^b Real, Esteban; Aggarwal, Alok; Huang, Yanping; Le, Quoc V. (2018-02-05). "Regularized Evolution for Image Classifier Architecture Search". arXiv:1802.01548 [cs.NE].

[12] Stanley, Kenneth; Miikkulainen, Risto, "Evolving Neural Networks through Augmenting Topologies", in: Evolutionary Computation, 2002

[13] Thomas, Elsken; Jan Hendrik, Metzen; Frank, Hutter (2017-11-13). "Simple And Efficient Architecture Search for Convolutional Neural Networks". arXiv:1711.04528 [stat.ML].

[Elsken_2018-14] Jump up to: ^a ^b Elsken, Thomas; Metzen, Jan Hendrik; Hutter, Frank (2018-04-24). "Efficient Multi-objective Neural Architecture Search via Lamarckian Evolution". arXiv:1804.09081 [stat.ML].

[Zhou_2018-15] Jump up to: ^a ^b Zhou, Yanqi; Diamos, Gregory. "Neural Architect: A Multi-objective Neural Architecture Search with Performance Prediction" (PDF). Baidu. Retrieved 2019-09-27.

[mNASNet2-16] Tan, Mingxing; Chen, Bo; Pang, Ruoming; Vasudevan, Vijay; Sandler, Mark; Howard, Andrew; Le, Quoc V. (2018). "MnasNet: Platform-Aware Neural Architecture Search for Mobile". arXiv:1807.11626 [cs.CV].

[MobileNetV3-17] Howard, Andrew; Sandler, Mark; Chu, Grace; Chen, Liang-Chieh; Chen, Bo; Tan, Mingxing; Wang, Weijun; Zhu, Yukun; Pang, Ruoming; Vasudevan, Vijay; Le, Quoc V.; Adam, Hartwig (2019-05-06). "Searching for MobileNetV3". arXiv:1905.02244 [cs.CV].

[18] Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Proceedings of the 35th International Conference on Machine Learning (2018).

[19] Li, L., Talwalkar, A.: Random search and reproducibility for neural architecture search. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (2019).

[20] H. Cai, L. Zhu, and S. Han. Proxylessnas: Direct neural architecture search on target task and hardware. ICLR, 2019.

[21] X. Dong and Y. Yang. Searching for a robust neural architecture in four gpu hours. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2019.

[22] H. Liu, K. Simonyan, and Y. Yang. Darts: Differentiable architecture search. In ICLR, 2019

[23] S. Xie, H. Zheng, C. Liu, and L. Lin. Snas: stochastic neural architecture search. ICLR, 2019.

[FBNet-24] Wu, Bichen; Dai, Xiaoliang; Zhang, Peizhao; Wang, Yanghan; Sun, Fei; Wu, Yiming; Tian, Yuandong; Vajda, Peter; Jia, Yangqing; Keutzer, Kurt (24 May 2019). "FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search". arXiv:1812.03443 [cs.CV].

[MobileNetV2-25] Sandler, Mark; Howard, Andrew; Zhu, Menglong; Zhmoginov, Andrey; Chen, Liang-Chieh (2018). "MobileNetV2: Inverted Residuals and Linear Bottlenecks". arXiv:1801.04381 [cs.CV].

[26] Keutzer, Kurt (2019-05-22). "Co-Design of DNNs and NN Accelerators" (PDF). IEEE. Retrieved 2019-09-26.

[SqueezeNAS-27] Shaw, Albert; Hunter, Daniel; Iandola, Forrest; Sidhu, Sammy (2019). "SqueezeNAS: Fast neural architecture search for faster semantic segmentation". arXiv:1908.01748 [cs.CV].

[28] Yoshida, Junko (2019-08-25). "Does Your AI Chip Have Its Own DNN?". EE Times. Retrieved 2019-09-26.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]