Core Concepts
The authors present a technique to efficiently construct a realistic neural architecture search (NAS) benchmark for the large-scale ImageNet2012 dataset, combined with performance metrics for various hardware accelerators including GPUs, TPUs, and FPGAs.
Abstract
The paper addresses the key challenges in constructing NAS benchmarks for large-scale datasets and hardware-aware model search:
-
Dataset Proxies and Sustainability:
- Existing NAS benchmarks use small datasets like CIFAR-10 or downsampled variants of ImageNet, which do not represent the true complexity of large-scale datasets.
- The authors propose a method to search for training proxies that can reduce the cost of benchmark construction for ImageNet2012 by significant margins.
-
On-Accelerator Performance:
- Previous NAS benchmarks use simplified analytical models or proxy datasets to estimate hardware performance, leading to unrealistic evaluations.
- The authors construct a benchmark that uses end-to-end throughput and latency measurements on real hardware accelerators, including GPUs, TPUs, and FPGAs.
The key steps are:
- Search for a proxified training scheme that maintains architecture rankings relative to a high-fidelity reference scheme, while reducing training cost by 5.6x.
- Collect datasets of 5.2k randomly sampled architectures on ImageNet2012 using the proxified scheme, along with their performance on various accelerators.
- Train surrogate models to predict accuracy and on-device performance, achieving high correlation with true values.
The authors demonstrate the utility of the constructed benchmark, Accel-NASBench, by performing uni-objective and bi-objective NAS experiments using various optimizers and hardware platforms. The results show that the benchmark can accurately simulate real-world performance and enable the discovery of state-of-the-art models at zero cost.
Stats
The training time of the proxified scheme is approximately 5.6x faster than the reference scheme.