Conference proceeding
A Scalable System for Neural Architecture Search
2020 10th Annual Computing and Communication Workshop and Conference (CCWC), pp.0053-0060
01/2020
DOI: 10.1109/CCWC47524.2020.9031181
Abstract
Building reliable systems for neural architecture search requires careful design consideration due to the high computational demands coupled with the necessity of fault-tolerance. In this domain, it is not uncommon for applications to crash due to GPU memory exhaustion, which makes fault-tolerance and even more important attribute of a distributed neural architecture search system. We propose an RPC-based system that is robust to node failures and provides elastic compute abilities, allowing the system to add or remove computational resources as needed. The system is demonstrated on the task of neural architecture search for image classification using the CIFAR-10 dataset. Our system achieves near linear scaling and is robust to multiple GPU node failures, allowing the failed nodes to restart and rejoin.
Details
- Title: Subtitle
- A Scalable System for Neural Architecture Search
- Creators
- Jeff Hajewski - University of IowaSuely Oliveira - University of Iowa
- Resource Type
- Conference proceeding
- Publication Details
- 2020 10th Annual Computing and Communication Workshop and Conference (CCWC), pp.0053-0060
- DOI
- 10.1109/CCWC47524.2020.9031181
- Publisher
- IEEE
- Language
- English
- Date published
- 01/2020
- Academic Unit
- Computer Science; Mathematics
- Record Identifier
- 9984259466702771
Metrics
29 Record Views