Conference proceeding
BinFI: an efficient fault injector for safety-critical machine learning systems
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1-23
SC '19
11/17/2019
DOI: 10.1145/3295500.3356177
Abstract
As machine learning (ML) becomes pervasive in high performance computing, ML has found its way into safety-critical domains (e.g., autonomous vehicles). Thus the reliability of ML has grown in importance. Specifically, failures of ML systems can have catastrophic consequences, and can occur due to soft errors, which are increasing in frequency due to system scaling. Therefore, we need to evaluate ML systems in the presence of soft errors.
In this work, we propose BinFI , an efficient fault injector (FI) for finding the safety-critical bits in ML applications. We find the widely-used ML computations are often monotonic . Thus we can approximate the error propagation behavior of a ML application as a monotonic function. BinFI uses a binary-search like FI technique to pinpoint the safety-critical bits (also measure the overall resilience). BinFI identifies 99.56% of safety-critical bits (with 99.63% precision) in the systems, which significantly outperforms random FI, with much lower costs.
Details
- Title: Subtitle
- BinFI: an efficient fault injector for safety-critical machine learning systems
- Creators
- Zitao Chen - University of British ColumbiaGuanpeng Li - University of British ColumbiaKarthik Pattabiraman - University of British ColumbiaNathan DeBardeleben - Los Alamos National Laboratory
- Resource Type
- Conference proceeding
- Publication Details
- Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1-23
- Publisher
- ACM
- Series
- SC '19
- DOI
- 10.1145/3295500.3356177
- ISSN
- 2167-4329
- eISSN
- 2167-4337
- Grant note
- name: Natural Sciences and Engineering Research Council of Canada (NSERC)
- Language
- English
- Date published
- 11/17/2019
- Academic Unit
- Computer Science
- Record Identifier
- 9984259414302771
Metrics
4 Record Views