Conference proceeding
PEPPA-X: Finding Program Test Inputs to Bound Silent Data Corruption Vulnerability in HPC Applications
SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
International Conference for High Performance Computing Networking Storage and Analysis
SC '21: The International Conference for High Performance Computing, Networking, Storage and Analysis (St. Louis, Missouri, 11/14/2021–11/19/2021)
01/01/2021
DOI: 10.1145/3458817.3476147
Appears in UI Libraries Support Open Access
Abstract
Transient hardware faults have become prevalent due to the shrinking size of transistors, leading to silent data corruptions (SDCs). Therefore, I IPC applications need to be evaluated (e.g., via fault injections) and protected to meet the reliability target. In the evaluation, the target programs exercise with a set of given aunts which are usually from program benclunark suite. However, these inputs rarely manifest the SDC, vulnerabilities, leading to over-optimistic assessment and unexpectedly higher failure rates in production. We propose PEPPA-X, which efficiently identifies the test inputs that estimate the bound of program SDC resiliency. Our key insight is that the SDC sensitivity distribution in a program often remains stationary across input space. Thereby, we can guide the search of SDC-bound inputs by a sampled distribution. Our evaluation shows that PErrA-X can identify the SDC-bound input of a program that existing methods cannot find even with 5x more search time.
Details
- Title: Subtitle
- PEPPA-X: Finding Program Test Inputs to Bound Silent Data Corruption Vulnerability in HPC Applications
- Creators
- Md Hasanur Rahman - Univ Iowa, Iowa City, IA 52242 USAAabid Shamji - Univ Iowa, Iowa City, IA 52242 USAShengjian Guo - Baidu (China)Guanpeng Li - Univ Iowa, Iowa City, IA 52242 USA
- Resource Type
- Conference proceeding
- Publication Details
- SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
- Conference
- SC '21: The International Conference for High Performance Computing, Networking, Storage and Analysis (St. Louis, Missouri, 11/14/2021–11/19/2021)
- Series
- International Conference for High Performance Computing Networking Storage and Analysis
- DOI
- 10.1145/3458817.3476147
- ISSN
- 2167-4329
- Publisher
- Association for Computing Machinery (ACM)
- Number of pages
- 14
- Language
- English
- Date published
- 01/01/2021
- Academic Unit
- Computer Science
- Record Identifier
- 9984410844802771
Metrics
13 Record Views