Logo image
Evaluating Compiler IR-Level Selective Instruction Duplication with Realistic Hardware Errors
Conference proceeding

Evaluating Compiler IR-Level Selective Instruction Duplication with Realistic Hardware Errors

Chun-Kai Chang, Guanpeng Li and Mattan Erez
2019 IEEE/ACM 9th Workshop on Fault Tolerance for HPC at eXtreme Scale (FTXS), pp.41-49
11/2019
DOI: 10.1109/FTXS49593.2019.00010

View Online

Abstract

Hardware faults (i.e., soft errors) are projected to increase in modern HPC systems. The faults often lead to error propagation in programs and result in silent data corruptions (SDCs), seriously compromising system reliability. Selective instruction duplication, a widely used software-based error detector, has been shown to be effective in detecting SDCs with low performance overhead. In the past, researchers have relied on compiler intermediate representation (IR) for program reliability analysis and code transformation in selective instruction duplication. However, they assumed that the IR-based analysis and protection are representative under realistic fault models (i.e., faults originated at lower hardware layers). Unfortunately, the assumptions have not been fully validated, leading to questions about the accuracy and efficiency of the protection since IR is a higher level of abstraction and far away from hardware layers. In this paper, we verify the assumption by injecting realistic hardware faults to programs that are guided and protected by IR-based selective instruction duplication. We find that the protection yields high SDC coverage with low performance overhead even under realistic fault models, albeit a small amount of such faults escaping the detector. Our observations confirm that IR-based selective instruction duplication is a cost-effective method to protect programs from soft errors.
Circuit faults Detectors fault-injection Fault-tolerance Hardware instruction-duplication Integrated circuit modeling Logic gates Reliability resilience Software

Details

Metrics

Logo image