Conference proceeding
Demystifying and Mitigating Cross-Layer Deficiencies of Soft Error Protection in Instruction Duplication
SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Vol.November 2023, pp.1-13
SC '23: International Conference for High Performance Computing, Networking, Storage and Analysis (CO, Denver, USA, 11/12/2023–11/17/2023)
11/11/2023
DOI: 10.1145/3581784.3607078
Appears in UI Libraries Support Open Access
Abstract
Soft errors are prevalent in modern High-Performance Computing (HPC) systems, resulting in silent data corruptions (SDCs), compromising system reliability. Instruction duplication is a widely used software-based protection technique against SDCs. Existing instruction duplication techniques are mostly implemented at LLVM level and may suffer from low SDC coverage at assembly level. In this paper, we evaluate instruction duplication at both LLVM and assembly levels. Our study shows that existing instruction duplication techniques have protection deficiency at assembly level and are usually over-optimistic in the protection. We investigate the root-causes of the protection deficiency and propose a mitigation technique, Flowery, to solve the problem. Our evaluation shows that Flowery can effectively protect programs from SDCs evaluated at assembly level.
Details
- Title: Subtitle
- Demystifying and Mitigating Cross-Layer Deficiencies of Soft Error Protection in Instruction Duplication
- Creators
- Zhengyang He - University of Iowa, Computer ScienceYafan Huang - University of Iowa, Computer ScienceHui Xu - Fudan UniversityDingwen Tao - Indiana UniversityGuanpeng Li - University of Iowa, Computer Science
- Resource Type
- Conference proceeding
- Publication Details
- SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Vol.November 2023, pp.1-13
- Conference
- SC '23: International Conference for High Performance Computing, Networking, Storage and Analysis (CO, Denver, USA, 11/12/2023–11/17/2023)
- DOI
- 10.1145/3581784.3607078
- Publisher
- Association for Computing Machinery (ACM)
- Language
- English
- Date published
- 11/11/2023
- Academic Unit
- Computer Science
- Record Identifier
- 9984517158102771
Metrics
20 Record Views