Logo image
Demystifying and Mitigating Cross-Layer Deficiencies of Soft Error Protection in Instruction Duplication
Conference proceeding   Open access

Demystifying and Mitigating Cross-Layer Deficiencies of Soft Error Protection in Instruction Duplication

Zhengyang He, Yafan Huang, Hui Xu, Dingwen Tao and Guanpeng Li
SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Vol.November 2023, pp.1-13
SC '23: International Conference for High Performance Computing, Networking, Storage and Analysis (CO, Denver, USA, 11/12/2023–11/17/2023)
11/11/2023
DOI: 10.1145/3581784.3607078
url
https://doi.org/10.1145/3581784.3607078View
Published (Version of record) Open Access

Abstract

Soft errors are prevalent in modern High-Performance Computing (HPC) systems, resulting in silent data corruptions (SDCs), compromising system reliability. Instruction duplication is a widely used software-based protection technique against SDCs. Existing instruction duplication techniques are mostly implemented at LLVM level and may suffer from low SDC coverage at assembly level. In this paper, we evaluate instruction duplication at both LLVM and assembly levels. Our study shows that existing instruction duplication techniques have protection deficiency at assembly level and are usually over-optimistic in the protection. We investigate the root-causes of the protection deficiency and propose a mitigation technique, Flowery, to solve the problem. Our evaluation shows that Flowery can effectively protect programs from SDCs evaluated at assembly level.
Architecture compiler transformation instruction duplication system reliability hardware transient faults fault injection UIOWA OA Agreement

Details

Metrics

20 Record Views
Logo image