Logo image
Modeling Soft-Error Propagation in Programs
Conference proceeding

Modeling Soft-Error Propagation in Programs

Guanpeng Li, Karthik Pattabiraman, Siva Kumar Sastry Hari, Michael Sullivan and Timothy Tsai
2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp.27-38
06/2018
DOI: 10.1109/DSN.2018.00016

View Online

Abstract

As technology scales to lower feature sizes, devices become more susceptible to soft errors. Soft errors can lead to silent data corruptions (SDCs), seriously compromising the reliability of a system. Traditional hardware-only techniques to avoid SDCs are energy hungry, and hence not suitable for commodity systems. Researchers have proposed selective software-based protection techniques to tolerate hardware faults at lower costs. However, these techniques either use expensive fault injection or inaccurate analytical models to determine which parts of a program must be protected for preventing SDCs. In this work, we construct a three-level model, TRIDENT, that captures error propagation at the static data dependency, control-flow and memory levels, based on empirical observations of error propagations in programs. TRIDENT is implemented as a compiler module, and it can predict both the overall SDC probability of a given program and the SDC probabilities of individual instructions, without fault injection. We find that TRIDENT is nearly as accurate as fault injection and it is much faster and more scalable. We also demonstrate the use of TRIDENT to guide selective instruction duplication to efficiently mitigate SDCs under a given performance overhead bound.
Probability Analytical models Error Propagation Error Resilience Hardware Predictive models Program Analysis Program processors Scalability Silent Data Corruption Soft Error

Details

Metrics

Logo image