Conference proceeding
Modeling Soft-Error Propagation in Programs
2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp.27-38
06/2018
DOI: 10.1109/DSN.2018.00016
Abstract
As technology scales to lower feature sizes, devices become more susceptible to soft errors. Soft errors can lead to silent data corruptions (SDCs), seriously compromising the reliability of a system. Traditional hardware-only techniques to avoid SDCs are energy hungry, and hence not suitable for commodity systems. Researchers have proposed selective software-based protection techniques to tolerate hardware faults at lower costs. However, these techniques either use expensive fault injection or inaccurate analytical models to determine which parts of a program must be protected for preventing SDCs. In this work, we construct a three-level model, TRIDENT, that captures error propagation at the static data dependency, control-flow and memory levels, based on empirical observations of error propagations in programs. TRIDENT is implemented as a compiler module, and it can predict both the overall SDC probability of a given program and the SDC probabilities of individual instructions, without fault injection. We find that TRIDENT is nearly as accurate as fault injection and it is much faster and more scalable. We also demonstrate the use of TRIDENT to guide selective instruction duplication to efficiently mitigate SDCs under a given performance overhead bound.
Details
- Title: Subtitle
- Modeling Soft-Error Propagation in Programs
- Creators
- Guanpeng Li - University of British ColumbiaKarthik Pattabiraman - University of British ColumbiaSiva Kumar Sastry Hari - Nvidia (United States)Michael Sullivan - Nvidia (United States)Timothy Tsai - Nvidia (United States)
- Resource Type
- Conference proceeding
- Publication Details
- 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp.27-38
- DOI
- 10.1109/DSN.2018.00016
- eISSN
- 2158-3927
- Publisher
- IEEE
- Language
- English
- Date published
- 06/2018
- Academic Unit
- Computer Science
- Record Identifier
- 9984259407702771
Metrics
13 Record Views