Journal article
Configurable Detection of SDC-causing Errors in Programs
ACM transactions on embedded computing systems, Vol.16(3), pp.1-25
07/01/2017
DOI: 10.1145/3014586
Abstract
Silent Data Corruption (SDC) is a serious reliability issue in many domains, including embedded systems. However, current protection techniques are brittle and do not allow programmers to trade off performance for SDC coverage. Further, many require tens of thousands of fault-injection experiments, which are highly time-and resource-intensive. In this article, we propose two empirical models, SDCTune and SDCAuto, to predict the SDC proneness of a program's data. Both models are based on static and dynamic features of the program alone and do not require fault injections to be performed. The main difference between them is that SDCTune requires manual tuning while SDCAuto is completely automated, using machine-learning algorithms.
We then develop an algorithm using both models to selectively protect the most SDC-prone data in the program subject to a given performance overhead bound. Our results show that both models are accurate at predicting the relative SDC rate of an application compared to fault injection, for a fraction of the time taken. Further, in terms of efficiency of detection (i.e., ratio of SDC coverage provided to performance overhead), our technique outperforms full duplication by a factor of 0.78x to 1.65x with the SDCTune model and 0.62x to 0.96x with SDCAuto model.
Details
- Title: Subtitle
- Configurable Detection of SDC-causing Errors in Programs
- Creators
- Qining Lu - University of British ColumbiaGuanpeng Li - University of British ColumbiaKarthik Pattabiraman - University of British ColumbiaMeeta S. Gupta - Founder Shumee Toys, IndiaJude A. Rivers - Technology Consultant, NY, USA
- Resource Type
- Journal article
- Publication Details
- ACM transactions on embedded computing systems, Vol.16(3), pp.1-25
- DOI
- 10.1145/3014586
- ISSN
- 1539-9087
- eISSN
- 1558-3465
- Publisher
- Assoc Computing Machinery
- Number of pages
- 25
- Grant note
- Natural Science and Engineering Research Council of Canada (NSERC); Natural Sciences and Engineering Research Council of Canada (NSERC) HR0011-13-C-0022 / Microsystems Technology Office (MTO) Defense Advanced Research Projects Agency (DARPA); United States Department of Defense
- Language
- English
- Date published
- 07/01/2017
- Academic Unit
- Computer Science
- Record Identifier
- 9984259470002771
Metrics
15 Record Views