Conference proceeding
Distributed fault-tolerance for large multiprocessor systems
Proceedings of the 7th annual symposium on computer architecture, pp.23-30
ISCA '80
05/06/1980
DOI: 10.1145/800053.801905
Abstract
Techniques for dealing with hardware failures in very large networks of distributed processing elements are presented. A concept known as distributed fault-tolerance is introduced. A model of a large multiprocessor system is developed and techniques, based on this model, are given by which each processing element can correctly diagnose failures in all other processing elements in the system. The effect of varying system interconnection structures upon the extent and efficiency of the diagnosis process is discussed, and illustrated with an example of an actual system.
Finally, extensions to the model, which render it more realistic, are given and a modified version of the diagnosis procedure is presented which operates under this model.
Details
- Title: Subtitle
- Distributed fault-tolerance for large multiprocessor systems
- Creators
- J KuhlS Reddy
- Resource Type
- Conference proceeding
- Publication Details
- Proceedings of the 7th annual symposium on computer architecture, pp.23-30
- Publisher
- ACM
- Series
- ISCA '80
- DOI
- 10.1145/800053.801905
- ISSN
- 1063-6897
- eISSN
- 2575-713X
- Language
- English
- Date published
- 05/06/1980
- Academic Unit
- Electrical and Computer Engineering
- Record Identifier
- 9984197428102771
Metrics
11 Record Views