Distributed Recovery for Enterprise Services

Shane S. Clark; Jacob Beal; Partha Pal

doi:10.1109/SASO.2015.19

Back

Conference proceeding

Distributed Recovery for Enterprise Services

Shane S. Clark, Jacob Beal and Partha Pal

2015 IEEE 9th International Conference on Self-Adaptive and Self-Organizing Systems, Vol.2015-, pp.111-120

09/01/2015

DOI: 10.1109/SASO.2015.19

View Online

Abstract

Small-to medium-scale enterprise systems are typically complex and highly specialized, but lack the management resources that can be devoted to large-scale (e.g., Cloud) systems, making them extremely challenging to manage. Here we present an adaptive algorithm for addressing a common management problem in enterprise service networks: safely and rapidly recovering from the failure of one or more services. Due to poorly documented and shifting dependencies, a typical industry practice for this situation is to bring the entire system down, then to restart services one at a time in a predefined order. We improve on this practice with the Dependency-Directed Recovery (DDR) algorithm, which senses dependencies by observing network interactions and recovers near-optimally from failures following a distributed graph algorithm. Our Java-based implementation of this system is suitable for deployment with a wide variety of networked enterprise services, and we validate its correct operation and advantage over fixed-order restart with emulation experiments on networks of up to 20 services.

aggregate programming

Databases

distributed algorithms

Electronic mail

enterprise systems

Logic gates

Monitoring

protelis

Reliability

Servers

Sockets

Details

Title: Subtitle: Distributed Recovery for Enterprise Services
Creators: Shane S. Clark - RTX
Jacob Beal - RTX
Partha Pal - RTX
Resource Type: Conference proceeding
Publication Details: 2015 IEEE 9th International Conference on Self-Adaptive and Self-Organizing Systems, Vol.2015-, pp.111-120
Publisher: IEEE
DOI: 10.1109/SASO.2015.19
ISSN: 1949-3673
eISSN: 1949-3681
Language: English
Date published: 09/01/2015
Academic Unit: Electrical and Computer Engineering
Record Identifier: 9984627232602771

Metrics

4 Record Views

7 Times Cited - Web of Science