Abstract
As information systems become more distributed and complex, they become more difficult to manage. The need for automated management of all aspects thus becomes a pressing requirement. A crucial aspect of automatic management is automatic fault recovery. The thesis proposes a distributed algorithm which coordinates management entities, called agents, to monitor the managed resources (which have directed failure and recovery dependencies among them) and perform recovery actions once failures have been found. We carefully define the problem to be solved and its motivation; relate its system model to a general object model for automatic fault recovery of distributed applications; describe all the assumptions necessary for the algorithm to work; present a rigorous description of the algorithm in the setting of I/O Automata; prove the correctness of the algorithm; perform efficiency analysis on it; and point out some unaddressed important areas to invite future work.
Yang, Jiantian (1996). An algorithm for recovery of distributed applications with directed dependencies. Master's thesis, Texas A&M University. Available electronically from
https : / /hdl .handle .net /1969 .1 /ETD -TAMU -1996 -THESIS -Y36.