NOTE: This item is not available outside the Texas A&M University network. Texas A&M affiliated users who are off campus can access the item through NetID and password authentication or by using TAMU VPN. Non-affiliated individuals should request a copy through their local library's interlibrary loan service.
Transparent process rollback recovery: some new techniques and a portable implementation
MetadataShow full item record
Processes in a distributed system can be made transparently recoverable through the use of process checkpointing, which periodically introduces a relatively large but temporary overhead, or message logging, which introduces a smaller overhead for every message sent, and requires process execution to be deterministic. Research prototypes have shown promising performance results [1, 21, but an implementation has not been readily available. To address that deficiency, this thesis describes the design, implementation, and failure-free performance of a new transparent recovery system for standard Unix workstations, which provides a basis for future experimental work. The system incorporates both coordinated checkpointing and family-based message logging with the logging site technique for reducing message logging overhead when some processes share a common memory address space (recently suggested independently by Vaidya [3) and Alvisi and Marzullo ). Performance measurements show the overhead of the consistent checkpointing and family-based message logging implementations to be reasonably small for a representative distributed application. This thesis also presents a new approach for efficient output commit and recovery when some processes are intermittently nondeterrninistic. The approach requires the application program to explicitly mark the beginning and end of each period of nondeterminism. Message logging is used during deterministic execution, but is disabled and replaced with optimistic checkpointing during periods of nondeterminism. The commit algorithm avoids communication with deterministic processes by using causal dependency information that makes a distinction between nondeterministic and deterministic state intervals. Finally, the concept of reactive replication for message logging is introduced. During failure free operation, reactive replication uses a low-overhead protocol that can tolerate only a single simultaneous failure, such as family-based message logging. If a failure occurs, the logged data necessary to recover the failed process is immediately copied to stable storage or to the volatile storage of another processor; a second failure can be tolerated only after the completion of that copy operation. This technique assumes that such a copy operation is faster than the recovery protocol.
DescriptionDue to the character of the original source materials and the nature of batch digitization, quality control issues may be present in this document. Please report any quality issues you encounter to email@example.com, referencing the URI of the item.
Includes bibliographical references.
Issued also on microfiche from Lange Micrographics.
Ellenberger, Ernest Lloyd (1995). Transparent process rollback recovery: some new techniques and a portable implementation. Master's thesis, Texas A&M University. Available electronically from
Request Open Access
This item and its contents are restricted. If this is your thesis or dissertation, you can make it open-access. This will allow all visitors to view the contents of the thesis.