Reconfiguration algorithms for fault tolerance in hypercubes

Al-Tawil, K. M.

NOTE: This item is not available outside the Texas A&M University network. Texas A&M affiliated users who are off campus can access the item through NetID and password authentication or by using TAMU VPN. Non-affiliated individuals should request a copy through their local library's interlibrary loan service.

View/ Open

1556387.pdf (70.43Mb)

Date

1994

Author

Al-Tawil, K. M.

Metadata

Show full item record

Abstract

Recently, a cost-effective and feasible approach to supercomputing by connecting a large number of processors has been offered by hypercube multicomputers. However, a major drawback of such systems is that a single processor failure may destroy the whole network. As the probability of any one or more processors failing in such a complex system is great, building some fault-tolerance feature into them becomes extremely important. In this dissertation, a new effective approach is proposed for achieving fault tolerance in hypercubes. This approach will enable the hypercube to continue working in case of failure in a short time. In particular, three algorithms are presented for reconfiguring faulty hypercubes using spanning trees to achieve communication between the unfaulty processors. The first algorithm uses the completely unbalanced spanning tree (CUST), the second one uses the balanced spanning tree (BST), and the third algorithm is based on using general spanning trees (GST) for reconfiguring the hypercube to avoid faulty nodes. The parallel algorithm developed for reconfiguring faulty hypercubes using GST can be generalized to work for reconfiguring multicomputer networks in general in the presence of faults. These algorithms use, at most, one used link and one unused link for every reconstructed path in the reconfigured hypercube. These reconstruction algorithms can reconfigure around any single fault in 0(l) reconfiguration time with no extra-dilation and will increase the congestion of a link by, at most, one. The algorithms are optimal, in terms of the reconfiguration time, and can be used for reconfiguration of spanning trees, which have been used extensively for communications and broadcasting, in a hypercube in the presence of node failures. Simulation results for the algorithms under multiple faults also are presented. Results show that the algorithms provide close to 100% fault coverage of double and triple faults for hypercubes having a dimension of 10 or more. Fault coverage and congestion results for up to 60 faults having different cube sizes are presented. We also evaluate the improvement of reliability in hypercubes based on these reconfiguration algorithms. With these algorithms, the system's recovery is shown to be faster, improving hypercube reliability. A dependability evaluation study, based on a Markov chain model for a cluster of four hypercube nodes, is presented.

URI

https://hdl.handle.net/1969.1/200901

Description

Vita
"Major Subject: Computer Science"

Subject

Major computer science
1994 Dissertation A465

Collections

Digitized Theses and Dissertations (1922–2004)

Citation

Al-Tawil, K. M. (1994). Reconfiguration algorithms for fault tolerance in hypercubes. Texas A&M University. Texas A&M University. Libraries. Available electronically from https : / /hdl .handle .net /1969 .1 /200901.

This item and its contents are restricted. If this is your thesis or dissertation, you can make it open-access. This will allow all visitors to view the contents of the thesis.

Request Open Access