A Structural Analysis of On-Line Fault Detection Mechanisms in Network-On-Chip Architectures
Abstract
Network-on-Chip (NoC) communication architectures are widely used as on-chip interconnect in multi-core systems. These systems are increasingly used in safety-critical applications, so it is essential to quickly detect faults within the NoC during system operation.
The current approach to detect a fault in an NoC system is to apply periodic test using built-in self-test (BIST) circuitry during system idle periods. This approach has the advantage that it is a structural test, so can quickly achieve high fault coverage. A second advantage is that the BIST infrastructure can be used during manufacturing test. The disadvantage is the need for the idle time to apply the test, and the time to save/restore the functional state that is overwritten during the test. An additional disadvantage is that the system is at risk of an undetected fault between self-tests.
In this research we propose to test the NoC system while it is in functional operation, which is an on-line test. We will use functional invariants to detect errors in functional operation, which can then trigger diagnosis and fault recovery or system reconfiguration. The advantage of this approach is that it minimizes fault detection latency, and avoids the need for a system idle period or for save/restore state operations. The disadvantage is that it is much more difficult to achieve high fault coverage since our approach detects functional errors based on existing functional network traffic, rather than self-test stimulus.
In order to evaluate our functional test approach, we have designed a gate-level NoC implementation, which can be the target for gate-level fault injection and simulation using realistic network traffic. We inject stuck-at faults and single event transients into the gate-level logic during simulation of synthetic NoC traffic. We found that the functional invariants proposed in prior work miss detection of many faults. Most of these escapes are detected by end-to-end cyclical redundancy checks. However, we found it necessary to create additional functional checkers to detect the remaining faults.
Citation
Oh, Han Bee (2020). A Structural Analysis of On-Line Fault Detection Mechanisms in Network-On-Chip Architectures. Master's thesis, Texas A&M University. Available electronically from https : / /hdl .handle .net /1969 .1 /191802.