Differential Threshold Prefetching for Distributed NUCA Multi-Core System
Abstract
As the disparity between computational and interconnect speeds increases, the practical implementation of computers has shifted from a few, very wide cores to numerous smaller cores. These chip multiprocessors (CMPs) consist of multiple identical cores, generally connected in a mesh topology. Although each core maintains its own private lower level caches, the last level cache and main memory is accessed by all cores in parallel. In a two-dimensional topology, having a monolithic last level cache causes some cores to be physically closer to the bank, leading to an unfair disparity in cache access times. In addition, the complication of interconnect delays within the cache increases the access time for all the cores. This makes it more favorable to split the last level cache into identical, independent banks allowing concurrent access, leading to a Non-Uniform Cache Architecture. In modern CMPs, each core tends to have a private L1 and L2, as well as a slice of the shared L3 located in proximity. Prefetching has always been an important technique used to boost processor performance by reducing the cache miss rate. However, most of the poplar prefetchers are designed to improve performance of a single core processor and are completely oblivious to the structure of the CMP environment in which they operate. This motivates us to study the effect of being more aggressive while prefetching addresses mapped to a LLC bank far away from the core of execution, and vice-versa. Building on this, we propose a novel prefetching technique - the Distance Aware Prefetcher, which is expected to boost the performance of all the cores within a CMP environment, as compared to standalone prefetchers like the Signature Path Prefetcher.
Description
Keywords
prefetching, chip multiprocessor, non-uniform cache architcture, network-on-chip