Using Write Buffers in Systolic Array Architectures to Mitigate the Number of Memory Access Produced by Row Stationary Dataflows
Author
Metadata
Show full item recordAbstract
New applications of Deep Neural Networks are being designed such as fraud detection, short term weather precipitation forecasts, and cancer prognosis prediction. Nonetheless, their respective models are getting more complex with an increasing number of depth layers. These models require millions of computations that conventional CPU and GPU architectures will take a significant amount of computational time. The data distribution of these models is well known; they mostly consist of dot product operations between inputs and filters. Applications such as self-driving cars required fast response time and accurate predictions. Current research introduces accelerator architectures based on 2D systolic arrays as they provide high efficiency in performing multiplication and accumulation operations. Computational and power cost define performance, memory accesses attribute the highest cost to current architecture models. In order to enhance the performance of DNN accelerators, parallelism is extracted by breaking convolution into partial computations at the expense of segmenting output memory accesses. This thesis explores the implementation of an accumulator microarchitecture component based on column pipelined adder trees with the purpose of collecting and aggregating output computed values based on destination address. The results of this work showed a 3.3x and 2.15x speedup for Tiny-YOLO and AlexNet CNN using a 32x64 Systolic Array. Through the reduction of computed values developers will be able to explore novel data mappings to extract parallelism based on data locality.
Subject
Domain Specific ArchitecturesEdge Computing
Parallel Processing
Near-data Processing
CNN Accelerator
Accumulator
Buffer
Cycle Based Simulator
Systolic Array
Compilers
Concurrency
Citation
Peralta Velazquez, Daniel U (2021). Using Write Buffers in Systolic Array Architectures to Mitigate the Number of Memory Access Produced by Row Stationary Dataflows. Undergraduate Research Scholars Program. Available electronically from https : / /hdl .handle .net /1969 .1 /194328.