Scalable Systolic Array Architecture (SSAA) - A General Convolutional Neural Network ISA and Compiler-Enabled Dataflow to Maximize Parallelism While Reducing Memory Utilization

Lopez Carrasco, Cesar Ramon

dc.contributor.advisor	Gratz, Paul V
dc.contributor.advisor	Nowka, Kevin
dc.creator	Lopez Carrasco, Cesar Ramon
dc.date.accessioned	2021-05-11T20:14:59Z
dc.date.available	2022-12-01T08:18:33Z
dc.date.created	2020-12
dc.date.issued	2020-12-02
dc.date.submitted	December 2020
dc.identifier.uri	https://hdl.handle.net/1969.1/192993
dc.description.abstract	Convolutional Neural Networks have become the standard mechanism for machine vision problems due to their high accuracy and ability to keep improving with new data. Although precise, these algorithms are mathematically intensive, as a very large amount of independent dot products have to be performed. The sheer number of operations has slowed the adoption of these algorithms on real-time applications such as Autonomous Vehicles. Being massively parallel and performing thousands of operations with similar data, acceleration of these algorithms focuses on data reuse because extracting parallelism is trivial. This study introduces Scalable Systolic Array Architecture (SSAA) a simple scalable ISA that allows decoupling microarchitecture implementations of a systolic array, register file and memory hierarchy from operation scheduling. This decoupling allows independent studying of both the hardware implementations and implementation agnostic compilers that solely focus on operation scheduling. Here we use this framework to develop several compilers that allow the study of channel-wise implementations of row stationary implementations that spatially schedule different channels for the same output pixel instead of different rows in the filter. We find that on a 32x64 Systolic Array implementation of SSAA for both Alexnet and YOLO Tiny CNNs networks, our system reduces cache utilization by 3-20x when a cache holds an entire partition of the IF map. This yields a 5-10x speedup over the original implementation of row stationary by achieving up to 20x higher systolic array utilization in later layers. This is as result of the increasing number of channels being able to more easily saturate the systolic array.	en
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.subject	CNN	en
dc.subject	accelerators	en
dc.subject	archiecture	en
dc.subject	ai	en
dc.subject	ml	en
dc.subject	machine learning	en
dc.subject	heterogenous computing	en
dc.title	Scalable Systolic Array Architecture (SSAA) - A General Convolutional Neural Network ISA and Compiler-Enabled Dataflow to Maximize Parallelism While Reducing Memory Utilization	en
dc.type	Thesis	en
thesis.degree.department	Electrical and Computer Engineering	en
thesis.degree.discipline	Computer Engineering	en
thesis.degree.grantor	Texas A&M University	en
thesis.degree.name	Master of Science	en
thesis.degree.level	Masters	en
dc.contributor.committeeMember	Li, Peng
dc.type.material	text	en
dc.date.updated	2021-05-11T20:14:59Z
local.embargo.terms	2022-12-01
local.etdauthor.orcid	0000-0003-0777-6701

Files in this item

Name:: LOPEZCARRASCO-THESIS-2020.pdf
Size:: 687.6Kb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Electronic Theses, Dissertations, and Records of Study (2002– )
Texas A&M University Theses, Dissertations, and Records of Study (2002– )

Show simple item record