Show simple item record

dc.contributor.advisorGratz, Paul V
dc.contributor.advisorNowka, Kevin
dc.creatorLopez Carrasco, Cesar Ramon
dc.date.accessioned2021-05-11T20:14:59Z
dc.date.available2022-12-01T08:18:33Z
dc.date.created2020-12
dc.date.issued2020-12-02
dc.date.submittedDecember 2020
dc.identifier.urihttps://hdl.handle.net/1969.1/192993
dc.description.abstractConvolutional Neural Networks have become the standard mechanism for machine vision problems due to their high accuracy and ability to keep improving with new data. Although precise, these algorithms are mathematically intensive, as a very large amount of independent dot products have to be performed. The sheer number of operations has slowed the adoption of these algorithms on real-time applications such as Autonomous Vehicles. Being massively parallel and performing thousands of operations with similar data, acceleration of these algorithms focuses on data reuse because extracting parallelism is trivial. This study introduces Scalable Systolic Array Architecture (SSAA) a simple scalable ISA that allows decoupling microarchitecture implementations of a systolic array, register file and memory hierarchy from operation scheduling. This decoupling allows independent studying of both the hardware implementations and implementation agnostic compilers that solely focus on operation scheduling. Here we use this framework to develop several compilers that allow the study of channel-wise implementations of row stationary implementations that spatially schedule different channels for the same output pixel instead of different rows in the filter. We find that on a 32x64 Systolic Array implementation of SSAA for both Alexnet and YOLO Tiny CNNs networks, our system reduces cache utilization by 3-20x when a cache holds an entire partition of the IF map. This yields a 5-10x speedup over the original implementation of row stationary by achieving up to 20x higher systolic array utilization in later layers. This is as result of the increasing number of channels being able to more easily saturate the systolic array.en
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectCNNen
dc.subjectacceleratorsen
dc.subjectarchiectureen
dc.subjectaien
dc.subjectmlen
dc.subjectmachine learningen
dc.subjectheterogenous computingen
dc.titleScalable Systolic Array Architecture (SSAA) - A General Convolutional Neural Network ISA and Compiler-Enabled Dataflow to Maximize Parallelism While Reducing Memory Utilizationen
dc.typeThesisen
thesis.degree.departmentElectrical and Computer Engineeringen
thesis.degree.disciplineComputer Engineeringen
thesis.degree.grantorTexas A&M Universityen
thesis.degree.nameMaster of Scienceen
thesis.degree.levelMastersen
dc.contributor.committeeMemberLi, Peng
dc.type.materialtexten
dc.date.updated2021-05-11T20:14:59Z
local.embargo.terms2022-12-01
local.etdauthor.orcid0000-0003-0777-6701


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record