Show simple item record

dc.contributor.advisorKim, Eun Jung
dc.creatorKim, Kyung Hoon
dc.date.accessioned2021-05-17T15:41:22Z
dc.date.available2023-05-01T06:37:33Z
dc.date.created2021-05
dc.date.issued2021-03-11
dc.date.submittedMay 2021
dc.identifier.urihttps://hdl.handle.net/1969.1/193119
dc.description.abstractToday, hardware accelerators are widely accepted as a cost-effective solution for emerging applications in computing platforms from servers to mobile devices. Servers often leverage manycore accelerators such as Graphics Processing Units (GPUs) to achieve high performance gain by exploiting simple yet energy-efficient compute cores. The tremendous computing power of GPUs shows great potential to keep up with the emerging applications that demand heavy computation on a large volume of data. However, scaling up single-chip GPUs is challenging due to strict chip power constraints. The data movement overhead over the Network-on-Chip (NoC) becomes a key performance bottleneck in large-scale GPUs that degrades both overall performance and energy efficiency. Mobile devices are inherently even more restricted by energy constraints than servers so that they often leverage low-power accelerators for particular functionalities including inference in Deep Neural Networks (DNNs). However, the emerging applications that typically rely on DNNs require considerable computation due to complex algorithmic operations, which becomes a key energy bottleneck. To tackle the performance and energy bottlenecks fundamentally, we propose three approaches that focus on minimizing unnecessary data movement and computation. First, we propose a packet coalescing mechanism to coalesce redundant packets over the NoC of GPUs and transfer the coalesced packet in a multicast. Second, we present a packet compression mechanism to directly reduce the packet size based on a dual-pattern compression technique with data preprocessing capability. Third, we propose an optimization methodology for a convolutional neural network (CNN) that uses an early prediction and reduces the complexity of compute kernels in CNNs by guiding them to compute critical features only. In our analysis, the packet coalescing and packet compression approaches show 15% and 33% IPC improvements in a large-scale GPU on average across various modern applications. Besides, the network optimization methodology reduces the inference energy cost of CNNs by 77% on average with an ignorable accuracy drop in a time-series classification problem.en
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectGPUen
dc.subjectPacket Coalescingen
dc.subjectPacket Compressionen
dc.subjectAI-acceleratoren
dc.subjectFeature Criticalityen
dc.subjectGenetic Algorithmen
dc.titleEnergy-Efficient Accelerator Design for Emerging Applicationsen
dc.typeThesisen
thesis.degree.departmentComputer Science and Engineeringen
thesis.degree.disciplineComputer Engineeringen
thesis.degree.grantorTexas A&M Universityen
thesis.degree.nameDoctor of Philosophyen
thesis.degree.levelDoctoralen
dc.contributor.committeeMemberJiménez, Daniel
dc.contributor.committeeMemberDa Silva, Dilma
dc.contributor.committeeMemberGratz, Paul
dc.type.materialtexten
dc.date.updated2021-05-17T15:41:23Z
local.embargo.terms2023-05-01
local.etdauthor.orcid0000-0003-4916-7058


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record