Energy-Efficient Accelerator Design for Emerging Applications

Kim, Kyung Hoon

dc.contributor.advisor	Kim, Eun Jung
dc.creator	Kim, Kyung Hoon
dc.date.accessioned	2021-05-17T15:41:22Z
dc.date.available	2023-05-01T06:37:33Z
dc.date.created	2021-05
dc.date.issued	2021-03-11
dc.date.submitted	May 2021
dc.identifier.uri	https://hdl.handle.net/1969.1/193119
dc.description.abstract	Today, hardware accelerators are widely accepted as a cost-effective solution for emerging applications in computing platforms from servers to mobile devices. Servers often leverage manycore accelerators such as Graphics Processing Units (GPUs) to achieve high performance gain by exploiting simple yet energy-efficient compute cores. The tremendous computing power of GPUs shows great potential to keep up with the emerging applications that demand heavy computation on a large volume of data. However, scaling up single-chip GPUs is challenging due to strict chip power constraints. The data movement overhead over the Network-on-Chip (NoC) becomes a key performance bottleneck in large-scale GPUs that degrades both overall performance and energy efficiency. Mobile devices are inherently even more restricted by energy constraints than servers so that they often leverage low-power accelerators for particular functionalities including inference in Deep Neural Networks (DNNs). However, the emerging applications that typically rely on DNNs require considerable computation due to complex algorithmic operations, which becomes a key energy bottleneck. To tackle the performance and energy bottlenecks fundamentally, we propose three approaches that focus on minimizing unnecessary data movement and computation. First, we propose a packet coalescing mechanism to coalesce redundant packets over the NoC of GPUs and transfer the coalesced packet in a multicast. Second, we present a packet compression mechanism to directly reduce the packet size based on a dual-pattern compression technique with data preprocessing capability. Third, we propose an optimization methodology for a convolutional neural network (CNN) that uses an early prediction and reduces the complexity of compute kernels in CNNs by guiding them to compute critical features only. In our analysis, the packet coalescing and packet compression approaches show 15% and 33% IPC improvements in a large-scale GPU on average across various modern applications. Besides, the network optimization methodology reduces the inference energy cost of CNNs by 77% on average with an ignorable accuracy drop in a time-series classification problem.	en
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.subject	GPU	en
dc.subject	Packet Coalescing	en
dc.subject	Packet Compression	en
dc.subject	AI-accelerator	en
dc.subject	Feature Criticality	en
dc.subject	Genetic Algorithm	en
dc.title	Energy-Efficient Accelerator Design for Emerging Applications	en
dc.type	Thesis	en
thesis.degree.department	Computer Science and Engineering	en
thesis.degree.discipline	Computer Engineering	en
thesis.degree.grantor	Texas A&M University	en
thesis.degree.name	Doctor of Philosophy	en
thesis.degree.level	Doctoral	en
dc.contributor.committeeMember	Jiménez, Daniel
dc.contributor.committeeMember	Da Silva, Dilma
dc.contributor.committeeMember	Gratz, Paul
dc.type.material	text	en
dc.date.updated	2021-05-17T15:41:23Z
local.embargo.terms	2023-05-01
local.etdauthor.orcid	0000-0003-4916-7058

Files in this item

Name:: KIM-DISSERTATION-2021.pdf
Size:: 3.129Mb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Electronic Theses, Dissertations, and Records of Study (2002– )
Texas A&M University Theses, Dissertations, and Records of Study (2002– )

Show simple item record