Power and Performance Optimization in GPGPU

Radaideh, Ahmad Mahmoud Mesleh

dc.contributor.advisor	Gratz, Paul
dc.creator	Radaideh, Ahmad Mahmoud Mesleh
dc.date.accessioned	2021-01-06T21:53:00Z
dc.date.available	2021-01-06T21:53:00Z
dc.date.created	2020-05
dc.date.issued	2020-04-20
dc.date.submitted	May 2020
dc.identifier.uri	https://hdl.handle.net/1969.1/191830
dc.description.abstract	Thread parallel hardware, as the Graphics Processing Units (GPUs), greatly outperform CPUs in providing high compute throughput and memory bandwidth which make them ideal for accelerating various data-parallel applications. These hardware designs provide high performance computing by supporting a massive thread level parallelism (TLP) processing model. Our work focuses on making the thread parallel hardware more power and energy efficient and higher performance. It also focuses on making the simulation of this type of hardware more accurate. Our work is divided into three main parts: (1) We introduce a coalescing-aware register file organization that takes advantage of frequent narrow-width data present in general-purpose applications in order to increase performance and reduce energy consumption in GPU. We present a new design that is capable of combining read and write accesses originated from same or different warps into fewer accesses. Our design reduces the number of register file accesses by 30.5%, achieves IPC speedup of 16.5%, and reduces overall GPU energy by 32.2% on average. (2) We present a low-cost power saving scheme in GPU that dynamically exploits frequent zero data within and across registers in order to gate off register file reads and writes and execution units to reduce dynamic power without impacting performance. Our scheme reduces register file reads and writes on average by 50% and 54%, respectively. The register file and execution unit dynamic power are reduced on average by 27% and 19%, respectively. The reduction in total GPU dynamic power achieved is about 8% on average. (3) For multi-threaded applications, the results taken from full system architecture simulation can often be inconsistent, primarily because of a combination of small input sets and the behavior of the Linux thread scheduler. We propose a simple solution wherein the scheduler is modified to enforce mapping of software threads into available distinct processors that provides consistent runtimes for short-run, multi-thread benchmarks, leading to expected, consistent experimental results.	en
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.subject	Register file coalescing	en
dc.subject	zero gating	en
dc.subject	power optimization	en
dc.subject	performance optimization	en
dc.subject	GPGPU	en
dc.subject	full system simulation	en
dc.subject	impact of thread scheduler	en
dc.title	Power and Performance Optimization in GPGPU	en
dc.type	Thesis	en
thesis.degree.department	Electrical and Computer Engineering	en
thesis.degree.discipline	Computer Engineering	en
thesis.degree.grantor	Texas A&M University	en
thesis.degree.name	Doctor of Philosophy	en
thesis.degree.level	Doctoral	en
dc.contributor.committeeMember	Hu, Jiang
dc.contributor.committeeMember	Braga Neto, Ulisses
dc.contributor.committeeMember	Kim, Eun
dc.type.material	text	en
dc.date.updated	2021-01-06T21:53:01Z
local.etdauthor.orcid	0000-0003-2943-5019

Files in this item

Name:: RADAIDEH-DISSERTATION-2020.pdf
Size:: 6.599Mb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Electronic Theses, Dissertations, and Records of Study (2002– )
Texas A&M University Theses, Dissertations, and Records of Study (2002– )

Show simple item record