dc.contributor.advisor | Kim, Eun Jung | |
dc.creator | Huang, Jiayi | |
dc.date.accessioned | 2021-02-02T21:42:11Z | |
dc.date.available | 2022-08-01T06:53:13Z | |
dc.date.created | 2020-08 | |
dc.date.issued | 2020-07-22 | |
dc.date.submitted | August 2020 | |
dc.identifier.uri | https://hdl.handle.net/1969.1/192321 | |
dc.description.abstract | The onset of big data and deep learning applications, mixed with conventional general-purpose programs, have driven computer architecture to embrace heterogeneity with specialization. With the ever-increasing interconnected chip components, future architectures are required to operate under a stricter power budget and process emerging big data applications efficiently. Interconnection network as the communication backbone thus is facing the grand challenges of limited power envelope, data movement and performance scaling. This dissertation provides interconnect solutions that are specialized to application requirements towards power-/energy-efficient and high-performance computing for heterogeneous architectures.
This dissertation examines the challenges of network-on-chip router power-gating techniques for general-purpose workloads to save static power. A voting approach is proposed as an adaptive power-gating policy that considers both local and global traffic status through router voting. In addition, low-latency routing algorithms are designed to guarantee performance in irregular power-gating networks. This holistic solution not only saves power but also avoids performance overhead.
This research also introduces emerging computation paradigms to interconnects for big data applications to mitigate the pressure of data movement. Approximate network-on-chip is proposed to achieve high-throughput communication by means of lossy compression. Then, near-data processing is combined with in-network computing to further improve performance while reducing data movement. The two schemes are general to play as plug-ins for different network topologies and routing algorithms.
To tackle the challenging computational requirements of deep learning workloads, this dissertation investigates the compelling opportunities of communication algorithm-architecture co-design to accelerate distributed deep learning. MultiTree allreduce algorithm is proposed to bond with message scheduling with network topology to achieve faster and contention-free communication. In addition, the interconnect hardware and flow control are also specialized to exploit deep learning communication characteristics and fulfill the algorithm needs, thereby effectively improving the performance and scalability.
By considering application and algorithm characteristics, this research shows that interconnection network can be tailored accordingly to improve the power-/energy-efficiency and performance to satisfy heterogeneous computation and communication requirements. | en |
dc.format.mimetype | application/pdf | |
dc.language.iso | en | |
dc.subject | Interconnection network | en |
dc.subject | computer architecture | en |
dc.subject | heterogeneous architecture | en |
dc.subject | communication acceleration | en |
dc.subject | interconnect specialization | en |
dc.title | Efficient Interconnection Network Design for Heterogeneous Architectures | en |
dc.type | Thesis | en |
thesis.degree.department | Computer Science and Engineering | en |
thesis.degree.discipline | Computer Engineering | en |
thesis.degree.grantor | Texas A&M University | en |
thesis.degree.name | Doctor of Philosophy | en |
thesis.degree.level | Doctoral | en |
dc.contributor.committeeMember | Jimenez, Daniel | |
dc.contributor.committeeMember | Da Silva, Dilma | |
dc.contributor.committeeMember | Hu, Jiang | |
dc.type.material | text | en |
dc.date.updated | 2021-02-02T21:42:12Z | |
local.embargo.terms | 2022-08-01 | |
local.etdauthor.orcid | 0000-0003-4011-6668 | |