Show simple item record

dc.contributor.advisorGratz, Paul
dc.creatorWang, Ping
dc.date.accessioned2022-01-24T22:16:34Z
dc.date.available2022-01-24T22:16:34Z
dc.date.created2021-08
dc.date.issued2021-06-03
dc.date.submittedAugust 2021
dc.identifier.urihttps://hdl.handle.net/1969.1/195080
dc.description.abstractWith the growth of big data application demands, improving high-performance computing (HPC) becomes an essential industry task. High-performance matching is a critical performance path for HPC communications because it significantly impacts computing performance and profoundly affects networking performance. This dissertation focuses on improving the high-performance matching in HPC networks to keep up with the increasingly heavy demands of evolving applications. This dissertation is tackling the matching problem from both the computational and network aspects. On the one hand, the Message Passing Interface (MPI) is a de facto standard for the communication of parallel processes in an HPC network [1]. MPI has delivered an excellent performance for running large-scale scientific applications in petascale systems. Along with the petascale system, the exascale system is evolving to run even larger applications where the computing job size increases dramatically. This trend enlarges the message queues and degrades the MPI message matching performance. With the increasing requirement of big data applications, MPI message matching is a critical performance path for HPC communications. On the other hand, with the blooming of network techniques and the fast-growing size of network applications, users are seeking more enhanced, secure, and various network services. In an HPC network, the HPC cluster comprises multiple interconnected nodes in a switched network. With the integration of software-defined networking (SDN) technology into the HPC network, both the computational and network resources can be allocated efficiently according to the applications’ requirements. Thus, SDN switches are deployed in HPC networks to support high-performance, differentiated network services and guarantee the diverse users’ needs, such as firewall, load balancing, and quality of service [2]. In an SDN switch, packet classification classifies incoming packets to flows according to the rules generated in the control plane, which is a switch’s core function. Therefore, packet classification becomes a critical performance path for the HPC network. First, this dissertation presents GenMatcher, a generic and software-only arbitrary matching framework for fast and efficient searches on packet classification. The goal is to represent arbitrary rules with efficient prefix-based tries. In order to generate efficient trie groupings and expansions to support all arbitrary rules, we propose a clustering-based grouping algorithm to group rules based upon their bit-level similarities. Our algorithm generates near-optimal trie groupings with low configuration times and provides significantly higher match throughput than prior techniques. Experiments with synthetic traffic show that our method can achieve a 58.9X speedup 1 compared to the baseline on a single-core processor under a given memory constraint [3]. Second, to further improve the GenMatcher performance, this dissertation proposes GenS- Matcher, an efficient Single Instruction Multiple Data (SIMD) and cache-friendly arbitrary match- ing framework. GenSMatcher adopts a trie node with a fixed high-fanout and a varying span for each node depending on the data distribution. The layout of the trie node leverage cache and modern processor features such as SIMD instructions. To support arbitrary matching, we interpret arbitrary rules into three fields: value, mask, and priority, and then propose the GenSMatcher extraction algorithm to process the wildcard bits to support randomly positioning wildcards in arbitrary rules. At last, we add an array of wildcard entries to the leaf entries, which stores the wildcard rules and guarantees matching results. Experiments show that GenSMatcher outperforms GenMatcher under a large scale of the ruleset and key set regarding search time, insert time, and memory cost. Specifically, with 5M rules, our method achieves a 2.7X speedup on search time, and the insertion time takes ∼ 7.3 seconds, gaining a 1.38X speedup; meanwhile, the memory cost reduction is up to 6.17X. Third, to guarantee MPI ordering feature and high-performance matching for big applications on MPI tag matching, this dissertation introduces a new hybrid data structure and match- ing mechanism to address the performance challenges, reducing the matching operation time in the posted receive queue (PRQ) and unexpected message queue (UMQ). The hybrid data structures are composed of tries and hash maps. We evaluate our mechanism on microbenchmarks and existing MPI applications with different numbers of processes. Experiments with synthetic message flow show that our method can achieve a 20X search time speedup compared to the single-core processor’s baseline. For the PICSARlite application, we integrated our Hybrid and Intel mechanism into the MPICH library and evaluated their performance on the Ada cluster of Texas A&M University, which has 793 general compute nodes. The experiment outcome shows that our proposed Hybrid mechanism can achieve up to 1.55X speedup compared to the MPICH library method.en
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectArbitrary matchingen
dc.subjecthigh-performanceen
dc.titleTechniques for High Performance Matchingen
dc.typeThesisen
thesis.degree.departmentElectrical and Computer Engineeringen
thesis.degree.disciplineComputer Engineeringen
thesis.degree.grantorTexas A&M Universityen
thesis.degree.nameDoctor of Philosophyen
thesis.degree.levelDoctoralen
dc.type.materialtexten
dc.date.updated2022-01-24T22:16:34Z
local.etdauthor.orcid0000-0002-7415-7903


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record