Techniques for High Performance Matching

Wang, Ping

dc.contributor.advisor	Gratz, Paul
dc.creator	Wang, Ping
dc.date.accessioned	2022-01-24T22:16:34Z
dc.date.available	2022-01-24T22:16:34Z
dc.date.created	2021-08
dc.date.issued	2021-06-03
dc.date.submitted	August 2021
dc.identifier.uri	https://hdl.handle.net/1969.1/195080
dc.description.abstract	With the growth of big data application demands, improving high-performance computing (HPC) becomes an essential industry task. High-performance matching is a critical performance path for HPC communications because it significantly impacts computing performance and profoundly affects networking performance. This dissertation focuses on improving the high-performance matching in HPC networks to keep up with the increasingly heavy demands of evolving applications. This dissertation is tackling the matching problem from both the computational and network aspects. On the one hand, the Message Passing Interface (MPI) is a de facto standard for the communication of parallel processes in an HPC network [1]. MPI has delivered an excellent performance for running large-scale scientific applications in petascale systems. Along with the petascale system, the exascale system is evolving to run even larger applications where the computing job size increases dramatically. This trend enlarges the message queues and degrades the MPI message matching performance. With the increasing requirement of big data applications, MPI message matching is a critical performance path for HPC communications. On the other hand, with the blooming of network techniques and the fast-growing size of network applications, users are seeking more enhanced, secure, and various network services. In an HPC network, the HPC cluster comprises multiple interconnected nodes in a switched network. With the integration of software-defined networking (SDN) technology into the HPC network, both the computational and network resources can be allocated efficiently according to the applications’ requirements. Thus, SDN switches are deployed in HPC networks to support high-performance, differentiated network services and guarantee the diverse users’ needs, such as firewall, load balancing, and quality of service [2]. In an SDN switch, packet classification classifies incoming packets to flows according to the rules generated in the control plane, which is a switch’s core function. Therefore, packet classification becomes a critical performance path for the HPC network. First, this dissertation presents GenMatcher, a generic and software-only arbitrary matching framework for fast and efficient searches on packet classification. The goal is to represent arbitrary rules with efficient prefix-based tries. In order to generate efficient trie groupings and expansions to support all arbitrary rules, we propose a clustering-based grouping algorithm to group rules based upon their bit-level similarities. Our algorithm generates near-optimal trie groupings with low configuration times and provides significantly higher match throughput than prior techniques. Experiments with synthetic traffic show that our method can achieve a 58.9X speedup 1 compared to the baseline on a single-core processor under a given memory constraint [3]. Second, to further improve the GenMatcher performance, this dissertation proposes GenS- Matcher, an efficient Single Instruction Multiple Data (SIMD) and cache-friendly arbitrary match- ing framework. GenSMatcher adopts a trie node with a fixed high-fanout and a varying span for each node depending on the data distribution. The layout of the trie node leverage cache and modern processor features such as SIMD instructions. To support arbitrary matching, we interpret arbitrary rules into three fields: value, mask, and priority, and then propose the GenSMatcher extraction algorithm to process the wildcard bits to support randomly positioning wildcards in arbitrary rules. At last, we add an array of wildcard entries to the leaf entries, which stores the wildcard rules and guarantees matching results. Experiments show that GenSMatcher outperforms GenMatcher under a large scale of the ruleset and key set regarding search time, insert time, and memory cost. Specifically, with 5M rules, our method achieves a 2.7X speedup on search time, and the insertion time takes ∼ 7.3 seconds, gaining a 1.38X speedup; meanwhile, the memory cost reduction is up to 6.17X. Third, to guarantee MPI ordering feature and high-performance matching for big applications on MPI tag matching, this dissertation introduces a new hybrid data structure and match- ing mechanism to address the performance challenges, reducing the matching operation time in the posted receive queue (PRQ) and unexpected message queue (UMQ). The hybrid data structures are composed of tries and hash maps. We evaluate our mechanism on microbenchmarks and existing MPI applications with different numbers of processes. Experiments with synthetic message flow show that our method can achieve a 20X search time speedup compared to the single-core processor’s baseline. For the PICSARlite application, we integrated our Hybrid and Intel mechanism into the MPICH library and evaluated their performance on the Ada cluster of Texas A&M University, which has 793 general compute nodes. The experiment outcome shows that our proposed Hybrid mechanism can achieve up to 1.55X speedup compared to the MPICH library method.	en
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.subject	Arbitrary matching	en
dc.subject	high-performance	en
dc.title	Techniques for High Performance Matching	en
dc.type	Thesis	en
thesis.degree.department	Electrical and Computer Engineering	en
thesis.degree.discipline	Computer Engineering	en
thesis.degree.grantor	Texas A&M University	en
thesis.degree.name	Doctor of Philosophy	en
thesis.degree.level	Doctoral	en
dc.type.material	text	en
dc.date.updated	2022-01-24T22:16:34Z
local.etdauthor.orcid	0000-0002-7415-7903

Files in this item

Name:: WANG-DISSERTATION-2021.pdf
Size:: 2.023Mb
Format:: PDF

View/ Open

Name:: Gmail - Request for permission ...
Size:: 260.0Kb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Electronic Theses, Dissertations, and Records of Study (2002– )
Texas A&M University Theses, Dissertations, and Records of Study (2002– )

Show simple item record