Memory-Efficient Multi-Threading Streaming Partitioning Algorithm
Abstract
Due to the growth of the modern Internet, data analytics, and cluster computing, massive amounts of data are frequently being generated and need to be processed. In many common data processing applications (e.g., sorting), a set of input keys needs to be partitioned into buckets based on their values. Since key partitioning is an application where data can be processed sequentially (i.e., via streaming), one such programming platform we can use to solve this problem is Vortex. Vortex creates the illusion of an infinite buffer by generating controlled memory access violations that are handled transparently. The buffer can be accessed with a single C/C++ pointer, making Vortex both extremely fast and easy to use. Efficient parallelization of a key partitioning algorithm is required to take advantage of multi-core processors, which are now found even in low-end consumer hardware. With this in mind, we propose a high-performance, memory-efficient key partitioning algorithm, which makes use of multiple Vortex streams to allow for concurrent partitioning of keys by multiple threads in a single pass over the input data. The resulting algorithm is able to nearly saturate the memory bandwidth of modern Intel Coffee Lake systems and can be applied to develop high-performance, multi-threaded streaming sorts that are capable of utilizing the multiple processing cores available in modern computers.
Subject
virtual memorymemory
streaming
sorting
partitioning
computing
algorithms
radix sort
vortex
multi-threaded
parallel
Citation
Labbane, Alexander (2022). Memory-Efficient Multi-Threading Streaming Partitioning Algorithm. Undergraduate Research Scholars Program. Available electronically from https : / /hdl .handle .net /1969 .1 /196543.