High-Performance External-Memory Mergesort
Abstract
Many current solutions for sorting very large files today are incredibly slow. Given that virtually every widespread application requires some form of sorted data, these slow solutions total a massive waste of time and computing power. When sorting large datasets, the data can come from different files, parts of files, or streams, which poses a problem when trying to create truly high-performance algorithms since the underlying hardware can be slow, specifically in the I/O speed of magnetic drives. Popular solutions used today do not properly consider the performance impact that these I/O devices have on the overall speed. My thesis implements techniques that can reduce and minimize the number of seeks from these HDDs, therefore maximizing the overall performance of the external-memory merge sort algorithm. It also includes several other optimizations for merge rate, such as utilizing large continuous files at the front of the hard drive’s address space.
Citation
Robert, Nicholas (2023). High-Performance External-Memory Mergesort. Undergraduate Research Scholars Program. Available electronically from https : / /hdl .handle .net /1969 .1 /199673.