dc.description.abstract | Many current solutions for sorting very large files today are incredibly slow. Given that virtually every widespread application requires some form of sorted data, these slow solutions total a massive waste of time and computing power. When sorting large datasets, the data can come from different files, parts of files, or streams, which poses a problem when trying to create truly high-performance algorithms since the underlying hardware can be slow, specifically in the I/O speed of magnetic drives. Popular solutions used today do not properly consider the performance impact that these I/O devices have on the overall speed. My thesis implements techniques that can reduce and minimize the number of seeks from these HDDs, therefore maximizing the overall performance of the external-memory merge sort algorithm. It also includes several other optimizations for merge rate, such as utilizing large continuous files at the front of the hard drive’s address space. | |