Virtual Memory Streaming and Sorting in MapReduce Applications

Yao, Yuan

Abstract

In the age of fast growing technology, massive storage, and cluster computing, efficient big-data processing algorithms are in high demand. MapReduce is one of the programming models that enables massive-scale cluster technology around the world. Despite significant public efforts, the open-source implementation of MapReduce – Apache Hadoop – is cumbersome, complex, and inefficient. The purpose of this research is to improve the performance of Hadoop, specifically its sorting component, by developing a single-pass, streambased multithreaded bucket sort. Our new set of algorithms has the potential to influence the future of data-centric computing.

URI

https://hdl.handle.net/1969.1/166469

Subject

storage
cluster computing
MapReduce
algorithms
single-pass
streambased multithread sort

Collections

Undergraduate Research Scholars Capstone (2006–present)

Citation

Yao, Yuan (2019). Virtual Memory Streaming and Sorting in MapReduce Applications. Undergraduate Research Scholars Program. Available electronically from https : / /hdl .handle .net /1969 .1 /166469.