Performance Analysis of Artificial Intelligence Workloads
Abstract
Machine Learning involves analysing large sets of training data to make predictions and decisions to achieve a specific objective. This data-intensive computation places enormous demand on the underlying hardware. To improve overall performance and make predictions quickly, extremely powerful specialized hardware has been built to make the data processing faster. Although this has yielded improved results, they are not economical. They often require significant investment in additional infrastructure and collaboration among the various hardware components.
On the other hand, CPUs are cost-effective, and easily accessible for a fraction of the cost. Unfortunately, very little work has been done to identify the configuration bottlenecks in CPUs and improve their overall performance to meet the demands of Machine Learning. This thesis aims to identify the system parameters which can be tweaked to achieve a boost in CPU performance for Machine Learning algorithms. Leveraging the Gem5 system simulator, a series of experiments were conducted varying the hardware configurations to observe the overall system performance. Analysis of the simulation results showed that CPU and system operating frequency, the L2 cache size and Indirect branch predictor can significantly affect the system performance. We strongly believe our system simulation results can further help in optimizing the performance of CPUs for machine learning workloads.
Subject
Artificial IntelligenceMachine Learning
CPU
Convolutional Neural Network
MNIST
Caffe
TensorFlow.
Citation
Gurushanthappa, Poornima Bevakal (2019). Performance Analysis of Artificial Intelligence Workloads. Master's thesis, Texas A&M University. Available electronically from https : / /hdl .handle .net /1969 .1 /189051.