Building a Better Machine Learning Hardware Accelerator with HARP
As machine learning is applied to ever more ambitious tasks, higher performance is required to be able to train and evaluate neural nets in reasonable amounts of time. To this end, many hardware accelerators for machine learning have been made, ranging from ASICs to CUDA code that runs on a conventional GPU. GPU and FPGA based accelerators have seen more success than ASICS due to the ease with which the design can be tweaked or revised, but still suffer from latency resulting from the interface between the processor and the accelerator (generally PCIe). The purpose of this paper is to build a hardware accelerator on Intel’s Heterogeneous Architecture Research Platform, which includes a Xeon processor and Arria 10 FPGA on the same mainboard, which share access to common memory. This should significantly reduce latency and increase throughput. This accelerator is expected to at least match the performance of a typical machine learning library implementation on a GPU, and will hopefully significantly exceed it.
Sacco, Jacob (2019). Building a Better Machine Learning Hardware Accelerator with HARP. Undergraduate Research Scholars Program. Available electronically from