Leveraging Automated Machine Learning, Edge Computing for Video Understanding

Bhat, Zaid Pervaiz

View/ Open

BHAT-THESIS-2022.pdf (14.42Mb)

Date

2022-10-28

Author

Bhat, Zaid Pervaiz

Metadata

Show full item record

Abstract

Computer Vision is witnessing unprecedented growth over the past few years mainly because of the applications of deep learning methods to computer vision tasks like classification, action recognition, segmentation, and object detection. Video-based action recognition is an important task for video understanding with broad applications in security and behavior analysis. However, developing an effective action recognition solution often requires extensive engineering efforts in building and testing different combinations of the modules and optimizing for the best set of their hyperparameters. The recent advancements in computer vision has shown its vast applicability across several real-world problems. However, developing an optimal end-end machine learning pipeline requires considerable knowledge in computer vision and significant engineering efforts by the developers. To address these problems, in this paper, we present AutoVideo, an AutoML framework for automated video action recognition. AutoVideo aims to tackle these problems by 1) being a highly modular and extendable infrastructure following the standard pipeline language, 2) having an exhaustive list of primitives for pipeline construction, 3) including data-driven tuners to save the efforts of pipeline tuning, and 4) integrating an easy-to-use Graphical User Interface (GUI). Another major problem with computer vision applications is the deployment of these machine learning models to edge devices for real world applications, especially because these usually require low latency, low power or data privacy. This requires significant research and engineering efforts due to the computational and memory limitations of edge devices. To tackle this problem, we also present BED, an object detection system for edge devices practiced on the MAX78000 DNN accelerator. To demonstrate real world applicability, we integrate on-device DNN inference with a camera and a screen for image acquisition and output exhibition respectively. AutoVideo is released at GitHub - AutoVideo-GitHub under MIT license with a demo video hosted at Demo Video-AutoVideo while BED is released at Github - BED_main-GitHub with a demo video at Demo Video-BED.

Citation

Bhat, Zaid Pervaiz (2022). Leveraging Automated Machine Learning, Edge Computing for Video Understanding. Master's thesis, Texas A&M University. Available electronically from https : / /hdl .handle .net /1969 .1 /198510.