Leveraging Automated Machine Learning, Edge Computing for Video Understanding

Bhat, Zaid Pervaiz

dc.contributor.advisor	Hu, Xia Ben
dc.contributor.advisor	Caverlee, James
dc.creator	Bhat, Zaid Pervaiz
dc.date.accessioned	2023-09-18T16:18:26Z
dc.date.available	2023-09-18T16:18:26Z
dc.date.created	2022-12
dc.date.issued	2022-10-28
dc.date.submitted	December 2022
dc.identifier.uri	https://hdl.handle.net/1969.1/198510
dc.description.abstract	Computer Vision is witnessing unprecedented growth over the past few years mainly because of the applications of deep learning methods to computer vision tasks like classification, action recognition, segmentation, and object detection. Video-based action recognition is an important task for video understanding with broad applications in security and behavior analysis. However, developing an effective action recognition solution often requires extensive engineering efforts in building and testing different combinations of the modules and optimizing for the best set of their hyperparameters. The recent advancements in computer vision has shown its vast applicability across several real-world problems. However, developing an optimal end-end machine learning pipeline requires considerable knowledge in computer vision and significant engineering efforts by the developers. To address these problems, in this paper, we present AutoVideo, an AutoML framework for automated video action recognition. AutoVideo aims to tackle these problems by 1) being a highly modular and extendable infrastructure following the standard pipeline language, 2) having an exhaustive list of primitives for pipeline construction, 3) including data-driven tuners to save the efforts of pipeline tuning, and 4) integrating an easy-to-use Graphical User Interface (GUI). Another major problem with computer vision applications is the deployment of these machine learning models to edge devices for real world applications, especially because these usually require low latency, low power or data privacy. This requires significant research and engineering efforts due to the computational and memory limitations of edge devices. To tackle this problem, we also present BED, an object detection system for edge devices practiced on the MAX78000 DNN accelerator. To demonstrate real world applicability, we integrate on-device DNN inference with a camera and a screen for image acquisition and output exhibition respectively. AutoVideo is released at GitHub - AutoVideo-GitHub under MIT license with a demo video hosted at Demo Video-AutoVideo while BED is released at Github - BED_main-GitHub with a demo video at Demo Video-BED.
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.subject	AutoML
dc.subject	Object Detection
dc.subject	Real-time System
dc.subject	Edge Device
dc.subject	Video Understanding
dc.subject	Video Recognition
dc.title	Leveraging Automated Machine Learning, Edge Computing for Video Understanding
dc.type	Thesis
thesis.degree.department	Computer Science and Engineering
thesis.degree.discipline	Computer Science
thesis.degree.grantor	Texas A&M University
thesis.degree.name	Master of Science
thesis.degree.level	Masters
dc.contributor.committeeMember	Qian, Xiaoning
dc.type.material	text
dc.date.updated	2023-09-18T16:18:27Z
local.etdauthor.orcid	0000-0002-8331-3154

Files in this item

Name:: BHAT-THESIS-2022.pdf
Size:: 14.42Mb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Electronic Theses, Dissertations, and Records of Study (2002– )
Texas A&M University Theses, Dissertations, and Records of Study (2002– )

Show simple item record