Show simple item record

dc.contributor.advisorHu, Xia Ben
dc.contributor.advisorCaverlee, James
dc.creatorBhat, Zaid Pervaiz
dc.date.accessioned2023-09-18T16:18:26Z
dc.date.available2023-09-18T16:18:26Z
dc.date.created2022-12
dc.date.issued2022-10-28
dc.date.submittedDecember 2022
dc.identifier.urihttps://hdl.handle.net/1969.1/198510
dc.description.abstractComputer Vision is witnessing unprecedented growth over the past few years mainly because of the applications of deep learning methods to computer vision tasks like classification, action recognition, segmentation, and object detection. Video-based action recognition is an important task for video understanding with broad applications in security and behavior analysis. However, developing an effective action recognition solution often requires extensive engineering efforts in building and testing different combinations of the modules and optimizing for the best set of their hyperparameters. The recent advancements in computer vision has shown its vast applicability across several real-world problems. However, developing an optimal end-end machine learning pipeline requires considerable knowledge in computer vision and significant engineering efforts by the developers. To address these problems, in this paper, we present AutoVideo, an AutoML framework for automated video action recognition. AutoVideo aims to tackle these problems by 1) being a highly modular and extendable infrastructure following the standard pipeline language, 2) having an exhaustive list of primitives for pipeline construction, 3) including data-driven tuners to save the efforts of pipeline tuning, and 4) integrating an easy-to-use Graphical User Interface (GUI). Another major problem with computer vision applications is the deployment of these machine learning models to edge devices for real world applications, especially because these usually require low latency, low power or data privacy. This requires significant research and engineering efforts due to the computational and memory limitations of edge devices. To tackle this problem, we also present BED, an object detection system for edge devices practiced on the MAX78000 DNN accelerator. To demonstrate real world applicability, we integrate on-device DNN inference with a camera and a screen for image acquisition and output exhibition respectively. AutoVideo is released at GitHub - AutoVideo-GitHub under MIT license with a demo video hosted at Demo Video-AutoVideo while BED is released at Github - BED_main-GitHub with a demo video at Demo Video-BED.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectAutoML
dc.subjectObject Detection
dc.subjectReal-time System
dc.subjectEdge Device
dc.subjectVideo Understanding
dc.subjectVideo Recognition
dc.titleLeveraging Automated Machine Learning, Edge Computing for Video Understanding
dc.typeThesis
thesis.degree.departmentComputer Science and Engineering
thesis.degree.disciplineComputer Science
thesis.degree.grantorTexas A&M University
thesis.degree.nameMaster of Science
thesis.degree.levelMasters
dc.contributor.committeeMemberQian, Xiaoning
dc.type.materialtext
dc.date.updated2023-09-18T16:18:27Z
local.etdauthor.orcid0000-0002-8331-3154


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record