Show simple item record

dc.contributor.advisorWang, Zhangyang
dc.creatorSridhar, Rahul
dc.date.accessioned2022-02-24T19:03:13Z
dc.date.available2022-02-24T19:03:13Z
dc.date.created2021-05
dc.date.issued2021-04-28
dc.date.submittedMay 2021
dc.identifier.urihttps://hdl.handle.net/1969.1/195834
dc.description.abstractThere is a lot of research done to increase the accuracy and reduce the latency of deep learning algorithms. But, there is very little research done to reduce the energy consumption of the deep learning models. For applications that require deploying the deep learning models on the edge devices that have low compute resources, it is important that these algorithms are energy-efficient. Efficient Video Text Spotting is the field that deals with developing deep learning models to be deployed on edge devices to detect, localize, and recognize text appearing in the frames of the videos. Previous methods followed a four-step pipeline: text detection in every frame, text recognition for the localized text region in every frame, tracking text streams, and post-processing. The two main problems with the above approach are high computational cost and low performance. This thesis focuses on the text spotting model design for an Efficient Video Text Spotting System. In this thesis, model design experiments are carried out keeping efficiency in mind. Two different real-time text spotting models were experimented i.e. ABCNet and FOTS. For ABCNet different backbones, normalization schemes, and feature pyramid variations are experimented with to attain the best accuracy and energy tradeoff. For the FOTS model, the two-step text spotting and two-stage text spotting model design are experimented. The influence of various factors such as bounding box to character count ratio, character count, blur level, bounding box count, bounding box area are experimented. From the experiments, it was observed that the two-step text spotting model design method performed better for all resolutions. Further, it was observed that the recognition performance improves with a higher bounding box to character count ratio and lower character count. The energy measurement of the two-step FOTS text spotting model on Raspberry Pi is also presented.en
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectText Spotting, Energy Efficiency, UAV, Deep Learning, Computer Visionen
dc.titleEnergy-Efficient Video Text-Spottingen
dc.typeThesisen
thesis.degree.departmentComputer Science and Engineeringen
thesis.degree.disciplineComputer Scienceen
thesis.degree.grantorTexas A&M Universityen
thesis.degree.nameMaster of Scienceen
thesis.degree.levelMastersen
dc.contributor.committeeMemberKalantari, Nima
dc.contributor.committeeMemberTian, Chao
dc.type.materialtexten
dc.date.updated2022-02-24T19:03:15Z
local.etdauthor.orcid0000-0003-0930-4679


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record