Show simple item record

dc.contributor.advisorKumar, Panganamala
dc.creatorBalasubramanian, Suprith
dc.date.accessioned2023-02-07T16:18:39Z
dc.date.available2023-02-07T16:18:39Z
dc.date.created2022-05
dc.date.issued2022-04-22
dc.date.submittedMay 2022
dc.identifier.urihttps://hdl.handle.net/1969.1/197330
dc.description.abstractThe guiding design principle behind humans building machines has been the repeated execution of a particular task in a precise and efficient manner. While we have systems that can solve tasks ranging from the relatively mundane like crunching numbers to highly complex tasks like recognizing objects and harvesting wheat, they are incredibly specific in that they perform the tasks that they are trained for and not good at generalization. For instance, a robotic arm that can weld two parts of a car together may be rendered completely useless in a situation that requires welding components in a computer motherboard. Developing agents that are endowed with human-like abilities to generalize in diverse scenarios is a core research topic in artificial intelligence. \In order for an agent to develop general-purpose skills in a completely self-supervised manner, learning rich representations of the world that it is embodied in as well as using these representations to adapt and learn more about the environment are useful. The prediction and anticipation of future events is a key component of such intelligent decision-making systems. Prediction serves as a useful means to learn useful concepts about the world even from a raw stream of sensory observations, such as images from a camera. If the agent can learn to predict raw sensory observations directly, it does not need to assume availability of low-dimensional state information or an extrinsic reward signal. This is beneficial in learning skills in real-world environments, where external reward feedback is extremely sparse or non-existent, and the agent has only indirect access to the state of the world through its senses. Images are high-dimensional and rich sources of information, underlying the potential of video prediction to extract meaningful representations of the underlying patterns in video data. Video prediction refers to the problem of generating pixels of future frames given context information in the form of past frames of a video. When combined with planning algorithms, the agent is able to take actions towards a desired goal in an unsupervised manner by using data as its own supervision. Motivated by this objective of learning generalizable behavior in the real world, we introduce the Hierarchical Variational Autoencoder (HVAE), a model that leverages a hierarchy of latent sequences to solve the task of video prediction.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectVideo Prediction
dc.subjectDeep Learning
dc.subjectSelf-supervised Learning
dc.titleA Hierarchical Approach to Video Prediction
dc.typeThesis
thesis.degree.departmentElectrical and Computer Engineering
thesis.degree.disciplineComputer Engineering
thesis.degree.grantorTexas A&M University
thesis.degree.nameMaster of Science
thesis.degree.levelMasters
dc.contributor.committeeMemberKalantari, Nima
dc.contributor.committeeMemberKalathil, Dileep
dc.contributor.committeeMemberShakkottai, Srinivas
dc.type.materialtext
dc.date.updated2023-02-07T16:18:39Z
local.etdauthor.orcid0000-0001-5807-9987


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record