The full text of this item is not available at this time because the student has placed this item under an embargo for a period of time. The Libraries are not authorized to provide a copy of this work during the embargo period, even for Texas A&M users with NetID.
A Deep Motion Vector Approach to Video Object Segmentation
MetadataShow full item record
Video object segmentation is gaining increased research and commercial importance in recent times from no checkout lines in Amazon Go stores to autonomous vehicles operating on roads. Efficient operation for such use cases require segmentation inference in real time. Even though there has been significant research in image segmentation, both semantic and instance, there is still much scope for improvement in video segmentation. Video seg-mentation is a direct extension of image segmentation, except that there is temporal relation between neighboring frames of videos. Exploiting this temporal relation in an efficient way is one of the most important challenges in video segmentation. This temporal relation has a lot of redundancy involved and many of the prevalent state-of-the-art techniques do not exploit this redundancy. Optical flow is one of the approaches for exploiting temporal redundancies. Intermediate feature maps of previous frames are interpolated using this information and rest of the segmentation operation is performed. However, optical flow provides motion resolution on a pixel level. There is not enough motion between consecutive frames to warrant motion estimation on pixel level. Instead we can divide a frame into multiple blocks and estimate the movement of their centroids in consecutive video frames. Based on this idea, we present a motion vector approach to video semantic segmentation. Additionally, we also propose an adaptive technique to select keyframes during inference. We show that our proposed algorithm can bring down the computational complexity during inference by as much as 50% with only a 2-3% drop in the accuracy metric. Our algorithm can operate at as high as 136 frames per second indicating that it can easily handle real time inference.
Garg, Vineet (2019). A Deep Motion Vector Approach to Video Object Segmentation. Master's thesis, Texas A&M University. Available electronically from