Evaluation of Alternative Face Detection Techniques and Video Segment Lengths on Sign Language Detection
MetadataShow full item record
Sign language is the primary medium of communication for people who are hearing impaired. Sign language videos are hard to discover in video sharing sites as the text-based search is based on metadata rather than the content of the videos. The sign language community currently shares content through ad-hoc mechanisms as no library meets their requirements. Low cost or even real-time classification techniques are valuable to create a sign language digital library with its content being updated as new videos are uploaded to YouTube and other video sharing sites. Prior research was able to detect sign language videos using face detection and background subtraction with recall and precision that is suitable to create a digital library. This approach analyzed one minute of each video being classified. Polar Motion Profiles achieved better recall with videos containing multiple signers but at a significant computational cost as it included five face trackers. This thesis explores techniques to reduce the computation time involved in feature extraction without overly impacting precision and recall deeply. This thesis explores three optimizations to the above techniques. First, we compared the individual performance of the five face detectors and determined the best performing single face detector. Second, we evaluated the performance detection using Polar Motion Profiles when face detection was performed on sampled frames rather than detecting in every frame. From our results, Polar Motion Profiles performed well even when the information between frames is sacrificed. Finally, we looked at the effect of using shorter video segment lengths for feature extraction. We found that the drop in precision is minor as video segments were made shorter from the initial empirical length of a minute. Through our work, we found an empirical configuration that can classify videos with close to two orders of magnitude less computation but with precision and recall not too much below the original voting scheme. Our model improves detection time of sign language videos that in turn would help enrich the digital library with fresh content quickly. Future work can be focused on enabling diarization by segmenting the video to find sign language content and non-sign language content with effective background subtraction techniques for shorter videos.
Duggina, Satyakiran (2015). Evaluation of Alternative Face Detection Techniques and Video Segment Lengths on Sign Language Detection. Master's thesis, Texas A & M University. Available electronically from