Show simple item record

dc.contributor.advisorWang, Zhangyang
dc.contributor.advisorSze, Sing-Hoi
dc.creatorYuan, Ye
dc.date.accessioned2021-02-03T17:43:14Z
dc.date.available2022-08-01T06:54:16Z
dc.date.created2020-08
dc.date.issued2020-07-24
dc.date.submittedAugust 2020
dc.identifier.urihttps://hdl.handle.net/1969.1/192354
dc.description.abstractRe-identification (ReID) has been one of the most intensively studied problems in computer vision and finds extensive applications in multi-camera systems such as for public safety, indoor/outdoor monitoring, and smart city/community. Being presented with a subject-of-interest (query) captured in one frame, the ReID algorithms aim to identify the occurrences (matches) of the same subject in other video frames, e.g., at different times of the day, or by other cameras. Typically, a standard ReID system contains three main components: object detection including bounding box proposal and recognition, representation learning, and evaluation for retrieval. Most of the existing ReID approaches aim to learn identity-related features or equivalently, design similarity metrics, and measure identity similarities between image pairs. The main goal for ReID problem is to correctly match two images of the same object under intensive appearance changes caused by either intrinsic factors i.e. various pose and viewpoint, or extrinsic factors e.g. occlusion, illumination change, and various environmental background. With the rapidly increasing demand for ReID in multi-camera video surveillance systems, the core technical challenge of the ReID problem is not only just the performance in an enclosed or fixed environment, but also the model’s robustness and transferability to diverse and large-scale unseen cases. In seeking a highly robust ReID algorithm for large-scale real-world scenarios, we strive to tackle this challenging problem from four interlinked perspectives: image understanding in poor visibility environments, robust representation learning with noisy labels, domain-invariant learning for better generalizability, and potential mesh recovery for video-based ReID. To address the robustness of re-identification with large variations, we first conduct a thorough examination of how environmental variances can affect image quality and the visual task, e.g., recognition and detection, and propose a low-level enhancement pipeline as image preprocessing module to help eliminate degradations in complex environmental variations. The proposed image enhancement pipeline wins the second prize in CVPR 2018 UG2 competition for automatic object recognition in poor visibility environment. In addition, we comprehensively discuss the main challenge in ReID, i.e., how to correctly match two images of the same subject under intensive appearance changes caused by intrinsic and environmental factors. To be more specific, we introduce an effective yet efficient loss function a fast-approximated triplet (FAT) loss for representation to extract informative features from noisy data. The FAT loss provably converts the point-wise triplet loss into its upper bound form, consisting of a point-to-set loss term plus cluster compactness regularization. It preserves the effectiveness of triplet loss, while leading to linear complexity to the training set size. A label distillation strategy is further designed to learn refined soft-labels in place of the potentially noisy labels, from only an identified subset of confident examples, through teacher-student networks. We conduct extensive experiments on three most popular ReID benchmarks, and demonstrate that FAT loss with distilled labels lead to ReID features with remarkable accuracy, efficiency, robustness, and direct transferability to unseen datasets. Meanwhile, we present an adversarial domain-invariant feature learning framework (ADIN) to eliminate extrinsic misleading information. The ADIN framework explicitly learns to separate identity-related features from challenging variations, where for the first time “free” annotations in ReID data such as video timestamp and camera index are utilized. Experiments on existing largescale person/vehicle ReID datasets demonstrate that ADIN learns more robust and generalizable representations, as evidenced by its outstanding direct transfer performance across datasets, which is a criterion that can better measure the generalizability of large-scale Re-ID methods. Furthermore, we explore the possibility of modeling 3D-mesh and capturing video motion as an alternative representation for ReID to completely get rid of any environmental distraction in appearance.en
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectImage Enhancementen
dc.subjectImage Retrievalen
dc.subjectRe-Identificationen
dc.subjectHuman Mesh Recoveryen
dc.titleRobust Re-Identification with Large Variationsen
dc.typeThesisen
thesis.degree.departmentComputer Science and Engineeringen
thesis.degree.disciplineComputer Scienceen
thesis.degree.grantorTexas A&M Universityen
thesis.degree.nameDoctor of Philosophyen
thesis.degree.levelDoctoralen
dc.contributor.committeeMemberChaspari, Theodora
dc.contributor.committeeMemberShen, Yang
dc.type.materialtexten
dc.date.updated2021-02-03T17:43:15Z
local.embargo.terms2022-08-01
local.etdauthor.orcid0000-0003-2264-2736


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record