Show simple item record

dc.contributor.advisorJi, Shuiwang
dc.creatorXie, Yaochen
dc.date.accessioned2023-10-12T13:56:44Z
dc.date.available2023-10-12T13:56:44Z
dc.date.created2023-08
dc.date.issued2023-07-26
dc.date.submittedAugust 2023
dc.identifier.urihttps://hdl.handle.net/1969.1/199866
dc.description.abstractDeep learning approaches have demonstrated impressive performance on a variety of data and tasks, where deep models take some data as inputs and are trained to output desired predictions. While the capability of expressiveness of advanced deep models has been improved greatly, their training requires a huge amount of data. A common way to train a deep model is to use the supervised mode in which a sufficient amount of input data and label pairs are given. However, since a large number of labels are required, the supervised training becomes inapplicable in many real-world scenarios, where labels are expensive, limited, imbalanced, or even unavailable. In such cases, self-supervised learning (SSL) enables the training of deep models on unlabeled data, removing the need for excessively annotated labels. When no labeled data is available, SSL can serve as an promising approach to learning representations from and enabling explainability for unlabeled data. In this dissertation, we study and develop multiple theoretically grounded approaches of using self-supervision to perform both learning and explanation under multiple scenarios with image and graph data. The general goal of learning is to learn representations that are both informative and robust to noise from unlabeled data. In contrast to supervised learning, it is more challenging for SSL to learn deep models that are robust to the noise in given data. This is because the self-supervision from data itself may include noise. To achieve such a goal with SSL, we start by studying and investigating the denoising capability of SSL approaches. In particular, we study SSL approaches in the image denoising problems under the scenarios where clean image are unavailable. Self-supervised frameworks that learn denoising models with merely individual noisy images have shown strong capability and promising performance in various image denoising tasks. Existing self-supervised denoising frameworks are mostly built upon the same theoretical foundation inspired by denoising autoencoder, where the denoising models are required to be J -invariant. However, our analyses indicate that the current theory and the J -invariance may lead to denoising models with reduced performance. In this dissertation, we first introduce Noise2Same, a novel self-supervised denoising framework. In Noise2Same, a new self-supervised loss is proposed by deriving a self-supervised upper bound of the typical supervised loss. In particular, Noise2Same requires neither J -invariance nor extra information about the noise model and can be used in a wider range of denoising applications. We analyze our proposed Noise2Same both theoretically and experimentally. The experimental results show that our Noise2Same remarkably outperforms previous self-supervised denoising methods in terms of denoising performance and training efficiency. Given the promising capability of denoising, we further generalize above theoretical framework for SSL into even more challenging data and problems. Specifically, we propose self-supervised approaches to learn representations with graph neural networks (GNNs) on graph data. SSL of GNNs is emerging as a promising way of leveraging unlabeled graph data. Currently, most methods are based on contrastive learning adapted from the image domain, which requires view generation and a sufficient number of negative samples. In contrast, existing predictive models do not require negative sampling, but lack theoretical guidance on the design of pretext training tasks. In this dissertation, we then propose the LaGraph, a predictive SSL framework grounded by the above denoising theory and by formulating the SSL task as the latent graph prediction problem. Learning objectives of LaGraph are derived as self-supervised upper bounds to objectives for predicting unobserved latent graphs. In addition to its improved performance, LaGraph provides explanations for recent successes of predictive models that include invariance-based objectives. We provide theoretical analysis comparing LaGraph to related methods in different domains. Our experimental results demonstrate the superiority of LaGraph in performance and the robustness to the decreasing training sample size on both graph-level and node-level tasks. To ensure reliable deep models are learned under self-supervision, one approach is to enable the explainability of self-supervisely trained models. However, without given downstream tasks and labels, the explanation become infeasible with existing learning-based explanation pipelines and approaches. Specifically, they are incapable of producing explanations for a multitask prediction model with a single explainer. They are also unable to provide explanations in cases where the model is trained in a self-supervised manner, and the resulting representations are used in future downstream tasks. In this dissertation, we further demonstrate with graph data that self-supervision can further be used to learn to explain self-supervisely trained deep models. Specifically, we propose a Task-Agnostic GNN Explainer (TAGE) that is independent of downstream models and trained under self-supervision with no knowledge of downstream tasks. TAGE enables the explanation of GNN embedding models with unseen downstream tasks and allows the efficient explanation of multitask models. Our extensive experiments show that TAGE can significantly speed up the explanation efficiency by using the same model to explain predictions for multiple downstream tasks while achieving an explanation quality as good as or even better than the current state-of-the-art GNN explanation approaches. Finally, given the success in natural images and graph data, we further investigate the capability of self-supervised representation learning to advance scientific discoveries in the scenario of genome-wide association studies (GWAS), which are used to identify relationships between genetic variations and specific traits. When applying GWAS to high-dimensional medical imaging data, a key step is to extract lower-dimensional, yet informative representations of the data as traits. Representation learning for imaging genetics is largely under-explored due to the unique challenges posed by GWAS in comparison to typical visual representation learning. We tackle this problem from the mutual information (MI) perspective by identifying key limitations of existing SSL methods. We introduce a trans-modal SSL framework Genetic InfoMax (GIM), including a regularized MI estimator and a novel genetics-informed transformer to address the specific challenges of GWAS. We evaluate GIM on human brain 3D MRI data and establish standardized evaluation protocols to compare it to existing approaches. Our results demonstrate the effectiveness of GIM and a significantly improved performance on GWAS.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectMachine learning
dc.subjectdeep learning
dc.subjectartificial intelligence
dc.subjectself-supervised learning
dc.titleTowards Self-Supervised Learning and Explaining of Deep Models
dc.typeThesis
thesis.degree.departmentComputer Science and Engineering
thesis.degree.disciplineComputer Science
thesis.degree.grantorTexas A&M University
thesis.degree.nameDoctor of Philosophy
thesis.degree.levelDoctoral
dc.contributor.committeeMemberDing, Yu
dc.contributor.committeeMemberHuang, Ruihuong
dc.contributor.committeeMemberMortazavi, Bobak
dc.type.materialtext
dc.date.updated2023-10-12T13:56:44Z
local.etdauthor.orcid0000-0003-0320-6728


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record