Towards Explainable Deep Models for Images, Texts, and Graphs
Abstract
Deep neural networks have been widely studied and applied to different applications in recent years due to their great performance. Even though deep models are shown to be powerful and promising, most of them are developed as black boxes. However, without meaningful explanations of how and why predictions are made, we do not fully understand their inner working mechanisms. Hence, such models cannot be fully trusted, which prevents their use in critical applications pertaining to fairness, privacy, and safety. This raises the need of explaining deep learning models and investigating several questions; some of those are, what input factors are important to the predictions? how the decisions are made through deep networks? and what is the meaning of hidden neurons? In this dissertation, we investigate different explanation techniques for different types of deep models. In particular, we explore both instance-level and model-level explanations for image models, text models, and graph models.
Understanding deep image models is the most straightforward choice for explaining deep models since images are naturally well presented and can be easily visualized. Hence, we start by proposing a novel discrete masking method for explaining deep image classifiers. Our method follows the generative adversarial network formalism that the deep model to be explained is regarded as the discriminator while we train a generator to explain it. The generator is trained to capture discriminative image regions that should convey the same or similar semantic meaning as the original image from the model's perspective. It produces a probability map from which a discrete mask can be sampled. Then the discriminator is used to measure the quality of the sampled mask and provide feedback for updating the generator. Due to the sampling operations, the generator cannot be trained directly by back-propagation. We propose to update it using the policy gradient. Furthermore, we propose to incorporate gradients as auxiliary information to reduce the search space and facilitate training. We conduct both quantitative and qualitative experiments on the ILSVRC dataset to demonstrate the effectiveness of our proposed method. Experimental results indicate that our method can provide reasonable explanations for both correct and incorrect predictions and outperform existing approaches. In addition, our method can pass the model randomization test, indicating that it is reasoning the attribution of network predictions.
Unlike image models, text models are more difficult to explain since texts are represented as discrete variables and cannot be directly visualized. In addition, most explanation methods only focus on the input space of the models and ignore the hidden space. Hence, we propose to explain deep models for text analysis by exploring the meaning of hidden space. Specifically, we propose an approach to investigate the meaning of hidden neurons of the convolutional neural network models for sentence classification tasks. We first employ the saliency map technique to identify important spatial locations in the hidden layers. Then we use optimization techniques to approximate the detected information of these hidden locations from input sentences. Furthermore, we develop regularization terms and explore words in vocabulary to explain such detected information. Experimental results demonstrate that our approach can identify meaningful and reasonable explanations for hidden spatial locations. Additionally, we show that our approach can describe the decision procedure of deep text models.
These facts further motivate us to study the explanation techniques for graph neural networks (GNNs). Unlike images and texts, graph data are usually represented as continuous feature matrices and discrete adjacency matrices. The structural information in the adjacency matrices is important, which should be considered when providing explanations. Thus, methods for images and texts cannot be directly applied. Hence, we investigate both instance-level and model-level explanations of GNNs to provide a comprehensive understanding. First, existing methods invariably focus on explaining the importance of graph nodes or edges but ignore the substructures of graphs, which are more intuitive and human-intelligible. To provide instance-level explanations for GNNs, we propose a novel method, known as SubgraphX, to explain GNNs by identifying important subgraphs. Given a trained GNN model and an input graph, our SubgraphX explains its predictions by efficiently exploring different subgraphs with the Monte Carlo tree search. To make the tree search more effective, we propose to use Shapley values as a measure of subgraph importance, which can also capture the interactions among different subgraphs. To expedite computations, we propose efficient approximation schemes to compute Shapley values for graph data. Our work represents the first attempt to explain GNNs via identifying subgraphs explicitly. Experimental results show that our SubgraphX achieves significantly improved explanations, while keeping computations at a reasonable level. Second, while most existing explanation methods only provide instance-level explanations, none of them can provide high-level understanding. We propose a novel approach, known as XGNN, to explain GNNs at the model-level. Our approach can provide high-level insights and a generic understanding of how GNNs work. In particular, we propose to explain GNNs by training a graph generator so that the generated graph patterns maximize a certain prediction of the model. We formulate the graph generation as a reinforcement learning task, where for each step, the graph generator predicts how to add an edge into the current graph. The graph generator is trained via a policy gradient method based on information from the trained GNNs. In addition, we incorporate several graph rules to encourage the generated graphs to be valid. Experimental results on both synthetic and real-world datasets show that our proposed methods help understand and verify the trained GNNs.
Citation
Yuan, Hao (2021). Towards Explainable Deep Models for Images, Texts, and Graphs. Doctoral dissertation, Texas A&M University. Available electronically from https : / /hdl .handle .net /1969 .1 /195099.