Deep Neural Networks Explainability: Algorithms and Applications

Du, Mengnan

dc.contributor.advisor	Hu, Xia
dc.creator	Du, Mengnan
dc.date.accessioned	2022-05-25T20:31:31Z
dc.date.available	2022-05-25T20:31:31Z
dc.date.created	2021-12
dc.date.issued	2021-12-07
dc.date.submitted	December 2021
dc.identifier.uri	https://hdl.handle.net/1969.1/196092
dc.description.abstract	Deep neural networks (DNNs) are progressing at an astounding rate, and these models have a wide range of real-world applications, such as movie recommendations of Netflix, neural machine translation of Google, speech recognition of Amazon Alexa. Despite the successes, DNNs have their own limitations and drawbacks. The most significant one is the lack of transparency behind their behaviors, which leaves users with little understanding of how particular decisions are made by these models. Consider, for instance, an advanced self-driving car equipped with various DNN algorithms doesn't brake or decelerate when confronting a stopped firetruck. This unexpected behavior may frustrate and confuse users, making them wonder why. Even worse, the wrong decisions could cause severe consequences if the car is driving at highway speeds and might finally crash the firetruck. The concerns about the black-box nature of complex deep neural network models have hampered their further applications in our society, especially in those critical decision-making domains like self-driving cars. In this dissertation, we investigate the following three research questions: How can we provide explanations for pre-trained DNN models so as to provide insights into their decision making process? How can we make use of explanations to enhance the generalization ability of DNN models? And how can we employ explanations to promote the fairness of DNN models? To address the first research question, we explore the explainability of two standard DNN architectures, including convolutional neural networks (CNNs) and recurrent neural networks (RNNs). We propose to investigate a guided feature inversion framework for taking advantage of the deep architectures towards effective interpretation for CNN models. The proposed framework not only determines the contribution of each feature in the input but also provides insights into the decision-making process of CNN models. By further interacting with the neuron of the target category at the output layer of the CNN, we enforce the interpretation result to be class-discriminative. Besides, we propose a novel attribution method, called REAT, to provide interpretations to RNN predictions. REAT decomposes the final prediction of a RNN into the additive contribution of each word in the input text. This additive decomposition enables REAT to further obtain phrase-level attribution scores. In addition, REAT is generally applicable to various RNN architectures, including GRU, LSTM and their bidirectional versions. Experimental results over a series of image and text classification benchmarks demonstrate the faithfulness and interpretability of the proposed two explanation methods. To address the second research question, we make use of explainability as a debugging tool to examine the vulnerability and failure reasons of DNNs, which further lead to insights that can be used to enhance the generalization ability of DNN models. We propose CREX, which encourages DNN models to focus more on evidence that actually matters for the task at hand, and to avoid overfitting to data-dependent bias and artifacts. Specifically, CREX regularizes the training process of DNNs with rationales, i.e., a subset of features highlighted by domain experts as justifications for predictions, to enforce DNNs to generate local explanations that conform with expert rationales. Besides, recent studies indicate that BERT-based natural language understanding models are prone to rely on shortcut features for prediction. Explainability based observations are employed to formulate a measurement which can quantify the shortcut degree of each training sample. Based on this shortcut measurement, we propose a shortcut mitigation framework LTGR, to suppress the model from making overconfident predictions for samples with large shortcut degree. Experimental analysis over several text benchmark datasets validate that our CREX and LTGR framework could effectively increase the generalization ability of DNN models. In terms of the third research question, explainability based analysis indicates that DNN models trained with standard cross entropy loss tend to capture the spurious correlation between fairness sensitive information in encoder representations with specific class labels. We propose a new mitigation technique, namely RNF, that achieves fairness by debiasing only the task-specific classification head of DNN models. To this end, we leverage samples with the same ground-truth label but different sensitive attributes, and use their neutralized representations to train the classification head of the DNN model. Experimental results over several benchmark datasets demonstrate our RNF framework to effectively reduce discrimination of DNN models with minimal degradation in task-specific performance.	en
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.subject	Deep neural networks	en
dc.subject	Explainability	en
dc.subject	Generalization	en
dc.subject	Fairness	en
dc.title	Deep Neural Networks Explainability: Algorithms and Applications	en
dc.type	Thesis	en
thesis.degree.department	Computer Science and Engineering	en
thesis.degree.discipline	Computer Science	en
thesis.degree.grantor	Texas A&M University	en
thesis.degree.name	Doctor of Philosophy	en
thesis.degree.level	Doctoral	en
dc.contributor.committeeMember	Caverlee, James
dc.contributor.committeeMember	Ji, Shuiwang
dc.contributor.committeeMember	Qian, Xiaoning
dc.type.material	text	en
dc.date.updated	2022-05-25T20:31:31Z
local.etdauthor.orcid	0000-0002-1614-6069

Files in this item

Name:: DU-DISSERTATION-2021.pdf
Size:: 3.378Mb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Electronic Theses, Dissertations, and Records of Study (2002– )
Texas A&M University Theses, Dissertations, and Records of Study (2002– )

Show simple item record