Show simple item record

dc.contributor.advisorHu, Xia
dc.creatorDu, Mengnan
dc.date.accessioned2022-05-25T20:31:31Z
dc.date.available2022-05-25T20:31:31Z
dc.date.created2021-12
dc.date.issued2021-12-07
dc.date.submittedDecember 2021
dc.identifier.urihttps://hdl.handle.net/1969.1/196092
dc.description.abstractDeep neural networks (DNNs) are progressing at an astounding rate, and these models have a wide range of real-world applications, such as movie recommendations of Netflix, neural machine translation of Google, speech recognition of Amazon Alexa. Despite the successes, DNNs have their own limitations and drawbacks. The most significant one is the lack of transparency behind their behaviors, which leaves users with little understanding of how particular decisions are made by these models. Consider, for instance, an advanced self-driving car equipped with various DNN algorithms doesn't brake or decelerate when confronting a stopped firetruck. This unexpected behavior may frustrate and confuse users, making them wonder why. Even worse, the wrong decisions could cause severe consequences if the car is driving at highway speeds and might finally crash the firetruck. The concerns about the black-box nature of complex deep neural network models have hampered their further applications in our society, especially in those critical decision-making domains like self-driving cars. In this dissertation, we investigate the following three research questions: How can we provide explanations for pre-trained DNN models so as to provide insights into their decision making process? How can we make use of explanations to enhance the generalization ability of DNN models? And how can we employ explanations to promote the fairness of DNN models? To address the first research question, we explore the explainability of two standard DNN architectures, including convolutional neural networks (CNNs) and recurrent neural networks (RNNs). We propose to investigate a guided feature inversion framework for taking advantage of the deep architectures towards effective interpretation for CNN models. The proposed framework not only determines the contribution of each feature in the input but also provides insights into the decision-making process of CNN models. By further interacting with the neuron of the target category at the output layer of the CNN, we enforce the interpretation result to be class-discriminative. Besides, we propose a novel attribution method, called REAT, to provide interpretations to RNN predictions. REAT decomposes the final prediction of a RNN into the additive contribution of each word in the input text. This additive decomposition enables REAT to further obtain phrase-level attribution scores. In addition, REAT is generally applicable to various RNN architectures, including GRU, LSTM and their bidirectional versions. Experimental results over a series of image and text classification benchmarks demonstrate the faithfulness and interpretability of the proposed two explanation methods. To address the second research question, we make use of explainability as a debugging tool to examine the vulnerability and failure reasons of DNNs, which further lead to insights that can be used to enhance the generalization ability of DNN models. We propose CREX, which encourages DNN models to focus more on evidence that actually matters for the task at hand, and to avoid overfitting to data-dependent bias and artifacts. Specifically, CREX regularizes the training process of DNNs with rationales, i.e., a subset of features highlighted by domain experts as justifications for predictions, to enforce DNNs to generate local explanations that conform with expert rationales. Besides, recent studies indicate that BERT-based natural language understanding models are prone to rely on shortcut features for prediction. Explainability based observations are employed to formulate a measurement which can quantify the shortcut degree of each training sample. Based on this shortcut measurement, we propose a shortcut mitigation framework LTGR, to suppress the model from making overconfident predictions for samples with large shortcut degree. Experimental analysis over several text benchmark datasets validate that our CREX and LTGR framework could effectively increase the generalization ability of DNN models. In terms of the third research question, explainability based analysis indicates that DNN models trained with standard cross entropy loss tend to capture the spurious correlation between fairness sensitive information in encoder representations with specific class labels. We propose a new mitigation technique, namely RNF, that achieves fairness by debiasing only the task-specific classification head of DNN models. To this end, we leverage samples with the same ground-truth label but different sensitive attributes, and use their neutralized representations to train the classification head of the DNN model. Experimental results over several benchmark datasets demonstrate our RNF framework to effectively reduce discrimination of DNN models with minimal degradation in task-specific performance.en
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectDeep neural networksen
dc.subjectExplainabilityen
dc.subjectGeneralizationen
dc.subjectFairnessen
dc.titleDeep Neural Networks Explainability: Algorithms and Applicationsen
dc.typeThesisen
thesis.degree.departmentComputer Science and Engineeringen
thesis.degree.disciplineComputer Scienceen
thesis.degree.grantorTexas A&M Universityen
thesis.degree.nameDoctor of Philosophyen
thesis.degree.levelDoctoralen
dc.contributor.committeeMemberCaverlee, James
dc.contributor.committeeMemberJi, Shuiwang
dc.contributor.committeeMemberQian, Xiaoning
dc.type.materialtexten
dc.date.updated2022-05-25T20:31:31Z
local.etdauthor.orcid0000-0002-1614-6069


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record