Towards Ongoing Detection and Neutralization of Linguistic Bias on Wikipedia

Madanagopal, Karthic

The full text of this item is not available at this time because the student has placed this item under an embargo for a period of time. The Libraries are not authorized to provide a copy of this work during the embargo period, even for Texas A&M users with NetID.

View/ Open

MADANAGOPAL-DISSERTATION-2023.pdf (1.331Mb)

Date

2023-05-24

Author

Madanagopal, Karthic

Metadata

Show full item record

Abstract

As the Internet becomes increasingly integrated into people’s daily lives, dealing with data that is presented with subjective bias becomes a challenging issue. One such issue is "subjective bias", which is the use of biased language in presenting objective information with an implied proposition or conclusion. Biased language has a potential to manipulate people’s perception of reality and sometimes stir up and intensify social conflicts. Even though many communication venues, such as encyclopedias and academics, strictly requires objective writing. Owing to numerous elements such as social-categorical knowledge, writers include small linguistic variations in communication, resulting in subjective writing. In order to communicate facts effectively, it is imperative that the language used should be simple, objective, and free of stereotypes. Thus, the need for effective methods of detecting and neutralizing biased language is as critical today as it is with modern spelling and grammar checkers which have become commonplace over the years. This research investigates three major thrusts centered around detecting and neutralizing subjective bias in the text. In the first thrust, we explore automatic approaches to detect various forms of linguistic bias in Wikipedia text. In particular, we study the potential of a cross-domain pre-training approach to learning evidence of biased statements from multiple sources that may provide deeper insights into the kinds of subtle bias occurring on Wikipedia. In the second thrust, we study existing bias neutralization methods and develop a novel reinforced sequence training approach to rewrite biased statements to be neutral, fluent, and grammatically correct using a parallel corpus derived from Wikipedia edit histories. In the final thrust, we propose a cycle-consistent adversarial training-based method to neutralize subjective bias in domains that lack parallel data. The findings of this study offer important insights into the development of more effective methods to address subjective bias in text and contribute to the ongoing efforts to promote fairness and equity in communication. Finally, we conclude the research with an analysis of gaps and future directions for research on detecting and neutralizing subjective bias in the text.

Citation

Madanagopal, Karthic (2023). Towards Ongoing Detection and Neutralization of Linguistic Bias on Wikipedia. Doctoral dissertation, Texas A&M University. Available electronically from https : / /hdl .handle .net /1969 .1 /199943.