Mitigating Linguistic Bias in BERT-Based Medical Diagnosis Models
Abstract
Large language models (e.g. BERT, GPT) are increasingly being integrated into critical fields like healthcare. Current machine learning applications have been used for patient diagnoses, monitoring and predicting trial enrollments, consumer health and question answering, and more. However, they’ve yet to be fully trusted. The issue reveals itself when we recognize that Machine Learning algorithms are subject to bias, a result of the datasets they are trained on, misclassification, and sample sizes. When this bias presents itself in clinical tasks it may exacerbate existing socioeconomic disparities. In this thesis, we propose using prompt-based methods for de-biasing clinical based natural language processing models. This method aims to utilize prompt design methods and a variant of the beam search method to generate prompts that directly invoke the most bias in our models. Once we identify the prompts, we use Jensen-Shannon divergence to fine-tune models and lower unfairness. In our preliminary experiments, we find that the prompt design approach reduces both gender and racial bias in language models such as BERT, RoBERTa, and ALBERT, as well as clinical BERT model: SciBERT. Additionally, this improvement in fairness is not at the detriment of the model’s comprehension as showcased in the GLUE benchmark. In summary, we find that once our debiasing method is applied, on average models perform with less gender and race bias and maintain their result accuracy. We hope to further this work by exploring tunable prompts, which would consist of taking our model outputs and back-propagating them into a soft prompt vector. Thus, by the end, instead of a de-biased model we would have a prompt prefix that would get rid of bias on its own.
Subject
Prompt DesignNatural Language Processing
Prompts
BERT
Medical
Scientific Models
Clinical Models
Prompting
NLP
ML
Bias
Debiasing
Linguistic Bias
Citation
Mathavan, Shri (2023). Mitigating Linguistic Bias in BERT-Based Medical Diagnosis Models. Undergraduate Research Scholars Program. Available electronically from https : / /hdl .handle .net /1969 .1 /200282.