Mitigating Linguistic Bias in BERT-Based Medical Diagnosis Models

Mathavan, Shri

dc.creator	Mathavan, Shri
dc.date.accessioned	2023-11-01T14:17:36Z
dc.date.available	2023-11-01T14:17:36Z
dc.date.created	2023-05
dc.date.submitted	May 2023
dc.identifier.uri	https://hdl.handle.net/1969.1/200282
dc.description.abstract	Large language models (e.g. BERT, GPT) are increasingly being integrated into critical fields like healthcare. Current machine learning applications have been used for patient diagnoses, monitoring and predicting trial enrollments, consumer health and question answering, and more. However, they’ve yet to be fully trusted. The issue reveals itself when we recognize that Machine Learning algorithms are subject to bias, a result of the datasets they are trained on, misclassification, and sample sizes. When this bias presents itself in clinical tasks it may exacerbate existing socioeconomic disparities. In this thesis, we propose using prompt-based methods for de-biasing clinical based natural language processing models. This method aims to utilize prompt design methods and a variant of the beam search method to generate prompts that directly invoke the most bias in our models. Once we identify the prompts, we use Jensen-Shannon divergence to fine-tune models and lower unfairness. In our preliminary experiments, we find that the prompt design approach reduces both gender and racial bias in language models such as BERT, RoBERTa, and ALBERT, as well as clinical BERT model: SciBERT. Additionally, this improvement in fairness is not at the detriment of the model’s comprehension as showcased in the GLUE benchmark. In summary, we find that once our debiasing method is applied, on average models perform with less gender and race bias and maintain their result accuracy. We hope to further this work by exploring tunable prompts, which would consist of taking our model outputs and back-propagating them into a soft prompt vector. Thus, by the end, instead of a de-biased model we would have a prompt prefix that would get rid of bias on its own.
dc.format.mimetype	application/pdf
dc.subject	Prompt Design
dc.subject	Natural Language Processing
dc.subject	Prompts
dc.subject	BERT
dc.subject	Medical
dc.subject	Scientific Models
dc.subject	Clinical Models
dc.subject	Prompting
dc.subject	NLP
dc.subject	ML
dc.subject	Bias
dc.subject	Debiasing
dc.subject	Linguistic Bias
dc.title	Mitigating Linguistic Bias in BERT-Based Medical Diagnosis Models
dc.type	Thesis
thesis.degree.department	Computer Science and Engineering
thesis.degree.discipline	Computer Engineering
thesis.degree.grantor	Undergraduate Research Scholars Program
thesis.degree.name	B.S.
thesis.degree.level	Undergraduate
dc.contributor.committeeMember	Caverlee, James
dc.type.material	text
dc.date.updated	2023-11-01T14:17:37Z

Files in this item

Name:: MATHAVAN-FINALTHESIS-2023.pdf
Size:: 351.5Kb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Undergraduate Research Scholars Capstone (2006–present)

Show simple item record