Show simple item record

dc.creatorMathavan, Shri
dc.date.accessioned2023-11-01T14:17:36Z
dc.date.available2023-11-01T14:17:36Z
dc.date.created2023-05
dc.date.submittedMay 2023
dc.identifier.urihttps://hdl.handle.net/1969.1/200282
dc.description.abstractLarge language models (e.g. BERT, GPT) are increasingly being integrated into critical fields like healthcare. Current machine learning applications have been used for patient diagnoses, monitoring and predicting trial enrollments, consumer health and question answering, and more. However, they’ve yet to be fully trusted. The issue reveals itself when we recognize that Machine Learning algorithms are subject to bias, a result of the datasets they are trained on, misclassification, and sample sizes. When this bias presents itself in clinical tasks it may exacerbate existing socioeconomic disparities. In this thesis, we propose using prompt-based methods for de-biasing clinical based natural language processing models. This method aims to utilize prompt design methods and a variant of the beam search method to generate prompts that directly invoke the most bias in our models. Once we identify the prompts, we use Jensen-Shannon divergence to fine-tune models and lower unfairness. In our preliminary experiments, we find that the prompt design approach reduces both gender and racial bias in language models such as BERT, RoBERTa, and ALBERT, as well as clinical BERT model: SciBERT. Additionally, this improvement in fairness is not at the detriment of the model’s comprehension as showcased in the GLUE benchmark. In summary, we find that once our debiasing method is applied, on average models perform with less gender and race bias and maintain their result accuracy. We hope to further this work by exploring tunable prompts, which would consist of taking our model outputs and back-propagating them into a soft prompt vector. Thus, by the end, instead of a de-biased model we would have a prompt prefix that would get rid of bias on its own.
dc.format.mimetypeapplication/pdf
dc.subjectPrompt Design
dc.subjectNatural Language Processing
dc.subjectPrompts
dc.subjectBERT
dc.subjectMedical
dc.subjectScientific Models
dc.subjectClinical Models
dc.subjectPrompting
dc.subjectNLP
dc.subjectML
dc.subjectBias
dc.subjectDebiasing
dc.subjectLinguistic Bias
dc.titleMitigating Linguistic Bias in BERT-Based Medical Diagnosis Models
dc.typeThesis
thesis.degree.departmentComputer Science and Engineering
thesis.degree.disciplineComputer Engineering
thesis.degree.grantorUndergraduate Research Scholars Program
thesis.degree.nameB.S.
thesis.degree.levelUndergraduate
dc.contributor.committeeMemberCaverlee, James
dc.type.materialtext
dc.date.updated2023-11-01T14:17:37Z


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record