Show simple item record

dc.creatorCheng, Nick Sha
dc.date.accessioned2023-11-01T14:26:15Z
dc.date.available2023-11-01T14:26:15Z
dc.date.created2023-05
dc.date.submittedMay 2023
dc.identifier.urihttps://hdl.handle.net/1969.1/200293
dc.description.abstractPrediction models can be applied to hospital intensive care units, or ICUs, in order to improve prediction of adverse patient events through the duration of their stay such as mortality. The current field for mortality and length of stay predictions in the ICU consists of mainly single modal models, such as late fusion models like Shukla & Marlin’s Interpolation Network, or gaussian process models such as Futoma et al.s’ Multitask Gaussian Network. These models create predictions of patient behavior off a single mode of data such as physiological time series data, or clinical text notes. However, they are incapable of leveraging inter-modal patterns where each mode is strongest, which should allow for improved model performance when compared to single modal models. This is especially applicable in a hospital setting, as different modes of time series data are gathered when patients are admitted, such as clinical notes and machine output. Multimodal fusion models for this context have been proposed, and offer a notable performance improvement when compared to their single modal cousins. Through my research, I tested whether the addition of a pre-training step to multimodal fusion models in a hospital setting improved model performance. This is because a pre-training step will allow the model to leverage the large amounts of unlabeled data that hospitals accumulate daily. The unsupervised step is also expected to increase model performance when transferred to hospitals with different operating conditions or little labelled data when compared to standard supervised multimodal models. The pretraining technique used is Variance Invariance Covariance Regularization, or VICReg, which relies on maintaining a minimum desired variance during training to prevent unsupervised branch collapse. While VICReg is a technique mainly used for self-supervised image recognition networks, it can be used in this setting as the different modes of data can be considered as augmentations of the patient condition. After multiple experiments with varying model architectures and VICReg hyperparameters, my results show that VICReg failed to create any noticeable performance benefit when compared to a baseline multimodal model. Despite this outcome, I still believe that a pre-training step, specifically VICReg can be used to boost multimodal fusion model performance, and I will discuss potential steps that can be added to my current experimentation that could create a performance boost. Mainly, a medical multimodal fusion model can see greater benefits through VICReg pre- training if a large and deep model is used, and a hyperparameter search for the VICReg coefficients are conducted. My work serves to compare and contrast the usage of a pre training technique from the image recognition field onto a multimodal fusion model from the medical field in order to improve patient care through the use of intelligent systems that can aid workers in the ICU.
dc.format.mimetypeapplication/pdf
dc.subjectVICReg
dc.subjectMultimodal Models
dc.subjectFusion Models
dc.subjectMIMIC-III
dc.titleMultimodal Fusion Models Pretrained with VICReg
dc.typeThesis
thesis.degree.departmentComputer Science and Engineering
thesis.degree.disciplineComputer Science
thesis.degree.grantorUndergraduate Research Scholars Program
thesis.degree.nameB.S.
thesis.degree.levelUndergraduate
dc.contributor.committeeMemberMortazavi, Bobak
dc.type.materialtext
dc.date.updated2023-11-01T14:26:15Z


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record