Flexible Models for Heterogeneous Biomedical Data
Abstract
With the development of biomedical sensing techniques and data storage, machine learning has been widely applied to many healthcare applications from the abundance of data resources. However, biomedical data, from real-world applications, has the nature of heterogeneity, and this heterogeneity has not been comprehensively considered and successfully addressed. The heterogeneity in biomedical data includes the various data distributions, the irregularly sampled timeseries data, the variation in the time domain, and other heterogeneous factors such as uncertain labeling. These different types of heterogeneity can happen individually or simultaneously, and sometimes a type of heterogeneity can trigger another one, for instance, a patient’s health condition changed over time, and the doctors made adjustments to the measurements and treatments which causes the irregular feature sampling. Facing the challenge of heterogeneous data, a generalized may have decent performance on average, but fails in certain cases, which should not be ignored in the clinic. In addition, when building individual models for each group of homogeneous data, the training data can become limited, even with a large data size in total. For example, there are a great number of medications, but each of them may not have enough data. The limitation of the generalized models and the possible shortage of training data make the data heterogeneity a very challenging problem to address. Therefore, flexible models are demanded for the various types of heterogeneous biomedical data in real-world applications.
This dissertation investigates data heterogeneity and builds flexible models in biomedical data by focusing on different levels of heterogeneity: different types of heterogeneity happening individually, multi-source simultaneous heterogeneity, multiple data modalities on the same task, and clinical translation of data heterogeneity. We start by building different adaptive models for each individual heterogeneity on a certain type of biomedical data, focusing on time series, and then addressing a more complex situation of simultaneous heterogeneity. Next, the problem setting is extended from time-series data only to multiple data modalities, and finally, we introduce a clinical translation model trying to understand the data heterogeneity. Based on the focus on the heterogeneity in each type of data, transfer learning, adversarial training, and meta-learning techniques are proposed and applied to build adaptive models.
Subject
Machine learningData heterogeneity
Flexible models
Meta-learning
Transfer learning
Adversarial training
Biomedical data
Medicine
Citation
Zhang, Lida (2023). Flexible Models for Heterogeneous Biomedical Data. Doctoral dissertation, Texas A&M University. Available electronically from https : / /hdl .handle .net /1969 .1 /199912.