Show simple item record

dc.contributor.advisorCaverlee, James
dc.creatorHe, Yun
dc.date.accessioned2023-05-26T17:32:04Z
dc.date.available2023-05-26T17:32:04Z
dc.date.created2022-08
dc.date.issued2022-05-19
dc.date.submittedAugust 2022
dc.identifier.urihttps://hdl.handle.net/1969.1/197762
dc.description.abstractMachine learning plays a significant role in powering artificial intelligence advances in many areas like natural language processing and personalized recommendation, aiming to build models to fit the labeled training data and then predict on the held-out testing data. A key challenge for these machine learning models is the imbalance between scarce labeled data and continuously increased model capacity. On the one hand, the labeled data of many tasks is scarce because human annotations are expensive, which is especially true for some specialized domains like biomedical. On the other hand, the capacity of models are growing continuously in the last decade, with parameters ranging from millions to billions. Without enough labeled data, such large-scale models may overfit on low-resource tasks, resulting in performance deterioration. Recently, many work demonstrate that transferring useful knowledge from pre-training stages or jointly trained related tasks to the target task may alleviate the label scarcity problem and significantly boost the performance of the target task. Despite the prominence achieved in the recent work, there are still many challenges and open problems to be explored for the knowledge transfer. First, transferring domain-specific knowledge from pre-training stages to large-scale language models remains under-explored, which limits the performance of natural language understanding over the corresponding domains. Second, training multiple tasks jointly hinders the performance on individual tasks, which is more serious in transformer-based multi-task co-training because all tasks share a single set of parameters. Third, transferring knowledge from the source might have a negative impact on the target learner, leading to worse results than training the target task alone. To overcome these challenges, three contributions are made in this dissertation: • To transfer disease knowledge to enhance BERT-like language models over health-related tasks, we propose a new pre-training procedure named disease knowledge infusion, which efficiently exploit the self-supervised learning signals of Wikipedia pages. • The second contribution is a novel method named HyperPrompt that utilizes HyperNetworks to generate task-conditioned prompts for multi-task learning, where the task-specific knowledge can be flexibly shared via the HyperNetworks. • To alleviate the negative transfer problem from the perspective of gradient magnitudes, we propose a novel algorithm named MetaBalance to dynamically and adaptively balance the gradients of auxiliary tasks to better assist the target task.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectKnowledge Transfer
dc.subjectMulti-task Learning
dc.subjectPre-training
dc.subjectTransformer
dc.titleIntelligent Knowledge Transfer for Multi-Stage and Multi-Task Learning
dc.typeThesis
thesis.degree.departmentComputer Science and Engineering
thesis.degree.disciplineComputer Science
thesis.degree.grantorTexas A&M University
thesis.degree.nameDoctor of Philosophy
thesis.degree.levelDoctoral
dc.contributor.committeeMemberHu, Xia
dc.contributor.committeeMemberHuang, Ruihong
dc.contributor.committeeMemberMortazavi, Bobak
dc.contributor.committeeMemberShen, Yang
dc.type.materialtext
dc.date.updated2023-05-26T17:32:05Z
local.etdauthor.orcid0000-0001-9462-4583


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record