Show simple item record

dc.contributor.advisorMortazavi, Bobak
dc.creatorKovuri, Pranoy
dc.date.accessioned2019-11-25T22:57:39Z
dc.date.available2021-08-01T07:33:51Z
dc.date.created2019-08
dc.date.issued2019-07-12
dc.date.submittedAugust 2019
dc.identifier.urihttps://hdl.handle.net/1969.1/186570
dc.description.abstractInformation extraction (IE) extracts meaningful knowledge from data. Two important tasks in IE are named entity recognition and relation extraction. Existing approaches in relation extraction treat entity and relation extraction as two separate tasks. They model them in a pipeline approach and rely on external linguistic resources to improve the performance. On contrary, we design a generalized system for end-to-end relation extraction without utilizing any external resources. Our approach identifies entities and relations jointly using a single model, and concurrently identifying all relations between all predicted entities. Through this work, we introduce multi-task fine-tuning on pre-trained models as an approach for related tasks and show that it gives significant performance improvements for each of the individual tasks. Our model performs comparably to the state of the art on Biocreative V Chemical Disease Relation corpus in detecting chemical and diseases and chemically induced disease relation F1-score. We outperform the existing state of the art results on nominal relation classification for SemEval-2010 Task 8 by Test F1 86.9 (2.2 point absolute improvement), without incorporating any external resources or tools. Better information extraction techniques can help identify patient risks more efficiently and thus will be helpful in patient care. Clinical notes are crucial for predicting events during a patient stay in hospital since they contain valuable information which correlates with the event occurrence. Hence, we study identifying Intensive care unit (ICU) readmission risks using clinical notes for heart disease patients, considering different subsets of these notes but focusing on Echocardiography notes. This work builds a representation of the clinical notes and accounts for additional modality including time series based vital data and different patient descriptors. We outperform previous work on predicting ICU readmission clinical event measured by AUROC (0.634) and F1-score (0.73) without textual modality (Baseline - 0.62 AUROC and 0.72 F1). Additionally, we give the clinician a way of visual interpretation of the important text for the model prediction using attention scores.en
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectInformation Extractionen
dc.subjectNamed Entity Recognitionen
dc.subjectNatural Language Processingen
dc.subjectRelation Extractionen
dc.subjectEnd-to-end relation extractionen
dc.subjectElectronic Health Recorden
dc.subjectIntensive care uniten
dc.subjectByte pair encodingen
dc.subjecten
dc.titleEND-TO-END RELATION EXTRACTION USING SEMI-SUPERVISED PRE-TRAININGen
dc.typeThesisen
thesis.degree.departmentComputer Science and Engineeringen
thesis.degree.disciplineComputer Scienceen
thesis.degree.grantorTexas A&M Universityen
thesis.degree.nameMaster of Scienceen
thesis.degree.levelMastersen
dc.contributor.committeeMemberHuang, Ruihong
dc.contributor.committeeMemberQian, Xiaoning
dc.type.materialtexten
dc.date.updated2019-11-25T22:57:39Z
local.embargo.terms2021-08-01
local.etdauthor.orcid0000-0002-4621-638X


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record