END-TO-END RELATION EXTRACTION USING SEMI-SUPERVISED PRE-TRAINING

Kovuri, Pranoy

dc.contributor.advisor	Mortazavi, Bobak
dc.creator	Kovuri, Pranoy
dc.date.accessioned	2019-11-25T22:57:39Z
dc.date.available	2021-08-01T07:33:51Z
dc.date.created	2019-08
dc.date.issued	2019-07-12
dc.date.submitted	August 2019
dc.identifier.uri	https://hdl.handle.net/1969.1/186570
dc.description.abstract	Information extraction (IE) extracts meaningful knowledge from data. Two important tasks in IE are named entity recognition and relation extraction. Existing approaches in relation extraction treat entity and relation extraction as two separate tasks. They model them in a pipeline approach and rely on external linguistic resources to improve the performance. On contrary, we design a generalized system for end-to-end relation extraction without utilizing any external resources. Our approach identifies entities and relations jointly using a single model, and concurrently identifying all relations between all predicted entities. Through this work, we introduce multi-task fine-tuning on pre-trained models as an approach for related tasks and show that it gives significant performance improvements for each of the individual tasks. Our model performs comparably to the state of the art on Biocreative V Chemical Disease Relation corpus in detecting chemical and diseases and chemically induced disease relation F1-score. We outperform the existing state of the art results on nominal relation classification for SemEval-2010 Task 8 by Test F1 86.9 (2.2 point absolute improvement), without incorporating any external resources or tools. Better information extraction techniques can help identify patient risks more efficiently and thus will be helpful in patient care. Clinical notes are crucial for predicting events during a patient stay in hospital since they contain valuable information which correlates with the event occurrence. Hence, we study identifying Intensive care unit (ICU) readmission risks using clinical notes for heart disease patients, considering different subsets of these notes but focusing on Echocardiography notes. This work builds a representation of the clinical notes and accounts for additional modality including time series based vital data and different patient descriptors. We outperform previous work on predicting ICU readmission clinical event measured by AUROC (0.634) and F1-score (0.73) without textual modality (Baseline - 0.62 AUROC and 0.72 F1). Additionally, we give the clinician a way of visual interpretation of the important text for the model prediction using attention scores.	en
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.subject	Information Extraction	en
dc.subject	Named Entity Recognition	en
dc.subject	Natural Language Processing	en
dc.subject	Relation Extraction	en
dc.subject	End-to-end relation extraction	en
dc.subject	Electronic Health Record	en
dc.subject	Intensive care unit	en
dc.subject	Byte pair encoding	en
dc.subject		en
dc.title	END-TO-END RELATION EXTRACTION USING SEMI-SUPERVISED PRE-TRAINING	en
dc.type	Thesis	en
thesis.degree.department	Computer Science and Engineering	en
thesis.degree.discipline	Computer Science	en
thesis.degree.grantor	Texas A&M University	en
thesis.degree.name	Master of Science	en
thesis.degree.level	Masters	en
dc.contributor.committeeMember	Huang, Ruihong
dc.contributor.committeeMember	Qian, Xiaoning
dc.type.material	text	en
dc.date.updated	2019-11-25T22:57:39Z
local.embargo.terms	2021-08-01
local.etdauthor.orcid	0000-0002-4621-638X

Files in this item

Name:: KOVURI-THESIS-2019.pdf
Size:: 1.398Mb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Electronic Theses, Dissertations, and Records of Study (2002– )
Texas A&M University Theses, Dissertations, and Records of Study (2002– )

Show simple item record