dc.contributor.advisor | Mortazavi, Bobak | |
dc.creator | Kovuri, Pranoy | |
dc.date.accessioned | 2019-11-25T22:57:39Z | |
dc.date.available | 2021-08-01T07:33:51Z | |
dc.date.created | 2019-08 | |
dc.date.issued | 2019-07-12 | |
dc.date.submitted | August 2019 | |
dc.identifier.uri | https://hdl.handle.net/1969.1/186570 | |
dc.description.abstract | Information extraction (IE) extracts meaningful knowledge from data. Two important tasks in IE are named entity recognition and relation extraction. Existing approaches in relation extraction treat entity and relation extraction as two separate tasks. They model them in a pipeline approach and rely on external linguistic resources to improve the performance. On contrary, we design a generalized system for end-to-end relation extraction without utilizing any external resources. Our approach identifies entities and relations jointly using a single model, and concurrently identifying all relations between all predicted entities. Through this work, we introduce multi-task fine-tuning on pre-trained models as an approach for related tasks and show that it gives significant performance improvements for each of the individual tasks. Our model performs comparably to the state of the art on Biocreative V Chemical Disease Relation corpus in detecting chemical and diseases and chemically induced disease relation F1-score. We outperform the existing state of the art results on nominal relation classification for SemEval-2010 Task 8 by Test F1 86.9 (2.2 point absolute improvement), without incorporating any external resources or tools. Better information extraction techniques can help identify patient risks more efficiently and thus will be helpful in patient care.
Clinical notes are crucial for predicting events during a patient stay in hospital since they contain valuable information which correlates with the event occurrence. Hence, we study identifying Intensive care unit (ICU) readmission risks using clinical notes for heart disease patients, considering different subsets of these notes but focusing on Echocardiography notes. This work builds a representation of the clinical notes and accounts for additional modality including time series based vital data and different patient descriptors. We outperform previous work on predicting ICU readmission clinical event measured by AUROC (0.634) and F1-score (0.73) without textual modality (Baseline - 0.62 AUROC and 0.72 F1). Additionally, we give the clinician a way of visual interpretation of the important text for the model prediction using attention scores. | en |
dc.format.mimetype | application/pdf | |
dc.language.iso | en | |
dc.subject | Information Extraction | en |
dc.subject | Named Entity Recognition | en |
dc.subject | Natural Language Processing | en |
dc.subject | Relation Extraction | en |
dc.subject | End-to-end relation extraction | en |
dc.subject | Electronic Health Record | en |
dc.subject | Intensive care unit | en |
dc.subject | Byte pair encoding | en |
dc.subject | | en |
dc.title | END-TO-END RELATION EXTRACTION USING SEMI-SUPERVISED PRE-TRAINING | en |
dc.type | Thesis | en |
thesis.degree.department | Computer Science and Engineering | en |
thesis.degree.discipline | Computer Science | en |
thesis.degree.grantor | Texas A&M University | en |
thesis.degree.name | Master of Science | en |
thesis.degree.level | Masters | en |
dc.contributor.committeeMember | Huang, Ruihong | |
dc.contributor.committeeMember | Qian, Xiaoning | |
dc.type.material | text | en |
dc.date.updated | 2019-11-25T22:57:39Z | |
local.embargo.terms | 2021-08-01 | |
local.etdauthor.orcid | 0000-0002-4621-638X | |