Show simple item record

dc.contributor.advisorHuang, Ruihong
dc.creatorLi, Ming
dc.date.accessioned2023-10-12T13:43:35Z
dc.date.created2023-08
dc.date.issued2023-05-16
dc.date.submittedAugust 2023
dc.identifier.urihttps://hdl.handle.net/1969.1/199679
dc.description.abstractNews Discourse Profiling is a sub-task of Discourse Parsing, which aims to analyze each sentence’s event-related role and has been proven useful in several downstream tasks. Complex feature extractors are widely employed for text representation building, and this is true in many NLP systems including news discourse profiling. However, these complex feature extractors make the NLP systems prone to overfitting especially when the training datasets are relatively small, which is the case for several discourse parsing tasks. Thus, we propose an alternative lightweight neural pipeline that removes multiple complex feature extractors and only utilizes self-attention modules to exploit pretrained neural language models, to maximally preserve the generalizability of pretrained language models and mitigate the potential overfitting problem. Though existing news discourse profiling models have made some improvements, they still suffer from a lack of data. Creating discourse-level annotations is time-consuming and labor-intensive which needs a lot of effort from experts while raw news articles are easy to collect. Motivated by this tremendous difficulty gap between collecting annotated and unannotated news articles, we aim to introduce more unlabeled data to improve the performance on the benchmark. In this paper, we propose Intra-document Contrastive Learning with Distillation for news discourse profiling based on its special task structure. Moreover, in this paper, we propose a novel application for the news discourse profiling task. Rhetorical Structure Theory based Discourse Parsing (RST-DP) explores how clauses, sentences, and large text spans compose a whole discourse and presents the rhetorical structure as a hierar-chical tree. Existing RST parsing pipelines construct rhetorical structures without the knowledge of document-level content structures, which is likely to be useful to guide RST tree building, especially for large text spans. We thus present a new pipeline for RST-DP by introducing structure-aware news content sentence representations derived from news discourse profiling.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectDiscourse Parsing
dc.subjectNatural Language Processing
dc.subjectDeep Learning
dc.subjectMachine Learning
dc.titleA Robust Pipeline for News Discourse Profiling and Its Application
dc.typeThesis
thesis.degree.departmentComputer Science and Engineering
thesis.degree.disciplineComputer Science
thesis.degree.grantorTexas A&M University
thesis.degree.nameMaster of Science
thesis.degree.levelMasters
dc.contributor.committeeMemberJiang, Anxiao
dc.contributor.committeeMemberTang, Lu
dc.type.materialtext
dc.date.updated2023-10-12T13:43:36Z
local.embargo.terms2025-08-01
local.embargo.lift2025-08-01
local.etdauthor.orcid0009-0001-6491-4827


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record