A Robust Pipeline for News Discourse Profiling and Its Application

Li, Ming

The full text of this item is not available at this time because the student has placed this item under an embargo for a period of time. The Libraries are not authorized to provide a copy of this work during the embargo period, even for Texas A&M users with NetID.

Show simple item record

dc.contributor.advisor	Huang, Ruihong
dc.creator	Li, Ming
dc.date.accessioned	2023-10-12T13:43:35Z
dc.date.created	2023-08
dc.date.issued	2023-05-16
dc.date.submitted	August 2023
dc.identifier.uri	https://hdl.handle.net/1969.1/199679
dc.description.abstract	News Discourse Profiling is a sub-task of Discourse Parsing, which aims to analyze each sentence’s event-related role and has been proven useful in several downstream tasks. Complex feature extractors are widely employed for text representation building, and this is true in many NLP systems including news discourse profiling. However, these complex feature extractors make the NLP systems prone to overfitting especially when the training datasets are relatively small, which is the case for several discourse parsing tasks. Thus, we propose an alternative lightweight neural pipeline that removes multiple complex feature extractors and only utilizes self-attention modules to exploit pretrained neural language models, to maximally preserve the generalizability of pretrained language models and mitigate the potential overfitting problem. Though existing news discourse profiling models have made some improvements, they still suffer from a lack of data. Creating discourse-level annotations is time-consuming and labor-intensive which needs a lot of effort from experts while raw news articles are easy to collect. Motivated by this tremendous difficulty gap between collecting annotated and unannotated news articles, we aim to introduce more unlabeled data to improve the performance on the benchmark. In this paper, we propose Intra-document Contrastive Learning with Distillation for news discourse profiling based on its special task structure. Moreover, in this paper, we propose a novel application for the news discourse profiling task. Rhetorical Structure Theory based Discourse Parsing (RST-DP) explores how clauses, sentences, and large text spans compose a whole discourse and presents the rhetorical structure as a hierar-chical tree. Existing RST parsing pipelines construct rhetorical structures without the knowledge of document-level content structures, which is likely to be useful to guide RST tree building, especially for large text spans. We thus present a new pipeline for RST-DP by introducing structure-aware news content sentence representations derived from news discourse profiling.
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.subject	Discourse Parsing
dc.subject	Natural Language Processing
dc.subject	Deep Learning
dc.subject	Machine Learning
dc.title	A Robust Pipeline for News Discourse Profiling and Its Application
dc.type	Thesis
thesis.degree.department	Computer Science and Engineering
thesis.degree.discipline	Computer Science
thesis.degree.grantor	Texas A&M University
thesis.degree.name	Master of Science
thesis.degree.level	Masters
dc.contributor.committeeMember	Jiang, Anxiao
dc.contributor.committeeMember	Tang, Lu
dc.type.material	text
dc.date.updated	2023-10-12T13:43:36Z
local.embargo.terms	2025-08-01
local.embargo.lift	2025-08-01
local.etdauthor.orcid	0009-0001-6491-4827

Files in this item

Name:: LI-THESIS-2023.pdf
Size:: 1.463Mb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Electronic Theses, Dissertations, and Records of Study (2002– )
Texas A&M University Theses, Dissertations, and Records of Study (2002– )

Show simple item record