The full text of this item is not available at this time because the student has placed this item under an embargo for a period of time. The Libraries are not authorized to provide a copy of this work during the embargo period, even for Texas A&M users with NetID.
A Robust Pipeline for News Discourse Profiling and Its Application
Abstract
News Discourse Profiling is a sub-task of Discourse Parsing, which aims to analyze each sentence’s event-related role and has been proven useful in several downstream tasks. Complex feature extractors are widely employed for text representation building, and this is true in many NLP systems including news discourse profiling. However, these complex feature extractors make the NLP systems prone to overfitting especially when the training datasets are relatively small, which is the case for several discourse parsing tasks. Thus, we propose an alternative lightweight neural pipeline that removes multiple complex feature extractors and only utilizes self-attention modules to exploit pretrained neural language models, to maximally preserve the generalizability of pretrained language models and mitigate the potential overfitting problem.
Though existing news discourse profiling models have made some improvements, they still suffer from a lack of data. Creating discourse-level annotations is time-consuming and labor-intensive which needs a lot of effort from experts while raw news articles are easy to collect. Motivated by this tremendous difficulty gap between collecting annotated and unannotated news articles, we aim to introduce more unlabeled data to improve the performance on the benchmark. In this paper, we propose Intra-document Contrastive Learning with Distillation for news discourse profiling based on its special task structure.
Moreover, in this paper, we propose a novel application for the news discourse profiling task. Rhetorical Structure Theory based Discourse Parsing (RST-DP) explores how clauses, sentences, and large text spans compose a whole discourse and presents the rhetorical structure as a hierar-chical tree. Existing RST parsing pipelines construct rhetorical structures without the knowledge of document-level content structures, which is likely to be useful to guide RST tree building, especially for large text spans. We thus present a new pipeline for RST-DP by introducing structure-aware news content sentence representations derived from news discourse profiling.
Citation
Li, Ming (2023). A Robust Pipeline for News Discourse Profiling and Its Application. Master's thesis, Texas A&M University. Available electronically from https : / /hdl .handle .net /1969 .1 /199883.