Show simple item record

dc.contributor.advisorJiang, Anxiao
dc.creatorYu, Xiaojing
dc.date.accessioned2023-05-26T18:06:39Z
dc.date.created2022-08
dc.date.issued2022-07-12
dc.date.submittedAugust 2022
dc.identifier.urihttps://hdl.handle.net/1969.1/198003
dc.description.abstractGeneration is a fundamental sub-area in artificial intelligence. Compared with the remarkable progress in image generation, textual data generation still faces many challenges and is far from perfect. This dissertation aims at addressing some key challenges in textual data generation with constraints. We focus on three topics: text-to-label generation, label-to-text generation, and text-to-text generation. For each topic, we discuss the major issues and propose our approaches to address those issues with a special application. Firstly, we extend open domain text-to-SQL parsing to clinic domain and introduce a new task that automatically translates eligibility criteria to SQL queries. To avoid domain shift problem, we create a new dataset Criteria2SQL with eligibility criteria with paired SQL annotations and summarize a set of grammar rules. With the designed grammar rules, our proposed semantic parsing model can parse eligibility criteria with both simple SQL statements and domain-specific statements, which significantly improves the parsing accuracy. Training generation model with class-imbalanced dataset could lead to tedious and repetitive expression of generated sentences. To tackle this problem, we apply flexible templates to guide neural-based generation. We propose a novel framework for diversity-aware SQL-to-question generation, which extracts natural templates from cross-domain datasets and enforces the generator to produce diverse and high-quality questions. Evaluation on two large-scale datasets demonstrates the effectiveness of our model in generating both diverse and high-quality sentences. Privacy-preserving text generation approaches usually suffer from semantic inconsistency and quality degradation problems. Considering this limitation, we introduce a new measurement to first evaluate the privacy-quality trade-off limit of a generator and then present an efficient authorship obfuscation model to rewrite original text into privacy-preserving text with minimum edition cost. Experiment results show our model improves the upper bound of privacy-quality trade-offs and is adjustable to meet different needs of privacy protection.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectSemantic Parsing
dc.subjectSQL-to-Question Generation
dc.subjectPrivacy-Preserving Text Generation
dc.titleDeep Learning Approaches for Textual Data Generation
dc.typeThesis
thesis.degree.departmentComputer Science and Engineering
thesis.degree.disciplineComputer Science
thesis.degree.grantorTexas A&M University
thesis.degree.nameDoctor of Philosophy
thesis.degree.levelDoctoral
dc.contributor.committeeMemberHuang, Ruihong
dc.contributor.committeeMemberKalantari, Nima
dc.contributor.committeeMemberQian, Xiaoning
dc.type.materialtext
dc.date.updated2023-05-26T18:06:40Z
local.embargo.terms2024-08-01
local.embargo.lift2024-08-01
local.etdauthor.orcid0000-0002-0514-6303


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record