Show simple item record

dc.contributor.advisorGutierrez-Osuna, Ricardo
dc.creatorQuamer, Waris
dc.date.accessioned2023-10-12T14:51:54Z
dc.date.created2023-08
dc.date.issued2023-08-02
dc.date.submittedAugust 2023
dc.identifier.urihttps://hdl.handle.net/1969.1/200047
dc.description.abstractPronunciation learning is a significant aspect of second language (L2) acquisition, particularly for older learners who often struggle with acquiring a native-like accent. While vocabulary, grammar, and writing skills can be developed well into adulthood, achieving native-like pronunciation becomes challenging due to the neuro-musculatory nature of speech production. Research suggests that L2 learners can improve their pronunciation by imitating a model voice that closely resembles their own or their own voice to transformed to exhibit native-like characteristics. Foreign accent conversion (FAC) techniques aim to transform non-native speaker utterances to appear native-like. Previous FAC approaches have three major limitations. First, most methods needed a reference native speaker (L1) utterance during synthesis, limiting the conversion system to sentences prerecorded by the L1 speaker. Second, early methods were dedicated one-to-one systems, which needed to be trained for each pair of L1 and L2 speakers and/or required considerable amount of data for each new speaker. Finally, none of the previous FAC techniques could disentangle the two main sources of non-native accent: segmental and prosodic characteristics. Being able to manipulate an L2 speaker’s segmental and/or prosodic characteristics independently is critical to quantify how these two channels contribute to speech comprehensibility and social attitudes. To address the first and second problem, I propose a new FAC system that can transform L2 speech directly from previously unseen speakers. The system consists of two independent modules: a translator and a synthesizer, which operate on bottleneck features derived from phonetic posteriorgrams. The translator is trained to map bottleneck features in L2 utterances into those from a parallel L1 utterance. The synthesizer is a many-to-many system that maps input bottleneck features into the corresponding Melspectrograms, conditioned on an embedding from the L2 speaker. During inference, both modules operate in sequence to take an unseen L2 utterance and generate a native-accented Mel-spectrogram. The proposed system achieved a large reduction (67%) in non-native accentedness while retaining the voice identity of the L2 speaker. To address the third problem, I propose an FAC system that further decomposes an accent into its segmental and prosodic characteristics, and provides independent control of both channels. The system uses conventional modules (acoustic model, speaker/prosody encoders, seq2seq model) to generate accent conversions that combine (1) the segmental characteristics from a source utterance, (2) the voice characteristics from a target utterance, and (3) the prosody of a reference utterance. Both objective and perceptual measured show that the system was able to effectively transfer prosody as well as improve transfer of voice identity. Additionally, I show the suitability of the proposed system to study the relative role of various aspects of non-native speech (i.e., voice quality, segmental, and prosody) to the perception of speech intelligibility.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectaccent-conversion
dc.subjectvoice-conversion
dc.subjectzero-shot learning
dc.subjectspeech synthesis
dc.titleImproving Foreign Accent Conversion with Zero-Shot Learning and Explicit Prosody Modeling
dc.typeThesis
thesis.degree.departmentComputer Science and Engineering
thesis.degree.disciplineComputer Science
thesis.degree.grantorTexas A&M University
thesis.degree.nameMaster of Science
thesis.degree.levelMasters
dc.contributor.committeeMemberChaspari, Theodora
dc.contributor.committeeMemberBraga-Neto, Ulisses
dc.type.materialtext
dc.date.updated2023-10-12T14:51:54Z
local.embargo.terms2025-08-01
local.embargo.lift2025-08-01
local.etdauthor.orcid0000-0001-9087-0242


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record