Show simple item record

dc.contributor.advisorGutierrez-Osuna, Ricardo
dc.creatorLiberatore, Christopher Bryant
dc.date.accessioned2022-05-25T20:29:15Z
dc.date.available2022-05-25T20:29:15Z
dc.date.created2021-12
dc.date.issued2021-10-07
dc.date.submittedDecember 2021
dc.identifier.urihttps://hdl.handle.net/1969.1/196057
dc.description.abstractVoice conversion is the task of transforming speech from one speaker to sound as if it was produced by another speaker, changing the identity while retaining the linguistic content. There are many methods for performing voice conversion, but oftentimes these methods have onerous training requirements or fail in instances where one speaker has a nonnative accent. To address these issues, this dissertation presents and evaluates a novel “anchor-based” representation of speech that separates speaker content from speaker identity by modeling how speakers form English phonemes. We call the proposed method Sparse, Anchor-Based Representation of Speech (SABR), and explore methods for optimizing the parameters of this model in native-to-native and native-to-nonnative voice conversion contexts. We begin the dissertation by demonstrating how sparse coding in combination with a compact, phoneme-based dictionary can be used to separate speaker identity from content in objective and subjective tests. The formulation of the representation then presents several research questions. First, we propose a method for improving the synthesis quality by using the sparse coding residual in combination with a frequency warping algorithm to convert the residual from the source to target speaker’s space, and add it to the target speaker’s estimated spectrum. Experimentally, we find that synthesis quality is significantly improved via this transform. Second, we propose and evaluate two methods for selecting and optimizing SABR anchors in native-to-native and native-to-nonnative voice conversion. We find that synthesis quality is significantly improved by the proposed methods, especially in native-to- nonnative voice conversion over baseline algorithms. In a detailed analysis of the algorithms, we find they focus on phonemes that are difficult for nonnative speakers of English or naturally have multiple acoustic states. Following this, we examine methods for adding in temporal constraints to SABR via the Fused Lasso. The proposed method significantly reduces the inter-frame variance in the sparse codes over other methods that incorporate temporal features into sparse coding representations. Finally, in a case study, we examine the use of the SABR methods and optimizations in the context of a computer aided pronunciation training system for building “Golden Speakers”, or ideal models for nonnative speakers of a second language to learn correct pronunciation. Under the hypothesis that the optimal “Golden Speaker” was the learner’s voice, synthesized with a native accent, we used SABR to build voice models for nonnative speakers and evaluated the resulting synthesis in terms of quality, identity, and accentedness. We found that even when deployed in the field, the SABR method generated synthesis with low accentedness and similar acoustic identity to the target speaker, validating the use of the method for building “golden speakers”.en
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectvoice conversionen
dc.subjectaccent conversionen
dc.subjectsparse codingen
dc.subjectdictionary learningen
dc.subjectresidualen
dc.titleDeveloping Sparse Representations for Anchor-Based Voice Conversionen
dc.typeThesisen
thesis.degree.departmentComputer Science and Engineeringen
thesis.degree.disciplineComputer Scienceen
thesis.degree.grantorTexas A&M Universityen
thesis.degree.nameDoctor of Philosophyen
thesis.degree.levelDoctoralen
dc.contributor.committeeMemberChoe, Yoonsuck
dc.contributor.committeeMemberShell, Dylan
dc.contributor.committeeMemberYoon, Byung-Jun
dc.type.materialtexten
dc.date.updated2022-05-25T20:29:16Z
local.etdauthor.orcid0000-0002-5871-0596


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record