Developing Sparse Representations for Anchor-Based Voice Conversion

Liberatore, Christopher Bryant

dc.contributor.advisor	Gutierrez-Osuna, Ricardo
dc.creator	Liberatore, Christopher Bryant
dc.date.accessioned	2022-05-25T20:29:15Z
dc.date.available	2022-05-25T20:29:15Z
dc.date.created	2021-12
dc.date.issued	2021-10-07
dc.date.submitted	December 2021
dc.identifier.uri	https://hdl.handle.net/1969.1/196057
dc.description.abstract	Voice conversion is the task of transforming speech from one speaker to sound as if it was produced by another speaker, changing the identity while retaining the linguistic content. There are many methods for performing voice conversion, but oftentimes these methods have onerous training requirements or fail in instances where one speaker has a nonnative accent. To address these issues, this dissertation presents and evaluates a novel “anchor-based” representation of speech that separates speaker content from speaker identity by modeling how speakers form English phonemes. We call the proposed method Sparse, Anchor-Based Representation of Speech (SABR), and explore methods for optimizing the parameters of this model in native-to-native and native-to-nonnative voice conversion contexts. We begin the dissertation by demonstrating how sparse coding in combination with a compact, phoneme-based dictionary can be used to separate speaker identity from content in objective and subjective tests. The formulation of the representation then presents several research questions. First, we propose a method for improving the synthesis quality by using the sparse coding residual in combination with a frequency warping algorithm to convert the residual from the source to target speaker’s space, and add it to the target speaker’s estimated spectrum. Experimentally, we find that synthesis quality is significantly improved via this transform. Second, we propose and evaluate two methods for selecting and optimizing SABR anchors in native-to-native and native-to-nonnative voice conversion. We find that synthesis quality is significantly improved by the proposed methods, especially in native-to- nonnative voice conversion over baseline algorithms. In a detailed analysis of the algorithms, we find they focus on phonemes that are difficult for nonnative speakers of English or naturally have multiple acoustic states. Following this, we examine methods for adding in temporal constraints to SABR via the Fused Lasso. The proposed method significantly reduces the inter-frame variance in the sparse codes over other methods that incorporate temporal features into sparse coding representations. Finally, in a case study, we examine the use of the SABR methods and optimizations in the context of a computer aided pronunciation training system for building “Golden Speakers”, or ideal models for nonnative speakers of a second language to learn correct pronunciation. Under the hypothesis that the optimal “Golden Speaker” was the learner’s voice, synthesized with a native accent, we used SABR to build voice models for nonnative speakers and evaluated the resulting synthesis in terms of quality, identity, and accentedness. We found that even when deployed in the field, the SABR method generated synthesis with low accentedness and similar acoustic identity to the target speaker, validating the use of the method for building “golden speakers”.	en
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.subject	voice conversion	en
dc.subject	accent conversion	en
dc.subject	sparse coding	en
dc.subject	dictionary learning	en
dc.subject	residual	en
dc.title	Developing Sparse Representations for Anchor-Based Voice Conversion	en
dc.type	Thesis	en
thesis.degree.department	Computer Science and Engineering	en
thesis.degree.discipline	Computer Science	en
thesis.degree.grantor	Texas A&M University	en
thesis.degree.name	Doctor of Philosophy	en
thesis.degree.level	Doctoral	en
dc.contributor.committeeMember	Choe, Yoonsuck
dc.contributor.committeeMember	Shell, Dylan
dc.contributor.committeeMember	Yoon, Byung-Jun
dc.type.material	text	en
dc.date.updated	2022-05-25T20:29:16Z
local.etdauthor.orcid	0000-0002-5871-0596

Files in this item

Name:: LIBERATORE-DISSERTATION-2021.pdf
Size:: 1.642Mb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Electronic Theses, Dissertations, and Records of Study (2002– )
Texas A&M University Theses, Dissertations, and Records of Study (2002– )

Show simple item record