Visual prosody in speech-driven facial animation: elicitation, prediction, and perceptual evaluation

Zavala Chmelicka, Marco Enrique

dc.contributor.advisor	Gutierrez-Osuna, Ricardo
dc.creator	Zavala Chmelicka, Marco Enrique
dc.date.accessioned	2005-08-29T14:41:42Z
dc.date.available	2005-08-29T14:41:42Z
dc.date.created	2005-05
dc.date.issued	2005-08-29
dc.identifier.uri	https://hdl.handle.net/1969.1/2436
dc.description.abstract	Facial animations capable of articulating accurate movements in synchrony with a speech track have become a subject of much research during the past decade. Most of these efforts have focused on articulation of lip and tongue movements, since these are the primary sources of information in speech reading. However, a wealth of paralinguistic information is implicitly conveyed through visual prosody (e.g., head and eyebrow movements). In contrast with lip/tongue movements, however, for which the articulation rules are fairly well known (i.e., viseme-phoneme mappings, coarticulation), little is known about the generation of visual prosody. The objective of this thesis is to explore the perceptual contributions of visual prosody in speech-driven facial avatars. Our main hypothesis is that visual prosody driven by acoustics of the speech signal, as opposed to random or no visual prosody, results in more realistic, coherent and convincing facial animations. To test this hypothesis, we have developed an audio-visual system capable of capturing synchronized speech and facial motion from a speaker using infrared illumination and retro-reflective markers. In order to elicit natural visual prosody, a story-telling experiment was designed in which the actors were shown a short cartoon video, and subsequently asked to narrate the episode. From this audio-visual data, four different facial animations were generated, articulating no visual prosody, Perlin-noise, speech-driven movements, and ground truth movements. Speech-driven movements were driven by acoustic features of the speech signal (e.g., fundamental frequency and energy) using rule-based heuristics and autoregressive models. A pair-wise perceptual evaluation shows that subjects can clearly discriminate among the four visual prosody animations. It also shows that speech-driven movements and Perlin-noise, in that order, approach the performance of veridical motion. The results are quite promising and suggest that speech-driven motion could outperform Perlin-noise if more powerful motion prediction models are used. In addition, our results also show that exaggeration can bias the viewer to perceive a computer generated character to be more realistic motion-wise.	en
dc.format.extent	4014463 bytes	en
dc.format.medium	electronic	en
dc.format.mimetype	application/pdf
dc.language.iso	en_US
dc.publisher	Texas A&M University
dc.subject	Facial Animation	en
dc.subject	Visual Prosody	en
dc.subject	Speech-Driven Facial Animation	en
dc.title	Visual prosody in speech-driven facial animation: elicitation, prediction, and perceptual evaluation	en
dc.type	Book	en
dc.type	Thesis	en
thesis.degree.department	Computer Science	en
thesis.degree.discipline	Computer Science	en
thesis.degree.grantor	Texas A&M University	en
thesis.degree.name	Master of Science	en
thesis.degree.level	Masters	en
dc.contributor.committeeMember	Bortfeld, Heather
dc.contributor.committeeMember	Amato, Nancy
dc.type.genre	Electronic Thesis	en
dc.type.material	text	en
dc.format.digitalOrigin	born digital	en

Files in this item

Name:: etd-tamu-2005A-CPSC-Zavala.pdf
Size:: 3.828Mb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Electronic Theses, Dissertations, and Records of Study (2002– )
Texas A&M University Theses, Dissertations, and Records of Study (2002– )

Show simple item record