perceptual analysis of talking avatar head movements: a quantitative perspective
Post on 04-Jan-2016
24 Views
Preview:
DESCRIPTION
TRANSCRIPT
Perceptual Analysis of Talking Avatar Head Movements: A Quantitative
Perspective
Xiaohan Ma, Binh H. Le, and Zhigang Deng
Department of Computer Science University of Houston
Motivation
Avatars have been increasingly used in Human-Computer Interfaces– Teleconferencing, computer-mediated
communication, distance education, online virtual worlds, etc.
Human-like avatar gestures influence human perception significantly– Facial expressions– Hand gestures– Lip movements– head movements
• One of the crucial visual cues to facilitate engaging social interaction and communication
How do talking head movements affect perception?
Our Quantitative Perspective
Uncover how talking avatar head movements affect human perception– User-rated head
animations’ naturalness– Joint features extracted
from head animations (with audio)
• Acoustic speech features• Head motion patterns
– Quantitatively analyze the association between extracted joint features and user ratings
Joint Features
Perception (rating)
Analysis of the association
Talking Avatar Head Animations
User evaluation
Featureextraction
Data Acquisition and Processing
Acquisition of the audio-head motion dataset– Head & speech were recorded
simultaneously– Head motion: optical motion
capture system (120 Hz)– Speech: microphone (48 kHz)
Processing of the captured audio-head motion dataset– Head motion: 3 Euler rotation
angles per frame– Speech: pitches and RMS
energy– Aligned head & speech
datasets to the same frame rate (24 FPS)
Y-axis rotation
X-axis rotation
Z-axis rotation
Subjective Evaluation Using the captured dataset,
we generated 60 head animation clips– Based on 15 recorded speech
clips– 4 different audio-head motion
generation techniques– Mosaic on the mouth region
User study– 18 participants– Ages: 23~28– Gender: female (16.67%),
male (83.33%)– Language: fluent English-
speakers– User rating: 1~5
Original data Play back the captured
HMMs [Busso et al. 05]
Mood-Swings [Chuang et al. 05]
Random Randomly generated
Speech-Head Motion Features and Perception
Measure the correlation between head motion and speech features– Canonical Correlation Analysis
(CCA)
Pitch-Head motion and human perception– Computed Pearson coefficient:
0.731
Energy-Head motion and human perception– Seem random, definitely not
linear.
Speech-Head Motion Features and Perception
Implications for CHI– Validate the tight coordination between speech and head
motion: Precise timing in generation is required• Delayed head movement generation may significantly degrade
human perception
– An approximate linear correlation between user ratings and CCA for Pitch-head motion
• Prosody driven head motion synthesis could be fundamentally sound.
– No a simple linear correlation between user ratings and CCA for RMS Energy-head motion
• RMS energy may vary among sentences
Frequency-Domain Analysis of Head Motion
Frequency-domain analysis of head motion– Head motion: rotation angles– Frequency spectrum: FFT
transform applied to the head rotation angle vector
Association between head motion spectrum and human perception– With squared magnitude less
than 5 degree.
- X-axis: average user rating (2.1 ~ 4.2) - Y-axis: the squared magnitude of three Euler angles in the head rotation (0 ~ 5 degree) - Z-axis: Frequency spectrum (0 ~ 19 Hz)
X-axis
Y-axis
Z-axis
Frequency-Domain Analysis of Head Motion
Key observations– Highly rated: low-frequency
• Natural head motion: less than 10 Hz
– Lowly rated: high-frequency• Typically lager than 12 Hz• With a small range of head movements
Implications for HCI– The comfortable head motion
frequency zone: 0~12 Hz – Smooth post-processing for head
motion generations of talking avatar• Smooth: Post-process the synthesized head motions• Simply crop the high frequency part
from the synthesized head motions
Low-frequency patterns
High-frequency patterns
Conclusion and Future Work Summary of our findings
– The coupling between the pitch and head motion has a strong linear correlation with human perception
– The perceived-natural head motions mainly consist of low-frequency motion components and those high-frequency components (>12 Hz) will damage human perception significantly.
Future work– Multi-party conversation scenario– Analysis of other fundamental speech features: pause,
repetitions, etc.
Acknowledgments: This work is in part supported by NSF IIS-0914965, Texas Norman Hackerman Advanced Research 003652-0058-2007, and research gifts from Google and Nokia.
top related