modeling of nonverbal characteristics of persuasive speech
TRANSCRIPT
*1 *1 *1 *2
Modeling of Nonverbal Characteristics of Persuasive Speech
Yukiko Hirabayashi*1, Yusuke Fujita*1, Tomoaki Yoshinaga*1 and Yoshinori Kitahara*2
Abstract - With the objective of developing a persuasive voice-interaction system for making presentations to
large groups, we analyzed the nonverbal characteristics, especially the prosody and face motion, of 35 Japanese
speakers and used the results to model the persuasive prosody and face motion for the system. In regards to
prosody, the maximum and average voice pitches of the persuasive speakers were high and the dynamic range of
the persuasive speakers’ voice pitches was wide. Additionally, the maximum and average lengths of silent pauses
of the persuasive speakers were long and the dynamic range of silent pause lengths of the persuasive speakers was
wide. In regards to face motion, we found that the persuasive speakers mainly moved their faces from side to side
and sparingly moved their faces during utterances. We have reproduced these nonverbal characteristics of prosody
and face motion by synthesized voice and computer-generated (CG) animation and confirmed that these
characteristics enhanced speakers’ persuasiveness.
Keywords: Persuasiveness Nonverbal Prosody Gesture Multimodal
1.
[1][2]
[3]
[3] CG
[4]
[5]
( )
[5,6]
( )
[6]
Mehrabian
55%
38% 7% [4,7]
*1:
*2:
*1: Research & Development Group, Hitachi, Ltd.
*2: The Graduate School of Engineering, Tokyo University of Agriculture and Technology
CG
3 3.1
3.2 3.3
4 4.1
4.2
CG
2.
Mehrabian
[8]
LaCrosse
[9] Burgoon 60
[10] Pearce
[11]
[12]
Miller
[13]
[14]
[15]
ICT
Park
[16]
Ramanarayanan
[17].
[21]
[18]
1 1
Kitahara
[19]
[19] Pelachaud
CG
[20] Sumi
[21]
Huang
[22]
CG
3.
3.1
35
1
1
Figure 1 Speaker’s Image
1 20-30
WAV
3.2
3.2.1
35 (20 )125
64
61 2 (A B)
19 20
5
4
5 1
3.5
2.5
4 (
) A 2.8 1.0 3.2 0.9
4.3 0.8 2.1 0.9 B 2.8 1.0 3.3 1.0
4.8 0.5 2.3 0.9 3
1 (4.3 4.8 ) (Cohens’
d) 0.68 1%
13 9
3.2.2
Praat[23] Praat
70-400 Hz
Praat
1
1
(
)
(
) 1
( / )
3.2.3
(Text-To-Speech, TTS) [24,25] Praat
Audacity[26]
Praat Audacity
3.3
3.3.1
35 119 (20
)
64 55 2 (A B) 3
2
19 20
5
4
5 1
3.5 2.5
4
A 3.4 0.9 2.5 0.7 3.7
0.7 2.2 0.8 B 3.5 0.9 2.7 0.9 3.8
0.9 2.0 0.8 4
15 7
3.3.2
[27,28]
5
3
3
4.
4.1
4.1.1
(a)
( 1(b)) t
t 20 1(b)
(Cohens’ d)
0.8
p < 0.01
( )
1%
1
Table 1 A Comparison of Fundamental Frequencies of
Persuasive and Unpersuasive groups.
4.1.2
2
t
t 20 2
0.8
p > 0.01
[14]
[15] [14]
8.43 7.05 5.47 mora/s ( /
1 1 mora )
11.72 10.75 mora/s [14]
5.73 5.99 mora/s
2
Table 2 A Comparison of Speech Rates of Persuasive
and Unpersuasive groups.
4.1.3
3
t
20 3
p > 0.01
4
t
20 4
0.8
p < 0.01
( )
3
Table 3 A Comparison of Sounding Durations of
Persuasive and Unpersuasive groups.
4
Table 4 A Comparison of Silent Durations of
Persuasive and Unpersuasive groups.
4.1.4
5
( 5 % )
1
5 % 2
Praat
Audacity
( 2(5) ) 1
TTS
TTS
DP
(TTS (Control) 2(1) ) TTS (Control)
TTS (Control)
TTS (Short
silence)( 2(2) ) TTS (Short silence)
2
Figure 2 Speech waveform and F0 of TTS and Model real voices.
(1) TTS (Control), (2)TTS (Short silence), (3)TTS (Narrow pitch),
(4)TTS (Persuasive), (5) Model real voice.
TTS (Control)
TTS (Narrow pitch)( 2(3)
) TTS (Narrow pitch)
TTS (Persuasive)( 2(4) ) TTS (Short silence)
TTS (Narrow pitch)
TTS
2 2(1)-(5)
7
4
3.2.1 116
(20 ) 63 52 2
4 5
3
( )
(1) TTS (Control) (2)TTS (Short silence) (3)TTS
(Narrow pitch) (4)TTS (Persuasive)
F(3, 460)=106.17, p < 0.001
1%
(Tukey-Kramer ) (2)TTS
(Short silence) (3)TTS (Narrow pitch)
p < 0.01 (** ) (2) (3)
1%
3 TTS
Figure 3 Comparing of persuasiveness of TTS voices
4.1.5
[8,11]
[8,11]
[14]
[15]
4.2
4.2.1
3 3
A
4
-60
-40
-20
0
20
40
60
1 301 601 901 1201Left
Right
Front
Voice waveform
(a)
(b)
Down
Up
Front
0 10 20 30 40
-60
-40
-20
0
20
40
60
1 301 601 901 12010 10 20 30 40
(c)
Time (s)
4
(a) (b) (c)
Figure 4 Time-series variations of face direction of a
persuasive speaker. (a) Horizontal direction, (b) Vertical
direction, and (c) Voice waveform.
4(a) (b)
0
0
(c)
1
(a)(b)(c) A
(a) (c)
5 3 3
(A)~(C)
A B C (D)~(F)
D E F
(A) A 4(a)(b) 3
B C
A
30 10 2.6
10 0.8
D E F
30
0 10 5
[5,8]
4.2.2
4.2.1 3
(a) (b)
2
5
(A-C) (D-F)
Figure 5 Time-series variations of face directions of (A-C) persuasive and (D-F) unpersuasive speakers.
6
F(t)
0
F(t)
M(t)
S(t)
6
Figure 6 Model of Face Direction and Speech of
Persuasive Speaker.
M(t) S(t)
(1) (M(t)=0)
(S(t)=0) ( )
(2) (M(t)=0)
(S(t) 0) ( )
(3) (M(t) 0)
(S(t)=0) ( )
(4) (M(t) 0)
(S(t) 0)
(5) L
( L )
(6) L
(L L)
6 /
7
S0 S1 S2
S3
M(t) S(t) L 0 t=0
S0 t
M(t)=0
(L=L+1)
M(t)≠0 and S(t)=0
M(t)≠0 and S(t)≠0
M(t)≠0 and S(t)≠0
or
β<L or
M(t)≠0 and L<α
M(t)=0 and L β(L=L+1)
M(t)≠0 andS(t)=0 and
α L β
(L=0)
M(t)≠0 and S(t)=0
M(t)=0
M(t)≠0 and S(t)≠0
(1) (2)
(3)
(4)
(1) (2) and (5)
(3) and (5)
(3)
(1) (2)
(4)
(6)
(4)
all
S2
S1
S3
S0
7
Figure 7 Finite automaton of Face Direction and
Speech Model.
4.2.3
4.2.2
/ CG
8 CG
8 CG
Figure 8 Processing flow of CG images with voice.
3
3
6 7 A
A
3D
Poser ®
CG
( 9(a))
CG ( 9(b))
9 CG
(a) (b)
Figure 9 CG images of (a) the face motion and
(b) the face front models.
(1) (2) CG
(20 65 )
3.3.1 5
(
) 10 (1)
(2)
1.28 1%
10 CG
Figure 10 Verification of Face Motion Model by CG
Movies.
4.2.4
[8,11,16]
5.
CG
( )
CG
CG
[3]
[1] , :
(HCS) 2014 3
, (9) (2014).
[2] :
(HCS)2014
10 , (1) (2014).
[3] , , , , , P. Reisert, :
2015
( 29 ), 3M3-2in (2015).
[4] :
; (2010).
[5] : ;
(2010).
[6] :
; (2006).
[7] A. Mehrabian: Silent messages; Wadsworth, Belmont,
California (1971).
[8] A. Mehrabian and M. Williams: Nonverbal
Concomitants of Perceived and Intended Persuasiveness;
Journal of Personality and Social Psychology, 13, 1,
pp37-58 (1969).
[9] M. B. LaCrosse: Nonverbal Behavior and Perceived
Counselor Attractiveness and Persuasiveness; Journal of
Counseling Psychology, 22, 6, pp563-566 (1975).
[10] J. K. Burgoon, T. Birk, M. Pfau: Nonverbal Behaviors,
Persuasion, and Credibility; Human Communication
Research, 17, 1, pp140-169 (1990).
[11] W. Pearce and B. Brommel: Vocalic Communication in
Persuasion; Quarterly Journal of Speech, 58, 3,
pp298-306 (1972).
[12] :
; D-II J79-D-II,12,
pp2154-2162 (1996).
[13] N. Miller, G. Maruyama, R. J. Beaber, and K. Valone:
Speed of Speech and Persuasion; Journal of Personality
and Social Psychology 34 pp.615 (1976).
[14] :
; 57 pp200
(1986).
[15] :
; 8 pp65-70 (2008).
[16] S. Park, H. S. Shim, M. Chatterjee, K. Sagae, L. P.
Morency: Computational Analysis of Persuasiveness in
Social Multimedia: A Novel Dataset and Multimodal
Prediction Approach; ICMI'14, Proceedings of the 16th
International Conference on Multimodal Interaction,
pp50-57 (2014).
[17] V. Ramanarayanan, C. W. Leong, L. Chen, G. Feng, D.
Suendermann-Oeft: Evaluating Speech, Face, Emotion
and Body Movement Time-series Features for
Automated Multimodal Presentation Scoring; ICMI'15,
Proceedings of the 2015 ACM on International
Conference on Multimodal Interaction, pp23-30 (2015).
[18] : Sensitive Argent:
;
2004 pp.11-18
(2004).
[19] Y. Kitahara and Y. Tohkura: Prosodic Control to Express
Emotions for Man-Machine Speech Interaction; IEICE
Trans. Fundamentals, Vol.E75-A, No.2 (1992).
[20] C. Pelachaud: Studies on Gesture Expressivity for a
Virtual Agent; Speech Communication, 51, 7, pp630-639
(2008).
[21] K. Sumi and R. Ebata: Sensitive Argent: Human Agent
Interaction for Learning Service-Minded
Communication; The 1st International conference on
Human-Agent Interaction, III-3-3 (2013).
[22] S. Huang and F. Lin: Sensitive Argent: The design and
evaluation of an intelligent sales agent for online
persuasion and negotiation; Electronic Commerce
Research and Applications, 6, 3, pp285–296 (2007).
[23] http://www.fon.hum.uva.nl/praat/.
[24] : ;
vol. 88 No. 06 pp60-65 (2006).
[25] :
; 2-P-4 (2004).
[26] http://audacity.sourceforge.net/.
[27] :
; 2005
pp.419 (2005).
[28] :
; 15 (MIRU)
IS3-30 (2012).
1990 1992
( )
2002
2012
2015
( )
2003 2005
2015
2002
2004
( )
2015
1979 1981
( )
1986
1989 ( )
2014
1987
1989
2001
( )
(C)NPO法人ヒューマンインタフェース学会