communicating virtual agents - uni-bielefeld.deskopp/download/ki3_2.pdf · 2 kopp & krämer...
TRANSCRIPT
1
KI3
Communicating Virtual Agents
Nicole Krämer
University of Cologne, Germany
Stefan Kopp
University of Bielefeld, Germany
Part 2: Bases of Multimodal Communication
Kopp & Krämer
KI3: Communicating virtual agents
Overview
I. Introduction� Motivation, history, recent developments� Evaluation
II. Bases of multimodal communication� Channels and functions of multimodal communication� Synthetic communicative behaviors, e.g., facial &
gestural animation, speech synthesis
III. Modeling conversational behavior� Underlying models & architecture� Top-down vs. bottom-up� Outlook & discussion
Kopp & Krämer
KI3: Communicating virtual agents
...knowledge about communication when implementing virtual agentsthat communicate in a human like fashion
� Conversational behavior is highly complex. Since the agent is supposed to behave „autonomously“, we need to know some rules.
� In order to build agents that are accepted and efficient, we need toknow about the effects of specific behaviors.
�Communication research has to provide bases and rules of communication (fundamental research) as well as evaluate theeffects of the agents (applied research).
Problem: Most of the relevant bases and rules are not known yet!
We need...
Kopp & Krämer
KI3: Communicating virtual agents
Channels of communication behavior (I)
Communication has an enormous complexity that mainly is caused by the variety of different channels and their interdependency.
• Verbal and nonverbal communication (Scherer & Wallbott, 1979),vocal and nonvocal channels (Laver & Hutcheson, 1972)
• „Basic triple structure“ of communication: language, paralanguageand kinesics (Poyatos, 1983)
• Studies show that especially the nonverbal behavior is of crucial importance for communication and person perception (Mehrabian& Ferris, 1967; „snap judgements“, Schneider, Hastorff & Ellsworth, 1979).
Kopp & Krämer
KI3: Communicating virtual agents
Channels of communication behavior (II)
Nonverbal behavior channels (according to Wallbott, 1994)
vocal
Time dependent aspects
Voice dependent aspects
Continuity dependent aspects
nonvocal
Motor channels
Physio-chemical channels
Ecological channels
Facial expression
Gestures
Gaze
Posture
Olfactory
Tactile
Thermal
Territory
Interpersonal distance
Appearance
Kopp & Krämer
KI3: Communicating virtual agents
Further important features
• Dimensional complexity – interdependence with respect to the effects (dependence on various contexts: other channels, interaction partners, situational context)
• Sequential complexity - time structure is very important (turn taking, gestures, lip synch)
• Importance of movements and activity (cf. Grammer et al., 1999)
• Subliminal reception and judging as well as producing nonverbal behaviors („communication between limbic systems“, Buck, 1994)
�So far it remains an open question whether rules can be foundthat allow reliable production of the „correct“ behavior
2
Kopp & Krämer
KI3: Communicating virtual agents
Functions of nonverbal behavior (I)
Modeling functions
Discourse functions
Dialogue functions
Relational functions
All these functions are used in FTF-communication and therefore are expected when an humanoid agentappears on the screen. So they have to be modeled!
Mehrabian (1970), Exline et al., (1975), Frey (1999)
Security presen-tations in airplanes
Bandura (1977)
Bolinger (1983), McNeill (1992), Chovil (1991)
Duncan (1972)
Cassell et al. (1994), Nagao & Takeuchi (1994)
Cassell et al. (1999); Thórisson (1996)
Kopp & Krämer
KI3: Communicating virtual agents
Functions of nonverbal behavior (II)
• Discourse functions
� Nonverbal behaviors that are closely related to verbal behavior and can work either as complements, supplements or substitutes of speech
� Especially gestures, but also facial movements such as eyebrow raising (Chovil, 1991) can serve this function
� Concerning gesture Ekman & Friesen (1979; see Efron, 1941) differentiate Illustrators and Emblems (as well as Adaptorsthat do not seem to have discourse function)
� McNeill (1992) distinguishes iconics, metaphorics, deictics,and beats as different types of spontaneous gestures(
�KW1)
Kopp & Krämer
KI3: Communicating virtual agents
Coverbal gesture
• Coverbal gestures are closely related to speech flow (semantic, pragmatic, and temporal synchrony, McNeill, 1992)
• Speech-gesture synchronization on various levels
� Gestures co-occur with rheme (Cassell, 2000)
� Stroke onset precedes orco-occurs with the most contrastively stressed syllable in speech and covaries with it in time.(De Ruiter, 1999; McNeill, 1992; Kendon, 1986)
�Characteristic spatiotemporal features and kinematic properties
Kopp & Krämer
KI3: Communicating virtual agents
Functions of nonverbal behavior (III)
• Dialogue functions
� Consist of turn-taking and backchannel signals� Serve to guarantee the smooth flow of interaction when
exchanging speaker and listener roles� Sacks, Schegloff & Jefferson (1974) list verbal and paraverbal
regulators, Duncan (1972) finds important nonverbal cues� Controversy about the importance of nonverbal cues (Rimé,
1983 vs. Rutter et al., 1979)
Kopp & Krämer
KI3: Communicating virtual agents
Functions of nonverbal behavior (IV)
• Turn-taking-signals (cf. Duncan, 1972)
� Turn yielding signal – extension of the last syllable or last stressed syllable, terminal clause, termination of gestures, sociocentric sentences, looking at interaction partner
� Speaker state signal – starting gesticulation, audible breath, rotating the head away, (over)loudness
� Backchannel signal (Yngve, 1970) – nods, paraverbal feedback, short questions, repetitions, sentence completion
� Turn keeping signal – gesture (negates turn yielding signals), increased head movement activity (Donaghy & Goldberg, 1991)
Kopp & Krämer
KI3: Communicating virtual agents
Functions of nonverbal behavior (V)
• Relational functions
� Socio-emotional effects, definition of the relationship, regulation of emotional climate, impression management
� Mehrabian (1970; cf. Osgood, 1966) differentiates• Evaluation (immediacy cues)
• Dominance (relaxation cues)
• Activity, responsiveness
� Mehrabian tried to find cues for all different dimensions of nonverbal communication...
3
Kopp & Krämer
KI3: Communicating virtual agents
Functions of nonverbal behavior (VI)
• Relational functions – Findings
� Evaluation: gaze, smile, touch, forward lean, head tilt, low distance, activity (e.g. facial expressiveness)
� Dominance: turning away, more expansive gestures, leaning backwards, nonreciprocal touch, relaxation cues?
� Activity/responsiveness: synchrony, relation to increased evaluation
Kopp & Krämer
KI3: Communicating virtual agents
Example of multifunctionality: Eye gaze
• Signals search for information
• Helps to regulate flow of conversation (cf. Duncan, 1972; Kendon, 1967)
• Establishes intimacy (cf. Argyle & Dean, 1967)
• Indicates personality characteristics (social status, culture, etc.) (cf. Exline et al., 1975)
Kopp & Krämer
KI3: Communicating virtual agents
• How to generated communicative behaviors automatically?
� Verbal behavior, also known as speech
� Facial animation for creating facial display and lip synch speech
� Skeletal animation for synthetic gesture
Kopp & Krämer
KI3: Communicating virtual agents
Verbal behaviors
• Spoken utterances with natural intonation contour(crucial for intelligibility and believeability)
�Text-to-speech synthesis
• Lexical stress and sentence stress determined by word class, syntactic constituency, surface position
• Emphatic stress determined by information structure(rheme vs. theme, Halliday, 1967)
• Contrastive stress or focus, e.g. „I like blue tiles more than green tiles.“ vs. „I like blue tiles better than blue wallpaper.“
Emphatic & contrastive stress (= primary stress)�main synchronization points for nonverbal behaviors!
(de Ruiter, 1999)
Kopp & Krämer
KI3: Communicating virtual agents
TTS for multimodality
• TXT2PHO (IKP) and MBROLA (TCTS)• SABLE tags for additional intonation commands
TXT2PHOTXT2PHO
Parse tagsParse tags
ManipulationManipulation
MBROLAMBROLA
Phonetic text+Phonetic text+
Speech
External commands
„<SABLE> Drehe <EMPH> die Leiste <\EMPH>quer zu <EMPH> der Leiste <\EMPH>. <\SABLE>“
„<SABLE> Drehe <EMPH> die Leiste <\EMPH>quer zu <EMPH> der Leiste <\EMPH>. <\SABLE>“
Initialization Planning
Phonation
Phonetic textPhonetic text
Phonetic text:
S 105 18 ...
P 90 8 153
a: 104 4 ...
s 71 28 ...
IPA/XSAMPA
Phonetic text:
S 105 18 ...
P 90 8 153
a: 104 4 ...
s 71 28 ...
IPA/XSAMPA
Kopp & Krämer
KI3: Communicating virtual agents
Nonverbal behaviors
• Generation requires...� High-level way of specifying movements� Accuracy w.r.t. both, spatial and temporal features� Reproduction of naturalness, lifelikeness, even subtleties of
emotive and individual (personal) expression
�Computer animation:
Illusion of movement by displaying slightly alteredpictures in a subsequent and fast manner
�Translation of behaviors into positions and orientations of visual objects for each frame
4
Kopp & Krämer
KI3: Communicating virtual agents
Computer anmation
�Critical issue due to high complexity of both, object and movement, in nonverbal behaviors�Motion control on different levels of abstraction...
Direct specification of all motion parameters(e.g., human body > 240 DOFs)
Direct specification of all motion parameters(e.g., human body > 240 DOFs)
Abstract description of movement &Automatic generation of low-level parameters
Abstract description of movement &Automatic generation of low-level parameters
�Control level hierarchies
simplicity of motion spec
naturalness of animation
Computer animation = modeling + motion control + rendering
Computer animation = modeling + motion control + rendering
Kopp & Krämer
KI3: Communicating virtual agents
Representational animations
• The object‘s representation is subject to the animation
• soft object animation� Animated deformations� Facial Animation, „cloth animation“, etc.
• skeletal animation � Hierarchical structure of rotational joints
connected by rigid links� Animation by alteration of joint angles� Additional control methods (tissue simulation,
cloth animation, etc.) based on underlying kinematic skeleton
Kopp & Krämer
KI3: Communicating virtual agents
Facial Animation
• Requires control hierarchy for deforming the highly complex facial geometry
Vertex displacementsVertex displacements
Face muscle simulationFace muscle simulation
Action EncodingAction EncodingHigh-level specification of actions
performable on the human face:� FACS (Ekman & Friesen, 1978):
Visible facial actions (emotional or conversational) described at muscle level in terms of action units
� MPA (Kalra et al., 1998): Visible features of both facial expressions and visemes(65 MPAs)
High-level specification of actions performable on the human face:
� FACS (Ekman & Friesen, 1978):Visible facial actions (emotional or conversational) described at muscle level in terms of action units
� MPA (Kalra et al., 1998): Visible features of both facial expressions and visemes(65 MPAs)
Kopp & Krämer
KI3: Communicating virtual agents
Face muscles
• Eleven muscles responsible for facial animation; four major groups: Jaw (A), mouth (B-G), eye (H,I), brow/neck (J,K)
• Fixed mapping from muscle contractions to vertex displacements
• Examples: Levator labii superioris (B), Zygomaticus major (C)
(Flemming & Dobbs, 1999)
Kopp & Krämer
KI3: Communicating virtual agents
Vertex displacement
• Movement generation by interpolating target positions (Morphing)• Targets given by, e.g., set of muscle contractions or visual
phonems• Straight, weighted, or segmented morphing
(Flemming & Dobbs, 1999)
Kopp & Krämer
KI3: Communicating virtual agents
Speech animation
• Visual phonems (visemes): mouth positions representing the sounds we hear in speech
• 16 visual phonems, but reduced sets may beadequate for lip synch
• „ba“ & ga � da(McGurk & MacDonald,1986)
5
Kopp & Krämer
KI3: Communicating virtual agents
Speech animation
• Creating lip synch speech� Determine phonems and assign visemes� Animate visemes based on
articulation of phonems� Coarticulation, e.g., drop phonems to
increase smoothness
• Speech Animation + TTS = Talking heads� Baldi, (Massaro et al., 2000)
Kopp & Krämer
KI3: Communicating virtual agents
Skeletal animation
• Hierarchy of rotational joints connected by rigid links
• Anthropometric modeling, joint limits• Redundancy (
�DOF problem, IK
problem)�
Various motion control variables (Cartesian, joint angles, elbow swivel, etc.)
R3 Rn
FK
IK
Kopp & Krämer
KI3: Communicating virtual agents
Keyframing
• Parametric keyframing: Automatic generation of intermediate frames for a given a set of keyframes, by means of interpolating joint angles
• Quality of movements depends onnumber of keyframes
• Still tedious work to define keyframes in low-level control parameters
Kopp & Krämer
KI3: Communicating virtual agents
Performance animation
• Motion capture: Measuring and recording direct movements ofactor for immediate or delayed analysis and playback
• Capture data and map to digital character� Mechanical: joystick, mouse, data gloves, etc.� Optical: at least two cameras, reflecting markers� Electromagnetical: sensors for tracking keypoints
• High degree of naturalness, but lack of generality & flexibility
Kopp & Krämer
KI3: Communicating virtual agents
Procedural animation
• Motion algorithmically described; calculation of control parameters for given point in time
• Physics-based animation� Non-constraint (Newton, Lagrange, etc.) vs. constraint-based
methods (constraint forces, spacetime constraints)� Forward & inverse dynamics� Generation of secondary movements
• Model-based animations� Detailed knowledge about targetted movement� Freqently applied for locomotion
Kopp & Krämer
KI3: Communicating virtual agents
Real-time requirements
Only a polygonal shape with possible texture may be applied
Individual hairs possibleHair
Texture mappingModel with wrinklesSkin
Texture mappingCalculated using mechanical models
Clothes
Simplified models should be used; limitations on the facial deformations
Complex models may be used including muscles with finite elements
Facial Animation
Dynamic models may be too CPU intensiveAny model/method may be used: motion capture, kinematics, dynamics, biomechanics
Locomotion
Real-time processing may prevent using expensive methods based on inverse dynamics or control theory
Any method may be usedSkeletal Animation
Requires fast transformations, e.g., based on cross-sections
May be calculated using metaballs, FFD, splines
Deformations
Limitations on the number of polygonsNo limitations on complexitySurface Modeling
Real-timeFrame-by-frame
(Magnenat-Thalmann & Thalmann, 1998)
6
Kopp & Krämer
KI3: Communicating virtual agents
Gesture animation
• Flexibility, accuracy, and naturalness!
• Two approaches to skeleton motion control:� Motion drawn from a database of predefined motions� Motion dynamically calculated on demand
• Integration of several motion generators vital for designing complex motions!
� hand vs. arm movement� gesture stroke vs. retraction� emblematic vs. iconic gestures
• In terms of Laban Movement Analysis: „Gestures [...] exist because they have some distinctiveness in their Effort and Shapeparameter.“ (Costa et al., 2000)
Kopp & Krämer
KI3: Communicating virtual agents
Gesture animation
• Start from high-level, parametrizeable gesture representations
� Script-based animations, e.g., PaT-Nets (Badler et al. 1993)� Feature-based descriptions based on some
gesture/movement notation system (Calvert et al., 1982; Lebourque & Gibet, 1999; Kopp & Wachsmuth, 2000)
Kopp & Krämer
KI3: Communicating virtual agents
Trajectory formation...
...and modulation
Kopp & Krämer
KI3: Communicating virtual agents
Tomorrow...
I. Introduction� Motivation, history, recent developments� Evaluation
II. Bases of multimodal communication� Channels and functions of multimodal communication� Synthetic communicative behaviors, e.g., facial &
gestural animation, speech synthesis
III. Modeling conversational behavior� Underlying models & architecture� Top-down vs. bottom-up� Outlook & discussion
Kopp & Krämer
KI3: Communicating virtual agents
• Questions? Otherwise....