towards a socially adaptive virtual agent - lsis · the role of the recruiter. ... lite, distant,...

14
Towards a Socially Adaptive Virtual Agent Atef Ben Youssef 1 , Mathieu Chollet 2 , Haza¨ el Jones 3 , Nicolas Sabouret 1 , Catherine Pelachaud 2 , and Magalie Ochs 4 1 University Paris-Sud, LIMSI-CNRS UPR 3251 2 CNRS LTCI, Telecom-ParisTech 3 SupAgro, UMR ITAP 4 University Aix-Marseille, LSIS, DiMag {atef.ben_youssef,nicolas.sabouret}@limsi.fr, [email protected], {mathieu.chollet,catherine.pelachaud}@telecom-paristech.fr, [email protected] Abstract. This paper presents a socially adaptive virtual agent that can adapt its behaviour according to social constructs (e.g. attitude, relation- ship) that are updated depending on the behaviour of its interlocutor. We consider the context of job interviews with the virtual agent playing the role of the recruiter. The evaluation of our approach is based on a comparison of the socially adaptive agent to a simple scripted agent and to an emotionally-reactive one. Videos of these three different agents in situation have been created and evaluated by 83 participants. This subjective evaluation shows that the simulation and expression of so- cial attitude is perceived by the users and impacts on the evaluation of the agent’s credibility. We also found that while the emotion expression of the virtual agent has an immediate impact on the user’s experience, the impact of the virtual agent’s attitude expression’s impact is stronger after a few speaking turns. Keywords: Social Attitudes, Emotions, Affective computing, Virtual Agent, Non-verbal behaviour, adaptation 1 Introduction Can you imagine a conversation where your interlocutors never change their be- haviour, or do not react to your own behaviour? Intelligent virtual agents have the ability to simulate and express affects in dyadic interactions. However, the connection of these expressions with the interlocutor’s verbal and non-verbal be- haviour remains an open challenge. This results into what we can call “scripted” agents whose behaviour does not change in function of the human user’s reac- tions. Our research motivation is that virtual agents should adapt their affective behaviour to the users’ behaviour in order to be credible and to be able to build a relationship with them [32]. A first step in that direction is to build reactive agents that adapt their emotions according to their perceptions. A second step is to also consider long

Upload: vuongkiet

Post on 17-Apr-2018

219 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Towards a Socially Adaptive Virtual Agent - LSIS · the role of the recruiter. ... lite, distant, cold, ... Towards a Socially Adaptive Virtual Agent 5 where Parami is one the (N

Towards a Socially Adaptive Virtual Agent

Atef Ben Youssef1, Mathieu Chollet2, Hazael Jones3, Nicolas Sabouret1,Catherine Pelachaud2, and Magalie Ochs4

1University Paris-Sud, LIMSI-CNRS UPR 32512CNRS LTCI, Telecom-ParisTech

3SupAgro, UMR ITAP4University Aix-Marseille, LSIS, DiMag

{atef.ben_youssef,nicolas.sabouret}@limsi.fr,

[email protected],

{mathieu.chollet,catherine.pelachaud}@telecom-paristech.fr,

[email protected]

Abstract. This paper presents a socially adaptive virtual agent that canadapt its behaviour according to social constructs (e.g. attitude, relation-ship) that are updated depending on the behaviour of its interlocutor.We consider the context of job interviews with the virtual agent playingthe role of the recruiter. The evaluation of our approach is based on acomparison of the socially adaptive agent to a simple scripted agent andto an emotionally-reactive one. Videos of these three different agentsin situation have been created and evaluated by 83 participants. Thissubjective evaluation shows that the simulation and expression of so-cial attitude is perceived by the users and impacts on the evaluation ofthe agent’s credibility. We also found that while the emotion expressionof the virtual agent has an immediate impact on the user’s experience,the impact of the virtual agent’s attitude expression’s impact is strongerafter a few speaking turns.

Keywords: Social Attitudes, Emotions, Affective computing, VirtualAgent, Non-verbal behaviour, adaptation

1 Introduction

Can you imagine a conversation where your interlocutors never change their be-haviour, or do not react to your own behaviour? Intelligent virtual agents havethe ability to simulate and express affects in dyadic interactions. However, theconnection of these expressions with the interlocutor’s verbal and non-verbal be-haviour remains an open challenge. This results into what we can call “scripted”agents whose behaviour does not change in function of the human user’s reac-tions. Our research motivation is that virtual agents should adapt their affectivebehaviour to the users’ behaviour in order to be credible and to be able to builda relationship with them [32].

A first step in that direction is to build reactive agents that adapt theiremotions according to their perceptions. A second step is to also consider long

Page 2: Towards a Socially Adaptive Virtual Agent - LSIS · the role of the recruiter. ... lite, distant, cold, ... Towards a Socially Adaptive Virtual Agent 5 where Parami is one the (N

2 A. Ben Youssef, M. Chollet, H. Jones, N. Sabouret, C. Pelachaud, M. Ochs

term affects such as moods and attitudes. The expression of attitudes by thevirtual character allows the human to build a social and affective relationshiptowards the character during the time of the interaction [32], because the char-acter’s behaviour does not only consist of emotional reactions, but also changesdepending on the previous behaviours of the human and the previous reactionsof the character towards the human’s behaviours.

Social attitude can be defined as an affective style that spontaneously devel-ops or is strategically employed in the interaction with a person or a group ofpersons, colouring the interpersonal exchange in that situation (e.g. being po-lite, distant, cold, warm, supportive, contemptuous). We propose in this paperan architecture for Socially Adaptive Agents, i.e. virtual agents that can adapttheir behaviour according to social attitudes that they continuously update asthe conversation progresses. More precisely, we propose to build agents thatcan perceive the non-verbal behaviour of their human interlocutor, build socialconstructs and adapt their non-verbal behaviour accordingly.

We apply this work to the domain of job interview simulation. We thenevaluate our model to measure if this mode leads to more credible and naturalagents.

2 Related works and Theoretical background

Our work refers to four different domains, as it involves the computation ofagent’s attitudes in real-time, with social adaptation, in the context of training.

Attitudes: A common representation for attitudes is Argyle’s InterpersonalCircumplex [3], a bi-dimensional representation with a Liking dimension andDominance dimension. Most modalities are involved in the expression of atti-tudes : gaze, head orientation, postures, facial expressions, gestures, head move-ments [9, 6]. For instance, wider gestures are signs of a dominant attitude [9],while smiles are signs of friendliness [6]. Models of attitude expression for em-bodied conversational agents have been proposed in the recent years. Ballin et al.propose a model that adapts posture, gesture and gaze behaviour depending onthe agents’ social relations [4]. Cafaro et al. study how users perceive attitudesand personality in the first seconds of their encounter with an agent dependingon the agent’s behaviour [8]. Ravenet et al. propose a model for expressing anattitude as the same time as a communicative intention [27].

Social training background: The idea of using virtual characters for train-ing has gained much attention in the recent years: an early example is the ped-agogical agent Steve [18], which was limited to demonstrating skills to studentsand answering questions. Since then, virtual characters have also been used fortraining social skills such as public speaking training [12] or job interview train-ing, for instance with the MACH system [15]. However, while the recruiter usedin MACH can mimic behaviour and display back-channels, it does not reason onthe user’s affects and does not adapt to them.

Agents with real-time perception and reaction to affects: Severalagent models have proposed to process users’ behaviours to infer affects in real-

Page 3: Towards a Socially Adaptive Virtual Agent - LSIS · the role of the recruiter. ... lite, distant, cold, ... Towards a Socially Adaptive Virtual Agent 5 where Parami is one the (N

Towards a Socially Adaptive Virtual Agent 3

time. Acosta and Ward proposed a spoken dialogue system capable of inferringthe user’s emotion from vocal features and to adapt its response according tothis emotion, which leads to a better perceived interaction [1]. Prendinger andIshizuka presented the Empathic Companion [26], which is capable to use users’physiological signals to interpret their affective state and to produce empathicfeedback and reduce the stress of participants. The Semaine project [28] intro-duced Sensitive Artificial Listeners, i.e. virtual characters with different per-sonalities that induce emotions in their interlocutors by producing emotionallycoloured feedback. Audio and visual cues are used to infer users’ emotions, whichare then used to tailor the agent’s non-verbal behaviour and next utterance.However the agents’ behaviour is restricted to the face, and the agent mainlyproduces backchannels (i.e. listening behaviour). The SimSensei virtual humaninterviewer is designed to handle a conversation with users about psychologicaldistress [13]. The agent can react to the user’s behaviours as well as some higherlevel affective cues (e.g. arousal). However these cues only affect some of thedialogue choices of the agent and its reactive behaviour: no further adaptationis performed, such as long term adaptation of the expressed attitude. We try toovercome these limitations in our own work.

Social adaptation: The relational agent Laura introduced by Bickmore [5]is a fitness coach agent designed to interact with users over a long period oftime. They use the amount of times the users interact with Laura as a measureof friendliness, and as the friendliness grows, Laura uses more and more friendlybehaviours (e.g. smiles, nods, more gestures...). Buschmeier and Kopp [7] de-scribe a model for adapting the dialogue model of an embodied conversationalagent according to the common ground between a human and the agent. Theirfocus is on the dialogical aspect, while we consider the non-verbal dimensions inour work. The video game Facade uses unconstrained natural language and emo-tional gesture expression for all characters, including the player [22]. Its socialgames have several levels of abstraction separating atomic player interactionsfrom changes in social ”score”. Facade’s discourse acts generate immediate reac-tions from the characters, it may take story-context-specific patterns of discourseacts to influence the social game score. The score is communicated to the playervia enriched, theatrically dramatic performance. Yang et al [36] analysed theadaptation of a participant body language behaviour to the multi-modal cues ofthe interlocutor, under two dyadic interaction stances: friendly and conflictive.They use statistical mapping to automatically predict body language behaviourfrom the multi-modal speech and gesture cues of the interlocutor. They showthat people with friendly attitudes tend to adapt more their body language tothe behaviour of their interlocutors. This work clearly shows a path to socialadaptation, but the mechanism is not based on reasoning on the interlocutor’sperformance.

As a conclusion, progress has already been achieved in these different do-mains: from the computation of agent’s affects in real-time, to social adaptationin the context of training. To our knowledge there is no system that combinesall these aspects in a virtual agent.

Page 4: Towards a Socially Adaptive Virtual Agent - LSIS · the role of the recruiter. ... lite, distant, cold, ... Towards a Socially Adaptive Virtual Agent 5 where Parami is one the (N

4 A. Ben Youssef, M. Chollet, H. Jones, N. Sabouret, C. Pelachaud, M. Ochs

Socially Adaptive Virtual Agent

Social signal Interpretation

Affective Adaptation

Sequential Behaviour Planner

Scenario

Emotions

&

Attitudes

Performance

Index

Fig. 1: Overview of our socially adaptative virtual agent architecture

3 Socially Adaptive Agents

Fig. 1 presents the architecture of our socially adaptive agent. In our work,the context of the interaction is a one-to-one job interview in which the virtualagent is acting as a recruiter, and is leading the discussion by asking questions orproposing conversation topics to the human participant (interviewee), and reactsin real-time to his/her non-verbal behaviour. However, such a socially adaptiveagent could be used in other contexts.

We follow a three-steps architecture. Firstly, a performance index (PI) ofinterviewee’s non-verbal responses is computed using the Social Signal Inter-pretation (SSI) software [33]. Secondly, the performance index is passed to theaffective module which computes the potentially expressed user’s emotion andadapts the attitude of the virtual agent toward the interviewee. Lastly, the be-haviour planner selects the appropriate social signals to express the emotionand the attitude of the agent. The rendering of this behaviour is based on theGreta/Semaine platform1.

3.1 Social Signal Interpretation

The performance interpretation consists in evaluating the interviewee’s non-verbal behaviour during his/her answer to the virtual agent’s utterance. In thiswork, we only included cues from the interviewee’s speech. We plan to includeelements from the body language (head movement, crossing hand, etc.) in thecomputation of the performance index in the future.

We define PId ∈ [0, 1] the detected Performance Index computed by the SSIsystem [33] as

PId =

N∑i=1

(Parami

score × Paramiweight

)(1)

1 http://semaine-project.eu/

Page 5: Towards a Socially Adaptive Virtual Agent - LSIS · the role of the recruiter. ... lite, distant, cold, ... Towards a Socially Adaptive Virtual Agent 5 where Parami is one the (N

Towards a Socially Adaptive Virtual Agent 5

where Parami is one the (N = 5) following parameters: speech duration, speechdelay (hesitating before answering), speech rate, speech volume and pitch varia-tion. The gender-dependency of the last three parameters was taken into account.Parami

weight is the weight of the ith parameter fixed to 1N . The Parami

score isdefined such that

Paramiscore =

(Parami

detected − Paramireference

Paramidetected + Parami

reference

)(2)

where Paramidetected is the detected value using SSI while Parami

reference is theaverage value computed from a recorded data in preliminary study. This workis described in more details in [2].

3.2 Affective adaptation

The detected performance index PId is compared to an expected one PIe re-ceived from the scenario module that depends on the domain and the interactionsituation. In our context of job interview simulation, experts (practitioners froma job centre) have anticipated possible non-verbal reactions from the intervieweeto the questions defined in the job interview scenario, and defined values of PIeaccordingly.

Since we want to build a different reaction for good and low performances ofthe human interlocutor, we separate the affective state space of the virtual agentin two. Let PIH and PIL represent the good and low performances, which willlead to positive and negative affects for our virtual recruiter, respectively:

PIH =

{2PI − 1 if PI ≥ 0.5

0 otherwiseand PIL =

{1− 2PI if PI < 0.5

0 otherwise(3)

We compute the virtual agent’s emotions by comparing detected PId andexpected PIe affect of the user, following the OCC theory of emotions [24].In this study, we focus on 8 emotions: joy/distress, hope/fear, disappointment,admiration, relief and anger.

Joy follows from the occurrence of a desirable event: we simply assumed thatyoungster’s detected positive index (PIHd ) increase the joy of the agent. Theintensity of joy felt by the agent is defined as:

Ef (Joy) = PIHd (4)

Similarly, distress is triggered by the occurrence of an undesirable event PILd .Following the same approach, we define the intensity of hope and fear using theexpected performance index PIHe and PILe , respectively.

Disappointment is activated when the expected performance is positive andhigher than the detected one: the youngster did not behave as good as the virtualagent expected it:

Ef (Disappointment) = max(0, P IHe − PIHd

)(5)

Page 6: Towards a Socially Adaptive Virtual Agent - LSIS · the role of the recruiter. ... lite, distant, cold, ... Towards a Socially Adaptive Virtual Agent 5 where Parami is one the (N

6 A. Ben Youssef, M. Chollet, H. Jones, N. Sabouret, C. Pelachaud, M. Ochs

Similarly, we compute admiration (positive expectation with higher detected per-formance), relief (negative expectation with higher detected performance) andanger (negative expectation and even worse performance): max

(0, P IHd − PIHe

),

max(0, P ILe − PILd

)and max

(0, P ILd − PILe

), respectively.

Using these agent’s emotions, we compute the mood of the agent, which rep-resents its long-term affective state, from which we derive its attitude towards theinterviewee. The computation of the mood relies on the ALMA model [14] andMehrabian’s theory [23], and is described in [19]. The outcome of this computa-tion is a set of 7 categories of mood: friendly, aggressive, dominant, supportive,inattentive, attentive and gossip, with values in [0, 1].

Agent’s moods are combined with the agent’s personality to compute the at-titude, following the work by [29] and [34]. The personality is defined, with val-ues in [0, 1] of 5 traits: openness, conscientiousness, extroversion, agreeableness,neuroticism. Consequently, our virtual agent could have one of the 4 followingpersonalities: provocative, demanding, understanding or helpful. For example,an agent with a non-aggressive personality may still show an aggressive attitudeif its mood becomes very hostile. The exact combination, based on a logical-OR with fuzzy rules and transformation of categories into continuous values inIsbister’s interpersonal circumplex [16], is given in [19]. In short, we obtain n at-titude values (val (a)) positioned in the interpersonal circumplex which are usedto compute values on the friendliness axis (AxisFriendly) and the dominanceaxis (AxisDominance) :

Friendly =1

n

∑(val (ai)×AxisFriendly (ai)) (6)

Dominance =1

n

∑(val (ai)×AxisDominance (ai)) (7)

The levels of dominance and of friendliness represent the global attitude ofthe agent toward its interlocutor. The values of attitude are stored and theirchanges serve as inputs, together with the emotions and mood, to the SequentialBehaviour Planner for the selection of non-verbal behaviour.

3.3 Sequential Behaviour Planner

The Sequential Behaviour Planner module is in charge of planning the virtualagent’s behaviour. It receives two inputs. The first input is the next utteranceto be said by the virtual agent annotated with communicative intentions (e.g.ask a question, propose a conversation topic), defined in the scenario module.Communicative intentions are expressed by verbal and non-verbal behaviour,and the planning algorithm makes sure that appropriate signals are chosen toexpress the input intentions. The second input is the set of emotion values andattitude changes computed by the affective adaptation model presented in thesection 3.2.

In this paper, we do not present the emotion expression mechanism, as wemerely reuse the existing emotion displays implemented in the Greta/Semaine

Page 7: Towards a Socially Adaptive Virtual Agent - LSIS · the role of the recruiter. ... lite, distant, cold, ... Towards a Socially Adaptive Virtual Agent 5 where Parami is one the (N

Towards a Socially Adaptive Virtual Agent 7

platform [25]: the novelty of our agent is that we extended the Greta/Semaineplatform with the capability for attitude expression. As shown in section 2,attitudes can be displayed through multimodal cues (a smile, a frown, . . . ).However the context the cues appear in (i.e. the signals surrounding the cues)is crucial [20, 35]. A smile may be a sign of friendliness, of dominance or even ofsubmissiveness. Some models for attitude expression have already been proposed[4, 8, 27], but they only look at signals independently from each other, that isthey do not consider the set of surrounding behaviours. This is the reason whywe propose, in our model, to consider sequences of signals rather than signalsindependently of each other for expressing attitudes.

To choose an appropriate sequence of signals to express an attitude change,our algorithm relies on a dataset of signals sequences observed frequently beforethis type of attitude change.

Extraction of sequences of signals characteristic of attitude changes Inorder to obtain sequences of non-verbal signals characteristic of attitude changes,job interviews between human resources practitioners and youngsters performedin a job centre were recorded and annotated at two levels, the non-verbal be-haviour of the recruiter and the expressed attitude of the recruiter. More detailson the annotation process can be found in [11].

The attitude annotations are analysed to find the timestamps where the atti-tudes of the recruiters vary. The non-verbal signals annotation streams are thensegmented using the attitude variations timestamps, and the resulting segmentsare regrouped in separate datasets depending on the type of attitude variation(we define 8 attitude variation types for: large or small rise or fall on the domi-nance or friendliness dimension). For instance, we obtain a dataset of 79 segmentsleading to a large drop in friendliness, and a dataset of 45 segments leading to alarge increase in friendliness.

These datasets of sequences happening before a type of attitude variationare then mined by giving them as input to the Generalized Sequence Pattern(GSP) frequent sequence mining algorithm described in [30]. The GSP algorithmrequires a parameter Supmin, i.e. the minimum number of times a sequence hap-pens in the dataset to be considered frequent. This algorithm follows an iterativeapproach: it begins by retrieving all the individual items (i.e. in our case, non-verbal signals) that happen at least Supmin times in the dataset. These itemscan be considered as sequences of length 1: the next step of the algorithm con-sists in trying to extend these sequences by appending another item to them andchecking if the resulting sequence occurs at least Supmin times in the dataset.This step is then repeated until no new sequences are created. We run the GSPalgorithm on each dataset and obtain frequent sequences for each kind of atti-tude variation. We also compute the confidence value for each of the extractedfrequent sequences: confidence represents the ratio between the number of occur-rences of a sequence before a particular attitude variation and the total numberof occurrences of that sequence in the data [31].

Page 8: Towards a Socially Adaptive Virtual Agent - LSIS · the role of the recruiter. ... lite, distant, cold, ... Towards a Socially Adaptive Virtual Agent 5 where Parami is one the (N

8 A. Ben Youssef, M. Chollet, H. Jones, N. Sabouret, C. Pelachaud, M. Ochs

Non-verbal signals sequences planning The Sequential Behaviour Plannertakes as input an utterance to be said by the agent augmented with informationon the communicative intentions it wants to convey. The planning algorithmstarts by generating a set of minimal non-verbal signals sequences that expressthe communicative intentions. These minimal sequences are computed to makesure that the communicative intentions are expressed by the adequate signals.

Then, the algorithm creates candidate sequences from the minimal sequencesby enriching them with additional non-verbal signals. We designed a BayesianNetwork (BN) to model the relationship between pairs of adjacent signals andwith the different attitude variations, and trained it on our corpus. Taking aminimal sequence as input, the algorithm looks for the available time intervals.It then tries to insert every kind of non-verbal signals considered in the networkinto this interval and computes the probability of the resulting sequence: if thatprobability exceeds a threshold, the sequence is considered as a viable candidatesequence and carries over to the next step.

Once the set of candidate sequences has been computed, the selected sequenceis obtained using a majority-voting method derived from [17]. Full details on theSequential Behaviour Planner can be found in [10].

4 Evaluation

The context of our study is job interview simulation. The developed virtualrecruiter is capable of perceiving the participant’s behaviour. It simulates emo-tions and attitudes dynamics and expresses them. We performed an experimentto evaluate whether our model of social adaptation improves the perception ofthe agent by users.

4.1 Experimental setting

We used two different configurations for the socially adaptive virtual agent (andtwo different job seekers). Each configuration corresponds to a different set ofvalues for the personality, the recruiter’s questions and the corresponding expec-tations: one agent is considered as more demanding, while the other is supposedto have a more understanding personality.

We achieved the subjective evaluation through an online questionnaire. Eachevaluator watched four scenes presenting the agent-applicant dialogue. Thescenes were presented to the evaluators in a correct temporal order to evaluatethe dynamic of the social model over time. Each scene was presented with threedifferent conditions, followed by a set of questions to answer to. What changesfrom one condition to the other is the non-verbal behaviour of the agent, whichcorrespond to three different versions of the virtual agent. Each condition waspresented in a different video clip, where the evaluators could concentrate onthe observation of the agent’s non-verbal behaviour and compare the three con-ditions. The first condition of the agent is a scripted agent with predefined

Page 9: Towards a Socially Adaptive Virtual Agent - LSIS · the role of the recruiter. ... lite, distant, cold, ... Towards a Socially Adaptive Virtual Agent 5 where Parami is one the (N

Towards a Socially Adaptive Virtual Agent 9

behaviour. During the answer phase, the agent occasionally shows affective be-haviours (i.e. predefined emotions and attitude), regardless of the input. Notethat the agent’s behaviour is not exaggerated in our model. The second condi-tion of the agent expresses emotions computed by our affective model, during thequestion phase. The role of this condition is to verify that the computation of af-fects does not have a negative impact on the perception of the agent’s behaviour(actually, our hypothesis is that it should improve the quality of the agent). Thethird virtual agent condition is the socially adaptive agent that displays bothemotion and attitude computed from the applicant’s behaviour. It is also theconfiguration that was used to collect the data.

With the two variants for the personality of the agent (demanding or under-standing), we have a total of 24 videos (2 variants × 4 scenes × 3 agent versions).Evaluators could assess either the demanding or understanding agent, but eachevaluator had to evaluate the three animations for each four scenes.

Fig. 2: Screen-shot of a video showing the agent non-verbal behavior with bothagent and applicant speech

Page 10: Towards a Socially Adaptive Virtual Agent - LSIS · the role of the recruiter. ... lite, distant, cold, ... Towards a Socially Adaptive Virtual Agent 5 where Parami is one the (N

10 A. Ben Youssef, M. Chollet, H. Jones, N. Sabouret, C. Pelachaud, M. Ochs

2

2.5

3

3.5

Natura

lness

Expres

sivity

Credib

ility

Cohere

nce

Consid

eratio

n

Scripted Emotions Emotions & Attiude

*

+

+

+

+

+

+

* **

+^

*

+

+#

(a) Average evaluators’ perception of thedifferent agent versions

2.5

2

1.5

s1

s2

s3

s4

Scripted Emotions Emotions & Attiude

+

**

+

(b) Average rank of the agent during eachscene

Fig. 3: Results of the evaluation of the virtual agent’s behaviour over 83 par-ticipants. * denotes significant differences (p < 0.05) found from +. Similarly,ˆdenotes a quasi-significant difference (p < 0.08) from #, using ANOVA.

The subjective tests were performed by 83 evaluators (29 with the under-standing variant and 54 with the challenging one), aged between 22 and 79years old. 33 evaluators were males and 50 were females. 45 were already famil-iar with subjective evaluation tests while the remaining 38 were not. 51 of theparticipants were native speakers.

Our subjective evaluation is based on a questionnaire that was presentedafter each one of the 4 animations corresponding to 3 different versions of agentfor a given scene. On a 5-level Likert scale, the evaluator was asked to indicatewhether the agent’s behaviour is Natural, Expressive, Credible, Coherent withthe job seeker’s speech, and takes into account what the human participant said(Consideration). In addition, the evaluator was asked to rank the 3 animations,from the best (i.e. 1) to the worst (i.e. 3). The goal of this ranking was toperform a forced A/B/C test: comparison of the 3 versions in order to obtaina ”global impression”. Fig. 2 presents an example of a video of a random andanonym version of agent displaying agent’s non-verbal behavior with both agentand applicant speech.

4.2 Results and discussion

Figure 3a presents the results of evaluators’ perception of the different agent ver-sions. On all five criteria, the emotion-only agent outperforms the scripted agentand the socially adaptive agent outperforms the two other ones. An analysis ofvariance, using ANOVA, showed that the effect of emotions and attitude wassignificant on term of naturalness (F (2, 993) = 4.93, p < 0.01), expressiveness(F (2, 993) = 11.69, p < 10−5), credibility (F (2, 993) = 3.15, p < 0.05), coherence(F (2, 993) = 5.27, p < 0.005) and consideration (F (2, 993) = 10.16, p < 10−4).Post hoc analyses using the Tukey’s honest significant difference criterion forsignificance indicated that the differences between the scripted version and theemotion version are significant on term of naturalness, expressiveness, coher-

Page 11: Towards a Socially Adaptive Virtual Agent - LSIS · the role of the recruiter. ... lite, distant, cold, ... Towards a Socially Adaptive Virtual Agent 5 where Parami is one the (N

Towards a Socially Adaptive Virtual Agent 11

ence and consideration (p < 0.05) and are significant for all criteria between thescripted agent and the socially adaptive agent (p < 0.05). The first importantresult in this evaluation is that an emotional agent is judged as more naturaland coherent than an agent using predefined emotions (independent from theparticipant’s reaction). This validates our cognitive model for computing affects,presented in section 3.2. Moreover, it shows that the expression of affects, as wehave achieved it (see section 3.3), has a positive impact on the user’s percep-tion of the agent’s behaviour. The second important result is that expressingemotions and attitude improves the performance of the agent on all criteria:the participants did notice the difference in the non-verbal behaviour when theattitude is implemented. This validates the attitude expression model in thebehaviour planner presented in section 3.3.

Figure 3b presents the evolution of the evaluation over the 4 scenes. For thefirst scene s1, people rank the scripted agent between 2nd and 3rd on average,and the agent with emotions, on the contrary, is the 1st. It shows that towardsthe end (last scenes s3 and s4), the agent expressing and adapting its attitudeis ranked better than the agent with only emotions, itself ranked better thanthe scripted agent, while this is not the case in the first scene. In the rankingresult, the difference between the different agents version is not significant in thebeginning of the interaction (F (2, 249) = 2.83, p < 0.07). Post hoc analyses usingthe Tukey’s honest significant difference criterion show a significant differencebetween the emotional agent and the scripted agent (p < 0.02), in the firstscene s1. The difference with the adaptive agent becomes visible only after afew speaking turns and is significant at the end of the interview s4 (F (2, 243) =5.14, p < 0.01). Note that these results are derived form both understanding anddemanding agents. There is a slight difference between these 2 variants. However,we have the same tendency with the same conclusion.

This confirms the hypothesis that our social and affective adaptation modelmakes the agent behaviour evolve during the interview, which impacts the per-ception that the user has of the agent over time. This corresponds to the ideathat social attitude is a long-term mechanism, that may not have immediateeffects on the interlocutor. In our affective model, the attitude value is onlychanging slowly (while the emotional reaction is immediate) : the adaptationprocess becomes visible after a few minutes of conversation.

5 Conclusion and future work

In this paper, we propose an architecture for a socially adaptive agent in the con-text of dyadic interactions. Our virtual agent is able to analyse the behaviour ofthe human participant, to update a cognitive model with social constructs (e.g.attitude, relationship) depending on the behaviour of their interlocutor, and toshow coherent social attitude expression. This approach has been evaluated inthe context of job-interview simulation, by comparing the behaviour of our agentwith a scripted and a reactive approach. Results show that our socially adaptiveagent performs significantly better in term of naturalness, expressiveness, credi-

Page 12: Towards a Socially Adaptive Virtual Agent - LSIS · the role of the recruiter. ... lite, distant, cold, ... Towards a Socially Adaptive Virtual Agent 5 where Parami is one the (N

12 A. Ben Youssef, M. Chollet, H. Jones, N. Sabouret, C. Pelachaud, M. Ochs

bility, coherence and user’s consideration. It also shows that, while the emotionexpression has an immediate impact on the user’s experience, the attitude ex-pression’s impact is stronger after a longer period of time. The impact on theevaluation of the agent’s credibility increases as the interview progresses. Whilemost evaluations in the literature rely on immediate reactions, we show thatadaptation takes time and is visible after few minutes of interaction.

Our model strongly relies on the computation of the performance index. Weintend to consider additional signals to compute this performance index suchas body language (head movements, crossing hands, postures, . . . ) and facialexpressions (smile, surprise, . . . ). Yet, this performance index is currently seenas an exact value, while it is computed from social cues detected with perceptionsoftware that can produce errors. The next version of our socially adaptive agentshould rely on an extended model with uncertainty values. A second extensionwould be to consider the social cues as possible inputs to the non-verbal be-haviour planner, to better control the coherence and credibility of the agent’sbehaviour. Indeed, several research works show that the agent’s reaction shouldnot only be based on internal mental state, but also on adaptation mechanismsto the interlocutor’s expressed emotions [21, 36]. Lastly, we believe that allowingthe agent to reason about the actual and potential behaviour of the interlocu-tor, following a Theory of Mind paradigm, will lead to a more credible decisionprocess both for the dialogue model and for the appraisal mechanism.

References

1. J. C. Acosta and N. G. Ward. Achieving rapport with turn-by-turn, user-responsiveemotional coloring. Speech Communication, 53(9-10):1137–1148, Nov. 2011.

2. K. Anderson, E. Andr, T. Baur, S. Bernardini, M. Chollet, E. Chryssafidou,I. Damian, C. Ennis, A. Egges, P. Gebhard, H. Jones, M. Ochs, C. Pelachaud,K. Porayska-Pomsta, P. Rizzo, and N. Sabouret. The tardis framework: Intelligentvirtual agents for social coaching in job interviews. In D. Reidsma, H. Katayose,and A. Nijholt, editors, Advances in Computer Entertainment, volume 8253 of Lec-ture Notes in Computer Science, pages 476–491. Springer International Publishing,2013.

3. M. Argyle. Bodily Communication. University paperbacks. Methuen, 1988.4. D. Ballin, M. Gillies, and B. Crabtree. A framework for interpersonal attitude and

non-verbal communication in improvisational visual media production. In FirstEuropean Conference on Visual Media Production, pages 203–210, 2004.

5. T. W. Bickmore and R. W. Picard. Establishing and maintaining long-termhuman-computer relationships. ACM Transactions on Computer-Human Inter-action (TOCHI), 12(2):293–327, 2005.

6. J. K. Burgoon, D. B. Buller, J. L. Hale, and M. A. Turck. Relational messagesassociated with nonverbal behaviors. Human Communication Research, 10(3):351–378, 1984.

7. H. Buschmeier, T. Baumann, B. Dosch, S. Kopp, and D. Schlangen. Combiningincremental language generation and incremental speech synthesis for adaptiveinformation presentation. In Proceedings of the 13th Annual Meeting of the SpecialInterest Group on Discourse and Dialogue, pages 295–303, 2012.

Page 13: Towards a Socially Adaptive Virtual Agent - LSIS · the role of the recruiter. ... lite, distant, cold, ... Towards a Socially Adaptive Virtual Agent 5 where Parami is one the (N

Towards a Socially Adaptive Virtual Agent 13

8. A. Cafaro, H. H. Vilhjalmsson, T. Bickmore, D. Heylen, K. R. Johannsdottir, andG. S. Valgarsson. First impressions: users’ judgments of virtual agents’ personalityand interpersonal attitude in first encounters. In Intelligent Virtual Agents, IVA’12,pages 67–80, Berlin, Heidelberg, 2012. Springer-Verlag.

9. D. R. Carney, J. A. Hall, and L. S. LeBeau. Beliefs about the nonverbal expressionof social power. Journal of Nonverbal Behavior, 29(2):105–123, 2005.

10. M. Chollet, M. Ochs, and C. Pelachaud. From non-verbal signals sequence miningto bayesian networks for interpersonal attitudes expression. In Intelligent VirtualAgents, pages 120–133, 2014.

11. M. Chollet, M. Ochs, and C. Pelachaud. Mining a Multimodal Corpus for Non-Verbal Signals Sequences Conveying Attitudes. In International Conference onLanguage Resources and Evaluation, Reykjavik, Iceland, May 2014.

12. M. Chollet, G. Sratou, A. Shapiro, L.-P. Morency, and S. Scherer. An interac-tive virtual audience platform for public speaking training. In Proceedings of the2014 International Conference on Autonomous Agents and Multi-agent Systems,AAMAS ’14, pages 1657–1658, Richland, SC, 2014.

13. D. DeVault, R. Artstein, G. Benn, T. Dey, E. Fast, A. Gainer, K. Georgila,J. Gratch, A. Hartholt, M. Lhommet, et al. Simsensei kiosk: a virtual human in-terviewer for healthcare decision support. In Proceedings of the 2014 internationalconference on Autonomous agents and multi-agent systems, pages 1061–1068, 2014.

14. P. Gebhard. ALMA - A Layered Model of Affect. Artificial Intelligence, pages 0–7,2005.

15. M. E. Hoque, M. Courgeon, J.-C. Martin, B. Mutlu, and R. W. Picard. Mach: Myautomated conversation coach. In Proceedings of the 2013 ACM international jointconference on Pervasive and ubiquitous computing, pages 697–706. ACM, 2013.

16. K. Isbister. Better Game Characters by Design: A Psychological Approach (TheMorgan Kaufmann Series in Interactive 3D Technology). Morgan Kaufmann Pub-lishers Inc., 2006.

17. S. Jaillet, A. Laurent, and M. Teisseire. Sequential patterns for text categorization.Intelligent Data Analysis, 10(3):199–214, 2006.

18. W. L. Johnson and J. Rickel. Steve: An animated pedagogical agent for proceduraltraining in virtual environments. ACM SIGART Bulletin, 8(1-4):16–21, 1997.

19. H. Jones and N. Sabouret. TARDIS - A simulation platform with an affectivevirtual recruiter for job interviews. In IDGEI, 2013.

20. D. Keltner. Signs of appeasement: Evidence for the distinct displays of embar-rassment, amusement, and shame. Journal of Personality and Social Psychology,68:441–454, 1995.

21. A. Kendon. Movement coordination in social interaction: Some examples described.Acta Psychologica, 32(0):101 – 125, 1970.

22. M. Mateas and A. Stern. Structuring content in the facade interactive dramaarchitecture. In Proceedings of Artificial Intelligence and Interactive Digital En-tertainment (AIIDE), pages 93–98, 2005.

23. A. Mehrabian. Pleasure-arousal-dominance: A general framework for describ-ing and measuring individual Differences in Temperament. Current Psychology,14(4):261, 1996.

24. A. Ortony, G. L. Clore, and A. Collins. The Cognitive Structure of Emotions.Cambridge University Press, July 1988.

25. I. Poggi, C. Pelachaud, F. de Rosis, V. Carofiglio, and B. De Carolis. Greta. abelievable embodied conversational agent. In Multimodal intelligent informationpresentation, pages 3–25. Springer, 2005.

Page 14: Towards a Socially Adaptive Virtual Agent - LSIS · the role of the recruiter. ... lite, distant, cold, ... Towards a Socially Adaptive Virtual Agent 5 where Parami is one the (N

14 A. Ben Youssef, M. Chollet, H. Jones, N. Sabouret, C. Pelachaud, M. Ochs

26. H. Prendinger and M. Ishizuka. The empathic companion: A character-based in-terface that addresses users’affective states. Applied Artificial Intelligence, 19(3-4):267–285, 2005.

27. B. Ravenet, M. Ochs, and C. Pelachaud. From a user-created corpus of virtualagent’s non-verbal behaviour to a computational model of interpersonal attitudes.In Intelligent Virtual Agents, Berlin, Heidelberg, 2013. Springer-Verlag.

28. M. Schroder, E. Bevacqua, R. Cowie, F. Eyben, H. Gunes, D. Heylen, M. Ter Maat,G. McKeown, S. Pammi, M. Pantic, et al. Building autonomous sensitive artificiallisteners. Affective Computing, IEEE Transactions on, 3(2):165–183, 2012.

29. M. Snyder. The influence of individuals on situations: Implications for under-standing the links between personality and social behavior. Journal of Personality,51(3):497–516, 1983.

30. R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and per-formance improvements. Advances in Database Technology, 1057:1–17, 1996.

31. P.-N. Tan, M. Steinbach, and V. Kumar. Introduction to Data Mining (FirstEdition). Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2005.

32. L. P. Vardoulakis, L. Ring, B. Barry, C. L. Sidner, and T. W. Bickmore. Designingrelational agents as long term social companions for older adults. In IntelligentVirtual Agents (IVA), pages 289–302, 2012.

33. J. Wagner, F. Lingenfelser, T. Baur, I. Damian, F. Kistler, and E. Andre. The so-cial signal interpretation (ssi) framework-multimodal signal processing and recog-nition in real-time. In Proceedings of the 21st ACM International Conference onMultimedia, Barcelona, Spain, 2013.

34. D. T. Wegener, R. E. Petty, and D. J. Klein. Effects of mood on high elaborationattitude change: The mediating role of likelihood judgments. European Journal ofSocial Psychology, 24(1):25–43, 1994.

35. S. With and W. S. Kaiser. Sequential patterning of facial actions in the productionand perception of emotional expressions. Swiss Journal of Psychology, 70(4):241–252, 2011.

36. Z. Yang, A. Metallinou, and S. Narayanan. Analysis and predictive modeling ofbody language behavior in dyadic interactions from multimodal interlocutor cues.IEEE Transactions on Multimedia, 16(6):1766–1778, Oct 2014.