demo: toward a data-driven generative behavior …demo: toward a data-driven generative behavior...

2
Demo: Toward A Data-Driven Generative Behavior Model for Human-Robot Interaction Henny Admoni Department of Computer Science Yale University New Haven, CT USA 06520 [email protected] Brian Scassellati Department of Computer Science Yale University New Haven, CT USA 06520 [email protected] ABSTRACT Socially assistive robots are designed to help people in social ways, for example through coaching or teaching. Because they operate in social environments, these robots must be able to understand and communicate with social cues that people use. Non-verbal behavior, such as eye gaze and ges- ture, can provide significant communication in social con- texts. Our research investigates how people use non-verbal behaviors when teaching. We use data from human-human interactions to computationally model peoples’ non-verbal behaviors, with the goal of constructing a generative robot behavior model based on this computational model of peo- ple. In this paper, we describe our work in progress and discuss challenges of real-time human-robot interaction. Categories and Subject Descriptors I.2.9 [Artificial Intelligence]: Robotics Keywords human-robot interaction; non-verbal communication 1. INTRODUCTION Socially Assistive Robotics (SAR) is a growing field that aims to design, construct, and evaluate robots to help people through social interactions [3]. Applications of SAR include educational tutoring [5], eldercare [7], and therapy [6]. SAR is different from other areas of robotics because it re- lies on social interactions between humans and robots. Peo- ple use various forms of social communication every day, and their experience is generally natural and effortless. For example, people tend to look at shared reference points in conversation [2]. These kinds of behaviors must be explicitly designed for a SAR robot, however. As SAR applications be- come more common, there is a growing need for robots to be able to leverage all modes of social communication. In addition to using the full range of communication modal- ities, it is also important that robot behavior follows human Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full cita- tion on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- publish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. MARS’14, June 16, 2014, Bretton Woods, New Hampshire, USA. Copyright 2014 ACM 978-1-4503-2823-4/14/06. http://dx.doi.org/10.1145/2609829.2611588 . Figure 1: A frame from a human-human teaching interaction used to train the model. expectations. If robots generate social behavior that is out- side of the established communication norms, people will be confused or reject the robot interaction outright. Therefore, any approach to designing social behaviors for robots must be informed by actual human behavior. Our research focuses on building models of human-human non-verbal communication and implementing these models for generating socially appropriate and communicative be- havior for socially assistive robots. These data-driven mod- els allow us to design robots that match peoples’ existing non-verbal communication use. Some researchers have begun to address this need for data- driven robot behavior models. For example, researchers have modeled conversational gaze aversions [1] and gesture during narration [4] based on analysis of human-human pairs. While these studies show promising advances in data-driven models of robot behavior, none of them deals directly with socially assistive applications, which require monitoring of— and feedback from—the interaction partner. There are three steps in the process of designing a data- driven generative behavior model: 1) collect data on non- verbal behaviors during human-human interactions, 2) train a predictive computational model with the human-human interaction data, and 3) develop a generative model for robot behaviors driven by the computational model from step 2. In this paper, we describe existing and planned work to- ward a model of behaviors for tutoring, a major focus of SAR. We conclude the paper by outlining some challenges of real-time SAR. 2. MODELING HUMAN INTERACTIONS To collect data about human interactions, we analyze a typical teaching interaction between two people, in which one participant (the teacher) explains the rules of a board game to another participant (the student). We video and audio-record the teaching interaction (Figure 1) as well as

Upload: others

Post on 27-Jun-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Demo: Toward A Data-Driven Generative Behavior …Demo: Toward A Data-Driven Generative Behavior Model for Human-Robot Interaction Henny Admoni Department of Computer Science Yale

Demo: Toward A Data-Driven Generative Behavior Modelfor Human-Robot Interaction

Henny AdmoniDepartment of Computer Science

Yale UniversityNew Haven, CT USA 06520

[email protected]

Brian ScassellatiDepartment of Computer Science

Yale UniversityNew Haven, CT USA 06520

[email protected]

ABSTRACTSocially assistive robots are designed to help people in socialways, for example through coaching or teaching. Becausethey operate in social environments, these robots must beable to understand and communicate with social cues thatpeople use. Non-verbal behavior, such as eye gaze and ges-ture, can provide significant communication in social con-texts. Our research investigates how people use non-verbalbehaviors when teaching. We use data from human-humaninteractions to computationally model peoples’ non-verbalbehaviors, with the goal of constructing a generative robotbehavior model based on this computational model of peo-ple. In this paper, we describe our work in progress anddiscuss challenges of real-time human-robot interaction.

Categories and Subject DescriptorsI.2.9 [Artificial Intelligence]: Robotics

Keywordshuman-robot interaction; non-verbal communication

1. INTRODUCTIONSocially Assistive Robotics (SAR) is a growing field that

aims to design, construct, and evaluate robots to help peoplethrough social interactions [3]. Applications of SAR includeeducational tutoring [5], eldercare [7], and therapy [6].

SAR is different from other areas of robotics because it re-lies on social interactions between humans and robots. Peo-ple use various forms of social communication every day,and their experience is generally natural and effortless. Forexample, people tend to look at shared reference points inconversation [2]. These kinds of behaviors must be explicitlydesigned for a SAR robot, however. As SAR applications be-come more common, there is a growing need for robots tobe able to leverage all modes of social communication.

In addition to using the full range of communication modal-ities, it is also important that robot behavior follows human

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full cita-tion on the first page. Copyrights for components of this work owned by others thanACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-publish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected]’14, June 16, 2014, Bretton Woods, New Hampshire, USA.Copyright 2014 ACM 978-1-4503-2823-4/14/06.http://dx.doi.org/10.1145/2609829.2611588 .

Figure 1: A frame from a human-human teachinginteraction used to train the model.

expectations. If robots generate social behavior that is out-side of the established communication norms, people will beconfused or reject the robot interaction outright. Therefore,any approach to designing social behaviors for robots mustbe informed by actual human behavior.

Our research focuses on building models of human-humannon-verbal communication and implementing these modelsfor generating socially appropriate and communicative be-havior for socially assistive robots. These data-driven mod-els allow us to design robots that match peoples’ existingnon-verbal communication use.

Some researchers have begun to address this need for data-driven robot behavior models. For example, researchershave modeled conversational gaze aversions [1] and gestureduring narration [4] based on analysis of human-human pairs.While these studies show promising advances in data-drivenmodels of robot behavior, none of them deals directly withsocially assistive applications, which require monitoring of—and feedback from—the interaction partner.

There are three steps in the process of designing a data-driven generative behavior model: 1) collect data on non-verbal behaviors during human-human interactions, 2) traina predictive computational model with the human-humaninteraction data, and 3) develop a generative model for robotbehaviors driven by the computational model from step 2.

In this paper, we describe existing and planned work to-ward a model of behaviors for tutoring, a major focus ofSAR. We conclude the paper by outlining some challengesof real-time SAR.

2. MODELING HUMAN INTERACTIONSTo collect data about human interactions, we analyze a

typical teaching interaction between two people, in whichone participant (the teacher) explains the rules of a boardgame to another participant (the student). We video andaudio-record the teaching interaction (Figure 1) as well as

Page 2: Demo: Toward A Data-Driven Generative Behavior …Demo: Toward A Data-Driven Generative Behavior Model for Human-Robot Interaction Henny Admoni Department of Computer Science Yale

Category Feature Vocabulary

VerbalTranscript Actual words said (e.g., “Now take this card. . . ”)Context (C) Map reference, confirmation seeking, backchannel, etc.

Non-verbalEye gaze (G) At map, at partner’s face, at partner’s hands, etc.Sweeping gesture (S) To map, to game box, etc.Pointing gesture (P ) To map, to partner’s cards, etc.Head pose (H) Nod yes, shake no, questioning, etc.

Table 1: The coding vocabulary used to extract data from the human-human interaction video.

two subsequent rounds of gameplay. In total, recordingslasted about 20 minutes per dyad.

In order to extract data from the video recordings, wehand-annotated certain verbal and non-verbal behavioralfeatures, as shown in Table 1. Each annotation can be de-scribed by a tuple (ct, gt, st, pt, ht) where ct ∈ C is the verbalcontext, gt ∈ G is the eye gaze, st ∈ S is the sweeping ges-ture, pt ∈ P is the pointing gesture, and ht ∈ H is the headpose occurring at time t.

These features are used to train a predictive model with ak-nearest neighbor algorithm. In the algorithm, non-verbalfeatures are attributes and context is the class label. In otherwords, for an annotation at time i, there is a feature vectorαi = (gi, si, pi, hi) with classification ci. To classify a newsample, the algorithm finds the k closest training samplesand assigns the sample’s feature vector a context based ona majority vote of c for the k closest samples. This modelallows our system to classify the context of new observationsof non-verbal behaviors.

3. GENERATING ROBOT BEHAVIORThe next step of this research extends the learned human

behavior model into a generative model for robot behavior.This section discussed planned work toward this goal.

To generate new behavior from the learned model, thesystem must identify appropriate actions based on a knowncontext. For example, a segment of robot speech that refersdeictically to the map is labeled with the “map reference”context ahead of time. When the robot reaches the partof its lesson that uses that segment, the behavior selectormust generate appropriate non-verbal behaviors for a mapreference context to match the robot’s speech.

One approach is to perform k-means clustering on the ob-served samples, and to identify the context of each ensuingcluster by the majority ci of its constituent elements. Eachcluster also has a weight ω representing the proportion of ele-ments in that cluster with matching ci, that is, how stronglythat cluster represents context ci.

In order to identify non-verbal behaviors for a desired con-text cd, the system identifies one cluster with label cd thathas ω above some threshold θ. The system then selects asample near the center of that cluster and uses that sample’sfeatures (g, s, p, and h) to decide on the robot’s behavior.

For example, suppose that a map reference context was of-ten accompanied by gaze to the map and a pointing gesturetoward the map. Then when looking for appropriate behav-ior for a map reference, the system would find a cluster c =“map reference” with a sample containing g = “at map” andp = “at map” near the center of the cluster. Based on thissample, the robot would gaze and point at the map. Somenoise in the selection algorithm could ensure that differentsamples were selected from within the same cluster.

4. CHALLENGESReal-time robot behavior control, like real-time mobile

systems, faces many challenges in sensing and processing.While there have been significant improvements in body

posture recognition and eye gaze tracking, real-time sensingof non-verbal behaviors can be difficult. Quickly and reliablydetecting the target of eye gaze in real time is an active areaof research. While systems such as Microsoft Kinect nowallow for relatively easy posture recognition, the process ofaction identification (e.g., pointing) also remains difficult.

Real-time learning and adaptation also remains a chal-lenge of socially assistive robotics. People are not staticsystems, and their preferences and knowledge change overtime. Good SAR systems build models that adapt in realtime based on continuously collected training samples. How-ever, classifying these samples in real time can be difficult,both because the ground truth is uncertain and because ma-chine learning can require significant computational process-ing. Mobile system strategies like offloading computation toa remote server may help address these challenges.

5. ACKNOWLEDGMENTSThis work is supported by NSF grants 1139078 and 1117801.

6. REFERENCES[1] S. Andrist, X. Z. Tan, M. Gleicher, and B. Mutlu.

Conversational gaze aversion for humanlike robots. InProceedings of the 10th ACM/IEEE InternationalConference on Human-Robot Interaction (HRI ’14).ACM, 2014.

[2] M. Argyle and M. Cook. Gaze and Mutual Gaze.Cambridge University Press, Oxford, England, 1976.

[3] D. Feil-Seifer and M. J. Mataric. Defining sociallyassistive robotics. In Proceedings of the 9thInternational IEEE Conference on RehabilitationRobotics, 2005.

[4] C.-M. Huang and B. Mutlu. Modeling and EvaluatingNarrative Gestures for Humanlike Robots. InProceedings of Robotics: Science and Systems, Berlin,Germany, June 2013.

[5] T. Kanda, T. Hirano, D. Eaton, and H. Ishiguro.Interactive robots as social partners and peer tutors forchildren: A field trial. Human-Computer Interaction,19:61–84, 2004.

[6] B. Scassellati, H. Admoni, and M. Mataric. Robots foruse in autism research. Annual Review of BiomedicalEngineering, 14:275–294, 2012.

[7] K. Wada and T. Shibata. Living with seal robots—itssociophsychological and physiological influences on theelderly at a care house. IEEE Transactions on Robotics,23(5):972–980, October 2007.