real-time multisensor people tracking for human .real-time multisensor people tracking for...
Post on 30-Jun-2018
Embed Size (px)
Real-Time Multisensor People Trackingfor Human-Robot Spatial Interaction*
Christian Dondrup1, Nicola Bellotto1, Ferdian Jovan2 and Marc Hanheide1
Abstract All currently used mobile robot platforms are ableto navigate safely through their environment, avoiding staticand dynamic obstacles. However, in human populated environ-ments mere obstacle avoidance is not sufficient to make humansfeel comfortable and safe around robots. To this end, a largecommunity is currently producing human-aware navigationapproaches to create a more socially acceptable robot behaviour.A major building block for all Human-Robot Spatial Interactionis the ability of detecting and tracking humans in the vicinityof the robot. We present a fully integrated people perceptionframework, designed to run in real-time on a mobile robot. Thisframework employs detectors based on laser and RGB-D dataand a tracking approach able to fuse multiple detectors usingdifferent versions of data association and Kalman filtering. Theresulting trajectories are transformed into Qualitative SpatialRelations based on a Qualitative Trajectory Calculus, to learnand classify different encounters using a Hidden Markov Modelbased representation.
We present this perception pipeline, which is fully imple-mented into the Robot Operating System (ROS), in a smallproof of concept experiment. All components are readily avail-able for download, and free to use under the MIT license, toresearchers in all fields, especially focussing on social interactionlearning by providing different kinds of output, i.e. QualitativeRelations and trajectories.
Currently used mobile robots are able to navigate safelythrough their environment, avoiding not only static but alsodynamic obstacles. An important aspect of mobile robotshowever is the ability to navigate and manoeuvre safelyaround humans . Treating them as dynamic obstaclesand merely avoid them is not sufficient in those situationsdue to the special needs and requirements of humans tofeel safe and comfortable when interacting with robots.Human-Robot Spatial Interaction (HRSI), is the study ofjoint movement of robots and humans through space and thesocial signals governing these interactions. It is concernedwith the investigation of models of the ways humans androbots manage their motions in vicinity to each other. Ourwork aims to equip a mobile robot with understanding ofHRSI situations and enable it to act accordingly.
Recently, the robotics community started to take the dy-namic aspects of human obstacles into account, e.g.  andcurrently a large body of research is dedicated to answer the
*The research leading to these results has received funding from the Euro-pean Communitys Seventh Framework Programme under grant agreementNo. 600623, STRANDS.
1Christian Dondrup, Nicola Bellotto and Marc Hanheide are with theSchool of Computer Science, University of Lincoln, LN6 7TS Lincoln, UKcdondrup, nbellotto, firstname.lastname@example.org
2Ferdian Jovan is with the School of Computer Science, University ofBirmingham, B15 2TT Birmingham, UK email@example.com
Fig. 1. Example output of the ROS based people perception pipeline,showing the tracker and detector results in rviz.
more fundamental questions of HRSI to produce navigationapproaches which plan to explicitly move on more sociallyacceptable and legible paths . The term legible hererefers to the communicative or interactive aspects ofmovements which previously has widely been ignored inrobotics research. To realise any kind of these HRSI appli-cations, the basic challenge is the detection and tracking ofhumans in the vicinity of the robot considering the robotsmovement, varying ambient conditions, and occlusion .Due to the importance of these applications, there are severalsolutions to the problem of detection and tracking of humansor body parts, as shown for example in these surveys ,. In our work, we are focusing on two specific bodypart detectors, i.e. human upper bodies and legs, combinedin a multisensor Bayesian tracking framework. We presenta fully integrated and freely available processing pipeline,using the widely used Robot Operating System (ROS), todetect and track humans using a mobile robot. Apart from theability to directly feed into reactive human-aware navigationapproaches like , this detection and tracking frameworkis used to create qualitative representations of the interac-tion , to facilitate online interaction learning approachesvia a Hidden Markov Model (HMM) based representation ofthese qualitative states.
Detecting walking or standing pedestrians is a widelystudied field due to the advances in autonomous cars androbots. To this end, many successful full-body or partlyoccluded body detectors have been developed, e.g. , .Most of these detectors suffer from high computational costswhich is why currently used approaches, like the so calledupper body detector , rely on the extraction of a RegionOf Interest (ROI) to speed up detection. Therefore, our
Fig. 2. Conceptual overview of the system architecture. The number ofdetectors is variable due to the modular design of the tracker and its abilityto merge several detections from different sensors.
framework employs this upper body detector for the real-time detection of walking or standing humans. The seconddetector used is a leg detector based on work by  and hasbecome a standard ROS component for people perception.Like detectors, human tracking is an important part of aperception system for human spatial movement. Hence, avariety of tracking systems using multisensor fusion havebeen introduced, e.g. . We use a probabilistic real-timetracking framework which, in its current implementationof the processing pipeline, relies on the fusion of the twomentioned sensors and an Extended or Unscented Kalmanfilter to track and predict the movements of humans .However, the tracker itself does not rely on a specific detectorfor input and is very modular in design.
The presented human perception pipeline (see Fig. 2) isused to facilitate our interaction learning framework whichrelies on previous work where we introduced a qualitative,probabilistic framework to represent HRSI  using a Qual-itative Trajectory Calculus (QTC) . We utilised HMMsto represent QTC state chains and classify several differenttypes of HRSI encounters observed from experiments. Inthis paper we present a fully integrated and automatisedprocessing pipeline for the detection and tracking of peoplein close vicinity to a robot (see Fig. 1), and the onlinecreation of QTC state chains, enabling the classification andlearning of HRSI using our HMM representation. We showthe functionality of our framework in a small experimentusing an autonomous human-sized mobile robot in a realworld office environment, relying on the robots on-boardsensors instead of an external motion capture system likein the mentioned previous work.
Everything presented here is available online. A conciselist of packages and installation instructions can be found athttp://lcas.lincoln.ac.uk/cdondrup.
In this section we present our integrated system, consistingof the perception pipeline including the upper body detectorand the tracker, the qualitative spatial representation module,and the library to create the QTC state chains.
Robots use a range of sensors to perceive the outsideworld, enabling them to reason about its future state and plan
Fig. 3. Our Scitos G5 robot equipped with two RGB-D sensors and alaser scanner in our open office environment.
their actions. Our robot (see Fig. 3) used in the experimenthas three main sensors, i.e. a Asus Xtion RGB-D camera, aSick s300 laser, and the odometry of its non-holonomic base.In the following we are presenting the two detectors based onthe RGB-D and laser scanner, respectively. Example outputcan been seen in Figure 1.
The so-called upper body detector uses a template andthe depth information of a RGB-D sensor to identify upperbodies (shoulders and head), designed to work for closerange human detection using head mounted cameras .This first approach was based on stereo outdoor data; anintegrated tracking system using a Kinect like RGB-D sensorand the mentioned detector was introduced in . To reducethe computational load, this upper body detector employs aground plane estimation or calculation to determine a Regionof Interest (ROI) most suitable to detect upper bodies of astanding or walking person. The actual depth image is thenscaled to various sizes and the template is sliding over theimage trying to find matches. This detector works in realtime, meaning 25fps which corresponds to the frame rateof the Asus Xtion. We implemented the detector and theground plane estimation/calculation into ROS and will useit as one of the inputs for our tracker. The main advantageof this detector, compared to full body detectors, is that thecamera on our robot is mounted in 1.72m hight which onlyallows it to see upper bodies in normal corridor or roomsettings, like offices or flats, due to the restrictions in spaceand the field of view of the Asus camera.
In addition to the RGB-D base detector we also usea laser based leg detector which was initially introducedin . Using laser scanners for people perception is popularin mobile robotics because most currently used platformsprovide such a sensor which also has a wider field of viewthan a camera and is less dependent on ambient conditions.Arras et al. define a set of 14 features for the detection oflegs which uses, e.g. the number of beams, the circularity, theradius, mean curvature, and the mean speed, to only namea few. These features are used for the supervised learning
t1 t2 t1t2, t3t3