Real-Time Multisensor People Tracking for Human ?· Real-Time Multisensor People Tracking for Human-Robot…

Download Real-Time Multisensor People Tracking for Human ?· Real-Time Multisensor People Tracking for Human-Robot…

Post on 30-Jun-2018




0 download

Embed Size (px)


<ul><li><p>Real-Time Multisensor People Trackingfor Human-Robot Spatial Interaction*</p><p>Christian Dondrup1, Nicola Bellotto1, Ferdian Jovan2 and Marc Hanheide1</p><p>Abstract All currently used mobile robot platforms are ableto navigate safely through their environment, avoiding staticand dynamic obstacles. However, in human populated environ-ments mere obstacle avoidance is not sufficient to make humansfeel comfortable and safe around robots. To this end, a largecommunity is currently producing human-aware navigationapproaches to create a more socially acceptable robot behaviour.A major building block for all Human-Robot Spatial Interactionis the ability of detecting and tracking humans in the vicinityof the robot. We present a fully integrated people perceptionframework, designed to run in real-time on a mobile robot. Thisframework employs detectors based on laser and RGB-D dataand a tracking approach able to fuse multiple detectors usingdifferent versions of data association and Kalman filtering. Theresulting trajectories are transformed into Qualitative SpatialRelations based on a Qualitative Trajectory Calculus, to learnand classify different encounters using a Hidden Markov Modelbased representation.</p><p>We present this perception pipeline, which is fully imple-mented into the Robot Operating System (ROS), in a smallproof of concept experiment. All components are readily avail-able for download, and free to use under the MIT license, toresearchers in all fields, especially focussing on social interactionlearning by providing different kinds of output, i.e. QualitativeRelations and trajectories.</p><p>I. INTRODUCTION</p><p>Currently used mobile robots are able to navigate safelythrough their environment, avoiding not only static but alsodynamic obstacles. An important aspect of mobile robotshowever is the ability to navigate and manoeuvre safelyaround humans [1]. Treating them as dynamic obstaclesand merely avoid them is not sufficient in those situationsdue to the special needs and requirements of humans tofeel safe and comfortable when interacting with robots.Human-Robot Spatial Interaction (HRSI), is the study ofjoint movement of robots and humans through space and thesocial signals governing these interactions. It is concernedwith the investigation of models of the ways humans androbots manage their motions in vicinity to each other. Ourwork aims to equip a mobile robot with understanding ofHRSI situations and enable it to act accordingly.</p><p>Recently, the robotics community started to take the dy-namic aspects of human obstacles into account, e.g. [2] andcurrently a large body of research is dedicated to answer the</p><p>*The research leading to these results has received funding from the Euro-pean Communitys Seventh Framework Programme under grant agreementNo. 600623, STRANDS.</p><p>1Christian Dondrup, Nicola Bellotto and Marc Hanheide are with theSchool of Computer Science, University of Lincoln, LN6 7TS Lincoln, UKcdondrup, nbellotto,</p><p>2Ferdian Jovan is with the School of Computer Science, University ofBirmingham, B15 2TT Birmingham, UK</p><p>Fig. 1. Example output of the ROS based people perception pipeline,showing the tracker and detector results in rviz.</p><p>more fundamental questions of HRSI to produce navigationapproaches which plan to explicitly move on more sociallyacceptable and legible paths [3]. The term legible hererefers to the communicative or interactive aspects ofmovements which previously has widely been ignored inrobotics research. To realise any kind of these HRSI appli-cations, the basic challenge is the detection and tracking ofhumans in the vicinity of the robot considering the robotsmovement, varying ambient conditions, and occlusion [4].Due to the importance of these applications, there are severalsolutions to the problem of detection and tracking of humansor body parts, as shown for example in these surveys [5],[6]. In our work, we are focusing on two specific bodypart detectors, i.e. human upper bodies and legs, combinedin a multisensor Bayesian tracking framework. We presenta fully integrated and freely available processing pipeline,using the widely used Robot Operating System (ROS), todetect and track humans using a mobile robot. Apart from theability to directly feed into reactive human-aware navigationapproaches like [7], this detection and tracking frameworkis used to create qualitative representations of the interac-tion [8], to facilitate online interaction learning approachesvia a Hidden Markov Model (HMM) based representation ofthese qualitative states.</p><p>Detecting walking or standing pedestrians is a widelystudied field due to the advances in autonomous cars androbots. To this end, many successful full-body or partlyoccluded body detectors have been developed, e.g. [9], [10].Most of these detectors suffer from high computational costswhich is why currently used approaches, like the so calledupper body detector [11], rely on the extraction of a RegionOf Interest (ROI) to speed up detection. Therefore, our</p></li><li><p>Fig. 2. Conceptual overview of the system architecture. The number ofdetectors is variable due to the modular design of the tracker and its abilityto merge several detections from different sensors.</p><p>framework employs this upper body detector for the real-time detection of walking or standing humans. The seconddetector used is a leg detector based on work by [12] and hasbecome a standard ROS component for people perception.Like detectors, human tracking is an important part of aperception system for human spatial movement. Hence, avariety of tracking systems using multisensor fusion havebeen introduced, e.g. [13]. We use a probabilistic real-timetracking framework which, in its current implementationof the processing pipeline, relies on the fusion of the twomentioned sensors and an Extended or Unscented Kalmanfilter to track and predict the movements of humans [14].However, the tracker itself does not rely on a specific detectorfor input and is very modular in design.</p><p>The presented human perception pipeline (see Fig. 2) isused to facilitate our interaction learning framework whichrelies on previous work where we introduced a qualitative,probabilistic framework to represent HRSI [8] using a Qual-itative Trajectory Calculus (QTC) [15]. We utilised HMMsto represent QTC state chains and classify several differenttypes of HRSI encounters observed from experiments. Inthis paper we present a fully integrated and automatisedprocessing pipeline for the detection and tracking of peoplein close vicinity to a robot (see Fig. 1), and the onlinecreation of QTC state chains, enabling the classification andlearning of HRSI using our HMM representation. We showthe functionality of our framework in a small experimentusing an autonomous human-sized mobile robot in a realworld office environment, relying on the robots on-boardsensors instead of an external motion capture system likein the mentioned previous work.</p><p>Everything presented here is available online. A conciselist of packages and installation instructions can be found at</p><p>II. OVERVIEW</p><p>In this section we present our integrated system, consistingof the perception pipeline including the upper body detectorand the tracker, the qualitative spatial representation module,and the library to create the QTC state chains.</p><p>A. Detectors</p><p>Robots use a range of sensors to perceive the outsideworld, enabling them to reason about its future state and plan</p><p>Fig. 3. Our Scitos G5 robot equipped with two RGB-D sensors and alaser scanner in our open office environment.</p><p>their actions. Our robot (see Fig. 3) used in the experimenthas three main sensors, i.e. a Asus Xtion RGB-D camera, aSick s300 laser, and the odometry of its non-holonomic base.In the following we are presenting the two detectors based onthe RGB-D and laser scanner, respectively. Example outputcan been seen in Figure 1.</p><p>The so-called upper body detector uses a template andthe depth information of a RGB-D sensor to identify upperbodies (shoulders and head), designed to work for closerange human detection using head mounted cameras [11].This first approach was based on stereo outdoor data; anintegrated tracking system using a Kinect like RGB-D sensorand the mentioned detector was introduced in [16]. To reducethe computational load, this upper body detector employs aground plane estimation or calculation to determine a Regionof Interest (ROI) most suitable to detect upper bodies of astanding or walking person. The actual depth image is thenscaled to various sizes and the template is sliding over theimage trying to find matches. This detector works in realtime, meaning 25fps which corresponds to the frame rateof the Asus Xtion. We implemented the detector and theground plane estimation/calculation into ROS and will useit as one of the inputs for our tracker. The main advantageof this detector, compared to full body detectors, is that thecamera on our robot is mounted in 1.72m hight which onlyallows it to see upper bodies in normal corridor or roomsettings, like offices or flats, due to the restrictions in spaceand the field of view of the Asus camera.</p><p>In addition to the RGB-D base detector we also usea laser based leg detector which was initially introducedin [12]. Using laser scanners for people perception is popularin mobile robotics because most currently used platformsprovide such a sensor which also has a wider field of viewthan a camera and is less dependent on ambient conditions.Arras et al. define a set of 14 features for the detection oflegs which uses, e.g. the number of beams, the circularity, theradius, mean curvature, and the mean speed, to only namea few. These features are used for the supervised learning</p><p></p></li><li><p>k l</p><p>t1 t2 t1t2, t3t3</p><p>Fig. 4. Example of HRSI encoded in QTC, with human k and robot l.The respective QTC state chain is ( 0 0)t1 : k moves straight towardsl ( 0 0) and l moves straight towards k ( 0 0), ( 0 0 0)t2 : kcontinues its movement while l is stationary ( 0 0 0), ( + 0)t3 : kapproaches and moves to the right ( 0 + 0) while l stays stationary. For amore comprehensive and detailed description of QTC for HRSI please referto [8].</p><p>of a set of weak classifiers using recorded training data.The AdaBoost [17] algorithm is employed to turn theseweak classifiers into a strong classifier, detecting legs inlaser range data. The approach was evaluate in various officeand corridor settings which makes it ideal for most indoorrobotics environments. The implementation of the detectoris part of the official ROS people stack1 and is also used tofeed into our tracker.</p><p>B. Tracker</p><p>To use the wealth of information provided by a robotequipped with multiple sensors, we refrain from using purelyvision based trackers like the one introduced by Hosseiniet al.2 from which we extracted the upper body detector and employ a solution for Bayesian tracking originallyimplemented in [14]. This tracker is available in its currentimplementation from [18] (see Fig. 5(b)) and allows tonatively combine multiple sensors and is not dependent onthe frame rate of any one detector. Bellotto et al. showedthat their Bayesian tracker, based on an Unscented KalmanFilter, achieves comparable results to a Sampling ImportanceResampling particle filter in several people tracking scenar-ios, although it is computationally more efficient in termsof estimation time. In the current implementation differenttracking configurations can be used by defining the noiseparameters of the constant velocity model to predict humanmotion, to e.g. compensate for loss of detection, as presentedin [19], and the fixed frame observation models (one for eachdetector). A gating procedure is applied using a validationregion relative to the target, based on the chosen noiseparameters, for each new predicted observation in order toreduce the chance of assigning false positives and wrongobservations [20]. New detections are then associated to thecorrect target using a Nearest Neighbour (NN) associationalgorithm, suitable for computationally less powerful robotsystems, or a more sophisticated Nearest Neighbour JointProbabilistic Data Association (NNJPDA), which is morereliable but also less efficient regarding computation time. Ifno suitable target could be found, the detections are stored</p><p>1 tracker bey Hosseini et al. has been ported and split up into several</p><p>ROS modules and is available from our github repository or debian packageserver, see Sec. II-E.</p><p>(a) The two used detectors. Greensphere: leg detector, green cube andred box in image: upper body detec-tor.</p><p>(b) The tracker output. Overlaid redand blue figure: position of trackedhuman, green arrow: orientation.</p><p>(c) A human moving around therobot. Showing his/her current posi-tion and the path since the start ofthe tracking.</p><p>(d) The complete path of the humanwalking around the robot. Humanleft the field of view of the laser andis not tracked any more.</p><p>Fig. 5. The visualisations of the detector and tracker outputs using rviz.The leg detector is a standard ROS component, the upper body detector wasported to ROS and tracker was implemented by us.</p><p>and eventually used to create a new track if they are stableover a predefined time frame. For a detailed description ofthe tracking and association algorithms, please refer to theoriginal paper [14] and other relevant work [21].</p><p>C. Spatial Interaction Learning</p><p>The output of the tracking framework, i.e. the position,velocity, and orientation of the tracked humans, can eitherbe directly used for reactive human-aware navigation like theROS implementation of layered costmaps [7], or for onlinespatial interaction learning. In previous work, we focused ona learning approach representing HRSI utilising QualitativeSpatial Relations (QSR) [8], i.e. a Qualitative TrajectoryCalculus (QTC) [15]. This is used to qualitatively representthe relative spatial movement of a human (k) and a robot (l)in relation to one another (see Fig. 4).3 The 2D movementof these two agents k and l is represented in a 4-tuple(a b c d) where a and b can assume the values {,+, 0}for moving apart, towards the other, or being stable withregards to the last position and where c and d can assumethe values {,+, 0} for moving to the left, right, or alongthe connecting line between the two agents. Hence, a and cdescribe the movement of k and b and d the movement of lin relation to each other. This allows to abstract from metricspace and to focus on the essence of the interaction.</p><p>We presented a Hidden Markov Model (HMM) repre-sentation of these QTC state chains in [8] used to classifyand reason about HRSI situations. This probabilistic modelenables us to represent actual sensor data by allowing for</p><p>3The variant of QTC we are referring to here is QTCC21.</p><p></p></li><li><p>/to_pose_array/leg_detector /people_tracker/positions</p><p>/people_tracker/positions/upper_body_detector/bound...</p></li></ul>


View more >