[Lecture Notes in Computer Science] Computer Vision and Graphics Volume 8671 || Gaze-Driven Object Tracking Based on Optical Flow Estimation

Download [Lecture Notes in Computer Science] Computer Vision and Graphics Volume 8671 || Gaze-Driven Object Tracking Based on Optical Flow Estimation

Post on 09-Feb-2017




2 download

Embed Size (px)


  • Gaze-Driven Object Tracking

    Based on Optical Flow Estimation

    Bartosz Bazyluk and Radoslaw Mantiuk

    West Pomeranian University of Technology, SzczecinFaculty of Computer Science and Information Technology

    Zolnierska Str. 49, 71210 Szczecin, Poland{bbazyluk,rmantiuk}@wi.zut.edu.pl

    Abstract. To efficiently deploy eye tracking within gaze-dependentimage analysis tasks, we present an optical flow-aided extension of thegaze-driven object tracking technique (GDOT). GDOT assumes that ob-jects in a 3-dimensional space are fixation targets and with high probabi-lity computes the fixation directions towards the target observed by theuser. We research whether this technique proves its efficiency for videofootage in 2-dimensional space in which the targets are tracked by opticalflow tracking technique with inaccuracies characteristic for this method.In the conducted perceptual experiments, we assess efficiency of the gaze-driven object identification by comparing results with the reference datawhere attended objects are known. The GDOT extension reveals highererrors in comparison to 3D graphics tasks but still outperforms typicalfixation techniques.

    1 Introduction

    Gaze tracking is a powerful technique which can be applied for the visual sa-liency analysis. It can indicate the regions of an image that are most frequentlyattended by human observer. Gaze data captured by eye trackers show how vi-sual space is scanned by the constantly changing gaze to extract informationand build the conscious image of a 3-dimensional scene. Gaze tracking could besuccessfully applied in many computer vision application, for example in analysisof advertisement visibility in movie clips, or in gaze-driven image segmentation.However, it is disappointing that neither display devices nor image analysis me-thods make full use of this property of the human visual system (HVS). We arguethat the main reason of this fact is the low accuracy of eye-tracking systems.

    Eye trackers capture gaze direction indicated by the physical pose of the eye,in particular location of the pupil centre [1]. The actual fixation direction (thedirection consistent with intention of the observer) is determined with the aidof gaze data that changes over time. Typical fixation detection algorithms areprone to accuracy error of up to two degrees of visual angle [3], which is anequivalent of 80-pixel distance between the reference fixation point watched byobserver and the fixation point captured by the device (value estimated for a

    L.J. Chmielewski et al. (Eds.): ICCVG 2014, LNCS 8671, pp. 8491, 2014.c Springer International Publishing Switzerland 2014

  • Gaze-Driven Object Tracking Based on Optical Flow Estimation 85

    typical 22-inch display of 1680x1050 pixel resolution, seen from a distance of65 cm).

    In this work we propose a gaze-tracking system that combines gaze data withinformation about the image content to greatly improve the accuracy and sta-bility of the contemporary eye trackers. The general purpose fixation detectiontechniques are not suitable for tracking the moving objects, which are followedwith eyes in the smooth pursuit movement [9]. To overcome this limitation weuse the gaze-driven object tracking (GDOT) technique [2], which clearly outper-forms the standard fixation detection techniques in this matter. GDOT treatsdistinct objects in a scene as potential fixation targets and with high probabilitycomputes the fixation directions towards the target observed by the viewer. Itwas demonstrated that this technique can be successfully applied to the depth-of-field effect simulation [10], in tone mapping [6], and as an auxiliary controllerin a computer game [2].

    In this paper we present a novel application of GDOT in which it is appliedto identify visually attended objects in a TV broadcast or other examples ofmoving pictures. A set of targets (e.g. players in a football game) is tracked bythe sparse optical flow-aided method. We use the output from this system asa set of potential attention targets that allow us to compute attention-relatedstatistics, e.g. estimation of the most attended player in the game. The basic dif-ference between the original algorithm and our extension is that GDOT assumesperfect location of the targets. But in the optical flow approach inaccuracies intarget positions may occur. We also take into account the cases in which targetsgo beyond the camera boundaries during footage and must be temporarily orpermanently removed from the list of potential targets. We test if this data qu-ality deterioration can be compensated by the regular GDOT routines. We alsocompare achieved results with efficiency of the typical fixation protocols.

    Sect. 2 introduces the main concept of our work, i.e. an optical flow-aidedattention tracking system. In Sect. 3 we describe the performed experimentsand then discuss their results in Sect. 4. The paper ends with Conclusions.

    2 Gaze-Attended Target Identification

    The overview of our proposed tracking system is shown in Fig. 1. The opticalflow supplies the GDOT module with current positions of the potential targetsof attention and eye tracker sends the current gaze positions. The task of GDOTis to select the most likely target of attention for the current time instance.

    Proposed Optical Flow-Based Tracking Extension. To use GDOT tech-nique for effective tracking, first a set of potential attention targets has to beprovided. In our proposed solution an expert who has the knowledge about scopeof the study is responsible for identification and marking of targets. Since ourgoal is to choose targets that are semantically important for the study and canattract observers attention, the general purpose automated feature extractionmethods may not be used instead. In this work we use re-playable finite video

  • 86 B. Bazyluk and R. Mantiuk

    Fig. 1. The design of our gaze-driven target identification system supplied by theoptical flow object tracking

    clips as stimuli. This allows us to introduce a semi-manual, computer vision-aidedframework for target designation and tracking of their movement. In a footballmatch the expert would select game-related visible items like the ball, players,referees, on-screen information like score and TV station logo, as well as othervisual attractors like advertisements if they are of interest during study. By mar-king their position within respective first frames of appearance, an automatedtracking procedure can be initiated.

    The sparse optical flow estimation is calculated to follow marked targets mo-vement in frame space. To accomplish this task, a frame-to-frame Lucas-Kanademethod is used [8]. This well known algorithm considers tracked features tobe single pixels, which movement between every two consecutive frames is ap-proximately bound to their local neighbourhoods. This way usually an over-determined system is produced. It is then solved with a least-squares method.

    However tracking the visual features using simple optical flow analysis canbe problematic in real-world videos. Such stimuli are often prone to artefactsand general quality issues. Low frame rate together with slow shutter speedscan lead to motion blur which affects sparse flow estimation [7], as well as rapidmovement and the natural tendency of moving objects to rotate and occludeeach other, often lead to unrecoverable gaps in automatic tracking process. The-refore we found it necessary to implement in our software a way for manual keyframe insertion, that would help tracking in these critical moments (e.g. when aquickly moving ball is occluded by players and its tracking cannot be recoveredautomatically). The expert is also allowed to perform basic linear interpolationof objects path to cater for transient disappearance periods, during which theusers attention can still be bound to the target despite its temporal lack of vi-sibility [9]. These tasks are performed off-line, however implementation of a fullreal time version is also feasible as long as a reliable automated tracking methodcan be provided. We consider it to be a part of our future work.

    GDOT Algorithm. A small distance between the eye-tracker gaze point andthe target is the strongest indicator that an object is attended. The GDOT

  • Gaze-Driven Object Tracking Based on Optical Flow Estimation 87

    technique models this indicator as the position probability proportional to theEuclidean distance between the gaze point and targets. If the position consi-stency becomes unreliable, the object can still be tracked if its velocity is consi-stent with the smooth pursuit motion of the eye. The velocity computed directlyfrom scan paths (gaze data) is an unreliable measure as it is dominated by theeye-tracker noise, saccades and the tremor of the eye. Fortunately, the smoothpursuit motion operates over longer time periods and thus can be extracted fromthe noisy data with the help of a low-pass filter. The sum of probabilities is usedfor the position and velocity consistency because it is likely that either positionor velocity is inconsistent even if the target is attended. A naive target trackingalgorithm could compute the probability of each target at a given point in time(or at a given frame) and choose the object with the highest probability. This, ho-wever, would result in frequent shifts of tracking from one target to another. Anelegant way to penalise implausible interpretation of the data without excessivetime delay is to model the attention shifts between targets as a Hidden Markovprocess. Each state in such a process corresponds to tracking a particular target.Since the fixation cannot be too short, the probability of staying in a particularstate is usually much higher than the probability of moving to another state (inour case 95% vs. 5%). This way the eye is more likely to keep tracking the s