scanning and tracking with independent cameras—a ...environment visual scanning (evs) is a...

18
Auton Robot DOI 10.1007/s10514-007-9057-4 Scanning and tracking with independent cameras—a biologically motivated approach based on model predictive control Ofir Avni · Francesco Borrelli · Gadi Katzir · Ehud Rivlin · Hector Rotstein Received: 19 June 2007 / Accepted: 25 September 2007 © Springer Science+Business Media, LLC 2007 Abstract This paper presents a framework for visual scan- ning and target tracking with a set of independent pan-tilt cameras. The approach is systematic and based on Model Predictive Control (MPC), and was inspired by our under- standing of the chameleon visual system. We make use of the most advanced results in the MPC theory in order to design the scanning and tracking con- trollers. The scanning algorithm combines information about the environment and a model for the motion of the tar- get to perform optimal scanning based on stochastic MPC. The target tracking controller is a switched control combin- ing smooth pursuit and saccades. Min-Max and minimum- time MPC theory is used for the design of the tracking con- trol laws. E. Rivlin · O. Avni ( ) Department of Computer Science, Technion—Israel Institute of Technology, Haifa, Israel e-mail: [email protected] E. Rivlin e-mail: [email protected] F. Borrelli Department of Mechanical Engineering, University of California, Berkeley, CA 94720-1740, USA e-mail: [email protected] G. Katzir Department of Biology, Oranim—University of Haifa, Tivon, Israel H. Rotstein Rafael—Advanced Defence Systems LTD. and Department of Electrical Engineering, Technion—Israel Institute of Technology, Haifa, Israel We make use of the observed chameleon’s behavior to guide the scanning and the tracking controller design proce- dures, the way they are combined together and their tuning. Finally, simulative and experimental validation of the ap- proach on a robotic chameleon head composed of two inde- pendent Pan-Tilt cameras is presented. Keywords MPC controllers · Target tracking · Target search · Biologically motivated 1 Introduction Environment visual scanning (EVS) is a critical task for living creatures and artificial systems alike. EVS refers to the process of visual exploration of the environment used to identify relevant events and detect threats or targets. In nature, solutions to EVS are diverse and range from: (i) non-moving eyes on top of a movable neck as in the barn owl, through (ii) coupled movements of the eyes as in humans, to (iii) independent eye movements as seen in chameleons. Overall, one can identify three models which represent different levels of eye vs. neck movements and correspond to the three examples mentioned above: the owl-like, the human-like and the chameleon-like models. In the world of robotics, common solutions are based on the first two models. For example, a number of systems have been implemented using two cameras that are ei- ther fixed or verged (Bernardino and Santos-Victor 1999; Vijayakumar et al. 2001; Asada et al. 2000). These two models have a number of important advantages in terms of facilitating visual calculations such as estimating depth from stereo. Yet, by constraining the relative motion of the cam- eras they sacrifice search efficiency due to a large overlap between the views of the two cameras.

Upload: others

Post on 11-Mar-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scanning and tracking with independent cameras—a ...Environment visual scanning (EVS) is a critical task for living creatures and artificial systems alike. EVS refers to the process

Auton RobotDOI 10.1007/s10514-007-9057-4

Scanning and tracking with independent cameras—a biologicallymotivated approach based on model predictive control

Ofir Avni · Francesco Borrelli · Gadi Katzir ·Ehud Rivlin · Hector Rotstein

Received: 19 June 2007 / Accepted: 25 September 2007© Springer Science+Business Media, LLC 2007

Abstract This paper presents a framework for visual scan-ning and target tracking with a set of independent pan-tiltcameras. The approach is systematic and based on ModelPredictive Control (MPC), and was inspired by our under-standing of the chameleon visual system.

We make use of the most advanced results in the MPCtheory in order to design the scanning and tracking con-trollers. The scanning algorithm combines informationabout the environment and a model for the motion of the tar-get to perform optimal scanning based on stochastic MPC.The target tracking controller is a switched control combin-ing smooth pursuit and saccades. Min-Max and minimum-time MPC theory is used for the design of the tracking con-trol laws.

E. Rivlin · O. Avni (�)Department of Computer Science, Technion—Israel Instituteof Technology, Haifa, Israele-mail: [email protected]

E. Rivline-mail: [email protected]

F. BorrelliDepartment of Mechanical Engineering, University of California,Berkeley, CA 94720-1740, USAe-mail: [email protected]

G. KatzirDepartment of Biology, Oranim—University of Haifa, Tivon,Israel

H. RotsteinRafael—Advanced Defence Systems LTD. and Departmentof Electrical Engineering, Technion—Israel Institute ofTechnology, Haifa, Israel

We make use of the observed chameleon’s behavior toguide the scanning and the tracking controller design proce-dures, the way they are combined together and their tuning.

Finally, simulative and experimental validation of the ap-proach on a robotic chameleon head composed of two inde-pendent Pan-Tilt cameras is presented.

Keywords MPC controllers · Target tracking · Targetsearch · Biologically motivated

1 Introduction

Environment visual scanning (EVS) is a critical task forliving creatures and artificial systems alike. EVS refers tothe process of visual exploration of the environment usedto identify relevant events and detect threats or targets.In nature, solutions to EVS are diverse and range from:(i) non-moving eyes on top of a movable neck as in thebarn owl, through (ii) coupled movements of the eyes asin humans, to (iii) independent eye movements as seen inchameleons. Overall, one can identify three models whichrepresent different levels of eye vs. neck movements andcorrespond to the three examples mentioned above: theowl-like, the human-like and the chameleon-like models.In the world of robotics, common solutions are based onthe first two models. For example, a number of systemshave been implemented using two cameras that are ei-ther fixed or verged (Bernardino and Santos-Victor 1999;Vijayakumar et al. 2001; Asada et al. 2000). These twomodels have a number of important advantages in terms offacilitating visual calculations such as estimating depth fromstereo. Yet, by constraining the relative motion of the cam-eras they sacrifice search efficiency due to a large overlapbetween the views of the two cameras.

Page 2: Scanning and tracking with independent cameras—a ...Environment visual scanning (EVS) is a critical task for living creatures and artificial systems alike. EVS refers to the process

Auton Robot

Fig. 1 The robotic head and a chameleon head

The chameleons are probably the best known example forindependent eye movements. Eye movements in chameleonsare considered unique. They are laterally positioned andcan move independently, each over a wide range, whilescanning different sections of the environment. Independenteye movements are seen also in fishes. A well studied ex-ample is that of the sandlance which has an optical sys-tem and eye movements similar to the chameleon (Petti-grew et al. 1999). Scanning in chameleons is performed bysaccades—i.e. large, independent and rapid movements ofthe eye from one location to another (Land 1999). The rangeof movement of each eye is almost 180◦ in the pan axis andabout 90◦ in the tilt axis. Recent unpublished works (Avniet al. 2007) indicate that the global scanning strategy of thechameleon is based on a “negative correlation” principle.More specifically, if one eye scans forward, then with a highprobability, the other will point backwards.

There are several advantages in using independent eyesmovements: (i) fast scanning—it is faster to rotate only theeye on not the whole head with a consequent energy sav-ing, (ii) camouflage—eye’s movement is less likely to bedetected, (iii) efficient scanning—a certain area can be cov-ered in less time. However, the chameleon’s model is not the“best solution” in nature, and as any solution it presents dis-advantages as well. Main disadvantages include the need fora massive set of muscles to drive the eyes, and the larger sizeof the eyes, which take up space in the skull.

The objective of our research is to develop, model andcontrol a robot head based on the principles that govern thechameleon eyes movements, without imposing a-priori con-straints on the relative motion of the cameras. This permitsbasic science studies of the chameleon’s visually-guided be-havior and allows us to build a robot which inherits theaforementioned advantages of chameleon’s solution to en-vironment scanning and tracking. Figure 1 depicts our sys-tem as compared to a chameleon and Fig. 2 depicts a blockdiagram of the system architecture.

We make use of standard control theory and tools, andthe most advanced results in the Model Predictive Control

Fig. 2 System architecture

(MPC) theory in order to design the scanning and trackingcontrollers. The resulting procedure will be systematic andmodel based, satisfy the system constraints, be easy to tuneand be real-time implementable. One main contribution ofthis paper resides in the way a large set of existing tools andadvanced theoretical results have been combined togetherin order to tackle the scanning and tracking control designproblem.

We use a hierarchical approach. At the scanning level,a simple probabilistic target motion model is used and thescanning controller is based on a stochastic MPC design.At the tracking level, a dynamical linear model of cameraand target are used for predicting target and camera dynam-ics and solve (i) a constrained min-max MPC game duringsmooth pursuit and (ii) a minimum time MPC controller

Page 3: Scanning and tracking with independent cameras—a ...Environment visual scanning (EVS) is a critical task for living creatures and artificial systems alike. EVS refers to the process

Auton Robot

during saccades. To the best of our knowledge, this paperpresents the first real-time application of a constrained dy-namical min-max MPC game to a robotic system with “fast”sampling times. At both levels we make use of the observedchameleon’s behavior to guide the control design procedureand the control tuning. Finally, we describe the architectureof the robotic head and present simulations and experimen-tal tests obtained by implementing the proposed control ap-proach on the developed robotic head.

A short introduction to both scanning and control prob-lems and related literature follows in Sects. 1.1, 1.2 and 1.3.In Sect. 2 we present in more details the principles that gov-ern the eyes movements in chameleons. In Sect. 3 the high-level scanning algorithm is formulated and solved. In Sect. 4the smooth pursuit controller, the saccade controller and theway they are combined together are discussed. Section 5contains simulation and experimental results of the proposedapproach. Finally, some conclusions are presented in Sect. 6.

1.1 Oculomotor tasks

In this paper, it is assumed that the vision system has twomain goals:

1. Scanning the environment autonomously while searchingfor targets or threats.

2. Acquiring and tracking targets or threats once they aredetected.

It is worth mentioning that in real-life systems the oculomo-tor system has additional objectives related to the two men-tioned above; an obvious example is range estimation to theprey, which is required for a successful capture. We will fo-cus only on the two aforementioned goals and define threemain tasks needed to complete them:

Environment Visual Scanning (EVS). EVS refers to the actof moving an optical devices, be them eyes in chameleonsor cameras in a robotic system, in order to explore the sur-rounding environment. EVS requires computing a strategyto guarantee that part or the whole 360 degrees of the envi-ronment are covered according to some specified priorities.The strategy is then transformed into commands to the op-tical device.

Image Processing (IP). IP is a computational task that ex-tracts information and features from the optical images thatare relevant for the operation of the system. The IP activi-ties relevant for the work considered in this paper are tar-get detection and visual target tracking. Here, visual targettracking refers to the image processing task of tracking.

Target Tracking (TT). TT refers to the act of moving the op-tical devices in order to keep a detected target within thefield of view or a subset of it.

This paper will not dwell with the IP task and instead con-centrate on EVS and TT which involve electro-mechanicalmotion.

1.2 Environment visual scanning

The EVS task includes a decision phase in which the regionof the environment to be explored is selected, and a low-level motion phase in which the exploration is executed bythe system in a controlled manner. EVS can be consideredas a special case of Search Theory, originally developed dur-ing World War II in the context of antisubmarine warfare.Much of the early work on search theory was summarizedby L. Stone in his classic book (Stone 1992). The topic hasreceived considerable attention in robotics, given its rele-vance for planning search and rescue missions. For instance,(Eagle 1984; Eagle and Yee 1990) studied the problem ofsearching for a moving target when the path of the searcheris constrained, and (Lau et al. 2005) presented a search strat-egy for targets in an indoor environment assuming stationarytargets only. The approach followed in this paper builds onthe results of (Wong et al. 2005), where the authors deal withthe problem of coordinating the effort of several autonomousunmanned vehicles in the context of search and rescue. Thescanning algorithm is designed so as to maximize the prob-ability of discovering any existing target in the environmentsurrounding the robot. A Markov process is used to modelthe probability of appearance and the motion of a target inthe environment within the detection radius of the robot. Theresulting problem is formulated and solved as a DynamicStochastic Programming Problem (see, e.g., Ross 1983;Bertsekas 2000).

1.3 Target tracking

Active target tracking is one of the fundamental tasks in ac-tive vision. In general, a good active target tracking can beachieved by combining smooth pursuit and saccades move-ments (Rivlin and Rotstein 2000; Sutherland et al. 2001). Insmooth pursuit, the target is tracked continuously with thecamera motion controlled so that the image of the target re-mains inside a predefined window. Saccades are triggeredeither by large tracking errors (including the case wherethe image of the target exits the tracking window) or bythe request of moving the camera to a new region of at-tention. Usually, the purpose of saccades is to quickly re-orient the cameras in order to allow a smooth pursuit. Thesmooth pursuit/saccades scheme is a natural consequenceof the foveated structure of biological eyes found in manycreatures in nature including the chameleon. As shown by(Rivlin and Rotstein 2000; Sutherland et al. 2001), foveatedvision and hence smooth pursuit/saccades can be explainedin terms of optimal control theory. As a continuation of thatwork, Model Predictive Control is proposed in this paperas a more suited approach to design each control loop andtheir interaction. Model Predictive Control (MPC) is a con-trol technique that uses the model of the plant to predict the

Page 4: Scanning and tracking with independent cameras—a ...Environment visual scanning (EVS) is a critical task for living creatures and artificial systems alike. EVS refers to the process

Auton Robot

future evolution of the system (Mayne et al. 2000). Basedon this prediction, at each time step t a certain performanceindex is optimized under operating constraints with respectto a sequence of future input moves. The first of such opti-mal moves is the control action applied to the plant at timet . At time t + 1, a new optimization is solved over a shiftedprediction horizon. This leads to a feedback control policy.

In the past, MPC was limited to relatively “slow” proces-ses as found in the chemical industry, since solving an op-timization problem in real-time is computational expensive.Recent developments have shown that it is possible to solvethe optimization problem off-line as a function of all feasiblestates. In particular, for linear and piecewise-linear systems,the feedback solution derived from an MPC scheme with alinear or quadratic performance index and linear constraintsis shown to be a piecewise linear function of the states(Borrelli et al. 2005; Bemporad et al. 2002). Therefore,MPC on-line implementation resorts to a simple lookup-table evaluation and allows the use of MPC in real-timescheme also for “fast” systems as the one considered in thismanuscript. The interested reader is referred to the manualof the Multi-Parametric Toolbox (Kvasnica et al. 2004) fora quick introduction to the topic. A detailed description oflatest results on the topic can be found in (Borrelli 2003).

2 The visual system of the chameleon

Chameleons are arboreal lizards that eat mostly insects.They are rather slow and developed, as a compensation, aunique visual system combined with a specialized tongue inorder to catch a prey. The chameleon’s eyes are positionedlaterally on the head and are able to move independentlyover a wide range. This enables the chameleon to quicklyscan the environment while improving its camouflage, sinceonly its eyes are moving.

Once prey is detected, the chameleon directs its head andtwo eyes toward the prey, and shoots its long, sticky tongueat it. The tongue is launched at high speeds which are pro-duced using a catapult system (de Groot and van Leeuwen2004) and applies a suction force to pull the prey (Herrelet al. 2000). The tongue has to be shot to the exact distancewhere the prey is. Undershooting will result in a prey notcaught, and overshooting is not desirable for several rea-sons. One is that the tongue might be damaged if it hits ahard surface that might be positioned behind the prey, sec-ond is that the prey may be pushed out of reach by thetongue, and moreover shooting the tongue involves spend-ing high energy. In (Harkness 1977), the author found thatchameleons estimate the distance of the target by accommo-

dation cues (although both of the eyes are pointed towardsthe prey and stereo method might also be used). In the exper-iments, negative and positive lenses were placed in front ofthe eyes of chameleons. The resulting shoots of the tonguewere closer or farer from the position of the prey accordingto the diopter of the lenses. Chameleons were also able tocatch prey with one eye covered, although with less accu-racy.

The chameleon’s eye has the basic structure of a simplechambered eye common to all vertebrates. The retina is com-posed solely of cones photoreceptors (Armengol et al. 1988)(mentioned by Bowmaker et al. 2005) and has a fovea.In (Bowmaker et al. 2005) it was found that chameleons(at least the four species that were examined) can see fourdifferent colors: green, light-blue, dark-blue and purple-UV.The author of (Bowmaker et al. 2005) pointed out that allthe four species had similar photoreceptors although theylive in different areas and conditions. The lens, unlike othervertebrates, has a negative power (Ott and Schaeffel 1995).This result in a magnification of the image on the retina,which enables a more accurate measurement of image focusand probably supports more accurate distance measurementfrom accommodation cues. Before shooting the tongue, theeyes alternate between states of in-focus and out-of-focus(Ott et al. 1998) which may serve to perform distance esti-mation by focus/defocus algorithms.

As mentioned in the introduction, in order to scan theenvironment, chameleons performs large, independent sac-cadic movements of the eyes. Once a target is detected thehead axis is directed towards it and both eyes fixate it. If thetarget starts to move the chameleon tracks it using a combi-nation of head movements and eye movements (Ott 2001;Flanders 1985). In (Ott 2001) the author highlights thatwhile the saccadic movements are independent during thesearch for the prey, they are synchronous during the track-ing of the prey. The chameleon is the only vertebrate knownto switch from independent to synchronous saccades.

In a complementary work (Avni et al. 2007) we are study-ing eye movements in the chameleon. In this work headpose1 and eye movements are analyzed by computer visionalgorithms from video movies of chameleons. In each ex-periment, a chameleon is positioned on a stick above twomirrors and video movies are recorded from two cameras infront-upper and rear-upper positions. See Fig. 3 for illustra-tion of the experimental setup. Features on the head of thechameleon are tracked and the head pose is calculated us-ing a model-based approach. The eyelids are also tracked,whether they appear in the real image or in the mirrors, andthe direction of the eyes is calculated relative to the poseof the head based on the geometric model of the eye. In

1Pose refers to both the position and the orientation, i.e., six degrees offreedom.

Page 5: Scanning and tracking with independent cameras—a ...Environment visual scanning (EVS) is a critical task for living creatures and artificial systems alike. EVS refers to the process

Auton Robot

Fig. 4 one image from a movie is presented with the ana-lyzed data plotted on top of it. Preliminary results indicatethat chameleons use “negative correlation” between the po-sitions of the eyes while scanning the environment. That is,when one eye is pointing to the forward direction, with highprobability the other eye will point to the backward directionand vice-versa.

Fig. 3 Tracking chameleon’s eyes setup—a chameleon is placed ona stick above two mirrors, and is videoed using two cameras, locatedabove the chameleon in front and rear positions. This setup is equiva-lent to a six-cameras setup, and has two real views and four mirroredviews of the chameleon

Of special relevance to the scanning strategy is the Field-of-View (FOV) of chameleons’ eyes. However, to the bestof our knowledge no studies has so far addressed this as-pect of their visual system. Although we can’t state the exactFOV, we can make the observation that the expected FOVshould be smaller than other vertebrates based on the factthat chameleons have negatively powered lens. In addition,the protruding eyes are covered by a protective skin (mergedeyelids) within which only a small aperture is opened for vi-sion. This inevitably restricts the FOV.

3 Optimal environment visual scanning

In this section we review the mathematical foundations ofstandard Bayesian filter design (F.L. 1986) and introduce themodel and the problem solved for designing the scanning al-gorithm. Both the estimation procedure and the control de-sign approach are well known and extensive used in the liter-ature (Stengel 1994; F.L. 1986; Anderson and Moore 1995).Our mathematical formulation and solution procedure fol-lows the footsteps of (Wong et al. 2005). The main differ-ences reside in the model used, its parameters and assump-tions.

The environment is partitioned into M regions, with eachregion representing a location that the system can explore.An additional region, denoted by region 0, is introduced torepresent the set of points outside the range of detection. Inthe model, a target appears in the system range of detection,when it moves from region 0 to a region different from 0, and

(a) Chameleon in the Experiment Setup (b) Chameleon with Eyes’ Directions Marked

Fig. 4 Tracking eye movements of the chameleon—the chameleonis placed on a stick above two mirrors. The left figure shows thechameleon in the experiment setup. In the right image, extracted data

is plotted on the image. White crosses mark the location of tracked fea-tures and large white cones indicate the line-of-sight of each eye. Notethat the direction of the left eye was found by using the rear camera

Page 6: Scanning and tracking with independent cameras—a ...Environment visual scanning (EVS) is a critical task for living creatures and artificial systems alike. EVS refers to the process

Auton Robot

disappears when it moves from a region different from 0 toregion 0. Time is also discretized into time steps which areassumed to be long enough to guarantee the completion of asearch step. A search step comprises the actions of movingthe camera to a new location and inspecting this new loca-tion by means of visual processing.

A single-target search problem with multiple cameras isconsidered. The vector xk ∈ {0,1, . . . ,M} is the target lo-cation at time k. The i-th camera observation at time k isyik = (p, b) where p = yi

k(1) ∈ {0,1, . . . ,M} is the cam-era location at time k and b = yi

k(2) ∈ {D,D} is a binaryvariable denoting “detection” (D) and “no-detection” (D)events.

The Bayesian filter estimates the target probability den-sity function (PDF) p(xk|y[1:k]) given the sequence of allthe observations made up to time step k by the Nc cameras,y[1:k] = {y1[1:k], . . . , y

Nc

[1:k]}. This is done by using a predic-tion and an update equation (F.L. 1986; Wong et al. 2005).The prediction model

p(xk|y[1:k−1]) = Ap(xk−1|y[1:k−1]), (1)

predicts forward to time k the system state at time k − 1,where A ∈ R

(M+1)×(M+1) is a probabilistic Markovianmodel with elements aij = p(xk|xk−1). The update equa-tion updates the predicted target PDF based on the new setof observations yk at time k. For each camera i we denotethe target state observation probability by p(yi

k|xk). Giventhe current state, we assume all the observations indepen-dent and use the Bayes rule to update the predicted targetPDF as:

p(xk|y[1:k]) = Kp(xk|y[1:k−1])Nc∏

i=1

p(yik|xk), (2)

where K is a normalization factor that keeps the PDF sumequals 1:

K =∑

j∈[0,1,...,M]p(j |y[1:k−1])

Nc∏

i=1

p(yik|j). (3)

In this work we use a simple model for the target state ob-servation probability p(yi

k|xk) based on detection rates. Thedetection rate denotes the chances of finding the target and itis a characteristic of each region, and each camera (if hetero-geneous cameras are used) and depends on several factors.For example, in an area with uniform color, detection rateis expected to be high, while in areas with a large variety ofcolor and condense texture the detection rate will be lower.The estimation of detection rate for each region is complexand influenced by several parameters. We assume that it canbe computed by means of computer vision algorithms anddenote dk

l the detection rate of region l ∈ {0, . . . ,M} of cam-era k for k = 1, . . . ,Nc. We assume only false-negative er-rors and no false-positive ones. That is, if no target is located

in region j the inspection of that region will always result a“no-detection” answer. Therefore the p(yi

k|xk) is given as

p(yik = (l,D)|xk = j) =

{0 j �= l,

dil j = l

(4)

and

p(yik = (l, D)|xk = j) =

{1 j �= l,

1 − dil j = l.

(5)

Based on (1), (2), (3) and model (4) and (5), we com-pactly write the target PDF equation in the case that all cam-eras fail to find the target:

p(xk|y[1:k], yik(2) = D ∀i)

= KAp(xk−1|y[1:k−1])Nc∏

i=1

((ev(xk �= yik(1)))

+ (1 − dixk

)ev(xk = yik(1))), (6)

where ev(statement) = 0 if statement is true else equals 1.Note that (6) links the PDF at time k − 1 with the PDF attime k if all the new set of observations yk at time k resultinto a no-detect event. With abuse of notation we denote byg the function (6) and compactly write (6) as:

Pf (xk) = g(Pf (xk−1), y1k (1), . . . , y

Nc

k (1)). (7)

Once the model equations have been derived, the searchproblem consists of determining, for each camera, the se-quence of regions to explore which will maximize a certainteam cost function. The team cost includes the two terms:cl(i, j) and R which are described next. For each pair of re-gions (i, j), the cost cl(i, j) reflects the effort required todrive a camera from region i to region j plus the effort toscan region j . A cost of ∞ is set if the camera cannot movebetween i and j . For example, in the case where each cam-era is constrained to guard its own hemisphere, the cost ofmoving the right camera to the left side will be ∞. On theother hand the system gains a reward R if a target is cap-tured. Both cl(i, j) and R are tuning parameters and theirrelative value will determine the system behavior. We for-mulate the following finite horizon constrained optimal con-trol problem:

minvt

l∈[1,...,Nc]

t+N∑

k=t

(cl(vlk, v

lk+1)) + Pf (xk, v

1k , . . . , v

Nc

k )R

subject to (8)Model (7),

v1t = y1

t (1), . . . , vNct = y

Nct (1)

where vt = {v1t , . . . , v

1t+N−1, . . . , v

Nct , . . . , v

Nc

t+N−1} is thevector of all cameras locations over the control horizon N ,with v

jk being the location of camera j at time k.

Page 7: Scanning and tracking with independent cameras—a ...Environment visual scanning (EVS) is a critical task for living creatures and artificial systems alike. EVS refers to the process

Auton Robot

In problem (8) the horizon length should be set equalto the maximum duration of a scanning session. However,solving problem (8) becomes prohibitive with increase num-bers of cameras and large N . For this reason we solve(8) by using a receding horizon policy. At a given timet , given the current PDF Pf (xt ) and current camera loca-

tions y1t (1), . . . , y

Nct (1), problem (8) is solved to obtain the

sequence of optimal cameras locations vt∗. Only the first

step of vt∗ is implemented (i.e., the cameras are moved in

v1∗t+1, . . . , vNc∗t+1) and at the next time step t + 1 prob-

lem (8) is solved over the shifted horizon t +1, . . . , t +N +1starting from the newly estimated PDF Pf (xt+1) and cam-

era locations y1t+1(1), . . . , y

Nc

t+1(1).The EVS approach described in this section is a system-

atic and model-based approach which can be extended inseveral directions. As an example the target state x could in-clude attitude and speed of the target. Also, the detection ratedli and/or the Markovian model A could be updated at each

time step (to become dli (t) and A(t)) based on new data. All

this would not change the design and solution procedure.The main limitation of the proposed EVS approach residein the oversimplified Markov model (1), in the “independentopinion pool” assumption in (2) and in the computationalburden required to solve problem (8) for large horizons andnumber of cameras.

The choice of A, the detection rates dli for each cam-

era l, the horizon length N , the costs cl(i, j) and rewardR are all parameters which affect the overall behavior of theclosed-loop system. An extensive study on how all this para-meters reflect the resulting scanning behavior goes beyondthe scope of this paper. For instance, the work in (Bourgaultet al. 2004) provides examples of realistic process model oftype (2). In the simulations discussed in Sect. 5 we selectedthe model and all the parameters with a minimal effort suchthat the resulting scanning algorithm would have two maincharacteristics: qualitatively similar to the one observed inchameleons and computationally attractive. For this reason,we used a course griding or the target space and numeri-cal values dictated by common sense with a minor iterationprocedure.

4 Target tracking: smooth pursuit and saccade

In this section we describe the autonomous tracking con-troller design. In particular, we introduce the smooth pursuitcontroller, the saccade controller, and the method they arecombined together. We make use of (A) standard controltheory and tools and (B) the most advanced results in theMPC theory in order to design the tracking controllers. Theresulting procedure will be systematic and model based, sat-isfy the system constraints, be easy to tune and be real-timeimplementable. Its main contribution resides in the way a

large set of existing tools and theoretical results have beencombined together in order to tackle the tracking control de-sign problem and handle possible MPC infeasibility issues.

The smooth pursuit controller is based on a the solutionof a min-max constrained game (also called min-max MPC)(A) which is computed by avoiding the griding of state space(B) and which is combined with a model based inversion forthe compensation of system nonlinearities (A). It relies onthe assumption that the target model is excited by an un-known and bounded disturbance which brings it away fromthe center of a given tracking window.

The saccade controller is a MPC constrained minimum-time controller (A) which is computed without griding thestate space (B) and with a combination of explicit solutionswith different time-horizons (B). It relies on the oppositeassumptions: it assumes that the target movement is known,without any disturbances, and tries to center the target in thetracking window as fast as it can.

We start by presenting the dynamical model of a sin-gle pan-tilt head. However, we remark that the control de-sign procedure can be used for any linear camera and targetmodel.

4.1 Active vision model

Consider a single pan-tilt head depicted in Fig. 5a. We de-note by ψ and θ the pan angle and the tilt angle, respectively,and by ψ and θ the pan and tilt speeds, respectively. Bycollecting the model variables, the head state vector can bedenoted by xp = [ψ, ψ, θ, θ]T . We use discrete-time lineartime-invariant model with sampling time equal to the sam-pling time of the camera. With abuse of notation the samenotation is used for the corresponding discrete time vari-ables, xp = [ψ, ψ, θ, θ ]T . The head dynamics are describedby the model (Brogan 1991)

xp(k + 1) = Apxp(k) + Bpu(k) (9)

where Ap is a 4×4 matrix, Ap is a 4×2 matrix and u is the2×1 input vector to the local pan-tilt controllers. The matri-ces Ap and Bp are identified based on step-test data (Ljung1999).

The target dynamics are modeled as a double integratorin the radial coordinate system. If At and Bt define the dy-namics of double integrator, and xt = [ψt, ψt , θt , θt ]T is thetarget state, then the target dynamics can be written as

xt (k + 1) = Atxt (k) + Btv(k) (10)

where v(k) is the acceleration of the target which is un-known to the cameras. We define the global tracking erroras �x = xt − xp , and we separate it into global tracking er-ror in the pan axis, �Ψ = Ψt − Ψt and global tracking errorin the tilt axis, �Θ = Θt − Θp , where Ψ = [ψ, ψ]T andΘ = [θt , θt ]T .

Page 8: Scanning and tracking with independent cameras—a ...Environment visual scanning (EVS) is a critical task for living creatures and artificial systems alike. EVS refers to the process

Auton Robot

Fig. 5 Pan tilt model and coordinate systems—the axes marked withw define the world coordinate system. The axes marked with c de-fine the coordinate system in the camera framework. The camera ispointing toward its X axis (and not to the standard Z axis). The axes

corresponding to the image hight and width are the Z and Y axesrespectively. The pan and tilt axes are shown in (a). The measuredtracking errors δψ and δθ are shown in (b)

We denote by δψ and δθ the measured tracking errors.It should be noticed that the measured tracking error are re-ferred to the camera coordinate system (Fig. 5b) which isdifferent from the global coordinate system in our configura-tion. It should also be noticed that only the position trackingerror is measured and that the tracking error speed is esti-mated from it. Given the measured tracking errors δψ andδθ and the current position of the camera ψ and θ , the targetposition in the world coordinate system can be calculated as

ψt = − arcsin(wTz/

√wT 2

x + wT 2y ),

θt = arctan(wTy/wTx)

where wT = [wTx,wTy,

wTz] is a normal vector pointing to-ward the target in the world coordinate system. wT can becalculated from the line of sight in the camera coordinatesystem and by using cT , through the equation:

wT = Ry(θ) · Rz(ψ)cT ,

where RO(α) is the rotation matrix around the axis O , withα radians. The vector cT is calculated in a similar way usingδψ and δθ .

cT = Ry(δθ) · Rz(ψ) · [1,0,0]T .

The global tracking error can then be easily calculated bysubtracting the current camera position, �ψ = ψt − ψ . Anidentical procedure can be written for the tilt axis.

Clearly the global tracking errors �ψ and �θ are notlinearly dependent from the measured errors δψ and δθ . Inorder to make use of a linear control design, we preprocessthe measured errors with a nonlinear static inversion. In par-ticular, after measuring δψ and δθ , the global tracking er-rors �ψ and �θ are computed by a non-linear function (de-noted IK). The MPC controller receives as inputs the latter

tracking errors. Since a minimization of �θ implies the min-imization of δθ , we obtain a linear MPC with the same finalobjective. The aforementioned procedure is very commonin the field of autonomous guidance and navigation (Ennset al. 1994) for compensating system nonlinearities and thusallowing for liner control design.

Figure 6 presents the system model. The large left blockperforms the control. The large block to the right is the plant.In the control block the IK non-linear function computesthe tracking errors �ψ and �θ from the measured track-ing errors δψ and δθ and the current head position ψc andθc. In the plant, the block marked as P represents the lo-cal pan-tilt controllers and the pan-tilt engines. It receivesthe input from the MPC controllers and considers the targetmovements as a disturbance. The outputs of P are the track-ing errors �ψ , �θ and the head position. However, throughimage precessing one can measure only δψ and δθ , outputsof the static non-linear function Kim.

Besides the linearity of the problem, by using the globaltracking error rather the measured ones, we gain the sepa-rability of the control problem. The pan and tilt controllerscan be now separated since they are dynamically decoupled.

The choice of a second order linear models for cameraand targets is based on the computational load required forthe solution of MPC design (presented in the next sections).The models capture the main system dynamics (includingeventual camera oscillations as a function of system inputs)and are simple enough for realtime constrained min-maxcontrol design. The use of more detailed and possibly non-linear models in the control design improves the trackingperformance but is prohibitive for real-time implementationeven on most advance computational platforms. To the bestof the authors’ knowledge, at the time of this submission,there is no implementation of a constrained min-max non-linear control for active visual tracking.

Page 9: Scanning and tracking with independent cameras—a ...Environment visual scanning (EVS) is a critical task for living creatures and artificial systems alike. EVS refers to the process

Auton Robot

Fig. 6 System model described in Sect. 4.1—the large block on the left is the control. The large block to the right is the plant

4.2 Smooth pursuit control

We combine the camera states and target states into the fol-lowing LTI dynamical model:[xp(k + 1)

xt (k + 1)

]=

[Ap 00 At

][xp(k)

xt (k)

]+

[Bp

0

]u(k)+

[0Bt

]v(k).

We apply a linear state transformation by replacing the targetstates with the tracking error states: x = [xp, xp,�x, �x],where �x = xt −xp and �x = xt − xp . The resultant modelwill be compactly written as

x(k + 1) = Ax(k) + Bu(k) + Ev(k). (11)

The goal of the smooth pursuit controller is to main-tain the target inside a predefined tracking window. Theacceleration v(k) that drives the target can be seen as adisturbance that competes against the controller goal. Thesmooth pursuit controller is designed by solving a con-strained min-max game where the disturbance is the oppo-nent and the controller tries to minimize the worst-case per-formance index while satisfying the constraints for all pos-sible disturbances. State constraints will include the pres-ence of the target within a pre-specified window and boundson position, variation and acceleration of the head angles.We design a min-max MPC controller by solving the fol-lowing optimization problem

minuk∈U

s.t. xk∈X ∀vk∈V

{‖Ruk‖∞ + max

vk∈V

{‖Qxk+1|k‖∞

+ · · · + minuk+N−1∈U

s.t. xk+N−1∈X ∀vk+N−1∈V

{‖Ruk+j−1‖∞

+ maxvk+N−1∈V

{‖Qxk+N |k‖∞}}}}

(12)

where N is the prediction horizon, Q ∈ Rn×n defines the

weights on the states and R ∈ Rm×m defines the weights on

the control inputs. The vector xk+j |k denotes the predictedstate at time k + j starting from state xk|k = x(k) and ap-plying the input sequence uk, . . . , uk+j−1, and the distur-bances sequence vk, . . . , vk+j−1 to the model (11). The setsX, V, and U are polyhedral sets that define constraints onthe states, disturbances and inputs, respectively. In particu-lar, the set X constraints the target position to be in a certainwindow and the set V defines the maximum amplitude ofthe target acceleration. The performance objective in (12)is standard in robust MPC design (Scokaert and Mayne1998; Lee and Yu 1997): the weights Q and R are tuningknobs which can be used to select a desired closed loop be-havior.

Solving the optimization problem (12) in real-time mightbe challenging. An off-line solution to problem (12) canbe computed as a function of the current state vector x

(Bemporad et al. 2003), i.e., U∗(k) = f (x(k)), with U∗ =[u∗

k, . . . , u∗k+N−1]. In (Bemporad et al. 2003) it is shown

that the state-feedback solution f to problem (12) is a piece-wise linear function of the states. That is, the states spaceis partitioned into polyhedra (Fig. 7) and in each polyhe-dron the optimal controller is a linear function of the states.Hence, no state-space griding with subsequent approxima-tion is involved in the computation process. At each timestep, the online solution of (12) is reduced to the evaluationof f .

This consists of searching for the polyhedron containingx(k) and computing the corresponding linear function in or-der to obtain U∗. Once U∗ is obtained, the first input signalu∗

k is applied, and the procedure is repeated over a shiftedhorizon. Note that f is time-invariant and it is computedoff-line only once.

Page 10: Scanning and tracking with independent cameras—a ...Environment visual scanning (EVS) is a critical task for living creatures and artificial systems alike. EVS refers to the process

Auton Robot

Fig. 7 Smooth-pursuit min-max MPC controller. State-space parti-tion: the horizontal axis is the tracking error and the vertical axis isthe tracking error speed. Axes units are steps and steps per second.Each step is roughly 0.05◦

4.3 Saccade controller

When the smooth pursuit controller fails, or when oneprefers to move the camera and scan another section, thesaccade controller is used. In both cases we want to com-plete the task as fast as possible, hence the saccadic motion.For this reason, the saccade controller, in general, requires amore accurate model of the target motion in opposite to thesmooth pursuit controller, which is a robust controller. Inthis work, the need for a more accurate model is translatedin the following assumption: the disturbance acting on thetarget is known and constant. Such hypothesis will be betterjustified and understood later in this section. However, moreaccurate linear target models, if available, can be immedi-ately included in our framework.

We design a saccade controller by computing the statefeedback closed-loop solution to the following MPC prob-lem (Borrelli 2003):

minU=uk,...,uk+N−1

J =N−1∑

j=0

‖Qsxk+j |k‖2 + ‖Rsuk+j‖2

+ ‖Psxk+N |k‖2

s.t.xk+1|k = Axk|k + Buk + Ev, (13)xk+j |k ∈ X, j = 1, . . . ,N,

yk+j |k ∈ Y, j = 0, . . . ,N − 1,

uk+j |k ∈ U, j = 0, . . . ,N − 1,

xt+N |t ∈ Tset

where the disturbance is assumed constant over the horizonand with a terminal state constraints Tset . The performanceobjective in (13) is standard in MPC design (Mayne et al.2000): the weights Qs and Rs are tuning knobs which canbe used to affect the closed loop system.

The choice of v is discussed in Sect. 4.4. The off-line so-lution is piecewise affine (Bemporad et al. 2002), i.e., theset of feasible states (states that can be brought to Tset inN steps) is partitioned into polyhedra and in each polyhe-dron the optimal control is a linear function of the states. Inorder to obtain the state-feedback solution to problem (13),we solve (13) off-line for all horizons N ≤ Nmax for a givenNmax . This procedure yields Nmax controllers, one for eachhorizon:

u∗(k)N = fN(x(k)), N = 1, . . . ,Nmax.

These controllers will be referred to as “explicit controllers”.The feasible area of the explicit controllers fN(x(k)) willincrease with N , as can be seen in Fig. 8, since as the hori-zon is getting longer, state vectors which are farer from theterminal set can be driven into the terminal set.

We implement a feedback controller that brings x to theterminal set Tset in minimum time by using the explicit con-trollers fN(x(k)) in a combined way. At each time step, fora given current state x(k), we look for a feasible controllerfN(x(k)) with the smallest horizon N . Once this has beenfound we implement the corresponding optimal control pol-icy. It should be noted that checking whether or not the statevector is inside a feasible region of a controller is simple: thefeasible region of a linear MPC controller is a polyhedron.

We remark that in the proposed scheme, the systems statereaches Tset in minimum-time, under no model mismatch.

There is a trade off between the real-time computationalcomplexity and the choice of Nmax . The use of a long hori-zon Nmax allows a wider tracking capability (due to the in-crease of the feasible area). On the other hand a long pre-diction horizon requires higher real-time computational ca-pabilities, since such controller has a large number of poly-hedral regions. In our approach Nmax is chosen to be thelargest possible given the computational limits of our sys-tem. If x(k) is outside the feasible area of all the explicitcontrollers, we implement a different saccade controller (13)without terminal set and tuned with a high weight on the po-sition error compared to the other weights. This type of con-troller, tries to minimize the position errors as fast as it can.When the errors becomes smaller the explicit controllersfN(x(k)) become feasible and are used until the state vectorenters the terminal set.

As an alternative solution to reduce the complexity of thesaccade controller (13) the method in (Grieder et al. 2003)could be applied. In (Grieder et al. 2003) the authors proposea minimum-time low complexity controller. While this ap-proach provides a wider tracking area with a lower controllercomplexity, the overall performance degrades compared toour approach.

Page 11: Scanning and tracking with independent cameras—a ...Environment visual scanning (EVS) is a critical task for living creatures and artificial systems alike. EVS refers to the process

Auton Robot

Fig. 8 Saccade controllers. State-space partition for controllersfN(x(k)) for different prediction horizons N . The horizontal axis isthe tracking position error and the vertical axis is the tracking speederror in the pan or tilt axis. The partitions depicts a cut of the state-

space where the head position, head speed and target acceleration arezero. As the prediction horizon increases, the feasible area of the con-troller increases. The axes units are steps and steps per second. Eachstep is roughly 0.05◦

4.4 Switching between smooth pursuit and saccade

Consider the case where the smooth pursuit controller is ac-tive and it becomes infeasible (i.e., the target exits the track-ing window). Since the smooth controller is robust to distur-bances on the target motion, then an acceleration larger thanmodeled is acting on the target. In order to apply the sac-cade controller we estimate the acceleration using the lastand current states. With this estimation as input to the targetmodel, we implement the saccade controller as described inthe previous section. During the operation of the saccadecontroller, no image processing is performed, and the con-trol is based on the estimated states of the target. This is dueto the fact that it is hard to find the target in the image whenthe relative speed between the camera and the target is high.Any attempt on image processing would fail. This also pre-serve the ballistic nature of saccades seen in primates. Whenthe saccade is completed, based on the estimated movementof the target, the system starts to look for the target in thecenter of the image. If the estimates were correct, the targetwill be found. If the target is not found, the system switchesback to EVS mode and starts to scan the expected area ofthe target.

We remark that the proposed “dual-mode” approach doesnot present instability issues since the switching is based onan external measurable and uncontrollable variable (the rel-ative position of the target with respect to the tracking win-dow). Under nominal conditions the target is tracked, whilean increased switching between the two controllers is causedby a model mismatch in the target system dynamics (10) orin the target constraints V. This phenomenon has been high-lighted in the experimental tests presented in Sect. 5.

We conclude this section by highlighting that the pro-posed tracking control design has the advantages and thelimitations of any MPC design. A fare comparison betweenthe performance of the proposed approach and other con-troller design procedures would be very difficult to asses

(since function of many parameters and target motions), es-pecially under environment and target uncertainties. Thisstudy goes beyond the scope of this paper. We refer thereader to (Mayne et al. 2000) for a description of the ad-vantages of MPC design against traditional control designmethodologies.

5 Simulations and experiments

In this section we first present simulations of the high-levelscanning algorithm and then experimental validation of thesmooth pursuit and saccade controllers. The experimentsare carried on a robotic head designed and developed atthe Computer Science department of the Technion. This isbriefly described next.

5.1 System architecture

Inspired by the arrangement of the chameleon visual system,the robotic head is composed of two cameras installed eachon top of a pan-tilt unit. The two cameras can be movedindependently covering large sections of the environmentthanks to the wide range of motion of the pan-tilt units.At the high level, an algorithm that scans the environmentruns continuously. This algorithm is a heuristic algorithmbased on the principles that govern global scanning in thechameleon.

The whole system runs on a standard PC computer. Thecameras are sending images to the computer through afirewire link. In the PC, the image processing module is re-sponsible to find the target in the images. The smooth pur-suit controller and the saccade controller are also hosted onthe PC: they receive information from the high-level scan-ning algorithm during scan operation and directly from theimage processing and target location module during target

Page 12: Scanning and tracking with independent cameras—a ...Environment visual scanning (EVS) is a critical task for living creatures and artificial systems alike. EVS refers to the process

Auton Robot

tracking operation. They send references to the pan-tilt unitscontrollers which are located outside the PC and are respon-sible to perform the movements (see Fig. 2).

The head is composed of two pan-tilt units model PTU-D46-17.5 of DirectedPerception. Two flea cameras of Point-Grey are mounted on top of the pan-tilt units and are work-ing with a frame rate of 30 Hz. The cameras are connectedto a PC computer using a firewire link and the controllersof the pan-tilt units are connected to the computer using aserial RS232 cable.

5.2 EVS algorithm simulations

In this section simulation results for the algorithm intro-duced in Sect. 3 are presented. In order to facilitate theunderstating of the results, we first present simulations forscanning a 1D circle and then 2D simulations.

5.2.1 1D simulations

The environment considered here is a 2D world where thesystem is in the middle of a circle and the cameras can bedirected in angles of 0–360◦. The circle is divided to 10 re-gions where each camera can scan its own side and the for-ward and backward regions, as detailed in Fig. 9. The hori-zon of the MPC problem in (8) was set to three. A is set torepresent an appearance probability of 0.3 and probability ofstaying in the same region of 0.63. All regions have exitingprobability of 0.3 and transition probability that is divideduniformly between the neighbors. The detection rate of allregions (with the exception of the “no-target” region) is setto 1. The cost function was set to c(i, j) = (i − j)2 + 1 ifthe movement is possible, and c(i, j) = ∞ if the movementfrom i to j is not possible. The cost to move to and from the“no-target” region was also set to infinite, which means oursystem will always continue to search for targets.

The results are shown in Fig. 10(a). The y-axis repre-sent the angle of each camera from the forward direction.The right camera has positive angles and the left camerahas negative angles. An angle of 180◦ in the right camerais of course the same as an angle of −180◦ in the left cam-era. We highlight that the positions of the cameras are neg-atively correlated, that is, when one camera is looking for-ward the other is usually in a backward position and vice-versa. This behavior is the same as the one we observed inchameleons.

In Fig. 10(b) we present simulation results where thefront regions have larger probability of having a target.In this case the intuitive optimal solution resorts to keepone camera, (for example, the left camera) looking forwardwhile the other should scan its side (the right side in ourexample). Next, the cameras should switch, and the rightcamera should look forward while the left camera is free

Fig. 9 1D simulation setup—the world is a circle with 10 regions. Re-gion 0 represents the system. Right camera can be directed to regions[1–6]. Left camera can be directed to regions [1,6–10]. The wide arrowrepresents the forward looking direction of the system. The two thinnerarrows represents possible locations of the right and left cameras

to scan the left side. This is also a behavior observed inchameleons by the authors. As can be seen in Fig. 10(b), thescanning pattern created by our model in this case is similarto the strategy mentioned above.

In the last simulation the control of four cameras isdemonstrated. The setup is similar to the previous simu-lations. The 1D circle is now divided into twelve regions,which are denoted by 1 to 12 as depicted in Fig. 11(a). Eachcamera inspect a quarter of the circle with an overlap of oneregion between neighboring cameras. The cameras are de-noted by their main direction: front-left, front-right, rear-leftor rear-right. We choose the matrix A in order to model apreferred path for the target. The path is located at the frontposition and goes between regions 2, 1, 12, 11 and 10. Thetarget, with high probability, will enter and exit the environ-ment in regions 2 or 10, and will travel along this path. Thetarget is not able to exit the environment whilst it is in thepath, nor is able to enter the path in regions other than 2and 10. In the other regions, the target may enter or exit theenvironment but with lower probability. The detection rateof all regions (with the exception of the “no-target” region)is set to 0.9. In Fig. 11(b) the simulation results are pre-sented. As seen in the figure, the front-left and front-rightcameras guard regions 2 and 10 where the target have highprobability to enter the environment. From time to time oneof the front cameras inspect the other regions of the path,

Page 13: Scanning and tracking with independent cameras—a ...Environment visual scanning (EVS) is a critical task for living creatures and artificial systems alike. EVS refers to the process

Auton Robot

Fig. 10 Simulation results for search method in 1d—the right camerahas positive angles and the left camera has negative angles. In a all theregions are identical and target movement is uniformly distributed be-tween them. The cameras start at the same location, directed forward,

but develop a negative correlation quickly. In b the front regions havelarger probability to have the target, and they are covered by one of thecameras most of the time

Fig. 11 Simulation results for search method of four camera. a Setupof the four cameras simulation. Large arrows mark the regions withhigh probability to enter or exit the environment. b Plots of each cam-era. The plots from top to bottom represents respectively the rear-left,

front-left, front-right, and rear-right cameras. The two front camerasguard the two regions where the target have high probability to enterthe environment, while the rear cameras scan the rear regions

in case the target wasn’t detected and is still in one of them.The rear cameras cover the other regions 3 to 9.

5.2.2 2D simulations

The environment considered in this section is tridimen-sional. The scanning area is a twodimensional since it isdescribed by two parameters: the pan and tilt angles of acamera. A system of two cameras scans the upper hemi-sphere (in a realistic setup, the lower hemisphere is usu-ally below the ground surface). Each camera can scan halfof the hemisphere where front and back regions can be

scanned by both the cameras. Simulation settings were iden-tical to those in the 1D case with a detection rate of 0.9 forall regions (with the exception of the “no-target” region).The results for the pan axis can be seen in Fig. 12(a) andfor the tilt axis in Fig. 12(b). A negative correlation is clearfrom Fig. 12(c) which depicts the angle between the camerasduring the search. This angle can range between 0◦ to 180◦.An angle of 0◦ represents that the cameras point to the samedirection, while an angle of 180◦ indicates the cameras aredirected to opposite directions. As can be seen most of thetime the angle between the cameras is larger than 90◦ whichindicates a negative correlation between them.

Page 14: Scanning and tracking with independent cameras—a ...Environment visual scanning (EVS) is a critical task for living creatures and artificial systems alike. EVS refers to the process

Auton Robot

Fig. 12 Simulation Results for Search Method in 2D. a Pan axis of thetwo cameras. Right camera has positive angles and the left camera hasnegative angles. b Tilt axis. The upper plot is for the right camera and

the lower plot is for the left camera. c Angle between the cameras. Theangle is usually larger than 90◦ which indicates a negative correlationpattern

5.3 Smooth pursuit and saccade controllers: experimentalvalidation

The smooth pursuit and saccade controllers were first sim-ulated in Matlab and then experimentally validated on therobotic head. Below we present the results from the experi-ments on the robotic head.

Delays is one of the main concerns when dealing with vi-sually guided active tracking. In our case the image arriveswith a delay of almost one time step. During tracking, thetarget is located using the normalized cross correlation onthe predefined window. Once the target is located, the track-ing error and tracking error speed are estimated and the con-trol signal is computed by finding the polyhedral region ofthe controller based on the current state. This process is fastenough (about 1 ms for control computation). Therefore thetotal delay in the system is considered to be one time step.

We have extensively experimentally tested both con-trollers. In this paper we report two simplified tracking taskswhere the system tracks successfully a target represented asa black circle and moving with double integrator dynam-ics. Movies of more complex target tracking can be found in(Movie1 Online; Movie2 Online).

The unit measure for the camera is a step which cor-responds approximately to 0.9 miliradians. The cameramodel (9) was identified based on open-loop input-outputdata sets at a rate of 150 Hz. The model has been validatedup to camera speeds of 500 steps per second. In the smoothpursuit controllers the following constraints and tuning wasused:

– Pan = [−3000,+3000] step– Tilt = [−500,+500] step– Pan and tilt speed = [−2800,+2800] step/s– Pan and tilt acceleration = [−3000,+3000] step/s2

– Pan and tilt tracking window = [−20,+20] step– Pan and tilt speeds tracking window = [−300,+300]

step/s

– Target acceleration in both axis = [−1200,+1200]step/s2

– Q: 2 for position error, 0.05 for speed error, zero for allother elements

– R = 0.4 for each axis– N = 4

For the saccades controllers, the following constraints andtuning was used:

– Pan and Tilt = [−2000,+2000] step– Pan and tilt speed = [−3300,+3300] step/s– Pan and tilt acceleration = [−15000,+15000] step/s2

– Terminal set (Pan and Tilt) = [−1,+1] step– Terminal set (Pan and Tilt speed) = [−10,+10] step/s– Qs : 1000 for position error, 10 for velocity error and 0.01

for all the others– Rs = 0.01 for each axis– Ps : 10000 for position error and 100 for velocity error,

zero for all other elements– N = 5

In Fig. 13 the target moves along a sinusoidal path withconstant amplitude and different frequencies. Target speedand accelerations can be obtained by differentiating the si-nusoidal path and are not reported in this paper. Our objec-tive is to show the tracking performance under uncertaintieson target acceleration bounds V in problem (12).

When the frequency is low, target acceleration is low andwithin the chosen V and the smooth pursuit controller per-forms a good tracking. When the frequency is high, the sac-cade controller is activated due to the high acceleration dur-ing the direction change, and drives the target back to thesmooth-pursuit tracking window (Fig. 13).

Figure 14 depicts experimental results when the tar-get is moving with a constant acceleration. Each exper-iment begins with an initialization phase that brings thetarget to a base speed. The base speed is maintained for

Page 15: Scanning and tracking with independent cameras—a ...Environment visual scanning (EVS) is a critical task for living creatures and artificial systems alike. EVS refers to the process

Auton Robot

(a) ω = 0.7Hz (b) ω = 0.8Hz

(c) ω = 0.9Hz (d) ω = 1Hz

Fig. 13 Experimental results for sinusoidal signal—each plot showsthe target and camera position (in degrees) vs. time (in seconds) fordifferent frequency ω of the signal. The target is represented as adashed line and the camera in solid line. Saccades are indicated with

a thicker line. The start of a saccade is marked with a triangle directlybelow or above the line. In d the system uses the saccade controller tobring back the target into the tracking window

0.25 seconds. This is represented by the phase of zero ac-celeration in the upper plots of Fig. 14. After this phase,the test acceleration is applied until the target exits theexperiment area. As the acceleration increases, the rateof saccades needed to track the target increases as well.With larger accelerations than the ones reported in Fig. 14,after the first saccade, the system could no longer find thetarget.

6 Conclusions

In this paper we have presented a method for visual scanningand autonomous target tracking by means of independentpan-tilt cameras which mimic the chameleon visual sys-tem. We employed a model-based and optimization-basedapproach to the problem, both for the high-level scanningalgorithm and the low-level tracking control design. Bothalgorithms use an MPC approach combining robotic headmodel and target model to optimally solve the scanning andthe tracking problem.

The scheme is flexible and information about roads ortraveling paths in the environment can be incorporated into

the model and change the scanning pattern and tracking al-gorithm accordingly.

The scanning algorithm has been simulated with a settingof two cameras that scans the hemisphere, a setup which issimilar to those of chameleons. With an insignificant tuningeffort, we were able to observe a scanning pattern with a“negative correlation” between the cameras. This scanningpattern has a distinguishable resemblance to the scanningpatterns seen in chameleons in nature.

For the low level tracking, we have designed and im-plemented linear and min-max MPC controller. The MPCtechnique has several advantages: (i) it is model based,(ii) it allows to include the physical constraints of the sys-tem and the target in the control design, (iii) it is robust toabrupt and bounded changes in target directions. Thanks tomost recent development on MPC theory we have computedand implemented the piecewise affine state feedback solu-tion to the MPC problem. The implementation is real-timecapable and experimental results have shown good perfor-mance.

Acknowledgements The authors are grateful to the anonymous re-viewers for their helpful comments on how to improve the originalversion of the manuscript.

Page 16: Scanning and tracking with independent cameras—a ...Environment visual scanning (EVS) is a critical task for living creatures and artificial systems alike. EVS refers to the process

Auton Robot

(a) Final acceleration: v = 60◦/sec2 (b) Final acceleration: v = 105◦/sec2

(c) Final acceleration: v = 120◦/sec2 (d) Final acceleration: v = 150◦/sec2

(e) Final acceleration: v = 60◦/sec2 (f) Final acceleration: v = 105◦/sec2

(g) Final acceleration: v = 120◦/sec2 (h) Final acceleration: v = 150◦/sec2

Fig. 14 Experimental results for constant acceleration. a to d show theacceleration profile of the target (in degrees/s2) vs. time (in seconds).e to h show the position of the target and the camera (in degrees) vs.time, for the corresponding acceleration profile in the upper plots. Thetarget is represented as a dashed line and the camera in a solid line.

Saccades are marked as in Fig. 13. The constant acceleration is appliedafter the target is brought to a base speed, which is maintained for 0.25seconds. The value of the constant acceleration v is provided in theplots labels. As the acceleration increases, more saccades are neededto track of the target

Page 17: Scanning and tracking with independent cameras—a ...Environment visual scanning (EVS) is a critical task for living creatures and artificial systems alike. EVS refers to the process

Auton Robot

References

Anderson, B., & Moore, J. (1995). Optimal filtering. New York: Dover.Armengol, J. A., Prada, F., Ambrosiani, J., & Genis-Galvez, J. M.

(1988). The photoreceptors of the chameleon retina (chamaleochamaleo). A golgi study. Journal für Hirnforschung, 29(4), 403–409.

Asada, M., Tanaka, T., & Hosoda, K. (2000). Adaptive binocular visualserving for independently moving target tracking. In Proceedingsof the of IEEE international conference on robotics and automa-tion (ICRA’00) (Vol. 3, pp. 2076–2081).

Avni, O., Katzir, G., & Rivlin, E. (2007, in preparation). Eye move-ments in the chameleon.

Bemporad, A., Borrelli, F., & Morari, M. (2003). Min-max control ofconstrained uncertain discrete-time linear systems. IEEE Trans-actions on Automatic Control, 48(9), 1600–1606.

Bemporad, A., Morari, M., Dua, V., & Pistikopoulos, E. (2002). Theexplicit linear quadratic regulator for constrained systems. Auto-matica, 38(1), 3–20.

Bernardino, A., & Santos-Victor, J. (1999). Binocular visual tracking:integration of perception and control. IEEE Transactions on Ro-botics and Automation, 15(6), 1937–1958.

Bertsekas, D. P. (2000). Dynamic programming and optimal control(Vol. I, 2nd ed.). Belmont: Athena Scientific.

Borrelli, F. (2003). Constrained optimal control of linear & hybrid sys-tems (Vol. 290). Berlin: Springer.

Borrelli, F., Baotic, M., Bemporad, A., & Morari, M. (2005). Dynamicprogramming for constrained optimal control of discrete-time hy-brid systems. Automatica, 41, 1709–1721.

Bourgault, F., Furukawa, T., & Durrant-Whyte, H. (2004). Processmodel, constraints, and the coordinated search strategy. InIEEE/RSJ international conference on robotics and automation(pp. 5256–5261), New Orleans, LA.

Bowmaker, J. K., Loew, E. R., & Ott, M. (2005). The cone photorecep-tors and visual pigments of chameleons. Journal of ComparativePhysiology A, 191(10), 925–932.

Brogan, W. (1991). Modern control theory (3rd ed.). Upper SaddleRiver: Prentice Hall.

de Groot, J. H., & van Leeuwen, J. L. (2004). Evidence for an elasticprojection mechanism in the chameleon tongue. Proceedings ofthe Royal Society of London, 271, 761–770.

Eagle, J. N. (1984). An optimal search for a moving target when thesearch path is constrained. Operations Research, 32(5), 1107–1115.

Eagle, J. N., & Yee, J. R. (1990). An optimal branch-and-bound pro-cedure for the constrained path, moving target search problem.Operations Research, 38(1), 110–114.

Enns, D., Bugajski, D., Hendrick, R., & Stein, G. (1994). Dynamicinversion: an evolving methodology for flight control design. In-ternational Journal of Control, 59(1), 71–91.

Lewis, F.L. (1986). Optimal estimation. New York: Wiley.Flanders, M. (1985). Visually guided head movement in the African

chameleon. Vision Research, 25(7), 935–942.Grieder, P., Parrilo, P., & Morari, M. (2003). Robust receding horizon

control—analysis & synthesis. In IEEE conference on decisionand control (pp. 941–946), Maui, HI.

Harkness, L. (1977). Chameleons use accommodation cues to judgedistance. Nature, 267, 346–349.

Herrel, A., Meyers, J. J., Aerts, P., & Nishikawa, K. C. (2000). Themechanics of prey prehension in chameleons. The Journal of Ex-perimental Biology, 21(203), 3255–3263.

Kvasnica, M., Grieder, P., & Baotic, M. (2004). Multi-parametric tool-box (MPT). http://control.ee.ethz.ch/mpt/.

Land, M. F. (1999). Motion and vision: why animals move their eyes.Journal of Comparative Physiology A, 185(4), 341–352.

Lau, H., Huang, S., & Dissanayake, G. (2005). Optimal search for mul-tiple targets in a built environment. In Proceedings of IEEE/RSJinternational conference on intelligent robots and systems (IROS)(pp. 228–233).

Lee, J. H., & Yu, Z. (1997). Worst-case formulations of model pre-dictive control for systems with bounded parameters. Automatica,33(5), 763–781.

Ljung, L. (1999). System identification—theory for the user. UpperSaddle River: Prentice Hall.

Mayne, D., Rawlings, J., Rao, C., & Scokaert, P. (2000). Constrainedmodel predictive control: stability and optimality. Automatica,36(6), 789–814.

Movie1 Online. http://www.cs.technion.ac.il/~avni/mov1.avi.

Movie2 Online. http://www.cs.technion.ac.il/~avni/mov2.avi.

Ott, M. (2001). Chameleons have independent eye movements but syn-chronise both eyes during saccadic prey tracking. ExperimentalBrain Research, 139(2), 173–179.

Ott, M., & Schaeffel, F. (1995). A negatively powered lens in thechameleon. Nature, 373, 692–694.

Ott, M., Schaeffel, F., & Kirmse, W. (1998). Binocular vision and ac-commodation in prey-catching chameleons. Journal of Compar-ative Physiology A: Sensory, Neural, and Behavioral Physiology,182(3), 319–330.

Pettigrew, J. D., Collin, S. P., & Ott, M. (1999). Convergence of spe-cialised behaviour, eye movements and visual optics in the sand-lance (teleostei) and the chameleon (reptilia). Current Biology,9(8), 421–424.

Rivlin, E., & Rotstein, H. (2000). Control of a camera for active vision:foveal vision, smooth tracking and saccade. International Journalof Computer Vision, 39(2), 81–96.

Ross, S. M. (1983). Introduction to stochastic dynamic programming.Orlando: Academic Press.

Scokaert, P., & Mayne, D. (1998). Min-max feedback model predic-tive control for constrained linear systems. IEEE Transactions onAutomatic Control, 43(8), 1136–1142.

Stengel, R. (1994). Optimal control and estimation. New York: Dover.

Stone, L. D. (1992). Theory of optimal search (2 ed.). Military Appli-cations Se America.

Sutherland, O., Truong, H., Rougeaux, S., & Zelinsky, A. (2001). Ad-vancing active vision systems by improved design and control. InISER ’00: experimental robotics VII (pp. 71–80), London. Berlin:Springer.

Vijayakumar, S., Conradt, J., Shibata, T., & Schaal, S. (2001). Overt vi-sual attention for a humanoid robot. In Proceedings of IEEE/RSJinternational conference on intelligent robots and systems (Vol. 4,pp. 2332–2337).

Wong, E.-M., Bourgault, F., & Furukawa, T. (2005). Multi-vehicleBayesian search for multiple lost targets. In Proceedings of IEEEinternational conference on robotics and automation (ICRA05)(pp. 3169–3174).

Ofir Avni received the B.Sc. degree in computerscience in 2001, and the M.Sc. degree in com-puter science in 2006, both from the Technion—Israel Institute of Technology. His research in-terests are computer vision: mainly pose esti-mation and tracking, and biologically motivatedrobotics. He is currently an algorithm developerin the Image Processing Department in Rafael.

Page 18: Scanning and tracking with independent cameras—a ...Environment visual scanning (EVS) is a critical task for living creatures and artificial systems alike. EVS refers to the process

Auton Robot

Francesco Borrelli received the ‘Laurea’ de-gree in computer science engineering in 1998from the University of Naples ‘Federico II’,Italy. In 2002 he received his Ph.D. from theAutomatic Control Laboratory at ETH-Zurich,Switzerland. He has been a Research Assis-tant at the Automatic Control Laboratory of theETH-Zurich, a Contract Assistant Professor atthe Aerospace and Mechanics Department at theUniversity of Minnesota, USA and an Assistant

Professor at the ‘Università del Sannio’, Benevento, Italy. He is cur-rently an Assistant Professor in the Department of Mechanical Engi-neering of the University of California at Berkeley, USA. He is authorof the book Constrained Optimal Control of Linear and Hybrid Sys-tems published by Springer Verlag and the winner of the ‘InnovationPrize 2004’ from the ElectroSwiss Foundation. His research interestsinclude constrained optimal control, model predictive control, robustcontrol, parametric programming, singularly perturbed systems and au-tomotive applications of automatic control.

Gadi Katzir received his B.Sc. in Biology andM.Sc. in Zoology, both at the Hebrew Univer-sity of Jerusalem. He received his Ph.D. in An-imal Behaviour from the Cambridge Univer-sity (UK). Katzir is at the Dept. of Biology atOranim—University of Haifa. His research in-terests focus on sensory capacities of both verte-brates and invertebrates, and their relations withthe environment in which the animals live. Hehas studied vision in coral reef fishes, herons

and kingfishers as well as social behaviour in corvids and facial expres-sions in humans. Currently his research centres on amphibious visionin cormorants and visually guided prey capture in chamaeleons.

Ehud Rivlin received the B.Sc. and M.Sc. de-grees in computer science and the M.B.A. de-gree from the Hebrew University in Jerusalem,and the Ph.D. from the University of Maryland.Currently he is an Associate Professor in theComputer Science Department at the Technion,Israel Institute of Technology. His current re-search interests are in machine vision and robotnavigation.

Hector Rotstein received the Ingeniero Elec-tricista degree from the Universidad Nacionaldel Sur, Argentina, and M.Sc. and Ph.D. de-grees from the California Institute of Technol-ogy. Since 1997 he has been with the MissileDivision of Rafael, where he is now a ChiefResearch Engineer in control and navigationsystems. His research interests include robustand vision-based control and the design of in-tegrated navigation systems. Dr. Rotstein is also

a cofounder of M&H Engineering Consultants that has recently beeninvolved in the design of the Scolio-Scan System for detecting infantscoliosis. He is a member of the IEEE.