control of a ptz camera in a hybrid vision system · 2017-02-05 · control of a ptz camera in a...

Control of a PTZ camera in a hybrid vision system

Francois Rameau, Cedric Demonceaux, Desire Sidibe, David Fofi

To cite this version:

Francois Rameau, Cedric Demonceaux, Desire Sidibe, David Fofi. Control of a PTZ camera in ahybrid vision system. International Conference on Computer Vision Theory and Applications,Jan 2014, Lisbon, Portugal. <hal-00968425>

HAL Id: hal-00968425

https://hal.archives-ouvertes.fr/hal-00968425

Submitted on 31 Mar 2014

HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come fromteaching and research institutions in France orabroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, estdestinee au depot et a la diffusion de documentsscientifiques de niveau recherche, publies ou non,emanant des etablissements d’enseignement et derecherche francais ou etrangers, des laboratoirespublics ou prives.

https://hal.archives-ouvertes.fr

https://hal.archives-ouvertes.fr/hal-00968425

Control of a PTZ camera in a hybrid vision system

Francois Rameau, Cedric Demonceaux, Desire Sidibe and David FofiUniversite de Bourgogne, Le2i UMR 6306 CNRS, 12 rue de la fonderie, 71200 Le Creusot, France.

[email protected]

Keywords: Fisheye camera, PTZ, target detection, hybrid vision system

Abstract: In this paper, we propose a new approach to steer a PTZ camera in the direction of a detected object visiblefrom another fixed camera equipped with a fisheye lens. This heterogeneous association of two camerashaving different characteristics is called a hybrid stereo-vision system. The presented method employs epipolargeometry in a smart way in order to reduce the range of search of the desired region of interest. Furthermore,we proposed a target recognition method designed to cope with the illumination problems, the distortion of theomnidirectional image and the inherent dissimilarity of resolution and color responses between both cameras.Experimental results with synthetic and real images show the robustness of the proposed method.

1 INTRODUCTION

Stereo-vision is one of the most explored topic incomputer vision. The traditional approach is basedon the use of two cameras of same nature mimickingthe binocular human vision system (Marr and Poggio,1977). Using two similar cameras drastically sim-plifies the steps of calibration and matching. How-ever, in this paper we are dealing with a non standardstereo-vision rig, composed of a fisheye camera asso-ciated with a Pan-Tilt-Zoom (PTZ) camera. This kindof layout mixing different types of camera is called ahybrid vision system.Omnidirectional cameras have the great advantage toprovide a wide field of view (up to 360◦), howeverthey often provide a limited and non-linear resolution.Furthermore, the use of omnidirectional sensors leadsto a strong geometric distortion of the image, makingmost of the usual image processing methods ineffi-cient.On the other hand, a Pan-Tilt-Zoom camera is azooming perspective camera which can be mechan-ically oriented in multiple directions. Despite a re-stricted field of view, the ability to steer the camerain the desired direction allows to cover a large regionof the scene (up to 360◦ in panoramic and 180◦ in tiltdirection, depending on the manufacturer). The zoompermits to obtain high resolution images of a specificregion of interest (ROI). So, the versatility offered bythose kind of camera is really appreciable for manyapplications especially in the field of video surveil-lance.

The couple composed of these two cameras combinesthe advantages given by both of them, that is to saya large field of view and an accurate vision of a par-ticular ROI with an adjustable level of details usingthe zoom. Nevertheless, the control of the mechan-ical camera with information from the fisheye oneis not straightforward. Moreover, the difference be-tween the two cameras has to be taken into consider-ation. In fact, the model of projection as well as thecolor response and the resolution of the two sensorsare greatly different.Therefore, in this paper we propose a new approachable to cope with the previously mentioned problemsand to find out the orientation of the PTZ camera forvisualizing a defined ROI on the fisheye image.The rest of the paper is organized as follows. First,we review previous methods using heterogeneous vi-sion system. Section 2 is dedicated to an overview ofthe necessary background, containing a detailed de-scription of the spherical model and of the epipolargeometry. The sections 3 and 4 respectively deal withthe model of our system and its calibration. In Section5, we describe our method of target detection throughan hybrid stereo-vision system. Section 6 summarizesthe results of our experiments. Section 7 concludesthe paper.

1.1 Previous works

A hybrid vision system means that the two camerasof the rig have different characteristics. This com-bination allows to obtain additional information such

as an extension of the field of view, the extraction of3D information or the study of a wider range of wave-lengths (Cyganek and Gruszczynski, 2013). Recently,we noticed the emergence of composite stereo-visionsensors using the association of an omnidirectionaland a perspective camera. This camera associationis often considered as a bio-inspired method since thehuman retina can be divided into two parts - the fovealand the peripheral (Gould et al., 2007) - leading to acoarse vision in the peripheral region which is moresensitive to motion and a fine vision in the central partof the retina. In robotics, many articles have taken ad-vantage of this specificity to facilitate the navigationusing the wide angle camera, while the perspectivecamera allows to obtain accurate details of the envi-ronment. It is for instance the case in (Neves et al.,2008), where a soccer robot can navigate and detectthe ball using an omnidirectional camera while a per-spective camera is used for an accurate front view ofthe game. Similarly, in (Adorni et al., 2002) the au-thors proposed another approach for obstacle avoid-ance using merged information acquired from a PTand a catadioptric camera. Some innovative roboticapplications using the previously described tandem ofcameras do exist such as the robot photographer pre-sented in (Bazin et al., 2011). Furthermore the useof this system is not only limited to ground robotsbut it is also applied to UAVs, for instance in (Ey-nard et al., 2012) the authors proposed to estimate thealtitude and attitude of a UAV with a heterogeneoussensor.However, the main application of hybrid vision re-mains the video surveillance because of the great ver-satility offered by those systems. (Ding et al., 2012)and (Puwein et al., 2012) are two representative ex-amples of the possibilities offered by PTZ camerasnetwork respectively for optimizing the surveyed areaand for sport broadcasting. Many others papers aredealing with collaborative tracking and recognitionusing these types of cameras (Micheloni et al., 2010;Raj et al., 2013; Amine Iraqui et al., 2010). For videosurveillance applications we often assume a static lo-cation of the rig, making possible to calibrate and usethe hybrid stereo-vision system based on different a-priori. For instance in (Chen et al., 2008; Cui et al.,1998; Scotti et al., 2005), the planarity of the ground,the height of the devices, a size of a person or thealignment of the vertical axis of the cameras are sup-posed to be known. Another very usual approach is tocreate a look up table between the PTZ setpoints andthe coordinates of the omnidirectional camera (Badriet al., 2007; Liao and Cho, 2008). This method of cal-ibration assumes a fixed environment and required acumbersome step of calibration. In this paper we pro-

Oc

O

Ps=(Xs,Ys,Zs)

P=(X,Y,Z)

f

l

pi=(x,y)

πi

Figure 1: Unified spherical model

pose a more flexible method able to steer a rotatingcamera in the direction of a selected target from thewide angle image in an unknown environment.

2 Background

2.1 The unified spherical model

The cameras in our system have different projectionmodels, it is possible to homogenize it using the uni-fied spherical model defined by Barreto et al. (Barretoand Araujo, 2001) which remains valid for fisheyeand perspective cameras (Ying and Hu, 2004). The-oretically, this model can only fit with SVP (SingleView Point) cameras which is not the case of fisheyesensors. However it has been proved that this approx-imation still holds (Courbon et al., 2012).Furthermore, it is also a suitable model for PT/PTZcameras since the translation leads by the mechani-cal motions of the camera can be neglected (Rameauet al., 2012). Consequently the SVP assumption isalso satisfied.For any central camera, the image formation processcan be described by a double projection on a Gaus-sian sphere (as shown in fig.1). First, a world pointP is projected onto the sphere at the point Ps. Then,this first projection is followed by a second one onthe image plane πi inducing the pixel pi. This pro-jection starts from a point Oc located above the centerof the sphere. The distance l between the point Ocand the center of the sphere O models the inherent ra-dial distortion of the camera. This distance is null ifwe consider a perspective camera without distortionwhile l > 1 for fisheye lens (Ying and Hu, 2004).In this work we mainly use the inverse projection toback-project image plane’s pixel on its equivalent uni-tary sphere. Basically this back-projection allows to

take the non-linear resolution as well as the camera’sdistortion into consideration.A prior knowledge on the intrinsic parameters of the

camera K =

fx s u00 fy v00 0 1

is necessary in order to

back-project a pixel pi(x,y) onto a point lying on thesphere P(Xs,Ys,Zs). Knowing those parameters, theprojection can be expressed under the following form:

Zs =−2.l.ω+

√(2.l.ω)2−4(ω+1).(l2.ω−1)

2(ω+1)Xs = xt(Zs + l)Ys = yt(Zs + l)

,

with

xtyt1

' K−1 pi and ω = x2t + y2

t .

2.2 Epipolar geometry

The epipolar geometry is the mathematical model re-lating two images of the same scene taken from dif-ferent viewpoints. This well known geometry is basedon the intersection of the image planes with the epipo-lar plane πe. This plane is formed by the optical cen-ters of the cameras and a 3D point X projected onboth images in x and x′. The epipolar geometry canbe mathematically formalized using the fundamen-tal matrix F = K−T

2 T[×]RK−11 = K−T

2 EK−11 (with K1

and K2 the intrinsic parameters of the cameras andE the essential matrix). Therefore the fundamentalmatrix links two corresponding points by the relationx′T Fx = 0.The epipolar geometry remains valid in the context ofan omnidirectional or a hybrid stereo-vision system(Fujiki et al., 2007). In fact, since we use the sphericalrepresentation of images the projective geometry isvalid. In this configuration the centers of the spheresOo and Op are respectively the optical center of theomnidirectional camera and of the perspective cam-era. Thus, the baseline between cameras intersectseach sphere in two positions, forming four epipolese1, e2, e′1 and e′2 (see fig.2). Because the projectivegeometry is preserved the epipolar relation betweenpoints on the spheres can be expressed as follows:

P′T EP = 0, (1)

where E = [t]×R, with t and R the translation and therotation between the cameras. Hence, any selectedpoint P on So defines a great circle C on Sp.

P

P'e2

e1

e1'

e2'

Pw

Oo

OpC

So

Sp

Xo

Yo

Zo

Yp

Xp

Zp

YptzXptz

Zptz

Rpo,Tp

o

Rptzp (φ,ψ)

Figure 2: Model of our hybrid stereo-vision system

So Spherical model of the fisheye cameraSp Spherical model of the PTZ cameraOo Center of So, world coordinate systemOp Center of Sp, Op = T o

pP Target point ∈ SoPw 3D location of the target in the worldE Essential matrix E = T[×]Rπe Epipolar plane defined by EPC Epipolar great circle ∈ SpP′ Desired point ∈Cψ Angular setpoint of the camera in tiltϕ Angular setpoint of the camera in pan

Kp Intrinsic parameters of the PTZ cameraKo Intrinsic parameters of the fisheye cameraT o

p Translation between camerasRo

p Rotation between the cameras for ψ = ϕ = 0Rp

ptz(ϕ,ψ) Rotation of the PTZ in its coordinate system(−→X o,−→Y o,−→Z o) Fisheye coordinate system

(−→X ptz,

−→Y ptz,

−→Z ptz) PTZ coordinate system

(−→X p,−→Y p,−→Z p) Intermediate coordinate system for Rp

ptz = I

Table 1: Notations

3 Model of the system

Figure 2 summarizes all the possible geometricrelationships between the 2 cameras and table 1 de-fines the notations used. For convenience, the fisheyecoordinate system (

−→X o,−→Y o,−→Z o) is taken as the ref-

erence of our model. Then, the position and orien-tation of the PTZ camera in its own coordinate sys-tem (

−→X ptz,

−→Y ptz,

−→Z ptz) is expressed with respect to

the global coordinate system. (−→X ptz,

−→Y ptz,

−→Z ptz) is

the coordinate system of the PTZ camera for its cali-brated position. Note that a translation T p

ptz practicallyexist, it is the residual translation leads by the mecha-nisms used to rotate the camera.Consequently, a point Pptz in the PTZ coordinate sys-tem can be expressed in the global coordinate systemas follow:

Po = Rop

T Rpptz(ϕ,ψ)

T(Pptz−T p

ptz)−T op , (2)

where Po is the point Pptz in the omnidirectional coor-dinate system. In our case we consider the translationT p

ptz as negligible as it is often the case (Agapito et al.,

2001), hence:

Po = Rop

T Rpptz(ϕ,ψ)

T Pptz−T op . (3)

4 Calibration of the hybridstereo-vision system

4.1 Intrinsic calibration

As mentioned in the section 2, our cameras have tobe calibrated individually in order to use the spher-ical model. For the PTZ camera (for a given levelof zoom) we used the toolbox developed by Bouguet(Bouguet, 2008) in order to obtain the calibration ma-trix Kp while the fisheye camera has been calibratedusing (Mei and Rives, 2007) giving Ko and l.

4.2 Extrinsic calibration

To determine the extrinsic parameters of the stereo rigat its initial position (Ro

p and T op with Rp

ptz(0,0) = I),we propose to compute the homography H inducedby a plane projected on our two spheres (Mei et al.,2008):

H ∼ Rop−

T op nT

d, (4)

where ∼ denotes the equality up to scale, with nT/dthe ratio between the normal of the plane and its dis-tance to the optical center of the fisheye camera.In order to compute this homography we use m pointson a plane visible simultaneously by both cameras.Let Pi

o be the ith point lying on the fisheye sphere Soand Pi

p the ith point on the PTZ’s sphere Sp. The ho-mography of the plane between the two views leadsto the following relationship:

Pio ∼ HPi

p. (5)

After computing H, we can extract Rop and T o

p as aninitialization for a non-linear refining method by solv-ing the following optimisation problem:

{R∗,T ∗}=

argminR,T

k

∑i=1

[d2(Pip,T

op[×]R

opPi

o)+d2(Pip

TT o

p[×]Rop,P

io)],

such that RT = R−1

with k the number of matched points, Pip ,Pi

o the ith

matched points of the scene and d the geodesic dis-tance between the point and its corresponding epipo-lar great circle.

5 Methodology

We want to find the angular setpoint of the PTZcamera (ψ,ϕ) in order to visualize a ROI selected onthe fisheye camera. The proposed method can be de-composed into two main steps. The scanning of theepipolar great circle using the rotations of the PTZcamera followed by the detection of the region of in-terest.

5.1 Epipolar circle scanning

With a fixed sensor and without any a priori knowl-edge of the scene it is impossible to directly steer thePTZ camera in the desired direction. However, con-sidering the centroid of the ROI (Pc

o ) on So, we canlimit the search space to the set of pointsPc

p ∈ Sp sat-

isfying PcTp T o

p[×]RopPc

o = 0. This set of points draws agreat circle C on Sp. We propose to scan this circlealigning the central axis of the PTZ camera on C.Getting the setpoints of the camera to scan the circle isstraightforward. Indeed, we obtain them by convert-ing all points lying on C into spherical coordinates:

∀P(X ,Y,Z) ∈ Sp,{ϕ = arccos(Z/

√(X2 +Y 2 +Z2))

ψ = arctan(Y/X),

Finally the rotations satisfying the following state-ment allow to center the PTZ camera on the circle:

N.RopRp

ptz(ϕ,ψ)[0 0 1]T = 0,

with N = T op[×]R

opPc

o the normal of the epipolar plane.

5.2 Detection of the ROI

This step aims to match a selected template on thefisheye image to its corresponding region through aseries of perspective images. The template matchingis one of the most challenging task in the context ofa hybrid vision system because of the multiple prob-lems already mentioned in the introduction of this ar-ticle. Thus, in this section we propose an approachable to deal with the high dissimilarities between im-ages. As we scan the epipolar circle with the PTZcamera, we get a set of images (an image for eachposition (ϕ,ψ) of the camera). The detection task isthus to localize the target through the images acquiredduring the scanning step, and to find the correspond-ing setpoint to steer the camera to this specific ROI.First we select a patch on the fisheye image repre-senting the ROI. Thereafter, we detect features pointson the patch as well as on the set of perspective im-ages. Let po, and pp be the Harris points detected onthe fisheye and perspective images respectively. The

detected points are back-projected on the sphere, giv-ing Po and Pp. Most of these detected points can berejected using the epipolar constraint | PT

p EPo |< ε,where ε is an arbitrary threshold. Then only the poten-tial corresponding points are preserved. Note that thisapproach is purely geometric, which means that nophotometric descriptors are used, making our match-ing robust to illumination changes, rotation and scale.Now, considering that the ROI is locally planar it ispossible to compute the homography fitting the modeland reject outliers simultaneously, using a RANSACprocedure (Fischler and Bolles, 1981). A homogra-phy is typically computed using 4 points, yet in thepresent case, since we know the rotation and transla-tion between the cameras, we can accordingly reducethe number of points needed to compute the homog-raphy matrix H by pre-rotating the points Pi

p on thesphere Sp. Reducing the number of points needed tocompute H allows to exponentially decrease the com-plexity of the method. Without loss of generality, theequation (4) can then be rewritten as follows:

H ∼ I−T o

p nT

d. (6)

Knowing T op , the number of degrees of freedom of H

is reduced to 3 which are the 3 entries of Nd = nT

d . His then expressed by:

H =

1− nxd tx − ny

d tx − nzd tx

− nxd ty 1− ny

d ty − nzd ty

− nxd tz − ny

d tz 1− nzd tz

, (7)

where T op = [tx ty tz]T and Nd = [nx ny nz]

T/d. Tofind out the 3 entries of Nd we solved Pp × PoH =0. Every single point correspondences Po(x,y,z) andPp(x′,y′,z′) gives 3 equations (of the form ANd = b)all linearly dependent to the following equation:

[tyxz′−tzxy′ tyyz′−tzyy′ tyzz′−tzy′z]Nd = y′z−yz′,(8)

so 3 points correspondences are enough to solve Ndusing a singular value decomposition. The homogra-phy is finally computed using the equation (6). Theoutput of the RANSAC process subsequently givesthe inliers points and the optimal matrix H fitting ourmodel.To summarize the RANSAC algorithm applied here:3 points are randomly selected among the pointsPoand Pp to compute a homography matrix H, then thenumber of inliers (NI) is computed by,

ki =

{1 if ‖Pi

o−HPip‖2 +‖Pi

p−H−1Pio‖2 < τ

0 else,

NI =n

∑i=1

ki,

Figure 3: Hybrid stereo vision system used for the experi-ments

where τ is an arbitrary threshold and n the total num-ber of potential matched points. Afterwards, we reit-erated the process until we get the homography fittingthe largest number of inliers.Finally, the resulting bounding box on the PTZ imagecan be easily drawn by transforming the coordinatesof the corners of the selected ROI using H. The set-point to steer the camera in the desired direction isgiven by the spherical coordinates of the centroid ofthe patch Pc

p = HPco .

6 Results

This section is dedicated to the assessment of ourmethod, to do so we present here a quantitative anal-ysis tested in a photo-realistic environment generatedwith a ray tracer software. A series of qualitative ex-periments in real conditions is also conducted to provethe robustness of our approach in such complex con-figurations.In order to test our algorithm we have decided to usePovRay (Persistence of Vision Ray Tracer) for syn-thesizing fisheye and perspective views in a totallycontrolled environment. In fact, the internal param-eters as well as the localization of the cameras areknown and tunable. This assessment approach allowsto work in a quasi-real scene where all the 3D infor-mation are known making the experiments as close aspossible from the reality. In this series of tests, bothcamera have a 640×480 pixels sensor.We also conducted multiple experiments with real im-ages where the results have been obtained using afixed 640× 480p’s camera equipped with a fisheyelens to provide a field of view of 180◦. The PTZcamera used is an AXIS 2130R which has the abil-ity to rotate in pan and tilt directions respectively upto 338◦ and 90◦. It is also able to perform an opticalzoom up to 16×. The rig is hung at the cell of a room(as shown in fig 3). On figure 4, we can see the spheri-cal representation of the system in its initial calibratedposition (that is to say ψ = ϕ = 0). On this figure we

can clearly distinguish the great epipolar circle C (inred) on the sphere Sp corresponding to the selectedred point on the sphere So.

Figure 4: Spherical representation of the system after cali-bration

6.1 Scan of the epipolar line

In our simulation using PovRay the scan of the epipo-lar line is always very accurate since our extrin-sic/intrinsic parameters are known. Figure 5 illus-trates a representative experiment done with our sys-tem, it shows a sample of images acquired after se-lecting the region (depicted in figure 5(a)) on the fish-eye image. Here, we intentionally selected a distantand hardly distinguishable object on the omnidirec-tional image. This series of images is indeed the re-sult of the computed angles steering the PTZ cameraalong the great epipolar circle drawn by the center ofthe fisheye’s patch. In this sequence of images wecan clearly see the epipolar line crossing the princi-pal point (which is close to the center of the images)of every single images, furthermore this epipolar lineaccurately pass through the target object as expected.

6.2 Object detection

In our experiments we used the Harris corner detectordirectly on the fisheye image and on the images takenwith the PTZ camera. Note that, this approach doesnot need any rectification of the fisheye image whichis a considerable gain in computation time. Further-more, for all object detection presented here we keptthe thresholds τ and ε fixed (0.01 and 0.1 respec-tively).

6.2.1 Experiments with synthetic images

In these synthetic experiments we point out the ro-bustness of our algorithm in various scenarios. Themetric used to quantify the quality of the object local-ization is the angular distance between the center of

(a)

(b) (c)

(d) (e)Figure 5: Scanning of an epipolar circle (a) Fisheye imagewith selected ROI, (b) (c) (d) (e) epipolar line through mul-tiple images from the PTZ camera

noise variance 0 5.1 10.2 15.3 20.4 25.5 30.6 35.7angular error (◦) 6.89 7.01 3.96 5.13 5.66 5.04 4.62 6.07

Table 2: Angular error against Gaussian noise

the desired ROI with the actual position of the cam-era.First of all, we tested the robustness of our algorithmagainst different level of pixels noise. In the exper-iment shown in figure 6, we have added a Gaussiannoise directly on the pixel intensities -of both PTZand fisheye images- in order to disturb the corner de-tection. To make the results more readable, we havedisplayed the resulting ROI on the non-noisy image.Table 2 depicts the angular accuracy of the method fordifferent levels of noise. This table shows that thereare no big fluctuations in the angle estimations withincrease in noise up to some level. Note that evenwithout the presence of noise we are not able to have anull error because the selected area is non-planar andcannot give a perfect matching using an homography.Nevertheless, the desired region is always in the sightof the camera whatever the level of noise. The resultssuggest a good robustness to this type of image noisebecause we do not match the corners based on theirintensity.

In figure 7 we experiment our method on differentsurfaces, the one selected in figure 7(a) is a perfect

(a)

(b) (c)Figure 6: Results obtained against noise, (a) master image,(b),(c) results with a noise variance of 0.0 and 38.25 respec-tively

planar region, (b) is slightly planar while the last ROIis far from being a plane. In the first case the com-puted angular error is 0.58◦, 0.6◦ for the second andfinally 7.34◦ for the last, it means that the geometryof the object has a direct incidence on the accuracyof the algorithm. However, it also shows that even incase of a totally non-planar area, our algorithm is stillable to steer the camera in the right direction, keep-ing the desired object within the field of view of thecamera. Regarding the application we are not look-ing for a perfect homography computation since anapproximation is enough to orient the PTZ camera inthe desired direction.

Figure 8 contains the results computed for thelocalization of the same object for different focallengths. The angular errors obtained are in between2.5 an 4◦ proving that our method performs well forvarious level of zoom.

6.2.2 Experiments with real images

The master fisheye image (figure 9(a)) used for allthese experiments does not change while the imagesfrom the PTZ camera are subject to many changes inillumination or in the environment.In the first experiment we selected a flat and well-textured surface (see the red bounding box in fig.9(a))which is matched among images from the perspectivecamera. The resulting bounding box surrounding theselected area can be shown in figure 9(b), we can seethat the matching is close to be perfect.Figure 9(e) depicts a test with a planar area but with

(a)

(b)

(c)Figure 7: Test with various type of surface (a) planar (b)slightly planar (c) non-planar

a partial occlusion on the perspective image. Evenin this circumstance the recognition is still very ac-curate. To assert our method we also applied it onnon-planar regions, those experiments are depicted infigure 9(c)(d). Despite the planar patch assumptionis not satisfied, the detection of the desired areas re-mains precise.Figure 9(f) is the result of a detection in very chal-lenging conditions, in fact the selected area is not flatand with occlusions. Even in these conditions our al-gorithm achieves very good results.While the other images assumes a optical zoom of 1X,figure 9(g) shows a result with a zoom of 5X. The useof different zoom levels do not affect much the per-formances of the proposed method. Finality the lastexperiment (figure9(h)) proves the robustness of ourmethod to strong illuminations changes.

7 Conclusion

In this paper we have presented a flexible and effi-cient approach to control a PTZ camera in a heteroge-neous vision system. We proved that it is possible tolocalize a target with a PTZ camera using only infor-mation from any omnidirectional camera respectingthe SVP assumption. Our method combines manyadvantages. Indeed it performs well even in an un-known environment by scanning an epipolar circle.Furthermore the detection of the target object is onlybased on geometric assumptions, while most of the

(a)

(b) (c)

(d) (e)Figure 8: Detection of a ROI using different focal lengths(a) fisheye image with the selected ROI, results for an Hor-izontal Field Of View of (b) 40◦, (c) 50◦, (d) 65◦, (e) 80◦

hybrid matching methods in the literature use photo-metric descriptors. Consequently, our approach candeal with strong distortions, illumination changes andbig scale difference without any rectification of theomnidirectional image. It is also important to notethat this template matching can be used for any cali-brated hybrid vision system.The provided set of qualitative and quantitative resultsshow the great accuracy and flexibility offers by ourmethod which is capable to detect any kind of ROI.This work can be useful in many robotics or videosurveillance applications, for instance as an initializa-tion for collaborative object tracking.

ACKNOWLEDGEMENTS

This work was supported by DGA (Direction Gen-erale de l’Armement) and the regional council of Bur-gundy.

(a)

(b) (c)

(d) (e)

(f) (g)

(h)Figure 9: Detection of various objects (a) Fisheye imagewith selected ROIs, (b) (c) (d) (e) (f) (g) and (h) Detectedtargets through PTZ images

REFERENCES

Adorni, G., Bolognini, L., Cagnoni, S., and Mordonini, M.(2002). Stereo obstacle detection method for a hybridomni-directional/pin-hole vision system. In RoboCup2001: Robot Soccer World Cup V, pages 244–250,London, UK. Springer-Verlag.

Agapito, L., Hayman, E., and Reid, I. (2001). Self-

calibration of rotating and zooming cameras. Inter-national Journal of Computer Vision, 45(2):107–127.

Amine Iraqui, H., Dupuis, Y., Boutteau, R., Ertaud, J.-Y.,and Savatier, X. (2010). Fusion of omnidirectionaland ptz cameras for face detection and tracking. InEmerging Security Technologies (EST), 2010 Interna-tional Conference on, pages 18–23. IEEE.

Badri, J., Tilmant, C., Lavest, J., Pham, Q., and Sayd, P.(2007). Camera-to-camera mapping for hybrid pan-tilt-zoom sensors calibration. In SCIA, pages 132–141.

Barreto, J. P. and Araujo, H. (2001). Issues on the geome-try of central catadioptric image formation. In CVPR,pages 422–427.

Bazin, J., Kim, S., Ghoi, D., J.Y.Lee, and Kweon, I. (2011).Mixing collaborative and hybrid vision devices forrobotics applications. journal of Korea Robotics Soci-ety.

Bouguet, J. Y. (2008). Camera calibration toolbox for Mat-lab.

Chen, C.-H., Yao, Y., Page, D. L., Abidi, B. R., Koschan,A., and Abidi, M. A. (2008). Heterogeneous fusionof omnidirectional and ptz cameras for multiple ob-ject tracking. IEEE Trans. Circuits Syst. Video Techn.,18(8):1052–1063.

Courbon, J., Y.Mezouar, and Martinet, P. (2012). Evalu-ation of the unified model of the sphere for fisheyecameras in robotic applications. Advanced Robotics,26(8-9):947–967.

Cui, Y., Samarasekera, S., Huang, Q., Greienhagen, M., andEnhagen, M. G. (1998). Indoor monitoring via thecollaboration between a peripheral sensor and a fovealsenor. In In Proc. of the IEEE Workshop on VisualSurveillance, pages 2–9. IEEE Computer Society.

Cyganek, B. and Gruszczynski, S. (2013). Hybrid computervision system for drivers’ eye recognition and fatiguemonitoring. Neurocomputing.

Ding, C., Song, B., Morye, A., Farrell, J. A., and Roy-Chowdhury, A. K. (2012). Collaborative sensing ina distributed ptz camera network. Image Processing,IEEE Transactions on, 21(7):3282–3295.

Eynard, D., Vasseur, P., Demonceaux, C., and Fremont,V. (2012). Real time uav altitude, attitude and mo-tion estimation from hybrid stereovision. AutonomousRobots, 33(1-2):157–172.

Fischler, M. A. and Bolles, R. C. (1981). Random sampleconsensus: a paradigm for model fitting with appli-cations to image analysis and automated cartography.Commun. ACM, 24(6):381–395.

Fujiki, J., Torii, A., and Akaho, S. (2007). Epipolar ge-ometry via rectification of spherical images. In Pro-ceedings of the 3rd international conference on Com-puter vision/computer graphics collaboration tech-niques, MIRAGE’07, pages 461–471, Berlin, Heidel-berg. Springer-Verlag.

Gould, S., Arfvidsson, J., Kaehler, A., Messner, M., Brad-ski, G., Baumstarck, P., Chung, S., and Ng, A. Y.(2007). Peripheral-foveal vision for real-time objectrecognition and tracking in video. In In InternationalJoint Conference on Artificial Intelligence (IJCAI.

Liao, H. C. and Cho, Y. C. (2008). A new calibrationmethod and its application for the cooperation ofwide-angle and pan-tilt-zoom cameras. InformationTechnology Journal, 7(8):1096–1105.

Marr, D. and Poggio, T. (1977). A theory of human stereovision. Technical report, Cambridge, MA, USA.

Mei, C., Benhimane, S., Malis, E., and Rives, P. (2008).Efficient homography-based tracking and 3-d recon-struction for single-viewpoint sensors. IEEE Transac-tions on Robotics, 24(6):1352–1364.

Mei, C. and Rives, P. (2007). Single view point omni-directional camera calibration from planar grids. InIEEE International Conference on Robotics and Au-tomation.

Micheloni, C., Rinner, B., and Foresti, G. L. (2010). Videoanalysis in pan-tilt-zoom camera networks. SignalProcessing Magazine, IEEE, 27(5):78–90.

Neves, A. J., Martins, D. A., and Pinho, A. J. (2008).A hybrid vision system for soccer robots using ra-dial search lines. In Proc. of the 8th Conference onAutonomous Robot Systems and Competitions, Por-tuguese Robotics Open-ROBOTICA, pages 51–55.

Puwein, J., Ziegler, R., Ballan, L., and Pollefeys, M. (2012).Ptz camera network calibration from moving peoplein sports broadcasts. In Applications of Computer Vi-sion (WACV), 2012 IEEE Workshop on, pages 25–32.IEEE.

Raj, A., Khemmar, R., Eratud, J. Y., and Savatier, X.(2013). Face detection and recognition under hetero-geneous database based on fusion of catadioptric andptz vision sensors. In Proceedings of the 8th Interna-tional Conference on Computer Recognition SystemsCORES 2013, pages 171–185. Springer.

Rameau, F., Habed, A., Demonceaux, C., Sidibe, D., andFofi, D. (2012). Self-calibration of a ptz camera usingnew lmi constraints. In ACCV.

Scotti, G., Marcenaro, L., Coelho, C., Selvaggi, F., andRegazzoni, C. (2005). Dual camera intelligent sen-sor for high definition 360 degrees surveillance. IEEProceedings-Vision, Image and Signal Processing,152(2):250–257.

Ying, X. and Hu, Z. (2004). Can we consider central cata-dioptric cameras and fisheye cameras within a unifiedimaging model. In ECCV, pages 442–455.

control of a ptz camera in a hybrid vision system · 2017-02-05 · control of a ptz camera in a...

Documents