[ieee 2009 2nd international conference on computer science and its applications (csa) - jeju, korea...

8
People Reidentification in a camera network Icaro Oliveira de Oliveira and Jose Luiz de Souza Pio Federal University of Amazonas - Computer Science Department [email protected]/[email protected] Abstract—This paper presents an approach to the object reidentification problem in a camera network system. The rei- dentification or reacquisition problem consists essentially on the matching process of images acquired from different cameras. This work is applied in a monitored environment by cameras. This application is important to modern security systems, in which the targets presence identification in the environment expands the capacity of action by security agents in real time and provides important parameters like localization for each target. We used target’s interest points and target’s color with features for reidentification. The satisfactory results were obtained from real experiments in public video datasets and synthetic images with noise. I. I NTRODUCTION In many video-surveillance applications, it is desirable to determine if a presently visible person has already been observed somewhere else in the cameras system. This kind of problem is commonly known as ”object reidentification”, and a general presentation of this field can be found for instance in [1]. Re-identification algorithms have to be robust even in challenging situations caused by differences in camera viewpoints and orientations, varying lighting conditions, pose variability of persons, and rapid change in clothes appearance. The proliferation of video cameras in public places such as airports, streets, parking lots and shopping malls can create many opportunities for business and public safety applications, including surveillance for threat detection, monitoring parking lots and streets, customer tracking in a store, assets movement control, detecting unusual events in a hospital, monitoring elderly people at home, optimization of employee placement etc. These applications require the ability of automatically detecting, recognizing and tracking people and other objects by analyzing the video or image data. In spite that video surveillance has been in use for decades, the development of systems that can automatically detect and track people is still an active research area. Many approaches have been proposed in recent years. They differ in various aspects such as the number of cameras used, the type of cameras (grayscale or color, mono or stereo, CCD or Webcams, etc.) and their speed and resolution, type of environment (indoors or outdoors), area covered (a room or a hall, a hallway, several connected rooms, a parking lot, a highway, etc.), and location of cameras (with or without overlapping fields of view). There is a vast number of papers that are devoted to this research area and we shall overview some of them in the related work section below. However, the performance of most systems is still far from the requirement of real applications due to three major reasons. First, actually the capability of the video and image analysis techniques are dwarfed by the complexity of the problem. For example, The existing face recognition techniques for person identification in an image, are susceptible to many factors, such as changes in the luminosity, view angle and even very simple disguises. Second, most of such systems are developed and tested in controlled ”simulated” environments, e.g., a person deliber- ately walking back and forth in the room, instead of the real public areas where events take place naturally. Finally, the efforts have been mostly focused on analyzing the video data, without taking into consideration the domain knowledge about the environment that can essentially improve the accuracy of both tracking and recognition. The amount of surveillance videos is increasing. Finding useful information from a large collection of surveillance videos is becoming unmanageable by human. This paper aims at building an object retrieval system for such a large collection. The detection and matching of human appearance is a challenging problem. The viewpoint changes lead to the difference in appearance. The lighting conditions are not constant due to sunlight, street light, and cloud. Occlusion happens frequently due to the presence of other objects and traffic. Weather condition such as rain also causes severe changes in background as well as foreground. II. RELATED WORK Several studies have contributed to this theme, such as tracking methods presented in [2], [3], [4], [5], methods of persons detection as used in [6]. The novelty presented in this paper is the match of interest points, different from the methods developed in [7] that uses matches among images, in [2] that uses biometric characteristics and in [5], [4], [6] that uses color matches. The approaches [3], [5], [8] have limitation on the number of people to be reidentified, because when there is movement of people or cars a new target is inserted into the structure as it also occurs in [3] that does not treat the movements of people as it is treated in [7]. The structure used in [8] for cars has a method of maintenance for removing unnecessary nodes, but this additional procedure has become a bottleneck in the system. In [3] a person re-identification scheme using matching of interest points collected in several images during short video sequences was proposed and evaluated, while in this work, a a person re-identification scheme using matching of interest points from one image is proposed and evaluated. In this work we present a novel algorithm for object reidentification in camera networks. The major contributions are the very efficient description of targets using local features, and the subsequent communication and matching algorithm 978-1-4244-4946-0/09/$25.00 ©2009 IEEE

Upload: jose-luiz-souza

Post on 27-Mar-2017

213 views

Category:

Documents


0 download

TRANSCRIPT

People Reidentification in a camera networkIcaro Oliveira de Oliveira and Jose Luiz de Souza Pio

Federal University of Amazonas - Computer Science [email protected]/[email protected]

Abstract—This paper presents an approach to the objectreidentification problem in a camera network system. The rei-dentification or reacquisition problem consists essentially on thematching process of images acquired from different cameras.This work is applied in a monitored environment by cameras.This application is important to modern security systems, inwhich the targets presence identification in the environmentexpands the capacity of action by security agents in real time andprovides important parameters like localization for each target.We used target’s interest points and target’s color with featuresfor reidentification. The satisfactory results were obtained fromreal experiments in public video datasets and synthetic imageswith noise.

I. INTRODUCTION

In many video-surveillance applications, it is desirable todetermine if a presently visible person has already beenobserved somewhere else in the cameras system. This kindof problem is commonly known as ”object reidentification”,and a general presentation of this field can be found forinstance in [1]. Re-identification algorithms have to be robusteven in challenging situations caused by differences in cameraviewpoints and orientations, varying lighting conditions, posevariability of persons, and rapid change in clothes appearance.

The proliferation of video cameras in public places suchas airports, streets, parking lots and shopping malls can createmany opportunities for business and public safety applications,including surveillance for threat detection, monitoring parkinglots and streets, customer tracking in a store, assets movementcontrol, detecting unusual events in a hospital, monitoringelderly people at home, optimization of employee placementetc. These applications require the ability of automaticallydetecting, recognizing and tracking people and other objectsby analyzing the video or image data. In spite that videosurveillance has been in use for decades, the development ofsystems that can automatically detect and track people is stillan active research area.

Many approaches have been proposed in recent years. Theydiffer in various aspects such as the number of cameras used,the type of cameras (grayscale or color, mono or stereo, CCDor Webcams, etc.) and their speed and resolution, type ofenvironment (indoors or outdoors), area covered (a room ora hall, a hallway, several connected rooms, a parking lot,a highway, etc.), and location of cameras (with or withoutoverlapping fields of view). There is a vast number of papersthat are devoted to this research area and we shall overviewsome of them in the related work section below. However, theperformance of most systems is still far from the requirementof real applications due to three major reasons. First, actuallythe capability of the video and image analysis techniques are

dwarfed by the complexity of the problem. For example, Theexisting face recognition techniques for person identificationin an image, are susceptible to many factors, such as changesin the luminosity, view angle and even very simple disguises.Second, most of such systems are developed and tested incontrolled ”simulated” environments, e.g., a person deliber-ately walking back and forth in the room, instead of the realpublic areas where events take place naturally. Finally, theefforts have been mostly focused on analyzing the video data,without taking into consideration the domain knowledge aboutthe environment that can essentially improve the accuracy ofboth tracking and recognition.

The amount of surveillance videos is increasing. Findinguseful information from a large collection of surveillancevideos is becoming unmanageable by human. This paperaims at building an object retrieval system for such a largecollection. The detection and matching of human appearanceis a challenging problem. The viewpoint changes lead tothe difference in appearance. The lighting conditions are notconstant due to sunlight, street light, and cloud. Occlusionhappens frequently due to the presence of other objects andtraffic. Weather condition such as rain also causes severechanges in background as well as foreground.

II. RELATED WORK

Several studies have contributed to this theme, such astracking methods presented in [2], [3], [4], [5], methods ofpersons detection as used in [6]. The novelty presented inthis paper is the match of interest points, different from themethods developed in [7] that uses matches among images, in[2] that uses biometric characteristics and in [5], [4], [6] thatuses color matches.

The approaches [3], [5], [8] have limitation on the numberof people to be reidentified, because when there is movementof people or cars a new target is inserted into the structureas it also occurs in [3] that does not treat the movements ofpeople as it is treated in [7]. The structure used in [8] forcars has a method of maintenance for removing unnecessarynodes, but this additional procedure has become a bottleneckin the system. In [3] a person re-identification scheme usingmatching of interest points collected in several images duringshort video sequences was proposed and evaluated, while inthis work, a a person re-identification scheme using matchingof interest points from one image is proposed and evaluated.

In this work we present a novel algorithm for objectreidentification in camera networks. The major contributionsare the very efficient description of targets using local features,and the subsequent communication and matching algorithm

978-1-4244-4946-0/09/$25.00 ©2009 IEEE

proposed. First, the operator sent a image with the interesttarget. This image is identified and described by robust localfeatures. The object description is subsequently converted intoan extremely compact representation, the so called signa-ture. This signature might further be communicated to theneighboring camera nodes. The objects seen in a cameranode are again described and a specific signature for theobject is created. The point correlation algorithm comparesthe camera signatures with the operator signature and allowsfor efficient reidentification and tracking of the target throughentire camera networks. Our approach requires no topologicalknowledge of the environment or training standards of objects.

One nice feature of our approach is that our algorithmsminimize the amount of information per target to be trans-mitted between adjacent camera nodes while another oneis that it does not have to be re-trained for every singlecamera view. Additionally, one major goal of our work wasto use uncalibrated setups since multi-camera calibration isstill a tedious task. The power of our approach is shownon the simulation of several traffic scenarios; first, a one-camera setup is considered in several scenarios and then atwo-camera network is simulated; second, a scenario usingsynthetic images with several Gaussian noises. The resultsshowed our approach was useful and motivate further researchin the area of local features for object reidentification.

III. OBJECT REIDENTIFICATION

As previously mentioned, the entire object reidentificationapproach presented in this paper is solely based on localappearance features. Although the usage of local featureshas some shortcomings, i.e. computational costs, minimumrequirements on image quality, etc., and highly relies on robustmatching methods, it has the advantage of being partiallyrobust to image scale, rotation, affine distortion, addition ofnoise and illumination changes. By using a setup such as theproposed one, it is not necessary to tediously train classifiersfor each camera or add information of additional cues orsensors. In the following, we shortly introduce our choice oflocal features and describe the point correlation algorithm.

Interest Point Detection

Interest point detection reduces the complexity of visualrecognition, by determining the salient points in an image.A point actually refers to a region of the image with aregular shape, centered on a specific point. This regular shapemay be a circle (or elliptical, if affine invariance is desired),or it may also be a rectangle (a cruder approach that ismore straightforward for calculations). Analyzing a regionsurrounding a point is necessary because a single pixel doesnot have enough information to be deemed interesting oruninteresting.

Only the points found with the interest point detector willbe passed to the feature descriptor. This does not require,however, that every pixel falls into at least one interest point’sarea. Certain regions of the image may not be useful in objectrecognition, and thus should be ignored. Ideally, portions of

the image that are ignored by the interest point detector donot contain useful information for object recognition.

It would seem that if a large percentage of the image iscovered by the detected interest points, then little has beenaccomplished. The argument would be that only a smallpercentage of the information has been eliminated. This is notthe case. A good detector will properly aggregate the pixelsof the image, so that they are easily analyzed in the featuredescription phase.

Lowe [9] approximated the Laplacian-of-Gaussian using adifference-of-Gaussian function. This function performs twoconsecutive smoothings using a Gaussian, and finds the dif-ference in the resulting images. This is efficient to computebecause the smoothing is already necessary for building mul-tiple scales in the image pyramid.

In this paper, we use the Bay, Tuytelaars, and Van Gool[10] detector. This detector is a detection scheme that is fasterthan the Lowe. It relies on integral images [11] and box filtersto approximate the determinant of the Hessian matrix. Thisdetector is more distinctiveness, and robustness, yet can becomputed and compared much faster. This detector has highrepeatability.

A. Feature Description

In this stage, the interest points detected are descripted inColor and SURF Description.

Color: This section describes how the feature vector bycolor is generated for a given interest point. The first com-ponent of this feature vector is a histogram of the hue ofthe point. To overcome the sensitivity of HSV histograms tochanges in illumination and shadows in outdoor scenes, weuse the definition of hue which is invariant to brightness andGamma [12]. This definition of hue for a given RGB colorspace is as follows:

H = arccoslog(R) − log(G)

log(R) + log(G) − 2 ∗ log(B). (1)

This color is utilized in the signature generation process.SURF Description: A feature descriptor attempts to char-

acterize a region in a robust way that is invariant to naturalviewing changes. This may include scale, rotation, lighting,and affine/viewpoint variance. The SURF [10] descriptor ac-knowledges the high matching performance of SIFT [9] andattempts to replicate this performance while improving speed.The region of interest is divided into a square grid of 4×4 sub-regions, like SIFT [9]. However, instead of using a histogramof image gradients to characterize each sub-region, SURFmeasures the response of Haar wavelets summed over thesub-region. Both a ”vertical” and ”horizontal” Haar waveletresponse are calculated. The actual directions are relative tothe interest point’s orientation. There are four values in thedescriptor for each region: the sum of the response for eachwavelet, and the sum of the absolute value of the responsefor each wavelet. This SURF description is utilized in thesignature generation process.

Signature Generation

Recently encouraging recognition results have beenachieved in the area of large-scale image databases, using tree-like representations of local descriptors. One approach is theusage of vocabulary trees and inverse voting, like the work of[8], [3]. This approach does not use vocabulary tree, insteadthe approach utilizes a compact object signature through queryimage provided by operator. This signature is compact becauseit is formed by the interest points’ descriptor. This descriptorhas a vector of the 64 or 128 dimensions. First, the operatorprovides data through an image with the interest target, asshown in Figure 1. This image is obtained for motion detectionor tracking in the camera selected by the operator.

Fig. 1. The image provided by the operator.

In this image, we detect the n interest points with Fast-Hessian detector. We then extract the SURF descriptor vectoror/and the color of each interest point detected. This vectoror/and this color is stored in a compact signature. Thissignature is sent to all cameras in system. In each camera thepoint correlation is made between the interest target signatureand the descriptor vector of the interest points detected in theframe, shown in Figure 2, of each camera in the system.

Fig. 2. The frame provided by the camera.

Points correlation

This method does not utilize vocabulary tree, instead, it isbased on a signature of the interest target and a signatureof the camera i frame per second. This method is efficient,because the signatures are relatively short vectors (typicallyin the order of a few hundred elements) and can be matched

very fast. Also, sorting the elements in increasing order stillsimplifies the algorithm complexity and makes more efficientmatching possible. The points correlation process is formedby logical and mathematicals operations.

The vector x is formed by m interest points obtained interesttarget signature and the vector ci is formed by p interest pointsobtained from the camera i frame. The points correlation is amatching process between m interest points xj, j = 1, ...,mand p interest points ci,j, j = 1, ..., p. Each interest point hasa feature vector.

We find the Laplacian signal for each interest point in xj

and ci,j. This signal represents the maximum positive signaland the maximum negative signal in Hessian matrix. For eachpoint descriptor xj and ci,j with the same signal the sum ofquadratic differences (SQD) is applied

d∑

y=1

(ky − ly)2, (2)

where k is an interest point vector descriptor in xj, l isa interest point vector descriptor in ci,j and d is amountof dimensions the vector. If the vector is formed by SURFdescription then d is 64 or 128 else if the vector is formedby color then d is 1 else if the vector is formed by color andby SURF description then d is 65 or 129. A matching pair isdetected if its SQD distance is closer than L times the distanceof the second neighbor, where L is a threshold. This meansthat the set of interest points in x matches ci. In other words,the interest target is in the camera i frame. Finally, this frameand i are communicated to the operator. Thus, the operatorreceives all frames of the cameras system correlated interesttarget.

IV. SYSTEM EVALUATION USING REAL VIDEOS

This section presents two groups of experiments. The firstgroup uses a synthetic image database to assess the robustnessof the combination of the recognition method against thefeature vectors using the noise insertion and scalability ofpeople. The second group uses public video datasets withthe objective of forming videos databases with heterogeneousinformations like environments, the camera view angles, num-ber of people, cameras with overlapping and non-overlappingfields of view, different types of cameras and movements.This group receives most robust features vector obtained fromgroup 1. This group evaluated accuracy features vector inthe person reidentification on real videos. The third groupshow the methods complexity. These evaluations groups andthe parameters used in the evaluations are described in thefollowing sections.

A. Specification of the hardware used

The settings used in the experiments are Configuration 1with Pentium 4 Core Duo 3.2GHZ, 2GB RAM, 250 GB, C++Language, OPENCV and OPENSUSE 11.1. Configurantion 2is a notebook amazon pc tablet l97, C++ Language, OPENCVand OPENSUSE 11.1.

V. IMPLEMENTATION PARAMETERS

The experiments were performed with some parameters ofthe implementation previously adjusted, where these parame-ters are discussed in the following sections.

A. Interest Point Detection Parameters

As shown in Section III, the detector gets the interest pointsfrom each image given as input. In this paper, we used thefunction cvExtractSurf obtained from Open Computer VisionLibrary proposed by [13]. The input parameters and output ofthis function are:

• image: The input 8-bit grayscale image.• mask: The optional input 8-bit mask. The features are

only found in the areas that contain more than 50% ofnon-zero mask pixels.

• keypoints:The output parameter; double pointer to thesequence of keypoints. This will be the sequence ofCvSURFPoint structures:

– pt, position of the feature within the image.– laplacian, -1, 0 or +1. sign of the laplacian at the

point. can be used to speedup feature comparison(normally features with laplacians of different signscan not match).

– size, size of the feature.– dir, orientation of the feature: 0..360 degrees.– hessian, value of the hessian (can be used to ap-

proximately estimate the feature strengths; see alsoparams.hessianThreshold).

• descriptors: The optional output parameter; doublepointer to the sequence of descriptors; Depending onthe params.extended value, each element of the sequencewill be either 64-element or 128-element floating-point(CV 32F) vector. If the parameter is NULL, the descrip-tors are not computed.

• storage: Memory storage where keypoints and descriptorswill be stored.

• params: Various algorithm parameters put to the structureCvSURFParams:

– extended, 0 means basic descriptors (64 elementseach), 1 means extended descriptors (128 elementseach).

– hessianThreshold = 500, only features with key-point.hessian larger than that are extracted. gooddefault value is 300-500 (can depend on the averagelocal contrast and sharpness of the image). The usercan further filter out some features based on theirhessian values and other characteristics.

– nOctaves=3, the number of octaves to be used forextraction. With each next octave the feature size isdoubled (3 by default. nOctaveLayers=4, the numberof layers within each octave (4 by default).

In the case of the method that uses only the color channel H,the descriptor is not computed. In the Figure 3 the examplesof detection of points of interest in images are shown.

Fig. 3. These are examples of interest points detections. The first image isthe synthetic image query, the second image is the base1’s query image andthe third image is the base2’s video frame. e um quadro do vdeo da base2.

B. Features Description

In Section III-A, from the points detected are obtained thecolor channel H and the descriptor.The feature vector is formedby two differents forms:

1) descriptor;2) color channel H with descriptor.The descriptor was obtained in the parameter descriptors

obtained in the function cvExtractSurf. The color channel Hwas obtained by the function CV IMAGE ELEM where itsparameters are the image converted to HSV, the type of thereturn result uchar, the interest point dimensions called y andx, where the position x is multiplied by three. The value of xis multiplied by three because the image is in three channels(HSV). So, the channel H is obtained by the multiplication ofthe x by three, the channel S is obtained by the multiplicationof the x by three and add to one, and the channel V is obtainedby the multiplication of the x by three and two adds up. Thisinformation is obtained both in the image given by the operatoras the image obtained from the video frame. The evaluationsgroups are described in the following sections.

C. Group 1: System evaluation using synthetic images

This group analyzes the system’s robustness on the scala-bility of 38 interest objects. This group disregarded a camerasystem, because the specific objective of this evaluation is tocheck the robustness of the features recognition of a query on aframe with several interest objects. For each object (3D modelperson), we check the robustness of the features recognitionagainst the Gaussian noise insertion in the query image andin each object in the frame.

In this evaluation group, the matching method analyzed thethree feature vectors with the threshold 0.7 and 0.8.The threefeature vectors are color, interest points, color and interestpoints. The interest points were described in 128 dimensions.Consider the size of the noise ranged from 91 to 121 andthe scalability of models made by 26 people and 12 othersmodels. The objective is to evaluate the robustness of thecombinations between the matching method and the threefeature vectors. Consider also the scale insertion 1

1.5×, 13.0×

and 14.5× in the query images. In the follow sections the

synthetic image database and the evaluation details will beshown. A histogram equalization process is performed toincrease the image sharpness about query images with scale.After, the noise is added in the image.

Syntethic images database

This database is formed by 26 images of 3D models peopleand 12 images of 3D models objects in a total of 38 differentmodels from Google Sketchup software. The synthetic imagesare shown in Figure 4.

Fig. 4. On the left side images of 3D models people and, on the right side,images of 3D models objects.

System evaluation with the Gaussian noise insertion in thequery image and in each object in the image

In this evaluation, we inserted Gaussian noise with the sizeranged from 91 to 121 in 3D models people used as queryand in all 3D models used as object in the image, where therecognition robustness is checked. This size is high, becausein the experiments the size < 91 is stable. The details of theresults are presented below. Each graph of the results is theaverage quantity of the hits between the results of each scale( 11.5×, 1

3.0× and 14.5×) inserted in the image query.

Feature vector formed by interest points: In Figure 5 whenthe noise increases in the query image, the accuracy decreases.While between the noises 95 × 95 at 97 × 97, 99 × 99 at101× 101, 103 × 103 at 107× 107, 109× 109 at 111× 111,119 × 119 at 121 × 121, the accuracy is stable, because thefeatures have stable results in this noise levels.

In most cases the threshold 0.8 has shown higher accuracyon high noises than the 0.7 threshold, because the threshold 0.7remove some correct images. This matching method with theinterest points shown to be robust with an accuracy between5% and 19% on high noises levels.

Fig. 5. Matching results with the Laplacian signal and the feature vectorformed by the interest points.

Feature vector formed by color: This feature vector isobtained in III-A. In Figure 6 when the noise increases in thequery image, the accuracy decreases, because the image noiseis the random variation of brightness or color information inimages. In other hand, the image becomes dithering. So, thecolor information is variant in differents noise levels. However,it is perceived that there are several moments in which thefeature is stable, in other words, the information color isinvariant at the noise. In these cases the accuracy is constant.But it is observed that the accuracy is 4% in the best case.This is unsatisfactory because it is below the worst case ofthe previous experiment. This happens because the color is afeature variant at light, at noise insertion and principally ofscale. On the other hand, the interest point is not formed bythe color.

Fig. 6. Matching results with the Laplacian signal and the feature vectorformed by channel H.

Feature vector formed by color and interest points: InFigure 7, the threshold 0.7 becomes unrecognizable in somenoises. In this case the color used in the feature vectorinfluenced the matching to find false positives in all queries.Therefore, this experiment is poor.

Fig. 7. Matching results with the signal Laplacian and the features vectorformed by the channel H and the interest points.

D. Group 2: System evaluation using videos

The group 2 of experiments evaluates the reidentificationaccuracy on real videos. This group evaluates the query imagesaccuracy provided by the operator on real videos. This queryimage contain only one view angle person. In other words, theapproach reidentificated with this query image multiples viewangles of this person with rates over 82% accuracy, which isdifferent from the work developed in [3] that uses a model ofeach person storing all angles for reidentification.

Video databases description

In [14] there is a large amount of public video datasetsabout people, cars and medical images. In this database wereobtained, as:

• base1: Obtained in [15]. This database has high irregu-larities in the video sequences. It has 10 people videoswalking in different scenarios with different backgroundsnot uniform and performing several movements. Thesevideos have low sensitivity to partial occlusions anddeformations in each non-rigid motion. The database haspre-classified 7,133 in movements and in each person. Inthe images acquisition process was set to 1 fps, since theamount of the database is small. The some frames in theACTIONS videos database are shown in Figure 8.

Fig. 8. Frames obtained in the ACTIONS videos databases.

• base2: Obtained in [16]. This database provides anamount of videos recorded in different scenarios withpeople walking into a mall in a shopping centre in Lisbon.For each sequence, there are two time synchronisedvideos, one with the view across and the other along thehallway. The resolution is half-resolution PAL standard

(384 x 288 pixels, 25 frames per second) and compressedusing MPEG2. Unlike the base1, this database is notpre-classified in people and it has 72,466 images. In theimages acquisition process was set to 25 fps, since theamount of database is large. The some frames in theCAVIAR videos database are shown in Figure 9.

Fig. 9. Frames obtained in the CAVIAR videos databases.

Query database description

The query images were obtained from public videosdatabases. Each image represents a view angle of each person,which is different from [3] that uses a model of each personwith different view angles of the same as query. These imagesare obtained by a motion detection or tracking. Figures 10 and11 show the query of each database.

Fig. 10. These query images represent base1 people, which are extractedfrom person frontal view with little deformation.

Fig. 11. These query images represent base2 people, which are extractedfrom person frontal view and back view with little deformation and withpanoramic view.

Reidentification Evaluation: It considers feature vectorformed by interest points (64 and 128 dimensions) and thematching method using Laplacian signal. In this paper the rei-dentification considers at least corresponding 2 points betweenvideo frame and query image. This article was examined onfour thresholds 0.5, 0.6, 0.7 and 0.8.

This evaluation verified the accuracy of the features vectorformed by interest points on the public video databases usingthe query images for each database. The result is shown inFigure 12.

Figure 12 shows that the function is decreasing on thethreshold 0.5 in the vector of 64 dimensions. This shows thatincreasing the threshold, the error rate also increases leading todecrease in accuracy. However, the vector of 128 dimensions,the function is decreasing after the threshold 0.6. This shows

Fig. 12. Feature vectors accuracy on video database.

that the increase in error rate occurs when the threshold isgreater than 0.6. It can also view that as the threshold increasesthe 2 functions (vector of 64 dimensions and the vector of 128dimensions) will always be parallel. The system achieve 82,2%with best result.

For comparison, the best result reported in [3] on a set ofvideos including 10 different people is 82% correct best match.Ours contain two set of videos including 15 different peoplein several scenarios with overlapping and non-overlappingcameras. In this paper, we achieve maybe 82,2%. It showsdifficult examples of reidentification below, because they showchanges in their clothes and in some cases little occlusion. Thequery reidentification is using the interest points vector with128 dimensions and a threshold with 0.6.

Fig. 13. A query reidentification shows people near the person who it wantsto reidentify.

Figure 13 shows differences in clothes and the person’splacement. However, the system has reidentified the interestpoints of the person’s face and person’s shirt.

Figure 14 shows the person in the image query withocclusion in the camera frame reideintified. The operator couldhave difficult in finding this person in each video, but thesystem manages to identify this case from the interest pointsof the person’s face in the image query.

In each testing, the approach localizes the query objectthrough query object’s interest points on one object or multipleobjects obtained in the camera frame.

VI. GROUP 3: EVALUATION OF COMPLEXITY OF

ALGORITHMS

In this group,the complexity evaluation of the interest pointdetection, interest point description and point correlation isshown in the following sections. This methods are used inGroup 2.

Fig. 14. A query reidentification shows people near the person who it wantsto reidentify and this person has occlusion.

Interest Point Detection

The Interest Point Detection complexity is θ(o∗ i∗ rm ∗ rn)where o = 4 is octaves number, as shown in section V-A,i = 3 is number of layers within each octave, as shown inV-A, rm and rn are the number of answers for each pixel inthe image. The rm is between filter length and image height.The rn is between filter length and image. Thus, the InterestPoint Detection complexity is simplified for θ(rm ∗ rn).

Interest Point Description

The Interest Point Description complexity is θ(l), where l isa number of interest points detected. The SURF descriptionsare mathematicals calculations on four loops. The InterestPoint Description complexity is 625 for each interest pointdetected.

Points correlation

The points correlation complexity is formed by the numberof q camera frame points and the number of points c of thequery. Thus, the points correlation complexity is O(q∗c). Thework developed in [3] uses in the recognition a kd-tree withcomplexity O(n log n). The number of points c obtained fromone query image is less than the n points obtained in 21 imagesused with query in [3]. Thus, this points correlation methodis more efficient than the recognition algorithm developed in[3].

VII. CONCLUSIONS AND FUTURE WORK

In this paper we presented a novel approach for object rei-dentification in the cameras system. The proposed algorithmsare designed to run fully autonomous on network of camerasand are real-time capable. While communication overheadbetween adjacent camera nodes is notably reduced, we proofthe application of high-level local features advantageous forobject signature. While the application of the approach pro-posed is not limited to a special object category, our algorithmswere tested on traffic scenarios and encouraging results werepresented. The main limitations of this article is the necessityof experiments with real cameras network and the camerashave a local processing. Although the proposed method in thetask of helping people identification based on the descriptor

points of interest, other evidence can be proposed throughthe insertion of new attributes, such as people identificationthrough selected points or regions in images. Extensions of theproposed article can be a search engine for videos of peopleonline, the 3D models reidentification or even other interestobjects such as cars and lost objects. This system can integratewith others recognition or tracking systems.The deployment ofthe system in real environment is an activity that should becomplemented in the future.

REFERENCES

[1] P. H. Tu, G. Doretto, N. O. Krahnstoever, J. Rittscher, T. B. Sebastian,T. Yu, and K. G. Harding, “An intelligent video framework for homelandprotection,” 2007.

[2] G. Wei, V. A. Petrushin, and A. V. Gershman, “Multiple-camera peo-ple localization in a cluttered environment,” Multimedia Data MiningWorkshop, 2004.

[3] O. Hamdoun, F. Moutarde, B. Stanciulescu, and B. Steux, “Person re-identification in multi-camera system by signature based on interest pointdescriptors collected on short video sequences,” in ICDSC08, 2008, pp.1–6.

[4] K. Morioka, X. Mao, and H. Hashimoto, “Global color model basedobject matching in the multi-camera environment,” Intelligent Robotsand Systems, 2006 IEEE/RSJ International Conference on, pp. 2644–2649, 2006.

[5] U. Park, A. K. Jain, I. Kitahara, K. Kogure, and N. Hagita, “Vise: Visualsearch engine using multiple networked cameras,” Pattern Recognition,International Conference on, vol. 3, pp. 1204–1207, 2006.

[6] A Multi-Camera Visual Surveillance System for Trackingof Reoccurrences of People, 2007. [Online]. Available:http://dare.uva.nl/record/265108

[7] N. Gheissari, T. B. Sebastian, and R. Hartley, “Person reidentificationusing spatiotemporal appearance,” in 2006 Conference on ComputerVision and Pattern Recognition (CVPR 2006), 2006, pp. 1528–1535.

[8] Object Reacquisition and Tracking in Large-Scale Smart Camera Net-works. Viena: First ACM/IEEE International Conference on DistributedSmart Cameras, 2007.

[9] D. Lowe, “Distinctive image features from scale-invariant keypoints,”in International Journal of Computer Vision, vol. 20, 2003, pp. 91–110.[Online]. Available: citeseer.ist.psu.edu/lowe04distinctive.html

[10] Herbert Bay, Tinne Tuytelaars, and Luc Van Gool, “SURF: Speeded UpRobust Features,” in Proceedings of the ninth European Conference onComputer Vision, May 2006.

[11] P. Viola and M. Jones, “Rapid object detection using a boosted cascadeof simple features,” pp. 511–518, 2001.

[12] M. J. Swain and D. H. Ballard, “Indexing via color histograms,” inProceedings of the DARPA Image Understanding Workshop, Pittsburgh,PA, USA, 1990.

[13] G. Bradski, “The OpenCV Library,” Dr. Dobb’s Journal of SoftwareTools, 2000.

[14] Cantata, “Datasets for cantata project,” 2000, [Online;acessado 10-Agosto-2008]. [Online]. Available: http://www.hitech-projects.com/euprojects/cantata/datasets cantata/dataset.html

[15] Actions, “Actions as space-time shapes,” 2005, [On-line; acessado 10-Agosto-2008]. [Online]. Available:http://www.wisdom.weizmann.ac.il/ vision/SpaceTimeActions.html

[16] Caviar, “Caviar test case scenarios,” 2004, [On-line; acessado 10-Agosto-2008]. [Online]. Available:http://groups.inf.ed.ac.uk/vision/CAVIAR/CAVIARDATA1/