a robust human head detection method for human tracking

A robust human head detection method for human trackingHosub Yoon, Dohyung Kim, Suyoung Chi, Youngjo Cho

HRI Team, Intelligent Robot DivisionETRI

Daejeon, Korea

{yoonhs, dhkimO08, chisy, youngjo}etri.re.kr

Abstract - In this paper, an algorithm for human headdetection over a distance exceeding 2.5m between a cameraand an object is described. This algorithm is used for thecontrol of a robot, which has the additional limits of amoving camera, moving objects, various face orientations,and unfixed illuminations. With these circumstances, onlythe assumption that human head and body contours have anomega (Q) shape is made. In order to separate thebackground from these omega shapes, the three basicfeatures of gray, color and edge are used, and combined.The skin color cue is very useful when the image stream isfrontal and has large face regions, and additionally has nobackground objects similar to the skin color. The gray cueis also important when captured faces have a lower graylevel than background objects. The edge cue is helpfulwhen captured background objects have similar gray levelsand colors to those of a head, but can be discriminated byedges. Since these three methods have roughly orthogonalfailure results, they serve to complement each other. Thenext step is a splitting method between the head and bodyregion using the proposed method. The final step is anellipse fitting and a head region verification algorithm. Theresults of this algorithm provide robustness for headrotation, illumination changing, and variable head sizes.Furthermore, it is possible to carry out real time processing

Index Terms - human head tracking

I. INTRODUCTION

Because of the great progress in the technology ofcomputer hardware, more effective and friendly human-computer interaction (HRI) methods are being consideredby many researchers. Here, a group of researchers,including hardware and software teams, attempt to create ahome service rohot similar to that shown in Fig. 1. This

- A human can also move simultaneously.* Directions

- The robot can pan and tilt its head.- A human at times won't be facing the robot camera.

* Distance- The robot camera has a restricted resolution(320x240), and an optic view for real-time processing.- It is assumed that the distance between the robotcamera and a human can be greater than 2.5m.

* Space and illumination-. The robot can move anywhere in a home, and thelighting of a home area can vary considerably.

Fig. 1. Our service robot Waver

There are numerous human head detection algorithms [1-10], but only a few of these [11-14] are useful for thisstudy. Ellipse-based head detection algorithms are suitablewhen the edge information is very clear, when head regionsare larger than the restricted size, and when an input imagehas stable illumination. Thus, only edge and colorinformation are utilized for head detection. Another issuecenters on ellipses being fitted to an entire image pixels in

II. PREPROCESSING FOR BINARIZATION

In this paper, one assumption is made concerning humanhead detection: it is that human head and body contoursmake an omega (Q) shape. In order to separate thebackground from the omega shapes, the three basic featuresof gray, color and edge are used and combined.

A. Gray ModuleThe first assumption of our head detection algorithm isabout that the heads and backgrounds are divided into twogroups by gray level difference. If we know optimalthreshold value T, we can make binary image Bgray(ij) frominput gray image gray(ij) as follows:

0O : for gray(i, j) < TgrayBgray (i, J)-255: for gray(i, j) > Tggray

To detect optimal threshold value of Tgray, we try to testseveral thresholding methods including OpenCV libraryfunction, we decide to use iterative(optimal) thresholdselection method[14]. This algorithm works well even ifthe image histogram is multi-modal and performs wellunder a large variety of image contrast conditions. Thismethod is iterative, four to ten iterations usually beingsufficient to the real time processing.

B. Edge ModuleThe edge is a reliable feature when captured image havediscriminated object from background. There are so manykinds of edge detection algorithms, the Sobel edge detectoris useful for real-time processing and have good results forour application. Following Sx and Sy mask would beconvolute to entire pixels and we can calculate theedgeness and edge orientation.

I 0 1-l1 -12 _

Sx =2 0 2 Sy 0 0 0

11 0 1-l1 2

Edgeness(edge strength) = abs(Sx) + abs(Sy)Edge orientation = arctan(Sx, Sy);From this edgeness information, we should decidethreshold value Tedge whether edge or not as follow:

0O : for Edgeness(i, j) < TedgeBedge (, J -255: for Edgeness(i, j) > Tedge

Generally speaking, the edgeness values have to be strongon the brightness illumination than on the darknessillumination. Therefore, the fixed threshold value Tedge isnot proper under the natural lighting condition. TheRobinson's algorithm[l5] utilizes temporal local thresholdvalue using masking method when a pixel is decided to an

edge or non-edge. Following equation and mask explainthis algorithm's creation.

E 1 2 1LAT= Io=412 ]

-1 2 1The variable E means edge strength and Mo is a filter value.We set to cc as 32. If a LAT is greater than 1, this pixelshould be decided to edge. If a LAT is smaller than 1, itshould be decided to non-edge. This edge decision moduleis useful, when illumination is variable at office or livingroom. At that time, this algorithm produce uniform edgecontours and it makes isolated human head regions or bodyobjects from background.

C. Color ModuleIt is known that although skin colors of different peoplevary over a wide range in color space, the variation of hueand saturation with human skin color is much less thanbrightness. The input images represented in RGB colorspace, we considered the color normalization as follow:

R G Br=9R+G+B R+G+B R+G+BAfter the image color values are normalized, thedistribution of human skin color values has a 2D Gaussianfunction even if case of different races. Another advantageof color normalization is that it reduces the illuminationeffect because the normalization processing is actually abrightness elimination process.To reduce missing error of skin color area, we make thewide range skin color detection Rule 1 as follow:Rule 1(Skin color detection):Bcolor U _I) - for r(i, j) > g (i, j) and r(i, j) > b(i, j)

{255 otherwiseThe results of image plane Bcolor(ii) may have a lot ofnoise area. Therefore, we make the noise deletion rule 2 asfollow:Rule 2(Noise deletion):Too bright or dark:

Bcolor (i, j) = 255 for (r(i, j) > Tcolor max or r(i, j) Tcolor min)Too red:

Bcolor(i, j) =255 for ((r(i, j) > g(i, j) * 2 or r(i, j) > b(i, j) * 2)Too green or blue:Bcolor(i, j) =255 for ((g(i, j) > b(i, j) * 2 or b(i, j)> g(i, j) * 2)Where, Tcolor-max is a maximum red color for skin andTcolor-min denotes a minimum red color. Above two rules areabsolute rules that mean these rules are not changedaccording to lighting or race variance excepting blackpeople. Another color model represented by a Gaussianmodel C(mrg6rb2) and C(mrb6rb2) are obtained, wheremrg = (r,g),andmb=m(r,b) with

4559

iN N i N

r = NE rig= N g,b= biN j1 N j1 N j1

Y1= I (yrb

These skin color attributes are assigned by analysis frommany training skin color data. But, they can't be fixed andthey should have relative values always. Namely, theycould be changed by illumination situation and this is areason why mean shift algorithm of OpenCV library isused for generally.

III. DIVIDING ALGORITHM FROM OMEGA SAHPE

The present algorithm is started by followingproposition:* All human heads have the same shape "Q" in spite of anyviewing direction.* If dividing the omega into two shapes similar to what isshown here, it is possible to fit the upper area to the ellipsefitting algorithm.

To divide the omega into two parts, three steps areprocessed, in the order of a center line detection, a checkfor the vertical length, and finally a separation into twoparts.

Fig. 2. There steps for breaking point detection

From the step 3, we have to decide break points. In order todecide break point, we make four condition rules asfollowed Figure 3.

B CBB CBB CB

(a) Condition 1: Width Change

B B B iBBBB

(b) Condition 1: Limited angle

(C n MuIpebv B_

(cCodto_:Mltpebace

0 Step 1: Centerline detection

§ Step 2: Check the vertical length and direction(d) Condition 4: Limited vertical length

Fig. 3. The four separating rules

From Fig. 3, at point C, the centerline is detected, while

point B shows one binary pixel from the preprocessing

section. Point denotes the end position of the head region.

Fig. 4 shows the results of these steps.

0 Step 3: Break into two parts

4560

IV. ELLIPSE FITTING AND VERIFICATION

A. Ellipse fittingFrom the previous section, candidates for a head circleregion are found. The first step for the head circle detectionis a contour detection method for an ellipse-fittingalgorithm. In this present approach, the OpenCV library(OpenCV 2003) contour extraction algorithm is applied tothe binary image [16-20]. This generates a collection ofexternal contours.A general conic has the following form:

Fe(x,y)= ax2 +by+cy2 +dx+ey+ fFitzgibbon's method adds restrictions to force the conic tobecome an ellipse. The final results of the ellipse-fittingalgorithm are several five-dimensional parameters torepresent the head contour as follows:

N

fellipse (S) Xi (Xcv YC,a, , 0)i=l

s = { gray, edge, color}Where Vfellipse (s) is the set(s) of ellipses from three

modules, and Xi is ith ellipse parameter of the candidatehead, (xc, Yc) is the center of ellipse, aand 8 are the lengthsof the major and miner axes of ellipse, respectively, and 0 isthe orientation. Finally, N is number of ellipses.

B. VerificationBecause the initial results of ellipse fitting algorithm have a

lot of false detection of head noises, we make the evaluationfunction K based on the size, shape and rate of eachcandidate ellipses.Kevaluation (s), evaluation = {size, shape, rate}

a Size of ellipse; Head sizes are depends on the distancebetween object and camera, but we have focused longdistance head detection for our approach, it needsmaximum limit of each parameters ranges. These rules are

helpful to delete the several noise ellipses.N

Ksize (S) = fellipse (S): ai < WP,l8i < h P, Oi < OP

Where, wP and hP are maximum width and height of headellipse and is a maximum degree of head ellipse.Shape of ellipse; For the calculating shape similarity of

head ellipse, we define the compactness and ellipse fittingerror.

Kshape = {compactness, fitting error}

Compactness is a popular shape description characteristic ofborder area that is given by:

(region -border length )2compactness = c* =l

areaik

fitting _error= fi = ( xi _xi I+tYi _Yi 1)i=l

Where areai is the ith ellipse square, and k is the number ofpixels which consist of borders, and xi denote ith pixelposition which consist of head border and xi' is ith pixelposition of fitted ellipse that have same angle Oi. Usingthese two shapes measurements, we define the followingtruncation set.

N

Kshape (S) = Z{IVeipse (S) : c < C , fi < f P}

i=l

Where, cP is the maximum compactness error of the ellipse,and f denotes the maximum fitting error of the ellipse.Figure 5 shows the ellipse fitting error measurement formore detail description.

Fitted ellipse Realborder

: Fitting error

Fig. 5. The ellipse fitting error

*Rates between vertical and horizontal radius of ellipse; Itis proper that vertical radius is bigger than horizontal radiusin case of head ellipse.

N a

rate (s) Iellipse (S): < 1}

i=l A

V. TEST RESULTS AND CONCLUSION

To prove the proposed head detection algorithm, severalmovie files were made. Each movie file had a durationbetween 2-4 minutes to ensure the robustness of the test interms of the head scale, orientation, and moving camera

variance. Table 1 shows the test results.

Table 1. Test results according to several variances

vMovie1

1 OUU Scale(2.5 6m)

1ouu I23(95.1%)

4561

Movie 1600 Scale + Face 1600 14752 orientation (92.1%)Movie 1600 Scale + Face 1600 14413 orientation + (90%)

Moving cameraMovie 3200 Scale+ Orientation 5689 52514 + Multiple (92.3%)

People

Movie 3200 Multiple 6545 59425 People + Moving (90.8%)

camera

The results of this algorithm provide robustness for thefigures of the head scale, head orientation, and movingcameras. Figure 6 shows the several head detection resultsaccording to the various environments. Most of thedetection errors under the scale variance occurred when thehead regions were over 5m away from the camera. Whenthe face orientations are nearly turned around (not facingthe camera), and it was not possible to use the skin colorinformation, the detection error rates increased. Neither amoving camera nor incidences of multiple objects affecteddetection ratio. Moreover, the proposed algorithm couldhandle real-time processing with less than a 1 GHz CPU.However, the proposed algorithm had high false detectionerrors yet. This means that background objects with a circleshape have a probability to be recognized as headcandidates. In the proposed approach, it is not possible todiscriminate in this situation, but this may be mitigated by atracking algorithm, which is a future study the authors arecurrently undertaking. Preliminary results are showingpromise in this area.

[11] Harsh Nanda; Kikuo Fujimura.; A Robust Elliptical Head Tracker,Proceeding of sixth IEEE International Conference of Face andGesture Recogniton, Page(s): 469 - 474, 2004.

[12] Birchfield, S.; An elliptical head tracker, Conference Record ofthe Thirty-First Asilomar Conference on Signals, Systems &Computers, Volume 2, Page(s):1710- 1714. Nov., 1997.

[13] Birchfield, S.; Elliptical head tracking using intensity gradientsand color histograms, IEEE Computer Society Conference onComputer Vision and Pattern Recognition, Page(s):232 - 237, June1998.

[14] T. W. Ridler and S Calvard, Picture thresholding using aniterative selection method, IEEE Transactions on Systems Men andCybernetics, 8(8): Page(s):630-632, 1978.

[15] G. S. Robinson, Edge Detection by Compass Gradient Masks,Computer Graphics and Image Processing, Vol. 6, Page(s)492-501,1977.

[16] H. Freeman; L. S. Davis.; A corner-finding algorithm for chaincoded curves, IEEE Computer Transactions Pattern Analysis andMachine Intelligence, vol.: C-26, pages: 297-303, March, 1977.

[17] C. H. Teh and R. H. Chin. On the Detection of Dominant Points onDigital Curves." In IEEE Transactions on Pattern Analysis andMachine Intelligence. Volume: 11, August, 1989.

[18] J.A. Horst. Efficient Piecewise Linear Approximation of SpaceCurves using Chord and Arc Length, SME Applied MachineVision 96 Conference, June 3 -6, 1996.

[19] Andrew W. Fitzgibbon, Maurizio Pilu, and Robert B. Fisher. Directleast square fitting of ellipses, IEEE Transactions on PatternAnalysis and Machine Intelligence, 21(5), Page(s):476-480, 1999

[20] S.Suzuki, K.Abe. Topological structural analysis of digital binaryimage by border following. CVGIP.30(l): Page(s)32-46, 1985.

References[1] Ming-Hsuan Yang; Kriegman, D.J.; Ahuja, N.; Detecting faces in

images: a survey, IEEE Transactions on Pattern Analysis andMachine Intelligence, Vol. 24, Issue 1, Page(s):34 - 58, Jan., 2002.

[2] P. Viola and M. Jones, Robust real-time face detection, Int. Journalof Computer Vision, vol. 57, no. 2, pp. 137-154, 2004.

[3] Moreno, F.; Tarrida, A.; Andrade-Cetto, J.; Sanfeliu, A.; 3D real-time head tracking fusing color histograms and stereovision,Proceedings. 16th International Conference on Pattern Recognition,Volume 1, Page(s):368 - 371, Aug. 2002.

[4] Yunqiang Chen; Yong Rui; Huang, T.; Mode-based multi-hypothesis head tracking using parametric contours, Fifth IEEEInternational Conference on Automatic Face and GestureRecognition, Page(s):1 12 - 117, May 2002.

[5] Zhihong Zeng; Songde Ma; Head tracking by active particlefiltering, Fifth IEEE International Conference on Automatic Faceand Gesture Recognition, Page(s):82 - 87, May 2002.

[6] Weimin Huang; Ruijiang Luo; Haihong Zhang; Beng Hai Lee;Rajapaksc, Real time head tracking and face and eyes detection,IEEE Region Conference on Computers, Communications, Controland Power Engineering. Volume 1, Page(s):507 - 510, Oct. 2002.

[7] Gokturk, S.B.; Tomasi, C. 3D head tracking based on recognitionand interpolation using a time-of-flight depth sensor, IEEEComputer Society Conference on Computer Vision and PatternRecognition, Volume 2, Page(s):211 - 217, July 2004.

[8] Ba, S. O.; Odobez, J.M. A probabilistic framework for joint headtracking and pose estimation, International Conference on PatternRecognition, Volume 4, 23-26 Aug. 2004 Page(s):264 - 267 Vol.4

[9] Ye Zhang; Kambhamettu, C. Robust 3D head tracking underpartial occlusion, Fourth IEEE International Conference onAutomatic Face and Gesture Recognition, Page(s):176 - 182, March2000.

[10] Bernier, 0. and Collobert, D.; Head and hands 3D tracking in realtime by the EM algorithm, IEEE Workshop on Recognition,Analysis, and Tracking of Faces and Gestures in Real-Time Systems,Page(s):75 - 81, July, 2001.

4562

Rotation 5 Skew 5 Scale 5 Background 5

Fig. 6. The results of head detection according to variance of head rotation, skew, scale, textured background

4563

a robust human head detection method for human tracking

Documents

human headdetection

head regionsare larger

variable head sizes

robot camera

human trackinghosub

edge cue

service robot waverthere

edge information