skin color segmentation using coarse-to-fine region on...

IEICE TRANS. INF. & SYST., VOL.E91–D, NO.10 OCTOBER 20082493

PAPER

Skin Color Segmentation Using Coarse-to-Fine Region onNormalized RGB Chromaticity Diagram for Face Detection

Aryuanto SOETEDJO†a), Nonmember and Koichi YAMADA††b), Member

SUMMARY This paper describes a new color segmentation based ona normalized RGB chromaticity diagram for face detection. Face skin isextracted from color images using a coarse skin region with fixed bound-aries followed by a fine skin region with variable boundaries. Two newlydeveloped histograms that have prominent peaks of skin color and non-skin colors are employed to adjust the boundaries of the skin region. Theproposed approach does not need a skin color model, which depends on aspecific camera parameter and is usually limited to a particular environmentcondition, and no sample images are required. The experimental results us-ing color face images of various races under varying lighting conditionsand complex backgrounds, obtained from four different resources on theInternet, show a high detection rate of 87%. The results of the detectionrate and computation time are comparable to the well known real-time facedetection method proposed by Viola-Jones [11], [12].key words: face detection, normalized RGB, chromaticity diagram, his-togram thresholding, coarse skin region, fine skin region

1. Introduction

Many machine vision applications for recognizing peopleand their activities have been developed, such as surveil-lance system, security and control system, human computerinteraction, etc. One important task in those systems is facedetection, which is usually used as a front-end process todetect and localize a human face from images.

Many researchers have proposed various techniquesto detect human faces from images [1]–[14]. In [1], [2],YCrCb color space was used to detect faces in the contextof video sequences. A two-dimensional Cr-Cb histogramwas employed for separating a facial region from the back-ground [1], and a region-growing technique was applied onthe remaining area. Finally faces are detected by ellipsematching and using criteria based on face’s characteristics.In [2], they have shown that under a normal illuminationcondition, the skin color falls into a small region on theCbCr plane, and the luminance (Y) is uncorrelated with re-spect to the CbCr. Thus, a pixel is classified as skin-likeif its chrominance value falls into the small region definedin CbCr plane, and the luminance (Y) falls into the intervaldefined empirically.

Manuscript received March 24, 2008.Manuscript revised June 13, 2008.†The author is with the Faculty of Engineering, Multimedia

University, Malaysia.††The author is with the Department of Management and In-

formation Systems Science, Nagaoka University of Technology,Nagaoka-shi, 940–2188 Japan.

a) E-mail: [email protected]) E-mail: [email protected]

DOI: 10.1093/ietisy/e91–d.10.2493

Gaussian models were employed to build the skincolor models based on YPrPb [3] or normalized RGB colorspaces [4]. The skin-likelihood image is obtained from themodel and will be a grayscale image whose gray values rep-resent the likelihood of the pixel belonging to skin. Then,skin regions and non-skin regions are obtained by threshold-ing the skin-likelihood image. Faces are detected by com-puting the ratio of width and height, and the number of holesrepresenting eyes and mouth inside the extracted skin re-gions [4].

The performance of different skin models and differ-ent color spaces (HSV, YCrCb, RGB, normalized RGB, etc)have been studied in [5], [6]. Those models have shown arelative robustness against illumination changes under a lim-ited condition, and when illumination color is uniform overthe face. The skin locus defined as the range of skin chro-maticities under varying illumination in normalized RGBcolor space was proposed to extract skin color [7]–[10]. In[10], they proposed a modification of skin locus used in [7]to improve the extraction results. The skin locus methodclassifies pixels as skin color when the chromaticity valuesfall within the upper and lower bounds.

The drawback of the Gaussian models used in [3], [4]is the high computational time in calculating the probabil-ity of each pixel in the image. The very low computationaltime is achieved by a simple method using the skin locus asproposed in [7]–[10]. However, the skin locus is camera-dependent. Different camera parameters have different skinloci.

In recent years, a well known real-time face detectionapproach with very high detection rate was introduced in[11]. Instead of using color images [1]–[10], the approachworked with grayscale images. It employed an Adaboostclassifier [15] to select the appropriate Haar-like rectangularfeatures for face detection. One of the major contributionsof the paper is the concept of integral image which allowsthe features to be computed very fast. Secondly, they con-structed a learning algorithm, based on Adaboost to select asmall number of important features. Furthermore, a methodto combine the Adaboost classifiers in a cascade allows non-face regions of the image to be discarded quickly. The ex-tended rotated Haar-like features were introduced in [12] toimprove the detection performance.

The drawback of methods in [11], [12] is that they re-quire huge-sized sample images in the training process, usu-ally thousands of positive sample images (face images) andnegative sample images (non face images). In addition, the

Copyright c© 2008 The Institute of Electronics, Information and Communication Engineers

2494IEICE TRANS. INF. & SYST., VOL.E91–D, NO.10 OCTOBER 2008

training process might take several days to be completed.The face detection methods proposed in [1]–[12] work

with static images which only consider the information con-tained in one image (current frame) to detect the face object.The performance would be improved by taking into accountthe information from several frames up to the current frameusing a particle filter approach as proposed in [13], [14]. Inthe approach, each particle refers to a candidate location offace object in the image. The particle locations are com-pared with a target model to measure the likelihood functionwhich is used to estimate the probability of presence face inthe current frame given the history of observations up to cur-rent time. In [13], an observation density based on the staticface detection using rectangle features introduced by [11]was adopted. In [14], the face detection on the skin colorin HSV color space was adopted to compute the observationdensity.

In this paper, we propose a new method to overcomethe problems encountered in the previous works, which de-pend on the specific camera and lighting condition, and/orneed to prepare sample images in the training process. Ourproposed method works in more general conditions, anddoes not need the training process.

To avoid the training process, we adopt the approachusing the skin locus as proposed in [7]–[10]. However,our skin locus is adjusted automatically by observing thepeaks/valleys of the newly developed histograms. There-fore when the camera parameters and the lighting conditionschange, our method adjusts the skin locus appropriately.

We propose the new skin color segmentation based onthe normalized RGB chromaticity diagram by developingthe coarse skin region and the fine skin region to extract skincolor. The fixed straight lines are used as boundaries of thecoarse region, while the variable straight lines are used asboundaries of the fine region. We take advantages of thenewly developed histograms which show prominent peaksof skin and non-skin colors and valleys between them fordetermining the boundaries of the skin regions.

The paper deals with detection of single frontal face,where the shape could be assumed to be an ellipse. An el-lipse detection technique is employed to detect the face fromthe extracted skin pixels. Since our approach is composedof two steps: skin segmentation and face detection, it is pos-sible to develop the system for non frontal multi faces byextending the techniques used in the steps accordingly. Itwill be addressed in a different paper separately. Our maincontribution presented in this paper is the skin segmentationmethod which is effective to extract human skin color fromimages.

There are many face detection applications which em-ploy single frontal face images, such as a system to monitorthe driver’s fatigue using a video camera installed in the ve-hicle’s dashboard, an access control system using a videocamera installed in front of the door to allow a recognizedperson to enter the room, or a system to recognize people’sface and comparing it with the photograph printed in thepassport for people identification for assisting the immigra-

tion officer at the entrance gate in the airport, etc.The rest of this paper is organized as follows. Section

2 presents the proposed face segmentation method. Experi-mental results are presented in Sect. 3. Conclusions are pre-sented in Sect. 4.

2. Skin Segmentation

2.1 Skin Locus

Color distribution of human skin from various countries hasbeen investigated in [16], which is a shell-shaped area in ther-g color space (normalized RGB color space) that is calledas skin locus, where r = R/(R+G+B) and g = G/(R+G+B).They proposed a physical based model of skin color that canbe used to detect skin color, when the spectrum of the lightsource and the camera characteristics are known.

In [7]–[10], [17], they employed a skin color distribu-tion obtained empirically to develop the skin locus in the r-gcolor space. The upper and lower bounds of the skin locusare quadratic functions, which the quadratics coefficients arefound using least square estimation of the sampled points onthe boundaries. In [7], [10], the upper bound quadratic func-tion is g = −1.3767r2 + 1.0743r + 0.1452, while the lowerbound quadratic function is g = −0.776r2+0.5601r+0.1766as illustrated in Fig. 1. In [17], the upper bound quadraticfunction is g = −5.05r2 + 3.71r − 0.32, while the lowerbound quadratic function is g = −0.65r2 + 0.05r + 0.36 asillustrated in Fig. 2.

2.2 Proposed Skin Segmentation

Instead of developing the skin locus empirically, we developa more general skin locus by utilizing color distribution inthe r-g chromaticity diagram as described in this section.

Figure 3 illustrates color distribution in the r-g chro-maticity diagram, where reddish colors, bluish colors, andgreenish colors are separated in the three sides of the trian-gle. Our previous work [18] has shown that we could utilizea line-p in Fig. 3 to extract red color effectively. In additionto the line-p, a line-n is utilized here to extract skin colorby introducing the coarse and fine skin regions. At first, thecoarse skin region is defined using fixed boundaries, where

Fig. 1 Skin locus proposed in [7], [10].

SOETEDJO and YAMADA: SKIN COLOR SEGMENTATION USING COARSE-TO-FINE REGION ON NORMALIZED RGB CHROMATICITY DIAGRAM2495

Fig. 2 Skin locus proposed in [17].

Fig. 3 Color distribution in the r-g chromaticity diagram.

skin and skin-like colors (colors near to skin color) are ex-tracted. Then, the fine skin region with variable boundariesis used to separate skin color from skin-like colors.

2.2.1 Coarse Skin Region

When images are taken under normal lighting condition, theskin locus occupies an area around the center of the r-g chro-maticity diagram [7]–[10], [17]. However, when the spec-trum of light sources changes, e.g. from skylight in the earlymorning (bluish appearance) to sunlight during sunset (red-dish appearance), the skin locus moves to the right (reddish)on the r-g chromaticity diagram as observed in [16].

The coarse skin region proposed in this paper is devel-oped by considering the results obtained in [16] to cope withthe different lighting conditions. We define the boundariesof the coarse skin region as illustrated in Fig. 4, where line-G, line-R, line-B, and line-up are expressed with the follow-ing equations:

- line-G : g = r (1)

- line-R : g = r − 0.4 (2)

- line-B : g = −r + 0.6 (3)

- line-up : g = 0.4 (4)

The line-G, line-R, and line-B are used to remove thegreenish, reddish, and bluish pixels respectively, while the

Fig. 4 The coarse skin region.

line-up is used to remove yellow-green pixels.In addition to those boundaries, to exclude the white

pixels, we use a circle (indicated by line-c in Fig. 4) similarto [7], [8], [10], which is expressed as:

- line-c : (g − 0.33)2 + (r − 0.33)2 = 0.0004. (5)

At the first stage, we extract pixels whose values of rand g fall inside the coarse skin region. It is able to extractskin color from images under the different lighting condi-tions as illustrated in Fig. 5. Figures in the first row arethree images with different lighting conditions. Figures inthe second, third, fourth, and fifth rows are extracted imagesobtained using the skin region proposed in [7], [8], [17], andour proposed method respectively. The figures show the su-periority of our method compared to the others for extractingskin color from images taken under different lighting condi-tions. However, in some conditions non-skin objects withcolors near to skin are also extracted. Therefore, furtherprocesses are needed to separate skin objects from non-skinobjects as described below.

2.2.2 Fine Skin Region

After color extraction using the coarse skin region, we havean image that contains both skin pixels and non-skin pixels.To separate skin pixels from non-skin pixels, we develop thefine skin region by moving the line-p and the line-n shown inFig. 3 to the particular positions according to the developedhistograms as described below.

In our previous work [18], a diagonal line with slope1.0 called line-p could be used for separating red color ef-fectively, in the sense that by utilizing the line, the resultedg pos histogram discussed in the following shows promi-nent peaks/valleys for easy threshold calculation.

A histogram represents the number of pixels in an im-age at each different intensity. We develop two histogramsin different ways. The first histogram called g pos is the onecreated by counting pixels with the value obtained by sub-tracting r value from g value (g − r). The second histogram


Fig. 5 Skin color extraction under the different lighting conditions: (a),(b), (c) Original Images; (d), (e), (f) Extracted skin color using methodin [7]; (g), (h), (i) Extracted skin color using method in [8]; (j),(k), (l)Extracted skin color using method in [17]; (m), (n), (o) Extracted skin colorusing our proposed coarse skin region.

called g neg is created by counting pixels with the value ob-tained by adding g value and r value (g + r). Then, by em-ploying histogram peaks/valleys analysis, we could find theappropriate line-p and line-n for separating skin color, be-cause those lines are represented by g−r = k1 and g+r = k2,respectively, where k1, k2 are constants. To find a peak, firstthe Gaussian smoothing is employed to reduce small fluc-tuations of the histogram. Then a point in the histogram isdefined as the peak if it has a maximum (local maximum)value, and is preceded by a value lower than a threshold (inthe experiment, the threshold is set to 5). Since the thresh-old is introduced here, a small up and down in the histogramcould be neglected. The valley is defined in the oppositemanner.

Figure 6 is typical examples of g pos and g neg his-tograms obtained from an image after the extraction usingthe coarse skin region. There are two peaks in Figs. 6 (a) and6 (b), and three peaks in Fig. 6 (c). In Fig. 6 (a), the higher

Fig. 6 (a) g pos histogram; (b) g neg histogram; (c) g pos histogramwith three peaks.

peak represents non-skin pixels, and the lower peak repre-sents skin pixels. In Fig. 6 (b), the higher peak representsskin pixels, and the lower peak represents non-skin pixels.In Fig. 6 (c), skin pixels are represented by the highest peak.Thus we do not have a common rule to extract skin pixelsfrom the histograms.

To overcome the difficulty and make our method workswith almost all cases, we define all possible fine skin regionsfrom the histograms and extract skin color for each region.Then we verify extracted pixels from each region by geom-etry analysis and perform ellipse fitting to extract a face.

Let us define cluster as “a hill-curve” with one peakand two valleys in the left and right. It is noted that if thepeak is in the most left/right of the histogram then the lowestpoint in the left/right of the peak is considered as a valley.We search the clusters by finding the peaks and valleys in


the histograms. From a few experiments on single face im-ages, we found that those non-skin pixels concentrate in acluster with the high peak, when they are the uniform back-ground of the image. When the background is non-uniform,the non-skin pixels scatter into a few clusters with the lowpeaks. Therefore, we assume that the cluster of skin pixelswill have the highest or second highest peaks. Thus onlytwo clusters with the highest peaks are selected for furtherprocess, due to the fact that skin color will fall into one ofthem.

The assumption could be justified, because non-skinpixels far from the skin color have been removed in the pre-vious step, namely, by the coarse skin region. Therefore, itdoes not imply that the face should occupy a large area inthe image.

The left and right valleys of the two clusters for g poshistogram are expressed as PTL1, PTR1, PTL2, and PTR2as illustrated in Figs. 6 (a) and 6 (c), where PT Lx denotesthe left valley of cluster-x, and PTRx denotes the rightvalley of cluster-x. In Fig. 6 (a), since there are two clus-ters, we select both of them as the first and second clus-ters, and PTL2 = PTR1. In the case of only one clusterexists, then PTL1 = PTL2, and PTR1 = PTR2. Figure6 (c) shows the case when three peaks or clusters exist, andsince the two highest peaks do not appear successively, thenPTR1 � PTL2. The same process is applied to g neg his-togram, and NTL1,NTR1,NTL2, and NTR2 are obtained asillustrated in Fig. 6 (b). Referring to Fig. 3, PTL1, PTR1,PTL2, and PTR2 are the parameters to be used to select theappropriate line-p for extracting skin color. While NTL1,NTR1, NTL2, and NTR2 are the parameters to be used toselect the appropriate line-n for extracting skin color.

Using the above parameters, we define ten fine skin re-gions as follows:

Let us define:

Pos = g − r and Neg = g + r. (6)

Then Region-1 to Region-10 are defined using the followingrules:

- Region-1 :

IF (PTL1 ≤ Pos ≤ PTR1) AND (NTL1 ≤ Neg ≤ NTR1),

(7)

- Region-2 :


(8)

Region-3 :


(9)

- Region-4 :


(10)

- Region-5:

Region-5 = (Region-1) ∪ (Region-2), (11)

- Region-6 :


- Region-7 :


- Region-8 :


- Region-9 :

IF g ≤ 0.34, (15)

- Region-10 :

IF (R +G + B)/3 ≥ 0.33, (16)

where the values of R,G and B are normalized to 1.0 (0 ≤R,G, B ≤ 1.0), and “∪” in Eqs. (11)–(14) denotes the com-bination of regions.

Figure 7 illustrates the representation of the fine skinregions in the chromaticity diagram inside the coarse skinregion which is shown in Fig. 4. In Fig. 7, PTL1, PTR2,NTL1, and NTR2 are not always on the border of the coarseskin region. Areas-E,G,H appear when PTR1 � PTL2,areas-B,J,M appear when NTR1 � NTL2, and area-F appearswhen (PTR1 � PTL2) and (NTR1 � NTL2). They repre-sent non-skin colors as discussed previously (see Fig. 6 (c)).Region-10 does not appear in the figure, because it is usedto overcome the limitation of the r-g chromaticity color asdiscussed below.

The ten regions are defined to cope with all possibleimage conditions. We might extract a skin face properlyfrom Region-1 to Region-4, when the skin pixels fall in onecluster of both g pos and g neg histograms. However, theymay separate into two clusters when the light illuminatesover the face non-uniformly. In the case, none of Region-1 to Region-4 can extract all the skin pixels. Therefore,Region-5 to Region-8 are defined to cope with such cases,that is, at least one of regions can extract all the skin pixels

Fig. 7 Representation of the fine skin regions in the chromaticity dia-gram inside the coarse skin region.


properly.There are a few conditions, where the chromaticity val-

ues of non-skin objects are the same as or very close tothose of skin object. In those cases, there are no valleys inthe g pos and g neg histograms to find PTR1, PTL2, NTR1,and NTL2. Thus PTR1=PTR2, PTL1=PTL2, NTR1=NTR2,NTL1=NTL2. Hence, Region-1 to Region-8 are all the same,and could not be used for separating the skin and non-skinobjects. To cope with the problems, additional Region-9 andRegion-10 are defined as shown in Eqs. (15) and (16).

Region-9 is used to remove non-skin objects with col-ors near to the white point (r = 0.33, g = 0.33), which arenot removed by line-c in Eq. (5) nor Regions from 1 to 8. Asshown in Fig. 7, if the non-skin objects occupy area-L, andthe skin object occupies area-A+C+D, then there will be avalley in the g pos histogram where PTR1 � PTR2, PTL1� PTL2, and PTR1=PTL2. Thus Region-5 will extract skincolor properly. However, if the skin object occupies area-Ior area-I+A, then there will be no valley in the g pos andn pos histograms for separating the skin and non-skin ob-jects. Thus Region-1 to 8 could not be used to extract skincolor as discussed previously. Instead, Region-9 is intro-duced to discard the non-skin objects.

Region-10 is used to remove non-skin objects havingthe same chromaticity values as addressed in [10]. Due tothe normalization effect in the r-g chromaticity color, darkpixels and bright pixels might have the same r and g values.Suppose the two pixels; one is a dark pixel with RGB valuesof R = 0.3, G = 0.3, B = 0.3, and the other is a bright onewith R = 0.6, G = 0.6, B = 0.6, where the values of R,G and B are normalized to 1.0 (0 ≤ R,G, B ≤ 1.0). In thiscase, the both have the same values of r = 0.33 and g = 0.33in the chromaticity diagram. However the grayscale valuesobtained using Eq. (16) of the dark one and the bright oneare 0.3 and 0.6, respectively. Therefore, Region-10 couldresolve the problem.

2.3 Face Detection

The coarse and fine skin regions extract all human skin, in-cluding face, hand, neck, etc. In some conditions, especiallywhen images have the complex backgrounds including col-ors similar to skin, non-skin pixels are also extracted. There-fore we employ geometric analysis and ellipse detection todetect face, since we deal with frontal face whose shapecould be assumed as ellipse.

After extracting skin pixels in ten regions, connectedcomponent analysis is used to group the connected pixelscalled as blobs. In the research, the original image is resizedinto 60 × 80 pixels using nearest neighbor method that sim-ply matches a pixel in the original image to the correspond-ing position in the resized image, thus there is no new colorsare created. Therefore the edges are relatively sharp com-pared with methods such as bilinear and bicubic ones, thusit is suitable for color segmentation. To reduce the numberof non-face objects, the resulted blobs are pre-processed asfollows. First, a median filtering is applied to remove spuri-

ous pixels or very small blobs. Blobs usually have holes dueto the non-skin parts on the face, such as eyes and mouth. Amorphology operator “filling” is used to fill the holes. Then,a connected component analysis is used to form the solidblobs and label or group them. Finally blobs are removed ifthe number of the pixels is less than 400 pixels and greaterthan 3,000 pixels.

In the next stage, edge pixels of each labeled blob areextracted and used to detect the ellipse using an ellipse fit-ting method [24]. This ellipse fitting method is based onthe least square criterion as described in the following. Theequation of an ellipse can be written as

a1x2 + a2xy + a3y2 + a4x + a5y + f = 0. (17)

The least square estimator estimates the coefficientsA = [a1 a2 a3 a4 a5]T from the given data points(x1, y1), . . . , (xN , yN) by minimizing the squares sum of anerror between the data points and the ellipse

e =N∑

i=1

(a1x2

i + a2xiyi + a3y2i + a4xi + a5yi + f

)2. (18)

The values of estimated coefficients A are then convertedto the ellipse parameter; orientation (θ), center coordinate(h, k), semi-major axis (a), and semi-minor axis (b). (Read-ers may refer to Ref. [24] for the detail formulas).

The final stage is to verify whether the detected ellipsesare faces or not. Many methods have been proposed for faceverification, such as calculating the ratio of height to width[1], [4], [9], symmetry analysis [9], and facial componentsanalysis, i.e. detecting eyes, lips, and nose [1], [3], [4], [9],[10]. Facial component analysis might be the best methodfor face verification, however it contributes to high compu-tation time. It is reported that the facial component analysistakes about 70% of the overall face detection time [10].

Face verification using the height-width ratio is themost popular and simplest method, since it is reasonablethat the ratio of height to width of human face should fallwithin a specific range. In the paper we employ the method;the ratio of height to width should fall within 1.2 to 2.2, fol-lowed by a simple proposed method to calculate the ratio ofblob’s mismatch area as described below.

There are several cases, where even though the heightto width ratio of detected ellipse falls within face’s ratio,the fitted ellipse is not a true face. It causes false detection.We propose to utilize the blob’s mismatch area for ellipseverification to overcome the problem.

Instead of computing all possible pixels along the el-lipse’s circumference which consumes the computationaltime, we only calculate eight points using ellipse parameterdescribed previously as illustrated in Fig. 8 using the follow-ing formulas:

Pi(x) = h − a cos(αi) cos(θ) + b sin(αi) sin(θ) (19)

Pi(y) = k + a cos(αi) sin(θ) + b sin(αi) cos(θ) (20)

where, Pi(x) is the x-coordinate of point Pi; Pi(y) is the y-coordinate of point Pi; i = 1, 2, . . . , 8;α1 = 00, α2 = 450,


Fig. 8 Illustration of calculation blob’s mismatch area.

α3 = 900, α4 = 1350, α5 = 1800, α6 = 2250, α7 = 2700, andα8 = 3150.

We define a new blob called blob oct as intersectionof the octagon created from eight points and the originalblob. The areas of original blob and blob oct are expressedas blob area and blob oct area, respectively. The mismatcharea is defined as

Mismatch area = blob area − blob oct area, (21)

The ellipse matching ratio is defined as

Match ratio

= (blob oct area − mismatch area)/elp area, (22)

where elp area is the area of ellipse defined by π ×semimajor axis × semiminor axis.

From Eq. (22), it is easily known that a blob represent-ing the true ellipse will have a high match ratio. Thus, onlyblobs with high match ratio are verified as face candidates.Since we deal with single face images, only the one with thehighest match ratio is considered as the face candidate.

There is another problem when extracted blob containsneck. Since face and neck have the same skin color, theybelong to the same object/blob and in some cases the fittedellipse does not represent face. One method to overcomethe problem is by removing the lower part of the blob be-fore ellipse fitting using the following procedure. First, wedivide the original blob into three parts with the same heightcalled upper, middle, and lower parts. Then we scan theblob from the top to the bottom in the middle part to find thevertical position where the blob’s width is maximum, calledas the blob width. This position is supposed to be the ear’sposition. From this ear’s position, we scan the blob to thebottom to find the vertical position where the blob’s width isminimum. This position is supposed to be the turnoff pointbetween face and neck as shown in first example of Fig. 9.Finally we remove the blob’s pixels below the position.

In some cases, the minimum width’s position might notbe found due to the shapes of face or neck which do not forma turnoff point (see second example of Fig. 9), or when theoriginal blob does not contain a neck (see third example of

Fig. 9 Illustration of the results of each step in the face detection method.

Fig. 10 Face detection process.

Fig. 9), e.g. when neck is hidden by a muffle, a turtleneck,etc. In the case, a vertical position below ear’s position witha distance equals to 0.7 × blob width is considered as thestarting position for removing the lower blob’s pixels. Themethod will not remove (remove a few) the lower part ofblob that contains only the face as illustrated in third exam-ple of Fig. 9. Therefore the method extracts ellipse that fitsthe face properly for both cases where the neck is hidden ornot.

The overall procedure of the face detection is shown inFig. 10. The ten fine regions described in the previous sec-tion are devised so that at least one of them could extract theface skin color in a given test image. However, there mightno common rule to choose the most appropriate region fora given particular test image. Therefore, we conduct skincolor extraction using all the ten fine skin regions, then ap-ply the face detection described above for each blob image


obtained from each region, as illustrated in Fig. 10. Let a setof face candidates obtained from ten regions is defined asf c = { f1, f2, . . . , f10}, where fi is a face candidate obtainedfrom region-i. The corresponding match ratio is expressedas m( fi). The detected face (df ) is selected from the facecandidates which has the highest match ratio, or mathemat-ically is expressed as

d f = arg maxf∈ f c

m( f ) (23)

3. Experimental Results

We implemented our face segmentation algorithm usingMATLAB running on PC Pentium-D, 2.8 Ghz. Four faceimage datasets obtained from different resources in the In-ternet [19]–[23] are used to evaluate our algorithm. The fourdatasets contain frontal faces from various races under thedifferent lighting conditions and different backgrounds.

Dataset-1 [19] consists of 197 single frontal face im-ages with white color background. Dataset-2 [20] con-sists of 303 single frontal face images with green colorbackground. Dataset-3 [21], [22] consists of 448 singlefrontal face images with colorful objects in the background.Dataset-4 [23] consists of 425 single frontal face imageswith colorful objects in the background. All images indataset-1, dataset-2, and dataset-3 are indoor images, whiledataset-4 consists of indoor and outdoor images. Eachdataset contains images where the light illuminates over facenon-uniformly.

Figure 11 illustrates the face detection results, whereFigs. 11 (a), 11 (b), 11 (c), and 11 (d) are the original imageswith detected faces drawn by ellipses and the correspond-ing rectangles (bounding-boxes) superimposed in the im-ages. Figures 11 (e), 11 (f), 11 (g), and 11 (h) are the skin re-gion on the r-g chromaticity diagram of Figs. 11 (a), 11 (b),11 (c), and 11 (d) respectively.

Observing the figures, skin regions occupy different lo-cations in the chromaticity diagrams and have different areasfor different images. The area of skin region in Fig. 11 (f) is

Fig. 11 Face detection results from datasets: (a), (b), (c), (d) De-tected faces superimposed in the original images from dataset-1, dataset-2,dataset-3, dataset-4 respectively; (e), (f), (g), (h) Skin region of images (a),(b), (c), (d) on the rg chromaticty diagram respectively.

larger than the others. It is obvious that the fixed skin locusmethods proposed in [7]–[10], [17] are not effective to ex-tract skin from images with the different conditions. How-ever our algorithm by adjusting the boundaries of the fineregion could extract skin and detect faces effectively as il-lustrated in the figures. Our method works well with indoorand outdoor images. Moreover, it could overcome the prob-lem of non-uniform light illuminated face as illustrated inFig. 11 (d).

To verify the effectiveness of our algorithm, we makecomparison to four existing methods; three methods basedon the skin locus proposed in [8], [10], [17] called as M1,M2, and M3, and one method based on Adaboost proposedby [11], [12] called as AM, by computing the true detec-tion rate (TD), the false detection rate (FD), and the mis-detection rate (MD) of each method for all images fromdataset-1, dataset-2, dataset-3, and dataset-4.

TD is defined as the ratio of the total number of de-tected faces to the total number of face images, FD is definedas the ratio of the total number of detected wrong faces tothe total number of face images, and MD is the ratio of thetotal number of mis-detected faces to the total number offace images.

In the comparisons, we use the bounding-box createdfrom the detected ellipse instead of the detected ellipse itselfas shown in Figs. 9,11,12 to determine the true detection be-cause of two reasons: a) In fact, the face detection is used tolocalize face in the image. Then further process which re-quires the facial information in more details is carried out byanother algorithm according to the specific application suchas face recognition, facial recognition, or fatigue recogni-tion, etc. Therefore, the bounding-box is sufficient for facelocalization; b) To make a fair comparison to the existingface detection technique (AM) which does not use the el-lipse for detection, but use rectangle/square for face local-ization.

The detection is judged as the true detection whenall three conditions in the following are satisfied: a) Thebounding-box encloses both eyes and mouth; b) The hori-zontal deviation of the bounding-box from the face is lessthan 10% of the actual face’s width; c) The vertical devia-tion of the bounding-box from the face is less than 10% ofthe actual face’s height. Figure 12 illustrates the detectionresults, where Fig. 12 (a) represents the true detection, whileFigs. 12 (b) and 12(c) represent the false detection.

In PM (proposed method), we apply our proposed skinsegmentation and then apply a median filtering, a morphol-ogy operator to fill holes, and a connected component anal-

Fig. 12 (a) True detection, (b),(c) False detection.


ysis using the MATLAB functions MEDFILT2, IMFILL,BWLABEL, respectively, built in MATLAB Image Process-ing Toolbox, ellipse fitting using least square method [24]and our proposed face verification as described before. InM1, M2, and M3, we apply the skin locus proposed in [8],[10], and [17] for skin color segmentation respectively, andapply the same method as PM for connected componentanalysis, ellipse fitting and face verification.

For AM, we use the method proposed by [12] whichhas been implemented in the OpenCV [25] (an open sourcecomputer vision library developed by Intel). Two differenttraining data are used, called AM-1 and AM-2. In AM-1,we trained the Adaboost classifier using 700 positive imagestaken from the datasets-1,2,3,4 randomly: 100 faces fromdataset-1, 150 faces from dataset-2, 225 faces from dataset-3, 225 faces from dataset-4; and 255 negative images (non-face images). The training parameters are the same as theones used in [12]. The detection process is applied to the restof images in the datasets which are not used as the trainingsamples. In AM-2, we use the pre-trained classifier usedin [12], which is trained by 5000 positive images and 3000negative images available in [25]. The detection process isapplied to all images in the datasets as used in PM, M1, M2,and M3.

Since we deal with single face images in all datasets,only the best ellipse is detected in the face verification step,thus the total of the true detection rate, the false detectionrate, and the misdetection rate for each dataset is 1.0 forPM, M1, M2, and M3, but not for AM, due to the differentapproach. The comparison results are given in Table 1.

From Table 1, it is clear that the existing skin segmenta-tion methods using fixed skin regions or skin locus [8], [10],[17] are not effective to extract faces, when the image’s con-ditions differ from the ones used to determine the skin locus.

Table 1 Comparison of the face detection results.

However, the proposed method shows a high true detectionrate for all images from four datasets with different image’sconditions.

The results of AM-1 and AM-2 show that the true de-tection rate of the Adaboost relies on the training data usedfor training. It requires a proper selection of a number oftraining data, the quality and diversity of training data, andthe training parameters to achieve a high detection rate. Onthe other hand, our approach does not need the training data,therefore it does not have such problems. Comparing thetrue detection rates of PM, AM-1, and AM-2, we could saythat our approach is very efficient, especially when we donot want to be bothered with the problems of preparing andconducting the training process.

Since our method requires to compute ten fine skin re-gions, the computational time is higher than the existingmethods (M1,M2,M3). In the experiment, we resize im-ages into 60 × 80 pixels to compute all ten fine regions.The average computation time of our face detection methodusing MATLAB running on a PC Pentium-D, 2.8 Ghz is0.75 seconds. To make a comparison with AM, we run AMin MATLAB using Matlab interface (Mex file) provided by[26]. The computation time is 0.1 seconds on the same PC.This result suggests that our method is feasible to be imple-mented in real time using the faster program such as C++.

The weakness of our method which requires to com-pute ten fine skin regions could be overcome by workingwith video images instead of still images. Using the videoimages, we may track the face from frame to frame, thus theten skin regions are not computed in every frame. Thereforeoverall detection time will be reduced.

4. Conclusions

In this paper, we proposed a new approach for face segmen-tation based on the normalized RGB chromaticity diagram.It extracts skin color using a coarse region with fixed bound-aries and a fine region with variable boundaries. Then, theface verification is carried out using ellipse fitting and blob’smismatch ratio.

In the experiments using single frontal face imagesconsisting of various races and under varying lighting con-ditions obtained from four different resources in the Inter-net, the proposed method showed a high detection rate. Themethod is simple, no sample images required, and the com-putation cost is relative low, thus suitable to be implementedin real situations.

In future works, we will improve our method for de-tecting multiple faces under various orientations, incorpo-rate video images for face detection and tracking to increasethe performance, and implement our method into the realhardware system.

Acknowledgments

Authors thank to The Psychological Image Collection atUniversity of Stirling, UK, Dr. Libor Spacek at Department


of Computer Science, University of Essex, UK, VALIDDatabase University College Dublin, Ireland, and MarkusWeber at California Institute of Technology, USA for shar-ing the face image datasets used in this research.

References

[1] S. Kewei, F. Xitian, C. Anni, and S. Jingao, “Automatic face seg-mentation in YcrCb images,” Proc. Fifth Asia-Pacific Conference onCommunication and Fourth Optoelectronics and CommunicationsConference, pp.916–919, Beijing, China, Oct. 1999.

[2] A. Albiol, L. Torres, C.A. Bouman, and E.J. Delp, “A simple andefficient face detection algorithm for video database applications,”Proc. IEEE International Conference on Image Processing, vol.2,pp.239–242, Vancouver, Canada, Sept. 2000.

[3] P. Campadelli, F. Cusmai, and R. Lanzarotti, “A color-based methodfor face detection,” Proc. International Symposium on Telecommu-nications, pp.186–190, Isfahan, Iran, Aug. 2003.

[4] L.H. Zhao, X.L. Sun, J.H. Liu, and X.H. Xu, “Face detection basedon skin color,” Proc. International Conference On Machine Learningand Cybernetics, vol.6, pp.3625–3628, Shanghai, China, Aug. 2004.

[5] J.C. Terrilon, M.N. Shirazi, H. Fukumachi, and S. Akamatsu, “Com-parative performance of different skin chrominance models andchrominance spaces for the automatic detection of human face incolor images,” Proc. 4th IEEE International Conforence on Auto-matic Face and Gesture Recognition, pp.54–61, Grenoble, France,March, 2000.

[6] M.H. Yang, D.J. Kriegman, and N. Ahuja, “Detecting faces in im-ages: A survey,” IEEE Trans. Pattern Anal. Mach. Intell., vol.24,no.1, pp.34–58, 2002.

[7] M. Soriano, B. Martinkauppi, S. Huovinen, and M. Laaksonen, “Us-ing the skin locus to cope with changing illumination conditionsin color-based face tracking,” Proc. IEEE Nordic Signal ProcessingSymposium, pp.383–386, Kolmarden, Sweden, 2000.

[8] M. Soriano, B. Martinkauppi, S. Huovinen, and M. Laaksonen,“Skin detection in video under changing Illumination conditions,”Proc. Computer Vision and Pattern Recognition, vol.1, pp.839–842,2000.

[9] A. Hadid, M. Pietikainen, and B. Martinkauppi, “Color-based facedetection using skin locus model and hierarchical filtering,” Proc.16th IEEE International Conference on Pattern Recognition, vol.4,pp.196–200, Quebec, Canada, Aug. 2002.

[10] C.C. Chiang, W.K. Tai, M.T. Yang, Y.T. Huang, and C.J. Huang, “Anovel method for detecting lips, eyes and faces in real time,” Real-Time Imaging, vol.9, no.4, pp.277–287, Aug. 2003.

[11] P. Viola and M. Jones, “ Rapid object detection using a boosted cas-cade of simple features,” Proc. IEEE Conference on Computer Vi-sion and Pattern Recognition, Kauai, Hawaii, USA, 2001.

[12] R. Lienhart, A. Kuranov, and V. Pisarevsky, “Empirical analysis ofdetection cascades of boosted classifiers for rapid object detection,”Proc. 25th Pattern Recognition Symposium, pp.297–304, Madge-burg, Germany, Sept. 2003.

[13] J. Czyz, “Object detection via particle filters,” Proc. InternationalConference on Pattern Recognition (ICPR) 2006, vol.1, pp.820–823,2006.

[14] L. Stasiak and A. Pacut, “Particle filters for multi-face detection andtracking with automatic clustering,” Proc. IEEE International Work-shop on Imaging Systems and Techniques, Poland, May 2007.

[15] Y. Freund and R. Schapire, “A decision-theoritic generalization ofon-line learning and an application to boosting,” J. Comput. Syst.Sci., vol.55, no.1, pp.119–139, Aug. 1997.

[16] M. Storring, H. Andersen, and E. Granum, “Skin colour detectionunder changing lighting condition,” Proc. 7th Symposium on Intel-ligent Robotics Systems, ed. H. Araujo and J. Dias, pp.187–195,Coimbra, Portugal, July 1999.

[17] J. Fritsch, S. Lang, M. Kleinehagenbrock, G.A. Fink, and G.

Sagerer, “Improving adaptive skin color segmentation by incorporat-ing results from face detection,” Proc. IEEE International Workshopon Robot and Human Interactive Communication, Berlin, Germany,Sept. 2002.

[18] A. Soetedjo and K. Yamada, “A new approach on red color thresh-olding for traffic sign recognition system,” J. Japan Society for FuzzyTheory and Intelligent Informatics, vol.19, no.5, pp.457–465, Oct.2007.

[19] http://pics.psych.stir.ac.uk/[20] http://cswww.essex.ac.uk/mv/allfaces/index.html[21] http://ee.ucd.ie/validdb/datasets.html[22] N.A. Fox, B.A.O ’Mullane, and R.B. Reilly, “VALID: A new practi-

cal audio-visual database, and comparative results,” Proc. 5th Inter-national Conference on Audio- and Video-Based Biometric PersonAuthentication, AVBPA-2005.

[23] http://www.vision.caltech.edu/Image Datasets/faces/faces.tar[24] http://www.mathworks.com/matlabcentral/files/3215/fit ellipse.m[25] http://www.intel.com/technology/computing/opencv/index.htm[26] http://staff.science.uva.nl/˜zivkovic/Publications/mexHaarDetect.zip

Aryuanto Soetedjo received the B.Eng. andthe M. Eng. Degrees in Electrical Engineeringfrom Bandung Institute of Technology, Indone-sia in 1993 and 2002, respectively, and the Dr.Eng. degree in Information Science and ControlEngineering from Nagaoka University of Tech-nology, Japan in 2006. During 1994–2003, heworked in Traffic Control System R&D Depart-ment, Telnic Industries Co. Ltd. He is currentlya lecturer in the Faculty of Engineering, Multi-media University, Malaysia. His main research

interests are in image processing, machine learning, artificial intelligent,robotic and control systems.

Koichi Yamada received the B.Eng. de-gree in Control Engineering in 1978, and the M.Eng. and the Dr. Eng. degrees in Systems Sci-ence in 1980 and 1996, respectively from TokyoInstitute of Technology, Japan. He is currently aprofessor in the Department of Management andInformation Systems Science, Nagaoka Univer-sity of Technology, Japan. His main researchinterests are in reasoning in uncertainty, ma-chine learning, intelligent support systems andhuman-computer interactions.

skin color segmentation using coarse-to-fine region on...

Documents