quo vadis face recognition? robotics institute department...

Quo vadisFaceRecognition?

RalphGross,JianboShiRoboticsInstitute

CarnegieMellon UniversityPittsburgh,PA 15213�

rgross,jshi� @cs.cmu.edu

Jeffrey F. CohnDepartmentof Psychology

Universityof PittsburghPittsburgh,PA 15260

[email protected]

Abstract

Within the pastdecade, major advanceshaveoccurredin facerecognition. With few exceptions,however, mostre-search hasbeenlimited to training and testingon frontalviews. Little is known about the extent to which facepose, illumination, expression,occlusion, and individualdifferences,such asthoseassociatedwith gender, influencerecognition accuracy. We systematicallyvaried thesefac-tors to testtheperformanceof two leading algorithms,onetemplatebasedand the other feature based. Image dataconsistedof over 21000images from 3 publicly availabledatabases: CMU PIE, Cohn-Kanade, and AR databases.In general, bothalgorithmswere robustto variation in illu-mination andexpression.Recognition accuracy washighlysensitiveto variation in pose. For frontal training images,performancewasattenuatedbeginningat about 15degrees.Beyond about 30 degrees,performancebecameunaccept-able. For non-frontal training images, fall off was moresevere. Smallbut consistentdifferenceswere found for in-dividual differencesin subjects.Thesefindingssuggestdi-rectionfor future research, including designof experimentsanddatacollection.

1. Intr oduction

Is facerecognition a solvedproblem? Over the last 30years facerecognition hasbecome oneof the beststudiedpatternrecognition problemswith anearlyintractable num-ber of publications. Many of the algorithms have demon-stratedexcellent recognition results,often with error ratesof lessthan 10 percent. Thesesuccesseshave led to thedevelopment of a number of commercial facerecognitionsystems.Most of the current facerecognition algorithmscanbe categorized into two classes,imagetemplatebased

or geometryfeature-based.Thetemplatebasedmethods[1]compute the correlation betweena faceand one or moremodel templatesto estimatethe face identity. Statisticaltools such as Support Vector Machines (SVM) [30, 21],LinearDiscriminant Analysis(LDA) [2], PrincipalCompo-nentAnalysis(PCA) [27, 29, 11], KernelMethods[25, 17],andNeural Networks[24, 7, 12, 16] havebeenusedto con-structasuitablesetof facetemplates.While thesetemplatescanbe viewedasfeatures,they mostly capture global fea-turesof the faceimages.Facialocclusionis oftendifficultto handlein theseapproaches.

Thegeometryfeature-basedmethodsanalyzeexplicit lo-calfacialfeatures,andtheirgeometricrelationships.Cooteset al. have presentedanactive shapemodelin [15] extend-ing the approachby Yuille [34].Wiskott et al. developedanelasticBunchgraphmatchingalgorithm for facerecog-nition in [33]. Penev et. al [22] developedPCA into Lo-cal FeatureAnalysis (LFA). This technique is thebasisforoneof themostsuccessfulcommercial facerecognitionsys-tems,FaceIt.

Most facerecognition algorithms focuson frontal facialviews. However, posechangescanoftenleadto largenon-linear variation in facial appearancedue to self-occlusionandself-shading. To addressthis issue,MoghaddamandPentland[20] presenteda BayesianapproachusingPCA asa probability densityestimationtool. Li et al. [17] havedeveloped a view-basedpiece-wiseSVM model for facerecognition. In thefeaturebasedapproach,Cootesetal. [5]proposeda 3D active appearancemodelto explicitly com-pute the faceposevariation. Vetter et at. [32, 31] learna 3D geometry-appearancemodel for faceregistrationandmatching. However, todaythe exact trade-offs andlimita-tion of thesealgorithmsarerelatively unknown.

To evaluatetheperformanceof thesealgorithms,Phillipset. al. haveconductedtheFERETfacealgorithmtests[23],basedon theFERETdatabasewhich now contains 14,126

imagesfrom 1,199individuals. More recently the FacialRecognition Vendor Test[3] evaluatedcommercial systemsusingtheFERETandHumanID databases.Thetestresultshaverevealed thatimportantprogresshasbeenmadein facerecognition,andmany aspectsof thefacerecognition prob-lemsarenow well understood.However, therestill remainsa gapbetweenthesetestingresultsandpractical userex-periencesof commercial systems.While this gapcan,andwill, benarrowedthroughtheimprovementsof practical de-tailssuchassensorresolutionandview selection,wewouldlike to understandclearly the fundamentalcapabilitiesandlimitationsof currentfacerecognitionsystems.

In this paper, we will conduct a seriesof testsusingtwostateof art facerecognition systemson threenewly con-structedfacedatabasesto evaluatethe effect of facepose,illumination, facial expression,occlusionandsubjectgen-deron facerecognition performance.

Thepaperis organizedasfollows. Wedescribethethreedatabaseusedin our evaluationin Section2. In Section3 we introducethe two algorithms we usedfor our eval-uations. The experimentalproceduresandresultsarepre-sentedin Section4, andwe concludein Section5.

2. Description of Databases

2.1. Overview

Table1 givesan overview of the databasesusedin ourevaluation.

CMU PIE Cohn-Kanade AR DBSubjects 68 105 116Poses 13 1 1Illuminations 43 3 3Expressions 3 6 3Occlusion 0 0 2Sessions 1 1 2

Table 1. Overview over databases.

2.2. CMU Pose Illumination Expression (PIE)database

TheCMU PIEdatabasecontainsatotalof 41,368imagestakenfrom 68 individuals[26]. Thesubjectswereimagedin theCMU 3D Room[14] usinga setof 13 synchronizedhigh-quality color camerasand 21 flashes. The resultingimagesare 640x480 in size, with 24-bit color resolution.Thecamerasandflashesaredistributedin a hemisphereinfront of thesubjectasshown in Figure 1.

A seriesof imagesof asubjectacrossthedifferentposesis shown in Figure2. Eachsubjectwasrecordedunder 4conditions:

1. expression: thesubjectswereaskedtodisplayaneutralface,to smile,andto closetheireyesin order to simu-lateablink. Theimagesof all 13camerasareavailablein thedatabase.

2. illumination 1: 21 flashesare individually turnedonin a rapid sequence. In the first setting the imageswerecaptured with the room lights on. Eachcamerarecorded 24 images,2 with no flashes,21 with oneflashfiring andthenafinal imagewith noflashes.Onlytheoutput of threecameras (frontal, three-quarterandprofileview) waskept.

3. illumination 2: the procedurefor the illumination 1wasrepeatedwith the room lights off. Theoutputofall 13 cameraswasretainedin thedatabase.Combin-ing thetwo illumination settings,atotalof 43differentillumination conditionswererecorded.

4. talking: subjectscounted startingat 1. 2 seconds(60frames)of themtalkingwererecordedusing3 camerasasabove(again frontal, three-quarterandprofile view).

Figure3 shows examplesfor illumination conditions 1 and2.

2.3. Cohn-Kanade AU-Coded Facial ExpressionDatabase

This is apublicly availabledatabasefrom CarnegieMel-lon University [13]. It containsimagesequencesof facialexpressionfrom menandwomenof varying ethnicback-grounds.Thecameraorientation is frontal. Smallheadmo-tion is present.Imagesizeis 640by 480pixels with 8-bitgray scaleresolution. There are threevariations in light-ing: ambient lighting, single-high-intensity lamp,anddualhigh-intensity lampswith reflective umbrellas. Facial ex-pressionsarecoded usingtheFacialAction CodingSystem[8] andalsoassignedemotion-specifiedlabels.For thecur-rent study, we selected714 imageimagesequencesfrom105 subjects. Emotion expressionsincluded happy, sur-prise,anger, disgust,fear, andsadness.Examples for thedifferentexpressions areshown in Figure4.

2.4. AR FaceDatabase

Thepublicly availableAR databasewascollectedat theComputer Vision Centerin Barcelona[19]. It containsim-agesof 116 individuals (63 malesand 53 females). Theimagesare768x576pixelsin sizewith 24-bit color resolu-tion. Thesubjectswererecordedtwiceata2-weekinterval.

(a)

βα

cameraFlash

(b)−1.5 −1 −0.5 0 0.5 1 1.5

−0.3

−0.1

0.1

0.3

b

a

25

5

7

9

111422

22729

31

34 37

Figure 1. PIEdatabasecamerapositions.(a)13synchro-nizedvideocamerascapturefaceimagesfrom multiple an-gles,21 controlledflashunitsareevenly distributedaroundthecameras.(b) A plot of theazimuth( � ) andaltitude( � )anglesof the cameras,alongwith the cameraID number.9 of the 13 camerassamplea half circle at roughly headheightrangingfrom afull left to a full right profileview(+/-60 degrees);2 cameraswereplacedabove andbelow thecentralcamera;and2 cameraswerepositionedin the cor-nersof theroom.

During eachsession13 conditions with varying facial ex-pressions,illuminationandocclusionwerecaptured. Figure5 showsanexample for eachcondition.

3. FaceRecognitionAlgorithms

3.1. MIT , Bayesian Eigenface

Moghaddamet. al. generalize thePrincipalComponentAnalysis (PCA) approachof Sirovich andKirby [28] andTurk andPentland[29] by examining the probability dis-tribution of intra-personal variations in appearanceof thesameindividualandextra-personal variationsin appearancedueto differencein identity. Thisalgorithm performedcon-sistentlynearthetop in the1996FERRETtest[23].

Given two faceimages,�� , let �� be theimageintensitydifferencebetweenthem,we would like to

c25 c25

c22 c02 c37 c05 c27

c07

Figure 2. Posevariationin thePIEdatabase.8 of 13cam-eraviews areshown here. The remaining5 cameraposesaresymmetricalto theright sideof camerac27.

Figure 3. Illumination variationin thePIEdatabase.Theimagesin thefirst row show facesrecordedwith roomlightson, theimagesin thesecondrow show facescapturedwithonly flashillumination.

estimatetheposteriorprobability of �� , where �� istheintra-personalvariation of subject� . According to Bayesrule,we canrewrite it as:

�� !� � � �"�� !� �#�$�%��&�#�&�(')��&�*� �#+,�"��#+,� � (1)where�-+ is theextra-personal variation of all thesubjects.To estimatethe probability densitydistributions ��!� ��$�and ��!� �-+.� , PCA is usedto derived a low (M) dimen-sion approximation of the measured feature space � /021

( 3465��%798;:

Figure 4. Cohn-Kanade AU-CodedFacial Expressiondatabase.Examplesof emotion-specified expressionsfromimagesequences.

this is a mostly templatebasedclassificationalgorithm, al-though somelocal featuresareimplicitly encoded throughthe“Eigen” intra-personalandextra-personalimages.

3.2. Visionics,FaceIt

FaceIt’s recognition module is basedon Local FeatureAnalysis(LFA) [22]. This technique addressestwo majorproblemsof Principal ComponentAnalysis. The applica-tion of PCA to a setof imagesyields a global representa-tion of the imagefeaturesthat is not robust to variabilitydueto localizedchangesin theinput [10]. FurthermorethePCA representationis nontopographic,sonearbyvaluesinthe featurerepresentationdo not necessarilycorrespondtonearby valuesin theinput. LFA overcomestheseproblemsby using localized imagefeaturesin form of multi-scalefilters. The featureimagesare then encoded using PCAto obtaina compact description. According to Visionics,FaceItis robustagainstvariations in lighting, skin tone,eyeglasses,facialexpressionandhair style. They furthermoreclaim to be able to handle posevariations of up to 35 de-grees in all directions. We systematicallyevaluatedtheseclaims.

4. Evaluation

Following Phillips et. al. [23] we distinguish betweengallery andprobeimages.Thegallerycontains theimagesusedduring training of the algorithm. The algorithms aretestedwith theimagesin theprobesets.All resultsreportedherearebasedon non-overlappinggallery andprobesets(with theexceptionof thePIEposetest).We usethecloseduniverse model for evaluating the performance,meaningthatevery individual in theprobe setis alsopresentin thegallery. Thealgorithmswerenotgivenany furtherinforma-tion, sowe only evaluatethe facerecognition, not the face

01

02 03 04

05 06 07

08 09 10

11 12 13

Figure 5. AR database. The conditions are: (1) neu-tral, (2) smile, (3) anger, (4) scream,(5) left light on, (6)right light on, (7) both lights on, (8) sun glasses,(9) sunglasses/leftlight (10)sunglasses/rightlight, (11)scarf,(12)scarf/leftlight, (13) scarf/rightlight

verification performance.

4.1. Face localization and registration

Facerecognition is a two stepprocessconsistingof facedetectionandrecognition.First,thefacehastobelocatedintheimageandregisteredagainst aninternalmodel. There-sult of this stageis a normalized representationof theface,which therecognition algorithm canbeappliedto. In orderto ensurethevalidity of ourfindingsin termsof facerecog-nition accuracy, we provided both algorithms with correctlocations of the left andright eyes. This is done by apply-ing FaceIt’s facefinding module with a subsequentmanualverification of the results. If the initial facepositionwasincorrect, the locationof the left andright eye wasmarkedmanually andthe facefinding module is rerun on the im-age.Thefacedetectionmodule becamemore likely to failasdeparturefrom thefrontalview increased.

4.2. Pose

Using the CMU PIE databasewe arein the unique po-sition to evaluatetheperformanceof facerecognition algo-rithms with respectto posevariations in great detail. Weexhaustively sampledthe posespaceby using eachview

in turn asgallery with the remaining views asprobes. Asthereis only a singleimagepersubjectandcamera view inthe database, the gallery imagesareincluded in the probeset. Table2 shows thecompleteposeconfusionmatrix forFaceIt.Of particularinterestis thequestionhow far theal-gorithm cangeneralizefrom givengalleryviews.

Two thingsareworth noting. First, FaceIthasa reason-ablegeneralizability for frontal gallery images:therecogni-tion ratedropsto the70%-80%rangefor 45degreeof headrotation (corresponds to camerapositions 11and37 in Fig-ure 1 ). Figure6 shows the recognition accuraciesof thedifferentcameraviews for a mugshot gallery view.

00.25

0.50.75

1

Gallery pose: 27

0.030.93

0.93

0.94

0.750.13

0.03

0.06

1.000.93

0.07

0.03

0.62

Figure 6. Recognition accuraciesof all camerasfor themugshot galleryimage.Therecognitionratesareplottedontheposepositionsshown in figure1(b). Thedarker color inthe lower portion of thegraphindicateshigherrecognitionrate.Thesquarebox marksthegalleryview.

Second,for most non-frontal views (outsideof the 40degreerange), facegeneralizability goesdown drastically,even for very closeby views. This canbe seenin Figure7. Heretherecognition ratesareshown for the two profileviewsasgalleryimages.Thefull setof performancegraphsfor all 13gallery views is shown in appendix A.

We thenaskedthequestion, if we cangainmore by in-cluding multiple faceposesin the gallery set? Intuitively,given multiplefaceposes,with correspondencebetweenthefacial features,onecanhave a betterchance of predicting

00.25

0.50.75

1

Gallery pose: 34

0.000.01

0.00

0.03

0.000.01

0.01

0.01

0.040.00

0.03

1.00

0.03

00.25

0.50.75

1

Gallery pose: 22

0.000.03

0.03

0.03

0.010.01

1.00

0.03

0.030.01

0.01

0.00

0.04

Figure 7. Recognitionaccuracies of all camerasfor thetwo profile posesasgallery images(cameras34 and22 in1b).

a-66 -47 -46 -32 -17 0 0 0 16 31 44 44 62b 3 13 2 2 2 15 2 1.9 2 2 2 13 3

ProbePose c34 c31 c14 c11 c29 c09 c27 c07 c05 c37 c25 c02 c22Gallery Pose

c34 1.00 0.03 0.01 0.00 0.00 0.03 0.04 0.00 0.01 0.03 0.01 0.00 0.01c31 0.01 1.00 0.12 0.16 0.15 0.09 0.04 0.06 0.04 0.03 0.06 0.00 0.01c14 0.04 0.16 1.00 0.28 0.26 0.16 0.19 0.10 0.16 0.04 0.03 0.03 0.01c11 0.00 0.15 0.29 1.00 0.78 0.63 0.73 0.50 0.57 0.40 0.09 0.01 0.03c29 0.00 0.13 0.22 0.87 1.00 0.75 0.91 0.73 0.68 0.44 0.03 0.01 0.03c09 0.03 0.01 0.09 0.68 0.79 1.00 0.95 0.62 0.87 0.57 0.09 0.01 0.01c27 0.03 0.07 0.13 0.75 0.93 0.94 1.00 0.93 0.93 0.62 0.06 0.03 0.03c07 0.01 0.07 0.12 0.38 0.70 0.57 0.87 1.00 0.73 0.35 0.03 0.03 0.00c05 0.01 0.03 0.13 0.54 0.65 0.75 0.91 0.75 1.00 0.66 0.09 0.01 0.03c37 0.00 0.03 0.04 0.37 0.35 0.43 0.53 0.23 0.60 1.00 0.10 0.04 0.00c25 0.00 0.01 0.01 0.06 0.04 0.07 0.04 0.03 0.06 0.07 0.98 0.04 0.04c02 0.00 0.01 0.03 0.03 0.01 0.01 0.01 0.04 0.01 0.01 0.04 1.00 0.03c22 0.00 0.01 0.01 0.01 0.01 0.03 0.03 0.03 0.03 0.04 0.03 0.00 1.00

Table 2. Confusiontablefor posevariation.Eachrow of theconfusiontableshows therecognitionrateoneachof theprobeposesgivena particulargallerypose.Thecamerapose,indicatedby its azimuth(� ) andaltitude(� ) anglewasshown in Figure1.

novel faceposes.This testis carriedout by takingtwo setsof faceposes,cd7d7;�eIdf\�hgif\j , and c�8ikl�eIdf\�eI�mnj asgalleryandtestonall otherposes.Theresultsarepresentedin Table3.

ProbePose 02 05 07 09 11 14 22 25 27 29 31 34 37Gallery Pose

11-27-37 0.01 0.99 0.91 0.93 1.0 0.35 0.01 0.1 1.0 0.91 0.19 0.0 1.005-27-29 0.01 1.0 0.90 0.91 0.88 0.24 0.01 0.1 1.0 1.0 0.12 0.01 0.66

Table 3. Posevariation.Recognition ratesfor FaceItwithmultiple posesin thegalleryset.

An analysisof the results,shown in Figure8, indicatesthat with this algorithm, no additional gain is achievedthroughmultiple facegallery poses.This suggeststhat3Dfacerecognition approachescouldhave an advantageovernaive integrationof multiple faceposes,suchasin thepro-posed2DstatisticalSVM or relatednon-linearKernelmeth-ods.

Weconductedthesamesetof experimentsusingtheMITalgorithm. The resultsaremuchworsethanFaceIt’s evenwith manual identificationof theeyes.We suspectthatthismight bedueto theextremesensitivity of theMIT algorithmto faceregistrationerrors.

4.3. Illumination

For this test, the PIE andAR databasesare used. Wefoundthatbothalgorithmsperformedsignificantlybetterontheillumination images thanunderthevarious posecondi-tions. Table4 shows the recognition accuraciesof FaceItandtheMIT algorithm on bothdatabases.As described insection2.2thePIE databasecontainstwo illuminationsets.

Theimages in setillumination 1 weretakenwith theroomlights on. The mugshot gallery imagesdid not have flashillumination. For the illumination 2 setof imagestheroomlight wasswitchedoff. Theillumination for thegallery im-ageswasprovided by a flashdirectly oppositeof the sub-ject’s face. In eachcasetheprobe setwasmadeup by theremaining flashimages(21and20imagesrespectively). Ascanbe expected, the algorithms perform betterin the firsttest.

IlluminationPIE 1 PIE2 AR 05 AR 06 AR 07

FaceIt 0.97 0.91 0.95 0.93 0.86MIT 0.94 0.72 0.77 0.74 0.72

Table 4. Illumination results. PIE 1 and 2 refer tothe two illumination conditionsdescribedin section2.2.oAR05,AR06,AR07p arethe o left,right,bothp light oncon-

ditionsin theAR databaseasshown in Figure5

The resultson the PIE databaseareconsistentwith theoutcomeof theexperimentson theAR database. Heretheimages5 through7 dealwith differentillumination condi-tions,varying the lighting from the left- andright sidestobothlights on.

While theseresultsleadoneto concludethatfacerecog-nitionunderilluminationisasolvedproblem,wewouldliketo caution that the illumination change could still causeamajor problem when it is coupled otherchanges (expres-sion,pose,etc.).

0

0.25

0.5

0.75

1

Gallery pose: 11

0

0.25

0.5

0.75

1

Gallery pose: 27

0

0.25

0.5

0.75

1

Gallery pose: 37

0

0.25

0.5

0.75

1

Gallery pose: 11 27 37

0.010.57

0.50

0.63

1.000.29

0.03

0.09

0.740.78

0.15

0.00

0.40 0.030.93

0.93

0.94

0.750.13

0.03

0.06

1.000.93

0.07

0.03

0.62

0.040.60

0.24

0.43

0.370.04

0.00

0.10

0.530.35

0.03

0.00

1.00 0.010.99

0.91

0.93

1.000.35

0.01

0.10

1.000.91

0.19

0.00

1.00

a1 b1

a2 b2

c1

c2

d1

d2

Figure 8. Multiple posesgallery vs. multiple singleposegallery. Subplotsa1 anda2 show the recognition ratewith gallerypose11. With a singlegallery image,the algorithmis ableto generalizeto nearbyposes.Subplotsb

o1,2p andc o 1,2p show the

recognition ratesfor poses27 and37. Subplotsdo1,2p show the recognition rateswith gallery poseof o 11,27,37p . Onecansee

do1,2p is the sameas taking the maximumvaluesin a-co 1,2p . In this caseno additionalgain is achieved by using the joint seto

11,27,37p asthegalleryposes.

4.4. Expression

Facesundergo large deformationsunderfacial expres-sions.Humanscaneasilyhandlethis variation, but we ex-pectedthealgorithmsto haveproblemswith theexpressiondatabases.To our surpriseFaceItandMIT performedverywell ontheCohn-KanadeandtheAR database.In eachtestweusedtheneutralexpressionasgalleryimageandprobedthealgorithm with thepeakexpression.

ExpressionCohn-Kanade AR 02 AR 03 AR 04

FaceIt 0.97 0.96 0.93 0.78MIT 0.94 0.72 0.67 0.41

Table 5. Expressionresults. AR 02, AR 03 andAR 04referto theexpressionchangesin theAR databaseasshownin figure5. Both algorithmsperformreasonably well underfacialexpression, however the“scream”expression, AR 04,produceslargerecognition errors.

Table5 shows theresultsof bothalgorithmson the twodatabases.Thenotable exception is thescream(AR04) setof theAR database.

For most facial expressions,the facial deformation iscentered around the lower part of the face. This mightleave sufficient invariant information in the upperfaceforrecognition, which resultsin a high recognition rate. Theexpression“scream”haseffectson both theupper andthelower faceappearance,which leadsto a significantfall offin the recognition rate. This indicatesthat 1) facerecog-nition under extreme facialexpressionstill remainsanun-solved problem, and2) temporal informationcanprovidesignificant additional informationin facerecognitionunderexpression.

4.5. Occlusion

For theocclusion testswe look at imageswherepartsofthefaceareinvisible for thecamera.TheAR databasepro-videstwo scenarios:subjectswearingsunglassesandsub-jectswearinga scarfaround the lower portion of the face.The recognition ratesfor the sunglassimagesareaccord-ing to expectations.As Table6 shows, FaceItis unable tohandle this variation (AR08). The result further deterio-rateswhentheleft or right light is switchedon (AR09 andAR10).Thisresultis readilyreplicatedontheimagesof thesecondsession.

This testreveals thatFaceItis morevulnerableto upperfaceocclusionthan the MIT algorithm. Facial occlusion,particularly upper faceocclusion,remainsa difficult prob-lemyet to besolved. Interestingopenquestionsare1) what

OcclusionAR 08 AR 09 AR 10 AR 11 AR 12 AR13

FaceIt 0.10 0.08 0.06 0.81 0.73 0.71MIT 0.34 0.35 0.28 0.46 0.43 0.40

Table 6. Occlusionresults.AR08, AR09, AR10 refer tothe upperfacial occlusions,andAR11, AR12, AR13 referto the lower facialocclusions asshown in figure 5. Upperfacialocclusioncausesa majordropin recognition rates.

arethefundamentallimits of any recognition systemundervarious occlusions, and2) to what extend canotheraddi-tional facial information,suchasmotion, provide the nec-essaryhelpfor facerecognitionunderocclusion.

4.6. Gender

Male andfemalefacesdiffer in both local featuresandin shape[4]. Men’s faceson average have thicker eye-brows andgreatertexture in thebeardregion. In women’sfaces,the distancebetweenthe eyesandbrows is greater,the protuberanceof the nosesmaller, and the chin nar-rower than in men [4]. Peoplereadily distinguishmalefrom femalefacesusing theseandotherdifferences(e.g.,hair style),andconnectionist modeling hasyieldedsimilarresults[6, 18]. Little is known, however, about the sen-sitivity of faceidentificationalgorithms to differencesbe-tweenmen’s andwomen’s faces.The relative proportionsof menandwomenin trainingsamplesis seldomreported,andidentification resultstypically fail to reportwhether al-gorithmsaremoreor lessaccuratefor onesex or theother.Otherfactors thatmayinfluenceidentification, suchasdif-ferences in face shapebetweenindividuals of European,Asian,andAfrican ancestry[4, 9], have similarly beenig-nored in pastresearch.

We evaluatedthe influence of genderon facerecogni-tion algorithmsontheAR databasedueto its balancedratiobetweenthefemaleandmalesubjects.Figure9 shows therecognition rateachievedby FaceItacrossthe13variationsincluding illumination, occlusion, andexpression.

The resultsreveal a surprising trend: betterrecognitionratesareconsistentlyachieved for femalesubjects.Aver-agedacrossthe conditions (excluding the testsAR08-10whereFaceIt breaks down) the recognition rate for malesubjectsis 83.4%,while therecognition ratefor femalesub-jectsis 91.66%. It is not clearwhathascausedthis effect.To further validate this result, a much larger databaseisneeded.

If this result is further substantiated,it opens up manyinterestingquestionson facerecognition. In particularitraisesthequestions:1)whatmakesonefaceeasierto recog-

nizethananother, and2) aretherefaceclasseswith similarrecognizability.

01 02 03 04 05 06 07 08 09 10 11 12 130

0.5

1ARDB Male/Female

Figure 9. ARDB resultsmalevs. female. The dashedline indicatesthe recognition ratefor malesubjectsin theAR databaseshown in figure5.

5. Discussion

In natural environments, pose,illumination, expression,occlusion and individual difference among people repre-sentcritical challengesto facerecognition algorithms. TheFERETtests[23] andthe Facial Recognition Vendor Test2000 [3] provided initial resultson limited variations ofthesefactors.

FaceItandtheMIT algorithmwereoverall thebestper-formerin thesetests.Weevaluatedbothalgorithmsonmul-tiple independentdatabasesthat systematicallyvary pose,illumination, expression,occlusion,andgender. We found:

1. Pose:Posevariation still presentsa challengefor facerecognition. Frontal trainingimageshavebettergener-alizability to novel views thando non-frontal trainingimages. For a frontal training view, we canachievereasonable recognition ratesof 70-80% for up to 45degreeheadrotation.In addition, usingmultiple train-ingviewsdoesnotnecessarilyimprovetherecognitionrate.

2. Illumination: Pureillumination changes on the faceare handled well by current face recognition algo-rithms.

3. Expression: With the exception of extreme expres-sionssuchasscream,thealgorithmsarerelatively ro-bust to facial expression. Deformation of the mouth

andocclusionof theeyesby eye narrowing andclos-ing present a problemto thealgorithms.

4. Occlusion: The performance of the facerecognitionalgorithmsunder occlusionvaries.FaceItis more sen-sitive to upper faceocclusionthanMIT. FaceItis morerobust to lower faceocclusion.

5. Gender:We foundsurprisingly consistentdifferencesof facerecognition ratesacrossthegender. This resultis basedon testingon the AR databasewhich has70maleand60 femalesubjects.On average the recog-nition ratefor femalesis consistentlyabout 5%higherthanfor males,acrossa range of perturbation. Whilethe databaseusedin thesetestsis too small to drawgeneral conclusionsit pointsinto aninterestingdirec-tion for futureresearchanddatabasecollections.

The current study has several limitations. One, wedid not examine the effect of faceimagesize on al-gorithm performancein thevariousconditions. Mini-mum sizethresholds may well differ for variousper-mutations, which would be important to determine.Two, the influence of racial or ethnicdifferencesonalgorithm performancecouldnot be examined duetothe homogeneityof racial andethnicbackgrounds inthedatabases.While largedatabaseswith ethnicvaria-tion areavailable,they lack theparametricvariation inlighting, shape,poseandotherfactorsthatwerethefo-cusof this investigation.Three,faceschange dramat-ically with development,but the influence of changewith developmentonalgorithm performancecouldnotbe examined. Fourth, while we were able to exam-ine the combined effects of somefactors, databasesareneededthatsupport examinationof all ecologicallyvalid combinations,which may be non-additive. Theresultsof thecurrent studysuggest thatgreateratten-tion be paid to the multiple sources of variation thatarelikely to affect facerecognition in naturalenviron-ments.

6. Acknowledgement

This researchis supported by ONR N00014-00-1-0915(DARPA HumanID) andNSFIRI-9817496.

A. Appendix

The following figuresshow the recognition ratesfor allcameraviews in turn asgallery images.They areorderedaccording to thecameranumbersroughly going from left toright in Figure1.

00.25

0.50.75

1

Gallery pose: 34

0.000.01

0.00

0.03

0.000.01

0.01

0.01

0.040.00

0.03

1.00

0.03

Figure 10. Recognition accuraciesof all camerasforcameraview 34 asgalleryimage.

00.25

0.50.75

1

Gallery pose: 31

0.000.04

0.06

0.09

0.160.12

0.01

0.06

0.040.15

1.00

0.01

0.03


00.25

0.50.75

1

Gallery pose: 14

0.030.16

0.10

0.16

0.281.00

0.01

0.03

0.190.26

0.16

0.04

0.04

Figure 12. Recognitionaccuraciesof all camerasforcameraview 14 asgalleryimage.

00.25

0.50.75

1

Gallery pose: 11

0.010.57

0.50

0.63

1.000.29

0.03

0.09

0.740.78

0.15

0.00

0.40


00.25

0.50.75

1

Gallery pose: 29

0.010.68

0.74

0.75

0.870.22

0.03

0.03

0.911.00

0.13

0.00

0.44


00.25

0.50.75

1

Gallery pose: 07

0.030.74

1.00

0.57

0.380.12

0.00

0.03

0.870.71

0.07

0.01

0.35


00.25

0.50.75

1

Gallery pose: 27

0.030.93

0.93

0.94

0.750.13

0.03

0.06

1.000.93

0.07

0.03

0.62


00.25

0.50.75

1

Gallery pose: 09

0.010.87

0.62

1.00

0.680.09

0.01

0.09

0.960.79

0.01

0.03

0.57


00.25

0.50.75

1

Gallery pose: 05

0.011.00

0.75

0.75

0.540.13

0.03

0.09

0.910.65

0.03

0.01

0.66


00.25

0.50.75

1

Gallery pose: 37

0.040.60

0.24

0.43

0.370.04

0.00

0.10

0.530.35

0.03

0.00

1.00


00.25

0.50.75

1

Gallery pose: 02

1.000.01

0.04

0.01

0.030.03

0.03

0.04

0.010.01

0.01

0.00

0.01


00.25

0.50.75

1

Gallery pose: 25

0.040.06

0.03

0.07

0.060.01

0.04

0.99

0.040.04

0.01

0.00

0.07


00.25

0.50.75

1

Gallery pose: 22

0.000.03

0.03

0.03

0.010.01

1.00

0.03

0.030.01

0.01

0.00

0.04


References

[1] RobertJ. Baron. Mechanismsof humanfacialrecog-nition. International Journal ofMan-MachineStudies,15(2):137–178,1981.

[2] P. N. Belhumeur, J.P. Hespanha, andD. J.Kriegman.Eigenfacesvs. fisherfaces: Recognition using classspecificlinearprojection. IEEE Transactions on Pat-tern Analysisand Machine Intelligence, 19(7):711–720,July 1997.

[3] D.M. Blackburn, M. Bone, and P.J. Philips. Facialrecognition vendor test2000:evaluationreport, 2000.

[4] V. BruceandA. Young. In theeyeof thebeholder: Thescienceof faceperception. Oxford University Press,1998.

[5] T. Cootes,K. Walker, andC. Taylor. View-basedac-tive appearancemodels. In IEEE International Con-ferenceon AutomaticFaceandGesture Recognition,March2000.

[6] G.W. CottrellandJ.Metcalfe.Empath:Face,emotion,and gender recognition usingholons. In R.P. Lipp-mann,J.E.Moody, andD.S.Touretzky, editors,Neu-ral informationprocessingsystems, volume 3, pages53–60, SanMateo,CA, 1991. MorganKaufmann.

[7] M. N. Dailey andGarrisonW. Cottrell. Organizationof faceandobjectrecognition in modular neuralnet-work models. Neural Networks, 12(7/8),1999.

[8] P. EkmanandW.V. Friesen.Facial ActionCodingSys-tem. ConsultingPsychologist Press,1978.

[9] L.G. Farkasand I.R. Munro. Anthropometric facialproportions in medicine. C.C. Thomas,Springfield,IL, 1987.

[10] R. Gross,J. Yang,andA. Waibel. Facedetectionina meetingroom. In Proceedings of the Fourth IEEEInternational ConferenceonAutomaticFaceandGes-tureRecognition, Grenoble,France,2000.

[11] Peter J. B. Hancock, A. Mike Burton, and VickiBruce. Faceprocessing:Humanperception andprin-cipal componentsanalysis. Memoryand Cognition,24(1):26–40,1996.

[12] A. JonathanHowell andHilary Buxton. Invarianceinradial basisfunction neural networks in humanfaceclassification.Neural ProcessingLetters, 2(3):26–30,1995.

[13] T. Kanade,J.F. Cohn, andY. Tian. Comprehensivedatabasefor facial expressionanalysis. In Proceed-ingsof theFourth IEEE International ConferenceonAutomatic Faceand Gesture Recognition, pages46–53,Grenoble, France,2000.

[14] T. Kanade,H. Saito,andS.Vedula.The3droom: Dig-itizing time-varying 3d eventsby synchronizedmul-tiple video streams. TechnicalReportCMU-RI-TR-98-34,RoboticsInstitute,CarnegieMellonUniversity,Pittsburgh, PA, December1998.

[15] Andreas Lanitis, Christopher J. Taylor, and Timo-thy FrancisCootes.Automatic interpretationandcod-ing of faceimagesusingflexible models.IEEETrans-actionsonPatternAnalysisandMachine Intelligence,19(7):743–756, 1997.

[16] Steve Lawrence,C. Lee Giles, Ah ChungTsoi, andAndrew D. Back. Facerecognition: A convolutionalneural network approach.IEEETransactionsonNeu-ral Networks, 8(1):98–113, 1998.

[17] Y. Li, S.Gong,andH. Liddell. Support vector regres-sionandclassificationbasedmulti-view facedetectionand recognition. In IEEE International Conferenceon AutomaticFaceand Gesture Recognition, March2000.

[18] A.J.Luckman,N.M Allison, A. Ellis, andB.M. Flude.Familiar facerecognition: A comparative studyof aconnectionistmodelandhumanperformance.Neuro-computing, 7:3–27,1995.

[19] A. R. Martinez and R. Benavente. The ar facedatabase.Technical Report24,ComputerVisionCen-ter(CVC) TechnicalReport, Barcelona,Spain, June1998.

[20] BabackMoghaddamandAlex Paul Pentland.Proba-bilistic visuallearningfor objectrepresentation. IEEETransactionsonPatternAnalysisandMachineIntelli-gence, 19(7):696–710, 1997.

[21] E. Osuna,R. Freund, and F. Girosi. Training sup-port vector machines: An application to facedetec-tion, 1997.

[22] P. Penev andJ. Atick. Local feature analysis: A gen-eralstatisticaltheoryfor objectrepresentati on,1996.

[23] P. Jonathon Phillips, Harry Wechsler, Jeffrey S.Huang,andPatrickJ.Rauss.TheFERETdatabaseandevaluationprocedurefor face-recognition algorithms.ImageandVisionComputing, 16(5):295–306, 1998.

[24] T. Poggio andK.-K. Sung. Example-basedLearningfor View-basedHumanFaceDetection.In 1994ARPAImage Understanding Workshop, volume II, Novem-ber1994.

[25] B. Scholkopf, A. Smola,and K.-R. Muller. Kernalprincipal component analysis.In Artificial Neural Net-worksICANN97, 1997.

[26] T. Sim, S. Baker, andM. Bsat. The CMU Pose,Il-lumination, andExpression(PIE) databaseof humanfaces.Technical ReportCMU-RI-TR-01-02,RoboticsInstitute,Carnegie Mellon University, Pittsburgh, PA,January2001.

[27] L. Sirovich andM. Kirby. Low-dimensional proce-durefor thecharacterizationof humanfaces.Journalof Optical Societyof America, 4(3):519–524, March1987.

[28] L. SirovichandM. Kirby. Low-dimensionalprocedurefor thecharacterizationof humanfaces,1987.

[29] Matthew Turk and Alex Paul Pentland. Eigenfacesfor recognition. Journal of Cognitive Neuroscience,3(1):71–86, 1991.

[30] Vladimir N. Vapnik. Thenatureof statisticallearningtheory. Springer Verlag,Heidelberg, DE, 1995.

[31] ThomasVetter. Synthesisof novel viewsfromasinglefaceimage.International Journal of ComputerVision,28(2):103–116,1998.

[32] ThomasVetter, Anya Hurlbert, andTomasoPoggio.View-basedmodels of 3D object recognition: Invari-anceto imaging transformations. Cerebral Cortex,5(3):261–269,1995.

[33] LaurenzWiskott, Jean-Marc Fellous,Norbert Krüger,and Christophvon der Malsburg. Facerecognitionby elastic bunch graph matching. IEEE Transac-tions on Pattern Analysisand Machine Intelligence,19(7):775–779, July1997.

[34] Alan L. Yuille. Deformabletemplatesfor facerecog-nition. Journal of Cognitive Neuroscience, 3(1):59–70,1991.

quo vadis face recognition? robotics institute department...

Documents