quo vadis face recognition? robotics institute department...

14
Quo vadis Face Recognition? Ralph Gross, Jianbo Shi Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213 rgross,jshi @cs.cmu.edu Jeffrey F. Cohn Department of Psychology University of Pittsburgh Pittsburgh, PA 15260 [email protected] Abstract Within the past decade, major advances have occurred in face recognition. With few exceptions, however, most re- search has been limited to training and testing on frontal views. Little is known about the extent to which face pose, illumination, expression, occlusion, and individual differences, such as those associated with gender, influence recognition accuracy. We systematically varied these fac- tors to test the performance of two leading algorithms, one template based and the other feature based. Image data consisted of over 21000 images from 3 publicly available databases: CMU PIE, Cohn-Kanade, and AR databases. In general, both algorithms were robust to variation in illu- mination and expression. Recognition accuracy was highly sensitive to variation in pose. For frontal training images, performance was attenuated beginning at about 15 degrees. Beyond about 30 degrees, performance became unaccept- able. For non-frontal training images, fall off was more severe. Small but consistent differences were found for in- dividual differences in subjects. These findings suggest di- rection for future research, including design of experiments and data collection. 1. Introduction Is face recognition a solved problem? Over the last 30 years face recognition has become one of the best studied pattern recognition problems with a nearly intractable num- ber of publications. Many of the algorithms have demon- strated excellent recognition results, often with error rates of less than 10 percent. These successes have led to the development of a number of commercial face recognition systems. Most of the current face recognition algorithms can be categorized into two classes, image template based or geometry feature-based. The template based methods [1] compute the correlation between a face and one or more model templates to estimate the face identity. Statistical tools such as Support Vector Machines (SVM) [30, 21], Linear Discriminant Analysis (LDA) [2], Principal Compo- nent Analysis (PCA) [27, 29, 11], Kernel Methods [25, 17], and Neural Networks [24, 7, 12, 16] have been used to con- struct a suitable set of face templates. While these templates can be viewed as features, they mostly capture global fea- tures of the face images. Facial occlusion is often difficult to handle in these approaches. The geometry feature-based methods analyze explicit lo- cal facial features, and their geometric relationships. Cootes et al. have presented an active shape model in [15] extend- ing the approach by Yuille [34].Wiskott et al. developed an elastic Bunch graph matching algorithm for face recog- nition in [33]. Penev et. al [22] developed PCA into Lo- cal Feature Analysis (LFA). This technique is the basis for one of the most successful commercial face recognition sys- tems, FaceIt. Most face recognition algorithms focus on frontal facial views. However, pose changes can often lead to large non- linear variation in facial appearance due to self-occlusion and self-shading. To address this issue, Moghaddam and Pentland [20] presented a Bayesian approach using PCA as a probability density estimation tool. Li et al. [17] have developed a view-based piece-wise SVM model for face recognition. In the feature based approach, Cootes et al. [5] proposed a 3D active appearance model to explicitly com- pute the face pose variation. Vetter et at. [32, 31] learn a 3D geometry-appearance model for face registration and matching. However, today the exact trade-offs and limita- tion of these algorithms are relatively unknown. To evaluate the performance of these algorithms, Phillips et. al. have conducted the FERET face algorithm tests [23], based on the FERET database which now contains 14,126

Upload: others

Post on 08-Feb-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

  • Quo vadisFaceRecognition?

    RalphGross,JianboShiRoboticsInstitute

    CarnegieMellon UniversityPittsburgh,PA 15213�

    rgross,jshi� @cs.cmu.edu

    Jeffrey F. CohnDepartmentof Psychology

    Universityof PittsburghPittsburgh,PA 15260

    [email protected]

    Abstract

    Within the pastdecade, major advanceshaveoccurredin facerecognition. With few exceptions,however, mostre-search hasbeenlimited to training and testingon frontalviews. Little is known about the extent to which facepose, illumination, expression,occlusion, and individualdifferences,such asthoseassociatedwith gender, influencerecognition accuracy. We systematicallyvaried thesefac-tors to testtheperformanceof two leading algorithms,onetemplatebasedand the other feature based. Image dataconsistedof over 21000images from 3 publicly availabledatabases: CMU PIE, Cohn-Kanade, and AR databases.In general, bothalgorithmswere robustto variation in illu-mination andexpression.Recognition accuracy washighlysensitiveto variation in pose. For frontal training images,performancewasattenuatedbeginningat about 15degrees.Beyond about 30 degrees,performancebecameunaccept-able. For non-frontal training images, fall off was moresevere. Smallbut consistentdifferenceswere found for in-dividual differencesin subjects.Thesefindingssuggestdi-rectionfor future research, including designof experimentsanddatacollection.

    1. Intr oduction

    Is facerecognition a solvedproblem? Over the last 30years facerecognition hasbecome oneof the beststudiedpatternrecognition problemswith anearlyintractable num-ber of publications. Many of the algorithms have demon-stratedexcellent recognition results,often with error ratesof lessthan 10 percent. Thesesuccesseshave led to thedevelopment of a number of commercial facerecognitionsystems.Most of the current facerecognition algorithmscanbe categorized into two classes,imagetemplatebased

    or geometryfeature-based.Thetemplatebasedmethods[1]compute the correlation betweena faceand one or moremodel templatesto estimatethe face identity. Statisticaltools such as Support Vector Machines (SVM) [30, 21],LinearDiscriminant Analysis(LDA) [2], PrincipalCompo-nentAnalysis(PCA) [27, 29, 11], KernelMethods[25, 17],andNeural Networks[24, 7, 12, 16] havebeenusedto con-structasuitablesetof facetemplates.While thesetemplatescanbe viewedasfeatures,they mostly capture global fea-turesof the faceimages.Facialocclusionis oftendifficultto handlein theseapproaches.

    Thegeometryfeature-basedmethodsanalyzeexplicit lo-calfacialfeatures,andtheirgeometricrelationships.Cooteset al. have presentedanactive shapemodelin [15] extend-ing the approachby Yuille [34].Wiskott et al. developedanelasticBunchgraphmatchingalgorithm for facerecog-nition in [33]. Penev et. al [22] developedPCA into Lo-cal FeatureAnalysis (LFA). This technique is thebasisforoneof themostsuccessfulcommercial facerecognitionsys-tems,FaceIt.

    Most facerecognition algorithms focuson frontal facialviews. However, posechangescanoftenleadto largenon-linear variation in facial appearancedue to self-occlusionandself-shading. To addressthis issue,MoghaddamandPentland[20] presenteda BayesianapproachusingPCA asa probability densityestimationtool. Li et al. [17] havedeveloped a view-basedpiece-wiseSVM model for facerecognition. In thefeaturebasedapproach,Cootesetal. [5]proposeda 3D active appearancemodelto explicitly com-pute the faceposevariation. Vetter et at. [32, 31] learna 3D geometry-appearancemodel for faceregistrationandmatching. However, todaythe exact trade-offs andlimita-tion of thesealgorithmsarerelatively unknown.

    To evaluatetheperformanceof thesealgorithms,Phillipset. al. haveconductedtheFERETfacealgorithmtests[23],basedon theFERETdatabasewhich now contains 14,126

  • imagesfrom 1,199individuals. More recently the FacialRecognition Vendor Test[3] evaluatedcommercial systemsusingtheFERETandHumanID databases.Thetestresultshaverevealed thatimportantprogresshasbeenmadein facerecognition,andmany aspectsof thefacerecognition prob-lemsarenow well understood.However, therestill remainsa gapbetweenthesetestingresultsandpractical userex-periencesof commercial systems.While this gapcan,andwill, benarrowedthroughtheimprovementsof practical de-tailssuchassensorresolutionandview selection,wewouldlike to understandclearly the fundamentalcapabilitiesandlimitationsof currentfacerecognitionsystems.

    In this paper, we will conduct a seriesof testsusingtwostateof art facerecognition systemson threenewly con-structedfacedatabasesto evaluatethe effect of facepose,illumination, facial expression,occlusionandsubjectgen-deron facerecognition performance.

    Thepaperis organizedasfollows. Wedescribethethreedatabaseusedin our evaluationin Section2. In Section3 we introducethe two algorithms we usedfor our eval-uations. The experimentalproceduresandresultsarepre-sentedin Section4, andwe concludein Section5.

    2. Description of Databases

    2.1. Overview

    Table1 givesan overview of the databasesusedin ourevaluation.

    CMU PIE Cohn-Kanade AR DBSubjects 68 105 116Poses 13 1 1Illuminations 43 3 3Expressions 3 6 3Occlusion 0 0 2Sessions 1 1 2

    Table 1. Overview over databases.

    2.2. CMU Pose Illumination Expression (PIE)database

    TheCMU PIEdatabasecontainsatotalof 41,368imagestakenfrom 68 individuals[26]. Thesubjectswereimagedin theCMU 3D Room[14] usinga setof 13 synchronizedhigh-quality color camerasand 21 flashes. The resultingimagesare 640x480 in size, with 24-bit color resolution.Thecamerasandflashesaredistributedin a hemisphereinfront of thesubjectasshown in Figure 1.

    A seriesof imagesof asubjectacrossthedifferentposesis shown in Figure2. Eachsubjectwasrecordedunder 4conditions:

    1. expression: thesubjectswereaskedtodisplayaneutralface,to smile,andto closetheireyesin order to simu-lateablink. Theimagesof all 13camerasareavailablein thedatabase.

    2. illumination 1: 21 flashesare individually turnedonin a rapid sequence. In the first setting the imageswerecaptured with the room lights on. Eachcamerarecorded 24 images,2 with no flashes,21 with oneflashfiring andthenafinal imagewith noflashes.Onlytheoutput of threecameras (frontal, three-quarterandprofileview) waskept.

    3. illumination 2: the procedurefor the illumination 1wasrepeatedwith the room lights off. Theoutputofall 13 cameraswasretainedin thedatabase.Combin-ing thetwo illumination settings,atotalof 43differentillumination conditionswererecorded.

    4. talking: subjectscounted startingat 1. 2 seconds(60frames)of themtalkingwererecordedusing3 camerasasabove(again frontal, three-quarterandprofile view).

    Figure3 shows examplesfor illumination conditions 1 and2.

    2.3. Cohn-Kanade AU-Coded Facial ExpressionDatabase

    This is apublicly availabledatabasefrom CarnegieMel-lon University [13]. It containsimagesequencesof facialexpressionfrom menandwomenof varying ethnicback-grounds.Thecameraorientation is frontal. Smallheadmo-tion is present.Imagesizeis 640by 480pixels with 8-bitgray scaleresolution. There are threevariations in light-ing: ambient lighting, single-high-intensity lamp,anddualhigh-intensity lampswith reflective umbrellas. Facial ex-pressionsarecoded usingtheFacialAction CodingSystem[8] andalsoassignedemotion-specifiedlabels.For thecur-rent study, we selected714 imageimagesequencesfrom105 subjects. Emotion expressionsincluded happy, sur-prise,anger, disgust,fear, andsadness.Examples for thedifferentexpressions areshown in Figure4.

    2.4. AR FaceDatabase

    Thepublicly availableAR databasewascollectedat theComputer Vision Centerin Barcelona[19]. It containsim-agesof 116 individuals (63 malesand 53 females). Theimagesare768x576pixelsin sizewith 24-bit color resolu-tion. Thesubjectswererecordedtwiceata2-weekinterval.

  • (a)

    βα

    cameraFlash

    (b)−1.5 −1 −0.5 0 0.5 1 1.5

    −0.3

    −0.1

    0.1

    0.3

    b

    a

    25

    5

    7

    9

    111422

    22729

    31

    34 37

    Figure 1. PIEdatabasecamerapositions.(a)13synchro-nizedvideocamerascapturefaceimagesfrom multiple an-gles,21 controlledflashunitsareevenly distributedaroundthecameras.(b) A plot of theazimuth( � ) andaltitude( � )anglesof the cameras,alongwith the cameraID number.9 of the 13 camerassamplea half circle at roughly headheightrangingfrom afull left to a full right profileview(+/-60 degrees);2 cameraswereplacedabove andbelow thecentralcamera;and2 cameraswerepositionedin the cor-nersof theroom.

    During eachsession13 conditions with varying facial ex-pressions,illuminationandocclusionwerecaptured. Figure5 showsanexample for eachcondition.

    3. FaceRecognitionAlgorithms

    3.1. MIT , Bayesian Eigenface

    Moghaddamet. al. generalize thePrincipalComponentAnalysis (PCA) approachof Sirovich andKirby [28] andTurk andPentland[29] by examining the probability dis-tribution of intra-personal variations in appearanceof thesameindividualandextra-personal variationsin appearancedueto differencein identity. Thisalgorithm performedcon-sistentlynearthetop in the1996FERRETtest[23].

    Given two faceimages,������� , let ����������� be theimageintensitydifferencebetweenthem,we would like to

    c25 c25

    c22 c02 c37 c05 c27

    c07

    Figure 2. Posevariationin thePIEdatabase.8 of 13cam-eraviews areshown here. The remaining5 cameraposesaresymmetricalto theright sideof camerac27.

    Figure 3. Illumination variationin thePIEdatabase.Theimagesin thefirst row show facesrecordedwith roomlightson, theimagesin thesecondrow show facescapturedwithonly flashillumination.

    estimatetheposteriorprobability of �������� ��� , where ��� istheintra-personalvariation of subject� . According to Bayesrule,we canrewrite it as:

    ����� � � ��� �����!� � � �"����� � ������!� �#�$�%���&�#�&�(')���&�*� �#+,�"�����#+,� � (1)where�-+ is theextra-personal variation of all thesubjects.To estimatethe probability densitydistributions �����!� ���$�and �����!� �-+.� , PCA is usedto derived a low (M) dimen-sion approximation of the measured feature space � /021

    ( 3465��%798;:

  • Figure 4. Cohn-Kanade AU-CodedFacial Expressiondatabase.Examplesof emotion-specified expressionsfromimagesequences.

    this is a mostly templatebasedclassificationalgorithm, al-though somelocal featuresareimplicitly encoded throughthe“Eigen” intra-personalandextra-personalimages.

    3.2. Visionics,FaceIt

    FaceIt’s recognition module is basedon Local FeatureAnalysis(LFA) [22]. This technique addressestwo majorproblemsof Principal ComponentAnalysis. The applica-tion of PCA to a setof imagesyields a global representa-tion of the imagefeaturesthat is not robust to variabilitydueto localizedchangesin theinput [10]. FurthermorethePCA representationis nontopographic,sonearbyvaluesinthe featurerepresentationdo not necessarilycorrespondtonearby valuesin theinput. LFA overcomestheseproblemsby using localized imagefeaturesin form of multi-scalefilters. The featureimagesare then encoded using PCAto obtaina compact description. According to Visionics,FaceItis robustagainstvariations in lighting, skin tone,eyeglasses,facialexpressionandhair style. They furthermoreclaim to be able to handle posevariations of up to 35 de-grees in all directions. We systematicallyevaluatedtheseclaims.

    4. Evaluation

    Following Phillips et. al. [23] we distinguish betweengallery andprobeimages.Thegallerycontains theimagesusedduring training of the algorithm. The algorithms aretestedwith theimagesin theprobesets.All resultsreportedherearebasedon non-overlappinggallery andprobesets(with theexceptionof thePIEposetest).We usethecloseduniverse model for evaluating the performance,meaningthatevery individual in theprobe setis alsopresentin thegallery. Thealgorithmswerenotgivenany furtherinforma-tion, sowe only evaluatethe facerecognition, not the face

    01

    02 03 04

    05 06 07

    08 09 10

    11 12 13

    Figure 5. AR database. The conditions are: (1) neu-tral, (2) smile, (3) anger, (4) scream,(5) left light on, (6)right light on, (7) both lights on, (8) sun glasses,(9) sunglasses/leftlight (10)sunglasses/rightlight, (11)scarf,(12)scarf/leftlight, (13) scarf/rightlight

    verification performance.

    4.1. Face localization and registration

    Facerecognition is a two stepprocessconsistingof facedetectionandrecognition.First,thefacehastobelocatedintheimageandregisteredagainst aninternalmodel. There-sult of this stageis a normalized representationof theface,which therecognition algorithm canbeappliedto. In orderto ensurethevalidity of ourfindingsin termsof facerecog-nition accuracy, we provided both algorithms with correctlocations of the left andright eyes. This is done by apply-ing FaceIt’s facefinding module with a subsequentmanualverification of the results. If the initial facepositionwasincorrect, the locationof the left andright eye wasmarkedmanually andthe facefinding module is rerun on the im-age.Thefacedetectionmodule becamemore likely to failasdeparturefrom thefrontalview increased.

    4.2. Pose

    Using the CMU PIE databasewe arein the unique po-sition to evaluatetheperformanceof facerecognition algo-rithms with respectto posevariations in great detail. Weexhaustively sampledthe posespaceby using eachview

  • in turn asgallery with the remaining views asprobes. Asthereis only a singleimagepersubjectandcamera view inthe database, the gallery imagesareincluded in the probeset. Table2 shows thecompleteposeconfusionmatrix forFaceIt.Of particularinterestis thequestionhow far theal-gorithm cangeneralizefrom givengalleryviews.

    Two thingsareworth noting. First, FaceIthasa reason-ablegeneralizability for frontal gallery images:therecogni-tion ratedropsto the70%-80%rangefor 45degreeof headrotation (corresponds to camerapositions 11and37 in Fig-ure 1 ). Figure6 shows the recognition accuraciesof thedifferentcameraviews for a mugshot gallery view.

    00.25

    0.50.75

    1

    Gallery pose: 27

    0.030.93

    0.93

    0.94

    0.750.13

    0.03

    0.06

    1.000.93

    0.07

    0.03

    0.62

    Figure 6. Recognition accuraciesof all camerasfor themugshot galleryimage.Therecognitionratesareplottedontheposepositionsshown in figure1(b). Thedarker color inthe lower portion of thegraphindicateshigherrecognitionrate.Thesquarebox marksthegalleryview.

    Second,for most non-frontal views (outsideof the 40degreerange), facegeneralizability goesdown drastically,even for very closeby views. This canbe seenin Figure7. Heretherecognition ratesareshown for the two profileviewsasgalleryimages.Thefull setof performancegraphsfor all 13gallery views is shown in appendix A.

    We thenaskedthequestion, if we cangainmore by in-cluding multiple faceposesin the gallery set? Intuitively,given multiplefaceposes,with correspondencebetweenthefacial features,onecanhave a betterchance of predicting

    00.25

    0.50.75

    1

    Gallery pose: 34

    0.000.01

    0.00

    0.03

    0.000.01

    0.01

    0.01

    0.040.00

    0.03

    1.00

    0.03

    00.25

    0.50.75

    1

    Gallery pose: 22

    0.000.03

    0.03

    0.03

    0.010.01

    1.00

    0.03

    0.030.01

    0.01

    0.00

    0.04

    Figure 7. Recognitionaccuracies of all camerasfor thetwo profile posesasgallery images(cameras34 and22 in1b).

  • a-66 -47 -46 -32 -17 0 0 0 16 31 44 44 62b 3 13 2 2 2 15 2 1.9 2 2 2 13 3

    ProbePose c34 c31 c14 c11 c29 c09 c27 c07 c05 c37 c25 c02 c22Gallery Pose

    c34 1.00 0.03 0.01 0.00 0.00 0.03 0.04 0.00 0.01 0.03 0.01 0.00 0.01c31 0.01 1.00 0.12 0.16 0.15 0.09 0.04 0.06 0.04 0.03 0.06 0.00 0.01c14 0.04 0.16 1.00 0.28 0.26 0.16 0.19 0.10 0.16 0.04 0.03 0.03 0.01c11 0.00 0.15 0.29 1.00 0.78 0.63 0.73 0.50 0.57 0.40 0.09 0.01 0.03c29 0.00 0.13 0.22 0.87 1.00 0.75 0.91 0.73 0.68 0.44 0.03 0.01 0.03c09 0.03 0.01 0.09 0.68 0.79 1.00 0.95 0.62 0.87 0.57 0.09 0.01 0.01c27 0.03 0.07 0.13 0.75 0.93 0.94 1.00 0.93 0.93 0.62 0.06 0.03 0.03c07 0.01 0.07 0.12 0.38 0.70 0.57 0.87 1.00 0.73 0.35 0.03 0.03 0.00c05 0.01 0.03 0.13 0.54 0.65 0.75 0.91 0.75 1.00 0.66 0.09 0.01 0.03c37 0.00 0.03 0.04 0.37 0.35 0.43 0.53 0.23 0.60 1.00 0.10 0.04 0.00c25 0.00 0.01 0.01 0.06 0.04 0.07 0.04 0.03 0.06 0.07 0.98 0.04 0.04c02 0.00 0.01 0.03 0.03 0.01 0.01 0.01 0.04 0.01 0.01 0.04 1.00 0.03c22 0.00 0.01 0.01 0.01 0.01 0.03 0.03 0.03 0.03 0.04 0.03 0.00 1.00

    Table 2. Confusiontablefor posevariation.Eachrow of theconfusiontableshows therecognitionrateoneachof theprobeposesgivena particulargallerypose.Thecamerapose,indicatedby its azimuth(� ) andaltitude(� ) anglewasshown in Figure1.

    novel faceposes.This testis carriedout by takingtwo setsof faceposes,cd7d7;�eIdf\�hgif\j , and c�8ikl�eIdf\�eI�mnj asgalleryandtestonall otherposes.Theresultsarepresentedin Table3.

    ProbePose 02 05 07 09 11 14 22 25 27 29 31 34 37Gallery Pose

    11-27-37 0.01 0.99 0.91 0.93 1.0 0.35 0.01 0.1 1.0 0.91 0.19 0.0 1.005-27-29 0.01 1.0 0.90 0.91 0.88 0.24 0.01 0.1 1.0 1.0 0.12 0.01 0.66

    Table 3. Posevariation.Recognition ratesfor FaceItwithmultiple posesin thegalleryset.

    An analysisof the results,shown in Figure8, indicatesthat with this algorithm, no additional gain is achievedthroughmultiple facegallery poses.This suggeststhat3Dfacerecognition approachescouldhave an advantageovernaive integrationof multiple faceposes,suchasin thepro-posed2DstatisticalSVM or relatednon-linearKernelmeth-ods.

    Weconductedthesamesetof experimentsusingtheMITalgorithm. The resultsaremuchworsethanFaceIt’s evenwith manual identificationof theeyes.We suspectthatthismight bedueto theextremesensitivity of theMIT algorithmto faceregistrationerrors.

    4.3. Illumination

    For this test, the PIE andAR databasesare used. Wefoundthatbothalgorithmsperformedsignificantlybetterontheillumination images thanunderthevarious posecondi-tions. Table4 shows the recognition accuraciesof FaceItandtheMIT algorithm on bothdatabases.As described insection2.2thePIE databasecontainstwo illuminationsets.

    Theimages in setillumination 1 weretakenwith theroomlights on. The mugshot gallery imagesdid not have flashillumination. For the illumination 2 setof imagestheroomlight wasswitchedoff. Theillumination for thegallery im-ageswasprovided by a flashdirectly oppositeof the sub-ject’s face. In eachcasetheprobe setwasmadeup by theremaining flashimages(21and20imagesrespectively). Ascanbe expected, the algorithms perform betterin the firsttest.

    IlluminationPIE 1 PIE2 AR 05 AR 06 AR 07

    FaceIt 0.97 0.91 0.95 0.93 0.86MIT 0.94 0.72 0.77 0.74 0.72

    Table 4. Illumination results. PIE 1 and 2 refer tothe two illumination conditionsdescribedin section2.2.oAR05,AR06,AR07p arethe o left,right,bothp light oncon-

    ditionsin theAR databaseasshown in Figure5

    The resultson the PIE databaseareconsistentwith theoutcomeof theexperimentson theAR database. Heretheimages5 through7 dealwith differentillumination condi-tions,varying the lighting from the left- andright sidestobothlights on.

    While theseresultsleadoneto concludethatfacerecog-nitionunderilluminationisasolvedproblem,wewouldliketo caution that the illumination change could still causeamajor problem when it is coupled otherchanges (expres-sion,pose,etc.).

  • 0

    0.25

    0.5

    0.75

    1

    Gallery pose: 11

    0

    0.25

    0.5

    0.75

    1

    Gallery pose: 27

    0

    0.25

    0.5

    0.75

    1

    Gallery pose: 37

    0

    0.25

    0.5

    0.75

    1

    Gallery pose: 11 27 37

    0.010.57

    0.50

    0.63

    1.000.29

    0.03

    0.09

    0.740.78

    0.15

    0.00

    0.40 0.030.93

    0.93

    0.94

    0.750.13

    0.03

    0.06

    1.000.93

    0.07

    0.03

    0.62

    0.040.60

    0.24

    0.43

    0.370.04

    0.00

    0.10

    0.530.35

    0.03

    0.00

    1.00 0.010.99

    0.91

    0.93

    1.000.35

    0.01

    0.10

    1.000.91

    0.19

    0.00

    1.00

    a1 b1

    a2 b2

    c1

    c2

    d1

    d2

    Figure 8. Multiple posesgallery vs. multiple singleposegallery. Subplotsa1 anda2 show the recognition ratewith gallerypose11. With a singlegallery image,the algorithmis ableto generalizeto nearbyposes.Subplotsb

    o1,2p andc o 1,2p show the

    recognition ratesfor poses27 and37. Subplotsdo1,2p show the recognition rateswith gallery poseof o 11,27,37p . Onecansee

    do1,2p is the sameas taking the maximumvaluesin a-co 1,2p . In this caseno additionalgain is achieved by using the joint seto

    11,27,37p asthegalleryposes.

  • 4.4. Expression

    Facesundergo large deformationsunderfacial expres-sions.Humanscaneasilyhandlethis variation, but we ex-pectedthealgorithmsto haveproblemswith theexpressiondatabases.To our surpriseFaceItandMIT performedverywell ontheCohn-KanadeandtheAR database.In eachtestweusedtheneutralexpressionasgalleryimageandprobedthealgorithm with thepeakexpression.

    ExpressionCohn-Kanade AR 02 AR 03 AR 04

    FaceIt 0.97 0.96 0.93 0.78MIT 0.94 0.72 0.67 0.41

    Table 5. Expressionresults. AR 02, AR 03 andAR 04referto theexpressionchangesin theAR databaseasshownin figure5. Both algorithmsperformreasonably well underfacialexpression, however the“scream”expression, AR 04,produceslargerecognition errors.

    Table5 shows theresultsof bothalgorithmson the twodatabases.Thenotable exception is thescream(AR04) setof theAR database.

    For most facial expressions,the facial deformation iscentered around the lower part of the face. This mightleave sufficient invariant information in the upperfaceforrecognition, which resultsin a high recognition rate. Theexpression“scream”haseffectson both theupper andthelower faceappearance,which leadsto a significantfall offin the recognition rate. This indicatesthat 1) facerecog-nition under extreme facialexpressionstill remainsanun-solved problem, and2) temporal informationcanprovidesignificant additional informationin facerecognitionunderexpression.

    4.5. Occlusion

    For theocclusion testswe look at imageswherepartsofthefaceareinvisible for thecamera.TheAR databasepro-videstwo scenarios:subjectswearingsunglassesandsub-jectswearinga scarfaround the lower portion of the face.The recognition ratesfor the sunglassimagesareaccord-ing to expectations.As Table6 shows, FaceItis unable tohandle this variation (AR08). The result further deterio-rateswhentheleft or right light is switchedon (AR09 andAR10).Thisresultis readilyreplicatedontheimagesof thesecondsession.

    This testreveals thatFaceItis morevulnerableto upperfaceocclusionthan the MIT algorithm. Facial occlusion,particularly upper faceocclusion,remainsa difficult prob-lemyet to besolved. Interestingopenquestionsare1) what

    OcclusionAR 08 AR 09 AR 10 AR 11 AR 12 AR13

    FaceIt 0.10 0.08 0.06 0.81 0.73 0.71MIT 0.34 0.35 0.28 0.46 0.43 0.40

    Table 6. Occlusionresults.AR08, AR09, AR10 refer tothe upperfacial occlusions,andAR11, AR12, AR13 referto the lower facialocclusions asshown in figure 5. Upperfacialocclusioncausesa majordropin recognition rates.

    arethefundamentallimits of any recognition systemundervarious occlusions, and2) to what extend canotheraddi-tional facial information,suchasmotion, provide the nec-essaryhelpfor facerecognitionunderocclusion.

    4.6. Gender

    Male andfemalefacesdiffer in both local featuresandin shape[4]. Men’s faceson average have thicker eye-brows andgreatertexture in thebeardregion. In women’sfaces,the distancebetweenthe eyesandbrows is greater,the protuberanceof the nosesmaller, and the chin nar-rower than in men [4]. Peoplereadily distinguishmalefrom femalefacesusing theseandotherdifferences(e.g.,hair style),andconnectionist modeling hasyieldedsimilarresults[6, 18]. Little is known, however, about the sen-sitivity of faceidentificationalgorithms to differencesbe-tweenmen’s andwomen’s faces.The relative proportionsof menandwomenin trainingsamplesis seldomreported,andidentification resultstypically fail to reportwhether al-gorithmsaremoreor lessaccuratefor onesex or theother.Otherfactors thatmayinfluenceidentification, suchasdif-ferences in face shapebetweenindividuals of European,Asian,andAfrican ancestry[4, 9], have similarly beenig-nored in pastresearch.

    We evaluatedthe influence of genderon facerecogni-tion algorithmsontheAR databasedueto its balancedratiobetweenthefemaleandmalesubjects.Figure9 shows therecognition rateachievedby FaceItacrossthe13variationsincluding illumination, occlusion, andexpression.

    The resultsreveal a surprising trend: betterrecognitionratesareconsistentlyachieved for femalesubjects.Aver-agedacrossthe conditions (excluding the testsAR08-10whereFaceIt breaks down) the recognition rate for malesubjectsis 83.4%,while therecognition ratefor femalesub-jectsis 91.66%. It is not clearwhathascausedthis effect.To further validate this result, a much larger databaseisneeded.

    If this result is further substantiated,it opens up manyinterestingquestionson facerecognition. In particularitraisesthequestions:1)whatmakesonefaceeasierto recog-

  • nizethananother, and2) aretherefaceclasseswith similarrecognizability.

    01 02 03 04 05 06 07 08 09 10 11 12 130

    0.5

    1ARDB Male/Female

    Figure 9. ARDB resultsmalevs. female. The dashedline indicatesthe recognition ratefor malesubjectsin theAR databaseshown in figure5.

    5. Discussion

    In natural environments, pose,illumination, expression,occlusion and individual difference among people repre-sentcritical challengesto facerecognition algorithms. TheFERETtests[23] andthe Facial Recognition Vendor Test2000 [3] provided initial resultson limited variations ofthesefactors.

    FaceItandtheMIT algorithmwereoverall thebestper-formerin thesetests.Weevaluatedbothalgorithmsonmul-tiple independentdatabasesthat systematicallyvary pose,illumination, expression,occlusion,andgender. We found:

    1. Pose:Posevariation still presentsa challengefor facerecognition. Frontal trainingimageshavebettergener-alizability to novel views thando non-frontal trainingimages. For a frontal training view, we canachievereasonable recognition ratesof 70-80% for up to 45degreeheadrotation.In addition, usingmultiple train-ingviewsdoesnotnecessarilyimprovetherecognitionrate.

    2. Illumination: Pureillumination changes on the faceare handled well by current face recognition algo-rithms.

    3. Expression: With the exception of extreme expres-sionssuchasscream,thealgorithmsarerelatively ro-bust to facial expression. Deformation of the mouth

    andocclusionof theeyesby eye narrowing andclos-ing present a problemto thealgorithms.

    4. Occlusion: The performance of the facerecognitionalgorithmsunder occlusionvaries.FaceItis more sen-sitive to upper faceocclusionthanMIT. FaceItis morerobust to lower faceocclusion.

    5. Gender:We foundsurprisingly consistentdifferencesof facerecognition ratesacrossthegender. This resultis basedon testingon the AR databasewhich has70maleand60 femalesubjects.On average the recog-nition ratefor femalesis consistentlyabout 5%higherthanfor males,acrossa range of perturbation. Whilethe databaseusedin thesetestsis too small to drawgeneral conclusionsit pointsinto aninterestingdirec-tion for futureresearchanddatabasecollections.

    The current study has several limitations. One, wedid not examine the effect of faceimagesize on al-gorithm performancein thevariousconditions. Mini-mum sizethresholds may well differ for variousper-mutations, which would be important to determine.Two, the influence of racial or ethnicdifferencesonalgorithm performancecouldnot be examined duetothe homogeneityof racial andethnicbackgrounds inthedatabases.While largedatabaseswith ethnicvaria-tion areavailable,they lack theparametricvariation inlighting, shape,poseandotherfactorsthatwerethefo-cusof this investigation.Three,faceschange dramat-ically with development,but the influence of changewith developmentonalgorithm performancecouldnotbe examined. Fourth, while we were able to exam-ine the combined effects of somefactors, databasesareneededthatsupport examinationof all ecologicallyvalid combinations,which may be non-additive. Theresultsof thecurrent studysuggest thatgreateratten-tion be paid to the multiple sources of variation thatarelikely to affect facerecognition in naturalenviron-ments.

    6. Acknowledgement

    This researchis supported by ONR N00014-00-1-0915(DARPA HumanID) andNSFIRI-9817496.

    A. Appendix

    The following figuresshow the recognition ratesfor allcameraviews in turn asgallery images.They areorderedaccording to thecameranumbersroughly going from left toright in Figure1.

  • 00.25

    0.50.75

    1

    Gallery pose: 34

    0.000.01

    0.00

    0.03

    0.000.01

    0.01

    0.01

    0.040.00

    0.03

    1.00

    0.03

    Figure 10. Recognition accuraciesof all camerasforcameraview 34 asgalleryimage.

    00.25

    0.50.75

    1

    Gallery pose: 31

    0.000.04

    0.06

    0.09

    0.160.12

    0.01

    0.06

    0.040.15

    1.00

    0.01

    0.03

    Figure 11. Recognition accuraciesof all camerasforcameraview 31 asgalleryimage.

    00.25

    0.50.75

    1

    Gallery pose: 14

    0.030.16

    0.10

    0.16

    0.281.00

    0.01

    0.03

    0.190.26

    0.16

    0.04

    0.04

    Figure 12. Recognitionaccuraciesof all camerasforcameraview 14 asgalleryimage.

    00.25

    0.50.75

    1

    Gallery pose: 11

    0.010.57

    0.50

    0.63

    1.000.29

    0.03

    0.09

    0.740.78

    0.15

    0.00

    0.40

    Figure 13. Recognitionaccuraciesof all camerasforcameraview 11 asgalleryimage.

  • 00.25

    0.50.75

    1

    Gallery pose: 29

    0.010.68

    0.74

    0.75

    0.870.22

    0.03

    0.03

    0.911.00

    0.13

    0.00

    0.44

    Figure 14. Recognition accuraciesof all camerasforcameraview 29 asgalleryimage.

    00.25

    0.50.75

    1

    Gallery pose: 07

    0.030.74

    1.00

    0.57

    0.380.12

    0.00

    0.03

    0.870.71

    0.07

    0.01

    0.35

    Figure 15. Recognition accuraciesof all camerasforcameraview 07 asgalleryimage.

    00.25

    0.50.75

    1

    Gallery pose: 27

    0.030.93

    0.93

    0.94

    0.750.13

    0.03

    0.06

    1.000.93

    0.07

    0.03

    0.62

    Figure 16. Recognitionaccuraciesof all camerasforcameraview 27 asgalleryimage.

    00.25

    0.50.75

    1

    Gallery pose: 09

    0.010.87

    0.62

    1.00

    0.680.09

    0.01

    0.09

    0.960.79

    0.01

    0.03

    0.57

    Figure 17. Recognitionaccuraciesof all camerasforcameraview 09 asgalleryimage.

  • 00.25

    0.50.75

    1

    Gallery pose: 05

    0.011.00

    0.75

    0.75

    0.540.13

    0.03

    0.09

    0.910.65

    0.03

    0.01

    0.66

    Figure 18. Recognition accuraciesof all camerasforcameraview 05 asgalleryimage.

    00.25

    0.50.75

    1

    Gallery pose: 37

    0.040.60

    0.24

    0.43

    0.370.04

    0.00

    0.10

    0.530.35

    0.03

    0.00

    1.00

    Figure 19. Recognition accuraciesof all camerasforcameraview 37 asgalleryimage.

    00.25

    0.50.75

    1

    Gallery pose: 02

    1.000.01

    0.04

    0.01

    0.030.03

    0.03

    0.04

    0.010.01

    0.01

    0.00

    0.01

    Figure 20. Recognitionaccuraciesof all camerasforcameraview 02 asgalleryimage.

    00.25

    0.50.75

    1

    Gallery pose: 25

    0.040.06

    0.03

    0.07

    0.060.01

    0.04

    0.99

    0.040.04

    0.01

    0.00

    0.07

    Figure 21. Recognitionaccuraciesof all camerasforcameraview 25 asgalleryimage.

  • 00.25

    0.50.75

    1

    Gallery pose: 22

    0.000.03

    0.03

    0.03

    0.010.01

    1.00

    0.03

    0.030.01

    0.01

    0.00

    0.04

    Figure 22. Recognition accuraciesof all camerasforcameraview 22 asgalleryimage.

    References

    [1] RobertJ. Baron. Mechanismsof humanfacialrecog-nition. International Journal ofMan-MachineStudies,15(2):137–178,1981.

    [2] P. N. Belhumeur, J.P. Hespanha, andD. J.Kriegman.Eigenfacesvs. fisherfaces: Recognition using classspecificlinearprojection. IEEE Transactions on Pat-tern Analysisand Machine Intelligence, 19(7):711–720,July 1997.

    [3] D.M. Blackburn, M. Bone, and P.J. Philips. Facialrecognition vendor test2000:evaluationreport, 2000.

    [4] V. BruceandA. Young. In theeyeof thebeholder: Thescienceof faceperception. Oxford University Press,1998.

    [5] T. Cootes,K. Walker, andC. Taylor. View-basedac-tive appearancemodels. In IEEE International Con-ferenceon AutomaticFaceandGesture Recognition,March2000.

    [6] G.W. CottrellandJ.Metcalfe.Empath:Face,emotion,and gender recognition usingholons. In R.P. Lipp-mann,J.E.Moody, andD.S.Touretzky, editors,Neu-ral informationprocessingsystems, volume 3, pages53–60, SanMateo,CA, 1991. MorganKaufmann.

    [7] M. N. Dailey andGarrisonW. Cottrell. Organizationof faceandobjectrecognition in modular neuralnet-work models. Neural Networks, 12(7/8),1999.

    [8] P. EkmanandW.V. Friesen.Facial ActionCodingSys-tem. ConsultingPsychologist Press,1978.

    [9] L.G. Farkasand I.R. Munro. Anthropometric facialproportions in medicine. C.C. Thomas,Springfield,IL, 1987.

    [10] R. Gross,J. Yang,andA. Waibel. Facedetectionina meetingroom. In Proceedings of the Fourth IEEEInternational ConferenceonAutomaticFaceandGes-tureRecognition, Grenoble,France,2000.

    [11] Peter J. B. Hancock, A. Mike Burton, and VickiBruce. Faceprocessing:Humanperception andprin-cipal componentsanalysis. Memoryand Cognition,24(1):26–40,1996.

    [12] A. JonathanHowell andHilary Buxton. Invarianceinradial basisfunction neural networks in humanfaceclassification.Neural ProcessingLetters, 2(3):26–30,1995.

    [13] T. Kanade,J.F. Cohn, andY. Tian. Comprehensivedatabasefor facial expressionanalysis. In Proceed-ingsof theFourth IEEE International ConferenceonAutomatic Faceand Gesture Recognition, pages46–53,Grenoble, France,2000.

    [14] T. Kanade,H. Saito,andS.Vedula.The3droom: Dig-itizing time-varying 3d eventsby synchronizedmul-tiple video streams. TechnicalReportCMU-RI-TR-98-34,RoboticsInstitute,CarnegieMellonUniversity,Pittsburgh, PA, December1998.

    [15] Andreas Lanitis, Christopher J. Taylor, and Timo-thy FrancisCootes.Automatic interpretationandcod-ing of faceimagesusingflexible models.IEEETrans-actionsonPatternAnalysisandMachine Intelligence,19(7):743–756, 1997.

    [16] Steve Lawrence,C. Lee Giles, Ah ChungTsoi, andAndrew D. Back. Facerecognition: A convolutionalneural network approach.IEEETransactionsonNeu-ral Networks, 8(1):98–113, 1998.

    [17] Y. Li, S.Gong,andH. Liddell. Support vector regres-sionandclassificationbasedmulti-view facedetectionand recognition. In IEEE International Conferenceon AutomaticFaceand Gesture Recognition, March2000.

    [18] A.J.Luckman,N.M Allison, A. Ellis, andB.M. Flude.Familiar facerecognition: A comparative studyof aconnectionistmodelandhumanperformance.Neuro-computing, 7:3–27,1995.

  • [19] A. R. Martinez and R. Benavente. The ar facedatabase.Technical Report24,ComputerVisionCen-ter(CVC) TechnicalReport, Barcelona,Spain, June1998.

    [20] BabackMoghaddamandAlex Paul Pentland.Proba-bilistic visuallearningfor objectrepresentation. IEEETransactionsonPatternAnalysisandMachineIntelli-gence, 19(7):696–710, 1997.

    [21] E. Osuna,R. Freund, and F. Girosi. Training sup-port vector machines: An application to facedetec-tion, 1997.

    [22] P. Penev andJ. Atick. Local feature analysis: A gen-eralstatisticaltheoryfor objectrepresentati on,1996.

    [23] P. Jonathon Phillips, Harry Wechsler, Jeffrey S.Huang,andPatrickJ.Rauss.TheFERETdatabaseandevaluationprocedurefor face-recognition algorithms.ImageandVisionComputing, 16(5):295–306, 1998.

    [24] T. Poggio andK.-K. Sung. Example-basedLearningfor View-basedHumanFaceDetection.In 1994ARPAImage Understanding Workshop, volume II, Novem-ber1994.

    [25] B. Scholkopf, A. Smola,and K.-R. Muller. Kernalprincipal component analysis.In Artificial Neural Net-worksICANN97, 1997.

    [26] T. Sim, S. Baker, andM. Bsat. The CMU Pose,Il-lumination, andExpression(PIE) databaseof humanfaces.Technical ReportCMU-RI-TR-01-02,RoboticsInstitute,Carnegie Mellon University, Pittsburgh, PA,January2001.

    [27] L. Sirovich andM. Kirby. Low-dimensional proce-durefor thecharacterizationof humanfaces.Journalof Optical Societyof America, 4(3):519–524, March1987.

    [28] L. SirovichandM. Kirby. Low-dimensionalprocedurefor thecharacterizationof humanfaces,1987.

    [29] Matthew Turk and Alex Paul Pentland. Eigenfacesfor recognition. Journal of Cognitive Neuroscience,3(1):71–86, 1991.

    [30] Vladimir N. Vapnik. Thenatureof statisticallearningtheory. Springer Verlag,Heidelberg, DE, 1995.

    [31] ThomasVetter. Synthesisof novel viewsfromasinglefaceimage.International Journal of ComputerVision,28(2):103–116,1998.

    [32] ThomasVetter, Anya Hurlbert, andTomasoPoggio.View-basedmodels of 3D object recognition: Invari-anceto imaging transformations. Cerebral Cortex,5(3):261–269,1995.

    [33] LaurenzWiskott, Jean-Marc Fellous,Norbert Krüger,and Christophvon der Malsburg. Facerecognitionby elastic bunch graph matching. IEEE Transac-tions on Pattern Analysisand Machine Intelligence,19(7):775–779, July1997.

    [34] Alan L. Yuille. Deformabletemplatesfor facerecog-nition. Journal of Cognitive Neuroscience, 3(1):59–70,1991.