recognition of facial expressions in the presence of occlusion

213

Recognitionof Facial Expressionsin thePresenceof Occlusion

FabriceBourelClaudeC. Chibelushi

AdrianA. LowSchoolof Computing

StaffordshireUniversityBeaconside

Stafford [email protected]

[email protected]@staffs.ac.uk

Abstract

We presenta new approachfor the recognitionof facial expressionsfromvideo sequencesin the presenceof occlusion. Although promisingresultshavebeenreportedin theliteratureonautomaticrecognitionof facialexpres-sions,most techniqueshave beenassessedusingexperimentsperformedincontrolledlaboratoryconditionswhich do not reflectreal-world conditions.Thegoalof thework presentedherein,is to developrecognitiontechniquesthatwill overcomesomelimitationsof currenttechniques,suchastheir sen-sitivity to partialocclusionof the face. Theproposedapproachis basedona localisedrepresentationof facial features,andon datafusion. Theexper-imentsshow that theproposedapproachis robust to partialocclusionof theface.

1 Intr oduction

The automaticrecognitionof facial expressionshasbeenstudiedwith muchinterestinthe past10 years[9, 22, 24]. An importantpotentialapplicationof automaticfacial ex-pressionrecognitionis humancomputerinteraction[19, 30]. It mayalsobeusedasatoolin humanbehavioral research[22]. This paperpresentsa new approachfor recognisingfacial expressions,from imagesequences,in the presenceof partial occlusion.The ap-proachis basedon a localisedrepresentationof facialexpressionfeatures,andon fusionof classifieroutputs. Data fusion or sensorfusion hasbeenusedsuccessfullyin manyfields suchaspatternrecognition[6, 7, 25] or distributedcomputing[4, 5]. The abilityto handleoccludedfacial featuresis importantfor achieving robustrecognition.Despiteits importance,theeffectof occlusionon automaticfacialexpressionrecognitionhasnotreceived muchattentionfrom the researchcommunity. As indicatedby Pantic [22], nofacial expressionrecognitiontechniquescancurrentlyhandlepartial occlusion. To the

BMVC 2001 doi:10.5244/C.15.23

214

knowledgeof the authors,the effect of the latter on recognitionaccuracy hasnot beenassessedeither. Causesfor occlusionmaybefacialhair, glasses,handsor otherobjects.

By thenatureof their structures,facialmodelsusinglocal facial informationextrac-tion have an advantageover modelsthat considerthe whole face. For instance,whenapart of the faceis occluded,performingoptical flow on the whole face[12] may causesomeproblemsin representingtheextracteddatafor laterclassification.Approachesus-ing adeformablemodelfittedto thewholeface[16, 15] maybeadaptedto handlemissingfeatures,if thestructureof themodelandtheclassifierbothhandlepartialdatain agenericway. Facemodelsbasedonfeaturepointsanda localgeometricrepresentationof theface[27, 28] canbe adaptedeasily for achieving robustnessto partial occlusion. The latterwill causethe lossof somemodelparametersbut the parametersnot affectedby occlu-sionmaybestill usedfor classification.Theclassifieralsohasto beableto handlepartialinputdataandmayneedto beretrained.

Although sensitivity to occlusionis still an unresolved problem,many techniquesfor facial expressionrecognitionhave beenreported.Black andYacoob[2] usea localparameterizedmodelof imagemotion from optical flow analysis.They utilize a planarmodel for rigid facial motion andan affine-plus-curvaturemodel for non rigid motion.OtsukaandOhya[21] estimatethemotionof theright eyeandthemouthby opticalflow.Then,they performa2D wavelettransformto extractlow-frequency featurevectors.Essaand Pentland[12] first locatethe nose,eyes and mouth. Then, from two consecutivenormalisedframes,a2D spatio-temporalmotionenergy representationof facialmotionisusedasadynamicfacemodel.KimuraandYachida[15] fit whatthey call a‘potentialnet’on theface.Then,applyinga Gaussianfilter to anedgeenhancedimage,they determinea ‘potential field’. The net is thendeformedby the elasticforce of the potentialfield.Informationencodedin the nodesof the net is usedfor classification.Wanget al. [28]use19facialfeaturepoints.Thefaceis representedasa labeledgraphmodelwherefacialfeaturepoints are treatedas interconnectednodes. Graphsare comparedto templatesfor recognitionusinga minimum-distanceclassificationtechnique.Cohnet al. [8] usefeaturepointsthatareautomaticallytrackedusingahierarchicalopticalflow method[18].Thefeaturevectors,usedfor therecognition,arecreatedby calculatingthedisplacementof the facialpoints. Thedisplacementis obtainedby subtractingits normalisedpositionin thefirst framefrom its currentnormalisedposition.Following Cohn’s approach,Lienet al. [17] performthreemethods:denseoptical flow, featurepoint trackingandhighgradientedgeextraction.Then,thevectorsfrom eachextractionstepsarevectorquantizedandusedfor classification.Tian et al. [27] recentlyproposeda feature-basedmethodusing geometricfacial features. The extractedfacial regions (mouth, eyes, brows andcheeks)arerepresentedwith geometricparameters.Thefurrows arealsodetectedusinga Canny edgedetectorto measureorientationandquantifytheir intensity. Thegeometricparametersarethenfed into aneuralnetwork for recognition.

Following thefacial featureextractionis theclassificationstep.Six basicexpressionclasses(anger, disgust,fear, joy, sadness,surprise),definedby Ekman[10], areoftenused.ExpressionanalyserscanclassifytheencounteredexpressionusingtheFACSsystem[11].The FACS systemdefines46 actionunits (AUs) for describingfacial expressions.AUsareanatomicallyrelatedto contractionof specificmuscles.AUscanbeclassifiedeitherasa singleor combinationof AUs [2, 23, 8, 17, 27]. Most approachesfor theclassificationof expressionsfrom imagesequencesarebasedon neuralnetworks [23, 27], rule-basedmodels[2, 29], templatebased-methods[28, 20, 12, 15] usingminimum-distanceclassifi-

215

cation,statisticalmodelsusingdiscriminantfunctionanalysis[8], or probabilisticmodelssuchashiddenMarkov Models[17, 21].

In thecontext of robustrecognitionof occludedexpressions,usinga modularclassi-ficationandfusionapproachmayhave severaladvantagesover monolithicclassification[25]. In thedatafusionapproach,failureof oneor severalclassifiersdoesnot necessarilydegradetheclassificationprocess.New classificationmodulescanalsobeappendedeas-ily whenotherregionsof thefaceareto beexamined.Monolithic classifiersaresensitiveto incompleteinputdata,aswould occurin caseof partialocclusionof theface.

Thispaperpresentsandassessesa datafusionapproachaimingto achieverobustnessto partial occlusion. The paperis organisedasfollow: Section2 givesan overview ofthesystem.Section3 andSection4 describerespectively thespatio-temporalmodelusedto representtheexpressions,andtheclassificationanddatafusionframework. Section5performsexperimentson the effect of occlusionanddiscussesthe results.We concludein Section6 by summarisingthefindingsandsuggestingfutureresearchdirections.

2 Overview of the approach

Feature pointtracking data

Motion informationextraction andnormalisation

Classificationof local featurevector 1

Classificationof local featurevector 6

Localclassifiers

Fusion/combiner

Class label

Image sequence

Figure1: Architectureof therecognitionsystem.

The framework for automaticrecognitionof facial expressions,shown in Figure1,consistsof a featurepoint tracker, a featureextractor, a groupof local classifiers,andafusionmodule.Thefeatureextractorprocesses12 facialfeaturepointswhich aretrackedover anentireimagesequenceby the robust tracker describedin [3] (Figure2). The12facialfeaturepointsaresetmanuallyon thefirst frameof theimagesequence.

As describedin [3], the tracker, basedon the Kanade-Lucastracker [18], is capableof following and recovering any of the 12 facial points lost due to lighting variations,rigid or non-rigidmotion,or (to a certainextent)changeof headorientation.Automatic

216

recovery, which usesthe nostrilsasa reference,is performedbasedon someheuristicsexploiting the configurationandvisual propertiesof faces.It wasshown in [3] that theenhancedtracker outperformsthe Kanade-Lucastracker, in termsof recovering lost ordrifting points.

Whenthetrackingis over, independentlocalspatio-temporalvectors,arecreatedfromgeometricalrelationsbetweenthe featurepoints (Figure3). Thesevectorscontainthemotioninformationrelativeto thestartof eachsequence.Dynamicinformationis usedinorderto achieve shapeindependence.Shapeindependenceis soughtsothat theeffect ofpersonidentity on expressionrecognitionis minimised. In particular, it hasbeenshownin speaker recognitionthatlip shapeconveysidentity information[6].

Classificationis appliedto local facialregions(mouthandeyebrows)insteadof globalfacialinformation,andthelocal regionsareclassifiedindependently. Thisway, occlusionof a local region will not affect the classificationof the other regions. Decision-levelfusion [25] of classifieroutputsis used. For the purposeof this paper, 4 classesof ex-pressionsareusedout of the6 definedin [10]; they arethefacialexpressionsfor ‘anger’,‘joy’, ‘sadness’and‘surprise’.

Tracking

Automatic pointrecovery

Next frame

loss

no loss

Trackerupdate

Figure2: Automaticfeaturepoint trackingscheme.

3 Spatio-temporal representationof expressions

Facial point tracking is followed by a determinationof the start and endpointsof thedynamicpart of the sequencefor eachlocal region. The neutralend-segmentsarethendiscarded.Thereafter, thedurationof thesequenceis normalisedto approximately2.66secondsby linear interpolation. At 30 framesper second,this correspondsto 80 videoframes. The normalisationis appliedindependentlyto eachlocal region. As a resultof durationnormalisation,two expressionsperformedin a differentamountof time willbecomesimilar on thenormalisedtemporalscale.Figure4 illustratestheremoval of the

217

Figure3: Facialmodelusedto producethesix local featurevectors.

neutralsegmentsat thebeginningandendof asequence.The mouthmodel is definedusingthe four featurepointsshown in Figure2. From

thesefour points,we extract four paraboliccoefficients,which model the shapeof themouthin eachframe(Figure3). Theupperfacemodelalsoconsistsof four coefficients.Two coefficientsmodeltheshapeof theeyebrowsbymeasuringtheirangleof deformation(Figure 3). The other two coefficients measurethe distancefrom the eyebrows to thenostrils (Figure 3). From the eight coefficients, a set of six local featurevectorsarecreated:

�� . The componentsof the featurevectorsare

createdby takingthedifferencebetweenthecoefficientsfor eachframe,andtheaverageof thefirst setof coefficientsrepresentingthesegmentremovedfrom thebeginningof thesequence.

Neutral Neutral

Amplitude ofparabolic coefficient

Frames Frames

Amplitude ofparabolic coefficient

(a) (b)

Figure 4: Removal of theneutralsegmentsat thestartandendof a motionsequence.(a)Neutralsegmentsdetected.(b) Neutralsegmentsremoved.

218

4 Facial expressionclassification

Datafusionhasbeenusedby many researchersto improve theaccuracy of patternclas-sification[6], whenseveralsourcesof informationareindependentor complementary. Inour system,eachlocal expert classifieroutputsa recognitionscorefor eachknown ex-pression.The scoresare thencombinedto producea final classificationresult (Figure5).

(c1, c4)

(c2, c3)

a1

d1

a2

d2Class label

a1 a2

d2

c3

c4

c2

c1

d1

Localclassifiers

Featurevectors

Combiner

V

V

V

V

V

V

Figure5: Expressionclassificationusingdecision-level fusion.

4.1 Local classifiers

Each local classifieris a rank-weighted� -nearest-neighbourclassifier. It producesaweightedcumulativescorefor eachknown classof expression.Theweightedclassscoreis proportionalto the rank of eachnearestneighbourbelongingto the class. This is anadaptationto the distance-weighted� NN classifier[1]. Therefore,the classof the firstnearestneighbourwill have thehighestscore,thentheclassof theseconda lower scoreandsoonuntil wereach� neighbours.For instance,if �� , theclassof thefirst nearestneighbourreceivesa scoreof 4, the classof the secondneighbour3, until the fifth oneandtherestwhich receivea scoreof 0.

4.2 Combining scheme

Thecombinationis implementedasa sumschemeperformedat thedecisionlevel. It isusedherefor its simplicity. Whenall local classifiersarecombined,all the classesofexpressionget a final scoreby summingthe class-specificscorefrom eachlocal classi-fier. The unknown patternis assignedto the classwhich hasthe highestscore. Otherapproachesto decisionfusionsuchaslinearcombination,Bayesianinference,neuralnet-works,fuzzy logic arethesubjectof ourongoinginvestigations[6, 25, 13].

219

5 Experimental results

Anger Joy Sadness Surprise0

20

40

60

80

100

% Correctrecognition

No occlusion

Mouth occlusion

Upper face occlusion

left/right half faceocclusion

Figure6: Effectof partialocclusiononfacialexpressionrecognition.Errorbarsshow the99%confidenceintervals.

Theaim of theseexperimentsis to studytherobustness,of theproposedrecognitionapproach,to theocclusionof faceregions.Theexperimentshave beencarriedout using100 video sequences,where25 sequencesareavailablefor eachexpressionclass. Par-titioning into testandtraining setsis basedon the leave-one-outtestmethodology. Theconfidencelimits areestimatedundertheassumptionthatrecognitionrateshaveof abino-mial distribution [26]. The99%confidencelevelsproduce,in theworstcase,confidenceintervalsof � 10.3%for thetestsamplesizeandvaluesof recognitionratesmeasured.Theimagesequencesarefrom theCohn-KanadeAction-Unit-codedCarnegieMellon Univer-sity facialexpressiondatabase[14]. The30subjectsfrom thedatabasevary in ethnicities,ageandskincolor.

Testswith noocclusionof any facialfeaturesserveasbaseline.Theeffectof occlusionis investigatedby simulatingmissingfacial regions. Thetestsetcoversocclusionof themouth,upperface,or left/right half of theface.Themotivationfor thesetestsaremany.For example,whensomebodyperformsanexpressionsuchasasmile,it mayhappenthat,heputshishandor anobjectin front of hismouth.Facialhairor glassescanalsooccludefaceregions.

Theresultsareshown in Figure6. Recognitionof ‘anger’ and‘surprise’expressionsis robustto theocclusionof themouth.This is asexpectedbecausesomediscriminativeinformationis alsoembeddedin theupperpartof theface.Surprisingly, ‘joy’ is accuratedespitethe absenceof the mouth,which is importantwhenmakinga smile. This unex-pectedresultcouldbedueto virtual lack of motionof theeyebrowsduringa smile. Thissets‘joy’ apartfrom theotherexpressionclassesunderstudy. However, a problemmayariseif a neutralexpressionis consideredasa valid classfor recognition. The featurevectors,for ‘joy’ andthe neutralexpression,would be similar. The discriminative fea-turesfor the‘sadness’expressionaremainlyconveyedby themouth,asillustratedby thesensitivity to mouthocclusionof therecognitionof ‘sadness’.

Both left/right facesideswerefound to give the sameresultson average.They arethereforerepresentedasonesinglesetin Figure6. Therecognitionresultsunderocclusionare very closeto the baselineresultscorrespondingto no occlusion. This shows thatreducingthe amountof facial informationby half still retainssufficient discriminative

220

informationfor expressionrecognition.However, removingthedatacorrespondingto halfof thefacedoesnotnecessarilymeanthatweremoveredundantdata.Indeed,expressionscanbeunilateralandbilateral[11].

Overall, theexperimentalresultsshow thattheproposedapproachis robustto partialocclusionof the face. However, analysingexpressionsusinggeometricfeaturesmissestheregionalinformationembeddedin imageintensity, texture,or edges.Hence,ongoinginvestigationsarestudyingtheuseof bothgeometricfeaturesandtexture. Theeffect onrobustnessto partialocclusion,of thegranularityof thesubdivision of thefaceinto localregions,is alsounderstudy.

6 Conclusion

A methodfor recognizingfacial expressionsin the presenceof occlusionhasbeenpre-sented.Facialfeaturepointsareautomaticallytrackedoveranimagesequences,andusedto representa facemodelmadeof severalregionsof interest.Decision-level fusioncom-bineslocal interpretationof the facemodel into a global recognitionscore. It hasbeenexperimentallyshown thattheproposedapproachis robustto partialocclusionof theface.It hasalsobeenobservedthat thereis an interactionbetweenrecognitionrobustnessforsomeexpressionsandwhatregionof thefaceis occluded.

References

[1] T. Bailey andA. K. Jain. A Noteon Distance-Weightedk-NearestNeighborRules.IEEETransactionson Systems,Man,andCybernetics, 8:311–313,1978.

[2] M. J.BlackandY. Yacoob. RecognizingFacialExpressionsin ImageSequencesUs-ing Local ParameterizedModelsof ImageMotion. InternationalJournal on Com-puterVision, 25(1):23–48,October1997.

[3] F. Bourel,C. C. Chibelushi,andA. A. Low. RobustFacialFeatureTracking.Proc.11th British Machine Vision Conference, Bristol, England, 1:232–241,September2000.

[4] R. R. BrooksandS. S. Iyengar. RobustDistributedComputingandSensingAlgo-rithm. IEEE Computer, 29(6):53–60,June1996.

[5] R. R. BrooksandS. S. Iyengar. Multi-SensorFusion: FundamentalsandApplica-tionswith Software. PrenticeHall, 1998.

[6] C. C. Chibelushi,F. Deravi, andS.D. Mason.A Review of Speech-BasedBimodalRecognition.IEEE TransactionsonMultimedia,Acceptedfor publication.

[7] C. C. Chibelushi,F. Deravi, andS. D. Mason. Adaptive ClassifierIntegrationforRobustPatternRecognition.IEEE Transactionson Systems,Man,andCybernetics- Part B: Cybernetics, 29(6):902–907,December1999.

221

[8] J. Cohn,A. Zlochower, J. Lien, andT. Kanade. Feature-pointtrackingby opticalflow discriminatessubtledifferencesin facial Expression. Proc. 3rd IEEE Inter-nationalConferenceon AutomaticFaceandGesture Recognition, pages396–401,1998.

[9] G. Donato,M. S. Barlett, J. C. Hager, P. Ekman,andT. J. Sejnowski. ClassifyingFacialActions. IEEE Transactionson Pattern AnalysisandMachine Intelligence,21(10):974–989,October1999.

[10] P. EkmanandW. Friesen.UnmaskingtheFace. PrenticeHall, New Jersey, 1975.

[11] P. EkmanandW. Friesen.TheFacialAction CodingSystem.ConsultingPsycholo-gistsPress,SanFrancisco,CA, 1978.

[12] I. A. EssaandA. P. Pentland.Coding,Analysis,InterpretationandRecognitionofFacial Expressions.IEEE Transactionson Pattern Analysisand Machine Intelli-gence, 19(7):757–763,July1997.

[13] A. K. Jain,R. P. W. Duin, andJ. Mao. StatisticalPatternRecognition:A Review.IEEE Transactionson PatternAnalysisandMachineIntelligence, 22(1):4–37,Jan-uary2000.

[14] T. Kanade,J. Cohn,andY. Tian. Comprehensive Databasefor Facial ExpressionAnalysis.Proc.4th IEEEInternationalConferenceon AutomaticFaceandGestureRecognition (FG’2000),France, pages46–53,March2000.

[15] S. Kimura andM. Yachida.FacialExpressionRecognitionandits DegreeEstima-tion. Proc. IEEE InternationalConferenceon ComputerVision andPatternRecog-nition, pages295–300,1997.

[16] A. Lanitis, C. J. Taylor, andT. F. Cootes.AutomaticInterpretationandCodingofFaceImagesUsing Flexible Models. IEEE Transactionson Pattern AnalysisandMachineIntelligence, 19(7):743–756,July 1997.

[17] J.J.Lien, T. Kanade,J.F. Cohn,andC. Li. Detection,Tracking,andClassificationof Action Units in FacialExpression.Journalof RoboticsandAutonomousSystem,31:131–146,2000.

[18] B. D. Lucasand T. Kanade. An Iterative ImageRegistrationTechniquewith anApplication to StereoVision. Proc. International Joint Conferenceon ArtificialIntelligence, pages674–679,1981.

[19] B. Myers,J.Hollan,andI. Cruz. Strategic Directionsin Human-ComputerInterac-tion. ACM Computingsurveys, 28(4):795–809,December1996.

[20] N. Oliver, A. P. Pentland,andF. Berard.LAFTER: Lips andFaceRealTimeTrackerwith FacialExpressionRecognition.Proc. ComputerVision andPattern Recogni-tion Conference, S.Juan,PuertoRico, pages123–129,June1997.

[21] T. OtsukaandJ.Ohya.Recognitionof FacialExpressionsUsingHMM with Contin-uousOutputProbabilities.5th IEEE InternationalWorkshopon RobotandHumanCommunication, pages323–328,1996.

222

[22] M. PanticandRothkrantzL. J.M. AutomaticAnalysisof FacialExpressions:TheStateof theArt. IEEE Transactionson PatternAnalysisandMachineIntelligence,22(12):1424–1445,December2000.

[23] M. Rosenblum,L. S. Davis, andY. Yacoob. HumanEmotion RecognitionfromMotion Using a RadialBasisFunctionNetwork Architecture. IEEE WorkshoponMotion of Non-RigidandArticulatedObjects,Austin,Texas, pages43–49,Novem-ber1994.

[24] A SamalandA. P. Iyengar. AutomaticRecognitionandAnalysisof HumanFacesandFacialExpressions:A Survey. PatternRecognition, 25(1):65–77,January1992.

[25] M. Sarkar. ModularPatternClassifiers:A Brief Survey. Proc. IEEEConferenceonSystem,Man,andCybernetics, 4:2878–2883,2000.

[26] M. R. Spiegel. Schaum’sOutlineof TheoryandProblemsof Statistics,2ndEdition.McGraw-Hill Inc., 1990.

[27] Y. Tian, T. Kanade,andJ. F. Cohn. RecognizingAction Units for Facial Expres-sion Analysis. IEEE Transactionon Pattern Analysisand Machine Intelligence,23(2):97–115,February2001.

[28] M. Wang,Y. Iwai, andM. Yachida.ExpressionRecognitionfrom Time-SequentialFacial Imagesby Useof ExpressionChangeModel. Proc. 3rd IEEE InternationalConferenceon AutomaticFaceandGesture Recognition, pages324–329,1998.

[29] Y. Yacooband L. S. Davis. RecognizingHumanFacial Expressionsfrom LongImageSequencesUsingOptical-Flow. IEEE Transactionson PatternAnalysisandMachineIntelligence, 18(6):636–642,June1996.

[30] J.Ziegler. InteractiveTechniques.ACM Computingsurveys, 28(1):185–187,March1996.

recognition of facial expressions in the presence of occlusion

Documents