irfan essa, alex pentland
Post on 31-Dec-2015
31 views
Embed Size (px)
DESCRIPTION
Facial Expression Recognition using a Dynamic Model and Motion Energy. Irfan Essa, Alex Pentland. (a review by Paul Fitzpatrick for 6.892). Overview. Want to categorize facial motion Existing coding schemes not suitable Oriented towards static expressions Designed for human use - PowerPoint PPT PresentationTRANSCRIPT
Irfan Essa, Alex PentlandFacial Expression Recognition using a Dynamic Model and Motion Energy(a review by Paul Fitzpatrick for 6.892)
OverviewWant to categorize facial motionExisting coding schemes not suitableOriented towards static expressionsDesigned for human useBuild better coding schemeMore detailed, sensitive to dynamicsCategorize using templates constructed from examples of expression changes Facial muscle actuation templatesMotion energy templates
Facial Action Coding SystemFACS allows psychologists code expression from static facial mug-shotsFacial configuration = combination of action unitsMotivation
Problems with action unitsSpatially localizedReal expressions are rarely localPoor time codingEither no temporal coding, or heuristicCo-articulation effects not representedMotivation
Solution: add detailRepresent time course of all muscle activations during expressionFor recognition, match against templates derived from example activation historiesTo estimate muscle activation:Register image of face with canonical meshThrough mesh, locate muscle attachments on faceEstimate muscle activation from optic flowApply muscle activation to face model to generate corrected motion field, also used for recognition
Motivation
Registering image with meshFind eyes, nose, mouthWarp on to generic face meshUse mesh to pick out further features on faceModeling
Registering mesh with musclesOnce face is registered with mesh, can relate to muscle attachments36 muscles modeled; 80 face regionsModeling
Parameterize face motionUse continuous time Kalman filter to estimate:Shape parameters: mesh positions, velocities, etc.Control parameters: time course of muscle activationModeling
Driven by optic flowComputed using coarse to fine methodsUse flow to estimate muscle actuationThen use muscle actuation to generate flow on modelModeling
Spatial patterningCan capture simultaneous motion across the entire faceCan represent the detailed time course of muscle activationBoth are important for typical expressionsAnalysis
Temporal patterningApplication/release/relax structure not a rampCo-articulation effects presentAnalysis
Peak muscle actuation templatesNormalize time period of expressionFor each muscle, measure peak value over application and releaseUse result as template for recognitionNormalizes out time course, doesnt actually use it for recognition?
Recognition
Peak muscle actuation templatesRandomly pick two subjects making expression, combine to form templateMatch against template using normalized dot productRecognitionTemplatesPeak muscle actuations for 5 subjects
Motion energy templatesUse motion field on face model, not on original imageBuild template representing how much movement there is at each location on the faceAgain, summarizes over time course, rather than representing it in detailBut does represent some temporal propertiesRecognitionHighLow
Motion energy templatesRandomly pick two subjects making expression, combine to form templateMatch against template using Euclidean distance
RecognitionHighLow
Data acquisitionVideo sequences of 20 subjects making 5 expressions smile, surprise, anger, disgust, raise browOmitted hard-to-evoke expressions of sadness, fearTest set: 52 sequences across 8 subjectsResults
Data acquisitionResults
Using peak muscle actuationComparison of peak muscle actuation against templates across entire database1.0 indicates complete similarityResults
Using peak muscle actuationActual results for classificationOne misclassification over 51 sequencesResults
Using motion energy templatesComparison of motion energy against templates across entire databaseLow scores indicate greater similarityResults
Using motion energy templatesActual results for classificationOne misclassification over 49 sequencesResults
Small test setTest set is a little small to judge performanceSimple simulation of the motion energy classifier using their tables of means and std. deviations shows:Large variation in results for their sample sizeResults are worse than test data would suggestExample: anger classification for large sample size has accuracy of 67%, as opposed to 90%Simulation based on false Gaussian, uncorrelated assumption (and means, deviations derived from small data set!)Comments
Nave simulated resultsOverall success rate: 78% (versus 98%)Comments
Smile90.7%1.4%2.0%19.4%0.0%Surprise0.0%64.8%9.0%0.1%0.0%Anger0.0%18.2%67.1%3.8%9.9%Disgust9.3%13.1%21.4%76.7%0.0%Raise brow0.0%2.4%0.5%0.0%90.1%
Motion estimation vs. categorizationThe authors formulation allows detailed prior knowledge of the physics of the face to be brought to bear on motion estimationThe categorization component of the paper seems a little primitive in comparisonThe template-matching the authors use is: Sensitive to irrelevant variation (facial asymmetry, intensity of action)Does not fully use the time course data they have been so careful to collect
Comments
Video, gratuitous image of TrevorConclusion95 paper what came next?Real-time version with Trevor