irfan essa, alex pentland

Download Irfan Essa,  Alex Pentland

Post on 31-Dec-2015




0 download

Embed Size (px)


Facial Expression Recognition using a Dynamic Model and Motion Energy. Irfan Essa, Alex Pentland. (a review by Paul Fitzpatrick for 6.892). Overview. Want to categorize facial motion Existing coding schemes not suitable Oriented towards static expressions Designed for human use - PowerPoint PPT Presentation


  • Irfan Essa, Alex PentlandFacial Expression Recognition using a Dynamic Model and Motion Energy(a review by Paul Fitzpatrick for 6.892)

  • OverviewWant to categorize facial motionExisting coding schemes not suitableOriented towards static expressionsDesigned for human useBuild better coding schemeMore detailed, sensitive to dynamicsCategorize using templates constructed from examples of expression changes Facial muscle actuation templatesMotion energy templates

  • Facial Action Coding SystemFACS allows psychologists code expression from static facial mug-shotsFacial configuration = combination of action unitsMotivation

  • Problems with action unitsSpatially localizedReal expressions are rarely localPoor time codingEither no temporal coding, or heuristicCo-articulation effects not representedMotivation

  • Solution: add detailRepresent time course of all muscle activations during expressionFor recognition, match against templates derived from example activation historiesTo estimate muscle activation:Register image of face with canonical meshThrough mesh, locate muscle attachments on faceEstimate muscle activation from optic flowApply muscle activation to face model to generate corrected motion field, also used for recognition


  • Registering image with meshFind eyes, nose, mouthWarp on to generic face meshUse mesh to pick out further features on faceModeling

  • Registering mesh with musclesOnce face is registered with mesh, can relate to muscle attachments36 muscles modeled; 80 face regionsModeling

  • Parameterize face motionUse continuous time Kalman filter to estimate:Shape parameters: mesh positions, velocities, etc.Control parameters: time course of muscle activationModeling

  • Driven by optic flowComputed using coarse to fine methodsUse flow to estimate muscle actuationThen use muscle actuation to generate flow on modelModeling

  • Spatial patterningCan capture simultaneous motion across the entire faceCan represent the detailed time course of muscle activationBoth are important for typical expressionsAnalysis

  • Temporal patterningApplication/release/relax structure not a rampCo-articulation effects presentAnalysis

  • Peak muscle actuation templatesNormalize time period of expressionFor each muscle, measure peak value over application and releaseUse result as template for recognitionNormalizes out time course, doesnt actually use it for recognition?


  • Peak muscle actuation templatesRandomly pick two subjects making expression, combine to form templateMatch against template using normalized dot productRecognitionTemplatesPeak muscle actuations for 5 subjects

  • Motion energy templatesUse motion field on face model, not on original imageBuild template representing how much movement there is at each location on the faceAgain, summarizes over time course, rather than representing it in detailBut does represent some temporal propertiesRecognitionHighLow

  • Motion energy templatesRandomly pick two subjects making expression, combine to form templateMatch against template using Euclidean distance


  • Data acquisitionVideo sequences of 20 subjects making 5 expressions smile, surprise, anger, disgust, raise browOmitted hard-to-evoke expressions of sadness, fearTest set: 52 sequences across 8 subjectsResults

  • Data acquisitionResults

  • Using peak muscle actuationComparison of peak muscle actuation against templates across entire database1.0 indicates complete similarityResults

  • Using peak muscle actuationActual results for classificationOne misclassification over 51 sequencesResults

  • Using motion energy templatesComparison of motion energy against templates across entire databaseLow scores indicate greater similarityResults

  • Using motion energy templatesActual results for classificationOne misclassification over 49 sequencesResults

  • Small test setTest set is a little small to judge performanceSimple simulation of the motion energy classifier using their tables of means and std. deviations shows:Large variation in results for their sample sizeResults are worse than test data would suggestExample: anger classification for large sample size has accuracy of 67%, as opposed to 90%Simulation based on false Gaussian, uncorrelated assumption (and means, deviations derived from small data set!)Comments

  • Nave simulated resultsOverall success rate: 78% (versus 98%)Comments

    Smile90.7%1.4%2.0%19.4%0.0%Surprise0.0%64.8%9.0%0.1%0.0%Anger0.0%18.2%67.1%3.8%9.9%Disgust9.3%13.1%21.4%76.7%0.0%Raise brow0.0%2.4%0.5%0.0%90.1%

  • Motion estimation vs. categorizationThe authors formulation allows detailed prior knowledge of the physics of the face to be brought to bear on motion estimationThe categorization component of the paper seems a little primitive in comparisonThe template-matching the authors use is: Sensitive to irrelevant variation (facial asymmetry, intensity of action)Does not fully use the time course data they have been so careful to collect


  • Video, gratuitous image of TrevorConclusion95 paper what came next?Real-time version with Trevor