1
Face Tracking in Videos
Gaurav Aggarwal, Ashok Veeraraghavan, Rama Chellappa
2
Why video ?
Illumination Pose Expression
Video •Multiple images (better hope!)
•Dynamic information (distinguishability?)
3
3D facial pose tracking
The goal is to recover the 3D configuration of a face in each frame of a given video.•3D configuration: 3 translation parameters
and 3 orientation parameters. Important for applications requiring
head normalization like face recognition, expression analysis, lip reading, etc.
4
Challenges
Self occlusions (due to pose changes)
Expression changes Illumination variation PS : unlike 2D
tracking, pose-based appearance changes are crucial.
5
Earlier approaches
2D appearance based•Output: region of interest on the image
•3D configuration? Active appearance models 3D face models based Cylindrical models
• Inter-frame warping usually assumed to be linear
•Simple inter-frame pose changes
6
Our Approach
Hybrid: geometrical + statistical Geometric modeling takes care of
pose and self-occlusion. Statistical inference handles
tracking under occlusions, illumination and expression variations.
7
The Geometric Model
We use a cylindrical model with an elliptic cross-section.•The ellipticity becomes important when yaw is high.
Why not simple planar model?•Tracking becomes difficult and does not provide 3D
pose Why not a complicated face model (based on
a few laser scans) ?•Very susceptible to errors in initialization and
registration.
8
The Projection model
Orthographic•Restrictive
Perspective•Calibration parameters?
We use perspective projection model and show robustness to errors in focal length
9
Errors in focal length assignment (1)
Suppose true focal length = f0
True projections:
Say, assigned focal length = kf0
Consider a fictitious cylinder of same dimensions but placed at (X0, Y0, kZ0)
10
Errors in focal length assignment (2)
The projections under the assumed f :
Hskakjhjj , we are fine
11
Errors in focal length assignment (3)
Now, if
The assumption means that the depth variations within the object are small
12
Choice of features
Desirable properties•Easy to detect and compute
•Robust to occlusions, changes in illumination, expression etc.
We stress-test our approach by using an extremely simple and easily computable feature.
13
Features
We superimpose a rectangular grid all around the cylinder.
The mean intensity for each of the visible grids constitutes the feature vector
Given a configuration, the grids can be projected on to the image frame and the feature vector can be computed.
14
Tracking (1)
Dynamic state estimation problem•State consists of 3D orientation and translation
parameters
•We use Particle filter based inference
15
Tracking (2)
pf approximates the desired posterior pdf by a set of weighted particles
Random-walk motion model
• keeps the tracker generic The observation model
• Ds is the mapping to transform an image frame to the feature vector
• N is the feature model
16
Tracking (3)
Likelihood of each particle is computed using average SSD between the feature model and the mean vector corresponding to the particle.
Choice of feature model•Ability to handle variations in the appearance
• Immune to drift
17
Tracking (4)
Two feature models•Lost model (the feature vector in the 1st
frame)•not capable of handling drastic appearance changes
•Wander model (the feature vector corresponding to best particle at previous instant)•can handle appearance changes
•susceptible to drifts
We use a combination of both which makes the tracker very resilient.
18
Tracking (5)
Robust Statistics• trust only the top half of the means and treat
the rest as outliers.
•makes the tracker robust to illumination and expression changes, occlusions, etc.
Robustified likelihood computation
19
Experiments and results
3 different datasets •Ground truth available for one to
evaluate the performance of the tracker Experiments
•Tracking – extreme poses, occlusion, expression variations
•Comparison to ground truth
•Recognition with non-overlapping poses
20
Tracking results
21
Comparison to ground Truth
22
Small Recognition Experiment
Gallery of 10 subjects. No overlap between poses present
in gallery and probes. •nearest poses were at least 30 degrees
apart 100% recognition rate.
23
24
More results
25
Contributors
Rama Chellappa Gaurav Aggarwal Ashok Veeraraghavan