outline facial attributes analysis animated pose templates(apt) for modeling and detecting human...
TRANSCRIPT
APPLICATIONS OF DEEP MODEL
Outline
Facial Attributes Analysis Animated Pose Templates(APT) for Modeling
and Detecting Human Actions Unsupervised Structure Learning of Stochastic
And-Or Grammars
Outline
Facial Attributes Analysis Animated Pose Templates for Modeling and
Detecting Human Actions Unsupervised Structure Learning of Stochastic
And-Or Grammars
A Deep Sum-Product Architecture for Robust Facial Attributes Analysis
Motivation:An attribute can be estimated from small
regionOccluded region can be inferred with
respect to othersAttributes may indicate the absence or
presence of others
Algorithm
Use discriminative binary decision tree(DDT) for each attribute.Each node of tree contains a detector(locate
the region) and a classifier(determine the presence or absence of an attribute)
DDT
Sum-product Tree(SPT)
Model joint probabilityThe value of the root equals the joint
probability of the variables. All the children of a product node are
sums, all the children of a sum node are products or terminals.
Sum node with its children has weights.
Sum-product Tree(SPT)
With SPT, we can efficiently infer the value of an unobserved variable using MPE inference.
When = 1 and is unobserved!We use MPE can find that the most probable explanation of is 0 when = 1.
Algorithm Transform DDT to a sum-product tree(SPT)
to explorer interdependencies of regions. Be able to handle occlusions even train data has
no occlusions
separator
cluster
Sum node
Product node
Algorithm Organize all the SPTs into a sum-product
network(SPN) to learn correlations of different attributes.(Learned by EM)
means 3 different type of sum weights
Inference
Run region detector with sliding window Locate a region Apply a region classifier
Learning
1) Train DDT for each attribute 2) transform DDT to SPT 3) build SPN
E-step: infer unobserved dataM-step: renormalize parametersPrune edges with zero weights
Outline
Facial Attributes Analysis Animated Pose Templates for Modeling and
Detecting Human Actions Unsupervised Structure Learning of Stochastic
And-Or Grammars
Formulation
Short-term action snippets( 2~5 frames )Moving pose templates
Long-term transitions between the pose templatesAPTs
Contextual objects
Short-term action snippets
Moving pose templates for each pose =
Shape template(HOG) + Motion template(HOF) Human geometry, appearance, motion jointly
Moving Pose Template(MPT) MPT
appearance(HOG), deformation and motion(variation of HOF).
Long-term actions
Animated pose templateA sequence of moving pose templates
Animated Pose Templates HMM model
Transition Probability for the MPT labelsTracking probability for the movement of
parts between frames
Animated Pose Templates(APT)
Animated Pose Templates with Contextual Object
Contextual ObjectsWeak objects( e.g. cigarette and ground )
○ Too small or too diverse○ Using body parts
Strong objects( e.g. cup )○ Distinguishable○ Using HOG
Treat these objects in the same way as the body parts.
Inference
Learning
Semi-supervised Structure SVMAnnotated key framesCluster them into pose templates by EMFor unannotated frames and model parameters
○ Learn model using labeled data by LSVM○ Accept high score frames as labeled frames
Outline
Facial Attributes Analysis Animated Pose Templates for Modeling and
Detecting Human Actions Unsupervised Structure Learning of Stochastic
And-Or Grammars
Unsupervised Structure Learning Problem Definition
G is grammar X is the training data
Algorithm Framework
Introduce new intermediate nonterminal nodes to increase its posterior probability.
And-Or Fragments
And-fragmentsFailed when training data is scarce.
Or-fragmentsDecrease posterior probability.
And-Or fragmentsAnd-rules and Or-rules are learned in a
more unified manner.
Likelihood Gain
= likelihood changes * context matrix changes
Prior Gain = size of grammar increase + reductions of configurations
Posterior Gain = Likelihood Gain * Prior Gain