outline facial attributes analysis animated pose templates(apt) for modeling and detecting human...

APPLICATIONS OF DEEP MODEL

Outline

Facial Attributes Analysis Animated Pose Templates(APT) for Modeling

and Detecting Human Actions Unsupervised Structure Learning of Stochastic

And-Or Grammars

Outline

Facial Attributes Analysis Animated Pose Templates for Modeling and

Detecting Human Actions Unsupervised Structure Learning of Stochastic

And-Or Grammars

A Deep Sum-Product Architecture for Robust Facial Attributes Analysis

Motivation:An attribute can be estimated from small

regionOccluded region can be inferred with

respect to othersAttributes may indicate the absence or

presence of others

Algorithm

Use discriminative binary decision tree(DDT) for each attribute.Each node of tree contains a detector(locate

the region) and a classifier(determine the presence or absence of an attribute)

DDT

Sum-product Tree(SPT)

Model joint probabilityThe value of the root equals the joint

probability of the variables. All the children of a product node are

sums, all the children of a sum node are products or terminals.

Sum node with its children has weights.

Sum-product Tree(SPT)

With SPT, we can efficiently infer the value of an unobserved variable using MPE inference.

When = 1 and is unobserved!We use MPE can find that the most probable explanation of is 0 when = 1.

Algorithm Transform DDT to a sum-product tree(SPT)

to explorer interdependencies of regions. Be able to handle occlusions even train data has

no occlusions

separator

cluster

Sum node

Product node

Algorithm Organize all the SPTs into a sum-product

network(SPN) to learn correlations of different attributes.(Learned by EM)

means 3 different type of sum weights

Inference

Run region detector with sliding window Locate a region Apply a region classifier

Learning

1) Train DDT for each attribute 2) transform DDT to SPT 3) build SPN

E-step: infer unobserved dataM-step: renormalize parametersPrune edges with zero weights

Outline



And-Or Grammars

Formulation

Short-term action snippets( 2~5 frames )Moving pose templates

Long-term transitions between the pose templatesAPTs

Contextual objects

Short-term action snippets

Moving pose templates for each pose =

Shape template(HOG) + Motion template(HOF) Human geometry, appearance, motion jointly

Moving Pose Template(MPT) MPT

appearance(HOG), deformation and motion(variation of HOF).

Long-term actions

Animated pose templateA sequence of moving pose templates

Animated Pose Templates HMM model

Transition Probability for the MPT labelsTracking probability for the movement of

parts between frames

Animated Pose Templates(APT)

Animated Pose Templates with Contextual Object

Contextual ObjectsWeak objects( e.g. cigarette and ground )

○ Too small or too diverse○ Using body parts

Strong objects( e.g. cup )○ Distinguishable○ Using HOG

Treat these objects in the same way as the body parts.

Inference

Learning

Semi-supervised Structure SVMAnnotated key framesCluster them into pose templates by EMFor unannotated frames and model parameters

○ Learn model using labeled data by LSVM○ Accept high score frames as labeled frames

Outline



And-Or Grammars

Unsupervised Structure Learning Problem Definition

G is grammar X is the training data

Algorithm Framework

Introduce new intermediate nonterminal nodes to increase its posterior probability.

And-Or Fragments

And-fragmentsFailed when training data is scarce.

Or-fragmentsDecrease posterior probability.

And-Or fragmentsAnd-rules and Or-rules are learned in a

more unified manner.

Likelihood Gain

= likelihood changes * context matrix changes

Prior Gain = size of grammar increase + reductions of configurations

Posterior Gain = Likelihood Gain * Prior Gain

outline facial attributes analysis animated pose templates(apt) for modeling and detecting human...

Documents

grammars slide

inference slide

attribute ddt slide

region classifier slide

sumproduct networkspn

deep sumproduct architecture

small region

different type of sum