poselets: body part detectors trained using 3d human pose annotations zuo zhen 27 sep 2011

34
Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011

Upload: conrad-hawkins

Post on 08-Jan-2018

222 views

Category:

Documents


0 download

DESCRIPTION

Introduction The proposed poselet classifiers are directly trained to handle the visual variation associated with a common underlying semantics.

TRANSCRIPT

Page 1: Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011

Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations

ZUO ZHEN27 SEP 2011

Page 2: Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011

Outline

• Introduction• Related work• Methods• Experiments• Conclusion and future work

Page 3: Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011

Introduction

The proposed poselet classifiers are directly trained to handle the visual variation associated with a common underlying semantics.

Page 4: Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011

Introduction• What is poselet?A poselet describes a particular part of the human pose under a givenviewpoint. It is defined with a set of examples that are close in 3D configuration space.

• Two criteria of “good” Poselets1. Easy to find the poselet given the input image. (Tightly clustered in appearance space)2. Easy to localize the 3D configuration of the person conditioned on the detection of a

poselet. (Tightly clustered in configuration space)

• Contribution1. Propose a new notion of part, a “poselet”, and an algorithm for selecting good poselets.2. Develop a novel dataset H3D(Humans in 3D) which is annotated with 3D configuration

information.

Page 5: Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011

Related work1. Work in the pictorial structure traditionDisadvantage: most natural to construct kinematic simulations of a moving person, while may not correspond to the most salient features for visual recognition.

2. Work in the appearance based window classification tradition

Disadvantage: not suitable for pose extraction or localization of the anatomical body parts or joints.

3. Work of hybrid approach which have stages of one type followed by a stage of another type

Disadvantage: the parts themselves are not jointly optimized with respect to combined appearance and configuration space criteria

Page 6: Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011

Left Hip

Left Shoulder

Method

This paper use keypoints to annotate the joints, eyes, nose, etc. of people to find correspondence at training time

Page 7: Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011

Method(H3D dataset)

• H3D dataset: 2000 human annotations Images from Flickr with Creative Commons

Attributions License4. Provides annotation of 15 types of regions of a

person, and 19 types of keypoint annotations.

Page 8: Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011

Method (H3D dataset)

• Why 3D not 2D?3D 2D

Use ratio of annotations contribute to the statistics

Every annotation Only frontal view annotations

Sensitivity to foreshortening

Not strongly affected Strongly affected

Whether allow for decomposing camera view point

Yes NA

Whether allow for query for the appearance of poselets

Yes NA

Page 9: Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011

Method (H3D dataset)

Left: H3D can generate conditional region probability masks. Right: H3D can generate scatter plots of the 2D screen locations of the right elbow and left ankle given the locations of both shoulders.

Page 10: Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011

Method (Finding Candidates)

Define the (asymmetric) distance in configuration space from example s to example r as:

Where = [x, y, z] are the normalized 3D coordinates of the i-th keypoint of the example s. The weight term isa Gaussian with mean at the center of the patch. The term is a penalty based on the visibility mismatch of keypoint i in the two examples.

, ( )s rh i

( )sX i( )sw i

Page 11: Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011

Method (Generate Poselet Candidates)

Example query regions (left column) and the corresponding closest matches in configuration space generated by H3D.

Page 12: Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011

Method (Training Poselet classifiers)

1. Given a seed patch2. Find the closest patch (search by running a scanning

window over all positions and scales of all annotations)

3. Sort them by residual error4. Threshold them5. Select a small set of poselets that are: Individually

effective and complementary6. Use them as positive training examples to train a

linear SVM with HOG features

( )sd r

Page 13: Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011

Method (For Detection & Localization)

The probability of detecting the object O at position x is:

Where is the score that a poselet classifier assigns to location x and is the weight of the poselet, and the author use the Max Margin Hough Transform to learn the weight.

( )ia x

iw

Page 14: Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011

Experiments

(1) Detecting Human Torsos

ROC curve comparing the proposed torso detection performance together with other published detectors on the H3D test set

Page 15: Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011

Experiments

• Examples of torso detections using poselets

Page 16: Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011

Experiments

(2) Detecting People on PASCAL VOC 2007Outperform the part-based deformable detector on H3D but get comparable performance on VOC2007.

Page 17: Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011

Experiments

(3) Detecting Keypoints

Detection rate of some keypoints conditioned on true positive torso detection.

Page 18: Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011

Conclusion & Future Work

• ConclusionThe authors propose a two-layer classification/ regression model for detecting people and localizing body components. And the 3D annotation guides the search for good parts.• Future workUse H3D more widely.

Page 19: Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011

Birdlets: Subordinate Categorization Using Volumetric

Primitivesand Pose-Normalized Appearance

Page 20: Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011

Outline

• Introduction• Related work• Methods• Experiments• Conclusion

Page 21: Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011

Introduction

• Application backgroundCurrent research: two extremes of individuals and basic-level categoriesFew research on subordinate categorization

• What is subordinate categorization?Distinguish by the differing properties of parts.

Page 22: Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011

Introduction

Overview of the Proposed approach

Page 23: Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011

Introduction

• Contribution1. A framework for detecting volumetric part

models2. A pose-normalized appearance model for

comparing part appearance3. A classification model for aggregating

information about part properties

Page 24: Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011

Related work

• Image featuresDisadvantages: view-dependent, pose variation• Part modelDisadvantages: high intra-class variability, significant articulation• Hierarchy modelDisadvantages: subordinate categories have both subtle and drastic appearance variation• Attribute modelDisadvantages: Insufficient to model subtle differences between parts

Page 25: Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011

Method

• Why birds?1. Exist largest subordinate-level dataset (CUB-

200)2. Conform with the definition of subordinate-

level (share common structure & parts with many subtle part distinctions)

3. Involving highly variable appearances and articulations (challenging)

Page 26: Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011

Method (PNAD)

Post-normalized appearance descriptor (PNAD)1. Map points on a unit sphere onto the ellipsoid’s

surface for patch sampling2. Project patches on ellipsoid surface to original

image plane3. Extend the projected patches for extracting SIFT

descriptor 4. Concatenate the location and appearance

information for forming PAND descriptor

Page 27: Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011

Method (PNAD)

Page 28: Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011

Method (Birdlet)

• Volumetric primitive templates1. Two parts (head & body)2. Two ellipsoids (parameters: location center,

3D orientation, scale)3. Alignment (assisted by visible point features:

beaktips, eyes, wingtips, feet and tails)

Page 29: Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011

Method (Training & Testing)

1. Get selection windows for detecting objects and parts in testing image(both positive and negative examples for SVM classifier)

2. Get birdlets for integrated classification

Page 30: Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011

Method (Integrated Classification)• Stacked Evidence Trees model

The Stacked Evidence Tree takes a test feature and finding a set of training features that are similar both in appearance and surface location, and ultimately returning the class label distribution across this similar set

Page 31: Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011

Experiments

• Classification Confusion Matrices

(a) the PHOW/SVM Baseline (37.12% MAP), (b) the PNAD-RF performance on the top 20% of detections (40.25% MAP), and (c) the PNAD-RF performance on the ground truth part locations (66.58% MAP).

Page 32: Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011

Experiments

• Example Volumetric Primitive Detections

Top two images: the bird is detected and localized with reasonable accuracyLow two images: false positive detections

Page 33: Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011

Experiments

Classification of Volumetric Detections. For the k top ranked detections, this plots the corresponding PNAD-RF classification performance (using mean-average precision)

Page 34: Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011

Conclusion

• ConclusionThis paper presented an approach for subordinate categorization using a pose-normalized appearance representation founded upon a volumetric part model.