shape-based human detection and segmentation via hierarchical part-template matching

Shape-Based Human Detection and Segmentation via Hierarchical Part-

Template Matching

Zhe Lin, Member, IEEELarry S. Davis, Fellow, IEEE

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLGENCE, APRIL 2010

Overview

• Introduction• Previous Work• Proposed Approach– Hierarchical Part-Template Matching– Pose-Adaptive Descriptors– Combining With Calibration And Background

Subtraction• Experiment Result• Conclusion

Introduction

• Robust Human tracking and identification are highly dependent on reliable human detection and human segmentation.

• Remains challenging due to several conditions like body postures, illumination, occlusion, and viewpoint changes.

• Goal: Develop a robust and efficient approach to detect and segmentation.

• Method: Shape-based, part-template matching

Overview



Previous Work

• Shape Feature extraction schemes– Model human shapes globally [1],[2],[3]– Model shapes using sparse local features [9],[10],[11]

• Learning Perspective– Generative approach – tree-based data structure [6],

[7],[8]– Discriminative approach – using SVMs as the test

classifiers [3]• Surveillance scenarios– Motion blob information [35],[36]

Overview



Proposed Approach

• Hierarchical part-template matching approach combining with discriminative learning.

Overview



Hierarchical Part-Template Matching

• Generating the part-template tree model– Synthesizing global shape models– Generating parts by decomposition– Constructing an initial tree model using parts

• Learning the part-template tree• Hierarchical part-template matching

Synthesizing Global Shape Models

• Analyzing articulation of human body to six regions– Head, torso, pair of upper legs, pair of lower legs– Parameter above are quantized into {3,2,3,3,3,3}

Generating Parts by Decomposition

• Binarize (a) and to obtain (b), then extract boundaries of the silhouettes to get (c).

• Silhouettes are decomposed into three parts(head-torso, upper legs, and lower legs)

• The parameters of silhouettes are denoted by θj, consist of index and location

Constructing an Initial Tree Model Using Parts

• A part-template tree is conducted by placing the decomposed part region or fragment into a tree.

• Four layer L0~L3, denote root, head-torso, upper and lower legs separately.

• Tree consists of 186 part-template. (6 ht models, 18 ul models, and 162 ll models)

• Much larger set only slightly improves in performance.

• Applying fast hierarchical shape matching scheme.

Constructing an Initial Tree Model Using Parts

Learning the Part-Template Tree

• The tree doesn’t contain any prior statistics from real human silhouettes.

• The learning is performed by matching the tree to a set of real human silhouette images.

• The goal is to explicitly estimate branching probability distributions (conditional probability distributions).

Learning the Part-Template Tree

• Learning method:– The training silhouette is passed through the tree

from root to estimate the matching score and find the optimal path.

– Based on the set of paths, a branching probability distribution is estimated for each node.

– Each node contains a binary image of the part-template, its sample point coordinates, and a branching probability.

Hierarchical Part-Template Matching

• Similarly to the model used for tree learning.• The overall matching score for a detection

window is simply modeled as a summation of scores of all nodes along the path.

• Score of node is the product of the part-template matching score and the probability of the node.

• Matching method is similar to Chamfer matching [6].– The matching score of a sample point on the contour

is measured by edge-orientation matching to find the optimal human pose.

[6] D.M. Gavrila and V. Philomin, “Real-Time Object Detection for SMART Vehicles,” Proc. IEEE

Overview



Pose-Adaptive Descriptors

• Introduce a pose-adaptive feature computation method for detecting human from images using SVM.

• By similar method of HOG descriptor[3] getting object detection window.

• After given the candidate detection window, hierarchical part-template matching is performed to estimate the optimal pose.

• After the pose is estimated, block features closest to each pose contour point are collected.

[3] N. Dalal and B. Triggs, “Histograms of Oriented Gradients for Human Detection,” Proc. IEEE

Conf.

Pose-Adaptive Descriptors

Low-Level Features

• Similar to [3]• Given an image, calculate gradient magnitudes

|G| and edge orientation O• Quantize the image into 8x8 nonoverlapping

cells, each represent a histogram of edge orientations.

Pose Inference on The Low-Level Features

• An optimal tree path is estimated based on the matching score.

• Among matching score, the part-template score is measured by an average of gradient magnitude.

• Matching score (1), where B(t) = [O(t)/(π/9)], h is the

orientation histogram• The average score of the part-template is

(2)

Representation Using Pose-Adaptive Descriptors

• The global shape models are represented as a set of boundary points with corresponding edge orientations.

Overview



Scene-to-Camera Calibration

• To obtain a mapping between head points and foot points in the image, estimate the homography between the head plane and the foot plane in the image.

• Get head point ph = f(pf), where pf is an arbitrary point of foot.

Combining With Background Subtraction

• Find foot regions Rfoot = {x|ϒx≥ξ}• Through part-template matching finding

regions that may be legs.• Given the estimated human vertical axis vx and

an adaptive rectangular window W(x,(w0,h0)), get human detection.

• Get human segmentation.

Combining With Calibration and Background Substraction

Overview



Experiment Result

• Present result of human detector using their method on two public pedestrian data sets (INRIA and MIT-CBCL).

• Present result of multiple occluded human detector on three crowded image and video data set.

• Compare with other approaches using DET curves.

Experiment of Detection Result


• Better performance than HOG-SVM.• Not only detecting but also segmenting

human poses.• Can be further improved because of capability

of being extended to cover more pose or articulations.

• Successfully detected difficult poses while the HOG-based detector missed.

Experiment of Segmentation Result

• Using pose model and probabilistic hierarchical part-template matching algorithm give very accurate segmentation in the MIT-CBCL and INRIA data set.

Experiment Without Subtraction

Experiment With Subtraction

• Data set– Caviar Benchmark data set– Munich Airport data set collected by Siemens

Corporate Research• Can get good result even with poor and

inaccurate background subtraction.

Experiment With Subtraction

Overview



Conclusion

• A hierarchical part-template matching approach is employed to match human shapes with images detect and segment simultaneously.

• Many of misdetections are due to the pose estimation failures.

• Future work– Investigating the addition of color and

texture statistics to the local contextual descriptor to improve the detection and segmentation performance.

shape-based human detection and segmentation via hierarchical part-template matching

Documents

template matchinggenerating

template treehierarchical

template treethe

template matching shape

template matching approach

template matchingzhe

human segmentation

initial tree model