lpp-hog: a new local image descriptor for fast human detection
DESCRIPTION
Andy {[email protected]}. LPP-HOG: A New Local Image Descriptor for Fast Human Detection. Qing Jun Wang and Ru Bo Zhang IEEE International Symposium on Knowledge Acquisition and Modeling Workshop, 2008. pp.640-643 21-22 Dec. 2008, Wuhan. Problem setting. - PowerPoint PPT PresentationTRANSCRIPT
LPP-HOG: A New Local Image Descriptor for Fast Human Detection
Andy {[email protected]}
Qing Jun Wang and Ru Bo Zhang
IEEE International Symposium on Knowledge Acquisition and Modeling Workshop, 2008.pp.640-643 21-22 Dec. 2008, Wuhan
2Intelligent Systems
Lab.
Problem settingGoal: design algorithm for human detection able to perform in real-time
Proposed solution:-Use a combination of Histogram of Oriented Gradients (HOG) as a feature vector.- Decrease feature-space dimensionality using Locality Preserving Projection (LPP)- Use Support Vector Machine (SVM) algorithm in reduced feature space to train the classifier
3Intelligent Systems
Lab.
HOGgeneral scheme
4Intelligent Systems
Lab.
Typical person detection scheme using SVM
In practice, effect is very small (about 1%) while some computational time is required*
*Navneet Dalal and Bill Triggs. Histograms of Oriented Gradients for Human Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, SanDiego, USA, June 2005. Vol. II, pp. 886-893.
5Intelligent Systems
Lab.
Computing gradients
Mask Type
1D centered
1D uncentered
1D cubic‑corrected 2x2 diagonal 3x3 Sobel
Operator [-1, 0, 1] [-1, 1] [1, -8, 0, 8, -1]
Miss rateat 10−4 FPPW
11% 12.5% 12% 12.5% 14%
01
10
121000121
101202101
1001
6Intelligent Systems
Lab.
Accumulate weight votes over spatial cells• How many bins should be in histogram?
• Should we use oriented or non-oriented gradients?
• How to select weights?
• Should we use overlapped blocks or not? If yes, then how big should be the overlap?
• What block size should we use?
z
7Intelligent Systems
Lab.
Accumulate weight votes over spatial cells• How many bins should be in histogram?
• Should we use oriented or non-oriented gradients?
• How to select weights?
• Should we use overlapped blocks or not? If yes, then how big should be the overlap?
• What block size should we use?
8Intelligent Systems
Lab.
Accumulate weight votes over spatial cells• How many bins should be in histogram?
• Should we use oriented or non-oriented gradients?
• How to select weights?
• Should we use overlapped blocks or not? If yes, then how big should be the overlap?
• What block size should we use?
9Intelligent Systems
Lab.
Contrast normalization
1
1vvnormL
1
1vvsqrtL
2
2
2v
vnormL
HysL 2 - L2-norm followed by clipping (limiting the maximum values of v to 0.2) and renormalising
10Intelligent Systems
Lab.
Making feature vector
Variants of HOG descriptors. (a) A rectangular HOG (R-HOG) descriptor with 3 × 3 blocks of cells. (b) Circular HOG (C-HOG) descriptor with the central cell divided into angular sectors as in shape
contexts. (c) A C-HOG descriptor with a single central cell.
11Intelligent Systems
Lab.
HOG feature vector for one block
1201019747
1101019545
30251510
2525150
80707050
40303020
510105
1020205
Angle Magnitude
0
20
40
60
80
100
120
140
160
0-19 20-39 40-59 60-79 80-99 100-119 120-139 140-159 160-1790
1
2
3
4
5
0-19 20-39 40-59 60-79 80-99 100-119 120-139 140-159 160-179
Binary voting Magnitude voting
49
41
39
31
29
21
19
11 ,...,,,...,,,...,,,..., hhhhhhhhf
Feature vector extends while window moves
12Intelligent Systems
Lab.
HOG example
In each triplet: (1) the input image, (2) the corresponding R-HOG feature vector (only the dominant orientation of each cell is shown), (3) the dominant orientations selected by the SVM (obtained by multiplying the feature vector by the
corresponding weights from the linear SVM).
13Intelligent Systems
Lab.
Support Vector Machine (SVM)
14Intelligent Systems
Lab.
Problem setting for SVM
x1
x2
wT x + b = 0
wT x + b < 0
wT x + b > 0
A hyper-plane in the feature space
(Unit-length) normal vector of the hyper-plane:
wnw
n
0xwT b
denotes +1denotes -1
15Intelligent Systems
Lab.
x1
x2How would you classify these points using a linear discriminant function in order to minimize the error rate?
denotes +1denotes -1
Infinite number of answers!
Which one is the best?
Problem setting for SVM
16Intelligent Systems
Lab.
Large Margin Linear Classifier
We know that
The margin width is:
x1
x2
denotes +1denotes -1
1
1
T
T
b
b
w xw x
Margin
wT x + b = 0
wT x + b = -1wT x + b = 1
x+
x+
x-
( )2 ( )
M
x x nwx xw w
n
Support Vectors
17Intelligent Systems
Lab.
Large Margin Linear Classifier
Formulation:
x1
x2
denotes +1denotes -1
Margin
wT x + b = 0
wT x + b = -1wT x + b = 1
x+
x+
x-n
such that
2maximize w
For 1, 1
For 1, 1
Ti i
Ti i
y b
y b
w x
w x
18Intelligent Systems
Lab.
Large Margin Linear Classifier
Formulation:
x1
x2
denotes +1denotes -1
Margin
wT x + b = 0
wT x + b = -1wT x + b = 1
x+
x+
x-n
21minimize 2
w
such that
For 1, 1
For 1, 1
Ti i
Ti i
y b
y b
w x
w x
19Intelligent Systems
Lab.
Large Margin Linear Classifier
Formulation:
x1
x2
denotes +1denotes -1
Margin
wT x + b = 0
wT x + b = -1wT x + b = 1
x+
x+
x-n( ) 1T
i iy b w x
21minimize 2
w
such that
20Intelligent Systems
Lab.
Solving the Optimization Problem
( ) 1Ti iy b w x
21minimize 2
w
s.t.
Quadratic programming
with linear constraints
2
1
1minimize ( , , ) ( ) 12
nT
p i i i ii
L b y b
w w w x
s.t.
Lagrangian Function
0i
21Intelligent Systems
Lab.
Solving the Optimization Problem
2
1
1minimize ( , , ) ( ) 12
nT
p i i i ii
L b y b
w w w x
s.t. 0i
1
n
i i ii
y
w x
1
0n
i ii
y
n
1i
x-ww iiip yL
n
1i
-b iip yL
22Intelligent Systems
Lab.
Solving the Optimization Problem
2
1
1minimize ( , , ) ( ) 12
nT
p i i i ii
L b y b
w w w x
s.t. 0i
1 1 1
1maximize 2
n n nT
i i j i j i ji i j
y y
x x
s.t. 0i 1
0n
i ii
y
, and
Lagrangian Dual Problem
23Intelligent Systems
Lab.
Solving the Optimization Problem
The solution has the form:
( ) 1 0Ti i iy b w x
From KKT condition, we know:
Thus, only support vectors have 0i
1 SV
n
i i i i i ii i
y y
w x x
get from ( ) 1 0, where is support vector
Ti i
i
b y b w xx
x1
x2
wT x + b = 0
wT x + b = -1wT x + b = 1
x+
x+
x-
Support Vectors
24Intelligent Systems
Lab.
Solving the Optimization Problem
SV
( ) T Ti i
i
g b b
x w x x x
The linear discriminant function is:
Notice it relies on a dot product between the test point x and the support vectors xi
Also keep in mind that solving the optimization problem involved computing the dot products xi
Txj between all pairs of training points
25Intelligent Systems
Lab.
Large Margin Linear Classifier
What if data is not linear separable? (noisy data, outliers, etc.)
Slack variables ξi can be added to allow miss-classification of difficult or noisy data points
x1
x2
denotes +1denotes -1
wT x + b = 0
wT x + b = -1wT x + b = 1 12
26Intelligent Systems
Lab.
Large Margin Linear Classifier
Formulation:
( ) 1Ti i iy b w x
2
1
1minimize 2
n
ii
C
w
such that
0i
Parameter C can be viewed as a way to control over-fitting.
27Intelligent Systems
Lab.
Large Margin Linear Classifier
Formulation: (Lagrangian Dual Problem)
1 1 1
1maximize 2
n n nT
i i j i j i ji i j
y y
x x
such that
0 i C
1
0n
i ii
y
28Intelligent Systems
Lab.
Datasets that are linearly separable with noise work out great:
0 x
0 x
x2
0 x
But what are we going to do if the dataset is just too hard?
How about… mapping data to a higher-dimensional space:
Non-linear SVMs
29Intelligent Systems
Lab.
General idea: the original input space can be mapped to some higher-dimensional feature space where the training set is separable:
Φ: x → φ(x)
Non-linear SVMs: Feature Space
30Intelligent Systems
Lab.
With this mapping, our discriminant function is now:
SV
( ) ( ) ( ) ( )T Ti i
i
g b b
x w x x x
No need to know this mapping explicitly, because we only use the dot product of feature vectors in both the training and test.
A kernel function is defined as a function that corresponds to a dot product of two feature vectors in some expanded feature space:
( , ) ( ) ( )Ti j i jK x x x x
Non-linear SVMs: The Kernel Trick
31Intelligent Systems
Lab.
Linear kernel:
2
2( , ) exp( )2i j
i jK
x xx x
( , ) Ti j i jK x x x x
( , ) (1 )T pi j i jK x x x x
Examples of commonly-used kernel functions:
Polynomial kernel:
Gaussian (Radial-Basis Function (RBF) ) kernel:
Non-linear SVMs: The Kernel Trick
32Intelligent Systems
Lab.
Nonlinear SVM: Optimization
Formulation: (Lagrangian Dual Problem)
1 1 1
1maximize ( , )2
n n n
i i j i j i ji i j
y y K
x x
such that 0 i C
1
0n
i ii
y
The solution of the discriminant function is
SV
( ) ( , )i ii
g K b
x x x
The optimization technique is the same.
33Intelligent Systems
Lab.
Support Vector Machine: Algorithm
1. Choose a kernel function
2. Choose a value for C
3. Solve the quadratic programming problem (many algorithms and software packages available)
4. Construct the discriminant function from the support vectors
34Intelligent Systems
Lab.
Summary: Support Vector Machine1. Large Margin Classifier
Better generalization ability & less over-fitting
2. The Kernel TrickMap data points to higher dimensional space in order to make them linearly separable.Since only dot product is used, we do not need to represent the mapping explicitly.
35Intelligent Systems
Lab.
Back to the proposed paper
36Intelligent Systems
Lab.
Proposed algorithm parameters
- Bins in histogram: 8- Cell size: 4x4 pixels- Block size: 2x2 cells (8x8 pixels)- Image size: 64x128 pixels (8x16 blocks)- Feature vector size: 2x2x8x8x16=4096
yxG
yxG
yxGyxGG
yxIyxG
yxIyxG
x
y
yx
Ty
x
,,
tan
,,
,*101,
,*101,
1
22
bbbb
cccc
hwyxBhwyxC
,,,,,,
otherwisebinyxifyxG
yx
yx
yxkBCf
kk
Byxk
Cyxk
,0,,,
,
,
,,,
,
,
37Intelligent Systems
Lab.
LPP AlgorithmMain idea: find matrix which will project original data into a space with lower dimensionality while preserving similarity between data (data which are close to each other in original space should be close after projection)
XWY 2
ij
ijjiWSyy 2min
0
,exp
2
ji
ijji
ij xorxofodneighborho
nearestkamongisxorxt
xx
S
38Intelligent Systems
Lab.
LPP Algorithm
WXLXW
SxWxWSyyW
TT
W
ijijj
Ti
T
WijijjiWopt
minarg
minargminarg22
jijii SD
SDL
1:..
minarg
WXDXWts
WXLXWTT
TT
W
1WXDXW TT
Is it correct?
Add constraints Can be represented as a generalized eigenvalue problem
wXDXXLX TT Is it correct?
By selecting d smallest eigenvalues and corresponding eigenvectors dimensionality reduction is achieved
39Intelligent Systems
Lab.
Solving different scale problem
40Intelligent Systems
Lab.
Some results
Dimension dD
etec
tion
rate
PCA-HOG features (labeled’ *’) vs LPP-HOG features (labeled ˅’)
Detection example
41Intelligent Systems
Lab.
Conclusions- Fast human detection algorithm based on HOG features is presented
- no information about computational speed is given
- Proposed method is similar to PCA-HOG- feature space dimensionality decreased using LPP- why do we need to make LPP instead of finding eigenvectors from original feature space?- some equations seems to be wrong
- Reference papers are very few
Navneet Dalal “Finding People in Images and Videos” PhD Thesis. Institut National Polytechnique de Grenoble / INRIA Grenoble , Grenoble, July 2006.
Navneet Dalal and Bill Triggs, “Histograms of Oriented Gradients for Human Detection”. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, SanDiego, USA, June 2005. Vol. II, pp. 886-893.
Paisitkriangkrai, S., Shen, C. and Zhang, J. “Performance evaluation of local features in human classification and detection”, IET Computer Vision, vol.2, issue 4, pp.236-246,December 2008