spatial coordinate coding to reduce histogram...
TRANSCRIPT
Spatial Coordinate Coding To Reduce HistogramRepresentations, Dominant Angle And Colour Pyramid
Match
P. Koniusz, K. Mikolajczyk
CVSSP, University of Surrey, UK
{P.Koniusz, K.Mikolajczyk}@surrey.ac.uk
September 11, 2011
P. Koniusz, K. Mikolajczyk (CVSSP) Spatial Cooridnate Coding September 11, 2011 1 / 13
Introduction
Recognition approach (Bag of Words)
1. Feature extraction
Computedescriptors
3. Mid-level features2. Visual vocabulary
Kernel +SVM or KDA
Cluster descriptors
Detect key-points
Buildhistograms
4. Classification
Average or max pooling
…
freq.
codewords
S2-spatial pyramid match
pool|LX0,LY0
…
pool|LX1,LY1
pool|LX2,LY2
1. Feature extraction 3. Mid-level features2. Visual vocabulary 4. Classification
…
freq.
codewords
pool|LY0
…
L0
L1
L2
pool|LY1
pool|LY2
(x0,y0), d0
(x1,y1), d1(x2,y2), d2
...(xN,yN), dN
L0
L1
L2
(x0), d0
(x1), d1(x2), d2
...(xN), dN
1. Feature extraction 3. Mid-level features2. Visual vocabulary 4. Classification
…
freq.
codewords
pool
…
L0
Computedescriptors
Kernel +SVM or KDA
Joint clustering
Detect key-points
Buildhistograms
Average or max pooling
S1-spatial pyramid match
Computedescriptors
Kernel +SVM or KDA
Joint clustering
Detect key-points
Buildhistograms
Average or max pooling
Pyramid match removed
Spatial Pyramid Match [S. Lazebnik, 2006] at a heart of modernobject category recognition to exploit spatial bias in images
Mid-level feature representations result from mapping low levelfeatures (e.g. descriptors) to a given vocabulary space
Increasing number of quantisation levels results in extreme histogramvectors of 200K or more elements
P. Koniusz, K. Mikolajczyk (CVSSP) Spatial Cooridnate Coding September 11, 2011 2 / 13
Introduction
Aim
To propose a new joint appearance and spatial representation
To reduce resulting vector sizes and therefore both computational andmemory requirements
To investigate which of pooling modalities (spatial, dominant angle,scale, colour bias) benefit from multiple levels of quantisation
Bias in images (Spatial Pyramid Match)
sky trees
fence fence fencetrunktrunk
sky ,tree, ship, grass
sky, treetree, ship, grass
skysky, tree, ship, grassgrass
Coordinate set Xs of an object s introduces spatial bias p(s|~x) ≥ p(s)for ~x ∈ Xs
P. Koniusz, K. Mikolajczyk (CVSSP) Spatial Cooridnate Coding September 11, 2011 3 / 13
Introduction
Bias in images (Dominant Edge Orientation)
sky trees
fence fence fencetrunktrunk
sky ,tree, ship, grass
sky, treetree, ship, grass
skysky, tree, ship, grassgrass
Trunks t remain largely vertical order Θt : p(t|θ) ≥ p(t) if θ ∈ Θt
Bias in images (Dominant Colours)
sky trees
fence fence fencetrunktrunk
sky ,tree, ship, grass
sky, treetree, ship, grass
skysky, tree, ship, grassgrass
Foliage f is of a limited colour set Cf , thus p(f |~c) ≥ p(f ) if ~c ∈ Cf
P. Koniusz, K. Mikolajczyk (CVSSP) Spatial Cooridnate Coding September 11, 2011 4 / 13
Spatial Coordinate Coding for Soft Assignment
Descriptor to mid-level features mapping:
~hn = f (~xn), n = 1, ...,N~xn ∈ X - image descriptors~hn - mid-level features
Mid-level features are Component Membership Probabilities of GMM:
hnk = p(~mk |~xn) =g(~xn; ~mk , σ)∑K
k ′=1 g(~xn; ~mk ′ , σ)~mk ∈ M - visual wordsσ - model paremeter
Average (or maximum) pooling operation performed on columns ofmatrix HN×K
We assume independence of visual appearance and spatial bias andcode both modalities as a joint distribution (key idea):
g′α(n, k) = g [(1− α)~xn; (1− α)~mk , σ
′]︸ ︷︷ ︸
visual term
· g(α~x′n;α~m
′k , σ
′)︸ ︷︷ ︸
spatial term
We assume idddP. Koniusz, K. Mikolajczyk (CVSSP) Spatial Cooridnate Coding September 11, 2011 5 / 13
Spatial Coordinate Coding for Sparse Coding
Mid-level features by optimising:
arg min~hn
∥∥∥~xn −M~hn
∥∥∥2+ β|~hn|
MD×K - visual vocabulary with K atoms of length D
Spatial descriptor ~x′n and dictionary M
′terms added to the problem
(key idea):
arg min~hn
(1− α)∥∥∥~xn −M~hn
∥∥∥2
︸ ︷︷ ︸visual term
+α∥∥∥~x ′
n −M′~hn
∥∥∥2
︸ ︷︷ ︸spatial term
+β|~hn| (1)
Soft Assignment and Sparse Coding can be spatially enhanced by justconcatenating image descriptors with the spatial information ~x
′n, i.e.:
~xaugn = [√
1− α~xTn︸ ︷︷ ︸visual term
,√α(~x
′n)T︸ ︷︷ ︸
spatial term
]T (key outcome)
P. Koniusz, K. Mikolajczyk (CVSSP) Spatial Cooridnate Coding September 11, 2011 6 / 13
Experiments on Spatial Information (VOC 2010)
Spatial Coordinate Coding
Pascal 2010 [M. Everingham, 2010] Action Classification set
9 classes, 301 training, 307 validation, and 613 testing bounding boxes
Soft Assignment (SA) and Spatial Coordinate Coding (SCC) withRBF χ2 kernels used
Results reported as Mean Average Precision
SA+ SPM(3levels) SA+SCC SA+SCCvalidation, 1 kernel validation, 1 kernel test, multiple kernels
49.8 51.6 62.15
Spatial Coordinate Coding outperforms Spatial Pyramid MatchP. Koniusz, K. Mikolajczyk (CVSSP) Spatial Cooridnate Coding September 11, 2011 7 / 13
Experiments on Spatial Information (Flower 17)
Spatial Coordinate Coding
Flower 17 [M. E. Nilsback, 2008], 17 classes, 3 splits of data, eachconsisting of 680 training, 340 validation, and 340 testing images
Soft Assignment SCC SPM (3 levels)χ2 kernel 91.16 89.3
Sparse Coding SCC SPM (4 levels)linear kernel 88.43 88.86
Spatial Coordinate Coding is a weaker performer if Sparse Coding andlinear classifier are used
Pyramid Match elevates histogram data to a higher dimensionalrepresentation (vital for linear classifier)
P. Koniusz, K. Mikolajczyk (CVSSP) Spatial Cooridnate Coding September 11, 2011 8 / 13
Experiments on Dominant Angle Pyramid Match
Dominant Angle Pooling
Pascal 2007 consists of 20 object categories with high variability inintra-class appearance, rotation, and spatial position
Dominant Angle (DA) on descriptor level (variant, invariant, anddescriptor augmentation cases)
DA invariant DA variant DA coordinate appended46.00 50.23 50.24
Dominant Angle is important in classification
Dominant Angle (DA) with multiple qunatisation levels (DAPM) andSpatial Pyramid Match (SPM)
SPM (3 levels) DAPM(5levels) DAPM + SPM54.3 53.40 SPM 56.3
Best results achieved when using both Spatial (3 levels) andDominant Angle Pyramid Match (5 levels)
P. Koniusz, K. Mikolajczyk (CVSSP) Spatial Cooridnate Coding September 11, 2011 9 / 13
Experiments on Colour Pyramid Match
Colour Component Pooling
Flower 17 set used for further evaluation as it greatly benefits fromcolour information
Soft Assignment (SA) and Spatial Coordinate Coding (SCC) withRBF χ2 kernels used
Results Reported as Average Accuracy
SCC 86.4%SCC+Colour Pyramid Match 87.4%
SCC+Colour Pyramid Match+Opponent SIFT 91.4%MKL based approach [F. Yan, 2010] 86.7%
P. Koniusz, K. Mikolajczyk (CVSSP) Spatial Cooridnate Coding September 11, 2011 10 / 13
Conclusions
Spatial Coordinate Coding outperforms SPM (3 levels) (e.g. by 1.8%on Flower 17)
It reduces histogram sizes from e.g. 56K to 4K bypassing SpatialPyramid Match
Spatial bias does not benefit much form multi-level quantisation
Dominant Angle benefits from multi-level quantisation (DAPM)
DAPM+SPM results in 2.0% improvement on VOC 2007
Colour Pyramid Match improves further Spatial Coordinate Coding by1.0% on Flower 17
Letting classifier decide the right level of quantisation formultiple modalities leads to performance improvement
P. Koniusz, K. Mikolajczyk (CVSSP) Spatial Cooridnate Coding September 11, 2011 11 / 13
sky trees
fence fence fencetrunktrunk
sky ,tree, ship, grass
sky, treetree, ship, grass
skysky, tree, ship, grassgrass
Thank You
P. Koniusz, K. Mikolajczyk (CVSSP) Spatial Cooridnate Coding September 11, 2011 12 / 13
References
S. Lazebnik et al. (2006)
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories.CVPR.
J. C. Van Gemert et al. (2010)
Visual Word Ambiguity.PAMI.
J. Yang et al. (2009)
Linear spatial pyramid matching using sparse coding for image classification.CVPR.
M. E. Nilsback et al. (2008)
Automated Flower Classification over a Large Number of Classes.ICCV.
M. Everingham et al. (2010)
The PASCAL Visual Object Classes Challenge 2010 (VOC2010) Results.ICCV.
F. Yan et al. (2010)
Lp Norm Multiple Kernel Fisher Discriminant Analysis for Object and Image Categorisation.CVPR.
P. Koniusz, K. Mikolajczyk (CVSSP) Spatial Cooridnate Coding September 11, 2011 13 / 13