multi-class classification on riemannian manifolds for video surveillance

Multi-class Classification on Riemannian Manifolds for Video Surveillance D. Tosato, M. Farenzena, M. Cristani, M. Spera and V. Murino

Dipartimento di Informatica, University of Verona, Italy

Istituto Italiano di Tecnologia (IIT), Genova, Italy

The Problem

In video surveillance, classification of visual data can be very hard

Small obj., << 50x50

< 60x40

Low resolution

Occlusions

Bad light conditions

What kind of situations we want to tackle?

The goal of this work is …

• finding a feature able to describe visual objects at prohibitive low resolutions.

• building a robust multi-class learning framework which marries the selected object description.

Related works: • O. Tuzel, F. Porikli, P. Meer. Pedestrian detection via classification on Riemannian manifolds.

IEEE PAMI, 2008.

• J. Orozco, S. Gong, T. Xiang. Head pose classification in crowded scenes. BMVC 2009.

• N. Dalal, B. Triggs. Histograms of oriented gradients for human detection. CVPR 2005.

3

Outline

• Problem overview

• Feature layout: ARCO

• Multi-class classification framework on Riemannian Manifolds

• Experiments

• Computational considerations

• Conclusions and future work

4

Overview Assume:

• Data are typically at low resolution, often in crowded scenarios.

5

• According to a surveillance task, only a coarse categorization can be achieved.

• Data must be roughly aligned for training purposes.

Overview

The method in a nutshell …

1. Features and their integral representations are calculated.

2. For a set of overlapping patches on a regular grid covariance matrices as object descriptors are built.

3. Covariance matrices are managed introducing the sectional curvature analysis (SCA)

4. For each patch a multi-class LogitBoost classifier is instantiated.

5. A majority voting is used to label an image.

6

Outline


• Feature layout: ARCO


• Experiments



7

ARCO (ARray of COvariances)

• An image is organized into a grid of uniformly spaced and overlapping patches.

• This layout does not needs to find region/point of interest and is efficiently computed.

• The patches of pixels, on a fixed grid of pixels steps.

• We achieve the best classification performances both for pedestrians and heads using where is the image dimension. 8

ARCO (ARray of COvariances)

• Each patch is described by a covariance matrix of image features.

• These have been exploited as powerful descriptors of pedestrians [Tuzel et al. PAMI2008].

• Their effectiveness have been explicitly investigated in a comparative study [Paisitkriangkrai et al. IET-CV2008].

• Their versatility has been shown in [Tuzel et al. ECCV2006]. • Feature set is task-dependent:

9

Head Pose Classification/Detection

Pedestrian Detection

Working with covariances

• An object is described with a set of covariance matrices.

• Covariance matrices live on a Riemannian manifold and typical machine learning techniques are not usable.

• Covariances have to be projected on local manifold views (vectorial spaces) for learning purposes.

• At the state-of-the art, classifiers are learned on the local views and combined with boosting.

10

Interesting issues in using covariances

• On the Riemannian Manifold of covariances , the distance between points is

• On a tangent space the distance is the usual Euclidean distance

11

So, the question is: - is a good approximation of ? ? - if so, which tangent space must be chosen?

Working efficiently with covariances

• Covariances are very powerful descriptors with some characteristics: – covariances’ calculation is fast thanks to integral tensor

representation [Tuzel et al., ECCV2006] – the computational burden necessary to utilize them is a

drawback

• The nonlinear manifold of covariances can be turned into a flat one using the Log-Euclidean metric, but the goodness of the approximation have to be estimated.

Sectional Curvature Analysis - SCA

12 I. Chavel, Riemannian Geometry - A modern introduction. Cambridge Univ. Press, 2006.

Sectional Curvature Analysis - SCA

• The space of covariance matrices can be

equipped with a Riemannian metric.

• SCA is a way to describe the curvature on a Riemannian Manifold which naturally generalize the classical Gaussian curvature for surfaces.

• is a homogeneous symmetric space, therefore its negative sectional curvature can be computed at .

• If the SCA coefficient is close to 0 implies that the Riemannian Manifold is almost flat (= Euclidean space).

13

Idea

SCA (2)

Let and their logarithm mapping at ( e.g. ), and an approximated (geodesic) distance

14

It is a non-negative function that depends on the sectional curvature

SCA (3)

• Experimentally mean value is -10-3 that is far from the standard negative curvature -1

• In this conditions, one can choose any point on which to map the dataset.

15

Sectional Curvature

Outline


• Feature layout: ArCO


• Experiments



16

Multi-class Boosting Framework

• For each patch a multi-class boosting classifier is learned.

• To reach a computationally feasible solution we exploit the result of SCA in a multi-class LogitBoost learning framework:

• Given , where , the Riemannian Manifold of covariance matrices and

a set of labels, data is mapped to by:

17

LogitBoost [J. Friedman, T. Hastie, and R. Tibshirani, Ann Statist. 2000]

18

• LB is a real (not 1-vs-All) multi-class boosting framework which fits iteratively an additive symmetric logistic model to get the posterior over the classes

• The update step combine the weak classification response coming from each class.

Multi-class weak classifier Binary weak classifier

• At each iteration, LB combine binary weak classification response fitting its own linear/non-linear regressor .

• Each multi-class weak learner focuses on a sub-window on an overlapped regular grid of Np patches.

• We assign a class label with a estimating

where 19

LogitBoost (2)

Multi-class Boosting Framework (2)

• We eliminate the necessity of using the boosting as feature selector reinforcing the weak learning strategy: Weighted Regression Trees.

• Adding an extra class with negative examples and using the rejection cascade*, we can build a robust multi-class detector.

• We have established an automatic stopping rule for the learning process:

20 * Viola and Jones, CVPR 2001

Weighted Regression Trees

• WRTs* are binary tree which can be applied to efficiently tackle the weighted nonlinear regression problem.

• WRTs growth have been limited strongly in order to use them as weak classifiers.

• Boosting Weights are injected into a WRT to refine the regression result.

21 * L. Breiman et al., Classification and Regression Trees, CRC Press, 1984

Outline




• Experiments



22

Experiments Datasets

Head Pose Classification: • QMUL 4 Head Pose Dataset:

– 5 classes (back/front/left/right/background) – 4000 examples/class automatically collected – Image resolution: 128x64 pixels

• Additional Examples from INRIA Person Dataset – 2736 head examples / ~ 2000 background examples – head examples are manually classified in 4 classes – Image resolution: 32x pixels

Pedestrian Detection: • INRIA Person Dataset:

– 3580 pedestrians / 1671 person-free images; – Pedestrian ROI resolution 128x64 pixels.

23

24

Experiments Head Pose Classification

4 classes is used from QMUL 4 Head Pose Dataset (no background). Some examples:

Performances using different feature subsets

Experiments Head Pose Classification

25

Our, avg =.94, std=.05 Orozco et al., avg =.82, std=.11

• 5 classes is used from QMUL 4 Head Pose Dataset.

26

Our, avg =.90, std=.09

Orozco et al., avg =.67, std=.36

* Data provided by QMUL * Data is extracted from INRIA Person Dataset

• 5 classes coming from the previous dataset join extra ~ 500 examples/class for the FG and ~2000 for the BG from a more general dataset.

Experiments Head Pose Detection

Experiments Pedestrian Detection

• We use the INRIA Person Dataset where a person is contained in a ROI of 128x64 pixels and its actual average dimension is 50x50 pixels.

• To achieve the best performances in terms of FPPW, we imply a cascade of 5 levels.

27

True Positive

False Negative

miss rate

True Negative

False Positive

FPPW

Qualitative Experiments

28

29

Qualitative Experiments

Outline




• Experiments



30

Computational considerations

• Fixing the feature layout, ARCO decreases the computational complexity of the learning phase of one order of magnitude wrt the state-of-the-art boosting framework embedding the feature selection: from to .

• Using as unique projection point we reduce the compitational complexity of projection from to .

31

Number of WL per class

Number of classes

Number of candidate features

SVD Number of candidate features In time: from 2 weeks to 2 hours.

Computational considerations (2)

• The computation of a image integral tensors takes

, where is the number of features considered and and are the image dimensions.

• The complexity of using regression trees as weak learners (fixing a priori the number of elements per terminal node) is , with the number of samples.

32

Conclusions and future work

• We have proposed the novel general-purpose ARCO descriptor.

• We have built an effective and efficient multi-class LogitBoost framework able to work on Riemannian Manifolds.

• SCA is introduced as an effective tool to analyze the curvature of a Riemannian Manifold.

• ARCO is able to describe visual objects at prohibitive low resolutions.

• ARCO will be used with more general class of object.

• In the future, we plan to work on devising a more novel and powerful classification technique, to replace the LogitBoost framework.

34

multi-class classification on riemannian manifolds for video surveillance

Software