taxonomic multi-class prediction and person layout using efficient structured ranking ·...

1
Taxonomic Multi-class Prediction and Person Layout using Efficient Structured Ranking Arpit Mittal 1 , Matthew Blaschko 2 , Andrew Zisserman 1 and Philip Torr 3 1 Visual Geometry Group, University of Oxford 2 Center of Visual Computing, École Centrale Paris 3 Oxford Brookes Vision Group, Oxford Brookes University 3. Implementation and Results 1. Objective and Contributions 2. Parts-based model in which mis-classification score should be proportional to the number of parts mis- classified. Animal Vehicle Horse Cow Bus Bike Evaluation Measure: Mean accumulated taxonomic loss over the top scoring tclasses for a given image. * Taxonomic loss between bikeand busis 2, whereas between bikeand cowit is 4. We perform layout prediction in 2 stages: (i) Part candidates are generated using individual detectors. The candidates are filtered and scored using local position and scale cues head and ϕ hand for head and hand respectively). (ii) Candidates are combined and ranked using the structured output ranking. ϕ(x i ,y i ) is formed by concatenating the feature vectors of head and all hands, ϕ(x i ,y i ) = head hand ) h ]. Feature Vector Structured Ranking SVM Confidence Value Multi-class Taxonomic Prediction Person Layout Experiments are performed using the Indoor scene database [Quattoni et al., 2009] and the PASCAL VOC 2007 classification dataset. The Indoor scene database has 67 classes structured in a two level taxonomy whereas the VOC 2007 dataset has a three level taxonomy of 20 classes. Deeper taxonomy of PASCAL VOC dataset yields better performance. PASCAL VOC 2011 person layout dataset is used for the experiments. Our method is also compared with the PASCAL VOC 2010 submissions. This research is funded by ERC grant VisRec no. 228180, ERC grant no. 259112, ONR MURI N00014-07-1-0182 and PASCAL2 Network of Excellence. Loss 3 Loss 1 Loss 2 < < Loss 1 Loss 2 Loss 3 < < Contributions: Show that the training time can be reduced from quadratic to linear in the number of training samples. Improved performance over state-of-the-art for both the example problems. (2) 2. Method Linear Time Constraint Generation Generalizes ordinal regression to a structured space. Learns the weight vector w, such that the input-output pairs with lower loss values i ) are assigned a higher compatibility score f(x i , y i ) = w T ϕ(x i , y i ). The objective function pays a hinge loss proportional to the difference in losses for mis-ordering: (1) (1) has O(n 2 ) number of constraints and slack variables. The equivalent 1-slack formulation has a single slack variable, . c ij = |(Δ i < Δ j ) ((w T ϕ(x i , y i )) (w T ϕ(x j , y j )) < 1)| . A constraint is violated when the difference in the compatibility score of samples is less than the margin. Training of (2) involves optimizing the objective function with the violated constraints. If the loss values are discrete and finite, all the violated constraints can be found in linear time. (a) (b) Sort the data samples, as list S, in terms of decreasing compatibility score. Iterate through the set of loss values l = l 1 l L . For each l: If a sample j with Δ j > l (black bar) is scored such that it violates the margin constraint with a sample i having Δ i = l, it will also violate all the subsequent samples iwith Δ i= l in the sorted list S. By recording all the samples having loss value l (grey bars), all the violated constraints for pairs (i, j) with Δ i = l, can be found in one pass through the samples. http://www.robots.ox.ac.uk/~vgg/software/struct_rank/ . Structured Output Ranking Results Results * The target class is bike Algorithm * * * * * The number of loss values is logarithmic in the number of class labels. We learn the class hierarchies in a ranking setting. Image (x i ) is represented by quantised SIFT descriptors encoded using locality-constrained linear encoding. The joint feature map (ϕ) is the standard one used in multi-class prediction: ϕ(x i , y i ) = λ(y i ) x i . Class attribute vector λ j (y i ) = 1 if j = y i , zero otherwise. The number of loss values is limited by the number of parts. Objective: Structured output ranking for problems where the mis-classification score has a special structure. Examples: 1. Taxonomy in which the mis-classification score for object classes close by, using the tree distance within the taxonomy, should be less than for those far apart.

Upload: others

Post on 01-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Taxonomic Multi-class Prediction and Person Layout using Efficient Structured Ranking · 2012-10-05 · Taxonomic Multi-class Prediction and Person Layout using Efficient Structured

Taxonomic Multi-class Prediction and Person Layout using Efficient Structured Ranking

Arpit Mittal1, Matthew Blaschko2, Andrew Zisserman1 and Philip Torr3

1Visual Geometry Group, University of Oxford 2Center of Visual Computing, École Centrale Paris 3Oxford Brookes Vision Group, Oxford Brookes University

3. Implementation and Results 1. Objective and Contributions

2. Parts-based model in which mis-classification score should be proportional to the number of parts mis-classified.

Animal Vehicle

Horse Cow Bus Bike

Evaluation Measure: Mean accumulated taxonomic loss over the top scoring ‘t’ classes for a given image.

* Taxonomic loss between ‘bike’ and ‘bus’ is 2, whereas between

‘bike’ and ‘cow’ it is 4.

We perform layout prediction in 2 stages: (i) Part candidates are generated using individual

detectors. • The candidates are filtered and scored using local position and scale cues (ϕhead and ϕhand for head and hand respectively).

(ii) Candidates are combined and ranked using the

structured output ranking. • ϕ(xi,yi) is formed by concatenating the feature vectors of head and all hands, ϕ(xi,yi) = [ϕhead (ϕhand)

h].

Feature

Vector

Structured

Ranking

SVM

Confidence

Value

Multi-class Taxonomic Prediction Person Layout

• Experiments are performed using the Indoor scene database [Quattoni et al., 2009] and the PASCAL VOC 2007 classification dataset.

• The Indoor scene database has 67 classes structured in a two level taxonomy whereas the VOC 2007 dataset has a three level taxonomy of 20 classes.

• Deeper taxonomy of PASCAL VOC dataset yields better performance.

• PASCAL VOC 2011 person layout dataset is used for the experiments.

• Our method is also compared with the PASCAL VOC 2010 submissions.

This research is funded by ERC grant VisRec no. 228180, ERC grant no. 259112, ONR MURI N00014-07-1-0182 and PASCAL2 Network of Excellence.

Loss3 Loss1 Loss2 < <

Loss1 Loss2 Loss3 < <

Contributions: • Show that the training time can be reduced from

quadratic to linear in the number of training samples. • Improved performance over state-of-the-art for both

the example problems.

(2)

2. Method

Linear Time Constraint Generation

• Generalizes ordinal regression to a structured space. • Learns the weight vector w, such that the input-output

pairs with lower loss values (Δi) are assigned a higher compatibility score f(xi, yi) = wTϕ(xi, yi).

• The objective function pays a hinge loss proportional to the difference in losses for mis-ordering:

(1)

• (1) has O(n2) number of constraints and slack variables. • The equivalent 1-slack formulation has a single slack

variable, .

• cij = |(Δi < Δj) ∧ ((wTϕ(xi, yi)) – (wTϕ(xj, yj)) < 1)| . • A constraint is violated when the difference in the

compatibility score of samples is less than the margin. • Training of (2) involves optimizing the objective

function with the violated constraints. • If the loss values are discrete and finite, all the violated

constraints can be found in linear time.

(a) (b)

• Sort the data samples, as list S, in terms of decreasing compatibility score.

• Iterate through the set of loss values l = l1 … lL. For each l: • If a sample j with Δj > l (black bar) is scored such that

it violates the margin constraint with a sample i having Δi = l, it will also violate all the subsequent samples i’ with Δi’ = l in the sorted list S.

• By recording all the samples having loss value l (grey bars), all the violated constraints for pairs (i, j) with Δi

= l, can be found in one pass through the samples. • http://www.robots.ox.ac.uk/~vgg/software/struct_rank/ .

Structured Output Ranking

Results

Results

* The target class is bike

Algorithm

* * * * *

The number of loss values is logarithmic in the number of class labels.

• We learn the class hierarchies in a ranking setting. • Image (xi) is represented by quantised SIFT descriptors

encoded using locality-constrained linear encoding. • The joint feature map (ϕ) is the standard one used in

multi-class prediction: ϕ(xi, yi) = λ(yi) xi. • Class attribute vector λj(yi) = 1 if j = yi, zero otherwise.

The number of loss values is limited by the number of parts.

Objective: Structured output ranking for problems

where the mis-classification score has a special structure.

Examples:

1. Taxonomy in which the mis-classification score for object classes close by, using the tree distance within the taxonomy, should be less than for those far apart.