fast edge detection using structured forests by piotr

Fast Edge Detection Using StructuredForests

by Piotr Dollar, C. Lawrence Zitnick (PAMI2015)

Vojtech Cvrcek

Reading Group Presentation

June 6, 2018

Fast edge detection June 6, 2018 1 / 22

Problem Definition

given a set of training images and corresponding segmentation masks

predict an edge map for a novel input image

Problem Definition

given a set of training images and corresponding segmentation masks

predict an edge map for a novel input image

Edge structure

Random Forest - Single Tree

ft : X → Y

hθj : X → {L,R}

h1θ1j

(x) = x(k) < τ

θ1j = (k, τ)

h2θ2j

(x) = x(k1)− x(k2) < τ

θ2j = (k1, k2, τ)

hθj (x) = δh1θ1j

(x) + (1− δ)h2θ2j

δ ∈ {0, 1}

y3 y4 y5

x ∈ X

ft : X → Y

hθj : X → {L,R}

h1θ1j

(x) = x(k) < τ

θ1j = (k, τ)

h2θ2j

(x) = x(k1)− x(k2) < τ

θ2j = (k1, k2, τ)

hθj (x) = δh1θ1j

(x) + (1− δ)h2θ2j

δ ∈ {0, 1}

y3 y4 y5

x ∈ Xft : X → Y

ft : X → Y

hθj : X → {L,R}h1θ1j

(x) = x(k) < τ

θ1j = (k, τ)

h2θ2j

(x) = x(k1)− x(k2) < τ

θ2j = (k1, k2, τ)

hθj (x) = δh1θ1j

(x) + (1− δ)h2θ2j

δ ∈ {0, 1}

y3 y4 y5

x ∈ Xft : X → Y

ft : X → Y

(x) = x(k) < τ

θ1j = (k, τ)

h2θ2j

(x) = x(k1)− x(k2) < τ

θ2j = (k1, k2, τ)

hθj (x) = δh1θ1j

(x) + (1− δ)h2θ2j

δ ∈ {0, 1}

y3 y4 y5

x ∈ Xft : X → Y

ft : X → Y

(x) = x(k) < τ

θ1j = (k, τ)

h2θ2j

(x) = x(k1)− x(k2) < τ

θ2j = (k1, k2, τ)

hθj (x) = δh1θ1j

(x) + (1− δ)h2θ2j

δ ∈ {0, 1}

y3 y4 y5

x ∈ Xft : X → Y

Training

each tree is trained independently in a recursive manner

for a given node j and training set Sj ⊂ X × Y, randomly sampleparameters θj from parameters space

Select θj resulting in a ’good’ split of the data

bad splitgood split

Training

bad split

good split

Training

bad split

good split

Training - Information Gain Criterion

information gain criterion:

Ij = I (Sj ,SLj ,SRj ), (1)

where SLj = {(x , y) ∈ Sj | h(x , θ) = 0}, SRj = Sj \ SLj are splits

θj = argmaxθ Ij(Sj , θ)

for multiclass classification (Y ⊂ Z) the standard definition ofinformation gain is:

Ij = H(Sj)−∑

k∈{L,R}

|Skj ||Sj |

H(Skj ) (2)

where H(S) is either the Shannon entropy (H(S) = −∑

y py log(py ))or alternatively the Gini impurity (H(S) = −

∑y py (1− py ))

Ij = H(Sj)−∑

k∈{L,R}

|Skj ||Sj |

H(Skj ) (2)

∑y py (1− py ))

Ij = H(Sj)−∑

k∈{L,R}

|Skj ||Sj |

H(Skj ) (2)

∑y py (1− py ))

Ij = H(Sj)−∑

k∈{L,R}

|Skj ||Sj |

H(Skj ) (2)

∑y py (1− py ))

Training

training stops when a maximum depth is reached or if informationgain or training set size fall below fixed threshold

single output y ∈ Y is assigned to a leaf node based on a problemspecific ensemble model

Training

training stops when a maximum depth is reached or if informationgain or training set size fall below fixed threshold

single output y ∈ Y is assigned to a leaf node based on a problemspecific ensemble model

Testing

combining results from multiple trees depends on a problem specificensemble

classification → majority voting

regression → averaging

Testing

Structured Forests

structured output space Y, e.g.:

use of structured outputs in random forests presents followingchallenges:

computing splits for structured output spaces of highdimensions/complexity is time consumingit is unclear how to define information gain

solution: mapping (somehow) structured output space Y intomulticlass space C = {1, ..., k}

?Ij - multiclass case

Structured Forests

structured output space Y, e.g.:use of structured outputs in random forests presents followingchallenges:

Structured Forests

computing splits for structured output spaces of highdimensions/complexity is time consuming

it is unclear how to define information gain

Structured Forests

Ij - multiclass case

Structured Forests

Structured Edges Clustering

How to efficiently compute splits?

Can the task be transformed into a multiclass problem?

Y ?−→ C = {1, ..., k}

start with an intermediate mapping:

Π : Y → Z (3)

z = Π(y) is a long binary vector, which encodes whether every pair ofpixels in the y belongs to the same or different segment

Y ?−→ C = {1, ..., k}

Π : Y → Z (3)

Y ?−→ C = {1, ..., k}

Π : Y → Z (3)

Y ?−→ C = {1, ..., k}

Π : Y → Z (3)

dimension of vectors z ∈ Z is reduced by PCA to m = 5, andclustering (k-means) splits Z into k = 2 clusters

Y pairs−−−→ Z PCA, k-means−−−−−−−−→ C

since the elements of Y are of size 16 x 16, the dimension of Z is(2562

too expensive → randomly sample m = 256 dimensions of Z

Y sampled pairs−−−−−−−−→ Z PCA, k-means−−−−−−−−→ C

)too expensive → randomly sample m = 256 dimensions of Z

Ensemble model

during training, we need to assign a single prediction to a leaf node

during testing, we need to combine multiple predictions into one

to select a single output from a set y1, ..., yk ∈ Y:

compute zi = Πyiselect yk∗ such that k∗ = argmink

∑i,j(zk,j − zi,j)

2 (medoid)

a domain specific ensemble model (for edge map): y′k∗ = E [y

′i ]

Ensemble model

2 (medoid)

′i ]

Ensemble model

2 (medoid)

′i ]

Ensemble model

compute zi = Πyi

select yk∗ such that k∗ = argmink

2 (medoid)

′i ]

Ensemble model

2 (medoid)

′i ]

Ensemble model

2 (medoid)

′i ]

Structured Forest Training - Overview

input:

image patch 32× 32, sampled into 7228 features

corresponding segmentation mask 16× 16

randomly selected features per split

segmentation masks → clusters → split information gain

medoid → segmentation mask

averaging → soft edge map

input:

Structured Forest Testing

Efficiency

Why is the method so fast?

a single decision tree → lots of pixel information

lots of pixel information → a small random forest → fast evaluation

Efficiency

Multiscale Detection

multiscale version takes original, double, and half resolution of aninput image

resulting three edge maps are averaged

slower (×5)

improved edge quality

slower (×5)

Edge Sharpening

individual predictions are noisy and do not perfectly align to eachother or the underlying image data

sharpening takes individual prediction y and produces a new maskthat better aligns it to the image patch x :

compute mean segment s color, µs = E [x(j) | y(j) = s]change pixel j assignment if the pixel color x(j) is closer to differentsegment (s∗ = argmins‖µs − x(j)‖) and such segment labeling is in4-connected vicinity

sharpening can be repeated multiple times, Dollar & Zitnick claimsthat in practice two steps suffice

Edge Sharpening

compute mean segment s color, µs = E [x(j) | y(j) = s]

change pixel j assignment if the pixel color x(j) is closer to differentsegment (s∗ = argmins‖µs − x(j)‖) and such segment labeling is in4-connected vicinity

Edge Sharpening

Parameter Sweeps

Results - SE

Results - HED

Conclusion

realtime

structured learning

suitable preprocessing step for methods requiring speed

Conclusion

realtime

structured learning

Conclusion

realtime

structured learning

fast edge detection using structured forests by piotr

Documents