fast edge detection using structured forests by piotr
TRANSCRIPT
Fast Edge Detection Using StructuredForests
by Piotr Dollar, C. Lawrence Zitnick (PAMI2015)
Vojtech Cvrcek
Reading Group Presentation
June 6, 2018
Fast edge detection June 6, 2018 1 / 22
Problem Definition
given a set of training images and corresponding segmentation masks
predict an edge map for a novel input image
Fast edge detection June 6, 2018 2 / 22
Problem Definition
given a set of training images and corresponding segmentation masks
predict an edge map for a novel input image
Fast edge detection June 6, 2018 2 / 22
Random Forest - Single Tree
ft : X → Y
L R
hθj : X → {L,R}
h1θ1j
(x) = x(k) < τ
θ1j = (k, τ)
h2θ2j
(x) = x(k1)− x(k2) < τ
θ2j = (k1, k2, τ)
hθj (x) = δh1θ1j
(x) + (1− δ)h2θ2j
(x)
δ ∈ {0, 1}
y1
y2
y3 y4 y5
y6 y7
x ∈ X
ft : X → Y
Fast edge detection June 6, 2018 4 / 22
Random Forest - Single Tree
ft : X → Y
L R
hθj : X → {L,R}
h1θ1j
(x) = x(k) < τ
θ1j = (k, τ)
h2θ2j
(x) = x(k1)− x(k2) < τ
θ2j = (k1, k2, τ)
hθj (x) = δh1θ1j
(x) + (1− δ)h2θ2j
(x)
δ ∈ {0, 1}
y1
y2
y3 y4 y5
y6 y7
x ∈ Xft : X → Y
Fast edge detection June 6, 2018 4 / 22
Random Forest - Single Tree
ft : X → Y
L R
hθj : X → {L,R}h1θ1j
(x) = x(k) < τ
θ1j = (k, τ)
h2θ2j
(x) = x(k1)− x(k2) < τ
θ2j = (k1, k2, τ)
hθj (x) = δh1θ1j
(x) + (1− δ)h2θ2j
(x)
δ ∈ {0, 1}
y1
y2
y3 y4 y5
y6 y7
x ∈ Xft : X → Y
Fast edge detection June 6, 2018 4 / 22
Random Forest - Single Tree
ft : X → Y
L R
hθj : X → {L,R}h1θ1j
(x) = x(k) < τ
θ1j = (k, τ)
h2θ2j
(x) = x(k1)− x(k2) < τ
θ2j = (k1, k2, τ)
hθj (x) = δh1θ1j
(x) + (1− δ)h2θ2j
(x)
δ ∈ {0, 1}
y1
y2
y3 y4 y5
y6 y7
x ∈ Xft : X → Y
Fast edge detection June 6, 2018 4 / 22
Random Forest - Single Tree
ft : X → Y
L R
hθj : X → {L,R}h1θ1j
(x) = x(k) < τ
θ1j = (k, τ)
h2θ2j
(x) = x(k1)− x(k2) < τ
θ2j = (k1, k2, τ)
hθj (x) = δh1θ1j
(x) + (1− δ)h2θ2j
(x)
δ ∈ {0, 1}
y1
y2
y3 y4 y5
y6 y7
x ∈ Xft : X → Y
Fast edge detection June 6, 2018 4 / 22
Training
each tree is trained independently in a recursive manner
for a given node j and training set Sj ⊂ X × Y, randomly sampleparameters θj from parameters space
Select θj resulting in a ’good’ split of the data
bad splitgood split
Fast edge detection June 6, 2018 5 / 22
Training
each tree is trained independently in a recursive manner
for a given node j and training set Sj ⊂ X × Y, randomly sampleparameters θj from parameters space
Select θj resulting in a ’good’ split of the data
bad split
good split
Fast edge detection June 6, 2018 5 / 22
Training
each tree is trained independently in a recursive manner
for a given node j and training set Sj ⊂ X × Y, randomly sampleparameters θj from parameters space
Select θj resulting in a ’good’ split of the data
bad split
good split
Fast edge detection June 6, 2018 5 / 22
Training - Information Gain Criterion
information gain criterion:
Ij = I (Sj ,SLj ,SRj ), (1)
where SLj = {(x , y) ∈ Sj | h(x , θ) = 0}, SRj = Sj \ SLj are splits
θj = argmaxθ Ij(Sj , θ)
for multiclass classification (Y ⊂ Z) the standard definition ofinformation gain is:
Ij = H(Sj)−∑
k∈{L,R}
|Skj ||Sj |
H(Skj ) (2)
where H(S) is either the Shannon entropy (H(S) = −∑
y py log(py ))or alternatively the Gini impurity (H(S) = −
∑y py (1− py ))
Fast edge detection June 6, 2018 6 / 22
Training - Information Gain Criterion
information gain criterion:
Ij = I (Sj ,SLj ,SRj ), (1)
where SLj = {(x , y) ∈ Sj | h(x , θ) = 0}, SRj = Sj \ SLj are splits
θj = argmaxθ Ij(Sj , θ)
for multiclass classification (Y ⊂ Z) the standard definition ofinformation gain is:
Ij = H(Sj)−∑
k∈{L,R}
|Skj ||Sj |
H(Skj ) (2)
where H(S) is either the Shannon entropy (H(S) = −∑
y py log(py ))or alternatively the Gini impurity (H(S) = −
∑y py (1− py ))
Fast edge detection June 6, 2018 6 / 22
Training - Information Gain Criterion
information gain criterion:
Ij = I (Sj ,SLj ,SRj ), (1)
where SLj = {(x , y) ∈ Sj | h(x , θ) = 0}, SRj = Sj \ SLj are splits
θj = argmaxθ Ij(Sj , θ)
for multiclass classification (Y ⊂ Z) the standard definition ofinformation gain is:
Ij = H(Sj)−∑
k∈{L,R}
|Skj ||Sj |
H(Skj ) (2)
where H(S) is either the Shannon entropy (H(S) = −∑
y py log(py ))or alternatively the Gini impurity (H(S) = −
∑y py (1− py ))
Fast edge detection June 6, 2018 6 / 22
Training - Information Gain Criterion
information gain criterion:
Ij = I (Sj ,SLj ,SRj ), (1)
where SLj = {(x , y) ∈ Sj | h(x , θ) = 0}, SRj = Sj \ SLj are splits
θj = argmaxθ Ij(Sj , θ)
for multiclass classification (Y ⊂ Z) the standard definition ofinformation gain is:
Ij = H(Sj)−∑
k∈{L,R}
|Skj ||Sj |
H(Skj ) (2)
where H(S) is either the Shannon entropy (H(S) = −∑
y py log(py ))or alternatively the Gini impurity (H(S) = −
∑y py (1− py ))
Fast edge detection June 6, 2018 6 / 22
Training
training stops when a maximum depth is reached or if informationgain or training set size fall below fixed threshold
single output y ∈ Y is assigned to a leaf node based on a problemspecific ensemble model
Fast edge detection June 6, 2018 7 / 22
Training
training stops when a maximum depth is reached or if informationgain or training set size fall below fixed threshold
single output y ∈ Y is assigned to a leaf node based on a problemspecific ensemble model
Fast edge detection June 6, 2018 7 / 22
Testing
combining results from multiple trees depends on a problem specificensemble
classification → majority voting
regression → averaging
Fast edge detection June 6, 2018 8 / 22
Testing
combining results from multiple trees depends on a problem specificensemble
classification → majority voting
regression → averaging
Fast edge detection June 6, 2018 8 / 22
Testing
combining results from multiple trees depends on a problem specificensemble
classification → majority voting
regression → averaging
Fast edge detection June 6, 2018 8 / 22
Structured Forests
structured output space Y, e.g.:
use of structured outputs in random forests presents followingchallenges:
computing splits for structured output spaces of highdimensions/complexity is time consumingit is unclear how to define information gain
solution: mapping (somehow) structured output space Y intomulticlass space C = {1, ..., k}
?Ij - multiclass case
Fast edge detection June 6, 2018 9 / 22
Structured Forests
structured output space Y, e.g.:use of structured outputs in random forests presents followingchallenges:
computing splits for structured output spaces of highdimensions/complexity is time consumingit is unclear how to define information gain
solution: mapping (somehow) structured output space Y intomulticlass space C = {1, ..., k}
?Ij - multiclass case
Fast edge detection June 6, 2018 9 / 22
Structured Forests
structured output space Y, e.g.:use of structured outputs in random forests presents followingchallenges:
computing splits for structured output spaces of highdimensions/complexity is time consuming
it is unclear how to define information gain
solution: mapping (somehow) structured output space Y intomulticlass space C = {1, ..., k}
?Ij - multiclass case
Fast edge detection June 6, 2018 9 / 22
Structured Forests
structured output space Y, e.g.:use of structured outputs in random forests presents followingchallenges:
computing splits for structured output spaces of highdimensions/complexity is time consumingit is unclear how to define information gain
solution: mapping (somehow) structured output space Y intomulticlass space C = {1, ..., k}
?
Ij - multiclass case
Fast edge detection June 6, 2018 9 / 22
Structured Forests
structured output space Y, e.g.:use of structured outputs in random forests presents followingchallenges:
computing splits for structured output spaces of highdimensions/complexity is time consumingit is unclear how to define information gain
solution: mapping (somehow) structured output space Y intomulticlass space C = {1, ..., k}
?
Ij - multiclass case
Fast edge detection June 6, 2018 9 / 22
Structured Edges Clustering
How to efficiently compute splits?
Can the task be transformed into a multiclass problem?
Y ?−→ C = {1, ..., k}
start with an intermediate mapping:
Π : Y → Z (3)
z = Π(y) is a long binary vector, which encodes whether every pair ofpixels in the y belongs to the same or different segment
Fast edge detection June 6, 2018 10 / 22
Structured Edges Clustering
How to efficiently compute splits?
Can the task be transformed into a multiclass problem?
Y ?−→ C = {1, ..., k}
start with an intermediate mapping:
Π : Y → Z (3)
z = Π(y) is a long binary vector, which encodes whether every pair ofpixels in the y belongs to the same or different segment
Fast edge detection June 6, 2018 10 / 22
Structured Edges Clustering
How to efficiently compute splits?
Can the task be transformed into a multiclass problem?
Y ?−→ C = {1, ..., k}
start with an intermediate mapping:
Π : Y → Z (3)
z = Π(y) is a long binary vector, which encodes whether every pair ofpixels in the y belongs to the same or different segment
Fast edge detection June 6, 2018 10 / 22
Structured Edges Clustering
How to efficiently compute splits?
Can the task be transformed into a multiclass problem?
Y ?−→ C = {1, ..., k}
start with an intermediate mapping:
Π : Y → Z (3)
z = Π(y) is a long binary vector, which encodes whether every pair ofpixels in the y belongs to the same or different segment
Fast edge detection June 6, 2018 10 / 22
Structured Edges Clustering
dimension of vectors z ∈ Z is reduced by PCA to m = 5, andclustering (k-means) splits Z into k = 2 clusters
Y pairs−−−→ Z PCA, k-means−−−−−−−−→ C
Ij - multiclass case
Fast edge detection June 6, 2018 11 / 22
Structured Edges Clustering
dimension of vectors z ∈ Z is reduced by PCA to m = 5, andclustering (k-means) splits Z into k = 2 clusters
Y pairs−−−→ Z PCA, k-means−−−−−−−−→ C
Ij - multiclass case
Fast edge detection June 6, 2018 11 / 22
Structured Edges Clustering
dimension of vectors z ∈ Z is reduced by PCA to m = 5, andclustering (k-means) splits Z into k = 2 clusters
Y pairs−−−→ Z PCA, k-means−−−−−−−−→ C
Ij - multiclass case
Fast edge detection June 6, 2018 11 / 22
Structured Edges Clustering
since the elements of Y are of size 16 x 16, the dimension of Z is(2562
)
too expensive → randomly sample m = 256 dimensions of Z
Y sampled pairs−−−−−−−−→ Z PCA, k-means−−−−−−−−→ C
Fast edge detection June 6, 2018 12 / 22
Structured Edges Clustering
since the elements of Y are of size 16 x 16, the dimension of Z is(2562
)too expensive → randomly sample m = 256 dimensions of Z
Y sampled pairs−−−−−−−−→ Z PCA, k-means−−−−−−−−→ C
Fast edge detection June 6, 2018 12 / 22
Structured Edges Clustering
since the elements of Y are of size 16 x 16, the dimension of Z is(2562
)too expensive → randomly sample m = 256 dimensions of Z
Y sampled pairs−−−−−−−−→ Z PCA, k-means−−−−−−−−→ C
Fast edge detection June 6, 2018 12 / 22
Ensemble model
during training, we need to assign a single prediction to a leaf node
during testing, we need to combine multiple predictions into one
to select a single output from a set y1, ..., yk ∈ Y:
compute zi = Πyiselect yk∗ such that k∗ = argmink
∑i,j(zk,j − zi,j)
2 (medoid)
a domain specific ensemble model (for edge map): y′k∗ = E [y
′i ]
Fast edge detection June 6, 2018 13 / 22
Ensemble model
during training, we need to assign a single prediction to a leaf node
during testing, we need to combine multiple predictions into one
to select a single output from a set y1, ..., yk ∈ Y:
compute zi = Πyiselect yk∗ such that k∗ = argmink
∑i,j(zk,j − zi,j)
2 (medoid)
a domain specific ensemble model (for edge map): y′k∗ = E [y
′i ]
Fast edge detection June 6, 2018 13 / 22
Ensemble model
during training, we need to assign a single prediction to a leaf node
during testing, we need to combine multiple predictions into one
to select a single output from a set y1, ..., yk ∈ Y:
compute zi = Πyiselect yk∗ such that k∗ = argmink
∑i,j(zk,j − zi,j)
2 (medoid)
a domain specific ensemble model (for edge map): y′k∗ = E [y
′i ]
Fast edge detection June 6, 2018 13 / 22
Ensemble model
during training, we need to assign a single prediction to a leaf node
during testing, we need to combine multiple predictions into one
to select a single output from a set y1, ..., yk ∈ Y:
compute zi = Πyi
select yk∗ such that k∗ = argmink
∑i,j(zk,j − zi,j)
2 (medoid)
a domain specific ensemble model (for edge map): y′k∗ = E [y
′i ]
Fast edge detection June 6, 2018 13 / 22
Ensemble model
during training, we need to assign a single prediction to a leaf node
during testing, we need to combine multiple predictions into one
to select a single output from a set y1, ..., yk ∈ Y:
compute zi = Πyiselect yk∗ such that k∗ = argmink
∑i,j(zk,j − zi,j)
2 (medoid)
a domain specific ensemble model (for edge map): y′k∗ = E [y
′i ]
Fast edge detection June 6, 2018 13 / 22
Ensemble model
during training, we need to assign a single prediction to a leaf node
during testing, we need to combine multiple predictions into one
to select a single output from a set y1, ..., yk ∈ Y:
compute zi = Πyiselect yk∗ such that k∗ = argmink
∑i,j(zk,j − zi,j)
2 (medoid)
a domain specific ensemble model (for edge map): y′k∗ = E [y
′i ]
Fast edge detection June 6, 2018 13 / 22
Structured Forest Training - Overview
input:
image patch 32× 32, sampled into 7228 features
corresponding segmentation mask 16× 16
randomly selected features per split
segmentation masks → clusters → split information gain
medoid → segmentation mask
averaging → soft edge map
Fast edge detection June 6, 2018 14 / 22
Structured Forest Training - Overview
input:
image patch 32× 32, sampled into 7228 features
corresponding segmentation mask 16× 16
randomly selected features per split
segmentation masks → clusters → split information gain
medoid → segmentation mask
averaging → soft edge map
Fast edge detection June 6, 2018 14 / 22
Structured Forest Training - Overview
input:
image patch 32× 32, sampled into 7228 features
corresponding segmentation mask 16× 16
randomly selected features per split
segmentation masks → clusters → split information gain
medoid → segmentation mask
averaging → soft edge map
Fast edge detection June 6, 2018 14 / 22
Structured Forest Training - Overview
input:
image patch 32× 32, sampled into 7228 features
corresponding segmentation mask 16× 16
randomly selected features per split
segmentation masks → clusters → split information gain
medoid → segmentation mask
averaging → soft edge map
Fast edge detection June 6, 2018 14 / 22
Structured Forest Training - Overview
input:
image patch 32× 32, sampled into 7228 features
corresponding segmentation mask 16× 16
randomly selected features per split
segmentation masks → clusters → split information gain
medoid → segmentation mask
averaging → soft edge map
Fast edge detection June 6, 2018 14 / 22
Structured Forest Training - Overview
input:
image patch 32× 32, sampled into 7228 features
corresponding segmentation mask 16× 16
randomly selected features per split
segmentation masks → clusters → split information gain
medoid → segmentation mask
averaging → soft edge map
Fast edge detection June 6, 2018 14 / 22
Structured Forest Training - Overview
input:
image patch 32× 32, sampled into 7228 features
corresponding segmentation mask 16× 16
randomly selected features per split
segmentation masks → clusters → split information gain
medoid → segmentation mask
averaging → soft edge map
Fast edge detection June 6, 2018 14 / 22
Efficiency
Why is the method so fast?
a single decision tree → lots of pixel information
lots of pixel information → a small random forest → fast evaluation
Fast edge detection June 6, 2018 16 / 22
Efficiency
Why is the method so fast?
a single decision tree → lots of pixel information
lots of pixel information → a small random forest → fast evaluation
Fast edge detection June 6, 2018 16 / 22
Efficiency
Why is the method so fast?
a single decision tree → lots of pixel information
lots of pixel information → a small random forest → fast evaluation
Fast edge detection June 6, 2018 16 / 22
Multiscale Detection
multiscale version takes original, double, and half resolution of aninput image
resulting three edge maps are averaged
slower (×5)
improved edge quality
Fast edge detection June 6, 2018 17 / 22
Multiscale Detection
multiscale version takes original, double, and half resolution of aninput image
resulting three edge maps are averaged
slower (×5)
improved edge quality
Fast edge detection June 6, 2018 17 / 22
Multiscale Detection
multiscale version takes original, double, and half resolution of aninput image
resulting three edge maps are averaged
slower (×5)
improved edge quality
Fast edge detection June 6, 2018 17 / 22
Multiscale Detection
multiscale version takes original, double, and half resolution of aninput image
resulting three edge maps are averaged
slower (×5)
improved edge quality
Fast edge detection June 6, 2018 17 / 22
Edge Sharpening
individual predictions are noisy and do not perfectly align to eachother or the underlying image data
sharpening takes individual prediction y and produces a new maskthat better aligns it to the image patch x :
compute mean segment s color, µs = E [x(j) | y(j) = s]change pixel j assignment if the pixel color x(j) is closer to differentsegment (s∗ = argmins‖µs − x(j)‖) and such segment labeling is in4-connected vicinity
sharpening can be repeated multiple times, Dollar & Zitnick claimsthat in practice two steps suffice
Fast edge detection June 6, 2018 18 / 22
Edge Sharpening
individual predictions are noisy and do not perfectly align to eachother or the underlying image data
sharpening takes individual prediction y and produces a new maskthat better aligns it to the image patch x :
compute mean segment s color, µs = E [x(j) | y(j) = s]change pixel j assignment if the pixel color x(j) is closer to differentsegment (s∗ = argmins‖µs − x(j)‖) and such segment labeling is in4-connected vicinity
sharpening can be repeated multiple times, Dollar & Zitnick claimsthat in practice two steps suffice
Fast edge detection June 6, 2018 18 / 22
Edge Sharpening
individual predictions are noisy and do not perfectly align to eachother or the underlying image data
sharpening takes individual prediction y and produces a new maskthat better aligns it to the image patch x :
compute mean segment s color, µs = E [x(j) | y(j) = s]
change pixel j assignment if the pixel color x(j) is closer to differentsegment (s∗ = argmins‖µs − x(j)‖) and such segment labeling is in4-connected vicinity
sharpening can be repeated multiple times, Dollar & Zitnick claimsthat in practice two steps suffice
Fast edge detection June 6, 2018 18 / 22
Edge Sharpening
individual predictions are noisy and do not perfectly align to eachother or the underlying image data
sharpening takes individual prediction y and produces a new maskthat better aligns it to the image patch x :
compute mean segment s color, µs = E [x(j) | y(j) = s]change pixel j assignment if the pixel color x(j) is closer to differentsegment (s∗ = argmins‖µs − x(j)‖) and such segment labeling is in4-connected vicinity
sharpening can be repeated multiple times, Dollar & Zitnick claimsthat in practice two steps suffice
Fast edge detection June 6, 2018 18 / 22
Edge Sharpening
individual predictions are noisy and do not perfectly align to eachother or the underlying image data
sharpening takes individual prediction y and produces a new maskthat better aligns it to the image patch x :
compute mean segment s color, µs = E [x(j) | y(j) = s]change pixel j assignment if the pixel color x(j) is closer to differentsegment (s∗ = argmins‖µs − x(j)‖) and such segment labeling is in4-connected vicinity
sharpening can be repeated multiple times, Dollar & Zitnick claimsthat in practice two steps suffice
Fast edge detection June 6, 2018 18 / 22
Conclusion
realtime
structured learning
suitable preprocessing step for methods requiring speed
Fast edge detection June 6, 2018 22 / 22
Conclusion
realtime
structured learning
suitable preprocessing step for methods requiring speed
Fast edge detection June 6, 2018 22 / 22