iccv2009: map inference in discrete models: part 5

Course Program9.30-10.00 Introduction (Andrew Blake)

10.00-11.00 Discrete Models in Computer Vision (Carsten Rother)

15min Coffee break

11.15-12.30 Message Passing: DP, TRW, LP relaxation (Pawan Kumar)

12.30-13.00 Quadratic pseudo-boolean optimization (Pushmeet Kohli)

1 hour Lunch break

14:00-15.00 Transformation and move-making methods (Pushmeet Kohli)

15:00-15.30 Speed and Efficiency (Pushmeet Kohli)

15min Coffee break

15:45-16.15 Comparison of Methods (Carsten Rother)

16:15-17.30 Recent Advances: Dual-decomposition, higher-order, etc. (Carsten Rother + Pawan Kumar)

All online material will be online (after conference):http://research.microsoft.com/en-us/um/cambridge/projects/tutorial/

Comparison of Optimization Methods

Carsten Rother

Microsoft Research Cambridge

Why is good optimization important?

[Data courtesy from Oliver Woodford]

Problem: Minimize a binary 4-connected pair-wise MRF (choose a colour-mode at each pixel)

Input: Image sequence

Output: New view

[Fitzgibbon et al. ‘03+

Why is good optimization important?

Belief Propagation ICM, Simulated Annealing

Ground Truth

QPBOP [Boros ’06, Rother ‘07+

Global Minimum

Graph Cut with truncation [Rother et al. ‘05+

Comparison papers• Binary, highly-connected MRFs *Rother et al. ‘07+

• Multi-label, 4-connected MRFs *Szeliski et al. ‘06,‘08+all online: http://vision.middlebury.edu/MRF/

• Multi-label, highly-connected MRFs *Kolmogorov et al. ‘06+

• Multi-label, 4-connected MRFs *Szeliski et al. ‘06,‘08+all online: http://vision.middlebury.edu/MRF/

• Multi-label, highly-connected MRFs *Kolmogorov et al. ‘06+

Random MRFs

o Three important factors:

o Connectivity (av. degree of a node)

o Unary strength:

o Percentage of non-submodular terms (NS)

E(x) = w ∑ θi (xi) + ∑ θij (xi,xj)

Computer Vision Problems

perc. unlabeled (sec) Energy (sec)

Conclusions: • Connectivity is a crucial factor• Simple methods like Simulated

Annealing sometimes best

Diagram Recognition [Szummer et al ‘04]

71 nodes; 4.8 con.; 28% non-sub; 0.5 unary strength

Ground truth

GrapCut E= 119 (0 sec) ICM E=999 (0 sec)BP E=25 (0 sec)

QPBO: 56.3% unlabeled (0 sec)QPBOP (0sec) - Global Min.Sim. Ann. E=0 (0.28sec)

• 2700 test cases: QPBO solved nearly all

(QPBOP solves all)

Binary Image Deconvolution50x20 nodes; 80con; 100% non-sub; 109 unary strength

Ground Truth Input

0.2 0.2 0.2 0.2 0.2

MRF: 80 connectivity - illustration

5x5 blur kernel

Binary Image Deconvolution50x20 nodes; 80con; 100% non-sub; 109 unary strength

Ground Truth QPBO 80% unlab. (0.1sec)Input

ICM E=6 (0.03sec)QPBOP 80% unlab. (0.9sec) GC E=999 (0sec)

BP E=71 (0.9sec) QPBOP+BP+I, E=8.1 (31sec) Sim. Ann. E=0 (1.3sec)

Conclusion: low-connectivity tractable: QPBO(P)

• Multi-label, 4-connected MRFs *Szeliski et al ‘06,‘08+all online: http://vision.middlebury.edu/MRF/

• Multi-label, highly-connected MRFs *Kolmogorov et al ‘06+

Conclusion: low-connectivity tractable: QPBO(P)

• Multi-label, 4-connected MRFs *Szeliski et al ‘06,‘08+all online: http://vision.middlebury.edu/MRF/

Multiple labels – 4 connected

[Szelsiki et al ’06,08+

stereo

Panoramic stitching

Image Segmentation;de-noising; in-painting

“Attractive Potentials”

Stereo

Conclusions: – Solved by alpha-exp. and TRW-S

(within 0.01%-0.9% of lower bound – true for all tests!)

– Expansion-move always better than swap-move

image Ground truth

TRW-S image Ground truth

De-noising and in-painting

Conclusion:

– Alpha-expansion has problems with smooth areas (potential solution: fusion-move *Lempitsky et al. ‘07+)

Ground truth TRW-S Alpha-exp.Noisy input

Panoramic stitching

• Unordered labels are (slightly) more challenging

Conclusion: low-connectivity tractable (QPBO)

• Multi-label, 4-connected MRFs *Szeliski et al ‘06,‘08+all online: http://vision.middlebury.edu/MRF/Conclusion: solved by expansion-move; TRW-S

(within 0.01 - 0.9% of lower bound)

• Multi-label, 4-connected MRFs *Szeliski et al ‘06,‘08+all online: http://vision.middlebury.edu/MRF/Conclusion: solved by expansion-move; TRW-S

(within 0.01 - 0.9% of lower bound)

Multiple labels – highly connected

Stereo with occlusion:

Each pixel is connected to D pixels in the other image

E(d): {1,…,D}2n → R

*Kolmogorov et al. ‘06+

Multiple labels – highly connected

• Alpha-exp. considerably better than message passing

Tsukuba: 16 labels Cones: 56 labels

Potential reason: smaller connectivity in one expansion-move

Comparison: 4-con. versus highly con.

Conclusion:• highly connected graphs are harder to optimize

Tsukuba (E) Map (E) Venus (E)

highly-con. 103.09% 103.28% 102.26%

4-con. 100.004% 100.056% 100.014%

Lower-bound scaled to 100%

Comparison papers• binary, highly-connected MRFs *Rother et al. ‘07+

• Multi-label, 4-connected MRFs *Szeliski et al ‘06,‘08+all online: http://vision.middlebury.edu/MRF/Conclusion: solved by alpha-exp.; TRW

(within 0.9% to lower bound)

• Multi-label, highly-connected MRFs *Kolmogorov et al ‘06+Conclusion: challenging optimization (alpha-exp. best)

How to efficiently optimize general highly-connected (higher-order) MRFs is still an open question

Course Program9.30-10.00 Introduction (Andrew Blake)

10.00-11.00 Discrete Models in Computer Vision (Carsten Rother)

15min Coffee break

11.15-12.30 Message Passing: DP, TRW, LP relaxation (Pawan Kumar)

12.30-13.00 Quadratic pseudo-boolean optimization (Pushmeet Kohli)

1 hour Lunch break

14:00-15.00 Transformation and move-making methods (Pushmeet Kohli)

15:00-15.30 Speed and Efficiency (Pushmeet Kohli)

15min Coffee break

15:45-16.15 Comparison of Methods (Carsten Rother)

16:15-17.30 Recent Advances: Dual-decomposition, higher-order, etc. (Carsten Rother + Pawan Kumar)

All online material will be online (after conference):http://research.microsoft.com/en-us/um/cambridge/projects/tutorial/

Advanced Topics –Optimizing Higher-Order MRFs

Carsten Rother

Microsoft Research Cambridge

Challenging Optimization Problems

• How to solve higher-order MRFs:

• Possible Approaches:

- Convert to Pairwise MRF (Pushmeet has explained)

- Branch & MinCut (Pushmeet has explained)

- Add global constraint to LP relaxation

- Dual Decomposition

Add global constraints to LPBasic idea:

References:[K. Kolev et al. ECCV’ 08] silhouette constraint [Nowizin et al. CVPR ‘09+ connectivity prior[Lempitsky et al ICCV ‘09+ bounding box prior (see talk on Thursday)

∑i Є T

Xi ≥ 1

See talk on Thursday: [Lempitsky et al ICCV ‘09+ bounding box prior

Dual Decomposition

• Well known in optimization community [Bertsekas ’95, ‘99+

• Other names: “Master-Slave” [Komodiakis et al. ‘07, ’09+

• Examples of Dual-Decomposition approaches:– Solve LP of TRW [Komodiakis et al. ICCV ‘07+

– Image segmentation with connectivity prior [Vicente et al CVPR ‘08+

– Feature Matching [Toressani et al ECCV ‘08+

– Optimizing Higher-Order Clique MRFs [Komodiakis et al CVPR ‘09+

– Marginal Probability Field *Woodford et al ICCV ‘09+

– Jointly optimizing appearance and Segmentation [Vicente et al ICCV 09]

Dual Decomposition

min E(x) = min [ E1(x) + θTx + E2(x) – θTx ]

• θ is called the dual vector (same size as x)

• Goal: max L(θ) ≤ min E(x)

• Properties:• L(θ) is concave (optimal bound can be found)• If x1=x2 then problem solved (not guaranteed)

Hard to optimize Possible to optimize Possible to optimize

x1 x2 “Lower bound”

≥ min [E1(x1) + θTx1] + min [E2(x2) - θTx2] = L(θ)

Why is the lower bound a concave function?

L(θ) = min [E1(x1) + θTx1] + min [E2(x2) - θTx2] x1 x2

L(θ) : Rn -> R

L1(θ)

L1(θ) L2(θ)

L(θ) concave since a sum of concave functions

θTx’1

θTx’’1

θTx’’’1

How to maximize the lower bound?If L(θ) were to be differentiable use gradient ascent

L(θ) not diff. … so subgradient approach [Shor ‘85+

How to maximize the lower bound?If L(θ) were to be differentiable use gradient ascent

L(θ) not diff. so subgradient approach [Shor ‘85+

L(θ) = min [E1(x1) + θTx1] + min [E2(x2) - θTx2] x1 x2

L(θ) : Rn -> R

L1(θ)

L1(θ) L2(θ)

θTx’1

θTx’’1

θTx’’’1

Θ’’ = Θ’ + λ g

Θ’ Θ’’

= Θ’ + λ x’1 Θ’’ = Θ’ + λ (x1-x2)

Subgradient g

Dual DecompositionL(θ) = min [E1(x1) + θTx1] + min [E2(x2) - θTx2] x1 x2

Subproblem 1x1 = min [E1(x1) + θTx1]

Subgradient Optimization:

subgradient

Θ = Θ + λ(x1-x2)

Subproblem 2x2 = min [E2(x2) + θTx2]x2

“Slaves”

“Master”

Example optimization

• Guaranteed to converge to optimal bound L(θ)

• Choose step-width λ correctly ([Bertsekas ’95])

• Pick solution x as the best of x1 or x2

• E and L can in- and decrease during optimization

• Each step: θ gets close to optimal θ*

Why can the lower bound go down?

Lower envelop of planes in 3D:

L(θ’)

L(θ’) ≤ L(θ)

Analyse the model

Θ’’ = Θ’ + λ (x1-x2)

L(θ) = min [E1(x1) + θTx1] + min [E2(x2) - θTx2]

Update step:

Look at pixel p:

Case2: x1p = 1 x2p = 0 then Θ’’ = Θ’+ λ

push x1p towards 0 push x2p towards 1

Case1: x1p = x2p then Θ’’ = Θ’

Case3: x1p = 0 x2p = 1 then Θ’’ = Θ’- λ

push x1p towards 1 push x2p towards 0

Example 1: Segmentation and Connectivity

Foreground object must be connected:

User input Standard MRF Standard MRF+h

Zoom in

E(x) = ∑ θi (xi) + ∑ θij (xi,xj) + h(x)

h(x)= { ∞ if x not 4-connected0 otherwise

*Vicente et al ’08+

Example 1: Segmentation and ConnectivityE1(x)

min E(x) = min [ E1(x) + θTx + h(x) – θTx ]

≥ min [E1(x1) + θTx1] + min [h(x2) + θTx2] = L(θ)x1 x2

Derive Lower bound:

Subproblem 1:

Unary terms +pairwise terms

Global minimum:GraphCut

Subproblem 2:

Unary terms + Connectivity constraint

Global minimum: Dijkstra

But: Lower bound was for no example tight.

Example 1: Segmentation and Connectivity

min E(x) = min [ E1(x) + θTx + θ’Tx’ + h(x) – θTx + h(x) - θ’Tx’]

≥ min [E1(x1) + θTx1 + θ’Tx’1] + min [h(x2) + θTx2] +

min [h(x3) + θ’Tx’3] =L(θ)x2

x1,x’1

Derive Lower bound:

Subproblem 1:

Unary terms +pairwise terms

Global minimum:GraphCut

Subproblem 2:

Unary terms + Connectivity constraint

Global minimum: Dijkstra

x’ indicator vector of all pairwise terms

x3,x’3Subproblem 3:

Pairwise terms + Connectivity constraint

Lower Bound: Based on minimal paths on a dual graph

x’ x’

Results: Segmentation and Connectivity

Global optimum 12 out of 40 cases.

Image Input GraphCut GlobalMin

Heuristic method, DijkstraGC, which is faster and gives empirically same or better results

Extra Input

*Vicente et al ’08+

Example2: Dual of the LP Relaxation(from Pawan Kumar’s part)

Wainwright et al., 20011

q*( 1)

q*( 2)

q*( 3)

q*( 4) q*( 5) q*( 6)

i q*( i)

Dual of LP

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

i ≥ 0

= ( a0, a1,…, ab00, ab01,…)

= L({ i})

q*( i) = min iTxixi

x = (xa0, xa1,…,xab00,xab01,…)

Example2: Dual of the LP Relaxation

min Tx = min iTx ≥ min iTxi = L({ i})

“Original problem”

∑ix xi

“i different trees”

q*( i)Subject to i =

“Lower bound”

Projected subgradient method:

Θi = [Θi + λxi ]Ω

Ω= {Θi| ∑ Θi = Θ }

q*( i) = min iTxixi

q*( i) concave wrt i ;

Guaranteed to get the optimal lower bound !

θTx’1θTx’’1θTx’’’1

Use subgradient … why?

Example 2: optimize LP of TRW

[Komodiakis et al ’07+

TRW-S:• Not guaranteed to get optimal bound (DD does)• Lower bound goes always up (DD not).• Needs min-marginals (DD not)• DD paralizable (every tree in DD can be optimized separately)

Not NP-hard(PushmeetKohli’s part)

Example 3: A global perspective on low-level vision

*Woodford et al. ICCV’09+(see poster on Friday)

Add global term which enforcesa match with the marginal statistic

∑xii

Cost f

E(x) = ∑ θi (xi) + ∑ θij (xi,xj) + f(∑xi)ii i,jЄN

Global unary, er. 12.8%

∑xii

Cost f“Solve with dual-decomposition”

Example 3: A global perspective on low-level vision

Image synthesis

Image de-noising

global colordistribution prior

[Kwatra ’03+

Pairwise-MRFNoisy input Global gradient prior

Ground truth Gradient strength

Example 4: Solve GrabCut globally optimal

*Vicente et al; ICCV ’09+(see poster on Tuesday)

E(x, w)

w Color model

Highly connected MRF

E’(x) = min E(x, w)w

Higher-order MRF

E(x,w) = ∑ θi (xi,w) + ∑ θij (xi,xj)

E(x,w): {0,1}n x {GMMs}→ R

Prefers “equal area” segmentation Each color either fore- or background

0 n/2 n

g convex fb

concave

E(x)= g(∑xi) + ∑ fb(∑xib) + ∑i b i,jЄN

θij (xi,xj)

∑xi ∑xib

“Solve with dual-decomposition”

*Vicente et al; ICCV ’09+(see poster on Tuesday)

Globally optimal in 60% of cases, such as…

Summary

• Dual Decomposition is a powerful technique for challenging MRFs

• Not guaranteed to give globally optimal energy

• … but for several vision problems we get tight bounds

… unused slides

Texture Restoration (table 1)256x85 nodes; 15connec; 36% non-sub; 6.6 unary strength

Training Image Test Image GC, E=999(0.05sec)

QPBO, 16.5% unlab. (1.4sec)

QPBOP, 0% unlab. (14sec)Global Minimum.Sim. ann., ICM, BP, BP+I, C+BP+I (visually similar)

New-View Synthesis [Fitzgibbon et al ‘03]

385x385 nodes; 8con; 8% non-sub; 0.1 unary strength

QPBO 3.9% unlabelled (black) (0.7sec)

QPBOP - Global Min. (1.4sec),P+BP+I, BP+I

Ground Truth Sim. Ann. E=980 (50sec)ICM E=999 (0.2sec) (visually similar)

Graph Cut E=2 (0.3sec)

BP E=18 (0.6sec)

Image Segmentation – region & boundary brush 321x221 nodes; 4con; 0.006% non-sub; 0 unary strength

Sim. Ann. E=983 (50sec) ICM E=999 (0.07sec)

GraphCut E=873 (0.11sec) BP E=28 (0.2sec) QPBO 26.7% unlabeled (0.08sec)

QPBOP Global Min (3.8sec)

Input Image User Input

Non-truncated cost function

[HBF; Szeliski ‘06+ Fast linear system solver in continuous domain (then discretised)

original input HBF

|di-dj|

iccv2009: map inference in discrete models: part 5

connected mrfs

connected tsukuba

connected graphs

qpbop multilabel

edumrf multilabel

comparison papers binary

kolmogorov et

lower bound multilabel

Education

discrete math: rules of inference

discrete mathematics - rules of inference and...

stein variational inference for discrete distributionsstein...

iccv2009: map inference in discrete models: part 3

iccv2009 recognition and learning object categories p3 c00...

discrete-continuous admm for transductive inference in...

cse 321 discrete structures winter 2008 lecture 5 rules of...

dynamic hybrid algorithms for map inference in discrete mrfs

discrete markov image modeling and inference on the...

application of a discrete-character parsimony...

iccv2009 recognition and learning object categories p2 c03...

statistical modelling and inference for multivariate and...

discrete structures of computer science theorems...

a comparative study of modern inference techniques for...

1 causal inference on discrete data using additive noise...

statistical inference for diffusion processes ·...

inference in regression discontinuity designs with...

iccv2009 recognition and learning object categories p1 c01...

iccv2009 recognition and learning object categories p0 c00...

iccv2009: map inference in discrete models: part 6: recent...