tractable higher order models in computer vision (part ii) slides from carsten rother, sebastian...

Tractable Higher Order Models in Computer Vision (Part II)

Slides from Carsten Rother, Sebastian Nowozin, Pusohmeet KhliMicrosoft Research Cambridge

Presented by Xiaodan Liang

Part II

• Submodularity • Move making algorithms

• Higher-order model : Pn Potts model

Feature selection

Factoring distributions

Problem inherently combinatorial!

Example: Greedy algorithm for feature selection

6

s

Key property: Diminishing returnsSelection A = {} Selection B = {X2,X3}

Adding X1 will help a lot!

Adding X1 doesn’t help much

New feature X1

B As

+

+

Large improvement

Small improvement

Submodularity:

Y“Sick”

X1

“Fever”

X2

“Rash”X3

“Male”

Y“Sick”

Theorem [Krause, Guestrin UAI ‘05]: Information gain F(A) in Naïve Bayes models is submodular!

7

Why is submodularity useful?Theorem [Nemhauser et al ‘78]Greedy maximization algorithm returns Agreedy:

F(Agreedy) ¸ (1-1/e) max|A| k F(A)

• Greedy algorithm gives near-optimal solution!• For info-gain: Guarantees best possible unless P = NP!

[Krause, Guestrin UAI ’05]

~63%

8

Submodularity in Machine Learning• Many ML problems are submodular, i.e., for F

submodular require:• Minimization: A* = argmin F(A)– Structure learning (A* = argmin I(XA; XV\A))– Clustering– MAP inference in Markov Random Fields– …

• Maximization: A* = argmax F(A) – Feature selection– Active learning– Ranking– …

Set functions

Submodular set functions• Set function F on V is called submodular if

• Equivalent diminishing returns characterization:

SB AS

+

+

Large improvement

Small improvement

Submodularity:

BA A [ B

AÅB

++ ¸

Submodularity and supermodularity

Example: Mutual information

13

Closedness propertiesF1,…,Fm submodular functions on V and 1,…,m > 0

Then: F(A) = i i Fi(A) is submodular!

Submodularity closed under nonnegative linear combinations!

Extremely useful fact!!– F(A) submodular ) P() F(A) submodular!– Multicriterion optimization:

F1,…,Fm submodular, i¸0 ) i i Fi(A) submodular

14

Submodularity and Concavity

|A|

g(|A|)

15

Maximum of submodular functions

Suppose F1(A) and F2(A) submodular.

Is F(A) = max(F1(A),F2(A)) submodular?

|A|

F2(A)

F1(A)

F(A) = max(F1(A),F2(A))

max(F1,F2) not submodular in general!

16

Minimum of submodular functions

Well, maybe F(A) = min(F1(A),F2(A)) instead?

F1(A) F2(A) F(A)

; 0 0 0{a} 1 0 0{b} 0 1 0{a,b} 1 1 1

F({b}) – F(;)=0

F({a,b}) – F({a})=1

<

But stay tuned

min(F1,F2) not submodular in general!

18

Submodularity and convexity

19

The submodular polyhedron PFExample: V = {a,b}

x({a}) · F({a})

x({b}) · F({b})

x({a,b}) · F({a,b})

PF

-1 x{a}

x{b}

0 1

1

2

-2

A F(A); 0{a} -1{b} 2{a,b} 0

Lovasz extension

22

-1 w{a}

w{b}

0 1

1

2

-2

Example: Lovasz extension

g([0,1]) = [0,1]T [-2,2] = 2 = F({b})

g([1,1]) = [1,1]T [-1,1] = 0 = F({a,b})

{} {a}

{b} {a,b}[-1,1][-2,2]

g(w) = max {wT x: x 2 PF}

w=[0,1]want g(w)

Greedy ordering:e1 = b, e2 = a

w(e1)=1 > w(e2)=0

xw(e1)=F({b})-F(;)=2

xw(e2)=F({b,a})-F({b})=-2

xw=[-2,2]

A F(A); 0{a} -1{b} 2{a,b} 0

23

Why is this useful?Theorem [Lovasz ’83]:g(w) attains its minimum in [0,1]n at a corner!

If we can minimize g on [0,1]n, can minimize F…(at corners, g and F take same values)

F(A) submodular g(w) convex (and efficient to evaluate)

Does the converse also hold?No, consider g(w1,w2,w3) = max(w1,w2+w3)

{a} {b} {c} F({a,b})-F({a})=0 < F({a,b,c})-F({a,c})=1

Minimizing a submodular function

Ellipsoid algorithm

Interior Points algorithm

Example: Image denoising

26

Example: Image denoising

X1

X4

X7

X2

X5

X8

X3

X6

X9

Y1

Y4

Y7

Y2

Y5

Y8

Y3

Y6

Y9

P(x1,…,xn,y1,…,yn) = i,j i,j(yi,yj) i i(xi,yi)

Want argmaxy P(y | x) =argmaxy log P(x,y) =argminy i,j Ei,j(yi,yj)+i Ei(yi)

When is this MAP inference efficiently solvable(in high treewidth graphical models)?

Ei,j(yi,yj) = -log i,j(yi,yj)

Pairwise Markov Random Field

Xi: noisy pixels

Yi: “true” pixels

MAP inference in Markov Random Fields[Kolmogorov et al, PAMI ’04, see also: Hammer, Ops Res ‘65]

28

Constrained minimization

Part II



Multi-Label problems

Move makingexpansions move and swap move for this problem

Metric and Semi metric Potential functions

• if the pairwise potential functions define a metric then the energy function in equation (8) can be approximately minimized using alpha expansions.

• if pairwise potential functions defines a semi-metric, it can be minimized using alpha beta-swaps.

Move Energy

• Each move:• A transformation function: • The energy of a move t:• The optimal move:

Submodular set functions play an important role in energy minimization as they can be minimized in polynomial time

The swap move algorithm

The expansion move algorithm

Higher order potential

• The class of higher order clique potentialsfor which the expansion and swap moves can be

computed in polynomial timeThe clique potential take the form:

• Question you should be asking:

• Show that move energy is submodular for all xc

Can my higher order potential be solved using α-expansions?

• Form of the Higher Order Potentials

Moves for Higher Order Potentials

Clique Inconsistency function:

Pairwise potential:

xi

xj

xk

xm xl

cSum Form

Max Form

Theoretical Results: Swap• Move energy is always submodular if

non-decreasing concave.

proofs

Condition for Swap move

Concave Function:

Prove • all projections on two variables of any alpha

beta-swap move energy are submodular.

• The cost of any configuration

substitute

Constraints 1:Lema 1:Constraints2:

The theorem is true

Condition for alpha expansion

• Metric:

• Form of the Higher Order Potentials

Moves for Higher Order Potentials

Clique Inconsistency function:

Pairwise potential:

xi

xj

xk

xm xl

cSum Form

Max Form

Part II



Image Segmentation

E(X) = ∑ ci xi + ∑ dij |xi-xj|i i,j

E: {0,1}n → R

0 →fg, 1→bg

n = number of pixels

[Boykov and Jolly ‘ 01] [Blake et al. ‘04] [Rother et al.`04]

Image Unary Cost Segmentation

Pn Potts Potentials

Patch Dictionary

(Tree)

Cmax 0

{0 if xi = 0, i ϵ p Cmax otherwise

h(Xp) =

p

[slide credits: Kohli]

Pn Potts Potentials

E(X) = ∑ ci xi + ∑ dij |xi-xj| + ∑ hp (Xp) i i,j p

p

{0 if xi = 0, i ϵ p Cmax otherwise

h(Xp) =

E: {0,1}n → R

0 →fg, 1→bg

n = number of pixels

[slide credits: Kohli]

Theoretical Results: Expansion

• Move energy is always submodular if

increasing linear

See paper for proofs

PN Potts Model

c

PN Potts Model

c Cost : g

PN Potts Model

c Cost : gmax

Optimal moves for PN Potts• Computing the optimal swap move

c

Label 3Label 4

Case 1Not all variables assigned label 1 or 2

Move Energy is independent of tc

and can be ignored.

Label 1( )aLabel 2 ( )b


c


Label 3Label 4

Case 2All variables assigned label 1 or 2


c

Label 3Label 4

Case 2All variables assigned label 1 or 2

Can be minimized by solving a st-mincut problem


Solving the Move Energy

Add a constant

This transformation does not effect the solution

add a constant K to all possible values of the clique potential without changing the optimal move

Solving the Move Energy• Computing the optimal swap move

Source

Sink

v1 v2 vn

Ms

Mt

ti = 0 vi Source Set

tj = 1 vj Sink Set


Case 1: all xi = a (vi Source)

Cost:

Source

Sink

v1 v2 vn

Ms

Mt


v1 v2 vn

Ms

MtCost:

Source

Sink

Case 2: all xi = b (vi Sink)


Cost:

v1 v2 vn

Ms

Mt

Source

Sink

Case 3: all xi = ,a b (vi Source, Sink)

Recall that the cost of an st-mincut is the sum of weights of the edges included in the stmincut which go from the source set to the sink set.

Optimal moves for PN Potts• The expansion move energy

• Similar graph construction.

Experimental Results• Texture Segmentation

Unary(Colour)

Pairwise(Smoothness)

Higher Order(Texture)

Original Pairwise Higher order

Experimental Results

Original Swap (3.2 sec)

Expansion (2.5 sec)

Pairwise Higher Order

Swap (4.2 sec)

Expansion (3.0 sec)

Experimental Results

Original

Pairwise Higher Order

Swap (4.7 sec)

Expansion (3.7sec)

Swap (5.0 sec)

Expansion (4.4 sec)

More Higher-order models

tractable higher order models in computer vision (part ii) slides from carsten rother, sebastian...

Documents

fa submodular p fa submodular

f2a submodular

selection b

fm submodular functions

argminy i

fa greedy algorithm

information gain fa

submodularity useful