tractable higher order models in computer vision (part ii) slides from carsten rother, sebastian...
TRANSCRIPT
Tractable Higher Order Models in Computer Vision (Part II)
Slides from Carsten Rother, Sebastian Nowozin, Pusohmeet KhliMicrosoft Research Cambridge
Presented by Xiaodan Liang
Part II
• Submodularity • Move making algorithms
• Higher-order model : Pn Potts model
Feature selection
Factoring distributions
Problem inherently combinatorial!
Example: Greedy algorithm for feature selection
6
s
Key property: Diminishing returnsSelection A = {} Selection B = {X2,X3}
Adding X1 will help a lot!
Adding X1 doesn’t help much
New feature X1
B As
+
+
Large improvement
Small improvement
Submodularity:
Y“Sick”
X1
“Fever”
X2
“Rash”X3
“Male”
Y“Sick”
Theorem [Krause, Guestrin UAI ‘05]: Information gain F(A) in Naïve Bayes models is submodular!
7
Why is submodularity useful?Theorem [Nemhauser et al ‘78]Greedy maximization algorithm returns Agreedy:
F(Agreedy) ¸ (1-1/e) max|A| k F(A)
• Greedy algorithm gives near-optimal solution!• For info-gain: Guarantees best possible unless P = NP!
[Krause, Guestrin UAI ’05]
~63%
8
Submodularity in Machine Learning• Many ML problems are submodular, i.e., for F
submodular require:• Minimization: A* = argmin F(A)– Structure learning (A* = argmin I(XA; XV\A))– Clustering– MAP inference in Markov Random Fields– …
• Maximization: A* = argmax F(A) – Feature selection– Active learning– Ranking– …
Set functions
Submodular set functions• Set function F on V is called submodular if
• Equivalent diminishing returns characterization:
SB AS
+
+
Large improvement
Small improvement
Submodularity:
BA A [ B
AÅB
++ ¸
Submodularity and supermodularity
Example: Mutual information
13
Closedness propertiesF1,…,Fm submodular functions on V and 1,…,m > 0
Then: F(A) = i i Fi(A) is submodular!
Submodularity closed under nonnegative linear combinations!
Extremely useful fact!!– F(A) submodular ) P() F(A) submodular!– Multicriterion optimization:
F1,…,Fm submodular, i¸0 ) i i Fi(A) submodular
14
Submodularity and Concavity
|A|
g(|A|)
15
Maximum of submodular functions
Suppose F1(A) and F2(A) submodular.
Is F(A) = max(F1(A),F2(A)) submodular?
|A|
F2(A)
F1(A)
F(A) = max(F1(A),F2(A))
max(F1,F2) not submodular in general!
16
Minimum of submodular functions
Well, maybe F(A) = min(F1(A),F2(A)) instead?
F1(A) F2(A) F(A)
; 0 0 0{a} 1 0 0{b} 0 1 0{a,b} 1 1 1
F({b}) – F(;)=0
F({a,b}) – F({a})=1
<
But stay tuned
min(F1,F2) not submodular in general!
18
Submodularity and convexity
19
The submodular polyhedron PFExample: V = {a,b}
x({a}) · F({a})
x({b}) · F({b})
x({a,b}) · F({a,b})
PF
-1 x{a}
x{b}
0 1
1
2
-2
A F(A); 0{a} -1{b} 2{a,b} 0
Lovasz extension
22
-1 w{a}
w{b}
0 1
1
2
-2
Example: Lovasz extension
g([0,1]) = [0,1]T [-2,2] = 2 = F({b})
g([1,1]) = [1,1]T [-1,1] = 0 = F({a,b})
{} {a}
{b} {a,b}[-1,1][-2,2]
g(w) = max {wT x: x 2 PF}
w=[0,1]want g(w)
Greedy ordering:e1 = b, e2 = a
w(e1)=1 > w(e2)=0
xw(e1)=F({b})-F(;)=2
xw(e2)=F({b,a})-F({b})=-2
xw=[-2,2]
A F(A); 0{a} -1{b} 2{a,b} 0
23
Why is this useful?Theorem [Lovasz ’83]:g(w) attains its minimum in [0,1]n at a corner!
If we can minimize g on [0,1]n, can minimize F…(at corners, g and F take same values)
F(A) submodular g(w) convex (and efficient to evaluate)
Does the converse also hold?No, consider g(w1,w2,w3) = max(w1,w2+w3)
{a} {b} {c} F({a,b})-F({a})=0 < F({a,b,c})-F({a,c})=1
Minimizing a submodular function
Ellipsoid algorithm
Interior Points algorithm
Example: Image denoising
26
Example: Image denoising
X1
X4
X7
X2
X5
X8
X3
X6
X9
Y1
Y4
Y7
Y2
Y5
Y8
Y3
Y6
Y9
P(x1,…,xn,y1,…,yn) = i,j i,j(yi,yj) i i(xi,yi)
Want argmaxy P(y | x) =argmaxy log P(x,y) =argminy i,j Ei,j(yi,yj)+i Ei(yi)
When is this MAP inference efficiently solvable(in high treewidth graphical models)?
Ei,j(yi,yj) = -log i,j(yi,yj)
Pairwise Markov Random Field
Xi: noisy pixels
Yi: “true” pixels
MAP inference in Markov Random Fields[Kolmogorov et al, PAMI ’04, see also: Hammer, Ops Res ‘65]
28
Constrained minimization
Part II
• Submodularity • Move making algorithms
• Higher-order model : Pn Potts model
Multi-Label problems
Move makingexpansions move and swap move for this problem
Metric and Semi metric Potential functions
• if the pairwise potential functions define a metric then the energy function in equation (8) can be approximately minimized using alpha expansions.
• if pairwise potential functions defines a semi-metric, it can be minimized using alpha beta-swaps.
Move Energy
• Each move:• A transformation function: • The energy of a move t:• The optimal move:
Submodular set functions play an important role in energy minimization as they can be minimized in polynomial time
The swap move algorithm
The expansion move algorithm
Higher order potential
• The class of higher order clique potentialsfor which the expansion and swap moves can be
computed in polynomial timeThe clique potential take the form:
• Question you should be asking:
• Show that move energy is submodular for all xc
Can my higher order potential be solved using α-expansions?
• Form of the Higher Order Potentials
Moves for Higher Order Potentials
Clique Inconsistency function:
Pairwise potential:
xi
xj
xk
xm xl
cSum Form
Max Form
Theoretical Results: Swap• Move energy is always submodular if
non-decreasing concave.
proofs
Condition for Swap move
Concave Function:
Prove • all projections on two variables of any alpha
beta-swap move energy are submodular.
• The cost of any configuration
substitute
Constraints 1:Lema 1:Constraints2:
The theorem is true
Condition for alpha expansion
• Metric:
• Form of the Higher Order Potentials
Moves for Higher Order Potentials
Clique Inconsistency function:
Pairwise potential:
xi
xj
xk
xm xl
cSum Form
Max Form
Part II
• Submodularity • Move making algorithms
• Higher-order model : Pn Potts model
Image Segmentation
E(X) = ∑ ci xi + ∑ dij |xi-xj|i i,j
E: {0,1}n → R
0 →fg, 1→bg
n = number of pixels
[Boykov and Jolly ‘ 01] [Blake et al. ‘04] [Rother et al.`04]
Image Unary Cost Segmentation
Pn Potts Potentials
Patch Dictionary
(Tree)
Cmax 0
{0 if xi = 0, i ϵ p Cmax otherwise
h(Xp) =
p
[slide credits: Kohli]
Pn Potts Potentials
E(X) = ∑ ci xi + ∑ dij |xi-xj| + ∑ hp (Xp) i i,j p
p
{0 if xi = 0, i ϵ p Cmax otherwise
h(Xp) =
E: {0,1}n → R
0 →fg, 1→bg
n = number of pixels
[slide credits: Kohli]
Theoretical Results: Expansion
• Move energy is always submodular if
increasing linear
See paper for proofs
PN Potts Model
c
PN Potts Model
c Cost : g
PN Potts Model
c Cost : gmax
Optimal moves for PN Potts• Computing the optimal swap move
c
Label 3Label 4
Case 1Not all variables assigned label 1 or 2
Move Energy is independent of tc
and can be ignored.
Label 1( )aLabel 2 ( )b
Optimal moves for PN Potts• Computing the optimal swap move
c
Label 1( )aLabel 2 ( )b
Label 3Label 4
Case 2All variables assigned label 1 or 2
Optimal moves for PN Potts• Computing the optimal swap move
c
Label 3Label 4
Case 2All variables assigned label 1 or 2
Can be minimized by solving a st-mincut problem
Label 1( )aLabel 2 ( )b
Solving the Move Energy
Add a constant
This transformation does not effect the solution
add a constant K to all possible values of the clique potential without changing the optimal move
Solving the Move Energy• Computing the optimal swap move
Source
Sink
v1 v2 vn
Ms
Mt
ti = 0 vi Source Set
tj = 1 vj Sink Set
Solving the Move Energy• Computing the optimal swap move
Case 1: all xi = a (vi Source)
Cost:
Source
Sink
v1 v2 vn
Ms
Mt
Solving the Move Energy• Computing the optimal swap move
v1 v2 vn
Ms
MtCost:
Source
Sink
Case 2: all xi = b (vi Sink)
Solving the Move Energy• Computing the optimal swap move
Cost:
v1 v2 vn
Ms
Mt
Source
Sink
Case 3: all xi = ,a b (vi Source, Sink)
Recall that the cost of an st-mincut is the sum of weights of the edges included in the stmincut which go from the source set to the sink set.
Optimal moves for PN Potts• The expansion move energy
• Similar graph construction.
Experimental Results• Texture Segmentation
Unary(Colour)
Pairwise(Smoothness)
Higher Order(Texture)
Original Pairwise Higher order
Experimental Results
Original Swap (3.2 sec)
Expansion (2.5 sec)
Pairwise Higher Order
Swap (4.2 sec)
Expansion (3.0 sec)
Experimental Results
Original
Pairwise Higher Order
Swap (4.7 sec)
Expansion (3.7sec)
Swap (5.0 sec)
Expansion (4.4 sec)
More Higher-order models