on dual decomposition and linear programming relaxations for natural language processing

On Dual Decompositionand Linear Programming Relaxations

for Natural Language Processing

Alexander M. Rush, David Sontag,Michael Collins, and Tommi Jaakkola

Dynamic Programming

Dynamic programming is a dominant technique in NLP.

I Fast

I Exact

I Easy to implement

Examples:

I Viterbi algorithm for hidden Markov models

I CKY algorithm for weighted context-free grammars

y∗ = arg maxy

f (y) ← Decoding

Model Complexity

Unfortunately, dynamic programming algorithms do not scale wellwith model complexity.

As our models become complex, these algorithms can explode interms of computational or implementational complexity.

Integration:

I f ← Easy

I g ← Easy

I f + g ← Hard

Integration (1)

S

NP

N

Red

VP

V

flies

NP

D

some

A

large

N

jet

Red1 flies2 some3 large4 jet5

N V D A N

f (y) + g(z)

I Classical problem in NLP.

I The dynamic programming intersection is prohibitively slowand complicated to implement.

Integration (2)

S

NP

N

Red

VP

V

flies

NP

D

some

A

large

N

jet

*0 Red1 flies2 some3 large4 jet5

f (y) + g(z)

I Important for improving parsing accuracy.

I The dynamic programming intersection is slow andcomplicated to implement.

Dual Decomposition

A general technique for constructing decoding algorithms

Solve complicated models

y∗ = arg maxy

f (y)

by decomposing into smaller problems.

Upshot: Can utilize a toolbox of combinatorial algorithms.

I Dynamic programming

I Minimum spanning tree

I Shortest path

I Min-Cut

I ...

Dual Decomposition Algorithms

Simple - Uses basic dynamic programming algorithms

Efficient - Faster than full dynamic programming intersections

Strong Guarantees - Gives a certificate of optimality when exact

In experiments, we find the global optimum on 99% of examples.

Widely Applicable - Similar techniques extend to other problems

Roadmap

Algorithm

Experiments

LP Relaxations

Integrated Parsing and Tagging



N V D A N


S

NP

N

Red

VP

V

flies

NP

D

some

A

large

N

jet

HMM

Dual Decomposition

CFG

HMM for Tagging


N V D A N

I Let Z be the set of all valid taggings of a sentence

and g(z) be a scoring function.

e.g. g(z) = log p(Red1|N) + log p(V|N) + ...

z∗ = arg maxz∈Z

g(z) ← Viterbi decoding

CFG for Parsing

S

NP

N

Red

VP

V

flies

NP

D

some

A

large

N

jet

I Let Y be the set of all valid parse trees for a sentence and

f (y) be a scoring function.

e.g. f (y) = log p(S → NP VP|S) + log p(NP → N|NP) + ...

y∗ = arg maxy∈Y

f (y) ← CKY Algorithm

Problem DefinitionS

NP

N

Red

VP

V

flies

NP

D

some

A

large

N

jet

I Find parse tree that optimizes

score(S → NP VP) + score(VP → V NP) +

...+ score(Red1,N) + score(V,N) + ...

I Conventional Approach (Bar Hillel et al., 1961)I Replace rules like S → NP VP

with rules like SN,N → NPN,V VPV ,N

Painful. O(t6) increase in complexity for trigram tagging.

The Integrated Parsing and Tagging Problem

Find argmax

y∈ Y, z∈ Zf (y) + g(z)

such that for all i , t, y(i , t) = z(i , t)

Trees Taggings

CFG HMM

Constraints

Where y(i , t) = 1 if parse includes tag t at position i

z(i , t) = 1 if tagging includes tag t at position i


Find argmax

y∈ Y, z∈ Zf (y) + g(z)


Trees

Taggings

CFG HMM

Constraints




Find argmax

y∈ Y, z∈ Zf (y) + g(z)


Trees Taggings

CFG HMM

Constraints




Find argmax

y∈ Y, z∈ Zf (y) + g(z)


Trees Taggings

CFG

HMM

Constraints




Find argmax

y∈ Y, z∈ Zf (y) + g(z)


Trees Taggings

CFG HMM

Constraints



Algorithm Sketch

Set penalty weights equal to 0 for the tag at each position.

For k = 1 to K

y (k) ← Decode (f (y) + penalty) by CKY Algorithm

z(k) ← Decode (g(z)− penalty) by Viterbi Decoding

If y (k)(i , t) = z(k)(i , t) for all i , t Return (y (k), z(k))

Algorithm Sketch

Set penalty weights equal to 0 for the tag at each position.

For k = 1 to K

y (k) ← Decode (f (y) + penalty) by CKY Algorithm

z(k) ← Decode (g(z)− penalty) by Viterbi Decoding

If y (k)(i , t) = z(k)(i , t) for all i , t Return (y (k), z(k))

Else Update penalty weights based on y (k)(i , t)− z(k)(i , t)

CKY Parsing

S

NP

A

Red

N

flies

D

some

A

large

VP

V

jet

S

NP

A

Red

N

flies

D

some

A

large

VP

V

jet

S

NP

N

Red

VP

V

flies

NP

D

some

A

large

N

jet

S

NP

N

Red

VP

V

flies

NP

D

some

A

large

N

jet

S

NP

N

Red

VP

V

flies

NP

D

some

A

large

N

jet

y∗ = arg maxy∈Y

(f (y) +∑i ,t

u(i , t)y(i , t))

Viterbi Decoding


N V D A NN V D A NA N D A NA N D A NN V D A N

z∗ = arg maxz∈Z

(g(z)−∑i ,t

u(i , t)z(i , t))

Penaltiesu(i , t) = 0 for all i ,t

Iteration 1

u(1,A) -1

u(1,N) 1

u(2,N) -1

u(2,V ) 1

u(5,V ) -1

u(5,N) 1

Iteration 2

u(5,V ) -1

u(5,N) 1

Convergedy∗ = arg max

y∈Yf (y) + g(y)

Keyf (y) ⇐ CFG g(z) ⇐ HMMY ⇐ Parse Trees Z ⇐ Taggingsy(i , t) = 1 if y contains tag t at position i

CKY Parsing S

NP

A

Red

N

flies

D

some

A

large

VP

V

jet

S

NP

A

Red

N

flies

D

some

A

large

VP

V

jet

S

NP

N

Red

VP

V

flies

NP

D

some

A

large

N

jet

S

NP

N

Red

VP

V

flies

NP

D

some

A

large

N

jet

S

NP

N

Red

VP

V

flies

NP

D

some

A

large

N

jet

y∗ = arg maxy∈Y

(f (y) +∑i ,t

u(i , t)y(i , t))

Viterbi Decoding



z∗ = arg maxz∈Z

(g(z)−∑i ,t

u(i , t)z(i , t))


Iteration 1

u(1,A) -1

u(1,N) 1

u(2,N) -1

u(2,V ) 1

u(5,V ) -1

u(5,N) 1

Iteration 2

u(5,V ) -1

u(5,N) 1


y∈Yf (y) + g(y)


CKY Parsing S

NP

A

Red

N

flies

D

some

A

large

VP

V

jet

S

NP

A

Red

N

flies

D

some

A

large

VP

V

jet

S

NP

N

Red

VP

V

flies

NP

D

some

A

large

N

jet

S

NP

N

Red

VP

V

flies

NP

D

some

A

large

N

jet

S

NP

N

Red

VP

V

flies

NP

D

some

A

large

N

jet

y∗ = arg maxy∈Y

(f (y) +∑i ,t

u(i , t)y(i , t))

Viterbi Decoding


N V D A N

N V D A NA N D A NA N D A NN V D A N

z∗ = arg maxz∈Z

(g(z)−∑i ,t

u(i , t)z(i , t))


Iteration 1

u(1,A) -1

u(1,N) 1

u(2,N) -1

u(2,V ) 1

u(5,V ) -1

u(5,N) 1

Iteration 2

u(5,V ) -1

u(5,N) 1


y∈Yf (y) + g(y)


CKY Parsing

S

NP

A

Red

N

flies

D

some

A

large

VP

V

jet

S

NP

A

Red

N

flies

D

some

A

large

VP

V

jet

S

NP

N

Red

VP

V

flies

NP

D

some

A

large

N

jet

S

NP

N

Red

VP

V

flies

NP

D

some

A

large

N

jet

S

NP

N

Red

VP

V

flies

NP

D

some

A

large

N

jet

y∗ = arg maxy∈Y

(f (y) +∑i ,t

u(i , t)y(i , t))

Viterbi Decoding


N V D A N

N V D A N

A N D A NA N D A NN V D A N

z∗ = arg maxz∈Z

(g(z)−∑i ,t

u(i , t)z(i , t))


Iteration 1

u(1,A) -1

u(1,N) 1

u(2,N) -1

u(2,V ) 1

u(5,V ) -1

u(5,N) 1

Iteration 2

u(5,V ) -1

u(5,N) 1


y∈Yf (y) + g(y)


CKY Parsing

S

NP

A

Red

N

flies

D

some

A

large

VP

V

jet

S

NP

A

Red

N

flies

D

some

A

large

VP

V

jet

S

NP

N

Red

VP

V

flies

NP

D

some

A

large

N

jet

S

NP

N

Red

VP

V

flies

NP

D

some

A

large

N

jet

S

NP

N

Red

VP

V

flies

NP

D

some

A

large

N

jet

y∗ = arg maxy∈Y

(f (y) +∑i ,t

u(i , t)y(i , t))

Viterbi Decoding



z∗ = arg maxz∈Z

(g(z)−∑i ,t

u(i , t)z(i , t))


Iteration 1

u(1,A) -1

u(1,N) 1

u(2,N) -1

u(2,V ) 1

u(5,V ) -1

u(5,N) 1

Iteration 2

u(5,V ) -1

u(5,N) 1


y∈Yf (y) + g(y)


CKY Parsing

S

NP

A

Red

N

flies

D

some

A

large

VP

V

jet

S

NP

A

Red

N

flies

D

some

A

large

VP

V

jet

S

NP

N

Red

VP

V

flies

NP

D

some

A

large

N

jet

S

NP

N

Red

VP

V

flies

NP

D

some

A

large

N

jet

S

NP

N

Red

VP

V

flies

NP

D

some

A

large

N

jet

y∗ = arg maxy∈Y

(f (y) +∑i ,t

u(i , t)y(i , t))

Viterbi Decoding


N V D A NN V D A N

A N D A N

A N D A NN V D A N

z∗ = arg maxz∈Z

(g(z)−∑i ,t

u(i , t)z(i , t))


Iteration 1

u(1,A) -1

u(1,N) 1

u(2,N) -1

u(2,V ) 1

u(5,V ) -1

u(5,N) 1

Iteration 2

u(5,V ) -1

u(5,N) 1


y∈Yf (y) + g(y)


CKY Parsing

S

NP

A

Red

N

flies

D

some

A

large

VP

V

jet

S

NP

A

Red

N

flies

D

some

A

large

VP

V

jet

S

NP

N

Red

VP

V

flies

NP

D

some

A

large

N

jet

S

NP

N

Red

VP

V

flies

NP

D

some

A

large

N

jet

S

NP

N

Red

VP

V

flies

NP

D

some

A

large

N

jet

y∗ = arg maxy∈Y

(f (y) +∑i ,t

u(i , t)y(i , t))

Viterbi Decoding


N V D A NN V D A NA N D A N

A N D A N

N V D A N

z∗ = arg maxz∈Z

(g(z)−∑i ,t

u(i , t)z(i , t))


Iteration 1

u(1,A) -1

u(1,N) 1

u(2,N) -1

u(2,V ) 1

u(5,V ) -1

u(5,N) 1

Iteration 2

u(5,V ) -1

u(5,N) 1


y∈Yf (y) + g(y)


CKY Parsing

S

NP

A

Red

N

flies

D

some

A

large

VP

V

jet

S

NP

A

Red

N

flies

D

some

A

large

VP

V

jet

S

NP

N

Red

VP

V

flies

NP

D

some

A

large

N

jet

S

NP

N

Red

VP

V

flies

NP

D

some

A

large

N

jet

S

NP

N

Red

VP

V

flies

NP

D

some

A

large

N

jet

y∗ = arg maxy∈Y

(f (y) +∑i ,t

u(i , t)y(i , t))

Viterbi Decoding



z∗ = arg maxz∈Z

(g(z)−∑i ,t

u(i , t)z(i , t))


Iteration 1

u(1,A) -1

u(1,N) 1

u(2,N) -1

u(2,V ) 1

u(5,V ) -1

u(5,N) 1

Iteration 2

u(5,V ) -1

u(5,N) 1


y∈Yf (y) + g(y)


CKY Parsing

S

NP

A

Red

N

flies

D

some

A

large

VP

V

jet

S

NP

A

Red

N

flies

D

some

A

large

VP

V

jet

S

NP

N

Red

VP

V

flies

NP

D

some

A

large

N

jet

S

NP

N

Red

VP

V

flies

NP

D

some

A

large

N

jet

S

NP

N

Red

VP

V

flies

NP

D

some

A

large

N

jet

y∗ = arg maxy∈Y

(f (y) +∑i ,t

u(i , t)y(i , t))

Viterbi Decoding


N V D A NN V D A NA N D A NA N D A N

N V D A N

z∗ = arg maxz∈Z

(g(z)−∑i ,t

u(i , t)z(i , t))


Iteration 1

u(1,A) -1

u(1,N) 1

u(2,N) -1

u(2,V ) 1

u(5,V ) -1

u(5,N) 1

Iteration 2

u(5,V ) -1

u(5,N) 1


y∈Yf (y) + g(y)


CKY Parsing

S

NP

A

Red

N

flies

D

some

A

large

VP

V

jet

S

NP

A

Red

N

flies

D

some

A

large

VP

V

jet

S

NP

N

Red

VP

V

flies

NP

D

some

A

large

N

jet

S

NP

N

Red

VP

V

flies

NP

D

some

A

large

N

jet

S

NP

N

Red

VP

V

flies

NP

D

some

A

large

N

jet

y∗ = arg maxy∈Y

(f (y) +∑i ,t

u(i , t)y(i , t))

Viterbi Decoding


N V D A NN V D A NA N D A NA N D A N

N V D A N

z∗ = arg maxz∈Z

(g(z)−∑i ,t

u(i , t)z(i , t))


Iteration 1

u(1,A) -1

u(1,N) 1

u(2,N) -1

u(2,V ) 1

u(5,V ) -1

u(5,N) 1

Iteration 2

u(5,V ) -1

u(5,N) 1


y∈Yf (y) + g(y)Key

f (y) ⇐ CFG g(z) ⇐ HMMY ⇐ Parse Trees Z ⇐ Taggingsy(i , t) = 1 if y contains tag t at position i

Guarantees

TheoremIf at any iteration y (k)(i , t) = z(k)(i , t) for all i , t, then (y (k), z(k))is the global optimum.

In experiments, we find the global optimum on 99% of examples.

If we do not converge to a match, we can still get a result (more inpaper).

Integrated CFG and Dependency Parsing




S(flies)

NP

N

Red

VP(flies)

V

flies

NP(jet)

D

some

A

large

N

jet

Dependency Model

Dual Decomposition

Lexicalized CFG

Dependency Parsing


I Let Z be the set of all valid dependency parses of a sentenceand g(z) be a scoring function.

e.g. g(z) = log p(some3|jet5, large4) + ...

z∗ = arg maxz∈Z

g(z) ← Eisner (2000) algorithm

Lexicalized PCFG

S(flies)

NP

N

Red

VP(flies)

V

flies

NP(jet)

D

some

A

large

N

jet

I Let Y be the set of all valid dependency parses of a sentenceand f (y) be a scoring function.

e.g. f (y) = log p(S(flies) → NP(Red) VP(flies)|S(flies)) + ...

y∗ = arg maxy∈Y

f (y) ← Modified CKY algorithm

The Integrated Constituency and Dependency Parsing Problem

Find argmax

y∈ Y, z∈ Zf (y) + g(z)

such that for all i , j , y(i , j) = z(i , j)

Trees Dependency Trees

CFG Dependency

Constraints

Where y(i , j) = 1 if parse includes dependency from word i to j

z(i , j) = 1 if parse includes dependency from word i to j


Find argmax

y∈ Y, z∈ Zf (y) + g(z)


Trees

Dependency Trees

CFG Dependency

Constraints




Find argmax

y∈ Y, z∈ Zf (y) + g(z)



CFG Dependency

Constraints




Find argmax

y∈ Y, z∈ Zf (y) + g(z)



CFG

Dependency

Constraints




Find argmax

y∈ Y, z∈ Zf (y) + g(z)



CFG Dependency

Constraints



CKY Parsing

S(flies)

NP

N

Red

VP(flies)

V

flies

D

some

NP(jet)

A

large

N

jet

S(flies)

NP

N

Red

VP(flies)

V

flies

D

some

NP(jet)

A

large

N

jet

S(flies)

NP

N

Red

VP(flies)

V

flies

NP(jet)

D

some

A

large

N

jet

S(flies)

NP

N

Red

VP(flies)

V

flies

NP(jet)

D

some

A

large

N

jet

y∗ = arg maxy∈Y

(f (y) +∑i ,j

u(i , j)y(i , j))

Dependency Parsing


z∗ = arg maxz∈Z

(g(z)−∑i ,j

u(i , j)z(i , j))

Penaltiesu(i , j) = 0 for all i ,j

Iteration 1

u(2, 3) -1

u(5, 3) 1


y∈Yf (y) + g(y)

Keyf (y) ⇐ CFG g(z) ⇐ Dependency ModelY ⇐ Parse Trees Z ⇐ Dependency Treesy(i , j) = 1 if y contains dependency i , j

CKY Parsing S(flies)

NP

N

Red

VP(flies)

V

flies

D

some

NP(jet)

A

large

N

jet

S(flies)

NP

N

Red

VP(flies)

V

flies

D

some

NP(jet)

A

large

N

jet

S(flies)

NP

N

Red

VP(flies)

V

flies

NP(jet)

D

some

A

large

N

jet

S(flies)

NP

N

Red

VP(flies)

V

flies

NP(jet)

D

some

A

large

N

jet

y∗ = arg maxy∈Y

(f (y) +∑i ,j

u(i , j)y(i , j))

Dependency Parsing


z∗ = arg maxz∈Z

(g(z)−∑i ,j

u(i , j)z(i , j))


Iteration 1

u(2, 3) -1

u(5, 3) 1


y∈Yf (y) + g(y)


CKY Parsing

S(flies)

NP

N

Red

VP(flies)

V

flies

D

some

NP(jet)

A

large

N

jet

S(flies)

NP

N

Red

VP(flies)

V

flies

D

some

NP(jet)

A

large

N

jet

S(flies)

NP

N

Red

VP(flies)

V

flies

NP(jet)

D

some

A

large

N

jet

S(flies)

NP

N

Red

VP(flies)

V

flies

NP(jet)

D

some

A

large

N

jet

y∗ = arg maxy∈Y

(f (y) +∑i ,j

u(i , j)y(i , j))

Dependency Parsing


z∗ = arg maxz∈Z

(g(z)−∑i ,j

u(i , j)z(i , j))


Iteration 1

u(2, 3) -1

u(5, 3) 1


y∈Yf (y) + g(y)


Roadmap

Algorithm

Experiments

LP Relaxations

Experiment

Properties:

I Exactness

I Parsing Accuracy

Experiments on:

I English Penn Treebank

Models

I Collins (1997) Model 1I Semi-Supervised Dependency Parser (Koo, 2008)I Trigram Tagger (Toutanova, 2000)

How quickly do the models converge?

0

20

40

60

80

100

<=1<=2

<=3<=4

<=10<=20

<=50

% e

xam

ples

con

verg

ed

number of iterations

Integrated Dependency Parsing

0

20

40

60

80

100

<=1<=2

<=3<=4

<=10<=20

<=50

% e

xam

ples

con

verg

ed

number of iterations

Integrated POS Tagging

Integrated Constituency and Dependency Parsing: Accuracy

87

88

89

90

91

92Collins

DepDual

F1 Score

I Collins (1997) Model 1

I Fixed, First-best Dependencies from Koo (2008)

I Dual Decomposition

Integrated Parsing and Tagging: Accuracy

87

88

89

90

91

92FixedDual

F1 Score

I Fixed, First-Best Tags From Toutanova (2000)

I Dual Decomposition

Roadmap

Algorithm

Experiments

LP Relaxations

Dual Decomposition and Linear Programming Relaxations

Theorem

I If the dual decomposition algorithm converges, then(y (k), z(k)) is the global optimum.

Questions

I What problem is dual decomposition solving?

I How come the algorithm doesn’t always converge?

Dual decomposition searches over a linear programming relaxationof the original problem.

Convex Hulls for CKY

A parse tree can be represented as a binary vector y ∈ Y.y(A→ B C , i , j , k) = 1 if rule A→ B C is used at span i , j , k.

Parsing

y∗

w

Y

conv(Y)

I If f is linear, arg maxy∈conv(Y)

f (y) is a linear program.

I The best point in an LP is a vertex. So CKY solves this LP.

Convex Hulls for CKY

A parse tree can be represented as a binary vector y ∈ Y.y(A→ B C , i , j , k) = 1 if rule A→ B C is used at span i , j , k.

Parsing

y∗

w

conv(Y)

I If f is linear, arg maxy∈conv(Y)

f (y) is a linear program.

I The best point in an LP is a vertex. So CKY solves this LP.

Combined Problem

Q = {(y , z): y ∈ Y, z ∈ Z,y(i , t) = z(i , t) for all (i , t)}

Q

conv(Q)Q′Q′ conv(Q)

Possible (y∗, z∗)

wPossible (y∗, z∗)

w

Q′ = {(µ, ν): µ ∈ conv(Y), ν ∈ conv(Z),

µ(i , t) = ν(i , t) for all (i , t)}

Dual decomposition searches over Q′

Depending on the weight vector, (y∗, z∗) ∈ Q′ could be in Q or inthe strict outer bound.

Combined Problem


conv(Q)

Q′Q′ conv(Q)



w


µ(i , t) = ν(i , t) for all (i , t)}



Combined Problem


conv(Q)

Q′

Q′ conv(Q)



w


µ(i , t) = ν(i , t) for all (i , t)}



Combined Problem


conv(Q)Q′

Q′ conv(Q)



w


µ(i , t) = ν(i , t) for all (i , t)}



Combined Problem


conv(Q)Q′

Q′ conv(Q)


w


w


µ(i , t) = ν(i , t) for all (i , t)}



Are there points strictly in the outer bound?

Q′

Possible (y∗, z∗)?

Taggings 0.5x

w1 w2 w3

A A A

+ 0.5x

w1 w2 w3

A B B

Parses 0.5 x

X

A

w1

X

A

w2

B

w3

+ 0.5 x

X

A

w1

X

B

w2

A

w3

Best result can be afractional solution.

Convexcombination ofthese structures.

Summary

A Dual Decomposition algorithm for integrated decoding

Simple - Uses only simple, off-the-shelf dynamic programmingalgorithms to solve a harder problem.

Efficient - Faster than classical methods for dynamic programmingintersection.

Strong Guarantees - Solves a linear programming relaxation whichgives a certificate of optimality.

Finds the exact solution on 99% of the examples.

Widely Applicable - Similar techniques extend to other problems

Appendix

Iterative Progress

50

60

70

80

90

100

0 10 20 30 40 50

Per

cent

age

Maximum Number of Dual Decomposition Iterations

f score% certificates

% match K=50

Deriving the Algorithm

Goal:y∗ = arg max

y∈Yf (y)

Rewrite:arg max

z∈Z,y∈Yf (z) + g(y)

s.t. z(i , j) = y(i , j) for all i , j

Lagrangian: L(u, y , z) = f (z) + g(y) +∑i,j

u(i , j) (y(i , j)− z(i , j))

The dual problem is to find minu

L(u) where

L(u) = maxy∈Y,z∈Z

L(u, y , z) = maxz∈Z

f (z) +∑i,j

u(i , j)z(i , j)

+ max

y∈Y

g(y)−∑i,j

u(i , j)y(i , j)

Dual is an upper bound: L(u) ≥ f (z∗) + g(y∗) for any u

A Subgradient Algorithm for Minimizing L(u)

L(u) = maxz∈Z

f (z) +∑i,j

u(i , j)y(i , j)

+ maxy∈Y

g(y)−∑i,j

u(i , j)z(i , j)

L(u) is convex, but not differentiable. A subgradient of L(u) at uis a vector gu such that for all v ,

L(v) ≥ L(u) + gu · (v − u)

Subgradient methods use updates u′ = u − αgu

In fact, for our L(u), gu(i , j) = z∗(i , j)− y∗(i , j)

Related Work

I Methods that use general purpose linear programming orinteger linear programming solvers (Martins et al. 2009;Riedel and Clarke 2006; Roth and Yih 2005)

I Dual decomposition/Lagrangian relaxation in combinatorialoptimization (Dantzig and Wolfe, 1960; Held and Karp, 1970;Fisher 1981)

I Dual decomposition for inference in MRFs (Komodakis et al.,2007; Wainwright et al., 2005)

I Methods that incorporate combinatorial solvers within loopybelief propagation (Duchi et al. 2007; Smith and Eisner 2008)

on dual decomposition and linear programming relaxations for natural language processing

Documents