exact decoding for phrase-based smt › slides › emnlp2014.pdf · exact decoding for phrase-based...
TRANSCRIPT
![Page 1: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/1.jpg)
Exact decoding for phrase-based SMT
Wilker Aziz1, Marc Dymetman2, Lucia Specia1
1University of Sheffield2Xerox Research Centre Europe
October 27, 2014
![Page 2: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/2.jpg)
Outline
Introduction
Approach
Results
Conclusions
2 / 21
![Page 3: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/3.jpg)
Decoding
Viterbi decoding
d∗ = argmaxd∈D(x)
f (d)
= argmaxd∈D(x)
θ>H(d)
1 / 21
![Page 4: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/4.jpg)
Decoding
Viterbi decoding
d∗ = argmaxd∈D(x)
f (d)
= argmaxd∈D(x)
θ>H(d)
space of translation derivations compatible with the input x
1 / 21
![Page 5: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/5.jpg)
Decoding
Viterbi decoding
d∗ = argmaxd∈D(x)
f (d)
= argmaxd∈D(x)
θ>H(d)
we are looking for the best derivationunder a linear parameterisation (θ ∈ Rm)
1 / 21
![Page 6: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/6.jpg)
Decoding
Viterbi decoding
d∗ = argmaxd∈D(x)
f (d)
= argmaxd∈D(x)
θ>1 H1(d) + θ>2 H2(d)
“local” features assess steps in a derivation independently
H1(d) =∑e∈d
h1(e)
1 / 21
![Page 7: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/7.jpg)
Decoding
Viterbi decoding
d∗ = argmaxd∈D(x)
f (d)
= argmaxd∈D(x)
θ>1 H1(d) + θ>2 H2(d)
“local” features assess steps in a derivation independently
H1(d) =∑e∈d
h1(e)
“nonlocal” features make weaker independence assumptionse.g. HLM(d) = log pLM(yield(d))
1 / 21
![Page 8: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/8.jpg)
Complexity
〈D(x), f (d)〉 can be seen as the intersection between
I a translation hypergraph G(x)locally parameterised
I and a target language model Aas a wFSA
Phrase-based SMT with a distortion limit (d) and an n-gram LM
|G(x)| ∝ I 22d
|A| ∝ |∆|n−1
ProblemThe intersection is too large for standard dynamic programming
2 / 21
![Page 9: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/9.jpg)
Complexity
〈D(x), f (d)〉 can be seen as the intersection betweenI a translation hypergraph G(x)
locally parameterised
I and a target language model Aas a wFSA
Phrase-based SMT with a distortion limit (d) and an n-gram LM
|G(x)| ∝ I 22d
|A| ∝ |∆|n−1
ProblemThe intersection is too large for standard dynamic programming
2 / 21
![Page 10: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/10.jpg)
Complexity
〈D(x), f (d)〉 can be seen as the intersection betweenI a translation hypergraph G(x)
locally parameterisedI and a target language model A
as a wFSA
Phrase-based SMT with a distortion limit (d) and an n-gram LM
|G(x)| ∝ I 22d
|A| ∝ |∆|n−1
ProblemThe intersection is too large for standard dynamic programming
2 / 21
![Page 11: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/11.jpg)
Complexity
〈D(x), f (d)〉 can be seen as the intersection betweenI a translation hypergraph G(x)
locally parameterisedI and a target language model A
as a wFSAPhrase-based SMT with a distortion limit (d) and an n-gram LM
|G(x)| ∝ I 22d
|A| ∝ |∆|n−1
ProblemThe intersection is too large for standard dynamic programming
2 / 21
![Page 12: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/12.jpg)
Complexity
〈D(x), f (d)〉 can be seen as the intersection betweenI a translation hypergraph G(x)
locally parameterisedI and a target language model A
as a wFSAPhrase-based SMT with a distortion limit (d) and an n-gram LM
|G(x)| ∝ I 22d
|A| ∝ |∆|n−1
ProblemThe intersection is too large for standard dynamic programming
2 / 21
![Page 13: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/13.jpg)
Contribution
Previous work on exact decodingI compact modelsI simpler parameterisation using 3-gram LMs
This workI large modelsI realistic 5-gram LMs
3 / 21
![Page 14: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/14.jpg)
Contribution
Previous work on exact decodingI compact modelsI simpler parameterisation using 3-gram LMs
This workI large modelsI realistic 5-gram LMs
3 / 21
![Page 15: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/15.jpg)
IntuitionFull intersection is wasteful
G(x) ∩ A
I complete n-gram LM I encodes one n-gram
Assumptionnot every n-gram participates in high-scoring derivations
Problemwhich n-grams are really necessary?and at which level of refinement?
Strategystart with strong independence assumptionsrevisit those assumptions as necessary
4 / 21
![Page 16: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/16.jpg)
IntuitionFull intersection is wasteful
G(x) ∩ A
I complete n-gram LM
I encodes one n-gram
Assumptionnot every n-gram participates in high-scoring derivations
Problemwhich n-grams are really necessary?and at which level of refinement?
Strategystart with strong independence assumptionsrevisit those assumptions as necessary
4 / 21
![Page 17: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/17.jpg)
IntuitionFull intersection is wasteful
G(x) ∩ A = G(x) ∩( M⋂
i=1A(i)
)
I complete n-gram LM I encodes one n-gram
Assumptionnot every n-gram participates in high-scoring derivations
Problemwhich n-grams are really necessary?and at which level of refinement?
Strategystart with strong independence assumptionsrevisit those assumptions as necessary
4 / 21
![Page 18: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/18.jpg)
IntuitionFull intersection is wasteful
G(x) ∩ A = G(x) ∩( M⋂
i=1A(i)
)
I complete n-gram LM I encodes one n-gram
Assumptionnot every n-gram participates in high-scoring derivations
Problemwhich n-grams are really necessary?and at which level of refinement?
Strategystart with strong independence assumptionsrevisit those assumptions as necessary
4 / 21
![Page 19: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/19.jpg)
IntuitionFull intersection is wasteful
G(x) ∩ A = G(x) ∩( M⋂
i=1A(i)
)
I complete n-gram LM I encodes one n-gram
Assumptionnot every n-gram participates in high-scoring derivations
Problemwhich n-grams are really necessary?and at which level of refinement?
Strategystart with strong independence assumptionsrevisit those assumptions as necessary
4 / 21
![Page 20: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/20.jpg)
IntuitionFull intersection is wasteful
G(x) ∩ A = G(x) ∩( M⋂
i=1A(i)
)
I complete n-gram LM I encodes one n-gram
Assumptionnot every n-gram participates in high-scoring derivations
Problemwhich n-grams are really necessary?and at which level of refinement?
Strategystart with strong independence assumptionsrevisit those assumptions as necessary
4 / 21
![Page 21: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/21.jpg)
OS∗: illustration
5 / 21
0
5
10
15
20
the
blac
k ca
t
the
dark
cat
the
blac
k fe
line
the
dark
felin
e
the
cat b
lack
the
cat d
ark
the
felin
e bl
ack
the
felin
e da
rk
the
(bla
ck c
at)
cat t
he b
lack
cat t
he d
ark
felin
e th
e bl
ack
felin
e th
e da
rk
f
I upperbound the complex target f (d)
![Page 22: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/22.jpg)
OS∗: illustration
5 / 21
0
5
10
15
20
the
blac
k ca
t
the
dark
cat
the
blac
k fe
line
the
dark
felin
e
the
cat b
lack
the
cat d
ark
the
felin
e bl
ack
the
felin
e da
rk
the
(bla
ck c
at)
cat t
he b
lack
cat t
he d
ark
felin
e th
e bl
ack
felin
e th
e da
rk
f g0
I upperbound the complex target f (d) by a simpler proposal g(d)
![Page 23: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/23.jpg)
OS∗: illustration
5 / 21
0
5
10
15
20
the
blac
k ca
t
the
dark
cat
the
blac
k fe
line
the
dark
felin
e
the
cat b
lack
the
cat d
ark
the
felin
e bl
ack
the
felin
e da
rk
the
(bla
ck c
at)
cat t
he b
lack
cat t
he d
ark
felin
e th
e bl
ack
felin
e th
e da
rk
f g0
This is our goal
I we are interested in finding f ’s argmaxhowever we cannot search through f
![Page 24: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/24.jpg)
OS∗: illustration
5 / 21
0
5
10
15
20
the
blac
k ca
t
the
dark
cat
the
blac
k fe
line
the
dark
felin
e
the
cat b
lack
the
cat d
ark
the
felin
e bl
ack
the
felin
e da
rk
the
(bla
ck c
at)
cat t
he b
lack
cat t
he d
ark
felin
e th
e bl
ack
felin
e th
e da
rk
f g0but because our proxy says this is the best
This is our goal
I we find d∗ = argmaxd g(d)
I finding our best solution thus far (according to f )I however we cannot guarantee exactness yet
![Page 25: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/25.jpg)
OS∗: illustration
5 / 21
0
5
10
15
20
the
blac
k ca
t
the
dark
cat
the
blac
k fe
line
the
dark
felin
e
the
cat b
lack
the
cat d
ark
the
felin
e bl
ack
the
felin
e da
rk
the
(bla
ck c
at)
cat t
he b
lack
cat t
he d
ark
felin
e th
e bl
ack
felin
e th
e da
rk
f g0but because our proxy says this is the best
This is our goal
I we find d∗ = argmaxd g(d) and assess f (d∗)
I finding our best solution thus far (according to f )I however we cannot guarantee exactness yet
![Page 26: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/26.jpg)
OS∗: illustration
5 / 21
0
5
10
15
20
the
blac
k ca
t
the
dark
cat
the
blac
k fe
line
the
dark
felin
e
the
cat b
lack
the
cat d
ark
the
felin
e bl
ack
the
felin
e da
rk
the
(bla
ck c
at)
cat t
he b
lack
cat t
he d
ark
felin
e th
e bl
ack
felin
e th
e da
rk
f g0
this is the best we have thus farThis is our goal
but because our proxy says this is the best
I we find d∗ = argmaxd g(d) and assess f (d∗)I finding our best solution thus far (according to f )
I however we cannot guarantee exactness yet
![Page 27: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/27.jpg)
OS∗: illustration
5 / 21
0
5
10
15
20
the
blac
k ca
t
the
dark
cat
the
blac
k fe
line
the
dark
felin
e
the
cat b
lack
the
cat d
ark
the
felin
e bl
ack
the
felin
e da
rk
the
(bla
ck c
at)
cat t
he b
lack
cat t
he d
ark
felin
e th
e bl
ack
felin
e th
e da
rk
f g0
Maximum error
I we find d∗ = argmaxd g(d) and assess f (d∗)I finding our best solution thus far (according to f )I however we cannot guarantee exactness yet
![Page 28: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/28.jpg)
OS∗: illustration
5 / 21
0
5
10
15
20
the
blac
k ca
t
the
dark
cat
the
blac
k fe
line
the
dark
felin
e
the
cat b
lack
the
cat d
ark
the
felin
e bl
ack
the
felin
e da
rk
the
(bla
ck c
at)
cat t
he b
lack
cat t
he d
ark
felin
e th
e bl
ack
felin
e th
e da
rk
f g1
I so we refine g as to bring it closer to fe.g. by making g(d∗) = f (d∗)
I at the cost of some little complexity increase (extra nodes and edges)
![Page 29: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/29.jpg)
OS∗: illustration
5 / 21
0
5
10
15
20
the
blac
k ca
t
the
dark
cat
the
blac
k fe
line
the
dark
felin
e
the
cat b
lack
the
cat d
ark
the
felin
e bl
ack
the
felin
e da
rk
the
(bla
ck c
at)
cat t
he b
lack
cat t
he d
ark
felin
e th
e bl
ack
felin
e th
e da
rk
f g1
I so we refine g as to bring it closer to fe.g. by making g(d∗) = f (d∗)
I at the cost of some little complexity increase (extra nodes and edges)
![Page 30: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/30.jpg)
OS∗: illustration
5 / 21
0
5
10
15
20
the
blac
k ca
t
the
dark
cat
the
blac
k fe
line
the
dark
felin
e
the
cat b
lack
the
cat d
ark
the
felin
e bl
ack
the
felin
e da
rk
the
(bla
ck c
at)
cat t
he b
lack
cat t
he d
ark
felin
e th
e bl
ack
felin
e th
e da
rk
f g1
I as a consequence, g’s argmax has changedwe solve d∗ = argmaxd g(d)
I and assess f (d∗) againour best solution thus far might remain unchanged
![Page 31: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/31.jpg)
OS∗: illustration
5 / 21
0
5
10
15
20
the
blac
k ca
t
the
dark
cat
the
blac
k fe
line
the
dark
felin
e
the
cat b
lack
the
cat d
ark
the
felin
e bl
ack
the
felin
e da
rk
the
(bla
ck c
at)
cat t
he b
lack
cat t
he d
ark
felin
e th
e bl
ack
felin
e th
e da
rk
f g1
I as a consequence, g’s argmax has changedwe solve d∗ = argmaxd g(d)
I and assess f (d∗) againour best solution thus far might remain unchanged
![Page 32: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/32.jpg)
OS∗: illustration
5 / 21
0
5
10
15
20
the
blac
k ca
t
the
dark
cat
the
blac
k fe
line
the
dark
felin
e
the
cat b
lack
the
cat d
ark
the
felin
e bl
ack
the
felin
e da
rk
the
(bla
ck c
at)
cat t
he b
lack
cat t
he d
ark
felin
e th
e bl
ack
felin
e th
e da
rk
f g1
I we cannot yet guarantee exactness
I even though our maximum error is smaller
![Page 33: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/33.jpg)
OS∗: illustration
5 / 21
0
5
10
15
20
the
blac
k ca
t
the
dark
cat
the
blac
k fe
line
the
dark
felin
e
the
cat b
lack
the
cat d
ark
the
felin
e bl
ack
the
felin
e da
rk
the
(bla
ck c
at)
cat t
he b
lack
cat t
he d
ark
felin
e th
e bl
ack
felin
e th
e da
rk
f g1
I we cannot yet guarantee exactnessI even though our maximum error is smaller
![Page 34: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/34.jpg)
OS∗: illustration
5 / 21
0
5
10
15
20
the
blac
k ca
t
the
dark
cat
the
blac
k fe
line
the
dark
felin
e
the
cat b
lack
the
cat d
ark
the
felin
e bl
ack
the
felin
e da
rk
the
(bla
ck c
at)
cat t
he b
lack
cat t
he d
ark
felin
e th
e bl
ack
felin
e th
e da
rk
f g2
I but we can continue refining g
I for as long as g and f disagree on the maximumI until we have a certificate of optimality
![Page 35: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/35.jpg)
OS∗: illustration
5 / 21
0
5
10
15
20
the
blac
k ca
t
the
dark
cat
the
blac
k fe
line
the
dark
felin
e
the
cat b
lack
the
cat d
ark
the
felin
e bl
ack
the
felin
e da
rk
the
(bla
ck c
at)
cat t
he b
lack
cat t
he d
ark
felin
e th
e bl
ack
felin
e th
e da
rk
f g3
I but we can continue refining gI for as long as g and f disagree on the maximum
I until we have a certificate of optimality
![Page 36: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/36.jpg)
OS∗: illustration
5 / 21
0
5
10
15
20
the
blac
k ca
t
the
dark
cat
the
blac
k fe
line
the
dark
felin
e
the
cat b
lack
the
cat d
ark
the
felin
e bl
ack
the
felin
e da
rk
the
(bla
ck c
at)
cat t
he b
lack
cat t
he d
ark
felin
e th
e bl
ack
felin
e th
e da
rk
f g4
I but we can continue refining gI for as long as g and f disagree on the maximum
I until we have a certificate of optimality
![Page 37: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/37.jpg)
OS∗: illustration
5 / 21
0
5
10
15
20
the
blac
k ca
t
the
dark
cat
the
blac
k fe
line
the
dark
felin
e
the
cat b
lack
the
cat d
ark
the
felin
e bl
ack
the
felin
e da
rk
the
(bla
ck c
at)
cat t
he b
lack
cat t
he d
ark
felin
e th
e bl
ack
felin
e th
e da
rk
f g5
I but we can continue refining gI for as long as g and f disagree on the maximumI until we have a certificate of optimality
![Page 38: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/38.jpg)
Decoding algorithm
1: function Optimise(g, ε)
2: d∗ ← argmaxd g(d) . proxy’s argmax3: g∗ ← g(d∗)4: f ∗ ← f (d∗) . observe a point in f5: while (q∗ − f ∗ ≥ ε) do . ε is the maximum error6: A← actions(g,d∗) . collect refinement actions7: g ← refine(g,A) . update proposal8: d∗ ← argmaxd g(d) . update argmax9: g∗ ← g(d∗)
10: f ∗ ← max(f ∗, f (d∗)) . update “best so far”11: end while12: return g, d∗13: end function
6 / 21
![Page 39: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/39.jpg)
Decoding algorithm
1: function Optimise(g, ε)2: d∗ ← argmaxd g(d) . proxy’s argmax
3: g∗ ← g(d∗)4: f ∗ ← f (d∗) . observe a point in f5: while (q∗ − f ∗ ≥ ε) do . ε is the maximum error6: A← actions(g,d∗) . collect refinement actions7: g ← refine(g,A) . update proposal8: d∗ ← argmaxd g(d) . update argmax9: g∗ ← g(d∗)
10: f ∗ ← max(f ∗, f (d∗)) . update “best so far”11: end while12: return g, d∗13: end function
6 / 21
![Page 40: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/40.jpg)
Decoding algorithm
1: function Optimise(g, ε)2: d∗ ← argmaxd g(d) . proxy’s argmax3: g∗ ← g(d∗)4: f ∗ ← f (d∗) . observe a point in f
5: while (q∗ − f ∗ ≥ ε) do . ε is the maximum error6: A← actions(g,d∗) . collect refinement actions7: g ← refine(g,A) . update proposal8: d∗ ← argmaxd g(d) . update argmax9: g∗ ← g(d∗)
10: f ∗ ← max(f ∗, f (d∗)) . update “best so far”11: end while12: return g, d∗13: end function
6 / 21
![Page 41: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/41.jpg)
Decoding algorithm
1: function Optimise(g, ε)2: d∗ ← argmaxd g(d) . proxy’s argmax3: g∗ ← g(d∗)4: f ∗ ← f (d∗) . observe a point in f5: while (q∗ − f ∗ ≥ ε) do . ε is the maximum error
6: A← actions(g,d∗) . collect refinement actions7: g ← refine(g,A) . update proposal8: d∗ ← argmaxd g(d) . update argmax9: g∗ ← g(d∗)
10: f ∗ ← max(f ∗, f (d∗)) . update “best so far”11: end while12: return g, d∗13: end function
6 / 21
![Page 42: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/42.jpg)
Decoding algorithm
1: function Optimise(g, ε)2: d∗ ← argmaxd g(d) . proxy’s argmax3: g∗ ← g(d∗)4: f ∗ ← f (d∗) . observe a point in f5: while (q∗ − f ∗ ≥ ε) do . ε is the maximum error6: A← actions(g,d∗) . collect refinement actions
7: g ← refine(g,A) . update proposal8: d∗ ← argmaxd g(d) . update argmax9: g∗ ← g(d∗)
10: f ∗ ← max(f ∗, f (d∗)) . update “best so far”11: end while12: return g, d∗13: end function
6 / 21
![Page 43: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/43.jpg)
Decoding algorithm
1: function Optimise(g, ε)2: d∗ ← argmaxd g(d) . proxy’s argmax3: g∗ ← g(d∗)4: f ∗ ← f (d∗) . observe a point in f5: while (q∗ − f ∗ ≥ ε) do . ε is the maximum error6: A← actions(g,d∗) . collect refinement actions7: g ← refine(g,A) . update proposal
8: d∗ ← argmaxd g(d) . update argmax9: g∗ ← g(d∗)
10: f ∗ ← max(f ∗, f (d∗)) . update “best so far”11: end while12: return g, d∗13: end function
6 / 21
![Page 44: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/44.jpg)
Decoding algorithm
1: function Optimise(g, ε)2: d∗ ← argmaxd g(d) . proxy’s argmax3: g∗ ← g(d∗)4: f ∗ ← f (d∗) . observe a point in f5: while (q∗ − f ∗ ≥ ε) do . ε is the maximum error6: A← actions(g,d∗) . collect refinement actions7: g ← refine(g,A) . update proposal8: d∗ ← argmaxd g(d) . update argmax
9: g∗ ← g(d∗)10: f ∗ ← max(f ∗, f (d∗)) . update “best so far”11: end while12: return g, d∗13: end function
6 / 21
![Page 45: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/45.jpg)
Decoding algorithm
1: function Optimise(g, ε)2: d∗ ← argmaxd g(d) . proxy’s argmax3: g∗ ← g(d∗)4: f ∗ ← f (d∗) . observe a point in f5: while (q∗ − f ∗ ≥ ε) do . ε is the maximum error6: A← actions(g,d∗) . collect refinement actions7: g ← refine(g,A) . update proposal8: d∗ ← argmaxd g(d) . update argmax9: g∗ ← g(d∗)
10: f ∗ ← max(f ∗, f (d∗)) . update “best so far”11: end while
12: return g, d∗13: end function
6 / 21
![Page 46: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/46.jpg)
Decoding algorithm
1: function Optimise(g, ε)2: d∗ ← argmaxd g(d) . proxy’s argmax3: g∗ ← g(d∗)4: f ∗ ← f (d∗) . observe a point in f5: while (q∗ − f ∗ ≥ ε) do . ε is the maximum error6: A← actions(g,d∗) . collect refinement actions7: g ← refine(g,A) . update proposal8: d∗ ← argmaxd g(d) . update argmax9: g∗ ← g(d∗)
10: f ∗ ← max(f ∗, f (d∗)) . update “best so far”11: end while12: return g, d∗13: end function
6 / 21
![Page 47: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/47.jpg)
ProposalIn g, the true LM pLM is replaced by an upperbound qLM
with stronger independence assumptions
I Suppose α = yJI is a substring of y = yM
1e.g. α = black2 cat3 in y = BOS0 the1 black2 cat3 EOS4
I contribution of α to the true LM score of y
pLM(α) ≡J∏
k=Ip(yk |yk−1
1 )
e.g. p(black2|BOS0 the1)p(cat3|BOS0 the1 black2)I upperbound to pLM(α)
qLM(α) ≡ q(yI |ε)J∏
k=I+1q(yk |yk−1
I )
e.g. q(black2|ε)q(cat3|black2)
where q(z|P) ≡ maxH∈∆∗ p(z|HP)
7 / 21
![Page 48: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/48.jpg)
ProposalIn g, the true LM pLM is replaced by an upperbound qLM
with stronger independence assumptionsI Suppose α = yJ
I is a substring of y = yM1
e.g. α = black2 cat3 in y = BOS0 the1 black2 cat3 EOS4
I contribution of α to the true LM score of y
pLM(α) ≡J∏
k=Ip(yk |yk−1
1 )
e.g. p(black2|BOS0 the1)p(cat3|BOS0 the1 black2)I upperbound to pLM(α)
qLM(α) ≡ q(yI |ε)J∏
k=I+1q(yk |yk−1
I )
e.g. q(black2|ε)q(cat3|black2)
where q(z|P) ≡ maxH∈∆∗ p(z|HP)
7 / 21
![Page 49: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/49.jpg)
ProposalIn g, the true LM pLM is replaced by an upperbound qLM
with stronger independence assumptionsI Suppose α = yJ
I is a substring of y = yM1
e.g. α = black2 cat3 in y = BOS0 the1 black2 cat3 EOS4
I contribution of α to the true LM score of y
pLM(α) ≡J∏
k=Ip(yk |yk−1
1 )
e.g. p(black2|BOS0 the1)p(cat3|BOS0 the1 black2)
I upperbound to pLM(α)
qLM(α) ≡ q(yI |ε)J∏
k=I+1q(yk |yk−1
I )
e.g. q(black2|ε)q(cat3|black2)
where q(z|P) ≡ maxH∈∆∗ p(z|HP)
7 / 21
![Page 50: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/50.jpg)
ProposalIn g, the true LM pLM is replaced by an upperbound qLM
with stronger independence assumptionsI Suppose α = yJ
I is a substring of y = yM1
e.g. α = black2 cat3 in y = BOS0 the1 black2 cat3 EOS4
I contribution of α to the true LM score of y
pLM(α) ≡J∏
k=Ip(yk |yk−1
1 )
e.g. p(black2|BOS0 the1)p(cat3|BOS0 the1 black2)I upperbound to pLM(α)
qLM(α) ≡ q(yI |ε)J∏
k=I+1q(yk |yk−1
I )
e.g. q(black2|ε)q(cat3|black2)
where q(z|P) ≡ maxH∈∆∗ p(z|HP)7 / 21
![Page 51: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/51.jpg)
Max-ARPAAn n-gram is scored in its most optimistic context H
q(z|P) ≡ maxH∈∆∗
p(z|HP)
Efficiently computed using a Max-ARPA table MI start with an ARPA table
n-gram Pz — conditional log p(z|P) — backoff b(Pz)I compute an upperbound view of the last 2 columns
n-gram Pz — max-conditional log q(z|P) — max-backoff m(Pz)
Then for an arbitrary n-gram Pz,
q(z|P) =
p(z|P) Pz 6∈ M and P 6∈ Mp(z|P)×m(P) Pz 6∈ M and P ∈ Mq(z|P) Pz ∈ M
8 / 21
![Page 52: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/52.jpg)
Max-ARPAAn n-gram is scored in its most optimistic context H
q(z|P) ≡ maxH∈∆∗
p(z|HP)
Efficiently computed using a Max-ARPA table M
I start with an ARPA tablen-gram Pz — conditional log p(z|P) — backoff b(Pz)
I compute an upperbound view of the last 2 columnsn-gram Pz — max-conditional log q(z|P) — max-backoff m(Pz)
Then for an arbitrary n-gram Pz,
q(z|P) =
p(z|P) Pz 6∈ M and P 6∈ Mp(z|P)×m(P) Pz 6∈ M and P ∈ Mq(z|P) Pz ∈ M
8 / 21
![Page 53: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/53.jpg)
Max-ARPAAn n-gram is scored in its most optimistic context H
q(z|P) ≡ maxH∈∆∗
p(z|HP)
Efficiently computed using a Max-ARPA table MI start with an ARPA table
n-gram Pz — conditional log p(z|P) — backoff b(Pz)
I compute an upperbound view of the last 2 columnsn-gram Pz — max-conditional log q(z|P) — max-backoff m(Pz)
Then for an arbitrary n-gram Pz,
q(z|P) =
p(z|P) Pz 6∈ M and P 6∈ Mp(z|P)×m(P) Pz 6∈ M and P ∈ Mq(z|P) Pz ∈ M
8 / 21
![Page 54: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/54.jpg)
Max-ARPAAn n-gram is scored in its most optimistic context H
q(z|P) ≡ maxH∈∆∗
p(z|HP)
Efficiently computed using a Max-ARPA table MI start with an ARPA table
n-gram Pz — conditional log p(z|P) — backoff b(Pz)I compute an upperbound view of the last 2 columns
n-gram Pz — max-conditional log q(z|P) — max-backoff m(Pz)
Then for an arbitrary n-gram Pz,
q(z|P) =
p(z|P) Pz 6∈ M and P 6∈ Mp(z|P)×m(P) Pz 6∈ M and P ∈ Mq(z|P) Pz ∈ M
8 / 21
![Page 55: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/55.jpg)
Max-ARPAAn n-gram is scored in its most optimistic context H
q(z|P) ≡ maxH∈∆∗
p(z|HP)
Efficiently computed using a Max-ARPA table MI start with an ARPA table
n-gram Pz — conditional log p(z|P) — backoff b(Pz)I compute an upperbound view of the last 2 columns
n-gram Pz — max-conditional log q(z|P) — max-backoff m(Pz)
Then for an arbitrary n-gram Pz,
q(z|P) =
p(z|P) Pz 6∈ M and P 6∈ Mp(z|P)×m(P) Pz 6∈ M and P ∈ Mq(z|P) Pz ∈ M
8 / 21
![Page 56: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/56.jpg)
Proposal hypergraph
The proposal g(d) can be efficiently represented by a hypergraph
Remark!Nodes do not store LM state
9 / 21
![Page 57: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/57.jpg)
Proposal hypergraph
The proposal g(d) can be efficiently represented by a hypergraph
Remark!Nodes do not store LM state
9 / 21
![Page 58: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/58.jpg)
Proposal hypergraph
The proposal g(d) can be efficiently represented by a hypergraph
Remark!Nodes do not store LM state
| 〈D(x), f (d)〉 | = |G(d) ∩ A| ∝ I 22d |∆|n−1
9 / 21
![Page 59: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/59.jpg)
Proposal hypergraph
The proposal g(d) can be efficiently represented by a hypergraph
Remark!Nodes do not store LM state
| 〈D(x), g(d)〉 | = |G(d) ∩���upperbound
A| ∝ I 22d����|∆|n−1
9 / 21
![Page 60: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/60.jpg)
Refinement
Goal: break independence assumptions1. making larger n-grams2. bringing g closer to f
Examples
10 / 21
![Page 61: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/61.jpg)
Refinement
Goal: break independence assumptions1. making larger n-grams2. bringing g closer to f
Examples
10 / 21
![Page 62: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/62.jpg)
Refinement
Goal: break independence assumptions1. making larger n-grams2. bringing g closer to f
Examples
g′(d) ={
g(d) if d 6= argmax g(d)f (d) otherwise.
10 / 21
![Page 63: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/63.jpg)
Refinement
Goal: break independence assumptions1. making larger n-grams2. bringing g closer to f
Examples
g′(d) ={
g(d) if d 6= argmax g(d)f (d) otherwise.
Too local: one derivation at a time
10 / 21
![Page 64: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/64.jpg)
Refinement
Goal: break independence assumptions1. making larger n-grams2. bringing g closer to f
Examples
g′(d) = g(d)q(z|hP)q(z|P)
k0 1
aelse a/w(a)
z/w(z)where k counts occurrences of hPz in yield(d)
10 / 21
![Page 65: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/65.jpg)
Refinement
Goal: break independence assumptions1. making larger n-grams2. bringing g closer to f
Examples
g′(d) = g(d)q(z|hP)q(z|P)
k0 1
aelse a/w(a)
z/w(z)where k counts occurrences of hPz in yield(d)Too global: refines derivations which already score poorly
10 / 21
![Page 66: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/66.jpg)
LM state refinement1
Break independence assumptions (making larger n-grams)but not in every derivation
Example:I suppose the argmax is (X1(X2(X3the)black)cat)I and node X3 currently stores an empty LM statesI this motivates a refined node X3′ whose LM state is
the · LMState(X3)
I incoming edges to X3 are splitI outgoing edges from X3′ are reweighted copies of those leaving X3
This is connected to an intersection local to X3
∗(X3 ∗ the)z∗ with weight update q(z|the)q(z)
1[Li and Khudanpur, 2008, Heafield et al., 2013]11 / 21
![Page 67: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/67.jpg)
LM state refinement1
Break independence assumptions (making larger n-grams)but not in every derivation
Example:I suppose the argmax is (X1(X2(X3the)black)cat)
I and node X3 currently stores an empty LM statesI this motivates a refined node X3′ whose LM state is
the · LMState(X3)
I incoming edges to X3 are splitI outgoing edges from X3′ are reweighted copies of those leaving X3
This is connected to an intersection local to X3
∗(X3 ∗ the)z∗ with weight update q(z|the)q(z)
1[Li and Khudanpur, 2008, Heafield et al., 2013]11 / 21
![Page 68: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/68.jpg)
LM state refinement1
Break independence assumptions (making larger n-grams)but not in every derivation
Example:I suppose the argmax is (X1(X2(X3the)black)cat)I and node X3 currently stores an empty LM states
I this motivates a refined node X3′ whose LM state is
the · LMState(X3)
I incoming edges to X3 are splitI outgoing edges from X3′ are reweighted copies of those leaving X3
This is connected to an intersection local to X3
∗(X3 ∗ the)z∗ with weight update q(z|the)q(z)
1[Li and Khudanpur, 2008, Heafield et al., 2013]11 / 21
![Page 69: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/69.jpg)
LM state refinement1
Break independence assumptions (making larger n-grams)but not in every derivation
Example:I suppose the argmax is (X1(X2(X3the)black)cat)I and node X3 currently stores an empty LM statesI this motivates a refined node X3′ whose LM state is
the · LMState(X3)
I incoming edges to X3 are splitI outgoing edges from X3′ are reweighted copies of those leaving X3
This is connected to an intersection local to X3
∗(X3 ∗ the)z∗ with weight update q(z|the)q(z)
1[Li and Khudanpur, 2008, Heafield et al., 2013]11 / 21
![Page 70: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/70.jpg)
LM state refinement1
Break independence assumptions (making larger n-grams)but not in every derivation
Example:I suppose the argmax is (X1(X2(X3the)black)cat)I and node X3 currently stores an empty LM statesI this motivates a refined node X3′ whose LM state is
the · LMState(X3)
I incoming edges to X3 are split
I outgoing edges from X3′ are reweighted copies of those leaving X3
This is connected to an intersection local to X3
∗(X3 ∗ the)z∗ with weight update q(z|the)q(z)
1[Li and Khudanpur, 2008, Heafield et al., 2013]11 / 21
![Page 71: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/71.jpg)
LM state refinement1
Break independence assumptions (making larger n-grams)but not in every derivation
Example:I suppose the argmax is (X1(X2(X3the)black)cat)I and node X3 currently stores an empty LM statesI this motivates a refined node X3′ whose LM state is
the · LMState(X3)
I incoming edges to X3 are splitI outgoing edges from X3′ are reweighted copies of those leaving X3
This is connected to an intersection local to X3
∗(X3 ∗ the)z∗ with weight update q(z|the)q(z)
1[Li and Khudanpur, 2008, Heafield et al., 2013]11 / 21
![Page 72: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/72.jpg)
LM state refinement1
Break independence assumptions (making larger n-grams)but not in every derivation
Example:I suppose the argmax is (X1(X2(X3the)black)cat)I and node X3 currently stores an empty LM statesI this motivates a refined node X3′ whose LM state is
the · LMState(X3)
I incoming edges to X3 are splitI outgoing edges from X3′ are reweighted copies of those leaving X3
This is connected to an intersection local to X3
∗(X3 ∗ the)z∗ with weight update q(z|the)q(z)
1[Li and Khudanpur, 2008, Heafield et al., 2013]11 / 21
![Page 73: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/73.jpg)
Refinement actions
Goal: lower g’s maximum as much as possible
Heuristic
Refine all nodes participating in the current argmaxI by extending a node’s LM (right) state by
exactly one word from yield(argmax g(d))
12 / 21
![Page 74: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/74.jpg)
Refinement actions
Goal: lower g’s maximum as much as possible
Heuristic
Refine all nodes participating in the current argmaxI by extending a node’s LM (right) state by
exactly one word from yield(argmax g(d))
12 / 21
![Page 75: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/75.jpg)
13 / 21
0
5
10
15
20
(8 (6
(2 th
e) b
lack
) cat
)
(8 (6
(2 th
e) d
ark)
cat
)
(8 (6
(2 th
e) b
lack
) fel
ine)
(8 (6
(2 th
e) d
ark)
felin
e)
(5 (4
(2 th
e) c
at) b
lack
)
(5 (4
(2 th
e) c
at) d
ark)
(5 (4
(2 th
e) fe
line)
bla
ck)
(5 (4
(2 th
e) fe
line)
dar
k)
(5 (2
the)
bla
ck c
at)
(5 (7
(3 c
at) t
he) b
lack
)
(5 (7
(3 c
at) t
he) d
ark)
(5 (7
(3 fe
line)
the)
bla
ck)
(5 (7
(3 fe
line)
the)
dar
k)
f g
I argmax is (X8(X6(X2the)black)cat)
![Page 76: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/76.jpg)
13 / 21
0
5
10
15
20
(8 (6
(2"
the)
bla
ck) c
at)
(8 (6
(2"
the)
dar
k) c
at)
(8 (6
(2"
the)
bla
ck) f
elin
e)
(8 (6
(2"
the)
dar
k) fe
line)
(5 (4
(2"
the)
cat
) bla
ck)
(5 (4
(2"
the)
cat
) dar
k)
(5 (4
(2"
the)
felin
e) b
lack
)
(5 (4
(2"
the)
felin
e) d
ark)
(5 (2
" th
e) b
lack
cat
)
(5 (7
(3 c
at) t
he) b
lack
)
(5 (7
(3 c
at) t
he) d
ark)
(5 (7
(3 fe
line)
the)
bla
ck)
(5 (7
(3 fe
line)
the)
dar
k)
f g the
I the argmax is (X8(X6(X2the)black)cat)I refine spans continuing from X2 by conditioning on the
![Page 77: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/77.jpg)
13 / 21
0
5
10
15
20
(8 (6
" (2
" th
e) b
lack
) cat
)
(8 (6
(2"
the)
dar
k) c
at)
(8 (6
" (2
" th
e) b
lack
) fel
ine)
(8 (6
(2"
the)
dar
k) fe
line)
(5 (4
(2"
the)
cat
) bla
ck)
(5 (4
(2"
the)
cat
) dar
k)
(5 (4
(2"
the)
felin
e) b
lack
)
(5 (4
(2"
the)
felin
e) d
ark)
(5 (2
" th
e) b
lack
cat
)
(5 (7
(3 c
at) t
he) b
lack
)
(5 (7
(3 c
at) t
he) d
ark)
(5 (7
(3 fe
line)
the)
bla
ck)
(5 (7
(3 fe
line)
the)
dar
k)
f g the black
I the argmax is (X8(X6(X2the)black)cat)I refine spans continuing from X6 by conditioning on black
![Page 78: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/78.jpg)
13 / 21
0
5
10
15
20
(8"
(6"
(2"
the)
bla
ck) c
at)
(8"
(6 (2
" th
e) d
ark)
cat
)
(8 (6
" (2
" th
e) b
lack
) fel
ine)
(8 (6
(2"
the)
dar
k) fe
line)
(5 (4
(2"
the)
cat
) bla
ck)
(5 (4
(2"
the)
cat
) dar
k)
(5 (4
(2"
the)
felin
e) b
lack
)
(5 (4
(2"
the)
felin
e) d
ark)
(5 (2
" th
e) b
lack
cat
)
(5 (7
(3 c
at) t
he) b
lack
)
(5 (7
(3 c
at) t
he) d
ark)
(5 (7
(3 fe
line)
the)
bla
ck)
(5 (7
(3 fe
line)
the)
dar
k)
f g the black cat
I the argmax is (X8(X6(X2the)black)cat)I refine spans continuing from X8 by conditioning on cat
![Page 79: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/79.jpg)
13 / 21
0
5
10
15
20
(8"
(6"
(2"
the)
bla
ck) c
at)
(8"
(6 (2
" th
e) d
ark)
cat
)
(8 (6
" (2
" th
e) b
lack
) fel
ine)
(8 (6
(2"
the)
dar
k) fe
line)
(5 (4
(2"
the)
cat
) bla
ck)
(5 (4
(2"
the)
cat
) dar
k)
(5 (4
(2"
the)
felin
e) b
lack
)
(5 (4
(2"
the)
felin
e) d
ark)
(5 (2
" th
e) b
lack
cat
)
(5 (7
(3 c
at) t
he) b
lack
)
(5 (7
(3 c
at) t
he) d
ark)
(5 (7
(3 fe
line)
the)
bla
ck)
(5 (7
(3 fe
line)
the)
dar
k)
f g"
I obtaining a new hypergraph 〈D(x), g′(d)〉
![Page 80: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/80.jpg)
Experiments
German-English (WMT14 data)I phrase extraction: 2.2M sentencesI maximum phrase length 5I maximum translation options 40I unpruned LMs: 25M sentencesI dev set: newstest2010 (LM interpolation and tuning)I batch-mira tuning (cube pruning beam 5000)I test set: newstest2012 (3,003 sentences)I distortion limit d = 4
14 / 21
![Page 81: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/81.jpg)
Exact decoding
n build (s) total (s) N |V | |E |3 1.5 21 190 2.5 1594 1.5 50 350 4 2885 1.5 106 555 6.1 450
I time to build initial proposalI decoding timeI number of iterationsI size of the hypergraph (in thousands of nodes and edges)
15 / 21
![Page 82: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/82.jpg)
Cube pruning
Search errors by beam size (k)
k n-gram LM3 4 5
10 2168 2347 2377102 613 999 1126103 29 102 167104 0 4 7
16 / 21
![Page 83: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/83.jpg)
Translation quality with BLEU
Cube pruning and exact decoding with OS∗
k 3-gram LM 4-gram LM 5-gram LM10 20.47 20.71 20.69102 21.14 21.73 21.76103 21.27 21.89 21.91104 21.29 21.92 21.93OS∗ 21.29 21.92 21.93
17 / 21
![Page 84: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/84.jpg)
Conclusions
Exact decoding
I manageable timeI fraction of the search space
Search error curves for beam search and cube pruningI large phrase tablesI large 5-gram LMs
Exactness at the cost of worst-case complexity, howeverI we demonstrate empirically that the algorithm is practicable
18 / 21
![Page 85: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/85.jpg)
Conclusions
Exact decodingI manageable time
I fraction of the search spaceSearch error curves for beam search and cube pruning
I large phrase tablesI large 5-gram LMs
Exactness at the cost of worst-case complexity, howeverI we demonstrate empirically that the algorithm is practicable
18 / 21
![Page 86: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/86.jpg)
Conclusions
Exact decodingI manageable timeI fraction of the search space
Search error curves for beam search and cube pruningI large phrase tablesI large 5-gram LMs
Exactness at the cost of worst-case complexity, howeverI we demonstrate empirically that the algorithm is practicable
18 / 21
![Page 87: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/87.jpg)
Conclusions
Exact decodingI manageable timeI fraction of the search space
Search error curves for beam search and cube pruning
I large phrase tablesI large 5-gram LMs
Exactness at the cost of worst-case complexity, howeverI we demonstrate empirically that the algorithm is practicable
18 / 21
![Page 88: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/88.jpg)
Conclusions
Exact decodingI manageable timeI fraction of the search space
Search error curves for beam search and cube pruningI large phrase tablesI large 5-gram LMs
Exactness at the cost of worst-case complexity, howeverI we demonstrate empirically that the algorithm is practicable
18 / 21
![Page 89: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/89.jpg)
Conclusions
Exact decodingI manageable timeI fraction of the search space
Search error curves for beam search and cube pruningI large phrase tablesI large 5-gram LMs
Exactness at the cost of worst-case complexity, however
I we demonstrate empirically that the algorithm is practicable
18 / 21
![Page 90: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/90.jpg)
Conclusions
Exact decodingI manageable timeI fraction of the search space
Search error curves for beam search and cube pruningI large phrase tablesI large 5-gram LMs
Exactness at the cost of worst-case complexity, howeverI we demonstrate empirically that the algorithm is practicable
18 / 21
![Page 91: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/91.jpg)
Recent developments
1. exact k-best2. exact sampling
19 / 21
![Page 92: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/92.jpg)
Future work
OptimisationI early stop the searchI error safe pruning
SpeedupsI be more selective with refinementsI LR to deal with powerset constraints (allowing for higher d)I more grouping (partial edges)
Hiero models
20 / 21
![Page 93: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/93.jpg)
Future work
OptimisationI early stop the searchI error safe pruning
SpeedupsI be more selective with refinementsI LR to deal with powerset constraints (allowing for higher d)I more grouping (partial edges)
Hiero models
20 / 21
![Page 94: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/94.jpg)
Future work
OptimisationI early stop the searchI error safe pruning
SpeedupsI be more selective with refinementsI LR to deal with powerset constraints (allowing for higher d)I more grouping (partial edges)
Hiero models
20 / 21
![Page 95: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/95.jpg)
Thanks!
Questions?
21 / 21
![Page 96: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/96.jpg)
Variable-order LM
n Nodes at level m LM states at level m0 1 2 3 4 1 2 3 4
3 0.4 1.2 0.5 - - 113 263 - -4 0.4 1.6 1.4 0.3 - 132 544 212 -5 0.4 2.1 2.4 0.7 0.1 142 790 479 103
Table : Average number of nodes (in thousands) whose LM state encodean m-gram, and average number of unique LM states of order m in thefinal hypergraph for different n-gram LMs (d = 4 everywhere).
22 / 21
![Page 97: Exact decoding for phrase-based SMT › slides › emnlp2014.pdf · Exact decoding for phrase-based SMT Wilker Aziz1, Marc Dymetman2, Lucia Specia1 1University of Sheffield 2Xerox](https://reader030.vdocuments.site/reader030/viewer/2022040108/5f049a497e708231d40ec91b/html5/thumbnails/97.jpg)
References I
Kenneth Heafield, Philipp Koehn, and Alon Lavie. Groupinglanguage model boundary words to speed k-best extraction fromhypergraphs. In Proceedings of the 2013 Conference of theNorth American Chapter of the Association for ComputationalLinguistics: Human Language Technologies, pages 958–968,Atlanta, Georgia, USA, June 2013.
Zhifei Li and Sanjeev Khudanpur. A scalable decoder forparsing-based machine translation with equivalent languagemodel state maintenance. In Proceedings of the SecondWorkshop on Syntax and Structure in Statistical Translation,SSST ’08, pages 10–18, Stroudsburg, PA, USA, 2008.Association for Computational Linguistics.
23 / 21