linguistically-motivated tree-based probabilistic phrase alignment toshiaki nakazawa, sadao...
Post on 06-Jan-2018
233 Views
Preview:
DESCRIPTION
TRANSCRIPT
Linguistically-motivated Tree-based Probabilistic Phrase Alignment
Toshiaki Nakazawa, Sadao Kurohashi (Kyoto University)
Outline Background Tree-based Probabilistic Phrase Alignment
Model Model Training Symmetrization Algorithm Experiments Conclusions
2 05/03/23
Background Many of state-of-the-art SMT systems are
based on “word-based” alignment results Phrase-based SMT [Koehn et al., 2003] Hierarchical Phrase-based SMT [Chiang, 2005] and so on
Some of them incorporate syntactic information “after” word-based alignment [Quirk et al., 2005], [Galley et al., 2006] and so on
Is it enough? Is it able to achieve “practical” translation
quality?
3 05/03/23
Background (cont.) Word-based alignment model works well for
structurally similar language pairs It is not effective for language pairs with great
difference in linguistic structure such as Japanese and English SOV versus SVO
For such language pair, syntactic information is necessary even during alignment process
4 05/03/23
Related Work Syntactic tree-based model
[Yamada and Knight, 2001], [Gildea, 2003], ITG by Wu Incorporating some operations which control sub-
trees (re-order, insert, delete, clone) to reproduce the opposite tree structure
Our model does not require any operations Our model utilizes dependency trees
Dependency tree-based model [Cherry and Lin, 2003] Word-to-word, and one-to-one alignment Our model makes phrase-to-phrase alignment, and
can make many-to-many links
5 05/03/23
Features of Proposed Tree-based Probabilistic Phrase Alignment Model Generation model similar to IBM models Using phrase dependency structures
“phrase” means a linguistic phrase (cf. phrase-based SMT)
Phrase to phrase alignment model Each phrase (node) consists of basically 1 content
word and 0 or more function words Source side content words can be aligned to content
words of target side only (same for function words) Generation starts from the root node and end
up with one of leaf nodes (cf. IBM model is from first word to last word)
6 05/03/23
Outline Background Tree-based Probabilistic Phrase Alignment
Model Model Training Symmetrization Algorithm Experiments Conclusions
7 05/03/23
Dependency Analysis of Sentencesプロピレングリコールは血中グルコースインスリンを上昇させ、血中NEFA 濃度を減少させる
Propylene glycol increases in blood glucose and insulin and decreases in NEFA concentration in the blood
Source Target
Word order
Head node Head node
Root node
Root node
8 05/03/23
IBM Model v.s Tree-based Model IBM Model [Brown et al., 93]
Tree-based Model
)|(),|(maxargˆ eapaefpaa
)|(),|(maxargˆ eefa
TapaTTpa
f : source sentencee : target sentence
S
s asss eapaefp
1
)|(),|(maxargˆ
S
s asesesf TapaTTp
1,,, )|(),|(maxargˆ
a : alignment
: parameters
fT : source tree
eT : target tree
9 05/03/23
Model Decomposition:Lexicon Probability Suppose consists of nodes and consists
of nodes
is calculated as a product of two probabilities
Ex) 濃度 を - in concentration
上昇 さ せ - increase
),|( aTTp ef
fT J eTI
J
jajef jefpaTTp
1
)|(),|(
)|(jaj efp
)|()|()|( .. jjj ajfuncajcontaj efpefpefp
)in|を()ionconcentrat |濃度( .. funccont pp
Phrase translatio
n probabilit
y
)EMPTY|せ さ()increase|上昇( .. funccont pp
)|(),|(maxargˆ eefa
TapaTTpa
10 05/03/23
Model Decomposition:Alignment Probability Define the parent node of as is decomposed as a product of target
side dependency relation probability conditioned on source side relation
If the parent node has been aligned to NULL, indicates the grandparent of , and this continues until has been aligned to other than NULL
models a tree-based reordering
jf
)|( eTap
J
jjjaae ffreleerelpTap
jj1
)),(|),(()|(
jf )|( eTap
jf jf
jf
Dependency relation
probability
)|( eTap
)|(),|(maxargˆ eefa
TapaTTpa
11 05/03/23
Outline Background Tree-based Probabilistic Phrase Alignment
Model Model Training Symmetrization Algorithm Experiments Conclusions
12 05/03/23
Model Training The proposed model is trained by EM
algorithm First, phrase translation probability is learned
(Model 1) Model 1 can be efficiently learned without
approximation (cf. IBM model 1 and 2) Next, dependency relation probability is
learned (Model 2) with probabilities learned in Model 1 as initial parameters Model 2 needs some approximation (cf. IBM model
3 or greater), we use beam-search algorithm
13 05/03/23
Model 1 Each phrase in source side can
correspond to an arbitrary phrase in target side a or NULL phrase
A probability of one possible alignment is:
Then, tree translation probability is:
Efficiently calculated as:
)1( Jjf j
)1( Iiei )( 0e
)|()|()|,( .1
. jj ajfunc
J
jajcontef efpefpTTap
a
efef TTapTTp )|,()|(
J
j
I
iaj
a
J
jaj jj
efpefp1 01
)|()|(
14 05/03/23
Model 2 (imaginary ROOT node) Root node of a sentence is supposed to
depend on the imaginary ROOT node, which works as a Start-Of-Sentence (SOS) in word-based model
The ROOT node in source tree always corresponds to that of target tree
事例 を 通して援助 の
視点 に必要な
ポイント を確認 した
ROOT
necessarythe point
through the casein the viewpoint
of the assistwas confirmed
ROOT
15 05/03/23
Model 2 (beam-search algorithm) It is impossible to enumerate all the possible
alignment Consider only a subset of “good-looking”
alignments using beam-search algorithm Ex) beam-width = 4
事例 を 通して援助 の
視点 に必要な
ポイント を確認 した
necessarythe point
through the casein the viewpoint
of the assistwas confirmedNULL
16 05/03/23
Model 2 (beam-search algorithm)事例 を 通して
援助 の視点 に
必要な
ポイント を確認 した
necessary
the pointthrough the case
in the viewpoint
of the assistwas confirmed
NULL
事例 を 通して
援助 の視点 に
必要な
ポイント を確認 した
necessary
the pointthrough the case
in the viewpoint
of the assistwas confirmed
NULL
事例 を 通して
援助 の視点 に
必要な
ポイント を確認 した
necessary
the pointthrough the case
in the viewpoint
of the assistwas confirmed
NULL
事例 を 通して
援助 の視点 に
必要な
ポイント を確認 した
necessary
the pointthrough the case
in the viewpoint
of the assistwas confirmed
NULL
17 05/03/23
Model 2 (parameter notations) Dependency relation between two
phrases and is defined as a path from to using the following notations: “c-” if is a pre-child of “c+” if is a post-child of “p-” if is a post-child of “p+” if is a pre-child of “INCL” if and are same phrase “ROOT” if is an imaginary ROOT node “NULL” if is aligned to NULL
),( 21 PPrel1P 2P 2P 1P
1P 2P
1P 2P
1P2P1P2P
1P 2P
18 05/03/23
1P
1P
1P
1P
2Pc-
c+p-
p+
2P
2P
2P
2P
1P ROOTROOT2P
1P
Model 2 (parameter notations, cont.) In a case where and are two or more
nodes distant from each other, the relation is described by combining the notations
Ex)
05/03/2319
1P 2P
1P2P
c-c+
c-;c+
1P
2P
c-c+
p-
p-;c+;c-
Dependency Relation Probability Examples事例 を 通して
援助 の視点 に
必要な
ポイント を確認 した
necessary
the pointthrough the case
in the viewpoint
of the assistwas confirmed
NULL
事例 を 通して
援助 の視点 に
必要な
ポイント を確認 した
necessary
the pointthrough the case
in the viewpoint
of the assistwas confirmed
NULL
事例 を 通して
援助 の視点 に
必要な
ポイント を確認 した
necessary
the pointthrough the case
in the viewpoint
of the assistwas confirmed
NULL
事例 を 通して
援助 の視点 に
必要な
ポイント を確認 した
necessary
the pointthrough the case
in the viewpoint
of the assistwas confirmed
NULL
20 05/03/23
c-)|(c-ROOT)|(ROOT pp c-)|(p-ROOT)|c-(ROOT; pp
c-)|c(c-;ROOT)|(ROOT pp c-)ROOT;|(ROOTROOT)|(NULL pp
Example事例 を 通して
援助 の視点 に
必要なポイント を
確認 したROOT
necessarythe point
through the casein the viewpoint
of the assistwas confirmed
ROOT
)|,( ef TTapROOT)|(ROOTEMPTY)|(confirmed) was|確認( ppp した
c-)|c(c-;through)| 通 を(case)|事例( ppp してc-)|(c-EMPTY)|(point)|( ppp をポイント
c-)|(c-EMPTY)|EMPTY(necessary)|( ppp 必要なc-)|(cin)|(viewpoint)|( ppp に視点
c-)|(cof)|(assist)|( ppp の援助21 05/03/23
Outline Background Tree-based Probabilistic Phrase Alignment
Model Model Training Symmetrization Algorithm Experiments Conclusions
22 05/03/23
Symmetrization Algorithm Since our model is directed, we run the model
bi-directionally and symmetrize two alignment results heuristically
Symmetrization algorithm is similar to [Koehn et al. 2003], which uses 1-best GIZA++ word alignment result of each direction
Our algorithm exploits n-best alignment results of each direction
Three steps: Superimposition Growing Handling isolations
23 05/03/23
Symmetrization Algorithm1. SuperimpositionSource to Target
5-bestTarget to Source
5-best5 210
1010
5 35
1 57 3
105 9
77 1
・・・ ・・・
24 05/03/23
Symmetrization Algorithm1. Superimposition (cont.)
5 210
1010
5 35
1 57 3
105 9
77 1
Definitive alignment points are adopted The points which don’t
have same or higher scored point in their same row or column
Conflicting points are discarded The points which is in
the same row or column of the adopted point and is not contiguous to the adopted point on tree
5 210
1010
5 35
1 57 3
105 9
77 1
5 210
1010
3
57
105 9
77
25 05/03/23
Symmetrization Algorithm2. Growing Adopt contiguous
points to adopted points in both source and target tree In descending order of
the score From top to bottom From left to right
Discard conflicting points The points which have
adopted point both in the same row and column
5 210
1010
3
57
105 9
77
510
1010
3
57
109
77
26 05/03/23
Symmetrization Algorithm3. Handling Isolation Adopt points which
are not aligned to any phrase in both source and target language
510
1010
3
57
109
77
510
1010
3
57
109
77
27 05/03/23
Alignment Experiment Training corpus
Japanese-English paper abstract corpus provided by JST which consists of about 1M parallel sentences
Gold-standard alignment Manually annotated 100 sentence pairs among the
training corpus Sure (S) alignment only [Och and Ney, 2003]
Evaluation unit Morpheme-based for Japanese Word-based for English
Iterations 5 iterations for Model 1, and 5 iterations for Model 2
28 05/03/23
Alignment Experiment (cont.) Comparative experiment (word-base
alignment) GIZA++ and various symmetrization heuristics
[Koehn et al., 2007] Default settings for GIZA++
Use original forms of words for both Japanese and English
29 05/03/23
Results
Precision Recall F-measure
proposed
1-best-intersection 90.92 41.69 57.171-best-grow 83.30 54.33 65.763-best-grow 81.21 56.52 66.655-best-grow 80.59 57.33 67.00
GIZA++
intersection 88.14 40.18 55.20grow 83.50 49.65 62.27grow-final 67.19 56.91 61.63grow-final-and 78.00 52.93 63.06grow-diag 77.34 53.18 63.03grow-diag-final 67.24 56.63 61.48grow-diag-final-and 74.95 54.26 62.95
30 05/03/23
Example of Alignment ImprovementProposed model Word-base alignment
31 05/03/23
Translation Experiments Training corpus
Same to alignment experiments Test corpus
500 paper abstract sentences Decoder
Moses [Koehn et al., 2007] Use default options except for phrase table limit (20 -
> 10) and distortion limit (6 -> -1) No minimum error rate training
Evaluation BLEU No punctuations and case-insensitive
33 05/03/23
ResultsPre Rec F BLEU
proposed 1-best-intersection 90.92 41.69 57.17 12.735-best-grow 80.59 57.33 67.00 15.40
GIZA++intersection 88.14 40.18 55.20 16.35grow-diag 77.34 53.18 63.03 17.89grow-diag-final-and 74.95 54.26 62.95 17.76
34 05/03/23
Definition of function words is improper Articles? Auxiliary verbs? …
Tree-based decoder is necessary BLEU is essentially insensitive to syntactic
structure Translation quality potentially improved
Potentially Improved Example Input:
これ は LB 膜 の 厚み が アビジン を 吸着 する こと で 増加 した こと に よる 。
Proposed (30.13):this is due to the increase in the thickness of the lb film avidin adsorb
GIZA++ (33.78):the thickness of the lb film avidin to adsorption increased by it
Reference:this was due to increased thickness of the lb film by adsorbing avidin
05/03/2335
Conclusion Tree-based probabilistic phrase alignment
model using dependency tree structures Phrase translation probability Dependency relation probability
N-best symmetrization algorithm Achieve high alignment accuracy compared to
word-based models Syntactic information is useful during alignment
process BUT: Unable to improve the BLEU scores of
translation
36 05/03/23
Future Work More flexible model
Content words sometimes correspond to function words and vice versa
Integrate parsing probabilities into the model Parsing errors easily lead to alignment errors By integrating parsing probabilities, parsing
results and alignment can be revised complementary
More syntactical information Use POS or phrase category into the model
37 05/03/23
05/03/2338
Thank You!
top related