paper presentation: hmm-based alignment
DESCRIPTION
The paper presentation I did for HMM-based Alignment at IIT Bombay as a part of the Topics in NLP course. The paper treats alignment as an HMM problem, which is a different approach compared to the IBM models approach which is predominantly used.TRANSCRIPT
HMM-based Alignment in Statistical Translation (1996)
Lekha Muraleedharan [133050002]Sagar Ahire [133050073]
Roadmap
● Review of Alignment● HMM-based Alignment● Results and Examples
Roadmap: We Are Here
● Review of Alignment● HMM-based Alignment● Results and Examples
Review of Alignment
● In order to translate a French sentence F to an English sentence E, the following expression can be used:
E* = argmaxE P(E|F)
= argmaxE P(E) * P(F|E)
● To learn P(F|E), the concept of alignments is used.
Review of Alignment
● Alignment refers to a correspondence between E and F which indicates which word in F is translated to a particular word in E.
● For Example:पीटर ज द सोया
Peter slept early 1 3 2
Alignment Models
Depending on the assumptions taken, there are several possible alignment models:● IBM Models (1 to 5)● HMM-based Alignment Models
MODEL 1 MODEL 2
IBM Model 1,2 :The Math
● Assumes alignments are more likely to “lie along the diagonal”
IBM Model 1
● Assumes all alignments are equally likely● Assumes source word depends only on
target word
IBM Model 2
Roadmap: We Are Here
● Review of Alignment● HMM-based Alignment● Results and Examples
HMM-based Alignment :The Math
HMM-based Alignment
● Assumes alignment depends only on○ The previous alignment (not all previous)○ The jump width
● Thus, in this model alignments are relative
A ComparisonIBM MODEL 1 IBM MODEL 2
HMM Based Model
Roadmap: We Are Here
● Review of Alignment● HMM-based Alignment● Results and Examples
Statistical Results:Basic Framework
● Models compared:○ IBM 1○ IBM 2○ HMM
● Corpora Used (German to French)○ Avalanche Bulletins Corpus (News)○ Vermobil Corpus (Spoken Dialog)○ EuTrans Corpus (Travel & Tourism)
Statistical Results:Basic Framework
● Training Process:○ IBM 1: 10 iterations of EM○ IBM 2: 5 iterations of Maximum Approximation○ HMM: 5 iterations of Maximum Approximation
● Metric Used○ Perplexity (Wikipedia: “a measurement of how well a
probability model predicts a sample”)
Statistical Results
Corpus IBM 1 IBM 2 HMM
EuTrans 16.267 9.781 9.686
Vermobil 46.672 30.706 26.495
Intuitive Example: 1
Hin: पीटर ज द सोया
Eng: Peter slept earlyA: 1 3 2Jump: N/A 2 -1
Intuitive Example:पीटर ज द सोया
● Relatively straightforward● As there are no major jumps, translation
probabilities take precedence
Intuitive Example: 2
Hin: पीटर घर लौटने पर ज द सोया
Eng: Peter slept early on returning homeA: 1 6 5 4 3 2Jump: N/A 5 -1 -1 -1 -1
Intuitive Example:पीटर घर लौटने पर ज द सोया
● IBM 2 stresses on diagonal alignments, so it will find the correct alignment difficult, as all alignments are nearly on the inverse diagonal
● HMM only concentrates on previous alignments and overall jump lengths, so this alignment minimizes the total jump length
Intuitive Example: 3
Hin: पीटर बहुत ह ज द सोया
Eng: Peter slept very earlyA: 1 3 ? 4 2
Intuitive Example:पीटर बहुत ह ज द सोया
● The HMM model assumes that every source word has a corresponding target word
● Moreover, empty word alignments are not incorporated in the basic HMM model
● To model empty words an HMM of order 2 is required
Intuitive Example: 4
Hin: पीटर आज कल ज द सोता है
Eng: Peter sleeps early these daysA: 1 2,3 3 2 2
Intuitive Example:पीटर आज कल ज द सोता है
● सोता है↔sleeps can be handled by HMM● आज कल↔these days requires multi-word
handling to defeat a translation like “today tomorrow”
References
● HMM-based Word Alignment in Statistical Translation (1996) by Stephan Vogel, Hermann Ney, Christoph Tillman; COLING ‘96, Copenhagen
● The Mathematics of Statistical Machine Translation: Parameter Estimation (1993) by Peter Brown, Stephen Della-Pietra, Vincent Della-Pietra, Robert Mercer; Journal of Computational Linguistics