unsupervised constraint driven learning for transliteration discovery m. chang, d. goldwasser, d....
TRANSCRIPT
Unsupervised Constraint Driven Learning for Transliteration Discovery
M. Chang, D. Goldwasser, D. Roth, and Y. Tu
What I am going to do today…
Goal 1 : Present the transliteration work Get feedback!
Goal 2: Think about this work with CCM Tutorial …. I will try to present this work in a slightly
different way Some of them are my personal comment Different than our yesterday discussion
Please give us comment about this Make this work more general (not only
transliteration)
Constraints Driven Learning
Why Constraints? The Goal: Building a good system easily We have prior knowledge at our hand
Why not inject knowledge directly ?
How useful are constraints? Useful for supervised learning [Yih and Roth 04]
[many others]
Useful for semi-supervised learning [Chang et.al. ACL 2007]
Some times more efficient than labeling data directly
Unsupervised Constraint Driven Learning
In this work We do not use any label instance Achieve to good performance that competitive
several supervised model
Compared to [Chang et.al. ACL 2007]
In ACL 07, they use a small amount of dataset (5-20)
Reason: Bad Models can not benefit from constraints!
For some applications, we have very good resource We do not need labeled instances at all!
6
In a nutshell:
Traditional semi-supervised learning.
• Model can drift from the correct one. Model
Unlabeled Data
Prediction
Label unlabeled data
Feedback
Learn from labeled data
Unsupervised Learning
Resource
7
In a nutshell:
CODLUse constraints to generate better training samples in unsupervised learning.
Prediction+ Constraints
Model
Unlabeled Data
PredictionFeedback
More accurate labeling
Better Model
CODLImproves “Simple” Model
Using Expressive Constraints
Transliteration Generation (Not our focus)
Given a Source Transliteration; What is the target transliteration? Bush
布希 Sushi
壽司 Issues
Ambiguity : For the same source word, many different
transliteration Think about Chinese
What we want: find the most widely used transliteration
Transliteration Discovery (Our focus)
Problem Settings Give you two list of words, map them!
Advantages A relatively easy problem Can find the most widely used transliteration
Assumption: Source: English Each source entities has a transliteration in the
target candidates Target candidates might not be named entities
Algorithm Outline
Prediction Model
How to use existing resource to construct the Model?
Constraints?
Learning Algorithm
The Prediction Model
How do we make prediction? Given a source word, how to predict the best target
?
Model 1 : Vs, Vt Yes or No Issue: Not many obvious constraints can be added Not a structure prediction problem
Model 2: Vs, Vt Hidden variables Yes or No Predicting F is a structure prediction algorithm We can add constraints more easily
The Prediction Model
Score for a pair
A CCM formulation
A slightly different scoring function
Ff
tsts vvfWfvvWFscore ),|(]1[),,|(
Cc
tststs cvvFvvWFscorevvFg ),,|(),,|(),,(
More on this point in the next few slides
),(maxarg*ts vvscorev
t
||
),,(maxarg),(
t
tsFts v
vvFgvvscore
Hidden Variables
Violation
Prediction Model: Another View
The scoring function looks like weight times features!
If there is a bad feature, score - ∞
Our Hidden variable (Feature Vectors): Character Mapping
),(
),|(]1[),,|(
tsT
Fftsts
vvFW
vvfWfvvWFscore
Cc
tstsT
ts cvvFvvFWvvFg ),,|(),(),,(
),( ts vvF
Algorithm Outline
Prediction Model
How to use existing resource to construct the Model?
Constraints?
Learning Algorithm
Resource: Romanization Table
Hebrew, Russian How can you type Hebrew or Russian?
Use English Keyboard, C maps to A similar character “C” or “S” in Hebrew or Russian
Very easy to get Ambiguous
Special Case: Chinese (Pin Yin)壽司 shòu sī (Low ambiguity) Map Pin-Yin to English (sushi) Romanization Table? a a
Initialize the Table
Every character pair in the Romanization Table Weight = 0 Everything else, -1 Could have better way to do initialization
Note: All (v_s,v_t) will get zero without constraints
Cc
tstsT
ts cvvFvvFWvvFg ),,|(),(),,(
Algorithm Outline
Prediction Model
How to use existing resource to construct the Model?
Constraints?
Learning Algorithm
Constraints
General Constraints Coverage: all character need to be mapped at
least once No crossing: character mappings can not cross
each other
Language Specific Constraints General Restricted Mapping Initial Restricted Mapping Length Restriction
Algorithm Outline
Prediction Model
How to use existing resource to construct the Model?
Constraints?
Learning Algorithm
High-Level Overview
Model Resource While Converge
Use Model + Constraints to get Labels (for both F, y)
Update Model with newly labeled F and y (without Constraints) (details in the next slide)
Similar to ACL 07 Update the model without Constraints
Difference from ACL 07 We get feedback from the labels of both hidden
variables and output
Experimental Setting
Evaluation
ACC: Top candidate is (one of) the right answer
Learning Algorithm Linear SVM with C = 0.5
Dataset English-Hebrew 300: 300 English-Chinese 581:681 English-Russian 727:50648 (Target includes all
words)
Analysis
A small Russian subset was used here
1) Without Constraints (on features), Romanization Table is useless!
2) General Constraints are more important!
4) Better Constraints Lead to Better Final Results
3) Learning has great impact here!But constraints are very important, too!
Related Works (Need more work here)
Learning the score for Edit Distance
Previous transliteration works
Machine translation?
Conclusion
ML: unsupervised constraint driven algorithm Use hidden variable to find more constraints (e.g. co-ref) Use constraints to find “cleaner” feature representation
Transliteration: Usage of Normalization Table as the starting point
We can get good results without training data Right constraints (modeling) is the key
Future Work Transliteration Model: Better Model, Quicker Inference CoDL: Other applications for unsupervised CoDL
33
Constraint - Driven Learning (CODL)
=learn(Tr)For N iterations do
T= For each x in unlabeled dataset
y Inference(x, )T=T {(x, y)}
= +(1- )learn(T)
Any supervised learning algorithm parametrized by
Learn from new training data.Weight supervised and unsupervised model(Nigam2000*).
Augmenting the training set (feedback). Any inference algorithm (with constraints).
Inference(x,C, )
34
Unsupervised Constraint - Driven Learning
=Construct(Resource)For N iterations do
T= For each x in unlabeled dataset
y Inference(x, )T=T {(x, y)}
= +(1- )learn(T)
Construct the model with Resources
Learn from new training data. = 0 in this work
Augmenting the training set (feedback). Any inference algorithm (with constraints).
Inference(x,C, )