a hierarchical nonparametric bayesian approach to statistical language model domain adaptation frank...
TRANSCRIPT
![Page 1: A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation Frank Wood and Yee Whye Teh AISTATS 2009 Presented by: Mingyuan](https://reader036.vdocuments.site/reader036/viewer/2022062500/5697bfae1a28abf838c9ca49/html5/thumbnails/1.jpg)
A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model
Domain Adaptation
Frank Wood and Yee Whye Teh AISTATS 2009
Presented by: Mingyuan ZhouDuke University, ECEDecember 18, 2009
![Page 2: A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation Frank Wood and Yee Whye Teh AISTATS 2009 Presented by: Mingyuan](https://reader036.vdocuments.site/reader036/viewer/2022062500/5697bfae1a28abf838c9ca49/html5/thumbnails/2.jpg)
Outline
• Background• Pitman-Yor Process• Hierachical Pitman-Yor Process Language Models• Doubly Hierachical Pitman-Yor Process Language Model • Inference• Experimental results• Summary
![Page 3: A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation Frank Wood and Yee Whye Teh AISTATS 2009 Presented by: Mingyuan](https://reader036.vdocuments.site/reader036/viewer/2022062500/5697bfae1a28abf838c9ca49/html5/thumbnails/3.jpg)
Background: Language modeling and n-Gram models
• “A language model is usually formulated as a probability distribution p(s) over strings s that attempts to reflect how frequently a string s occurs as a sentence”.
• n-Gram (n=2: bigram, n=3: trigram)
• Smoothing:
Reference: S.F. Chen and J.T Goodman. 1998. An empirical study of smoothing techniques for language modeling. Technical Report TR-10-98, Computer Science Group, Harvard University.
![Page 4: A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation Frank Wood and Yee Whye Teh AISTATS 2009 Presented by: Mingyuan](https://reader036.vdocuments.site/reader036/viewer/2022062500/5697bfae1a28abf838c9ca49/html5/thumbnails/4.jpg)
• Example
• Smoothing
Reference: S.F. Chen and J.T Goodman. 1998. An empirical study of smoothing techniques for language modeling. Technical Report TR-10-98, Computer Science Group, Harvard University.
![Page 5: A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation Frank Wood and Yee Whye Teh AISTATS 2009 Presented by: Mingyuan](https://reader036.vdocuments.site/reader036/viewer/2022062500/5697bfae1a28abf838c9ca49/html5/thumbnails/5.jpg)
• Evaluation
• Train the n-Gram model:
• Calculate:
• Cross-entropy:
• Perplexity:
Reference: S.F. Chen and J.T Goodman. 1998. An empirical study of smoothing techniques for language modeling. Technical Report TR-10-98, Computer Science Group, Harvard University.
![Page 6: A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation Frank Wood and Yee Whye Teh AISTATS 2009 Presented by: Mingyuan](https://reader036.vdocuments.site/reader036/viewer/2022062500/5697bfae1a28abf838c9ca49/html5/thumbnails/6.jpg)
Dirichlet Process and Pitman-Yor Process
• Dirichlet Process
Number of unique words grows at
• Pitman-Yor Process
Number of unique words grows at
• When d=0, Pitman-Yor Process reduces to DP
• Both can be understood through the Chinese Restaurant process
DP Pitman-Yor
Sitting at Table k
Sitting at new Table
0~ DP( , )G G
1
( ) /( )t
k kk
c d c
1
( ) /( )t
kk
dt c
1
/( )t
k kk
c c
1
/( )t
kk
c
![Page 7: A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation Frank Wood and Yee Whye Teh AISTATS 2009 Presented by: Mingyuan](https://reader036.vdocuments.site/reader036/viewer/2022062500/5697bfae1a28abf838c9ca49/html5/thumbnails/7.jpg)
Power-law properties of the Pitman-Yor Process
Num
ber
of u
niqu
e w
ords
Number of words
0d
0.5d 0.9d
Pro
port
ion
of w
ords
app
earin
g on
ce
Number of words
0d
0.5d
0.9d
![Page 8: A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation Frank Wood and Yee Whye Teh AISTATS 2009 Presented by: Mingyuan](https://reader036.vdocuments.site/reader036/viewer/2022062500/5697bfae1a28abf838c9ca49/html5/thumbnails/8.jpg)
Hierachical Pitman-Yor Process Language Models
![Page 9: A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation Frank Wood and Yee Whye Teh AISTATS 2009 Presented by: Mingyuan](https://reader036.vdocuments.site/reader036/viewer/2022062500/5697bfae1a28abf838c9ca49/html5/thumbnails/9.jpg)
Doubly Hierachical Pitman-Yor Process Language Model
![Page 10: A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation Frank Wood and Yee Whye Teh AISTATS 2009 Presented by: Mingyuan](https://reader036.vdocuments.site/reader036/viewer/2022062500/5697bfae1a28abf838c9ca49/html5/thumbnails/10.jpg)
Doubly Hierachical Pitman-Yor Process Language Model
![Page 11: A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation Frank Wood and Yee Whye Teh AISTATS 2009 Presented by: Mingyuan](https://reader036.vdocuments.site/reader036/viewer/2022062500/5697bfae1a28abf838c9ca49/html5/thumbnails/11.jpg)
Inference• Direchlet Process, Chinese Restaurant Process
• Hierachical Direchlet Process, Chinese Restaurant Franchise
• Pitman-Yor Process, Chinese Restaurant Process
• Hierachical Pitman-Yor Process, Chinese Restaurant Franchise
• Doubly Hierachical Pitman-Yor Language Model, Graphical Pitman-Yor Process, Multi-floor Chinese Restaurant Process, Multi-floor Chinese Restaurant Franchise
![Page 12: A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation Frank Wood and Yee Whye Teh AISTATS 2009 Presented by: Mingyuan](https://reader036.vdocuments.site/reader036/viewer/2022062500/5697bfae1a28abf838c9ca49/html5/thumbnails/12.jpg)
Experimental results (HPYLM)
![Page 13: A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation Frank Wood and Yee Whye Teh AISTATS 2009 Presented by: Mingyuan](https://reader036.vdocuments.site/reader036/viewer/2022062500/5697bfae1a28abf838c9ca49/html5/thumbnails/13.jpg)
Experimental results (DHPYLM)
![Page 14: A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation Frank Wood and Yee Whye Teh AISTATS 2009 Presented by: Mingyuan](https://reader036.vdocuments.site/reader036/viewer/2022062500/5697bfae1a28abf838c9ca49/html5/thumbnails/14.jpg)
Summary
• DHPYLM achieves encouraging domain adaptation results.
• A graphical Pitman-Yor process is constructed and a multi-floor Chinese restaurant representation is proposed for doing sampling.
• DHPYLM may be integrated into topic models to eliminate “bag-of-words” assumptions.