a hierarchical bayesian language model based on pitman-yor processes

10
A Hierarchical Bayesian Language Model based on Pitman-Yor Processes Yee Whye Teh Dicussed by Duan Xiangyu

Upload: gil-navarro

Post on 30-Dec-2015

26 views

Category:

Documents


0 download

DESCRIPTION

A Hierarchical Bayesian Language Model based on Pitman-Yor Processes. Yee Whye Teh Dicussed by Duan Xiangyu. Introduction. N-gram language model This paper introduces hierarchical Baysian model for the above, that is, to model - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Hierarchical Bayesian Language Model based on Pitman-Yor Processes

A Hierarchical Bayesian Language Model based on Pitman-Yo

r Processes

Yee Whye Teh

Dicussed by Duan Xiangyu

Page 2: A Hierarchical Bayesian Language Model based on Pitman-Yor Processes

Introduction

• N-gram language model

• This paper introduces hierarchical Baysian model for the above, that is, to model

• The hierarchical model in this paper is the hierarchical Pitman-Yor processes– Pitman-Yor processes can produce power-law distribution– Hierarchical structure is corresponding to smoothing techniques

in language modeling.

Page 3: A Hierarchical Bayesian Language Model based on Pitman-Yor Processes

Introduction of Pitman-Yor Processes

• Let W be a vocabulary of V words, G(w) be the probability of a word w, and G=[G(w)]w∈W is the vector of word probabilities.

– where base distribution G0=[G0(w)] w∈W, and G

0(w)=1/V

– d and θ are hyper-parameters.

Page 4: A Hierarchical Bayesian Language Model based on Pitman-Yor Processes

Generative Procedure of PYP

• A sequence of words: x1, x2,… drawn i.i.d from G• A sequence of draws y1, y2,… drawn i.i.d from G0

• With probability: , let xc.+1 = yk, that is, next word assigned to previous draw from G0

, let xc.+1 = yt+1, that is, next word assigned to new draw from G0

where t is the current number of draws from G0, ck is the number of

words assigned to yk, and .

This generative process of PYP exhibits rich get richer phenomenon

Page 5: A Hierarchical Bayesian Language Model based on Pitman-Yor Processes

Metaphor to the Generative Procedure of PYP

• Chinese Restaurant Process

Page 6: A Hierarchical Bayesian Language Model based on Pitman-Yor Processes

Hierarchical PYP Language Models

• Given context u, let Gu=[Gu(w)]w∈W

• π(u) is the suffix of u consisting of all but the earliest word. For example, u is “1 2 3”, then π(u) is “2 3”.

• Gπ(u)~ PY(d|π(u)|, θ|π(u)|, Gπ(π(u)))

• Until Gø ~ PY(d0, θ0, G0)

This is hierarchy

Page 7: A Hierarchical Bayesian Language Model based on Pitman-Yor Processes

Generative Procedure of Hierarchical PYP Language Models• Denotations:

– xu1, xu2,… drawn from Gu

– yu1, yu2,… drawn from Gπ(u)

– We use l to index x, use k to index y.

– tuwk=1 if yuk=w

– cuwk is the number of words xul=yuk=w

– We denote marginal counts by dots• cu.k is the number of words xul=yuk

• cuw. is the number of words xul=w

• tu.. is the number of draws yuk from Gπ(u)

Page 8: A Hierarchical Bayesian Language Model based on Pitman-Yor Processes

cont.

Page 9: A Hierarchical Bayesian Language Model based on Pitman-Yor Processes

Inference for Hierarchical PYP Language Models

• We are interested in predictive probability:

• We approximate it with {S(i),θ(i)}i=1I

where

Page 10: A Hierarchical Bayesian Language Model based on Pitman-Yor Processes

Gibbs Sampling for the Predictive Probability (of last slide)