march 1, 2009 dr. muhammed al-mulhem 1 ics 482 natural language processing probabilistic context...

March 1, 2009 Dr. Muhammed Al-Mulhem 1

ICS 482ICS 482Natural Language Natural Language

ProcessingProcessing

Probabilistic Context Probabilistic Context Free Grammars Free Grammars

(Chapter 14)(Chapter 14)Muhammed Al-MulhemMuhammed Al-Mulhem

March 1, 2009March 1, 2009

March 1, 2009 2Dr. Muhammed Al-Mulhem

SCFGSCFG Probabilistic CFG (PCFG) or Stochastic CFG Probabilistic CFG (PCFG) or Stochastic CFG

(SCFG) is the simplest augmentation of the (SCFG) is the simplest augmentation of the CFG.CFG.

A CFG G is defined as (N, Σ, R, S), a SCFG is A CFG G is defined as (N, Σ, R, S), a SCFG is also defined as (N, Σ, R, S), where also defined as (N, Σ, R, S), where N is a set of non-terminal symbols.N is a set of non-terminal symbols. Σ a set of terminal symbols, (N Σ a set of terminal symbols, (N ∑= Ø).∑= Ø). R is a set of production rules, each of the form A R is a set of production rules, each of the form A

→ → ββ[p], where[p], where A A N N ββ (Σ (ΣN)*N)* P is a number between 0 and 1 expressing P(A → P is a number between 0 and 1 expressing P(A → ββ))

S is the start symbol, S S is the start symbol, S N. N.

SCFGSCFG

P(A → P(A → ββ) expresses the probability that ) expresses the probability that the A will be expanded to the A will be expanded to ββ..

If we consider all the possible expansions of a If we consider all the possible expansions of a non-terminal A, the sum of their probabilities non-terminal A, the sum of their probabilities must be 1.must be 1.

Example 1Example 1 Attach probabilities to grammar Attach probabilities to grammar

rulesrules The sum of the probabilities of a The sum of the probabilities of a

given non-terminal, such as VP, is 1given non-terminal, such as VP, is 1

VP VP Verb Verb .55.55

VP VP Verb NP Verb NP .40.40

VP VP Verb NP NP Verb NP NP .05.05

Example2Example2

NP Det N : 0.4NP NPposs N : 0.1NP Pronoun : 0.2NP NP PP : 0.1NP N : 0.2

P(subtree above) = 0.1 x 0.4 = 0.04

Example 3Example 3

These Are the rules that generate the above trees (not the grammar)

Example 3Example 3

The probabilities for the two parse trees are calculated by multiplying the probabilities of the production rules used to generate the parse tree.

P(T1)= .15x.40x.05x.05x.35x.75x.40x.40x.30x.40x.50 = 3.78x10-7

P(T2)= .15x.40x.40x.05x.05x.75x.40x.40x.30x.40x.50 = 4.32x10-7

Example 4Example 4

S S → → NP VP 1.0 NP VP 1.0 PP PP → → P NP 1.0P NP 1.0 VP VP → → V NP 0.7V NP 0.7 VP VP → → VP PP 0.3VP PP 0.3 P P → → with 1.0with 1.0 V V → → saw 1.0saw 1.0

NP → NP PP 0.4 NP → astronomers 0.1 NP → ears 0.18 NP → saw 0.04 NP → stars 0.18 NP → telescopes 0.1

Example 4: Astronomers saw stars with ears

The probabilities of the two parse The probabilities of the two parse treestrees

P(t1) =1.0 x 0.1 x 0.7 x 1.0 x 0.4 x 0.18 x 1.0 x 1.0 x 0.18 = 0.0009072

P(t2) =1.0 x 0.1 x 0.3 x 0.7 x 1.0 x 0.18 x 1.0 x 1.0 x 0.18 = 0.0006804

Example 4Example 4

S → NP VP 1.0

PP → P NP 1.0

VP → V NP 0.7

VP → VP PP 0.3

P → with 1.0

V → saw 1.0

NP → NP PP 0.4

NP → astronomers 0.1

NP → ears 0.18

NP → saw 0.04

NP → stars 0.18

NP → telescopes 0.1

P(t1) =1.0 x 0.1 x 0.7 x 1.0 x 0.4 x 0.18 x 1.0 x 1.0 x 0.18 = 0.0009072 P(t2) =1.0 x 0.1 x 0.3 x 0.7 x 1.0 x 0.18 x 1.0 x 1.0 x 0.18 = 0.0006804

Probabilistic CFGsProbabilistic CFGs The probabilistic modelThe probabilistic model

Assigning probabilities to parse treesAssigning probabilities to parse trees Getting the probabilities for the Getting the probabilities for the

modelmodel Parsing with probabilitiesParsing with probabilities

Slight modification to dynamic Slight modification to dynamic programming approachprogramming approach

Task is to find the max probability tree Task is to find the max probability tree for an inputfor an input

Getting the ProbabilitiesGetting the Probabilities From an annotated database (a From an annotated database (a

treebank)treebank) Learned from a corpusLearned from a corpus

TreebankTreebank Get a large collection of parsed Get a large collection of parsed

sentences.sentences. Collect counts for each non-terminal Collect counts for each non-terminal

rule expansion in the collection.rule expansion in the collection. NormalizeNormalize DoneDone

LearningLearning What if you don’t have a treebank (and What if you don’t have a treebank (and

can’t get one).can’t get one). Take a large collection of text and parse it.Take a large collection of text and parse it. In the case of syntactically ambiguous In the case of syntactically ambiguous

sentences collect all the possible parses.sentences collect all the possible parses. Prorate the rule statistics gathered for Prorate the rule statistics gathered for

rules in the ambiguous case by their rules in the ambiguous case by their probability.probability.

Proceed as you did with a treebank.Proceed as you did with a treebank. Inside-OutsideInside-Outside algorithm. algorithm.

AssumptionsAssumptions

We’re assuming that there is a grammar We’re assuming that there is a grammar to be used to parse with.to be used to parse with.

We’re assuming the existence of a large We’re assuming the existence of a large robust dictionary with parts of speech.robust dictionary with parts of speech.

We’re assuming the ability to parse (i.e. We’re assuming the ability to parse (i.e. a parser).a parser).

Given all that… we can parse Given all that… we can parse probabilistically. probabilistically.

Typical ApproachTypical Approach

Bottom-up dynamic programming Bottom-up dynamic programming approachapproach

Assign probabilities to constituents as Assign probabilities to constituents as they are completed and placed in the they are completed and placed in the tabletable

Use the max probability for each Use the max probability for each constituent going up.constituent going up.

Max probabilityMax probability Say we’re talking about a final part Say we’re talking about a final part

of a parseof a parse SS0 0 NP NPiiVPVPjj

The probability of the S is…The probability of the S is…

P(S P(S NP VP)* NP VP)*P(NP)*P(VP)P(NP)*P(VP)

The yellow part is already known. The yellow part is already known. We’re doing bottom-up parsingWe’re doing bottom-up parsing

MaxMax The P(NP) is known.The P(NP) is known. What if there are multiple NPs for What if there are multiple NPs for

the span of text in question (the span of text in question (00 to to ii)?)? Take the max (Why?)Take the max (Why?) Does not mean that other kinds of Does not mean that other kinds of

constituents for the same span are constituents for the same span are ignored (i.e. they might be in the ignored (i.e. they might be in the solution)solution)

Probabilistic ParsingProbabilistic Parsing Probabilistic CYK (Cocke-Younger-Probabilistic CYK (Cocke-Younger-

Kasami) algorithm for parsing PCFG Kasami) algorithm for parsing PCFG Bottom-up dynamic programming Bottom-up dynamic programming

algorithm algorithm Assume PCFG is in Chomsky Assume PCFG is in Chomsky

Normal Form (production is either Normal Form (production is either A → B C or A → A → B C or A → aa) )

Chomsky Normal Form Chomsky Normal Form (CNF)(CNF)

All rules have form:

Non-Terminal Non-Terminal

aA and

terminal

April 18, 2023 24

Examples:

Not ChomskyNormal Form

Chomsky Normal Form

April 18, 2023 25

Observations

Chomsky normal forms are good for parsing and proving theorems

It is possible to find the Chomsky normal form of any context-free grammar

April 18, 2023 26

Probabilistic CYK Parsing Probabilistic CYK Parsing of PCFGsof PCFGs

CYK Algorithm: bottom-up parser CYK Algorithm: bottom-up parser Input: Input:

A Chomsky normal form PCFG, G= (N, Σ, P, S, D) A Chomsky normal form PCFG, G= (N, Σ, P, S, D) Assume that the N non-terminals have indices 1, Assume that the N non-terminals have indices 1, 2, …, |N|, and the start symbol S has index 1 2, …, |N|, and the start symbol S has index 1

n n words words ww11,…, ,…, wwnn Data Structure: Data Structure:

A dynamic programming array πA dynamic programming array π[i,j,a] [i,j,a] holds the holds the maximum probability for a constituent with non-maximum probability for a constituent with non-terminal index terminal index a a spanning words spanning words i..ji..j. .

Output: Output: The maximum probability parse πThe maximum probability parse π[1,n,1] [1,n,1]

April 18, 2023 27

Base CaseBase Case

CYK fills out πCYK fills out π[i,j,a] [i,j,a] by induction by induction Base case Base case

Input strings with length = 1 Input strings with length = 1 (individual words (individual words wwii) )

In CNF, the probability of a given non-In CNF, the probability of a given non-terminal A expanding to a single word terminal A expanding to a single word wwii must come only from the rule A → must come only from the rule A → wwi i

ii.e., P(A → .e., P(A → wwii) )

April 18, 2023 28

Probabilistic CYK Probabilistic CYK Algorithm Algorithm [[CorrectedCorrected]]

Function Function CYK(CYK(wordswords, , grammargrammar) ) return return the most probable parse and its probability the most probable parse and its probability

For For i ←1 i ←1 to to num_wordsnum_words for for aa ←1 ←1 to to num_nonterminalsnum_nonterminals If If ((A →wA →wii) is in grammar ) is in grammar then then π[i, i, a] ←P(A →wπ[i, i, a] ←P(A →wii))

For For spanspan ←2 ←2 to to num_wordsnum_words For For beginbegin ←1 ←1 to to num_wordsnum_words – – spanspan + 1 + 1 endend ← ←beginbegin + + spanspan – 1 – 1 For For mm ← ←beginbegin to to endend – 1 – 1 For For aa ←1 ←1 to to num_nonterminals num_nonterminals For For bb ←1 ←1 to to num_nonterminalsnum_nonterminals For For cc ←1 ←1 to to num_nonterminalsnum_nonterminals prob ←π[begin, m, b] × π[m+1, end, c] × P(A →BC)prob ←π[begin, m, b] × π[m+1, end, c] × P(A →BC) If If ((prob > π[begin, end, aprob > π[begin, end, a]) ]) then then π[begin, end, a] = prob π[begin, end, a] = prob back[begin, end, a] = {m, b, c}back[begin, end, a] = {m, b, c}

Return Return build_tree(back[1, num_words, 1]), π[1, num_words, 1]build_tree(back[1, num_words, 1]), π[1, num_words, 1]

April 18, 2023 29

The CYK Membership Algorithm

Input:

• Grammar in Chomsky Normal Form G

• String

Output:

find if )(GLw

April 18, 2023 30

The Algorithm

• Grammar :G

• String : w aabbb

Input example:

April 18, 2023 31

a b b b

aa ab bb bb

aab abb bbb

aabb abbb

All substrings of length 1

April 18, 2023 32

aa ab bb bb

aab abb bbb

aabb abbb

April 18, 2023 33

aa abS,B

aab abb bbb

aabb abbb

April 18, 2023 34

aa abS,B

aabS,B

bbbS,B

abbbS,B

aabbbS,B

Therefore: )(GLaabbb

April 18, 2023 35

CYK Algorithm for CYK Algorithm for Deciding Context Free Deciding Context Free

LanguagesLanguages

IDEA: For each substring of a given IDEA: For each substring of a given input input xx, find all variables which can , find all variables which can derive the substring. Once these have derive the substring. Once these have been found, telling which variables been found, telling which variables generate generate x x becomes a simple matter becomes a simple matter of looking at the grammar, since it’s of looking at the grammar, since it’s in Chomsky normal formin Chomsky normal form

march 1, 2009 dr. muhammed al-mulhem 1 ics 482 natural language processing probabilistic context...

Documents

muhammed siyah kalem

47 muhammed

dr. muhammed al-mulhem march 1, 2009

march 1, 2009dr. muhammed al-mulhem1 ics 415 computer...

march 1, 2009dr. muhammed al-mulhem1 ics 415 computer...

prophet muhammed

muhammed j. al-muhammed david w. embley brigham young...

march 1, 2009dr. muhammed al-mulhem1 ics 415 computer...

peygamberİmİz hz. muhammed

hazretİ muhammed (sav)

graphics output primitives drawing line, circle and ellipse...

muhammed nur

muhammed boziji poslanik

muhammed otopark

mrach 1, 2009dr. muhammed al-mulhem1 ics482 formal grammars...

muhammed sÜresi muhammed b. selam · muhammed siddik han...

march 1, 2009dr. muhammed al-mulhem1 ics 415 computer...

from database to web multimedia documents · pdf filefrom...

tazkirah moulana muhammed haroon kandhelwi by muhammed saani

muhammed jafar iqbal