march 1, 2009 dr. muhammed al-mulhem 1 ics 482 natural language processing probabilistic context...
Post on 17-Dec-2015
220 Views
Preview:
TRANSCRIPT
March 1, 2009 Dr. Muhammed Al-Mulhem 1
ICS 482ICS 482Natural Language Natural Language
ProcessingProcessing
Probabilistic Context Probabilistic Context Free Grammars Free Grammars
(Chapter 14)(Chapter 14)Muhammed Al-MulhemMuhammed Al-Mulhem
March 1, 2009March 1, 2009
March 1, 2009 2Dr. Muhammed Al-Mulhem
SCFGSCFG Probabilistic CFG (PCFG) or Stochastic CFG Probabilistic CFG (PCFG) or Stochastic CFG
(SCFG) is the simplest augmentation of the (SCFG) is the simplest augmentation of the CFG.CFG.
A CFG G is defined as (N, Σ, R, S), a SCFG is A CFG G is defined as (N, Σ, R, S), a SCFG is also defined as (N, Σ, R, S), where also defined as (N, Σ, R, S), where N is a set of non-terminal symbols.N is a set of non-terminal symbols. Σ a set of terminal symbols, (N Σ a set of terminal symbols, (N ∑= Ø).∑= Ø). R is a set of production rules, each of the form A R is a set of production rules, each of the form A
→ → ββ[p], where[p], where A A N N ββ (Σ (ΣN)*N)* P is a number between 0 and 1 expressing P(A → P is a number between 0 and 1 expressing P(A → ββ))
S is the start symbol, S S is the start symbol, S N. N.
March 1, 2009 3Dr. Muhammed Al-Mulhem
SCFGSCFG
P(A → P(A → ββ) expresses the probability that ) expresses the probability that the A will be expanded to the A will be expanded to ββ..
If we consider all the possible expansions of a If we consider all the possible expansions of a non-terminal A, the sum of their probabilities non-terminal A, the sum of their probabilities must be 1.must be 1.
1)(AP
March 1, 2009 4Dr. Muhammed Al-Mulhem
Example 1Example 1 Attach probabilities to grammar Attach probabilities to grammar
rulesrules The sum of the probabilities of a The sum of the probabilities of a
given non-terminal, such as VP, is 1given non-terminal, such as VP, is 1
VP VP Verb Verb .55.55
VP VP Verb NP Verb NP .40.40
VP VP Verb NP NP Verb NP NP .05.05
March 1, 2009 5Dr. Muhammed Al-Mulhem
Example2Example2
NP Det N : 0.4NP NPposs N : 0.1NP Pronoun : 0.2NP NP PP : 0.1NP N : 0.2
NP
Det
NP
N
PP
P(subtree above) = 0.1 x 0.4 = 0.04
March 1, 2009 6Dr. Muhammed Al-Mulhem
Example 3Example 3
March 1, 2009 7Dr. Muhammed Al-Mulhem
Example 3Example 3
March 1, 2009 8Dr. Muhammed Al-Mulhem
These Are the rules that generate the above trees (not the grammar)
March 1, 2009 9Dr. Muhammed Al-Mulhem
Example 3Example 3
The probabilities for the two parse trees are calculated by multiplying the probabilities of the production rules used to generate the parse tree.
P(T1)= .15x.40x.05x.05x.35x.75x.40x.40x.30x.40x.50 = 3.78x10-7
P(T2)= .15x.40x.40x.05x.05x.75x.40x.40x.30x.40x.50 = 4.32x10-7
March 1, 2009 10Dr. Muhammed Al-Mulhem
Example 4Example 4
S S → → NP VP 1.0 NP VP 1.0 PP PP → → P NP 1.0P NP 1.0 VP VP → → V NP 0.7V NP 0.7 VP VP → → VP PP 0.3VP PP 0.3 P P → → with 1.0with 1.0 V V → → saw 1.0saw 1.0
NP → NP PP 0.4 NP → astronomers 0.1 NP → ears 0.18 NP → saw 0.04 NP → stars 0.18 NP → telescopes 0.1
March 1, 2009 11Dr. Muhammed Al-Mulhem
Example 4: Astronomers saw stars with ears
March 1, 2009 12Dr. Muhammed Al-Mulhem
The probabilities of the two parse The probabilities of the two parse treestrees
P(t1) =1.0 x 0.1 x 0.7 x 1.0 x 0.4 x 0.18 x 1.0 x 1.0 x 0.18 = 0.0009072
P(t2) =1.0 x 0.1 x 0.3 x 0.7 x 1.0 x 0.18 x 1.0 x 1.0 x 0.18 = 0.0006804
March 1, 2009 13Dr. Muhammed Al-Mulhem
Example 4Example 4
S → NP VP 1.0
PP → P NP 1.0
VP → V NP 0.7
VP → VP PP 0.3
P → with 1.0
V → saw 1.0
NP → NP PP 0.4
NP → astronomers 0.1
NP → ears 0.18
NP → saw 0.04
NP → stars 0.18
NP → telescopes 0.1
P(t1) =1.0 x 0.1 x 0.7 x 1.0 x 0.4 x 0.18 x 1.0 x 1.0 x 0.18 = 0.0009072 P(t2) =1.0 x 0.1 x 0.3 x 0.7 x 1.0 x 0.18 x 1.0 x 1.0 x 0.18 = 0.0006804
March 1, 2009 14Dr. Muhammed Al-Mulhem
Probabilistic CFGsProbabilistic CFGs The probabilistic modelThe probabilistic model
Assigning probabilities to parse treesAssigning probabilities to parse trees Getting the probabilities for the Getting the probabilities for the
modelmodel Parsing with probabilitiesParsing with probabilities
Slight modification to dynamic Slight modification to dynamic programming approachprogramming approach
Task is to find the max probability tree Task is to find the max probability tree for an inputfor an input
March 1, 2009 15Dr. Muhammed Al-Mulhem
Getting the ProbabilitiesGetting the Probabilities From an annotated database (a From an annotated database (a
treebank)treebank) Learned from a corpusLearned from a corpus
March 1, 2009 16Dr. Muhammed Al-Mulhem
TreebankTreebank Get a large collection of parsed Get a large collection of parsed
sentences.sentences. Collect counts for each non-terminal Collect counts for each non-terminal
rule expansion in the collection.rule expansion in the collection. NormalizeNormalize DoneDone
March 1, 2009 17Dr. Muhammed Al-Mulhem
LearningLearning What if you don’t have a treebank (and What if you don’t have a treebank (and
can’t get one).can’t get one). Take a large collection of text and parse it.Take a large collection of text and parse it. In the case of syntactically ambiguous In the case of syntactically ambiguous
sentences collect all the possible parses.sentences collect all the possible parses. Prorate the rule statistics gathered for Prorate the rule statistics gathered for
rules in the ambiguous case by their rules in the ambiguous case by their probability.probability.
Proceed as you did with a treebank.Proceed as you did with a treebank. Inside-OutsideInside-Outside algorithm. algorithm.
March 1, 2009 18Dr. Muhammed Al-Mulhem
AssumptionsAssumptions
We’re assuming that there is a grammar We’re assuming that there is a grammar to be used to parse with.to be used to parse with.
We’re assuming the existence of a large We’re assuming the existence of a large robust dictionary with parts of speech.robust dictionary with parts of speech.
We’re assuming the ability to parse (i.e. We’re assuming the ability to parse (i.e. a parser).a parser).
Given all that… we can parse Given all that… we can parse probabilistically. probabilistically.
March 1, 2009 19Dr. Muhammed Al-Mulhem
Typical ApproachTypical Approach
Bottom-up dynamic programming Bottom-up dynamic programming approachapproach
Assign probabilities to constituents as Assign probabilities to constituents as they are completed and placed in the they are completed and placed in the tabletable
Use the max probability for each Use the max probability for each constituent going up.constituent going up.
March 1, 2009 20Dr. Muhammed Al-Mulhem
Max probabilityMax probability Say we’re talking about a final part Say we’re talking about a final part
of a parseof a parse SS0 0 NP NPiiVPVPjj
The probability of the S is…The probability of the S is…
P(S P(S NP VP)* NP VP)*P(NP)*P(VP)P(NP)*P(VP)
The yellow part is already known. The yellow part is already known. We’re doing bottom-up parsingWe’re doing bottom-up parsing
March 1, 2009 21Dr. Muhammed Al-Mulhem
MaxMax The P(NP) is known.The P(NP) is known. What if there are multiple NPs for What if there are multiple NPs for
the span of text in question (the span of text in question (00 to to ii)?)? Take the max (Why?)Take the max (Why?) Does not mean that other kinds of Does not mean that other kinds of
constituents for the same span are constituents for the same span are ignored (i.e. they might be in the ignored (i.e. they might be in the solution)solution)
March 1, 2009 22Dr. Muhammed Al-Mulhem
Probabilistic ParsingProbabilistic Parsing Probabilistic CYK (Cocke-Younger-Probabilistic CYK (Cocke-Younger-
Kasami) algorithm for parsing PCFG Kasami) algorithm for parsing PCFG Bottom-up dynamic programming Bottom-up dynamic programming
algorithm algorithm Assume PCFG is in Chomsky Assume PCFG is in Chomsky
Normal Form (production is either Normal Form (production is either A → B C or A → A → B C or A → aa) )
March 1, 2009 23Dr. Muhammed Al-Mulhem
Chomsky Normal Form Chomsky Normal Form (CNF)(CNF)
All rules have form:
BCA
Non-Terminal Non-Terminal
aA and
terminal
March 1, 2009 24Dr. Muhammed Al-Mulhem
April 18, 2023 24
Examples:
bA
SAA
aS
ASS
Not ChomskyNormal Form
aaA
SAA
AASS
ASS
Chomsky Normal Form
March 1, 2009 25Dr. Muhammed Al-Mulhem
April 18, 2023 25
Observations
Chomsky normal forms are good for parsing and proving theorems
It is possible to find the Chomsky normal form of any context-free grammar
March 1, 2009 26Dr. Muhammed Al-Mulhem
April 18, 2023 26
Probabilistic CYK Parsing Probabilistic CYK Parsing of PCFGsof PCFGs
CYK Algorithm: bottom-up parser CYK Algorithm: bottom-up parser Input: Input:
A Chomsky normal form PCFG, G= (N, Σ, P, S, D) A Chomsky normal form PCFG, G= (N, Σ, P, S, D) Assume that the N non-terminals have indices 1, Assume that the N non-terminals have indices 1, 2, …, |N|, and the start symbol S has index 1 2, …, |N|, and the start symbol S has index 1
n n words words ww11,…, ,…, wwnn Data Structure: Data Structure:
A dynamic programming array πA dynamic programming array π[i,j,a] [i,j,a] holds the holds the maximum probability for a constituent with non-maximum probability for a constituent with non-terminal index terminal index a a spanning words spanning words i..ji..j. .
Output: Output: The maximum probability parse πThe maximum probability parse π[1,n,1] [1,n,1]
March 1, 2009 27Dr. Muhammed Al-Mulhem
April 18, 2023 27
Base CaseBase Case
CYK fills out πCYK fills out π[i,j,a] [i,j,a] by induction by induction Base case Base case
Input strings with length = 1 Input strings with length = 1 (individual words (individual words wwii) )
In CNF, the probability of a given non-In CNF, the probability of a given non-terminal A expanding to a single word terminal A expanding to a single word wwii must come only from the rule A → must come only from the rule A → wwi i
ii.e., P(A → .e., P(A → wwii) )
March 1, 2009 28Dr. Muhammed Al-Mulhem
April 18, 2023 28
Probabilistic CYK Probabilistic CYK Algorithm Algorithm [[CorrectedCorrected]]
Function Function CYK(CYK(wordswords, , grammargrammar) ) return return the most probable parse and its probability the most probable parse and its probability
For For i ←1 i ←1 to to num_wordsnum_words for for aa ←1 ←1 to to num_nonterminalsnum_nonterminals If If ((A →wA →wii) is in grammar ) is in grammar then then π[i, i, a] ←P(A →wπ[i, i, a] ←P(A →wii))
For For spanspan ←2 ←2 to to num_wordsnum_words For For beginbegin ←1 ←1 to to num_wordsnum_words – – spanspan + 1 + 1 endend ← ←beginbegin + + spanspan – 1 – 1 For For mm ← ←beginbegin to to endend – 1 – 1 For For aa ←1 ←1 to to num_nonterminals num_nonterminals For For bb ←1 ←1 to to num_nonterminalsnum_nonterminals For For cc ←1 ←1 to to num_nonterminalsnum_nonterminals prob ←π[begin, m, b] × π[m+1, end, c] × P(A →BC)prob ←π[begin, m, b] × π[m+1, end, c] × P(A →BC) If If ((prob > π[begin, end, aprob > π[begin, end, a]) ]) then then π[begin, end, a] = prob π[begin, end, a] = prob back[begin, end, a] = {m, b, c}back[begin, end, a] = {m, b, c}
Return Return build_tree(back[1, num_words, 1]), π[1, num_words, 1]build_tree(back[1, num_words, 1]), π[1, num_words, 1]
March 1, 2009 29Dr. Muhammed Al-Mulhem
April 18, 2023 29
The CYK Membership Algorithm
Input:
• Grammar in Chomsky Normal Form G
• String
Output:
find if )(GLw
w
March 1, 2009 30Dr. Muhammed Al-Mulhem
April 18, 2023 30
The Algorithm
• Grammar :G
bB
ABB
aA
BBA
ABS
• String : w aabbb
Input example:
March 1, 2009 31Dr. Muhammed Al-Mulhem
April 18, 2023 31
a
a b b b
aa ab bb bb
aab abb bbb
aabb abbb
aabbb
aabbb
All substrings of length 1
All substrings of length 2
All substrings of length 3
All substrings of length 4
All substrings of length 5
March 1, 2009 32Dr. Muhammed Al-Mulhem
April 18, 2023 32
aA
aA
bB
bB
bB
aa ab bb bb
aab abb bbb
aabb abbb
aabbb
bB
ABB
aA
BBA
ABS
March 1, 2009 33Dr. Muhammed Al-Mulhem
April 18, 2023 33
aA
aA
bB
bB
bB
aa abS,B
bbA
bbA
aab abb bbb
aabb abbb
aabbb
bB
ABB
aA
BBA
ABS
March 1, 2009 34Dr. Muhammed Al-Mulhem
April 18, 2023 34
aA
aA
bB
bB
bB
aa abS,B
bbA
bbA
aabS,B
abbA
bbbS,B
aabbA
abbbS,B
aabbbS,B
bB
ABB
aA
BBA
ABS
Therefore: )(GLaabbb
March 1, 2009 35Dr. Muhammed Al-Mulhem
April 18, 2023 35
CYK Algorithm for CYK Algorithm for Deciding Context Free Deciding Context Free
LanguagesLanguages
IDEA: For each substring of a given IDEA: For each substring of a given input input xx, find all variables which can , find all variables which can derive the substring. Once these have derive the substring. Once these have been found, telling which variables been found, telling which variables generate generate x x becomes a simple matter becomes a simple matter of looking at the grammar, since it’s of looking at the grammar, since it’s in Chomsky normal formin Chomsky normal form
top related