cs 461 – oct. 10
DESCRIPTION
CS 461 – Oct. 10. Review PL grammar as needed How to tell if a word is in a CFL? Convert to PDA and run it. CYK algorithm Modern parsing techniques. Accepting input. How can we tell if a given source file (input stream of tokens) is a valid program? Language defined by CFG, so … - PowerPoint PPT PresentationTRANSCRIPT
CS 461 – Oct. 10
• Review PL grammar as needed
• How to tell if a word is in a CFL?– Convert to PDA and run it. – CYK algorithm– Modern parsing techniques
Accepting input
• How can we tell if a given source file (input stream of tokens) is a valid program?Language defined by CFG, so …– Can see if there is some derivation from grammar?– Can convert CFG to PDA?
• Exponential performance not acceptable. (e.g. doubling every time we add token)
• Two improvements:– CYK algorithm, runs in O(n3)– Bottom-up parsing, generally linear, but restrictions on
grammar.
CYK algorithm
• In 1965-67, discovered independently by Cocke, Younger, Kasami.
• Given any CFG and any string, can tell if grammar generates string.
• The grammar needs to be in CNF first.– This ensures that the rules are simple. Rules are of
the form X a or X YZ
• Consider all substrings of len 1 first. See if these are in language. Next try all len 2, len 3, …. up to length n.
continued
• Maintain results in an NxN table. Top right portion not used.– Example on right is for
testing word of length 3.
• Start at bottom; work your way up.
• For length 1, just look for “unit rules” in grammar, e.g. Xa.
1..3
X X1..2 2..3
X1..1 2..2 3..3
continued
• For general case i..j– Think of all possible
ways this string can be broken into 2 pieces.
– Ex. 1..3 = 1..2 + 3..3or 1..1 +
2..3– We want to know if
both pieces L. This handles rules of form A BC.
• Let’s try example from 3+7+. (in CNF)
1..3
X X1..2 2..3
X1..1 2..2 3..3
337 3+7+ ?
S AB
A 3 | AC
B 7 | BD
C 3
D 7
For each len 1 string, which variables generate it?
1..1 is 3. Rules A and C.
2..2 is 3. Rules A and C.
3..3 is 7. Rules B and D.
1..3
X X1..2 2..3
X1..1
A, C
2..2
A, C
3..3
B, D
337 3+7+ ?
S AB
A 3 | AC
B 7 | BD
C 3
D 7
Length 2:
1..2 = 1..1 + 2..2 =
(A or C)(A or C) = rule A
2..3 = 2..2 + 3..3 =
(A or C)(B or D) = rule S
1..3
X X1..2
A
2..3
S X1..1
A, C
2..2
A, C
3..3
B, D
337 3+7+ ?
S AB
A 3 | AC
B 7 | BD
C 3
D 7
Length 3: 2 cases for 1..3:
1..2 + 3..3: (A)(B or D) = S
1..1 + 2..3: (A or C)(S) no!
We only need one case to work.
1..3
S X X1..2
A
2..3
S X1..1
A, C
2..2
A, C
3..3
B, D
CYK example #2
Let’s test the word baabS AB | BCA BA | aB CC | bC AB | a
Length 1:‘a’ generated by A, C‘b’ generated by B
1..4X X X
1..3 2..4X X
1..2 2..3 3..4X
1..1
B
2..2
A, C
3..3
A, C
4..4
B
baab
S AB | BC
A BA | a
B CC | b
C AB | a
Length 2:
1..2 = 1..1 + 2..2 = (B)(A, C) = S,A
2..3 = 2..2 + 3..3 = (A,C)(A,C) = B
3..4 = 3..3 + 3..4 = (A,C)(B) = S,C
1..4X X X
1..3 2..4X X
1..2
S, A
2..3
B
3..4
S, C X1..1
B
2..2
A, C
3..3
A, C
4..4
B
baab
S AB | BC
A BA | a
B CC | b
C AB | a
Length 3: [ each has 2 chances! ]
1..3 = 1..2 + 3..3 = (S,A)(A,C) = Ø
1..3 = 1..1 + 2..3 = (B)(B) = Ø
2..4 = 2..3 + 4..4 = (B)(B) = Ø
2..4 = 2..2 + 3..4 = (A,C)(S,C) = B
1..4X X X
1..3
Ø
2..4
B X X1..2
S, A
2..3
B
3..4
S, C X1..1
B
2..2
A, C
3..3
A, C
4..4
B
Finally…
S AB | BCA BA | aB CC | bC AB | aLength 4 [has 3 chances!]1..4 = 1..3 + 4..4 = (Ø)(B) = Ø1..4 = 1..2 + 3..4 = (S,A)(S,C) = Ø1..4 = 1..1 + 2..4 = (B)(B) = Ø
Ø means we lose!baab L.
However, in general don’t give up if you encounter Ø in the middle of the process.
1..4
Ø X X X1..3
Ø
2..4
B X X1..2
S, A
2..3
B
3..4
S, C X1..1
B
2..2
A, C
3..3
A, C
4..4
B