csci 3130: automata theory and formal languages andrej bogdanov andrejb/csc3130 the chinese...
TRANSCRIPT
![Page 1: CSCI 3130: Automata theory and formal languages Andrej Bogdanov andrejb/csc3130 The Chinese University of Hong Kong Ambiguity](https://reader036.vdocuments.site/reader036/viewer/2022062301/56649ed45503460f94be4d27/html5/thumbnails/1.jpg)
CSCI 3130: Automata theory and formal languages
Andrej Bogdanov
http://www.cse.cuhk.edu.hk/~andrejb/csc3130
The Chinese University of Hong Kong
AmbiguityParsing algorithm for CFGs
Fall 2010
![Page 2: CSCI 3130: Automata theory and formal languages Andrej Bogdanov andrejb/csc3130 The Chinese University of Hong Kong Ambiguity](https://reader036.vdocuments.site/reader036/viewer/2022062301/56649ed45503460f94be4d27/html5/thumbnails/2.jpg)
Ambiguity
• A grammar is ambiguous if some strings have more than one parse tree
1+2*2
E
E E+
E E*V
V V1
2 2
E
E E*
E E+ V
V V
1 2
2
E E + E | E * E | (E) | N
N 1N | 2N | 1 | 2
= 5 = 6
![Page 3: CSCI 3130: Automata theory and formal languages Andrej Bogdanov andrejb/csc3130 The Chinese University of Hong Kong Ambiguity](https://reader036.vdocuments.site/reader036/viewer/2022062301/56649ed45503460f94be4d27/html5/thumbnails/3.jpg)
Disambiguation
• Sometimes we can rewrite the grammar to remove the ambiguity
E E + E | E * E | (E) | N
N 1N | 2N | 1 | 2same precedence!
Divide expression into terms and factors
2 * (1 + 2 * 2)F F
TT
F F
![Page 4: CSCI 3130: Automata theory and formal languages Andrej Bogdanov andrejb/csc3130 The Chinese University of Hong Kong Ambiguity](https://reader036.vdocuments.site/reader036/viewer/2022062301/56649ed45503460f94be4d27/html5/thumbnails/4.jpg)
Disambiguation
E E + E | E * E | (E) | N
N 1N | 2N | 1 | 2
E T | E + TAn expression is a sum of one or more terms
Each term is a product of one or more factors T F | T * F
Each factor is a parenthesizedexpression or a number F (E) | 1 | 2
![Page 5: CSCI 3130: Automata theory and formal languages Andrej Bogdanov andrejb/csc3130 The Chinese University of Hong Kong Ambiguity](https://reader036.vdocuments.site/reader036/viewer/2022062301/56649ed45503460f94be4d27/html5/thumbnails/5.jpg)
Parsing example
2 * (1 + 1 + 2 * 2) + 1
E T | E + TT F | T * FF (E) | 1 | 2
E
TTE +
T F*E( )
TF
F F
F
FTE +
TE + FT *
![Page 6: CSCI 3130: Automata theory and formal languages Andrej Bogdanov andrejb/csc3130 The Chinese University of Hong Kong Ambiguity](https://reader036.vdocuments.site/reader036/viewer/2022062301/56649ed45503460f94be4d27/html5/thumbnails/6.jpg)
Disambiguation
• Disambiguation is not always possible– There exist inherently ambiguous languages– There is no general procedure for disambiguation
• In programming languages, ambiguity comes from precedence rules, and we can do like in example
• In English, ambiguity is sometimes a problem:
He ate the cookies on the floor
![Page 7: CSCI 3130: Automata theory and formal languages Andrej Bogdanov andrejb/csc3130 The Chinese University of Hong Kong Ambiguity](https://reader036.vdocuments.site/reader036/viewer/2022062301/56649ed45503460f94be4d27/html5/thumbnails/7.jpg)
Parsing
• Do we have a method for building a parse tree?
• Can we tell if the parse tree is unique?
S → 0S1 | 1S0S1 | TT → S |
input: 00111
![Page 8: CSCI 3130: Automata theory and formal languages Andrej Bogdanov andrejb/csc3130 The Chinese University of Hong Kong Ambiguity](https://reader036.vdocuments.site/reader036/viewer/2022062301/56649ed45503460f94be4d27/html5/thumbnails/8.jpg)
First attempt
• Maybe we can try all possible derivations:
S → 0S1 | 1S0S1 | TT → S | x = 00111
S 0S1
1S0S1
T
00S1101S0S110T1
S
10S10S1...
when do we stop?
![Page 9: CSCI 3130: Automata theory and formal languages Andrej Bogdanov andrejb/csc3130 The Chinese University of Hong Kong Ambiguity](https://reader036.vdocuments.site/reader036/viewer/2022062301/56649ed45503460f94be4d27/html5/thumbnails/9.jpg)
Problems
• How do we know when to stop?
S → 0S1 | 1S0S1 | TT → S | x = 00111
S 0S1
1S0S1
00S1101S0S110T1
10S10S1...
when do we stop?
![Page 10: CSCI 3130: Automata theory and formal languages Andrej Bogdanov andrejb/csc3130 The Chinese University of Hong Kong Ambiguity](https://reader036.vdocuments.site/reader036/viewer/2022062301/56649ed45503460f94be4d27/html5/thumbnails/10.jpg)
Problems
• Idea: Stop derivation when length exceeds |x|
• Not right because of -productions
• We want to eliminate -productions
S → 0S1 | 1S0S1 | TT → S | x = 01011
S 0S1 01S0S11 01S011 010111 3 7 6 5
![Page 11: CSCI 3130: Automata theory and formal languages Andrej Bogdanov andrejb/csc3130 The Chinese University of Hong Kong Ambiguity](https://reader036.vdocuments.site/reader036/viewer/2022062301/56649ed45503460f94be4d27/html5/thumbnails/11.jpg)
Problems
• Loops among the variables (S → T → S) might make us go forever
• We want to eliminate such loops
S → 0S1 | 1S0S1 | TT → S | x = 00111
![Page 12: CSCI 3130: Automata theory and formal languages Andrej Bogdanov andrejb/csc3130 The Chinese University of Hong Kong Ambiguity](https://reader036.vdocuments.site/reader036/viewer/2022062301/56649ed45503460f94be4d27/html5/thumbnails/12.jpg)
Removal of -productions
• A variable N is nullable if there is a derivation
• How to remove -productions
Find all nullable variables NFor every production of the form A → N,
add another production A → If N → is a production, remove itIf S is nullable, add the special production S →
N *
![Page 13: CSCI 3130: Automata theory and formal languages Andrej Bogdanov andrejb/csc3130 The Chinese University of Hong Kong Ambiguity](https://reader036.vdocuments.site/reader036/viewer/2022062301/56649ed45503460f94be4d27/html5/thumbnails/13.jpg)
Example
• Find the nullable variables
S ACDA aB C ED | D BC | bE b
B C D
nullable variablesgrammar
Find all nullable variables
![Page 14: CSCI 3130: Automata theory and formal languages Andrej Bogdanov andrejb/csc3130 The Chinese University of Hong Kong Ambiguity](https://reader036.vdocuments.site/reader036/viewer/2022062301/56649ed45503460f94be4d27/html5/thumbnails/14.jpg)
Finding nullable variables
• To find nullable variables, we work backwards– First, mark all variables A s.t. A as nullable– Then, as long as there are productions of the form
where all of A1,…, Ak are marked as nullable, mark A as nullable
A → A1… Ak
![Page 15: CSCI 3130: Automata theory and formal languages Andrej Bogdanov andrejb/csc3130 The Chinese University of Hong Kong Ambiguity](https://reader036.vdocuments.site/reader036/viewer/2022062301/56649ed45503460f94be4d27/html5/thumbnails/15.jpg)
Eliminating -productions
S ACDA aB C ED | D BC | bE b
nullable variables: B, C, D
For every production of the form A → N,add another production A →
If N → is a production, remove it
D CS ADD BD S ACS AC E
![Page 16: CSCI 3130: Automata theory and formal languages Andrej Bogdanov andrejb/csc3130 The Chinese University of Hong Kong Ambiguity](https://reader036.vdocuments.site/reader036/viewer/2022062301/56649ed45503460f94be4d27/html5/thumbnails/16.jpg)
Dealing with loops
• A unit production is a production of the form
where A1 and A2 are both variables
• Example
A1 → A2
S → 0S1 | 1S0S1 | TT → S | R | R → 0SR
grammar: unit productions:
S T
R
![Page 17: CSCI 3130: Automata theory and formal languages Andrej Bogdanov andrejb/csc3130 The Chinese University of Hong Kong Ambiguity](https://reader036.vdocuments.site/reader036/viewer/2022062301/56649ed45503460f94be4d27/html5/thumbnails/17.jpg)
Removal of unit productions
• If there is a cycle of unit productions
delete it and replace everything with A1
• Example
A1 → A2 → ... → Ak → A1
S → 0S1 | 1S0S1 | TT → S | R | R → 0SR
S T
R
S → 0S1 | 1S0S1S → R | R → 0SR
T is replaced by S in the {S, T} cycle
![Page 18: CSCI 3130: Automata theory and formal languages Andrej Bogdanov andrejb/csc3130 The Chinese University of Hong Kong Ambiguity](https://reader036.vdocuments.site/reader036/viewer/2022062301/56649ed45503460f94be4d27/html5/thumbnails/18.jpg)
Removal of unit productions
• For other unit productions, replace every chain
by productions A1 → ,... , Ak →
• Example
A1 → A2 → ... → Ak →
S → R → 0SR is replaced by S → 0SR, R → 0SR
S → 0S1 | 1S0S1 | R | R → 0SR
S → 0S1 | 1S0S1 | 0SR | R → 0SR
![Page 19: CSCI 3130: Automata theory and formal languages Andrej Bogdanov andrejb/csc3130 The Chinese University of Hong Kong Ambiguity](https://reader036.vdocuments.site/reader036/viewer/2022062301/56649ed45503460f94be4d27/html5/thumbnails/19.jpg)
Recap
• After eliminating -productions and unit productions, we know that every derivation
doesn’t shrink in length and doesn’t go into cycles
• Exception: S → – We will not use this rule at all, except to check if L
• Note -productions must be eliminated before unit
productions
S a1…ak where a1, …, ak are terminals*
![Page 20: CSCI 3130: Automata theory and formal languages Andrej Bogdanov andrejb/csc3130 The Chinese University of Hong Kong Ambiguity](https://reader036.vdocuments.site/reader036/viewer/2022062301/56649ed45503460f94be4d27/html5/thumbnails/20.jpg)
Example: testing membership
S → 0S1 | 1S0S1 | TT → S |
x = 00111
S → | 01 | 101 | 0S1 |10S1 | 1S01 | 1S0S1
S 01, 101
10S1
1S01
1S0S1
10011, strings of length ≥ 6
10101, strings of length ≥ 6
unit, -prod
eliminate
only strings of length ≥ 6
0S1 0011, 0101100S11strings of length ≥ 6
only strings of length ≥ 6
![Page 21: CSCI 3130: Automata theory and formal languages Andrej Bogdanov andrejb/csc3130 The Chinese University of Hong Kong Ambiguity](https://reader036.vdocuments.site/reader036/viewer/2022062301/56649ed45503460f94be4d27/html5/thumbnails/21.jpg)
Algorithm 1 for testing membership• How to check if a string x ≠ is in L(G)
Eliminate all -productions and unit productionsLet X := SWhile some new rule R can be applied to X
Apply R to XIf X = x, you have found a
derivation for xIf |X| > |x|, backtrack
If no more rules can be applied to X, x is not in L
![Page 22: CSCI 3130: Automata theory and formal languages Andrej Bogdanov andrejb/csc3130 The Chinese University of Hong Kong Ambiguity](https://reader036.vdocuments.site/reader036/viewer/2022062301/56649ed45503460f94be4d27/html5/thumbnails/22.jpg)
Practical limitations of Algorithm I
• This method can be very slow if x is long
• There is a faster algorithm, but it requires that we do some more transformations on the grammar
G = CFG of the java programming languagex = code for a 200-line java program
algorithm might take about 10200 steps!
![Page 23: CSCI 3130: Automata theory and formal languages Andrej Bogdanov andrejb/csc3130 The Chinese University of Hong Kong Ambiguity](https://reader036.vdocuments.site/reader036/viewer/2022062301/56649ed45503460f94be4d27/html5/thumbnails/23.jpg)
Chomsky Normal Form
• A CFG is in Chomsky Normal Form if every production (except S → ) is
• Convert to Chomsky Normal Form:
A → BC A → aor
A → BcDEreplace terminalswith new variables
A → BCDEC → c
break upsequenceswith new variables
A → BX1
X1 → CX2
X2 → DEC → c
Noam Chomsky
![Page 24: CSCI 3130: Automata theory and formal languages Andrej Bogdanov andrejb/csc3130 The Chinese University of Hong Kong Ambiguity](https://reader036.vdocuments.site/reader036/viewer/2022062301/56649ed45503460f94be4d27/html5/thumbnails/24.jpg)
Algorithm 2 for testing membership
S AB | BCA BA | aB CC | bC AB | a
x = baaba
Idea: We generate each substring of x bottom up
ab b aa
ACB B ACAC
BSA SASC
B– B
SAC–
SAC
![Page 25: CSCI 3130: Automata theory and formal languages Andrej Bogdanov andrejb/csc3130 The Chinese University of Hong Kong Ambiguity](https://reader036.vdocuments.site/reader036/viewer/2022062301/56649ed45503460f94be4d27/html5/thumbnails/25.jpg)
Parse tree reconstruction
S AB | BCA BA | aB CC | bC AB | a
x = baabaab b aa
ACB B ACAC
BSA SASC
B– B
SAC–
SAC
Tracing back the derivations, we obtain the parse tree
![Page 26: CSCI 3130: Automata theory and formal languages Andrej Bogdanov andrejb/csc3130 The Chinese University of Hong Kong Ambiguity](https://reader036.vdocuments.site/reader036/viewer/2022062301/56649ed45503460f94be4d27/html5/thumbnails/26.jpg)
Cocke-Younger-Kasami algorithm
For cells in last row If there is a production A xi
Put A in table cell iiFor cells st in other rows If there is a production A BC where B is in cell sj and C is in cell jt Put A in cell st
x1 x2 … xk
11 22 kk12 23
… …1k
tablecells
s j t k1
Input: Grammar G in CNF, string x = x1…xk
Cell ij remembers all possible derivations of substring xi…xj