comp3190: principle of programming languages formal language syntax
TRANSCRIPT
![Page 1: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/1.jpg)
COMP3190: Principle of Programming Languages
Formal Language Syntax
![Page 2: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/2.jpg)
- 2 -
Motivation
The problem of parsing structured text is very commonConsider the structure of email addresses (using a grammar):<emailAddress> := <person> @ <host><person> := <word><host> := <word> | <word>.<host>Describe and recognize email addresses in arbitrary text.
![Page 3: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/3.jpg)
- 3 -
Outline
DFA & NFA Regular expression Regular languages Context free languages &PDA Scanner Parser
![Page 4: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/4.jpg)
- 4 -
Deterministic Finite Automata (DFA)
Q: finite set of states Σ: finite set of “letters” (alphabet) δ: QxΣ -> Q (transition function) q0: start state (in Q)
F : set of accept states (subset of Q) Acceptance: input consumed with the automata
in a final state.
![Page 5: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/5.jpg)
- 5 -
Example of DFA
q1 q2
1
0
0 1
δ 0 1
q1 q1 q2
q2 q1 q2
Accepts all strings that end in 1
![Page 6: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/6.jpg)
- 6 -
Another Example of a DFA
S
q1
q2
r1
r2
a b
a
ab
b
b
a b
a
Accepts all strings that start and end with “a” OR start and end with “b”
![Page 7: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/7.jpg)
- 7 -
Non-deterministic Finite Automata (NFA)
Transition function is different δ: QxΣε -> P(Q)
P(Q) is the powerset of Q (set of all subsets) Σε is the union of Σ and the special symbol ε
(denoting empty)String is accepted if there is at least one path leading to an accept state, and input consumed.
![Page 8: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/8.jpg)
- 8 -
Example of an NFA
q1 q2 q3 q4
0, 11 0, ε 1
0, 1
δ 0 1 ε
q1 {q1} {q1, q2}
q2 {q3} {q3}
q3 {q4}
q4 {q4} {q4}
What strings does this NFA accept?
![Page 9: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/9.jpg)
- 9 -
Outline
DFA & NFA Regular expression Regular languages Context free languages &PDA Scanner Parser
![Page 10: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/10.jpg)
- 10 -
Regular Expressions
R is a regular expression if R is “a” for some a in Σ. ε (the empty string). member of the empty language. the union of two regular expressions. the concatenation of two regular expr. R1
* (Kleene closure: zero or more repetitions of R1).
![Page 11: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/11.jpg)
- 11 -
Regular Expression Notation a: an ordinary letter ε: the empty string M | N: choosing from M or N MN: concatenation of M and N M*: zero or more times (Kleene star) M+: one or more times M?: zero or one occurence [a-zA-Z] character set alternation (choice) . period stands for any single char exc. newline
![Page 12: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/12.jpg)
- 12 -
Examples of Regular Expressions
{0, 1}* 0 all strings that end in 0{0, 1} 0* string that start with 1 or 0 followed by zero or more 0s.{0, 1}* all strings{0n1n, n >=0} not a regular expression!!!
![Page 13: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/13.jpg)
- 13 -
Converting a Regular Expression to an NFA
εε
ε
ε
εM
N
M
M N
ε
a
M|N
MN
M*
![Page 14: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/14.jpg)
- 14 -
Regular expression->NFA
Language: Strings of 0s and 1s in which the number of 0s is even
Regular expression: (1*01*0)*1*
![Page 15: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/15.jpg)
- 15 -
Converting an NFA to a DFA
For set of states S, closure(S) is the set of states that can be reached from S without consuming any input.
For a set of states S, DFAedge(s, c) is the set of states that can be reached from S by consuming input symbol c.
Each set of NFA states corresponds to one DFA state (hence at most 2n states).
![Page 16: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/16.jpg)
- 16 -
NFA -> DFA
Initial classes:{A, B, E}, {C, D}
No class requires partitioning!
Hence a two-stateDFA is obtained.
![Page 17: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/17.jpg)
- 17 -
Obtaining the minimal equivalent DFA
Initially two equivalence classes: final and nonfinal states.
Search for an equivalence class C and an input letter a such that with a as input, the states in C make transitions to states in k>1 different equivalence classes.
Partition C into k classes accordingly Repeat until unable to find a class to partition.
![Page 18: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/18.jpg)
- 18 -
Example (cont.)
![Page 19: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/19.jpg)
- 19 -
Outline
DFA & NFA Regular expression Regular languages Context free languages &PDA Scanner Parser
![Page 20: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/20.jpg)
- 20 -
Regular Grammar
Later definitions build on earlier ones Nothing defined in terms of itself (no
recursion)
Regular grammar for numeric literals in Pascal:digit -> 0|1|2|...|8|9unsigned_integer -> digit digit*unsigned_number -> unsigned_integer (( . unsigned_integer) | ε ) (( e (+ | - | ε ) unsigned_integer ) | ε )
![Page 21: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/21.jpg)
- 21 -
Languages and Automata in Programming Languages
Regular languages» Recognized(accepted) by finite automata» Useful for tokenizing program text
(lexical analysis) Context-free languages
» Recognized(accepted) by pushdown automata» Useful for parsing the syntax of a program
![Page 22: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/22.jpg)
- 22 -
Important Theorems
A language is regular if a regular expression describes it.
A language is regular if a finite automata recognizes it.
DFAs and NFAs are equally powerful.
![Page 23: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/23.jpg)
- 23 -
Outline
DFA & NFA Regular expression Regular languages Context free languages &PDA Scanner Parser
![Page 24: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/24.jpg)
- 24 -
Context-free Grammars
Context-free grammars are defined by substitution rules
Big Jim ate gree cheesegreen Jim ate green cheeseJim ate cheeseCheese ate Jim
P -> NP -> APS -> PVP
A -> big|greenN -> cheese|JimV -> ate
![Page 25: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/25.jpg)
- 25 -
Context-free Grammars
Context-free grammars are used to formally describe the syntax of programming languages.
Every syntactically correct program is derived using the context-free grammar of the language.
Parsing a program involves tracing such derivation, given the context-free grammar and the program.
![Page 26: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/26.jpg)
- 26 -
Context-free Grammars
A context-free grammar consists of V: a finite set of variables Σ: a finite set of terminals R: a finite set of rules of the form
variable -> {variable, terminal}* S: the start variable
![Page 27: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/27.jpg)
- 27 -
Pushdown Automata (PDA)
A pushdown automata consists of Q: a set of states Σ: input alphabet (of terminals) Γ: stack alphabet δ: a set of transition rules
Q x Σε x Γε -> P(Q x Γε)currentState, inputSymbol, headOfStack ->newState, pushSymbolOnStack
q0: the start state F: the set of accept states (subset of Q)
Deterministic: At most one move is possible from any configuration
![Page 28: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/28.jpg)
- 28 -
How does a PDA accept?
By final state: » Consume all the input while» Reaching a final state
By empty stack:» Consume all the input while» Having an empty stack» Set of final states is irrelevant
![Page 29: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/29.jpg)
- 29 -
Example of a PDA
q1 q2
q3q4
ε, ε ->$ 0, ε->0
1, 0->ε
1, 0->εε, $->ε
Notation: a, b->c: when PDA reads “a” from input, it replaces “b” at the top of stack with “c”.
What does this PDA accept?
![Page 30: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/30.jpg)
- 30 -
Important Theorems
A language is context-free iff a pushdown automata recognizes it
Non-deterministic PDA are more powerful than deterministic ones
![Page 31: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/31.jpg)
- 31 -
Example of Context-free Language That Requires a Non-deterministic PDA
{w wR | w belongs to {0, 1}*}
i.e. wR is w written backwards
Idea:
Non-deterministically guess the middle of the input string
![Page 32: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/32.jpg)
- 32 -
The Solution
q1 q2
q3q4
ε, ε ->$ 0, ε->01, ε->1
ε, ε->ε
1, 1->ε0, 0->ε
ε, $->ε
![Page 33: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/33.jpg)
- 33 -
Derivations and Parse Trees
Nested constructs require recursion, i.e. context-free grammars
CFG for arithmetic expressions
expression -> identifier | number | - expression | (expression) | expression operator expression
operator -> + | - | * | /
![Page 34: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/34.jpg)
- 34 -
Parse Tree for Slope*x + Intercept
Is this the only parse tree for this expression and grammar?
![Page 35: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/35.jpg)
- 35 -
A Better Expression Grammar
1. expression -> term | expression add_op term
2. term -> factor | term mult_op factor
3. factor -> identifier | number | - factor | (expression)
4. add_op -> + | -
5. mult_op -> * | /
A good grammar reflects the internal structure of programs.
This grammar is unambiguous and captures (HOW?):- operator precedence (*,/ bind tighter than +,- )- associativity (ops group left to right)
![Page 36: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/36.jpg)
- 36 -
And Better Parse Trees...
3 + 4 * 5
10 - 4 - 3
![Page 37: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/37.jpg)
- 37 -
Syntax-directed Compilation
Parser calls scanner to obtain tokens. Assembles tokens into parse tree. Passes tree to later phases of compilation. Scanner: deterministic finite automata. Parser: pushdown automata. Scanners and parsers can be generated
automatically from regular expressions and CFGs (e.G. lex/yacc).
![Page 38: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/38.jpg)
- 38 -
Outline
DFA & NFA Regular expression Regular languages Context free languages &PDA Scanner Parser
![Page 39: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/39.jpg)
- 39 -
Scanning
Accept the longest possible token in each invocation of the scanner.
Implementation.» Capture finite automata.
Case(switch) statements. Table and driver.
![Page 40: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/40.jpg)
- 40 -
Scanner for Pascal
![Page 41: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/41.jpg)
- 41 -
Scanner for Pascal(case Statements)
![Page 42: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/42.jpg)
- 42 -
Scanner (Table&driver)
![Page 43: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/43.jpg)
- 43 -
Scanner Generators
Start with a regular expression. Construct an NFA from it. Use a set of subsets construction to obtain an
equivalent DFA. Construct the minimal equivalent DFA.
![Page 44: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/44.jpg)
- 44 -
Outline
DFA & NFA Regular expression Regular languages Context free languages &PDA Scanner Parser
» Top-down parsing» Bottom-up Parsing» Comparison
![Page 45: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/45.jpg)
- 45 -
Parsing approaches Parsing in general has O(n3) cost. Need classes of grammars that can be parsed in
linear time» Top-down or
predictive parsing orrecursive descent parsingor LL parsing (Left-to-right Left-most)
» Bottom-up or shift-reduce parsing orLR parsing (Left-to-right Right-most)
![Page 46: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/46.jpg)
- 46 -
A Simple Grammar for a Comma-separated List of Identifiers
id_list -> id id_list_tail
id_list_tail -> , id id_list_tail
id_list_tail -> ;_________________________
String to be parsed: A, B, C;
![Page 47: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/47.jpg)
- 47 -
Top-down/bottom-up Parsing
![Page 48: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/48.jpg)
- 48 -
Outline
DFA & NFA Regular expression Regular languages Context free languages &PDA Scanner Parser
» Top-down parsing» Bottom-up Parsing» Comparison
![Page 49: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/49.jpg)
- 49 -
Top-down Parsing
Predicts a derivation Matches non-terminal against token observed in
input
![Page 50: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/50.jpg)
- 50 -
LL(1) Grammar
A grammar for which a top-down deterministic parser can be produced with one token of look-ahead.
LL(1) grammar:» For a given non-terminal, the lookahead symbol
uniquely determines the production to apply
» Top-down parsing = predictive parsing
» Driven by predictive parsing table of non-terminals x terminals productions
![Page 51: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/51.jpg)
- 51 -
From Last Time: Parsing with Table
Partly-derived String Lookahead parsed part unparsed partES’ ( (1+2+(3+4))+5(S)S’ 1 (1+2+(3+4))+5(ES’)S’ 1 (1+2+(3+4))+5(1S’)S’ + (1+2+(3+4))+5(1+ES’)S’ 2 (1+2+(3+4))+5(1+2S’)S’ + (1+2+(3+4))+5
S ES’ S’ | +S E num | (S)
num + ( ) $S ES’ ES’S’ +S E num (S)
![Page 52: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/52.jpg)
- 52 -
How to Construct Parsing Tables?
Needed: Algorithm for automatically generatinga predictive parse table from a grammar
S ES’S’ | +SE number | (S)
num + ( ) $S ES’ ES’S’ +S E num (S)
??
![Page 53: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/53.jpg)
- 53 -
Constructing Parse Tables Can construct predictive parser if:
» For every non-terminal, every lookahead symbol can be handled by at most 1 production
FIRST() for an arbitrary string of terminals and non-terminals is:» Set of symbols that might begin the fully expanded
version of FOLLOW(X) for a non-terminal X is:
» Set of symbols that might follow the derivation of X in the input stream
FIRST FOLLOW
X
![Page 54: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/54.jpg)
- 54 -
Parse Table Entries
Consider a production X Add to the X row for each symbol in
FIRST() If can derive ( is nullable), add
for each symbol in FOLLOW(X) Grammar is LL(1) if no conflicting entries
num + ( ) $S ES’ ES’S’ +S E num (S)
S ES’S’ | +SE number | (S)
![Page 55: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/55.jpg)
- 55 -
Computing Nullable
X is nullable if it can derive the empty string:» If it derives directly (X )
» If it has a production X YZ ... where all RHS symbols (Y,Z) are nullable
Algorithm: assume all non-terminals are non-nullable, apply rules repeatedly until no change
S ES’S’ | +SE number | (S)
Only S’ is nullable
![Page 56: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/56.jpg)
- 56 -
Computing FIRST Determining FIRST(X)
1. if X is a terminal, then add X to FIRST(X)
2. if X then add to FIRST(X)
3. if X is a nonterminal and X Y1Y2...Yk then a is in FIRST(X) if a is in FIRST(Yi) and is in FIRST(Yj) for j = 1...i-1 (i.e., its possible to have an empty prefix Y1 ... Yi-1
4. if is in FIRST(Y1Y2...Yk) then is in FIRST(X)
![Page 57: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/57.jpg)
- 57 -
FIRST Example
S ES’S’ | +SE number | (S)
Apply rule 1: FIRST(num) = {num}, FIRST(+) = {+}, etc.Apply rule 2: FIRST(S’) = {}Apply rule 3: FIRST(S) = FIRST(E) = {}
FIRST(S’) = FIRST(‘+’) + {} = { , + }FIRST(E) = FIRST(num) + FIRST(‘(‘) = {num, ( }
Rule 3 again: FIRST(S) = FIRST(E) = {num, ( }FIRST(S’) = {, + }FIRST(E) = {num, ( }
![Page 58: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/58.jpg)
- 58 -
Computing FOLLOW
Determining FOLLOW(X)1. if S is the start symbol then $ is in FOLLOW(S)
2. if A B then add all FIRST() != to FOLLOW(B)
3. if A B or B and is in FIRST() then add FOLLOW(A) to FOLLOW(B)
![Page 59: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/59.jpg)
- 59 -
FOLLOW Example
S ES’S’ | +SE number | (S)
FIRST(S) = {num, ( }FIRST(S’) = {, + }FIRST(E) = { num, ( }
Apply rule 1: FOL(S) = {$}Apply rule 2: S ES’ FOL(E) += {FIRST(S’) - } = {+}
S’ | +S -E num | (S) FOL(S) += {FIRST(‘)’) - } = {$,) }
Apply rule 3: S ES’ FOL(E) += FOL(S) = {+,$,)}(because S’ is nullable)
FOL(S’) += FOL(S) = {$,)}
![Page 60: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/60.jpg)
- 60 -
Putting it all TogetherFOLLOW(S) = { $, ) }FOLLOW(S’) = { $, ) }FOLLOW(E) = { +, ), $ }
FIRST(S) = {num, ( }FIRST(S’) = {, + }FIRST(E) = { num, ( }
Consider a production X
Add to the X row for each symbol in FIRST()
If can derive ( is nullable), add for each symbol in FOLLOW(X)
num + ( ) $S ES’ ES’S’ +S E num (S)
S ES’S’ | +SE number | (S)
![Page 61: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/61.jpg)
- 61 -
Ambiguous Grammars
Construction of predictive parse table for ambiguousgrammar results in conflicts in the table (ie 2 or moreproductions to apply in same cell)
S S + S | S * S | num
FIRST(S+S) = FIRST(S*S) = FIRST(num) = { num }
![Page 62: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/62.jpg)
- 62 -
Class Problem
E E + T | TT T * F | FF (E) | num |
1. Compute FIRST and FOLLOW sets for this G2. Compute parse table entries
![Page 63: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/63.jpg)
- 63 -
Top-Down Parsing Up to This Point
Now we know» How to build parsing table for an LL(1)
grammar (ie FIRST/FOLLOW)» How to construct recursive-descent parser
from parsing table» Call tree = parse tree
Open question – Can we generate the AST?
![Page 64: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/64.jpg)
- 64 -
Creating the Abstract Syntax Tree Some class definitions to assist
with AST construction class Expr {} class Add extends Expr {
» Expr left, right;
» Add(Expr L, Expr R) { left = L; right = R;
» }
} class Num extends Expr {
» int value;
» Num(int v) {value = v;}
}
Expr
Num Add
Class Hierarchy
![Page 65: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/65.jpg)
- 65 -
Creating the AST
++ 5
1 +
2 +
3 4
(1 + 2 + (3 + 4)) + 5S
E + S
( S ) E
E + S 5
E + S1
2 E
( S )
E + S
E3 4
• We got the parse treefrom the call tree
• Just add code to eachparsing routine to createthe appropriate nodes
• Works because parse treeand call tree are the sameshape, and AST is just acompressed form of theparse tree
![Page 66: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/66.jpg)
- 66 -
AST Creation: parse_E
Expr parse_E() {» switch (token) {
case num: // E number Expr result = Num(token.value); token = input.read(); return result;
case ‘(‘: // E (S) token = input.read(); Expr result = parse_S(); if (token != ‘)’) ParseError(); token = input.read(); return result;
default: ParseError();
» }
}
Remember, this is lookahead token
S ES’S’ | +SE number | (S)
![Page 67: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/67.jpg)
- 67 -
AST Creation: parse_S
Expr parse_S() {» switch (token) {
case num: case ‘(‘: // S ES’
Expr left = parse_E(); Expr right = parse_S’(); if (right == NULL) return left; else return new Add(left,right);
default: ParseError();
» }
}
S ES’S’ | +SE number | (S)
![Page 68: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/68.jpg)
- 68 -
Grammars Have been using grammar for language “sums
with parentheses” (1+2+(3+4))+5 Started with simple, right-associative grammar
» S E + S | E» E num | (S)
Transformed it to an LL(1) by left factoring:» S ES’» S’ | +S» E num (S)
What if we start with a left-associative grammar?» S S + E | E» E num | (S)
![Page 69: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/69.jpg)
- 69 -
Reminder: Left vs Right Associativity
+
1 +
2 +
3 4
S E + SS EE num
S S + ES EE num
+
1
+
2
+ 34
Right recursion : right associative
Left recursion : left associative
Consider a simpler string on a simpler grammar: “1 + 2 + 3 + 4”
![Page 70: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/70.jpg)
- 70 -
Left Recursion
derived string lookahead read/unreadS 1 1+2+3+4S+E 1 1+2+3+4S+E+E 1 1+2+3+4S+E+E+E 1 1+2+3+4E+E+E+E 1 1+2+3+41+E+E+E 2 1+2+3+41+2+E+E 3 1+2+3+41+2+3+E 4 1+2+3+41+2+3+4 $ 1+2+3+4
Is this right? If not, what’s the problem?
S S + ES EE num
“1 + 2 + 3 + 4”
![Page 71: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/71.jpg)
- 71 -
Left-Recursive Grammars
Left-recursive grammars don’t work with top-down parsers: we don’t know when to stop the recursion
Left-recursive grammars are NOT LL(1)!» S S» S
In parse table» Both productions will appear in the predictive
table at row S in all the columns corresponding to FIRST()
![Page 72: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/72.jpg)
- 72 -
Eliminate Left Recursion
Replace» X X1 | ... | Xm» X 1 | ... | n
With» X 1X’ | ... | nX’» X’ 1X’ | ... | mX’ |
See complete algorithm in Dragon book
![Page 73: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/73.jpg)
- 73 -
Class Problem
E E + T | TT T * F | FF (E) | num
Transform the following grammar to eliminate left recursion:
![Page 74: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/74.jpg)
- 74 -
Creating an LL(1) Grammar
Start with a left-recursive grammar S S + E S E
» and apply left-recursion elimination algorithm S ES’ S’ +ES’ |
Start with a right-recursive grammar S E + S S E
» and apply left-factoring to eliminate common prefixes S ES’ S’ +S |
![Page 75: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/75.jpg)
- 75 -
Top-Down Parsing Summary
Language grammarLeft-recursion elimination
Left factoring
LL(1) grammar
predictive parsing tableFIRST, FOLLOW
recursive-descent parser
parser with AST gen
![Page 76: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/76.jpg)
- 76 -
Outline
DFA & NFA Regular expression Regular languages Context free languages &PDA Scanner Parser
» Top-down parsing» Bottom-up Parsing» Comparison
![Page 77: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/77.jpg)
- 77 -
New Topic: Bottom-Up Parsing
A more power parsing technology LR grammars – more expressive than LL
» Construct right-most derivation of program» Left-recursive grammars, virtually all
programming languages are left-recursive» Easier to express syntax
Shift-reduce parsers» Parsers for LR grammars» Automatic parser generators (yacc, bison)
![Page 78: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/78.jpg)
- 78 -
Bottom-Up Parsing (2)
Right-most derivation – Backward» Start with the tokens» End with the start symbol» Match substring on RHS of production,
replace by LHS
S S + E | EE num | (S)
(1+2+(3+4))+5 (E+2+(3+4))+5 (S+2+(3+4))+5 (S+E+(3+4))+5 (S+(3+4))+5 (S+(E+4))+5 (S+(S+4))+5 (S+(S+E))+5 (S+(S))+5 (S+E)+5 (S)+5 E+5 S+E S
![Page 79: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/79.jpg)
- 79 -
Shift-Reduce Parsing
Parsing actions: A sequence of shift and reduce operations
Parser state: A stack of terminals and non-terminals (grows to the right)
Current derivation step = stack + input
Derivation step stack Unconsumed input(1+2+(3+4))+5 (1+2+(3+4))+5(E+2+(3+4))+5 (E +2+(3+4))+5(S+2+(3+4))+5 (S +2+(3+4))+5(S+E+(3+4))+5 (S+E +(3+4))+5...
![Page 80: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/80.jpg)
- 80 -
Shift-Reduce Actions
Parsing is a sequence of shifts and reduces Shift: move look-ahead token to stack
Reduce: Replace symbols from top of stack with non-terminal symbol X corresponding to the production: X (e.g., pop , push X)
stack input action( 1+2+(3+4))+5 shift 1(1 +2+(3+4))+5
stack input action(S+E +(3+4))+5 reduce S S+ E(S +(3+4))+5
![Page 81: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/81.jpg)
- 81 -
Shift-Reduce Parsing
derivation stack input stream action(1+2+(3+4))+5 (1+2+(3+4))+5 shift(1+2+(3+4))+5 ( 1+2+(3+4))+5 shift(1+2+(3+4))+5 (1 +2+(3+4))+5 reduce E num(E+2+(3+4))+5 (E +2+(3+4))+5 reduce S E(S+2+(3+4))+5 (S +2+(3+4))+5 shift(S+2+(3+4))+5 (S+ 2+(3+4))+5 shift(S+2+(3+4))+5 (S+2 +(3+4))+5 reduce E num(S+E+(3+4))+5 (S+E +(3+4))+5 reduce S S+E(S+(3+4))+5 (S +(3+4))+5 shift(S+(3+4))+5 (S+ (3+4))+5 shift(S+(3+4))+5 (S+( 3+4))+5 shift(S+(3+4))+5 (S+(3 +4))+5 reduce E num
...
S S + E | EE num | (S)
![Page 82: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/82.jpg)
- 82 -
Potential Problems
How do we know which action to take: whether to shift or reduce, and which production to apply
Issues» Sometimes can reduce but should not» Sometimes can reduce in different ways
![Page 83: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/83.jpg)
- 83 -
Action Selection Problem
Given stack and look-ahead symbol b, should parser:» Shift b onto the stack making it b ?» Reduce X assuming that the stack has the
form = making it X ? If stack has the form , should apply
reduction X (or shift) depending on stack prefix ? is different for different possible reductions
since ’s have different lengths
![Page 84: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/84.jpg)
- 84 -
LR Parsing Engine
Basic mechanism» Use a set of parser states» Use stack with alternating symbols and states
E.g., 1 ( 6 S 10 + 5 (blue = state numbers)
» Use parsing table to: Determine what action to apply (shift/reduce) Determine next state
The parser actions can be precisely determined from the table
![Page 85: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/85.jpg)
- 85 -
LR Parsing Table
Algorithm: look at entry for current state S and input terminal C» If Table[S,C] = s(S’) then shift:
push(C), push(S’)
» If Table[S,C] = X then reduce: pop(2*||), S’= top(), push(X), push(Table[S’,X])
Next actionand next state
Next state
Terminals Non-terminals
State
Action table Goto table
![Page 86: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/86.jpg)
- 86 -
LR Parsing Table Example
( ) id , $ S L1 s3 s2 g42 Sid Sid Sid Sid Sid3 s3 s2 g7 g54 accept5 s6 s86 S(L) S(L) S(L) S(L) S(L)7 LS LS LS LS LS8 s3 s2 g99 LL,S LL,S LL,S LL,S LL,S
Sta
te
Input terminal Non-terminals
We want to derive this in an algorithmic fashion
![Page 87: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/87.jpg)
- 87 -
Parsing Example ((a),b)
derivation stack input action((a),b) 1 ((a),b) shift, goto 3((a),b) 1(3 (a),b) shift, goto 3((a),b) 1(3(3 a),b) shift, goto 2((a),b) 1(3(3a2 ),b) reduce Sid((S),b) 1(3(3(S7 ),b) reduce LS((L),b) 1(3(3(L5 ),b) shift, goto 6((L),b) 1(3(3L5)6 ,b) reduce S(L)(S,b) 1(3S7 ,b) reduce LS(L,b) 1(3L5 ,b) shift, goto 8(L,b) 1(3L5,8 b) shift, goto 9(L,b) 1(3L5,8b2 ) reduce Sid(L,S) 1(3L8,S9 ) reduce LL,S(L) 1(3L5 ) shift, goto 6(L) 1(3L5)6 reduce S(L)S 1S4 $ done
S (L) | idL S | L,S
![Page 88: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/88.jpg)
- 88 -
LR(k) Grammars
LR(k) = Left-to-right scanning, right-most derivation, k lookahead chars
Main cases» LR(0), LR(1)» Some variations SLR and LALR(1)
Parsers for LR(0) Grammars:» Determine the actions without any lookahead» Will help us understand shift-reduce parsing
![Page 89: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/89.jpg)
- 89 -
Building LR(0) Parsing Tables
To build the parsing table:» Define states of the parser
» Build a DFA to describe transitions between states
» Use the DFA to build the parsing table
Each LR(0) state is a set of LR(0) items» An LR(0) item: X . where X is a
production in the grammar
» The LR(0) items keep track of the progress on all of the possible upcoming productions
» The item X . abstracts the fact that the parser already matched the string at the top of the stack
![Page 90: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/90.jpg)
- 90 -
Example LR(0) State
An LR(0) item is a production from the language with a separator “.” somewhere in the RHS of the production
Sub-string before “.” is already on the stack (beginnings of possible ’s to be reduced)
Sub-string after “.”: what we might see next
E num .E ( . S)
stateitem
![Page 91: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/91.jpg)
- 91 -
Class Problem
For the production,E num | (S)
Two items are:E num .E ( . S )
Are there any others? If so, what are they? If not, why?
![Page 92: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/92.jpg)
- 92 -
LR(0) Grammar
Nested lists» S (L) | id
» L S | L,S
Examples» (a,b,c)
» ((a,b), (c,d), (e,f))
» (a, (b,c,d), ((f,g)))
S
( L )
L , S
L , S
( S )S
a L , S
S
b
c
d
Parse tree for(a, (b,c), d)
![Page 93: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/93.jpg)
- 93 -
Start State and Closure
Start state» Augment grammar with production: S’ S $» Start state of DFA has empty stack: S’ . S $
Closure of a parser state:» Start with Closure(S) = S» Then for each item in S:
X . Y Add items for all the productions Y to the
closure of S: Y .
![Page 94: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/94.jpg)
- 94 -
Closure Example
S (L) | idL S | L,S
DFA start state
S’ . S $closure
S’ . S $S . (L)S . id
- Set of possible productions to be reduced next- Added items have the “.” located at the beginning: no symbols for these items on the stack yet
![Page 95: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/95.jpg)
- 95 -
The Goto Operation
Goto operation = describes transitions between parser states, which are sets of items
Algorithm: for state S and a symbol Y» If the item [X . Y ] is in I, then» Goto(I, Y) = Closure( [X Y . ] )
S’ . S $S . (L)S . id
Goto(S, ‘(‘) Closure( { S ( . L) } )
![Page 96: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/96.jpg)
- 96 -
Class Problem
1. If I = { [E’ . E]}, then Closure(I) = ??
2. If I = { [E’ E . ], [E E . + T] }, then Goto(I,+) = ??
E’ EE E + T | TT T * F | FF (E) | id
![Page 97: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/97.jpg)
- 97 -
Applying Reduce Actions
S’ . S $S . (L)S . id
S ( . L)L . SL . L, SS . (L)S . id
S id .
id
(
id (Grammar
S (L) | idL S | L,S
S (L . )L L . , S
L S .
L
S
states causing reductions(dot has reached the end!)
Pop RHS off stack, replace with LHS X (X ),then rerun DFA (e.g., (x))
![Page 98: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/98.jpg)
- 98 -
Reductions
On reducing X with stack » Pop off stack, revealing prefix and state» Take single step in DFA from top state» Push X onto stack with new DFA state
Example
derivation stack input action((a),b) 1 ( 3 ( 3 a),b) shift, goto 2((a),b) 1 ( 3 ( 3 a 2 ),b) reduce S id((S),b) 1 ( 3 ( 3 S 7 ),b) reduce L S
![Page 99: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/99.jpg)
- 99 -
Full DFA
S’ . S $S . (L)S . id
S ( . L)L . SL . L, SS . (L)S . id
S id .id
(
id
(
S (L . )LL L . , S
L S .
S
L L , . SS . (L)S . id
L L,S .
S (L) .
S’ S . $
final state
1 2 8 9
6
5
3
74
S
,
)
S
$
id
L
GrammarS (L) | idL S | L,S
![Page 100: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/100.jpg)
- 100 -
Building the Parsing Table
States in the table = states in the DFA For transition S S’ on terminal C:
» Table[S,C] += Shift(S’) For transition S S’ on non-terminal N:
» Table[S,N] += Goto(S’) If S is a reduction state X then:
» Table[S,*] += Reduce(X )
![Page 101: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/101.jpg)
- 101 -
Computed LR Parsing Table
( ) id , $ S L1 s3 s2 g42 Sid Sid Sid Sid Sid3 s3 s2 g7 g54 accept5 s6 s86 S(L) S(L) S(L) S(L) S(L)7 LS LS LS LS LS8 s3 s2 g99 LL,S LL,S LL,S LL,S LL,S
Sta
te
Input terminal Non-terminals
red = reduceblue = shift
![Page 102: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/102.jpg)
- 102 -
LR(0) Summary
LR(0) parsing recipe:» Start with LR(0) grammar» Compute LR(0) states and build DFA:
Use the closure operation to compute states Use the goto operation to compute transitions
» Build the LR(0) parsing table from the DFA This can be done automatically
![Page 103: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/103.jpg)
- 103 -
Class Problem
S E + S | EE num
Generate the DFA for the following grammar
![Page 104: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/104.jpg)
- 104 -
LR(0) Limitations
An LR(0) machine only works if states with reduce actions have a single reduce action» Always reduce regardless of lookahead
With a more complex grammar, construction gives states with shift/reduce or reduce/reduce conflicts
Need to use lookahead to choose
L L , S .L L , S .S S . , L
L S , L .L S .
OK shift/reduce reduce/reduce
![Page 105: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/105.jpg)
- 105 -
A Non-LR(0) Grammar
Grammar for addition of numbers» S S + E | E» E num
Left-associative version is LR(0) Right-associative is not LR(0) as you saw
with the previous class problem» S E + S | E» E num
![Page 106: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/106.jpg)
- 106 -
LR(0) Parsing Table
S’ . S $S .E + SS . EE .num E num .
S E . +SS E .
E
num
+
S E + S .
S’ S $ .
S
S E + . SS . E + SS . EE . num
S’ S . $
1 2
5
3
7
4S
GrammarS E + S | EE num
$
E
num
num + $ E S1 s4 g2 g62 SE s3/SE SE
Shift orreducein state 2?
![Page 107: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/107.jpg)
- 107 -
Solve Conflict With Lookahead
3 popular techniques for employing lookahead of 1 symbol with bottom-up parsing» SLR – Simple LR» LALR – LookAhead LR» LR(1)
Each as a different means of utilizing the lookahead» Results in different processing capabilities
![Page 108: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/108.jpg)
- 108 -
SLR Parsing
SLR Parsing = Easy extension of LR(0)» For each reduction X , look at next symbol C
» Apply reduction only if C is in FOLLOW(X)
SLR parsing table eliminates some conflicts» Same as LR(0) table except reduction rows» Adds reductions X only in the columns of
symbols in FOLLOW(X)
num + $ E S1 s4 g2 g62 s3 SE
Example: FOLLOW(S) = {$}
GrammarS E + S | EE num
![Page 109: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/109.jpg)
- 109 -
SLR Parsing Table
Reductions do not fill entire rows as before Otherwise, same as LR(0)
num + $ E S1 s4 g2 g62 s3 SE3 s4 g2 g54 Enum Enum5 SE+S6 s77 accept
GrammarS E + S | EE num
![Page 110: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/110.jpg)
- 110 -
Class ProblemConsider:
S L = RS RL *RL identR L
Think of L as l-value, R as r-value, and* as a pointer dereference
When you create the states in the SLR(1) DFA,2 of the states are the following:
S L . = RR L . S R .
Do you have any shift/reduce conflicts? (Not as easy as it looks)
![Page 111: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/111.jpg)
- 111 -
LR(1) Parsing Get as much as possible out of 1 lookahead
symbol parsing table LR(1) grammar = recognizable by a shift/reduce
parser with 1 lookahead LR(1) parsing uses similar concepts as LR(0)
» Parser states = set of items» LR(1) item = LR(0) item + lookahead symbol
possibly following production LR(0) item: S . S + E LR(1) item: S . S + E , + Lookahead only has impact upon REDUCE
operations, apply when lookahead = next input
![Page 112: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/112.jpg)
- 112 -
LR(1) States
LR(1) state = set of LR(1) items LR(1) item = (X . , y)
» Meaning: already matched at top of the stack, next expect to see y
Shorthand notation» (X . , {x1, ..., xn})
» means: (X . , x1) . . . (X . , xn)
Need to extend closure and goto operations
S S . + E +,$S S + . E num
![Page 113: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/113.jpg)
- 113 -
LR(1) Closure
LR(1) closure operation:» Start with Closure(S) = S
» For each item in S: X . Y , z
and for each production Y , add the following item to the closure of S: Y . , FIRST(z)
» Repeat until nothing changes
Similar to LR(0) closure, but also keeps track of lookahead symbol
![Page 114: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/114.jpg)
- 114 -
LR(1) Start State
Initial state: start with (S’ . S , $), then apply closure operation
Example: sum grammar
S’ . S , $
S’ . S , $S . E + S , $S . E , $E . num , +,$
closure
S’ S $S E + S | EE num
![Page 115: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/115.jpg)
- 115 -
LR(1) Goto Operation
LR(1) goto operation = describes transitions between LR(1) states
Algorithm: for a state S and a symbol Y (as before)» If the item [X . Y ] is in I, then
» Goto(I, Y) = Closure( [X Y . ] )
S E . + S , $S E . , $
Closure({S E + . S , $})
Goto(S1, ‘+’)S1 S2
Grammar:S’ S$S E + S | EE num
![Page 116: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/116.jpg)
- 116 -
Class Problem
1. Compute: Closure(I = {S E + . S , $})2. Compute: Goto(I, num)3. Compute: Goto(I, E)
S’ S $S E + S | EE num
![Page 117: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/117.jpg)
- 117 -
LR(1) DFA Construction
S’ . S , $S . E + S , $S . E , $E .num , +,$
E num . , +,$
S’ S . , $
E
num
+
S E+S. , +,$
S
S E + . S , $S . E + S , $S . E , $E . num , +,$
S E . + S , $S E . , $
S
GrammarS’ S$S E + S | EE numE
num
![Page 118: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/118.jpg)
- 118 -
LR(1) Reductions
S’ . S , $S . E + S , $S . E , $E .num , +,$
E num . , +,$
S’ S . , $
E
num
+
S E . , +,$
S
S E + . S , $S . E + S , $S . E , $E . num , +,$
S E . + S , $S E . , $
S
GrammarS’ S$S E + S | EE numE
num
• Reductions correspond to LR(1) items of the form (X . , y)
![Page 119: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/119.jpg)
- 119 -
LR(1) Parsing Table Construction
Same as construction of LR(0), except for reductions
For a transition S S’ on terminal x:» Table[S,x] += Shift(S’)
For a transition S S’ on non-terminal N:» Table[S,N] += Goto(S’)
If I contains {(X . , y)} then:» Table[I,y] += Reduce(X )
![Page 120: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/120.jpg)
- 120 -
LR(1) Parsing Table Example
S’ . S , $S . E + S , $S . E , $E .num , +,$
E
+
S E + . S , $S . E + S , $S . E , $E . num , +,$
S E . + S , $S E . , $
GrammarS’ S$S E + S | EE num
1
2
3
+ $ E1 g22 s3 SE
Fragment of theparsing table
![Page 121: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/121.jpg)
- 121 -
Class Problem
Compute the LR(1) DFA for the following grammar
E E + T | TT TF | FF F* | a | b
![Page 122: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/122.jpg)
- 122 -
LALR(1) Grammars
Problem with LR(1): too many states LALR(1) parsing (aka LookAhead LR)
» Constructs LR(1) DFA and then merge any 2 LR(1) states whose items are identical except lookahead
» Results in smaller parser tables
» Theoretically less powerful than LR(1)
LALR(1) grammar = a grammar whose LALR(1) parsing table has no conflicts
S id . , +S E . , $
S id . , $S E . , ++ = ??
![Page 123: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/123.jpg)
- 123 -
LALR Parsers
LALR(1)» Generally same number of states as SLR
(much less than LR(1))» But, with same lookahead capability of LR(1)
(much better than SLR)» Example: Pascal programming language
In SLR, several hundred states In LR(1), several thousand states
![Page 124: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/124.jpg)
- 124 -
Automate the Parsing Process
Can automate:» The construction of LR parsing tables
» The construction of shift-reduce parsers based on these parsing tables
LALR(1) parser generators» yacc, bison
» Not much difference compared to LR(1) in practice
» Smaller parsing tables than LR(1)
» Augment LALR(1) grammar specification with declarations of precedence, associativity
» Output: LALR(1) parser program
![Page 125: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/125.jpg)
- 125 -
Associativity
S S + E | EE num
E E + EE num
What happens if we run this grammar through LALR construction?
E E + EE num
E E + E . , +E E . + E , +,$
+
shift/reduceconflict
shift: 1+ (2+3)reduce: (1+2)+3
1 + 2 + 3
![Page 126: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/126.jpg)
- 126 -
Associativity (2)
If an operator is left associative» Assign a slightly higher value to its precedence if it is
on the parse stack than if it is in the input stream
» Since stack precedence is higher, reduce will take priority (which is correct for left associative)
If operator is right associative» Assign a slightly higher value if it is in the input
stream
» Since input stream is higher, shift will take priority (which is correct for right associative)
![Page 127: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/127.jpg)
- 127 -
Precedence
E E + E | TT T x T | num | (E) E E + E | E x E | num | (E)
Shift/reduceconflict results
What happens if we run this grammar through LALR construction?
E E . + E , ...E E x E . , +
E E + E . , xE E . x E, ...
Precedence: attach precedence indicators to terminalsShift/reduce conflict resolved by:
1. If precedence of the input token is greater than the last terminal on parse stack, favor shift over reduce2. If the precedence of the input token is less than or equal to the last terminal on the parse stack, favor reduce over shift
![Page 128: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/128.jpg)
- 128 -
Abstract Syntax Tree (AST) - Review Derivation = sequence of
applied productions» S E+S 1+S 1+E
1+2
Parse tree = graph representation of a derivation» Doesn’t capture the order
of applying the productions
AST discards unnecessary information from the parse tree
++ 5
1 +
2 +
3 4
S
E + S
( S ) E
E + S 5
E + S1
2 E
( S )
E + S
E3 4
![Page 129: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/129.jpg)
- 129 -
Implicit AST Construction
LL/LR parsing techniques implicitly build AST
The parse tree is captured in the derivation» LL parsing: AST represented by applied
productions» LR parsing: AST represented by applied
reductions We want to explicitly construct the AST
during the parsing phase
![Page 130: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/130.jpg)
- 130 -
AST Construction - LL
void parse_S() { switch (token) { case num: case ‘(‘: parse_E(); parse_S’(); return; default: ParseError(); }}
Expr parse_S() { switch (token) { case num: case ‘(‘: Expr left = parse_E(); Expr right = parse_S’(); if (right == NULL) return left else return new Add(left,right); default: ParseError(); }}
LL parsing: extend proceduresfor non-terminals
S ES’S’ | +SE num | (S)
![Page 131: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/131.jpg)
- 131 -
AST Construction - LR
We again need to add code for explicit AST construction
AST construction mechanism» Store parts of the tree on the stack» For each nonterminal symbol X on stack, also
store the sub-tree rooted at X on stack» Whenever the parser performs a reduce
operation for a production X , create an AST node for X
![Page 132: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/132.jpg)
- 132 -
AST Construction for LR - Example
S E + S | SE num | (S)
.
.
.
.
.
.
S
+
E
.
.
Add
Num(1) Num(2)
stac
k
Before reduction: S E + S
Num(3) ...
.
.
.
S .Add
Num(1)
Num(2) Num(3)
Add
After reduction: S E + S
input string: “1 + 2 + 3”
![Page 133: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/133.jpg)
- 133 -
Problems
Unstructured code: mixing parsing code with AST construction code
Automatic parser generators» The generated parser needs to contain AST
construction code
» How to construct a customized AST data structure using an automatic parser generator?
May want to perform other actions concurrently with parsing phase» E.g., semantic checks
» This can reduce the number of compiler passes
![Page 134: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/134.jpg)
- 134 -
Syntax-Directed Definition
Solution: Syntax-directed definition» Extends each grammar production with an
associated semantic action (code): S E + S {action}
» The parser generator adds these actions into the generated parser
» Each action is executed when the corresponding production is reduced
![Page 135: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/135.jpg)
- 135 -
Semantic Actions
Actions = C code (for bison/yacc) The actions access the parser stack
» Parser generators extend the stack of symbols with entries for user-defined structures (e.g., parse trees)
The action code should be able to refer to the grammar symbols in the productions» Need to refer to multiple occurrences of the same non-
terminal symbol, distinguish RHS vs LHS occurrence E E + E
» Use dollar variables in yacc/bison ($$, $1, $2, etc.) expr ::= expr PLUS expr {$$ = $1 + $3;}
![Page 136: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/136.jpg)
- 136 -
Building the AST
Use semantic actions to build the AST AST is built bottom-up along with parsing
expr ::= NUM {$$ = new Num($1.val); }expr ::= expr PLUS expr {$$ = new Add($1, $3); }expr ::= expr MULT expr {$$ = new Mul($1, $3); }expr ::= LPAR expr RPAR {$$ = $2; }
Recall: User-defined type forobjects on the stack (%union)
![Page 137: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/137.jpg)
- 137 -
Outline
DFA & NFA Regular expression Regular languages Context free languages &PDA Scanner Parser
» Top-down parsing» Bottom-up Parsing» Comparison
![Page 138: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/138.jpg)
- 138 -
LL/LR Grammar Summary
LL parsing tables» Non-terminals x terminals productions
» Computed using FIRST/FOLLOW
LR parsing tables» LR states x terminals {shift/reduce}
» LR states x non-terminals goto
» Computed using closure/goto operations on LR states
A grammar is:» LL(1) if its LL(1) parsing table has no conflicts
» same for LR(0), SLR, LALR(1), LR(1)
![Page 139: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/139.jpg)
- 139 -
Top-Down Parsing
S S+E E+E (S)+E (S+E)+E (S+E+E)+E (E+E+E)+E (1+E+E)+E (1+2+E)+E ...
S S + E | EE num | (S)
In left-most derivation, entiretree above token (2) has beenexpanded when encountered
S
S + E
( S )
S + E
5E
S + E
2E
1
( S )
S + E
4E
3
![Page 140: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/140.jpg)
- 140 -
Top-Down vs Bottom-Up
scanned unscanned scanned unscanned
Top-down Bottom-up
Bottom-up: Don’t need to figure out as much of he parse treefor a given amount of input More time to decide what rulesto apply
![Page 141: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/141.jpg)
- 141 -
Terminology: LL vs LR LL(k)
» Left-to-right scan of input» Left-most derivation» k symbol lookahead» [Top-down or predictive] parsing or LL parser» Performs pre-order traversal of parse tree
LR(k)» Left-to-right scan of input» Right-most derivation» k symbol lookahead» [Bottom-up or shift-reduce] parsing or LR parser» Performs post-order traversal of parse tree
![Page 142: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/142.jpg)
- 142 -
Classification of Grammars
LR(0)
SLR
LALR(1)
LR(1)
LL(1)
LR(k) LR(k+1)LL(k) LL(k+0)
LL(k) LR(k)LR(0) SLRLALR(1) LR(1)
not to scale
![Page 143: COMP3190: Principle of Programming Languages Formal Language Syntax](https://reader037.vdocuments.site/reader037/viewer/2022110209/56649e565503460f94b4e7ef/html5/thumbnails/143.jpg)
- 143 -
Bottom-Up Parsing
(1+2+(3+4))+5 (E+2+(3+4))+5 (S+2+(3+4))+5 (S+E+(3+4))+5
S S + E | EE num | (S)
Advantage of bottom-up parsing:can postpone the selection ofproductions until more of theinput is scanned
S
S + E
( S )
S + E
5E
S + E
2E
1
( S )
S + E
4E
3