1 languages and compilers (sprog og oversættere) parsing
TRANSCRIPT
1
Languages and Compilers(SProg og Oversættere)
Parsing
2
Parsing
– Describe the purpose of the parser
– Discuss top down vs. bottom up parsing
– Explain necessary conditions for construction of recursive decent parsers
– Discuss the construction of an RD parser from a grammar
3
Top-down parsing
The cat sees a rat .The cat sees rat .
Sentence
Subject Verb Object .
Sentence
Noun
Subject
The
Noun
cat
Verb
sees a
Noun
Object
Noun
rat .
4
Bottom up parsing
The cat sees a rat .The cat
Noun
Subject
sees
Verb
a rat
Noun
Object
.
Sentence
5
Look-Ahead
Derivation
LL-Analyse (Top-Down)
Look-Ahead
Reduction
LR-Analyse (Bottom-Up)
Top-Down vs Bottom-Up parsing
6
Development of Recursive Descent Parser
(1) Express grammar in EBNF
(2) Grammar Transformations: Left factorization and Left recursion elimination
(3) Create a parser class with– private variable currentToken– methods to call the scanner: accept and acceptIt
(4) Implement private parsing methods:– add private parseN method for each non terminal N
– public parse method that
• gets the first token form the scanner
• calls parseS (S is the start symbol of the grammar)
7
Recursive Descent Parsing
Sentence ::= Subject Verb Object .Subject ::= I | a Noun | the Noun Object ::= me | a Noun | the NounNoun ::= cat | mat | ratVerb ::= like | is | see | sees
Sentence ::= Subject Verb Object .Subject ::= I | a Noun | the Noun Object ::= me | a Noun | the NounNoun ::= cat | mat | ratVerb ::= like | is | see | sees
Define a procedure parseN for each non-terminal N
private void parseSentence() ;private void parseSubject();private void parseObject(); private void parseNoun();private void parseVerb();
private void parseSentence() ;private void parseSubject();private void parseObject(); private void parseNoun();private void parseVerb();
8
Recursive Descent Parsing
public class MicroEnglishParser {
private TerminalSymbol currentTerminal;
//Auxiliary methods will go here ...
//Parsing methods will go here ...}
public class MicroEnglishParser {
private TerminalSymbol currentTerminal;
//Auxiliary methods will go here ...
//Parsing methods will go here ...}
9
Recursive Descent Parsing: Auxiliary Methods
public class MicroEnglishParser {
private TerminalSymbol currentTerminal
private void accept(TerminalSymbol expected) {if (currentTerminal matches expected) currentTerminal = next input terminal ;else report a syntax error
}
...}
public class MicroEnglishParser {
private TerminalSymbol currentTerminal
private void accept(TerminalSymbol expected) {if (currentTerminal matches expected) currentTerminal = next input terminal ;else report a syntax error
}
...}
10
Recursive Descent Parsing: Parsing Methods
private void parseSentence() { parseSubject(); parseVerb(); parseObject(); accept(‘.’);}
private void parseSentence() { parseSubject(); parseVerb(); parseObject(); accept(‘.’);}
Sentence ::= Subject Verb Object .Sentence ::= Subject Verb Object .
11
Recursive Descent Parsing: Parsing Methods
private void parseSubject() { if (currentTerminal matches ‘I’) accept(‘I’); else if (currentTerminal matches ‘a’) { accept(‘a’); parseNoun(); } else if (currentTerminal matches ‘the’) { accept(‘the’); parseNoun(); } else report a syntax error}
private void parseSubject() { if (currentTerminal matches ‘I’) accept(‘I’); else if (currentTerminal matches ‘a’) { accept(‘a’); parseNoun(); } else if (currentTerminal matches ‘the’) { accept(‘the’); parseNoun(); } else report a syntax error}
Subject ::= I | a Noun | the Noun Subject ::= I | a Noun | the Noun
12
Recursive Descent Parsing: Parsing Methods
private void parseNoun() { if (currentTerminal matches ‘cat’) accept(‘cat’); else if (currentTerminal matches ‘mat’) accept(‘mat’); else if (currentTerminal matches ‘rat’) accept(‘rat’); else report a syntax error}
private void parseNoun() { if (currentTerminal matches ‘cat’) accept(‘cat’); else if (currentTerminal matches ‘mat’) accept(‘mat’); else if (currentTerminal matches ‘rat’) accept(‘rat’); else report a syntax error}
Noun ::= cat | mat | ratNoun ::= cat | mat | rat
13
LL 1 Grammars
• The presented algorithm to convert EBNF into a parser does not work for all possible grammars.
• It only works for so called “LL 1” grammars.• Basically, an LL1 grammar is a grammar which can be
parsed with a top-down parser with a lookahead (in the input stream of tokens) of one token.
• What grammars are LL1?
How can we recognize that a grammar is (or is not) LL1? We can deduce the necessary conditions from the parser
generation algorithm. We can use a formal definition
14
LL 1 Grammars
parse X* parse X*
while (currentToken.kind is in starters[X]) { parse X}
while (currentToken.kind is in starters[X]) { parse X}
parse X|Y parse X|Y
switch (currentToken.kind) { cases in starters[X]: parse X break; cases in starters[Y]: parse Y break; default: report syntax error }
switch (currentToken.kind) { cases in starters[X]: parse X break; cases in starters[Y]: parse Y break; default: report syntax error }
Condition: starters[X] and starters[Y] must be disjoint sets.
Condition: starters[X] and starters[Y] must be disjoint sets.
Condition: starters[X] must be disjoint from the set of tokens that can immediately follow X *
Condition: starters[X] must be disjoint from the set of tokens that can immediately follow X *
15
Formal definition of LL(1)
A grammar G is LL(1) iff for each set of productions M ::= X1 | X2 | … | Xn :1. starters[X1], starters[X2], …, starters[Xn] are all pairwise disjoint 2. If Xi =>* ε then starters[Xj]∩ follow[X]=Ø, for 1≤j≤ n.i≠j
If G is ε-free then 1 is sufficient
16
Converting EBNF into RD parsers
• The conversion of an EBNF specification into a Java implementation for a recursive descent parser is so “mechanical” that it can easily be automated!
=> JavaCC “Java Compiler Compiler”
17
JavaCC and JJTree
18
LR parsing
– The algorithm makes use of a stack.
– The first item on the stack is the initial state of a DFA
– A state of the automaton is a set of LR0/LR1 items.
– The initial state is constructed from productions of the form S:= • [, $] (where S is the start symbol of the CFG)
– The stack contains (in alternating) order:
• A DFA state
• A terminal symbol or part (subtree) of the parse tree being constructed
– The items on the stack are related by transitions of the DFA
– There are two basic actions in the algorithm:
• shift: get next input token
• reduce: build a new node (remove children from stack)
19
JavaCUP: A LALR generator for Java
Grammar BNF-like Specification
JavaCUP
Java File: Parser Class
Uses Scanner to get TokensParses Stream of Tokens
Definition of tokens
Regular Expressions
JFlex
Java File: Scanner Class
Recognizes Tokens
Syntactic Analyzer
20
Steps to build a compiler with SableCC
1. Create a SableCC specification file
2. Call SableCC3. Create one or more
working classes, possibly inherited from classes generated by SableCC
4. Create a Main class activating lexer, parser and working classes
5. Compile with Javac
21
Hierarchy