1 contents introduction introduction a simple compiler a simple compiler scanning – theory and...
Post on 22-Dec-2015
245 views
TRANSCRIPT
1
ContentsContents IntroductionIntroduction A Simple CompilerA Simple Compiler Scanning – Theory and PracticeScanning – Theory and Practice Grammars and ParsingGrammars and Parsing LL(1) ParsingLL(1) Parsing LR ParsingLR Parsing Lex and yaccLex and yacc Semantic ProcessingSemantic Processing Symbol TablesSymbol Tables Run-time Storage OrganizationRun-time Storage Organization Code Generation and Local Code OptimizationCode Generation and Local Code Optimization Global OptimizationGlobal Optimization
3
OutlineOutline Context-Free GrammarsContext-Free Grammars Errors in Context-Free GrammarsErrors in Context-Free Grammars Transforming Extended BNF GrammarsTransforming Extended BNF Grammars Parsers and RecognizersParsers and Recognizers Grammar Analysis AlgorithmsGrammar Analysis Algorithms
4
Context-Free Grammars: Concepts and Context-Free Grammars: Concepts and NotationNotation
A context-free grammar A context-free grammar G = (G = (VVtt,, V Vnn,, S S,, P P)) A finite A finite terminal vocabularyterminal vocabulary VVtt
The token set produced by scannerThe token set produced by scanner
A finite set of A finite set of nonterminal vacabularynonterminal vacabulary VVnn
Intermediate symbolsIntermediate symbols
A A start symbolstart symbol SS Vn that starts all derivationsVn that starts all derivations Also called goal symbolAlso called goal symbol
P, a finite set of P, a finite set of productionsproductions (rewriting rules) (rewriting rules) of the form of the form AAXX11XX22XXmm
AAVVnn, X, Xi i VVnn ∪ ∪ VVtt, 1, 1i i mm AAis a valid productionis a valid production
5
Context-Free Grammars: Concepts and Context-Free Grammars: Concepts and Notation Notation (Cont’d.)(Cont’d.)
Other notationsOther notations The vocabulary V of a CFG is the set of terminal The vocabulary V of a CFG is the set of terminal
and nonterminal symbols and nonterminal symbols V= V= VVnn∪∪VVtt
L(G), the set of strings derivable from S compriseL(G), the set of strings derivable from S comprise Context-free language of grammar GContext-free language of grammar G
Notational conventionsNotational conventions a, b, c, a, b, c, denote symbols in denote symbols in VVtt
A, B, C, A, B, C, denote symbols in denote symbols in VVnn
U, V, W, U, V, W, denote symbols in denote symbols in VV , , , , ,,denote strings in denote strings in V*V* u, v, w, u, v, w, denote strings in denote strings in VVtt**
6
Context-Free Grammars: Concepts and Context-Free Grammars: Concepts and Notation Notation (Cont’d.)(Cont’d.)
DerivationDerivation One step derivationOne step derivation
If If AA, then , then AA One-step derivation One-step derivation One or more steps derivation One or more steps derivation
Zero or more steps derivation Zero or more steps derivation
If If S S , then , then is said to be is said to be sentential formsentential form of the of the CFG.CFG. SF(G) is the set of sentential forms of grammar G.SF(G) is the set of sentential forms of grammar G.
L(G) = {x L(G) = {x VVtt*| S*| Sx}x} L(G) = SF(G) L(G) = SF(G) ∩∩VVtt*; that is, the language of G is simply those *; that is, the language of G is simply those
sentential forms of G that are terminal strings.sentential forms of G that are terminal strings.
7
Context-Free Grammars: Concepts and Context-Free Grammars: Concepts and Notation Notation (Cont’d.)(Cont’d.)
Left-most derivation, a top-down parsersLeft-most derivation, a top-down parsers lmlm ,, lmlm
++, , lmlm** A sentential form produced via a leftmost A sentential form produced via a leftmost
derivation sequence is called a left sentential derivation sequence is called a left sentential form.form.
E.g. of leftmost derivation of F(V+V)E.g. of leftmost derivation of F(V+V)
EPrefix(E)EV TailPrefixFPrefixTailTail
G0
Elm Prefix(E)
lm F(E)
lm F(V Tail)
lm F(V+E)
lm F(V+V Tail)
lm F(V+V)
8
Context-Free Grammars: Concepts and Context-Free Grammars: Concepts and Notation Notation (Cont’d.)(Cont’d.)
Right-most derivation (canonical derivation)Right-most derivation (canonical derivation) rmrm ,, rmrm
++, , rmrm** Bottom-up parsersBottom-up parsers A sentential form produced via a rightmost derivation A sentential form produced via a rightmost derivation
sequence is called a right sentential form.sequence is called a right sentential form. E.g. of rightmost derivation of F(V+V)E.g. of rightmost derivation of F(V+V)
EPrefix(E)EV TailPrefixFPrefixTailTail
G0
Erm Prefix(E)
rm Prefix(V Tail)
rm Prefix(V+E)
rm Prefix(V+V Tail)
rm Prefix(V+V)
rm F(V+V)
Same # of steps, but different order
9
Context-Free Grammars: Concepts and Context-Free Grammars: Concepts and Notation Notation (Cont’d.)(Cont’d.)
A parse treeA parse tree Rooted by the start symbolRooted by the start symbol Its leaves are grammar symbols Its leaves are grammar symbols
or or
10
Context-Free Grammars: Concepts and Context-Free Grammars: Concepts and Notation Notation (Cont’d.)(Cont’d.)
AA phasephase of a sentential form is of a sentential form is a sequence of symbols a sequence of symbols descended from a single descended from a single nonterminal in the parse tree.nonterminal in the parse tree.
Simple or prime phraseSimple or prime phrase A simple phrase is a sequence A simple phrase is a sequence
of symbols directly derived form of symbols directly derived form a nonterminal.a nonterminal.
The The handlehandle of a sentential of a sentential form is the left-most simple form is the left-most simple phrase.phrase.
11
Errors in Context-Free GrammarsErrors in Context-Free Grammars
CFGs are a definitional mechanism. They CFGs are a definitional mechanism. They may have errors, just as programs may.may have errors, just as programs may.
Flawed CFGFlawed CFG Useless nonterminalsUseless nonterminals
UnreachableUnreachable Derive no terminal stringDerive no terminal string
SSA|BA|BAAaaBBBbBbCCcc
Nonterminal C cannot be reached form SNonterminal C cannot be reached form SNonterminal B derives no terminal stringNonterminal B derives no terminal string
S is the start symbol.S is the start symbol. Do exercise 7.Do exercise 7.
12
Errors in Context-Free Grammars Errors in Context-Free Grammars (Cont’d.)(Cont’d.)
AmbiguousAmbiguous Grammars that allow different parse trees for the Grammars that allow different parse trees for the
same terminal stringsame terminal string It is It is impossibleimpossible to decide whether a given to decide whether a given
CFG is ambiguousCFG is ambiguous
13
Errors in Context-Free Grammars Errors in Context-Free Grammars (Cont’d.)(Cont’d.)
It is impossible to decide whether a given It is impossible to decide whether a given CFG is ambiguousCFG is ambiguous For certain grammar classes, we can prove that For certain grammar classes, we can prove that
constituent grammars are unambiguousconstituent grammars are unambiguous Wrong languageWrong language A general comparison algorithm applicable A general comparison algorithm applicable
to all CFGs is known to be impossibleto all CFGs is known to be impossible
14
Transforming Extended BNF GrammarsTransforming Extended BNF Grammars Extended BNF Extended BNF BNF BNF
Extended BNF allows Extended BNF allows Square bracket []Square bracket [] Optional list {}Optional list {}
15
Parsers and RecognizersParsers and Recognizers
RecognizerRecognizer An algorithm that does Boolean-valued testAn algorithm that does Boolean-valued test
Is this input syntactically valid?Is this input syntactically valid?
ParserParser Answers more general questionsAnswers more general questions
Is this input valid?Is this input valid? And, if it is, what is its structure (parse tree)?And, if it is, what is its structure (parse tree)?
16
Parsers and Recognizers Parsers and Recognizers (Cont’d.)(Cont’d.)
Two general approaches to parsingTwo general approaches to parsing Top-down parserTop-down parser
Expanding the parse tree (via predictions) in a Expanding the parse tree (via predictions) in a depth-first mannerdepth-first manner
Preorder traversal of the parse treePreorder traversal of the parse tree PredictivePredictive in nature in nature lm lm LL parser, recursive descentLL parser, recursive descent
17
Parsers and Recognizers Parsers and Recognizers (Cont’d.)(Cont’d.)
Bottom-up parserBottom-up parser Beginning at its bottom (the leaves of the tree, Beginning at its bottom (the leaves of the tree,
which are terminal symbols) and determining the which are terminal symbols) and determining the productions used to generate the leavesproductions used to generate the leaves
Postorder traversal of the parse treePostorder traversal of the parse tree rmrm LR parser, shift-reduce parserLR parser, shift-reduce parser
18
Parsers and Recognizers Parsers and Recognizers (Cont’d.)(Cont’d.)
To parseTo parse begin SimpleStmt; SimpleStmt; end $
21
Parsers and Recognizers Parsers and Recognizers (Cont’d.)(Cont’d.)
Naming of parsing techniquesNaming of parsing techniques Top-downTop-down
LLLL Bottom-upBottom-up
LRLR
The way to parse The way to parse token sequencetoken sequence
L: Leftmost L: Leftmost R: Righmost R: Righmost
22
Grammar Analysis AlgorithmsGrammar Analysis Algorithms
Goal of this section:Goal of this section: Discuss a number of important analysis Discuss a number of important analysis
algorithms for Grammarsalgorithms for Grammars
23
Grammar Analysis AlgorithmsGrammar Analysis Algorithms (Cont’d.) (Cont’d.)
The data structure of a grammar GThe data structure of a grammar G
24
Grammar Analysis AlgorithmsGrammar Analysis Algorithms (Cont’d.) (Cont’d.)
What nonterminals can derive What nonterminals can derive ?? A A BCD BCD BC BC B B An iterative marking algorithmAn iterative marking algorithm
26
Grammar Analysis AlgorithmsGrammar Analysis Algorithms (Cont’d.) (Cont’d.)
First(First()) The set of all the terminal symbols that can The set of all the terminal symbols that can
begin a sentential form derivable from begin a sentential form derivable from If If is the right-hand side of a production, then is the right-hand side of a production, then
First(First() contains terminal symbols that begin ) contains terminal symbols that begin strings derivable from strings derivable from
First(First()={a)={aVVtt| | * * aa}}
{if {if * * then { then {} else } else }}
27
根據定義 , FIRST(X) 集合之計算可依下列三步驟而得 :
1. If XT, then FIRST(X) = {X}. 2. If XN, X→, then add to FIRST(X). 3. If XN, and X → Y1 Y2 . . . Yn, then add all
non- elements of FIRST(Y1) to FIRST(X), if FIRST(Y1), then add all non- elements of FIRST(Y2) to FIRST(X), ..., if FIRST(Yn), then add to FIRST(X).
28
文法 文法 G G 定義如下定義如下 ::
E E TE’ TE’
E’ E’ +TE’ | +TE’ |
T T FT’ FT’
T’T’ *FT’ | *FT’ |
F F (E) | id (E) | id
則其 則其 FIRST FIRST 求解如下求解如下 :: FIRSTFIRSTEE (( ididE’E’ ++ TT (( ididT’T’ ** FF (( idid
29
Follow(A)Follow(A) A is any nonterminalA is any nonterminal Follow(A) is the set of terminals that may follow A Follow(A) is the set of terminals that may follow A
in some sentential formin some sentential form
Follow(A)={aFollow(A)={aVVtt|S|S* * Aa Aa } }
{if S {if S + + A then {A then {} else } else }}
30
根據定義根據定義 , FOLLOW(X), FOLLOW(X) 集合之計算可依下列三集合之計算可依下列三步驟而得步驟而得 ::
1. Put $ into FOLLOW(S).1. Put $ into FOLLOW(S). 2. For each A2. For each A BB, add all non-, add all non- elements elements
of of FIRST(FIRST()) to FOLLOW(B). to FOLLOW(B). 3. For each A3. For each A B or AB or A BB, where , where
FIRST(FIRST()), add all of FOLLOW(A) to , add all of FOLLOW(A) to FOLLOW(B).FOLLOW(B).
31
文法 文法 G G 定義如下定義如下 ::
E E TE’ TE’
E’ E’ +TE’ | +TE’ |
T T FT’ FT’
T’T’ *FT’ | *FT’ |
F F (E (E)) | id | id
則其 則其 FIRST FIRST 求解如下求解如下 :: FIRSTFIRST
FF (( idid
T’T’ ** TT (( idid
E’E’ ++ EE (( idid
FOLLOW 之求解 :
FOLLOW
E $ )E’ $ )T + $ )T’ + $ )F * + $ )
32
Grammar Analysis Algorithms Grammar Analysis Algorithms (Cont’d.)(Cont’d.)
Definition of C data structures and Definition of C data structures and subroutinessubroutines first_set[X]first_set[X]
contains terminal symbols and contains terminal symbols and X is any single vocabulary symbolX is any single vocabulary symbol
follow_set[A]follow_set[A] contains terminal symbols and contains terminal symbols and A is a nonterminal symbolA is a nonterminal symbol
37
EPrefix(E)EV TailPrefixFPrefixTailTail
G0
The execution of fill_follow_set() using grammar G0
$,) ( $,)
38
More examplesMore examples
S aSeS BB bBeB CC cCeC d The execution of fill_follow_set()
The execution of fill_first_set()
$,e $,e $,e