1 contents introduction introduction a simple compiler a simple compiler scanning – theory and...

39
1 Contents Contents Introduction Introduction A Simple Compiler A Simple Compiler Scanning – Theory and Practice Scanning – Theory and Practice Grammars and Parsing Grammars and Parsing LL(1) Parsing LL(1) Parsing LR Parsing LR Parsing Lex and yacc Lex and yacc Semantic Processing Semantic Processing Symbol Tables Symbol Tables Run-time Storage Organization Run-time Storage Organization Code Generation and Local Code Optimization Code Generation and Local Code Optimization Global Optimization Global Optimization

Post on 22-Dec-2015

245 views

Category:

Documents


1 download

TRANSCRIPT

1

ContentsContents IntroductionIntroduction A Simple CompilerA Simple Compiler Scanning – Theory and PracticeScanning – Theory and Practice Grammars and ParsingGrammars and Parsing LL(1) ParsingLL(1) Parsing LR ParsingLR Parsing Lex and yaccLex and yacc Semantic ProcessingSemantic Processing Symbol TablesSymbol Tables Run-time Storage OrganizationRun-time Storage Organization Code Generation and Local Code OptimizationCode Generation and Local Code Optimization Global OptimizationGlobal Optimization

2

Chapter 4 Chapter 4 Grammars and ParsingGrammars and Parsing

3

OutlineOutline Context-Free GrammarsContext-Free Grammars Errors in Context-Free GrammarsErrors in Context-Free Grammars Transforming Extended BNF GrammarsTransforming Extended BNF Grammars Parsers and RecognizersParsers and Recognizers Grammar Analysis AlgorithmsGrammar Analysis Algorithms

4

Context-Free Grammars: Concepts and Context-Free Grammars: Concepts and NotationNotation

A context-free grammar A context-free grammar G = (G = (VVtt,, V Vnn,, S S,, P P)) A finite A finite terminal vocabularyterminal vocabulary VVtt

The token set produced by scannerThe token set produced by scanner

A finite set of A finite set of nonterminal vacabularynonterminal vacabulary VVnn

Intermediate symbolsIntermediate symbols

A A start symbolstart symbol SS Vn that starts all derivationsVn that starts all derivations Also called goal symbolAlso called goal symbol

P, a finite set of P, a finite set of productionsproductions (rewriting rules) (rewriting rules) of the form of the form AAXX11XX22XXmm

AAVVnn, X, Xi i VVnn ∪ ∪ VVtt, 1, 1i i mm AAis a valid productionis a valid production

5

Context-Free Grammars: Concepts and Context-Free Grammars: Concepts and Notation Notation (Cont’d.)(Cont’d.)

Other notationsOther notations The vocabulary V of a CFG is the set of terminal The vocabulary V of a CFG is the set of terminal

and nonterminal symbols and nonterminal symbols V= V= VVnn∪∪VVtt

L(G), the set of strings derivable from S compriseL(G), the set of strings derivable from S comprise Context-free language of grammar GContext-free language of grammar G

Notational conventionsNotational conventions a, b, c, a, b, c, denote symbols in denote symbols in VVtt

A, B, C, A, B, C, denote symbols in denote symbols in VVnn

U, V, W, U, V, W, denote symbols in denote symbols in VV , , , , ,,denote strings in denote strings in V*V* u, v, w, u, v, w, denote strings in denote strings in VVtt**

6

Context-Free Grammars: Concepts and Context-Free Grammars: Concepts and Notation Notation (Cont’d.)(Cont’d.)

DerivationDerivation One step derivationOne step derivation

If If AA, then , then AA One-step derivation One-step derivation One or more steps derivation One or more steps derivation

Zero or more steps derivation Zero or more steps derivation

If If S S , then , then is said to be is said to be sentential formsentential form of the of the CFG.CFG. SF(G) is the set of sentential forms of grammar G.SF(G) is the set of sentential forms of grammar G.

L(G) = {x L(G) = {x VVtt*| S*| Sx}x} L(G) = SF(G) L(G) = SF(G) ∩∩VVtt*; that is, the language of G is simply those *; that is, the language of G is simply those

sentential forms of G that are terminal strings.sentential forms of G that are terminal strings.

7

Context-Free Grammars: Concepts and Context-Free Grammars: Concepts and Notation Notation (Cont’d.)(Cont’d.)

Left-most derivation, a top-down parsersLeft-most derivation, a top-down parsers lmlm ,, lmlm

++, , lmlm** A sentential form produced via a leftmost A sentential form produced via a leftmost

derivation sequence is called a left sentential derivation sequence is called a left sentential form.form.

E.g. of leftmost derivation of F(V+V)E.g. of leftmost derivation of F(V+V)

EPrefix(E)EV TailPrefixFPrefixTailTail

G0

Elm Prefix(E)

lm F(E)

lm F(V Tail)

lm F(V+E)

lm F(V+V Tail)

lm F(V+V)

8

Context-Free Grammars: Concepts and Context-Free Grammars: Concepts and Notation Notation (Cont’d.)(Cont’d.)

Right-most derivation (canonical derivation)Right-most derivation (canonical derivation) rmrm ,, rmrm

++, , rmrm** Bottom-up parsersBottom-up parsers A sentential form produced via a rightmost derivation A sentential form produced via a rightmost derivation

sequence is called a right sentential form.sequence is called a right sentential form. E.g. of rightmost derivation of F(V+V)E.g. of rightmost derivation of F(V+V)

EPrefix(E)EV TailPrefixFPrefixTailTail

G0

Erm Prefix(E)

rm Prefix(V Tail)

rm Prefix(V+E)

rm Prefix(V+V Tail)

rm Prefix(V+V)

rm F(V+V)

Same # of steps, but different order

9

Context-Free Grammars: Concepts and Context-Free Grammars: Concepts and Notation Notation (Cont’d.)(Cont’d.)

A parse treeA parse tree Rooted by the start symbolRooted by the start symbol Its leaves are grammar symbols Its leaves are grammar symbols

or or

10

Context-Free Grammars: Concepts and Context-Free Grammars: Concepts and Notation Notation (Cont’d.)(Cont’d.)

AA phasephase of a sentential form is of a sentential form is a sequence of symbols a sequence of symbols descended from a single descended from a single nonterminal in the parse tree.nonterminal in the parse tree.

Simple or prime phraseSimple or prime phrase A simple phrase is a sequence A simple phrase is a sequence

of symbols directly derived form of symbols directly derived form a nonterminal.a nonterminal.

The The handlehandle of a sentential of a sentential form is the left-most simple form is the left-most simple phrase.phrase.

11

Errors in Context-Free GrammarsErrors in Context-Free Grammars

CFGs are a definitional mechanism. They CFGs are a definitional mechanism. They may have errors, just as programs may.may have errors, just as programs may.

Flawed CFGFlawed CFG Useless nonterminalsUseless nonterminals

UnreachableUnreachable Derive no terminal stringDerive no terminal string

SSA|BA|BAAaaBBBbBbCCcc

Nonterminal C cannot be reached form SNonterminal C cannot be reached form SNonterminal B derives no terminal stringNonterminal B derives no terminal string

S is the start symbol.S is the start symbol. Do exercise 7.Do exercise 7.

12

Errors in Context-Free Grammars Errors in Context-Free Grammars (Cont’d.)(Cont’d.)

AmbiguousAmbiguous Grammars that allow different parse trees for the Grammars that allow different parse trees for the

same terminal stringsame terminal string It is It is impossibleimpossible to decide whether a given to decide whether a given

CFG is ambiguousCFG is ambiguous

13

Errors in Context-Free Grammars Errors in Context-Free Grammars (Cont’d.)(Cont’d.)

It is impossible to decide whether a given It is impossible to decide whether a given CFG is ambiguousCFG is ambiguous For certain grammar classes, we can prove that For certain grammar classes, we can prove that

constituent grammars are unambiguousconstituent grammars are unambiguous Wrong languageWrong language A general comparison algorithm applicable A general comparison algorithm applicable

to all CFGs is known to be impossibleto all CFGs is known to be impossible

14

Transforming Extended BNF GrammarsTransforming Extended BNF Grammars Extended BNF Extended BNF BNF BNF

Extended BNF allows Extended BNF allows Square bracket []Square bracket [] Optional list {}Optional list {}

15

Parsers and RecognizersParsers and Recognizers

RecognizerRecognizer An algorithm that does Boolean-valued testAn algorithm that does Boolean-valued test

Is this input syntactically valid?Is this input syntactically valid?

ParserParser Answers more general questionsAnswers more general questions

Is this input valid?Is this input valid? And, if it is, what is its structure (parse tree)?And, if it is, what is its structure (parse tree)?

16

Parsers and Recognizers Parsers and Recognizers (Cont’d.)(Cont’d.)

Two general approaches to parsingTwo general approaches to parsing Top-down parserTop-down parser

Expanding the parse tree (via predictions) in a Expanding the parse tree (via predictions) in a depth-first mannerdepth-first manner

Preorder traversal of the parse treePreorder traversal of the parse tree PredictivePredictive in nature in nature lm lm LL parser, recursive descentLL parser, recursive descent

17

Parsers and Recognizers Parsers and Recognizers (Cont’d.)(Cont’d.)

Bottom-up parserBottom-up parser Beginning at its bottom (the leaves of the tree, Beginning at its bottom (the leaves of the tree,

which are terminal symbols) and determining the which are terminal symbols) and determining the productions used to generate the leavesproductions used to generate the leaves

Postorder traversal of the parse treePostorder traversal of the parse tree rmrm LR parser, shift-reduce parserLR parser, shift-reduce parser

18

Parsers and Recognizers Parsers and Recognizers (Cont’d.)(Cont’d.)

To parseTo parse begin SimpleStmt; SimpleStmt; end $

19

20

21

Parsers and Recognizers Parsers and Recognizers (Cont’d.)(Cont’d.)

Naming of parsing techniquesNaming of parsing techniques Top-downTop-down

LLLL Bottom-upBottom-up

LRLR

The way to parse The way to parse token sequencetoken sequence

L: Leftmost L: Leftmost R: Righmost R: Righmost

22

Grammar Analysis AlgorithmsGrammar Analysis Algorithms

Goal of this section:Goal of this section: Discuss a number of important analysis Discuss a number of important analysis

algorithms for Grammarsalgorithms for Grammars

23

Grammar Analysis AlgorithmsGrammar Analysis Algorithms (Cont’d.) (Cont’d.)

The data structure of a grammar GThe data structure of a grammar G

24

Grammar Analysis AlgorithmsGrammar Analysis Algorithms (Cont’d.) (Cont’d.)

What nonterminals can derive What nonterminals can derive ?? A A BCD BCD BC BC B B An iterative marking algorithmAn iterative marking algorithm

25

26

Grammar Analysis AlgorithmsGrammar Analysis Algorithms (Cont’d.) (Cont’d.)

First(First()) The set of all the terminal symbols that can The set of all the terminal symbols that can

begin a sentential form derivable from begin a sentential form derivable from If If is the right-hand side of a production, then is the right-hand side of a production, then

First(First() contains terminal symbols that begin ) contains terminal symbols that begin strings derivable from strings derivable from

First(First()={a)={aVVtt| | * * aa}}

{if {if * * then { then {} else } else }}

27

根據定義 , FIRST(X) 集合之計算可依下列三步驟而得 :

1. If XT, then FIRST(X) = {X}. 2. If XN, X→, then add to FIRST(X). 3. If XN, and X → Y1 Y2 . . . Yn, then add all

non- elements of FIRST(Y1) to FIRST(X), if FIRST(Y1), then add all non- elements of FIRST(Y2) to FIRST(X), ..., if FIRST(Yn), then add to FIRST(X).

28

文法 文法 G G 定義如下定義如下 ::

E E TE’ TE’

E’ E’ +TE’ | +TE’ |

T T FT’ FT’

T’T’ *FT’ | *FT’ |

F F (E) | id (E) | id

則其 則其 FIRST FIRST 求解如下求解如下 :: FIRSTFIRSTEE (( ididE’E’ ++ TT (( ididT’T’ ** FF (( idid

29

Follow(A)Follow(A) A is any nonterminalA is any nonterminal Follow(A) is the set of terminals that may follow A Follow(A) is the set of terminals that may follow A

in some sentential formin some sentential form

Follow(A)={aFollow(A)={aVVtt|S|S* * Aa Aa } }

{if S {if S + + A then {A then {} else } else }}

30

根據定義根據定義 , FOLLOW(X), FOLLOW(X) 集合之計算可依下列三集合之計算可依下列三步驟而得步驟而得 ::

1. Put $ into FOLLOW(S).1. Put $ into FOLLOW(S). 2. For each A2. For each A BB, add all non-, add all non- elements elements

of of FIRST(FIRST()) to FOLLOW(B). to FOLLOW(B). 3. For each A3. For each A B or AB or A BB, where , where

FIRST(FIRST()), add all of FOLLOW(A) to , add all of FOLLOW(A) to FOLLOW(B).FOLLOW(B).

31

文法 文法 G G 定義如下定義如下 ::

E E TE’ TE’

E’ E’ +TE’ | +TE’ |

T T FT’ FT’

T’T’ *FT’ | *FT’ |

F F (E (E)) | id | id

則其 則其 FIRST FIRST 求解如下求解如下 :: FIRSTFIRST

FF (( idid

T’T’ ** TT (( idid

E’E’ ++ EE (( idid

FOLLOW 之求解 :

FOLLOW

E $ )E’ $ )T + $ )T’ + $ )F * + $ )

32

Grammar Analysis Algorithms Grammar Analysis Algorithms (Cont’d.)(Cont’d.)

Definition of C data structures and Definition of C data structures and subroutinessubroutines first_set[X]first_set[X]

contains terminal symbols and contains terminal symbols and X is any single vocabulary symbolX is any single vocabulary symbol

follow_set[A]follow_set[A] contains terminal symbols and contains terminal symbols and A is a nonterminal symbolA is a nonterminal symbol

33

It is a subroutine of It is a subroutine of fill_first_set()fill_first_set()

34

35

EPrefix(E)EV TailPrefixFPrefixTailTail

G0

The execution of fill_first_set() using grammar G0

36

37

EPrefix(E)EV TailPrefixFPrefixTailTail

G0

The execution of fill_follow_set() using grammar G0

$,) ( $,)

38

More examplesMore examples

S aSeS BB bBeB CC cCeC d The execution of fill_follow_set()

The execution of fill_first_set()

$,e $,e $,e

39

More examplesMore examples

S ABcA aA B bB

The execution of fill_follow_set()

The execution of fill_first_set()

$ b,c c