winter 2007seg2101 chapter 91 chapter 9 syntax analysis

73
Winter 2007 SEG2101 Chapter 9 1 Chapter 9 Syntax Analysis

Upload: austin-homer-webb

Post on 29-Dec-2015

256 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 1

Chapter 9

Syntax Analysis

Page 2: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 2

Contents

• Context free grammars

• Top-down parsing

• Bottom-up parsing

• Attribute grammars

• Dynamic semantics

• Tools for syntax analysis

• Chomsky’s hierarchy

Page 3: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 3

The Role of Parser

Page 4: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 4

9.1: Context Free Grammars• A context free grammar consists of terminals,

nonterminals, a start symbol, and productions.

• Terminals are the basic symbols from which strings are formed.

• Nonterminals are syntactic variables that denote sets of strings.

• One nonterminal is distinguished as the start symbol.

• The productions of a grammar specify the manner in which the terminal and nonterminals can be combined to form strings.

• A language that can be generated by a grammar is said to be a context-free language.

Page 5: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 5

Example of Grammar

Page 6: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 6

Notational Conventions

• Aho P.166• Example P.167

EEAE|(E)|-E|idA+|-|*|/|

Page 7: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 7

Derivations

• E-E is read “E derives -E”• E-E-(E)=-(id)is called a derivation

of -(id) from E.

• If A is a production and and are arbitrary strings of grammar symbols, we say A .

• If 12... n, we say 1 derives n.

Page 8: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 8

Derivations (II)

means “derives in one step.” means “derives in zero or more steps.”

– if and then

means “derives in one or more steps.”

• If S, where may contain nonterminals, then we say that is a sentential form.

*

*

* *

+

*

Page 9: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 9

Derivations (III)

• G: grammar, S: start symbol, L(G): the language generated by G.

• Strings in L(G) may contain only terminal symbols of G.

• A string of terminal w is said to be in L(G) if and only if Sw.

• The string w is called a sentence of G.

• A language that can be generated by a grammar is said to be a context-free language.

• If two grammars generate the same language, the grammars are said to be equivalent.

+

+

Page 10: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 10

Derivations (IV)

EEAE|(E)|-E|idA+|-|*|/|

• The string -(id+id) is a sentence of the above grammar because E-E-(E+E)-(id+E)-(id+id)We write E-(id+id)*

Page 11: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 11

Parse Tree

EE+E|E*E|(E)|-E|id

Page 12: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 12

Parse Tree (II)

Page 13: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 13

Two Parse Trees

Page 14: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 14

Ambiguity

• A grammar that produces more than one parse tree for some sentence is said to be ambiguous.

Page 15: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 15

Eliminating Ambiguity

• Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.– E.g. “match each else with the closest

unmatched then”

Page 16: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 16

Eliminating Left Recursion

• A grammar is left recursive if it has a nonterminal A such that there is a derivation AA for some string .

• AA| can be replaced by A A’A’A’|

• AA1|A2 |… |Am|1|2|…|n|A1A’|2A’|…|nA’|

A’1A’|2A’|… mA’|

+

Page 17: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 17

Algorithm: Eliminating Left Recursion

Page 18: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 18

Examples

SAa|bAAc|Sd|

AAc|Aad|bd|

SAa|bAbdA’|A’A’cA’|adA’|

Page 19: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 19

Left Factoring

• Left factoring is a grammar transformation that is useful for producing a grammar suitable for predictive parsing.

• The basic idea is that when it is not clear which of two alternative productions to use to expand a nonterminal A, we may be able to rewrite the A-productions to defer the decision until we have seen enough of the input to make the right choice.

• Stmt --> if expr then stmt else stmt• | if expr then stmt

Page 20: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 20

Algorithm: Left Factoring

Page 21: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 21

Left Factoring (example p178)

• A1|2

• The following grammar abstracts the dangling-else problem:– SiEtS|iEtSeS|a– Eb

Page 22: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 22

9.2: Top Down Parsing

• Recursive-descent parsing• Predictive parsers• Nonrecursive predictive parsing• FIRST and FOLLOW• Construction of predictive parsing table• LL(1) grammars• Error recovery in predictive parsing (if time

permits)

Page 23: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 23

Recursive-Descent Parsing

• Top-down parsing can be viewed as an attempt to find a leftmost derivation for an input string.

• It can also viewed as an attempt to construct a parse tree for the input string from the root and creating the nodes of the parse tree in preorder.

Input stringw = cad

Grammar:

Page 24: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 24

Predictive Parsers

• By carefully writing a grammar, eliminating left recursion, and left factoring the resulting grammar, we can obtain a grammar that can be parsed by a recursive-descent parser that needs no backtracking, i.e., a predictive parser.

ScAdAaA’A’b|

Page 25: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 25

Predictive Parser (II)

• Recursive-descent parsing is a top-down method of syntax analysis in which we execute a set of recursive procedures to process the input.

• A procedures is associated with each nonterminal of a grammar.

• Predictive parsing is what in which the look-ahead symbol unambiguously determines the procedure selected for each nonterminal.

• The sequence of procedures called in processing the input implicitly defines a parse tree for the input.

Page 26: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 26

Page 27: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 27

Page 28: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 28

Nonrecursive predictive parsing

Page 29: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 29

Predictive Parsing Program

Page 30: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 30

Parsing Table M

Input: id + id * id

Grammar:

Page 31: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 31

Moves Made by Predictive Parser

Page 32: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 32

FIRST and FOLLOW

• If is any string of grammar symbols, FIRST() is the set of terminals that begin the strings derived from . If then is also in FIRST().

• FOLLOW(A), for nonternimal A, is the set of terminals a that can appear immediately to the right of A in some sentential form, i.e. the set of terminals a such that there exists a derivation of the form SAa for some and .

• If A can be the rightmost symbol in some sentential form, the $ is in FOLLOW(A).

*

Page 33: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 33

Compute FIRST(X)

Page 34: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 34

Compute FOLLOW(A)

Page 35: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 35

Construction of Predictive Parsing Tables

Page 36: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 36

Example of Producing Parsing Table

Page 37: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 37

LL(1) Grammars

• A grammar whose parsing table has no multiply-defined entries is said to be LL(1).

• First L: scanning from left to right

• Second L: producing a leftmost derivation

• 1: using one input symbol of lookahead at each step to make parsing action decision.

Page 38: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 38

Properties of LL(1)

• No ambiguous or left recursive grammar can be LL(1).

• Grammar G is LL(1) iff whenever A| are two distinct productions of G and:– For no terminal a do both and derive strings

beginning with a. FIRST()FIRST()=

– At most one of and can derive the empty string.– If , the does not derive any string beginning with

a terminal in FOLLOW(A).FIRST(FOLLOW(A))FIRST(FOLLOW(A))=

*

Page 39: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 39

LL(1) Grammars: Example

Page 40: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 40

Non-LL(1) Grammar: Example

Page 41: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 41

Error recovery in predictive parsing

• An error is detected during the predictive parsing when the terminal on top of the stack does not match the next input symbol, or when nonterminal A on top of the stack, a is the next input symbol, and parsing table entry M[A,a] is empty.

• Panic-mode error recovery is based on the idea of skipping symbols on the input until a token in a selected set of synchronizing tokens.

Page 42: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 42

How to select synchronizing set?

• Place all symbols in FOLLOW(A) into the synchronizing set for nonterminal A. If we skip tokens until an element of FOLLOW(A) is seen and pop A from the stack, it likely that parsing can continue.

• We might add keywords that begins statements to the synchronizing sets for the nonterminals generating expressions.

Page 43: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 43

How to select synchronizing set? (II)

• If a nonterminal can generate the empty string, then the production deriving can be used as a default. This may postpone some error detection, but cannot cause an error to be missed. This approach reduces the number of nonterminals that have to be considered during error recovery.

• If a terminal on top of stack cannot be matched, a simple idea is to pop the terminal, issue a message saying that the terminal was inserted.

Page 44: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 44

Example: error recovery“synch” indicating synchronizing tokens obtained from FOLLOW set of the nonterminal in question.

If the parser looks up entry M[A,a] and finds that it is blank, the input symbol a is skipped.

If the entry is synch, the the nonterminal on top of the stack is popped.

If a token on top of the stack does not match the input symbol, then we pop the token from the stack.

Page 45: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 45

Example: error recovery (II)

Page 46: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 46

9.3: Bottom Up Parsing and LR Parsers

• Shift-reduce parsing attempts to construct a parse tree for an input string beginning at the leaves (bottom) and working up towards the root (top).

• “Reducing” a string w to the start symbol of a grammar.

• At each reduction step a particular substring machining the right side of a production is replaced by the symbol on the left of that production, and if the substring is chosen correctly at each step, a rightmost derivation is traced out in reverse.

Page 47: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 47

Example

• Grammar:SaABeAAbc|bBd

• Reduction:abbcdeaAbcdeaAdeaABeS

Page 48: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 48

Operator-Precedence ParsingGrammar for expression

Can be rewritten as

With the precedence relations inserted, id + id * id can be written as:

Page 49: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 49

LR(k) Parsers

• L: left-to-right scanning of the input

• R: constructing a rightmost derivation in reverse

• k: the number of input symbols of lookahead that are used in making parsing decisions.

Page 50: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 50

LR Parsing

Page 51: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 51

Shift-Reduce Parser

Page 52: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 52

Example LR Parsing Table

Page 53: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 53

9.4: Attributes Grammars

• An attribute grammar is a device used to describe more of the structure of a programming language than is possible with a context-free grammar.

• Some of the semantic properties can be evaluated at compile-time, they are called "static semantics", other properties are determined at execution time, they are called

"dynamic semantics". • The static semantics is often represented by semantic

attributes which are associated with the nonterminals.

Page 54: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 54

Attribute Grammars

• Grammars with added attributes, attribute computation functions, and predicate functions.

• Attributes: similar to variables

• Attribute computation functions: specify how attribute values are computed

• Predicate functions: state some of the syntax and static semantic rules of the language

Page 55: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 55

Example of Attribute Grammar

Page 56: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 56

Example (II)

Page 57: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 57

Example (III)

Page 58: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 58

Example (IV)

Page 59: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 59

9.5: Dynamic Semantics

• Informal definition: Only informal explanations are given (in natural language) which define the meaning of

programs (e.g. language reference manuals, etc.).

• Operational semantics: The meaning of the constructs of the programming language is defined in terms of the translation into another lower-level language and the semantics of this lower-level language. Usually only the translation is defined formally, the semantics of

the lower-level language is defined informally.

Page 60: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 60

Axiomatic Semantics• Axiomatic semantics was defined to prove the correctness of

programs.

• This approach is related to the approach of defining the semantics of a procedure (independently of its code) in terms of pre- and post-conditions that define properties of input and output parameters and values of state variables.

• Weakest precondition: For a given statement, and a given postcondition that should hold after its execution, the weakest precondition is the weakest condition which ensures, when it holds before the execution of the statement, that the given

postcondition holds afterwards.

Page 61: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 61

Denotational Semantics

• Denotational semantics is a method for describing the meaning of programs.

• It is based on recursive function theory.

• Grammar:<bin_num> 0 | 1

| <bin_num> 0

| <bin_num> 1

• Function Mmin:

Page 62: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 62

9.6: Tools for Syntax Analysis (YACC)

Page 63: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 63

Page 64: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 64

Syntax Graphs• A graph is a collection of nodes, some of which are connected by lines (edges).

• A directed graph is one in which the lines are directional.

• A parse tree is a restricted form of directed graph.

• Syntax graph is a directed graph representing the information in BNF rules.

Page 65: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 65

9.7: Chomsky Hierarchy

Page 66: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 66

Turing Machine

Page 67: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 67

Turing Machine (II)

• Unrestricted grammar

• Recognized by Turing machine

• It consists of a read-write head that can be positioned anywhere along an infinite tape.

• It is not a useful class of language for compiler design.

Page 68: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 68

Linear-Bounded Automata

Page 69: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 69

Linear-Bounded Automata

• Context-sensitive

• Restrictions– Left-hand of each production must have at least

one nonterminal in it– Right-hand side must not have fewer symbols

than the left– There can be no empty productions (N)

Page 70: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 70

Push-Down Automata

Page 71: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 71

Push-Down Automata (II)

• Context-free

• Recognized by push-down automata

• Can only read its input tape but has a stack that can grow to arbitrary depth where it can save information

• An automation with a read-only tape and two independent stacks is equivalent to a Turing machine.

• It allows at most a single nonterminal (and no terminal) on the left-hand side of each production.

Page 72: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 72

Finite-State Automata

Page 73: Winter 2007SEG2101 Chapter 91 Chapter 9 Syntax Analysis

Winter 2007 SEG2101 Chapter 9 73

Finite State Automata (II)

• Regular language• Anything that must be remembered about

the context of a symbol on the input tape must be preserved in the state of the machine.

• It allows only one symbol (a nonterminal) on the left-hand, and only one or two symbols on the right.