1 syntax analysis introduction to parsers context-free grammars push-down automata top-down parsing...

107
1 Syntax Analysis Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars and parsers Bison/Yacc - parser gen erators Error Handling: Detecti on & Recovery

Upload: thomasine-stanley

Post on 26-Dec-2015

272 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

1

Syntax AnalysisSyntax Analysis

Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars and parsers Bison/Yacc - parser generators Error Handling: Detection & R

ecovery

Page 2: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

2

Introduction to parsersIntroduction to parsers

LexicalAnalyzer

Parser

SymbolTable

token

next token

source SemanticAnalyzer

syntaxtreecode

CFG

Page 3: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

3

Context Free GrammarContext Free Grammar

CFG & Terminology Rewrite vs. Reduce Derivation

Language and CFL Equivalence & CNF

Parsing vs. Derivation lm/rm derivation & parse tree Ambiguity & resolution

Expressive power

Derivation is the reverse of Parsing.If we know how sentences are derived, we may find a parsing method in the reversed direction.

Page 4: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

4

CFG: An ExampleCFG: An Example

Terminals: id, ‘+’, ‘-’, ‘*’, ‘/’, ‘(’, ‘)’Nonterminals: expr, opProductions:

expr expr op expr expr ‘(’ expr ‘)’

expr ‘-’ expr expr id

op ‘+’ | ‘-’ | ‘*’ | ‘/’ The start symbol: expr

Page 5: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

5

Notational Conventions in CFGNotational Conventions in CFG

• a, b, c, … [+-0-9], id: symbols in • A, B, C,…,S, expr,stmt: symbols in N• U, V, W,…,X,Y,Z: grammar symbols in(+N)• …denotes strings in (+N)*

• u, v, w,… denotes strings in *

• is an abbreviation of

• Alternatives: … at RHS

||| A

A

AA

Page 6: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

7

Context-Free GrammarsContext-Free Grammars

A set of terminals: basic symbols from which sentences are formed

A set of nonterminals: syntactic variables denoting sets of strings

A set of productions: rules specifying how the terminals and nonterminals can be combined to form sentences

The start symbol: a distinguished nonterminal denoting the language

Page 7: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

8

CFG: ComponentsCFG: ComponentsSpecification for Structures & ConstituencySpecification for Structures & Constituency

• CFG: formal specification of structure (parse trees)– G = {, N, P, S} : terminal symbols– N: non-terminal symbols– P: production rules– S: start symbol

Page 8: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

9

CFG: ComponentsCFG: Components

: terminal symbols– the input symbols of the language

• programming language: tokens (reserved words, variables, operators, …)

• natural languages: words or parts of speech

– pre-terminal: parts of speech (when words are regarded as terminals)

• N: non-terminal symbols– groups of terminals and/or other non-terminals

• S: start symbol: the largest constituent of a parse tree

Page 9: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

10

CFG: ComponentsCFG: Components

• P: production (re-writing) rules– form: A → β (A: non-terminal, β: string of

terminals and non-terminals)– meaning: A re-writes to (“consists of”, “derived

into”)β, or β reduced to A – start with “S-productions” (S → β)

Page 10: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

11

DerivationsDerivations

A derivation step is an application of a production as a rewriting rule

E - EA sequence of derivation steps

E - E - ( E ) - ( id ) is called a derivation of “- ( id )” from E

The symbol * denotes “derives in zero or more steps”; the symbol + denotes “derives in one or more steps

Page 11: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

12

CFG: Accepted LanguagesCFG: Accepted Languages

• Context-Free Language– Language accepted by a CFG

• L(G) = { | S + (strings of terminals that can be derived from start symbol)}

– Proof of acceptance: by induction• On the number of derivation steps

• On the length of input string

Page 12: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

13

Context-Free LanguagesContext-Free Languages

A context-free language L(G) is the language defined by a context-free grammar G

A string of terminals is in L(G) if and only if S + , is called a sentence of G

If S * , where may contain nonterminals, then we call a sentential form of G

E - E - ( E ) - ( id ) G1 is equivalent to G2 if L(G1) = L(G2)

Page 13: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

14

CFG: EquivalenceCFG: Equivalence• Chomsky Normal Form (CNF) (Chomsky, 1963):

– ε-free, and– Every production rule is in either of the following

form:• A → A1 A2 [two non-terminals: A1, A2], or• A → a [a terminal: a]

– i.e., two non-terminals or one terminal at the RHS

• Properties:– Generate binary parse tree– Good simplification for some algorithms

• e.g., grammar training with the inside-outside algorithm (Baker 1979)

– Good tool for theoretical proving• e.g., time complexity

Page 14: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

15

CFG: EquivalenceCFG: Equivalence

• Every CFG can be converted into a weakly equivalent CNF– equivalence: L(G1) = L(G2)

• strong equivalent: assign the same phrase structure to each sentence (except for renaming non-terminals)

• weak equivalent: do not assign the same phrase structure to each sentence

– e.g., A → B C D == {A → B X, X → CD}

Page 15: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

16

CFG: An ExampleCFG: An Example

Terminals: id, ‘+’, ‘-’, ‘*’, ‘/’, ‘(’, ‘)’Nonterminals: E, opProductions:

E E op E …[R1] E ‘(’ E ‘)’ …[R2] E ‘-’ E …[R3] E id …[R4] op ‘+’ | ‘-’ | ‘*’ | ‘/’

The start symbol: E

Page 16: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

17

Left- & Right-most DerivationsLeft- & Right-most DerivationsEach derivation step needs to choose

– a nonterminal to rewrite– an alternative to apply

A leftmost derivation always chooses the leftmost nonterminal to rewrite

E lm - E lm - ( E ) lm - ( E + E ) lm - ( id + E ) lm - ( id + id )

A rightmost (canonical) derivation always chooses the rightmost nonterminal to rewrite

E rm - E rm - ( E ) rm - ( E + E ) rm - (E + id ) rm - ( id + id )

Page 17: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

18

Left- & Right-most DerivationsLeft- & Right-most Derivations Representation of leftmost/rightmost derivations:

Use the sequence of productions (or production numbers) to represent a derivation sequence.

Example:E rm - E rm - ( E ) rm - ( E + E )

rm - (E + id ) rm - ( id + id ) => [3], [2], [1], [4], [4] (~ R3, R2, R1, R4, R

4)Advantage: A compact representation for

parse tree (data compression)Each parse tree has a unique leftmost/rightmo

st derivation

R3

R2 R1

Page 18: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

19

Parse TreesParse Trees

A parse tree is a graphical representation for a derivation that filters out the order of choosing nonterminals for rewriting

PP

in

NP

NP

girl the park

NP

Page 19: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

20

Context Free Grammar (CFG): Context Free Grammar (CFG): Specification for Structures & ConstituencySpecification for Structures & Constituency

• Parse Tree: graphical representation of structure– Root node (S): a sentencial level structure

– Internal nodes: constituents of the sentence

– Arcs: relationship between parent nodes and their children (constituents)

– Terminal nodes: surface forms of the input symbols (e.g., words)

• Bracketed notation: Alternative representation• e.g., [I saw [the [girl [in [the park]]]]]

Page 20: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

21

Parse Tree:Parse Tree:“I saw the girl in the park”“I saw the girl in the park”

PP

in

NP

NP

girl the parkI saw the

NP

S

VP

vpron det n p det n

1st parse

Page 21: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

22

Parse Tree:Parse Tree:“I saw the girl in the park”“I saw the girl in the park”

PP

in

NP

NP

girl the park

NP

I saw the

NP

S

VP

vpron det n p det n

2nd parse

Page 22: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

23

LM & RM: An ExampleLM & RM: An Example

E

-

( )

+

id id

E

E E

E E lm - E lm - ( E ) lm - ( E + E )lm - ( id + E ) lm - ( id + id )

E rm - E rm - ( E ) rm - ( E + E )rm - ( E + id ) rm - ( id + id )

Page 23: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

24

Parse Trees & DerivationsParse Trees & Derivations

Many derivations may correspond to the same parse tree, but every parse tree has associated with it a unique leftmost and a unique rightmost derivation

Page 24: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

25

Ambiguous GrammarAmbiguous Grammar

A grammar is ambiguous if it produces more than one parse tree for some sentence more than one leftmost/rightmost derivation

E E + E id + E id + E * E id + id * E id + id * id

E E * E E + E * E id + E * E id + id * E id + id * id

Page 25: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

26

Ambiguous GrammarAmbiguous Grammar

E

+E E

id

id

*E E

id

E

*E E

id

id

+E E

id

Page 26: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

27

Resolving AmbiguityResolving Ambiguity

Use disambiguating rules to throw away

undesirable parse trees

Rewrite grammars by incorporating

disambiguating rules into unambiguous

grammars

Page 27: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

28

An ExampleAn Example

The dangling-else grammar stmt if expr then stmt | if expr then stmt else stmt

| other

Two parse trees forif E1 then if E2 then S1 else S2

Page 28: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

29

An ExampleAn Example

S

elseE S Sif then

if E then S

elseE

S

S Sif then

if E then S

Preferred parse: closest then

Page 29: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

30

Disambiguating RulesDisambiguating Rules

Rule: match each else with the closest previous

unmatched then

Remove undesired state transitions in the

pushdown automaton (parser) shift/reduce conflict on “else”

1st parse: reduce

2nd parse: shift

Page 30: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

31

Grammar RewritingGrammar Rewritingstmt m_stmt ; with only paired then-else | unm_stmt

m_stmt if expr then m_stmt else m_stmt | other

unm_stmt if expr then stmt | if expr then m_stmt else unm_stmt

So… cannot have unmatched then-else

want this then-else pair matched

Page 31: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

32

RE RE vs.vs. CFG CFG

Every language described by a RE can also be described by a CFG

Example: (a|b)*abb A0 a A0 | b A0 | a A1 A1 b A2 A2 b A3 A3 (1) Right branching

(2) Starts with a terminal symbol

Page 32: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

33

RE RE vs.vs. CFG CFGRegular Grammar:• Right branching• Starts with a

terminal symbol

A0

a(|b) A0

a(|b) A0A0

a A1

b A2A2

b A3

(a|b)* abb

Page 33: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

34

RE vs. CFG

0 31 2a b b

a

b

start

RE: (a | b)*abb

A0 a A0 | b A0 | a A1

A1 b A2

A2 b A3

A3 A0

A1

A2

A3

Page 34: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

35

RE vs. CFG

a DFA for (a | b)*abb

0 31 2ab b

a

b

start

a

b

a

A0

A1 A3

A2

A0 b A0 | a A1

A1 a A1 | b A2

A2 a A1 | b A3

A3 a A1 | b A0 |

Page 35: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

36

CFG: Expressive Power (cont.)CFG: Expressive Power (cont.)

• Writing a CFG for a FSA (RE)– define a non-terminal Ni for a state with state numb

er i

– start symbol S = N0 (assuming that state 0 is the initial state)

– for each transition δ(i,a)=j (from state i to stet j on input alphabet a), add a new production Ni → a Nj to P (if a== εNi → Nj)

– for each final state i, add a new production Ni → εto P

Page 36: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

38

CFG: Expressive PowerCFG: Expressive Power

• CFG vs. Regular Expression (R.E.)– Every R.E. can be recognized by a FSA– Every FSA can be represented by a CFG

with production rules of the form: A → a B | ε

– (known as a “Regular Grammar”)

• Therefore, L(RE) L(CFG)

Page 37: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

39

CFG: Expressive Power (cont.)CFG: Expressive Power (cont.)

• Chomsky Hierarchy:– R.E. : Regular set (recognized by FSAs)– CFG: Context-free (Pushdown automata)– CSG: Context-sensitive (Linear bounded aut

omata)– Unrestricted: Recursively enumerable (Tuni

ng Machine)

Page 38: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

40

Push-Down AutomataPush-Down Automata

Finite Automata

Input

OutputStack

Page 39: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

41

RE RE vs.vs. CFG CFG

Why use REs for lexical syntax?– do not need a notation as powerful as CFGs– are more concise and easier to understand than

CFGs– More efficient lexical analyzers can be constru

cted from REs than from CFGs– Provide a way for modularizing the front end i

nto two manageable-sized components

Page 40: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

42

CFG CFG vs.vs. Finite-State Machine Finite-State Machine

• Inappropriateness of FSA– Constituents: only terminals

– Recursion: do not allow A => … B … => … A …

• RTN (Recursive Transition Network)– FSA with augmentation of recursion

– arc: terminal or non-terminal

– if arc is non-terminal: call to a sub-transition network & return upon traversal

Page 41: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

43

Nonregular ConstructsNonregular Constructs

REs can denote only a fixed number of repetitions or an unspecified number of repetitions of one given constructE.g. a*b*

A nonregular construct:– L = {anbn | n 1}

Page 42: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

44

Non-Context-Free ConstructsNon-Context-Free Constructs

CFGs can denote only a fixed number of repetitions or an unspecified number of repetitions of one or two (paired) given constructs E.g. anbn

Some non-context-free constructs:– L1 = {wcw | w is in (a | b)*}

• declaration/use of identifiers

– L2 = {anbmcndm | n 1 and m 1}• #formal arguments/#actual arguments

– L3 = {anbncn | n 0}• e.g., b: Backspace, c: under score

Page 43: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

45

Context-Free ConstructsContext-Free Constructs

FA (RE) cannot keep countsCFGs can keep count of two items but not

threeSimilar context-free constructs:

– L’1 = {wcwR | w is in (a | b)*, R: reverse order}– L’2 = {anbmcmdn | n 1 and m 1}– L’’2 = {anbncmdm | n 1 and m 1}– L’3 = {anbn | n 1}

Page 44: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

46

CFG ParsersCFG Parsers

Page 45: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

47

Types of CFG ParsersTypes of CFG Parsers

Universal: can parse any CFG grammar CYK, Earley

CYK: Exhaustively matching sub-ranges of input tokens against grammar rules, from smaller ranges to larger ranges

Earley: Exhaustively enumerating possible expectations from left-to-right, according to current input token and grammar

Non-universal: not all CFG’s can be parsed (e.g., recursive descent parser)

Universal (to all grammars) is NOT always efficient

Page 46: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

48

Types of CFG ParsersTypes of CFG Parsers Practical Parsers: [“what is a good parser?”]

Simple: simple program structure Left-to-right (or right-to-left) scan

middle-out or island driven is often not preferred

Top-down or Bottom up matching

Efficient: efficient for good/bad inputs Parse normal syntax quickly Detect errors immediately on next token

Deterministic: No alternative choices during parsing given next token Small lookahead buffer (also contribute to efficiency)

Page 47: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

49

Types of CFG ParsersTypes of CFG Parsers

Top Down:Matching from start symbol down to terminal

tokens

Bottom Up:Matching input tokens with reducible rules

from terminal up to start symbol

Page 48: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

50

Efficient CFG ParsersEfficient CFG Parsers

Top Down: LL ParsersMatching from start symbol down to terminal

tokens, left-to-right, according to a leftmost derivation sequence

Bottom Up: LR ParsersMatching input tokens with reducible rules,

left-to-right, from terminal up to start symbol, in a reverse order of rightmost derivation sequence

Page 49: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

51

Efficient CFG ParsersEfficient CFG Parsers

Efficient & Deterministic Parsing – only possible for some subclasses of grammars with special parsing algorithmsTop Down:

Parsing LL Grammars with LL Parsers

Bottom Up:Parsing LR Grammars with LR ParsersLR grammar is a larger class of grammars than LL

Page 50: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

52

Parsing Table Construction for Parsing Table Construction for Efficient ParsersEfficient Parsers

Parsing Table:A pre-computed table (according to the gram

mar), indicating the appropriate action(s) to take in any predefined state when some input token(s) is/are under examination

Lookahead symbol(s): the input symbol(s) under examination for determining next action(s) id + * num

State-0 action-1 action-3

State-1 action-2 action-5

State-2 action-4

Good parsers do not change their codes when the grammar

is revised. Table driven.

Page 51: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

53

Parsing Table Construction for Parsing Table Construction for Efficient ParsersEfficient Parsers

Parsing Table Construction:Decide a pre-defined number of lookaheads to

use for predicting next stateDefine and enumerate all the unique states for

the parsing methodDecide the actions to take in all states with all

possible lookahead(s)

Page 52: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

54

Parsing Table Construction for Parsing Table Construction for Efficient ParsersEfficient Parsers

X-Parser: you can invent any parser and call it the X-ParserBut its parsing algorithm may not handle all

grammars deterministically, thus efficiently.X-Grammar:

Any grammar whose parsing table for the X-parsing method/X-Parser has no conflicting actions in all states

Non-X Grammar: has more than one action to take under any state

Page 53: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

55

Parsing Table Construction for Parsing Table Construction for Efficient ParsersEfficient Parsers

k: The number of lookahead symbols used by a parser to determine the next action A larger number of lookahead symbols tends to make

it less possible to have conflicting actions But may result in a much larger table that grows exponential

ly with the number of lookaheads Does not guarantee unambiguous for some grammars (inher

ently ambiguous) even with infinite lookaheads X(k) Parser:

X Parser that uses k lookahead symbols to determine the next action

X(k) Grammar: any grammar deterministically parsable with X(k) Par

ser

Page 54: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

56

Types of Grammars Capable of Types of Grammars Capable of Efficient ParsingEfficient Parsing

LL(k) GrammarsGrammars that can be deterministically

parsed using an LL(k) parsing algorithme.g., LL(1) grammar

LR(k) GrammarsGrammars that can be deterministically

parsed using an LR(k) parsing algorithme.g., SLR(1) grammar, LR(1) grammar,

LALR(1) grammar

Page 55: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

57

Top-Down CFG ParsersTop-Down CFG Parsers

Recursive Descent Parser

vs.

Non-Recursive LL(1) Parser

Page 56: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

58

Top-Down ParsingTop-Down ParsingConstruct a parse tree from the root to the

leaves using leftmost derivation

S c A B input: cadA a b | aB d

S

c A B

S

c A B

a

S

c A B

a b

S

c A B

a d

Page 57: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

59

Predictive ParsingPredictive Parsing

A top-down parsing without backtracking– there is only one alternative production to choo

se at each derivation step

stmt if expr then stmt else stmt | while expr do stmt | begin stmt_list end

Page 58: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

60

LL(LL(kk) Parsing) Parsing

The first L stands for scanning the input from left to right

The second L stands for producing a leftmost derivation

The k stands for the number of input symbols for lookahead used to choose alternative productions at each derivation step

Page 59: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

61

LL(1) ParsingLL(1) Parsing

Use one input symbol of lookaheadSame as Recursive-descent parsing

But, Non-recursive predictive parsing

Page 60: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

62

Recursive Descent Parsing (more)Recursive Descent Parsing (more)

The parser consists of a set of (possibly recursive) procedures

Each procedure is associated with a nonterminal of the grammar

The calling sequence of procedures in processing the input implicitly defines a parse tree for the input

Page 61: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

63

An ExampleAn Example

type simple | id | array [ simple ] of type

simple integer | char | num dotdot num

Page 62: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

64

An ExampleAn Example

type

array [ simple ] of type

dotdotnum num simple

integer

array [ num dotdot num ] of integer

Page 63: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

65

An ExampleAn Exampleprocedure type;begin if lookahead is in { integer, char, num } then simple else if lookahead = id then match(id) else if lookahead = array then begin match(array); match('['); simple; match(']'); match(of); type end else errorend;

Page 64: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

66

An ExampleAn Example

procedure match(t : token);begin if lookahead = t then lookahead := nexttoken else errorend;

Page 65: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

67

An ExampleAn Example

procedure simple;begin if lookahead = integer then match(integer) else if lookahead = char then match(char) else if lookahead = num then begin match(num); match(dotdot); match(num) end else errorend;

Page 66: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

68

LL(k) Constraint: Left RecursionLL(k) Constraint: Left Recursion

A grammar is left recursive if it has a nonterminal A such that A + A

A A | A R R R |

A

A

A

A

A R

RRR

*

Page 67: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

69

Direct/Immediate Left Direct/Immediate Left RecursionRecursion

A A 1 | A 2 | ... | A m | 1 | 2 | ... | n

A 1 A' | 2 A' | ... | n A'

A' 1 A' | 2 A' | ... | m A' |

is equivalent to …

(1 | 2 | ... | n ) (1 | 2 | ... | m )*

A A i | j (i=1,m ; j=1,n)

Page 68: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

70

An ExampleAn Example

E E + T | TT T * F | FF ( E ) | id

E T E'E' + T E' | T F T'T' * F T' | F ( E ) | id

Page 69: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

71

Indirect Left RecursionIndirect Left Recursion

G0: S A a | b A A c | S d |

Problem: Indirect Left-Recursion: S A a S d a

Solution-Step1: Indirect to Direct Left-Recursion: A A c | A a d | b d |

Solution-Step2: Direct Left-Recursion to Right-Recursion: S A a | b A b d A' | A' A' c A' | a d A' |

• Scan rules top-down• Do not start with symbols defined earlier (=> substitute them if any)• Resolve direct recursion

Page 70: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

72

Indirect Left RecursionIndirect Left Recursion

Input. Grammar G with no cycles or -production.Output. An equivalent grammar with no left recursion.1. Arrange the nonterminals in some order A1, A2, ..., An

2. for i := 1 to n do begin // Step1: Substitute 1st-symbols of Aifor j := 1 to i - 1 do begin // which are previous Aj’s replace each production of the form Ai Aj ( j < i )

by the production Ai 1 | 2 | ... | k where Aj 1 | 2 | ... | k are all thecurrent Aj-productions;

endeliminate direct left recursion among Ai-productions // Step2

end

Page 71: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

73

Left FactoringLeft Factoring

Two alternatives of a nonterminal A have a nontrivial common prefix if , and

A 1 | 2

A A'A' 1 | 2

Page 72: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

74

An ExampleAn Example

S i E t S | i E t S e S | aE b

S i E t S S' | aS' e S | E b

Page 73: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

76

Top-Down Parsing: as Stack Top-Down Parsing: as Stack MatchingMatching

Construct a parse tree from the root to the leaves using leftmost derivation

S c A B input: cadA a b | aB d

S

c A B

S

c A B

a

S

c A B

a b

S

c A B

a d

Page 74: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

77

Nonrecursive Predictive ParsinNonrecursive Predictive Parsing – General Stateg – General State

Parsing program(parser/driver)

Parsing table

Input

Output

Stack

Predictive: pre-computed

parsing actions

M[X,a]= {X -> Y1 Y2 … Yk}

X

…Non-

Recursive: “Stack + Driver

Program” (instead of Recursive

procedures)

a b c … x y z

Page 75: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

78

Nonrecursive Predictive Parsing Nonrecursive Predictive Parsing – Expand Non-terminal– Expand Non-terminal

Parsing program(parser/driver)

Parsing table

Input

Output

Stack

Predictive: pre-computed

parsing actions

M[X,a]= {X -> Y1 Y2 … Yk}

Y1

Y2

Yk

Non-Recursive: “Stack + Driver

Program” (instead of Recursive

procedures)

a b c … x y z

Page 76: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

79

Nonrecursive Predictive ParsinNonrecursive Predictive Parsing – Match Terminalg – Match Terminal

Parsing program(parser/driver)

Parsing table

Input

Output

Stack

Predictive: pre-computed

parsing actions

M[X,a]= {X -> Y1 Y2 … Yk}

Y1

Y2

Yk

Non-Recursive: “Stack + Driver

Program” (instead of Recursive

procedures)

a b c … x y z

=a

Page 77: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

80

Nonrecursive Predictive ParsinNonrecursive Predictive Parsing - Error Recoveryg - Error Recovery

Parsing program(parser/driver)

Parsing table

Input

Output

Stack

Predictive: pre-computed

parsing actions

M[X,a]= {X -> Y1 Y2 … Yk}

Y1

Y2

Yk

Non-Recursive: “Stack + Driver

Program” (instead of Recursive

procedures)

a b c … x y z

=a

=c

Page 78: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

81

Nonrecursive Predictive ParsinNonrecursive Predictive Parsing - Error Recoveryg - Error Recovery

Parsing program(parser/driver)

Parsing table

Input

Output

Stack

Predictive: pre-computed

parsing actions

M[X,a]= {X -> Y1 Y2 … Yk}

Y1

Y2

Yk

Non-Recursive: “Stack + Driver

Program” (instead of Recursive

procedures)

a b c … x y z

=a

=c

Page 79: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

83

Stack OperationsStack Operations

Match– when the top stack symbol is a terminal and it

matches the input symbol, pop the top stack symbol and advance the input pointer

Expand– when the top stack symbol is a nonterminal, rep

lace this symbol by the right hand side of one of its productions

• Leftmost RHS symbol at Top-of-Stack

Page 80: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

84

An ExampleAn Example

type simple | id | array [ simple ] of type

simple integer | char | num dotdot num

Page 81: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

85

An ExampleAn ExampleAction Stack InputE type array [ num dotdot num ] of integerM type of ] simple [ array array [ num dotdot num ] of integerM type of ] simple [ [ num dotdot num ] of integerE type of ] simple num dotdot num ] of integerM type of ] num dotdot num num dotdot num ] of integerM type of ] num dotdot dotdot num ] of integerM type of ] num num ] of integerM type of ] ] of integerM type of of integerE type integerE simple integerM integer integer

Page 82: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

86

Parsing programParsing program

push $S onto the stack, where S is the start symbolset ip to point to the first symbol of w$; // try to match S$ with w$repeat let X be the top stack symbol and a the symbol pointed to by ip; if X is a terminal or $ then if X = a then pop X from the stack and advance ip else error // or error_recovery() else // X is a nonterminal

if M[X, a] = X Y1 Y2 ... Yk then pop X from and push Yk ... Y2 Y1 onto the stack else error // or error_recovery()until X = $

Page 83: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

87

Parser Driven by a Parsing Table:Parser Driven by a Parsing Table:Non-recursive DescentNon-recursive Descent

X() { // WITHOUT ε-production: X→ε

if (LA=‘a’) then

Y1(); Y2(); …Yk();

else if (LA=‘b’)

Z1(); Z2(); …; Zm();

else ERROR(); // no X→ε

// else RETURN; if X exists

} // Recursive decent procedure for matching X

a b c d

X X Y1 Y2 … Yk X Z1 Z2 … Zm

Y1 Y1 1 Y1 2

Z1 Z1 1 Z1 2

‘a’ in FirstSet( Y1 Y2 … Yk )

‘b’ in FirstSet( Z1 Z2 … Zm )

Page 84: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

88

Parser Driven by a Parsing Table:Parser Driven by a Parsing Table:Non-recursive DescentNon-recursive Descent

X() { // WITH ε-production: X→ε

if (LA=‘a’) then

Y1(); Y2(); …Yk();

else if (LA=‘b’)

Z1(); Z2(); …; Zm();

// else ERROR(); // no X→ε

else if (LA=??) RETURN; // if X exists

} // Recursive decent procedure for matching X

a b c d

X X Y1 Y2 … Yk X Z1 Z2 … Zm X

Y1 Y1 1 Y1 2

Z1 Z1 1 Z1 2

‘a’ in FirstSet( Y1 Y2 … Yk )

‘b’ in FirstSet( Z1 Z2 … Zm )

‘d’ in FollowSet(X)(S =>* …X d …)

Page 85: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

89

First Sets: Predictive ParsingFirst Sets: Predictive Parsing

The first set of a string is the set of terminals that begin the strings derived from. If * , then is also in the first set of

.Used simply to flag whether can be null for

computing First SetNot for matching any real input when parsing

FIRST() = {a | * a }+{ , if * }FIRST() includes { }: means that *

Page 86: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

90

Compute First SetsCompute First Sets

If X is terminal, then FIRST(X) is {X} If X is nonterminal and X is a production,

then add to FIRST(X) If X is nonterminal and X Y1 Y2 ... Yk is a pr

oduction, then add a to FIRST(X) if for some i, a is in FIRST(Yi) and is in all of FIRST(Y1), ..., FIRST(Yi-1).

If is in FIRST(Yj) for all j, then add to FIRST(X)

Page 87: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

91

Follow Sets: Matching EmptyFollow Sets: Matching Empty

What to do with matching null: A ? TD Recursive Descent Parsing: “assumes” success LL: more predictive => Follow Set of ‘A’

The follow set of a nonterminal A is the set of terminals that can appear immediately to the right of A in some sentential form, namely,

S * A a

a is in the follow set of A.

Page 88: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

92

Compute Follow SetsCompute Follow Sets Initialization: Place $ in FOLLOW(S), where S is the

start symbol and $ is the input right end marker. If there is a production A B , then everything in

FIRST() except for is placed in FOLLOW(B) is not considered a visible input to follow any symbol

If there is a production A B or A B where FIRST() contains (i.e., * ), then everything in FOLLOW(A) is in FOLLOW(B) S * … A a … implies S * … B a YES:“every symbol that can follow A will also follow B” NO!: “every symbol that can follow B will also follow A”

Page 89: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

93

An ExampleAn Example

E T E'E' + T E' | T F T'T' * F T' | F ( E ) | id

FIRST(E) = FIRST(T) = FIRST(F) = { (, id }FIRST(E') = { +, }FIRST(T') = { *, }FOLLOW(E) = FOLLOW(E') = { ), $ }FOLLOW(T) = FOLLOW(T') = { +, ), $ }FOLLOW(F) = { +, *, ), $ }

Page 90: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

94

Constructing Parsing TableConstructing Parsing Table

Input. Grammar G.

Output. Parsing Table M.

Method.

1. For each production A of the grammar, do steps 2 and 3.

2. For each terminal a in FIRST( ), add A to M[A, a].

3. If is in FIRST( ) [A * ], add A to M[A, b] for each

terminal b [including ‘$’] in FOLLOW(A).

- If is in FIRST( ) and $ is in FOLLOW(A),

add A to M[A, $].

4. Make each undefined entry of M be error.

Page 91: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

95

LL(1) Parsing Table ConstructionLL(1) Parsing Table Construction

A() { // WITH/WITHOUT ε-productions: A (* )

if (LA=‘a’ in First(Y1 Y2… Yk)) then

Y1(); Y2(); …Yk();

else if (LA=‘b’ in Follow(A) & εin First(Z1 Z2... ))

Z1(); Z2(); …; Zm(); // Nullable

else ERROR();

} // Recursive version of LL(1) parser

a in First() b in Follow(A) c not in First() or Follow(A)

A A A (* ) error

B

CWhen to apply A ?

including A

Page 92: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

96

An ExampleAn Example

id + * ( ) $E E TE' E TE'E' E' +TE' E' E' T T FT' T FT' T' T' T' *FT' T' T' F F id F (E)

Page 93: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

97

An ExampleAn Example Stack Input Output$E id + id * id$ $E'T id + id * id$ E TE' $E'T'F id + id * id$ T FT' $E'T'id id + id * id$ F id$E'T' + id * id$$E' + id * id$ T' $E'T+ + id * id$ E' + TE' $E'T id * id$$E'T'F id * id$ T FT' $E'T'id id * id$ F id$E'T' * id$

$E'T'F* * id$ T' * FT' $E'T'F id$$E'T'id id$ F id$E'T' $$E' $ T' $ $ E'

Page 94: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

98

LL(1) GrammarsLL(1) Grammars

A grammar is an LL(1) grammar if its predictive parsing table has no multiply-defined entries

Page 95: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

99

A Counter ExampleA Counter Example

S i E t S S' | aS' e S | E b

a b e i t $S S a S i E t S S'S' S' S' S' e SE E b

e FOLLOW(S’)

e FIRST(e S)Disambiguation: matching closest “then”

Page 96: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

100

LL(1) Grammars or Not ??LL(1) Grammars or Not ??

A grammar G is LL(1) iff whenever A | are two distinct productions of G, the following conditions hold:– For no terminal a do both and derive strings beginning

with a.• or… M[A, first()&first()] entries will have conflicting actions

– At most one of and can derive the empty string• or… M[A, follow(A)] entries have conflicting actions

– If * , then does not derive any string beginning with a terminal in FOLLOW(A).

• or… M[A, first()&follow(A)] entries have conflicting actions

Page 97: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

101

Non-LL(1) Grammar:Non-LL(1) Grammar:Ambiguous According to LL(1) Ambiguous According to LL(1)

Parsing Table ConstructionParsing Table Construction

a in First() & First() b in Follow(A) a in First() & Follow(A)

A A A

A (* )

A (* )

A (/* ) (but * a )

A (* )

B

C

When will A & A appear in the same table cell ??

S' e S | X X a | b

Page 98: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

102

LL(1) Grammars or Not??LL(1) Grammars or Not??

If G is left-recursive or ambiguous, then M will have at least one multiply-defined entry=> non-LL(1)E.g., X X a | b

=> FIRST(X) = {b} (and, of course, FIRST(b) = {b})

=> M[X,b] includes both {X X a} and {X b}

i.e., Ambiguous G and G with left-recursive productions can not be LL(1).

No LL(1) grammar can be ambiguous

Page 99: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

103

Error Recovery for LL ParsersError Recovery for LL Parsers

Page 100: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

104

Syntactic ErrorsSyntactic Errors

• Empty entries in a parsing table:– Syntactic error is encountered when the lookah

ead symbol corresponding to this entry is in input buffer

– Error Recovery information can be encoded in such entries to take appropriate actions upon error

• Error Detection:– (1) Stacktop = x && x != input (a)– (2) Stacktop = A && M[A, a] = empty (error)

Page 101: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

105

Error Recovery StrategiesError Recovery Strategies Panic mode: skip tokens until a token in a set of

synchronizing tokens appears INS (insertion) type of errors sync at delimiters, keywords, …, that have clear

functions Phrase Level Recovery

local INS (insertion), DEL (deletion), SUB (substitution) types of errors

Error Production define error patterns (“error productions”) in grammar

Global Correction [Grammar Correction] minimum distance correction

Page 102: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

106

Error Recovery – Panic ModeError Recovery – Panic Mode

Panic mode: skip tokens until a token in a set of synchronizing tokens appears

Commonly used Synchronizing tokens:– SUB(A,ip): use FOLLOW(A) as sync set for A (pop A)

– use the FIRST set of a higher construct as sync set for a lower construct

– INS(ip): use FIRST(A) as sync set for A

– *ip= : use the production deriving as the default

– DEL(ip): If a terminal on stack cannot be matched, pop the terminal

Page 103: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

107

… …

Error Recovery – Panic ModeError Recovery – Panic ModeAction Stack InputSUB(A,ip)

INS(ip)

DEL(ip)

… A *ip … Follow(A) …A

… A *ip … First(A) …

… x *ip … …

A

x

X

Follow(A)…

A

*ip

X

… A

First(A)…*ip

X

… …x

*ip

x

Page 104: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

108

Error Recovery Actions Using Error Recovery Actions Using Follow & First Sets to SyncFollow & First Sets to Sync

Expanding non-terminal A: M[A,a] = error (blank):

Skip “a” in input = delete all such “a” (until sync with sync symbol, b) /* panic */

M[A,b] = sync (at FOLLOW(A)) Pop “A” from stack = “b” is a sync symbol following A

M[A,b] = A (== sync at FIRST(A) ) Expand A as (same as normal parsing action)

Matching terminal “x”: (*sp=“x”) != “a”

Pop(x) from stack = missing input token “x”

Page 105: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

109

An ExampleAn Example

id + * ( ) $E E TE' E TE' sync syncE' E' +TE' E' E' T T FT' sync T FT' sync syncT' T' T' *FT' T' T' F F id sync sync F (E) sync sync

FOLLOW(F)={+,*,),$}

FOLLOW(E)=FOLLOW(E’)={),$}

FIRST(X) is used to Expand non-productions or Sync (on errors)

FOLLOW(X) is used to Expand -productions or Sync (on errors)

Page 106: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

110

An ExampleAn Example Stack Input Output$E ) id * + id$ error, skip )$E id * + id$ id is in FIRST(E)$E'T id * + id$ E TE' $E'T'F id * + id$ T FT' $E'T'id id * + id$ F id$E'T' * + id$$E'T'F* * + id$ T' *FT' $E'T'F + id$ error, M[F,+]=synch / FOLLOW(F)$E'T' + id$ F popped$E' + id$ T' $E'T+ + id$ E' +TE' $E'T id$$E'T'F id$ T FT'$E'T'id id$ F id$E'T' $$E' $ T' $ $ E'

Page 107: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars

111

Parse Tree - Error RecoveredParse Tree - Error Recovered

E

) E’

ε

+ E’T

ε

F T’

id

T

F

id

T’

ε

F* T’

) id * + id => id * F + id