parsing. parsing calculate grammatical structure of program, like diagramming sentences, where:...

ParsingParsing

ParsingParsing

Calculate grammatical structure of Calculate grammatical structure of program, like diagramming program, like diagramming sentences, where:sentences, where:

Tokens = “words”Tokens = “words”

Programs = “sentences”Programs = “sentences”

For further information, read: Aho, Sethi, Ullman, “Compilers: Principles, Techniques, and Tools” (a.k.a, the “Dragon Book”)

Outline of coverageOutline of coverage

Context-free grammarsContext-free grammarsParsingParsing

– Tabular Parsing Methods– One pass

• Top-down• Bottom-up

YaccYacc

Parser: extracts grammatical structure of programParser: extracts grammatical structure of program

function-def

name arguments stmt-list

mainstmt

expression

operatorexpression expression

variable string

cout

<<

“hello, world\n”

Context-free languagesContext-free languages

Grammatical structure defined by context-Grammatical structure defined by context-free grammarfree grammar

statementstatement labeled-statementlabeled-statement | | expression-statementexpression-statement | | compound-statementcompound-statementlabeled-statementlabeled-statement identident :: statementstatement | | casecase constant-expression constant-expression :: statementstatementcompound-statementcompound-statement {{ declaration-list statement-list declaration-list statement-list }}

terminalnon-terminal

“Context-free” = only one non-terminal in left-part

Parse treesParse trees

Parse tree = tree labeled with grammar Parse tree = tree labeled with grammar symbols, such that:symbols, such that:

If node is labeled A, and its children If node is labeled A, and its children are labeled are labeled xx11......xxnn, then there is a , then there is a productionproductionA A xx11......xxnn

““Parse tree from Parse tree from AA” = root labeled ” = root labeled with with AA

““Complete parse tree” = all leaves Complete parse tree” = all leaves labeled with tokenslabeled with tokens

Parse trees and sentencesParse trees and sentences

Frontier Frontier of tree = labels on leaves (in left-of tree = labels on leaves (in left-to-right order)to-right order)

Frontier of tree from Frontier of tree from SS is a is a sentential formsentential form Frontier of a complete tree from Frontier of a complete tree from SS is a is a

sentencesentence

L

E

a

L

; E

“Frontier”

ExampleExample

GG: L : L L L ;; E | E E | E E E aa | | bb

Syntax trees from start symbol (L):Syntax trees from start symbol (L):

a a;E a;b;b

L

E

a

L

E

a

L

; E L

E

a

L

; E

b

L

E

b

;

Sentential forms:Sentential forms:

DerivationsDerivations

Alternate definition of Alternate definition of sentencesentence:: Given Given , , in in VV*, say *, say is a is a derivation derivation

step step if if ’’’’ and ’’ and = = ’’’’ , where ’’ , where A A is a productionis a production

is a is a sentential form sentential form iff there exists a iff there exists a derivationderivation (sequence of derivation steps) (sequence of derivation steps) SS( alternatively, we say that ( alternatively, we say that SS))

Two definitions are equivalent, but note that there are many derivations corresponding to each parse tree

Another exampleAnother example

HH: L : L E E ;; L | E L | E E E aa | | bb

L

E

a

L

E

a

L

;E L

E

a

L

;E

b

L

E

b

;

AmbiguityAmbiguity

For some purposes, it is important to For some purposes, it is important to know whether a sentence can have more know whether a sentence can have more than one parse treethan one parse tree

A grammar is A grammar is ambiguous ambiguous if there is a if there is a sentence with more than one parse treesentence with more than one parse tree

Example: Example: EE E E++E | EE | E**E | E | idid

E

E

E

E

E

id id

id+

*

E

E

EE

Eid

id id

+

*

If e then if b then d else fIf e then if b then d else f{ int x; y = 0; }{ int x; y = 0; }A.b.c = d;A.b.c = d; Id -> s | s.idId -> s | s.id

E -> E + T -> E + T + T -> T + T + T -> id E -> E + T -> E + T + T -> T + T + T -> id + T + T -> id + T * id + T -> id + id * id + T + T -> id + T * id + T -> id + id * id + T ->+ T ->id + id * id + idid + id * id + id

AmbiguityAmbiguity

Ambiguity is a function of the Ambiguity is a function of the grammar rather than the languagegrammar rather than the language

Certain ambiguous grammars may Certain ambiguous grammars may have equivalent unambiguous oneshave equivalent unambiguous ones

Grammar TransformationsGrammar Transformations

Grammars can be transformed Grammars can be transformed without affecting the language without affecting the language generatedgenerated

Three transformations are discussed Three transformations are discussed next:next:– Eliminating Ambiguity– Eliminating Left Recursion

(i.e.productions of the form AA )– Left Factoring

Eliminating AmbiguityEliminating Ambiguity

Sometimes an ambiguous grammar can Sometimes an ambiguous grammar can be rewritten to eliminate ambiguitybe rewritten to eliminate ambiguity

For example, expressions involving For example, expressions involving additions and products can be written as additions and products can be written as follows:follows:

EE E E++T | TT | T TT T T**idid | | idid The language generated by this grammar The language generated by this grammar

is the same as that generated by the is the same as that generated by the grammar on tranparency 11. Both grammar on tranparency 11. Both generate generate idid(+(+idid||**idid)*)*

However, this grammar is not ambiguousHowever, this grammar is not ambiguous

Eliminating Ambiguity (Cont.)Eliminating Ambiguity (Cont.)

One advantage of this grammar is One advantage of this grammar is that it represents the precedence that it represents the precedence between operators. In the parsing between operators. In the parsing tree, products appear nested within tree, products appear nested within additionsadditions

E

T

TE

id

+

*

idT

id


An example of ambiguity in a An example of ambiguity in a programming language is the programming language is the dangling dangling elseelse

ConsiderConsider S S ifif thenthen SS elseelse SS | | ifif thenthen

SS | |


When there are two nested ifs and When there are two nested ifs and only one else..only one else..

S

ifif then S else S

if then S

S

ifif then S

ifif then S else S


In most languages (including C++ and Java), In most languages (including C++ and Java), each each elseelse is assumed to belong to the is assumed to belong to the nearest nearest ifif that is not already matched by an that is not already matched by an elseelse. This association is expressed in the . This association is expressed in the following (unambiguous) grammar:following (unambiguous) grammar:

S S MatchedMatched | Unmatched| Unmatched Matched Matched ifif thenthen Matched Matched elseelse Matched Matched | | Unmatched Unmatched ifif thenthen S S ||ifif thenthen Matched Matched elseelse Unmatched Unmatched


Ambiguity is a property of the Ambiguity is a property of the grammargrammar

It is undecidable whether a context It is undecidable whether a context free grammar is ambiguousfree grammar is ambiguous

The proof is done by reduction to The proof is done by reduction to Post’s correspondence problemPost’s correspondence problem

Although there is no general Although there is no general algorithm, it is possible to isolate algorithm, it is possible to isolate certain constructs in productions certain constructs in productions which lead to ambiguous grammarswhich lead to ambiguous grammars


For example, a grammar containing the For example, a grammar containing the production production AAAA |AA | would be ambiguous, would be ambiguous, because the substring because the substring has two parses: has two parses:

A

A A

A

A

A A

A

A

A

This ambiguity disappears if we use the productions This ambiguity disappears if we use the productions AAAB |AB | BB and and BB

or the productionsor the productions AABA |BA | BB and and BB..


Examples of ambiguous productions:Examples of ambiguous productions:AAAAA | A andAA | AA

A language generated by an ambiguous A language generated by an ambiguous CFG is inherently ambiguous if it has no CFG is inherently ambiguous if it has no unambiguous CFGunambiguous CFG– An example of such a language is

L={aibjcm | i=j or j=m} which can be generated by the grammar:

SAB | DC AaA | CcC | BbBc | DaDb |

Elimination of Left RecursionElimination of Left Recursion

A grammar is left recursive if it has a nonterminal A grammar is left recursive if it has a nonterminal A and a derivation A and a derivation AAAAfor some stringfor some stringTop-down Top-down parsing methods (to be discussed shortly) cannot handle parsing methods (to be discussed shortly) cannot handle left-recursive grammars, so a transformation to eliminate left-recursive grammars, so a transformation to eliminate left recursion is needed.left recursion is needed.

Immediate left recursion (productions of the form Immediate left recursion (productions of the form AAAA ) ) can be easily eliminated.can be easily eliminated.

We group the We group the AA-productions as-productions as AAAA 1 1 || AA 2 2 || … | … | AA m m | | 11| | 22 | … | | … | nn

where no where no ii begins with begins with AA. Then we replace the . Then we replace the AA--productions byproductions by

AA1 1 A’A’ | | 2 2 A’A’ | … | | … | n n A’A’

A’A’ 1 1 A’A’ || 2 2 AA’|’| … | … | m m A’A’ | |

Elimination of Left Recursion (Cont.)Elimination of Left Recursion (Cont.)

The previous transformation, The previous transformation, however, does not eliminate left however, does not eliminate left recursion involving two or more recursion involving two or more steps. For example, consider the steps. For example, consider the grammargrammar SAa | b

AAc| Sd |

S is left-recursive because S is left-recursive because SSAAaaSSdadabut it is not immediately left but it is not immediately left recursiverecursive


Algorithm. Algorithm. Eliminate left recursionEliminate left recursionArrange nonterminals in some order AArrange nonterminals in some order A11, A, A2 ,2 ,,…, A,…, Ann

for for i =1i =1 to to n { n { forfor j =1 j =1 toto i -1 { i -1 { replace each production of the form replace each production of the form AAiiAAjj by the production by the production AAii1 1 | | 2 2 | … | | … | n n where where AAjj1 1 | | 2 2 |…| |…| n n are all the current Aare all the current Ajj--

productionsproductions }} eliminate the immediate left recursion among the Aeliminate the immediate left recursion among the Aii--

productionsproductions}}


To show that the previous algorithm actually To show that the previous algorithm actually works all we need notice is that iteration i only works all we need notice is that iteration i only changes productions with changes productions with AAii on the left-hand side. on the left-hand side. And m > i in all productions of the form And m > i in all productions of the form AAiiAAmm

Induction proof: Induction proof: – Clearly true for i=1– If it is true for all i<k, then when the outer loop is

executed for i=k, the inner loop will remove all productions AiAm with m < i

– Finally, with the elimination of self recursion, m in the AiAm productions is forced to be > i

So, at the end of the algorithm, all derivations of So, at the end of the algorithm, all derivations of the form the form AAiiAAmmwill have m > i and therefore left will have m > i and therefore left recursion would not be possiblerecursion would not be possible

Left FactoringLeft Factoring

Left factoring helps transform a grammar for Left factoring helps transform a grammar for predictive parsingpredictive parsing

For example, if we have the two productionsFor example, if we have the two productions S S ifif thenthen SS elseelse SS | | ifif thenthen SS on seeing the input token on seeing the input token ifif, we cannot , we cannot

immediately tell which production to choose to immediately tell which production to choose to expand expand SS

In general, if we have In general, if we have AA 1 1 || 22 and the input and the input begins with begins with , we do not know, we do not know (without looking (without looking further) which production to use to expand further) which production to use to expand AA

Left Factoring (Cont.)Left Factoring (Cont.)

However, we may defer the decision However, we may defer the decision by expanding A to by expanding A to A’A’

Then after seeing the input derived Then after seeing the input derived from from , we may expand A’ to , we may expand A’ to 1 1 or toor to

22

Left-factored, the original Left-factored, the original productions becomeproductions become

AA A’A’

A’A’1 1 | | 22

Non-Context-Free Language ConstructsNon-Context-Free Language Constructs

Examples of non-context-free languages are:Examples of non-context-free languages are:– L1 = {wcw | w is of the form (a|b)*}– L2 = {anbmcndm | n 1 and m 1 }– L3 = {anbncn | n 0 }

Languages similar to these that are context freeLanguages similar to these that are context free– L’1 = {wcwR | w is of the form (a|b)*} (wR stands for w

reversed) This language is generated by the grammar

S aSa | bSb | c

– L’2 = {anbmcmdn | n 1 and m 1 } This language is generated by the grammar

S aSd | aAdA bAc | bc

Non-Context-Free Language Constructs Non-Context-Free Language Constructs (Cont.)(Cont.)L”L”22={={aannbbnnccmmddmm | | n n 1 1 andand m m 1 1 }} is generated by the grammaris generated by the grammar

S ABA aAb | abB cBd | cd

L’L’33={={aannbbnn | | n n 1 1}} is generated by the grammaris generated by the grammar

S aSb | ab This language is not definable by any This language is not definable by any

regular expressionregular expression

Non-Context-Free Language Constructs Non-Context-Free Language Constructs (Cont.)(Cont.)

Suppose we could construct a DFSM Suppose we could construct a DFSM DD accepting accepting L’L’3. 3.

DD must have a finite number of states, say must have a finite number of states, say kk. . Consider the sequence of states Consider the sequence of states ss00, , ss11, , ss22, …, , …, sskk

entered by entered by DD having read having read , , aa, , aaaa, …, , …, aakk. . Since Since DD only has only has kk states, two of the states in the states, two of the states in the

sequence have to be equal. Say,sequence have to be equal. Say, s sii ssjj (i (ij). j). From From ssii, a sequence of i , a sequence of i bbs leads to an accepting s leads to an accepting

(final) state. Therefore, the same sequence of i (final) state. Therefore, the same sequence of i bbs s will also lead to an accepting state from will also lead to an accepting state from ssjj. . Therefore D would accept Therefore D would accept aajjbbii which means that which means that the language accepted by D is not identical to L’the language accepted by D is not identical to L’33. . A contradiction.A contradiction.

ParsingParsing

The parsing problem is: Given string of The parsing problem is: Given string of tokens tokens ww, find a parse tree whose frontier , find a parse tree whose frontier is is ww. (Equivalently, find a derivation from . (Equivalently, find a derivation from ww.).)

A A parserparser for a grammar for a grammar GG reads a list of reads a list of tokens and finds a parse tree if they form tokens and finds a parse tree if they form a sentence (or reports an error otherwise)a sentence (or reports an error otherwise)

Two classes of algorithms for parsing:Two classes of algorithms for parsing:– Top-down– Bottom-up

Parser generatorsParser generators

A A parser generator parser generator is a program that reads is a program that reads a grammar and produces a parsera grammar and produces a parser

The best known parser generator is The best known parser generator is yaccyacc It produces bottom-up parsersIt produces bottom-up parsers

Most parser generators - including yacc - Most parser generators - including yacc - do not work for every CFG; they accept a do not work for every CFG; they accept a restricted class of CFG’s that can be restricted class of CFG’s that can be parsed efficiently using the method parsed efficiently using the method employed by that parser generatoremployed by that parser generator

Top-down parsingTop-down parsing

Starting from parse tree containing Starting from parse tree containing just just SS, build tree down toward input. , build tree down toward input. Expand left-most non-terminal.Expand left-most non-terminal.

Algorithm: (next slide)Algorithm: (next slide)

Top-down parsing (cont.)Top-down parsing (cont.)

Let input = aLet input = a11aa22...a...ann

current sentential form (csf) = Sloop {

suppose csf = t1...tkA if t1...tk a1...ak , it’s an error

based on ak+1..., choose production A

csf becomes t1...tk}

Top-down parsing exampleTop-down parsing example

Grammar: Grammar: HH: L : L E E ;; L | E L | E E E aa | | bb

Input: Input: a;ba;bParse tree Sentential form Parse tree Sentential form Input Input

L a;b

E;L a;b

L

LE L;

LE L;

a

a;L a;b

Top-down parsing example Top-down parsing example (cont.)(cont.)

Parse tree Sentential form InputParse tree Sentential form Input

a;E a;bLE L;

a E

LE L;

a E

b

a;b a;b

LL(1) parsingLL(1) parsing

Efficient form of top-down parsingEfficient form of top-down parsingUse only first symbol of remaining Use only first symbol of remaining

input (input (aak+1k+1) to choose next ) to choose next

production. That is, employ a production. That is, employ a function M:function M: N N P in “choose P in “choose production” step of algorithm.production” step of algorithm.

When this works, grammar is called When this works, grammar is called LL(1)LL(1)

LL(1) examplesLL(1) examples

Example 1:Example 1: H: L E ; L | E

E a | b

Given input a;b, so next symbol is a.

Which production to use? Can’t tell.

H not LL(1)

LL(1) examplesLL(1) examples

Example 2:Example 2: Exp Term Exp’

Exp’ $ | + Exp

Term id(Use $ for “end-of-input” symbol.)

Grammar is LL(1): Exp and Term have only one production; Exp’ has two productions but only one is applicable at any time.

Nonrecursive predictive parsingNonrecursive predictive parsing

It is possible to build a nonrecursive It is possible to build a nonrecursive predictive parser by maintaining a predictive parser by maintaining a stack explicitly, rather than implicitly stack explicitly, rather than implicitly via recursive callsvia recursive calls

The key problem during predictive The key problem during predictive parsing is that of determining the parsing is that of determining the production to be applied for a non-production to be applied for a non-terminalterminal

Nonrecursive predictive parsingNonrecursive predictive parsing

Algorithm. Algorithm. Nonrecursive predictive parsingNonrecursive predictive parsing Set Set ipip to point to the first symbol of to point to the first symbol of ww$.$. repeatrepeat Let Let XX be the top of the stack symbol and a the symbol pointed to be the top of the stack symbol and a the symbol pointed to

by by ipip ifif XX is a terminal or $ is a terminal or $ thenthen ifif XX == == aa thenthen pop pop XX from the stack and advance from the stack and advance ipip elseelse error() error() elseelse // // XX is a nonterminal is a nonterminal ifif MM[[X,aX,a] == ] == XXYY11 Y Y22 … Y … Y kk thenthen pop pop XX from the stack from the stack push Ypush YkkY Y k-1k-1, …, Y, …, Y11 onto the stack with Y onto the stack with Y11 on top on top (push nothing if (push nothing if YY11 Y Y22 … Y … Y kk is is ) ) output the production output the production XXYY11 Y Y22 … Y … Y kk elseelse error() error() untiluntil X == $ X == $

LL(1) grammarsLL(1) grammars

No left recursionNo left recursionA A : If this production is chosen,

parse makes no progress.No common prefixesNo common prefixes

A |

Can fix by “left factoring”:A A’

’|

LL(1) grammars (cont.)LL(1) grammars (cont.)

No ambiguityNo ambiguityPrecise definition requires that

production to choose be unique (“choose” function M very hard to calculate otherwise)

Top-down ParsingTop-down Parsing

Input tokens: <t0,t1,…,t-i,...>L

E0 … E-n

Start symbol androot of parse tree

Input tokens: <t-i,...>L

E0 … E-n

...From left to right,“grow” the parsetree downwards

Checking LL(1)-nessChecking LL(1)-ness

For any sequence of grammar symbols For any sequence of grammar symbols , , define set FIRST(define set FIRST() ) to be to be

FIRST(FIRST() = { ) = { aa | | * * aa for some for some } }

Checking LL(1)-nessChecking LL(1)-ness

Define: Grammar G = (N, Define: Grammar G = (N, , P, S) is , P, S) is LL(1) LL(1) iff whenever there iff whenever there are two left-most derivations (in which the leftmost non-are two left-most derivations (in which the leftmost non-terminal is always expanded first) terminal is always expanded first) SS =>* =>* wAwA => => ww =>* =>* wxwx SS =>* =>* wAwA => => ww =>* =>* wywy

such that FIRST(x) = FIRST(y), it follows that such that FIRST(x) = FIRST(y), it follows that = =

In other words, given In other words, given 1. A string1. A string wA wA in V* and in V* and 2. The first terminal symbol to be derived from 2. The first terminal symbol to be derived from AA, , say say

tt there is at most one production that can be applied to there is at most one production that can be applied to AA to to yield a derivation of any terminal string beginning with yield a derivation of any terminal string beginning with wtwt FIRST sets can often be calculated by inspectionFIRST sets can often be calculated by inspection

FIRST SetsFIRST Sets

Exp Term Exp’Exp’ $ | + Exp Term id

(Use $ for “end-of-input” symbol)

FIRST($) = {$}FIRST(+ Exp) = {+}

FIRST($) FIRST(+ Exp) = {}

grammar is LL(1)

FIRST SetsFIRST Sets

L E ; L | EE a | b

FIRST(E ; L) = {a, b} = FIRST(E)FIRST(E ; L) FIRST(E) {} grammar not LL(1).

Computing FIRST SetsComputing FIRST Sets

Algorithm. Algorithm. Compute FIRST(X) for all grammar symbols XCompute FIRST(X) for all grammar symbols X forallforall X X V V dodo FIRST(X)={} FIRST(X)={} forallforall X X (X is a terminal) (X is a terminal) dodo FIRST(X)={X} FIRST(X)={X} forallforall productions X productions X dodo FIRST(X) = FIRST(X) U { FIRST(X) = FIRST(X) U {}} repeatrepeat c: c: forall forall productions productions XXYY11 Y Y22 … Y … Y kk dodo forallforall i i [1,k] do [1,k] do FIRST(X) = FIRST(X) U (FIRST(FIRST(X) = FIRST(X) U (FIRST(YYii) - {) - {}) })

ifif FIRST( FIRST(YYii) ) thenthen continue c continue c FIRST(X) = FIRST(X) U {FIRST(X) = FIRST(X) U {} } untiluntil no more terminals or no more terminals or are added to any FIRST set are added to any FIRST set

FIRST Sets of Strings of SymbolsFIRST Sets of Strings of Symbols

FIRST(XFIRST(X11XX22…X…Xnn) is the union of FIRST(X) is the union of FIRST(X11) )

and all FIRST(Xand all FIRST(Xii) such that ) such that FIRST( FIRST(XXkk) )

for k=1, 2, …, i-1for k=1, 2, …, i-1FIRST(XFIRST(X11XX22…X…Xnn) contains ) contains iff iff

FIRST(FIRST(XXkk) for k=1, 2, …, n) for k=1, 2, …, n

FIRST Sets do not SufficeFIRST Sets do not Suffice

Given the productionsGiven the productions AA T x T x AA T y T y

TTww TT

TTww should be applied when the next should be applied when the next input token is w.input token is w.

TTshould be applied whenever the should be applied whenever the next terminal (the one pointed to by next terminal (the one pointed to by ipip) is ) is either x or yeither x or y

FOLLOW SetsFOLLOW Sets

For any nonterminal For any nonterminal XX, define set , define set FOLLOW(FOLLOW(XX) ) as as

FOLLOW(FOLLOW(XX) = {) = {aa | S | S **XXaa}}

Computing the FOLLOW SetComputing the FOLLOW Set

Algorithm. Algorithm. Compute FOLLOW(X) for all nonterminals XCompute FOLLOW(X) for all nonterminals X FOLLOW(S) ={$}FOLLOW(S) ={$} forallforall productions A productions A BB dodo FOLLOW(B)=Follow(B) U FOLLOW(B)=Follow(B) U

(FIRST((FIRST() - {) - {})}) repeatrepeat forall forall productions A productions A B or A B or A BB with with

FIRST(FIRST() ) dodo FOLLOW(B) = FOLLOW(B) U FOLLOW(A) FOLLOW(B) = FOLLOW(B) U FOLLOW(A)

untiluntil all FOLLOW sets remain the same all FOLLOW sets remain the same

Construction of a predictive parsing tableConstruction of a predictive parsing table

Algorithm. Algorithm. Construction of a predictive parsing tableConstruction of a predictive parsing table M[:,:] = {}M[:,:] = {} forallforall productions A productions A dodo forall forall a a FIRST( FIRST() ) do do M[A,a] = M[A,a] U {A M[A,a] = M[A,a] U {A } } ifif FIRST( FIRST() ) thenthen forall forall b b FOLLOW(A) FOLLOW(A) do do M[A,b] = M[A,b] U {A M[A,b] = M[A,b] U {A } } Make all empty entries of M be Make all empty entries of M be errorerror

Another Definition of LL(1)Another Definition of LL(1)

Define: Grammar G is Define: Grammar G is LL(1) LL(1) if for every if for every

AA N with productions A N with productions A

11nn

FIRST(FIRST(i i FOLLOW(A)) FOLLOW(A)) FIRST( FIRST(j j

FOLLOW(A) ) = FOLLOW(A) ) = for all i, j for all i, j

Regular LanguagesRegular Languages

Definition. A Definition. A regularregular grammar is one grammar is one whose productions are all of the whose productions are all of the type:type:– A aB– A a

A Regular Expression is either:A Regular Expression is either:– a– R1 | R2

– R1 R2

– R*

Nondeterministic Finite State Nondeterministic Finite State AutomatonAutomaton

0 1 2 3

a

b

a b bstart

Regular LanguagesRegular Languages

Theorem. The classes of languagesTheorem. The classes of languages– Generated by a regular grammar– Expressed by a regular expression– Recognized by a NDFS automaton– Recognized by a DFS automaton

coincide.coincide.

Deterministic Finite AutomatonDeterministic Finite Automaton

space, tab, new line

digit

OPERATOR

KEYWORD

digit

=, +, -, /, (, )

letter

START

NUM

$ $ $

circle state

double circle accept state

arrow transition

bold, cap labels state names

lower case labels transition characters

Scanner codeScanner code state := startstate := start looploop if no input character buffered then read one, and add it to the accumulated tokenif no input character buffered then read one, and add it to the accumulated token case state ofcase state of start: start: case input_char ofcase input_char of A..Z, a..z : state := idA..Z, a..z : state := id 0..9 : state := num0..9 : state := num else ...else ... endend id:id: case input_char ofcase input_char of A..Z, a..z : state := idA..Z, a..z : state := id 0..9 : state := id0..9 : state := id else ...else ... endend num:num: case input_char ofcase input_char of 0..9: ...0..9: ... ...... else ...else ... endend ...... end;end; end;end;

Table-driven DFATable-driven DFA

0-start 1-num 2-id 3-operator 4-keyword

white space 0 exit exit exit exit

letter 2 error 2 exit error

digit 1 1 2 exit error

operator 3 exit exit exit exit

$ 4 error error exit 4

L0

CFL [NPA]

Language ClassesLanguage Classes

LR(1)

LL(1)RL

[DFA=NFA]

L0

CSL

QuestionQuestion

Are regular expressions, as provided Are regular expressions, as provided by Perl or other languages, sufficient by Perl or other languages, sufficient for parsing nested structures, e.g. for parsing nested structures, e.g. XML files?XML files?

parsing. parsing calculate grammatical structure of program, like diagramming sentences, where:...

Documents

id t id id

id id t t

id id s s

e e t t e e t t t t

t slide

ambiguity ambiguity

additions e t te id

language ambiguity