structure of programming languages lecture5eliza.newhaven.edu/lang/attach/l5-parsing.pdfso a new...

36
Structure of Programming Languages – Lecture5 CSCI 6636 – 4536 February 25, 2020 CSCI 6636 – 4536 Lecture 5. . . 1/36 February 25, 2020 1 / 36

Upload: others

Post on 27-Jun-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Structure of Programming Languages Lecture5eliza.newhaven.edu/lang/attach/L5-Parsing.pdfSo a new notation called Backus Naur Form (BNF) was developed, better adapted to the character

Structure of Programming Languages – Lecture5

CSCI 6636 – 4536

February 25, 2020

CSCI 6636 – 4536 Lecture 5. . . 1/36 February 25, 2020 1 / 36

Page 2: Structure of Programming Languages Lecture5eliza.newhaven.edu/lang/attach/L5-Parsing.pdfSo a new notation called Backus Naur Form (BNF) was developed, better adapted to the character

Outline

1 Syntax and its Specification

2 Context-free LanguagesHistoryExtended BNFSyntax Diagrams

3 The Definition of Pascal

4 ParsingLL ParsersLR Parsers

5 Homework

CSCI 6636 – 4536 Lecture 5. . . 2/36 February 25, 2020 2 / 36

Page 3: Structure of Programming Languages Lecture5eliza.newhaven.edu/lang/attach/L5-Parsing.pdfSo a new notation called Backus Naur Form (BNF) was developed, better adapted to the character

Syntax and its Specification

Part 1

1. Syntax and its Specification

Context-Free LanguagesExtended Backus-Naur Form

Syntax Diagrams

CSCI 6636 – 4536 Lecture 5. . . 3/36 February 25, 2020 3 / 36

Page 4: Structure of Programming Languages Lecture5eliza.newhaven.edu/lang/attach/L5-Parsing.pdfSo a new notation called Backus Naur Form (BNF) was developed, better adapted to the character

Context-free Languages History

Context-Free Languages

Formally, almost all programming languages belong to the category called“context-free languages”. That is, the syntax of the language (excludingthe type matching rules) can be described by a context-free grammar.

The set of all context-free languages is identical to the set oflanguages accepted by a finite-state machine that uses a stack fortemporary storage.

We call such a machine a pushdown automaton

A context-free grammar provides a simple and precise mechanism fordescribing the way phrases in a language are built from smaller blocks.

CSCI 6636 – 4536 Lecture 5. . . 4/36 February 25, 2020 4 / 36

Page 5: Structure of Programming Languages Lecture5eliza.newhaven.edu/lang/attach/L5-Parsing.pdfSo a new notation called Backus Naur Form (BNF) was developed, better adapted to the character

Context-free Languages History

Regular vs. Context-free

Regular grammars support sequences of elements, choice among a setof elements, bounded and unbounded repetition, and using asubroutine (a separate expression that defines a set of elements).

Context Free languages support all of the above, plus recursion.Recursive rules provide the ability to describe matched pairs ofelements, such as parentheses. This power is necessary to describemany parts of a programming language, including:

Program blocks with nested blocksArithmetic expressions with nested subexpressionsArray subscripts; the ‘[’ and ‘]’ must match.Begin-comment and matching end-comment markers

There is a part of programming languages that context-free grammarscannot describe: the semantics of types.

CSCI 6636 – 4536 Lecture 5. . . 5/36 February 25, 2020 5 / 36

Page 6: Structure of Programming Languages Lecture5eliza.newhaven.edu/lang/attach/L5-Parsing.pdfSo a new notation called Backus Naur Form (BNF) was developed, better adapted to the character

Context-free Languages History

A Context-free Grammar G is

A finite set of nonterminal symbols, V (for vocabulary), each onerepresenting a different type of syntactic category in the language.

A finite set of keywords and punctuation, Σ (for symbol). These arecalled terminal symbols.

A finite set, R, of rules or productions of the grammar.

There must be at least one rule for every nonterminal symbol.

The starting symbol, S , is used to represent the whole sentence orprogram. It must be an element of V .

CSCI 6636 – 4536 Lecture 5. . . 6/36 February 25, 2020 6 / 36

Page 7: Structure of Programming Languages Lecture5eliza.newhaven.edu/lang/attach/L5-Parsing.pdfSo a new notation called Backus Naur Form (BNF) was developed, better adapted to the character

Context-free Languages History

History: Describing Programming Language Syntax.

Context-free grammars were developed by logicians in the 50’s. Thenotation they used was mathematical in nature and not friendly tocomputer keyboards.

The first application was to analysis of natural language.

Soon afterward, they were applied to the definition of programminglanguages

Context-free grammars were then crucial in the development oflanguages and translation tools.

So a new notation called Backus Naur Form (BNF) was developed,better adapted to the character set supported by computers.

Later, it was extended to make it easier to use. The extended versionis called EBNF, and one version of EBNF is the notation commonlyused today.

CSCI 6636 – 4536 Lecture 5. . . 7/36 February 25, 2020 7 / 36

Page 8: Structure of Programming Languages Lecture5eliza.newhaven.edu/lang/attach/L5-Parsing.pdfSo a new notation called Backus Naur Form (BNF) was developed, better adapted to the character

Context-free Languages Extended BNF

The syntax for EBNF itself

EBNF is a notation for writing context-free grammars.We say it is a metalanguage, that is, a language for describing languages.

Nonterminal symbols will be written in non-bold type and/or enclosedin < . . . >.

Terminal symbols will be written in boldface and/or enclosed in‘single quotes’.

Production rules. The nonterminal being defined is written at the left,followed by an “=” sign (which we will pronounce as “becomes”).After this is a set of options, which define how the nonterminal canbe expanded. The rule extends up to but does not include the “;” atthe end.

When a nonterminal is expanded it is replaced by one of the optionsfrom its definition.

Blank spaces between the “=” and the “;” are ignored.

CSCI 6636 – 4536 Lecture 5. . . 8/36 February 25, 2020 8 / 36

Page 9: Structure of Programming Languages Lecture5eliza.newhaven.edu/lang/attach/L5-Parsing.pdfSo a new notation called Backus Naur Form (BNF) was developed, better adapted to the character

Context-free Languages Extended BNF

Syntax for EBNF Production Rules

Alternatives are separated by vertical bars.This indicates that an ‘s’ may be replaced by an ‘a’ or a ‘bc’:

s ::= a | bc .

Parentheses may be used to indicate grouping. For example, thisindicates that an ‘s’ may be replaced by an ‘ad’ or a ‘bcd’.

s ::= ( a | bc ) d .

Something enclosed in square brackets is optional. For example, thisrule says that an ‘s’ may be replaced by an ‘ad’ or simply by a ‘d’:

s ::= [a] d .

Zero or more repetitions of a unit is indicated by enclosing the unit incurly braces. This rule says that an ‘s’ may be replaced by a ‘d’, an‘ad’, an ‘aad’, or a string of any number of ‘a’s followed by a single‘d’ and one or more ‘b’s.

s ::= {a} d b {b} .

CSCI 6636 – 4536 Lecture 5. . . 9/36 February 25, 2020 9 / 36

Page 10: Structure of Programming Languages Lecture5eliza.newhaven.edu/lang/attach/L5-Parsing.pdfSo a new notation called Backus Naur Form (BNF) was developed, better adapted to the character

Context-free Languages Extended BNF

Example: A context-free grammar and EBNF notation.

This grammar defines the part of Pascal arithmetic expressions thatapplies to primitive types.

V is {addOp, multiplyOp, relationalOp, expression, simpleExpr, sign,term, factor, variableAccess, unsignedConstant}The starting symbol is expression .

Σ is not, =, and all 15 operators

R is this set of rules:1. relationalOp = <, <=, >, >=, =, <> ;2. addOp = +, -, or, xor ;3. multiplyOp = * | / | div | mod | and ;

4. expression = simpleExpr { relationalOp simpleExpr };5. simpleExpr = [ sign ] term { addOp term } ;6. term = factor {multiplyOp factor } ;7. factor = variableAccess | unsignedConstant | (expression) | (not factor);

CSCI 6636 – 4536 Lecture 5. . . 10/36 February 25, 2020 10 / 36

Page 11: Structure of Programming Languages Lecture5eliza.newhaven.edu/lang/attach/L5-Parsing.pdfSo a new notation called Backus Naur Form (BNF) was developed, better adapted to the character

Context-free Languages Extended BNF

Applying the grammar for expressions.

Consider the expression 3 ∗ (x + 2) < limit

a. x is a variableAccess, so is limit.

b. A variableAccess is a factor and a factor is a term.

c. x+2 is a simpleExpr: term + term with no sign.

d. A simpleExpr is an expression, and with the parenthesis, the wholeunit is a factor.

e. 3*(x+2) is a term: factor multiplyOp factor .

f. So it is also a simpleExpr .

g. 3 ∗ (x + 2) < limit, is an expression: simpleExpr relationalOpsimpleExpr .

Rule 7 contains a recursive reference to rule 4.Steps d and g show nesting, the result of applying that recursive definition.

CSCI 6636 – 4536 Lecture 5. . . 11/36 February 25, 2020 11 / 36

Page 12: Structure of Programming Languages Lecture5eliza.newhaven.edu/lang/attach/L5-Parsing.pdfSo a new notation called Backus Naur Form (BNF) was developed, better adapted to the character

Context-free Languages Extended BNF

Example: An EBNF grammar for AS.

This grammar defines a nonsense language called AS.

V is {S ,A} and the starting symbol is S

Σ is x ( )

R: On the left is the boldface presentation of the rules; on the right isthe machine-compatible version that uses angle brackets aroundnon-terminals.

1. S ::= A . 1. S = < A > ;2. A ::= ( S ) . 2. A = ( < S > ) ;3. A ::= ASA . 3. A = < A >< S >< A > ;4. A ::= x . 4. A = x ;

Rules 2, 3, and 4 can be consolidated to : A ::= ASA | ( S ) | x .

CSCI 6636 – 4536 Lecture 5. . . 12/36 February 25, 2020 12 / 36

Page 13: Structure of Programming Languages Lecture5eliza.newhaven.edu/lang/attach/L5-Parsing.pdfSo a new notation called Backus Naur Form (BNF) was developed, better adapted to the character

Context-free Languages Extended BNF

Describing Programming Language Syntax

This grammar illustrates how matched and nested symbols are generated.

Start a derivation by writing down the starting symbol.

Apply rules to nonterminal symbols, in any order, to reach your goal.

Stop when all the nonterminals are gone.

Any rule that introduces a left-paren must also introduce a matchingright-paren.

The grammar is recursive so that parenthesized units can be producedinside other pairs of parentheses.

CSCI 6636 – 4536 Lecture 5. . . 13/36 February 25, 2020 13 / 36

Page 14: Structure of Programming Languages Lecture5eliza.newhaven.edu/lang/attach/L5-Parsing.pdfSo a new notation called Backus Naur Form (BNF) was developed, better adapted to the character

Context-free Languages Extended BNF

Example: What strings are in the AS language?

Following are a few examples of AS derivations.

S → A → x .

S → A → (S) → (A) → (x) .

S → A → ASA → xSx → xAx → xxx .

S → A → ASA → (S)Sx → (A)Ax → (x)xx .

CSCI 6636 – 4536 Lecture 5. . . 14/36 February 25, 2020 14 / 36

Page 15: Structure of Programming Languages Lecture5eliza.newhaven.edu/lang/attach/L5-Parsing.pdfSo a new notation called Backus Naur Form (BNF) was developed, better adapted to the character

Context-free Languages Extended BNF

Example: An EBNF Grammar for Nonsense.

This grammar includes a loop and an optional element.

The starting symbol is S .

Nonterminal symbols are: S, stop

Terminal symbols are: A B C D E –

Productions:S ::= E { – E } B stopS ::= [ stop ] A stopstop ::= C | D

We use this grammar to generate four Nonsense sentences:S

A stop

A D

S

stop A stop

D A D

S

E B stop

E B C

S

E - E - E B stop

E - E - E B D

CSCI 6636 – 4536 Lecture 5. . . 15/36 February 25, 2020 15 / 36

Page 16: Structure of Programming Languages Lecture5eliza.newhaven.edu/lang/attach/L5-Parsing.pdfSo a new notation called Backus Naur Form (BNF) was developed, better adapted to the character

Context-free Languages Syntax Diagrams

Syntax Diagrams

An alternative formal definition metalanguage was developed for Pascal; itis often called “railroad diagrams”. It has the same elements as EBNF,but they are presented in a 2D graphic format:

Terminal symbols are boldface and enclosed in ovals. Nonterminalsymbols are written in non-bold type.

Production rules: the nonterminal being defined is written at the left,followed by an arrow.

Alternatives are shown by branches in the arrow.

To expand a nonterminal, follow some branch of the arrow to its endat the right.

An optional element is handled by an empty arrow branching aroundit.

Repetitions of a unit are shown by the arrow looping back on itself.

CSCI 6636 – 4536 Lecture 5. . . 16/36 February 25, 2020 16 / 36

Page 17: Structure of Programming Languages Lecture5eliza.newhaven.edu/lang/attach/L5-Parsing.pdfSo a new notation called Backus Naur Form (BNF) was developed, better adapted to the character

Context-free Languages Syntax Diagrams

AS in Syntax Diagrams

We have an alternative and a recursive rule.

SAASAAAx(S)(S)x(A)(A)x(x)((S))x(x)((A))x(x)((x))x

SA( S ) ( A )( x )

SAx

)(

x

S AA A S A

S

CSCI 6636 – 4536 Lecture 5. . . 17/36 February 25, 2020 17 / 36

Page 18: Structure of Programming Languages Lecture5eliza.newhaven.edu/lang/attach/L5-Parsing.pdfSo a new notation called Backus Naur Form (BNF) was developed, better adapted to the character

Context-free Languages Syntax Diagrams

Nonsense in Syntax Diagrams

Here is a looping rule and an optional element.

S

stop

stop

D

C

E B

A

stop

-SA stopA D

SE B stopE B C

Sstop A stopC A D

SE-E-E B stopE-E-E B D

CSCI 6636 – 4536 Lecture 5. . . 18/36 February 25, 2020 18 / 36

Page 19: Structure of Programming Languages Lecture5eliza.newhaven.edu/lang/attach/L5-Parsing.pdfSo a new notation called Backus Naur Form (BNF) was developed, better adapted to the character

The Definition of Pascal

Pascal Syntax

Here are large parts of the definition of Pascal.Productions involving type declarations have been omitted.

EBNF definition of a Pascal program

Syntax Diagrams for Pascal expressions

CSCI 6636 – 4536 Lecture 5. . . 19/36 February 25, 2020 19 / 36

Page 20: Structure of Programming Languages Lecture5eliza.newhaven.edu/lang/attach/L5-Parsing.pdfSo a new notation called Backus Naur Form (BNF) was developed, better adapted to the character

The Definition of Pascal

The Syntax for part of Pascal.

program ::= <program-heading> ; <program-block> . .

program-heading ::=program <identifier> [ ( <program-parameters> ) ].

program-parameters ::= <identifier-list> .

identifier-list ::= <identifier> { , <identifier> } .

program-block ::= <block> .

block ::= <label-declaration-part> <constant-declaration-part><type-declaration-part> <variable-declaration-part><procedure-and-function-declaration-part><statement-part>.

variable-declaration-part ::= [ var{<identifier-list> :<typename>; }].

CSCI 6636 – 4536 Lecture 5. . . 20/36 February 25, 2020 20 / 36

Page 21: Structure of Programming Languages Lecture5eliza.newhaven.edu/lang/attach/L5-Parsing.pdfSo a new notation called Backus Naur Form (BNF) was developed, better adapted to the character

The Definition of Pascal

Continuing with Pascal.

statement-part ::= <compound statement> .

compound-statement ::= begin <statement-sequence> end.

statement-sequence ::= <statement> { ; <statement> } .

statement ::= [ <label> : ]( <simple-statement> | <structured-statement> ).

simple-statement ::=<empty-statement> | <assignment-statement> |<procedure-call-statement> | <goto-statement> .

structured-statement ::=<compound-statement> | <conditional-statement> |<repetitive-statement> | <with-statement> .

CSCI 6636 – 4536 Lecture 5. . . 21/36 February 25, 2020 21 / 36

Page 22: Structure of Programming Languages Lecture5eliza.newhaven.edu/lang/attach/L5-Parsing.pdfSo a new notation called Backus Naur Form (BNF) was developed, better adapted to the character

The Definition of Pascal

Simple Statements in Pascal.

empty-statement ::= .

assignment-statement ::=( <variable-reference> | <function-name> ) ’:=’ <expression> .

procedure-call-statement ::= <IO-procedure-statement> |<procedure-identifier> [ ( <actual-parameter-list> ) ] .

IO-procedure-statement := read <read-parameter-list >| readln <readln-parameter-list> |write <write-parameter-list>|writeln <writeln-parameter-list> .

goto-statement ::= goto <label> .

label-declaration ::= [ label <label> { , <label> } ] .

label ::= <digit-sequence> .

CSCI 6636 – 4536 Lecture 5. . . 22/36 February 25, 2020 22 / 36

Page 23: Structure of Programming Languages Lecture5eliza.newhaven.edu/lang/attach/L5-Parsing.pdfSo a new notation called Backus Naur Form (BNF) was developed, better adapted to the character

The Definition of Pascal

Conditionals in Pascal.

compound-statement := begin <statement> { ; <statement>} end.

conditional-statement ::= <if-statement> | <case-statement> .

if-statement ::= if <boolean-expression>then <statement> [<else-part> ] .

else-part ::= else <statement> .

case-statement::= case <case-index> of<case-list-element> { ; <case-list-element> } [; ] end .

case-list-element ::= case-constant-list : <statement> .

case-constant-list ::= case-constant { , case-constant } .

case-constant ::= constant .

CSCI 6636 – 4536 Lecture 5. . . 23/36 February 25, 2020 23 / 36

Page 24: Structure of Programming Languages Lecture5eliza.newhaven.edu/lang/attach/L5-Parsing.pdfSo a new notation called Backus Naur Form (BNF) was developed, better adapted to the character

The Definition of Pascal

Loops and With in Pascal.

repetitive-statement ::=<repeat-statement> | <while-statement> | <for-statement> .

repeat-statement ::= repeat <statement-sequence> until<boolean-expression> .

while-statement ::= while <boolean-expression> do <statement> .

for-statement ::= for <control-variable> := <initial-value> [ to |downto ] <final-value> do <statement> .

with-statement ::= with <record-variable-list> do <statement> .

CSCI 6636 – 4536 Lecture 5. . . 24/36 February 25, 2020 24 / 36

Page 25: Structure of Programming Languages Lecture5eliza.newhaven.edu/lang/attach/L5-Parsing.pdfSo a new notation called Backus Naur Form (BNF) was developed, better adapted to the character

The Definition of Pascal

Pascal Expressions

+

term

term

or

simple expression

+

expression

>=

simple expression

in<=simple

expression

=< <> >

CSCI 6636 – 4536 Lecture 5. . . 25/36 February 25, 2020 25 / 36

Page 26: Structure of Programming Languages Lecture5eliza.newhaven.edu/lang/attach/L5-Parsing.pdfSo a new notation called Backus Naur Form (BNF) was developed, better adapted to the character

The Definition of Pascal

Pascal Expressions

function designator

( )actual parameter

,

/

term

*

factor

factor

div mod and

function identifier

CSCI 6636 – 4536 Lecture 5. . . 26/36 February 25, 2020 26 / 36

Page 27: Structure of Programming Languages Lecture5eliza.newhaven.edu/lang/attach/L5-Parsing.pdfSo a new notation called Backus Naur Form (BNF) was developed, better adapted to the character

The Definition of Pascal

Pascal Expressions

expression

factor

( )

unsigned constant

not

variablefunction designator

factor

set value

CSCI 6636 – 4536 Lecture 5. . . 27/36 February 25, 2020 27 / 36

Page 28: Structure of Programming Languages Lecture5eliza.newhaven.edu/lang/attach/L5-Parsing.pdfSo a new notation called Backus Naur Form (BNF) was developed, better adapted to the character

Parsing

Parsing

Parsing

Ad-hoc ParsingParsing Based on EBNF

CSCI 6636 – 4536 Lecture 5. . . 28/36 February 25, 2020 28 / 36

Page 29: Structure of Programming Languages Lecture5eliza.newhaven.edu/lang/attach/L5-Parsing.pdfSo a new notation called Backus Naur Form (BNF) was developed, better adapted to the character

Parsing

Old Languages were Parsed Ad-Hoc

These comments reflect FORTRAN-IV.

The language itself was created by collecting a lot of features.Everything about it was non-uniform and full of special cases. Forexample, there were half a dozen ways to punctuate a series of items.Syntax diagrams occupied 40 pages, versus 6 for Pascal.

Everything was made more difficult because the language definitionsaid that spaces were ignored.

A FORTRAN-IV parser was basically hand-built. It would look at thenext source-code character and try to figure out what it might be,given the current context.

This is a famous FORTRAN parsing problem that illustrates what iswrong with ad-hoc design: DO 200 I=1,10,2

Since DO200I is a legal variable name, we can’t know whether this isan assignment statement or a DO loop until the first comma-token isfound.

CSCI 6636 – 4536 Lecture 5. . . 29/36 February 25, 2020 29 / 36

Page 30: Structure of Programming Languages Lecture5eliza.newhaven.edu/lang/attach/L5-Parsing.pdfSo a new notation called Backus Naur Form (BNF) was developed, better adapted to the character

Parsing

Ad-Hoc Languages Today

We can list several current languages with no rhyme or reason in thedesign:

The C-shells:bash, tcsh, and other UNIX shell languages and scripts.

Perl

TeX and LATeX

These are hard to learn and hard to write correctly. They are parsed andinterpreted in an ad-hoc manner. Often the semantics are complicated andhard to understand.

CSCI 6636 – 4536 Lecture 5. . . 30/36 February 25, 2020 30 / 36

Page 31: Structure of Programming Languages Lecture5eliza.newhaven.edu/lang/attach/L5-Parsing.pdfSo a new notation called Backus Naur Form (BNF) was developed, better adapted to the character

Parsing LL Parsers

Recursive Descent Parsing: LL(k) languages

A recursive descent parser is a top-down parser built from a set ofmutually-recursive and/or non-recursive procedures.

Each procedure implements one of the rules of the grammar. Thusthe structure of the parser closely mirrors that of the grammar itrecognizes.

A linear-time parser can be built for any language in which alook-ahead of k input symbols allows the parser to decide whichproduction to use next. (k is a non-negative integer constant).

An ambiguous grammar cannot be parsed this way.

Also, the grammar cannot contain left-recursive rules, of the formexpr :: expr + term. However, right-recursive rules, of the formexpr :: term + expr are not a problem.

CSCI 6636 – 4536 Lecture 5. . . 31/36 February 25, 2020 31 / 36

Page 32: Structure of Programming Languages Lecture5eliza.newhaven.edu/lang/attach/L5-Parsing.pdfSo a new notation called Backus Naur Form (BNF) was developed, better adapted to the character

Parsing LL Parsers

Top-down Parsing

The recursive descent parser starts with the starting symbol of thegrammar and the beginning of the tokenized source-code file.

It then attempts to find a match for the left end of one of thepossible definitions of the starting symbol.

If the left end is found, it calls itself recursively, with the rest of thesource code, to find a match for the next part of the production.

This process works its way through the source code and down the listof productions. It will terminate successfully when the inner, recursivecalls have all terminated and a match is found for the rightmostelement in the original starting production.

If it fails at any point, it has recognized a syntax error.

CSCI 6636 – 4536 Lecture 5. . . 32/36 February 25, 2020 32 / 36

Page 33: Structure of Programming Languages Lecture5eliza.newhaven.edu/lang/attach/L5-Parsing.pdfSo a new notation called Backus Naur Form (BNF) was developed, better adapted to the character

Parsing LL Parsers

Recursive Descent Parsing with Backtracking

A less-efficient top-down method exists for grammars that do notmeet the criteria above.

The parser works as above, but if it fails at any point, it willbacktrack and try another option from the current production.

This process will terminate when it succeeds or when possibilitieshave been attempted.

Parsers that use recursive descent with backtracking may requireexponential time.

CSCI 6636 – 4536 Lecture 5. . . 33/36 February 25, 2020 33 / 36

Page 34: Structure of Programming Languages Lecture5eliza.newhaven.edu/lang/attach/L5-Parsing.pdfSo a new notation called Backus Naur Form (BNF) was developed, better adapted to the character

Parsing LR Parsers

LR Parsers are bottom up

An LR(k) parser analyzes the source code from left to right with alook-ahead of k input tokens.

An LR parser starts with the leaves of the parse tree (the tokens) andattempts to build up from there to the starting symbol.

It detects a syntactic error when the input does not conform to thegrammar.

The syntax of many programming languages can be defined by agrammar that is LR(1), or close to being so, and for this reason LRparsers are often used in compilers.

LR parsers are difficult to produce by hand and they are usuallyconstructed by a parser generator also called a compiler-compiler.

CSCI 6636 – 4536 Lecture 5. . . 34/36 February 25, 2020 34 / 36

Page 35: Structure of Programming Languages Lecture5eliza.newhaven.edu/lang/attach/L5-Parsing.pdfSo a new notation called Backus Naur Form (BNF) was developed, better adapted to the character

Parsing LR Parsers

Compiler Compilers

A compiler compiler is a program whose input is a description of thelanguage and whose output is a compiler. Yacc is a well-known Unixcompiler compiler. The Gnu version is called bison. The inputs are:

A formal definition of the language’s lexical structure (expressed inEBNF).

A formal definition of preprocessing directives, if any, and theircorresponding actions.

The EBNF definition of the language syntax, given that tokens havealready been identified.

The code to be generated for each fully-parsed nonterminal symbol inthe grammar.

The compiler compiler produces the compiler that will build a parse tree(front end) and transmute the tree into the corresponding object code(back end).

CSCI 6636 – 4536 Lecture 5. . . 35/36 February 25, 2020 35 / 36

Page 36: Structure of Programming Languages Lecture5eliza.newhaven.edu/lang/attach/L5-Parsing.pdfSo a new notation called Backus Naur Form (BNF) was developed, better adapted to the character

Homework

Homework 5: 12 points1 (2) Generate a legal string of Nonsense that is longer than 10

terminals. Refer to the grammar on pages 13 and 18.2 (2) Write an EBNF rule that defines a FORTH infinite loop

(begin...repeat). This is a loop with no while inside it. Look up theprecise definition in the FORTH reference spreadsheet. If you cannotfind it, ask me. Invent any nonterminal symbols that you like, but use< word > and < words > to represent one or more symbols.

3 (4) Look up the syntax for the FORTH counted loop and draw asyntax diagram for it. Diagram both forms, one where you add to theloop variable on each iteration and the other where you subtract fromit. The termination conditions are different and slightly tricky.

4 (4) Write a FORTH function that uses an if statement and a countedloop. Take one parameter (a number) off the stack. If it is less than3, print an error comment. Otherwise, print the word ”hooray” thatmany times. Turn in the code and the results by using cut-and-paste.Please no screen shots of a black-background screen.

CSCI 6636 – 4536 Lecture 5. . . 36/36 February 25, 2020 36 / 36