doctor syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · doctor...

131

Upload: dodien

Post on 20-Mar-2018

230 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems
Page 2: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Doctor Syntax was the fictitious hero of several 19th century satiricalpoems by William Combe (1742–1823). The publications wereillustrated by Thomas Rowlandson (1756–1827).

Rowlandson’s many comic illustrations offered humorouscommentary on the political and social conventions of hisday. Rowlandson is perhaps best known for his animateddrawings of the mis-adventures of the infamous Dr. Syntax.The various escapades of the fictional English clergymanand schoolmaster, Dr. Syntax, are chronicled in threevolumes or “Tours.” The Tours of Dr. Syntax were satires ona then contemporary series of picturesque adventures tovarious parts of England with an emphasis on landscapeillustration and poetic verse.

I The tour of Doctor Syntax in search of the picturesque

I The second tour of Doctor Syntax in search of consolation

I The third tour of Doctor Syntax in search of a wife

Page 3: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Overview of Syntax

I Study of Language — page ??I Structure — page ??I Formal Languages — page ??I Regular Expressions — page ??I BNF, ∗ Context Free Grammar, parse trees — page ??I ∗Attribute Grammars, ∗Post Systems

Page 4: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Semiotics

Semiotics. Date: 1880. A general philosophical theory of signs andsymbols that deals especially with their function in both artificiallyconstructed and natural languages. The application of linguisticmethods to other than natural languages. A philosophy thatemphasizes the significance of language as universal.In his 1946 book, Signs, Language, and Behavior, Americanphilosopher Charles William Morris (1901–1979) gave thesedefinitions for the branches of the science of signs

I pragmatics — origin, uses and effects of signs with the behaviorin which they occur;

I semantics — the signification of signs in all modes of signifying;

I syntax — combination of signs without regard for their specificsignifications or their relations to the behavior in which they occur.

Page 5: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Study of Language

I syntax is the form, structure of language

I semantics is the meaning of language

I pragmatics is origin, use, context of language

Distinction between semantics and syntax:

1. number versus numeral

2. character versus glyph

3. conjunction versus ∧

Page 6: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Syntax in NL

’Twas brillig, and the slithy tovesDid gyre and gimble in the wabe;

All mimsy were the borogoves,And the mome raths outgrabe.

Lewis Carroll, “Jabberwocky”

We know certain things—for instance, that all this happened in thepast and that there was more than one tove but only one wabe. Weknow these things because of the grammatical structure of the Englishlanguage.

Page 7: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Syntax

Can you translate the poem “Jabberwocky” into other languages?

’Twas BRKPT and the I/O queueWas SYMMING FASTRAND like the wind.

All idle was the CAUAs the last run had just FINNED.

“Beware the UNIVAC, my son,Its FASTRAND and its high-speed drum,

And FIELDATA, and listen forThe CTMC’s hum.”

Harry Gilbert, mid 1970s

Page 8: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Syntax

Can you translate the poem “Jabberwocky” into other languages?

’Twas BRKPT and the I/O queueWas SYMMING FASTRAND like the wind.

All idle was the CAUAs the last run had just FINNED.

“Beware the UNIVAC, my son,Its FASTRAND and its high-speed drum,

And FIELDATA, and listen forThe CTMC’s hum.”

Harry Gilbert, mid 1970s

Page 9: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Arpawocky

Also, Arpawocky, RFC527, 22 June 1973, by D. L. Covill

Twas brillig, and the ProtocolsDid USER-SERVER in the wabe.

All mimsey was the FTP,And the RJE outgrabe,

Beware the ARPANET, my son;The bits that byte, the heads that scratch;

Beware the NCP, and shunthe frumious system patch,

Page 10: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Structure

The arrangement of elements of an entity and their relationships toeach other.

I Linear structure – like links of a chain

I Hierarchical structure – like nested mixing bowls, or boxes withinboxes

I Trees – like family tree, or railroad yard

Grammars are a way of describing these and more complex structures.

Page 11: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Structure

Page 12: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Structure

Even though it is invisible, structure has great importance, evenmeaning. Sentence structure is crucial to disambiguate the followingsentences.

1. The chicken is ready to eat.

Who is eating?

2. He saw that gasoline can explode.What is exploding?

3. Flying planes can be dangerous.What is dangerous?

Page 13: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Structure

Even though it is invisible, structure has great importance, evenmeaning. Sentence structure is crucial to disambiguate the followingsentences.

1. The chicken is ready to eat.Who is eating?

2. He saw that gasoline can explode.

What is exploding?

3. Flying planes can be dangerous.What is dangerous?

Page 14: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Structure

Even though it is invisible, structure has great importance, evenmeaning. Sentence structure is crucial to disambiguate the followingsentences.

1. The chicken is ready to eat.Who is eating?

2. He saw that gasoline can explode.What is exploding?

3. Flying planes can be dangerous.

What is dangerous?

Page 15: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Structure

Even though it is invisible, structure has great importance, evenmeaning. Sentence structure is crucial to disambiguate the followingsentences.

1. The chicken is ready to eat.Who is eating?

2. He saw that gasoline can explode.What is exploding?

3. Flying planes can be dangerous.What is dangerous?

Page 16: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

programminglanguages

language design,description, semantics

compilers

implementation,recognition

formallanguages

expressivity

Page 17: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

formallanguages,automata

models

computabilty

limits

complexity

efficiency

Page 18: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Bad/Confusing Syntax In PL

The science of syntax is about unambiguous communication and assuch is neutral. Let us at least be clear.Hard to define “good” and “bad” syntax. Only experience helps, andthat takes time. The lessons of programming language design are adhoc.

Page 19: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Bad/Confusing Syntax In PL

DO 10 I = 1.5A(I) = X + B(I)

10 CONTINUE

IF IF=THEN THEN THEN=ELSE; ELSE ELSE=IFIF IF THEN THEN=ELSE; ELSE ELSE=0;

while ( ... ) {...break;...

}

Page 20: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Bad Syntax In Compilers

Not LL but is LR:

stm ::= "if" expr "then" stm "else" stm| "if" expr "then" stm

cf. Jacc FAQ 8.1

Page 21: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Lexical Structure Versus Phrase Structure

In lexical structure the alphabet used is a set of characters (e.g,Latin1) and the goal is to categorize substring into different tokenseach described by their own language.In phrase structure, the alphabet used is a set tokens, and one goal isto determine if the program is syntactically correct (i.e., is the phrase inthe language or not). Actually the goal is more broad than that. Thegoal is to identify the phrase structure or derivation of syntacticallycorrect programs.

Page 22: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Three Views of a Program

Page 23: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Program asStream of Characters

Page 24: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Program asStream of Tokens

Page 25: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Program asTree Structure

Page 26: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Whitespace

Free-format as opposed to one statement per line. In earlier versionsof FORTRAN a fixed lexical structure of one statement per punch cardstating on column 7. COBOL and Basic also used to be fixed format.For the most part languages today permit the programmer to choosethe white space (and hence the layout) of the program. Some layoutsare clearly more legible and others, so the need for style guidelines.Python, Haskell, and Occam require proper indenting.

Should a language specify the layout (use of white space)?

Page 27: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Formal Language Versus Natural Language

Natural language is full of unresolvable differences and “jaggededges.” (This makes it rich and interesting.)

Page 28: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Formal Language Versus Natural Language

Page 29: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Formal Language

First some definitions:

I object language, the language being studied and meta language,the language used to communicate

I alphabet is a set of atomic symbols

I strings is a sequence of symbols

I formal language is a set of strings

Unlike natural languages, it is unambiguous for a formal language L ifs ∈ L or s /∈ L.The key challenge is to find good (finite) representation of (infinite)sets.

Page 30: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Object and Meta Language

Object language versus meta-language. The object language is thelanguage being studied, and the meta-language is the language beingused to communicate.For example, in the class French 101 one may study the Frenchlanguage, it is the object language. To communicate to the students itmay be necessary for the teacher to speak in English. English is themeta-language in this case.

Page 31: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

AlphabetAlphabet. The alphabet is the finite set of symbols comprising theformal language. These are the basic, atomic, distinct elements.

Σ = {a,b,c,d}Σ = {0,1}Σ = {a,b, . . . ,z,0,1, . . . ,9}Σ = Latin-1

Σ = Unicode

Σ = {if, then, (, ), ’,’, . . .}

Theoreticians like to use Σ = {0,1} because it is so simple.Sometimes the symbols of the alphabet may be chosen in a way tosuggest their meaning or use. The symbols might even be composedof other sub-atomic symbols. Important character sets forprogramming languages to use and be written in are Latin-1, Latin-0,and Unicode, not just US-ASCII.

Page 32: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

ISO 8859-13 (Latin-0)

NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SS SIDLE DC1 DC2 DC3 DC4 NAK SYN ETP CAN EM SUB ESC FS GS RS US

! " # $ % & ’ ( ) * + , - . /0 1 2 3 4 5 6 7 8 9 : ; < = > ?@ A B C D E F G H I J K L M N OP Q R S T U V W X Y Z [ \ ] ˆ _` a b c d e f g h i j k l m n op q r s t u v w x y z { | } ˜ DEL

XXX XXX BPH NBH IND NEL SSA ESA HTS HTJ VTS PLD PLU RI SS2 SS3DCS PU1 PU2 SST CCH MS SPA EPA SOS XXX SCI CSI ST OSC PM APC

¡ ¢ £ ¥ S § s © ª « ¬ SHY ® ¯° ± 2 3 Z µ ¶ · z 1 º » Œ œ Y ¿A A A A A A Æ C E E E E I I I IÐ N O O O O O × Ø U U U U Y Þ ßa a a a a a æ c e e e e ı ı ı ıd n o o o o o ÷ ø u u u u y þ y

Page 33: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Unicode Supports Many Scripts

Page 34: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Strings

String. Given an alphabet Σ = {a,b,c,d}. A string is an (ordered)sequence of symbols from the alphabet. E.g., aaa, bac, d .The sequence may contain no symbols, this empty string is given aspecial notation "". (Some authors use ε, the Greek letter epsilon.)The most important operation on strings is string concatenation whichis sometimes denoted by the · operator. E.g., ab ·db = abdb,da ·ba = daba.

Page 35: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

More Strings

Given an alphabet Σ = {0,1}, the following are strings: 0, ε, 111, 010,1.Given an alphabet Σ = {a,b, . . . ,z}, the following are strings: ε, x ,aeiou, hello, biopsy .Using Latin-1 as the alphabet, the following are strings: ¡Hola!,Hola!, ¿Que pasa?, "Hi. Bye", großtergeminsamer Teiler.Using the alphabet Σ

{if, then, else, begin, end, (, ), ‘,’}

the following are strings:1. else )

2. if then

3. begin begin , , end end

Page 36: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Formal Language

Formal language. A formal language is a set of strings over somealphabet.Given an alphabet Σ = {0,1}, the following are formal languages:

L0 = /0

L1 = {1}L2 = {1,0101}L3 = {0,01,00}L4 = {1,01,11,001,011,101,111, . . .}L5 = {ε,1,11,111,1111, . . .}L6 = {ε,0,1,00,01,10,11,000, . . .}= Σ∗

The collection of legal programs in a programming language is also anexample of a formal language.

Page 37: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Chomsky Hierarchy

Page 38: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Chomsky Hierarchy

name machine

0 unrestricted Turing machine, Post systems1 context-sensitive linear-bounded automata2 context-free pushdown automata, BNF3 regular finite automata, regular expression

I Regular expressions are used to describe tokens in programminglanguages

I BNF, CFG used to describe syntax of programming languages

I [Context-sensitive languages play no role in programminglanguages]

I attribute grammars, Post systems used in semantics

Page 39: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Description Mechanisms1. Regular expressions (Appel, page 19, but not in Webber or

Mitchell; briefly mentioned in Sebesta)2. BNF (Chapters 2 and three of Webber)

“The programming language Ada expresses computation.”“The programming language Ada is notation for computation.”“A program in Java denotes a computation or algorithm.”“Regular expressions express formal languages (sets of strings).”“Regular expressions are notation for formal languages.”“A regular expression denotes a formal language.”“BNF expresses formal languages.”“BNF is notation for formal languages.”“A BNF definition denotes a formal language.”Now consider: a programming language (like Ada) is a formallanguage.We can try to describe Ada using regular expressions or BNF.

Jeffrey E. F. Friedl. Mastering Regular Expressions, second edition.O’Reilly: Sepastapol, California, 2002. ISBN: 0-596-00289-0.

Page 40: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Regular Expressions

Some people, when confronted with a problem think “I know,I’ll use regular expressions.” Now they have two problems.

From a post by Jamie Zawinski to alt.religion.emacs in 1997.

Page 41: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Regular Expressions (Syntax)

1. Empty. /0

2. Atom. Any single symbol of a ∈ Σ is a regular expression.

3. Alternation. If r1 is a regular expression and r2 is a regularexpression, then (r1 + r2) is a regular expression. [Maybe (r1|r2)is better.]

4. Concatenation. If r1 and r2 are regular expressions, then (r1 · r2)is a regular expression.

5. Closure. If r is a regular expression, then r∗ is a regularexpression.

Page 42: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Regular Expressions (Examples)

a ·ba + ba · (b + c)a + (b · c)a∗

a + (b · c)∗

(a + (b · c))∗

(using the alphabet Σ = {a,b,c})

Page 43: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Regular Expressions (Examples)

(Omitting the “centered dot” and using the | for alternation.)

aba | ba(b | c)a | (bc)a∗

a | (bc)∗

(a | (bc))∗

(using the alphabet Σ = {a,b,c})

Page 44: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Regular Expressions (Examples)

011011 + 01(0 + 1)0 + (10)0∗

1 + (01)∗

(0 + (10)∗

((0 + (10)∗)∗

(using the alphabet Σ = {0,1} and omitting the · symbol)

Page 45: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Regular Expressions (Semantics)

That is what regular expressions look like (syntax). What do regularexpressions mean (semantics)?Each regular expression denotes a formal language (a set of strings).There are an infinite number of regular expressions. (Each one isfinitely constructed.) Just like programs in programming languages.We must find a way to define the meaning or denotation of eachregular expression.So, we define a (recursive) function from regular expressions (aspieces of syntax) to sets of strings (languages).

Page 46: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Regular Expressions (Semantics)

1. Empty. The language with no strings.

2. Atom. The language with one string of length one–a itself. Notethat the notation a is ambiguous. It stands for both the symboland the language.

3. Alternation. (r1 + r2) is the union of two languages.

4. Concatenation. (r1 · r2) is the set all strings beginning in onelanguage and then followed by a string in the second.

5. Closure. r∗ is zero, one, or more strings.

Page 47: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Regular Expressions (Examples)

a ·b = {ab}a + b = {a,b}

a · (b + c) = {ab,ac}a + (b · c) = {a,bc}

a∗ = {ε,a,aa,aaa, . . .}a + (b · c)∗ = {ε,a,bc,bcbc, . . .}

(a + (b · c))∗ = {ε,a,bc,aa,abc,bcbc, . . .}

(using the alphabet Σ = {a,b,c})

Page 48: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Other Definitions

Some authors replace one of the five cases of the definition

1. Empty. /0 denoting the language with no strings

with another case

1. Epsilon. ε denoting the set consisting of the single string “”

Do you see the difference?

Notice the ε is superfluous as {ε} is represented by /0∗. Can you provethis?

Page 49: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

∗Regular Expressions (Semantics)

1. Empty.

DJ /0K = {}

2. Atom.

DJa1K = {a1}...

DJanK = {an}

3. Alternation.

DJ(r1 + r2)K = DJr1K∪DJr2K

Page 50: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

∗Regular Expressions (Semantics)

4. Concatenation.

DJ(r1 · r2)K = {x · y | x ∈DJr1K, y ∈DJr2K}

where x · y is string concatenation.

5. Closure.DJr∗K =

⋃i

(DJrK)i

where Si is defined recursively as follows:

S0 = {ε}Si+1 =

{x · y | x ∈ S, y ∈ Si}

Page 51: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

∗Using the Definitions

DJ(a ·b)K = {x · y | x ∈DJaK, y ∈DJbK}= {x · y | x ∈ {a}, y ∈ {b}}= {a ·b}= {ab}

DJ(a + (a ·b))K = DJaK∪DJ(a ·b)K= {a}∪{ab}= {a, ab}

DJ(a + (b · c))∗K =⋃

i

(DJ(a + (b · c))K)i

= {ε}∪{a, bc}∪DJ(a + (b · c))K2∪ . . .= {ε, a, bc}∪{aa,abc,bca,bcbc}∪ . . .= {ε, a, aa, bc, abc, bca, bcbc}∪ . . .= {ε,a,aa,bc,aaa,abc,bca,aabc,abca,bcaa,bcbc, . . .}

Page 52: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

∗ Digression: Induction

Is the recursively defined function D well-defined? Yes.When are such recursively defined functions well-defined? Overfree-generated sets.

Page 53: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

∗ Digression: Induction

Consider the following inductive definition of a subset M of Nat , thenatural numbers, using integer multiplication ·:

I 1 ∈M,

I if n ∈M, then 9 ·n ∈M,

I if n ∈M, then 23 ·n ∈M.

Suppose we define the function g : M→ Nat inductively by:

I g(1) = 1,

I g(9 ·n) = 9,

I g(23 ·n) = 23.

Prove that 0=1!

Page 54: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

More Regular Expressions

To make regular expressions more convenient, regular expressionsare almost always extended with new notation. Here are someadditional meta-symbols commonly seen (perhaps with differentsyntax). Regular expressions with these new meta-symbols can bedefined in terms of the original definitions. Let r be a regularexpression over the alphabet Σ.

1. Optional. r? = (r + /0∗)

2. One or more. (This is a second and conflicting use of themeta-character +.) r+ = (r · r∗)

3. Any. . = (a + b + . . .+ y + z) where Σ = a,b, . . . ,y ,z.

4. Range. [a− z] = (a + b + . . .+ y + z). (Assumes that Σ isordered.)

5. Range complement. [¬c− x] = (a + b + y + z). (Assumes thatΣ is ordered.)

Page 55: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Where areregular expressions

used?

Page 56: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

xkcd—a webcomic of romance, sarcasm, math, and language byRandall Munroe

Page 57: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Grep

The program grep is a command-line utility for searching textoriginally written for Unix. The grep command searches text files forlines matching a given regular expression and prints matching lines tothe programs standard output.

Page 58: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Grep

Suppose the file preamble.txt has the following lines:

We the People of the United States, in Order to form a more perfectUnion, establish Justice, insure domestic Tranquility, provide for thecommon defence, promote the general Welfare, and secure the Blessingsof Liberty to ourselves and our Posterity, do ordain and establishthis Constitution for the United States of America.

> grep and preamble.txtcommon defence, promote the general Welfare, and secure the Blessingsof Liberty to ourselves and our Posterity, do ordain and establish

> grep -i people preamble.txtWe the People of the United States, in Order to form a more perfect

Page 59: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

> grep -w do preamble.txtof Liberty to ourselves and our Posterity, do ordain and establish

(But does not match “domestic.”)

> grep -E -w "(defense)|(defence)" preamble.txtcommon defence, promote the general Welfare, and secure the Blessings

Page 60: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Practical Regular Expressions

grep options regexp filename

Options:

-E extended regular expression-i ignore case-x match whole line-w match word in line

Syntax of regular expressions for grep

| or $ end of line. any ? optional[] set {n,m} n through m times[ˆ ] set compliment () grouping

Page 61: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

grep

grep -E ’.{5,}’ /usr/dict/words

AarhusAaronAbabaabackabacus...zombiezoologyZoroastrianzucchiniZurichzygote

Page 62: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

grep

grep -E ’.{5,}’ /usr/dict/words

AarhusAaronAbabaabackabacus...zombiezoologyZoroastrianzucchiniZurichzygote

Page 63: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

grep

grep -i -E ’[aeiou]{4,}’ /usr/dict/words

aqueousHawaiianIEEEobsequiousonomatopoeiapharmacopoeiaprosopopoeiaqueueSequoia

Page 64: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

grep

grep -i -E ’[aeiou]{4,}’ /usr/dict/words

aqueousHawaiianIEEEobsequiousonomatopoeiapharmacopoeiaprosopopoeiaqueueSequoia

Page 65: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

grep

grep -i -E ’(q[ˆu]|q$)’ /usr/dict/words

CEQColloqIQIraqqQatarQEDq’sseq

Page 66: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

grep

grep -i -E ’(q[ˆu]|q$)’ /usr/dict/words

CEQColloqIQIraqqQatarQEDq’sseq

Page 67: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

grepIs there a regular expression that matches those words whose lettersappear in alphabetical order?

grep -x -E /usr/dict/words

grep -x -E’a?b?c?d?e?f?g?h?i?j?k?l?m?n?o?p?q?r?s?t?u?v?w?x?y?z?’/usr/dict/words

almostbeginbelowbiopsydirtyemptyfirstfortyghostglory

(Some of the longer words.)

Page 68: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

grepIs there a regular expression that matches those words whose lettersappear in alphabetical order?

grep -x -E /usr/dict/words

grep -x -E’a?b?c?d?e?f?g?h?i?j?k?l?m?n?o?p?q?r?s?t?u?v?w?x?y?z?’/usr/dict/words

almostbeginbelowbiopsydirtyemptyfirstfortyghostglory

(Some of the longer words.)

Page 69: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

grepIs there a regular expression that matches those words whose lettersappear in alphabetical order?

grep -x -E /usr/dict/words

grep -x -E’a?b?c?d?e?f?g?h?i?j?k?l?m?n?o?p?q?r?s?t?u?v?w?x?y?z?’/usr/dict/words

almostbeginbelowbiopsydirtyemptyfirstfortyghostglory

(Some of the longer words.)

Page 70: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

grepCan we modify the previous example to allow double letters?

grep -x -E’a*b*c*d*e*f*g*h*i*j*k*l*m*n*o*p*q*r*s*t*u*v*w*x*y*z*’/usr/dict/words

accentaccessalmostbiopsychillychoosyeffortfloppyglossyknotty

(Some of the longer words.)

Page 71: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

grep

grep -E -e ’(y.*){3,}’ /usr/dict/wordsA

The words with at least three y’s.

polytypy chromosomal variation between populationspsychophysiologysynonymysyzygy straight-line alignment of three celestial bodies

Page 72: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

grep

grep -E -e ’(y.*){3,}’ /usr/dict/wordsA

The words with at least three y’s.

polytypy chromosomal variation between populationspsychophysiologysynonymysyzygy straight-line alignment of three celestial bodies

Page 73: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Back references

grep -E -e ’(.)\1\1’ /usr/dict/words

AAAAAASIEEEiiiviii

Page 74: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Back references

grep -E -e ’(.)\1\1’ /usr/dict/words

AAAAAASIEEEiiiviii

Page 75: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

PERL

Perl supports the usual regular expressions:

\ Quote the next metacharacterˆ Match the beginning of the line. Match any character (except newline)$ Match the end of the line (or before newline at the end)| Alternation(..) Grouping[..] Character class

* Match 0 or more times+ Match 1 or more times? Match 1 or 0 times{n} Match exactly n times{n,} Match at least n times{n,m} Match at least n but not more than m times

Page 76: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

PERL

PERL supports “reluctant” and “possessive” quantifiers in addition tothe “greedy” quantifiers. PERL has additional boundary matchers,Unicode character class operators, and look-ahead and look-behindmatchers. Some of these require exponential back-tracking.

\p{property} Match a character with Unicode property\P{property} Match a character without Unicode property(?=pattern) A zero-width positive look-ahead assertion(?!pattern) A zero-width negative look-ahead assertion(?<=pattern) A zero-width positive look-behind assertion(?<!pattern) A zero-width negative look-behind assertion

For more information see the documentation at www.perldoc.com.

Page 77: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Java

Java supports regular expression in the classjava.util.regex.Pattern. It has character class operations forunion and intersection.

[...] Character class[ˆ...] Complement[...[...]] Union[...&&[...]] Intersection[...&&[ˆ...]] Substraction

Page 78: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

// Greedy quantifiersString match = find("A.*c", "AbcAbc"); // AbcAbcmatch = find("A.+", "AbcAbc"); // AbcAbc

// Nongreedy quantifiersmatch = find("A.*?c", "AbcAbc"); // Abcmatch = find("A.+?", "AbcAbc"); // Ab

// Find first substring in input that matches pattern.static String find (String pat, CharSequence input) {

final Pattern pattern = Pattern.compile(pat);final Matcher matcher = pattern.matcher(input);if (matcher.find()) {

return matcher.group();}return null;

}

Page 79: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Using regular expressions to match comments often runs into twoproblems:

I The “every character” pattern often does not match thecharacters indicating a newline.

/* The first line,but not the second line. */

I The greedy “star” will match more than what was intended.

x = 1; /* Both comments */x = 2;/* at once! */ x = 3;

final static String commentMultilineString = "/\\*.*?\\*/";final static Pattern multiLineCommentPattern =

Pattern.compile (commentMultilineString , Pattern.DOTALL);

Page 80: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Tokens

Regular expressions are used in the definition of programminglanguages to describe the basic lexical units–tokens.

1. NUMBER: [0−9]+

2. IDENTIFIER: [a− zA−Z ][a− zA−Z0−9]∗

3. a keyword: begin

4. a hex number:0x[0−9A−F ]+

5. REAL: [0−9] +′ .′[0−9] + (E[0−9]+)?

Metacharacters: [] +′−()?|

Page 81: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Tools

Page 82: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Token Stream

Page 83: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Token Stream

Given a program

float match0 (char *s) /* find a zero */{ if (!strncmp (s, "0.0", 3))return 0.;

}

the lexical analyzer will return the streamFLOAT ID(MATCH0) LPAREN CHAR STAR ID(S) RPAREN LBRACE

BANG ID(STRNCMP) LPAREN ID(S) COMMA STRING(0.0)COMMA NUM(3) RPAREN RPAREN RETURN REAL(0.0) SEMI

RBRACE EOF

Page 84: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Lexical Tokens

An identifier is a sequence of letters and digits; the firstcharacter must be a letter. The underscore counts as aletter. Upper- and lowercase letters are different. If the inputstream has been parsed into tokens up to a given character,the next token is taken to include the longest string ofcharacters that could possibly constitute a token. Blanks,tabs, newlines, and comments are ignored except as theyserve to separate tokens. Some white space is required toseparate otherwise adjacent identifiers, keywords, andconstants.

Page 85: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Regular expressions

1. How do you recognize if a string matches a regular expression?

2. How do you tokenize a character string?

These questions are left to a compiler construction class.

Page 86: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Expressiveness of RE

We like RE because they describe strings.RE are great: easily understandable, concise, recognizers are easy tomake, . . .But,

The major shortcoming of regular expressions, as far as thedescription of programming languages is concerned, is that bracketingis not expressible. Bracketing is indispensable. For example,arithmetic expressions require opening and closing parentheses.Sequences of statements are often bracketed by begin/end pairs.

The set of strings

{an ·bn}= {ε,ab,aabb,aaabbb, . . .}

is not regular!

Page 87: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Expressiveness of RE

We like RE because they describe strings.RE are great: easily understandable, concise, recognizers are easy tomake, . . .But,The major shortcoming of regular expressions, as far as thedescription of programming languages is concerned, is that bracketingis not expressible. Bracketing is indispensable. For example,arithmetic expressions require opening and closing parentheses.Sequences of statements are often bracketed by begin/end pairs.

The set of strings

{an ·bn}= {ε,ab,aabb,aaabbb, . . .}

is not regular!

Page 88: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

BNF (Backus-Naur Form)Conventions (e.g., tokens in bold font) and meta-symbols:

::= “is defined as”| “or”[] “optional” (not necessary, EBNF){} “zero or more” (not necessary, EBNF)

BNF is a collection of rules or productions. Each rule has a LHS and aRHS. The LHS is a syntactic category (a part of the language), calleda nonterminal and the RHS is a sequence of nonterminals and tokens(called terminals).

LHS1 ::= RHS1

LHS2 ::= RHS2...

LHSn ::= RHSn

Page 89: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Examples and Derivations

Grammar 1 and derivation: “revolutionary ideas”Example 3.1, page 129. Program with statement listsExample 3.2, page 131. Simple assignment statementsSebesta, seventh edition

Page 90: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Example Grammar

sentence ::= subject predicate .

subject ::= article {adjective} noun

predicate ::= verb [adverb ]

article ::= A | The

adjective ::= green | colorless | big | new | revolutionary

noun ::= caterpillar | idea

verb ::= sleeps | appears | crawls

adverb ::= furiously | infrequently

Page 91: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Derivation

Sentential form. A sentential form is a string of terminal andnonterminal symbols.

Derivation. A derivation of a sentential form is created from anonterminal by repeatedly replacing the LHS nonterminal of some rulewith its RHS.Every BNF definition is a description of a formal language (a set ofstring), the language of all sentences derivable from a syntacticcategory.

We derive two sentences used as example by Noam Chomsky—onesentence is meaningful, the other makes no sense.

Page 92: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Example Derivation

sentence⇒1 subject predicate .⇒2 article adjective adjective noun predicate .⇒3 Aadjective adjective noun predicate .⇒4 Anewadjective noun predicate .⇒5 Anewrevolutionarynoun predicate .⇒6 Anewrevolutionary ideapredicate .⇒7 Anewrevolutionary ideaverb adverb .⇒8 Anewrevolutionary ideaappearsadverb .⇒9 Anewrevolutionary ideaappears infrequently .

Page 93: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Example Derivation

sentence⇒1 subject predicate .⇒2 article adjective adjective noun predicate .⇒3 Aadjective adjective noun predicate .⇒4 Acolorlessadjective noun predicate .⇒5 Acolorlessgreennoun predicate .⇒6 Acolorlessgreenideapredicate .⇒7 Acolorlessgreenideaverb adverb .⇒8 Acolorlessgreenideasleepsadverb .⇒9 Acolorlessgreenideasleepsfuriously .

Page 94: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Syntax versus Semantics

Two sentences (with the same structure, syntax) derived from thegrammar.

Anewrevolutionary ideaappears infrequently .Acolorlessgreenideasleepsfuriously .

One sentence is meaningful, the other is meaningless. Clearly, themeaning of the words (semantics) is important.

Page 95: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Examples and Derivations

∗ Grammar 1 and derivation: “revolutionary ideas”Example 3.1, page 112. Program with statement listsExample 3.2, page 113. Simple assignment statementsSebesta, fifth edition

Page 96: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Alternate Styles of BNF

“traditional” style:〈traditional style〉 ::= keyNonterminals in angle brackets, terminals in bold font.

CFG or “mathematical” style:A→ BaNonterminals in uppercase letters, terminal in lower case letters.

“verbatim” style:

verbatim_style ::= "key"

ISO 14977 is much like verbatim style.

Wirth, “What can we do about the unnecessary diversity of notation forsyntactic definitions?” CACM, volume 20, number 11, November 1977,page 822.

Page 97: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Syntax Graphs

In addition to linear forms of BNF there are also syntax graphs, alsoknown as syntax diagrams.See Chapter 2, page 21 of WebberFigure 3.6, page 122 of Sebesta, fifth edition (not in sixth edition).See Hyper GOS syntax diagram generator.

Page 98: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Syntax Graphs

if_statement ::="if" condition "then"sequence_of_statements{ "elsif" condition "then"sequence_of_statements }[ "else"sequence_of_statements ]"end" "if" ";"

Page 99: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

∗Grammar

A grammar is a 4-tuple 〈T ,N,P,S〉I T is the set of terminal symbols,

I N set of nonterminal symbols, T ∩N = /0,

I S, a nonterminal, is the start symbol,

I P are the productions of the grammar.

A production has the form α→ β where α and β are strings ofterminals and nonterminals (α can’t be the empty string). We areprimarily interested in context-free grammars in which α is a singlenonterminal.

Page 100: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

∗Example 1

Let G be the grammar defined by the following productions.

S → AB

A → ε

A → Aa

B → ε

B → B b

(We infer easily that T , the set of terminals, is {a,b}; N, the set ofnonterminals, is {A,B}; and S is the start symbol.)

LG = {ai bj}

Page 101: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

∗Example 2

Let G be the grammar defined by the following productions.

S → ε

S → aS b

(We infer easily that T , the set of terminals, is {a,b}; N, the set ofnonterminals, is {S}; and S is the start symbol.)

LG = {ai bi}

Page 102: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

∗What’s a Grammar?

A grammar is a description of a language.Every grammar denotes a formal language (set of strings). A string ofnonterminals ω is in the language, if it is derivable from the start thesymbol.We say γαδ⇒ γβδ (derives in one step) whenever α→ β is aproduction. We use α

∗⇒β as the reflexive and transitive closure of the⇒ relation. A string ω of terminals is in the language defined bygrammar G, if S

∗⇒ω. So, L(G) = {ω ∈ T ∗ | S ∗⇒ω}. In this case thestring ω is called a sentence of G. If S

∗⇒α where α may containnonterminals, then we say α is a sentential form of the grammar G.

Page 103: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

∗Chomsky Hierarchy

name grammar

0 unrestricted1 context-sensitive |β| ≥ |α|2 context-free α ∈ N3 regular A→ ωB, A→ ω

Page 104: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Same as BNF

Context-free grammar is the same as BNF.

S ::= AB

A ::=

A ::= Aa

B ::=

B ::= B b

OR

S ::= AB

A ::= Aa | εB ::= B b | ε

Page 105: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Parse Tree

A formal definition of a parse tree for a context-free grammar ispossible, but it is not illuminating. But the trees in the figures aresuggestive. Each production used in the derivation of a string appearsas a subtree in the diagram. The left-hand-side nonterminal appearsas a node, and all the grammar symbols in the right-hand side of theproduction appear as children of this node.

Page 106: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

An example form Sebesta, 7th, Example 3.2, page 131.An grammar for simple assignment statement

assign ::= id := expr

expr ::= expr + expr

| expr ∗ expr

| (expr )

| id

id ::= A | B | C

Page 107: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Example Derivation

assign⇒

id :=expr ⇒A :=expr ⇒A := id ∗expr ⇒A :=B∗expr ⇒A :=B∗(expr )⇒A :=B∗( id +expr )⇒A :=B∗(A+expr )⇒A :=B∗(A+ id )⇒A :=B∗(A+C)

Page 108: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Example Derivation

assign⇒id :=expr ⇒

A :=expr ⇒A := id ∗expr ⇒A :=B∗expr ⇒A :=B∗(expr )⇒A :=B∗( id +expr )⇒A :=B∗(A+expr )⇒A :=B∗(A+ id )⇒A :=B∗(A+C)

Page 109: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Example Derivation

assign⇒id :=expr ⇒A :=expr ⇒

A := id ∗expr ⇒A :=B∗expr ⇒A :=B∗(expr )⇒A :=B∗( id +expr )⇒A :=B∗(A+expr )⇒A :=B∗(A+ id )⇒A :=B∗(A+C)

Page 110: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Example Derivation

assign⇒id :=expr ⇒A :=expr ⇒A := id ∗expr ⇒

A :=B∗expr ⇒A :=B∗(expr )⇒A :=B∗( id +expr )⇒A :=B∗(A+expr )⇒A :=B∗(A+ id )⇒A :=B∗(A+C)

Page 111: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Example Derivation

assign⇒id :=expr ⇒A :=expr ⇒A := id ∗expr ⇒A :=B∗expr ⇒

A :=B∗(expr )⇒A :=B∗( id +expr )⇒A :=B∗(A+expr )⇒A :=B∗(A+ id )⇒A :=B∗(A+C)

Page 112: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Example Derivation

assign⇒id :=expr ⇒A :=expr ⇒A := id ∗expr ⇒A :=B∗expr ⇒A :=B∗(expr )⇒

A :=B∗( id +expr )⇒A :=B∗(A+expr )⇒A :=B∗(A+ id )⇒A :=B∗(A+C)

Page 113: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Example Derivation

assign⇒id :=expr ⇒A :=expr ⇒A := id ∗expr ⇒A :=B∗expr ⇒A :=B∗(expr )⇒A :=B∗( id +expr )⇒

A :=B∗(A+expr )⇒A :=B∗(A+ id )⇒A :=B∗(A+C)

Page 114: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Example Derivation

assign⇒id :=expr ⇒A :=expr ⇒A := id ∗expr ⇒A :=B∗expr ⇒A :=B∗(expr )⇒A :=B∗( id +expr )⇒A :=B∗(A+expr )⇒

A :=B∗(A+ id )⇒A :=B∗(A+C)

Page 115: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Example Derivation

assign⇒id :=expr ⇒A :=expr ⇒A := id ∗expr ⇒A :=B∗expr ⇒A :=B∗(expr )⇒A :=B∗( id +expr )⇒A :=B∗(A+expr )⇒A :=B∗(A+ id )⇒

A :=B∗(A+C)

Page 116: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Example Derivation

assign⇒id :=expr ⇒A :=expr ⇒A := id ∗expr ⇒A :=B∗expr ⇒A :=B∗(expr )⇒A :=B∗( id +expr )⇒A :=B∗(A+expr )⇒A :=B∗(A+ id )⇒A :=B∗(A+C)

Page 117: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Abstract Syntax Tree

Concrete parse trees may have a lot of extra stuff and inconvenient touse directly.(Webber, Section 3.8 Abstract Syntax Trees, page 38.)

Page 118: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Example Derivation

assign⇒id := expr ⇒A := expr ⇒A := id ∗ expr ⇒A := B ∗ expr ⇒A := B ∗ (expr )⇒A := B ∗ ( id +expr )⇒A := B ∗ (A+expr )⇒A := B ∗ (A+ id )⇒A := B ∗ (A+C)

Page 119: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Concrete Parse Tree

Page 120: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Abstract Syntax Tree

A:=B*(A+C). expr can be replaced by the kind of expression +, *. idcontains no information. Parens are no longer needed.

Page 121: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Parse Tree

If A:=(B*A)+C was the statement, then the abstract syntax would bedifferent and so no important information would be lost.

Page 122: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Dangling else

An ambiguous grammar is one in which some sentence has more thanone parse tree. [Sebesta, 3.3.1.7 Ambiguity, page 132ff.]The grammar of Example 3.3 [Sebesta, 7th, page 132], simpleexpressions, is ambiguous.Also, this grammar of if/then/else statements is ambiguous.Sebesta, 7th, 3.3.1.10 An Unambiguous Grammar for if-then-else,page 137; Figure 3.5, page 138.

Page 123: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Two Parse Trees

[Convert from PS tricks.]

Page 124: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Equivalent Grammar

Just because a grammar is ambiguous does not mean that allgrammars for the same language are ambiguous.

S → if C then S | if C then S else S | S′

This grammar for the same language is not.

S → MS | UMS

MS → if C then MS else MS | S′

UMS → if C then S | if C then MS else UMS

Page 125: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Alternative Language

If the “natural” grammar is ambiguous, could it mean that the constructis confusing to programmers? Is another approach possible?Many languages have redesigned the syntax of the if statement:

S → if C then S endif |if C then S else S endif |S′

The two statements with different structure now have different syntax.

if C1 then if C2 then S1 endif else S2 endifif C1 then if C2 then S1 else S2 endif endif

Page 126: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Tools

Page 127: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems
Page 128: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Recursive Descent Parsers

Grammars in the right form permit a parser of simple recursivefunctions. Each nonterminal turns into a recursive function forming aset of mutually recursive functions. Each function does a case analysison the input to determine which production/RHS to follow. Terminals inthe RHS are matched (consumed) and nonterminals turned intorecursive calls.Java Applet demo: http://cswebsrv.cs.binghamton.edu/˜zdu/parsdemo/recframe.htmlhttp://www.uni-paderborn.de/fachbereich/AG/agkastens/compiler/parsdemo/recframe.htmlSee example in Figure 2.10 in Scott.

Page 129: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Limitations of BNF/CFG

〈block〉 ::= 〈block identifier〉 :

begin 〈statement〉 {〈statement〉}end 〈block identifier〉 ;

〈call〉 ::= 〈identifier〉 ({〈expression〉} )

Page 130: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

Beyond BNF/CFG

Formalisms with greater expressive power

I unrestricted grammars

I Post systems

I attribute grammars

Such formalisms are used in describing semantics — a later topic.Notice the pragmatic definition of syntax (at or below context-free) andsemantics (beyond context-free).

The set of strings

{an ·bn · cn}= {ε,abc,aabbcc,aaabbbccc, . . .}

is not context free!

Page 131: Doctor Syntax was the fictitious hero of several 19th ...ryan/cse4250/notes/syntax.pdf · Doctor Syntax was the fictitious hero of several 19th century satirical ... Post Systems

programminglanguages

language design,description, semantics

compilers

implementation,recognition

formallanguages

expressivity