syntax and semantics form and meaning of programming languages copyright © 2003-2015 by curt hill
DESCRIPTION
Some Terminology Sentence –A string of characters using some alphabet Language –A set of sentences –Possibly infinite Lexeme –The most basic unit of the syntax Token –A class of lexemes Copyright © by Curt HillTRANSCRIPT
![Page 1: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill](https://reader034.vdocuments.site/reader034/viewer/2022051503/5a4d1aec7f8b9ab05997b321/html5/thumbnails/1.jpg)
Syntax and Semantics
Form and Meaning of Programming Languages
Copyright © 2003-2015 by Curt Hill
![Page 2: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill](https://reader034.vdocuments.site/reader034/viewer/2022051503/5a4d1aec7f8b9ab05997b321/html5/thumbnails/2.jpg)
Definitions• Syntax: form of the
expressions, statements and units
• Semantics: meaning of those expressions, statements and units
• What is needed for this course and beyond is a way to describe both in a clear and unambiguous way
Copyright © 2003-2015 by Curt Hill
![Page 3: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill](https://reader034.vdocuments.site/reader034/viewer/2022051503/5a4d1aec7f8b9ab05997b321/html5/thumbnails/3.jpg)
Some Terminology• Sentence
– A string of characters using some alphabet
• Language– A set of sentences– Possibly infinite
• Lexeme– The most basic unit of the syntax
• Token– A class of lexemes
Copyright © 2003-2015 by Curt Hill
![Page 4: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill](https://reader034.vdocuments.site/reader034/viewer/2022051503/5a4d1aec7f8b9ab05997b321/html5/thumbnails/4.jpg)
Programming Languages• Here we also have characters and
lexemes• A token is a class of lexemes
– Any token is interchangeable with its own class for syntax
– It may change the meaning, but not the form
• In English: nouns, verbs etc– Nouns are interchangeable, even though
the meaning changes• Reserved words, punctuation,
identifiers
Copyright © 2003-2015 by Curt Hill
![Page 5: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill](https://reader034.vdocuments.site/reader034/viewer/2022051503/5a4d1aec7f8b9ab05997b321/html5/thumbnails/5.jpg)
Tokens and Lexemes• The lexeme is the word or item from
the language itself• A token is the representation of the
lexeme that is output by the scanner• Tokens are often records or objects• Tokens are often identified by an
enumeration• This may be enhanced by other
information, such as an identifier in a symbol table
Copyright © 2003-2015 by Curt Hill
![Page 6: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill](https://reader034.vdocuments.site/reader034/viewer/2022051503/5a4d1aec7f8b9ab05997b321/html5/thumbnails/6.jpg)
Formal methods of describing syntax
• Two men worthy of note– Noam Chomsky
•Noted linguist and political activist•Devised an hierarchy of languages
– John Backus•FORTRAN•Algol60•Backus Normal (Naur) Form
Copyright © 2003-2015 by Curt Hill
![Page 7: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill](https://reader034.vdocuments.site/reader034/viewer/2022051503/5a4d1aec7f8b9ab05997b321/html5/thumbnails/7.jpg)
Chomsky Grammars• All languages are defined by a grammar
• A grammar contains four pieces– V - an alphabet
– The legal characters– T - set of terminal symbols
– Terminals may appear in the language such as reserved words
– Non-terminals may not appear• They are concepts or statements
composed of terminals– P - a set of rewriting rules, these
are called productions– Z - the distinguished symbol
Copyright © 2003-2015 by Curt Hill
![Page 8: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill](https://reader034.vdocuments.site/reader034/viewer/2022051503/5a4d1aec7f8b9ab05997b321/html5/thumbnails/8.jpg)
More on Grammar• A language is all the legal strings
accepted by this language• Terminals are those things that
actually exist in the language• Non-terminals are those things
that only represent syntactic items• For a parse to be complete all non-
terminals must be rewritten into terminals
• Lets consider a simple example
Copyright © 2003-2015 by Curt Hill
![Page 9: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill](https://reader034.vdocuments.site/reader034/viewer/2022051503/5a4d1aec7f8b9ab05997b321/html5/thumbnails/9.jpg)
Binary• The grammar is
G = {V,T,P,Z}• The alphabet, terminals and non-
terminals:V = {0,1,Z,A}
• Terminals:T = {0,1}
• Non-Terminals must be Z and A• Distinguished symbol is Z• Productions are on next screen
Copyright © 2003-2015 by Curt Hill
![Page 10: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill](https://reader034.vdocuments.site/reader034/viewer/2022051503/5a4d1aec7f8b9ab05997b321/html5/thumbnails/10.jpg)
Productions• P = {Z ::= AA ::= 1 AA ::= 0 AA ::= 0A ::= 1}
• A production allows us to rewrite from one form to another
• A non-terminal is on the left • Terminals and non-terminals on the right
Copyright © 2003-2015 by Curt Hill
![Page 11: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill](https://reader034.vdocuments.site/reader034/viewer/2022051503/5a4d1aec7f8b9ab05997b321/html5/thumbnails/11.jpg)
Derive 101
Copyright © 2003-2015 by Curt Hill
Start with distinguished symbol
Z
Apply production Z::= A AApply production: A ::= 1 A 1A
Apply production: A ::= 0 A 10A
Apply production: A ::= 1 101
![Page 12: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill](https://reader034.vdocuments.site/reader034/viewer/2022051503/5a4d1aec7f8b9ab05997b321/html5/thumbnails/12.jpg)
Chomsky Hierarchy• Chomsky proposed an hierarchy
of languages based on the strength of the rewriting rules
• There are four– Type 0 through Type 3
• The hierarchy is based on the strength of the rewriting rules
• Type 0 is strongest, 3 is weakest• In programming languages we
are only interested in the 3 and 2Copyright © 2003-2015 by Curt Hill
![Page 13: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill](https://reader034.vdocuments.site/reader034/viewer/2022051503/5a4d1aec7f8b9ab05997b321/html5/thumbnails/13.jpg)
Type 3 - regular languages
• U ::= N or U := WN• U and W are non-terminals and
N is a terminal• A non-terminal may only be
replaced by a terminal or non-terminal followed by a terminal
• Often used for describing tokens• Regular expressions are of this
type
Copyright © 2003-2015 by Curt Hill
![Page 14: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill](https://reader034.vdocuments.site/reader034/viewer/2022051503/5a4d1aec7f8b9ab05997b321/html5/thumbnails/14.jpg)
Type 2 - context free languages• U ::= v
• U is in set of non-terminals and v is in set of terminals and non-terminals
• A terminal may be replaced by any combination of terminals and non-terminals– The context of the terminal does not
matter• Most programming languages are
context-free or have a few minor exceptions
Copyright © 2003-2015 by Curt Hill
![Page 15: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill](https://reader034.vdocuments.site/reader034/viewer/2022051503/5a4d1aec7f8b9ab05997b321/html5/thumbnails/15.jpg)
Language Hierarchies
Copyright © 2003-2015 by Curt Hill
Type 3 Regular
Type 2 Context Free
Type 1 Context Sensitive
Type 0 Unrestricted
![Page 16: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill](https://reader034.vdocuments.site/reader034/viewer/2022051503/5a4d1aec7f8b9ab05997b321/html5/thumbnails/16.jpg)
BNF• John Backus defined FORTRAN
with a notation similar to Context Free languages independent of Chomsky in 1959
• Peter Naur extended it slightly in describing ALGOL
• Became known as BNF for Backus Normal Form or Backus Naur Form
• Meta-language is the language that describes another language
Copyright © 2003-2015 by Curt Hill
![Page 17: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill](https://reader034.vdocuments.site/reader034/viewer/2022051503/5a4d1aec7f8b9ab05997b321/html5/thumbnails/17.jpg)
BNF Again• There are several meta-languages
for BNF, the production rules given above are one
• Like the Chomsky grammar there are non-terminals, terminals, productions and a start symbol– Each non-terminal represents some
abstract concept in a language– There is often some notational way
to distinguish a terminal from a non-terminal
Copyright © 2003-2015 by Curt Hill
![Page 18: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill](https://reader034.vdocuments.site/reader034/viewer/2022051503/5a4d1aec7f8b9ab05997b321/html5/thumbnails/18.jpg)
Simplest notation• Form of productions: LHS RHS• Where:
– LHS is a non-terminal (context free and regular grammars)
– RHS is any sequence of terminals and non-terminals, including empty
• There can be many productions with exactly the same LHS, these are alternatives
• If the RHS contains the LHS, the rule is recursive
Copyright © 2003-2015 by Curt Hill
![Page 19: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill](https://reader034.vdocuments.site/reader034/viewer/2022051503/5a4d1aec7f8b9ab05997b321/html5/thumbnails/19.jpg)
Simple extensions• Some times there is an alternation
symbol that allows us to only need one production with the same LHS, often the vertical bar
• Some times things enclosed in [ and ] are optional, they may be present zero or one times
• Some times things enclosed in { and } may be present 1 or more times– Thus [{x}] allows zero or more x items
Copyright © 2003-2015 by Curt Hill
![Page 20: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill](https://reader034.vdocuments.site/reader034/viewer/2022051503/5a4d1aec7f8b9ab05997b321/html5/thumbnails/20.jpg)
More• The extensions are often called
EBNF• Syntax graphs are equivalent
to EBNF• These tend to be more easy to
read
Copyright © 2003-2015 by Curt Hill
![Page 21: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill](https://reader034.vdocuments.site/reader034/viewer/2022051503/5a4d1aec7f8b9ab05997b321/html5/thumbnails/21.jpg)
Simple Expressions
Copyright © 2003-2015 by Curt Hill
expressionterm
+
-termfactor
*
/factor
constant ident ( )expression
![Page 22: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill](https://reader034.vdocuments.site/reader034/viewer/2022051503/5a4d1aec7f8b9ab05997b321/html5/thumbnails/22.jpg)
BNF is generative• A derivation is sentence generation• Leftmost derivation
– Only the leftmost non-terminal can be rewritten
– This is usually the kind of derivation used by compilers
– The previous derivation was leftmost• There are also rightmost
derivations• The order of derivation does not
affect the language defined
Copyright © 2003-2015 by Curt Hill
![Page 23: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill](https://reader034.vdocuments.site/reader034/viewer/2022051503/5a4d1aec7f8b9ab05997b321/html5/thumbnails/23.jpg)
Example BNF productions
Copyright © 2003-2015 by Curt Hill
<program> <stmts><stmts> <stmt> | <stmt> ; <stmts><stmt> <var> = <expr><var> a | b | c | d<expr> <term> + <term> | <term> - <term><term> <var> | const
![Page 24: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill](https://reader034.vdocuments.site/reader034/viewer/2022051503/5a4d1aec7f8b9ab05997b321/html5/thumbnails/24.jpg)
Example Derivation
Copyright © 2003-2015 by Curt Hill
<program> => <stmts> => <stmt> => <var> = <expr> => a = <expr> => a = <term> + <term> => a = <var> + <term> => a = b + <term> => a = b + const
![Page 25: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill](https://reader034.vdocuments.site/reader034/viewer/2022051503/5a4d1aec7f8b9ab05997b321/html5/thumbnails/25.jpg)
Parse trees• A multi-way tree where:
– Each interior node is a non-terminal
– Each leaf is a terminal– The start symbol is the root– Nested under each interior node
is the RHS of the production, with the LHS being the node itself
• This is a handy data structure for compilers and the like
Copyright © 2003-2015 by Curt Hill
![Page 26: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill](https://reader034.vdocuments.site/reader034/viewer/2022051503/5a4d1aec7f8b9ab05997b321/html5/thumbnails/26.jpg)
Example Parse Tree
Copyright © 2003-2015 by Curt Hill
program
stmts
stmt
var expr =
term term = a
b
constvar
![Page 27: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill](https://reader034.vdocuments.site/reader034/viewer/2022051503/5a4d1aec7f8b9ab05997b321/html5/thumbnails/27.jpg)
Ambiguity• A grammar is ambiguous when
two parse trees can be derived from the same input sequence
• An ambiguous grammars usually require some fix-up in the compiler to guarantee that only one will be chosen
• Many IF grammars are ambiguous concerning whether they have an else or not
Copyright © 2003-2015 by Curt Hill
![Page 28: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill](https://reader034.vdocuments.site/reader034/viewer/2022051503/5a4d1aec7f8b9ab05997b321/html5/thumbnails/28.jpg)
BNF Problems• BNF cannot capture important information– That a variable is defined– That an expression contains proper
types• Some problems like type checking
could be done but would bulk out the grammar so much to be unusable– Other problems like declare before use
in C++ are impossible to catch in BNF• Many of these are types of things
are called Static SemanticsCopyright © 2003-2015 by Curt Hill
![Page 29: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill](https://reader034.vdocuments.site/reader034/viewer/2022051503/5a4d1aec7f8b9ab05997b321/html5/thumbnails/29.jpg)
The Solution?• Attribute Grammars• An attempt to augment the
syntax with static semantic information
• Associate with each production (and with nodes of the parse tree) a function that would check the static semantic information
• Check the attributes with a set of predicates
Copyright © 2003-2015 by Curt Hill
![Page 30: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill](https://reader034.vdocuments.site/reader034/viewer/2022051503/5a4d1aec7f8b9ab05997b321/html5/thumbnails/30.jpg)
Attribute Grammars• A context free grammar • For each symbol there may be a
set of attribute values• A set of functions that define these
attribute values based on non-terminals
Copyright © 2003-2015 by Curt Hill
![Page 31: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill](https://reader034.vdocuments.site/reader034/viewer/2022051503/5a4d1aec7f8b9ab05997b321/html5/thumbnails/31.jpg)
Example
Copyright © 2003-2015 by Curt Hill
Production Attribute<exp>::=<term> val(exp)=val(term)<exp>::=<exp> + <term>
val(exp)=val(exp)+ val(term)
<term>::=<term> * <factor>
val(term)=val(term) * val(factor)
<term> ::= <factor>
val(term) = val(factor)
<factor> ::= ident val(factor) = val(ident)<factor> ::= (<exp>)
val(factor) = val(exp)Consider: 2+4(1+2)
![Page 32: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill](https://reader034.vdocuments.site/reader034/viewer/2022051503/5a4d1aec7f8b9ab05997b321/html5/thumbnails/32.jpg)
Second Example
Copyright © 2003-2015 by Curt Hill
Production Attribute
<decl>::=<type><list> <type,names><type>::=int type=int<type>::=float type=float<list>::=ident names(list)=ident<list>::=ident , <list> names(list)=ident
names(list)
We can now determine whether defined or not from the types
![Page 33: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill](https://reader034.vdocuments.site/reader034/viewer/2022051503/5a4d1aec7f8b9ab05997b321/html5/thumbnails/33.jpg)
Second example• Consider declarations• Production Attributes
<decl>::=<type><list><type,names> <type>::=inttype=int <type>::=floattype=float <list>::=identnames(list)=ident <list>::=ident , <list> names(list)=ident names(list) Now we can determine from the attributes whether an item is defined or not
Copyright © 2003-2015 by Curt Hill
![Page 34: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill](https://reader034.vdocuments.site/reader034/viewer/2022051503/5a4d1aec7f8b9ab05997b321/html5/thumbnails/34.jpg)
YACC Uses• YACC (Yet Another Compiler
Compiler) and many other programs is a common UNIX tool for constructing compilers
• YACC uses an attribute grammar of sorts– Attached to each production is a
function call– You get to write the function that
does the checking at that point, including code generation
Copyright © 2003-2015 by Curt Hill
![Page 35: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill](https://reader034.vdocuments.site/reader034/viewer/2022051503/5a4d1aec7f8b9ab05997b321/html5/thumbnails/35.jpg)
Conclusion and Summary• Syntax is about the form of
langauges• Semantics the meaning• BNF represents a context free
grammar
Copyright © 2003-2015 by Curt Hill