chapter three - texas tech universityredwood.cs.ttu.edu/~rwatson/classes/cs3361/ch3.doc · web...

DRAFT 5/25/2023 III-1

Chapter ThreeLanguage Semantics Design

Introduction.

The syntax of a target language defines the valid ways in which one can combine

elements of the language vocabulary to form grammatically correct sentences. Semantic

definitions of a target language define the way in which one determines the meaning of a

grammatically correct sentence. In chapter two a metalanguage is described that can

precisely define the syntax of a language. Introduced in the present chapter are two

metalanguages, which are often used for providing precise definitions of the semantics of

target languages.

Just as there is an abstract definition for the syntax of a language, there are

abstract definitions giving the semantics of a language. Recall that the abstract syntax of

a language is a set of rules which, with reference to the language vocabulary, define how

one produces valid sentences in the language. Likewise, the reader will see that the

semantic definitions of a language reference its abstract syntax rules to define precisely

the meaning of the syntactic structure.

Precise syntactic and semantic definitions are an important attribute of a language.

Languages defined according to precise, mathematical methods of syntactic and semantic

description are called formal languages. A language that does not have precise

semantics is not formal.

In the absence of a precise definition a language is likely to be syntactically and

semantically ambiguous. Natural (or spoken) languages are not defined precisely. A

good example of the ambiguity that results is found in the English word "nurse." The

DRAFT 5/25/2023 III-2

word is syntactically ambiguous because the word can be a noun or a verb. Naturally the

meaning of the word is also ambiguous in the absence of clues from the context. For

example, a statement "nurse!" could be viewed as a command to nurse (given an implied

subject such as the word "you") or as someone calling out for a nurse's attention

(assuming an implied verb such as the word "come").

The purpose of a programming language is to provide instructions to a computer.

Computers work as automatons. If a programmer tells the computer to perform task A,

the computer does not have the wherewithal to stop and ask the programmer what is

meant when he says perform task A. An automaton must carry out its work without

clarification. Therefore, the meaning of the computer instructions must be absolutely

clear. Precise semantics are required.

Once the syntax of a target language is defined formally, one can develop the

target language's semantic description. In this chapter, two approaches to semantic

definition are presented. The denotational approach is presented first and is based upon

Lattice Theory and a Lambda Calculus-based metalanguage. The axiomatic approach to

semantic definition is based upon a logic-like language.

Denotational Semantics.

The denotational approach to language definition is based upon Lattice Theory.

A Lattice is a set of elements closed under two binary operations which are commutative,

associative, idempotent, and which satisfy the absorption laws. The denotational

semantics were developed by Scott-Strachey. [Stoy]

A Lattice is a partially ordered set S, which has a least upper bound called the

supremum of S and a greatest lower bound called the infimum of S. The supremum and

infimum of S are where elements of S join and meet, respectively. Consider the

following example which depicts in a Hasse Diagram the partial ordering of the subset

relation on the power set of {a,b,c}, i.e., P{a,b,c}:

DRAFT 5/25/2023 III-3

{a,b,c}

{a,b} {b,c}{a,c}

{a} {b} {c}

{}

Assume that the node containing {a,b,c} is at level 0 of the diagram and that the null set

{} is at level 3. One can see that the subsets of a level i set are found in level i+1. The

edges of the diagram connect a set with its corresponding subsets - the subset relation is

the basis of the ordering. The join of the P{a,b,c} is the least upper bound, I.E., the set

{a,b,c} while the elements meet in the greatest lower bound of P{a,b,c}, I.E., the null

set.

The notion of containment is essential in the Lattice. Whenever information is

ordered by the subset relation, the ordering implies the amount of information contained

in an element. For example, since {a,b,c} {b,c} the set {a,b,c} contains more

information. The integers are ordered in the following way:

DRAFT 5/25/2023 III-4

T

T

The integers are ordered as a "flat" set. The numerical ordering relation does not

convey the notion of containment, i.e., For example, the integer 3 does not "contain"

the information contained in the integer 2. The special elements top and bottom, T and

, respectively, indicate all values (a contradictory situation) and no values, respectively.

These elements form the least upper bound and greatest lower bound of the Integer

Lattice.

The goal of the denotational approach is to identify functions which map from a

syntactic domain to a semantic domain (where domains may be infinite or finite). The

semantic domains are based upon the mathematical notion of the Lattice – I.E., semantic

domains are not sets, they are Lattices. Before mappings can be defined, we must first

build up the domains involved in the mappings. We must also develop an understanding

of another metalanguage, which precisely defines how the functions perform their

mappings.

To demonstrate, we will first develop a small example language LE for arithmetic

expressions:

E::= E + E | E - E | E * E | E / E | (E) | CC ::= DC | D

DRAFT 5/25/2023 III-5

D ::= 0 | 1 | 2 | ... | 9

This grammatical definition corresponds with the BNF definitions from the previous

chapter. Notice that there are no variables in this language; assume only integer

constants. The syntactic domains of this language are:

SYNTACTIC DOMAINS:C: ConsE: ExpD:Digits

The syntactic domains correspond to the LHS's of the productions. This is always

the case. For every LHS of a production there is a syntactic domain. It seems clear that

the mapping from one of LE’s syntactic objects (either C, E, or D), yields a semantic

object from the domain of integers:

: N = {...,-2,-1,0,1,2,....}

The complete semantic domain is referred to by N while is a metavariable that can take

on any possible integer value. Semantic domains require two additional elements to

denote the possibility of erroneous and undefined values. These values are represented

by two special symbols - the top and bottom of a Lattice, respectively:

Top: T Bottom:

It is now possible to denote the semantic domain:

SEMANTIC DOMAIN:: N = {T,...,-2,-1,0,1,2,....,}

This latter definition often appears as {...,-2,-1,0,1,2,...} where the degree symbol

denotes inclusion of and T.

DRAFT 5/25/2023 III-6

The next step involves the declaration of a function. The general form of a

function declaration is:

: LHS Semantic_domain

where is a semantic function name, LHS is a LHS of a BNF rule, and Semantic_domain

is a previously defined semantic domain.

Below, the semantic functions which will map from the syntactic domain of Exp

to the semantic domain of N, and from Cons to N are declared:

SEMANTIC FUNCTION DECLARATION: : Exp N : Cons N : Digits N

Given an expression or a constant based upon the BNF above, a meaning is construed in

terms of integers:

: Exp N

The expression above declares which syntactic domain is mapped to which

semantic domain. The expression indicates that there is a function which maps from

the syntactic Exp to the semantic N. It is now necessary to describe how the mapping is

to take place. Semantic equations are developed for the semantic function describing

how the mapping will take place.

Recall that for every LHS of the BNF there is a syntactic domain. It is also true

that for every RHS option of the BNF there is a semantic equation. Semantic equations

are of the general form:

f « RHS » = semantic_definition

DRAFT 5/25/2023 III-7

The symbol f is a semantic function previously declared. The RHS is a syntactic object

as defined by a RHS option for the BNF. The = sign is a replacement operator. In

determining the meaning of an instruction, if one can match f«RHS» precisely, he can

replace f «RHS» with the corresponding semantic_definition. The relationships between

BNF, domains, semantic function declaration, and semantic equations are given in figure

1.

A semantic equation states precisely how the mapping is to be performed. The

operators «» are employed to enclose and isolate syntactic objects on the LHS of the

semantic equations. The equations should be read as: the syntactic object in «»'s on the

LHS of the "=" is defined and replaced by the expression on the RHS of the "=."

SEMANTIC FUNCTION EQUATIONS:a. « E1+E2 » = « E1 » + « E2 »b. « E1-E2 » = « E1 » - « E2 »c. « E1*E2 » = « E1 » * « E2 »d. « E1/E2 » = « E1 » / « E2 »e. « (E) » = « E » f. « C » = « C » g. « CD » = ((10 « C » ) + « D » )h. « D » = « D »i. « 0 » = 0j. « 1 » = 1k. « 2 » = 2

: :r. « 9 » = 9

The semantic equations define how to map from a syntactic specification of integer

arithmetic to integers and integer arithmetic. For example, the equations i-r map from a

syntactic integer digit to an integer. Equations f-h define how to map from multi-digit

syntactic integers to multi-digit semantic integers. Equations a-f map from a

specification of integer operations to the actual integer operations.

DRAFT 5/25/2023 III-8

Backus-Naur Form A ::= RHS1 | RHS2 | ... | RHSn

Syntactic Domain: A

Semantic Domain s : S = {....}

Semantic Equations ƒ «RHS1» = semantic1 ƒ «RHS2» = semantic2 : ƒ «RHSn» = semanticn

Semantic Function Declaration

ƒ : A S

Figure 1. Relationships between Syntax and Semantics.

Consider the derivation of an integer. The mapping below is from a syntactic

representation of a number to a semantic representation. Intuitively, the mapping can be

viewed as analogous to a program - you may have written yourself – that converts an

ASCII character representation of an integer to an integer representation – I.E., a

representation upon which you can perform numerical computations.

In the first step, one wishes to know the meaning of the syntactic characters 2345.

« 2345 » = ((10 « 234 » ) + « 5 » )

The function reference matches the LHS of semantic equation g. The reference

«2345» is replaced by an appropriate instantiation of g's RHS seen on the right,

above. The resulting expression includes two references to semantic functions, «234»

and «5». These references form the LHS of equations g and n, respectively. The

references result in the following substitutions:

DRAFT 5/25/2023 III-9

= ((10 ((10 « 23 » ) + « 4 » )) + 5 )

The references «23» and «4» result in appropriate replacements from equations

g and m:

= ((10 ((10 ((10 « 2 » ) + « 3 » )) + 4 )) +

5 )

The references to «2» and «3» result in the following replacements according to

equations h and l, respectively.

= ((10 ((10 ((10 « 2 » ) + 3 )) + 4 )) + 5 )

Now the final substitution is made according to equation k. Since there are no further

references to semantic functions, the expressions are evaluated according to the standard

definitions of integer arithmetic:

= ((10 ((10 ((10 2 ) + 3 )) + 4 )) + 5 )= ((10 ((10 (20 + 3 )) + 4 )) + 5 )= ((10 ((10 23) + 4 )) + 5 )= ((10 (230 + 4 )) + 5 )= ((10 234) + 5 )= (2340 + 5 )= 2345

Now consider the meaning of an arithmetic expression. Above the equals sign

(i.e., =) is the letter of the semantic equation used in a translation step. In a string of

equalities:

e1 =a e2 =b e3 =c ... =d en-1 =e en

DRAFT 5/25/2023 III-10

The expression e1 matches the LHS of equation a and is replaced by e2 which matches

the LHS of equation b and is replaced by e3 which matches the LHS of equation c, etc.

For brevity, when equations may apply in parallel, an &-sign is used - when the same

equation may apply to several terms, a factor is assigned the equation:

«(3+2) / (4-2)» =d

«(3+2)» / «(4-2)» =2 e

«3+2» / «4-2» =a&b

«3» + «2» / «4» - «2» =4 f

«3» + «2» / «4» - «2» =4 h

«3» + «2» / «4» - «2» =l,k,m,k

3+2 / 4-2

Even with the formal definitions there is ambiguity in the result: 3+2 / 4-2.

Precise definitions can lead to ambiguity if not developed properly. The question arises:

is 2.5 (or 2 since we are dealing with integers) the answer, or is it 1.5 (or 1)? This

ambiguity flies in the face of our goal, which is to obtain a precise definition of infix

integer arithmetic.

To solve this problem of ambiguity, we will alter the BNF for expressions, the

semantic equations, and we will also adopt two conventions for evaluating the equations.

Instead of a breadth-first application of semantic equations, let us do depth-first and adopt

the additional convention that integer expressions and subexpressions are to be evaluated

at the earliest possible point in the derivation.

Since the semantics of integer data is now well understood, future examples will

refer to integers simply as cons – and we will assume the syntactic construction and

semantic derivation of integers. Specifically, this abbreviation is meant to summarize the

earlier syntax, which describes how to syntactically construct an integer. Likewise, the

semantics will summarize the equations g-r which state how to interpret the syntactic

DRAFT 5/25/2023 III-11

representation of the integer. Consider, the BNF as altered to separate additive and

multiplicative operations:

E::= T | E + T | E - TT::= F | T * F | T / FF::=(E) | cons

This set of equations results in the following domains:

SYNTACTIC DOMAINS: E: Exp T: TerF: FacC: Cons

We will continue to map to the semantic domain of integers:

SEMANTIC DOMAIN:: N = {,...,-2,-1,0,1,2,....,T}

Due to the fact that there now exist three syntactic categories from the BNF LHS's, we

need to declare three semantic functions:

SEMANTIC FUNCTION DECLARATION: : Exp N : Ter N : Fac N

Next we elaborate the semantic equations for all RHS's for a given BNF syntax rule. In

equation g the RHS is altered by placing parentheses around the semantic expression,

thus increasing its precedence for evaluation. In equation h the integer corresponding to

the syntactic integer is selected - summarizing informally equations g-q in the original

semantics.

SEMANTIC EQUATIONS:

DRAFT 5/25/2023 III-12

a. «E + T» = « E » + « T »b. « E - T »= « E » - « T »c. «T » = « T »d. «T * F» = « T » * « F » e. «T / F » = « T » / « F »f. « F » = « F »g. «(E) » = ( « E » )h. « cons » = element of N corresponding to cons

Now consider the evaluation of an expression, employing the two conventions

mentioned earlier and the altered semantic equations. The equation letter over the = sign

indicates the pertinent LHS/RHS relationships.

« (3+2) / (4-2)» =c

« (3+2) / (4-2) » =e

« (3+2) » / « (4-2) » =f

« (3+2) » / « (4-2) » =g

« (3+2) » / ( «4-2 » ) =b

« (3+2) » / ( «4 »- «2» ) =2 c

« (3+2) » / ( «4 »- «2» ) =2 f

« (3+2) » / ( «4 »- «2» ) =2 h

« (3+2) » / ( 4 - 2 ) =math

« (3+2) » / 2 =f

« (3+2) » / 2 =g

( «3+2 » ) / 2 =a

( «3 »+ «2 » ) / 2 =2 c ( «3 »+ «2 » ) / 2 =2 f ( «3 »+ «2 » ) / 2 =2 h ( 3 + 2 ) / 2 =math

5/2 =math 2

In the next section we develop the metalanguage employed for more complicated

semantic equations.

DRAFT 5/25/2023 III-13

The Denotational Semantic Metalanguage

In this section, much of the metalanguage employed in denotational semantic

defintions is presented. In the previous section we observed simple syntactic and

semantic domains. It is possible to construct more complicated domains from existing

domains with the product and sum operators: and +, respectively.

Product Domains:

Given domains D1,D2,...,Dn it is possible to form a new domain N =

D1D2...Dn which gives a set of ordered n-tuples: <d1,d2,...,dn> where for each i:

diDi . For example,

Given, B = {t,f,,T} and V = {a,b,c} B V = {<t,a >,<t,b >, <t,c >,<f,a >,<f,b >, <f,c > }

The down-arrow is an operator which permits access to the components of a tuple

formed by a product operation, t i. In this operation, t is a tuple of a product

domain and i is an integer. Given N = D1D2...Dk , n N, and di Di then n

i = di .

There is a special form of a product domain denoted by Dn where, Dn =

D1D2...Dn iff D1D2...Dn . This set is to be distinguished from the * as

a superscript. The set D* is the set of all finite strings one may form using the elements

of D. For example, assume D = {t,f}:

D* = {t,f,tt,tf,ff,ft,...}Dn = {<t,t,t >,<f,f,f >,<f,t,t >,<t,f,t >,<t,t,f >,<t,f,f >,<f,t,f >,<f,f,t >}

where n = 3

Therefore, D* = D0 D1 D2 Di

DRAFT 5/25/2023 III-14

where D0 = - the empty string

Consider the following example where B ={t,f}, L = {a,b,c}, and P = BL :

Sum Domains:

Given domains D1,D2,...,Dn it is possible to form a new domain S =

D1D2...Dn which gives a set of elements, where for any element x S, there is

some i such that x Di For example, let I be the set of integers, C the set of

alphabetic characters, and R the set of real numbers. One possible use of these sets and

the metalanguage operators is to specify the output domain of a program language. The

output specification (I C* R)* would indicate a language capable of programs that

can produce strings of integer, real, and character strings as output.

The example domain (I C* R)* warrants inspection. Inside the parentheses

one may observe that an output can be an integer (I), a character string (C*), or a real

number (R). The quantity thus formed may itself allow for strings, i.e., we can obtain

any string one can form from the sum domain itself. Thus, the output of a program can

TLB

<a,TB> <TL ,t> <b,TB> <TL ,f> <c,TB>

<a,f> <a,t> <b,t> <b,f> <c,f> <c,t>

<a,B> <L ,t> <b, B> <L ,f> <c, B>

LB

DRAFT 5/25/2023 III-15

be any conceivable combination of integer, character, or real strings. Included in the

output domain are the following:

510.43athe value of r is 5 and the value of s is 8.95etc

Consider the following example where B ={t,f}, L = {a,b,c}, and P = BL :

Functions:

Since the goal of the semantic definitions is to map from a syntactic to a semantic

domain, function definition is central to our concern. Functions are declared according to

the general form:

F = D1 D2

TL+B

TL TBa b c t f

L B

L+B

DRAFT 5/25/2023 III-16

This definition is akin to a type declaration. In the declaration one is stating that there is a

class of functions, that given an element of D1, the function will map to an element of

D2. It is possible to identify a specific function f, which is an element of a function

class:

f : F = D1 D2

So that f(d1) = d2 assuming d1 D1 and d2 D2 and that there is a mapping

between d1 and d2. Expressions for domains can be complicated:

D1 D2 ...Dn Dn+1 Dn+2

The precedence relationships in order of importance are , , and The mapping

symbol, is right associative.

Function Domains.

Frequently it is necessary to define function domains, which map domain

elements to functions:

D1 D2 D3)

Assume that C represents the syntactic representation of an integer and that N represents

the semantic representation, I.E., the set of integers. Operators is a set of arithmetic

operators {*, /, +, -}. Now consider the following function domain definition:

S=Operators C C N)

DRAFT 5/25/2023 III-17

s: S=Operators C C N)

So that s [+ (4,3) ] selects the addition function which, in turn, maps the pair (4,3) to 7.

List Operations.

Recall that Dn denotes all lists of length n that can be formed using the elements

of D. Furthermore, recall that D* = {nil} + D1+ D2+ ..represents all possible finite

lists of a domain, D. Special operators exist to manage lists serving as operands of the

denotational semantic definitions:

hd(L) D*Dtl(L) D*D*

append(L,d) D* D D*

prefix(d,L) D D* D*

Given L=nil then both the hd and the tl function return top, T, an erroneous condition.

Notice that tl([20]) evaluates to nil, but one will never see a nil from the hd operation.

Suppose D=(N+C*) where N is the domain of integers and C is the domain of characters.

D* could thus represent an output file O=D*. The process of writing to an output file

could be represented by the append. An input file I=(CN + CC*)* (where CN and CC

are syntactic representations of integers and characters, respectively) could be

represented by two operations: the input value obtained with hd, and the subsequent

input file obtained by the tl:

append(O,d) For output;hd(I) For an input value;tl(I) For the input file resulting from an input operation.

Consider the following examples where I=[7,8,9]:

DRAFT 5/25/2023 III-18

append(I,10) = [7,8,9,10]prefix(I,6) = [6,7,8,9]prefix(append(I,10),6) = [6,7,8,9,10]hd(I) = 7tl(I) = [8,9]

Domain Operations.

It is common to have identifiers which map to locations (i.e., memory locations)

or to procedures (i.e., subprograms). In order to represent identifiers as a domain

consider the following definition:

Id L + P where L is a set of locations and P is a set of procedures.

Given an identifier it is possible to map to locations or to procedures. Special operators

exist to allow one to manipulate the Sum domains exemplified above. For these

definitions assume the general sum domain S=D1+...+Dn , P is a set of named

programmed segments, L is a set of named locations, and Id = P + L:

Operator d in S.

If d is an element of Di then d in S provides an element of S corresponding to d.

This operation allows one to "inject" elements of the smaller domain into the larger, e.g.,

during declarations of procedures and variables. Therefore, when a procedure, p, is

declared, one would use the expression p in Id. The in operator is not to be confused with

the membership relation: instead of testing for membership, in is forcing membership.

From a practical standpoint, this operation defines the use of an identifier once its use is

declared.

Operator s | Di .

If s is an element of S then s | Di provides the element of Di corresponding to

s. This operation allows one to "project" elements of the smaller domain from the larger,

DRAFT 5/25/2023 III-19

e.g., during references. When a variable id is referenced, one would use the expression

id in L in order to obtain the location of id. Projection on the name of a procedure would

obtain an appropriate segment of code, in the case of a procedure name, or a memory

location, in the case of a variable.

Operator s E Di .

If s is an element of S then s E Di iff s is squarely in s E Di . This operation

is used to guard actions of injection and projection, to make certain that an object is

where one thinks.

As in the preceding, assume Id = P + L. Consider the following example:

if id E P then id in P + L

In this example, imagine that one is capturing the meaning of a procedure declaration. In

the next example, one is obtaining a named segment of code in the presence of a

reference to a procedure:

if id E P then id|P

Replacement Operation.

There is an operator to allow for the replacement of information in the context of

a function.

f[a/b] denotes function f’ which is identical to f except in f’, b maps to a.

Given f[a/b], one subsequently finds that f[b]=a. In the denotational semantics one can

define a mapping with f[a/b] and then reference it with f[b].

Semantic Equations.

DRAFT 5/25/2023 III-20

The goal of the semantic definitions is to map from a syntactic object to a

semantic object. There are two important operators found in a semantic equation. First,

there is a special set of brackets used only to encapsulate (and thus isolate) syntactic

objects:

« syntactic object » .

One can state conditional semantic equations using the operator if-then. Consider

the following semantic equation, which obtains the value of a variable:

« » = if E L then [|L] else T

This equation states that the meaning of the variable evaluates to one of two things: if the

variable is in the domain of locations L, then the meaning of the variable is projected

from the locations domain with respect to the current state (i.e., one will obtain the value

currently in the location to which the variable is mapped); otherwise an erroneous

condition exists. Notice that the semantic equation is similar to a lambda function. The

fundamental notion is one of replacement. When the LHS of an equation is matched,

e.g., «», it is replaced by the RHS, e.g., if E L then [|L] else T. The

arguments of the function are and . The general form of a semantic equation is:

LHS = RHS OR

LHS = if CONDITION then RHS(true) else RHS(false)

More Interesting Examples

A semantic metavariable, E.G., , can take on the value of any legal variable in

the target language (legal based upon the BNF - syntax definitions). Suppose the

following BNF rule exists:

::= a | b |...| z | a | b |...| z

DRAFT 5/25/2023 III-21

Let’s apply the knowledge we have gained so far by refining our language of arithmetic

expressions to allow for variables. To do this, we will need a function that allows us to

map from the syntactic variable domain to a semantic domain containing values to which

the variable may be assigned. This mapping must be done with respect to a state :

[] yields the value to which is assigned in state . Augmenting the original syntax

for arithmetic expressions, presented at the beginning of this chapter, one obtains

arithmetic expressions, which provide for the use of variables:

New Syntax:S ::= := EE::= T | E + T | E - TT::= F | T * F | T / FF::= (E) | cons |

Since cons and should both be viewed as syntactic categories, it can be claimed that

each syntactic category is a Syntactic Domain, i.e., a class of syntactic objects from

which one will map to semantic objects. However, we will not declare them as syntactic

domains, since the syntactic construction of variables and constants is well understood.

New Syntactic Domain:S:Stmt E:ExpT:TerF:Fac

From the syntactic domain, one maps to semantic domains. Below, the semantic domains

are presented:

New Semantic Domain:: St = N: N = {...,-2,-1,0,1,2,....}

For each LHS in the BNF, we declare a semantic function. In order to understand the

semantic function declarations, let us consider the example from below: : Stmt

DRAFT 5/25/2023 III-22

. The semantic function symbol is declared to take as input a statement from the

syntactic domain and, based upon the statement, map from an input to some an state.

Each function declared maps from a syntactic domain to a semantic domain:

Semantic Functions: : Stmt St St : Exp St N : Ter St N : Fac St N

Considering the declarations, one can better understand the corresponding semantic

equations. For example, given the function declaration : Stmt , there is

one equation (j.) corresponding to the single RHS option for a statement in the syntax.

The LHS of the semantic equation contains the syntactic RHS option used to select the

equation, and the additional argument, , to be used as the current state. The RHS of the

semantic equation describes how the mapping is to be accomplished and is used to

replace the LHS of the semantic equation with, in the case of equation (j.), a new state:

Semantic Equations:a. «E + T » = «E » + «T »b. «E - T » = «E » - «T »c. «T » = «T »d. «T * F» = «T » * «F »e. «T / F » = «T » / «F »f. «F » = «F »g. «(E) » = ( «E »)h. «cons » = element of N corresponding to consi. « » = [ ]

Equation (i) obtains the value associated with a variable. Given the full set of equations

here, if a variable is undefined a null value is obtained – not undefined, but null.

j. « := E » = [«E » / ]

DRAFT 5/25/2023 III-23

This equation (j) assigns the meaning of the expression E (on the RHS of the assignment

operator in the syntactic structure) to the variable (on the LHS).

« (a+10)/d » c « (a+10)/d» e «a+10) » / «d » f « (a+10) » / «d » f « (a+10)» / «d » i

« (a+10) » / 6 g

( «a+10» ) / 6 a ( «a»+ «10» ) / 6 2 c ( «a»+ «10» ) / 6 2 f ( «a»+ «10» ) / 6 i ( 14+ «10» ) / 6 h ( 14+ 10 ) / 6 math 24 / 6 math 4

In two steps translations were carried out concurrently, marked by 2 c and 2 f. This

marking is meant to denote that equation c was applied to two references: «a»and

«10», respectively; and equation f was applied to «a»and «10», respectively.

More Interesting Examples

In this section, the grammar and semantics above are extended to specify the

syntax and meaning of the while-language, which was originally introduced in the

previous chapter.

Syntax:P::= SS ::= := E | read() | write() | while C do S od | S;SC ::= E1 < E2 | E1 > E2 | E1 = E2 | E1 <= E2 | E1 >= E2 | E1E2

DRAFT 5/25/2023 III-24

E::= T | E + T | E - TT::= F | T * F | T / FF::=(E) | cons |

Syntactic Domains:P:ProgC:CondS:StmtE: Exp T: TerF:Fac

Semantic Domains:: B={true,false} : Fi = N* : CF = St Fi Fi : St = N: N = {...,-2,-1,0,1,2,....}

The most complicated semantic domain is the configuration : CF = St Fi

Fi. The configuration is a 3-tuple where the first component is the state of the

computation, the second is the state of the input file, and the third is the state of the

output file. The three components can be obtained by and

, respectively.

Semantic Function Domain: : Prog Fi Fi : Stmt CF CF : Cond St B : Exp St N : Ter St N : Fac St N

Semantic Equations:a. «E + T » = «E » + «T »b. «E - T » = «E » - «T »

DRAFT 5/25/2023 III-25

c. «T » = «T »d. «T * F» = «T » * «F »e. «T / F » = «T » / «F »f. «F » = «F »g. « (E) » = ( «E »)h. «cons » = element of N corresponding to consi. « » = [ ] j. «E1 < E2» = if «E1 »< «E2 »

then true else falsek. «E1 > E2» = if «E1 »> «E2»

then true else falsel. «E1 = E2» = if «E1 »= «E2 »

then true else falsem. «E1 <= E2» = if «E1 »<= «E2 »

then true else falsen. «E1 >= E2» = if «E1 »>= «E2 »

then true else falseo. «E1 E2» = if «E1 » «E2 »

then true else false

Since «E1 » «E2 »(where can be =,<,>,>=,<=,) evaluates to

B={true,false} the equations j through o can be simplified. For example equation o

could be simplified to:

o.’ «E1 E2» = «E1 » «E2 »

In a like manner the other equations could be simplified, too. Now consider the

assignment statement.

p. « := E» = < [«E » / ] ,, >

The assignment is a fairly sophisticated statement. The state of the machine is modified

by assigning to the variable on the LHS of the assignment operator, the result of

determining the meaning of the expression on the RHS. The expression

«E»invokes the evaluation of the arithmetic expression, which makes use of the

state (i.e.,), and not the input or output files. Clearly, the assignment has no effect

DRAFT 5/25/2023 III-26

on the input or output files, since these components are copied from the input to the

output configuration.

q. «read» = if nil then

<[ hd() / ],tl(), >else

T Where & &

As you will see, the assignment and the read statements have quite a bit in common.

They are the only statements that modify the contents of the state, i.e., replace the value

of a variable. The difference between the statements is twofold. The value to be

assigned to a variable is obtained through the evaluation of an arithmetic instruction – see

equation p. The value to be read into a variable is obtained from the second component

of the configuration – the input file. The second difference is that the assignment

statement affects only the state of the machine. The read statement affects the state of

the machine and the input file. The head of the input file replaces the value of the

referenced program variable and the tail of the input file provides the new state obtained

by the evaluation of the read statement.

r. «write» = if [] then

<,, append(,[])>else

TWhere & &

s. «while C do S od » = if «C »then«while C do S od » «S » else

The while statement’s meaning is conditional. If the condition C evaluates to

false, which it should at some point, the input configuration is passed through (if else

). When the condition evaluates to true, the meaning of the statement(s) S, comprising

DRAFT 5/25/2023 III-27

the body of the loop, are evaluated. The resulting configuration is appended (by ) to

a recursive reference to the while statement that served as the input.

t. « S1;S2 » = «S2» «S1»

Equation (t) reverses the order of the statements so that they can be evaluated in the

proper sequence. Normally we would expect the evaluation to proceed from left to right.

However, the equations are evaluated like lambda expressions – from right to left –

where the evaluation of the rightmost expression provides the arguments for the next-to-

rightmost operation. An interesting fact about equation (t) is that any number of

statements can be reversed by repeated applications. For example, consider three

statements a, b, and c:

« a,b,c »=t « c » «a,b»=t « c » «b» «a»

There is another possible order of applications, but the end result is the same:

« a,b,c »=t « b,c » «a»=t « c » «b» «a»

Equation (u) provides for the initial state of the machine:

u. « P » = («P » <nil >)

This equation sets the output file to nil, copies the input file provided by the user to

second component of the configuration, and – through the use of a lambda function- sets

all variables appearing in program P to undefined. Recall from in our earlier language,

that the variables were null.

Review of the equations.

DRAFT 5/25/2023 III-28

Let’s review the equations just presented. Notice, in particular, the equations for

the read, write, and the while. The read checks to see if the input file is not nil. If not

nil, then the head of the file is assigned to the variable referenced in the read, [

hd()/]. Next the remainder of the list is left for the input file, tl(). The read

statement is much like an assignment in terms of its affect on the state of a computation.

From the equations one can clearly see that the only statements altering the state of the

machine are the read and the assignment statements.

The write appends the meaning of the referenced variable to the end of the output

file, append(,[]). As such it has no impact on the state of the machine.

The while statement is a complicated definition. It first checks the condition

referenced in the while statement:

«while C do S od » = «C »

Since requires only the state, the first component of the configuration is obtained with

. If the condition is satisfied, the meaning of the statements of the while (i.e., S in

while C do S od) is obtained:

« S »

The evaluated body of the while forms a configuration to be used when one reconsiders

the while loop. The totality of the semantic is denoted by:

«while C do S od » « S »

When the condition C is not satisfied, the while statements are bypassed. The bypass is

effected by replacing the while statement reference «while C do S od » with the input

configuration, . Consider the derivation of the following program:

M« read(n); i:=1; while i<=n do i:=i+1 od; write(i) » [3] u

DRAFT 5/25/2023 III-29

Equation u accomplishes several purposes. First, it establishes the initial configuration of the

program. The initial configuration includes a state where all program variables are declared to

be undefined. The initial state is formed with a lambda function: . The second component

of the configuration is the input list - in this case, [3]. The final component is the program

output list, which is initially nil. The program's meaning will result from the meaning of the

program's statements . The meaning of the statements will produce a final configuration, the

third component of which (i.e., , the output file) is the communicated meaning of the

program:

(«read(n); i:=1; while i<=n do i:=i+1 od; write(i) »<nil >) =t

Equation t rearranges statement pairs since the program is evaluated as a lambda function, from

right-to-left, rather than as read from left-to-right:

(« i:=1; while i<=n do i:=i+1 od; write(i) » «read(n) »<nil >) =q

The read statement defines the reference variable with the head of the input list [ hd([3]) /n] and

adjusts the input list to remove the value just read tl([3]):

(«i:=1; while i<=n do i:=i+1 od; write(i) »<[ hd([3]) / n],tl([3]),nil>) =t

Once again, equation t rearranges statement pairs:

(«while i<=n do i:=i+1 od; write(i) » «i:=1 »<[3/n],nil,nil>) =p

The next three steps are followed to initialize program variable i:

DRAFT 5/25/2023 III-30

(«while i<=n do i:=i+1 od; write(i) »<[« 1 » /i],nil,nil>)

=c («while i<=n do i:=i+1 od; write(i) » <[« 1» /i],nil,nil>)

=f («while i<=n do i:=i+1 od; write(i) »<[« 1» /i],nil,nil>)

=h

Once again, equation t rearranges statement pairs:

(«while i<=n do i:=i+1 od; write(i) »<[1 /i],nil,nil>)

=t

Now the loop semantics are entered for the first time:

(« write(i) » «while i<=n do i:=i+1 od»<[1 /i],nil,nil>) =s

The next several steps are followed to evaluate the loop condition:

(«write(i) » if « i<=n »then« while i<=n do i:=i+1 od » «i:=i+1 »<,nil,nil> else <,nil,nil>) =m

The evaluation of « i<=n » itself results in a conditional evaluation shown below as a nested

conditional:

(«write(i) » if ( if «i »<= «n »then true else false )then«while i<=n do i:=i+1 od » «i:=i+1 » <,nil,nil> else <,nil,nil>) =2 c

(«write(i) » if ( if «i »<= «n »then true else false )then« while i<=n do i:=i+1 od » « i:=i+1 »<,nil,nil> else <,nil,nil>) =2 f

DRAFT 5/25/2023 III-31

(«write(i) » if ( if «i » <= «n » then true else false )then«while i<=n do i:=i+1 od » «i:=i+1» <,nil,nil> else <,nil,nil>) =2 i

(«write(i) » if ( if <= then true else false )then«while i<=n do i:=i+1 od » «i:=i+1 » <,nil,nil> else <,nil,nil>) =math

(«write(i) » if true then«while i<=n do i:=i+1 od » «i:=i+1 »<,nil,nil>else <,nil,nil>) =math

Since the loop condition evaluates to true, the next several steps are followed to evaluate the

loop body. The configuration obtained from the loop body will be used when the loop reference

is to be reevaluated:

(«write(i) » «while i<=n do i:=i+1 od » «i:=i+1 »<,nil,nil>) =p

(«write(i) » «while i<=n do i:=i+1 od » < [«i+1» /i] ,nil,nil>) =a («write(i) » «while i<=n do i:=i+1 od »< [«i»+ «1» /i] ,nil,nil>)

=2 c («write(i) » «while i<=n do i:=i+1 od »< [«i»+ «1» /i] ,nil,nil>)

=2 f («write(i) » «while i<=n do i:=i+1 od »< [«i»+ «1» /i] ,nil,nil>)

=i+h

(«write(i) » «while i<=n do i:=i+1 od »< [+1 /i] ,nil,nil>) =math

The loop semantics are reentered:

(«write(i) » «while i<=n do i:=i+1 od»<[2 /i],nil,nil>) =s

The next several steps are followed to evaluate the loop condition:

(«write(i) » if « i<=n »then

DRAFT 5/25/2023 III-32

« while i<=n do i:=i+1 od » «i:=i+1 »<,nil,nil> else <,nil,nil>) =m







loop body:



=2 c

(«write(i) » «while i<=n do i:=i+1 od »< [«i»+ «1» /i] ,nil,nil>) =2

f («write(i) » «while i<=n do i:=i+1 od »< [«i»+ «1» /i] ,nil,nil>) =i+h («write(i) » «while i<=n do i:=i+1 od »< [+1 /i] ,nil,nil>)

=math

Notice now, that the loop semantics are reentered for the third time:

DRAFT 5/25/2023 III-33









loop body:



=2 c («write(i) » «while i<=n do i:=i+1 od »< [«i»+ «1» /i] ,nil,nil>)

=2 f

DRAFT 5/25/2023 III-34

(«write(i) » «while i<=n do i:=i+1 od »< [«i»+ «1» /i] ,nil,nil>)

=i+h

(«write(i) » «while i<=n do i:=i+1 od »< [+1 /i] ,nil,nil>) =math

The loop semantics are evaluated for the final time:







(«write(i) » if false then«while i<=n do i:=i+1 od » «i:=i+1 »<,nil,nil>else <,nil,nil>) =math

Since the condition evaluates to false, the configuration is passed through to the write

statement:

(«write(i) »<,nil,nil>) =r

(<,nil,append(nil, [i]) >) =append

DRAFT 5/25/2023 III-35

(<,nil,[4]>) =extract

[4]

Axiomatic Semantics in a Nutshell.

There are other methods for defining the semantics of a programming language.

One such method is the axiomatic semantic definitions developed by C.A.R. Hoare.

[Hoare] In these definitions, one can prove programs partially correct. That is, one can

use the axiomatic semantic definitions of a target language (combined with axioms from

the problem domain) to prove programs written in the target language to be correct with

respect to the specification of the program.

The following symbols are used, respectively, for logical implication, , and a

statement "it is true that...", ├ .

Assume the problem domain of a program to be integer arithmetic. First one

develops axioms for various arithmetic rules. These axioms describe the problem

domain.

A1: x+y = y+xA2: x*y = y*xA3: (x+y)+z = x+(y+z)A4: (x*y)*z = x*(y*z)A5: x*(y+z) = x*y + x*zA6: y<=x (x-y)+y=xA7: x+0=xA8: x*0=0A9: x*1=x

There is a basic template for proof, often referred to as Hoare's Triples:

P{Q}R where P is a precondition, R is a postcondition, and Q is a program (a sequence of or a single statement)

DRAFT 5/25/2023 III-36

Using the basic form we now have a series of axioms which give the semantics of the

language features. These axioms define the language to be applied to the problem

domain.

AXIOM OF ASSIGNMENT

D0: ├ P0{ := E }P Where is an identifier and E is an expression

Meaning: P0 is obtained from P by substituting E for all occurrences of in P.

AXIOM OF CONSEQUENCE

D1: if ├ P{ Q }R and R S then ├ P{ Q }S if ├ P{ Q }R and S P then ├ S{ Q }R

There is an important difference for implications that reference a precondition as opposed

to those which reference a postcondition. Notice that when the implication involves a

postcondition, the postcondition serves as the implication's antecedent. When the

implication involves a precondition, the precondition serves as the consequent. The

reasoning behind these relationships is significant.

Since an implication S P is equivalent to S P, one cannot logically justify:

if ├ P{ Q }S and R S then ├ P{ Q }R

R is not necessarily true. However one can justify

if ├ P{ Q }R and S P then ├ S{ Q }R

DRAFT 5/25/2023 III-37

since S can be true or false, but in either case P remains true.

AXIOM OF COMPOSITION

D2: if ├ P{ Q1 }R1 and R1{ Q2 }R then ├ P{ Q1; Q2 }R

The axiom of composition is applicable when the postcondition of one statement Q1 also

describes the precondition of another statement Q2. Since pre- and postconditions are

describing states, Q2 must immediately follow Q1.

AXIOM OF ITERATION

D3: if ├ P&B{ S }P then ├ P{ while B do S od }P& B

The axiom of iteration is a safe rule. When the pattern P&B{S}P is matched, it is safe to

say that P is a loop invariant for statements S, in which case a while-loop can be

formed.

Since read's are like assignments, and since a write statement does not alter the

state or otherwise influence a computation, there is no need to define them. Let us now

consider a problem specification, a while program solution, and a proof to show the

program to be correct.

Consider the program specification and program below:

Specification:x and n are input – initial values for i and r are zero. i<n & x=x*(i+1)-r

DRAFT 5/25/2023 III-38

The program specified will multiply two numbers together (i.e., x n ). The

specification is an invariant - since this program multiplies x n, it is important to

indicate how one may compute x at any point in the program.

Program:r:=0;i:=0;while i < n do

r:=r+x;i:=i+1od

Program Proof:1. true x=x*(0+1)-0 Lemma 1

Step 1 is proven below by Lemma. This is an important step because it sets the stage for

the proof. It is based upon the specification, is mathematically true, and provides for an

initial state of an invariant.

2. x=x*(0+1)-0 {r:=0} x=x*(0+1)-r D0

Step 2 is based upon the axiom of assignment wherein the precondition can be obtained

from the postcondition by substituting zero (the RHS of the assignment) for all

occurrences of r (the LHS of the assignment).

3. x=x*(0+1)-r {i:=0} x=x*(i+1)-r D0

Step 3 is also based upon the axiom of assignment wherein the precondition can be

obtained from the postcondition by substituting zero (the RHS of the assignment) for all

occurrences of i (the LHS of the assignment).

4. true {r:=0} x=x*(0+1)-r D1 (1,2)

DRAFT 5/25/2023 III-39

Step 4 employs the axiom of consequence.

5. true {r:=0;i:=0} x=x*(i+1)-r D2 (4,3)

Since step 4's postcondition matches step 3's precondition, the axiom of composition can

be used to concatenate program statements.

6. x=x*(i+1)-r & i < n x=x*((i+1)+1)-(r+x) Lemma 2

Step 6 is proven below by Lemma. This again is an important step because it sets the

stage for establishing the needed while loop. It is indeed based upon the specification, is

mathematically true, and provides for a loop invariant that makes use of the defined

variables.

7. x=x*((i+1)+1)-(r+x) {i:=i+1} x=x*(i+1)-(r+x) D0

Step 7 is based upon the axiom of assignment wherein the precondition can be obtained

from the postcondition by substituting i+1 (the RHS of the assignment) for all

occurrences of i (the LHS of the assignment) in the postcondition.

8. x=x*(i+1)-(r+x) {r:=r+x} x=x*(i+1)-r D0

Likewise, step 8 is based upon the axiom of assignment wherein the precondition can be

obtained from the postcondition by substituting r+x (the RHS of the assignment) for all

occurrences of r (the LHS of the assignment) in the postcondition.

9. x=x*((i+1)+1)-(r+x) {i:=i+1;r:=r+x} x=x*(i+1)-r D2 (7,8)

Since step 7's postcondition matches step 8's precondition, the axiom of composition can

be used to concatenate program statements.

DRAFT 5/25/2023 III-40

10. i<n & x=x*(i+1)-r {i:=i+1;r:=r+x} x=x*(i+1)-r D1 (6,9)

Step 10 employs the axiom of consequence on steps 6 and 9. This results in exactly the

pattern for a while-loop application where P is x=x*(i+1)-r and B is i<n.

11. x=x*(i+1)-r {while i<n do i:=i+1;r:=r+x od} i<n & x=x*(i+1)-r

D3

Step 11 employs the axiom of iteration.

12. true {r:=0;i:=0;while i<n do i:=i+1;r:=r+x od} i<n & x=x*(i+1)-r

D2(5,11)

In a final application of the axiom of composition, step 12 pulls the two program

segments together. Thus the program is proven partially correct.

Partial correctness is satisfied when a program (like the one above) is shown to be

correct with respect to its specification. Notice in the proof above, that the postcondition

derived is the specification of the program. To be totally correct it must also be proven

that the program will terminate or halt. Dijkstra's guarded commands [Dijkstra] were

developed to extend the idea of Hoare's Triples to provide for total correctness.

Below, the arithmetic axioms are employed to prove the lemmas.

Lemma 1: Lemma 2:

x=x x=x*(i+1)-rx=x*1 A9 x=x*(i+1)+x-(r+x) A6x=x*1+0 A7 x=x*((i+1)+1)-(r+x) A5x=x*1+0*x A8x=x*(1+0) A5x=x*(0+1) A1x=x*(0+1)-0 A7

DRAFT 5/25/2023 III-41

REFERENCES[Stoy] Joseph E. Stoy Denotational Semantics: The Scott-Strachey Approach to

Programming Language Theory, The MIT Press, Cambridge, Massachusetts (1977).

[Dijkstra] E.W. Dijkstra, "Guarded Commands," Communications of the ACM, 18:8, 453-457.

[Hoare] C.A.R. Hoare, "An Axiomatic Basis for Computer Programming," Communications of the ACM, 12:10, 576-580.

Questions.

1. Write a program to convert character representation of a number to an integer value

without using string-to-integer conversions available in some languages. to integer

2. Repeat the code written for 1 to convert from character to real.

3. Using the semantic equations derive the meaning of the following:

M« read(n); i:=1; while i<=n do i:=i+1 od; write(i) » [2]

4. Assume D = {t,f}. Give the result of each of the following denotational operations:

a. < l, 4, [1,2,3,4], [6,5]> 2b. hd( dcba ) c. tl(dcba ) d. D2 e. D*

5. Write a while language program segment that is equivalent to the JAVA segment

below:

s=0;

DRAFT 5/25/2023 III-42

for (i=0;i<9;i++)

{s:=s+i}

6. Using the while language definitions (with the additions below) complete the

following definition to implement the repeat-until loop construct:

« repeat S until C » =

which means, repeat the statements of S until the condition C is true. S is executed at least one time since

the test of the condition occurs at the end – not the beginning of the loop body. Additions to the syntax and

semantics of the while language:

C ::= E1 < E2 | E1 > E2 | E1 = E2 | E1 <= E2 | E1 >= E2 | E1E2 | not C

«not C» = if «C»then false else true

7. Using the handout on the syntax and detonational semantics of the while language,

show the steps necessary to determine the formal meaning of

« while i< 2 do i:=i+1; s:=s*i od »<,1,2>

Assume in above, the following values for i and s: i = 1 and s = 1.

DRAFT 5/25/2023 III-43

chapter three - texas tech universityredwood.cs.ttu.edu/~rwatson/classes/cs3361/ch3.doc · web...

Documents