correctness of the compiling process based on axiomatic semantics

Acta Informatica 14, 1-20 (1980)

�9 by Springer-Verlag 1980

Correctness of the Compiling Process Based on Axiomatic Semantics

Bruce Russell*

Dept. of Computer Sciences, SUNY at Stony Brook, Stony Brook, NY 11794, USA

Summary. A language that includes computed gotos and parameterized procedures is defined and its semantics are given axiomatically. A number of program transformations are described and proved correct. Taken collectively and applied repeatedly these transformations compile the full language into a low level subset,

1. Introduction

It has been argued by Gerhart [2] that a theory of partial correctness can be developed, based on verification rules. This paper explores the area of program transformations, or more traditionally compilation, using verfication rules. A number of transformations are examined and proofs of their correctness are given. These are obtained by showing that from any proof of a program, a proof of the transformed program may be derived. Hence, the transformations pre- serve all provable properties of the program. We thus show that a theory of partial correctness is capable of showing the correctness of a compiling process.

The language in which the transformations are specified is a high level, block structured programming language that belongs to the A L G O L family. Its features include parallel assignment, parameterized procedures and computed gotos. Its semantics are specified axiomatically. The language itself and its semantics are of interest since the language includes features that have not been given axiomatic semantics, to our knowledge, in the literature. These features would include the computed goto and its interaction with blocks and procedures. The language does not contain recursive procedures, and call by value result is the only parameter convention used. It is not clear at this time how to extend the results to include these other important cases.

We have examined a set of transformations rather than a single compilation function for a number of reasons. The case for transformations in general has

Current address: Megadata Corporation, 35 Orville Drive Bohemia, NY 11716, USA

0001-5903/80/0014/0001/$04.00

2 B. Russell

been made by Knuth [5] and a number of recent papers [-6, 11] have explored various aspects. Our reason for using transformations, aside from the general reasons, is that they nicely factor the proof of a compiler, which can be difficult [7, 8] into a number of independent and smaller proofs of individual transformations. By the finiteness of programs, one can agrue that repeated applications of the transformations will eventually compile the program.

Axiomatic semantics have been choosen since they are capable of specifying entire languages including low level and high level features. They do not require any sophisticated methematics to understand, and the proofs are simpler than with alternative semantic models, such as denotational [-9, 10] or operational [12] models.

In Sect. 2 we will define the language, introduce our notation, and derive a few rules of inference. The if and while statements and their transformations into goto programs are discussed in Sect. 3. Section 4 examines procedures and transformations that either replace calls by the procedure body or save return addresses and transfer to the procedure body. The transformations of assignment statements into a one-address, machinelike language are the subject of Sect. 5. In Sect. 6 will be found our conclusions.

2. The Language and its Formal Definition

We are going to define a programming language that belongs to the class of A L G O L like languages. The basic statement is an assignment statement and these may be composed into more complex forms using if, while, goto, and parameterized procedure constructs. The language has a block structure with the usual scope rule. The set of basic data objects and operations will be uninterpreted, however, the reader may think of integers, reals, and the other usual data types.

Regarding notation, the letter s perhaps decorated with subscripts will stand for any statement and b is a boolean valued expression. An identifier will be denoted by x or a, and ~ or 8 is a sequence of identifiers, similarl___y for ~, where e is an expression. A sequence of declarations will be written x: t , where the x's are identifiers and the t's are types (including procedures and labels).

An abstract syntax for the language may now be given. A program is a statement s, where s is:

x : = e X:~- - - e

skip abort if b then s if b then s 1 else s 2 SI: S 2

while b do s goto 1 l:s

call f (2) begin x : t , s end

parallel assignment

labelled statement

block with declarations

Correctness of Compiling Process with Axiomatic Semantics 3

The pairs x : t will be the usual ones, such as/: integer, as well as/ : labeiconst (a label constant), x:iabel (a label variable) and f :p roe ( x : t ) s , which declares a procedure named f with parameters x : t and body s.

The semantics for the language will be specified axiomatically in the tradi- tion of Floyd [13 and Hoare [33. We will use P,Q, R and T to stand for assertions, that is formulae of the first order predicate calculus, with predicates and functions drawn from the data types of the language. As usual, the notation P[s ] Q, where s is a program, is informally read as, if P is true of the machine state before execution of s, and s terminates normally, then Q is true of the final machine state. However, this notation needs to be extended to handle the goto and declarations.

This will be done by allowing a correctness formulae to have the form:

L ~ - P [ s ] Q

where L is a list of assumptions. Most of these assumptions simply give the information that some identifier x is of some type t. In this way, properties of the type t may be used to reason about x. This same approach is used for procedure declarations. The parameter list and body of the procedure are made available in the list of assumptions to permit reasoning about a call of the procedure. Finally, for a declaration of /:labelconst the entry in the list of assumptions for 1 is l:P, where P an assertion and in particular the assertion to be true whenever control passes to l.

An informal interpretation for correctness formulae may be given as follows. Let the state sequence of an executing program consist of pairs (l, m). The m (for memory) component is just the usual mapping from identifiers to their values. The l component is the label of the next statement to be executed. It may also take on a null value to indicate that the next sequential statement should be executed.

A set of assumptions is valid with respect to a state (l, m) if and only i f : i) the values of identifiers in m are of the types they are supposed to be in L.

ii) l :P in L then P is true of m.

Now the notion of a valid L ~ - P ( [ s ] Q can be given as follows. Execution is started in a state that makes Lvalid. Further, if the initial label is null, P must be true of the mapping, and execution will terminate in a state that makes Lvalid. If the label component of that final state is null, then Q will be true. Note that for an empty L, this interpretation defaults to the usual one. The formalization of this notion is beyond the scope of this paper.

Finally, some notation for substitutions in assertions is needed. By P ( e / x ) we mean substitute e for all free occurrences of x in P with the usual renaming of bound variables in P to make e free for x in P. Similarly P(-~/.~) means the simultaneous substitution of the e's for the x's. Clearly the lengths of ~ and must be the same and the x's distinct.

An obvious property of substitution that we will need later is that Q(a/,2) (2/~) is syntactically identical to Q for any Q if for each a~ either a i does not occur free in Q or a i = x i.

4 B. Russell

The semantics of the language are now given and some comments about the language and its definition follow:

Assignment Statements (assign axioms) Lt-P(e/x) [x: =e ] P L~P(-O/,2) [~: =~] P

Skip Statement (skip axiom) L F- P [skip] P

Abort Statement (abort axiom) L ~- P [abort] false

I f Statements (if rules) LF-Pb[sl] Q, P A--q b [s2] Q

L F- P [if b then s 1 else s2] Q L~- P Ab[s] Q, P A-qb~Q

L ~ P [if b then s] Q

Composition Statement (composition rules) L~-P[Sl] Q, Q[s2]R

LF-P[sl; s2]R

While Statement (while rule) L~- P Ab[s]P

L F- P [while b do s] P A -7 b

All the previous rules are well known and, thus, do not require explanation. The next rule deals with blocks and declarations, the syntax is:

begin x : t , s end

_ m

where x:t, is a sequence of identifier type pairs and s is the body of the block. For the moment we assume that the t's are simple types like integer or boolean, etc. We subsequently will deal with types label, labelconst, and procedures. A list L of variable type pairs will be used so that at any point in the proof, the types of the variables are known. In this way, properties of the types may be used in the proofs. Hence one aspect of the rule will be to add x ~ to Lthe list of assumptions. However, we wish to avoid problems of holes in scopes (and later will want a static binding rule for the global variables of procedures) so we force a renaming of the variables x to avoid clashes with the variables that are mentioned in L. The rule is:

Begin End Statement (scope and declaration rule) L,x':tI-P[s](~'/,2)Q

L F- P [begin x: t, s end] Q

where ~' is a list of distinct variables that are not mentioned in L and do not occur free in P, Q, or s.


A small example should clarify how the previous rule keeps track of free and bound variables and their types. Suppose we wish to prove something of the form:

x: int , y : int ~- x = x ' A y = y ' [begin y :int, z :int, z: = x ; y: = x end] x -- x' A y = y'

Hence, this program does not change the values of the globals x and y. Using the scope rule we remove the begin end renaning y and z so as not to clash with x, y, x' or y'. Choosing w and z as the replacements for y and z respectively gives:

x: int, y: int, w: int, z :int

x=x ' A y = y ' [ z : = x ; w : = x ] x = x 2 y = y '

which will follow using the assign axioms and composition rule. Turning now to labels and gotos, the basic idea is to allow an assertion P to

be associated with each label l. Then it must be shown that P is true whenever a goto l is executed or whenever control reaches a statement labeled with 1. The assertion must be fixed when the label constant declaration is encountered in the proof. This is accomplished by adding an assumption of the form hP for each / : labeleonst to the list of assumptions. Note a declaration x:label is handled as before. For example, the declaration:

x: real, l : iableeonst, z :label

gives rise to a list of assumptions as follows:

x: real, 1: R z: label

Also, note that the variables mentioned in this list include the free variables of P. Now the rule for the goto is:

Goto Statement (goto axiom) L F- P A l = e [goto e] false where l:P is in L.

For the case of the simple goto, the rule simplifies to the usual

L k- P [goto l] false

where l :P is in L.

Finally, a rule for labeled statements is required. It must insure the assertion associated with a label is true at the label. More precisely:

Labeled Statement (labeling rule) L~P[s]Q

L v--P[l:s] Q where h P is in L.

6 B. Russell

Before we can give an example to help clarify reasoning about the goto, we need the following standard rules:

Consequence Rules L~-P~Q,Q[s]R L~-P[s]Q,Q~R

L~-P[s] R L ~- P[s] R

Or Rule L ~- P~ Is] O,, Pa Is] 02 L~P1 vP2[s] Q1 vQ 2

The example follows. Consider the program s, where s is:

beg in l: l abe l eons t , w: label ,

w:=l x : = 3 g o t o w ; x : = 2

/:skip end

Let the set of initial assumptions be:

L = w: label , / :x=3

and call the body of the block s'. Now if:

L ~- true [s ' ] x = 3

then:

(1)

true Is ] x = 3

by the declaration and scope rule. The proof of (1) follows:

1. L F- true [w.. = I] w = 1 2. L~-w=l [ x : = 3 ] w = lAX=3 3. L ~- x = 3/x w -- 1 [ g o t o w] false 4. L ~ false [x: = 2] false 5. L ~- x = 3 [skip] x = 3 6. L~-x=3[l:sk ip]x=3 7. L~-false[hskip] x = 3 8. L~-true[w:=l;

x : = 3 ; g o t o w; x: =2 ;

l: skip] x = 3

assign and consequence assign and consequence goto axiom assign skip axiom labeling rule consequence from 1, 2, 3, 4 and 7 by composition

Q.E.D.


The final construct to consider are procedures. A procedure is declared with the following syntax"

f : proc (x :t) s

where f is the procedure name, x : t is a list of parameter type pairs, and s is the body of the procedure. For simplicity, we shall assume a value return convention for all parameters, and that procedures are not recursive. The basic approach is to put this declaration into the list of assumptions, as for any other declaration. Indeed, this is what we do with one change:

f: proe (x': t) s [ff'/~]

is added to the assumptions. The ~' have been choosen not to clash with any variables mentioned in L or those added to L by the set of declarations under consideration. The ~' now become variables mentioned in the list of assumptions. Hence, there will be no clashes between variables that are used in a block and the formal parameters of any procedure declared in that block. The scope and declaration rule described earlier is now complete.

Turning to the call statement, which has the form:

cailf(~)

where f is a procedure name, and 8 is a list of variables. To be able to prove a property of a call, we show the property for the body of the procedure. This property will now be in terms of the formal parameters of the procedure, rather than the actual parameters. If the formal parameters are 2, then the property P shown of the body becomes P(8/~) shown of the call. Informally, the rule will say if f has been declared as f :proc(x: t )s , and the type of each ag matches the type of the corresponding xi, and some P[s] Q is true of the body, then P(8/,2) [call f (8)] Q(~/~) is true of the call. More formally:

Call Statement (call rule) L~P[s]Q L ~- P(~/~2) [call f (8)] Q(8/~2)

where f :proe(x: t ) s is in L and all the a i are distinct, match in type the corresponding x i and for each al, either a i does not occur free in Q, or a i =xg.

The necessity of the resistriction on the call rule will become apparent when the transformations are discussed; however, it may also be justified informally. It is usually possible by clever renaming to avoid arbitrary clashes of variable names. If this is not possible in the case of an assertion Q, the post-condition of the procedure body, and some calling parameter a, then the effect of the procedure body on the value of a is important, i.e., a is a global variable subject to change. Now, if a is also to be matched up with some formal parameter x, such that x = a , we will have a case of aliasing, where a and x share the same value. As always, this plays havoc with formal systems, and our condition forbids it.

8 B. Russell

This way of handling procedures may not be the best way in terms of making proofs easy. In particular, we require a proof about the body of the procedure for each call. However, our interest is in using the rules to show the correctness of compilation with respect to axiomatic semantics; hence, this choice of rules is justified. Further, a soundness result for the procedure rules with respect to the goto rules will follow as a by-product of the transformation process which removes procedures from any program.

As an example of the previous rules, consider the following program, which we call s:

beg in y : int , f : proe (x : int, y: int) y , = x + 1, y : = l eal lf(y, z)

end

Suppose we wish to prove:

y:int, z:int~ y=y'[s]z=2A y=y'

that is, that the effect of the block is to leave y unchanged and to give z the value 2. Let:

L = y : int, z : int, a : int, f : proe (x : int, c : int) c: = x + 1

which renames y of the procedure to be c. Then show:

L ~ y = y' [a: = 1 ; cal lf(a, z)] z = 2 A y = y'

which is the body of the block with y renamed as a. Using the assign axiom, we must show:

L ~- y = y' A a = 1 [eal lf(a, z)] z = 2 A y = y'

This we get from the call rule by showing:

f:proe(x:int, c:int)c,=x+l is in L.

and

LF-y=y' AX = 1 [C: = X + 13 C=2 Ay=y'

which are both obvious. The above language definition is similar to that of PASCAL, but extends the

formal definition presented in [4] most notably by including the notion of labels and computed gotos. We need these language features since they figure pro- minently in program transformations. The definitions given here are based on suggestions in Knuth [-5] attributed to Hoare. However, we have gone well beyond those suggestions by integrating the notion of labels with block structure and procedures and allowing the computed goto.

Finally, to shorten some proofs and to give some simple examples of meta- proofs, we present two derived rules of inference followed by their proofs.


Labeled Skip Rule

L ~- P [1 : skip] P

where I: P is in L

Proof

1. L ~- P [skip] P 2. L~-P[l:skip] P

skip axiom from l by labeling rule

Q.E.D.

Derived Goto Rule L I-- P[goto l] Q where l: P is in L.

Proof 1. L ~- P[goto l] false by goto axiom 2. L t- f a l s e ~ Q 3. L~-P[goto 1] Q from 1 and 2 by consequence

Q.E.D.

Our approach in the following sections will be to give some program transformation which will transform s into s'. The correctness, or more ac- curately, the preservation of all provable properties, will be established by showing how from a proof of:

L~-P[s]Q

a proof of:

L~-P[s']Q

may be derived. There are interesting questions as to what the relation is between the input

output behavior of s and s'. Clearly, s' may be less defined than s. Consider the transformation that takes any program to one that loops forever. Such programs are not defined at all. However, since any

L~P[s']Q

may be proved of such a program, this transformation is correct by our criterion. To establish more precisely the relation between the input output behaviour of s and s' would require the examination of other properties of our proof system, such as definitional completness. These issues are beyond the scope of the present paper.

We may assume without loss of generality that the last step in the proof of L~P[s3 Q has been a use of the rule or axiom associated with s. If not, then either the or rule or consequence rule was used, and this identical step may be performed in the proof of L~-P[s'] Q.

We now have the tools we need to study the transformations.

10 B. Russell

3. The While and if Statements

This section will deal with the transformations of while and if statements into the corresponding goto programs. The abstract syntax and semantics of the while loop are given by the rule:

L ~ - P A b [ s ] P L ~ P [while b do s] P/x -7 b

Any while loop may be transformed into the following program:

begin loop : iaheleonst,finished : labeleonst, loop: i f ~ b then goto finished; s;goto loop; finished: skip

end

where loop and finished do not occur in s. Call this transformed program s'. If s contains many while loops, the transformation may be applied repeatedly to remove all of them. To justify the transformation, we must show that from

L ~- P [while b do s] P A ~ b (1)

a proof of

L ~ P[s']P A ~ b (2)

may be given. Since the only way to establish (1) is with the while rule, we may assume a proof of:

L F- P/x b Is] P (3)

The proof of (2) will require the choice of assertions for the label constants loop and finished. These will clearly be P and P/x ~ b, respectively. Let

12 = L, loop : P, finished : P/x -n b

Also, assume without loss of generality that loop and finished do not occur free in P, b or L. We are now ready to prove:

Theorem 33. I f L ~- P/x b Is] P then

12 ~- P [loop: i f ~ b then gotofinished s; goto loop; finished: skip] P/x ~ b


Proof.

1. /2 t- P A -7 b [goto finished] P A b

2. /2~-P / \ - n ( ~ b ) ~ P /\b

3. /2 ~- P[ i f -7 b then goto finished] P A b

4. /2 t- P[ loop : if-7 b then goto finished] P A b

5. /2~-P /\b[s]P

6. /2~-P[gotoloop] P A--7b

7. /2 t- P A ~ b [finished : skip] P A 7 b

8. /2~- P[ loop: i f 7 b then goto finished; s; goto loop; skip] P A ~ b

9. L F- P[begin loop : labelconst, finished : labeiconst,

loop: i f ~ b then goto finished; s;goto loop; finished: skip

end] P A -7 b Q.E.D.

derived goto rule

by if rule from 1 and 2

by labeling rule from 3

by the assumptions of the theorem and/_2 includes L

derived goto rule

derived labeled skip rule

by composition from 4, 5, 6, and 7

by declaration and scope rule

The abstract syntax and semantics of the if statement are given by the rule:

L ~- P A b[sl] Q, P A ~b[s2] Q

L F- P[ i fb then s 1 else s2] Q

and the transformation is:

begin startelse : laheleonst, endif: labelconst, if--7 b then goto startelse; s 1 ; goto endif; startelse: s2; endif: skip

end

call this program s'. In a manner similar to the while statement, this transformation may be justified by proving:

Theorem 3.2. If L~-P Ab[sl]Q, P A Tb[sa]Q then L~-P[s']Q.

The proof is practically identical to the while statement and is left to the reader.

1 2 B. R u s s e l l

4. The Procedure Declaration and Call Statement

This section will examine two transformations of procedures and calls. The first replaces a call by a copy of the procedure body and the second involves the use of a return label.

The two rules related to procedures were the scope and declaration rule and the call rule. The scope and declaration rules caused the addition of

f : proc(x' : t)s[,2'/~2]

to the list of assumptions, choosing the if' not to clash with any variables in the assumptions or in the assertions to be proved of the block. The call rule was

L ~ P [ s ] Q

L ~- P(~/X) [callf(~)] Q(~/~)

where f :proc (x : t ) s is in L, all the a i are distinct and match in type the corresponding x z and for each ai either a i does not occur free in Q or else a~ = X i .

The first transformation we examine is the substitution of the procedure body for a call. Since declarations can obviously be reordered, consider the statement

begin y : t, f : proc(x : t) Sl, s 2 end

and let s~ be derived from s 2 where some occurrences (but not necessarily all occurrences) of cailf(~) have been replaced by

This substitution must not occur at a place where any of the x or free variables of sl are bound (note that renaming bound variables in advance will permit the substitutions to be made for any call). The correctness of this transformation is established by showing that from a proof of

L ~- R [begin y : t , f : proe (x : t) s 1 s 2 end] S (1)

a proof of

L F-R [begin y : t , f :proc(x; t ) s 1, s' 2 end] S (2)

may be derived. The proof of (1) will proceed by using the declaration and scope rule

followed by a proof of s 2. These same steps may be used in the proof of (2). The proofs will differ only when we come to a call of the procedure in s 2 that has been replaced in s 2. If we let the call be eallf(8) then the step in proof (1) will be


an application of the call rule, namely:

L~-P[s] O

L ~- P(a/Y~) [eallf(fi)] Q(8/~)

where f : p roe (x : t ) s is in L.

Whereas in the proof of (2) we will need to actually prove:

L ~- P(~/Yc) [a: = x;s ;x." = fi] Q(8/~2)

Hence, the theorem to be proved is:

Theorem 4.1. I f L ~ P [ s ] Q and f :proe(x : t ) s

Proof

1. L F- P(a/,2) [~: = fi] P

2. L V-- P[s] (2

3. L ~ Q(~/~2) (~/~) [~: = ~] Q(~/~2)

4. Q(fi/~2)(,2/8) is Q

5. L ~ - Q [ a : = ; ] Q(•/,2)

6. L ~- P(O/,2) [~.- = a ;s ; a := ~] Q(a/~)

is in L then L~-P(~t/,2)

assignment axiom

assumption

assignment axiom

by the restrictions placed on the a and the free variables of Q

from 3 and 4

by composition from 1, 2, and 5

Q.E.D.

Note that step 4 needs the restrictions we placed on the call rule. The next transformation to consider is the replacement of calls by transfers

to the body of the procedure. When the body of the procedure has been executed, a transfer to a return label takes place. Each transformed call will be labelled with a label I i and the return label will be saved in the label variable return@ The label start-f is the start of the procedure body and the label fin ends the block.

The actual transformation proceeds in the two steps. The first replaces ff by ~' to give:

begin y : t , f : proc(x' : t) s [2 ' / '2] , s 2 end

where the ~' do not occur free in s z or clash with y. There are now no clashes between the formal parameters and the free variables in s 2. The next step replaces the transformed block:

begin y: t , f : proe(x : t)s, s 2 end

14 B. Russell

with

begin y : t ,x : t ,start-f: labeleonst,return-f: label 11 : labeleonst . . . . , I. : labelconst , f in : labeleonst, S2 goto fin; start-f: s; goto return-f; fin : skip

end

where the ith cailf(d) is s 2 has been replaced by:

return-f: - li; x : ~ a ; goto start-f;

/i :~ := . ,~

in s 2, Call this transformed program s'. Note that all the calls have been replaced in s; hence, the procedure f may be discarded. Also, note that the transformation only applies if none of the calls are nested inside blocks (other than the blocks created by the transformation). Since these inner blocks may be eliminated, the transformation may be applied quite generally.

The correctness of the first transformation, the renaming of the formal parameters, is easily shown. Simply use the declaration and scope rule. To establish the correctness of the second transformation, we must show that from a proof of:

L ~- R[beg in y : t , f: proc(x : t) s, s 2 end] T (1)

a proof of

L ~ R [s'] T (2)

may be derived. Now in the proof of (1) when the itheallf(a) was encountered, the call rule was used, namely:

L ~- PI[s] Qi

L ~ P/(~/~) [eallf(~)] Q~(yt/~2)

wheref:proe(x:t)s is in L.

Hence, associated with each call there is a pre-condition post-condition pair P~ and Qi and a proof of P~[s] Qi thus the proof of (2) may assume the proofs of P~[s]Q1,...,P,[s]Q,. Now s' is a block that introduces a number of label constants. Assertions must be associated with each of these as follows:

Let P =(P1/~ return-f= 11) v . . . v (P,/x re turn-f= 1,) which says that if the ith


call is executed, then P~ is true. Clearly, P is the assertion true at the start of the procedure body and, hence, the assumption about start-f is:

start-f: P

Also associated with each 1 i is the assertion that would be true upon return from the itla call, namely, Qi. Hence, the assumptions

I1:Q1,...,I,:Q,

are required. Finally, at the very end of the block, T is true so

fin: T

is also required. Let 12 be L with all the above assumptions added, as well as y : t and x:t. Also let

Q =(Q1/x return-f= 11) v . . . v (Q,/~ return-f= I,)

which describes the state when returning from the procedure body. In the following proofs, the theorem will be given first assuming the lemmas,

and then the lemmas will be proved. To show the transformations correct, we need:

Theorem 4.2. l f L~-Pl[s ] Q1 .... ,P,[s] Q, and L~-R[s2]T then

12~RFsl; gotofin; start-f : s; goto return-f; fin : skip] T

Proof.

1. s T Lemma 4.3 2. /2 t- T[goto fin] P by derived goto rule 3. /2 ~- P[s] Q Lemma 4.4 4. /2 ~ P[start-f: s] Q by labelling rule from 3 5. 12t-Q[goto return-f] T Lemma 4.5 6. 12 I-- T[fin : skip] T by labelled skip rule 7. 12t-- R[s2;

goto fin; by composition from 1, 2, 4, 5 and 6 start-f: s; goto return-f; fin : skip] T

The first lemma we need is:

Lemma 4.3. I f L ~-P1 [-s] Q1,..., P,[s] Q, and L ~-R Is z] T then I2 ~-R Is'2] T.

Q.E.D.

16 B. Russell

Proof The proof of /2b-R[s2] T proceeds exactly as the proof of L ~ R [ s 2 ] T except where a eallf(a) occurred in s2. In that case, the call rule was used to give P/(a/~) [eallf(a)] Q~(a/~) for the i 'h call and so we must show the same for the transformed program, namely:

P/(~/X) [return-f, = 1; x : = a ;

goto start-f; I i :~!: = X ] Qi (a / x )

1. /2 W Pi(a/ff)/x l i =//[return-f: = 1i] Pi(a/'2) /x re turn-f= I i

2. /2 ~- P/(a/~) =~> P~(a/~)/x true P~(a/~) A li = I~

3. /2 ~- P/(fi/ff) [return-f: = 1/] P/(a/~)/x return-f= Ii

4. /2 ~- P/(fi/ff)/x return-f = 1 i [~: = 2] P//x return-f= I/

5. /2 ~- P//x return-f= l i ~ P

6. /2 W P[goto start-f] false 7. /2 F- Pg A return-f= I i

[goto start-f] false 8. /2 F- P/A return-f = 1 i

[goto start-f] Qi 9. /2 ~ Q/(~/~) (~/~) [~, = ~] Q~(~/~)

10. Q/(~/,y)(X/~) is Qi

11.

12.

13.

/2 ~ Qi [a' = ~2 ] Q/(~/~)

1:. ~ Q i [ l / : ~, = x ] 9 . / (~ /~)

/2 ~ Pi(8/~) [return-f.. = I i ; ~:=~ ;

goto start-f; I~:~= =~] Q~(~/~)

assignment axiom

by consequence from 1 and 2

assignment axiom by definition of P

goto axiom

by consequence from 5 and 6

by consequence from 7

assignment axiom

by the restriction placed on a and the free variables of Q/ from 10 and 9

by labelling rule from 11


Q.E.D.

Note again the use of the restriction on the call rule in step 10. The next lemma is:

Lemma 4.4. I f L I-- P 1 Is] Q ~ . . . . . P,[s] Q, and L ~ R[s2] T then E ~ P[s] Q.

Proof Note that s does not use the variable return-f, hence, a proof of PiAreturn- f=Iz[s]QiAreturn- f=l i would proceed exactly as the proof of P/Is] Qi. Then, using the or rule we get P[s] Q. Q.E.D.


The final lemma needed is:

Lemma 4.5. I f L ~ - P I [ s ] Q 1 . . . . , P , [ s ] Q , and L t - R [ s 2 ] T then L' ~- Q [goto return-f] T.

Proo f For each 1 i and Qi we have li:Q i, so by the goto rule we have for each i , / 2~ -QiAre turn - f= l~[go tore turn - f ] fa l s e . Now, by the or rule we have /2 ~- Q [goto return-f] false v . . . v false and by consequence /2 ~- Q [goto return- f ] S . Q.E.D.

Thus, the proof of the second procedure transformation is complete.

5. The Assignment Statements

The final transformations we will explore involve the assignment statements. Their abstract syntax and semantics are given by the axiom schema:

P(e/x) Ix: = e] P

The kinds of transformations we examine will, through successive applications, change any assignment statement into a machine language equivalent. Now, a machine language for a one-address machine may be described as a subset of our existing language. Consider the variable ' a ' to be special in that it does not occur in any program except as the result of a transformation. We want to think of ' a ' as the accumulator. Further, let ' c ' be any constant and ' o p ' any operation. Now the following assignment statements may be considered as one-address machine instructions.

a : = c or a : = x load a : = a o p c or a : = a o p x operation x: = a store

The first transformations to be considered are:

x : = e - - - ~ a : ~ e ;

x : = a (1)

and

a : = e I o p e 2 ~ beg inx : t a : = e 2 ; x . . = a ; a : = e 1 ; a : = a op x (2)

end

where t is the same as the type of e 2 and x does not occur free in e 1 or e 2.

A transformation that retains the same order of evaluation of the original is:

a : = e 1 op e 2 ~ b e g i n x : t l , y : t 2 , a : = e 1 ; x : = a ; a : = e 2 ; y : = a ; a : = x ; a : = a op y (3)

end

18 B. Russell

and if the particular op in question is commutative, then:

a." = e I or e z ~ beginx : t, a:=e 1 ; x :=a ; a : e2 ;a :=aop x

end (4)

Clearly, a single application of (1) followed by some number of applications of (2), (3) or (4) will bring all the assignments into the form of one-address machine instructions. The removal of the blocks introduced by the transformation will be discussed at the end of this section.

Returning to transformation (1), since a is introduced only by the transformation and did not occur in the original program, any assertion made about the original program said nothing about a. In particular, if P was an assertion about the original program, we may assume that a does not occur free in P. The theorem now follows:

Theorem 5.1. P(e/x)[a:=e ;x: = a ] P where a does not occur free in P.

Proof

1. P(a/x) [x: = a] P 2. P(a/x) (e/a) I-a: = e] P(a/x) 3. P(a/x)(e/a)is P(e/x) 4. P(e/x) [ a : = e] P(a/x) 5. P(e/x) [a: = e ; x: = a ] P

assign axiom assign axiom properties of substitution from 2 and 3 from 4 and 1 by composition

Transformation (2):

Q.E.D.

a: = e 1 op e 2 ~ begin x :t a : = e 2 ; x : = a ; a : = e l ; a : = a o p x

end

may be applied in contexts where a is known and, thus, no special assumptions about a in P need to be made. Again, we must derive the axiom schema for the transformed assignment statement. We may assume without a loss of generality that x, the local variable declared in the transformed program, does not occur free in P. If it does, the declared x may be renamed.

Theorem 5.2. P(elopez/a ) [ b e g i n x : t , a : = e 2 ; x : = a ; a : = e 1 ;a :=aopxend]P .

Proof

1. P(a op x/a) [a: = a op x] p 2. P(a op x/a)(el/a ) [ a : = el]

P(a op x/a) 3. P(a op x/a) (ex/a) is

P(el op x/a)

4. P(e 1 opx/a) [a." =el i P(a op x/a)

assign axiom

assign axiom

properties of substitution

from 2 and 3


5. P(e 1 op x/a) (a/x) Ix." = a] P(e Iop x/a)

6. P(e 1 opx/a)(a/x) is P(e 1 op a/a)

7. P(e I opa/a) Ix." = a ] P(e Iop x/a)

8. P(e 1 op a/a) (ez/a) [a: = ez] P(e 1 op a/a)

9. P(el opa/a)(ez/a ) is P(e 1 op e2/a )

10. P(e I opez/a ) [ a : = e z ] P(e 1 op a/a)

11. P(e I ope2/a)[a:=ez;x:=a; a : = e l : a : = a o p x ] P

12. P(e Iop ez/a ) [ b e g i n x : t , a : = e 2 ; x : = a ; a : = e l; a: = a op x end] P

assign axiom

properties of substitution and x not free in P or e 1

from 5 and 6

assign axiom

properties of substitution

from 8 and 9


since we assumed x was not free in P

Q.E.D.

The correctness of transformations (3) and (4) may be shown in a similar manner to (2) and these are left to the reader.

Repeated applications of the above transformations will produce segments of code that consist of sequences of low level assignment statements or blocks. The blocks themselves declare only variables (i.e., no labels or procedures) and the bodies of the blocks are again a sequence of assignments and blocks.

The removal of these blocks in this setting could be accomplished as follows. The simplest approach is to rename all the local variables so they are distinct and have one block at the outermost level declare them. While this obviously works, it destroys the information given about the scope of temporaries and does not optimize the use of storage.

A more optimal approach is to rename locals so that two conditions are satisfied, namely:

i) variables in nested blocks are distinct; ii) variables in disjoint blocks are the same if they are of the same type or

have the same storage requirements.

Again, let one block at the outermost level declare them all.

6. Conc lus ions

We have proven the correctness of a number of program transformations that, taken collectively, give a compilation of a high level language into a low-level

20 B. Russell

subset of the language. This approach nicely factors the proof of a compiler into a number of much smaller proofs of each transformation.

Further, it has surprised us that one formal definition method could comfor- tably carry proofs about so many different language features. This, coupled with the fact that axiomatic semantics are capable of defining complete languages and of providing correctness proofs of individual programs, strengthens the claim that axiomatic definitions are of great importance.

Also, the proofs we have given do not require any sophisticated mathematical background to understand. Only notions such as proof, substitution, and formal rules and axioms defining the language were needed.

Finally, the difficulty of proving the correctness of a transformation, at least for the examples in this paper, is proportional to complexity of the transformation itself. More directly phrased, simple transformations have simple proofs.

These advantages were obtained by restricting the language to not include recursion or any parameter passing convention other than call by value result. Also, since a partial correctness axiomatic system was used, the method would admit many useless transformations. Addressing these issues forms the basis for future research.

Acknowledgements. The author gratefully acknowledges the constructive comments of J. Morris, S. Kamin and the referees.

References

1. Floyd, R.W.: Assigning meaning to programs. In: Math. aspects of computer science. J.T. Schwartz (ed.), pp. 19-32. American Math Society, Providence, Rhode Island 1967

2. Gerhart, S.L.: Proof theory of partial correctness verification systems. SIAM J. Comput. 5, 355- 377 (1976)

3. Hoare, C.A.R.: An axiomatic basis for computer programming, CACM 12, 576-580, 583 (1969) 4. Hoare, C.A.R., Wirth, N.: An axiomatic definition of the programming language PASCAL. Acta

Informat. 2, 335-355 (1973) 5. Knuth, D.E.: Structured programming with goto statements. ACM Computing Surveys 6, 261-

3O2 (1974) 6. Loveman, D.B.: Program improvement by source-to-source transformation. JACM 24, 121-145

(1977) 7. Milner, R., Weyhrauch, R.: Proving compiler correctness in a mechanized logic. In: Machine

Intelligence 7, B. Meltzer, and D. Mitchie (eds.), pp. 51-70. Edinburgh: Edinburgh Press 1972 8. Russell, B.D.: Implementation correctness involving a language with goto statements. SIAM J.

Comput. 6, 403-415 (1977) 9. Scott, D., Strachey, D.: Towards a mathematical semantics for computer languages. In: Com-

puters and automata, J. Fox (ed.), pp. 19-46. New York: John Wiley 1972 10. Tennent, R.D.: The denotational semantics of programming languages. CACM, 19, 437-453

(1976) 11. Wegbreit, B.: Goal directed program transformation. IEEE Transactions on Software Engineer-

ing, Vol. SE-2, No. 2, pp. 69-80, 1976 12. Wegner, P.: The Vienna definition language. ACM Computing Surveys 4, 5-63 (1972)

Received December 4, 1978; Revised February 8, 1980

correctness of the compiling process based on axiomatic semantics

Documents