assignment commands with array references

Assignment Commands with Array References

PETER J DOWNEY

The Pennsylvama State Umverstty, Umverstty Park, Pennsylvama

AND

RAVI SETHI

Bell Laborator:es, Murray Hdl, New Jersey

ABSTRACT. Stratght line programs with assignment statements involving both simple and array variables are considered Two such programs are equivalent if they compute the same values as a function of the inputs. Testing the equivalence of array programs ts shown to be NP-hard If array variables are updated but never subsequently referenced, equivalence can be tested in polynomial time Programs without array varmbles can be tested for equivalence in expected linear t~me

KEY WORDS AND PHRASES semanttcs, array asstgnments, data structures, NP-complete

CR CATEGORIES 5.24, 5 25

1. Introduction

A n array r e f e r ence is e i the r the se lec t ion o f i n fo rma t ion ou t o f an array, as in t:=a [ j ] , or the updat ing o f an array, as in a [ t ] : = j . C o m p h c a t e d express ions and as- s i gnmen t s can be buil t up us ing s e q u e n c e s o f array se lec t ions , array updates , and opera t ions o f t he f o r m a:=b~c , w h e r e 6 is s o m e opera tor .

W e will be in te res ted in efficient a lgor i thms for tes t ing w h e t h e r two p rograms c o m p o s e d o f a s s i gn men t c o m m a n d s c o m p u t e the s a m e va lues A var ie ty o f int r igu- ing e x a m p l e s indicate why efficient equ iva l ence a lgor i thms are hard to find. In the fo l lowing programs , c and d are ass igned the s a m e value:

a [ t ] : = 3 ; b [ t ] : - -3 ; a [j] :=2; b [j] :--2; c:=a[t]; d:=b[t];

Both c and d are ass igned the va lue o f i f t = j t h e n 2 e l se 3. N e s t e d condi t iona l express ions are e n c o u n t e r e d if c or d is subsequen t ly used as In: a [ t ] : = 2 ; a [ j ] : = c ; e : = a [ t ] . Is it obv ious that e is ass igned 2?

F r o m our earl ier e x a m p l e s we know that c is ass igned i f ~=j t h e n 2 e l se 3, whi le

Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and tts date appear, and nottce is gtven that copying ts by perm~sston of the Association for Computing Machinery To copy otherwise, or to republish, requires a fee and/or specific permission A preliminary version of this paper was presented at the 17th Annual Symposium on Foundations of Computer Science, Houston, Texas, October 1976 The work of P J Downey was partially supported by NSF grant MCS75-22557 Authors' addresses P J Downey, Department of Computer Science, The University of Artzona, Tucson, AZ 85721, R. Sethl, Bell Laboratories, Murray Hill, NJ 07974 The authors provided camera-ready copy for this paper. © 1978 ACM 0004-5411/78/1000-0652 $00.75

Journal ofthc Assoclatlon for Computing Machinery, Vol 25, No 4, October 1978, pp 652-666

Assignment Commands with Array References 6 5 3

e gets if t=j then c else 2 Putting these expressions together, e is assigned if t=j then 2 else 2, which simplifies to 2.

In the above programs, the effect of sequences of select and update commands is described using equably condmonals which are conditional expressions of the form t f A = B then C else D in which the only predicate that can occur is a test for equality.

Such equality conditionals have long been used to describe array assignments. The use of equality conditionals is implicit in McCarthy's [21] and Kaplan's [16] program semantics using state vectors. Burstall's [5] semantics of list assignment uses equality conditionals to update functions representing a list. Axioms for array assignments using such conditionals are given in [12,14,10,19]. See also [23], where axioms for assignments to a class of directed graphs are justified with respect to an in- terpretive model.

The interesting twist (Theorem 3.2) is that equality conditionals can be simulated by select and update commands. We show in Section 3 that the equivalence problem for programs with equality conditional assignments is NP-hard. It follows immedi- ately that equivalence of programs with array select and update commands is NP- hard as well (even if the programs involve just one array with two elements!). The reader is referred to [1] for a discussion of NP-hard problems.

The study of equivalence of programs with arrays is motivated by the need for expression simplification algorithms in the areas of program optimization, automated program verification, and program testing by symbolic execution.

For example, the problem arises m the interactive systems for debugging and testing programs described by Boyer et al. [4] and King [17]. These systems symbolical- ly execute the program being tested, maintaining formal expressions instead of values in the locations of the program. In the course of symbolic execution, the need arises to simplify expressions as far as possible, and to test for equality between expressions. We will be interested here in the complexity of testing equality between expressions involving array references.

2. The Model

2.1 SYNTAX. The syntax of programs in this paper is the same as that in Aho and Ullman [3] and Hoffmann and Landweber [13].

Let S and A be countable sets of stmple and array names, respectively.l As a general rule A,B, • . . denote simple names and c~,/3, . . • denote array names. Infor- mally, associated with a simple name will be a value from some value set V. Array names will be functions from V to V. A special operator (dot) will be used to select particular elements of an array, as in of.A, which may be thought of as the value stored at the element of ~ pointed to by the current value of A.

For "operate" commands, we will use a finite set O of operator symbols. Associ- ated with each operator ~ in O is an integer r >/1, called the rank of ~b.

A command is a string of symbols having one of the following three forms: 1. operate A ~ qb B1B2 • " • Br, 2. select ,4 .--- ee.B, 3. update e~.,4 ~-- B ,

where ,4 , B , B i . . . . . Br are elements of S, c~EA, ~bEO, and r is the rank of ~b. Expresstons are built up from 0 and S in the usual way. A program 1r is a triple ( P , I , U), where P is a finite sequence of commands, and I

~We use the term "array" rather than the phrase "structured variable," since we will use the property of array references that a[t] and a[j] rather refer to the same element, or they refer to distinct elements This property does not hold for references to certain data structures considered by Burstall [5] and Park [241

654 P. J. DOWNEY AND R. SETHI

and U are finite subsets of SUA. I and U may be thought of as input and output names.

We defer precise definitions of "va lue" and "equivalence" to Section 2.3 on semantics. Intuitively, two programs are equivalent if they map inputs to outputs in the same way under all interpretations of the operators.

2 2 THE PROBLEM. Section 3 shows that the problem of testing the equivalence of programs containing select and update commands alone is NP-hard. Therefore, in order to explore the boundary between NP-hard and polynomial equivalence problems, we examine a syntactically more restrictive class of programs Sections 4 and 5 study the equivalence of programs with update and operate commands, Le. those involving no selections from an array. Even for such special programs, proving equivalence raises interesting issues, which are illustrated by the following examples.

Aho and Ullman [3] note that the following two programs are equivalent'

ot-A ~'- B ot'B ~ A a ' B '-- A a 'A " - B

The reason is that if A ;eB, then the two commands can be interchanged, since a.A and a.B must refer to different locations. If A =B, then they become identical commands. When operate commands are also included, other interesting phenomena occur. Suppose that for some inputs and some operator ~b, A is equivalent to ckAB. An example of such an equivalence is a=b+a , when a=2 and b=4. If A is equivalent to 6AB, then it further follows that A is equivalent to 6 6 A B B , and so on. Extending our notation momentarily, the following two programs are equivalent under all interpretations:

a.A ~ ,hC66ABB a .6AB ~--- ,I~CA

a ' 6 A B ~ ,fJCA ec.A .-- ,I~C66ABB

For if A is not eqmvalent to qbAB then the two commands can be permuted. If A is eqmvalent to 6 A B then the right-hand sides are forced to become equivalent

The above examples show that to prove equivalence of even simple array programs, we must develop methods for propagating assumed equalities through expressions. In the foregoing example, the assumed equality A = qbAB implies the equality dJCA = ~hCqbqbABB.

It turns out that algorithms for propagating such equalities are closely related to existing algorithms for what has been called " the common subexpression problem." Cocke and Schwartz [6] give an algorithm for detecting identical subexpressions which we review in Section 5. The running time of this algorithm, q,(n), is linear on the average, but is O(n 2) in the worst case, where n is the number of commands in the program, Using this algorithm as a subroutine, we will give an O(n2,f~(n)) algorithm for the equivalence of programs with operate and update commands.

2.3 REPRESENTATION AND SEMANTICS. NOW that we know what array programs look like (syntax), we must next define what they mean (semantics). In addition, since we are investigating algorithms, we must deal with issues of efficient representation for array programs

In an expression like ( a + b ) / ( c - d ) , the order in which a+b and c - d are computed is not particularly important. Working just with operate commands, Aho and Ullman [2] found it convenient to use a graphical representation of programs that is insensitive to the order in which subexpressions are evaluated. Moreover, m a graph, it is easy to keep track of the "current" value of a name by constructing a separate node for each distinct value. The correspondence between programs with operate commands and their dags (directed acyclic graphs) has been studied by Aho

= (P,l,U)

1 = {A,B,C,~}

655

D~#,4B c~ D--C a A ~ B E ~ a D

U = {e,~}

,4 B


FIG 21 The dag D(Tr) for a program ~r

and Ullman [2] and Culik [9]. The introduction of select and update commands does change the correspondence somewhat, but the flavor is the same.

Figure 2.1 illustrates the dag representation we will use in this paper. The symbols | and ,-- at the nonleaf nodes in Figure 2.1 can be viewed as special operators corresponding to select and update commands, respectively.

Following McCarthy [21], a useful way to view an update command is to imagine that a new array value o~ is computed, based on the old values of o~, A, and B. Nodes labeled ~- therefore have three sons. 2 Nodes corresponding to output values are called output nodes, and are represented by open circles. It is easy to devise a linear time algorithm for constructing the dag D (~') from a program ~r [2].

In order to specify the semantics of dags we will introduce the notions of data and the interpretation of operator symbols in the usual manner (see for example the treatment of flowchart schemes in Manna [20]). The point of departure is that the interpretation of | and ,--, the select and update symbols, is fully specified. We separate data from interpretattons since we wish to consider select and update commands independently of operate commands.

Let V be a set of values, and let [ V r--. V] represent the set of all functions from V r to V, for all r, r >/1. An mterpretatton I assigns a function from [ V r'--. V] to each operator symbol ~b from O of rank r. Data d assigns an element of V to each simple name symbol in I , and an element of [ V---, V] to each array name symbol in the input name set I. Recall that we view each array name as a function that maps "loca- t ions" to values. I and d will be said to have base domain V. The semantics of | and ~- will be specified independently of the particular interpretation and data.

Let 7 r = ( P , I , U ) be a program. A nonleaf node u m D(zr) is called a select, up-

2Luckham and Suzuki [19] express array assignment, assignment to dereferenced pointers, and assignment to Pascal record structures using these select and update operators The "contents" and "assignment" operattons of McCarthy [21] studied by Kaplan [16] are restricted forms of select and update Kaplan assumes that m c~ A and c~ B, A and B can never have the same value since index variables are not assigned to

6 5 6 P . .L DOWNEY AND R. SETHI

date, or operate node if u has label | , . - , or 4~, for some operator symbol 6, respectively.

The value V(u) of a node u in D(~r) under (I,d), sometimes written Vu, is given by: 1. V(u) - -d(X) if u is a leaf with label X.

If X is a simple name, the value of u will be an element of V; otherwise the value will be a function from V to V.

2. V (u )=(16) (VwbVw2 . . . . . Vwr) if u has label 6 with sons wbw2 . . . . . Wr. Here the value of u is the result of applying the interpreted function represented by the operator ~ to the values of the sons of u.

3. W(u)=(Vwl)(Vw2) if u is a select node with sons wl and w2. The value of the select node is found by applying the function represented by the first son to the value represented by the second son.

4. V(u)=hx. i f x=Vw2 then Vw3 else (Vwl)(x) , if wb w2, and w 3 are the sons of the update node u. Thus V(u) is the function which agrees with Vwl on all arguments except Vw2, where it takes on value Vw3.

The value under (I,d) o lD(n- ) is the set consisting of the values of the output nodes of D ( ~ ) under (l,d). Nodes u and w in a dag D(~r) are equwalent under (l,d) if V ( u ) = V ( w ) under (l,d). Programs ~r and ~r' are equwalent under (I,d) if D(w) and D(~r') have the same value under (I,d). Two nodes or programs are strong~ equwalent (written ~ ) if they are equivalent under (I,d) for all (I,d). 3

Unless otherwise stated, we will work with singly rooted dags, in which the root is the only output node. Thus the value of a dag will be the value of its root.

3. Strong Equwalence

The complexity of the equivalence problem for programs depends on the kinds of commands permitted. If all commands are permitted, then dag D(~r) contains operators from OU{|,,---} and names from SUA. We use the notation "programs over (O, | ,~--,S,A)" to denote the fact that operate, select, and update commands are allowed. Similar shorthand is used for other types of programs; e.g. (I,---,S,A) denotes programs having select and update commands, but no operate commands.

The complexity of an algorithm working on a dag D will be expressed in terms of the size, I D I, of the dag, which is the sum of the number of nodes and edges of D.

The first results examine the complexity of the equivalence problem for programs over ({©},S), where @(A ,B ,C ,D) is if A =B then C else D.

THEOREM 3.1. Let P and Q be programs over ({©},S). Let mterpretatton I map ® to huvwx, t f u=v then w else x. Determmmg tf P and Q are mequivalent under (I,d), fi~r all d, is an NP-complete problem. Moreover the problem ts NP-complete even if the base domam of I and d t s {0,1 }.

PROOF. Let 3-SAT be the problem of determining if a Boolean formula in conjunc- tive normal form with three literals per clause is satisfiable. We will reduce 3-SAT tO the ineqlaivalence problem for programs. We give an example reduction, leaving the rest to the reader. Consider the Boolean formula (yl-kY24-Y3)'(yl+Y2-t-y3). We will construct programs P and Q that are not equivalent if and only if the formula is satisfiable. Program P is as follows:

comment: A, and B, correspond to y, and fi,, respectively. T will be used to test the truth or falsity of literals represented by A, or B,. If at least one of the literals in clause j is true, then C: will be assigned Y; otherwise C: will be assigned N. D will ensure that all clauses are true.

3 Different notions of eqmvalence appear m Aho and UIIman [3] and Hoffmann and Landweber [13] Two programs eqmvalent m either of these senses are strongly eqmvalent, but not conversely


I = {T, Y,N,O,1,D,Ai,B1,A2,B2,A3,B3,Ci,C2}

Cl "--- if A 1 = T then Y else N C1 ~'- if B2 = T then Y else Cl C1 - - if A 3-- T then Y else Cl

C2 ~--- if Bi=T then Y else N C2 - - if A 2 = T then Y else C2 C2 - - if A 3 = T then Y else C2

D ,-- if Ci=N then 0 else 1 D ~--- if C2=N then 0 else D

comment: The remaining commands ensure that A, ~B,.

D ~ if A l=B1 then 0 else D D ~- if A2=B2 then 0else D D ,--- if A3=B 3 then 0 else D

U = {D}.

657

Program Q is given by I--{0}, U={0} and has no commands. For the two programs to have different values under (I,d), it must be true that the output name D in program P has the same value as 1. For D to have the value of 1, it must be true that A,~B, and Cj~N. Moreover, the data function d must assign distinct values to Y and N as well as 0 and 1. Thus P and Q are inequivalent if and only if the Bool- ean formula is satisfiable. Consider the base domain V of I and d. Since all we need are distinct values for Y and N, 0 and 1, A, and B,, the result holds even for v = {0,1}.

Finally, we observe that the inequivalence problem is in NP by constructing a nondeterministic machine M. Given programs P and Q, with a finite set {A,B, • • • } of input names between them, M guesses which name pairs are equal in value: All other pairs are assumed unequal. M then verifies that this set of equalities and inequalities is consistent, and using these relations, executes P and Q to see if they have unequal values. The process takes polynomial time. []

COROLLARY 3.1. The equwalence problem for programs over ({~},S) is NP-hard. PROOF. Equivalence and ineqmvalence are complementary problems. Any prob-

lem is polynomial Turing reducible to its complement [18]. Since the inequivalence problem is NP-complete and reduces in polynomial time to its complement, the equivalence problem, it follows that the equivalence problem must be NP-hard. []

It is unlikely that the equivalence problem is m NP, since NP-completeness of two complementary problems implies that the sets NP and coNP = {SISENP} coincide [18] It is widely conjectured that NP ~ coNP.

Constable et al. [8] give a result similar to Theorem 3.1. They consider loop free programs with conditional branching controlled by arbitrary predicate symbols, allow- ing assignments to simple variables. For such programs, they show that the inequivalence problem is NP-hard. In Theorem 3.1, the only predicate is equality and there is no flow of control, since the result concerns sequences of assignments. In other words, the Constable et al. result concerns conditional statements, whereas Theorem 3.1 concerns conditional expressions.

From the examples in Section 1 and the definition of value for update nodes ear- lier in this section, the reader may have noted that it is possible for sequences of select and update commands to simulate the equality conditional operator @. The interesting point is that all we need is one two-element array ~ to carry out the simula- tion.

658 p . J . DOWNEY AND R. SETHI

THEOREM 3.2. Let P and Q be programs over (l,--,S,{c~}). Determining i f P and Q are inequivalent under (I,d) for all I and d is an NP-complete problem. Moreover the result is true even if the base domain of l and d is {0,1}.

PROOF. In order to prove this theorem we will show that a sequence of select and update commands using one array o~ can simulate the reduction in the proof of Theorem 3.1. The command E ~ - ® ( A , B , C , D ) can be simulated by the following sequence: a.A - - D; c~-B *- C; E *- ~x.A. Each command in the proof of Theorem 3 1 can be replaced by a sequence of three commands as outlined above. Thus we can reduce satisfiability of Boolean expressions to inequivalence of programs over ( I , - - ,S ,A) .

The inequivalence problem can be shown to be in NP either by showing that any sequence of select and update commands can be simulated by a sequence of operate commands over the equality conditional operator, or directly by constructing a machine M much as in the proof of Theorem 3.1. []

COROLLARY 3.2. The equivalence problem for programs over (l,,'--,S,{ct}) ts NP-hard. PROOF. As in Corollary 3.1. [] Let us review the basic reduction used in the proofs above. Note that (i) no "ar-

ithmetic" on the indices or values of the arrays was used - the original input values are simply moved about in the array c~ and in the index variables, and (ii) no "in- direct addressing" through the array was used. But what ff we restrict the index values to be of one mode (say mteger) and the array values to be of another (say real), as would happen in numerical analysis programs? (Our attention was drawn to mode conflicts by van Leeuwen [26].)

We argue that such a separation of modes makes no difference -- equivalence is still intractable. In the reduction of Theorem 3.1, A,,B, and T must be of the same mode since they participate in equality tests. Similarly, Cj, Y, and N must be of the same mode, and D, 0, and 1 must be of the same mode An examination of the reduction shows that if each name had an associated mode, no mode conflicts would occur. 4

With the negative results out of the way, we turn now to algorithms for determining eqmvalence of programs.

4. Charactertzaoon of Eqmvalence

Operate commands force us to confront the substitution property of equality illustrated by the following implication: If a=b+a , then a = b ~ - ( b + a ) = b + ( b + ( b ~ a ) ) , and so on. The next example shows why this property arises in program equivalence.

Example 4.1. Consider the following two equivalent programs, where I={A,B} and U={a}.

71"1: ot'A "-'- OAB zr2: a.OAB ,-- OOABB a.OAB "--- OOABB ol.B "" OBA a .B '--- OBA a.A "--- OAB

Under a given interpretation (I,d), either A and OAB have the same value, or A and O.4B have different values. In the first case, it follows that OOABB has the same value as OAB, which of course has the same value as A. In this case, the first two instructions in ,r I are both equivalent to a.A ~ A , so they can be exchanged In the second case, A and OAB have different values, so interchanging the two com-

4 Another relevant observation is that the dag of the program m the proof of Theorem 3 ! ~s a tree A

simpler reduction, m terms of the number of distinct modes reqmred, ~s possible ff the underlying graph ~s not restricted to being a tree The structure of the dag used in the reductton becomes tmportant when the boundary between polynomial and NP-hard equivalence problems is explored [25]

Assignment Commands with Array References 659

mands does not affect the function which is the final value of or. Thus, ~ - l ~ - , where ~- is c~.OAB ,-.- OOABB; a.A ~ OAB; c~.B , - OBA. Finally, we argue similarly by cases that the last two instructions in ~r can be exchanged in any interpretation, yielding zr 2. Thus ~'1~7r2. []

By formahzing the type of argument suggested by Example 4.1, we will find that, as long as programs contain no select commands, equivalence can be tested in polynomial time.

The difference between programs ~r~ and ~r 2 in Example 4.1 lies in the order in which particular updates of the array a take place. The extent to which updates can be reordered will be expressed by a logical formula, which can form the basis of an equivalence algorithm.

Let C be a singly rooted dag over (O,,---,S,A), that is, a dag without select nodes. Since an operate node cannot be the father of an update node, and there are no select nodes, the update nodes in C must form a "chain." Moreover, if there are any update nodes in C, then the root of C must be an update node. Henceforth we assume that C does indeed have update nodes.

J

Ot E n G n ~ F m U m

C D

FIG 41 A sketch of the relatmonshlp between dags C, D, E~, G~, F~, Hj

Definmon 4 1. (Refer to Figure 4 1.) Let u0 be the update node that is the root of C. For 1>/0, if u, is not a leaf, then let the sons of u, be U,+l,V,+l, and w,+l. Since all dags are finite, for some n, un must be a leaf Let the leaf un have label c~. For all 1, 1 ~<1~< n, let E, and G, be the subdags of C rooted at v, and w,, respectively. []

Some of the properties of C and its subdags are collected in the following lemma: LEMMA 4 1. Let C be a singly rooted dag over (O,,--,S,A). Then C must be of the

form given m Definmon 4.1, where E, and G,, 1 ~< i ~< n, are dags over (O,S) having on~ operate nodes and leaves.

PROOF. Immediate from the above discussion. [] Convenoon. Given an interpretation (l,d), the corresponding lower case italic

letter c , d , . . , denotes the value of the dag C , D , . . . . Thus c is simply convenient shorthand for V(C). []

The following lemma shows that array programs can be directly translated into expressions involving operators and equahty conditionals.

LEMMA 4.2. Let (l,d) be an interpretation, and let C be as m Defimtion 4.1. Let t be a variable taking values J?om the value set V. Then the value of C under (I,d) ts the function defined by


c(t) = i f t=el then gl

elseif t=e2 then g2

• • .

elsetf t=e, then gn

elsetf t=en+l then gn+l

where e ,+ l= t and g,+l=(dct) (t).5 PROOF. Immediate f rom the definition of the value of a dag in Section 3. [] The next lemma characterizes equivalent dags over (O,,---,S,A), and reduces the

equivalence problem to the validity of a simple logical formula. In the lemma we consider a dag C as in Defimtion 4.1, and another singly rooted dag D defined below.

Definmon4.2. Dags C, E,, and G,, l~<l~<n, are as in Definition 4.1. Let D, with update node x0 as its root, be a dag over (O, - - ,S ,A) . For all j , OR<jR<m-I, let the sons of x: be X:+l, Yj+l, and z:+ 1, and let Xm be a leaf with label/3.

For all j , l~<j~<rn, let F: and H s be the subdags of D rooted at y: and z:, respectively. []

LEMMA 4.3. Let dags C and D be as m Definmon 4.2. Let t be a vartable taking values Aqom V, and define en+l=t=f m+ 1 and g~+l=(dct) ( t)=hm+ 1. Then CuD'ho lds tf and on~ tf for all t , j , 1 ~< t ~< n + 1,1 ~< j ~< m + 1, the followmg condmon K ( t , j ) holds:

K ( i , j ) : For all (I ,d), ff e ,=f : then at least one of thhe followmg holds: e t = e l , • • • , e , = e , - l ,

e,=fl , • . . ,e,=f:_l,

g,----t~

PROOF. Suppose C~-D. Then for all (l,d) and t in the value set V, c ( t )=d( t ) . Choose (I,d), t, and j arbitrarily, and suppose e,=f:, but e, and f : are not equal

to any of el, • "" , e , - l , f l , • • " , fs- l . Choosing t=e,=fs, from c ( t )=d( t ) and Lem- ma 4.2, it follows that c(t)=c(e,)--g,, and d( t )=d( f : ) - -h: , so g,=h: must hold.

For the converse, suppose for all i and ./, K ( t , j ) holds. Choose (I,d) and t to be arbitrary. Let ~ and j be the minimal integers such that t=e, and t--f:. Such t and j exist since by definition en+l=t=fm+ 1. By minimality of t and j , e, is distinct f rom el, • • • , e,-i and f j is distinct f rom f l , " " " , f : - l . Since t=e,=f: and K ( t , j ) holds, we must have g,=h:. Then c(t)--g,=h:~d(t) (Note that the last statement holds even when t is n + l and j is m + l . ) Since t is arbitrary, C u D follows. []

Lemma 4.3 forms the basis for an equivalence testing algorithm. Gtven two programs, we merely need to be able to check polynomially many formulas of the form:

for all (I ,d), e = f implies g = h.

5. Equwalence AIgorahms

It is often the case in semantics that to show equivalence of programs under all interpretations, it suttices to check that equivalence holds under a free interpretation. Lemma 5.1 f rom Aho and Ullman [21 or Culik [91 is such a result.

5 A result stmdar to Lemma 4 2 holds even when select commands are permitted Given a dag C over (O, I ,~ ,S ,A) , explomng the connection between select-update commands and equahty condmonals, we can find an eqmvalent dag D, C--=D, over (Ot.A{©},I,S,A), where ® is the equahty condmonal operator When no updates take place, an array name just represents an arbitrary one argument functton (chosen of course by d) Array names can then be treated just hke elements of 19 Distributing operators m O over equahty condltmnals, we get a characterization of the dag C stmilar to the one in Lemma 4.2, except that the equahty condmonals have a bushier tree structure than the form in Lemma 4 2


a A a OAB

OAB

661

FIG 5 1 Trees over operate nodes have been written as prefix expressions to Conserve space

Let u be the root of a dag D. The tree for u is fo rmed by unbraiding the subdag at u, making copies of all shared subdags. More precisely, for a leaf u, ~- (u)=u, and if u has label qJ and sons wl, • • • ,wr, then r (u)=OZ(Wl) • • • l"(wr).

LEMMA 5.1. Let u and x be nodes m a dag D over (O,S) . Then u--:x i f and on~ tf • ( u ) = ~ ( x ) . []

This l emma is needed for the proof of algori thm correctness; we do not use trees as data structures in the algorithms.

Example 5.1. The trees shown in Figure 5.1 correspond to the programs ~r I and ~r2 of Example 4.1. In this example , the value of a tree like OOABB will be writ ten as OOabb.

On the basis of L e m m a 4.3, the trees in Figure 5.1 are strongly equwalent if and only if each of the following implications holds.

For all ( l ,d) :

K ( I , 1 ) : ( b = a ) D (Oba=Oab)

K(1 ,2 ) : ( b = a ) ~ (b=a)v (Oba=Oba)

K(1 ,3 ) : (b=Oab) D (b- -a) \ / (b=b) \ / (Oba=OOabb)

K(1 ,4 ) : ( b = t ) D ( b = a ) v ( b = b ) \ / ( b = O a b ) v ( b = ( d a ) ( t ) )

K(2 ,1 ) : (Oab.~a) D (Oab=b)v(OOabb-~Oab)

K(2 ,2 ) : (Oab=b) D (Oab=b) \ / (Oab=a) v (OOabb=Oba)

• , .

K(4,4)" ( t = t ) D ( t = b ) v ( t = O a b ) \ / ( t = a ) v ( t = a ) v ( t = b ) v ( t = O a b )

v ( (da ) ( t ) = ( d a ) ( t ) )

Each implication is found to be valid. [] Note from Lemma 4.2 that all the dags E,,G,,Fj ,Hj are dags over (O,S) . Since

trees for these dags play a significant role in an equivalence algori thm, we will introduce some shor thand notat ion for trees.

Given dags C , D , • • • , with roots u,v , • • • , we write the corresponding boldface letter c,d, • • • for tree ~'(u) for C, , ( v ) for D, • • • Collecting our convent ions , c is the tree for C, and c is the shor thand for V ( C ) , which always equals V(c). It makes sense to write V(e) since a tree is also a dag.

In order to construct an algori thm on the basis of L e m m a 4.3, for t rees e , f ,g ,h ,


containing operate nodes alone, we need a way of determining whether e = f implies g=h under all (I,d). This is a version of the "uniform word problem" studied in [11,221.

Definition 5.1. Let e,f ,g,h be trees over (O,S). The relation ~ ( e , f ) is defined by:

g ~(e , f ) h if and only if for all (I,d) e = f implies g--h

If g ~(e,f)h we say that g and h are congruent. For nodes u and x in a dag over (O,S) we say that u ~(e,f) X whenever the trees for u and x a r e ~(e.f)- []

Before we give an algorithm for detecting congruent subtrees, we need an algorithm for detecting identical subtrees, or "common subexpressions." One solution is to use the value number algorithm of Cocke and Schwartz [6].

Beginning at the leaves, the algorithm works toward the roots, assigning an integer called a value number to each new node encountered. Let node x have label 0 and son nodes xl, • • • , xr (r may be zero if x is a leaf). Then an entry is made on the AVAILABLE COMPUTATIONS LIST o f t h e f o r m <0VN(Xl) . . . VN(Xr),VN(X)>. E a c h

operation subsequently encountered is checked against the AVAILABLE COMPUTA- TIONS LIST to determine whether or not an identical operation has previously been performed; if so, the operation in question is redundant, and its node receives the previously recorded value number; if not, then a new value number is created, and a new entry is inserted in the AVAILABLE COMPUTATIONS LIST.

The simplest form of this algorithm is shown in Figure 5.2. A refinement usually employed is to introduce a hash table, which when accessed by keys of the form (0,It, • • • ,lr), retrieves a pointer to this entry in the AVAILABLE COMPUTATIONS LIST. With these modifications, the algorithm will run in time O(~(n)) where ~ ( n ) is the

comment'/* Input" A dag D over (O,S) Node x m D has operator symbol OP(x) and pointers to its sons m left to

right order. Output" An array VN, where V N ( x ) = V N ( y ) tff the expressions computed by these nodes are identical Structures: N X ( D ) enumerates nodes of D m topological order, starting with the leaves It returns null

when there are no more nodes ACL is the avadable computations list. COUNT is an integer used to provide value numbers */

lmtlalize VN to zero; COUNT.=0; ACL = n u l l ,

call MK(D,ACL,COUNT,VN) procedure MK(D,ACL,COUNT,VN)

x.~NX(D), while x~null do

o FOP(x), let /1, • • , I r be the VNs of the sons of x , left to right /* search loop */ FOUND = f a l s e ;

if the ACL has an entry 011, • . • ,Ir,

then return the value number / assocmted with this entry and set FOUND to true

if FOUND then VN(x)'=I else {

COUNT.=COUNT+I, VN(x)-=COUNT, insert item < O l l , • . ,Ir,VN(x) > on ACL }

x:~NX(D); end

end MK

FtG 5.2 Value number algorithm

Assignment Commands with Array References 663

comment /* Input x e and xf are the roots of dags E,F with trees e,f over (O,S) F is assumed to have at least as

many nodes as E. D Js the dag to be marked (E and F may be subdags of D) Output Array VN such that VN(x)=VN(y) if and only if x ~(e,f)Y

*/ procedure CMK (D ,xe,X f )

mltmhze VN to zero, COUNT.=0, ACL =null, call MK(E,ACL,COUNT,VN), call MK(F,ACL,COUNT,VN), alter the item on ACL for xf to record a value number equal to VN(xe);

VN (xf) =VN (x e), /*The only two items on ACL with the same value are those corresponding to x e and xf */ call MK(D,ACL,COUNT,VN) end CMK

FIG 5 3 Algorithm to detect congruent nodes

time to retrieve n items from the hash table, and n=lD I. Of course, ~t'(n)=O(n 2) in the worst case, but typical hashing methods yield an expected value of O(n) for qt (n) , as long as the hash table is not too full. By using this algorithm, equivalence of programs over (O,S) can be tested in O(*(n)) ume. The same algorithm serves to solve the equivalence problem for dags over (O,I ,S ,A) in time OOI'(n)) time, for if arrays are never updated, they may be regarded as functions of one argument.

A variation of the value number algorithm may be used to determine all congruent nodes of a dag, subject to e=f. The algorithm is given in Figure 5.3.

LEMMA 5.2. At the termmaoon of AIgortthm CMK on dag D,

VN(y)=VN(z) tf and only tf y ~(e,f) z

PROOF By induction on the sum of the hetghts of nodes y and z. The result is clearly true if nodes y and z are leaves. Assume for nodes y and z that the result is true of their sons. y and z are given the same value number by CMK iff either (i) oP (y )=oP(z ) and the corresponding sons of y and z have identical value numbers, or (ii) OP(y)=OP(Xe), corresponding sons of y and Xe have identical value numbers, oP(z) =oP(xf ) , and corresponding sons of z and xf have identical value numbers.

By the inductive hypothesis, these cases yield (i) o P ( y ) = o P ( z ) and the corresponding sons of y and z are congruent, or (ii) or'(y)=oP(Xe), corresponding sons of y and Xe are congruent, oP(z)=oP(xf), and corresponding sons of z and xf are congruent. This condition for the two cases is equivalent to y -~(e, f) Z. []

Repeated applications of Algorithm CMK will suffice to test almost all the conditions K( t , j ) . Only the conditions K(t , j ) with t = n + l or j = m + l involving a remain. The following lemma shows that these conditions can easily be dealt with.

LEMMA 5.3. Let dags C and D be as m Definttton 4.2. Then the condmons K(n+l , j ) , l~< j~<m+l , and K(t ,m+l) , l~<t~<n+l , all hold tf and onO, ifa=[3 and the sets of index trees {el, • • • ,e,} and {fi, " " " ,fro } are tdenucal.

PRoov. Dtrect f rom the condttlons. [] Basically, the above lemma says that the two dags must update the same elements,

and the arrays must be identical outside the elements updated. The only way the arrays can agree everywhere else is for them tO be the same array initially.

The above observations are collected in Figure 5.4 THEOREM 5.1. Algortthm TEST correct~ decMes whether C=--D m time O(k2W(k)),

where k=lCl+lol. PROOF. The correctness follows directly from Lemmas 4.3, 5.2, and 5.3.

6 6 4 P . J . DOWNEY AND R, SETHI

comment:/* Input: Dags C and D over (O,-- ,S,A) as in DefiniUon 4.2. Nodes xe , , xg , x f j ,xh~ are the roots of sub-

dags over (O,S) with associated trees e, ,g, , f : ,hj , respectively. Let U be the union of E and F without the update nodes. Thus U is a dag over (O,S).

Output If C ~ D then true else false. */

procedure TEST ( C , D )

/* Check that ~=/3 and [e 1 . . . . en} = [ f l , , fro} .1 0 if a # B then return ( false) . 1 Use the value number algorithm MK on U to determine the strongly equivalent nodes, 2 Check that for every node m {xe 1 . . . . . xe n} there is a strongly equivalent node m

{ x f l, , x fro }, and conversely If not, return ( fa lse)

/* Check that all the condmons K ( t , j ) hold * /

for all ( / , j ) , l~<t~<n, l~<j~<m, do FLAG = fa l se , 3. call C M K ( U , x e , , x f j ) ; 4 if there extsts zE{xe t , , x e , _ t , x f b . . , x f j _ l} such that z ~ (e , f ) xe, or xg, ~-(e,f) xhj

then FLAG = true if FLAG = fa lse then re turn( fa lse)

end /* This point is reached if and only if all K( i , j ) are true.*/ re turn ( true ) end TEST

FIG 54. Equivalence testing algorithm

Line 1 of the algorithm takes O ( W ( k ) ) time. Bucketing the nodes xet , " ' ' , xe~ ,x f t , ' ' " , x fm , and checking that each bucket with nodes has at least one node from each of C and D, will ensure that the time for line 2 is bounded by O ( k ) . Each execution of line 3 takes O ( ~ ( k ) ) , and line 4 can be implemented in constant time, so the loop is bounded by O(k2q t ( k ) ) . []

When operate commands are excluded, we no longer need to check for congruent expressions, so equivalence can be tested faster.

THEOREM 5.2. Let C and D be dags over (O,,---,S,A), and let k--Icl+lol. Equivalence can be decided m ttme 0 (k2).

PROOF. For each of the at most O ( k 2) pairs ( t , j ) , deciding g ( t , j ) takes constant time since CMK need not be called in the loop in Figure 5.4, and MK need not be called in line 1.

6. Conclusion

In work on the relationships between different classes of flowchart schemata it has been noted that arrays are a "powerful" construct. Constable and Gries [7] show that their class of schemes with arrays properly includes their class of schemes allow- ing recursive functions. Johnson [15] considers a language with arrays but no conditional branching. Johnson gives results "characterizing selection in array references as at least as powerful computationally as conditional branching in programs."

In this paper we have focused on programs with no looping or branching, but as examples in Section 1 show, a limited form of conditional assignment which is a result of array references. A dag model for straight line programs with operators and array assignments has been introduced. Two such programs or dags are equivalent if their behavior is the same under all interpretations.


TABLE I

Time complexity Problem: equivalence of dags C,D with k = IcI + IDI

operate commands only (@,S) operate and select commands (O,|,S,A) update commands only (--,S,A) operate and update commands (O,,--,S,A) update and select commands Q--,I,S,A) update, select, and operate commands

(O,--,I,S,A)

O(*(k)) O(xF(k)) O(g 2) O(k2xF(k) ) NP-hard NP-hard

665

Since select and update commands can simulate equality conditionals, the general problem of equivalence proves to be hard. Operate and update commands lead to a substitution phenomenon that is explored in Sections 4 and 5. The results are sum- marized in Table I. When more elaborate algorithms are employed to avoid hashing, the factor qt(k) can be improved to k in each case. Such algorithms are explored in [111.

ACKNOWLEDGMENTS Comments by A.V. Aho, D.B. Johnson, S.C. Johnson, and M.D. McIIroy are appreciated.

REFERENCES 1 AHO, A V , HOPCROFT, J E , AND ULLMAN, J D

2

3

4

The Demgn and Analysts of Computer Algorithms Addison-Wesley, Reading, Mass 1974 AHO, A V , AND ULLMAN, J D Optimization of straight hne programs SIAM J Compunng 1, 1 (March 1972), 1-19 AHO, A V , AND ULLMAN, J D Equwalence of programs with structured variables JCSS 6, 2 (April 1972), 125-137 BOYER, R S , ELSPAS, B, AND LEVITT, K N SELECT -- A formal system for testing and debuggmg programs by symbolic execution International Conference on Rehable Software, April 1975, pp 234-245

5 BURSTALL, R M Semantics of assignment Machme lntelhgence 2, American Elsevier, New York, N Y , 1968, pp 3-20

6 COCKE, J , AND SCHWARTZ, J T Programmmg Languages and Their Compders, Prehmmary Notes, Second Revised Version Courant Institute of Mathematical Sciences, New York, N Y , April 1970

7 CONSTABLE, R L , AND GRIES, D On classes of program schemata SIAMJ Compunng 1, 1 (March 1972), 66-118

8 CONSTABLE, R L , HUNT, H B I11, AND SAHNI, S On the computational complexity of scheme equivalence 8th Annual Prmceton Conference on Information Sciences and Systems, March 1974, pp 15-20

9 CULIK, K Combmatorlal problems in the theory of complexity of algorithmic nets without cycles for simple computers Aphkace Matemanky 16 (1971), 188-202

10 DE BAKKER, J W Correctness proofs for assignment statements Mathematlsch Centrum Report lW 55/76, March 1976

11 DOWNEY, P J , SAMET, H , AND SETm, R Off-hne and on-hne algorithms for deducing equahtles Fifth Annual ACM Symposmm on Prmclples of Programmmg Languages, Tucson, Arizona, Jan 1978, pp 158-170

12 HOARE, C A R , AND WIRTH, N An axiomatic defimt|on of the programming language Pascal Acta Informanca 2 (1973), 335-355

13 HOFFMANN, C M , AND LANDWEBER, L H A completeness theorem for straight-lme programs with structured variables J ACM 23, 1 (Jan 1976), 203-220

14 |GARASHI, S , LONDON, R L , AND LUCKHAM. D C Automatic program verification 1 A logical basts and its tmplementat |on Acta lnformanca 4, 2 (1975), 145-182

15 JOHNSON, D B On the power of arrays m umversal languages Seventh Annual Princeton Confer- ence on Information Scmnces and Systems, March 1973, pp 292-296


16. KAPLAN, D M. Some completeness results in the mathematical theory of computation. J. ACM 15, 1 (Jan 1968), 124-134.

17. KING, J.C. Symbolic execution and program testing. Comm. ACM 19, 7 (July 1976), 385-394 18 LADNER, R.E., LYNCH, N.A., AND SELMAN, A.L. A comparison of polynomial time reducibilities.

Theoremal Computer Science 1 (1975), 103-123 19. LUCKHAM, D.C., AND SUZUKI, N. Automattc program verification V. Verification-oriented proof

rules for arrays, records and pointers. Stanford AI Lab Memo AIM-278, March 1976. 20. MANNA, Z. Mathemattcal Theory ofComputatton. McGraw-HlU, New York, N.Y., 1974 21. Mc CARTHY, J Towards a mathemattcal science of computation IFIP 62, 1962, pp 21-28. 22 NELSON, G., AND OPPEN, D C. Fast decision algorithms based on union and find 18th Annual

Symposium on Foundations of Computer Science, Oct 1977, pp. 114-119. 23 OPPEN, D,C., AND COOK, S A Proving assertions about programs that mampulate data structures

7th Annual ACM Symposium on Theory of Computing, May 1975, pp. 107-116 24 PARK, D. Some semantics of data structures Machine lntelhgence 3, American Elsevier, New

York, N Y., 1968, pp 351-371 25. SETm, R Condmonal expressions with equahty tests £ A CM 25, 4 (Oct. 1978), 667-674 26 VAN LEEUWEN, J What makes some simple program optimization problems hard Tech Rep 206,

Computer Science Dept, The Pennsylvania State University, University Park, Pa, Aug. 1976

RECEIVED AUGUST 1976, REVISED FEBRUARY 1978

Journal of the Association for Computing Machinery, Vol 25, No 4, October 1978

assignment commands with array references

Documents