safety and translation of relational calculus
TRANSCRIPT
Safety and Translation of Relational CalculusQueries
ALLEN VAN GELDER
University of California at Santa Cruz
and
RODNEY W. TOPOR
The University of Melbourne
Notallqueries inrelational calculus can beanswered sensibly when disjunction, negation, and
universal quantification are allowed, The class of relation calculus queries or formulas that have
sensible answers is called the domam independent class which is known to be undecidable.Subsequent research has focused on identifying large decidable subclasses of domain indepen-dent formulas. In this paper we investigate the properties of two such classes: the et,aluable
formulas and the allowed formulas. Although both classes have been defined before, we give
simplified definitions, present short proofs of their main properties, and describe a method toincorporate equality.
Although evaluable queries have sensible answers, it is not straightforward to compute them
efficiently or correctly, We introduce relational algebra normal form for formulas from whichform the correct translation into relational algebra istrivlal. We give algorithms to transform
anevaluable formula into an equivalent allowed formula and from there into relational algebranormal form, Our algorithms avoid use of the so-called Dom relation, consisting of all constants
appearing in the database or the query.Finally, we describe a restriction under which every domain independent formula is evaluable
and argue that the class of evaluable formulas is the largest decidable subclass of the domainindependent formulas that can be efficiently recognized.
Categories and Subject Descriptors: F,4. 1 [Mathematical Logic and Formal Systems]: Mathe-
matical Logic—modet theory; H.2,3 [Database Management]: Languages—query ianguages;
H.2.3 [Database Management]: Systems–query processing
General Terms: Algorithms, Languages, Theory
Additional Key Words and Phrases: Allowed formulas, domain independence, evaluable formu-
las, existential normal, query translation, relational algebra, relational calculus
A preliminary version of this paper was presented at the Sixth ACM Symposium on Principles ofDatabase Systems, 1987. The research of Allen Van Gelder was supported in part by a grant
from the IBM Corporation and NSF grants 1ST-84-12791, CCR-8958590, and IR1-8902287.
Authors’ Addresses: R. W, Topor, Department of Computer Science, The University of Mel-
bourne, Parkville, Victoria 3052, Australia; A, Van Gelder, Baskin Computer Scien;e Center,University of California, Santa Cruz, CA 55064.Permission to copy without fee all or part of this material is granted provided that the copies arenot made or distributed for direct commercial advantage, the ACM copyright notice and the title
of the publication and its date appear, and notice is given that copying is by permission of theAssociation for Computing Machinery, To copy otherwise, or to republish, requires a ‘fee and/or
specific permission.@ 1991 ACM A0362-5915/91/0300-0235 $01.50
ACM Transactions on Database Systems, Vol 16, No 2, June 1991, Pages 235-278.
236 . A Van Gelder and R. W. Topor
1, INTRODUCTION
With the increased interest in development of deductive
and integration of logic programming languages such as
database systems
Prolog with rela-
tional database systems, it has become more important that relational query
systems be able to handle a wider range of relational calculus formulas
correctly and efficiently. In particular, disjunction, negation, and universal
quantification over subformulas, which are excluded from the class of con-
jurzctiue queries [26], should be available. Current “industrial strength”
implementations handle the class of conjunctive queries well but leave much
to be desired in the areas mentioned; we shall give an example later. In
defense of these implementations, we should point out that the large majority
of queries posed by typical users to traditional databases fall into the class of
conjunctive queries.
However, in sophisticated systems of the future we envision the queries
often being generated not by the user typing them in at the terminal but by a
layer of software positioned between the user and the relational database
system. This software will access a large set of deductive rules in addition to
the user’s query in order to construct relational calculus formulas. The Nail!
project at Stanford University [18] and the LDL project at MCC [24, 19] are
two examples of research projects headed in this direction. Such software-
generated queries will surely be nonconjunctive and much more complex
than those originating directly from users; they will require correspondingly
more general back-end translators.
Another important requirement for such systems is the ability to distin-
guish reasonable queries (those than can be answered sensibly) from unrea-
sonable ones. If the literal interpretation of a given query is not reasonable,
we argue that it is better to reject the query than to attempt to guess the
intended interpretation by transforming the query into a nonequivalent one.
This issue is addressed in Examples 2.1 and 5.4.
A preliminary version of this work was presented in a conference [27].
2. PROBLEM STATEMENT AND BACKGROUND
In this paper we shall be concerned with two main questions.
(1) Which relational calculus queries can be answered sensibly?
(2) How can such queries be answered?
For our purposes, answering a query to a database is equivalent to evaluat-
ing a relational calculus formula with respect to an interpretation. 13Y
“sensible” we mean that values in any correct answer are limited to con-
stants that occur in the query or in database relations that occur in the
query.
Not all queries in relational calculus can be answered sensibly. Two simple
examples that cannot be answered sensibly are
F(x)d:f+%,
G(x, .Y)%?V Qy
ACM TransactIons on Database Systems, Vol 16, No 2. June 1991
Safety and Translation of Relational Calculus Queries . 237
where P and Q are database relations. 1 F(x) holds for arbitrary x‘s that are
not in the database, and G( x, y) holds for arbitrary y values when Px is
true, and vice versa.
In the following section, we describe previous attempts to characterize
those classes of queries that can be answered sensibly.
Evaluation of relational calculus queries can be performed either by trans-
lation into a set of clauses suitable for a Prolog interpreter [14, 23, 51 or by
translation into a relational algebra expression. Here, we are concerned
solely with the second approach.
Translation of a relational calculus query that includes disjunction and/or
negation is a theoretically solved problem, provided the query is “safe” in the
original sense of Unman [26], as shown there. However, the practical difficult-
ies are such that several commercial database query systems give intu-
itively unexpected results on such queries.
Example 2.1. Consider the following SQL query.
select P. name
from P, Q, R
where P. name = Q. name
or P.name = R.name;
The intended interpretation of this query (ignoring attributes other than
name) is clearly the following relational calculus formula.
P(nanze) A (Q(name) V R(name)).
Surprisingly, if relation R is empty, SQL returns the answer nil, even if
there are matches between P and Q. QUEL behaves similarly, as do other
systems whose query language is derived from QUEL [25].
Even though SQL and QUEL may be defined to behave in this way, we
regard such behavior as confusing and incorrect. We address the problem of
the correct translation and evaluation of such queries below.
2.1 What Are the Problems?
Conjunctive query formulas are those that use only ~ and ~. (Equality can be
represented in conjunctive queries by repetition of variables and substitution
of constants; for simplicity, we do not consider “built-in” predicates such as
< , > , etc. ) The translation of such a formula into an equivalent relational
algebra expression is straightforward and well-known. Informally,
A(u, U, w, x) A l?(u, u, y, z) becomes an equi-join on the columns of u and u,
1Technically, the queries are written as {.x 17 Rx } and {x, y I Rx v Qy}, but the associationbetween named formulas and queries is clear.
ACM Transactions on Database Systems, Vol 16, No 2, June 1991.
238 . A. Van Gelder and R W Topor
and s xA( x, y, z) becomes a projection that eliminates the column for x.
Essentially, all such formulas can be translated.
The situation changes when we introduce disjunction and/or negation. We
intend to handle disjunction algebraically by union and handle negation by
set difference. For example, Pxy v Qxy can be evaluated by P U Q and
Pxy ~ 1 Qx can be evaluated by P cliff Q. More generally, to have a simple
representation in relational algebra, both operands of “v” must have the
same variables, and negations must appear in the form A v 7B where the
free variables of B are a subset of those of A [26].
Queries that do not satisfy these conditions possibly cannot be answered
sensibly, as illustrated by the two earlier examples:
F(x) ‘:f -Px
G(x, y)~fPx V Qy
The two problems here, which are the main problems aside from handling
equality, are
—the terms of a disjunction do not have the same set of free variables and
—a variable in a negative atomic formula is not limited in its range by a
positive atomic formula elsewhere in the formula.
Once we develop tools to handle these problems, then universal quantifiers
will not present any new problems; we will be able to rewrite v x as 1 ~ x ~ at
the appropriate moment.
The situation is really more complicated than it might appear at first
glance because the problem in a subformula can often be cured by some other
part of the overall formula. Thus, even though the query
G(x, y)d~f PzV Qxy
is not reasonable, because it is true for arbitrary y values when Px is true,
the query
F(x)~f:yG(x, y)==3y(Px VQXy)
is reasonable. The naive translation into x I( P U Q), where z ~ means “pro-
ject onto column 1,“ presents problems because the expression P U Q is
undefined. However, in this case, F(x) has an equivalent form,
for which the naive translation P U T I( Q) is correct.
ACM Transactions on Database Systems, Vol 16, No 2, June 1991
Safety and Translation of Relational Calculus Queries . 239
Our goal is to develop a systematic method to distinguish the curable
problems, such as the above, from the uncurable ones, such as ~ y( Px v Qy),
and to provide correct transformations for the curable ones.
2.2 Previous Work
There have been several attempts to define a class of “reasonable” qu cries,
i.e., a class with the following desirable properties:
–The constants in the database and the query provide a sufficient domain
for the values in the answer. Formulas with this property are called
domain independent [9, 16, 281.
–There is an efficient way to decide if the query is domain independent and,
if so, to translate the relational calculus formulas into an equivalent
relational algebra expression. Formulas with this property are often called
“safe”, although this term has been used with several definitions.
–There is an efficient way to evaluate the resulting relational algebra
expression.
The class of conjunctive queries has these properties, as shown by Unman
[261, but this class is rather limited. The class of domain indepwzdentformulas [9, 16], by definition the largest class having the first property listed
above, was shown by Nicolas and Demolombe [21] to be equivalent to the
class of definite formulas defined by Kuhns [12]. This class was shown to be
not recursive by DiPaola [8] and Vardi [281.
The class of safe formulas was introduced by Unman [26] to characterize
those relational calculus formulas that had equivalent relational algebra
expressions. This class was not, however, defined purely syntactically and, in
effect, Unman proved that every domain independent formula had an equiva-
lent relational algebra expression. The translation required the construction
of the Dom relation consisting of all constants in the query and in database
relations that occurred in the query. A rather narrower, and syntactic,
definition of safe appears in Unman’s revised edition [25]. To clarify which
definition is under discussion, we shall use “semantically safe” for the
original term [26], and “syntactically safe” for the newer one [251. (Still other
definitions of “safety” have appeared; Kifer compares several of them [111.)
Other researchers have proposed decidable subclasses oft he domain inde-
pendent formulas. These subclasses include the range separable formulas [4],
the range restricted formulas [20, 51, the eualuable formulas [61, the allowed
formulas [23], and the syntactically safe formulas [251. We define (some of)
these classes below, as we discuss them.
Of these classes, the evaluated formulas comprise the largest class, but the
definition of this class by Demolombe [6] occupies three pages; its cc)mplex
definition makes it unwieldy to work with, as evidenced by the fact that it
required ten pages just to prove that it is a subclass of the domain indepen-dent formulas; moreover, there is no attempt there to describe how to
evaluate evaluable formulas, i.e., how to translate them into reliztional
algebra expressions.
ACM Transactions on Database Systems, Vol 16, No. 2, June 1991.
240 . A Van Gelder and R. W Topor
The allowed formulas are a strict subclass of the evaluable formulas. They
are easier to recognize and easier to translate into relational algebra. The
syntactically safe formulas [25] are a strict subset of the allowed formulas
[231, provided that both classes treat equality alike.The range restricted formulas comprise the evaluable formulas that are in
disjunctive normal form or conjunctive normal form [61. In an important step
toward practical evaluation, Decker [5] extended the class of range restricted
formulas and showed how to transform any range restricted formula into an
equivalent z range form that is suitable for Prolog-style “tuple at a time”
evaluation.
Taking another approach, Aylamazan et al. have shown that even queries
that are not domain independent with potentially infinite answers can be
presented finitely, provided the database relations are finite [21. The method
involves forming a Dom + relation consisting of the Dem. relation together
with k + 1 new symbols where the query has k occurrences of equality. The
query is evaluated along standard lines [261 except that the extended domain
Dom + is used instead of Dem. This very interesting theoretical result does
not appear to be practical to implement in its current form due to the
necessity for enumerating Dom + but merits further study.
Another alternative approach is to add set constructors to an algebra-based
query language and to restrict the possible expressions to ensure domain
independence. This approach has been explored by several researchers
[13, 3, 221.
3. SUMMARY OF RESULTS
In this paper we give a much simpler definition of the evaluable formulas.
With this simpler definition, it is easier to prove properties of the evaluable
class and to see the relationship between allowed formulas and evaluable
formulas. We show that the evaluable class is invariant under a set of
well-known equivalences that can be used as rewrite rules (e. g., DeMorgan’s
laws) which we call conservative transformations. This invariance makes it
easy to see that every evaluable formula can be conservatively rewritten in
prenex-literal normal form (Definition 4.2). However, the eualuable property
is not always preserved under distribution of ~ over v or v over A. Using
distribution is apparently a necessary step to put certain formulas into an
equivalent form that can be “transliterated” into relational algebra. (For
example, the reader is invited to try to construct a relational algebra imple-mentation of P.Yy A ( QX V R-y) without using the distributive law. ) This is our
motivation for transforming evaluable formulas into allowed formulas which
are invariant under distribution.
One of our main results in an algorithm that transforms any evaluable
formula into an equivalent allowed formula.
‘By equivalent we shall always mean loglcally equivalent.
ACM TransactIons on Database Systems, Vol 16, No 2, June 1991
Safety and Translation of Relational Calculus Queries . 241
Another main result is that every allowed formula can be effectively
translated into an equivalent relational algebra expression.
At this point we should mention two properties of formula transformations
(either into other formulas or into relational algebra expressions) that we
consider unacceptable and wish to avoid. The first property is that the
transformation does not necessarily produce a logically equivalent formula
but is only guaranteed to do so if the input formula is a certain class (such as
the domain independent class). This puts the burden on the user of prcwiding
a correct input formula or getting erroneous results with no warning. The
second unacceptable method is to explicitly form the Dom relation clefined
above. Both these drawbacks are present, for example, in the rewrite rule:
‘Pxy = Donz(x) ADor?z(y) A lPXY
+ Dom ~ Dom — P.
Both our transformation algorithms have the attractive property that such
tactics are not required. The transformation from syntactically safe formulas
to relational algebra expressions given by Unman [25] also avoids such
tactics.
Finally, we argue that the class of evaluable formulas is the largest
practical subclass of domain independent formulas in a certain sense. Essen-
tially, the domain independent class is not recursive because a given formula
may have a subformula that is superficially not domain independent, but is
unsatisfiable, hence is actually domain independent (vacuously). 3 However,
formulas in which no predicate symbol is repeated cannot possibl~y have
unsatisfiable subformulas. We show that formulas in this class are evaluable
if and only if they are domain independent. This means that any extension to
the class of evaluable formulas that remains domain independent must at
least provide for simplifications based on common subexpressions (e.g., sub-
sumption tests) and should probably include some form of inference capabil -
ity (e. g., resolution). Any nontrivial capabilities in this direction may make
the procedure too expensive to serve in a general purpose formula processor.
4. NOTATION AND DEFINITIONS
We assume the reader is familiar with the standard notation and terminol-
ogy of logic, relational calculus, and relational algebra [17, 261. We shall
abbreviate “first order well formed formula” to formula, and “atomic for-
mula” to atom. A literal is either an atom or a negated atom. We assume the
absence of function symbols (other than constants) throughout.
We distinguish two classes of predicates that appear in formulas which we
call edb predicates and restriction predicates.
Definition 4.1. An edb predicate is either a predicate that represents a
database relation or a built-in predicate that represents a finite relation.
—-3The situation is not this simple, but this is the central idea.
ACM Transactmns on Database Systems, Vol. 16, No. 2, June 1991.
242 . A Van Gelder and R W. Topor
A restriction predicate is a built-in predicate whose relation is infinite and
the complement of whose relation is infinite in an infinite domain.
For example, “ unary equality”, represented by atoms of the form x = c
where x is a variable and c is a constant, must be interpreted by a relation of
one tuple, so is edb. However, x = y and x # Y, where x and y are variables,
have infinite interpretations on infinite domains so are restriction predicates.
Note that we have not provided for relations whose complement is finite;
instead we assume the language contains a predicate for the finite comple-
ment but not the infinite relation. For example, it turns out that defining
x # c as a built-in predicate gains nothing. This topic is discussed further in
Section 5.3.
We primarily use P, Q, R and S to denote edb predicate symbols. We use
A, B, . . . . H to denote formulas and sub formulas, a, . . . . d (and integers and
obvious strings) to denote constants, u, ., z to denote variables, and s and t
to denote terms, i.e., constants or variables.
We use 2 to denote a tuple (xl, . . . . x.), where n > 0, and A(x, j) to
denote a formula in which x is a free variable and yl, . . . . y, are (some of)
the remaining free variables of A.
Similarly, we write V? for Vxl . . . Vx~, and write ~1 for qxl . . . qxn. we
also use “Yc’) to denote either v or g or, in the case of 70 Z, a specific sequence
of (possibly mixed) quantifiers. We assume that no quantified variable occurs
outside the scope of its quantifier; i.e., we avoid (~xA(x) ~ ~ xB(x)) and use
instead (~ xl A(xl) ~ ~ Xz B(xz)).
We use - to denote logical equivalence and * to denote logical implica-
tion; both denote relations between formulas, not symbols within formulas.def
In addition, = is often used to mean “is defined as” to givenamestoformulas. We occasionally use “[ ]“ as synonyms for “( )“ for readability.
We adopt the usual definitions (e.g., see Manna [17]) for prenex normal
form, conjunctive normal form, and disjunctive normal form which we abbre-
viate to PNF, CNF and DNF, respectively. We shall also introduce existential
normal form, abbreviate ENF, and relational algebra normal form, abbrevi-
ated RANF. (See Definitions 9.4 and 9.5). In addition, we occasionally refer
to the following normal form.
Definition 4.2. A formula is said to be in prenex-literal normal form(PLNF) if it is in PNF and all negations are immediately above the atoms.
(This is sometimes called negative normal form.)
5. EVALUABLE AND ALLOWED CLASSES OF FORMULAS
In this section we define the classes of evaluable formulas and allowed
formulas, and give some of their properties. The term eualuable is due to
Demolombe [6]. We use the same term because the class is the same,
although our definition is different. (We shall not prove that the classes are
the same, as we do not rely upon this fact; we simply wish to give appropriate
credit. ) Actually, there is a minor difference in that we allow the equality
predicate = which we interpret throughout as the identity relation and allow
ACM Transactions on Database Systems,Vol 16, No 2, June 1991
Safety and Translation of Relational Calculus Queries . 243
gen(.c, A) if edb(A) & free(z, A).
gen(t, -v4) if geu(r, pushnot(~A)).
gem(z, 3yA) if dlstznct(x, y) &: gen(.c, A).
gen(z, b’yA) if dzst~nct(x, y) & gen(. e, A).
gen(x, A V B) if gen(x, A) & gen(z, L’).
gen(.r, A A II) if gen(z, A).
gen(x, A A l?) if gen(x, B).
con(x, A) if edb(A) & jree(x, P).
con(x, A) if absent(z, A).
con($, 74) if con(.r, pushnoi(IA)).
Cm(.c, 3yA) if d~sttnct(.r, y) & con(.r, A).
con(z, vfJ.4) if Listinct(-r, y) & co?i(.c, A).
con(x, A V I?) if con(r, A) & con(z, B).
con(r, AA B) if gen(.z, A).
con(r, AA B) if gen(x, B).
co7i(x, A A l?) if con(z, A) & con(x, B).
Fig. 1. The rules that define gen and con
for the possibility of other built-in predicates. Such predicates are not men-
tioned by Demolombe but could easily be incorporated.
5.1 Relations gen and con
To define eualuable and allowed we first need to define certain relations
between variables and (sub)formulas. We have chosen the names gen and
con for these key relations. They are abbreviations for generated and con-
strained. Our generated relation is called restricted by Demolombe [61 and
pos by Topor [233. Also, our constrained relation is similar but not identical
to the positiue relation of Demolombe [61. We prefer to reserve the terms
positive and negative to describe the polarity of atoms or sub formulas within
a formula. A subformula is called positive if it occurs under an even number
of negations and negative if’ it occurs under an odd number.
Definition 5.1. The relations gen and con are defined inductively inFigure 1 by a set of logical rules. We intend that the relations gen and con
hold only when they can be established by a finite number of applications of
these rules; that is, they are the inductive closures of their respective rules.
ACM Transactions on DatabaseSystems,Vol. 16, No 2, June 1991.
244 . A Van Gelder and R. W Topor
Several predicates and functions appear in these rules to support the
definitions of gen and con. We intend that they be interpreted as follows:
— edb( A) holds if A is an atomic formula whose predicate symbol represents
a database relation or a built-in that must have a finite relation (see
Definition 4.1).
–fiee( x, A) holds if variable x occurs (freely) in formula A.
— ab.sent( x, A) holds if variable x does not occur freely in formula A.
— distinct( x, y) holds if x and y are different variables.
–Function pushnot(~A) returns a formula B, provided A is a not an edb
atom, as follows:
-A B
In Figure 1, the &’s separating atoms are interpreted as “and”, x and y as
(object-level) variables, and A and B as (meta-level) variables representing
formulas. For example, the first rule reads, “Variable x is generated in
formula A if A is an edb atom and x occurs freely in A.”
To give some intuition about gen and con, consider a fixed database, and
let Dom( A) be the set consisting of all constants in the formula A and in the
database relations that occur in A [261. We assume finite database relations,
so Dom( A) is finite.
The fact that gen( x, A( x, j)) holds tells us that if A( a, ;) is true, then
a e Dom( A). From a database viewpoint, the values of x are generated when
A( x, j) is evaluated. From a logic programming viewpoint, A( x, j) can be
regarded as a goal that succeeds only by binding x to values in (a subset of)
Dom(A).
Similarly, the fact that con( x, A( x, j)) holds tells us that if A(a, ~) is true,
then either:
—a eDom( A), or
—A( x, ~) is true for all values of x
Figure 2 shows a geometric interpretation of this property, where A has two
free variables. If con holds for both free variables of A( x, y), then the set of
points (x, y) where A( x, y) holds can be represented as a finite collection of
points and lines orthogonal to an axis. All isolated points at which A( .x, y)
holds are in Dom( A) x Dom( A) (a sharper restriction is developed in
Section 8), and all lines intersect an axis at a point in Don-z( A).
ACM Transactions on Database Systems, Vol 16, No 2, June 1991
Safety and Translation of Relational Calculus Queries . 245
Y
t
.
Fig. 2.. .
erty for
.
.
Geometric interpretationdef
A(x, y) = px V Qy V Rxy.
of the con prop-
The term “constrained” is used as corz( x, A) holds whenever x is gener-
ated in every disjunct of A in which it occurs. This constrains x but not
necessarily enough to generate values for it. From a logic progyarnming
viewpoint, A can be regarded as a goal that succeeds only by binding x to
values in Dorn( A) or without binding x at all.
LEMMA 5.1. For every variable x and formula A, gen( x, A) implies
con( x, A).
PROOF. Use structural induction on A. ❑
Note that the converse of Lemma 5.1 is false.
Example 5.1. In each of the following formulas con( x, A) holds but
gen( x, A) does not hold.
A~flQy
A ‘~f Pxy V T Qy
A~f(PXY V~QY)A(RXVSY)
Note that x need not appear in A and that con( y, A) does not hold for any of
the formulas.
Example 5.2. Suppose we have a database that contains relations
—Rwy, which holds if project w requires part y, and
–Sxy, which holds if supplier x supplies part y.
Then, a natural question to ask is: “Does some supplier supply all parts
required for project 100?” which can be expressed by the formula
F~f3xvy(7R(100, y) V SXY)
Here, con( x,= R(1OO, y) V Sxy) holds but gen( x,1 R(1OO, y)I V Sxy) dcles not
hold. Neither con(y, TR(100, y) vSxy) nor gen(x, TR(100, y) ‘V Sxy) holds. We
shall return to this example later, in Examples 5.4 and 8.4.
ACM TransactIons on DatabaseSystems,Vol 16, No 2, June 1991
246 0 A. Van Gelder and R. W. Topor
The properties gen and con are used to classify relational calculus formu-
las in the next section.
5.2 Evaluable and Allowed Formulas
We now define the two main classes of relational calculus formulas consid-
ered in this paper.
Definition 5.2. A formula F is evaluable or has the evaluable property if
–For every variable x that is free in F, gen( x, F) holds.
–For every subformula ~ XA of F, con( x, A) holds.
—For every sub formula v XA of F, con( x, TA) holds.
Definition 5.3. A formula F is allowed or has the allowed property ifi
–For every variable x that is free in F, gen( x, F) holds.
–For every subformula ~ XA of F, gen( x, A) holds.
–For every sub formula v XA of F, gen( x, 1A) holds.
Each definition attempts to characterize a large class of formulas that are
provably domain independent. Requirements (2) and (3) of each definition are
designed to ensure that the scope of every quantified variable in a formula is
sufficiently restricted for the entire formula to be domain independent. The
stricter requirements in Definition 5.3 permit a straightforward implementa-
tion of 3 x as projection for allowed formulas. Moreover, the allowed property
is conceptually simpler and can be tested more efficiently than the evaluable
property.
Rather than prove that our definition of evaluable yields the same class as
Demolombe [6], it is easier to reprove the important properties of the class.
We shall show that every evaluable formula (and hence every allowed
formula) is domain independent in Section 10, after developing some more
machinery. First we note the following important relationship between the
two classes of formulas.
THEOREM 5.2. Euery allowed formula is evaluable.
PROOF. Immediate from Lemma 5.1, ❑
Note that the converse of Theorem 5.2 is false.
Example 5.3. The following formula is evaluable but not allowed:
F(y) d:f~x[(Pxyv Qy) /? lRy]
First note that gen( y, F(y)) holds. Next note that, for x, the subformula in
square brackets satisfies con but not gen. Finally note that the following
equivalent variant of F is allowed, indicating the sensitivity of the allowed
property to the form of the formula.
~(y) ~f(3.x[pxy] V Qy) A ‘Ry.
ACM TransactIons on Database Systems, Vol 16, No. 2, June 1991
Safety and Translation of Relational Calculus Queries . 247
Example 5.4. Recall Example 5.2 with relations R wy (w requires y) and
Sxy ( x supplies y). The question, “Does some supplier supply all parts
required for project 100?” was expressed by the formula
F*f3xvy[7R(100, y) VSxy] .
We see that F is not allowed because gen( x, 1R(1OO, y) v Sxy) does not hold.
However, con( x, 1R(1OO, y) v Sxy) does hold. Now, the negation of
(1 R(1OO, y) v Sxy) is (R(1OO, y) ~ lSxy), and gen(y, R(1OO, y) A =Sxy) holds
(and hence con also holds). So F is evaluable.
A natural extension of the above question is the apparently harmless
variant: “Which suppliers supply all parts required by project 100?” which
can be expressed by
F1(x)~fvy[=R(lOO, y) V Sxy] .
As x is now free in Fl, the evaluable property requires the stronger gen
relation to hold for x, which it does not. Therefore, F1 is not evaluable.
Intuitively, the problem is that if R(1OO, y) is empty, then Fl( x) is true for
all values of x, which may not be the intended answer.
The question can be expressed in several other ways in attempts to cnpture
its intended meaning more accurately. Examples of such alternative repre -
sensations are
F2(x)~f3zR(100, z) AVy[TR(loo,” y) VSXY]
F3(X)~f3ZSXZ AVY[TR(100, y) VSXY] .
Each of these formulas is evaluable, and F3 is even allowed. Now, however, if
R(1OO, y) is empty, F2(x) is true for no x at all, and F3( x) is true for all
suppliers in the database. As we cannot know which of these three answers is
the intended one, we believe that it is better to ask the user to give a more
precise query than to translate the nonevaluable query into a nonequivalent
evaluable one. It is our thesis, however, that the class of evaluable formulas
is large enough that this situation will rarely arise.
5.3 Equality and Built-in Relations in Evaluable Formulas
The definition of evaluable in this section adopts a “middlle of the road”
approach to equality and other “built-in” predicates. It is conservative with
respect to equality between two variables: gen( x, x = y), con( x, x = y),
gen( x, x # y) and con( x, x # y) never hold. Many formulas with equality are
evaluable despite this conservative approach. Formulas satisfying Defini-
tion 5.2 are called strict-sense evaluable. In Appendix A we describe transfor-
mations that remove many instances of such equalities and yield an “equality
reduced” form. We call formulas that can be transformed into evaluable
formulas by means of these transformations wide-sense evaluable. Our pri-mary motivation for distinguishing between strict- and wide-sense evaluabil -
ity is that we wish to exploit the fact that strict-sense evaluabi[ity is
invariant under many useful transformations, as discussed in Section 16.
ACM Transactions on Database S@ems, VO1 16, No. 2, June 1991.
248 . A Van Gelder and R W, Topor
Fig. 3, The equivalences upon which conservatztw transformations are based, “’%” denotes3orv
On the other hand, defining gen( x, x = c) to hold for “unary equality”
makes sense intuitively, since it is trivial to associate a one-tuple relation
with this atom.
If our system has other built-in predicates, we adopt the corresponding
approach: their atoms satisfy gen and con if and only if they represent
known, finite relations, so are classified as edb. We require that restriction
predicates occur in complementary pairs, such as > and < ; in such cases,
the definition of pushnot is extended in the obvious way (see Definition 5. 1).
In this paper, mainly to simplify parts of the presentation, we assume =
and # (with two variables) are the only restriction predicates. This treat-
ment is convenient but not central to the main topics of the paper.
6. CONSERVATIVE AND DISTRIBUTIVE TRANSFORMATIONS
In this section we study the effects of various logical transformations on the
evaluable and allowed properties of formulas with a view to identifying sets
of transformations under which these properties are invariant or preserved.
To avoid confusion, we use the terms “invariant” and “preserve” as defined
below:
Definition 6.1. Let 2’(A) be a transformation (normally of formulas into
formulas). A property is said to be inuariant under transformation T if T(A)
has the property if and only if A does. A property is said to be preserved
under transformation T if T(A) has the property whenever A does (but
T(A) may have the property even if A does not).
Figure 3 shows some standard equivalences that are frequently useful to
manipulate formulas [1, 17]. Note that the number of atoms and binary
logical operators are invariant under these transformations. We show that
the evaluable property is invariant under transformations based on these
identities.
ACM TransactIons on Database Systems,Vol 16, No. 2, June 1991
Safety and Translation of Relational Calculus Queries . 249
Definition 6.2. We say that F can be conservatively transformed to G if G
can be obtained by replacing (one copy of) a subformula of F according to one
of the equivalences in Figure 3, or by a series of such replacements.
LEMMA 6.1. The relations gen and con defined in Figure 1 are invariantunder conservative transformations ((El – 10) of Figure 3). That is, if F( y) can
be conservatively transformed to G( y), then gen( y, F( y)) * gen( y, G( y)), and
similarly for con.
PROOF. This is merely a matter of applying the definitions. For example,
suppose (E1O) is applied, i.e.,
F(X, y)d~fVX(A(X, Y) AB(X, y))
G(x, y)~fvxl A(xl, y)~vxall(xz, y)
(y need not occur in A or B). If con(y, F(x, y)) holds, then con(y, A(x, y) A
B( X, y)) also holds, and at least one of the following three propositions is
true:
—gen( y, A( x, y)) holds: Then gen( y, v xl A(xl, y)) also holds.
—gen(y, B( x, y)) holds: Then gen( y, VX2 B( X2, y)) also holds.
–Both con( y, A( x, y)) and con( y, B(x, y)) hold: Then con( y, Vxl A( xl, y))
and con( y, Vxz B( Xz, y)) also hold.
Thus, in each case, con( y, G( x, y)) holds. The other direction and other cases
are similar. ❑
THEOREM 6.2. If A is evaluable and A can be conservatively transformed to
B, then B is evaluable.
PROOF. The only cases not handled by Lemma 6.1 involve moving the
quantifier for the first argument of a con by means of (E7- 10). For example,
if F is evaluable and contains a subformula
Gd:f(3XA(X)AB)
and, using (E8), we form F’ by replacing G in F by
G’~f3X(A(X)AB)
then we need to verify con( x, A(x) A B). As x is not in 1?, con( x, B) holds.
As F is evaluable, con( x, A( x)) also holds. Consequently, con( x, A(x) A B)
holds, and F’ is thus evaluable.
In the other direction, if F’ is evaluable, then con( x, A(x) A B) holck. It is
impossible that gen( x, B) holds, so either gen( x, A(x)) holds or con( x, A(x))
(and con( x, B)) hold. This establishes that F is evaluable. The other cases
are similar. ❑
COROLLARY 6.3. Every evaluable formula can be conservatively trans-
formed into an equivalent evaluable formula in PLNF (Definition 4.2).
ACM Transactions on Database Systems, Voli 16, No 2, June 1991
250 . A. Van Gelder and R. W. TOPOI
AA(l?v C) = (A AII)v(A AC) (En)
Av(ll AC’) R (Av B) A(Av C) (E12j
3z(z=y A A(z, y)) = A(y, y) (E13)
Vz(X#y V A(x, y)) = A(y)y) (E14)
Fig 4 Other useful equivalences: distributive laws and equality elimination
COROLLARY 6.4. Every evaluable formula can be conservatively trans-
formed into an equivalent evaluable formula that contains no universal quan-
tifiers and has negations only immediately above atoms and existential
quantifiers.
The allowed property may not be preserved by the conservative transfor-
mations (E9– 10), as shown in the next example. In particular, conservatively
transforming an allowed formula into prenex normal form can destroy the
allowed property.
Example 6.1. The allowed formula ~ xA( x) v B, where x is not free in B,
can be conservatively transformed into the prenex normal form ~ X[ A( x) v B],
which is not allowed.
We now consider which properties are invariant under the distributive
laws (En- 12) shown in Figure 4.
LEMMA 6.5. The relation con defined in Figure 1 is invariant under (En)
of Figure 4 (“pushing ands”). That is, con( x, A A (B V C)) holds if and only if
con( x, ( A A B) V ( A ~ C)) holds. In addition, gen is invariant under both
distributive laws (En- 12) of Figure 4.
PROOF. Suppose (El 1) is applied, that is,
F~fAA(BVC)
G~f(AAB)V(AAC).
First suppose con( y, F) holds. (F and G may or may not contain y.) Then at
least one of the following three propositions is true:
–gen( y, A) holds: Then both gen(y, A A 1?) and gen( y, A A C) also hold.
—gen( y, B V C) holds: Then both gen( y, B) and gen( y, C) hold. It followsthat gen(y, A A B) and gen(y, A A C) also hold.
—con(ey, A) and con(y, B v C) both hold: Then con(y, B) and con(y, C) alsohold.
In all three cases we conclude that con( -y, G) holds.If con( y, G) is given, a similar case analysis shows that con( y, f’) holds.
The proofs for g-en are similar. O
As noted by Demolombe [6], “pushing ors” (E 12) does not always preserve
con.
ACM TransactIons on Database Systems, Vol. 16, No 2, June 1991
Safety and Translation of Relational Calculus Queries o 251
Example 6.2. ([61) Let F be transformed into G by applying (E12), where
F~fPx V (Qxy A ‘~y)
G~f(Pxv Qxy) A (Px V ‘Ry)
Then corz( y, F) holds, but con( y, G) does not.
Our final result in this section summarizes under which transformations
the allowed property is invariant or is preserved. This result is used in the
translation given in Section 9 from allowed formulas into relational algebra
normal form, which requires the use of distributive laws.
THEOREM 6.6. If A is allowed and B is obtained from A by either
— a distributive transformation ( Ell – 12) of Figure 4, or
— a conservative transformation (El –8) of Figure 3, or
— a conservative transformation ( E9– 10) of Figure 3, used in the left-to-right
direction only, then B is also allowed.
PROOF. The distributive laws are immediate from Lemma 6.5. The rest is
similar to Theorem 6.2, except that we need to check that the needed gen
relations are present (instead of con) when (E7– 10) are used. El
Note that, although “pushing ands” (En) preserves con (Lemma 6.5
above), it does not always preserve the evaluable property.
Example 6.3. Let F(z) ‘~f v x 3 yA( x, y, z) be evaluable, where
A(x> y, Z)~fRYZA (@cv 7Px).
Since
TA(x, y, Z) = VRYZV (-l QXAPX),
we have con( x, 1A), as required for F to be evaluable.
“Pushing and” by (El 1) gives
B(x, y, Z)d~f(l?yZAQX)V (RYZA TPX)
and the corresponding G ‘~f v x ~ylil( x, y, z). Now, con( x, 7B) does not hold, so
G is not evaluable. The problem is that “pushing and” in A is the same as
“pushing or” (E12) in 1A. This is the one distributive transformation that
may not preserve con.
7. RANGE RESTRICTED FORMULAS
Range restricted formulas are based on disjunctive and conjunctive normal
forms, and represent one of the first decidable subclasses of domain indepen-
dent formulas to be studied [201. (Our definition is a slight extension to
permit some use of equality.) Putting formulas into normall forms requires
the use of distributive laws (El 1- 12) of Figure 4. Since the distributive lawsdo not always preserve the evaluable property, it is not too surprising that
certain evaluable formulas become nonevaluable if we simply put them into
13NF in an attempt to make an equivalent range restricted formula, as shown
ACM Transactions on DatabaseSystems,Vol 16, No 2, June 1991
252 . A Van Gelder and R W. Topor
by Example 6.3. However, we show that every evaluable formula (and only
those) has an associated pair of formulas in DNF and CNF that satisfy
conditions quite similar to those required for range restricted formulas.
This theorem provides an alternative recognition mechanism for evaluable
formulas.
Definition 7.1. Let Fd~f % 2M be a formula in disjunctive normal form,
where
M~f(D&”. vD, ).
Let M’d~f(Cl ~ “ “ “ ~ Cm) be the conjunctive normal form of M constructed by
applying the distributive law (E 12) of Figure 4. Then F is range restricted if
the following properties hold.
(1) For every free variable x in F, x occurs in a positive literal (other than
x = y) in every D,, i.e., gen(x, M) holds.
(2) For every existentially quantified variable x in F, x occurs in a positive
literal (other than x = y) in every DJ in which x occurs, i.e., con( x, M)
holds.
(3) For every universally quantified variable x in F, x occurs in a negative
literal (other than x # y) in every C~ in which x occurs, i.e., con( x, -M’)
holds.
Item 3 in the above definition was stated differently by Demolombe [6]:
3’. For every universally quantified variable x in F, if x occurs in any positive atom,then there is some conjunction D~ such that every atom of D is negative andcontains x. (Either con( x, T D,) holds for all D, or gerz( x, ~D, ) ~olds for some Dj;
i.e., con(x, T 34) holds. )
The equivalence of the two definitions follows from Lemma 6.5, since TM’ is
obtained from TM by repeatedly “pushing and” (El 1).
THEOREM 7.1. (Demolombe [61) Let F be a formula in disjunctive normalform. Then F is evaluable if and only if F is range restricted.
PROOF. Immediate from the definition, Lemma 6.1, and Lemma 6.5. ❑
Demolombe observes that a similar result holds for formulas in conjunctive
normal form. This theorem can be generalized to apply to all evaluable
formulas as follows.
Definition 7.2. Let en f( F) (resp., dn f( F)) be the conjunctive (resp., dis-
junctive) normal form of formula F constructed by applying conservative
transformations and distributive law (En) (resp., (E12)).
THEOREM 7.2. Let F be a formula with
dnf(F)~f%ZM~~f% 2(Dlv . . . vD~),
cnf( F) ‘~f %IiMC ‘:f ‘%l(C1 A . . .A cm).
ACM Transactions on Database Systems, Vol 16, No 2, June 1991
Safety and Translation of Relational Calculus Queries . 253
Then F is evaluable if and only if the following properties hold:
(1) For every free variable x in F, x occurs in a positive literal (other thanx = y) in every D,, i.e., gen(x, M~) holds.
(2) For every existentially quantified variable x in dnf(F), x occurs in apositive literal (other than x = y) in euery DJ in which x occur,~, i. e.,
con( x, M~) holds.
(3) For every universally quantified variable x in cnf(F), x occurs in a
negative literal (other than x + y) in every Ch in which x occurs, i.e.,
con( x, lMC) holds.
PROOF. Let plnf(F) ‘~f %~M be the prenex-literal normal form of F (Defi-
nition 4.2) constructed by applying conservative transformations. By Theo-
rem 6.2, F is evaluable if and only if plnf(F) is evaluable.
Next, by Lemma 6.5 for every free variable x in F, gen( x, F) holds if and
only if gen( x, M~) holds.
Similarly, for every existentially quantified variable x in plnf(F) (and
hence in dnf(F)), con( x, M) holds if and only if con( x, M~) holds.
Finally, note that
con(x, =( Av(B~C)))
* con(x, TAA (TBV TC))
+ COn,(X, (TA~ lB) v (lAA TC)) (Lemma 6.5)
#con(x,7(( AVB)~(AV C)))
In other words, “pushing ors “ in M is the dual of “pushing ands” in 7M.
Thusj for every universally quantified variable x in plnf( F) (and hence in
cnf( F)), con( x, TM) holds if and only if con( x, lMC) holds. U
Again we remark that dnf( F) and cnf( F) may not themselves be evalu -
able, as shown in Example 6.3.
8. TRANSFORMATION INTO AN ALLOWED FORMULA
This section presents a procedure to transform any evaluable formula into an
equivalent allowed formula. The approach used by Decker [5] to convert a
range-restricted formula into “range form” which is nearly the same as
“allowed” can be generalized quite nicely with the aid of the rules for gen
and con in Figure 1.
8.1 Ternary gen and con Relations
The first idea is to add a third argument G(x) to gen and con which
functions as a “generator” of sorts. The modified rules are shown in Figure 5.
When x is free in A, then G(x) will be a disjunction of certain edb a~oms in
A(x) (and possibly the placeholder L described next) used to prove that gen
or con holds. (From now on we adopt the convention of writing A(x), G(x),
and so on only when x is actually free in the formula; the formula may
ACM Transactions on Database Systems, Vol 16, No. 2, June 1991.
254 . A. Van Gelder and R W Topor
gen(z, A, A) if edb(A) & jme(z, A).
gen(z, -1A, G) if gen(z, pushnot(-IA), G).
gen(z, 3uA, G’) if dJst2nct(x, y) & gen(.r, ii, C).
gen(x, VgA, G) if distlnct(x, y) & gen(.c, A, G’).
gen(x, AV B, Gl VG2) if gen(x, A, Gl) & gen(z, B, Gz).
gen(z, AA l?, G) if gen(x, A, G).
gen(z, A A 11, G) if gen(x, B, G).
con(z, A, A) if edb(A) & free(z, A).
con(x, A, 1) if absent(z, A).
con(z, 7A, G) if con(x, pushnot(~A), G).
con(z, 3yA, G) if dJsiinct(z, y) & con(z, A, G).
con(z, VyA, G) if d~s~~nct(.c, y) & con(z, A, G).
con(x, AV B, GI VG2) if con(x, A, Gl) & con(x, B, Gz).
con(x, AA B, G) if gen(.z, A, G).
con(z, AA B, G) if gen(z, B, G).
con(x, AA B, Gl VG2) if con(x, A, Gl) & con(x, B, G2).
Fig 5. Expansion of rules for gen and con to produce “generators”.
contain other free variables besides x.) We see that the G(x) in the conclu-
sion, or head, of each gen rule is constructed naturally from the subgoals.
The G in con is similar, except we need to provide for the possibility that x
does not even occur in A. For this, we introduce “ ~ “ as a placeholder; it
may be thought of as a O-ary predicate whose relation is always empty, that
is false.
Example 8.1. Consider F(y) ‘~f ~ xA(x, v), where
A(x, y)~f(~xyV Qy) /l (Rxy V lSy),
Figure 6 shows how the ternary gen and con apply to the sub formulas of F.
Since A( x, y) is the only quantified subformula in F(y), we see that F(y)
is evaluable, but not allowed. This example is considered further in
Example 8.3.
Definition 8.1. For any formula G( x), actually containing x, and possibly
containing other free variables, ~*G(x) denotes G(x) with all variables
except x existentially quantified. Also, ~*L denotes false.
ACM TransactIons on Database Systems, Vol 16, No 2, June 1991
Safety and Translation of Relational Calculus QUefleS . 255
gen(z, Pzy, Pzy) con(z, Qy, 1)
gen(y, Pzy, Pzy) gen(y, Qy, Qy)
[T, 7~,
\ /
con(z, (Pzy v Qy), Pzy V 1) con(z, (Rzy V TSy), Rzy V~
gen(y, (F’zy v Qy), P.cy V Qy)
\t 1con(z, (Pry V Qy) A (Rzy V +’y), Pzy V Rzy V 1)
gen(y, (Pzy v Qy) A (Rzy V +g), Pzy V Qy)
gen(y, 3t (Psy V Qu) A (&y V~{
Fig. 6 Application of rules for ternary gem and con, from Example 8.1
Definition 8.2. The operation of
applying the following simplifications
f~ alse + true
A ~ false + false
A v false +A
%x false + false
truth value simplification consists of
to a formula as long as possible.
~true + false
A~true+A
A v true + true
Yox true + true
Note that “simplifications” that depend on the law of the excluded middle,
such as A v 1A + true, are not part of this definition.
Definition 8.3. Let Gd~fPl V “ “ “ v Pm, where P, are atoms in formula A.
Then A[ G / false] denotes the formula in which each occurrence of P, in A is
replaced by false.
Example 8.2. Consider the evaluable, but nonallowed, formula
Fd:f 3xA( x), where
A(X) ~fT3y(R(100, y) A I%y).
Recall from Examples 5.2 and 5.4 that F represents the query “Does some
supplier supply all parts required by project 100?” Here, con( x, A( x), Sxy v
~ ) holds, but gen( x, A(x)) fails.
Letting G ‘~f (Sxy v ~), we have ~* G( x) := ~y Sxy. Note also that
A [ Sxy/ false] = 73y[R(10(), y) A ‘false]
Truth value simplification removes the second conjunct. ‘The resulting for-
mula and ~* G( x) play a key role in the transformation of evaluable formulas
into allowed formulas, as discussed further in Example 8.4.
The following lemma partly motivates the definition of the third argu-ments of gen and con. It shows that, for any interpretation (i. e., database),
the set of values of x for which A(x) holds is a subset of those for which G(x)
holds.
ACM TransactIons on DatabaseSystems,Vol 16, No 2, June 1991.
256 . A. Van Gelder and R. W. Topor
LEMMA 8,1. Let gen be defined as in Figure 5. Suppose that
gen( X, A( x), G( x)) holds, where A and G may have other free uariables aswell. Then
A(x) + a* A(x)
~OOF. The first logical implication
second is straightforward by structural
~ yA. ❑
8,2 The Main Algorithm
In the following Algorithm 8.1, which
=+=* G(x).
follows by its definition, and the
induction, observing that v y A *
implements the procedure con-to-
gen( F), we describe a local transformation that, when repeatedly applied,
transforms an evaluable formula into an equivalent allowed formula. Before
applying con-to-gen, it is necessary to check separately that F satisfies gen
on all of its free variables, and to replace v y by 1 ~ y 1 throughout. If F at the
top level is not evaluable due to an internal con violation, an error message
will result; however, recursive calls of con-to-gen are made on subformulas
that may be nonevaluable.
Algorithm 8.1. con-to-gen (F)
Input: A formula F with no universal quantifiers such that con( x, A( x)) holds mevery subformula of the form ~x A( x). (If the latter property is violated, thealgorithm halts with an error message. )
Output: A formula F’ equivalent to F with these properties:1. If gen(x, F) holds, so does gen( x, F’).2, If gen(x, ~ F) holds, so does gen( x, ~ F’).3. Property gen( x, A’( x)) holds in every subformula of the form ~x A(x)
Procedure:1. If F is an atom, then return F.2. If F has the form -A, then return T con-to-gen( A).
3. If F has the form A A B, then return con-to-gen( A) A corz-to-gen( B).4, If F has the form A v B, then return con- to-gen( A) v con-to-gen( B).5 If F has the form ~xA, where x may not occur in A and A may contain
other variables, then:
(a) If gen(z, A(x), G(x)) holds, then return ~x con-to-gen( A(x)).(b) If con( x, A, G) holds, then:
(i) If x is not free in A, and hence G = L , then return con-to-gen( A)
(ii) If x M free in A, then G(x) is a disjunction PI(x) v . . . v P,n(..r) ofatoms in A( x); here n-z > 1 and there may be additional disjuncts
that are c’ L “, Let ,~) be the truth value simplification of A[ G/ false](see Definition 8.3).4 Define:
@f+* G(x) AA(X)] V j’
and return con-to-gen( fi).(c) If con( x, A, G) does not hold (for any G), print an error message, and
halt.
4Quantlfied variables in A(x) are given new names in J, of course
ACM TransactIons on Database Systems, Vol 16, No. 2, JLUE IXII
Safety and Translation of Relational Calculus Queries . 257
The key step in Algorithm 8.1 is 5b(ii), where F is rewritten as F. To
justify the algorithm we need to prove both that F = F and that gen
properties in the larger formulas in which F is embedded are preserved when
F is replaced by F. Proof of these facts is deferred to Section 8,3, Here, we
present some examples and discussion of the algorithm.
Example 8.3. Consider F(y) ‘:f ~x A(x, y), where
A(X, y)~f(PXy VQy)A(RXy VTSy)
as in Example 8.1. From that example we saw that gen( x, A( x, y), G) does
not hold (for any G), so F(y) is not allowed. But con( x, A( x, y), Pxy v Rxy v
~ ) does hold, and in step 5b(ii) # (y) is the truth value simplification of
(faz~ev QY) A (falsev ~Sy).
Observe in passing that gen( y, #(y), Qy) holds. Therefore, the alSorithm
defines
~*f~X[dy’[ PXY’VRXy’] AA(x, y)] v (QYA -15Y)
which is an allowed formula equivalent to F,
Example 8.4. Consider the formula Fd:f ~xA( x), where
A(X) XflSy(R(lOO, y) A +xy).
Recall from Examples 5.2, 5.4, and 8.2 that F’ represents the query “Does
some supplier supply all parts required by project 100?” and was found to be
evaluable, but not allowed. From the latter example, we see that .4 of
Algorithm 8.1 is 13 y“( R(1OO, y“). It follows that the algorithm defines
~~fax[~y’sxy’ AA(x)] V ‘3y’’l?(100, y“)
which is an allowed formula equivalent to F.
It follows immediately from the correctness of this algorithm, which is
established below in Theorem 8.6 and Theorem 7.1, that every range re-
stricted formula can also be effectively transformed into an equivalent al-
lowed formula. In this special case, our procedure reduces to a variant of
Decker’s in which ~* G( x) plays the role of Decker’s “range expression” and
w plays the role of his “remainder. ”
Finally, we observe that the expanded rules for gen and con have some
nondeterminacy for conjunctions: the G of either conjunct can be used when
gen holds for both. This choice represents an opportunity for optimization.
8.3 Correctness of Main Algorithm
This subsection addresses the correctness of Algorithm 8.1. The correctness
theorem is supported by a series of lemmas, some of which are rather
technical. We begin with some definitions.
ACM Transactions on DatabaseSystems,Vol. 16, No 2, June 1991
258 . A. Van Gelder and R W. Topor
Definition 8.4. By subdisjunction of PI v . . . v Pn we mean a “disjunc-
tion” of zero or more of the P,; for zero, we use ~ and for one we use just the
atom.
Definition 8.5. A formula is in literal form if the only negations are
immediately above atoms, and cannot be further simplified by combination
with a built-in predicate. (In particular, 7(x = y) and ~(x i+ y) may not
appear. )
Clearly, any formula can be put into literal form while preserving its gen
and con properties, but at the expense of including universal quantifiers. We
shall express several lemmas in terms of formulas in literal normal form, to
facilitate proofs. Although Algorithm 8.1 operates on formulas without uni-
versal quantifiers and not in literal form, the necessary properties estab-
lished in the lemmas carry over to these formulas as well.
LEMMA 8.2. Let A(x) be a formula in literal form for which gen( x, A( x))
fails, but con(x, A(x), G(x)) holds, where G(x)~fP1 v . . . v Pm, as in Step
5b(ii) of Algorithm 8.1. Let B(x) be a sub formula of A(x) and let H(x) be a
subdisjunction of G(x) such that gen(x, B(x), H(x)) holds. (A, B, F, G, and
H may have other free variables.) Then B[ G / false] truth value simplifies to
false.
PROOF. The proof is by induction on h, the height of B. For the basis
h = O, B is an edb atom, so H = B. For h >0, assume the lemma holds for
all subformulas of A with height less than h, and consider each of the
possible principal operators.
If B ‘:f B, v Bz, then there are subdisjunctions HI and H2 such that
gen( x, Bl, 111) and gen( x, B2, H2) both hold, and H = HI v Hz. But then, by
the inductive hypothesis, both Bl[ G / fake] and B2[ G / false] evaluate to
false.
If Bd~f BI ~ Bz, then either gen( x, Bl, H) or gen( x, B2, II) holds. But then,
by the inductive hypothesis, one of Bl[ G / false] and B2[ G / false] evaluates to
false. Similar arguments apply to principal operators ~y and v y.
Finally, if B ‘~f 7B’, the B’ must be an atom, so the premise of the lemma is
false. ❑
LEMMA 8.3. Let A(x) be a formula in literal form for which gen( x, A( x))
fails, but con( x, A( x), G(x)) holds, where G(x) ‘~fP1 v . . . v Pm, as in Step
5b(ii) of Algorithm 8.1. Let B be a subformula of A(x) and let H(x) be a
subdisjunction of G(x) such that gen(x, B) fails, but con(x, B, H(x)) holds.
(A, B, F, G, and H may have other free variables, and H may equal ~ .) LetMB be the truth value simplification of B[G/ false]. Then x is not free in 3B
and T3*G(x)AB - v~.
PROOF. The proof is by induction on h, the height of B. For the basis
h = O, B must be an atom in which x is not free, so H = ~ and Z~ = B. For
ACM Transactions on Database Systems, Vol. 16, No 2, June 1991
Safety and Translation of Relational Calculus Queries . 259
h >0, assume the lemma holds for all subformulas of A with height less
than h, and consider each of the possible principal operators.
If B ~fB1 v B2, then there are subdisjunctions HI and Hz such that
con( x, Bl, HI) and con( x, Bz, Hz) both hold, and H = HI V IZz. We consider
cases according to whether gen holds on subformulas. Because gen( x, B) does
not hold, it is impossible that gen( x, Bl) and gen( x, Bz) both hold.
Suppose gen( x, Bl) holds and gen( x, Bz ) fails. Then by tlhe definitions of
gen and con there is some subdisjunction HI of H such that gen( x, Bl, Hl)
holds, and by Lemma 8.2 Bl[ G / false] = false. Therefore, B[ G / false] =
Bz[ G / false], in which x is not free. The inductive hypothesis also i:mplies
that l~*G(x) A Bj + Ii[/false] = J+~. Moreover, by Lemma 8.1, BI = ~*Hl
= ~* G(x). It follows that l~*G(x)AB + w~.
The case when gen( x, IIz) holds and gen( x, Bl) fails is similar. Now
suppose gen( x, Bl) and gen( x, Bz ) both fail. Then the inductive hypothesis
applies to both BI and B2, for con must hold for both and the conclusion
follows easily for B.
When B is of the form Bd~fBl A B2, then neither gen(x, B,) nor gen( x, BJ
can hold, but con must hold, so inductive hypothesis applies to both subfor-
mulas, and the conclusion follows for B. Similar arguments apply to B ‘If ~yBl
and B ‘:f v yBl.
Finally, if B ‘:f lB’, then B’ must be an atom, and must rlOt contain x, so
H = J- as in the base case. ❑
LEMMA 8.4. Let A( x, y) be a formula in literal fbrrn for wd~tch
gen( x, 4x, y)) fails, but con( x, A( x, y), G(x)) holds, where G(x) = PIv ““”v Pm, as in Step 5b(ii) of Algorithm 8.1. Let B(y) be a sub formula of
A( x, y) in which y, distinct from x, is free. (A, B, and G may have other free
variables.) Let JIB be the truth value simplification of B[ G/ false]. Then,
(1) If gen(y, B(y)) holds, then either #~ = false or gen(y, W,B) holds.
(2) If gen( y, -B( y)) holds, then either YB = true or gen( y, 1 SYB) holds
PROOF. The proof is by induction on h, the height of B. For the basis
h = O, B(y) is an atom and ~~~ is either B itself or false. For h > 0, assume
the lemma holds for all subformulas of A with height less than 1, and
consider each of the possible principal operators.
Suppose B(y) ~fB1 v B2. If gen( y, B), then gen( y, B,) anc[ gen( y, Bz). By
the inductive hypothesis, ~~, = false or gen( y, ?l~,) for each of i = 1,2. The
same conclusion for %~ follows immediately from the definition of gen. If
gen(y, =B), then either gen(y, ~Bl) or gen(y, =Bz), say the former. 13y the
inductive hypothesis either Y?~l = false and gen( y, fl~,), and the same con-
clusion follows for ,#~.
Suppose B(y) ‘:f 131A B2. The argument when gen( y, B) hcllds is similar to
the argument for gen ( y, ~B ) above, where B was a disjunction. The argu -
ment when gen( y, lB) holds is similar to the argument for gen( y, B) above,
where B was a disjunction.
ACM TransactIons on Database Systems, Vol. 16, No, 2, June 1991,
260 “ A Van Gelder and R. W. Topor
Suppose B(y)d~f~zBl(y, z). If gen(y, B(y)), then gen(y, Bl(y, z)) holds,
the inductive hypothesis applies, and the conclusion follows. If gen( y, lB( y)),
then gen( y, 1 Bl( y, z)) holds, and the inductive hypothesis applies. Either
$~, = true, in which case # B = ~ zd~ue, or Wz( Y, ??B1) holds, in which caseso does gen(y, ,#B). The cd~ze B(y) = VZBI(.Y, z) is similar.
Finally, suppose B(y) = -III(Y). Then BI is an atom, so gen( y, B( y))
is impossible. If gen(y, = B(y)), then gen(y, Bl(y), Ill(y)). That is, H = Bl,
:?B, = false, and 3B = true. ❑
LEMMA 8.5. Let A(x) be a formula for which gen(x, A(x)) fails, but
con(x, A(x), G(x)) holds, where G(x) d~fPl v . . “ v Pm, as in Step 5b(ii) of
Algorithm 8.1. Let # be the truth value simplification of A( x)[G/ false].
Then,
A(x) = [~* G(x) AA(x)] v m
PROOF. Let A be equivalent to A, but in literal form, and let
A“(X)~fT~*G(X )~A(X)
Clearly, A(x) = A’(x) v A“( x). But it follows from Lemma 8.3 with B = A
that A“( x) * M. This shows that
A(x) + (q* G(x) AA(x))vz.
For the other direction, it is sufficient to show that Y + A( x). This follows
from the facts that each atom in G(x) occurs positively in A( x), and
# = A[G/false]. ❑
THEOREM 8.6. Every evaluable formula can be effectively transformed into
an equivalent allowed formula.
PROOF. By Lemma 8.5, F = F in Step 5b(ii) of Algortihm 8.1. Clearly,
gen( x, 3* G( x) A A( x), G( x)) holds. Let A be equivalent to A, but in literal
form. By Lemma 8.3 with B = A, x is not free in 9A, so x is not free in ,fi’.
We need to check that F preserves gen properties in the larger sub formu-las in which F is embedded. If gen( y, F), then clearly gen( y, A), arzcl by
Lemma 8.4 with B = A, gen(y, Y?A). But J ~ is just the literal form of ~?, so
gen ( y, Y). It follows that gen( y, F). Similarly, if gen( y, 7F), it is sufficient
to show that gen( y, 7 J); but this also follows from Lemma 8.4. The other
steps of the algorithm clearly preserve gen properties, so by structural
ACM TransactIons on Database Systems, Vol 16, No 2, June 1991
Safety and Translation of Relational Calculus Queries . 261
induction, if’ the initial formula satisfies gen on all of its free variables, so
does the formula returned.
Also, it is clear that con is preserved in subformulas of A, and A[G/ false]
by step 5b(ii). Since # is the truth value simplification of .A[ G / false], some
of these subformulas may simplify to false or true; however, con holds for
these. (If a subformula of #, say y~~(y), satisfies con( y, <WB(y)) when its
corresponding subformula B(y) in A did not satisfy con( y, B(y)), that
cannot make a nonevaluable F into an evaluable F, because B(y) still occurs
in A.) It follows that Algorithm 8.1, applied to an evaluable formula, returns
an equivalent allowed formula. ❑
9. TRANSLATION INTO A RELATIONAL ALGEBRA EXPRESSION
We now describe a procedure to translate any allowed formula into an
equivalent relational algebra expression, by progressing through two normal
forms, ENF and RANF, to be defined below. In combination with the trans-
formation of the previous section, this allows any evaluable formula to be
translated into an equivalent relational algebra expression.
The translation procedure has three phases:
(1) transformation of the allowed formula into existential normal form (ENF),
(2) transformation of the ENF formula into a relational algebra normal form
(RANF),
(3) translation of the RANF formula into a relational algebra expression.
For these translations, it is convenient to regard “~” and “v” as polyadic
operators on sets of formulas with no order among them. Similarly, “ ~“
applies to a nonempty set of variables: we write s i to denote z xl “ . “ xm.
Our goal in defining RANF is to ensure that the relation for each subfor-
mula that we need to evaluate is finite. We do not need to evaluate ~B if we
have “something to subtract it from”, i.e., if it occurs in the context of
A A lB, but we still need to evaluate A and B, then take their difference.
Thus, we need B’s free variables to be a subset of A’s for this to be
well-defined and finite. Similarly, if we need to evaluate ( A v II), its relation
is finite only if A and B have the same free variables. With these observa-
tions as motivation, we proceed to the formal development.
9.1 Relational Algebra Normal Form
To facilitate defining relation algebra normal form, we begin with prelimi-
nary definitions. From now on, we assume universal quantifiers have been
removed from all formulas by rewriting v x as 1 ~ x ~.
Definition 9.1. The property genall( A) holds for formula A if and only if
gen( x, A) holds for each free variable x in A.
Note that if genall( A) holds, then the relation for A must be finite.
ACM TransactIons on Database Systems, Vol 16, No. 2, June 1991.
262 . A. Van Gelder and R. W Topor
Definition 9.2. A formula (without universal quantifiers) is called simpli-
fied it
(1) There is no occurrence of 17A.
(2) There is no occurrence of 1(x = y), or -(x # y), or negation of any other
restriction atom, as defined in Definition 4.1.
(3) The polyadic operators “A”, “ v“, and “a” are “flattened”; that is:
(a) in subformula Al A “ “ “ A A ~, no operand A, is itself a conjunction,
and n > 2.
(b) in subformula Al V . “ . V A~, no operand A, is itself a disjunction,
and n > 2.
(c) in subformula 3 1A, operand A does not begin with ~.
(4) In every subformal ~ ZA, each variable x, is actually free in A.
We assume the existence of a function simplify that transforms an arbi-
trary formula into an equivalent (simplified) formula that satisfies Definition
9.2. It is important for the analysis that A, V, and ~ are regarded as polyadic
operators and that the formulas are flattened.
Definition 9.3. A simplified formula is negative if its root is “ 1”; other-
wise, it is positive. An arbitrary formula is negative (resp., positive) if its
simplified form is negative (resp., positive).
A subformula of a simplified formula is restrictive if its parent is “A” and it
is either negative or a restriction atom (see Definition 4.1) such as x = y or
x # y. Any other formula is called constructive.
Intuitively, restrictive subformulas need not be evaluated themselves be-
cause they are only needed to remove tuples from another relation that was
evaluated. For example, in A( x, y) A ( x = y) A Tl?( y), subformulas x = y
and lB( y) are restrictive, but B(y) itself is constructive.
Definition 9.4. A formula is in existential normal form (ENF) ifi
(1) It is simplified.
(2) For each disjunction in the formula:
(a) The parent of the disjunction, if it has one, is “A”.
(b) Each operand of the disjunction is a positive formula.
Example 9.1. The following allowed formulas are in ENF.
(pxy/1 l~uQuy) V 3z(Sxyz A 7Rz)
Pxy A (Qx V Ry)
~xy A ‘=z(Qxz A ‘RyZ)
PxA ‘~y(Qy A ‘~ZRXyZ)
Of course, a formula may be in ENF, but not be allowed:
Pz~(Qxv Ry).
Definition 9.4 leads to prohibited parent-child combinations in an ENF
formula, as shown in Figure 7 by nonblank entries; these entries also specify
ACM TransactIons on Database Systems, Vol 16, No 2, June 1991.
Safety and Translation of Relational Calculus Queries . 263
Child
Parent v A 3 ~
v s T2R. Fig. 7. Rewrite rules to achieve ENF;
A 8“s” denotes a call to .sm@~Y(F’); blankentries denote acceptable parent-child
3 T9 s combinations,
7 T3 T2t s
I only if every conjunct of A k negative.
rewrite rules to correct the violations. The diagonal entries “s” denote a callto simplify( 1’) (see Definition 9.2), and have highest priority. The off-diago-
nal entries refer to other rewrite rules below which are only applied after
simplification (cf. Figure 3):
l(lAl A””- AIA~)+Alv. ..v A~ (T2)
lAVBI V.. .VBn+l(A AIBIA. .. AIBn) (T2R)
l(A, v””- VAn)+A(lAIA . . . A lAn) (T3)
~2(A(=) V B(=) +(%iA(i) V %/B(l))) (T9).
The restriction to simplified formulas ensures that the following conditions
hold for application of certain rewrite rules:
(1) In T2, all A, must be positive.
(2) In T2R, A must be positive;
The following algorithm uses the rewrite rules to achieve a transformation
into ENF.
Algorithm 9.1. enf(F)
Input: A formula F.
Output: An ENF formula equivalent to F.
Procedure: In this algorithm, “formula” means a tree data ~structure represent-ing a formula. The notation F[ A / B] where A is a subformula of F m cans thedata structure for F is modified by replacing the subtree for A by a tree thatrepresents B.
FO := simplify (F).
While some subformula A of F. matches the left side of one of the rewrite rulesT2, T2R, T3, or T9, do:
Apply the rewrite rule to A, yielding result B.FO := simplify (Fo[ A /Bl).
Return F..
LEMMA S3.1. Algorithm 9.1 transforms any formula into an equivalent
ENF formula. Further, if the input formula is allowed, so is the output.
PROOF. It is clear that FO satisfies Definition 9.4 when the “while”
terminates, as none of the rewrite rules in enf(F) applies. It only remains to
ACM TransactIons on Database Systems, Vol. 16, No. 2, June 1991
264 s A. Van Gelder and R. W. Topor
check that the “while” must terminate. The only possibility of looping
involves T2, T2R, and T3. But, simplification ensures that T2 cannot be
followed T3 at the same point in the formula. As mentioned above, A must
be positive when T2R is used. This ensures that the result cannot match the
left side of T2. Thus a finite number of rule applications are possible at any
given depth in the formula, and no rule increases the number of atoms, so the
total number of possible rule applications is finite.
A consequence of Theorem 6.6 is that the rewrite rules of the algorithm
preserve the “allowed” property. •l
From ENF as a starting point, we can now define relational algebra
normal form.
Definition 9.5. A formula F is in relational algebra normal form (RANF)
if it is in ENF, genall( F) holds, and every proper constructive subformula of
F is in RANF.
Subtractive subformulas need not be in RANF because “~ ~” can be
implemented in relational algebra as “difference”, as described later, while
x = y and x # y become selections. However, a negative subformula that is
not restrictive must be put into RANF. Of course, genall holds for a negative
subformula if and only if it has no free variables.
Example 9.2. Consider again the ENF formulas from Example 9.1. This
formula is in RANF:
Clearly, not every allowed ENF formula is in RANF. Not only are the
following allowed formulas not in RANF, but no conservative transformation
of them yields an RANF formula.
Pxy A (@v RY)
Pxy A 73Z(QXZA ‘@)
PxA 13y(Qy A ldZ&yZ).
It follows from the next lemma that the RANF class is unchanged if the
recursive RANF requirement in its definition is replaced by corresponding
“allowed” requirement.
LEMMA 9.2. An ENF formula F is in RANF if and only if F is allowed andeach constructive subforrnula A of F is allowed.
PROOF. The “if” case follows immediately from the definition of RANF
and the fact that an allowed formula satisfies genall. To prove the “only if”
case, we use structural induction on F. For A an edb atom, genall( A) holds
and A is allowed. Assume positive formulas A, A,, and BJ are allowed, by
inductive hypothesis. Then, by the definitions of “allowed”, gen, and genall:
(1) If genall( Al v . . . v A ~) holds, then each A, has the same free variables
and is not x = y or x # y. Thus, Al V “ . . VAm is allowed.
ACM TransactIons on Database Systems,Vol 16, No 2, June 1991
Safety and Translation of Relational Calculus Queries . 265
(2)1fgenall(A1 A”. ”AAn ATBIA . . . A 1 Bn) holds, then each free variable
in each ,13Jalso occurs free in some A, (other than x = y or x # y). Thus,
AI A””” A/ire ATBI A... A lBn is allowed.
(3) Formula ~2A is allowed, since ENF (through simplification) requires that
each variable in 2 must actually be free in A.
(4) If -A is a subformula of F whose parent is not “A”, then genall(TA)
holds only if A has no free variables, in which case -A is also allowed.
The same applies to 1 B1 A . . . A lBn.
(5) If the parent of x = y or x # y is not “A”, the parent cannot satisfy
genall.
Thus any positive formula that can be constructed from A, A,, and BJ with
one more logical operation and satisfies genall is also allowed. ❑
9.2 Transformation into RANF
We now present an algorithm to transform an allowed ENF formula into an
equivalent RANF formula. It is based on repeated application of the rewrite
rules below. Since use of the distribution transformation (T I 1 below) can lead
to exponential growth in formula size, the algorithm strives to use it only
where necessary.
(~~AABIA.. “ABn)+3~(A ABIA. .. ABn) (T8R)
(Alv. ” “VAm)AB +(AIAB)V. .OV (Am A B) (Tll)
=AAB+T(AAB)A B. (T15)
Transformation T15 is justified by El 1 and E2, as follows:
TAAB=(TA AB)v(TBAB)=(TAv TB)AB=~(AAB)AB.
Again by Theorem 6.6, these rewrite rules transform allowed formulas into
allowed formulas. It is easy to see that T8R and T15 transform ENF formulas
into ENF formulas, with the understanding that the rewrite rule application
is followed by resimplification of the result (see Definition 9.2). However, T11
may transform an ENF formula into a non-ENF formula, as in the case:
(DA ~ (( Al V Az) A B). Thus en~ is called after application of Tll in the
following algorithm.
Algorithm 9.2. ranf(F)
Input: ArI allowed ENF formula F.
Output: An RANF formula equivalent to F.
Procedure: Figure 8 illustrates the main ideas of the algorithm. The algorithmcalls simplify and enf, which we described earlier, as well as itself.
In this algorithm, “formula” usually means a tree data structure representinga formula. The notation F[ A /B] where A is a subformula of F means the datastructure for F is modified by replacing the subtree for A by a tree thatrepresents B.
The procedure selects one of the recursive cases T8R, Tll, T115, or exits throughthe “ELSE” case, as described below. In each case, FI is an allowed (notnecessarily proper) subformula of F’ to be replaced. Fz is the equivalent allowedformula that replaces F1. (Recall that genall holds for allowed formulas.) The
defnotation ‘(FI = . . . “ means that FI matches the pattern on the righthand side.
ACM Transactions on Database Systems, Vo~ 16, No 2, June 1991
266 . A Van Gelder and R. W ToDor
Rule Pattern Problem Cure
+
A
e
A
T8R gen(z,,4) fads, but3g
C(.C,z) B(z) gend/(A A C) hold> 3yB(:)
A(z, y) A
A(x, y) C(x, z)
&
A
Tll
A
Agen(z, AI V A2) fails,
v qz, ~) ~(z) but genall((A1 v A2) A v(7) holds
B(z)
AI(z, y) A2(y) A A
A,(z, y) C(z, z) A2(y) c(z, Z)
+
A
P
A
T15 gen(z, A) fails, but C
C(x, z) B(z) IS m RANF. C(z, z) B(z)
A(x) A
A(r) C(x, z)
Fig. 8. Illustration of rewrite rules used m ranf (somewhat simplified)
T8R: If FI ‘~f ~jA A BI A . . . A Bn, and genall( A) does not hold, then:
Let 2 be the set of variables that are free in A such that gen( x,, A) fails.(Since FI is an allowed formula, this set is disjoint from ~.)
LetBIA... A B~ be a prefix (possibly after rearrangement) of BI A . . . AB. such that genall( A A BI A . . . A B*) holds. (At worst k = n, because
genall( Fl) holds, )F2:=3j(A ABIA. .. ABk)ABk+1A. ..~Bn
Return r@(SZ~p~Lfy(F[ F, / F,])).
Tll: If F1d~fGAB1 A . . . A Bn, where Gd~fAl v . . . V A ~ and genall(G) does nothold, then:
LetBIA... A Bk be a prefix (possibly after rearrangement) of BI A . . . A
B. such that genall(G A BI A . . ~ A Bk) holds. (At worst h = n, becausegenall( Fl) holds. )
(Distribute BI A . ~. A Bk over G. Note that each disjunct G, has the sameset of free variables. )
For 15 i < m, do: G,:= ranf(A,ABIA . . . AB~),
F2:=(Gl V.. .VG,n)A Bk+l ABn. ABn
Return ranf( en f(F[ F, / F, ]))
T15: If Fld~f=AABI A... A Bn, and genall( A) does not hold, then:
Let ; be the set of variables x that are free in A, but for whichgen( x,, A) does not hold.
ACM Transactions on Database Systems, Vol 16, No. 2, June 1991.
Safety and Translation of Relational Calculus Queries . 267
~et~lA-.. A B~ be a prefix (possibly after rearrangement) of BI A . . . A
Iin such that all elements of ~ are free in Ill A . . . A B~ and genall( BIA . . . A Bk) holds. (At worst k = n, because genall( Fl) holds.)
G:= ranf(Bl A . ~. A Bk).
F2:=1(A AG)ABIA.. .ABn
Return ranf(.simplify(F[ FI /Fzl)).
ELSE: If none of cases T8R, Tll, or T15 apply, then:
Return F.
Correctness of this algorithm is proved in Section 9.3. Details of the
translation into relational algebra are covered in Section 9.4. The above
algorithm has considerable nondeterminacy, which we may view as an
opportunity for optimization in an implementation. Examp”les of the applica-
tion of Algorithm 9.2 are given in Examples 9.3 and 9.5 below.
Example 9.3. This example illustrates the operation c)f Algorithm 9.2.
From this example it appears necessary to be able to interleave the rewrites
of cases T8R, T11, and T15 in the algorithm. The initial formula is
The rewrite rules are applied in the order T11, T15, T8R, T11, giving the
following sequence of formulas, the last of which is in RANF.
(~XYAQX) V(~XYA~YA T~Z[SZA(~ZXV~Z)])
(~XYAQX) V(~XYA~YA l(@A~ZISZA(~ ZXV~Z)]))
(PXYAQX) V(PXYARYA T~Z[PXYASZA (PZXVRZ)])
(Pxy A Qx) v (PXYAI?YA T2Z[SZA ((~xyA~zx) v (( PxYA&)))]).
This example is discussed further in Example 9.4 below.
9.3 Correctness of Translation to RANF
Correctness of Algorithm 9.2, stated in Theorem 9.6, is established in two
stages:
(1) The algorithm terminates (Lemma 9.3 and 9.4).
(2) If none of the cases T8R, Tll, or T15 applies to F, then it is in RANF
(Lemma 9.5).
Termination is shown by the method of multisets, as described by Der-
showitz and Manna [7]. Finite multisets of nonnegative integers are well-
ordered by listing their elements in decreasing order and comparing these
lists lexicographically. We shall associate a finite multiset with each simpli-
fied formula and use induction on multisets. A similar technique to provetermination of a rewrite procedure was used by Lloyd and Topor [14]. Regard
the formula as a tree with atoms at the leaves and edges directed toward the
leaves.
ACM Transactions on DatabaseSystems,Vol. 16, No. 2, June 1991.
268 . A. Van Gelder and R. W. Topor
Definition 9.6. The and-height of a node in a formula is the maximum
number of “A” nodes, including itself, on any path from the node to a leaf.
Definition 9.7. Let A be a constructive (not necessarily proper) subfor-
mula of F, let x, be a free variable of A such that gen( x,, A) fails. Let k be
the and-height of A and let B be the parent of A (if it exists). Then the
element cost of the variable-node pair (x,, A), denoted pl( x,, A), is the
nonnegative integer defined by:
{
2kw~(~,, A) =
if B exists and gen( x,, B) holds
2 k + 1 otherwise,
The node cost of A, denoted pz( A), is the multiset of element costs of pairs
(x,, A) such that PI( x,, A) is defined. The node cost is the empty multiset if
genall( A) holds.
The cost of a formula F, denoted W(3’), is defined to be the multiset union
of the node costs of its constructive nodes. The cost of an RANF formula is
the empty multiset.
Example 9.4. This example shows the succession of values of the cost
function ~ as the formula of Example 9.3 is rewritten in Algorithm 9.2. The
initial formula and its node costs are
I’~f~XYA (QXV(l?YA 7qZ[SZA (12ZXVl?Z)]))
{} {Ax,%} {5X} {3X} {31} {1.L}
Only the node costs of constructive internal nodes, reading left to right, are
shown. The subscripts are annotations to show which variable produced each
element. Collecting, P(F) = {5,4, 4, 3, 3, 1}. The evolution of the cost during
rewriting is shown below.
(PXYAQX) V(PXYARYA ~~X[SZA(P.XVRZ)])
{1’ {1’ {} {3X) {3z)’ {lx} p = {3,3,1}
(F’XYAQX) V(PXYARYA T(PXYAqZ[SZA (PZXVRZ)]))
{} {1’ {} {H2.Z} {3.} {lx} ~ = {3,2,1}
(~xYAQx) v(pxYARYA 13z[pxY ASZA(pZXVRX)])
{)’ {} {} {} {2.X} {lx} /l = {2,1}
(RXYAQX)V (F%YAl?YA Tq,Z[SZA ((~xy AI?zx) v ((pxYAl?z)))])
P={)
The node costs of the intermediate formulas are shown under them. Only
node costs for internal constructive nodes are shown. Recall that A A B A C is
really one ternary conjunction; its node cost is shown under the left “A”.
Observe that the ~’s are lexicographically decreasing as we proceed throughthe intermediate stages. We prove below that this algorithm always produces
a decreasing sequence of p’s.
ACM Transactions on Database Systems, Vol 16, No 2, June 1991
Safety and Translation of Relational Calculus Queries . 269
LEMMA 9.3. In case Tll in the body of Algorithm 9.2, K(enf(F[ FI /Fzl )) s
p(F’[Fl/Fzl).
PROOF. The only problem is that T3 or T9 can apply at Fz when k = n,
because in this case Fz ‘~f GI v . . . v Gm. In general, these rewrite rules can
increase the and-height of some nodes, thereby increasing ~z for those nodes.
But Fz is in RANF, so there are no element costs within it to increase. ❑
LEMMA 9.4. In the body of Algorithm 9.2, each recursive call to ranf is
made on an allowed ENF formula whose cost is less than P(F).
PROOF. In case T8R the element costs in p(F) contributed by pairs of the
form (x,, ~ jA) are absent from P(F[ FI /Fz ]). The new nodes are “~ $“ and
“A” in the subformula ~j(A A BI A . . . A Bk). (Recall that A, Bl,. . . . B~ are
all operands of one polyadic “A”, although for notational convenience we
have written several infix symbols.) But these new nodes satisfy genall, so
contribute noting to ~. Moreover, the element costs due to pairs of the form
(x,, A) each decrease by 1 because the parent of A now satisfies genall.
(Calls to simplify can never increase the cost.) Checking the possible cases
for the parent of FI (it must be “v”, “ ~“, or “ -“) shows that
simplify( F[ FI /Fz ]) is in ENF.
In case T11 there are multiple recursive calls. For those that create the
G,, we observe that genall( A, A BI A . . . A B~) holds for all 1 s i s m. Itfollows that these subformulas are allowed and in ENF. We need to show
that their costs are less than p(F). This is immediate if genall( A,) holds,
for there is some AJ that contributes element costs to F but not to this
subformula. But even if genall( A,) fails, the element costs making up the
node cost of A, decrease by 1 because the parent of A, now satisfies genall.
For the last recursive call (of this case), since the G, are in RANF and
have the same free variables, it is obvious that P(F[ FI /Fzl) < K(F). By
Lemma 9.3, the call to enf does not increase p.
In case T15, it is critical that the replicated G is in RANF’, so that p(G) is
the empty set. But now genall( A A G) holds, so the elements of the node cost
of A are reduced by 1 relative to A(F). Checking the possible cases for A (its
root must be “A” or “d” or an atom) shows that simplify ( F[ FI /Fa ]) is in
ENF. ❑
LEMMA 9.5. In the body of Algorithm 9,2, if none of cases T8R, T11, or
T15 applies, then F is in RANF.
PROOF. If F is not in RANF, then let A be a maximal constructive
subformula that is not allowed, which must exist by Lemma 9.2. A must
have some free variables and must be a proper subformula of F, as F is
allowed. By that assumption of maximalit y, the parent of A cannot be d or A.If the parent of A is “~”, the its parent is “A” and is constructive, so must
be allowed. Thus, case T15 applies. (The “ 1“ cannot be the root of F with
free variables in A.)
ACM Transactions on Database Systems, Vol 16, No 2, June 1991
270 . A. Van Gelder and R. W Topor
If the parent of A is “A” and the root of A is “ ~“, then case T8R applies. If
the parent of A is “~” and the root of A is “v”, then case Tll applies. These
are the only possibilities because A is constructive.
It follows that such an A cannot exist in rcznf(F). ❑
THEOREM 9.6. Algorithm 9.2 correctly transforms any allowed formula
into an equivalent RANF formula.
ROOF. Lemmas 9.3 and 9.4 show that recursive calls are always on
formulas that satisfy the input restrictions in the algorithm and have re-
duced p value. Since the range of ~ is well ordered, chains of recursive calls
terminate. Lemma 9.5 shows that the formula returned upon exit is in
RANF. ❑
9.4 Translation into Relational Algebra
The translation of a formula F in relational algebra normal form into an
equivalent relational algebra expression is straightforward; the basics are
given by Unman [26, 251. The translation given earlier [26] required the
construction of the Dom relation consisting of all constants in the formula
and in database relations that occur in the formula. Our translation, like
that in the revised edition [25], does not require such a Dom relation.
We translate A v B into A’ U B’ (possibly after a permutation of columns),
where A’ and B’ are the translations of A and B. Similarly, we translate
A ~ B into A’ w B’ or A’ x B’, and ~ XA into TA’. Atoms x = c are translated
either into an appropriate selection operator or into C, where C is a unary
relation containing the single tuple c. Atoms x = y and x # y (and otherrestriction atoms) are translated into an appropriate selection operator. To
translate A ~ -B, we use the following generalized set difference operator
introduced by Hall et al. [10].
Definition 9.8. Let P and Q be relational expressions such that the
attributes of Q are a subset of those of P. The relational operation general-
ized set difference, P cliff Q, yields the set of tuples in P whose projections
are not in Q. That is,
Pdiff Q~fP-r(P~Q)
where the (equi-)join is on the attributes of Q and the projection is onto the
attributes of P. If P and Q have the same arity, then P cliff Q is simply
P – Q (possibly after a permutation of columns).
More generally, we translate every conjunction Al ~ . . . ~ A ~ ~ lB1A “ “ “ A ~Bn, where the A, and BJ are positive sub formulas, into P’ cliff B:
cliff . . . cliff B;, where P’ is the translation of Al A . . . A Am, and B:, . . . . B;are the translations of the B]. If there are no positive A,( m = O), then each
ACM TransactIons on Database Systems, Vol 16, No 2, June 1991
Safety and Translation of Relational Calculus Queries . 271
l?, must be a closed formula, and we use the singleton O-ary relation { <> }
(i.e., true) for P’.Although we have defined P cliff Q in terms of primitive relational
operators, it should be implemented as a primitive in its own right, using
techniques similar to those used for efficient joins. (In fact, cliff is also called
anti-join by Maier [15].) Thus we retain cliff in our final relational algebra
expression.
Example 9.5. We show below, for several allowed formuhils (cf., Example
9.2), the RANF formula and relational algebra expression constructed by the
above procedures.
Allowed: Pxy I? (Qx V Ry)
RANF: (Pxy ~ Qx) V (Pxy A Ry)
Algebra: T12(PM1=1Q) U rlJPrMz=l R)
Allowed: PX A Vy(7Qy V 3ZRXyZ)
RANF: PXA Tqy(Px A QyA TszRxyz)Algebra: P– ~l(Px Q–T12R)
Allowed: Pxy A VZ(-QXZ V RYZ)
RANF: Pxy A ‘3 z(px’y A Qxz A 7RYZ)
Algebra: P – ~12(~121(P ~1=1 Q)diffz, ~=2, ~R)
Allowed: Px A 3 y(Qy A VZ[RZ + ~xyz])
RANF: PXA ~Y(PXA QYA T3Z[PXA QYARZA TSxyzl)
Algebra: PD%.l~l(Px Q – T12KP x Q x R) – S1).
THEOREM 9.7. Every allowed formula can effectively be translated into an
equivalent relational algebra expression.
PRoOF. Theorem 9.6 shows that every allowed formula can effectively be
translated into an equivalent formula in RANF. Structural induction on the
resulting formula then shows that the relational algebra expression produced
by the above (informal) algorithm correctly evaluates the RANF formula,
and hence the original formula. ❑
Many simplifications of the relational algebra expressions produced by the
procedures of this section can be made during their construction. Alterna-
tively, final expressions can be simplified using standard methods [251.
10. EVALUABLE AND DOMAIN INDEPENDENT FORMULAS
In this section we show that the evaluable class is contained in the domain
independent class and that with the restriction to formulas with no repeated
predicates evaluable is equivalent to domain independent. To do so, we use
the fact that domain independent is equivalent to definite, which we now
define, following Nicolas and Demolombe [21].
Definition 10.1. Let I be an interpretation with domain I) for a formula
F, and let pi be the relations assigned by I to the edb predicates P, that
ACM Transactions on Database Systems, Vol. 16, No 2, June 1991.
272 . A Van Gelder and R W Topor
occur in F. Let * be a value not in D. Then the *-extension of I is the
interpretation I with domain D’ = D U {*} that assigns the same relations
p, to the p;edicat~s P, as does I. We denote appropriate cross products of Dand D’ by D and D’, respectively.
Definition 10.2. A formula F is called definite if, for all interpretations I,F is satisfied at the same points in I as in I’, where 1’ is the *-extension of 1.In other words, ii satisfies F in I’ if and only if ii satisfies F in I.
10.1 Evaluable Formulas are Domain independent
We now show that every evaluable formula is domain independent. This was
proved originally by Demolombe [6] for evaluable formulas as defined there.
The statement needs to reexamined because we have used an independent
definition, and have incorporated equality.
Our proof is significantly simpler because of Theorems 8.6 an 9.6, which
state that every evaluable formula has an equivalent RANF formula. Hence
it is sufficient to prove domain independence for RANF formulas.
LEMMA 10.1. Let F(x) be a formula, possibly containing other free
variables besides x. Let I be an interpretation for F with domain D and
e-extension 1’. If gen( X, F) holds, then F does not hold in I’ for any assign-
ment that assigns * to x.
PROOF. Use induction on formula size, which we define to be the number
of atoms plus the number of quantifiers (negations are excluded). For the
basis F is an atom and not of the form x = y; the conclusion is immediate.
For the induction, one of the following cases applies.
F~~fA A B. One of A and B satisfies gen, and therefore by the inductive
hypothesis, does not hold if x is assigned *.
—F~~~A v B. Both of A and B satisfy gen, and therefore by the inductive
hypothesis, do not hold if x is assigned *.
—Fd:f % yA. A satisfies gen, and therefore by the inductive hypothesis, does
not hold if x is assigned *.
—Fd~f 7A. If A is an atom, the conclusion holds vacuously, since gen( x, F) is
false. Otherwise, push the 1 giving G (i.e., pushnot(-A, G) holds). Theneither G is an atom other than x = y, or one of the above cases applies to
G. Cl
LEMMA 10.2. If F is an RANF formula, then F is definite.
PROOF. By Lemma 10.1, it is sufficient to show that gen( x, F) holds for
every free variable x in F and that gen( y, A) holds for every sub formula
=yA of F. But these properties follow immediately from the definition of
RANF. ❑
ACM TransactIons on Database Systems, Vol 16, No. 2, June 1991
Safety and Translation of Relational Calculus Queries . 273
THEOREM 10.3. If F is evaluable, then F is definite, and hence is domain
independent.
PROOF. By Theorems 8.6 and 9.6 and Lemma 10.2. ❑
10.2 Evaluable Formulas with No Repeated Predicates
The class of formulas with no repeated predicates is not particularly interest-
ing in its own right. However, if a query processing method does not
recognize the presence of repeated predicates and exploit tlhem by means of
useful transformations, then formulas with repeated predicates are no more
amenable to evaluation than those without repeated predicates. Thus study
of this class can shed light on the limits of such methods.
Essentially, the domain independent class is not recursive because a given
formula may have a subformula that is superficially not domain independent,
but is unsatisfiable, hence (vacuously) domain independent. Even though
unsatisfiability is decidable for formulas with sufficiently simple quantifier
structure (as was shown by Ackermann [1]), we do not consider it practical to
test subformulas for unsatisfiability as part of the procedure that transforms
them into relational algebra. However, formulas in which no predicate
symbol is repeated cannot possibly have unsatisfiable subformulas. We show
that formulas in this class (without equality) are evaluable if and only if they
are domain independent. This means that any extension to the class of
evaluable formulas that remains domain independent, to be implementable,
must at least provide for simplifications based on common subexpressions
(e.g., subsumption tests), and should probably include some form of inference
capability (e. g., resolution).
LEMMA 10.4. Let F be a formula with no repeated predicate symbols and
no equality that is in PLIVF, i. e., Fd~f % 2M( 2, j), where M is quantifier-free
(see Definition 4.2). Suppose M is a conjunction of literals. If F is definite,
then F is evaluable. The same holds if M is a disjunction of literals.
PROOF. Let M~fPl A “ “ “ A P. A lQI A “ “ “ A TQm where each P, and QJ is
an atom of a different predicate. Suppose F is not evaluable. Let D = { a}.
We shall prove F is not definite by constructing an interpretation I with
domain D and *-extension I’ such that F evaluates differently in I and 1’.
This contradiction will prove the lemma.
As F is not evaluable, one of the following three cases applies.
(1) Variable x is universally quantified in F and con( x, 7M) does not hold.
Then some P, contains x. Assign each P, to hold for (a, . . . . a) and assign
each Qj to be empty. Then F holds (on (a, . . . . a) if it has free variables)
in I but not in I’.(2) Case 1 abo~e does not apply, x is ezsistentially quantified in F, and
con( x, M) does not hold. Then x occurs in some QJ and in no P,. Also
universally quantified variables appear in no P,. Assign each P, and each
ACM Transactions on Database Systems. Vol. 16. No 2, June 1991.
274 . A Van Gelder and R. W TODOI
QJ to hold for (a,. . . . a). Then F holds (on (a, , a) if it has free
variables) in I’ but not in I.
(3) Case 1 above does not apply, x is free in F, and gen( x, M) does not hold.Then con( x, M) also does not hold. Proceed as in Case 2.
The case where M is a disjunction of literals is similar. ❑
THEOREM 10.5. Let F be a formula with no repeated predicate symbols and
no equality. Then F is definite if and only if F is evaluable.
PROOF. The “if” part holds by Theorem 10.3 above. By Corollary 6.3 we
may assume F is in PLNF and is given by
where M is quantifier free. We define the size of a formula to be the number
of atoms plus the number of quantifiers in it. For the “only if” part, we show
by induction on size that if F is definite, then we can reduce to the casedef def
covered in Lemma 10.4. The base cases, M = P and M = 7P, are immediate.
Suppose con( x, M) is false for existential variable x of F. If M contains
any v’s we shall derive a contradiction.
(1) M3fA v B. The~,fone of co~$x, A) and con( x, B) fails. Say con( x, A)
fails. Define Ml = A and FI = % ZMl. Suppose FI were not definite; since
B has no predicates in common with A, we can extend any interpretation
over FI to one over F in which B is always false. This would show that
F was not definite, a contradiction. Therefore FI is definite, and by the
inductive hypothesis, evaluable. But con( x, Ml) fails, a contradiction.
(2) M~fA ~ B. Then one of con(x, A) and con(x, 1?) fails. Say con(x, A)
fails. If any v’s occur in A, we can derive a contradiction by recursively
applying steps 1 and 2 to A instead of M. Assuming A is free of v’s, we
work on B. If con( x, B) fails, we can recursively apply steps 1 and 2. The
only other possibility is that con( x, 1?) holds and gen( x, B) fails.
(a) B~f C v D. Either con( x, C) holds and gen(x, C) fails, or con( x, D)
holds and gen( x, D) fails, (or both), say the former. Define Bz ‘~f C,
Mz to be M with B replaced by Bz and Fz to be % ?Mz. Again, if Fz
were not definite, making D “as true as possible” would demonstrate
that F was not definite. But con( x, Mz ) must fail because of thesimilarity to M, so Fz cannot be evaluable, a contradiction of the
inductive hypothesis.
(b) B ~f C ~ D. Then con( x, C) holds and gen( x, C) fails, and con( x, D)
holds and gen( x, D) fails. If any v’s occur in C or D, we can derive a
contradiction by recursively applying steps 2a and 2b to C and Dinstead of B.
Therefore, a contradiction arises unless M contains no V’S. But in this case
Lemma 10.4 applies.
ACM Transactions on Database Systems, Vol 16, No 2, June 1991
Safety and Translation of Relational Calculus Queries . 275
The arguments for free variables are similar. The arguments for univer-
sally quantified variables interchange the roles of v and A. ❑
We conjecture that this theorem can be extended to allow some presence of
equality. However, it cannot be extended much in other directions in view of
the fact that (cf., Example 6.2)
~(x)~fVy[(~xA Qy) V (~x A TRY)]
is domain independent but not evaluable.
11. OPEN PROBLEMS
While it is fast (linear time) to determine whether a formula is evaluable, the
actual transformation into allowed form and then into RANF often involves
rewriting the current formula into an equivalent but larger formula. This
suggests some topics for further study:
(1)
(2)
(3)
Can the transformation from evaluable to allowed (Section 8) introduce a
size increase that is exponential in the number of variables, or is there a
polynomial bound?
Can a method that combines our approach with the work of Aylamazan et
al. [2] yield a more efficient evaluation method?
Can a broader subclass of domain independent formulas be recognized
efficiently (i. e., in polynomial time) by using a restricted class of infer-
ence rules such as subsumption and subsumption rescllution to exploit
repeated predicates?
APPENDIX A. Equality Reduction and Wide-Sense Evaluability
In this appendix, we describe transformations that normalize formulas with
respect to equality ( x = y), which we call equality reduction. Certain formu-
las containing equality do not satisfy the requirements for evaluability
initially, but are evaluable after equality reduction. We say that such
formulas are evaluable in the wide sense. Wide-sense evaluability is invariant
under conservative transformations. Since every wide-sense evaluable for-
mula is equivalent to an evaluable formula, it is also domain independent.
LEMMA A. 1. Let Fd~fx = t A A( x, t,j), where t is either a variable or a
constant, and is not required to appear in A( x, t, ?). Then
F= F’~fx=t AA(t, t,j).
PROOF. In any evaluation, either x is assigned the same value as t or
both F and F evaluate to false. ❑
The lemma generalizes the transformations (E13 - 14) in Figure 4 to free
variables.
ACM Transactions on DatabaseSystems,Vcd 16, No 2, June 1991.
276 . A Van Gelder and R. W. Topor
Algorithm A. 1. Equality Reduction
Input: A relational calculus formula F with no universal quantifiers,
Output: An equivalent equality-reduced formula.
Procedure:
1, Apply the following transformation wherever possible.Let A( x, y) be the maximal subformula of F in which x is free. A mayhave other free variables. If A contains an atom x = y (or y = x), where y
is another free variable of A, then:
(a) Define AI(y) to be the formula that results from replacing every occur-rence of x in A by y, and then replacing y = y by true and carrying out
truth value simplification (Def. 8.2).
(b) Define A2( x, y) to be the formula that results from replacing each
occurrence of x = y in A( x, y) by false, and carrying out truth valuesimplification.5 Variables x and/or y may not actually appear in A2.
(c) Replace A( x, y) by
A’(X, Y)~f(X= yAA1(.V)) V(.X#y AA, (X,.Y)).
(d) If x is bound in F, then replace ~xA’( x, y) by
Al(y) vsx[x#y~A, (x, y)].
2. At this point all equalities between two free variables of F that remain can
be put in the form of “case splits” at the top of the formula by appropriately
“pushing ands” (En). For any case of the form x = z A A( z), where x is not
free in A and gerz( z, A) holds, rewrite this case as
x=z~A(x]~A(z)
This typically arises when A originally contained x but it was substituted
for in Step 1 above. In all implementation, we would not actually do it thisway; we would add a column replication primitive to our relational algebra,
The correctness of the algorithm follows from Lemma A. 1 and elementary
arguments.
Definition Al. A formula F is said to be wide-sense eualuable if
Algorithm A. 1 transforms it into an evaluable formula as defined in
Ilefinition 5.2.
Equality reduction can also be carried out on equalities between a variable
and a constant, and even on equalities between two constants, which may beintroduced as a result. Suppose c = d occurs, where c and d are distinct
constants. If the system assumes that the distinct name axiom c # d is
implicit in F, then we can make it explicit at the top level:
F+~#dAF
Now replace c = d by false throughout F and simplify, as in Step lb. Repeat
until all equalities between constants are removed. This extension may
improve efficiency, but does not change the class of wide-sense evaluable
5Bound variables of A are given different names in Al and A2
ACM TransactIons on Database Systems, Vol 16, No, 2, June 1991
Safety and Translation of Relational Calculus Queries . 277
formulas, so we do not consider it further here.
Example A. 1. Consider the formula F(y) ‘:f 3xA( x, y), where
A(x, y)~fPx@xyvx=y),
This formula might arise in a transitive closure calculation: If P represents
nodes at distance at most n from a source, and R represents edges, then Frepresents nodes at distance at most n + 1 from a source. This formula is not
strict-sense evaluable because gen,( y, F) fails to hold.
Noting that gen( x, A( x, y)) does hold, one might be tempted to formulate
some definition of gen in which gen( y,( Pz ~ x = y)) holds “by association”
because gen( x,( Px A x = y)) holds; such a proposal may be found in Unman
[25]. The idea would be to use distribution to locate Px by x = y. Two
drawbacks of relying on distribution to fix formulas are (1) distribution does
not always preserve evaluability, and (2) distribution can substantially in-
crease formula size. Moreover, a general algorithm is not apparent.
Let us instead apply the equality reduction algorithm:
A1(y)~fPy
Az(x, y)d~fpx~ Rxy
F’(Y) ~fPy V~X[X#y APXAl?Xy].
F’ is not only strict-sense evaluable, but is allowed.
Defining a class of formulas by means of an algorithm is not very satisfac-
tory. A better characterization of wide-sense evaluable formulas is a topic for
future research.
ACKNOWLEDGMENTS
We thank Robert Demolombe, who originated the evaluable class of formu-
las, for helpful discussions and for comments on an early draft of this work.
We also thank Hendrik Decker for helpful discussions. We are grateful to
Paris Kanellakis for bringing the article [2] to our attention. Finally, we
thank the anonymous referees for their careful reading of the paper and
numerous helpful suggestions.
REFERENCES
1. ACKERMANN, W, Solclable Cases of the Decision Problem. North-lFIolland, Amsterdam,
1968.
2, AYLAMAZAN, A. K., GICWLA, M. M., STOLBUSHIiIN, A. P., AND SCHWARTZ, G, F. Reduction ofthe relational model with infinite domains to the finite domain case. In Proceedings of
USSR Academy of Science. 286, 2 (1986).
3. BEEIU, C., NAQVI, S , RAMAKRISHNAN, R., SHMUELI, 0., AND TSUR, S. Sets and negation m alogic database language (LDL1). In 6th ACM S.vmposium on Principles of Database Sys-tems. 1987, 21–37.
4 CODD, E. F Relational completeness of data base sublanguages In Dda Base Systems, R.Rustin, Ed. Prentice-Hall, Englewood Cliffs, NJ., 1972, 65-98,
ACM TransactIons on Database Systems, Vol. 16, No 2, June 1991
278 . A Van Gelder and R W Topor
5.
6.
7.
8.
9
10.
11
12.
13
14.
15
16
17
18
19,
20.
21
22.
23
24.
25.
26.
27,
28.
DECKER H Integrity enforcement in deductive databases. In 1st In ternatiorzcsl Conference
on Expert Database Systems. 1986, 271-285,
DEMOLOMBE, R. Syntactical characterization of a subset of domain independent formulas.
Tech. Rep, ONERA-CERT, 1982.
DERSHOWITZ, N AND MANNA Z. Proving termination with multiset orderings Commun,
ACM. 22, 8 (1979), 465-476.
DIPAOL~, R. A, The recursive unsolvabdity of the decision problem for the class of definiteformulas. J, ACM. 16, 2 (1969), 324-324.
FAGIN, R. Horn clauses and database dependencies J. ACM 29, 4 (1982), 952-985
HALL, P., HITCHCOCK, P , .ANDTODD, S An algebra of relations for machine computation.
In ACM Symposwm on Principles of Programmmg Languages 1975, 225-232,
KIFER, M. On safety, domain independence, and capturabdity of database queries. In
Third International Conference on Data and Knozzledge Bases (Jerusalem, 1988),
KUHNS, J L, Answering questions by computer: A logical study, Tech. Rep. RM-5428-PR,
Rand Corp., 1967,
KUPER, G M Logic programming with sets. In 6th ACM Symposium on Prmczples of
Database Systems 1987, 11-20
LLOYD, J W.. AND TOPOR, R. W Making Prolog more expressive. J. Logw Program. 1, 3
(1984), 225-240,
MMER, D The Theory of Relational Databases Computer Science Press, Rockville, Md,,1983.MA~ows~~, J. A. Characterizing data base dependencies, In 8th Colloquium on AZL-
tomata, Languages and Programming Springer Verlag, New York, 1981MANN+ Z. Mathematical Theory of Computation. McGraw-Hill, New York, 1974.
MORRIS, K., ULLMAN, J, D , AND VAN GELDER, A. Design overview of the Nail! system, In
Third International Conference on Logic Programmmg. 1986, 554-568.
N~QVI, S., AND TSUR, S. A Logzc Language for Data and Knowledge Bases. ComputerScience Press, Rockville Md., 1988.
NICOLAS, J, M Logic for improving integrity checking in relational databases Acts Inj(
18, 3 (1982), 227-253
NICOI.AS, J M., AND DEMOLOMBE, R. On the stability of relational queries. Tech. Rep,
ONERA-CERT, 1982.OZSOYO~LU, G , .ANDWANG, H. On set comparison operators, safety, and QBE. Tech, RepCase Western Reserve Umv., Cleveland, Oh., 1987,TOPOR, R Domain independent formulas and databases. Theor. Comput. SCL 52, 3 (1987),
281-307.
TSUR, S,, AND ZANIOLO, C LDL: a logic-based data-language. In Proceedings of the 12th
International Con ference on Very Large Data Bases. 1986, 33-41, See also MCC repDB-150-85,
ULLMAN, J. D Principles of Database and Knowledge-Base Systems. Vol. 1. Computer
Science Press, Rockville, Md , 1988,ULLMAN, J, D. Prmczples of Database Systems. Computer Science Press, Rockville, Md.,1980, (Revised ed, 1982).
VAN GELDER, A , AND TOPOR, R W, Safety and correct translation of relational calculusformulas, In 6th ACM Symposium on Principles of Database Systems. 1987, 313-327.
VARDI, M. The decision problem for database dependencies, Inf Process. Lett. 12, 5 (1981),
251-254.
Received July 1987; revised December 1989; accepted June 1990
ACM TransactIons on Database Systems, Vol 16, No 2, June 1991