optimizing fragment constraints—a performance evaluation

22
Optimizing Fragment Constraints — A Performance Evaluation H. Ibrahim,* W. A. Gray,† N. J. Fiddian‡ Department of Computer Science, University of Wales, Cardiff, United Kingdom A principal problem with integrity constraints’ use for monitoring dynamically changing database integrity is evaluation cost. This cost associated with performance of checking mechanisms is the main quantitative measure which must be supervised carefully. Based Ž. on literature, evaluating an integrity constraint cost includes these main components: i Ž. Ž . data amount accessed; ii data amount transferred across network; and iii number of sites involved. In distributed databases, where many networked sites are involved, not only amount of data accessed must be minimized but also amount of data transferred across network and number of sites involved. In Ibrahim H, Gray WA, Fiddian NJ. SICSDD: Techniques and implementation. In Proceedings of Constraint Databases and Ž . Applications, Second International Workshop on Constraint Database Systems CDB’97 , Delphi, Greece, January 1997, pp 187 207 , we introduced an integrity constraint subsystem for a relational distributed database. The subsystem consists of several techniques necessary for efficient constraint checking, particularly in a distributed envi- ronment where data distribution is transparent to application domain. Here, we show how these techniques effectively reduce constraint checking cost in such a distributed environment. This is done by analyzing and comparing generated simplified forms to respective initial constraints respecting amount of data to be accessed, amount of data transferred across network, and number of sites involved during evaluation of constraints or simplified forms. Generally, our strategy reduces data amount needing to be accessed since only fragments of relations subject to update are evaluated. Data amount trans- ferred across network and number of sites that are involved are minimized by evaluating simplified forms at target site, i.e., site where update is performed. 2001 John Wiley & Sons, Inc. I. INTRODUCTION A database state is said to be consistent if the database satisfies a set of constraints, called semantic integrity constraints. Integrity constraints specify those configurations of the data that are considered semantically correct. Any Ž . Ž . update operation insert, delete, or modify or transaction sequence of updates that occurs must not violate these constraints. Thus, a fundamental issue * Author to whom correspondence should be addressed; e-mail: [email protected]. † e-mail: [email protected]. ‡ e-mail: n.j.fi[email protected]. Ž . INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, VOL. 16, 285306 2001 2001 John Wiley & Sons, Inc.

Upload: h-ibrahim

Post on 06-Jun-2016

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Optimizing fragment constraints—A performance evaluation

Optimizing Fragment Constraints —A Performance EvaluationH. Ibrahim,* W. A. Gray,† N. J. Fiddian‡Department of Computer Science, University of Wales, Cardiff, United Kingdom

A principal problem with integrity constraints’ use for monitoring dynamically changingdatabase integrity is evaluation cost. This cost associated with performance of checkingmechanisms is the main quantitative measure which must be supervised carefully. Based

Ž .on literature, evaluating an integrity constraint cost includes these main components: iŽ . Ž .data amount accessed; ii data amount transferred across network; and iii number of

sites involved. In distributed databases, where many networked sites are involved, notonly amount of data accessed must be minimized but also amount of data transferred

�across network and number of sites involved. In Ibrahim H, Gray WA, Fiddian NJ.SICSDD: Techniques and implementation. In Proceedings of Constraint Databases and

Ž .Applications, Second International Workshop on Constraint Database Systems CDB’97 ,�Delphi, Greece, January 1997, pp 187�207 , we introduced an integrity constraint

subsystem for a relational distributed database. The subsystem consists of severaltechniques necessary for efficient constraint checking, particularly in a distributed envi-ronment where data distribution is transparent to application domain. Here, we showhow these techniques effectively reduce constraint checking cost in such a distributedenvironment. This is done by analyzing and comparing generated simplified forms torespective initial constraints respecting amount of data to be accessed, amount of datatransferred across network, and number of sites involved during evaluation of constraintsor simplified forms. Generally, our strategy reduces data amount needing to be accessedsince only fragments of relations subject to update are evaluated. Data amount trans-ferred across network and number of sites that are involved are minimized by evaluatingsimplified forms at target site, i.e., site where update is performed. � 2001 John Wiley &Sons, Inc.

I. INTRODUCTION

A database state is said to be consistent if the database satisfies a set ofconstraints, called semantic integrity constraints. Integrity constraints specifythose configurations of the data that are considered semantically correct. Any

Ž . Ž .update operation insert, delete, or modify or transaction sequence of updatesthat occurs must not violate these constraints. Thus, a fundamental issue

* Author to whom correspondence should be addressed; e-mail:[email protected].

† e-mail: [email protected].‡ e-mail: [email protected].

Ž .INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, VOL. 16, 285�306 2001� 2001 John Wiley & Sons, Inc.

Page 2: Optimizing fragment constraints—A performance evaluation

IBRAHIM, GRAY, AND FIDDIAN286

concerning integrity constraints is constraint checking, that is the process ofensuring that the integrity constraints are satisfied after the database has beenupdated.

Much attention has been paid to the maintenance of integrity in centralizeddatabases over the last decade. A naive approach to constraint checking is toperform the update and then to check whether all the integrity constraints aresatisfied in the new database state. This method, termed brute force checking, isvery expensive, impractical, and can lead to prohibitive processing costs.7,11,16

Enforcement is costly because the evaluation of all integrity constraints requiresaccessing large amounts of data which are not involved in the database updatestate transition. Researchers suggested that constraint checking can be opti-mized by exploiting the fact that the constraints are known to be satisfied priorto an update. This strategy, known as incremental integrity checking, avoidsredundantly checking constraints that are satisfied in the database before andare not affected by the update operation. It is the basis of most currentapproaches to integrity checking in databases. Another strategy is to simplify theconstraint formulae so that less data are accessed to determine the truth of aconstraint. Under the assumption that the set of initial constraints, IC, is knownto be satisfied in the state before an update, simplified forms of IC, say IC�, areconstructed such that IC is satisfied in the new state if and only if IC� issatisfied, and the evaluation cost of IC� is less than or equal to the evaluationcost of IC. This strategy is referred to as constraint simplification and thesimplified forms of these constraints are referred to as integrity tests. Thisapproach conforms with the admonition of Nicolas14 to concentrate on theproblem of finding good§ constraints. Various simplification techniques havebeen proposed where integrity tests are derived from the syntactic structure ofthe constraints and the update operations.5,7,13,14,18 Researchers in this areafocussed solely on the derivation of efficient integrity tests, claiming that theyare cheaper to enforce and reduce the amount of data accessed, thus reducingthe cost of integrity constraint checking. Three different types of integrity testare defined in Ref. 13, namely, sufficient tests, necessary tests, and complete tests.

Although this research effort yielded fruitful results that have given central-ized systems a substantial level of reliability and robustness with respect to theintegrity of their data, there has so far been little research carried out onintegrity issues for distributed databases. Devising an efficient algorithm forenforcing database integrity against updates is more crucial in a distributedenvironment. The reasons for this are described in Refs. 1, 11, 16, and 18. Thebrute force strategy of checking constraints is worse in the distributed contextsince the checking would typically require data transfer as well as computationleading to complex algorithms to determine the most efficient approach. Allow-ing an update to execute with the intention of aborting it at commit time in theevent of constraint violation is also inefficient since rollback and recovery mustoccur at all sites in which the update participated.

§ Good is intended to mean easy to maintain and easy to check.14

Page 3: Optimizing fragment constraints—A performance evaluation

OPTIMIZING FRAGMENT CONSTRAINTS 287

A principal problem with the use of integrity constraints for monitoring theintegrity of a dynamically changing database is their cost of evaluation. This costwhich is associated with the performance of the checking mechanisms is themain quantitative measure which has to be supervised carefully. Differentcriteria are used to assess this performance such as the time to check thevalidity of the constraints against updates. Generally, an efficient constraintchecking strategy tries to minimize the utilization of the computing resourcesinvolved during the checking activities. A common goal addressed by previousresearchers in this field is to propose constraint simplification strategies whichmanage to derive a better set of constraints than the initial set. A simplificationstrategy is said to be efficient if the evaluation of the generated simplified formshas effectively reduced the cost of integrity checking compared to the evaluationof the initial constraints. Based on the following studies2,5,7,11,14,17,18 the cost of

Ž .evaluating an integrity constraint includes the following main components: iŽthe amount of data accessed locally or nonlocally for a distributed

. 7,17database �this is related to the checking space of the integrity constraint ;Ž . Ž .ii the amount of data transferred across the network; and iii the number of

Ž .sites involved, which is interrelated with ii above. For a centralized database,the main emphasis is on minimizing the amount of data accessed or thechecking space. In distributed databases, where many sites are involved, theamount of data transferred across the network and the number of sites involvedmust be minimized too. Most of the authors cited consider a single costcomponent due to the difficulty in assigning suitable weights to all cost compo-nents.

Bernstein and Blaustein2 and Nicolas14 used the following two intuitivearguments as a justification that the evaluation of a simplified constraintproduced by their algorithm effectively reduces the cost of integrity checking

Ž .compared to the evaluation of the corresponding initial constraint: i the moreconstants that are substituted for variables in a given constraint, the moreselective is the constraint, and so the easier it should be to evaluate and

Ž .therefore the cheaper the evaluation; and ii the simplified forms are derivedon a minimal substate of the database state and so involve less data access. Hsuand Imielinski7 measured the simplicity of an integrity constraint by the notion

Ž .of its checking space. The checking space of a constraint IC � , � , . . . , � is1 2 ndefined as � � � � ��� � � where � is the Cartesian product operator, and1 2 n

the � s are the range variables in the IC. A constraint IC-i is said to be simplerithan a constraint IC-j if the checking space of IC-i is smaller than the checkingspace of IC-j. The checking space of a constraint is a rough measure of thecomplexity of its evaluation and the number of variables it has to access. Thismeasurement is later used by other authors.17 Simon and Valduriez18 claimedthat their simplification method minimized the cost of integrity checking sinceonly data subject to update are evaluated. Thus, they are reducing the checkingspace by removing data that are known to be unaffected by the change.Mazumdar11 proposed a metric called scatter, � , to capture the amount ofnonlocal access necessary to evaluate a constraint in a distributed database.

Page 4: Optimizing fragment constraints—A performance evaluation

IBRAHIM, GRAY, AND FIDDIAN288

In Ref. 9, we introduced an integrity constraint subsystem for a relationalŽ .distributed database SICSDD that we developed. The subsystem provides

complete functionality and an efficient strategy for constraint enforcement.Complete functionality is attained through a modular and extensible architec-ture in which several techniques are incorporated. These techniques are neces-sary to achieve efficient constraint enforcement, particularly in a distributeddatabase. By database distribution we mean that a collection of data which

Ž .belongs logically to the same system is physically spread over the sites nodes ofa computer network where intersite data communication is a critical factoraffecting the system’s performance. In this paper, we show how the SICSDDtechniques effectively reduce the cost of constraint checking in a distributedenvironment. We do this by analyzing and comparing the generated simplifiedforms to their respective initial constraints with respect to the amount of datathat has to be accessed, the amount of data transferred across the network, andthe number of sites that may be involved during the evaluation of theseconstraints or simplified forms. In general, our strategy reduces the amount ofdata needing to be accessed since only fragments of relations subject to updateare evaluated. The amount of data transferred across the network and thenumber of sites that may be involved are minimized by evaluating the simplifiedforms at the target site, i.e., the site where an update is to be performed.

The paper is structured as follows. In Section II, the basic definitions,notations, and examples which are used in the rest of the paper are set out. InSection III, an overview is given of the techniques we adopt to derive simplifiedforms of initial constraints. In Section IV, we analyze the generated simplifiedforms and we compare them to their respective initial constraints. Conclusionsand further research are presented in the final Section V.

II. PRELIMINARIES

Our approach was developed in the context of relational databases,3 whichcan be regarded as consisting of two distinct parts, namely, an intensional partand an extensional part. A database is described by a database schema D, which

² :consists of a finite set of relation schemas, R , R , . . . , R . A relation schema1 2 mŽ . Žis denoted by R A , A , . . . , A where R is the name of the relation predi-1 2 n

. Ž .cate with n arity and the A s are the attributes of R. Let dom A be thei idomain values for attribute A . Then, an instance of R is a relation RR which isi

Ž . Ž .a finite subset of Cartesian product dom A � ��� � dom A . A database1 ninstance is a collection of instances for its relation schemas. A relational

Ž .distributed database schema is described as a quadruple D, IC, FR, AS whereIC is a finite set of integrity constraints, FR is a finite set of partitioning rules,and AS is a finite set of allocation schemas.

The database integrity constraints considered in this paper are state con-Ž . Ž . Ž .straints which include: domain IC-1 , key IC-2, IC-3 , referential IC-4 , and

Ž .general semantic integrity constraints IC-5, IC-6 . They are expressed inprenex conjunctive normal form with the range restricted property.13,14 A con-

Ž . Ž .junct literal is an atomic formula of the form R u , u , . . . , u where R is a k1 2 k

Page 5: Optimizing fragment constraints—A performance evaluation

OPTIMIZING FRAGMENT CONSTRAINTS 289

ary relation name and each u is either a variable or a constant. A positiveiŽ . Ž .atomic formula positive literal is denoted by R u , u , . . . , u while a negative1 2 kŽ . Ž .atomic formula negative literal is prefixed by �. An in equality is a formula of

Ž .the form u OP u prefixed with � for inequality where both u and u can be1 2 1 2� 4constants or variables and OP � � , � , � , � , �� , � .

A set of fragmentation rules, FR, specifies the set of restrictions, C , thatimust be satisfied by each fragment relation RR . These rules introduce a new setiof integrity constraints and therefore have the same notation as IC. We assumethat the fragmentation of relations satisfies the completeness, the disjointness,and the reconstructability properties.15 An allocation schema locates a fragmentrelation, RR , to one or more sites. Throughout this paper, the same exampleiemp dept database is used, as given in Figure 1. Here, we assume that the�relation emp is first vertically fragmented into emp on attribute set A �1 1� 4 � 4eno, ename, eaddress and emp on attribute set A � eno, dno, ejob, esal ;2 2emp and dept are horizontally fragmented into two fragments emp and dept ,2 2 i irespectively, with predicates dno � D1 and dno � D2.

Figure 1. The emp dept intensional database.�

Page 6: Optimizing fragment constraints—A performance evaluation

IBRAHIM, GRAY, AND FIDDIAN290

III. AN OVERVIEW OF OUR OPTIMIZATION TECHNIQUES

The high execution cost of constraint enforcement is one of the majorproblems in the field of constraint handling.4 This cost can be substantiallyreduced not only by applying an efficient enforcement strategy but also bygenerating and evaluating a good set of integrity constraints. The techniquesthat are incorporated into our system seek to derive efficient sets of fragment

�constraints and a range of possible local tests. Here, ‘‘efficient’’ means a set offragment constraints which is semantically equivalent to the initial set, does notcontain any semantically or syntactically redundant constructs or constraints,eliminates any constraints which contradict already existing constraints or rules,

Ž .and contains constraints that are either more local less distributed or entirelylocal when compared to the initial set. Our techniques are identified as con-straint preprocessing, constraint distribution, and integrity test generation. They aresummarized below as a full description of these techniques can be found inRefs. 9 and 10.

Constraint Preprocessing Techniques. There are five steps that are appliedŽwhich are performed by the following procedures we use the notation X to

indicate a finite set of variables, x , x , . . . , x ; and x , x , and x are the1 2 k key join ref.key, join, and reference attribute, respectively .

Ž .i constraint transformation procedure. This procedure transforms the� �Ž .constraints specification at the relational level global constraints into a con-

Ž .straints specification at the fragment level fragment constraints . The objectiveof the transformation is to obtain a specification of constraints that can bestraightforwardly used for constructing efficient enforcement algorithms. At thisstage, the transformations are restricted to logically equivalent transformations,without considering any reformulation of the original constraints. Initially, eachoccurrence of a relation RR in a constraint is replaced by its n fragmentrelations, RR . The transformation is one to many, which means that given aiglobal constraint, the result of the transformation is a logically equivalent set offragment constraints. There are four transformation rules which are appliedduring this process. These rules analyze the patterns in the prefixes of aconstraint and the types of fragmentation that are being applied to the globalrelations. The proofs of these rules can be found in Ref. 16 and are thereforeomitted here.

Horizontal Transformation Rules.

nŽ . Ž .Ž Ž . Ž .. Ž .Ž Ž . Ž ..i �X R X � P X � � �X R X � P X .i�1 inŽ . Ž .Ž Ž . Ž .. Ž .Ž Ž . Ž ..ii �X R X � P X � � �X R X � P X .i�1 i

� An integrity test is a local test if it can be evaluated at a single site, i.e., at the sitewhere the update is to be performed, and a global test otherwise.

Page 7: Optimizing fragment constraints—A performance evaluation

OPTIMIZING FRAGMENT CONSTRAINTS 291

Vertical Transformation Rules.

Ž . Ž .Ž Ž . Ž .. Ž .Ž Ž . Ž .i �X R X � P X � � x �X ��� �X R x , X � ��� � R x , Xkey 1 n 1 key 1 n key nŽ ..� P x , X , . . . , X .key 1 n

Ž . Ž .Ž Ž . Ž .. Ž .Ž Ž . Ž .ii �X R X � P X � � x �X ��� �X R x , X � ��� � R x , Xkey 1 n 1 key 1 n key nŽ ..� P x , X , . . . , X .key 1 n

To obtain a logically equivalent set of fragment constraints from a givenglobal constraint which contains a relation RR and is fragmented by a mixedfragmentation, we repeatedly apply horizontal and vertical transformation rulesin the same sequence of horizontal and vertical fragmentations to the globalconstraint to produce the fragment constraints.

Example. The integrity constraint IC-4 is transformed into the following set offragment constraints.

2 2 Ž .Ž Ž .FC-4 : � � � t�u�� �w� x� y�z emp t, u, � , w �t i�1 j�1 2 iŽ ..dept u, x, y, z .jŽ .ii simplification procedure. This procedure, which uses knowledge about�

the fragmentation of relations, reduces the number of fragment relationsinvolved in the evaluation of a constraint. A set of fragment constraints derivedfrom a global constraint can be simplified if the relations specified in the globalconstraint are fragmented on a join or reference attribute using the samefragmentation algorithm. Instead of deriving all combinations of fragmentconstraints, only compatible fragment constraints are constructed. Given globalrelations RR and SS which are fragmented on a join or reference attribute into nand m fragment relations, respectively, on the same fragmentation rules, thenthe following simplification rules are applied.

n mŽ . Ž .Ž Ž . Ž ..i � � � x �X �Y R x , X � S x , Y �i�1 j�1 r e f i ref j refn Ž .Ž Ž . Ž ..� � x �X �Y R x , X � S x , Y .i�1 ref i ref i refn mŽ . Ž .Ž Ž . Ž . .ii � � � x �X �Y R x , X � S x , Y ��� � ��� �i�1 j�1 join i join j joinn Ž .Ž Ž . Ž . .� � x �X �Y R x , X � S x , Y ��� � ��� .i�1 join i join i joinn mŽ . Ž .Ž Ž . Ž . .iii � � � x �X �Y R x , X � S x , Y ��� �i�1 j�1 join i join j joinn Ž .Ž Ž . Ž . .� � x �X �Y R x , X � S x , Y ��� .i�1 join i join i join

Example. FC-4 above can be simplified since both relations emp and dept aretfragmented on the reference attribute, i.e., dno, using the same fragmentationrules. The simplified set of fragment constraints are as shown below.

2 Ž .Ž Ž . Ž ..FC-4 : � � t�u�� �w� x� y�z emp t, u, � , w � dept u, x, y, z .si i�1 2 i iŽ .iii subsumption procedure. This procedure attempts to obtain an im-�

proved set of fragment constraints by removing redundant fragment constraintsfrom a set. A redundant fragment constraint is a constraint that can be impliedŽ .syntactically from other existing fragment constraints. Thus, deleting a redun-dant fragment constraint from its set does not affect the unsatisfiability orsatisfiability of the set since the truth of this constraint can be inferred from thetruth of its identical or subsumed pair. Even though a redundant set of

Page 8: Optimizing fragment constraints—A performance evaluation

IBRAHIM, GRAY, AND FIDDIAN292

constraints is semantically correct, excluding redundancy can improve the en-forcement time.

Example. Consider the following set of fragment constraints derived from IC-3Ž Ž ..by applying the transformation process step i .

2 2 Ž .Ž Ž .FC-3 : � � � w � x1� x2� y1� y2� z1� z2 dept w, x1, y1, z1 �t i�1 j�1 iŽ . Ž . Ž . Ž ..dept w, x2, y2, z2 � x1 � x2 � y1 � y2 � z1 � z2 .j

�Ž .Ž Ž .The sets � w � x1� x 2� y1� y 2� z1� z 2 dept w , x1, y1, z1 �iŽ . Ž . Ž . Ž ..�dept w , x 2, y 2, z 2 � x1 � x 2 � y1 � y 2 � z1 � z 2 andj

�Ž . Ž Ž . Ž .�w� x1� x2� y1� y2�z1�z2 dept w, x1, y1, z1 � dept w, x2, y2, z2 �j iŽ . Ž . Ž ..� � 4 � 4x1 � x2 � y1 � y2 � z1 � z2 for i � 1, 2 and j � 1, 2 are mutuallyredundant. Removing the redundant fragment constraints will generate thefollowing set of fragment constraints:

2 2 Ž .Ž Ž .FC-3 : � � � w� x1� x2� y1� y2� z1� z2 dept w, x1, y1, z1 �su i�1 j� i iŽ . Ž . Ž . Ž ..dept w, x2, y2, z2 � x1 � x2 � y1 � y2 � z1 � z2 .jŽ .iv contradiction procedure. This procedure removes the fragment con-�

straints produced by the constraint transformation procedure which contradict� �the fragmentation rules, i.e., it removes fragment constraints that are neversatisfied. The contradiction procedure designed by us is a specific-purpose�theorem prover that resembles a resolution-based theorem prover similar toRefs. 6 and 12. It is mainly designed for investigating if a fragment constraintviolates one of the existing fragmentation rules.

Example. The fragment constraint FC-3 above, when i � j, contradicts thesufragmentation rules FR-3 and FR-4.

Ž . Žv reformulation procedure. The above procedures the simplification� �.procedure, the subsumption procedure, and the contradiction procedure use� �

knowledge about data fragmentation and analyze the syntax of the constraints toŽeither remove inessential constraints a redundant constraint or a constraint

.which contradicts the fragmentation rules from the constraints set or reduceŽthe scope of the fragment relations specified in a constraint the

.simplification procedure . These procedures, which are based on syntactic crite-�ria, do not check the possible occurrence of redundant semantics in theconstraint set. The reformulation procedure, which is based on semantic crite-�

Ž . Ž .ria, attempts to i remove redundant semantic constructs that may exist, and iireformulate the set of fragment constraints derived so far into alternative formswhich can be either an antecedent¶ or an equi�alent form. In our approach,removing the redundant semantic constructs from a given fragment constraint isachieved by applying the substitution and absorption rules. This strategy makesthe constraint easier to evaluate because more constants are substituted for thevariables in the constraint which makes the constraint more selective.

¶ If an antecedent of a fragment constraint is satisfied this implies that the fragmentconstraint is satisfied but if the antecedent is falsified, then the fragment constraint hasto be evaluated.

Page 9: Optimizing fragment constraints—A performance evaluation

OPTIMIZING FRAGMENT CONSTRAINTS 293

Example. Consider the following fragment constraint derived from IC-5 byŽ Ž .. Ž Ž ..applying the transformation step i and the contradiction step iv processes.

Ž .Ž Ž . Ž . Ž ..FC-5 : �w� x� y�z dept w, x, y, z � w � D1 � z � 4000 .t 1The literal w � D1 is a redundant construct as this can be implied from

FR-3. Removing this construct will generate the following fragment constraint.Ž .Ž Ž . Ž ..FC-5 : �w� x� y�z dept w, x, y, z � z � 4000 .r 1

The fragment constraint FC-6 below which is derived from IC-6 can bet1reformulated as FC-6� by using FC-5 derived above.r

Ž . Ž Ž . Ž .FC-6 : � t � u�� � w � x � y� z emp t, u, � , w � dept u, x, y, z �t1 21 1Ž ..w � z .

Ž .Ž Ž . Ž ..FC-6�: � t�u�� �w emp t, u, � , w � w � 4000 .21Figure 2 lists the sets of fragment constraints derived after applying the

constraint preprocessing techniques to the initial constraints. Each FC-i is aŽsemantically equivalent set to its initial constraint IC-i except for FC-6� which

.is an antecedent of FC-6 .t1Constraint Distribution Techniques. The fragment constraints constructed so

far involve data stored at different network sites. Because the complexity ofenforcing constraints is directly related to both the number of constraints in theconstraint set and the number of sites involved, our objective in this phase is toreduce the number of constraints allocated to each site for execution at thatsite. Distributing the whole set of fragment constraints to every site is notcost-effective since not all fragment constraints are affected by an update and sosites may not be affected by particular updates.

These techniques reduce the number of constraints allocated to each site byallocating a fragment constraint to a site if and only if there is a fragmentrelation at that site which is mentioned in the constraint, so that whenever anupdate occurs at a site, the validation of the fragment constraints at that site

Figure 2. The sets of fragment constraints derived by the constraint preprocessingtechniques with respect to the global constraints and fragmentation rules given inFigure 1.

Page 10: Optimizing fragment constraints—A performance evaluation

IBRAHIM, GRAY, AND FIDDIAN294

implies the global validity of the update. Consequently, these techniques intendto allocate each fragment constraint to a site or minimal number of sites andrelieve the irrelevant sites from the computation of certain sets of fragmentconstraints.

Integrity Test Generation Techniques. These techniques generate integritytests from the syntactic structure of the constraints and the update operations.The algorithms applied for deriving these tests use the substitution, subsump-tion, and absorption rules, and are closely related to Ref. 14. These algorithmscan be found in Ref. 10 and are therefore omitted here.

In a distributed database, four types of integrity tests can be identified.They are global posttests, local posttests, global pretests, and local pretests.8 We

Ž .view local pretests as more effective in a distributed database since: i only asingle site is involved in evaluating the local tests, i.e., the site where the update

Ž .is to be performed; ii as they are evaluated at a target site, this avoids remotereading and the amount of data transferred across the network is minimized�in

5 Ž .fact, no data transfer across the network is required ; and iii they areevaluated before the update is performed�this avoids the need to undoŽ .rollback and recover from an update in the event of constraint violation, andthus reduces the overhead cost of checking integrity.18

IV. EVALUATION

The key problem in integrity checking is how to efficiently evaluate con-straints. An accurate way of measuring the evaluation of the generated fragmentconstraints and simplified forms is a high priority so that they can be comparedfairly and realistically. In this section, we evaluate the derived fragment con-straints and simplified forms with respect to the following components. We use

Ž .the symbol C R1, R2, . . . , Rn to denote the set of relations or fragmentrelations specified in the constraint or simplified form C.

AA provides an estimate of the amount of data accessed, which is related tothe number and the size of the relations or fragment relations specified in agiven constraint or simplified form. This measurement indirectly indicates thesize of the checking space. It is based on the following formula: AACŽ RR1, RR2, . . . , RRn.� � RR1 � RR2 ��� � RRn where the RRi’s are the relations or fragmentrelations specified in the C and � RRi is the size of RRi. For a compoundconstraint C � � m C , where C is a simple constraint, the amount of datai�1 i iaccessed is measured as follows: Ým AA . For a disjunction of constraints,i�1 Cim � m �C � � C , the amount of data accessed is measured as follows: AA ,Ý AAi�1 i min i�1 Ci

Ž m . Žwhich is a range where AA Ý AA , respectively is the minimum maximum,min i�1 Ci.respectively amount of data that might need to be accessed.

TT provides an estimate of the amount of data transferred across thenetwork. It is measured based on the following formula, TT �Ýn dt , where dti�1 i iis the amount of data transferred from site i to the target site.

� gives a rough measurement of the amount of nonlocal access necessaryto evaluate a constraint or simplified form, and is taken from Ref. 11. This ismeasured by analyzing the number of sites that might be involved in validating

Page 11: Optimizing fragment constraints—A performance evaluation

OPTIMIZING FRAGMENT CONSTRAINTS 295

� Ž . �the constraint. In general, the formula applied is as follows: � � site C whereCŽ .� is the locality of C, i.e., the number of sites involved, and site C is obtainedC

by the following formula.

� 4FOR each distinct relation or fragment relation, fr for i � 1, 2, . . . , k in CiDO

Ž . � � 44site C � j: j � sites �fr is allocated at site j where sites � 1, 2, . . . , n .i

For a compound constraint C � � m C , where C is a simple constraint, thei�1 i ilocality of these constraints is simply obtained from the maximum number ofsites that are involved in evaluating one of the constraints, i.e., � �C

Ž .max � , � , . . . , � . The locality of an integrity test, given an update opera-C C C1 2 m

tion, gives a rough measurement of the amount of nonlocal access necessary forverifying if the integrity test is being satisfied or violated by the updateoperation. This is measured by analyzing the number of sites that might beinvolved in validating the test. In general, the formula applied is

� �site T if site U site TŽ . Ž . Ž .i i i� � 1Ž .Ti ½ � �site T 1 otherwiseŽ .i

where � is the locality of an integrity test, T , with respect to a given updateT iiŽ . Ž .operation, U , i.e., the number of sites involved. The site T and site U arei i i

obtained by the following formula.

� 4FOR each distinct relation or fragment relation, fr for i � 1, 2, . . . , k in Ti iDO

Ž . � � 44site T � j: j � sites � fr is allocated at site j where sites � 1, 2, . . . , ni iŽ . � � 4site U � l:l � sites� � fr is allocated at site l where sites � 1, 2, . . . , ni ui

4and fr is a fragment relation specified in U .ui i

Table I shows information relating to the emp dept database. This informa-�tion includes each relation RR, its size � RR, its fragment relations RR , and theirisizes � RR . The fragment relations are derived based on the fragmentationischemas given in Section II. Cases I and II in the table represent the sites wherethe fragment relations are allocated. Here, Case I represents the reasonablecase where the fragment relations constructed based on the same fragmentationrules are allocated to the same site. It is called the reasonable case as it shouldminimize the network traffic. Case II represents a worst case where eachfragment relation is allocated to a different site of the network. It is a worst casein that network traffic is involved in any constraint evaluation that involves morethan one fragment.

In general, the amount of data accessed for evaluating the set of fragmentconstraints, AA , derived by the procedures embodied in the constraint prepro-F C - icessing techniques is less than the amount of data accessed for evaluating the

� As we assume that there is no replication of fragments across the network, for agiven update operation, U there will be only one target site, i.e., the site where theiupdate is to be performed.

Page 12: Optimizing fragment constraints—A performance evaluation

IBRAHIM, GRAY, AND FIDDIAN296

Table I. The emp�dept relations and fragment relations.a aRR � RR RR � RR Case I Case IIi i

bemp E emp e S S1 1 0 0emp e S S21 21 1 1emp e S S22 22 2 2

cdept D dept d S S1 1 1 3dept d S S2 2 2 4

a S � S where i � j.i jb E � e Ý2 e � � key where � key is the size of the repeated primary1 i�1 2 ikey values.

c D �Ý2 d .i�1 i

initial constraint, AA , i.e., AA � AA . A number of notes can be made.IC - i F C - i IC - iŽ .� RR is minimum � RR , � RR , . . . , � RR .min 1 2 n

The constraint transformation procedure transforms the initial constraints� �which include the domain, key, referential, and general semantic integrity

Žconstraints, into sets of logically equivalent fragment constraints without con-.sidering any reformulation of the original constraints by applying the transfor-

� 4mation rules given in Section III. If RR1, RR2, . . . , RRk is a finite set of globalŽ .relations specified in a constraint IC-i and each RRi with size � RRi is horizon-

Žtally fragmented into ni fragments, i.e., RR , RR , . . . , RR with sizesi1 i2 ini.� RR , � RR , . . . , � RR , respectively , then deriving an equivalent set of fragmenti1 i2 ini

Ž . Ž .constraints, FC-i , by applying the horizontal fragmentation rules i and�or iitwill obviously generate FC-i consisting of the prefix �, � or both where one oftthe following situations occurs.

Ž .a If IC-i is specified over RR1, then IC-i accessed � RR1 amount of data.Ž .Based on the horizontal transformation rule i , n1 fragment constraints are

constructed, i.e., � n1 fc , where each fragment constraint accesses � RR amountj�1 j 1 jof data. Thus, the total amount of data accessed by FC-i is equal to the amounttof data accessed by IC-i, i.e., Ýn1 � RR � � RR1 and thus AA � AA .j�1 1 j F C - i IC - it

Ž .b If IC-i is specified over RR1, then IC-i accessed � RR1 amount of data.Ž .Based on the horizontal transformation rule ii , n1 partial constraints are

constructed, i.e., � n1 fc , where each partial constraint accesses � RR amountj�1 j 1 jof data. According to the �-rule, the derived set of fragment constraints issatisfied if one of the partial constraints is satisfied. Thus, the amount of dataaccessed by FC-i is less than or equal to the amount of data accessed by IC-i,t

� n1 �i.e., � RR , Ý � RR � � RR1 and thus AA � AA .1 j�1 1 j F C - i IC - imin tŽ .c If IC-i is specified over RR1, RR2, . . . , RRk, then IC-i accessed � RR1

� RR2 ��� � RRk amount of data. Based on the horizontal transformation ruleŽ . Ž .i , n1 � n2 � ��� � nk fragment constraints are constructed, i.e.,� n1 � n2 ��� � nk fc , where each fragment constraint accesses � RRj1�1 j2�1 jk�1 j1 j2 ��� jk 1 j1 � RR ��� � RR amount of data. Thus, the total amount of data accessed2 j2 k jkby FC-i is greater than the amount of data accessed by IC-i, i.e., Ýn1 Ýn2

t j1�1 j2�1��� Ýnk � RR � RR ��� � RR � � RR1 � RR2 ��� � RRk and thusjk�1 1 j1 2 j2 k jkAA � AA .F C - i IC - it

Page 13: Optimizing fragment constraints—A performance evaluation

OPTIMIZING FRAGMENT CONSTRAINTS 297

Ž .d If IC-i is specified over RR1, RR2, . . . , RRk, then IC-i accessed � RR1 � RR2 ��� � RRk amount of data. Based on the horizontal transformation ruleŽ . Ž . n1 n2ii , n1 � n2 � ��� � nk partial constraints are constructed, i.e., � �j1�1 j2�1��� � nk fc , where each partial constraint accesses � RR � RRjk�1 j1 j2 ��� jk 1 j1 2 j2 ��� � RR amount of data. According to the �-rule the derived set ofk jkfragment constraints is satisfied if one of the partial constraints is satisfied.

�Thus, the amount of data accessed by FC-i is � RR � RRt 1 2min minn1 n2 nk � ��� � RR ,Ý Ý ���Ý � RR � RR ��� � RR , which can bek j1�1 j2�1 jk�1 1 j1 2 j2 k jkminŽ . � �rewritten as i � RR � RR ��� � RR , � RR1 � RR2 ��� � RRk , i.e.,1 2 kmin min min

Ž . � n1 n2 nkAA � AA and ii � RR1 � RR2 ��� � RRk,Ý Ý ���Ý � RR F C - i IC - i j1�1 j2�1 jk�1 1 j1t � Ž .� RR ��� � RR , i.e., AA � AA . For i , we assume that l partial con-2 j2 k jk F C - i IC - itŽ .straints are evaluated where l � n1 � n2 � ��� � nk and the maximum amount

of data accessed for evaluating l partial constraints is � RR1 � RR2 ��� � RRkŽ . Žand for ii , we assume that l partial constraints are evaluated where 1 � l � n1

.� n2 � ��� � nk and the minimum amount of data accessed for evaluating lpartial constraints is � RR1 � RR2 ��� � RRk.

Ž .e If IC-i is specified over RR1, RR2, . . . , RRk, then IC-i accessed � RR1 � RR2 ��� � RRk amount of data. If n1 fragment constraints are constructed in

Ž .which each fragment constraint contains n2 � n3 � ��� � nk partial con-straints, i.e., � n1 � n2 ��� � nk fc , by applying the horizontal trans-j1�1 j2�1 jk�1 j1 j2 ��� jk

Ž . Ž .formation rules i and ii , then each fragment constraint accesses � RR 1 j1� RR ��� � RR amount of data. According to the -rule, each fragment2 j2 k jkconstraint is satisfied if one of its partial constraints is satisfied. Thus, the total

� n1 Ž .amount of data accessed by FC-i is Ý � RR � RR ��� � RR ,t j1�1 1 j1 2 kmin minn1 n2 nk �Ý Ý ���Ý � RR � RR ��� � RR . It is hard to show if thej1�1 j2�1 jk�1 1 j1 2 j2 k jkn1 Ž .lower bound, Ý � RR � RR ��� � RR is less than, equal to, orj1�1 1 j1 2 kmin min

greater than the amount of data accessed by IC-i and thus AA � AA orF C - i IC - it

AA � AA .**F C - i IC - it

If each global relation RR specified in a constraint IC-i is vertically frag-mented into n fragments, then we have the following. The worst situation iswhen IC-i is specified over the attributes of RR which are associated with its nfragment relations and the best situation is when only a single fragment relationis affected by the condition specified in IC-i.

Ž .f Assume that IC-i accessed � RR amount of data and based on theŽ .vertical transformation rule i , a fragment constraint is constructed where each

global relation RR is replaced by its n fragment relations RR . Thus, the amountjof data accessed by FC-i is greater than the amount of data accessed by IC-i,ti.e., � RR � RR ��� � RR � � RR and thus AA � AA . This occurs when1 2 n F C - i IC - it

IC-i is specified over the attributes of RR which are associated with its n� 4fragment relations, i.e., RR , RR , . . . , RR . The same argument is applied to the1 2 n

Ž .vertical transformation rule ii given in Section III.Ž .g If only a subset of fragment relations is affected by the condition

specified in IC-i, then AA � AA or AA � AA . This occurs when IC-i isF C - i IC - i F C - i IC - it t

** If FC-i is in the form of � n1 � n2 ��� � nk�1 � nk fc , thent j1�1 j2�1 jk�1�1 jk�1 j1 j2 ��� jk�1 jkAA � AA .FC - i IC - it

Page 14: Optimizing fragment constraints—A performance evaluation

IBRAHIM, GRAY, AND FIDDIAN298

specified over some of the attributes of RR which are associated with a subset of� 4its n fragment relations, i.e., RR , RR , . . . , RR where j � 1 and j m � n.j j1 jm

� 4This means the global relation RR in IC-i is replaced by RR , RR , . . . , RR . Itj j1 jmis hard to show if � RR � RR ��� � RR is less than, equal to, or greaterj j1 jmthan the amount of data accessed by IC-i and thus AA � AA or AA �F C - i IC - i F C - it t

AA .IC - iŽ .h If only a single fragment relation is affected by the condition specified

in IC-i, then AA � AA . This occurs when IC-i is specified over the at-F C - i IC - itŽ .tribute s of RR which is associated with a single fragment relation, RR of RR.j

This means the global relation RR in IC-i is replaced by RR only. Therefore, thejamount of data accessed by FC-i is less than the amount of data accessed bytIC-i, i.e., � RR � � RR and thus AA � AA .j F C - i IC - it

If a set of fragment constraints FC-i is derived from IC-i by applying bothtthe horizontal and vertical transformation rules, then either AA � AA orF C - i IC - it

AA � AA as this depends on the result of applying the transformation rules.F C - i IC - itŽ .As a conclusion thus far, the following should be noted: i the objective of

the constraint transformation procedure is to transform the initial constraints� �into sets of logically equivalent fragment constraints without considering any

Ž .reformulation of the original constraints; ii for a set of fragment constraintsn1 n2 nk � Ž . �FC-i in the form of � � ��� � fc see situation c above ,t j1�1 j2�1 jk�1 j1 j2 ��� jk

the amount of data accessed by FC-i is determined under the assumption thattall the fragment constraints in FC-i are evaluated�but given an updatetoperation affecting a fragment relation RR , only the fragment constraints ini jFC-i where RR appears in their specification should be evaluated�since wet i jassume that fragmentation satisfies the disjointness property, only a subset ofFC-i is evaluated, i.e., the amount of data accessed is less than that presentedt

Ž . Ž . Ž .in situation c �the same argument applies for situations a and e above;Ž .and iii we assume that the fragmentation strategies are chosen in such a way

that their effect on the integrity constraints will result in efficient constraintchecking.

As expected, evaluating the sets of fragment constraints derived by theconstraint transformation procedure without further optimization is inefficient� �Ž .with respect to the amount of data accessed since the evaluation of theseconstraints generally will access all the fragment relations specified in theseconstraints. This is similar to the brute force strategy, but is only the initial stageof our application.

The simplification procedure simplifies a set of fragment constraints which�reduces the number of joins and references between fragment relations requiredin the definition of the set of fragment constraints. Each FC-i is constructedsifrom its associated set of fragment constraints FC-i . If the global relations RRtand SS specified in a constraint IC-i are fragmented into n and m fragments,respectively, then we have the following situation.

Ž . Ž .i See simplification rule i given in Section III. Assume that IC-i ac-cessed � RR � SS amount of data and based on the transformation rules, nfragment constraints are constructed where each fragment constraint consists ofm partial constraints, i.e., � n � m fc . Applying the simplification rules willi�1 j�1 i j

Page 15: Optimizing fragment constraints—A performance evaluation

OPTIMIZING FRAGMENT CONSTRAINTS 299

generate n fragment constraints, i.e., � n fc . Thus, the amount of dataj�1 jaccessed by FC-i is equal to the amount of data accessed by IC-i, i.e.,siÝn � RR SS � � RR � SS and thus AA � AA . The same argument isj�1 j j F C - i IC - is i

Ž . Ž .applied to the simplification rules ii and iii given in Section III which reducethe number of joins between fragment relations required in the definition of theset of fragment constraints.

The subsumption procedure removes any redundant fragment constraints�from a given set of fragment constraints, i.e., � n fc or � m fc or a set ofj�1 j j�1 jfragment constraints constructed by both. Obviously, this set of fragment con-straints is derived from the horizontal or mixed transformation rules. EachFC-i is constructed from its associated set of fragment constraints FC-i .su t

Ž . Ž .j If AA � AA and � RR is the total size of the fragment relation sF C - i IC - i rt

accessed by the redundant fragment constraint, then removing the redundantfragment constraint from its set means that the total amount of data accessed bythe new set of fragment constraints FC-i is less than the amount of datasu

Ž .accessed by IC-i, i.e., AA � � RR � AA and thus AA � AA .F C - i r IC - i F C - i IC - it suŽ . Ž .k If AA � AA and � RR is the total size of the fragment relation sF C - i IC - i rt

accessed by the redundant fragment constraint, then removing the redundantfragment constraint from its set means that the total amount of data accessed bythe new set of fragment constraints FC-i is less than the amount of datasuaccessed by FC-i and it is hard to show if it is less than, equal to, or greatertthan the amount of data accessed by IC-i and thus AA � AA or AA �F C - i IC - i F C - isu su

AA .IC - iThe contradiction procedure eliminates fragment constraints from their�

sets if they contradict one of the existing fragmentation rules, i.e., � n fc orj�1 j� m fc or a set of fragment constraints constructed by both. Obviously, this setj�1 jof fragment constraints is derived from the horizontal or mixed transformationrules. Each FC-i is constructed from its associated set of fragment constraintscgenerated by previous procedures, namely: the constraint transformation pro-� �cedure and the subsumption procedure.�

Ž .l If AA � AA or AA � AA and if � RR is the total size of theF C - i IC - i F C - i IC - i ct suŽ .fragment relation s accessed by the fragment constraint which contradicts one

of the existing fragmentation rules, then removing this fragment constraint fromits set means that the total amount of data accessed by the new set of fragmentconstraints FC-i is less than the amount of data accessed by IC-i, i.e.,cŽ . Ž .AA � � RR � AA or AA � � RR � AA and thus AA � AA .F C - i c IC - i F C - i c IC - i F C - i IC - it su c

Ž .m If AA � AA or AA � AA , then the same argument as pre-F C - i IC - i F C - i IC - it suŽ .sented in k is applied here.

Since we require that a fragment constraint which is derived by thereformulation process must be cheaper compared to the initial constraint,therefore the amount of data accessed for evaluating the derived constraintmust be less than or equal to the amount of data accessed for evaluating theinitial constraint.

Table II illustrates the amount of data needing to be accessed, AA, by theŽ .sets of fragment constraints given in Figure 2 except for FC-6� which are

derived by the procedures embodied in the constraint preprocessing techniques.

Page 16: Optimizing fragment constraints—A performance evaluation

IBRAHIM, GRAY, AND FIDDIAN300

Table II. Estimation of the amount of data accessed�the constraint preprocessingtechniques.

IC-i AA FC-i AA CommentIC - i F C - i

2IC-1 E FC-1 Ý e AA � AAi�1 2 i F C -1 IC -12 Ž .IC-2 E E FC-2 Ý 2 e 2 e e e AA � AAi�1 1 2 i 21 22 F C -2 IC -2

2IC-3 D D FC-3 Ý d d AA � AAi�1 i i F C -3 IC -32IC-4 E D FC-4 Ý e d AA � AAi�1 2 i i F C -4 IC -4

IC-5 D FC-5 d AA � AA1 F C -5 IC -52IC-6 E D FC-6 Ý e d AA � AAi�1 2 i i F C -6 IC -6

In most cases, the derived set of fragment constraints is better than the initialconstraint, i.e., AA � AA , as shown by FC-1, FC-3, FC-4, FC-5, and FC-6 inF C - i IC - i

Ž .Table II. AA � AA , only when IC-i is specified over the attribute s of aF C - i IC - iglobal relation RR which is associated with all of its fragment relations which areconstructed by a vertical fragmentation, as shown by FC-2 in Table II.

The amount of data accessed by FC-i is determined by the assumption thatall fragment constraints in FC-i are evaluated. However, given an updateoperation affecting a fragment relation RR , only the fragment constraints injFC-i containing RR in their specification should be evaluated. Since we assumejthat fragmentations satisfy the disjointness property, only a subset of FC-i isevaluated, i.e., the amount of data accessed is less than those presented in thissection. Also, we assume that the worst case, i.e., AA � AA , is a rare caseF C - i IC - isince in reality the fragmentation strategies are chosen in such a way that theireffect on the integrity constraints will result in efficient constraint checking.

The constraint distribution techniques employed by our system are appliedto the end result of the constraint preprocessing techniques. Table III shows theeffect of the constraint distribution techniques on the derived sets of fragmentconstraints with respect to � , which gives a rough measurement of the amountof nonlocal access required in constraint enforcement. As shown in the table,most of the derived sets of fragment constraints can be evaluated at a single site.Even in the worst case where each fragment is allocated to different sites of thenetwork, the number of sites involved in evaluating the derived sets of fragment

Ž .constraints is less in most cases, generally than the number of sites involved inŽ .evaluating the initial constraints shown by columns in Case II of Table III .

Table III. The reduction in the scatter metric.

IC-i Case I Case II FC-i Case I Case II

IC-1 3 3 FC-1 1 1IC-2 3 3 FC-2 3 3IC-3 2 2 FC-3 1 1IC-4 3 5 FC-4 1 2IC-5 2 2 FC-5 1 1IC-6 3 5 FC-6 1 2

Page 17: Optimizing fragment constraints—A performance evaluation

OPTIMIZING FRAGMENT CONSTRAINTS 301

Table IV. The reduction in the amount of data accessed, thescatter metric, and the amount of data transferred across thenetwork for the integrity tests given in Table V.

a b c dTest, T AA � � TT TTi T I II I IIi

1. 0 1 1 0 02. e 3 3 e e 1 1 1

e e e1 2 j 2 je 2 i

e2 j3. e 1 1 0 02 i4. d 1 1 0 0i5. d 1 1 0 0i6. d 1 2 0 di i7. e 1 1 0 02 i8. 0 1 1 0 09. d 1 2 0 di i

10. e 1 1 0 02 i

a � of test T for Case I.ib � of test T for Case II.icTT of test T for Case I.idTT of test T for Case II.i

ŽThe integrity test generation techniques, which derive integrity tests sim-.plified forms for the set of fragment constraints, further reduce the amount of

data that has to be accessed, AA, during the evaluation of these tests. This isŽillustrated by Table IV. Table V presents the tests constructed for a given

.update operation and the set of fragment constraints, FC-i, given in Figure 2 .

Table V. The test constructed for a given update operation and the set of fragmentconstraints, FC-i, given in Figure 2.

FC-i INSERT Test, Ti

FC-1 emp 1. d � 02 iŽ .a, b, c, d

Ž .FC-2 emp 2. ��1�� 2�w1�w2� x1� y1�z12 iŽ . Ž Ž . Ž .a, b, c, d � emp a, �1, w1 � emp a, � 2, w2 1 1

Ž . �Ž .� emp a, x1, y1, z1 �1 � � 2 �2 iŽ . Ž . Ž . Ž .�.w1 � w2 � x1 � b � y1 � c � z1 � d

Ž .Ž Ž ..and � x2� y2�z2 � emp a, x2, y2, z22 jŽ .Ž Ž ..3. � x1� y1�z1 � emp a, x1, y1, z12 iŽ .Ž Ž .FC-3 dept 4. � x1� y1�z1 � dept a, x1, y1, z1 i i

Ž . �Ž . Ž . Ž .�.a, b, c, d x1 � b � y1 � c � z1 � dŽ .Ž Ž ..5. � x1� y1�z1 � dept a, x1, y1, z1iŽ .Ž Ž ..FC-4 emp 6. � x� y� y dept b, x, y, z2 i i

Ž . Ž .Ž Ž ..a, b, c, d 7. � t�� �w emp t, b, � , w2 iFC-5 dept 8. d � 40001

Ž .a, b, c, dŽ .Ž Ž . Ž ..FC-6 emp 9. � x� y�z � dept b, x, y, z d � z2 i i

Ž . Ž .Ž Ž . Ž ..a, b, c, d 10. � t�� �w emp t, b, � , w � w � d2 i

Page 18: Optimizing fragment constraints—A performance evaluation

IBRAHIM, GRAY, AND FIDDIAN302

For example, the domain constraint IC-1, which initially required E amount ofŽ . Ždata to be accessed see column AA of Table II , is reduced to 0 see columnIC - i

.AA of Table IV , i.e., no data access is required at all; the referential integrityTi

constraint IC-4, which initially required E D amount of data to be accessedŽ . 2 Žsee column AA of Table II , is reduced to Ý e d see column AA ofIC - i i�1 2 i i F C - i

.Table II by the constraint preprocessing techniques and further reduced to diŽ . Žsee column AA for test 6 of Table IV or e see column AA for test 7 ofT 2 i Ti i

.Table IV by the integrity test generation techniques.With respect to the scatter metric, � , the number of sites involved in

evaluating an integrity test for a local fragment constraint is 1 and the numberof sites involved in evaluating an integrity test for a nonlocal fragment constraintis normally more than 1. A local fragment constraint is one in which allfragment relations specified in the constraint are allocated to the same site. Forthe case of nonlocal fragment constraints, the number of sites involved can bereduced to 1 by applying the integrity test generation techniques as shown in

Ž .Table IV. For example, test 9 of FC-6 for Case II see column � of Table IVIIŽ .involves two sites, while test 10 of FC-6 see column � of Table IV for theII

same case involves a single site. As most of the work is being carried out at asingle site, therefore the amount of data transferred across the network isminimized, i.e., TT � 0.

The integrity tests listed in Table V are generated from the sets of fragmentconstraints constructed by the fragmentation rules given in Section II. Asconstraint checking is strongly influenced by the fragmentation rules used toconstruct the fragment relations, it is important to see and to analyze the effectof applying different fragmentation rules on the derived integrity tests. This isdiscussed below. Assume that the fragment emp is horizontally fragmented into2two fragments emp with some predicates different from those given in Figure2 i1, i.e., the global relations are fragmented based on different fragmentationrules. The sets of fragment constraints derived after applying the constraintpreprocessing techniques to their respective initial constraints are as shown in

Ž .Figure 3. Only the derived forms for IC-4 represented by FC-4 and IC-6sŽ .represented by FC-6 are shown, as the rest of the derived sets of fragmentsconstraints remain the same.

Table VI shows the effectiveness of the integrity test generation techniqueswhen applied to the sets of fragment constraints derived in Figure 3 with respect

Žto AA, � , and TT. Table VII presents the tests constructed for a given update.operation and the set of fragment constraints, FC-i, given in Figure 3. From

Figure 3. The sets of fragment constraints derived by the constraint preprocessingtechniques when different fragmentation rules are applied.

Page 19: Optimizing fragment constraints—A performance evaluation

OPTIMIZING FRAGMENT CONSTRAINTS 303

Table VI. The reduction in the amount of data accessed, the scatter metric, and theamount of data transferred across the network for the integrity tests given in Table VII.

Test, T AA � � TT TTi T I II I IIi

3 3 3� � � � � �11. d ,Ý d 3 4 d ,Ý d d ,Ý dmin j�1 j min j�1 and j� � i j min j�1 j12. e 1 1 0 02 i

3 3 313. Ý d 3 4 Ý d Ý dj�1 j j�1 and j� � i j j�1 j14. e 1 1 0 02 i

this simple example, it is obvious that constraint checking is strongly influencedby the type of fragmentation and allocation used. Although the resultingsimplified forms shown in Table VI are not as efficient as those presented inTable IV, they are still better than the initial constraints with respect to AA, � ,and TT. The integrity test generation techniques formulate local simplified formsŽ .shown by tests 12 and 14 in Table VI which are cheaper than their alternative

Ž .forms tests 11 and 13, respectively, in Table VI . The sufficient tests, which areŽ . 5,11,16,18normally local tests i.e., � � 1 , are cheaper in a distributed environment

where the cost of accessing remote data for verifying the consistency of adatabase is a critical factor influencing the performance of the system.18

Some conclusions can be drawn with respect to the type of constraint beingŽ . Ž .considered. I Domain constraints IC-1 : the test generated will always have

AA � AA regardless of the fragmentation strategy used. In fact, AA � 0, � � 1,T IC - i Ti i

and TT � 0. This is because the test can be evaluated independently of the rest of8 Ž .the database as it only refers to the tuples to be updated. II Key constraints

Ž . Ž .IC-2, IC-3 : we considered: i when the global relations involved in the initialŽ . Ž .constraint are fragmented on the join attribute IC-3 , and ii when the global

relations involved in the initial constraint are not fragmented on the joinŽ .attribute IC-2 . As shown in Tables IV and V, it is always possible to deriveŽ . Ž . Ž . Žlocal tests i.e., � � 1 which are either a complete tests for case i tests 4 and

. Ž . Ž .5 of Table V or sufficient tests for case ii test 3 of Table V . These completetests are cheaper than the initial constraint with respect to the amount of data

Ž . Ž .accessed, i.e., AA � AA , � , and TT. III Referential constraints IC-4 : we haveT IC - iiŽ .considered: i when the global relations involved in the initial constraint are

Ž .fragmented on the reference attribute, and ii when the global relationsinvolved in the initial constraint are not fragmented on the reference attribute.

Table VII. The test constructed for a given update operation and the set of fragmentconstraints, FC-i, given in Figure 3.

FC-i INSERT Test, Ti

3Ž . Ž .Ž Ž ..FC-4 emp a, b, c, d 11. � � x� y�z dept b, x, y, zs 2 i j�1 jŽ .Ž Ž ..12. � t�� �w emp t, b, � , w2 i

3Ž . Ž .Ž Ž . Ž ..FC-6 emp a, b, c, d 13. � � x� y�z � dept b, x, y, z d � zs 2 i j�1 jŽ .Ž Ž . Ž ..14. � t�� �w emp t, b, � , w � w � d2 i

Page 20: Optimizing fragment constraints—A performance evaluation

IBRAHIM, GRAY, AND FIDDIAN304

ŽFor both cases two types of test are generated, namely, complete tests tests 6. Ž .and 11, respectively and sufficient tests tests 7 and 12, respectively . The

sufficient tests are cheaper than the complete tests with respect to � and TT.Ž . Ž .IV General semantic integrity constraints IC-5, IC-6 : In general, AA � AAT IC - ii

and it is always possible to generate complete tests which are normally globaltests, and sometimes possible to generate tests whose � � 1 since this dependson the fragmentation rules and the allocation schemas used, also on thecomplexity of the constraint itself.

Ž .In our method, given a set of possible simplified forms integrity tests , eachof the simplified forms is evaluated with respect to the three main componentslisted above. The most efficient one is selected based on the following heuristic

Ž . Ž .rules H1�H5 . A heuristic based approach is adopted due to: i the difficulty indetermining all the appropriate components and parameters that can be used tomeasure and to further select the efficient form from a range of possible

Ž .simplified forms; ii the difficulty in assigning suitable weights to all the costcomponents, when the interaction between these components is not clearly

Ž .defined; iii most of the values assigned to the components and parameters areestimates and not the actual values, which depend solely on the user input16 ;

Ž .and iv components which are considered as a critical factor that can influencethe performance of a system in one situation might not be critical in othersituations�for example, in a distributed database which is fully replicated, theamount of data transferred across the network in the event of evaluatingconstraints is not as important as in a distributed database with no replication.

Based on the above arguments, the following heuristic rules are applied:

Ž .H1 Given a set of simplified forms, a simplified form with the lowest � , AA, and TT

is always preferable to the others.Ž .H2 As the cost of accessing remote data for verifying the consistency of a

database state is the most critical factor that influences the performance of adistributed database, simplified forms which can be evaluated at a single sitewithout involving any interaction with the remote sites are more efficient.5,11

Therefore, a simplified form whose � � 1 is always preferable to the others.Ž .H3 In a situation where more than one local simplified form can be derived, then

the simplified form with the lowest AA is preferable to the others.Ž .H4 In a situation where the simplified forms are nonlocal, then the simplified

form with the lowest � is preferable to the others.Ž .H5 In a situation where the simplified forms are nonlocal with the same � value,

then the simplified form with the lowest TT is preferable to the others.

V. CONCLUSION

In a distributed database, the cost of accessing remote data for verifying theconsistency of the database is the most critical factor that influences theperformance of the system.5,11,16,18 In such an environment, simplified constraintforms which can be evaluated at a single site are preferable, i.e., � � 1 andTT � 0. In this paper, we outline several techniques which are essential forefficient constraint checking of fragmented relations in a distributed database.These techniques, which utilize knowledge about the database application to

Page 21: Optimizing fragment constraints—A performance evaluation

OPTIMIZING FRAGMENT CONSTRAINTS 305

derive fragment constraints and simplified forms, reduce the amount of datathat has to be accessed, the amount of data transferred across the network, andthe scatter metric which captures the scale of constraint nonlocality.

There are a number of extensions and improvements that could be made:Ž .i Consider a broader range of constraint types. We concentrate here on fourtypes of constraint, namely: domain constraints, key constraints, referentialintegrity constraints, and simple general semantic integrity constraints. Othertypes of constraint, such as aggregate constraints and transition constraints, are

Ž .worth investigating. ii The overhead of constraint checking is also stronglyinfluenced by the type of fragmentation and allocation used. Further investiga-tion of the effect of the fragmentation and allocation strategy on the derivedfragment constraints and simplified forms would be worthwhile.

References

1. Barbara D, Garcia-Molina H. The demarcation protocol: A technique for maintain-ing linear arithmetic constraints in distributed database systems. In: Proceedings of

Ž .the Conference on Extending Database Technology EDBT’92 , LNCS 580, Vienna,Austria, March 1992, p 373�388.

2. Bernstein PA, Blaustein BT. A simplification algorithm for integrity assertions andconcrete views. In: Proceedings of the 5th International Computer Software and

Ž .Applications Conference COMPSAC’81 , Chicago, IL, November 1981, p 90�99.3. Date CJ. An introduction to database systems. 6th ed. Reading, MA: Addison-Wes-

ley; 1995.4. Grefen PWPJ. Combining theory and practice in integrity control: A declarative

approach to the specification of a transaction modification subsystem. In: Proceed-Ž .ings of the 19th International Conference on Very Large Data Bases VLDB 19 ,

Dublin, Ireland, August 1993, p 581�591.5. Gupta A, Widom J. Local verification of global integrity constraints in distributed

databases. In: Proceedings of the 1993 ACM Special Interest Group on ManagementŽ .of Data SIGMOD Conference, Washington DC, May 1993, p 49�58.

6. Henschen LJ, McCune WW, Naqvi SA. Compiling constraint-checking programsfrom first-order formulas. In: Gallaire H, Minker J, Nicolas JM, editors. Advances indatabase theory. New York: Plenum; 1984, Vol 2, p 145�169.

7. Hsu A, Imielinski T. Integrity checking for multiple updates. In: Proceedings of the1985 ACM SIGMOD International Conference on the Management of Data, Austin,TX, May 1985, p 152�168.

8. Ibrahim H, Gray WA, Fiddian NJ. Rule-based integrity enforcement strategy for adistributed database. In: Proceedings of the Third Biennial World Conference onIntegrated Design and Process Technology�Issues and Applications of Database

Ž .Technology IADT’98 , Berlin, Germany, July 1998, Vol 2, p 111�118.9. Ibrahim H, Gray WA, Fiddian NJ. SICSDD: Techniques and implementation. In:

Proceedings of Constraint Databases and Applications, Second International Work-Ž .shop on Constraint Database Systems CDB’97 , Delphi, Greece, January 1997, p

187�207.10. Ibrahim H, Gray WA, Fiddian NJ. The development of a semantic integrity con-

Ž .straint subsystem for a distributed database SICSDD . In: Proceedings of the 14thŽ .British National Conference on Databases BNCOD 14 , Edinburgh, U.K., July 1996,

p 74�91.11. Mazumdar S. Optimizing distributed integrity constraints. In: Proceedings of the 3rd

International Symposium on Database Systems for Advanced Applications, Taejon,Korea, April 1993, Vol 4, p 327�334.

Page 22: Optimizing fragment constraints—A performance evaluation

IBRAHIM, GRAY, AND FIDDIAN306

12. McCarroll NF. Semantic integrity enforcement in parallel database machine. Ph.D.thesis. Sheffield, U.K.: Department of Computer Science, University of Sheffield;May 1995.

13. McCune WW, Henschen LJ. Maintaining state constraints in relational databases: AŽ .proof theoretic basis. J Assoc Comput Mach 1989;36 1 :46�68.

14. Nicolas JM. Logic for improving integrity checking in relational data bases. Acta InfŽ .1982;8 3 :227�253.

15. Ozsu MT, Valduriez P. Principles of distributed database systems. Englewood Cliffs,NJ: Prentice Hall International; 1991.

16. Qian X. Distribution design of integrity constraints. In: Proceedings of the 2ndInternational Conference on Expert Database Systems, 1989, p 205�226.

17. Qian X. An effective method for integrity constraint simplification. In: ProceedingsŽ .of the 4th International Conference on Data Engineering ICDE 88 , February 1988,

p 338�345.18. Simon E, Valduriez P. Integrity control in distributed database systems. In: Proceed-

ings of the 19th Hawaii International Conference on System Sciences, Hawaii,January 1986, p 622�632.