institut für scientific computing – universität wienp.brezany fragmentation univ.-prof. dr....

17
Institut für Scientific Computing – Universität Wien P.Brezany Fragmentation Fragmentation Univ.-Prof. Dr. Peter Brezany Institut für Scientific Computing Universität Wien Tel. 4277 39425 Sprechstunde: Di, 13.00-14.00 LV-Portal: www.par.univie.ac.at /~ brezany / teach / gckfk /300658.html

Post on 20-Dec-2015

223 views

Category:

Documents


3 download

TRANSCRIPT

Institut für Scientific Computing – Universität Wien

P.Brezany

FragmentationFragmentation

Univ.-Prof. Dr. Peter Brezany

Institut für Scientific Computing

Universität WienTel. 4277 39425

Sprechstunde: Di, 13.00-14.00

LV-Portal: www.par.univie.ac.at/~brezany/teach/gckfk/300658.html

Institut für Scientific Computing – Universität Wien

P.Brezany2

Introduction

• We already presented the various fragmentation strategies.

• Fragmentation strategies:– horizontal– vertical– nesting fragments in a hybrid fashion.

Institut für Scientific Computing – Universität Wien

P.Brezany3

Horizontal Fragmentation

• Primary horizontal fragmentation of a relation is performed using predicates that are defined on that relation.

• Derived horizontal fragmentation is the partitioning of a relation that results from predicates being defined on another relation.

• Information requirements of horizontal fragmentation

– Database information– Application information

Institut für Scientific Computing – Universität Wien

P.Brezany4

Database Example

Institut für Scientific Computing – Universität Wien

P.Brezany5

Database Information• It concerns the global conceptual schema. In this context it

is important to note how the database relations are connected to one another, especially with joins.

• Example:

Given link L1 of the above figure, the owner and member functions have thethe following values: owner(L1) = PAY; member(L1) = EMPThe quantitative information required about the database is the cardinalityof each relation R, denoted card(R).

Institut für Scientific Computing – Universität Wien

P.Brezany6

Application Information• It is required:

– qualitative information, which guides the fragmentation activity– quantitative information is incorporated primarily into the allocation models.

• The fundamental qualitative information consists of the predicates used in user queries.It is not possible to analyze all of the user applications to determine these predicates one should at least investigate the most “important“ ones. a rule of thumb: the most active 20% of user queries account for 80% of the total data access.

• Simple predicates: Given a relation R(A1, A2, ..., An), where Ai is an attribute defined over domain Di, a simple predicate pj defined on R has the form

pj : Ai Value

where {, , , , , } and Value Di. We use Pri to denote the set of all simple predicates defined on a relation Ri. The members of Pri are denoted by pij.

Example: for the relation instance PROJ:

PNAME = “Maintenance“ BUDGET 200000

Institut für Scientific Computing – Universität Wien

P.Brezany7

Application Information (cont.)

User queries often include more complicated predicates, which areBoolean combinations of simple predicates.One important combination: minterm predicate – conjunction ofsimple predicates.

Given a set of simple predicates for relation Ri, the set of minterm predicates is defined as

},...,,{Pr 21 imiii ppp},...,,{ 21 iziii mmmM

zjmkpmmM ikp

ijijiiik

1,1},|{ *

Pr

where ikik pp *or ikik pp *

So each simple predicate can occur in a minterm predicate either in its natural form or ist negated form.

Institut für Scientific Computing – Universität Wien

P.Brezany8

Application Information (cont.)

Example:

Institut für Scientific Computing – Universität Wien

P.Brezany9

Application Information (cont.)

• In terms of quantitative information about the user applications, we need to have 2 sets of data:1. Minterm selectivity: number of tuples of the relation that

would be accessed by a user query specified according to a given minterm predicate. E.g., in previous example, sel(m1)=0 since there are no tuples in PAY that satisfy the minterm predicate. sel(m2)=1

2. Access frequency: frequency with which user applications access data. If Q = {q1, q2, ..., qq} is a set of user queries, acc(qi) indicates the access frequency of query qi in a given period.

• The minterm access frequencies can be determined from the query frequencies acc(mi) – the access frequency of a minterm mi.

Institut für Scientific Computing – Universität Wien

P.Brezany10

Primary Horizontal Fragmentation

• It is defined by a selection operation on the owner relations of a database schema.

• Given a relation R, its horizontal fragments are given by

Ri = Fi(R), 1 i w

where Fi is the selection formula used to obtain fragment Ri.

Example : PROJ PROJ1 and PROJ2

PROJ1 = BUDGET 200000(PROJ)

PROJ2 = BUDGET 200000(PROJ)

Institut für Scientific Computing – Universität Wien

P.Brezany11

Primary Horizontal Fragmentation (cont.)

Example:

Institut für Scientific Computing – Universität Wien

P.Brezany12

Primary Horizontal Fragmentation (cont.)

A more formal definition of a horizontal fragment:

• A horizontal fragment of relation Ri consists of all the tuples of R that satisfy a minterm predicate mj.

• Hence, given a set of minterm predicates M, there are as many horizontal fragments of R as there are minterm predicates. minterm fragments.

• An important aspect of simple predicates is their completeness; another is their minimality.

• A set od simple predicates Pr is said to be complete if and only if there is an equal probability of access by every application to any tuple belonging to any minterm fragment that is defined according to Pr.

Institut für Scientific Computing – Universität Wien

P.Brezany13

Primary Horizontal Fragmentation (cont.)

• Example: Consider the fragmentation of PROJ in the last example. If the only application that accesses PROJ wants to access the tuples according to the location, the set is complete since each tuple of each fragment PROJi, has the same probability of being accessed.

• If there is a second application which accesses only those project tuples where the budegt is less than $200.000, then Pr is not complete. Some of the tuples within each PROJi have a higher probability of being accessed due to this second application.

• To make the set of predicates complete, we need to add (BUDGET 200000, BUDGET > 20000) to Pr:

Pr = {LOC=“Montreal“, LOC=“New York“, LOC=“Paris“,

BUDGET200000, BUDGET > 20000}

Institut für Scientific Computing – Universität Wien

P.Brezany14

Primary Horizontal Fragmentation (cont.)

• The second desirable property of the set of predicates, according to which minterm predicates and turn, fragments are to be defined, is minimality.

• If a predicate influences how fragmentation is performed (i.e., causes a fragment f to be further fragmented into, say, fi and fj), there should be at least one application that accesses fi and fj differently. In other words, the simple predicate should be relevant in determining a fragmentation.

• If all the predicates of a set Pr are relevant, Pr is minimal.

Institut für Scientific Computing – Universität Wien

P.Brezany15

Primary Horizontal Fragmentation (cont.)

• Example: The set Pr defined in the previous example is

complete and minimal. If, however, we were to add the predicate

PNAME = „Instrumentation“

to Pr, the resulting set would not be minimal since the new predicate is not relevant with respect to Pr. There is no application that would access the resulting fragments any differently.

Institut für Scientific Computing – Universität Wien

P.Brezany16

Derived Horizontal Fragmentation

A derived horizontal fragmentation is defined on a member relation of a link according to a selection operation specified on its owner.

Given a link L where owner(L) = S and member(L) = R, the derivedhorizontal fragments of R are defined as

Ri = R Si, 1 i w

where w is the maximum number of fragments that will be

defined on R, and Si = Fi(S), where Fi is the formula according to which the primary horizontal fragment Si is defined.

Institut für Scientific Computing – Universität Wien

P.Brezany17

Derived Horizontal Fragmentation (cont.)Example: Consider link L1,

whereowner(L1) = PAY and member(L1) = EMP.Then we can group engineers into 2 groups according to their salary: $30.000 and > $30.000. The 2 fragmentsEMP1 and EMP2 are defined:

EMP1 = EMP PAY1

EMP2 = EMP PAY2

where

PAY1 = SAL30000 (PAY)

PAY2 = SAL>30000 (PAY)

Derived horizontal fragmentation of EMP