dist db

Upload: mohan-khedkar

Post on 04-Jun-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 dist db

    1/15

    Distributed Database Systems

    Fall 2012

    Distributed Database DesignSL02

    Design problem

    Design strategies (top-down, bottom-up)

    Fragmentation (horizontal, vertical)

    Allocation and replication of fragments, optimality, heuristics

    DDBS12, SL02 1/60 M. Bohlen

    Design Problem

    Design problem of distributed systems: Making decisions about

    the placement of dataand programsacross the sites of a computer

    network as well as possibly designing the network itself.

    In DDBMS, the distribution of applications involves Distribution of the DDBMS software Distribution of applications that run on the database

    Distribution of applications will not be considered in the following;

    instead the distribution of data is studied.

    DDBS12, SL02 2/60 M. Bohlen

    Framework of Distribution Dimension for the analysis of distributed systems

    Level of sharing: no sharing, data sharing, data + program sharing Behavior of access patterns: static, dynamic Level of knowledge on access pattern behavior: no information,

    partial information, complete information

    Distributed database design should be considered within this

    general framework.DDBS12, SL02 3/60 M. Bohlen

    Design Strategies/1

    Top-down approach Designing systems from scratch Homogeneous systems

    Bottom-up approach The databases already exist at a number of sites The databases should be connected to solve common tasks

    DDBS12, SL02 4/60 M. Bohlen

  • 8/13/2019 dist db

    2/15

    Design Strategies/2

    Top-down design strategy

    DDBS12, SL02 5/60 M. Bohlen

    Design Strategies/3

    Distribution designis the central part of the design in DDBMSs

    (the other tasks are similar to traditional databases).

    Objective: Design the LCSs by distributing the data over the sites.

    Two main aspects have to be designed carefully: Fragmentation: Relation may be divided into a number of

    sub-relations, which are distributed Allocationand replication: Each fragment is stored at site(s) with

    optimal distribution

    Distribution design issues Why fragment at all? How to fragment? How much to fragment? How to test correctness? How to allocate?

    DDBS12, SL02 6/60 M. Bohlen

    Design Strategies/4

    Bottom-up design strategy

    DDBS12, SL02 7/60 M. Bohlen

    Fragmentation/1 What is a reasonable unit of distribution? Relation or fragment of

    relation? Relationsas unit of distribution:

    If the relation is not replicated, we get a high volume of remote data

    accesses. If the relation is replicated, we get unnecessary replications, which

    cause problems in executing updates and waste disk space Might be an OK solution, if queries need all the data in the relation

    and data stays only at the sites that use the data Fragmentsof relation as unit of distribution:

    Application views are usually subsets of relations Thus, locality of accesses of applications is defined on subsets of

    relations Permits a number of transactions to execute concurrently, since they

    will access different portions of a relation Parallel execution of a single query (intra-query concurrency) However, semantic data control (especially integrity enforcement) is

    more difficult

    Fragments of relations are (usually) appropriate unit of distribution.

    DDBS12, SL02 8/60 M. Bohlen

  • 8/13/2019 dist db

    3/15

    Fragmentation/2

    Fragmentation aims to improve: Reliability Performance Balanced storage capacity and costs Communication costs Security

    The following information is used to decide fragmentation: Quantitative information: cardinality of relations, frequency of queries,

    site where query is run, selectivity of the queries, etc. Qualitative information: predicates in queries, types of access of

    data, read/write, etc.

    DDBS12, SL02 9/60 M. Bohlen

    Fragmentation/3

    Types of Fragmentation Horizontal: partitions a relation along its tuples Vertical: partitions a relation along its attributes Mixed/hybrid: a combination of horizontal and vertical fragmentation

    (a) Horizontal Fragmentation

    (b) Ver tica l Fra gmenta tio n (c) Mixe d Fra gme ntat ion

    DDBS12, SL02 10/60 M. Bohlen

    Fragmentation/4 Example of database instance

    DataDDBS12, SL02 11/60 M. Bohlen

    Fragmentation/5

    Example (contd.):Horizontal fragmentation of PROJ relation PROJ1: projects with budgets less than 200000 PROJ2: projects with budgets greater than or equal to 200000

    DDBS12, SL02 12/60 M. Bohlen

  • 8/13/2019 dist db

    4/15

    Fragmentation/6

    Example (contd.):Vertical fragmentation of PROJ relation PROJ1: information about project budgets PROJ2: information about project names and locations

    DDBS12, SL02 13/60 M. Bohlen

    Correctness Rules of Fragmentation

    Completeness Decomposition of relationR into fragments R1,R2, . . . ,Rn is complete

    iff each data item in Rcan also be found in someRi.

    Reconstruction If relationR is decomposed into fragmentsR1,R2, . . . ,Rn, then there

    should exist some relational operator that reconstructsRfrom its

    fragments, i.e.,R= R1 . . . Rn Union to combine horizontal fragments Join to combine vertical fragments

    Disjointness If relationR is decomposed into fragmentsR1,R2, . . . ,Rnand data

    itemdiappears in fragment Rj, thendishould not appear in any otherfragmentRk,k j (exception: primary key attribute for verticalfragmentation)

    For horizontal fragmentation, data item is a tuple For vertical fragmentation, data item is an attribute

    DDBS12, SL02 14/60 M. Bohlen

    Idea of Horizontal Fragmentation

    Intuitionbehind horizontal fragmentation Every site should hold all information that is used to query the site The information at the site should be fragmented so the queries of the

    site run faster

    Horizontal fragmentation isdefined as selection operation, p(R)

    Example:

    BUDGET

  • 8/13/2019 dist db

    5/15

    Information Requirements/2

    Application information:

    Simple predicates used in queries

    RelationR[A1,A2, ...,An] Aivia simple predicate ( {=, , ,},vi Di,Di is the

    domain ofA i) For relationRwe definePr= {p1, p2, ..., pm}

    Example: PNAME = Maintenance, BUDGET < 200000 Minterm predicates

    minterm predicates are conjunctions of simple predicates RelationR,Pr= {p1, p2, ..., pm} M= {m1,m2, ...,mr}such that

    M= {mi |mi= pjPrpj}, 1 j m, 1 i z

    where pj= pjor pj= pj

    DDBS12, SL02 17/60 M. Bohlen

    02.1 Information Requirements/3

    Application information (cntd):

    Minterm selectivities The number of tuples of the relation that would be accessed by a user

    query, which is specified according to a given minterm predicatemi.

    Access frequencies The frequency with which a user application qi accesses data.

    DDBS12, SL02 18/60 M. Bohlen

    02.2

    Horizontal Fragmentation/1

    We writePr to denote the complete and minimal set of simple

    predicates generated from Pr:

    The set of predicates iscompleteif and only if any two tuples in the

    same fragment are referenced with the same probability by any

    application

    The set of predicates isminimal if and only if each simple predicate

    is relevant for determining the fragmentation and for each pair of

    fragments there is at least one query that accesses the fragments

    differently

    DDBS12, SL02 19/60 M. Bohlen

    Horizontal Fragmentation/2

    Ahorizontal fragmentRiof relationRconsists of all the tuples of R that

    satisfy a predicateFj:

    Rj= Fj(R),1 j w

    whereFj is a selection formula, which is (preferably) a minterm predicate.

    Computingthe horizontal fragmentation:

    1. Determine Pr (80/20 rule)

    2. Compute Pr

    3. Determine M

    4. Minimize M (eliminating impossible minterms)

    DDBS12, SL02 20/60 M. Bohlen

  • 8/13/2019 dist db

    6/15

    Horizontal Fragmentation/3

    Example:Fragmentation of thePROJrelation

    Consider the following query: Find the name and budget of projects

    given their location. The query is issued at all three locations Fragmentation based on LOC, using the set of predicates

    {LOC= Montreal

    ,LOC= NewYork

    ,LOC= Paris

    }

    PROJ1 = LOC=Montreal(PROJ)

    PNO PNAME BUDGET LOC

    P1 Instrument. 150000 Montreal

    PROJ2 = LOC=NewYork(PROJ)

    PNO PNAME BU DGET LOC

    P2 DB Develop. 135000 New York

    P3 CAD/CAM 250000 New York

    PROJ3 = LOC=Paris(PROJ)PNO PNAME BUDGET LOC

    P4 Maintenance 310000 Paris

    DDBS12, SL02 21/60 M. Bohlen

    Horizontal Fragmentation/4

    If access is only according to the location, the above set ofpredicates is complete

    i.e., in each fragment PROJieach tuple has the same probability of

    being accessed If there is a second query/application that accesses only those

    project tuples where the budget is less than $200K, the set ofpredicates is not complete.

    P2 in PROJ2 has higher probability to be accessed

    DDBS12, SL02 22/60 M. Bohlen

    Horizontal Fragmentation/5

    Example (contd.): AddBUDGET200K)

    (LOC= Paris) (BUDGET 200K)

    (LOC= Paris) (BUDGET>200K)

    DDBS12, SL02 23/60 M. Bohlen

    Horizontal Fragmentation/6

    Example (contd.):Now,PROJi, i= 1, 2,3 will be split in twofragments

    PROJ1 = LOC=MontrBDGT

  • 8/13/2019 dist db

    7/15

    Derived Horizontal Fragmentation/1

    Defined on a member (target) relation of a link according to aselection operation specified on its owner (source).

    Each link is an equijoin. Fragments of member relation can be defined with a semijoin on

    fragments of owner relation.

    DDBS12, SL02 25/60 M. Bohlen

    02.5 Derived Horizontal Fragmentation/2

    Given a link L where owner(L) = Sand member(L) = R, thederived horizontal fragments of Rare defined as

    Ri= R Si, 1 i w

    where w is the maximum number of fragments that will be defined

    on R and

    Si= Fi(S)

    where Fi is the formula according to which the primary horizontal

    fragment Si is defined.

    DDBS12, SL02 26/60 M. Bohlen

    Derived Horizontal Fragmentation/3

    Given linkL 1 where owner(L1) = PAYand member(L 1) = EMP

    and PAY1= SAL30000(PAY) PAY2= SAL>30000(PAY)

    we get EMP1= EMP PAY1 EMP2= EMP PAY2

    DDBS12, SL02 27/60 M. Bohlen

    02.7 Vertical Fragmentation/1

    Objectiveof vertical fragmentation is to partition a relation into a set

    of smaller relations so that many of the applications will run on only

    one fragment.

    Vertical fragmentation of a relation Rproduces fragments

    R1, R2, . . . , each of which contains a subset ofRsattributes.

    Vertical fragmentation is defined using the projection operationof

    the relational algebra:

    A1,A

    2,...,A

    n

    (R)

    Example:

    PROJ1= PNO,BUDGET(PROJ)

    PROJ2= PNO,PNAME,LOC(PROJ)

    Vertical fragmentation has also been studied for (centralized) DBMS Smaller relations, and hence less page accesses e.g., MONET system

    DDBS12, SL02 28/60 M. Bohlen

  • 8/13/2019 dist db

    8/15

    Vertical Fragmentation/2

    Vertical fragmentation is more complicatedthan horizontalfragmentation

    In horizontal partitioning: fornsimple predicates, the number of

    possible minterms is 2n; some of them can be ruled out by existing

    implications/constraints. In vertical partitioning: for mnon-primary key attributes, the number

    of possible fragments is equal to B(m)(= the mth Bell number), i.e.,the number of partitions of a set withmmembers.

    For large numbers, B(m) mm (e.g.,B(15) = 109)

    DDBS12, SL02 29/60 M. Bohlen

    Vertical Fragmentation/3

    Two types of heuristics for vertical fragmentation exist: Grouping: assign each attribute to one fragment, and at each step,

    join some of the fragments until some cr iteria is satisfied.

    Bottom-up approach

    Splitting: starts with a relation and decides on beneficial partitioningsbased on the access behaviour of applications to the attributes.

    Top-down approach Results in non-overlapping fragments

    Optimal solution is probably closer to the full relation than to a set of

    small relations with only one attribute

    DDBS12, SL02 30/60 M. Bohlen

    Vertical Fragmentation/4

    Application information:The major information required as inputfor vertical fragmentation is related to applications (queries)

    Since vertical fragmentation places in one fragment those attributes

    usually accessed together, there is a need for some measure that

    would define more precisely the notion of togetherness, i.e., howclosely related the attributes are. This information is obtained from queries and collected in the

    Attribute Usage Matrixand Attribute Affinity Matrix.

    DDBS12, SL02 31/60 M. Bohlen

    Vertical Fragmentation/5

    Given are the user queries/applications Q= (q1, . . . , qq)that willrun on relationR(A1, . . . , An)

    Attribute Usage Matrix: Denotes which query uses which attribute:

    use(qi, A

    j) = 1 iffqiusesAj

    0 otherwise

    Theuse(qi, )vectors for each application are easy to define if thedesigner knows the applications that willl run on the DB (consider

    also the 80-20 rule)

    DDBS12, SL02 32/60 M. Bohlen

  • 8/13/2019 dist db

    9/15

    Vertical Fragmentation/6

    Example: Consider relationPROJ(PNO, PNAME,BUDGET, LOC)and queries:

    q1= SELECTBUDGETFROM PROJ WHEREPNO=Value

    q2= SELECTPNAME,BUDGETFROM PROJ

    q3= SELECTPNAMEFROM PROJ WHERELOC=Value

    q4

    = SELECT SUM(BUDGET) FROM PROJ WHERELOC=Value

    Abbreviations:

    A1= PNO,A2= PNAME, A3= BUDGET,A4 = LOC

    Attribute Usage Matrix

    DDBS12, SL02 33/60 M. Bohlen

    02.8 Vertical Fragmentation/7

    Attribute Affinity Matrix: Denotes the frequency of two attributesAiandAjwith respect to a set of queriesQ= (q1, . . . , qn):

    aff(Ai,Aj) = k: use(qk,Ai)=1,use(qk,Aj)=1

    ( sites l

    refl(qk) accl(qk))

    where refl(qk)is the cost (= number of accesses to (Ai, Aj)) of queryqK at

    sitel accl(qk)is the frequency of query qkat sitel

    DDBS12, SL02 34/60 M. Bohlen

    Vertical Fragmentation/8 Example (contd.):Let the cost of each query berefl(qk) = 1, and

    the frequencyaccl(qk)of the queries be as follows:

    Site1 Site2 Site3

    acc1(q1) = 15 acc2(q1) = 20 acc3(q1) = 10acc1(q2) = 5 acc2(q2) = 0 acc3(q2) = 0acc1(q3) = 25 acc2(q3) = 25 acc3(q3) = 25acc1(q4) = 3 acc2(q4) = 0 acc3(q4) = 0

    Attribute affinity matrix aff(Ai,Aj) =

    e.g.,aff(A1,A3) =1

    k=1

    3l=1accl(qk) =

    acc1(q1) +acc2(q1) +acc3(q1) = 45(q1 is the only query to access both A1 and A3)

    DDBS12, SL02 35/60 M. Bohlen

    Vertical Fragmentation/9

    Idea: Take the attribute affinity matrix (AA) and reorganize the

    attribute orders to form clusters where the attributes in each cluster

    demonstrate high affinity to one another.

    Bond energy algorithm (BEA)has been suggested for thatpurpose for several reasons:

    It is designed specifically to determine groups of similar items asopposed to a linear ordering of the items.

    The final groupings are insensitive to the order in which items are

    presented. The computation time is reasonable (O(n2), wherenis the number of

    attributes)

    DDBS12, SL02 36/60 M. Bohlen

  • 8/13/2019 dist db

    10/15

    Vertical Fragmentation/10

    BEA: Input: AA matrix Output: Clustered AA matrix (CA) Permutation is done in such a way to maximize the following global

    affinity mesaure(affinity ofAiand Ajwith their neighbors):

    AM=n

    i=1

    nj=1

    aff(Ai,Aj)[aff(Ai,Aj1) +aff(Ai,Aj+1) +

    aff(Ai1,Aj) +aff(Ai+1,Aj)]

    DDBS12, SL02 37/60 M. Bohlen

    Vertical Fragmentation/11 Example (contd.): Cluster Affinity MatrixCA after running BEA

    Elements with similar values are grouped together, and two clusters

    can be identified An additional partitioning algorithm is needed to identify the clusters

    inCA Usually more clusters and more than one candidate partitioning, thus

    additional steps are needed to select the best clustering. The resulting fragmentation after partitioning (PNOis added in

    PROJ2 explicilty as key):

    PROJ1= {PNO,BUDGET}

    PROJ2= {PNO,PNAME, LOC}

    DDBS12, SL02 38/60 M. Bohlen

    BEA Algorithm/1

    Input: The AA matrix

    Output: Clustered affinity matrix CA which is a permutation of AA

    Initialization: Place and fix one of the columns of AA in CA (we

    choose column 1 in our example)

    Iteration: Place the remaining n-i columns in the remaining i+1

    positions in the CA matrix. For each column choose the placement

    that makes the largest contribution to the global affinity measure.

    Row order: Order the rows according to the column ordering.

    DDBS12, SL02 39/60 M. Bohlen

    BEA Algorithm/2

    In order to determine the best placement we define the contribution of a

    placement.

    Contribution of a placement:

    cont(Ai,Ak,Aj) =

    2 bond(Ai,Ak)+

    2 bond(Ak, Aj)

    2 bond(Ai,Aj)

    Bond between a pair of attributes is defined as:

    bond(Ax, Ay) =n

    z=1

    aff(Az,Ax) aff(Az,Ay)

    DDBS12, SL02 40/60 M. Bohlen

  • 8/13/2019 dist db

    11/15

    BEA Algorithm/3

    Consider the following AA matrix and the corresponding CA matrix where

    A1 and A2 have been placed.

    AA =

    A1 A2 A3 A4A1 45 0 45 0

    A2 0 80 5 75

    A3 45 5 53 3A4 0 75 3 78

    CA =

    A1 A2A1 45 0

    A2 0 80

    A3 45 5A4 0 75

    bond(A3, A1) = 45 45+5 0+53 45+3 0= 4410bond(A3, A2) = 45 0+5 80+53 5+3 75= 890

    The next step is to place A3.

    DDBS12, SL02 41/60 M. Bohlen

    BEA Algorithm/4

    Ordering (0-3-1) :

    cont(A 0,A 3,A 1)

    =2 bond(A 0,A 3) +2 bond(A 3,A 1)2 bond(A 0,A 1)

    =2 0+2 4410 2 0= 8820

    Ordering (1-3-2) :

    cont(A 1,A 3,A 2)=2 bond(A 1,A 3) +2 bond(A 3,A 2)2 bond(A 1,A 2)

    =2 4410+2 890 2 225= 10150

    Ordering (2-3-4) :

    cont(A 2,A 3,A 4)

    =2 bond(A 2,A 3) +2 bond(A 3,A 4)2 bond(A 2,A 4)

    =1780

    DDBS12, SL02 42/60 M. Bohlen

    BEA Algorithm/5

    After addingA3 the CA matrix has the form

    CA =

    A1 A3 A245 45 0

    0 5 80

    45 53 5

    0 3 75

    After addingA4 and ordering rows the CA matrix has the form

    CA =

    A1 A3 A2 A4A1 45 45 0 0

    A3 45 53 5 3

    A2 0 5 80 75

    A4 0 3 75 78

    DDBS12, SL02 43/60 M. Bohlen

    02.9 BEA Algorithm/6

    The final step is to divide a set of clustered attributes {A1, ...,An}into

    two sets{A1, ...,Ai}and {Ai+1, ...,An}such that the costs of queries

    that access both sets are minimized.

    The appropriate split point on the diagonal must be determined:

    A1 A3 A2 A4A1 45 45 0 0

    A3 45 53 5 3A2 0 5 80 75

    A4 0 3 75 78

    This is an optimization problem: the optimal point on the diagonal

    must be determined.

    Note: Generalizations are needed to deal with partitions located in the middle of the matrix multiple split points

    DDBS12, SL02 44/60 M. Bohlen

  • 8/13/2019 dist db

    12/15

    BEA Algorithm/7

    AQ(qi) = {Aj |use(qi,Aj) = 1} attributes accessed byqi

    TQ= {qi |AQ(qi) TA} queries that access TA only

    BQ= {qi |AQ(qi) BA } queries that access BA only

    OQ= Q {TQ BQ} queries that access TA and BA

    CQ =

    qiQ

    Sjrefj(qi) accj(qi) cost of all queries

    CTQ=

    qiTQ

    Sjrefj(qi) accj(qi) cost of TQ queries

    CBQ=

    qiBQ

    Sjrefj(qi) accj(qi) cost of BQ queries

    COQ=

    qiOQ

    Sjrefj(qi) accj(qi) cost of other queries

    CTQ CBQ COQ2 maximize

    DDBS12, SL02 45/60 M. Bohlen

    Correctness of Vertical Fragmentation

    RelationR is decomposed into fragments R1,R2, . . . , Rn e.g.,PROJ= {PNO,BUDGET,PNAME, LOC}into

    PROJ1= {PNO,BUDGET}and PROJ2= {PNO,PNAME, LOC}

    Completeness Guaranteed by the partitioning algortihm, which assigns each

    attribute inA to one partition Reconstruction

    Join to reconstruct vertical fragments R= R1 Rn= PROJ1 PROJ2

    Disjointness Attributes have to be disjoint in VF. Two cases are distinguished:

    If tuple IDs are used, the fragments are really disjoint Otherwise, key attributes are replicated automatically by the system e.g., PNOin the above example

    DDBS12, SL02 46/60 M. Bohlen

    Mixed Fragmentation

    In most cases simple horizontal or vertical fragmentation of a DB

    schema will not be sufficient to satisfy the requirements of the

    applications.

    Mixed fragmentation (hybrid fragmentation): Consists of a

    horizontal fragment followed by a vertical fragmentation, or a vertical

    fragmentation followed by a horizontal fragmentation

    Fragmentation is defined using the selection and projection

    operations of relational algebra:

    p(A1,...,An(R))

    A1,...,An(p(R))

    DDBS12, SL02 47/60 M. Bohlen

    Replication and Allocation

    Replication: Which fragements shall be stored as multiple copies? Complete Replication

    Complete copy of the database is maintained in each site

    Selective Replication Selected fragments are replicated in some sites

    Allocation: On which sites to store the various fragments? Centralized

    Consists of a single DB and DBMS stored at one site with users

    distributed across the network

    Partitioned Database is partitioned into disjoint fragments, each fragment assigned

    to one site

    DDBS12, SL02 48/60 M. Bohlen

  • 8/13/2019 dist db

    13/15

    Replication/1

    Replicated DB fully replicated: each fragment at each site partially replicated: each fragment at some of the sites

    Non-replicated DB (= partitioned DB) partitioned: each fragment resides at only one site

    Rule of thumb:

    Ifread only queries

    update queries 1, then replication is advantageous, otherwise

    replication may cause problems

    DDBS12, SL02 49/60 M. Bohlen

    Replication/2

    Comparison of replication alternatives

    DDBS12, SL02 50/60 M. Bohlen

    Fragment Allocation/1

    Fragment allocation problem Given are:

    fragments F= {F1,F2, ...,Fn} networ k sites S= {S1,S2, ...,Sm} and applicationsQ= {q1, q2, ..., ql}

    Find: the optimal distribution of Fto S

    Optimality Minimal cost

    Communication + storage + processing (read and update) Cost in terms of time (usually)

    Performance

    Response time and/or throughput

    Constraints

    Per site constraints (storage and processing)

    DDBS12, SL02 51/60 M. Bohlen

    Fragment Allocation/2

    Required information Database Information

    selectivity of fragments size of a fragment

    Application Information RRij: number of read accesses of a query qito a fragmentFj URij: number of update accesses of query qito a fragment Fj uij: a matrix indicating which queries updates which fragments, rij: a similar matrix for retrievals originating site of each query

    Site Information USCk: unit cost of storing data at a site Sk LPCk: cost of processing one unit of data at a site Sk

    Network Information communication cost/frame between two sites frame size

    DDBS12, SL02 52/60 M. Bohlen

  • 8/13/2019 dist db

    14/15

    Fragment Allocation/3

    We discuss anallocation modelwhich attempts to minimize the total cost of processing and storage meet certain response time restrictions

    General Form:

    min(Total Cost)

    subject to response time constraint storage constraint processing constraint

    Decision variablexij

    xij=

    1 if fragmentFi is stored at siteSj0 otherwise

    DDBS12, SL02 53/60 M. Bohlen

    Fragment Allocation/4

    The total cost functionhas two components: storage and query

    processing.

    TOC=

    SkS

    FjF

    STCjk+qiQ

    QPCi

    Storage costof fragment Fjat site Sk:

    STCjk =USCk size(Fj) xij

    whereUSCkis the unit storage cost at site k

    Query processing costfor a query qiis composed of twocomponents:

    composed of processing cost (PC) and transmission cost (TC)

    QPCi= PCi+ TCi

    DDBS12, SL02 54/60 M. Bohlen

    Fragment Allocation/5

    Processing costis a sum of three components: access cost (AC), integrity contraint cost (IE), concurency control cost

    (CC)

    PCi= ACi+ IEi+ CCi

    Access cost:

    ACi=skS

    FjF

    (URij+ RRij) xij LPCk

    where LPCk is the unit process cost at sitek Integrity and concurrency costs:

    Can be similarly computed, though depends on the specific constraints

    Note:ACiassumes that processing a query involves decomposingit into a set of subqueries, each of which works on a fragment, ...,

    This is a very simplistic model Does not take into consideration different query costs depending on

    the operator or different algorithms that are applied

    DDBS12, SL02 55/60 M. Bohlen

    Fragment Allocation/6

    Thetransmission cost is composed of two components: Cost of processing updates (TCU) and cost of processing retrievals

    (TCR)

    TCi= TCUi+ TCRi

    Cost of updates: Inform all the sites that have replicas + a short confirmation message

    back

    TCUi=

    SkS

    FjF

    uij (update message cost+acknowledgment cost)

    Retrieval cost: Send retrieval request to all sites that have a copy of fragments that are

    needed + sending back the results from these sites to the originating

    site.

    TCRi=FjF

    minSkS

    (xjk (cost retrieval request+cost sending back result))

    DDBS12, SL02 56/60 M. Bohlen

  • 8/13/2019 dist db

    15/15

    Fragment Allocation/7

    Modeling theconstraints

    Response timeconstraint for a queryqi

    execution time of qi max. allowable response time for qi

    Storageconstraints for a siteSkFjF

    storage requirement ofFja tSk storage capacity ofSk

    Processingconstraints for a siteSkqiQ

    processing load ofqiat site Sk processing capacity ofSk

    DDBS12, SL02 57/60 M. Bohlen

    Fragment Allocation/8

    Solution Methods The complexity of this allocation model/problem is NP-complete Correspondence between the allocation problem and similar

    problems in other areas

    Plant location problem in operations research Knapsack problem Network flow problem

    Hence, solutions from these areas can be re-used Use different heuristics to reduce the search space

    Assume that all candidate partitionings have been determined together

    with their associated costs and benefits in terms of query processing. The problem is then reduced to find the optimal partitioning and

    placement for each relation Ignore replication at the first step and find an optimal non-replicated

    solution Replication is then handeled in a second step on top of the previous

    non-replicated solution.

    DDBS12, SL02 58/60 M. Bohlen

    Conclusion/1

    Distributed design decides on the placement of (parts of the) data

    and programs across the sites of a computer network

    There are two key technical questions: fragmentation and

    allocation/replication of data

    Horizontal fragmentation is defined via the selection operationp(R)

    Rewrites the queries of each site in the conjunctive normal form andfinds a minimal and complete set of conjunctions to determine

    fragmentation

    Vertical fragmentation via the projection operation A (R) Computes the attribute affinity matrix and groups similar attributes

    together

    Mixed fragmentation is a combination of both approaches

    DDBS12, SL02 59/60 M. Bohlen

    Conclusion/2

    Allocation/Replication of data Type of replication: no replication, partial replication, full replication Optimal allocation/replication modelled as a cost function under a set

    of constraints The complexity of the problem is NP-complete Use of different heuristics to reduce the complexity

    DDBS12, SL02 60/60 M. Bohlen