ddbms

45
DDBMS Distributed Database Management Systems Fragmentation Data Allocation

Upload: donald

Post on 11-Jan-2016

111 views

Category:

Documents


6 download

DESCRIPTION

DDBMS. Distributed Database Management Systems Fragmentation Data Allocation. Distributed Database. Distributed Database A collection of multiple, logically interrelate databases, distributed over a computer network Distributed Database Management System - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: DDBMS

DDBMS Distributed Database Management

Systems Fragmentation Data Allocation

Page 2: DDBMS

Distributed Database

Distributed Database A collection of multiple, logically interrelate

databases, distributed over a computer network Distributed Database Management System

A software system that supports the transparent creation, access, and

manipulation of interrelated data located at different sites of a computer network.

Page 3: DDBMS

Centralized vs Distributed Differences between the two

approaches: Centralized: achieves efficiency through

local optimization by using complex physical data structures

Distributed: achieves efficiency through global optimization of processing including cost of network communication and local processing.

Page 4: DDBMS

DDBMS Benefits Transparency

Separation of high-level semantics from low-level implementation issues

Extension of the data Independence concept in centralized databases

Basic Concepts Fragmentation

Allocation replication

Page 5: DDBMS

Transparency Language Transparency Fragmentation Transparency Replication Transparency Network Transparency Data Independence Data

Page 6: DDBMS

Network Protect the user from the operational details of the

network Users do not have to specify where the data is located

location transparency – naming transparency

Replication Replicas (copies) are created for performance and

reliability reasons Replication causes difficult update problems Users should be made unaware of the existence of

these copies

Transparency

Page 7: DDBMS

Fragmentation Basic fragments are parts of a relation

vertical: subset of columns – horizontal: subset of rows

Fragments are also created for performance and reliability reasons

Users should be made unaware of the existence of fragments

Transparency

Page 8: DDBMS

PERFORMANCE Greater throughput due to:

Data Localization Reduces communication overhead

Parallelism Inter-query and intra-query parallelism

DDBMS Benefits

Page 9: DDBMS

Autonomy: degree to which each DBMS can operate independently (distribution of control, not data)

Distribution: describes where the data is located physically

Heterogeneity: indicates uniformity of the DBMSs with respect to data

DDBMS ARCHITECTURAL ALTERNATIVES

Page 10: DDBMS

Autonomy

Distribution

Heterogeneity

DDBMS ARCHITECTURAL ALTERNATIVES

Page 11: DDBMS

DDBMS ARCHITECTURAL ALTERNATIVES

Autonomy (A) A0: Tight integration A1: Semi-autonomous A2: Total isolation

Distribution (D) D0: Non-distributed D1: Client Server D2: Peer-to-peer

Heterogeneity (H)H0:HomogeneousH1: Heterogeneous

3*3*2 = 18 AlternativesSome alternatives aremeaningless or not practical!

Page 12: DDBMS

Major DDBMS Architecturea Client-server Peer-to-peer Multidatabase

Page 13: DDBMS

CLIENT SERVER (Ax, D1, Hy)

Distribute the functionality between client and server to better manage the complexity of the DBMS

Two-level architecture

Page 14: DDBMS

CLIENT SERVER Typical Scenario

1. Client parses a query, decomposes it into independent site queries, and sends it to an appropriate server.

2. Each server processes a local query and sends the result relation to the client.

3. The client combines the results of all the sub-queries.

Page 15: DDBMS

PEER-TO-PEER (A0, D2, H0)

Global users submit requests via an external schema (ES) defined over a global conceptual schema (GCS), which is system maintained, and that is the union of all local conceptual schemas (LCSs)

The global system has no control over local data and processing.

Page 16: DDBMS

MULTI-DATABASE (A2, Dx, Hy)

The global conceptual schema (GCS) exists as a union of some local conceptual schemas (LCSs) only

or does not exist.

Page 17: DDBMS

Three orthogonal dimensions for data distribution level of sharing access pattern behaviour level of knowledge (of access pattern

behaviour)

Page 18: DDBMS

LEVEL OF

KNOWLEDGE

ACCESS PATTERN BEHAVIOUR

Level of Sharing

Data

Data+program

Dynamic

Static

Partial information Complete informationNo sharing

Page 19: DDBMS

The three orthogonal dimensions

Level of sharing no sharing: each application and its data execute

at one site data sharing: programs replicated at each site;

data not replicated, but moved to a site as needed data+program sharing: both data and programs

may be moved to a site as needed

Page 20: DDBMS

Access pattern behaviour static: access patterns to data do not change over time dynamic: access patterns to data change over time

how dynamic? Level of knowledge (of access pattern

behaviour) no knowledge: not a likely scenario partial knowledge: can predict, but users may deviate

significantly complete knowledge: can predict and no major

changes

The three orthogonal dimensions

Page 21: DDBMS

• TOP-DOWN Approach

• Have a database

• How to partition and /or replicate it across sites

• BOTTOM_UP Approach

• Have existing databases at different sites

• How to integrate then together and deal with heterogeneity and autonomy?

DISTRIBUTED DATABASE DESIGN

Page 22: DDBMS

DISTRIBUTED DATABASE DESIGN

Top-Down Typical approach to database design information regarding

distribution of accesses among sites nature of access to database at each site

needs to be gathered during Requirements Analysis Design local conceptual schemas by distributing the

entities over the sites of the distributed system after conceptual schema design

Distribution activity consists of: data fragmentation split up schema into pieces data allocation assign schema pieces to sites

Page 23: DDBMS

Bottom-Up Approach Necessary when databases already exist

and we need to integrate them into one database

Similar to view integration, but may be heterogeneous

DISTRIBUTED DATABASE DESIGN

Page 24: DDBMS

Top-Down Design

Requirement Analysis

System Requirement

Conceptual Design View Design

Global Conceptual Schema Access Information

External Schema Definition

Distribution Design

Page 25: DDBMS

Top-Down Design(cont’d)Distribution Design

Logical Conceptual Schema

Physical Design

Physical Schema

Observation and Monitoring

User Input

Page 26: DDBMS

Reasons for Fragmentation Relation is not an appropriate unit of

distribution Application views are usually subsets of

relations Application access is local to subsets of relations

Applications at different sites may require different parts of the same relation

Store once high remote access costs Duplicate high update costs

Concurrency of access is more limited

Page 27: DDBMS

BUT, some applications may suffer from data fragmentation Their required data is located in two or

more fragments It may be harder to check integrity

constraints

Careful design of data fragmentation is required!

Reasons for Fragmentation

Page 28: DDBMS

Completeness If a relation R is decomposed into fragments R1, R2, …,

Rn, each tuple/attribute that can be found in R can also be found in one or more of the Ri’s

Reconstruction If a relation R is decomposed into fragments R1, R2, …,

Rn, it should be possible to define a relational operator such that R = Ri RiFR

Disjointness If a relation R is horizontally decomposed into fragments

R1, R2, …,Rn and data item di is in Rj, it is not in any other fragment Rk (kj)

Correctness Rule for Fragmentation

Page 29: DDBMS

Types of Fragmentation Horizontal Vertical Hybrid

Page 30: DDBMS

HORIZONTAL DATA FRAGMENTATION

What is it? Partitions a relation along its tuples so

that each fragment has a subset of the tuples the relation.

Types of horizontal fragmentation Primary: based on a predicate Pi that selects

tuples from a relation R Derived: based on the partitioning of a relation due

to predicates defined on another relation related according to foreign keys

Page 31: DDBMS

Each fragment, Ri, is a selection on a relation R using a predicate P

HORIZONTAL DATA FRAGMENTATION: Primary

Page 32: DDBMS

HORIZONTAL DATA FRAGMENTATION: Derived

Page 33: DDBMS

HORIZONTAL DATA FRAGMENTATION: Derived

Page 34: DDBMS

HORIZONTAL FRAGMENTATION: INFORMATION REQUIREMENTS

Database Information Relations in the database and relationships

between them specially with joins owner member

Project Works On

Page 35: DDBMS

Application Information User query predicates examine most

important applications (80/20 rule) simple: p1: DEPT = ‘CSE’; p2 : SAL > 3000 conjunctive: minterm predicate: m1= p1ANDp2

(e.g., (DEPT=‘CSEE’) AND (SAL>3000)) minterm selectivity: number of tuples returned

against a given minterm access frequency: access frequency of user

queries

HORIZONTAL FRAGMENTATION: INFORMATION REQUIREMENTS

Page 36: DDBMS

Predicates & Minterms Aspects of simple predicates:

Completeness: A set of simple predicate is said to be complete iff there is an equal probability of access by every application to any tuple belonging to any minterm fragment that is defined according to the set

If the only application that accesses Project wants to access depts, it is complete.

Page 37: DDBMS

Minimality: If a predicate influences how fragmentation is performed (causes a fragment to be further broken in fi and fj), there should be at least one application that accesses fi and fj differently.

Predicates & Minterms

Page 38: DDBMS

produces fragments R1, R2, R3, …,Rn of a relation R each fragment contains a subset of R’s attributes as

well as the primary key of R divides relations “vertically” by columns

(attributes) the objective is to obtain fragments so that

applications only need to access one fragment want to minimize execution time of applications

inherently more complicated than horizontal data fragmentation due to the total number of alternatives available

VERTICAL FRAGMENTATION

Page 39: DDBMS

VERTICAL FRAGMENTATION

Page 40: DDBMS

VERTICAL FRAGMENTATION

Page 41: DDBMS

VERTICAL FRAGMENTATION: INFORMATION REQUIREMENTS

Page 42: DDBMS

HYBRID FRAGMENTATION

Page 43: DDBMS

HYBRID FRAGMENTATION

Page 44: DDBMS

ALLOCATION Given:

a set of fragments F = {F1, F2, …, Fn} a set of sites S = {S1, S2, …, Sm} a set of transactions T = {T1, T2, …, Tp}

Find an “optimal” distribution of F to S

Page 45: DDBMS

ALLOCATION 1. Minimize cost

Cost of storing each Fi at Sj Cost of querying Fi at Sj Cost of updating Fi at all sites where it is stored Cost of data communication

2. Maximize performance Minimize response time at each site Maximize system throughput at each site

This problem is NP-hard!