ddbms
DESCRIPTION
DDBMS. Distributed Database Management Systems Fragmentation Data Allocation. Distributed Database. Distributed Database A collection of multiple, logically interrelate databases, distributed over a computer network Distributed Database Management System - PowerPoint PPT PresentationTRANSCRIPT
DDBMS Distributed Database Management
Systems Fragmentation Data Allocation
Distributed Database
Distributed Database A collection of multiple, logically interrelate
databases, distributed over a computer network Distributed Database Management System
A software system that supports the transparent creation, access, and
manipulation of interrelated data located at different sites of a computer network.
Centralized vs Distributed Differences between the two
approaches: Centralized: achieves efficiency through
local optimization by using complex physical data structures
Distributed: achieves efficiency through global optimization of processing including cost of network communication and local processing.
DDBMS Benefits Transparency
Separation of high-level semantics from low-level implementation issues
Extension of the data Independence concept in centralized databases
Basic Concepts Fragmentation
Allocation replication
Transparency Language Transparency Fragmentation Transparency Replication Transparency Network Transparency Data Independence Data
Network Protect the user from the operational details of the
network Users do not have to specify where the data is located
location transparency – naming transparency
Replication Replicas (copies) are created for performance and
reliability reasons Replication causes difficult update problems Users should be made unaware of the existence of
these copies
Transparency
Fragmentation Basic fragments are parts of a relation
vertical: subset of columns – horizontal: subset of rows
Fragments are also created for performance and reliability reasons
Users should be made unaware of the existence of fragments
Transparency
PERFORMANCE Greater throughput due to:
Data Localization Reduces communication overhead
Parallelism Inter-query and intra-query parallelism
DDBMS Benefits
Autonomy: degree to which each DBMS can operate independently (distribution of control, not data)
Distribution: describes where the data is located physically
Heterogeneity: indicates uniformity of the DBMSs with respect to data
DDBMS ARCHITECTURAL ALTERNATIVES
Autonomy
Distribution
Heterogeneity
DDBMS ARCHITECTURAL ALTERNATIVES
DDBMS ARCHITECTURAL ALTERNATIVES
Autonomy (A) A0: Tight integration A1: Semi-autonomous A2: Total isolation
Distribution (D) D0: Non-distributed D1: Client Server D2: Peer-to-peer
Heterogeneity (H)H0:HomogeneousH1: Heterogeneous
3*3*2 = 18 AlternativesSome alternatives aremeaningless or not practical!
Major DDBMS Architecturea Client-server Peer-to-peer Multidatabase
CLIENT SERVER (Ax, D1, Hy)
Distribute the functionality between client and server to better manage the complexity of the DBMS
Two-level architecture
CLIENT SERVER Typical Scenario
1. Client parses a query, decomposes it into independent site queries, and sends it to an appropriate server.
2. Each server processes a local query and sends the result relation to the client.
3. The client combines the results of all the sub-queries.
PEER-TO-PEER (A0, D2, H0)
Global users submit requests via an external schema (ES) defined over a global conceptual schema (GCS), which is system maintained, and that is the union of all local conceptual schemas (LCSs)
The global system has no control over local data and processing.
MULTI-DATABASE (A2, Dx, Hy)
The global conceptual schema (GCS) exists as a union of some local conceptual schemas (LCSs) only
or does not exist.
Three orthogonal dimensions for data distribution level of sharing access pattern behaviour level of knowledge (of access pattern
behaviour)
LEVEL OF
KNOWLEDGE
ACCESS PATTERN BEHAVIOUR
Level of Sharing
Data
Data+program
Dynamic
Static
Partial information Complete informationNo sharing
The three orthogonal dimensions
Level of sharing no sharing: each application and its data execute
at one site data sharing: programs replicated at each site;
data not replicated, but moved to a site as needed data+program sharing: both data and programs
may be moved to a site as needed
Access pattern behaviour static: access patterns to data do not change over time dynamic: access patterns to data change over time
how dynamic? Level of knowledge (of access pattern
behaviour) no knowledge: not a likely scenario partial knowledge: can predict, but users may deviate
significantly complete knowledge: can predict and no major
changes
The three orthogonal dimensions
• TOP-DOWN Approach
• Have a database
• How to partition and /or replicate it across sites
• BOTTOM_UP Approach
• Have existing databases at different sites
• How to integrate then together and deal with heterogeneity and autonomy?
DISTRIBUTED DATABASE DESIGN
DISTRIBUTED DATABASE DESIGN
Top-Down Typical approach to database design information regarding
distribution of accesses among sites nature of access to database at each site
needs to be gathered during Requirements Analysis Design local conceptual schemas by distributing the
entities over the sites of the distributed system after conceptual schema design
Distribution activity consists of: data fragmentation split up schema into pieces data allocation assign schema pieces to sites
Bottom-Up Approach Necessary when databases already exist
and we need to integrate them into one database
Similar to view integration, but may be heterogeneous
DISTRIBUTED DATABASE DESIGN
Top-Down Design
Requirement Analysis
System Requirement
Conceptual Design View Design
Global Conceptual Schema Access Information
External Schema Definition
Distribution Design
Top-Down Design(cont’d)Distribution Design
Logical Conceptual Schema
Physical Design
Physical Schema
Observation and Monitoring
User Input
Reasons for Fragmentation Relation is not an appropriate unit of
distribution Application views are usually subsets of
relations Application access is local to subsets of relations
Applications at different sites may require different parts of the same relation
Store once high remote access costs Duplicate high update costs
Concurrency of access is more limited
BUT, some applications may suffer from data fragmentation Their required data is located in two or
more fragments It may be harder to check integrity
constraints
Careful design of data fragmentation is required!
Reasons for Fragmentation
Completeness If a relation R is decomposed into fragments R1, R2, …,
Rn, each tuple/attribute that can be found in R can also be found in one or more of the Ri’s
Reconstruction If a relation R is decomposed into fragments R1, R2, …,
Rn, it should be possible to define a relational operator such that R = Ri RiFR
Disjointness If a relation R is horizontally decomposed into fragments
R1, R2, …,Rn and data item di is in Rj, it is not in any other fragment Rk (kj)
Correctness Rule for Fragmentation
Types of Fragmentation Horizontal Vertical Hybrid
HORIZONTAL DATA FRAGMENTATION
What is it? Partitions a relation along its tuples so
that each fragment has a subset of the tuples the relation.
Types of horizontal fragmentation Primary: based on a predicate Pi that selects
tuples from a relation R Derived: based on the partitioning of a relation due
to predicates defined on another relation related according to foreign keys
Each fragment, Ri, is a selection on a relation R using a predicate P
HORIZONTAL DATA FRAGMENTATION: Primary
HORIZONTAL DATA FRAGMENTATION: Derived
HORIZONTAL DATA FRAGMENTATION: Derived
HORIZONTAL FRAGMENTATION: INFORMATION REQUIREMENTS
Database Information Relations in the database and relationships
between them specially with joins owner member
Project Works On
Application Information User query predicates examine most
important applications (80/20 rule) simple: p1: DEPT = ‘CSE’; p2 : SAL > 3000 conjunctive: minterm predicate: m1= p1ANDp2
(e.g., (DEPT=‘CSEE’) AND (SAL>3000)) minterm selectivity: number of tuples returned
against a given minterm access frequency: access frequency of user
queries
HORIZONTAL FRAGMENTATION: INFORMATION REQUIREMENTS
Predicates & Minterms Aspects of simple predicates:
Completeness: A set of simple predicate is said to be complete iff there is an equal probability of access by every application to any tuple belonging to any minterm fragment that is defined according to the set
If the only application that accesses Project wants to access depts, it is complete.
Minimality: If a predicate influences how fragmentation is performed (causes a fragment to be further broken in fi and fj), there should be at least one application that accesses fi and fj differently.
Predicates & Minterms
produces fragments R1, R2, R3, …,Rn of a relation R each fragment contains a subset of R’s attributes as
well as the primary key of R divides relations “vertically” by columns
(attributes) the objective is to obtain fragments so that
applications only need to access one fragment want to minimize execution time of applications
inherently more complicated than horizontal data fragmentation due to the total number of alternatives available
VERTICAL FRAGMENTATION
VERTICAL FRAGMENTATION
VERTICAL FRAGMENTATION
VERTICAL FRAGMENTATION: INFORMATION REQUIREMENTS
HYBRID FRAGMENTATION
HYBRID FRAGMENTATION
ALLOCATION Given:
a set of fragments F = {F1, F2, …, Fn} a set of sites S = {S1, S2, …, Sm} a set of transactions T = {T1, T2, …, Tp}
Find an “optimal” distribution of F to S
ALLOCATION 1. Minimize cost
Cost of storing each Fi at Sj Cost of querying Fi at Sj Cost of updating Fi at all sites where it is stored Cost of data communication
2. Maximize performance Minimize response time at each site Maximize system throughput at each site
This problem is NP-hard!