distributed dbmss – concepts and design

40
Distributed DBMSs – Concepts and Design Chapter 24 in Textbook

Upload: cedric-grimes

Post on 02-Jan-2016

30 views

Category:

Documents


1 download

DESCRIPTION

Distributed DBMSs – Concepts and Design. Chapter 24 in Textbook. Overview. Concepts. What is a distributed DBMS? Distributed Processing. Homogeneous vs. Heterogeneous. Functions of a DDBMS. Components of a DDBMS. Advantages and Disadvantages. DDBMS Design. Fragmentation. Replication. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Distributed DBMSs – Concepts and Design

Distributed DBMSs – Concepts and Design

Chapter 24 in Textbook

Page 2: Distributed DBMSs – Concepts and Design

Overview

2

Concepts. What is a distributed DBMS? Distributed Processing. Homogeneous vs. Heterogeneous.

Functions of a DDBMS. Components of a DDBMS. Advantages and Disadvantages. DDBMS Design.

Fragmentation. Replication. Allocation.

DDBMS Transparencies. Date’s 12 Rules for a DDBMS.

Page 3: Distributed DBMSs – Concepts and Design

Concepts

3

Centralized DBMS systems with a single logical database

located at one site under the control of a single DBMS.

Distributed DBs logically interrelated collection of shared

data physically distributed over a computer network.

Applications can be classified into:

Local applications.

Global applications.

Page 4: Distributed DBMSs – Concepts and Design

Distributed DBMS

4

Distributed DBMS The software system that:

manages the distributed DBs.

makes distribution transparent to users.

allows users to access data on their own site as well

as remote sites.

Transparent distribution is the fundamental

principle of DDBMS.

Page 5: Distributed DBMSs – Concepts and Design

Characteristics of DDBMS

5

• A collection of logically related shared data.

• The data is split into a number of fragments.

• Fragments may be replicated.

• Fragments/replicas are allocated to sites.

• The sites are linked by a communications networks.

• The data at each site is under the control of a DBMS.

• The DBMS at each site can handle local applications.

• Each DBMS participates in at least one global application.

Page 6: Distributed DBMSs – Concepts and Design

Distributed DBMS Topology

6

Site 1

Site 2

Site 3

Site 4

Computer Network

Data itself is distributed and access to it can be local or remote.

Page 7: Distributed DBMSs – Concepts and Design

Distributed Processing

7

Site 1

Site 2

Site 3

Site 4

Computer Network

Data itself is centralized but access to it can be local or remote.

Page 8: Distributed DBMSs – Concepts and Design

Homogeneous vs. Heterogeneous DDBMS

8

Homogenous system: all sites use the same DBMS product.

Heterogeneous system: sites may run different DBMS

products & data model.

Possible differences between data in different DBS:

• Data type difference.

• Value difference.

• Semantic difference.

Page 9: Distributed DBMSs – Concepts and Design

Functions of a DDBMS

9

• Provide access to remote sites and allow transfer of

queries & data among the network’s site.

• Store data distribution details.

• Distributed data processing.

• Security control.

• Concurrency control.

• Recovery services.

Page 10: Distributed DBMSs – Concepts and Design

Components of a DDBMS

10

Site 1

Site 3

Computer Network

DDBMS

DC LDBMS

DDBMS

DC

GSC

GSC

DB

Global system catalog

Data communication component

Page 11: Distributed DBMSs – Concepts and Design

Advantages of DDBMS

11

• Reflects organizational structure.

• Improve sharability & local autonomy.

• Improved availability.

• Improved reliability.

• Improved performance.

Page 12: Distributed DBMSs – Concepts and Design

Disadvantages of DDBMS

12

• Complexity.

• Cost.

• Security.

• Integrity control more difficult.

• Lack of standards.

• Lack of experience.

• DB design more complex.

Page 13: Distributed DBMSs – Concepts and Design

Distributed Relational DB Design

13

We have a group of tables and we want to distribute them between a group of sites.

Consists of 3 major steps:1. Fragmentation divide a relation into a number of sub-relations (fragments).

(Horizontal & vertical).

2. Replication make a copy of a fragment.

3. Allocation decide where (which site) each of the fragments and replicas are

to be stored.

Page 14: Distributed DBMSs – Concepts and Design

Distributed Relational DB Design

14

When we fragment, replicate and allocate, we try

to achieve:• Locality of reference.

• Improved reliability and availability.

• Good performance.

• Balanced storage capacities and costs.

• Minimal communication costs.

Page 15: Distributed DBMSs – Concepts and Design

Rules of Fragmentation

15

Completeness: Nothing (rows or columns) gets lost while we fragment.

Reconstruction: We can get back the original table after we fragmented it.

Dis-jointness: No row or column appears in 2 fragments (there is 1 exception).

Page 16: Distributed DBMSs – Concepts and Design

Types of Fragmentation

16

Horizontal fragmentation

Vertical fragmentation

Mixedfragmentation

Page 17: Distributed DBMSs – Concepts and Design

Original PropertyForRent Table

17

PropertyNo

Street City PostCode Type Rooms Rent OwnerNo

StaffNo BranchNo

PA14 16 Holhead Aberdeen AB7 5SU House 6 650 CO46 SA9 B007

PG4 6 Lawrence Glasgow G11 9QX Flat 3 350 CO40 SG14 B003

PG16 5 Novar Dr Glasgow G12 9AX Flat 4 450 CO93 SG14 B003

PG21 18 Dale Rd Glasgow G12 House 5 600 CO87 SG37 B003

PG36 2 Manor Rd Glasgow G32 4QX Flat 3 375 CO93 SG37 B003

PL94 6 Argy11 St London NW2 Flat 4 400 CO87 SL41 B005

Page 18: Distributed DBMSs – Concepts and Design

18

BranchNo

Based on type of property.

P1: Type=‘House’ (PropertyForRent)

P2: Type=‘Flat’ (PropertyForRent)

Horizontal Fragmentation

PropertyNo

Street City PostCode Type Rooms Rent OwnerNo

StaffNo BranchNo

PA14 16 Holhead Aberdeen AB7 5SU House 6 650 CO46 SA9 B007

PG21 18 Dale Rd Glasgow G12 House 5 600 CO87 SG37 B003

PropertyNo

Street City PostCode Type Rooms Rent OwnerNo

StaffNo BranchNo

PG4 6 Lawrence Glasgow G11 9QX Flat 3 350 CO40 SG14 B003

PG16 5 Novar Dr Glasgow G12 9AX Flat 4 450 CO93 SG14 B003

PG36 2 Manor Rd Glasgow G32 4QX Flat 3 375 CO93 SG37 B003

PL94 6 Argy11 St London NW2 Flat 4 400 CO87 SL41 B005

Fragment P1

Fragment P2

Page 19: Distributed DBMSs – Concepts and Design

Original Staff Table

19

StaffNo Position sex DOB Salary FName LName BranchNo

SL21 Manager M 1 Oct 93 30000 John White B005

SG37 Assistant F 10 Nov 60 12000 Ann Beech B003

SG14 Supervisor M 24 Mar 58 18000 David Ford B003

SG5 Assistant F 3 Jun 40 24000 Susan Brand B007

Page 20: Distributed DBMSs – Concepts and Design

20

S1: staffno,Position,sex,DOB, Salary(STAFF)

S2: staffno,fname,lname,BranchNo(STAFF)

Vertical Fragmentation

StaffNo Position sex DOB Salary

SL21 Manager M 1 Oct 93 30000

SG37 Assistant F 10 Nov 60 12000

SG14 Supervisor M 24 Mar 58 18000

SG5 Assistant F 3 Jun 40 24000

StaffNo FName LName BranchNo

SL21 John White B005

SG37 Ann Beech B003

SG14 David Ford B003

SG5 Susan Brand B007

Fragment S1 Fragment S2

Page 21: Distributed DBMSs – Concepts and Design

21

FName LName BranchNoFragment S2.3

StaffNo FName LName BranchNo

Fragment S2.1

StaffNo LName BranchNo

Fragment S2.2S2.1: BranchNo=‘B005’ (S2)

S2.2: BranchNo=‘B003’ (S2)

S2.3: BranchNo=‘B007’ (S2)

S1: staffno,Position,sex,DOB, Salary(STAFF)

S2: staffoo,fname,lname,BranchNo(STAFF)

Fragment S1

Mixed Fragmentation – Vertical then Horizontal

StaffNo FName LName BranchNo

SL21 John White B005

StaffNo FName LName BranchNo

SG37 Ann Beech B003

SG14 David Ford B003

StaffNo FName LName BranchNo

SG5 Susan Brand B007

StaffNo Position sex DOB Salary

SL21 Manager M 1 Oct 93 30000

SG37 Assistant F 10 Nov 60 12000

SG14 Supervisor M 24 Mar 58 18000

SG5 Assistant F 3 Jun 40 24000

Page 22: Distributed DBMSs – Concepts and Design

Derived Horizontal Fragmentation

22

Derived Horizontal Fragmentation is the horizontal fragmentation of a table (child), T1, because we horizontally fragmented another related table (parent), T2.

It is not explicitly specified in design but implied from fragmentation of T2.

T1 (child) has a foreign key that belongs to T2 (parent).

Relationship between T1 and T2 either 1-to-1 or Many-to-1.

Use Semi-join operation:

Page 23: Distributed DBMSs – Concepts and Design

Derived Horizontal Fragmentation

23

You were required by the design to horizontally fragment Staff table. S1: BranchNo=‘B003’ (Staff) S2: BranchNo=‘B005’ (Staff) S3: BranchNo=‘B007’ (Staff)

StaffNo Position sex DOB Salary FName LName BranchNo

SL21 Manager M 1 Oct 93 30000 John White B005

SG37 Assistant F 10 Nov 60 12000 Ann Beech B003

SG14 Supervisor M 24 Mar 58 18000 David Ford B003

SG5 Assistant F 3 Jun 40 24000 Susan Brand B007

Page 24: Distributed DBMSs – Concepts and Design

Derived Horizontal Fragmentation

24

Fragment S1

Fragment S2

Fragment S3

StaffNo Position sex DOB Salary FName LName BranchNo

SG37 Assistant F 10 Nov 60 12000 Ann Beech B003

SG14 Supervisor M 24 Mar 58 18000 David Ford B003

StaffNo Position sex DOB Salary FName LName BranchNo

SL21 Manager M 1 Oct 93 30000 John White B005

StaffNo Position sex DOB Salary FName LName BranchNo

SG5 Assistant F 3 Jun 40 24000 Susan Brand B007

Page 25: Distributed DBMSs – Concepts and Design

Derived Horizontal Fragmentation

25

After we fragmented Staff, we found out that there is a table related to it, PropertyForRent.

Because Staff is now fragmented, it makes sense to fragment PropertyForRent too.

PropertyForRent

Staffhandle

s1 N

S1: BranchNo=‘B003’ (Staff)

S2: BranchNo=‘B005’ (Staff) Pi: PropertyForRent staffNo Si

S3: BranchNo=‘B007’ (Staff)

Page 26: Distributed DBMSs – Concepts and Design

Original PropertyForRent Table

26

PropertyNo

Street City PostCode Type Rooms Rent OwnerNo

StaffNo BranchNo

PA14 16 Holhead Aberdeen AB7 5SU House 6 650 CO46 SA9 B007

PG4 6 Lawrence Glasgow G11 9QX Flat 3 350 CO40 SG14 B003

PG16 5 Novar Dr Glasgow G12 9AX Flat 4 450 CO93 SG14 B003

PG21 18 Dale Rd Glasgow G12 House 5 600 CO87 SG37 B003

PG36 2 Manor Rd Glasgow G32 4QX Flat 3 375 CO93 SG37 B003

PL94 6 Argy11 St London NW2 Flat 4 400 CO87 SL41 B005

Page 27: Distributed DBMSs – Concepts and Design

27

Derived Horizontal Fragmentation

PropertyNo

Street City PostCode Type Rooms Rent OwnerNo

StaffNo BranchNo

PG4 6 Lawrence Glasgow G11 9QX Flat 3 350 CO40 SG14 B003

PG16 5 Novar Dr Glasgow G12 9AX Flat 4 450 CO93 SG14 B003

PG21 18 Dale Rd Glasgow G12 House 5 600 CO87 SG37 B003

PG36 2 Manor Rd Glasgow G32 4QX Flat 3 375 CO93 SG37 B003

PropertyNo

Street City PostCode Type Rooms Rent OwnerNo

StaffNo BranchNo

PA14 16 Holhead Aberdeen AB7 5SU House 6 650 CO46 SA9 B007

PropertyNo

Street City PostCode Type Rooms Rent OwnerNo

StaffNo BranchNo

PL94 6 Argy11 St London NW2 Flat 4 400 CO87 SL41 B005

Fragment P1

Fragment P2

Fragment P3

Page 28: Distributed DBMSs – Concepts and Design

Transparencies in a DDBMS

28

4 main transparencies:1. Distribution Transparency.

a. Fragmnetation.b. Location. c. Replication.d. Local Mapping.e. Naming.

2. Transaction Transparency.3. Performance Transparency.4. DBMS Transparency.

Page 29: Distributed DBMSs – Concepts and Design

1. Distribution Transparency

29

Allows the user to perceive the DB as a single, logical entity. Types:

a. Fragmentation: the user does not need to know the data is fragmented.

b. Location: the user does not need to know the location of fragments.

c. Replication: the user does not need to know the fragments are replicated.

d. Local Mapping: the user specifies the fragment and its location.

e. Naming: DDBMS makes sure every item name is unique.

Consider the distribution of the STAFF relation: S1: staffno,Position,sex,DOB, Salary(STAFF) S2: staffno,fname,lname,BranchNo(STAFF) S21: BranchNo=‘B003’ (S2) S22: BranchNo=‘B005’ (S2) S22: BranchNo=‘B007’ (S2)

Page 30: Distributed DBMSs – Concepts and Design

a. Fragmentation Transparency

30

Highest level of distribution transparency. The user does not need to know that the data is

fragmented. User treats DDB like a centralized DB. The database access are based on the global schema. Fragmentation of the data can be changed without

impacting the user.

Example:

SELECT Fname, Lname

FROM Staff

WHERE position = ‘Manager’;

Page 31: Distributed DBMSs – Concepts and Design

b. Location Transparency

31

The middle level of distribution transparency.

The user must know that the data is fragmented but still does not need

to know the location of the data.

Data location can be changed without impact on the user.

Example:

SELECT Fname, Lname FROM S21

WHERE staffNo IN (SELECT staffNo FROM S1 WHERE position=‘Manager’)

UNION

SELECT Fname, Lname FROM S22

WHERE staffNo IN (SELECT staffNo FROM S1 WHERE position=‘Manager’)

UNION

SELECT Fname, Lname FROM S23

WHERE staffNo IN (SELECT staffNo FROM S1 WHERE position=‘Manager’)

Page 32: Distributed DBMSs – Concepts and Design

c. Replication Transparency

32

User unaware of replication and location but knows that data is fragmented.

On the same level with location transparency.

Page 33: Distributed DBMSs – Concepts and Design

d. Local Mapping Transparency

33

The lowest level of distribution transparency.

The user knows that the data is fragmented and the location of the data.

Example:

SELECT Fname, Lname FROM S21 AT SITE 3

WHERE staffNo IN

(SELECT staffNo FROM S1 AT SITE 5 WHERE position=‘Manager’)

UNION

SELECT Fname, Lname FROM S22 AT SITE 5

WHERE staffNo IN

(SELECT staffNo FROM S1 AT SITE 5 WHERE position=‘Manager’)

UNION

SELECT Fname, Lname FROM S23 AT SITE 7

WHERE staffNo IN

(SELECT staffNo FROM S1 AT SITE 5 WHERE position=‘Manager’)

Page 34: Distributed DBMSs – Concepts and Design

e. Naming Transparency

34

Each item in distributed database must have a unique name.

DDBMS must ensure that no two sites violate that.

Solutions Create a central name server.

Bottleneck. against local autonomy.

Prefix an object with the identifier of the site. loss of distribution transparency.

Page 35: Distributed DBMSs – Concepts and Design

2. Transaction Transparency

35

All transactions must ensure the consistency and integrity of the DDB.

Each transaction that needs to access data in multiple sites is divided into multiple sub-transactions.

Even if transaction is split, atomicity has to be maintained.

Page 36: Distributed DBMSs – Concepts and Design

3. Performance Transparency

36

DDBMS performs as if it were a centralized DBMS.

Should not suffer because it is distributed (network communication cost).

When a site issues a query, the system must figure out the fastest way of executing it.

Distributed Query Processor (DQP) must figure out: Which fragment to access. Which copy of fragment to access (if replication is used). Where are the fragments.

Page 37: Distributed DBMSs – Concepts and Design

3. Performance Transparency

37

Consider the following distributed DB: Property(PropertyNo, city) 10,000 records in London Client(ClientNo, maxPrice) 100,000 records in Glasgow Viewing(PropertNo, ClientNo) 1,000,000 records in London

London site wants to list properties in Aberdeen that have been viewed by clients who have a maximum price limit greater than 200,000.

SELECT p.propertyNo

FROM Property P INNER JOIN

(Client c INNER JOIN Viewing v ON c.clientNo = v.clientNo)

ON p.propertyNo = v.propertyNo

WHERE p.city = ‘Aberdeen’ AND

c.maxprice > 200000;

Page 38: Distributed DBMSs – Concepts and Design

3. Performance Transparency

38

After the query is issued, DDBMS must determine the most cost-effective strategy to execute the query.

Strategies:

1. Move Client table to London and process query there.

2. Move Property and Viewing relation to Glasgow and process query there then return result.

3. Join Property and Viewing at London, project only property number and client number and move result to Glasgow to join with clients with maxPrice > 200,000 then return results.

4. Select clients at Glasgow with maxPrice > 200000, move them to London and join with viewing and Aberdeen property.

Page 39: Distributed DBMSs – Concepts and Design

4. DBMS Transparency

39

Hides the fact that different sites have different local DBMSs.

Heterogeneous DDBMSs.

Page 40: Distributed DBMSs – Concepts and Design

Date’s 12 Rules for a DDBMS

40

1. Local autonomy.

2. No reliance on a central site.

3. Continuous operation.

4. Location independence.

5. Fragmentation independence.

6. Replication independence.

7. Distributed query processing.

8. Distributed transaction processing.

9. Hardware independence.

10. Operating system independence.

11. Network independence.

12. Database independence.