15 distributed

22
Distributed Database Management System (DDBS) Motivation: Data is used at multiple distributed sites (e.g. Branch offices). Communication between sites is ----- costly ----- potentially unreliable Solution: ----- Allow sites to store/maintain the data they use most often/specialize in ----- Sharing with other sites/HQs if combinations of data necessary

Upload: cchan113

Post on 10-Dec-2015

4 views

Category:

Documents


0 download

DESCRIPTION

Databases

TRANSCRIPT

Page 1: 15 Distributed

Distributed Database Management System (DDBS)

Motivation: Data is used at multiple distributed sites (e.g. Branch offices).

Communication between sites is ----- costly ----- potentially unreliable Solution: ----- Allow sites to store/maintain the data they use

most often/specialize in ----- Sharing with other sites/HQs if combinations of

data necessary

Page 2: 15 Distributed

Network Topology

A B

C F

E D

A

C F B

D E A B

C

D E

F

A B E

C D F

A B

E F

C

D

Fully connected network Partially connected network

Tree structured network

Star network

Ring network

Cost/reliability #of hops

Page 3: 15 Distributed

Tradeoffs between

1. keeping data in centralized headquarters: • simpler maintenance • simpler consistency enforcement • possibly more efficient if many updates,aggregate

computations 2. or distributed across branch offices: • lower communication cost • reliability • parallelism can be implemented locally

Page 4: 15 Distributed

• Interconnectivity of pre-existing DBs • Expandability (don’t need to replace whole system

to grow) • Cost (many small engines on PC’s cheaper than

mainframes) issue:communication costs vs. hardware computation costs.

• Performance (place data near where used) • Availability and reliability

Advantages of DDBS (heterogeneous)

Page 5: 15 Distributed

Complicating factors

• Maintaining data consistency (in face of replication and sharing)

• Distributed directory management (who controls mapping of data to sites)

• Security

• Heterogeneous Databases

different database architectures

Page 6: 15 Distributed

Distributed Database Design Issues

Options for storing a relation R across multiple sites:

• Replication (maintain copies/replicas of R on multiple sites)

• Fragmentation (Relation store in fragments/ pieces on multiple sites)

• combination of both

Page 7: 15 Distributed

R1

R2

Copy of R1 Copy of R1

Copy of R2 Copy of R2

site1 site2 site3 site4

1/3 of R1

1/3 of R1

1/3 of R1

1

2

3 4

5

6 7

8

9

½ Of R2

½ Of R2

A/B C/D

(Horizontal)

(Vertical)

FRAGMENTATION

REPLICATION

Page 8: 15 Distributed

Replication

• Issues: (whole database replication v.s. no replication)

- what to replicate?(all relations or only frequently user shared data)

- where to replicate? (function of communication costs,usage needs,resources)

- which relations to replicate? -” primary copy” of relation (simplifies consistency

enforcement, but where located?)

Non redundant allocation

Page 9: 15 Distributed

Replication (cont)

•Advantages:

–Improved availability (multiple sources for a relation if a site is down)

–Increased parallelism (sites can process (primarily) read- only operations in parallel,minimizing data transfer)

(well suited for read-only, majority read-only data access)

Page 10: 15 Distributed

Replication (cont)

• Disadvantages: o problems/overhead for writes/updates o costs of consistency enforcement

- updates propagated to all sites (communication costs)

- costs of synchronization/locking for consistency enforcement on update greater than in single source models.

Complicates concurrency and recover Replication inefficient in databases with frequent

updates

Page 11: 15 Distributed

FRAGMENTATION

• Vertical • Horizontal • mixel

Issues:

- completeness: Every tuple/attribute in some fragment

- reconstruction:easy way of reconstructing full relation

- transparency

Page 12: 15 Distributed

-Fragments contain subsets of complete tuples (all attributes at all sites)

How to reconstruct R=Rs1 Rs2 ……. Rsn

HORIZONTAL FRAGMENTATION

Original relation

A1 A2 ………. An 1

1

1

2

2

3

3

3

T1

T2

T3

.

.T60

T61

.

.

Tn

A1 A2 ………. An

A1 A2 ………. An T1

T2

T3

.

.T60

T61

.

.

Tn

Site 1

Site 2

Page 13: 15 Distributed

Horizontal Fragmentation

• Example Usefulness: - Each branch office maintains complete attribute set of its

employees (salary,benefits,address/phone,departments,projects,etc.)

- Site of Fragment easily determined by a key attribute

value -e.g. Branch_office*

Page 14: 15 Distributed

VERTICAL FRAGMENTATION

A1 A2 A3 A4

A1 A2 A3 A4

Original Relation (R) t1

t2

tn

RS1

RS2 t1

t2

tn

t1

t2

tn

SITE1 SITE2

How to Reconstruct:

R=Rs1 Rs2 Rsn

TID –Tuple ID Hidden Attribute to

ensure account and simple join reconstruction

RS1.TID=RS2.TID

Join condition

1 2 n

1 2 n

TID TID

Page 15: 15 Distributed

Example usefulness: Salary Office Benefits Office Directory (Name|address|phone|fax) Dependents Management Office each control their own appropriate attribute for all corporate branch offices VERTICAL –Attribute-centered management (keep all instances of an attribute in one place) HORIZONTAL – tuple/individual-centered management (keep all values of a tuple in one place)

VERTICAL FRAGMENTATION

Page 16: 15 Distributed

MIXED FRAGMENTATION

usa

Europe

A1 A2 A3

A1 A2 A3

A4 A5

A4 A5

A1 A2 A3 A4 A5

(Salary Attributes)

(Benefit Attributes)

Rs1

Rs2

Rs3

Rs4

R

Page 17: 15 Distributed

Partition of Attributes/tuples need not be disjoint

REPLICATION and FRAGMENTATION

A1 A2 A3 A4 A5

A1 A2 A3 A4 A2 A3 A4 A5

Overlap

(replication of attributes)

Page 18: 15 Distributed

TRANSPARENCY

Fragmentation Transparency -User doesn’t need to know mapping between relations and

fragmented subrelations Replication Transparency -User doesn’t need to know about existence or location of other

copies (treat as if single copy of DB) Location and Naming Transparency -Use shouldn’t need to know about location and full names

of data on the server

ΠSalary(σssn=so(Employee)) Site27, Employee. Fragment3. Replica7

Name Server Proper site, Fragment, replica for this data access

Unique name

Page 19: 15 Distributed

Issues1:

Parallel Processing across Fragments

ΠLName(σsalary>40,000(Employee))

⇒ ΠLName(σsalary>40,000(Emp1)) U ΠLName(σsalary>40,000(Emp2))

QUERY PROCESSING IN DDMS

=Emp1 U Emp2

2 Fragments

Site 1 Site 2

Execution in Parallel on fragments

and union results together

Horizontal fragmentations

Page 20: 15 Distributed

(A B) C A (B C) 50K 0.5K

1K 3K

0.5K

0.5K

Site1 Site2 Site3 50K 1K 3K Joins- symmetric and

associative

Parallel Processing

(σxx(A)) (B C)

Page 21: 15 Distributed

QUERY PROCESSING IN DDBS

R=Π Fnames, Cnames, Dnames (Employee Department) Strategies: 1)Ship both relations to the result site and join there 2)Ship employee to 2, join at 2, results to 3 3)Ship Department to 1, join at 1, results to 3 ⇒ minimize total communication cost of data transfer

1,003,000 bytes transfered

1,002,000 bytes transfered

5,000 bytes transfered

Join Strategies

Site 3 100 records, 2000 bytes

Site 1 10,000 records, 1,000,000 bytes

Site 2 100 records, 3000

bytes

Mg rssn to ssn

Page 22: 15 Distributed

-transaction managers / coordinators -log managers Problems: -failure of site -failure of link -loss of messages if server is down, elect new server ⇒what about network partitioning?

RECOVERY IN DDBS

Server’s link

Newly elected Server

Difficult to know which had occurred

Original Server