data placement problems in database applications

63
Data Placement Problems in Database Applications An Zhu An Zhu Stanford University

Upload: nathaniel-glenn

Post on 31-Dec-2015

28 views

Category:

Documents


0 download

DESCRIPTION

Data Placement Problems in Database Applications. An Zhu Stanford University. Data Placement. Data objects Multiple disks Assignment of objects to disks Optimize performance Optimize I/O Handle dynamic situations. Outline. Multimedia Systems [GKKTZ 00] - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data Placement Problems in Database Applications

Data Placement Problems in Database Applications

An ZhuAn Zhu

Stanford University

Page 2: Data Placement Problems in Database Applications

04/19/23 AZ 2

Data Placement

Data objects Multiple disks Assignment of objects to disks

Optimize performance Optimize I/O Handle dynamic situations

Page 3: Data Placement Problems in Database Applications

04/19/23 AZ 3

Outline

Multimedia Systems [GKKTZ 00] Maximize the total clients served

Relational Database Layout [AFMPZ 03] Minimize the combined I/O access time

Load Rebalancing Problem [AMZ 03] Minimize the makespan within allowed

moves

Page 4: Data Placement Problems in Database Applications

04/19/23 AZ 4

Outline

Multimedia Systems [GKKTZ 00] Maximize the total clients served

Relational Database Layout [AFMPZ 03] Minimize the combined I/O access time

Load Rebalancing Problem [AMZ 03] Minimize the makespan within allowed

moves

Page 5: Data Placement Problems in Database Applications

04/19/23 AZ 5

Multimedia Storage Systems

Movie objects Clients/subscribers Parallel disks

Limited storage: # of movies—Nj

Limited bandwidth: # of clients—Cj

Homogeneous system: Nj=k, Cj=L, j Uniform ratio: Cj/Nj=r, j

Page 6: Data Placement Problems in Database Applications

04/19/23 AZ 6

An Example

000/600

000/600

000/600

100 100100100100100

100 400400100100100

Total Storage: 12, Total Capacity: 1800

Page 7: Data Placement Problems in Database Applications

04/19/23 AZ 7

An Example

000/600

000/600

400/600

100100

100 400400100100100

Total Storage: 12, Total Capacity: 1800

Page 8: Data Placement Problems in Database Applications

04/19/23 AZ 8

An Example

400/600

000/600

400/600

400400100100

Total Storage: 12, Total Capacity: 1800

Page 9: Data Placement Problems in Database Applications

04/19/23 AZ 9

Not All Clients Can be Satisfied

400/600

600/600

400/600

400

Total Satisfied Clients: 1400/1800=7/9

Page 10: Data Placement Problems in Database Applications

04/19/23 AZ 10

Sliding Window Algorithm

Consider one disk at a time Maintain an ordered list of movies The first consecutive k movies (or less)

with at least L combined clients Assign the first L clients to the disk and

reconsider leftover clients

Page 11: Data Placement Problems in Database Applications

04/19/23 AZ 11

An Example

000/600

000/600

000/600

100 100100100100100

100 400400100100100

Max window size k=4

100

Page 12: Data Placement Problems in Database Applications

04/19/23 AZ 12

An Example

000/600

000/600

000/600

100 100100100100100

100 400400100100100

Max window size k=4

200

Page 13: Data Placement Problems in Database Applications

04/19/23 AZ 13

An Example

000/600

000/600

000/600

100 100100100100100

100 400400100100100

Max window size k=4

400

Page 14: Data Placement Problems in Database Applications

04/19/23 AZ 14

An Example

000/600

000/600

000/600

100 100100100100100

100 400400100100100

Max window size k=4

400

Page 15: Data Placement Problems in Database Applications

04/19/23 AZ 15

An Example

000/600

000/600

000/600

100 100100100100100

100 400400100100100

Max window size k=4

700

Page 16: Data Placement Problems in Database Applications

04/19/23 AZ 16

An Example

000/600

000/600

600/600

100 100100100100100

100 400100 0 0 0

Max window size k=4

Page 17: Data Placement Problems in Database Applications

04/19/23 AZ 17

An Example

000/600

000/600

600/600

100 100100100100100

100 400100

Max window size k=4

Page 18: Data Placement Problems in Database Applications

04/19/23 AZ 18

An Example

600/600

000/600

600/600

100 100100100100100

Max window size k=4

400

Page 19: Data Placement Problems in Database Applications

04/19/23 AZ 19

An Example

600/600

400/600

600/600

100 100

Total Satisfied Clients: 1600/1800=8/9

Page 20: Data Placement Problems in Database Applications

04/19/23 AZ 20

Theoretical Bounds

Satisfies at least fraction of total clients

In the worst case, no algorithm can satisfy more clients

Translates to an -approximation

PTAS: (1+)-approximation, >0

21

11

k

21

11

k

Page 21: Data Placement Problems in Database Applications

04/19/23 AZ 21

Theoretical Bounds

Satisfies at least fraction of total clients

In the worst case, no algorithm can satisfy more clients

Translates to an -approximation

PTAS: (1+)-approximation, >0

21

11

k

21

11

k

Page 22: Data Placement Problems in Database Applications

04/19/23 AZ 22

Proof Sketch

Load vs. storage saturated: ML, MS

Least loaded disk: cL ML+MS=M, 0<c<1 All remaining movies each have no

more than cL/k clients Initial instance is feasible (w.l.o.g.)

Page 23: Data Placement Problems in Database Applications

04/19/23 AZ 23

An Example

600/600

400/600

600/600

100 100

Total Satisfied Clients: 1600/1800=8/9

ML=2, MS=1,

c=400/600

cL/k=100

Page 24: Data Placement Problems in Database Applications

04/19/23 AZ 24

Proof Outline

If there is a load saturated disk with less than k movies All clients are satisfied

Otherwise At most ML movies are left Satisfy at least fraction of

the clients 21

11

k

Page 25: Data Placement Problems in Database Applications

04/19/23 AZ 25

Lemma

If any of the load saturated disk has less than k objects

Any k-1 remaining movies in the list has L clients or more

Page 26: Data Placement Problems in Database Applications

04/19/23 AZ 26

Lemma

The remaining disks are all load saturated

So, all clients are satisfied

At least LAt least L

Page 27: Data Placement Problems in Database Applications

04/19/23 AZ 27

Otherwise…

Each disk has exactly k movies Total assigned movies: M·k

Initial movies: N M·k “New” movies generated: ML

# of movies left: ≤ ML

# of clients/remaining movie: ≤ cL/k Total # of remaining clients: cLML/k

Page 28: Data Placement Problems in Database Applications

04/19/23 AZ 28

Otherwise…

Total clients: ≤ M·L Assigned clients: ML·L + Ms·cL Total # of remaining clients : ≤ Ms·(1-c)L Final bound:

cLMLM

LcMkMcL

S

U

SL

SL

)1(,/min

21

11

k

Page 29: Data Placement Problems in Database Applications

04/19/23 AZ 29

Simulation Results

M=5L=100N=M·k

Zipf with=0.0( i-1 )

Page 30: Data Placement Problems in Database Applications

04/19/23 AZ 30

Recap

The problem is NP-complete PTAS: best possible approximation

bound : best possible absolute bound

Sliding window algorithm: practical with O((M+N)log(M+N)) running time

21

11

k

Page 31: Data Placement Problems in Database Applications

04/19/23 AZ 31

Outline

Multimedia Systems [GKKTZ 00] Maximize the total clients served

Relational Database Layout [AFMPZ 03] Minimize the total I/O access time

Load Rebalancing Problem [AMZ 03] Minimize the makespan within allowed

moves

Page 32: Data Placement Problems in Database Applications

04/19/23 AZ 32

Relational Databases

Objects: indexes, tables, views Multiple disks Minimize the total I/O access time

Page 33: Data Placement Problems in Database Applications

04/19/23 AZ 33

Past Work

Full striping Split uniformly across all available disks Utilize I/O parallelism

: transfer rate

200MB=0.05s/MB,Tt=10s200MB

Page 34: Data Placement Problems in Database Applications

04/19/23 AZ 34

=0.05s/MB,Tt=2.5s

Past Work

Full striping Split uniformly across all available disks Utilize I/O parallelism

: transfer rate

200MB =0.05s/MB,Tt=10s50MB50MB50MB

50MB50MB 50MB 50MB

Page 35: Data Placement Problems in Database Applications

04/19/23 AZ 35

Past Work

Co-accessed objects with Random I/O Seek time/per block size: 0.01s/0.1MB Seek rate: =0.1s/MB Smaller object dominates

50MB 50MB 50MB 50MB100MB 100MB 100MB 100MB

AB

Ts=50·2=10s

Page 36: Data Placement Problems in Database Applications

04/19/23 AZ 36

Past Work

Combined access time Transfer time: Tt=(50+100)·=7.5s Seek time: Ts=min(50,100)·=10s Combined time: Tt+Ts=17.5s

50MB 50MB 50MB 50MB100MB 100MB 100MB 100MB

AB

Page 37: Data Placement Problems in Database Applications

04/19/23 AZ 37

Past Work

Fully striping is no longer optimal [Agrawal Chaudhuri Das Narasayya 03’]

Combined time: 200·=10s

100MB 100MB200MB 200MB

Page 38: Data Placement Problems in Database Applications

04/19/23 AZ 38

Data Layout Problem

Work Load (SQL DML) A set of queries and/or updates A set of co-accessed objects (pairwise) Access stats (pairwise) Minimize the estimated I/O access time

Page 39: Data Placement Problems in Database Applications

04/19/23 AZ 39

Theoretical Questions

Approximation and its hardness Transfer time: P Seek time: Very Hard Combined time

Hard Minimizing transfer time alone is a “good”

approximation

Page 40: Data Placement Problems in Database Applications

04/19/23 AZ 40

Transfer Time

Heterogeneous disks Different rate: j

Storage constraint: cj

Objects Different size: si

Access frequency: i,i’

Solvable using Linear Programming (LP)

Page 41: Data Placement Problems in Database Applications

04/19/23 AZ 41

LP

',',',

,',',

,',,',

,

,

,

min

',,

,',,)(

,

,

,,0

iiiiii

jiiii

jjijijii

ji

ji

jiji

ji

T

iitT

jiixxt

jcx

isx

jix

Amount of object i assigned to disk j

Each object must be completely assigned

Each disk’s storage limit is kept

Transfer time for (i,i’) on disk j

Overall transfer time for (i,i’)

Minimize the total transfer time

Page 42: Data Placement Problems in Database Applications

04/19/23 AZ 42

Seek Time

Hard even on disks with no storage constraint

Integral assignment Each object is assigned to one machine

only Conversion from a fraction assignment

with no loss

Page 43: Data Placement Problems in Database Applications

04/19/23 AZ 43

Conversion

f( , )=1, f( , )=1, f( , )=0 Total seek cost: 1002+1002 Want: each file is spread uniformly

across a subset of disks

100MB 100MB200MB 200MB

150MB100MB

A B ABC C

Page 44: Data Placement Problems in Database Applications

04/19/23 AZ 44

Conversion

f( , )=1, f( , )=1, f( , )=0 Total seek cost: 1002+1002 New cost: 1002+1252

100MB 100MB200MB 200MB

125MB125MB

A B ABC C

Page 45: Data Placement Problems in Database Applications

04/19/23 AZ 45

Conversion

f( , )=1, f( , )=1, f( , )=0 Total seek cost: 1002+1002 New cost: 1002

100MB 100MB200MB 200MB

125MB125MB250MB

A B ABC C

Page 46: Data Placement Problems in Database Applications

04/19/23 AZ 46

Conversion

f( , )=1, f( , )=1, f( , )=0 Total seek cost: 0 Each file resides on only one disk

200MB400MB250MB

100MB 100MB200MB200MB250MB

A B ABC C

Page 47: Data Placement Problems in Database Applications

04/19/23 AZ 47

Implications

A polynomial time algorithm Equivalent to Minimum Edge Deletion

k-Partition NP-Hard to approximate: O(n2) Forces combined time be hard to

approximate

Page 48: Data Placement Problems in Database Applications

04/19/23 AZ 48

Combined Time

Let

Hard to approximate: ·, 1>>0 Optimize transfer time alone gives 1+

j

jj

max

)(),min(2)(

))(1(),min(2)(

212121

212121

xxxxxx

xxxxxx

Page 49: Data Placement Problems in Database Applications

04/19/23 AZ 49

Outline

Multimedia Systems [GKKTZ 00] Maximize the total clients served

Relational Database Layout [AFMPZ 03] Minimize the combined I/O access time

Load Rebalancing Problem [AMZ 03] Minimize the makespan within allowed

moves

Page 50: Data Placement Problems in Database Applications

04/19/23 AZ 50

Load Rebalancing

Access pattern changes Initial layout no longer balanced

2

1

54

3 6

7

8

9

1011

MAX LOAD

Page 51: Data Placement Problems in Database Applications

04/19/23 AZ 51

Load Rebalancing

Relocate objects Minimize the max load with k moves

2

1

543 6

78

9

1011

MAX LOAD

Page 52: Data Placement Problems in Database Applications

04/19/23 AZ 52

Simple Algorithm (O(nlogn))

Step 1: Repeat k times Remove the largest object from the most

loaded disk The resulting max load: L(1)

Step2: Relocate the removed k objects Assign each object to the least loaded

disk The resulting max load: L(2)

Page 53: Data Placement Problems in Database Applications

04/19/23 AZ 53

Example (k=3)

Step1: L(1) OPT

2

1

543 6

78

9

1011

MAX LOAD

9

MAX LOAD

1

6

L(1)

Page 54: Data Placement Problems in Database Applications

04/19/23 AZ 54

Example (k=3)

Step2: L(2) OPT + S 2OPT Overall: max(L(1),L(2)) 2OPT

2 543

78

1011

91

6

MIN LOAD6 MIN LOAD

1 9

L(2)

Page 55: Data Placement Problems in Database Applications

04/19/23 AZ 55

Can We Do Better?

Blindly remove the large object is not wise

2

1

543 6

78

9

1011

MAX LOAD

Page 56: Data Placement Problems in Database Applications

04/19/23 AZ 56

How can we do better

Take care of large objects Large objects: size >1/2OPT Small objects: size 1/2OPT

2

1

543 6

78

9

1011

OPT

Page 57: Data Placement Problems in Database Applications

04/19/23 AZ 57

Revising The Plan

Step 1: Repeat k times Remove the largest object from the most

loaded disk The resulting max load: L(1) OPT

Step2: Relocate the removed k objects Assign each object to the least loaded

disk The resulting max load: L(2) OPT +S

2OPT

Page 58: Data Placement Problems in Database Applications

04/19/23 AZ 58

Revised Plan

Step 1: with no more than k moves Shuffle large objects and remove small

objects The resulting max load: L(1) 3/2 OPT

Step2: Relocate the removed objects Assign each object to the least loaded

disk (they are all small) The resulting max load: L(2) OPT +S

3/2 OPT just to fill in the space

Page 59: Data Placement Problems in Database Applications

04/19/23 AZ 59

Example

Step 1

2

1

543 6

78

9

1011

MAX LOAD

2

1

10 11

3/2 OPT

Page 60: Data Placement Problems in Database Applications

04/19/23 AZ 60

2

Example

Step 2

543 6

78

9 MIN LOAD

2

1

10 11

11 MIN LOAD10 MIN LOAD

OPT+S

Page 61: Data Placement Problems in Database Applications

04/19/23 AZ 61

Recap

Fast 1.5-approximation (O(nlogn)) NP-complete PTAS: generalized cost

Page 62: Data Placement Problems in Database Applications

04/19/23 AZ 62

Summary

Multimedia Systems [GKKTZ 00] Maximize the total clients served

Relational Database Layout [AFMPZ 03] Minimize the combined I/O access time

Load Rebalancing Problem [AMZ 03] Minimize the makespan within allowed

moves

Page 63: Data Placement Problems in Database Applications

04/19/23 AZ 63

Other Research Interests

Algorithms for mobile, sensor networks and privacy preserving databases

Online Algorithms: queue management, packet switching, web caching, scheduling

Approximation Algorithms: network design, multi-product pricing

Streaming Algorithms