fine-grained replication and scheduling with freshness and correctness guarantees

Fine-Grained Replication and Scheduling with Freshness and

Correctness Guarantees

F.Akal1, C.Türker1, H.-J.Schek1, Y.Breitbart2, T.Grabs3, L.Veen4

1ETH Zurich, Institute of Information Systems, 8092 Zurich, Switzerland, {akal,tuerker,schek}@inf.ethz.ch2Kent State University, Department of Computer Science, Kent OH 44240, USA, [email protected] Microsoft Way, Redmond, WA 98052, USA, [email protected] of Twente, 7500 AE Enschede, The Netherlands, [email protected]

This work was supported partially by Microsoft

31st International Conference on Very Large Data Bases, Trondheim, Norway, 30 August – 2 September, 2005.

2September 1st, 2005 Fuat Akal, ETH Zürich, [email protected]

Overview

Introduction and Motivation Replication in a Database Cluster Need for a New Replication Scheme

PowerDB Replication (PDBREP) Overview of The PDBREP Protocol Freshness Locking Experimental Evaluations

Conclusions


Introduction

Replication is an essential technique to improve performance of reads when writes are rare

Different approaches have been studied so far eager replication

- synchronization within the same transaction

- conventional protocols have drawbacks regarding performance and scalability• Newer protocols reduce these drawbacks by using group communication

lazy replication- decoupled replica maintenance

- additional efforts are necessary to guarantee serializable executions

- older works focused on performance and correctness, freshness of data was not considered enough

Recently, coordinated replication management proposed within PowerDB project at ETH Zürich addresses freshness issues


The PowerDB Approach

Cluster of databases Cluster of off-the-shelf PCs Each PC runs a commercially available

RDBMS Fast Ethernet Connection (100 Mbit/s)

Middleware Architecture Clients access the cluster over the

middleware only Distinguished cluster into two parts Lazy replication management

- Eager from user‘s perspective

The “Scale-out” vision Adding new nodes for higher performance More nodes allow to increase parallelism

Coordination Middleware

Clients

Cluster of DBs

OLTP OLAP

Update and Read-only Transactions


The Early PowerDB Approach to Replication : FAS (Freshness Aware Scheduling)

Relies on full replication

Read-only transactions execute where they are initiated

Users may specify their freshness needs How much the accessed

data may deviate from up-to-date data

Freshness at the database level

- Locks entire database

Read-only sites are maintained by means of decoupled refresh transactions, e.g., on-demand refreshment

a,b,c,d

Update Sites

Update Transactions

Read-only Transactions

a,b,c,da,b,c,da,b,c,d

T2

r(b)r(a)

T3

r(d)r(c)

a,b,c,d

T1

w(a)

Read-only Sites

PowerDB MiddlewarePowerDB Middleware

„I am fine with 2 minutes old data“

„I want fresh data“

Decoupled Refresh Transactions


Systems Are Evolving… So Is PowerDB…

Customized node groups for certain query types

Requires support for arbitrary physical data design

Read-only transactions may span over many nodes providing query parallelizm

Users may still specify their freshness needs

Freshness at the database object level

- Fine-grained locking

Read-only transactions should be served fast even with the higher update rates and freshness requirements

Read-only sites must be kept as up-to-date as possible

d

Update Sites

Update Transactions

Read-only Transactions

a,b,db,da,ca,b,c,d

T1

w(a)

a,b,c,d a,b,c,d a,b,c,d a,b,c,d

T2

r(b)r(a)

T3

r(d)r(b)

PowerDB MiddlewarePowerDB Middleware

Continuous Update Propagation to Read-only Sites

Read-only Sites


Why There Is Need For A New Replication Protocol?

Distributed executions of read-only transactions might cause non-serializable global schedules

Continuous update propagation must be coordinated with read-only transaction execution

Having arbitrary physical layouts and extending locking to finer granules require more effort to maintain replicas and to execute read-only transactions

Sophisticated replication mechanism is required...


Overview of The PDBREP Protocol

a,c c,d a,b,dc,da,b

all changes are serialized in a global log

Site s1 Site s2 Site s3 Site s4 Site s5

T1 T3

w(d)

update transactions

r(c)w(a) w(b)r(a)

T2

w(c)

TG3: SN: 031013OP: w(d)

TG2: SN: 031012OP: w(c)

TG1: SN: 031011OP: w(a) w(b)

SN:031013v[a]: 6v[b]: 8v[c]: 4v[d]: 2

local propagation

queue

SN:031011v[a]: 6v[b]: 8v[c]: 3v[d]: 1

SN:031011v[a]: 6v[b]: 8v[c]: 3v[d]: 1

SN:031010v[a]: 5v[b]: 7v[c]: 3v[d]: 1

TL1

Update

Sites

Read-only Sites

Broadcast to Read-only Sites

TL2

TL1

TL2

TL1

TL2

Update Counter Vector

Global Counter Vector

read-only transactions T4

r(a)

w(a) w(b)

TL1

T5

w(a) w(b)

TL1

T6

Propagation transactions

• Localized log records are applied to the site by using propagation transactions when that site is idle

• Continuous broadcasting and update propagation keep the site as up-to-date as possible

• Global log records are continuously being broadcast to read-only sites

• They are enqueued in the local propagation queues in their serialization order

System does not allow propagation transactions to overwrite versions needed by read-only transactions

Freshness Locks


Overview of The PDBREP Protocol

a,c c,d a,b,dc,da,b

all changes are serialized in a global log

Site s1 Site s2 Site s3 Site s4 Site s5

T1 T3

w(d)

update transactions

r(c)w(a) w(b)r(a)

T2

w(c)

TG3: SN: 031013OP: w(d)

TG2: SN: 031012OP: w(c)

TG1: SN: 031011OP: w(a) w(b)

SN:031013v[a]: 6v[b]: 8v[c]: 4v[d]: 2

local propagation

queue

SN:031011v[a]: 6v[b]: 8v[c]: 3v[d]: 1

SN:031011v[a]: 6v[b]: 8v[c]: 3v[d]: 1

SN:031010v[a]: 5v[b]: 7v[c]: 3v[d]: 1

TL2

Update

Sites

Read-only Sites

Broadcast to Read-only Sites

TL2 TL1

TL2

T4

r(a)

read-only transactions

Update Counter Vector

Global Counter Vector

T8

r(d)r(a)

fresh data required

fresh data required

T7

r(d)r(c)

Freshness is not explicitly

specified

Freshness is not explicitly

specified

SN:031013v[a]: 6v[b]: 8v[c]: 4v[d]: 2

SN:031013v[a]: 6v[b]: 8v[c]: 4v[d]: 2

w(c) w(d)

T10

TL2 TG3

Refresh Transactions

w(c) w(d)

T9

TL2 TG3

To ensure correct executions, each read-only transaction determines the version of the objects it reads at its start


Freshness Locks

Propagation

While propagation transactions continue, read-only transaction T1 arrives. Then, propagation transactions stop. A refresh transaction is invoked.

Propagation stops T1 may continue

When T1 commits, it releases its freshness locks. T2 causes another refresh to be invoked.

T1 commits

Propagation stops T2 may continueT2 commits

Propagation stops T3 may continue

Data

Required Timestamp

T1

T2

T3

Propagation Tx

Refresh Tx

Freshness locks are placed on the objects to ensure that ongoing replica maintenance transactions do not overwrite versions needed by ongoing read-only transactions

Freshness locks keep the objects accessed by read-only transaction at a certain freshness level during the execution of that transaction


Current freshness freshness lock (request)

Scheduling a Read-only Transaction

Required Timestamp

Data

b

a

T1

T1

T2

T2

T3

T3

1 2 3 4 5 6 7 8

T1 (r1(a), r1(b), TS=1) Sites are older

T2 (r2(a), r2(b), TS=3) Both sites younger b´s freshness lock upgraded TS becomes 5

T3 (r3(a), r3(b), TS=7)There are younger siteb´s lock upgradedTS becomes 8


Experimental Evaluations

Investigating the influence of continuous update broadcasting and propagations on cluster performance. We considered… three different …

- settings : There are two basic options that we can switch on or off1. No-Broadcasting and No-Propagation

2. Broadcasting and No-Propagation

3. Broadcasting and Propagation

- workloads : 50%, 75% and 100% loaded clusters• 50% loaded cluster means that the cluster is busy with evaluating queries for the

half of the experiment duration

freshness- Five freshness levels: 0.6, 0.7, 0,8, 0.9, 1.0, e.g., 1.0 means the freshest- Freshness window of 30 seconds, e.g., 0.0 means 30 second old data

Looking at the scalability of PDBREP

Comparing PDBREP to its predecessor (FAS)


Experimental Setup

Cluster of 64 PCs 1GHz Pentium III, 256 MB RAM, 2 SCSI Discs, 100MBit Ethernet SQL Server 2000 running under Windows 2000 Advanced Server

TPCR-R Database with scale factor 1 ~4.3G together with indexes, 200 updates per second

Node Groups of 4 nodes Small tables are fully-replicated, the huge ones are partitioned (over

order_key) within the NGs.


Average Query Evaluation Times for Different

Workloads and Freshness Values

• Turning on propagation and/or broadcasting always improves the performance.• The lower the workload is, the higher the gain in perfomance becomes, e.g., improvement is

82% for 50% loaded cluster.

0

1000

2000

3000

4000

5000

6000

7000

Setting 1 Setting 2 Setting 3 Setting 1 Setting 2 Setting 3 Setting 1 Setting 2 Setting 3

Workload

Qu

ery

Exe

cuti

on

Tim

e (m

s)

1 0.9 0.8 0.7 0.6

50% 75% 100%


Average Refresh Transaction Size for Different

Workloads and Freshness Values

• Propagation eliminates the need for refresh transactions except for the maximum freshness requirements and workloads.

• This results in query execution times practically independent of the overall workload for the given update rate.

• For fully loaded cluster, there is simply no time for propagations except at the beginning and end of transactions, which results in small performance improvement.

0

500

1000

1500

2000

2500

30001

0.9

0.8

0.7

0.6 1

0.9

0.8

0.7

0.6 1

0.9

0.8

0.7

0.6 1

0.9

0.8

0.7

0.6 1

0.9

0.8

0.7

0.6 1

0.9

0.8

0.7

0.6 1

0.9

0.8

0.7

0.6 1

0.9

0.8

0.7

0.6 1

0.9

0.8

0.7

0.6

Setting 1 Setting 2 Setting 3 Setting 1 Setting 2 Setting 3 Setting 1 Setting 2 Setting 3

50% 75% 100%

Workload

Ref

resh

Tra

nsa

ctio

n S

ize

(nu

mb

er o

f ch

ang

es)

Local Global


Scalability of PDBREP : Query Throughput for Varying

Cluster Sizes

• PDBREP scales up with the increasing cluster size (Above chart shows the scalability for 50% loaded cluster)

• The results for freshness 0.9 and below are virtually identical due to local refresh transactions

0

1

2

3

4

5

6

4 8 16 32 64

Cluster Size

Qu

erie

s p

er S

eco

nd

0.6 0.7 0.8 0.9 1


PDBREP vs. FAS (freshness aware scheduling) : Relative

Query Throughput

• For all three workloads, PDBREP performs significantly better than FAS (30%, 72% and 125%)• PDBREP uses partitioning of data while FAS relies on full replication, which results in small

refresh transactions• PDBREP allows distributed executions and gains from parallelization

0

10

20

30

40

50

60

70

80

90

100

PDBREP FAS PDBREP FAS PDBREP FAS

Workload

Rel

ativ

e Q

uer

y T

hro

ug

hp

ut

1

0.9

0.8

0.7

0.6

50% 75% 100%


Conclusions

PDBREP respects user demanded freshness requirements and extends the notion of freshness to finer granules of data

PDBREP requires less refresh effort to serve queries due to continuous propagations of updates

PDBREP allows distributed executions of read-only transactions and produces globally correct schedules

PDBREP supports different physical data organization schemes

PDBREP scales even with higher update rates

fine-grained replication and scheduling with freshness and correctness guarantees

Documents

fuat akal

freshness of data

finegrained replication

freshness needsfreshness

freshness needshow

accessed data

new nodes

powerdb project