fine-grained replication and scheduling with freshness and correctness guarantees
DESCRIPTION
Fine-Grained Replication and Scheduling with Freshness and Correctness Guarantees. F.Akal 1 , C.Türker 1 , H.-J.Schek 1 , Y.Breitbart 2 , T.Grabs 3 , L.Veen 4 1 ETH Zurich, Institute of Information Systems, 8092 Zurich, Switzerland, {akal,tuerker,schek}@inf.ethz.ch - PowerPoint PPT PresentationTRANSCRIPT
Fine-Grained Replication and Scheduling with Freshness and
Correctness Guarantees
F.Akal1, C.Türker1, H.-J.Schek1, Y.Breitbart2, T.Grabs3, L.Veen4
1ETH Zurich, Institute of Information Systems, 8092 Zurich, Switzerland, {akal,tuerker,schek}@inf.ethz.ch2Kent State University, Department of Computer Science, Kent OH 44240, USA, [email protected] Microsoft Way, Redmond, WA 98052, USA, [email protected] of Twente, 7500 AE Enschede, The Netherlands, [email protected]
This work was supported partially by Microsoft
31st International Conference on Very Large Data Bases, Trondheim, Norway, 30 August – 2 September, 2005.
2September 1st, 2005 Fuat Akal, ETH Zürich, [email protected]
Overview
Introduction and Motivation Replication in a Database Cluster Need for a New Replication Scheme
PowerDB Replication (PDBREP) Overview of The PDBREP Protocol Freshness Locking Experimental Evaluations
Conclusions
3September 1st, 2005 Fuat Akal, ETH Zürich, [email protected]
Introduction
Replication is an essential technique to improve performance of reads when writes are rare
Different approaches have been studied so far eager replication
- synchronization within the same transaction
- conventional protocols have drawbacks regarding performance and scalability• Newer protocols reduce these drawbacks by using group communication
lazy replication- decoupled replica maintenance
- additional efforts are necessary to guarantee serializable executions
- older works focused on performance and correctness, freshness of data was not considered enough
Recently, coordinated replication management proposed within PowerDB project at ETH Zürich addresses freshness issues
4September 1st, 2005 Fuat Akal, ETH Zürich, [email protected]
The PowerDB Approach
Cluster of databases Cluster of off-the-shelf PCs Each PC runs a commercially available
RDBMS Fast Ethernet Connection (100 Mbit/s)
Middleware Architecture Clients access the cluster over the
middleware only Distinguished cluster into two parts Lazy replication management
- Eager from user‘s perspective
The “Scale-out” vision Adding new nodes for higher performance More nodes allow to increase parallelism
Coordination Middleware
Clients
Cluster of DBs
OLTP OLAP
Update and Read-only Transactions
5September 1st, 2005 Fuat Akal, ETH Zürich, [email protected]
The Early PowerDB Approach to Replication : FAS (Freshness Aware Scheduling)
Relies on full replication
Read-only transactions execute where they are initiated
Users may specify their freshness needs How much the accessed
data may deviate from up-to-date data
Freshness at the database level
- Locks entire database
Read-only sites are maintained by means of decoupled refresh transactions, e.g., on-demand refreshment
a,b,c,d
Update Sites
Update Transactions
Read-only Transactions
a,b,c,da,b,c,da,b,c,d
T2
r(b)r(a)
T3
r(d)r(c)
a,b,c,d
T1
w(a)
Read-only Sites
PowerDB MiddlewarePowerDB Middleware
„I am fine with 2 minutes old data“
„I want fresh data“
Decoupled Refresh Transactions
6September 1st, 2005 Fuat Akal, ETH Zürich, [email protected]
Systems Are Evolving… So Is PowerDB…
Customized node groups for certain query types
Requires support for arbitrary physical data design
Read-only transactions may span over many nodes providing query parallelizm
Users may still specify their freshness needs
Freshness at the database object level
- Fine-grained locking
Read-only transactions should be served fast even with the higher update rates and freshness requirements
Read-only sites must be kept as up-to-date as possible
d
Update Sites
Update Transactions
Read-only Transactions
a,b,db,da,ca,b,c,d
T1
w(a)
a,b,c,d a,b,c,d a,b,c,d a,b,c,d
T2
r(b)r(a)
T3
r(d)r(b)
PowerDB MiddlewarePowerDB Middleware
Continuous Update Propagation to Read-only Sites
Read-only Sites
7September 1st, 2005 Fuat Akal, ETH Zürich, [email protected]
Why There Is Need For A New Replication Protocol?
Distributed executions of read-only transactions might cause non-serializable global schedules
Continuous update propagation must be coordinated with read-only transaction execution
Having arbitrary physical layouts and extending locking to finer granules require more effort to maintain replicas and to execute read-only transactions
Sophisticated replication mechanism is required...
8September 1st, 2005 Fuat Akal, ETH Zürich, [email protected]
Overview of The PDBREP Protocol
a,c c,d a,b,dc,da,b
all changes are serialized in a global log
Site s1 Site s2 Site s3 Site s4 Site s5
T1 T3
w(d)
update transactions
r(c)w(a) w(b)r(a)
T2
w(c)
TG3: SN: 031013OP: w(d)
TG2: SN: 031012OP: w(c)
TG1: SN: 031011OP: w(a) w(b)
SN:031013v[a]: 6v[b]: 8v[c]: 4v[d]: 2
local propagation
queue
SN:031011v[a]: 6v[b]: 8v[c]: 3v[d]: 1
SN:031011v[a]: 6v[b]: 8v[c]: 3v[d]: 1
SN:031010v[a]: 5v[b]: 7v[c]: 3v[d]: 1
TL1
Update
Sites
Read-only Sites
Broadcast to Read-only Sites
TL2
TL1
TL2
TL1
TL2
Update Counter Vector
Global Counter Vector
read-only transactions T4
r(a)
w(a) w(b)
TL1
T5
w(a) w(b)
TL1
T6
Propagation transactions
• Localized log records are applied to the site by using propagation transactions when that site is idle
• Continuous broadcasting and update propagation keep the site as up-to-date as possible
• Global log records are continuously being broadcast to read-only sites
• They are enqueued in the local propagation queues in their serialization order
System does not allow propagation transactions to overwrite versions needed by read-only transactions
Freshness Locks
9September 1st, 2005 Fuat Akal, ETH Zürich, [email protected]
Overview of The PDBREP Protocol
a,c c,d a,b,dc,da,b
all changes are serialized in a global log
Site s1 Site s2 Site s3 Site s4 Site s5
T1 T3
w(d)
update transactions
r(c)w(a) w(b)r(a)
T2
w(c)
TG3: SN: 031013OP: w(d)
TG2: SN: 031012OP: w(c)
TG1: SN: 031011OP: w(a) w(b)
SN:031013v[a]: 6v[b]: 8v[c]: 4v[d]: 2
local propagation
queue
SN:031011v[a]: 6v[b]: 8v[c]: 3v[d]: 1
SN:031011v[a]: 6v[b]: 8v[c]: 3v[d]: 1
SN:031010v[a]: 5v[b]: 7v[c]: 3v[d]: 1
TL2
Update
Sites
Read-only Sites
Broadcast to Read-only Sites
TL2 TL1
TL2
T4
r(a)
read-only transactions
Update Counter Vector
Global Counter Vector
T8
r(d)r(a)
fresh data required
fresh data required
T7
r(d)r(c)
Freshness is not explicitly
specified
Freshness is not explicitly
specified
SN:031013v[a]: 6v[b]: 8v[c]: 4v[d]: 2
SN:031013v[a]: 6v[b]: 8v[c]: 4v[d]: 2
w(c) w(d)
T10
TL2 TG3
Refresh Transactions
w(c) w(d)
T9
TL2 TG3
To ensure correct executions, each read-only transaction determines the version of the objects it reads at its start
10September 1st, 2005 Fuat Akal, ETH Zürich, [email protected]
Freshness Locks
Propagation
While propagation transactions continue, read-only transaction T1 arrives. Then, propagation transactions stop. A refresh transaction is invoked.
Propagation stops T1 may continue
When T1 commits, it releases its freshness locks. T2 causes another refresh to be invoked.
T1 commits
Propagation stops T2 may continueT2 commits
Propagation stops T3 may continue
Data
Required Timestamp
T1
T2
T3
Propagation Tx
Refresh Tx
Freshness locks are placed on the objects to ensure that ongoing replica maintenance transactions do not overwrite versions needed by ongoing read-only transactions
Freshness locks keep the objects accessed by read-only transaction at a certain freshness level during the execution of that transaction
11September 1st, 2005 Fuat Akal, ETH Zürich, [email protected]
Current freshness freshness lock (request)
Scheduling a Read-only Transaction
Required Timestamp
Data
b
a
T1
T1
T2
T2
T3
T3
1 2 3 4 5 6 7 8
T1 (r1(a), r1(b), TS=1) Sites are older
T2 (r2(a), r2(b), TS=3) Both sites younger b´s freshness lock upgraded TS becomes 5
T3 (r3(a), r3(b), TS=7)There are younger siteb´s lock upgradedTS becomes 8
12September 1st, 2005 Fuat Akal, ETH Zürich, [email protected]
Experimental Evaluations
Investigating the influence of continuous update broadcasting and propagations on cluster performance. We considered… three different …
- settings : There are two basic options that we can switch on or off1. No-Broadcasting and No-Propagation
2. Broadcasting and No-Propagation
3. Broadcasting and Propagation
- workloads : 50%, 75% and 100% loaded clusters• 50% loaded cluster means that the cluster is busy with evaluating queries for the
half of the experiment duration
freshness- Five freshness levels: 0.6, 0.7, 0,8, 0.9, 1.0, e.g., 1.0 means the freshest- Freshness window of 30 seconds, e.g., 0.0 means 30 second old data
Looking at the scalability of PDBREP
Comparing PDBREP to its predecessor (FAS)
13September 1st, 2005 Fuat Akal, ETH Zürich, [email protected]
Experimental Setup
Cluster of 64 PCs 1GHz Pentium III, 256 MB RAM, 2 SCSI Discs, 100MBit Ethernet SQL Server 2000 running under Windows 2000 Advanced Server
TPCR-R Database with scale factor 1 ~4.3G together with indexes, 200 updates per second
Node Groups of 4 nodes Small tables are fully-replicated, the huge ones are partitioned (over
order_key) within the NGs.
14September 1st, 2005 Fuat Akal, ETH Zürich, [email protected]
Average Query Evaluation Times for Different
Workloads and Freshness Values
• Turning on propagation and/or broadcasting always improves the performance.• The lower the workload is, the higher the gain in perfomance becomes, e.g., improvement is
82% for 50% loaded cluster.
0
1000
2000
3000
4000
5000
6000
7000
Setting 1 Setting 2 Setting 3 Setting 1 Setting 2 Setting 3 Setting 1 Setting 2 Setting 3
Workload
Qu
ery
Exe
cuti
on
Tim
e (m
s)
1 0.9 0.8 0.7 0.6
50% 75% 100%
15September 1st, 2005 Fuat Akal, ETH Zürich, [email protected]
Average Refresh Transaction Size for Different
Workloads and Freshness Values
• Propagation eliminates the need for refresh transactions except for the maximum freshness requirements and workloads.
• This results in query execution times practically independent of the overall workload for the given update rate.
• For fully loaded cluster, there is simply no time for propagations except at the beginning and end of transactions, which results in small performance improvement.
0
500
1000
1500
2000
2500
30001
0.9
0.8
0.7
0.6 1
0.9
0.8
0.7
0.6 1
0.9
0.8
0.7
0.6 1
0.9
0.8
0.7
0.6 1
0.9
0.8
0.7
0.6 1
0.9
0.8
0.7
0.6 1
0.9
0.8
0.7
0.6 1
0.9
0.8
0.7
0.6 1
0.9
0.8
0.7
0.6
Setting 1 Setting 2 Setting 3 Setting 1 Setting 2 Setting 3 Setting 1 Setting 2 Setting 3
50% 75% 100%
Workload
Ref
resh
Tra
nsa
ctio
n S
ize
(nu
mb
er o
f ch
ang
es)
Local Global
16September 1st, 2005 Fuat Akal, ETH Zürich, [email protected]
Scalability of PDBREP : Query Throughput for Varying
Cluster Sizes
• PDBREP scales up with the increasing cluster size (Above chart shows the scalability for 50% loaded cluster)
• The results for freshness 0.9 and below are virtually identical due to local refresh transactions
0
1
2
3
4
5
6
4 8 16 32 64
Cluster Size
Qu
erie
s p
er S
eco
nd
0.6 0.7 0.8 0.9 1
17September 1st, 2005 Fuat Akal, ETH Zürich, [email protected]
PDBREP vs. FAS (freshness aware scheduling) : Relative
Query Throughput
• For all three workloads, PDBREP performs significantly better than FAS (30%, 72% and 125%)• PDBREP uses partitioning of data while FAS relies on full replication, which results in small
refresh transactions• PDBREP allows distributed executions and gains from parallelization
0
10
20
30
40
50
60
70
80
90
100
PDBREP FAS PDBREP FAS PDBREP FAS
Workload
Rel
ativ
e Q
uer
y T
hro
ug
hp
ut
1
0.9
0.8
0.7
0.6
50% 75% 100%
18September 1st, 2005 Fuat Akal, ETH Zürich, [email protected]
Conclusions
PDBREP respects user demanded freshness requirements and extends the notion of freshness to finer granules of data
PDBREP requires less refresh effort to serve queries due to continuous propagations of updates
PDBREP allows distributed executions of read-only transactions and produces globally correct schedules
PDBREP supports different physical data organization schemes
PDBREP scales even with higher update rates