elasca: workload-aware elastic scalability for partition based database systems taha rafiq mmath...
TRANSCRIPT
Elasca: Workload-Aware Elastic Scalability for Partition Based
Database Systems
Taha RafiqMMath Thesis Presentation
24/04/2013
2
Outline
1. Introduction & Motivation2. VoltDB & Elastic Scale-Out Mechanism3. Partition Placement Problem4. Workload-Aware Optimizer5. Experiments & Results6. Supporting Multi-Partition Transactions7. Conclusion
3
INTRODUCTION & MOTIVATION
4
DBMS Scalability
Replication
Partitioning
5
Traditional (DBMS) Scalability
Higher Load
Add Resources
Better Performance
Ability of a system to be enlarged to handle growing amount of work
Expensive Downtime
6
Elastic (DBMS) Scalability
Higher Load
Dynamically Add
Resources
Better Performance
Use of computer resources which vary dynamically to meet a variable workload
NoDowntime
Elastically Scaling a Partition Based DBMS
Re-Partitioning
7
Partition 1
Node 1Partition 1
Node 1
Partition 2
Node 2
Scale Out
Scale In
Elastically Scaling a Partition Based DBMS
Partition Migration
8
P1
Node 1
P2
P3 P4
Node 1
P1 P2
Node 2
P3 P4
Scale Out
Scale In
9
Partition Migration for Elastic Scalability
MechanismHow to add/remove nodes and move
partitions
Policy/StrategyWhich partitions to move when and where
during scale out/scale in
10
Elasca
Elastic Scale-Out Mechanism
Partition Placement & Migration Optimizer
=
+
11
VOLTDB & ELASTIC SCALE-OUT MECHANISM
12
What is VoltDB?
• In memory, partition based DBMS– No disk access = very fast
• Shared nothing architecture, serial execution– No locks
• Stored procedures– No arbitrary transactions
• Replication– Fault tolerance & durability
13
VoltDB Architecture
P1 P2
ES1 ES2
Initiator
Client Interface
P3 P1
ES1 ES2
Initiator
Client Interface
P2 P3
ES1 ES2
Initiator
Client Interface
Client ClientClient Client
Thr
eads
14
Single-Partition Transactions
P1 P2
ES1 ES2
Initiator
Client Interface
P3 P1
ES1 ES2
Initiator
Client Interface
P2 P3
ES1 ES2
Initiator
Client Interface
Client ClientClient Client
15
Multi-Partition Transactions
P1 P2
ES1 ES2
Initiator
Client Interface
P3 P1
ES1 ES2
Initiator
Client Interface
P2 P3
ES1 ES2
Initiator
Client Interface
Client ClientClient Client
ES1
16
Elastic Scale-Out Mechanism
P3 P4
ES3 ES4
Initiator
Client Interface
P1 P2
ES1 ES2Scale-Out Node
(Failed)
ES4
Initiator
Client Interface
ES1
P4
P1
17
Overcommitting Cores
• VoltDB suggests:Partitions per node < Cores per node
• Wasted resources when load is low or data access is skewed
IdeaAggregate extra partitions on each node
and scale out when load increases
18
PARTITION PLACEMENT PROBLEM
19
Given…Cluster and System Specifications
Number of CPU cores
MemoryMax. Number of Nodes
20
Given…
P1 P2 P3 P4 P5 P6 P7 P80
500
1000
1500
2000
2500
3000
Load Per Partition
Partition
Req
uest
s P
er S
eco
nd
21
Given…
P1 P2 P3 P4 P5 P6 P7 P80
200
400
600
800
1000
1200
Size of Each Partition
Partition
Size
in M
B
22
Given…
Partition Node 1 Node 2 Node 3
P1
P2
P3
P4
P5
P6
P7
P8
Current Partition-to-Node Assignment
23
Find…
Partition Node 1 Node 2 Node 3
P1 ? ? ?
P2 ? ? ?
P3 ? ? ?
P4 ? ? ?
P5 ? ? ?
P6 ? ? ?
P7 ? ? ?
P8 ? ? ?
Optimal Partition-to-Node Assignment (For Next Time Interval)
24
Optimization Objectives
Maximize ThroughputMatch the performance of a static, fully
provisioned system
Minimize Resources UsedUse the minimum number of nodes required
to meet performance demands
25
Optimization Objectives
Minimize Data MovementData movement adversely affects system performance and incurs network costs
Balance Load EffectivelyMinimizes the risk of overloading a node
during the next time interval
26
WORKLOAD-AWARE OPTIMIZER
System Overview
27
28
Statistics Collected
α. Maximum number of transactions that can be executed on a partition per second– Max capacity of Execution Sites
β. CPU overhead of host-level tasks– How much CPU capacity the Initiator uses
Effect of β
29
Estimating CPU Load
30
CPU Load Generated by Each Partition
Average CPU Load of Host-Level Tasks Per Node
Average CPU Load Per Node
31
Optimizer Details
• Mathematical Optimization vs. Heuristics• Mixed-Integer Linear Programming (MILP)• Can be solved using any general-purpose
solver (we use IBM ILOG CPLEX)• Applicable for wide variety of scenarios
Objective Function
32
Minimizes data movement as primary objective and balances load as secondary objective
Effect of ε
33
34
Minimizing Resources Used
• Calculate the minimum number of nodes that can handle the load of all the partitions– Non-integer assignment
• Explicitly tell optimizer how many nodes to use• If optimizer can’t find solution with minimum
nodes, it tries again with N + 1 nodes
35
Constraints
• Replication: Replicas of a given partition must be assigned to different nodes
• CPU Capacity: Sum of the load of partitions must be less than capacity of node
• Memory Capacity: All the partitions assigned to a node must fit in its memory
• Host-Level Tasks: The overhead of host-level tasks must not exceed capacity of single core
36
Staggering Scale In
• Fluctuating workload can result in excessive data movement
• Staggering scale in mitigates this problem• Delay scaling in by s time steps• Slightly higher resources used to provide
stability
37
EXPERIMENTAL EVALUATION
38
Optimizers Evaluated
• ELASCA: Our workload-aware optimizer• ELASCA-S: ELASCA with staggered scale in• OFFLINE: Offline optimizer that minimizes
resources used and data movement• GREEDY: A greedy first-fit optimizer• SCO: Static, fully provisioned system (no
optimization)
39
Benchmarks Used
• TPC-C: Modified to make it cleanly partitioned and fit in memory (3.6 GB)
• TATP: Telecommunication Application Transaction Processing Benchmark (250 MB)
• YCSB: Yahoo! Cloud Serving Benchmark with 50/50 read/write ratio (1 GB)
40
Dynamic Workloads
• Varying the aggregate request rate– Periodic waveforms • Sine, Triangle, Sawtooth
• Skewing the data access– Temporal skew– Statistical distributions• Uniform, Normal, Categorical, Zipfian
Temporal Skew
P1 P2 P3 P4 P5 P6 P7 P8
t = 1
Load
41
P1 P2 P3 P4 P5 P6 P7 P8
t = 2
Load
P1 P2 P3 P4 P5 P6 P7 P8
t = 3
Load
P1 P2 P3 P4 P5 P6 P7 P8
t = 4
Load
P1 P2 P3 P4 P5 P6 P7 P8
t = 1
Load
42
Experimental Setup
• Each experiment run for 1 hour• 15 time intervals– Optimizer run every four minutes
• Combination of simulation and actual runs– Exact numbers for data movement, resources
used and load balance through simulation
• Cluster has 4 nodes, 2 separate client machines
Data Movement (TPC-C)
43
Triangle Wave (f = 1)
Data Movement (TPC-C)
44
Triangle Wave (f = 1), Zipfian Skew
Data Movement (TPC-C)
45
Triangle Wave (f = 4)
Computing Resources Saved (TPC-C)
46
Triangle Wave (f = 1)
Load Balance (TPC-C)
47
Triangle Wave (f = 1)
Database Throughput (TPC-C)
48
Sine Wave (f = 2)
Database Throughput (TPC-C)
49
Sine Wave (f = 2), Normal Skew
Database Throughput (TATP)
50
Sine Wave (f = 2)
Database Throughput (YCSB)
51
Sine Wave (f = 2)
Database Throughput (TPC-C)
52
Triangle Wave (f = 4)
Optimizer Scalability
53
54
SUPPORTING MULTI-PARTITION TRANSACTIONS
55
Factors Affecting Performance
• Maximum MPT Throughput (η): The maximum number of transactions an execution site can coordinate per second
• Probability of MPTs (pmpt): Percentage of transactions that are MPTs
• Partitions Involved in MPTs: The number of partitions involved in MPTs
56
Changes to Model
CPU load generated by each partition is equal to sum of:
1. Load due to transaction work (same as SPTs)2. Load due to coordinating MPTs
Maximum MPT Throughput
57
Probability of MPTs
58
Effect on Resources Saved
59
Effect on Data Movement
60
61
CONCLUSION
62
Related Work
• Data replication and partitioning• Database consolidation• Live database migration• Key-value stores• Data placement
63
Elasca
Elastic Scale-Out Mechanism
Partition Placement & Migration Optimizer
=
+
64
Conclusion
• Elasca = Mechanism + Optimizer• Workload-Aware Optimizer– Meets performance demands– Minimizes computing resources used– Minimizes data movement– Effectively balances load
• Scalable to large problem sizes for online setting
65
Future Work
• Migrating to VoltDB 3.0– Intelligent client routing, master/slave
partitions
• Supporting multi-partition transactions• Automated parameter tuning• Transaction mixes• Workload prediction
66
Thank You
Questions?