presented by dr. greg speegle april 12, 2013. two-phase commit slow relative to local transaction...
TRANSCRIPT
CALVIN: FAST DISTRIBUTED TRANSACTIONS FOR
PARTITIONED DATABASE SYSTEMS
THOMSON ET ALSIGMOD 2012
Presented by Dr. Greg Speegle
April 12, 2013
DISTRIBUTED TRANSACTIONS Two-phase commit slow relative to local
transaction processing CAP Theorem
Option 1: Reduce availabilityOption 2: Reduce consistency
Goal: Provide availability and consistency by changing transaction semantics
DETERMINISTIC TRANSACTIONS Normal transaction execution
Submit SQL statementsSubsequent operations dependent on
results Deterministic transaction execution
Submit all requests before startExample: Auto-commit Difficult for dependent execution
ARCHITECTURE Sequencing Layer
Per replicaCreates universal transaction execution
order Scheduling Layer
Per data storeExecutes transactions consistently with
order Storage Layer
CRUD interface
ARCHITECTURE OVERVIEW
DATA MODEL Dataset partitioned Partitions are replicated One copy of each partition forms replica All replicas of one partition form
replication group Master/slave within replication group
(for asynchronous replication)
SEQUENCER Requests (deterministic transaction)
submitted locally Epoch – 10ms group of requests Asynchronous replication – master
receives all requests & determines order Synchronous replication – Paxos
determines order Batch sent to scheduler
SCHEDULER Logical concurrency control & recovery
(e.g., no TIDs) Lock manager distributed (lock only
keys stored locally) Strict 2PL with changes:
If t0 and t1 conflict and t0 precedes t1 in sequence order, t0 locks before t1
All lock requests by transaction processed together in sequence order
SCHEDULER II Transaction executes after all locks
acquiredRead/Write set analysis
Local vs Remote Read-only nodes are passive participants Write nodes are active participants
Local ReadsDistribute reads to active participantsCollect remote read resultsApply local writes
SCHEDULER III Deadlock Free (acyclic waits-for graph) Dependent Transactions
Read-only reconnaissance query generates read set
Transaction executed with resulting read/write locks
Re-execute if changes Maximum conflict footprint under 2PL
STORAGE Disk I/O problem
Pause t0 when I/O requiredA t1 can “jump ahead” of t0 (get conflicting
lock before t0) Solution: Delay t0, but request data So t1 may precede t0 in sequence
(assume) and execution
CHECKPOINTING Logging requires only ordered
transactions to restore after failure At checkpoint time (global epoch time)
Keep two versions of data, before & afterTransaction access appropriate dataAfter all “before” transactions terminate,
flush all dataThrow away “before” version if “after”
exists 20% throughput impact
PERFORMANCE TPC-C benchmark (order placing) Throughput scales linearly with number
of machines Per-node throughput appears
asymptotic At high contention, outperforms RDBMS At low contention, worse performance
CONCLUSION Adds ACID capability to any CRUD
system Performs nearly linear scale-up Requires deterministic transactions