csl 771: database implementation transaction processing maya ramanath all material (including...
TRANSCRIPT
CSL 771: Database ImplementationTransaction Processing
Maya Ramanath
All material (including figures) from:Concurrency Control and Recovery in Database SystemsPhil Bernstein, Vassos Hadzilacos and Nathan Goodman
(http://research.microsoft.com/en-us/people/philbe/ccontrol.aspx)
Transactions
• Interaction with the DBMS through SQL
update Airlines set price = price - price*0.1, status = “cheap” where price < 5000
A transaction is a unit of interaction
ACID Properties
• Atomicity• Consistency• Isolation• Durability
Database system must ensure ACID properties
Atomicity and Consistency
• Single transaction– Execution of a transaction: “all-or-
nothing”
Either a transaction completes in its entiretyOr it “does not even start”– As if the transaction never existed– No partial effect must be visible
2 outcomes: A transaction COMMITs or ABORTs
Consistency and Isolation
• Multiple transactions– Concurrent execution can cause an
inconsistent database state– Each transaction executed as if isolated
from the others
Durability
• If a transaction commits the effects are permanent
• But, durability has a bigger scope– Catastrophic failures (floods, fires,
earthquakes)
What we will study…
• Concurrency Control– Ensuring atomicity, consistency and
isolation when multiple transactions are executed concurrently
• Recovery– Ensuring durability and consistency in
case of software/hardware failures
Terminology
• Data item– A tuple, table, block
• Read (x)• Write (x, 5)
• Start (T)• Commit (T)• Abort (T)• Active Transaction
– A transaction which has neither committed nor aborted
High level model
Transaction Manager
Scheduler
Recovery Manager
Cache Manager
Disk
Transaction 1 Transaction 2Transaction n
Recoverability (1/2)
• Transaction T Aborts– T wrote some data items– T’ read items that T wrote
• DBMS has to…– Undo the effects of T– Undo effects of T’– But, T’ has already committed
T T’
Read (x)
Write (x, k)
Read (y)
Read (x)
Write (y, k’)
Commit
Abort
Recoverability (2/2)
• Let T1,…,Tn be a set of transactions
• Ti reads a value written by Tk, k < i
• An execution of transactions is recoverable if
Ti commits after all Tk commitT1 T2
Write (x,2)
Read (x)
Write (y,2)
Commit
T1 T2
Write (x,2)
Read (x)
Write (y,2)
Commit
Commit
Cascading Aborts (1/2)
• Because T was aborted, T1,…, Tk also have to be aborted
T T’ T’’
Read (x)
Write (x, k)
Read (y)
Read (x)
Write (y, k’)
Abort
Read (y)
Cascading Aborts (2/2)
• Recoverable executions do not prevent cascading aborts
• How can we prevent them then ?
T1 T2
Write (x,2)
Read (x)
Write (y,2)
Commit
Commit
T1 T2
Write (x,2)
Commit
Read (x)
Write (y,2)
Commit
What we learnt so far…
T1 T2
Write (x,2)
Read (x)
Write (y,2)
Commit
T1 T2
Write (x,2)
Read (x)
Write (y,2)
Commit
Commit
T1 T2
Write (x,2)
Commit
Read (x)
Write (y,2)
Commit
Not recoverableRecoverable with cascading aborts
Recoverable without cascading aborts
Reading a value, committing a transaction
Strict Schedule (1/2)
• “Undo”-ing the effects of a transaction– Restore the before image of the data
itemT1 T2
Write (x,1)
Write (y,3)
Write (y,1)
Commit
Read (x)
Abort
T1 T2
Write (x,1)
Write (y,3)
Commit
Equivalent to
Final value of y: 3
Strict Schedule (2/2)
T1 T2
Write (x,2)
Write (x,3)
Abort
Initial value of x: 1
Should x be restored to 1 or 3?
T1 T2
Write (x,2)
Write (x,3)
Abort
AbortT1 restores x to 3?T2 restores x to 2?
Do not read or write a value which has been written by an active transaction until that transaction has committed or aborted
T1 T2
Write (x,2)
Abort
Write (x,3)
The Lost Update Problem
T1 T2
Read (x)
Read (x)
Write (x, 200,000)
Commit
Write (x, 200)
Commit
Assume x is your account balance
Serializable Schedules
• Serial schedule– Simply execute transactions one after
the other
• A serializable schedule is one which equivalent to some serial schedule
SERIALIZABILITY THEORY
op21, op22, op23, op24
op11, op12, op13
Serializable SchedulesT1: op11, op12, op13
T2: op21, op22, op23, op24
• Serial schedule– Simply execute transactions one after
the other
op11, op12, op13
op21, op22, op23, op24
• Serializable schedule– Interleave operations– Ensure end result is equivalent to some
serial schedule
Notation
r1[x] = Transaction 1, Read (x)
w1[x] = Transaction 1, Write (x)
c1 = Transaction 1, Commit
a1= Transaction 1, Abort
r1[x], r1[y], w2[x], r2[y], c1, c2
Histories (1/3)
• Operations of transaction T can be represented by a partial order.
r1[x]
r1[y]w1[z] c1
Histories (2/3)
• Conflicting operations– Of two ops operating on the same data
item, if one of them is a write, then the ops conflict
– An order has to be specified for conflicting operations
Histories (3/3)
• Complete History
Serializable Histories
• The goal: Ensure that the interleaving operations guarantee a serializable history.
• The method–When are two histories equivalent?–When is a history serial?
Equivalence of Histories (1/2)
H ≅ H’ if1. they are defined over the same set of
transactions and they have the same operations
2. they order conflicting operations the same way
Equivalence of Histories (2/2)
Source: Concurrency Control and Recovery in Database Systems: Bernstein, Hadzilacos and Goodman
y
Serial History
• A complete history is serial if for every pair of transactions Ti and Tk,– all operations of Ti occur before Tk OR
– all operations of Tk occur before Ti
• A history is serializable if its committed projection is equivalent to a serial history.
Serialization Graph
T1 T3 T2
Serializability Theorem
A history H is serializable if its serialization graph SG(H) is acyclic
On your ownHow do recoverability, strict
schedules, cascading aborts fit into the big picture?
LOCKING
High level model
Transaction Manager
Scheduler
Recovery Manager
Cache Manager
Disk
Transaction 1 Transaction 2Transaction n
Transaction ManagementTransaction
Manager• Receives
Transactions• Sends operations to
scheduler
Scheduler• Execute op• Reject op• Delay op
Read1(x)Write2(y,k)Read2(x)Commit1
Transaction 1Transaction 2Transaction 3
.
.
.Transaction n
Disk
Locking
• Each data item x has a lock associated with it
• If T wants to access x– Scheduler first acquires a lock on x– Only one transaction can hold a lock on
x
• T releases the lock after processing
Locking is used by the scheduler to ensure serializability
Notation
• Read lock and write lockrl[x], wl[x]
• Obtaining read and write locksrli[x], wli[x]
• Lock table– Entries of the form [x, r, Ti]
• Conflicting locks– pli[x], qlk[y], x = y and p,q conflict
• Unlockrui[x], wui[x]
Basic 2-Phase Locking (2PL)
Receive pi[x]
is qlk[x] set such that p and q conflict?
pi[x] delayed
Acquire pli[x]
pi[x] scheduled
RULE 1
NO
YES
RULE 2
pli[x] cannot be released until pi[x] is completed
RULE 3 (2 Phase Rule)
Once a lock is released no other locks may be obtained.
The 2-phase rule
Once a lock is released no other locks may be obtained.T1: r1[x] w1[y] c1
T2: w2[x] w2[y] c2
H = rl1[x] r1[x] ru1[x] wl2[x] w2[x] wl2[y] w2[y] wu2[x] wu2[y] c2 wl1[y] w1[y] wu1[y] c1
T1 T2
Correctness of 2PL
2PL always produces serializable histories
Proof outlineSTEP 1: Characterize properties of the schedulerSTEP 2: Prove that any history with these properties is serializable
(That is, SG(H) is acyclic)
Deadlocks (1/2)
T1: r1[x] w1[y] c1
T2: w2[y] w2[x] c2
Schedulerrl1[x] wl2[y] r1[x] w2[y] <cannot proceed>
Deadlocks (2/2)
Strategies to deal with deadlocks• Timeouts– Leads to inefficiency
• Detecting deadlocks–Maintain a wait-for graph, cycle
indicates deadlock– Once a deadlock is detected, break the
cycle by aborting a transaction• New problem: Starvation
Conservative 2PL
• Avoids deadlocks altogether– T declares its readset and writeset– Scheduler tries to acquire all required locks– If not all locks can be acquired, T waits in a queue
• T never “starts” until all locks are acquired– Therefore, it can never be involved in a deadlock
On your ownStrict 2PL (2PL which ensures only strict
schedules)
Extra Information
• Assumption: Data items are organized in a tree
Can we come up with a better (more efficient) protocol?
Tree Locking Protocol (1/3)
Receive ai[x]
is alk[x] ?
ai[x] delayed
RULE 2
RULE 1
NO
YES
RULE 3ali[x] cannot be released until ai[x] is completed
RULE 2if x is an intermediate node, and y is a parent of x, the ali[x] is possible only if ali[y]
RULE 4Once a lock is released the same lock may not be re-obtained.
pi[x] scheduled
Tree Locking Protocol (2/3)
• Proposition: If Ti locks x before Tk, then for every v which is a descendant of x, if both Ti and Tk lock v, then Ti locks v before Tk.
• Theorem: Tree Locking Protocol always produces Serializable Schedules
Tree Locking Protocol (3/3)
• Tree Locking Protocol avoids deadlock
• Releases locks earlier than 2PL
BUT• Needs to know the access pattern to
be effective• Transactions should access nodes
from root-to-leaf
Multi-granularity Locking (1/3)
• Granularity– Refers to the relative size of the data
item– Attribute, tuple, table, page, file, etc.
• Efficiency depends on granularity of locking
• Allow transactions to lock at different granularities
Multi-granularity Locking (2/3)
• Lock Instance Graph
Source: Concurrency Control and Recovery in Database Systems: Bernstein, Hadzilacos and Goodman
• Explicit and Implicit Locks
• Intention read and intention write locks
• Intention locks conflict with explicit read and write locks but not with other intention locks
Multi-granularity Locking (3/3)
• To set rli[x] or irli[x], first hold irli[y] or iwli[y], such that y is the parent of x.
• To set wli[x] or iwli[x], first hold iwli[y], such that y is the parent of x.
• To schedule ri[x] (or wi[x]), Ti must hold rli[y] (or wli[y]) where y = x, or y is an ancestor of x.
• To release irli[x] (or iwli[x]) no child of x can be locked by Ti
The Phantom Problem
• How to lock a tuple, which (currently) does not exist?
T1: r1[x1], r1[x2], r1[X], c1
T2: w[x3], w[X], c2
rl1[x1], r1[x1], rl1[x2], r1[x2], wl2[x3], wl[X], w2[x3], wu2[x3,X], c2, rl1[X], ru1[x1,x2,X], c1
NON-LOCK-BASED SCHEDULERS
Timestamp Ordering (1/3)
• Each transaction is associated with a timestamp– Ti indicates Transaction T with
timestamp i.
• Each operation in the transaction has the same timestamp
Timestamp Ordering (2/3)
TO RuleIf pi[x] and qk[x] are conflicting operations, then pi[x] is processed before qk[x] iff i < k
Theorem: If H is a history representing an execution produced by a TO scheduler, then H is serializable.
Timestamp Ordering (3/3)
• For each data item x, maintain: max-rt(x), max-wt(x), c(x)• Request ri[x]
– Grant request if TS (i) >= max-wt (x) and c(x), update max-rt (x)
– Delay if TS(i) > max-wt(x) and !c(x)– Else abort and restart Ti
• Request wi[x]– Grant request if TS (i) >= max-wt (x) and TS (i) >= max-rt (x),
update max-wt (x), set c(x) = false– Else abort and restart Ti
ON YOUR OWN: Thomas write rule, actions taken when a transaction has to commit or abort
Validation
• Aggressively schedule all operations• Do not commit until the transaction
is “validated”
ON YOUR OWN
Summary
• Lock-based Schedulers– 2-Phase Locking– Tree Locking Protocol–Multi-granularity Locking– Locking in the presence of updates
• Non-lock-based Schedulers– Timestamp Ordering– Validation-based Concurrency Control
(on your own)
RECOVERY
SOURCE: Database System: The complete book. Garcia-Molina, Ullman and Widom
Logging
• Log the operations in the transaction(s)
• Believe the log– Does the log say transaction T has
committed?– Or does it say aborted?– Or has only a partial trace (implicit
abort)?
• In case of failures, reconstruct the DB from its log
The basic setup
T1
T2
T3
Tk
LOG
The Disk
Buffer Space for data and log
Buffer Spacefor each transaction
Transactions
Terminology
• Data item: an element which can be read or written– tuple, relation, B+-tree index, etc
Input x: fetch x from the disk to bufferRead x,t: read x into variable local variable tWrite x,t: write value of t into xOutput x: write x to disk
Example
Read P, xx -= x* 0.1Write x,PRead S, yy = “CHEAP”Write y, SOutput POutput S
update Airlines set price = price - price*0.1, status = “cheap” where price < 5000
System fails here
System fails here
System fails here
Logs
• Sequence of log records• Need to keep track of– Start of transaction– Update operations (Write operations)– End of transaction (COMMIT or ABORT)
• “Believe” the log, use the log to reconstruct a consistent DB state
Types of logs
• Undo logs– Ensure that uncommitted transactions are
rolled back (or undone)
• Redo logs– Ensure that committed transactions are
redone
• Undo/Redo logs– Both of the above
All 3 logging styles ensure atomicity and durability
Undo Logging (1/3)
• <START T>: Start of transaction T• <COMMIT T>• <ABORT T>• <T, A, x>: Transaction T modified A
whose before-image is x.
Undo Logging (2/3)
Read P, xx -= x* 0.1Write x,PRead S, yy = “CHEAP”Write y, SFLUSH LOGOutput POutput SFLUSH LOG
<START T>
<T, P, x>
<T, S, y>
<COMMIT T>
U1: <T, X, v> should be flushed before Output X
U2: <COMMIT T> should be flushed after all OUTPUTs
Undo Logging (3/3)
• Recovery with Undo log1. If T has a <COMMIT T> entry, do
nothing2. If T has a <START T> entry, but no
<COMMIT T>• T is incomplete and needs to be undone• Restore old values from <T,X,v> records
• There may be multiple transactions– Start scanning from the end of the log
Redo Logging (1/3)
• All incomplete transactions can be ignored
• Redo all completed transactions• <T, A, x>: Transaction T modified A
whose after-image is x.
Redo Logging (2/3)
Read P, xx -= x* 0.1Write x,PRead S, yy = “CHEAP”Write y, S
FLUSH LOGOutput POutput S
<START T>
<T, P, x>
<T, S, y>
<COMMIT T>Write-ahead
Logging
R1: <T, X, v> and <COMMIT T> should
be flushed before Output X
Redo Logging (3/3)
• Recovery with Redo Logging– If T has a <COMMIT T> entry, redo T– If T is incomplete, do nothing (add
<ABORT T>)
• For multiple transactions– Scan from the beginning of the log
Undo/Redo Logging (1/3)
• Undo logging: Cannot COMMIT T unless all updates are written to disk
• Redo logging: Cannot release memory unless transaction commits
• Undo/Redo logs attempt to strike a balance
Undo/Redo Logging (2/3)
Read P, xx -= x* 0.1Write x,PRead S, yy = “CHEAP”Write y, SFLUSH LOGOutput POutput S
<START T>
<T, P, x, a>
<T, S, y, b>
<COMMIT T>
UR1: <T, X, a, b> should be flushed before Output X
U1: <T, X, v> should be flushed before Output X
U2: <COMMIT T> should be flushed after all OUTPUTs R1: <T, X, v> and
<COMMIT T> should be flushed before Output X
Undo/Redo Logging (3/3)
• Recovery with Undo/Redo Logging– Redo all committed transactions
(earliest-first)– Undo all uncommitted transactions
(latest-first)
What happens if there is a crash when you are writing a log? What happens if there is a crash during recovery?
Checkpointing
• Logs can be huge…can we throw away portions of it?
• Can we avoid processing all of it when there is a crash?
ON YOUR OWN