csl 771: database implementation transaction processing maya ramanath all material (including...

CSL 771: Database ImplementationTransaction Processing

Maya Ramanath

All material (including figures) from:Concurrency Control and Recovery in Database SystemsPhil Bernstein, Vassos Hadzilacos and Nathan Goodman

(http://research.microsoft.com/en-us/people/philbe/ccontrol.aspx)

Transactions

• Interaction with the DBMS through SQL

update Airlines set price = price - price*0.1, status = “cheap” where price < 5000

A transaction is a unit of interaction

ACID Properties

• Atomicity• Consistency• Isolation• Durability

Database system must ensure ACID properties

Atomicity and Consistency

• Single transaction– Execution of a transaction: “all-or-

nothing”

Either a transaction completes in its entiretyOr it “does not even start”– As if the transaction never existed– No partial effect must be visible

2 outcomes: A transaction COMMITs or ABORTs

Consistency and Isolation

• Multiple transactions– Concurrent execution can cause an

inconsistent database state– Each transaction executed as if isolated

from the others

Durability

• If a transaction commits the effects are permanent

• But, durability has a bigger scope– Catastrophic failures (floods, fires,

earthquakes)

What we will study…

• Concurrency Control– Ensuring atomicity, consistency and

isolation when multiple transactions are executed concurrently

• Recovery– Ensuring durability and consistency in

case of software/hardware failures

Terminology

• Data item– A tuple, table, block

• Read (x)• Write (x, 5)

• Start (T)• Commit (T)• Abort (T)• Active Transaction

– A transaction which has neither committed nor aborted

High level model

Transaction Manager

Scheduler

Recovery Manager

Cache Manager

Disk

Transaction 1 Transaction 2Transaction n

Recoverability (1/2)

• Transaction T Aborts– T wrote some data items– T’ read items that T wrote

• DBMS has to…– Undo the effects of T– Undo effects of T’– But, T’ has already committed

T T’

Read (x)

Write (x, k)

Read (y)

Read (x)

Write (y, k’)

Commit

Abort

Recoverability (2/2)

• Let T1,…,Tn be a set of transactions

• Ti reads a value written by Tk, k < i

• An execution of transactions is recoverable if

Ti commits after all Tk commitT1 T2

Write (x,2)

Read (x)

Write (y,2)

Commit

T1 T2

Write (x,2)

Read (x)

Write (y,2)

Commit

Commit

Cascading Aborts (1/2)

• Because T was aborted, T1,…, Tk also have to be aborted

T T’ T’’

Read (x)

Write (x, k)

Read (y)

Read (x)

Write (y, k’)

Abort

Read (y)

Cascading Aborts (2/2)

• Recoverable executions do not prevent cascading aborts

• How can we prevent them then ?

T1 T2

Write (x,2)

Read (x)

Write (y,2)

Commit

Commit

T1 T2

Write (x,2)

Commit

Read (x)

Write (y,2)

Commit

What we learnt so far…

T1 T2

Write (x,2)

Read (x)

Write (y,2)

Commit

T1 T2

Write (x,2)

Read (x)

Write (y,2)

Commit

Commit

T1 T2

Write (x,2)

Commit

Read (x)

Write (y,2)

Commit

Not recoverableRecoverable with cascading aborts

Recoverable without cascading aborts

Reading a value, committing a transaction

Strict Schedule (1/2)

• “Undo”-ing the effects of a transaction– Restore the before image of the data

itemT1 T2

Write (x,1)

Write (y,3)

Write (y,1)

Commit

Read (x)

Abort

T1 T2

Write (x,1)

Write (y,3)

Commit

Equivalent to

Final value of y: 3

Strict Schedule (2/2)

T1 T2

Write (x,2)

Write (x,3)

Abort

Initial value of x: 1

Should x be restored to 1 or 3?

T1 T2

Write (x,2)

Write (x,3)

Abort

AbortT1 restores x to 3?T2 restores x to 2?

Do not read or write a value which has been written by an active transaction until that transaction has committed or aborted

T1 T2

Write (x,2)

Abort

Write (x,3)

The Lost Update Problem

T1 T2

Read (x)

Read (x)

Write (x, 200,000)

Commit

Write (x, 200)

Commit

Assume x is your account balance

Serializable Schedules

• Serial schedule– Simply execute transactions one after

the other

• A serializable schedule is one which equivalent to some serial schedule

SERIALIZABILITY THEORY

op21, op22, op23, op24

op11, op12, op13

Serializable SchedulesT1: op11, op12, op13

T2: op21, op22, op23, op24

• Serial schedule– Simply execute transactions one after

the other

op11, op12, op13

op21, op22, op23, op24

• Serializable schedule– Interleave operations– Ensure end result is equivalent to some

serial schedule

Notation

r1[x] = Transaction 1, Read (x)

w1[x] = Transaction 1, Write (x)

c1 = Transaction 1, Commit

a1= Transaction 1, Abort

r1[x], r1[y], w2[x], r2[y], c1, c2

Histories (1/3)

• Operations of transaction T can be represented by a partial order.

r1[x]

r1[y]w1[z] c1

Histories (2/3)

• Conflicting operations– Of two ops operating on the same data

item, if one of them is a write, then the ops conflict

– An order has to be specified for conflicting operations

Histories (3/3)

• Complete History

Serializable Histories

• The goal: Ensure that the interleaving operations guarantee a serializable history.

• The method–When are two histories equivalent?–When is a history serial?

Equivalence of Histories (1/2)

H ≅ H’ if1. they are defined over the same set of

transactions and they have the same operations

2. they order conflicting operations the same way

Equivalence of Histories (2/2)

Source: Concurrency Control and Recovery in Database Systems: Bernstein, Hadzilacos and Goodman

y

Serial History

• A complete history is serial if for every pair of transactions Ti and Tk,– all operations of Ti occur before Tk OR

– all operations of Tk occur before Ti

• A history is serializable if its committed projection is equivalent to a serial history.

Serialization Graph

T1 T3 T2

Serializability Theorem

A history H is serializable if its serialization graph SG(H) is acyclic

On your ownHow do recoverability, strict

schedules, cascading aborts fit into the big picture?

LOCKING

High level model

Transaction Manager

Scheduler

Recovery Manager

Cache Manager

Disk

Transaction 1 Transaction 2Transaction n

Transaction ManagementTransaction

Manager• Receives

Transactions• Sends operations to

scheduler

Scheduler• Execute op• Reject op• Delay op

Read1(x)Write2(y,k)Read2(x)Commit1

Transaction 1Transaction 2Transaction 3

.

.

.Transaction n

Disk

Locking

• Each data item x has a lock associated with it

• If T wants to access x– Scheduler first acquires a lock on x– Only one transaction can hold a lock on

x

• T releases the lock after processing

Locking is used by the scheduler to ensure serializability

Notation

• Read lock and write lockrl[x], wl[x]

• Obtaining read and write locksrli[x], wli[x]

• Lock table– Entries of the form [x, r, Ti]

• Conflicting locks– pli[x], qlk[y], x = y and p,q conflict

• Unlockrui[x], wui[x]

Basic 2-Phase Locking (2PL)

Receive pi[x]

is qlk[x] set such that p and q conflict?

pi[x] delayed

Acquire pli[x]

pi[x] scheduled

RULE 1

NO

YES

RULE 2

pli[x] cannot be released until pi[x] is completed

RULE 3 (2 Phase Rule)

Once a lock is released no other locks may be obtained.

The 2-phase rule

Once a lock is released no other locks may be obtained.T1: r1[x] w1[y] c1

T2: w2[x] w2[y] c2

H = rl1[x] r1[x] ru1[x] wl2[x] w2[x] wl2[y] w2[y] wu2[x] wu2[y] c2 wl1[y] w1[y] wu1[y] c1

T1 T2

Correctness of 2PL

2PL always produces serializable histories

Proof outlineSTEP 1: Characterize properties of the schedulerSTEP 2: Prove that any history with these properties is serializable

(That is, SG(H) is acyclic)

Deadlocks (1/2)

T1: r1[x] w1[y] c1

T2: w2[y] w2[x] c2

Schedulerrl1[x] wl2[y] r1[x] w2[y] <cannot proceed>

Deadlocks (2/2)

Strategies to deal with deadlocks• Timeouts– Leads to inefficiency

• Detecting deadlocks–Maintain a wait-for graph, cycle

indicates deadlock– Once a deadlock is detected, break the

cycle by aborting a transaction• New problem: Starvation

Conservative 2PL

• Avoids deadlocks altogether– T declares its readset and writeset– Scheduler tries to acquire all required locks– If not all locks can be acquired, T waits in a queue

• T never “starts” until all locks are acquired– Therefore, it can never be involved in a deadlock

On your ownStrict 2PL (2PL which ensures only strict

schedules)

Extra Information

• Assumption: Data items are organized in a tree

Can we come up with a better (more efficient) protocol?

Tree Locking Protocol (1/3)

Receive ai[x]

is alk[x] ?

ai[x] delayed

RULE 2

RULE 1

NO

YES

RULE 3ali[x] cannot be released until ai[x] is completed

RULE 2if x is an intermediate node, and y is a parent of x, the ali[x] is possible only if ali[y]

RULE 4Once a lock is released the same lock may not be re-obtained.

pi[x] scheduled


• Proposition: If Ti locks x before Tk, then for every v which is a descendant of x, if both Ti and Tk lock v, then Ti locks v before Tk.

• Theorem: Tree Locking Protocol always produces Serializable Schedules


• Tree Locking Protocol avoids deadlock

• Releases locks earlier than 2PL

BUT• Needs to know the access pattern to

be effective• Transactions should access nodes

from root-to-leaf

Multi-granularity Locking (1/3)

• Granularity– Refers to the relative size of the data

item– Attribute, tuple, table, page, file, etc.

• Efficiency depends on granularity of locking

• Allow transactions to lock at different granularities


• Lock Instance Graph

Source: Concurrency Control and Recovery in Database Systems: Bernstein, Hadzilacos and Goodman

• Explicit and Implicit Locks

• Intention read and intention write locks

• Intention locks conflict with explicit read and write locks but not with other intention locks


• To set rli[x] or irli[x], first hold irli[y] or iwli[y], such that y is the parent of x.

• To set wli[x] or iwli[x], first hold iwli[y], such that y is the parent of x.

• To schedule ri[x] (or wi[x]), Ti must hold rli[y] (or wli[y]) where y = x, or y is an ancestor of x.

• To release irli[x] (or iwli[x]) no child of x can be locked by Ti

The Phantom Problem

• How to lock a tuple, which (currently) does not exist?

T1: r1[x1], r1[x2], r1[X], c1

T2: w[x3], w[X], c2

rl1[x1], r1[x1], rl1[x2], r1[x2], wl2[x3], wl[X], w2[x3], wu2[x3,X], c2, rl1[X], ru1[x1,x2,X], c1

NON-LOCK-BASED SCHEDULERS

Timestamp Ordering (1/3)

• Each transaction is associated with a timestamp– Ti indicates Transaction T with

timestamp i.

• Each operation in the transaction has the same timestamp


TO RuleIf pi[x] and qk[x] are conflicting operations, then pi[x] is processed before qk[x] iff i < k

Theorem: If H is a history representing an execution produced by a TO scheduler, then H is serializable.


• For each data item x, maintain: max-rt(x), max-wt(x), c(x)• Request ri[x]

– Grant request if TS (i) >= max-wt (x) and c(x), update max-rt (x)

– Delay if TS(i) > max-wt(x) and !c(x)– Else abort and restart Ti

• Request wi[x]– Grant request if TS (i) >= max-wt (x) and TS (i) >= max-rt (x),

update max-wt (x), set c(x) = false– Else abort and restart Ti

ON YOUR OWN: Thomas write rule, actions taken when a transaction has to commit or abort

Validation

• Aggressively schedule all operations• Do not commit until the transaction

is “validated”

ON YOUR OWN

Summary

• Lock-based Schedulers– 2-Phase Locking– Tree Locking Protocol–Multi-granularity Locking– Locking in the presence of updates

• Non-lock-based Schedulers– Timestamp Ordering– Validation-based Concurrency Control

(on your own)

RECOVERY

SOURCE: Database System: The complete book. Garcia-Molina, Ullman and Widom

Logging

• Log the operations in the transaction(s)

• Believe the log– Does the log say transaction T has

committed?– Or does it say aborted?– Or has only a partial trace (implicit

abort)?

• In case of failures, reconstruct the DB from its log

The basic setup

T1

T2

T3

Tk

LOG

The Disk

Buffer Space for data and log

Buffer Spacefor each transaction

Transactions

Terminology

• Data item: an element which can be read or written– tuple, relation, B+-tree index, etc

Input x: fetch x from the disk to bufferRead x,t: read x into variable local variable tWrite x,t: write value of t into xOutput x: write x to disk

Example

Read P, xx -= x* 0.1Write x,PRead S, yy = “CHEAP”Write y, SOutput POutput S

update Airlines set price = price - price*0.1, status = “cheap” where price < 5000

System fails here

System fails here

System fails here

Logs

• Sequence of log records• Need to keep track of– Start of transaction– Update operations (Write operations)– End of transaction (COMMIT or ABORT)

• “Believe” the log, use the log to reconstruct a consistent DB state

Types of logs

• Undo logs– Ensure that uncommitted transactions are

rolled back (or undone)

• Redo logs– Ensure that committed transactions are

redone

• Undo/Redo logs– Both of the above

All 3 logging styles ensure atomicity and durability

Undo Logging (1/3)

• <START T>: Start of transaction T• <COMMIT T>• <ABORT T>• <T, A, x>: Transaction T modified A

whose before-image is x.

Undo Logging (2/3)

Read P, xx -= x* 0.1Write x,PRead S, yy = “CHEAP”Write y, SFLUSH LOGOutput POutput SFLUSH LOG

<START T>

<T, P, x>

<T, S, y>

<COMMIT T>

U1: <T, X, v> should be flushed before Output X

U2: <COMMIT T> should be flushed after all OUTPUTs

Undo Logging (3/3)

• Recovery with Undo log1. If T has a <COMMIT T> entry, do

nothing2. If T has a <START T> entry, but no

<COMMIT T>• T is incomplete and needs to be undone• Restore old values from <T,X,v> records

• There may be multiple transactions– Start scanning from the end of the log

Redo Logging (1/3)

• All incomplete transactions can be ignored

• Redo all completed transactions• <T, A, x>: Transaction T modified A

whose after-image is x.

Redo Logging (2/3)

Read P, xx -= x* 0.1Write x,PRead S, yy = “CHEAP”Write y, S

FLUSH LOGOutput POutput S

<START T>

<T, P, x>

<T, S, y>

<COMMIT T>Write-ahead

Logging

R1: <T, X, v> and <COMMIT T> should

be flushed before Output X

Redo Logging (3/3)

• Recovery with Redo Logging– If T has a <COMMIT T> entry, redo T– If T is incomplete, do nothing (add

<ABORT T>)

• For multiple transactions– Scan from the beginning of the log

Undo/Redo Logging (1/3)

• Undo logging: Cannot COMMIT T unless all updates are written to disk

• Redo logging: Cannot release memory unless transaction commits

• Undo/Redo logs attempt to strike a balance


Read P, xx -= x* 0.1Write x,PRead S, yy = “CHEAP”Write y, SFLUSH LOGOutput POutput S

<START T>

<T, P, x, a>

<T, S, y, b>

<COMMIT T>

UR1: <T, X, a, b> should be flushed before Output X

U1: <T, X, v> should be flushed before Output X

U2: <COMMIT T> should be flushed after all OUTPUTs R1: <T, X, v> and

<COMMIT T> should be flushed before Output X


• Recovery with Undo/Redo Logging– Redo all committed transactions

(earliest-first)– Undo all uncommitted transactions

(latest-first)

What happens if there is a crash when you are writing a log? What happens if there is a crash during recovery?

Checkpointing

• Logs can be huge…can we throw away portions of it?

• Can we avoid processing all of it when there is a crash?

ON YOUR OWN

csl 771: database implementation transaction processing maya ramanath all material (including...

Documents

transaction slide

commit slide

x abort t1t1 t2t2

transaction t aborts

operations of transaction

transaction n slide

commit abort slide

aborted slide