transactions. what is it? transaction - a logical unit of database processing motivation - want...
TRANSCRIPT
Transactions
What is it?
• Transaction - a logical unit of database processing
• Motivation - want consistent change of state in data
• Transactions developed in 1950's – banking activities
• Idea of transaction formalized in 1970's • Transactions in clouds in 2010’s?
Why?
Want to interleave operations-
• increases throughput of the system (number of transactions that can finish in any given period)
• interleave I/Os and CPU time
Problems
1) Inconsistent result if crash in middle of transaction
crash due to: hardware failure, system error, exception condition
2) Can have error if concurrent execution3) Uncertainty when changes permanent. -- Write to disk every time?
Concurrency control and Recovery from failures are needed to solve these problems
ACID properties – Jim Gray
• ACID properties• Atomicity - transaction is indivisible,• Consistency - correct execution takes DB
from one consistent state to another• Isolation – transactions affect each other as
if not concurrent• Durability - once a transaction completes
(commits), changes made to data are permanent and transaction is recoverable
Systems guarantees 4 properties• ACID properties• Atomicity
– all or nothing, performs transaction in entirety – solves 1) Inconsistent result if crash in middle of transaction
• Consistency – a logical property based on some consistency rule, implied by isolation – If isolation is implemented properly – solves 2) correct execution takes DB from one consistent state to
another• Isolation
– equivalent to serial schedule (serializability) • T2 -> T1 or T1 -> T2
– takes care of 2) • Durability
– changes are never lost due to subsequent failures –– solves 3) Uncertainty when changes permanent
ACID properties
• Atomicity ensures recovery
• Consistency ensures programmer and DBMS enforce consistency
• Isolation ensures concurrency control is utilized
• Durability ensures recovery
Operations of transactions
• Read and Write of data items • Granularity can be rows of a table or a table
(Granularity can affect concurrency)• Each transaction has an identifier i assigned to it
(Ti) • Transaction commit statement - causes
transaction to end (commit) successfully (Ci) • Transaction abort (rollback) statement - all writes
undone (Ai) • System or User can specify commit and abort
R’s and W’s
Notation:
R1(X) - transaction 1 reads data item X W2(Y) - transaction 2 writes to data item Y C1 - transaction 1 commits
BL1 – transaction 1 blocks A2 - transaction 2 aborts, rollback
• A series of R's and W's is a schedule (history) • Allow multiple users to execute simultaneously to
access tables in a common DB • Concurrent access - concurrency
Lost Update Problem – Dirty Write
Dirty write - some value of DB is incorrect T1 T2 where A=10 R1(A) R2(A) (A=10)
A=A-10 W1(A) A=A+20 W2(A)
R1(A)R2(A)W1(A)C1W2(A)C2
The value of A is 30 when T1 and T2 are done, but it should be 20
Dirty Read ProblemTransaction updates DB item, then fails
T1 T2 where A=10 R1(A) A=A-10 W1(A) R1(C) R2(A) (A=0, reads value T1 wrote)
A1 - T1 fails R2(B) B=B+A W2(B)
R1(A)W1(A)R1(C)R2(A)W2(A)A1 R2(B)W2(B)C2
When T1 is aborted, A is reset to value of 10, values B written by T2 are incorrect
Unrepeatable ReadTransaction reads data item twice, in between values change
T1 T2 where A=10 R1(A) R1(B) B=B+A W1(B) R2(A) A=A+20 W2(A) R1(A) R1(C) C=C+A W1(C)
R1(A)R1(B)W1(B)R2(A)W2(A)C2R1(A)R1(C)W1(C)C1
When T1 first reads A it has a value of 10, then a value of 30
Degrees of Isolation
DBs try to achieve the following:
degree 0 - doesn't overwrite data updated (dirty data) by other transactions with degree at least 1
degree 1 - no lost updates (no dirty writes)
degree 2 - no lost updates and no dirty reads
degree 3 - degree 2 plus repeatable reads
degree 4 - serializability
Serializable
A transaction history is serializable if it is equivalent to some serial schedule (does not mean it is a serial schedule) The important word here is ‘some’
Example of two serial schedules:
R1(X)W1(X)C1 R2(X)W2(X)R2(Y)W2(Y)C2 T1<< T2
R2(X)W2(X)R2(Y)W2(Y)C2 R1(X)W1(X)C1 T2 << T1
Equivalence
• Is a given schedule equivalent to some serial schedule? R2(X)W2(X)R1(X)W1(X)R2(Y)W2(Y)C1C2
• How do we answer this question?
• What is equivalence? – Need same number of operations
• How to define equivalence
Result equivalence
Result equivalent if produce same final state
R1(X) (X*2)W1(X)C1 R2(X)(X+Xmod10)W2(X)C2
if X=10 result is X=20
R1(X)R2(X)(T2:X+Xmod10)W2(X)C2(T1: X*2)W1(X)
if X=10 result is X=20
Same results if X=10, but next try X= 7: (results are 18 and 14)
Conflict equivalence
First define conflicting operations
R and W conflict if: 1. from 2 different transactions 2. reference same data item 3. at least one is a write
Conflicting operations
The conflicting operations are: Ri(A) << Wj(A) read followed by a write Wi(A) << Rj(A) write followed by a read Wi(A) << Wj(A) write followed by a write
Ri(A) << Rj(A) don't conflict Ri(A) << Wj(B) don't conflict
Conflict equivalence
Conflict equivalent if order of any 2 conflicting operations is the same in both schedules
R1(A)W1(A)C1 R2(A)W2(A)C2 or in reverse order
R1(A)<<W2(A) R2(A)<<W1(A) W1(A)<<R2(A) W2(A)<<R1(A) W1(A)<<W2(A) W2(A)<<W1(A)
Conflict equivalence
R1(A)R2(A)W1(A)C1W2(A)C2
R1(A)<<W2(A) R2(A)<<W1(A)
W1(A)<<W2(A) NOT serializable - example of lost update
ExampleIs this schedule serializable? R2(X)W2(X)R1(X)W1(X)R2(Y)W2(Y)C1C2 R2(X)<<W1(X)
W2(X)<<R1(X) W2(X)<<W1(X)
T2<<T1
Is this schedule serializable? R2(X)R1(X)W2(X)R2(Y)W1(X)C1W2(Y)C2
R2(X)<<W1(X) R1(X)<<W2(X)
W2(X)<<W1(X)
not equivalent
Strategy
• There is a simple algorithm for determining conflict serializability of a schedule
• What is the algorithm?
Testing for Equivalence
Construct a precedence graph (aka serialization graph)
1. For each Ti in S, create node Ti in graph 2. For all Rj(X) after Wi(X) create an edge Ti->Tj 3. For all Wj(X) after Ri(X) create an edge Ti->Tj 4. For all Wj(X) after Wi(X) create an edge Ti-> Tj
5. Serializable iff no cycles
If a cycle in the graph, no equivalent serial history
Example
Is this the following schedule serializable?
R1(A)R2(A)W1(A)W2(A)C1C2
T1 T2
Serializability
• If a schedule is serializable, we can say it is correct • Serializable does not mean it is serial • Difficult to test for conflict serializability - interleaving of
operations is determined by the operating system scheduler
• Concurrency control methods don't test for it.
• Instead, protocols developed that guarantee a schedule is serializable.
Concurrency Control Techniques
Protocols - set of rules that guarantee serializability
1.locking
2.Timestamps
3.multiversion
4.validation or certification - optimistic