1 dc8: transactions chapter 12 transactions and concurrency control

DC8: Transactions Chapter 12

Transactions and Concurrency Control

Transactions

We study transactions here because they require a lot of synchronization and coordination.

Transactions (think databases) have database tables and data items as shared resources. Transactions have the additional capability of coordinating the update of several of these “resources” at once. It is as if a process must have the CR (critical region) for several resources at the same time.

Topics

The transaction environment Serializability theory

Schedules and Conflicts Recoverability, Cascading Aborts

Mechanisms to enforce serializability Two Phase Locking Timestamp Concurrency Control

The Transaction Model

A transaction is a unit of program execution that accesses and possibly updates various data items.

A transaction must see a consistent database. During transaction execution the database may be

inconsistent. When the transaction is committed, the database

must be consistent. Two main issues to deal with:

Failures of various kinds, such as hardware failures and system crashes

Concurrent execution of multiple transactions

Concurrent execution of user programs is essential for good DBMS performance.

A user’s program may carry out many operations on the data retrieved from the DB, however, the DBMS is only concerned with reads and writes.

Users submit transactions and the DBMS interleaves the operations to achieve concurrency.

This is called Concurrency Control

The Transaction Model (ACID) Atomicity. Either all operations of the transaction are

properly reflected in the database or none are. Consistency. Execution of a transaction in isolation

preserves the consistency of the database. Isolation. Although multiple transactions may execute

concurrently, each transaction must be unaware of other concurrently executing transactions. Intermediate transaction results must be hidden from other concurrently executed transactions.

Durability. After a transaction completes successfully, the changes it has made to the database persist, even if there are system failures.

Example: Funds Transfer

Transaction to transfer $50 from account A to account B:1. read(A)2. A := A – 503. write(A)4. read(B)5. B := B + 506. write(B)

Consistency requirement – the sum of A and B is unchanged by the execution of the transaction.

Atomicity requirement — if the transaction fails on any step (after step 3 and before step 6) the system ensures that its updates are not reflected in the database.

Example: Funds Transfer continued Durability requirement — once the user has been

notified that the transaction has completed (i.e., the transfer of the $50 has taken place), the updates to the DB must persist despite failures.

Isolation requirement — if between steps 3 and 6, another transaction is allowed to access the partially updated database, it will see an inconsistent database (the sum A + B will be less than it should be) violating the isolation requirement.Can be ensured by running transactions serially.

Examples of primitives for transactions.

Write data to a file, a table, or otherwiseWRITE

Read data from a file, a table, or otherwiseREAD

Kill the transaction and restore the old valuesABORT_TRANSACTION

Terminate the transaction and try to commitEND_TRANSACTION

Make the start of a transactionBEGIN_TRANSACTION

DescriptionPrimitive

The Transaction Model- Aborts

a) Transaction to reserve three flights commitsb) Transaction aborts when third flight is unavailable

BEGIN_TRANSACTION reserve SFO -> JFK; reserve JFK -> Nairobi; reserve Nairobi -> Malindi full =>ABORT_TRANSACTION (b)

BEGIN_TRANSACTION reserve SFO -> JFK; reserve JFK -> Nairobi; reserve Nairobi -> Malindi;END_TRANSACTION (a)

Transaction States

Active, the transaction is executing Failed, after the discovery that normal execution

can no longer proceed. Aborted, after the transaction has been rolled back

and the database restored to its state prior to the start of the transaction. Two options after it has been aborted: restart the transaction – only if no internal logical error kill the transaction

Committed, after successful completion.

Reasons for Concurrency

Multiple transactions are allowed to run concurrently in the system. Advantages are: increased processor and disk utilization,

leading to better transaction throughput: one transaction can be using the CPU while another is reading from or writing to the disk

reduced average response time for transactions: short transactions need not wait behind long ones.

Distributed Transactions

Flat transactions versus nested transactions (which allow partial results to be committed).

A nested transaction is a transaction that is logically decomposed into a hierarchy of subtransactions.

A distributed transaction is a logically flat indivisible transaction that operates on distributed data.

Distributed Transactions

a) A nested transactionb) A distributed transaction

Transaction Atomicity, Isolation, and Durability Conceptually, when a transaction starts, it is given a

private workspace to make its changes to. When it commits, the private workspace replaces

the corresponding data items in the permanent workspace. If the transaction aborts, the private workspace can simply be discarded.

This type of implementation leads to many private workspaces and thus consumes a lot of space. Also, if a transaction only reads a data table or item, it doesn’t need a private copy.

Private Workspace

a) The file index and disk blocks for a three-block fileb) After a transaction has modified block 0 and appended block 3c) After committing

More Efficient Implementation Two common methods of implementation are write-

ahead logs and before images. With write-ahead logs, the transactions act on the

permanent workspace, but before they can make a change, a log record is written to stable storage with the transaction and data item ID and the old and new values.

This log can then be used if the transaction aborts and the changes need to be rolled back.

Writeahead Log

a) A transactionb) – d) The log before each statement is executed

[x = 0 / 1][y = 0/2][x = 1/4]

[x = 0 / 1][y = 0/2]

[x = 0 / 1]

x = 0;y = 0;BEGIN_TRANSACTION; x = x + 1; y = y + 2 x = y * y;END_TRANSACTION; (a)

Before- and After- Images

A before- and after-image is kept for each data item. When a data item is changed, the old value is written

to the before-image and the new value is the after-image.

Other transactions are not allowed to “see” the new value until the current transaction commits.

The after-image is made permanent and durable once the transaction which wrote it commits.

If the transaction aborts, the before-image is restored.

DBMS Organization General organization of managers for handling

transactions.

DBMS Organization General organization of

managers for handling distributed transactions.

Concurrency Control

Concurrency control schemes – mechanisms to achieve isolation, i.e., to control the interaction among the concurrent transactions in order to prevent them from destroying the consistency of the database.

These schemes are used along with logs and before-, after-images

Need a definition for correct execution of transactions: serializability

Transaction Schedules

Schedules – sequences that indicate the chronological order in which instructions of concurrent transactions are executed a schedule for a set of transactions must consist

of all instructions of those transactions must preserve the order in which the

instructions appear in each individual transaction.

Example Schedule

Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A to B. Here is a serial schedule, in which T1 is followed by T2.

Example Continued

Let T1 and T2 be the transactions defined previously. This schedule is not a serial schedule, but it is equivalent to Schedule 1. That is, the sum A+B is preserved. All the effects are the same as they would be if the schedule were serial.

Not Serializable

This concurrent schedule does not preserve the value of the the sum A + B and is not equivalent to the serial schedule.

Serializability

Assumption – Each transaction preserves database consistency. Thus a serial execution of a set of transactions preserves database consistency.

A (possibly concurrent) schedule is serializable if it is equivalent to a serial schedule.

We assume that transactions may perform arbitrary computations on data in local buffers in between reads and writes. Our simplified schedules consist of only read and write instructions.

Conflict Serializability

Instructions li and lj of transactions Ti and Tj respectively, conflict if and only if there exists some item Q accessed by both li and lj, and at least one of these instructions wrote Q.

1. li = read(Q), lj = read(Q). li and lj don’t conflict.2. li = read(Q), lj = write(Q). They conflict.3. li = write(Q), lj = read(Q). They conflict4. li = write(Q), lj = write(Q). They conflict

Intuitively, a conflict between li and lj forces a (logical) temporal order between them. If li and lj are consecutive in a schedule and they do not conflict, their results would remain the same even if they were interchanged.

Conflict Serializable

If a schedule S can be transformed into a schedule S´ by a series of swaps of non-conflicting instructions, we say that S and S´ are conflict equivalent.

We say that a schedule S is conflict serializable if it is conflict equivalent to a serial schedule

Example of a schedule that is not conflict serializable:

read(Q)write(Q)

write(Q)

Conflict Serializable

This schedule is conflict serializable.

Remember Recovery?

The DB must behave as if it contains all of the effects of committed transactions and none of the effects of uncommitted ones. So, when a transaction aborts, the DBMS must wipe out all its effects.

If a transaction, t1, writes a value to data item x, and t2 reads that value, what happens when t1 subsequently aborts?

Recoverable Schedules

If T8 should abort, T9 would have read (and possibly shown to the user) an inconsistent database state. Hence if T9 is allowed to commit before T8, the schedule is not recoverable.

In order to be recoverable, a transaction is not allowed to commit until every transaction it reads from has committed.

Cascading Aborts

Cascading aborts – a single transaction failure can lead to a series of transaction rollbacks. Consider the following schedule where none of the transactions has yet committed (so the schedule is recoverable)

If T10 fails, T11 and T12 must also be rolled back.

How to Avoid Cascading Aborts

If we ensure that every transaction reads only those items whose values were written by committed transactions, the schedule will avoid cascading aborts.

This restricts the reads of a transaction. What about the writes? Will restricting writes give us an useful property?

Strict Executions

If we ensure that every transaction writes to only those items whose values were written by committed transactions, the schedule is strict.

This nice property ensures that we only have to keep one before-image.

T1 T2 T3write(A)

write(A)write(A)

abortabort

Levels of Consistency (SQL92)

Serializable — default Repeatable read — only committed records to be

read, repeated reads of same record must return same value. However, a transaction may not be serializable. Scheduler must maintain RR property. If t1 makes a second read request and X has been modified, then t1 will be aborted.

Read committed — only committed records can be read, but successive reads of record may return different (but committed) values.

Read uncommitted — even uncommitted records may be read (browse).

Repeatable Read but not Serializable

T1 T2Read(A)B=A+1Write(B)

Read(B)Read(C)Commit

Read(A)C=A+2Write(C)

Read Committed but not Repeatable Read

T1 T2 Read(A) B=A+1 Write(B)

Read(A)A=A*2Write(A)Commit

Read(A) B=A+2 Write(B)

Example

Is the schedule on the next slide conflict serializable, and if so, find a conflict

equivalent serial order?

Remember: two operations conflict if they are from different transactions, they access the same data item,

and at least one of them is a write.

T1 T2 T3 T4 T5

read(X)read(Y)read(Z)

read(V)read(W)read(W)

read(Y)write(Y)

write(Z)read(U)

read(Y)write(Y)read(Z)write(Z)

read(U) write(U)

How to Enforce Serializability? Pessimistic approach: prevent

transactions from accessing data that might lead to a conflict.

Optimistic approach: allow transactions to access the data, but require them to “validate” before committing.

Two Phase Locking (1)

Pessimistic approach Easiest and most widely used way. Scheduler maintains a lock for each data

item. An item is locked on behalf of a transaction and then no other transaction can access it.

Refinement: distinguish between read locks and write locks. Read locks can be shared with other readers.

Rules for Two Phase Locking(2)

Transaction must get a read or write lock on data item d before reading d and must get a write lock on d before writing to d.

After a transaction relinquishes a lock, it may not acquire any new locks.

Two-Phase Locking (3)

2 Phase Locking

Two Phase Locking (4)

Strict 2PL avoids cascading aborts by preventing transactions from seeing uncommitted values.

Locks are acquired then held until the transaction is ready to commit or is aborted.

Two-Phase Locking (5)

Strict two-phase locking.

Two Phase Locking

T1 T2RL(Q)

read(Q) RL(Q) read(Q)UL(Q)

WL(Q) commit

write(Q)

commit

RL(X) means acquire a read lock on X

WL(X) means acquire a write lock on X

UL(X) means unlock X

Two Phase Locking Prevents Schedules That Are Not Serializable

Read(B)Read(C)Commit

Read(A)C=A+2Write(C)

The repeatable read example. T2 will not be able to get the locks it needs

Some Serializable Schedules Are Also Prevented

The scheduler either acquires all locks at the start of all transactions or it acquires them as needed for all transactions. This schedule can only be done with 2PL by a combination of those strategies.

Pessimistic Timestamp Ordering Every transaction gets a (Lamport, totally ordered)

timestamp when it starts. Every data item has a read ts and a write ts and a commit bit c.

The read ts is the ts of the transaction that most recently read the data item. The write ts is the ts of the transaction that most recently wrote to the item.

The commit bit c is true if and only if the most recent transaction to write to that item has committed.

The scheduler maintains the item timestamps and checks to make sure the reads and writes are correct. Goal is to enforce serializability.

Read Too Late

T1 tries to read X, but ts(T1) < write-ts(X) meaning X has been written to by a later transaction.

T1 should not be allowed to read X because it was written by a transaction that occurs later in the serialization order (transactions are serialized by start time).

Solution: T1 is aborted.

T2 writes X

T1 reads X?

T1 starts T2 starts

Write Too Late

T1 tries to write X, but the read-ts indicates that some other transaction should have read the value about to be written. write-ts(X) < ts(T1) < read-ts(X)

Solution: T1 is aborted.

T2 reads X

T1 writes X?

T1 starts T2 starts

Dirty Reads

T1 reads X that was last written by T2. The timestamps are properly ordered, but the commit bit c=false so if T2 later aborts then T1 must abort.

Solution: We can avoid cascading aborts by delaying T1’s read until T2 has committed (though not necessary to ensure serializability).

T2 writes X

T1 reads X?

T2 starts T1 starts T2 abort

Thomas Write Rule

T2 has written to X before T1. When T1 tries to write, the appropriate action is to do nothing. No other transaction T3 that should have read T1’s value of X got T2’s value instead, because it would have been aborted because of a too late read. Future reads of X want T2’s value or a later value, not T1’s value.

Solution: T1’s write can be skipped if T2 commits.

T2 writes X

T1 writes X?

T1 starts T2 starts

Commit Requests

Transaction commit requests are also passed to the scheduler.

To ensure strict executions, a commit request can be delayed until all transactions that wrote items that it overwrote have committed.

The scheduler sets the commit bit c on data items in the write set when it services the commit request.

TS Ordering Rules

When scheduler receives a read request from transaction T, if ts(T)>= write-ts(X) and c(X) is true, grant

request and set read-ts(X) to MAX{ts(T),read-ts(X)}

if ts(T)>= write-ts(X) and c(X) is false, delay T until c(X) becomes true or txn aborts.

If ts(T)< write-ts(X), abort T and restart with new timestamp.

TS Ordering Rules, continued When scheduler receives a write request

from transaction T, if ts(T)>= read-ts(X) and ts(T)>= write-ts(X), grant

request, set write-ts(X) to ts(T) and c(X)=false if ts(T)>= read-ts(X) and ts(T)< write-ts(X), don’t

do the operation but allow T to continue as if done (Thomas write rule).

If ts(T)< read-ts(X), abort T and restart with new timestamp.

Pessimistic TS Ordering

If the scheduler enforces these rules, transactions will be serializable. The serial order is the order of their timestamps.

The next slide is an example of 3 transactions T1, T2, and T3. T1 runs first and completes and has used every item T2 and T3 want. In a, b, c and d, T2 requests a write(x) at the end of the given sequence. In e, f, g and h, T2 requests a read(x) at the end of the sequence.

In (d), T2 could continue (Thomas write rule) if there are no intervening reads. In (f) timestamps of T2 and T3 are reversed.

Pessimistic Timestamp Ordering

Concurrency control using timestamps.

Tent means written but transaction has not yet committed

Optimistic Timestamp Ordering In any optimistic concurrency control, each

transaction does its writes to a private workspace until completion of a validation phase.

In the validate phase, the scheduler validates the transaction by comparing its read set and write set with those of other transactions.

After validation, the write set values are written to the database and the transaction commits

Validation is frequently done with the help of timestamps.

Summary and Examples

Rules for Two Phase Locking

Transaction must get a read or write lock on data item d before reading d and must get a write lock on d before writing to d.

A read lock can be shared with other read locks. A write lock is an exclusive lock.

After a transaction relinquishes a lock, it may not acquire any new locks.

Example One

Is this schedule serializable and would it be permitted under the rules of two phase locking? Where would the locks be acquired?

What about Strict 2PL?

Read(A)Read(B)Commit

Read(A)C=A+2Write(C)Commit

Example Two

Is this schedule serializable and would it be permitted under the rules of two phase locking?

Pessimistic Timestamp Ordering Every transaction gets a (Lamport, totally ordered)

timestamp when it starts. Every data item has a read ts and a write ts and a commit bit c.

The read ts is the ts of the transaction that most recently read the data item. The write ts is the ts of the transaction that most recently wrote to the item.

The commit bit c is true if and only if the most recent transaction to write to that item has committed.

TS Ordering Rules

When scheduler receives a read request from transaction T, if ts(T)>= write-ts(X) and c(X) is true, grant

request and set read-ts(X) to MAX{ts(T),read-ts(X)}

if ts(T)>= write-ts(X) and c(X) is false, delay T until c(X) becomes true or txn aborts.

If ts(T)< write-ts(X), abort T and restart with new timestamp.

TS Ordering Rules, continued When scheduler receives a write request

from transaction T, if ts(T)>= read-ts(X) and ts(T)>= write-ts(X), grant

request, set write-ts(X) to ts(T) and c(X)=false if ts(T)>= read-ts(X) and ts(T)< write-ts(X), don’t

do the operation but allow T to continue as if done (Thomas write rule).

If ts(T)< read-ts(X), abort T and restart with new timestamp.

Example One

Is this schedule serializable and how would it be handled in timestamp ordering?

Example Two

Is this schedule serializable and how would it be handled in timestamp ordering?

1 dc8: transactions chapter 12 transactions and concurrency control

transaction execution

transactions chapter

inconsistent database

database tables

updated database

isolation requirement

account b

system failures

Documents

lecture 11: transactions: concurrency

concurrency control issues in nested transactions ·...

transactions and concurrency control

distributed systems course transactions and concurrency...

database concurrency and transactions - tal olier

world · transactions and concurrency control:...

transactions & concurrency control -...

what are transactions? concurrency control recoverability...

module 5: transactions, concurrency and recovery, recent

transactions and concurrency control - freie...

cse544: transactions concurrency control wednesday,...

lecture 5: transactions concurrency control

accessing data transactions. agenda questions from last...

using chained transactions for maximum concurrency under...

cse544 transactions: concurrency control

concurrency control for transactions with priorities

module 6 - distributed transactions and concurrency control

scheduling open-nested transactions in … open-nested...

transactional concurrency control. transactions: acid...

transactions and concurrency...