illustrations of tp isolation levelsathena.ecs.csus.edu/~mitchell/csc204/p3_tp_204_s20.… · web...

CSC 204 Module: P3_tp_204_s20.doc

Introduction to transaction processing (tp)

First Reading Assignment for tp = The first of the tp chapters in E&N and secs 1&2 of second tp chapter (Edition#5 page/chapter numbers used here)

The first tp chapter text reading introduces transaction processing (tp).It is a simple model of transaction representation (Section 21.1.2) not corresponding to the way any real RDBMS actually does tp.

Elmasri/Navathe (E&N) textbook does not contain specifics on the Oracle tp implementation. We will cover (an introduction to) Oracle tp in this course.

First E&N discussion of some concurrent tp behaviors

Def - a transaction (tr) is a source code sequence: op1, . . . , opn, where each opj is a DB operation: read data, write data, control operation (such as terminating the tr), etc. The notation includes scope of the tr.(Unfortunately, tr scope is not fully-defined in the SQL standard)

Concurrent transactionsConsider two tr, tr1: op11, op12,…, op1n and tr2: op21, op22, ..., op2m. In general, each tr is 1) scheduled and 2) executed.The distinct execution modes are 1) serial and 2) concurrent.With serial execution, all tr1 operations are applied, and then all tr2 operations are applied (OR the reverse order).With concurrent execution, the opjk from tr1 and tr2 can be interleaved so that some op1j are executed, j < n, and then some op2k are executed, etc.In the example, op1j might have a physical I/O wait, during which, tr2’s operation op2k (and maybe others as well) could be started or resumed.Important: when trs interleave opj, the order of execution within each tr must be the same as defined in that tr’s source code.As more complex to implement, interleaving usually improves tp throughput Def - throughput means number of tr executed per unit time E&N identifies three bad/undesirable behaviors that can happen when concurrent transaction Rs (reads), Ws (writes), and control operators are not interleaved properly.Below, summarize E&N tp model, but with different examples from theirs. Lost update problem, Fig 21.3 a) – Let x be a variable in the DB accessible to tr T1 and tr T2. Let the initial x value be 0. T1 intends to add 1 to x and T2 intends to add 2 to x. The final value of x should be 3, for either order of the adds: Do all opj of T1, then all opk of T2, or the reverse (serial exec.) Suppose T2 reads x first, but that BEFORE T2 finishes changing x to 2 in the DB, T1 reads x from DB (the value T1 reads is initial DB value, 0, same as T2 read). After T2 finishes writing value (0+2) to x in DB, T1 writes value (0+1) = 1 to x in the DB, and the final value of x in DB is 1, not 3.

Details of Lost update scenarioValue x_db stored in DB is readable/writeable by tr T1 and T2.Recall that values stored in a DB are

stored in blocked rowsWhen a value x (number, string, or other scalar type) is read from the DB

the entire block containing x is read into memory/cache (assume here a single block read)

y1 and y2 are names for variables local to the program address spaces of tr T1 and tr T2 respectively.These two tr read and update their (separate) copies of the initial x_db value.

1

Assume the following interleaved processing for trs T1 and T2:

1. Initially the DB value is x_db [ 0 ] < -- Notation: Location name [current value]2. T2 reads x_db into y2: y2 [ 0 ]3. T2 modifies y2 by +2 y2 [ 2 ]4. T2 is paused or suspended by tp or OS scheduler5. T1 reads x_db into y1: y1 [ 0 ]6. T1 is paused or suspended by tp or OS scheduler7. T2 resumes and writes y2 to DB x_db [ 2 ]8. T2 finishes9. T1 resumes and modifies y1 by +1 y1 [ 1 ]10. T1 writes y1 to DB x_db [ 1 ]

=====================================The final x_db result is 1, not 3 because T2’s update was over-written/destroyed.

The over-written data happened because T1 read the initial x_db value from the databaserather than being forced to wait for T2’s update of x_db to be completed.

In general, successive updates to a value in a DB must be processed to completion, meaningthat whenever a DB value v is being updated by a tr, no other tr should be allowed to change v until thc current update is completed (means either succeeded or failed, that is, committed or rolled back).

Cause: T2’s is “lost/overwritten”; T1 read the original x_db value, not T2’s change of the x_db value

Dirty read problem – Let x be a variable (in the DB) accessible to trs T1 and T2. Let the initial x value be 0. T1 changes x from 0 to 1 in the DB, and continues … AND, meanwhile, T2 starts and is allowed to read the changed x value, namely 1, and then does y x+4, and continues … Next, T1 aborts, and all of T1’s DB effects/changes that occurred or are pending are cancelled. In tp terminology, T1 is “rolled back”. Cancelling T1’s effects is an RDBMS tp requirement.After T1 aborts, the change T1 made to x is (logically) invalid. However, T2 is continuing, and using T1’s update. == > We say that the current value of y that T2 computed is “dirty”.The value (5, not 4) of y used an invalid update of x (value, 1, that “never existed” in the DB). Cause: computing with a value that was never “committed” as a valid value.

Incorrect aggregate - See the E&N exampleExample: tr s adds DB values: 1,2,3,4 into local variable acc, initially 0. s starts and reads 1 into acc. Before tr s reads 2, another tr x updates 2 to 1. Also, after tr s reads 3, tr x updates 3 to 2.Final correct value in acc should be 10, but it is incorrect total 9. The total reads a set of values that depends on when tr x did its changes. (The same initial state producing different query results is inconsistent)Cause: terms of the sum were modified during summation calculation.

The orderings of R and W operations cause the bad results. Note! Database systems differ in how they handle these 3 scenarios.

The rest of this document discusses intro. tp in an Oracle system.

Transaction (tr) – the purpose of a transaction T is to specify a ‘unit of processing’ in database context. Each tr must be processed as an “atomic” unit. “Atomic” means that a) all statements in T are executed in (relative to each other) statement order specified - allows interleaving {such as t1op1 t2op1 t1op2 t1commit, etc.} OR b) one or more specific errors occurred during execution that causes T to fail.

2

a) and b) above are mutually exclusive: per tr; exactly one or the other occurs.

In case b), the RDBMS cancels tr T and removes from the database any data changes and/or DB state changes T caused; T is said to be “rolled back”In standard SQL, the COMMIT/ROLLBACK statements (upper or lower case), DCL statements, are the only explicit statements to specify the end of a tr T. COMMIT/ROLLBACK are considered tr operations, that is, part of the tr definition.

When a tr T COMMIT statement executes successfully, T is considered a completed transaction. COMMIT is the RDBMS guarantee that ALL database state and/or data changes specified by T will be written to DB (means: written successfully to the database) as soon as possible (usually within a few msec.). tr execution time taken depends, among other things, on system load.In standard SQL, ROLLBACK is the only explicit statement for aborting tr execution. Some trs have app code logic that branches to ROLLBACK when a state or situation arises for which a tr must cancel its pending changes, and abort. COMMIT and ROLLBACK are each “irrevocable” statements – this means that there is no such thing as ‘cancelling’ (or “undoing”) either statement, once executed.

Transaction classification by number of statements

Case 1: Single-statement transaction S;This is the simplest case of a transaction, consisting of one operation. The scope of each tr is determined by the statement by which T is committed or rolled back. Consider a committed tr.

Implicit “Start” T (the beginning of a session OR finish of most-recent transaction) S; First, and only non-tr control operation commit; Explicit termination of the T (here, we are not “counting” commit in tr length)Case 2: Multiple-statement transaction S1; : This is the general case of a transaction, consisting of n statements (of any kinds). Sn; Also (covered later) 3GL statements can be mixed with SQL in T code body commit; <or rollback;> (lower case or upper case letters are equivalent; the “;” is required)

Start of next/new Oracle tr is established in one of several ways by the first DML operation after any of 1) ROLLBACK, 2) COMMIT OR 3) start of an Oracle session

An Oracle SQL SELECT (query) does NOT begin/start a tr ==> Compare this rule with Microsoft SQL Server:: BEGIN TRANSACTION marks the starting point of an explicit, local transaction that can have n statements; thus, SELECT can start a tr and holds a table lock during SELECT executionWithout BEGIN TRANSACTION, a statement is executed as a single-statement tr that is “autocommitted” (see The Transaction Scope Setting section below)

Demo of Oracle transaction startupSQL> column name format a50SQL> -- Starting a new Oracle session, and give a name to first tr that is created (trs might be named (for tracking) in distributed DB)SQL> set transaction isolation level read committed NAME 'First session tr';Transaction set.

SQL> -- First, neither a) tr init statements nor b) session init nor c) a query begins a trSQL> select count(*) from abc; COUNT(*) ---------- 2 SQL> select name,status from v$transaction;no rows selectedSQL> SQL> -- Do a DML operation, without commit, and see that a tr has been started, and is pendingSQL> update abc 2 set c = 423 where c = 21;1 row updated.

3

v$transaction is an Oracle DD view having info about the state of all active trs on the DB

SQL> select name,status from v$transaction;NAME STATUS -------------------------------------------------- ---------------- First session tr ACTIVE SQL> rollback;Rollback complete.SQL> SQL> -- After rolling back the single DML statement tr, see that there is no active trSQL> select name,status from v$transaction;no rows selectedSQL> spool off

Transaction execution can be terminated in several ways

Explicitly executing COMMIT or ROLLBACK statement Explicitly executing a DDL statement (create table, grant SELECT …, etc. that changes DD) T execution abort because of a severe kinds of error such as: CPU failure/ disk crash, etc. T can be automatically committed by terminating a SQL*Plus session before executing a

COMMIT (in this situation, commit is done silently (no msg displayed) during session cleanup) 1. Prior to COMMIT; do: SQL> exit (tr will be autocommitted) 2. Prior to COMMIT, session is disconnected/dropped by networking error (tr status … ?)

a database-specific session setting that controls tr scope (see autocommit below) The transaction scope setting

A default Oracle session setting is: SQL> set autocommit off. By contrast, with: SQL> set autocommit on (that is, enabling autocommit), 1) each single SQL statement S is treated as a one-statement transaction 2) S is automatically committed and 3) an explicit COMMIT is not specified by the user. Autocommit will remain its current value until it is changed using set

A Common transaction example – even a “simple” tr has costsA very frequently executed commercial transaction, executed every day all over the world has the form shown below. Refer to this transaction as “T” - - - - - - - - - - update SavingsTable set balance = balance – 500 WHERE customer_id = customer_1;update CheckingTable set balance = balance + 500 WHERE customer_id = customer_2 ;COMMIT;- - - - - - - - - - - - - - - -Worst-case analysis -This T is a 2-table update; assume, physically, tables have 2-level indexes on customer_id column. (Assume root index node is cached, but no leaf nodes are cached).Therefore, with well-designed indexes, each update statement needs three disk I/Os, two reads (one leaf index node and one DB block to read a table row) and one write to disk of the changed row.

Estimating T execution cost in I/OsThus, a total of 6 disk I/Os is used. Even “simple” transactions have significant physical cost. For 3 msec avg disk access time per random I/O on MovingHeadDisk, worst-case T execution is approx. 18-20 msec This translates to a physical limit of approx. 50 such T executions per second on the server. SSD disk is much faster than MHD for reads, but ANY disk access is orders of times slower vs. cache access.

Optimizing the costIt is possible to reduce T execution time in many applications. Consider checking, savings, credit card, loan, and other account rows for the same person and/or members of the same family or business stored using clustered table storage. This is one of the ways that might reduce I/O.

For the moment, restrict the SQL operation/statement types in a transaction to: Query and/or row DML (INSERT/UPDATE/DELETE) on tables/views and DCL (COMMIT/ROLLBACK)

Most common transactions are short-lived (thus, can be “anonymous”); but there might be a need to monitor long-running ones and refer to them by name

4

{ Naming more relevant in distributed trs that reference >1 server with one tr }An RDBMS is installed and operated assuming a specific mode of DB processing:

Database instance modes, again1. Ordinary SQL queries and DML processing with multi-user interactive access,

and many moderate-sized tables is On-Line Transaction Processing (OLTP)Table schema carefully designed, and few gigantic (by rowCount) tables

2. Almost all tables huge; long-running aggregations; infrequent (or never) update; long batch jobs; table partitioning; this is data warehousing (DW)

3. Huge sets of related data; automated analysis seeking to “discover” previously unrealized relationships among data (data mining often involves “semi-structured” data such as logs, tracking, multi-media content, etc.);

{ The rest of this course focuses almost exclusively on OLTP }SQL statements within a Transaction

(Source: WJM edits of: Oracle® Database Concepts, 10g Rel 1, Ch 4 Part# B10743-01) A SQL statement S that runs successfully is different from committing a tr S executes successfully means that S was:

Parsed Found to be a valid SQL statement Run without error as an atomic unit. Example: all rows of a multi-row

update are successfully changed, from the point of view of S’s executer

A SQL statement that fails (for example, the statement violates an RDBMS integrity rule) does NOT cause the containing transaction “T” to fail. The same is true if a SQL statement contains a processing error such as divide by zero – in this case, “ORA-01476: divisor is equal to zero” is returned;After SQL statement S’s error occurs, S is rolled back. However, tr T containing S continues, and changes T has executed are still “pending” ( = not yet committed) .

Notes: 1. SQL is required (by the SQL standard) to return error info when a SQL statement fails. It is up to the DB tr coder/developer to write code to

check for and execute failure code for likely/common errors 2. A statement parse error is not considered a failed statement execution

(because it is not processed by the SQL execution engine)

Statement-Level RollbackIf during SQL statement execution an error occurs, all effects of this statement’s error(s) are rolled back. The effect of rollback is to leave no data nor state changes in DB. The containing tr, however, is not rolled back and continues. Demo of Oracle tp: tr_withSQLerror_Demo.sql (notice particularly the last tr)

First Example of concurrent Transaction Interactions in OracleNow look at interacting effects of concurrent transactions on the same table. Time: t1 < t2 < .. Initial: table X: 1500 rows; U1,U2 have all DML privileges on XIn the diagram below, time is increasing going downward. An Oracle “session” could be interactive via the command prompt SQL> or a script in execution. In examples, session and schema names are interchangeable.

5

Transaction T1 initiated by user U1/schemat1: User U1 issues:-- Add one row “r1” to table Xinsert into X values ( …. );select count(*)from X;Result: 1501 (U1 sees uncommitted new row r1) :Assume this is U1 think time, or some other kind of brief pause <- Never do this in production apps ! { Pauses should NOT be done in transaction code }

t3 commit;[ T1 is no longer an active transaction ]

Transaction T2 initiated by user U2 (U2 could be the same user/schema as U1)

t2: U2 adds a row “r2”, different from r1, into X:insert into X values ( …. );select count(*)from X;Count result: 1501 { U2 sees its new row r2,but not U1’s UNcommitted row r1; this demos that Oracle does not read uncommitted data }

t4: User U2 issues:select count (*)from X;Result: 1502 U2 also sees U1’s COMMITed row [ T2 is still an active transaction ]

In summary - Before T1 committed, T2 could not see T1’s changes, even though T2 started after T1’s row changes (Thus, Oracle tr can never access uncommitted changes – prevents Lost Update and Dirty Reads)After T1 committed, T2 can see data that it could not see prior to T1 commit .By default, an Oracle transaction can read the same data “D” two or more times in the same tr and can retrieve different results (called: non-repeatable reads (changed rows) or phantoms (new rows)

A Transaction wait scenarioThere are many situations in which a given tr is forced to wait. Wait is an implementation mechanism for enforcing synchronization.

When an Oracle transaction T1 does a SQL change (that is: SQL DML) of one table row r, and T1 is pending (i.e., has not done commit/rollback), . . . any other transaction T2 that now executes a change on the same r is forced to WAIT until T1 finishes with either commit or rollback.T1’s row changes on block B dynamically creates a small slot in an ‘ITL’ (slot acts like a semaphore) in variable part of B’s header. T2 is in a “tr wait” state;T2 waits until T1’s ITL slot is removed during T1’s commit/rollback.

There are positive (+) and the negative (-) aspects with any wait mechanism: (+) wait is necessary for preventing various Bad Concurrency behaviors (illustrated earlier in this document) (-) wait delays completion of trs, reducing response time and throughput; waits are implemented by small memory cells having R/W access rules (aka “locks”), adding to tp overhead ==> Excessive locking reduces tp performance

A Demo of SQL DML waiting? In the 2-tr (T1 and T2) example above, what happens if T2 updates the SAME row as did T1? To answer the ?: Assume table T(c) with PK column c (Oracle type number) exists and has 0 rows. Time: t1 < t2 < … . As before, assume trs S1 and S2 can both R/W T. Consider the display:

Schema and tr both named S1 Schema and tr both named S2

t1: insert into T values (1);

t2: select * from T; Result: no rows selected

. . . Doing nothing . . . t3: insert into T values (1); I am now hung/suspended ... No matter what I type/say/do, I cannot resume until S1’s transaction is finalized . . . Reason for this behavior – S1 holds a row update lock on its uncommitted row “ur” and DB block B containing ur is cached; thus, when S2 does insert of same row, Oracle inspects B (cached), sees state for ur, and S2 queues a wait on ur for S2’s turn to modify ur.

t4: commit; Commit complete

t5: insert into T values (1) * ORA-0001: unique constraint %%%% violated

S1’s commit of its transaction released the update lock S1 held on ur. Until that lock release, S2’s transaction waited to get an update lock on ur. As soon as S1 committed, session S2 1) was resumed automatically and 2) its pending insert was processed (and of course caused an RDBMS integrity violation). Notice that S2 INSERT was applied AFTER S1’s commit at time t4. By Oracle default, that DATA, that is, the rows to be INSERTed by S2 are relative to the state of table T after time t4. This behavior shows:

6

by default, an Oracle tr sees data committed by other tr as soon as the commit(s) occur

Read behaviorFor SQL statements executed in tp context, the first level is: behavior when a SQL statement reads data and the second level is:

behavior when successive reads occur within a given transaction TThe data “D” each query reads is previously committed (never dirty), and D actually existed as database data (not transient uncommitted data).The data read by multiple SQL statements in a transaction T is more complex.Even for successive executions of the same query Q, the Q results could be different. (Results of >1 read of the same data depends on each tr’s “isolation level of T” – study this soon )

SQL Statement-Level Read ConsistencyOracle always enforces statement-level read consistency . This guarantees that all the data returned by each query Q comes from a single point in time — the time when Q began. Data returned is from database blocks as they existed at the instant Q (NOT its containing tr, if in a tr) started. As query execution proceeds, data committed before Q began is visible to Q. Q does not see changes committed by other tr during Q’s execution. Below is a Figure illustrating statement read consistency. Note, such a read does not suspend/block another transaction from updating the data being read.Thus, Oracle readers do NOT block writers (opposite of Microsoft SQL Server default).

Read Consistency for each Oracle Query

tr T1 reads DB blocks Bj to calc. an aggregate and tr T2 updates some of the same Bj.

7

The SELECT statement reads the several DB blocks, as shown. “SCN” means system change number.The Scan Path (blocks accessed) indicates the set of DB blocks visible to the query. Using a DB block such as SCN 10008 is called “read around” because it substitutes for the DB block with SCN 10024. Block 10024 is an update to block 10008 that was committed AFTER the SELECT started, whereas 10008 is the block version that existed when the SELECT started. The SELECT reads two Rollback storage blocks with SCNs 10008 and 10021, not blocks changed having SCN 10024

As a query “Q” enters the execution stage, the current system change number (SCN) is determined. Think of an SCN as an internal ‘relative timestamp’. In the Figure above, Q’s SCN is 10023. It represents the relative time at which Q started reading blocks.

As data blocks are read by Q, only blocks written with indicated SCNs are used. Blocks with committed changes (more recent SCNs than Q’s SCN) are reconstructed from data in the rollback (aka Undo) segments, and reconstructed blocks (10008 and 10021) are returned for Q to read.

(When block “B” = (the first read with SCN 10008) is updated to SCN 10024, a copy of the old B (with CSN 10008) is saved in rollback long enough for Q to access the old B)

Each query accesses its data with respect to the time that query execution began.

Changes by other transactions that occur during Q's execution are not visible/accessible to Q, guaranteeing that consistent data is returned for each query

Def – accesses to older versions of a DB block are called “consistent gets” (this terminology is used throughout Oracle documentation) {Some QEP generations (that is, set autotrace parameters) display “consistent gets” information below the QEP display } The SQL statements SELECT , INSERT with a subquery, UPDATE , and DELETE all query data, either explicitly or implicitly; and SQL statements return consistent data. Each of these change type statements uses a query to determine which data is affected by SELECT/INSERT/UPDATE/DELETE. Queries used in INSERT, UPDATE, and DELETE statements are guaranteed a consistent set of results. However, they do not see the changes made by the DML statement itself. In other words, the query in these operations sees data as it existed before the statement makes changes.

Transaction-Level Read ConsistencyOracle has the option of enforcing transaction-level read consistency. When a transaction “tr z” runs in “serializable isolation level” (aka, a serializable tr), data different from z’s data sees/reads the state of the database as of the time the tr z began. This means data seen by all queries within the same tr is consistent (and is the SAME data set) with respect to a single point in time. Therefore, all queries made by a serializable tr z do see changes made by the z itself, but z does not see changes committed by other tr as z executes. Transaction-level read consistency produces repeatable reads and does not expose a query to committed changes by other tr after z began. (More below)Note: serializable is NOT the default Oracle transaction level! (explained next)

Oracle Transaction Isolation Levels

Def – a transaction isolation level defines the set of rules for how that transaction interacts with other concurrent transactions; the behavior rules determine what data is visible within a tr, as well as which data can be read or written, and which operations are forced to wait.

Isolation Level Description (Outline of most important properties)

Read committed(= the default Oracle isolationlevel)

Each query Q in a tr sees only data that was committed before Q (not the transaction!) began. Because Oracle does not prevent other transactions from modifying the data read by a query, that data can be changed by other transactions between two executions of the query. Thus, a transaction that runs a given query twice can experience both nonrepeatable read (subsequent reads see modified or deleted data) and/or phantoms (means: a tr re-runs a query returning a set of rows that satisfies a search condition C and sees that another committed tr inserted additional rows satisfying C.

Serializable Serializable transactions see only those changes that were committed at the time the transaction began, plus those changes made by the

8

Isolation Level Description (Outline of most important properties)

transaction itself through INSERT, UPDATE, and DELETE statements. Serializable transactions do not experience nonrepeatable reads(a re-read for which rows have been deleted or updated), nor phantoms (re-read(s) for which additional rows are returned compared to a previous read(s)).

Read-only Read-only trs 1) see only changes that were already committed at the time tr began and 2) do not allow INSERTs, UPDATEs, and DELETEs. Any DML operation in a read-only tr causes a SQL error (and operation’s rollback)

Setting the Isolation Level

Application designers, application developers, and database administrators can choose appropriate isolation levels for different transactions, depending on app requirements and workload. You can set the isolation level of the NEXT tr only using one of these statements as the first statement (must be first!) of the tr:SET TRANSACTION ISOLATION LEVEL READ COMMITTED; < -- default iso when a tr has no SET commandSET TRANSACTION ISOLATION LEVEL SERIALIZABLE; SET TRANSACTION READ ONLY;

Note: Isolation level in a session switches back to default isolation after a given tr (in serializable or read only) isolation level finishes

Source code for initiating a serializable transaction T would be:commit; Assume finalizing a pending trSET TRANSACTION ISOLATION LEVEL SERIALIZABLE; SET TRANSACTION must be the first (and only!) : T’s code body such tr property statement within T; that is, each trcommit; can have <=1 set transaction command Isolation level reverts back to the Oracle default here

To save the networking and processing cost of beginning each transaction with a SET TRANSACTION statement, you can use the ALTER SESSION statement to set the tr isolation level for all subsequent tr (until/unless subsequent ALTER SESSION):ALTER SESSION SET ISOLATION_LEVEL = SERIALIZABLE; ALTER SESSION SET ISOLATION_LEVEL = READ COMMITTED;

More on Read Committed Isolation

The default isolation level for Oracle is read committed. This isolation is appropriate for environments where few transactions are likely to conflict (heavy updating with many conflicts could cause many event waits). (-) it permits nonrepeatable reads and phantom reads within a transaction (+) typically provides higher transaction throughput (because serializable isolation refuses many DML update scenarios)

More on Serializable Isolation

Serializable isolation is suitable for environments:

With large databases and short transactions that update only a few rows Where two concurrent transactions are very unlikely to modify the same row

Where relatively long-running transactions are primarily read only

Oracle permits a serializable transaction to modify a data row (that it did not create) only if it can determine that previous changes to the row were made by transactions that had committed before the serializable transaction began . (Recall that changes that were committed by other Transactions T’ during execution of serializable T are not visible in T.In fact, serializable mode T will cause failure and statement rollback of a SQL change statement on data that another T’ committed after T started.

9

To determine efficiently whether a change has been successfully committed, Oracle uses control information stored in the data block headers that indicates which rows in the block have done committed vs. uncommitted changes.

Synchronizing concurrent transactionsMost systems that support shared data access use some form of “locking” to enforce waits. However, there are MANY WAYS to implement locking. To understand a given OS/DBMS/etc. concurrency control, there is no substitute for READING THE SYSTEM DOCUMENTATION. Specifically, commercial RDBMS tp subsystems differ from each other in many subtle ways. Different DBMSs are not “WRONG”, just DIFFERENT!

Experiments with Oracle tp isolation levels This section includes examples of each of the three isolation modes supported by Oracle. It is divided into two parts. Each Example#j within each Part has specified session environment initializations.Part 1 involves transactions using read committed isolation. Included in the examples is a brief illustration of how internal Oracle dynamic performance views can display current tp runtime status, such as: which transactions are active and the locks they hold, which transactions are in a wait state and what they are waiting for. Also, the deadlock issue (always possible in systems that share resources) is illustrated with an example Oracle deadlock detection and resolution.An important point here is: app tp code logic should include a handler for transactions that abort due to deadlock. Part 2 illustrates transactions with non-default Oracle isolation levels.

Experiments with Oracle read committed (the default) isolation level Part 1

Notations & Initializations for0 in the examples are as follows.Tj – an Oracle SQL*Plus sessionUj – the user/schema in session TjT(c1, c2) – a table accessed by the Tj; PK is c1, and c2 is a non-PK column tk – a relative point in time when a session performs an action; the tk are ordered, such that: ti < tj when i < j

Examples #1,#2,#3 assume table schema T(c1,c2), where c1 is PK column. autocommit is OFF, and use (default)READ COMMITTED (RC) isolation level for each Ti. T has n rows

time tr T1 tr T2

Example#1 READ COMMITTED allows NON-repeatable read of a row being UPDATEd and committed by another t

t1 -- Update row r’s PK from k to vUpdate T set c1 = v where c1 = k;

t2 Select * from T where c1 = k; (T2 sees the original row r having original PK value of k); T2 pauses

t3 COMMIT;t4 Select * from T where c1 = k;

(T2 sees ‘no rows selected’ as resultbecause the original r with PK: c1= k has committed change of its PK to v)

Explanation: At time t2, tr T2 sees the original row with c1=k, because of READ COMMITTED (T2 can only see previously committed data);At time t4, T2 now can see the changed rows after T1 committed; it is thus possible to change a PK value when no other constraints (such as FK references) are violated.In some DB systems, tr T1 takes a write lock on the row being updated; this would cause tr T2 to wait at

10

time t2 (suspended prior to the SELECT) until T1 either does rollback or commit; in Oracle, the ‘read around’ technique used in Read consistency avoids T2 being blocked at time t2----------Example#1 shows Oracle trade-off with READ COMMITTED isolation; phantom reads are possible (a negative) , but better concurrency (T2 was not blocked) than some other RDBMSs (a positive)

Example#2 Each tr Ti updates the same row of T(c1,c2); indefinite wait on row update by another Tj, AND display of tp state snapshot infotime T1 T2t1 update T set c2=v1 where c1=k;

-- Update existing row “r” t2 update T set c2=v2 where c1=k; -- an

update to same row “r” as accessed by T1t3 *** This session hangs until T1 does

commit/rollbackBelow is the wait state information after U2’s session has been hung (time = t3) for 49 seconds, obtained from Script: display_wait_events.sql running a concurrent (third) observer session

SID USERNAME EVENT BLOCKING_SESSION SECONDS_IN_WAIT (for T2)---------- ----------------- --------------------------------------- ----------------------------- --------------------------- 76 MITCHELL enq: TX - row lock contention 29 T1 is session#29 49 And here is basic held locks info for the Ti, obtained from Script display_held_locks_by_user.sql :USERNAME SID TYPE LMODE--------------- ---------- ------ ----------MITCHELL 76 TX 0 tr T2’s requested (not yet acquired) lock for row r MITCHELL 29 TM 3 T1 holds this lock When any tr has a pending DML operation on a table T, a table lock of TM type is automatically acquired. Why?MITCHELL 76 TM 3 MITCHELL 29 TX 6 T1 holds this update lock on row r

t4 COMMIT;t5 Select * from T where key=k;

At any time t5, the value displayed in T2 by select * from T where c1=k will be (k, v2) whether T1 does commit or rollbackIn the case that T1 did commit, then T1’s committed change of c2 to v1 is now changed when T2 becomes unblocked and does its update of v1 to v2. In the case that T1 did rollback, then T2’s update uses the original row “r” and changes the original c2 value to v2.To review your understanding of this Example, assume T( c1 ,c2) initially contained row (1,1), andsuppose that the SQL update statements executed at the indicated times were:t1 UPDATE T set c2=2 where c1=1 andt2 UPDATE T set c2=c2+1 where c1=1IF tr T1 commits at time t4, THEN, the result at time t5 is:SELECT * from t where c1=1;c1, c2 1 3

Thus, the classical ‘lost update’ problem does not occur Next:: tr scheduling has many of the same difficulties as OS process scheduling

Example#3 A simple change in Example#2 produces deadlock – each Ti updates a different row of T (initially causes no waits), but, then each Ti does DML access to the row with the pending update by the other session; deadlock occurs because each Ti is requesting an exclusive row lock on a row that the other Tj currently holds => each tr waits for an event that will never happen in the current tp state.c2 is a non-key column of T, and the notation “rx.cy” abbreviates column

11

cy of row rx of table T(c1, c2)time T1 T2t1 Update T set col r11.c2=v11; (r11 key= k1)

Now T1 holds one TM (table type) and one TX (row type) lock, respectively for row r11. Locks held:USERNAME SID TYPE LMODE---------------- ---------- ------- ----------MITCHELL 29 TM 3MITCHELL 29 TX 6

t2 : Update T set col r21.c2 = v21; (r21 key=k2)Now T2 holds one TM and one TX lock, respectively for row r21. Session locks held:USERNAME SID TYPE LMODE---------------- ---------- ------- ----------MITCHELL 29 TM 3MITCHELL 76 TM 3MITCHELL 76 TX 6MITCHELL 29 TX 6

t3 Update T set col r21.c2 = v12;Now T1 holds one TM (table type) andtwo TX (row type) lock, respectively for rows r11 and r21:Current Locks Held scoreboard:

USERNAME SID TY LMODE Note: The display is NOT time-ordered by lock acquisition--------------- ---------- -- ----------MITCHELL 29 TX 0 LMODE 0 means tr T1 has requested, but not yet obtained another TX lock MITCHELL 29 TM 3MITCHELL 76 TM 3MITCHELL 76 TX 6MITCHELL 29 TX 6Now T1 is hung because it is waiting for T2’s exclusive lock on row r21

t4 : Update T set col r11.c2 = v22;Now T2 also becomes hung, as did T1

The wait events now are:SID USERNAME EVENT BLOCKING_SESSION SECONDS_IN_WAIT--- ----------------- ------------------------------------- ---------------------------- ------------------------29 MITCHELL enq: TX - row lock contention 76 (T1 waiting on T2) 78(The blocking session is T2)

t5 T1 gets the standard error ORA-00060, indicating it was chosen by tp to break the deadlock. Oracle uses deadlock detection and the victim transaction (in this example, T1’s second UPDATE) has that specific (second update) statement only automatically rolled back). The victim tr T1: a) is still pending and b) will be required to do something after t4, because T2 is suspended until T1 is finalized. The deadlock state has been removed by cancelling T1’s second update.Now, T1 must do a rollback/commit (either can be done after possibly issuing additional statements). In the simplest case of no additional SQL statements, suppose that T1 commits; this will commit only T1’s first row update.

t6 Whether T1 had done commit or rollback, either way, T2 is now unblocked/unsuspended.If T2 does COMMIT, then both rowchanges by T2 will be applied to DB. (Final T contents for r11 and r21 is same whether T1 did COMMIT or ROLLBACK)

12

And, both the event wait and locks held displays now return 0 rows

To review your understanding of this example:Initially, we have T( k , c2) with rows (1,1) and (2,2), and do operations:t1: UPDATE t set c2=3 where k=1;t2: UPDATE t set c2=5 where k=2;t3: UPDATE t set c2= -5 where k=2;t4: UPDATE t set c2= -3 where k=1;What is the final content of table T?

Experiments with Oracle mixed isolation levelsPart 2

In the following Examples #4,#5, session scripting explicitly specifies transaction isolation levels (except for defaults when SET TRANSACTION is not applied). Each example uses a table schema j(c1,c2). Autocommit is off in each example. PK/FK structure is of no concern here.

Example#4 A READ ONLY transaction “t” 1) cannot see data committed after t starts and 2) can only do SELECT statements; T2 is read committed

Time T1 T2t1 set transaction read only;

Transaction set.(All reads are read consistent based on state of data when this tr started)Elapsed: 00:00:00.02

t2 SQL> select count (*) from j;COUNT(*)---------- 6

t3 SQL> insert into j values (20,20);insert into j values (20,20) *ERROR at line 1:ORA-01456: may not perform insert/delete/update operation inside a READ ONLY transaction

t4 (T2 has default READ COMMITTED isolation level)SQL> insert into j values (20,20);1 row created.

t5 commit;t6 SQL> select count (*) from j;

COUNT(*)---------- 7

t7 SQL> select count (*) from j;COUNT(*)---------- 6 -- T1 can’t see committed insert

Example#5 Interacting serializable Ti on the same tabletime T1 T2t1 set transaction isolation level serializable;t2 set transaction isolation level serializable;

13

t3 select count(*) from j; COUNT(*)---------- 11 < -- Initial rowcount

t4 insert into j values (100,100); -- Assume this is a new row

t5 COMMIT;t6 /* The new transaction starting is in read

committed isolation */SQL> select count (*) from j;

COUNT(*)---------- 12

t7 SQL> select count (*) from j;COUNT(*)---------- 11 (Cannot see committed insert (100,100))

t8 SQL> update j set c2=0 where c1= 100;0 rows updated. Can’t see committed new row

t9 update j set c2=200 where c1=2; at t7, this row (call it row x) existed when T2 read j

t10 COMMIT;t11 SQL> update j set c2 = 300 where c1=2;

update j set c2 = 300 where c1=2 *ERROR at line 1:ORA-08177: can't serialize access for this tr(Can’t access a committed change to row that existed when tr T2 started)

Note 2 differences vs. READ COMMITTED: 1) T2 did not see committed INSERT (100,100) by T1 and 2) T2 cannot update a row x that existed before T2 started in the case another tr commits a change to x after T2 starts.

Locking as a synchronization mechanismHere is a summary of the default lock-processing principles covered here: Source, Oracle® Database Concepts, 10 g Release 2 (10.2), B14220-02, Chapter 13, “ Data Concurrency and Consistency” – with modifications by WJM “----Locks are mechanisms that prevent destructive interaction between transactions accessing the same resource.Resources include two general types of objects:

User objects, such as tables and rows (structures and data) System objects not visible to users, such as shared data structures in the

memory and data dictionary rowsImplementations of a “lock” vary in the actual system processing and data structures, the kind of object locked, average lock duration, etc.There are two fundamentally different ways that a lock is acquired:

1 ) by queuing (FIFO waiting); wait until requested resource is available (Demo below)2 ) by requesting if resource is free, but if resource busy, RETRY later; in this approach, there is no wait queue, and acquired access is exclusive

In all cases, an RDBMS automatically obtains necessary locks when executing SQL statements, so users DO NOT need not “program” lock processing. Oracle automatically uses the lowest applicable level of restrictiveness to provide the highest degree of data concurrency, and also provide data integrity. Oracle also allows a user to lock data manually, if desired (dbms_lock package).

14

User-defined locking is rarely, if ever, needed/advised.----"

Locks vs. Latches - 2 alternative ways to implement “locking”Most systems supporting concurrency use 1) lock AND 2) latch constructs for protecting items that are being modified. Details are not covered in this course, but we will outline these mechanisms.A lock on a resource “R” establishes state information about R for the lock duration (= elapsed time during which the lock is held). A lock’s “type” is, in simplest terms, either “exclusive” or “shared”.Exclusive (ex) lock mode is obtained to modify data. The first transaction to lock a resource exclusively is the only transaction that can alter the resource until the exclusive lock is released.Shared lock mode allows the associated resource to be shared, depending on the operations involved. Multiple users reading data can share data, holding share locks to prevent concurrent access by a writer (who needs an exclusive lock).A typical OLTP tr “x” executes in some small number of milliseconds (nX10-3 sec). Aside from execution time, locking by x can consume most of x’s duration. x will typically lock items such as table rows, tables, index paths, etc. Thus, the lock duration will be on the order of milliseconds (mX10-3 sec), m a small int, and there is possibility of several concurrent tr accesses to a resource “r”. Part of shared lock implementation requires x to first REQUEST a lock of a given type.IF this request cannot immediately be satisfied, x’s request is queued.

A latch is, technically, a special form of lock; significant differences from a lock with queueing are:1) latch access is ALWAYS exclusive, and NEVER shared, regardless of what the access intends to do (read/write); used to protect system data structures2) latch duration is expected to be VERY small (nX10-6 sec), n is a small int.3) a latch request is NEVER queued; a failed latch request results in a request retry (usually after a small randomized wait interval)4) typically, a latched resource r is (a small amount of) state information such as reading/inspecting a DD view column, or modifying a cached buffer header, etc.)In general, a well-developed commercial tp system automatically implements the locking behavior and services without need for any app programmer coding.One notable exception – in Oracle, if you want to write your own lock subsystem (rarely, if ever needed!), you can, using the built-in package dbms_lock.

Lock duration –Each data change in most DBs (including Oracle) is in tr context => lock duration (time a lock is held) depends on lock release policy . Again, quoting Oracle docs:“All locks acquired by statements within a transaction are held for the duration of the transaction … Oracle releases all locks acquired by the statements within a transaction when you either commit or undo the transaction”. This prevents dirty reads by other transactions. (an exception is: “ROLLBACK to savepointName”)Concerning the practical problem of “long-running transactions”, Study the script illustrating the savepoint concept: tp_savepoints_example.docx================================================================================

Transaction Redo and Undo overviewWhen a transaction T executes changes to a DB block “B”, we have n >=1 pair(s) <new_block, old_block> (if n > 1, have old/new block chains). The “Consistent” and “Durable” (C and D parts) of the “ACID” properties of a transaction are usually implemented in an RDBMS by some combination of: Redo and Undo processing.The Redo is used to recover from catastrophic errors such as server crash – this is the ONLY purpose of Redo.In Oracle, Redo is a complete tr changes history; Redo is written sequentially and used ONLY during database recovery; is NOT used for individual tr recovery.Default Redo configuration allocates a circular chain of n (default, n=3) Redo log disks that are overwritten when circularly re-used (called “no-archive” mode = > older tr Redo history records are destroyed)The Undo stores the recent versions of changed DB blocks and implements:

15

- consistent gets (Review: means read from cache) (for query read consistency) - rollback of failed individual transactions and/or statements- flashback covered later

Figure – Some Oracle runtime physical process and disk components Database buffers Redo buffers; Undo buffers; LGWR and DBW0 processes;SQL memory areas; DB cache,user program areas

Tables, DB start/stop Redo storage of Undo storage indexes, etc & storage synch DB changes

Example – Transaction Redo and Undo storage Notation: Bxy/SCN for x=blockType y=BlockNumber/SystemChangeNumber. For example, a block in database disk is blockType nil, and a block in Undo is blockType u, a block in Redo log disk is blockType r, and a cached block is denoted by “c’. For any blockType, blockNumber is a block number on disk.

SystemChangeNumber (aka SCN) is usual numerical timestamp used to order various events such as start time of a SQL statement, time a block was written to a disk area, etc. (as seen in section above: ’Read Consistency for each Oracle Query’)Example: Given table S( t, r, w ) and, for simplicity, assume only 2 rows, both in database block B42/1001. Let DB Bfr=2. Call the two areas of B42/1001 where rows are stored “row slot0 & slot1”, & initially B42/1001 contents is: B42/1001: slot0 slot1 < -- row slot

(Eng 841 3) (Aus 74 6) < -- row’s column valuesConsider a transaction T with associated SCN shown in format trName/SCNT/1002: UPDATE S set r=75 where t=’Aus’; (T a simple one DML, one row tr)1) Bc42 a ‘database buffer cache’ slot storing B42 when B42 first read from disk.{The database buffer cache stores blocks read from disk; this is also the cache where block row changes are saved before they are written back to database disk}. A standard Oracle background process named DBW0 writes DB buffers in cache to disk somewhat independently from other Oracle background processes. DBW0 writes at a rate so that there is always buffer space for reading blocks from disk. 2) As T executes, Redo log buffer block Brc42 caches the old version of the changed row and the new version of the changed row in cache log slot slotxx:Brc42slotxx---------B42 slot0 col1 74 Old col r value (Columns are numbered 0, 1, 2, …)---------B42 slot1col1 75 New col r value----------

3) A cached Undo disk block Buc that was initially empty now containsUndo block Buc------------Bu01/1003 slot1 <= If T is rolled back, B42 content will be reset and slot1 col1 again 74.col1 74 Cached block Buc content will be discarded.------------4) If T commits successfully (success being the usual tr commit result),

16

Redo buffer content is force-written to Redo log disk, say into block Br29/1004

Writing Redo buffers to Redo log disk (it is commit overhead) is done as fast as possible. tr rollback probability is small, 1-2% even on busy systems; thus = =>Redo log disk writes are appended to the end of the current Redo log device.Redo log is a simple pile-like file (There are no indexed files or any other overhead).During database recovery, Redo log is read in chronological order, and that access is serial – thus, no need for anything other than simple/fast sequential writes during log writes to disk, and serial reads forward from a specific place during recovery …As we know, DBW0 writes B42’s updates to disk, with its own I/O scheduling. (On Redhat Linux systems, ps -aux | grep ora_ shows ora_x bkgd procs)B42/1005: slot0 slot1

(Eng 841 3) (Aus 75 6) (Assuming block B42 re-written to same location from which it was read)

Redo log I/O Oracle Redo log buffer content for a given T is written to Redo log disk BEFORE the database block changes for T are written to database disk. This synchronous ordering of I/O is the simplest way to guarantee that each committed T can be recovered. There is NO SUCH recovery GUARANTEE for an uncommitted tr! (Ex: cache contents is lost in power failure)In other words, assuming Redo information for a tr x has been saved to Redo log disk, then x can be recovered if x fails (its changes NOT written to disk) to complete successfully.

Part of commit processing is flush (means: “force” write) Redo cache to Redo log disk.Recall that DBW0 independently writes (modified, i.e. changed) cached DB block(s) to disk.If you need more reliability than simply assuming that Redo log files never get corrupted,then you pay for it by using redundant storage to duplex) the Redo log (means: write the log to 2 different physical devices (Statistically, triplexing the log is ultimate protection) Lifetime of tr x: x can fail between any of the times shown below:

Ts - … opj … - - - Tc - LGWR signals DBW0 - -> DBW0 writes x’s changes to DB disk Start x’s operations COMMIT x’s Redo log is complete

{ T could fail at various points in its lifetime: before any Redo log is cached … , after Redo log is written to Redo log disk … or even after T commits …. and before changes written to DB disk }With the processing model and order given above, a successful commit means that we can guarantee that T can be recovered if it fails at some time before its database block changes are written to disk. <-- A requirement of a DBMS system{ Scenario – after a server crash, RDBMS recovery reads the Redo log and does undo (if needed) and finishes each committed T that did not complete }Review: RDBMS recovery is the ONLY purpose of the Redo logThe circumstances that cause Redo log buffer contents to be written to Redo log disk:

- Every n (a configuration parameter) elapsed seconds OR- As part of commit processing OR - The Redo buffer becomes a certain % full (anticipate & avoid buffer overflow)

Redo log disk architecture

There are two operating “modes’ for Redo log disk.The first mode is no-archive mode (Oracle terminology).Physically, the default configuration consists of 3 Redo log “disks”, here name them RD1, RD2, and RD3. The RDk are circularly-chained together. tr Redo is written as a sequence of tr Redo log records. When the “current” RDk fills, a “log switch” is executed.When the last RDk (here, RD3) fills, logging switches back to RD1, thus overwriting the previous RD1 content.

The second mode is archive mode. This mode implements app requirements for which ALL Redo MUST BE SAVED PERMANENTLY. The only change from no-archive mode is that a copy of each full RDj is saved to (some other) permanent storage area before RDj is overwritten.

Introduction to table Flashback Undo storage has been extended by Oracle in recent versions to support variations of the ability to reconstruct old versions of data. This general capability is what Oracle calls “flashback”, and it has several variations and purposes.

17

Here we discuss only one basic form. The FLASHBACK table … command is the simplest use. With various expressions of points in past time, a table can be queried and/or re-constructed exactly as it existed in the database at a specified time. Example: (select * from t AS OF someTimeInThePast WHERE . . . ; ) Re-creates specified subset of rows of t as they existed at time: somePreviousTime

Table-level commands related to flashback

truncate table DEPENDENT Retains table in the DD, but also reclaims row storage;

New in Oracle 10g: a (recoverable) form of the table is still in recyclebinNote: recyclebin storage IS charged to your schema storage quota.

drop table tableName PURGE - the dropped table is NOT stored in schema’s recyclebin(recyclebin is on (not off) by default; ( can be toggled on/off any time in a session )

After drop table TableName is done (without using the PURGE clause): The table can no longer be referenced by its (previous) name However, the table still exists in the Oracle recycle bin, with an OBJECT_NAME of

the form XXX$YYY (Ex: DR$URLS_INDEX$I), along with its “ORIGINAL_NAME” A recyclebin item is 1) queryable by OBJECT_NAME 2) is restorable as a table recyclebin is a schema-specific structure for each schema

Some recyclebin processing examples; flashing back a “deleted” tableIf 2 versions of a table named tst had been dropped previously, we would have:select object_name,original_name,type,can_undrop as "UND",can_purge as "PUR", droptimefrom recyclebin; Notice the column display header re-namingsOBJECT_NAME ORIGINAL_NAME TYPE UND PUR DROPTIME------------------------------ ------------- ------ --- --- -------------------BIN$HGnc55/7rRPgQPeM/qQoRw==$0 TST TABLE YES YES 2016-09-01:16:10:12BIN$HGnc55/9rRPgQPeM/qQoRw==$0 TST TABLE YES YES 2016-09-01:16:21:00

(As you can see, the OBJECT_NAME values do not differ very much, but ARE different)

A specific version of a dropped table is recovered by referring to the appropriate OBJECT_NAME:

SQL> flashback table "BIN$HGnc55/7rRPgQPeM/qQoRw==$0" to before drop;Flashback complete.And now, the original (earlier of 2 versions) table contents again exists! (as at drop time)SQL> select * from tst;COL ROW_CHNG---------- --------Version1 16:10:03

18

Appendix

Oracle SET TRANSACTION clauses - Syntax diagram

Syntax notes: the ISOLATION LEVEL path specifies choice of isolation level for a transaction that will modify values (thus, READ ONLY is not in that path). Using USE ROLLBACK SEGMENT is optional (for high-concurrency on one data set)

Oracle Undo processing “in the Extreme”

A practical problem arises when a tp system is heavily loaded with concurrent updating == many simultaneous requests to change the same data set. This is one of MANY difficulties with engineering a tp system to be scalable

Def – an app is scalable when it satisfies response time requirements for increasing loads

Concerning consistent gets, a heavy load can require saving a large number of old_block representations for some period of time (such as a few minutes). In very extreme situations, the rollback storage area on disk might have insufficient space. Individual transactions with heavy updating can be structured to accommodate their heavier resource demands. One example is using SET TRANSACTION to specify extra/temporary dynamic disk space in a temporary “rollback segment” to store Undo (above syntax diagram)

Chart of a typical benchmark: Throughput vs LoadThe table below shows conflicting behavior. Each experiment, by tableRow below) takes 3 minutes. Done on a dual core laptop with a commodity disk. Obviously, avg(response time) is best for one user, but the optimal throughput occurs at about 28 users. ? Why ? This table illustrates the impossibility of optimizing all user categories at one time.

Concurrent Sessions Avg. Response Time (msec) Total Transactions Completed (tr/sec)

1 79 – the min 4,203 (23 tr/sec) Best for interactive users

2 108 6,772

4 133 10,481

8 198 13,346

12 244 13,639

16 310 14,798

20 337 14,749

24 369 14,176

28 428 15,181 (84 tr/sec) Best for overall system throughput

32 563 13,278

36 533 14,151

40 587 13,302

19

illustrations of tp isolation levelsathena.ecs.csus.edu/~mitchell/csc204/p3_tp_204_s20.… · web...

Documents