concurrency control in distributed databases gul sabah arif

18
Concurrency Control in Distributed Databases Gul Sabah Arif

Upload: ashley-bailey

Post on 03-Jan-2016

235 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Concurrency Control in Distributed Databases Gul Sabah Arif

Concurrency Control in Distributed Databases

Gul Sabah Arif

Page 2: Concurrency Control in Distributed Databases Gul Sabah Arif

Introduction Concurrency control is the activity of

coordinating concurrent accesses to a database in a multi-user database management system (DBMS).

Several problems1. The lost update problem. 2. The temporary update problem. 3. The incorrect summary problem. Serializability Theory.

Page 3: Concurrency Control in Distributed Databases Gul Sabah Arif

Distributed DD Architecture. Transaction Manager Data Manager Scheduler

DDBS Architecture Processing Operation

Page 4: Concurrency Control in Distributed Databases Gul Sabah Arif

Scheduling Algorithms Modify concurrency control schemes for use in

distributed environment. There are 3 basic methods for transaction concurrency control.

Locking (two phase locking - 2PL). Timestamp ordering Optimistic Hybrid

Page 5: Concurrency Control in Distributed Databases Gul Sabah Arif

Locking Protocols Majority Protocol Local lock manager at each site administers

lock and unlock requests for data items stored at that site.

When a transaction wishes to lock an un replicated data item Q residing at site Si, a message is sent to Si ‘s lock manager. If Q is locked in an incompatible mode, then the

request is delayed until it can be granted. When the lock request can be granted, the lock

manager sends a message back to the initiator indicating that the lock request has been granted.

Page 6: Concurrency Control in Distributed Databases Gul Sabah Arif

Majority Protocol (Cont.) In case of replicated data

If Q is replicated at n sites, then a lock request message must be sent to more than half of the n sites in which Q is stored.

The transaction does not operate on Q until it has obtained a lock on a majority of the replicas of Q.

When writing the data item, transaction performs writes on all replicas.

Benefit Can be used even when some sites are unavailable

Drawback Requires 2(n/2 + 1) messages for handling lock requests, and

(n/2 + 1) messages for handling unlock requests. Potential for deadlock even with single item - e.g., each of 3

transactions may have locks on 1/3rd of the replicas of a data.

Page 7: Concurrency Control in Distributed Databases Gul Sabah Arif

Biased Protocol Local lock manager at each site as in majority

protocol, however, requests for shared locks are handled differently than requests for exclusive locks.

Shared locks. When a transaction needs to lock data item Q, it simply requests a lock on Q from the lock manager at one site containing a replica of Q.

Exclusive locks. When transaction needs to lock data item Q, it requests a lock on Q from the lock manager at all sites containing a replica of Q.

Advantage - imposes less overhead on read operations.

Disadvantage - additional overhead on writes

Page 8: Concurrency Control in Distributed Databases Gul Sabah Arif

2 Phase Locking (2PL) Centralized 2PL. Primary copy 2PL. Distributed 2PL. Voting 2PL.

Simulation Models for 2PL

Simulation model of centralized 2PL

Page 9: Concurrency Control in Distributed Databases Gul Sabah Arif

Simulation model of distributed 2PL

Page 10: Concurrency Control in Distributed Databases Gul Sabah Arif

Timestamping Secure Time-Stamp Based Concurrency Control

Protocol For Distributed Databases Security level is assigned to each transaction and data. common instances of totally ordered security levels are the Top-Secret

(TS), Secret (S), Confidential(C), and Unclassified (U). System Model N number of sites, where each site Ni is having a secure database, which

is a partition of global database scattered on all the N sites. The secure distributed database is defined as a five tuples < Dt ,Tt ,Ts ,

Sc ,Lv > Dt is the set of data items, Tr is the set of distributed transactions, Ts is the timestamp Sc is the partially ordered set of security levels

Lv is a mapping

Page 11: Concurrency Control in Distributed Databases Gul Sabah Arif

Secure Time-Stamp Based Concurrency Control (cont)

Security level Sci is said to dominate security level Sc j if Sc j <= Sci The security policy used enforces the following restrictions: Simple Security Property: A transaction T (subject) is allowed to read a data

item (object) x , only if Lv (x) <= Lv (T ) . Restricted Property: A transaction T is allowed to write a data item x only if Lv

(x) = Lv (T)

System Architecture (GTr Global Transaction Manager) is a software module which translates and

decomposes the transaction into subtransactions against local schemas, and coordinates the execution of the subtransactions.

GTr Layers1. Transaction Interface2. Authentication Check Layer3. Security Level Assignment layer4. Data Manager and Transaction Manager Layer5. Data Access Tracker(DAT)

Page 12: Concurrency Control in Distributed Databases Gul Sabah Arif

Secure Time-Stamp Based Concurrency Control (cont)

Local Transaction Manager (LTr) Sub transaction interface layer Sub Query Manager Data Administrator Layer Local Database

Page 13: Concurrency Control in Distributed Databases Gul Sabah Arif

A Secure Concurrency Control Protocol

If ( RTs(x) > Tsi ) { Abor t ( Si ) ; } El seI f ( WTs(x) > Tsi ) {Ignor e ( Si ) ;}El seI f( Lv (x) == Lv (Si ) ) /* Lv (x)&Lv (Si ) is security level of data item x & transact ion Si*/{Wr itelockTo( x ) ;Execut ion( x ) ;WTs(x) = Tsi ;Update DAT to Tsi ;}Else{Abor t( Si ) ;/* acces s denied due to secur ity */}Algorithm for read operation on data item x issued by sub transaction Si with Timestamp Tsi :If (WTs(x) > Tsi ){Abort( Si );Rollback( Si );}ElseIf( Lv (x) <= Lv (Si ) ){ReadlockTo( x );ExecuteOn( x );RTs(x) = Tsi ;Update DAT to Tsi ;}Else{Abort( Si );Rollback( Si );}

Page 14: Concurrency Control in Distributed Databases Gul Sabah Arif

Hybrid Three basic technique and each can be used for rw or

ww scheduling or both. Schedulers can be centralized or distributed. Replicated data can be handled in three ways (Do

Nothing, Primary Copy, Voting). System R* Use a 2PL scheduler for rw and ww synchronization.

The schedulers are distributed at the DM's. Replication is handled by the do nothing approach.

Distributed INGRES INGRES uses primary copy for replication.

Page 15: Concurrency Control in Distributed Databases Gul Sabah Arif

New Approaches to ConcurrencyControl Total Ordering Total ordering in networking terms describes the property of a network

guaranteeing that all messages are delivered in the same order across all destinations.

In combination with the concept of transactions, one can make use of this property to ensure that transactions are received in the same order at all sites — called the ORDER CC technique.

Algorithm Each transaction is initiated by sending its reads and write predeclares

to the corresponding schedulers as a single atomic action in totally ordered fashion.

Each scheduler stores the received operation requests in a FIFO-type queue.

If read is at the head of the queue, it is immediately executed. transaction can now issue the write requests in accordance with the

previously given predeclares. Upon commit, the committed values are send in non-ordered fashion to

the schedulers, which re-place the corresponding predeclare statements in the queue with the received committed writes.

Page 16: Concurrency Control in Distributed Databases Gul Sabah Arif

Timestamp Ordering Revisited Whenever a network layout provides predictability

regarding the time at which a message will arrive at its destination, such as interconnection networks, this property can be exploited for concurrency control .

Algorithm The transaction manager initiates a transaction by sending

its reads and write predeclares to the corresponding schedulers as a single atomic action.

This atomic action is assigned a timestamp t, denoting the time by which all operations will have arrived at their respective schedulers.

When a scheduler receives an operation o, it can either wait until time t has arrived .

The alternative option is to process o ahead of time t, and causing conflicting operations that arrive afterwards, but with a lower timestamp, to abort.

Page 17: Concurrency Control in Distributed Databases Gul Sabah Arif

Conclusion Performance Comparison 2PL, the standard technique used for centralised DBMSs,

proves to perform rather poorly for distributed systems, whereas timestamp ordering based protocols in their various forms seem to provide the best overall performance.

In 2PL, and other locking techniques as well, the deadlock prevention or detection in a distributed environment, which is much more complex and costly .

Timestamp ordering techniques (TO) avoid deadlocks entirely. Basic TO (BTO) usually shows better overall performance in a

distributed environment than 2PL. ORDER outperforms both 2PL and BTO, i.e. low network latency and an efficient implementation of the total ordering

algorithm.For high network latencies, ORDER appears to be a rather disadvantageous approach.

PREDICT shows basically the same advantages ORDER does.

Page 18: Concurrency Control in Distributed Databases Gul Sabah Arif

References ” A Secure Time-Stamp Based Concurrency Control

Protocol For Distributed Databases” Journal of Computer Science 3 (7): 561-565, 2007

“Some Models of a Distributed Database Management System with Data Replication", International Conference on Computer Systems and Technologies - CompSysTech’07.

“A Sophisticated introduction to distributed database concurrency control”, Harvard University Cambridge, 1990.

“Database system concepts”,from Silberschatz Mc-graw Hill 2001.