distributed transaction processing

44
Distributed Transaction Processing Some of the slides have been borrowed from courses taught at Stanford, Berkeley, Washington, and earlier version of CS 223 at UCI

Upload: laddie

Post on 24-Feb-2016

97 views

Category:

Documents


3 download

DESCRIPTION

Distributed Transaction Processing. Some of the slides have been borrowed from courses taught at Stanford, Berkeley, Washington, and earlier version of CS 223 at UCI . Distributed Transaction. . Transaction T. Action: a 1 ,a 2. Action: a 3. Action: a 4 ,a 5. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Distributed Transaction Processing

Distributed Transaction Processing

Some of the slides have been borrowed from courses taught at Stanford, Berkeley, Washington, and earlier version of CS 223 at UCI

Page 2: Distributed Transaction Processing

Notes 11 2ICS214B

.

Distributed Transaction

Action:a1,a2

Action:a3

Action:a4,a5

Transaction T

Transaction T spans multiple databases

Page 3: Distributed Transaction Processing

Distributed Transactions Processing• Each DBMS performs locking & logging for recovery• How to rollback T?

– easy. Each DBMS can rollback independently.• How to commit T?

– If DBMS1 commits T, and DBMS 2 experiences a a failure before REDO logs of T persistent, T cannot be committed at DBMS2.

– Loss of atomicity.• Atomic Commit Protocols

– Protocol guarantees all sites make the same decision – commit, or abort!

Page 4: Distributed Transaction Processing

Notes 11 4ICS214B

Atomic Commit Protocol• Guarantee of Atomic commit protocol:

– Protocol resilient to communication and site failures– despite failures, if all failures repaired, then transactions commits or aborts at all

sites.• Metrics to compare different protocols:

– Overhead: I/O (number of records written to disk), network (number of messages– when no failures, when there are failures).

– Latency: number of rounds of messages, amount of forced I/O to disk.– Blocking: unbounded waiting due to failures.– Recovery Complexity-- how complex is the termination protocol

• Most common ACP: – Two-phase commit (2PC)

• Centralized 2PC, Distributed 2PC, Linear 2PC, …– Three Phase Commit (3PC)

Page 5: Distributed Transaction Processing

Notes 11 5ICS214B

Terminology

• Resource Managers (RMs)– Usually databases

• Participants– RMs that did work on behalf of transaction

• Coordinator– Component that runs two-phase commit on

behalf of transaction

Page 6: Distributed Transaction Processing

Notes 11 6ICS214B

States of the Transaction • At Coordinator:

– Initiated (I) -- transaction known to system– Preparing (P) -- prepare message sent to participants– committed (C) -- has committed– Aborted (A) -- has aborted

• At participant:– Initiated (I)– Prepared (P) -- prepared to commit, if the coordinator so desires– committed (C) – Aborted (A)

Page 7: Distributed Transaction Processing

Notes 11 7ICS214B

Coor

dina

tor

Parti

cipa

nt

REQUEST-TO-PREPARE

PREPARED*

COMMIT*

DONE

12

3

4

5

1. Local Prepare Work. Write prepare on logs (Forced)

2. Local Prepare Work. (lazy)

3. Write Commit record on logs (Forced)

4. Local commit work. Write completion log record on logs. Ack when durable. (effectively forced, though not immediately)

5. Write Completion log record (lazy)

Page 8: Distributed Transaction Processing

Notes 11 8ICS214B

Protocol Database

• Coordinator maintains a protocol database (in main memory) for each transaction

• Protocol database – enables coordinator to execute 2PC– answers inquiries by participants about status of

transaction• cohorts may make such inquiries if they fail during recovery

– entry for transaction deleted when coordinator is sure that no one will ever inquire about transaction again (when it has been acked by all the participants)

Page 9: Distributed Transaction Processing

Notes 11 9ICS214B

Two phase commit -- normal actions commit case (coordinator)

– make entry into protocol database for transaction marking its status as initiated when coordinator first learns about transaction

– Add participant to the cohort list in protocol database when coordinator learns about the cohorts

– Change status of transaction to preparing before sending prepare message. (it is assumed that coordinator will know about all the participants before this step)

– On receipt of PREPARE message from cohort, mark cohort as PREPARED in the protocol database. If all cohorts PREPARED, then change status to COMMITTED and send COMMIT message.• must force a commit log record to disk before sending commit message.

– on receipt of ACK message from cohort, mark cohort as ACKED. When all cohorts have acked, then delete entry of transaction from protocol database.• Must write a completed log record to disk before deletion from protocol

database. No need to force the write though.

Page 10: Distributed Transaction Processing

Notes 11 10ICS214B

Two Phase Commit - normal actions commit case (participant)

• On receipt of PREPARE message, write PREPARED log record before sending PREPARED message– needs to be forced to disk since coordinator may now

commit.• On receipt of COMMIT message, write COMPLETION

log record before sending ACK to coordinator– cohort must ensure log forced to disk before sending ack --

but no great urgency for doing so.

Page 11: Distributed Transaction Processing

Notes 11 11ICS214B

Coor

dina

tor

Parti

cipa

nt

REQUEST-TO-PREPARE

NO

ABORT*

12

3

4

1. Local Prepare Work. Write prepare on logs (Forced) if voting yes. Else, local abort work. Write Abort log record (lazy)

2. Local Prepare Work. (lazy)

3. Write Abort log record (forced)

4. Local abort work Write Abort record on logs. Ack when done (forced, though not immediately)

5. Write completion log record (lazy)

DONE

5

Page 12: Distributed Transaction Processing

Notes 11 12ICS214B

Timeout actions• At various stages of protocol, transaction waits from messages at both coordinator and

participants. • If message not received, on timeout, timeout action is executed:• Coordinator Timeout Actions

– waiting for votes of participants: ABORT transaction, send aborts to all, delete from protocol database

– waiting for ack from some participant: forward the transaction to recovery process that periodically will periodically send COMMIT to participant. When participant will recover, and all participants send an ACK, coordinator writes a completion log record and deletes entry from protocol database.

• Cohort timeout actions:– waiting for prepare: abort the transaction, write abort log record, send abort message to

coordinator. Alternatively, it could wait for the coordinator to ask for prepare and then vote NO.– Waiting for decision: forward transaction to recovery process. Recovery process periodically

executes status-transaction call to the coordinator. Such a transaction is blocked for recovery of failure.

– NOTE: The recovery process could have used a different termination protocol -- e.g., polling other participants to reduce blocking. (cooperative Termination)

Page 13: Distributed Transaction Processing

Notes 11 13ICS214B

2PC is blocking

Sample scenario:Coord P2

W P1 P3

WP4

W

Page 14: Distributed Transaction Processing

Notes 11 14ICS214B

Case I:P1 “W”; coordinator sent commits

P1 “C”Case II:P1 NO; P1 A

P2, P3, P4 (surviving participants)

cannot safely abort or commit transaction

coord

P1

P2

P3

P4w

w

w

Page 15: Distributed Transaction Processing

Notes 11 15ICS214B

Recovery Actions (cohort)• All sites execute REDO-UNDO pass• Detection: A site knows it is a cohort if it finds a prepared

log record for a transaction• If the log does not contain a completion log record:

– reacquire all locks for the transaction– ask coordinator for the status of transaction

• The coordinator, if it has no information about the transaction in its protocol database in memory, it returns ABORT message. If it has information and transaction committed, it sends back a COMMIT. Else, if it has information but transaction not yet committed, it sends back a WAIT.

• If log contains a completion log record– do nothing

Page 16: Distributed Transaction Processing

Notes 11 16ICS214B

Recovery Action (coordinator)

• If protocol database was made fault-tolerant by logging every change, simply reconstruct the protocol database and restart 2PC from the point of failure.

• However, since we have only logged the commit and completion transitions and nothing else:– if the log does not contain a commit. Simply abort the transaction. If a cohort asks for

status in the future, its status is not in the protocol database and it will be considered as aborted.

– If commit log record, but no completion log record, • recreate transactions entry committed in the protocol database and the recovery

process will ask all the participants if they are still waiting for a commit message. If no one is waiting, the completion entry will be written.

– If commit log record + completion log record • do nothing.

Page 17: Distributed Transaction Processing

Notes 11 17ICS214B

2PC analysis

• Count number of messages, and log writes and number of forced log writes

• Normal Processing overhead– Coordinator: 2 log writes (commit/Abort, complete) 1 forced + 2

messages per cohort– Cohort

• 2 log writes both forced (prepared, committed/aborted)• 2 messages to coordinator

• Various Optimizations to reduce overheads!

Page 18: Distributed Transaction Processing

182/15/99

Read-only Transaction• A read-only participant need only respond to phase

one. It doesn’t care what the decision is.• It responds Prepared-Read-Only to Request-to-Prepare,

to tell the coordinator not to send the decision• Limitation - All other participants must be fully

terminated, since the read-only participant will release locks after voting.– No more testing of SQL integrity constraints– No more evaluation of SQL triggers

Page 19: Distributed Transaction Processing

192/15/99

Presumed Abort• After a coordinator decides Abort and sends Abort to

participants, it forgets about T immediately.• Participants don’t acknowledge Abort (with Done)

Coordinator

ParticipantRequest-to-Prepare

Prepared

Commit

Log abort (forget T)

Log abort (forget T)

Log prepared

Log Start2PC

• If a participant times out waiting for the decision, it asks the coordinator to retry.– If the coordinator has no info for T, it replies Abort.

– Note: Lots of savings when transaction aborts!

Page 20: Distributed Transaction Processing

202/15/99

Transfer of CoordinationIf there is one participant, you can save a round of messages1. Coordinator asks participant to prepare and become the coordinator.2. The participant (now coordinator) prepares, commits, and tells the former coordinator to commit.3. The coordinator commits and replies Done.

CoordinatorParticipant

Request-to-Prepare-and-transfer-coordination

CommitLog commit

Log committed

Log prepared

Done

• Supported by Transarc Encina, but not in any standards.

Page 21: Distributed Transaction Processing

Reducing Blocking of 2PC

• 2PC results in blocking when the cohort is in a prepared state

• Blocked transactions hold onto resources causing increased contention.

• What can we do to reduce blocking?

Page 22: Distributed Transaction Processing

222/15/99

Heuristic Commit• Suppose a participant recovers, but the termination

protocol leaves T blocked.• Operator can guess whether to commit or abort– Must detect wrong guesses when coordinator recovers– Must run compensations for wrong guesses

• Heuristic commit– If T is blocked, the local resource manager (actually,

transaction manager) guesses– At coordinator recovery, the transaction managers jointly

detect wrong guesses.

Page 23: Distributed Transaction Processing

232/15/99

Cooperative Termination Protocol (CTP)• Assume coordinator includes a list of participants in

Request-to-Prepare.• If a participant times-out waiting for the decision,

it runs the following protocol.1. Participant P sends Decision-Req to other participants2. If participant Q voted No or hasn’t voted or received Abort

from the coordinator, it responds Abort3. If participant Q received Commit from the coordinator,

it responds Commit.4. If participant Q is uncertain, it responds Uncertain

(or doesn’t respond at all).• If all participants are uncertain, then P remains blocked.

Page 24: Distributed Transaction Processing

242/15/99

Cooperative Termination Issues

• Participants don’t know when to forget T, since other participants may require CTP– Solution 1 - After receiving Done from all participants,

coordinator sends End to all participants– Solution 2 - After receiving a decision, a participant may

forget T any time.• To ensure it can run CTP, a participant should

include the list of participants in the vote log record.

Page 25: Distributed Transaction Processing

Notes 11 25ICS214B

Is there a non-blocking protocol?Theorem: If communications failure or total site failures (i.e., all sites

are down simultaneously) are possible, then every atomic protocol may cause processes to become blocked.

Two exceptions:if we ignore communication failures, it is possible to design such a protocol (Skeen et. al. 83)If we impose some restrictions on transactions (I.e., what data they can read/write) such a protocol can also be designed (Mehrotra et. al. 92)

Page 26: Distributed Transaction Processing

Notes 11 26ICS214B

Next…• Three-phase commit (3PC)– Nonblocking if reliable network (no

communications failure) and no total site failures

– Handling communications failures

Page 27: Distributed Transaction Processing

Notes 11 27ICS214B

Why 2PC blocks?

• Since operational site on timeout in prepare state does not know if the failed site(s) had committed or aborted the transaction.

• Polling all operational sites does not work since all the operational sites might be in doubt.

Page 28: Distributed Transaction Processing

Notes 11 28ICS214B

Approach to Making ACP Non-blocking

• For a given state S of a transaction T in the ACP, let the concurrency set of S be the set of states that other sites could be in.

• For example, in 2PC, the concurrency set of PREPARE state is {PREPARE, ABORT, COMMIT}• To develop non-blocking protocol, one needs to ensure that:

– concurrency set of a transaction does not contain both a commit and an abort– There exists no non-committable state whose concurrency set contains a commit. A state is

committable if occupancy of the state by any site implies everyone has voted to commit the transaction.

• Necessity of these conditions illustrated by considering a situation with only 1 site operational. If either of the above violated, there will be blocking.– Let S be a state with concurrency set of both commit and abort. If last node is in state S, it cannot commit or

abort since we do not know the state of others.– Let S be a non-committable state whose concurrency set includes commit . If last node is in state S, it cannot

ABORT unilaterally (since S’s concurrency set contains commit). It cannot commit either, since not all sites have voted to commit –presumably one may vote NO.

• Sufficiency illustrated by designing a termination protocol that will terminate the protocol correctly if the above assumptions hold.

Page 29: Distributed Transaction Processing

Notes 11 29ICS214B

Coordinator ParticipantLog start-3PC record(participant list)

Log commit record(state C)

Log prepared record(state W)

Log committed record(state C)

REQUEST-PREPARE

PREPARED

COMMIT

PRECOMMIT

ACK

Page 30: Distributed Transaction Processing

Notes 11 30ICS214B

Coordinator Participant

REQUEST-PREPARE

PREPARED

COMMIT

PRECOMMIT

ACK

1. Timeout: Abort

2. Timeout: ignore

1. Timeout: abort

2. TimeoutTermination Protocol

3. TimeoutTermination Protocol

Note: Timeout failure means the corresponding cohort/coordinator failed and not message failures which are assumed to not fail.

Page 31: Distributed Transaction Processing

Notes 11 31ICS214B

Three Phase Commit - Termination Protocol

• Choose a backup coordinator from the remaining operational sites.

• Backup coordinator sends messages to other operational sites to make transition to its local state (or to find out that such a transition is not feasible) and waits for response.

• Based on response as well as its local state, it continues to commit or abort the transaction.

• It commits, if its concurrency set includes a commit state. Else, it aborts.

Page 32: Distributed Transaction Processing

Notes 11 32ICS214B

Termination Protocol

Start 3PC

Coordinatorfails

Decisionreached

All siteslearn decision

• Only operational processes participate in termination protocol. • Recovered processes wait until decision is reached and then learn decision

Page 33: Distributed Transaction Processing

Notes 11 33ICS214B

Coordinator ParticipantREQUEST-PREPARE

PREPARED

COMMIT

PRECOMMIT

ACK

Abortable (A)

Uncertain (U)

Precommitted (PC)

Committed (C)

Page 34: Distributed Transaction Processing

Notes 11 34ICS214B

Termination Protocol

• Elect new coordinator– Use Election Protocol

• New coordinator sends STATE-REQUEST to participants

• Makes decision using termination rules• Communicates to participants

Page 35: Distributed Transaction Processing

Notes 11 35ICS214B

Coor

dina

tor

Parti

cipa

nt

STATE-REQUEST*

ABORTABLE

ABORT*

Page 36: Distributed Transaction Processing

Notes 11 36ICS214B

Coor

dina

tor

Parti

cipa

nt

STATE-REQUEST*

COMMITTED

COMMIT*

Page 37: Distributed Transaction Processing

Notes 11 37ICS214B

Coor

dina

tor

Parti

cipa

nt

STATE-REQUEST*

UNCERTAIN*

ABORT*

Page 38: Distributed Transaction Processing

Notes 11 38ICS214B

Coor

dina

tor

Parti

cipa

nt

STATE-REQUEST*

PRECOMMITTED, NO COMMITTED

COMMIT*

PRECOMMIT*

ACK*

Page 39: Distributed Transaction Processing

Notes 11 39ICS214B

Termination Protocol

Sample scenario:Coord P1

W P2

WP3

W

Page 40: Distributed Transaction Processing

Notes 11 40ICS214B

Termination Protocol

Sample scenario:Coord P1

W P2

WP3

PC

Page 41: Distributed Transaction Processing

Notes 11 41ICS214B

Note: 3PC unsafe with communication failures!

W

W

W

P

P

abortcommit

Page 42: Distributed Transaction Processing

Replication

• Replicate Data at multiple sites in a distributed system

• Why?– Better Availability– Better response time– Better throughput

• Key Issues– How to ensure consistency– How to propagate updates

Page 43: Distributed Transaction Processing

Basic Technique• Treat replicated data just as ordinary data. Ensure 1 copy

serializability using 2PL, 2PC combination• Locking Protocol

– Reads lock all copies, writes lock all copies– Reads all one copy, writes lock all copies – Quorum based protocols – assign weights to each copy, ensure read

and write quorums conflict• Majority voting

– Primary Copy• Propogation

– Eager – at time of operation– Deferred – at commit time

Page 44: Distributed Transaction Processing

Weaker Consistency Models• Single Master Replication

– Update transactions execute at Master copy– Logs shipped from master to backups and used to recreate the

state of the database– Read-only transactions could execute over “older” transaction

consistent state of the data.– If primary fails, backups could take over transaction processing.

• Multimaster Replication– Update anywhere, read anywhere – Could result in inconsistent state of the database.– Reconciliation in case of inconsistency.