1 cs411 database systems 12: recovery obama and eric schmidt sysadmin song

1

CS411Database Systems

12: Recoveryobama and eric schmidt http://www.youtube.com/watch?v=k4RRi_ntQc8sysadmin song http://www.youtube.com/watch?v=udhd9fmOdCs14th century sysadmin http://www.youtube.com/watch?v=8UXAF-CUmIA

2

Bad things happen, but the DB contents must live on regardless.

System crashes are the most common problem.

We’ll worry about media failure later.

3

On restart, some transactions should be aborted, others must be durable.

crash!T1

T2

T3

T4

T5

T1, T2, T3 are should be durable.

T4, T5 should be aborted.

Recovery has a big impact on buffer management.

Force T’s writes to disk at commit time?– Poor response time.– If not, how do we

guarantee durability?

Steal dirty buffer pool pages from uncommitted Tns?– If not, poor throughput.– If so, what about

atomicity?

Force

No Force

No Steal Steal

Trivial

Desired

If T aborts, must undo T’s writes on disk!

The log helps us guarantee atomicity and durability.

Append-only file with all info needed to REDO or UNDO every write

Give it its own disk (why?)

6

Undo Logging

(force, steal)

7

Undo logs don’t need to save after-images

Log record types:<START T>

– transaction T has begun

<COMMIT T> – T has committed

<ABORT T>– T has aborted

<T, X, old_v>– T has updated element X, and its old value was

old_v

8

Undo logging has 2.5 rules.

U1: If T modifies X, then the log record <T, X, old_v> must be on disk before X is written to disk

U2: If T commits, then <COMMIT T> can’t be written to disk until all data changes by T are on disk (“early OUTPUTs”)

There may be many pages

to force, &

other Tns may want

them in memor

y

U2.5: Need to do the right thing when a transaction aborts (what?)

Buffer management

rule, not a logging rule

9

Crash recovery is easy with an undo log.1. Scan log, decide which transactions T

completed. <START T>….<COMMIT T>…. <START T>….<ABORT T>……. <START T>………………………

2. Starting from the end of the log, undo all modifications made by incomplete transactions.

The chance of crashing during recovery is relatively high!

But undo recovery is idempotent: just restart it if it crashes.

10

Detailed algorithm for undo log recovery

From the last entry in the log to the first:– <COMMIT T>: mark T as completed– <ABORT T>: mark T as completed– <T,X,v>: if T is not completed

then write X=v to disk else ignore

– <START T>: ignore

So how should we

handle ordinary

Tn aborts?

11

Undo recovery practice

…<T6,X6,v6>……<START T5><START T4><T1,X1,v1><T5,X5,v5><T4,X4,v4><COMMIT T5><T3,X3,v3><T2,X2,v2>

Which actions do we undo, in which order?

What could go wrong if we undid them in a different order?

12

Scanning a year-long log is SLOW and businesses lose money every minute their DB is down.

Solution: checkpoint the database periodically.

Easy version:

1. Stop accepting new transactions

2. Wait until all current transactions complete

3. Flush log to disk

4. Write a <CKPT> log record, flush

5. Resume transactions

13

During undo recovery, stop at first checkpoint.

……<T9,X9,v9>……(all completed)<CKPT><START T2><START T3<START T5><START T4><T1,X1,v1><T5,X5,v5><T4,X4,v4><COMMIT T5><T3,X3,v3><T2,X2,v2>

T2,T3,T4,T5

other transactions

14

This “quiescent checkpointing” isn’t good enough for 24/7 applications. Instead:

1. Write <START CKPT(T1,…,Tk)>,where T1,…,Tk are all active transactions

2. Continue normal operation

3. When all of T1,…,Tk have completed, write <END CKPT>

15

Example of undo recovery with nonquiescent checkpointing

…………

…<START CKPT T4, T5, T6>…………<END CKPT>………

T4, T5, T6, plus later transactions

earlier transactions plus T4, T5, T5

later transactions

What would go wrong if we didn’t use<END CKPT> ?

What would go wrong if we didn’t use<END CKPT> ?

16

Crash recovery algorithm with undo log, nonquiescent checkpoints.

1. Scan log backwards until the start of the latest completed checkpoint, deciding which transactions T completed. <START T>….<COMMIT T>…. <START T>….<ABORT T>……. <START CKPT {T…}>….<COMMIT T>…. <START CKPT {T…}>….<ABORT T>……. <START T>………………………

2. Starting from the end of the log, undo all modifications made by incomplete transactions.

17

Redo Logging

(no force, no steal)

18

Redo log entries are just slightly different from undo log entries.

<START T>

<COMMIT T>

<ABORT T>

<T, X, new_v> – T has updated element X, and its new value is

new_v

same as before

19

Redo logging has one rule.

R1: If T modifies X, then both <T, X, new_v> and <COMMIT T> must be written to disk before X is written to disk (“late OUTPUT”)

Don’t have to force all those

dirty data pages to disk

before committing!

Don’t steal dirty buffer pages

from uncommitted

tns!

Implicit and reasonable

assumption: log records reach disk in order;

otherwise terrible things will happen.

20

Recovery is easy with an undo log.

1. Decide which transactions T completed. <START T>….<COMMIT T>…. <START T>….<ABORT T>……. <START T>………………………

2. Read log from the beginning, redo all updates of committed transactions.

The chance of crashing during recovery is relatively high!

But REDO recovery is idempotent: just restart it if it crashes.

21

Example of redo recovery

<START T1><T1,X1,v1><START T2><T2, X2, v2><START T3><T1,X3,v3><COMMIT T2><T3,X4,v4><T1,X5,v5>……

Which actions do we redo, in which order?

What could go wrong if we redid them in a different order?

22

Nonquiescent checkpointing is trickier with a redo log than an undo log

1. Write a <START CKPT(T1,…,Tk)>where T1,…,Tk are the active transactions

2. Flush to disk all dirty data pages of transactions committed by the time the checkpoint started, while continuing normal operation

3. After that, write <END CKPT>

dirty = written

23

What exactly does “dirty” mean?

• When you are talking about buffer management and which buffers you can steal, a dirty page is a data page in memory that has been modified but not yet sent back to disk.

• When you are talking about concurrency control, a dirty page is a data page in memory that has been modified but not yet committed. A dirty read is a read of a dirty page.

Either way, the dirty pages are the ones that can get you in trouble.

24

Example of redo recovery with nonquiescent checkpointing

…<START T1>…<COMMIT T1>……<START CKPT T4, T5, T6>……<END CKPT>……<START CKPT T9, T10>…

1. Look forthe last<END CKPT>

2. Redo from <START T>, for committed T in {T4, T5, T6}.

3. Normal redo for committed Tns that started after this point.

All data written by T1 is known

to be on disk

25

But neither undo nor redo logging matches what we would like to have for buffer management

Force

No Force

No Steal Steal

Trivial

Desired

Undo Logging

Redo Logging

Use undo/redo logging to attain this

nirvana

26

Redo/undo logs save both before-images and after-images.

<START T> <COMMIT T> <ABORT T><T, X, old_v, new_v>

– T has written element X; its old value was old_v, and its new value is new_v

Undo/redo recovery has 1.5 rules.

1. Must force the log record for an update to disk before the corresponding data page goes to disk.

As usual, T committed iff <T

commits> is on disk

1.5: Need to do the right thing when a transaction aborts (what?)

Item X can be updated on disk once <T wrote X> is

on disk , before <T

commits> is on disk (i.e., early or late

OUTPUT)

“Write-ahead

logging”

28

Recovery is more complex with undo/redo logging.

1. Redo all committed transactions, starting at the beginning of the log

2. Undo all incomplete transactions, starting from the end of the log

<START T1><T1,X1,v1><START T2><T2, X2, v2><START T3><T1,X3,v3><COMMIT T2><T3,X4,v4><T1,X5,v5>……

REDO

UNDO

“incomplete” = started &

not committed or aborted

How do we know these undos won’t undo some committed

writes?

29

Algorithm for non-quiescent checkpoint for undo/redo

1. Write <start checkpoint, list of all active transactions> to log

2. Flush log to disk3. Write to disk all dirty buffers,

whether or not their transaction has committed(this implies some log records may

need to be written to disk (WAL))

4. Write <end checkpoint> to log

5. Flush log to disk29

Flush dirty

buffer pool

pages

…

<start checkpoint, active Tns are T1, T2, …>

…

<end checkpoint>

…

Active

Tns

Pointers are one of

many tricks to speed up future

undos

UNDO

30

Algorithm for undo/redo recovery with nonquiescent checkpoint 1. Backwards undo pass (end of log to

start of last completed checkpoint)

a. C = transactions that committed after the checkpoint started

b. Undo actions of transactions that (are in A or started after the checkpoint started) and (are not in C)

2. Undo remaining actions by incomplete transactionsa. Follow undo chains for transactions in

(checkpoint active list) – C

3. Forward pass (start of last completed checkpoint to end of log)

a. Redo actions of transactions in C

Active

Tns…

<start checkpoint, A=active Tns>

…<end checkpoint>

…

REDO

S

31

Examples what to do at recovery time?

no <T1 commit>

Undo T1 (undo A, B, C)

…T1 wrote A, ……checkpoint start (T1 active)

…T1 wrote B, ……checkpoint end…T1 wrote C, ……

32

Redo T1: (redo B, C)

…T1 wrote A, ……checkpoint start (T1 active)

…T1 wrote B, ……checkpoint end…T1 wrote C, ……T1 commit

Examples what to do at recovery time?

33

Real world actions

E.g., dispense cash at ATM

Ti = a1…... aj …... an

$

“Solution”:

(1) try to make idempotent

(2) execute real-world actions after commit

Why are these a problem from a

DB perspecti

ve?

34

PHYSICAL DISASTERS

35

These recovery algorithms won’t help you if your disk fails.

Solution: careful replication!

36

Example 1 Triple modular redundancy

Keep 3 copies on separate disks• Output(X) --> three outputs• Input(X) --> three inputs + vote

Copy 1 Copy 2 Copy 3

37

Example 2 Redundant writes, single reads

Keep N copies on separate disks• Output(X) --> N outputs• Input(X) --> Input one copy

- if ok, done

- else try another one

Assumes bad data can be

detected (traditional but false)

Copy 1Copy 1Copy 1Copy 1Copy 1Copy 1

38

Example 3: DB dump + log

backup

databaseactive

databaselog

If active database is lost,– restore active database from backup– bring up-to-date using redo entries in log

39

When can log be discarded?

check-

pointdb

dump

last

needed

undo

not needed for

media recovery

not needed for undo

after system failure

not needed for

redo after system failure

log

time

The real picture: what’s stored where

DB

Data pageseach with a pageLSN

(LSN of last write to that data page)

Xact TablelastLSN

status

Dirty Page TablerecLSN

flushedLSN

RAM

prevLSNXIDtype

lengthpageID

offsetbefore-imageafter-image

LSN (log sequence number)

LogRecords

LOG

Master record

Summary of Logging/Recovery

• Recovery manager guarantees atomicity & durability---two of the ACID properties.

• Redo logging and undo logging are simple but make the system too slow in practice for serious applications.

• Use write-ahead logging with undo/redo logging to speed up the system (by allowing STEAL/NO-FORCE) without sacrificing correctness.

Summary, Cont.

• Checkpointing: A quick way to limit the amount of log to scan on recovery. Nonquiescent checkpoints are especially useful.

• Recovery works in 3 phases:– Analysis: Forward from checkpoint.– Redo: Forward.– Undo: Backward.

1 cs411 database systems 12: recovery obama and eric schmidt sysadmin song

Documents

log slide

undo log recovery

flush log

scan log

v slide

needed undo

undo recovery practice

system failure log time