cse544 transactions: recovery

71
1 CSE544 Transactions: Recovery Thursday, January 27, 2011 Dan Suciu -- 544, Winter 2011

Upload: kelvin

Post on 22-Feb-2016

36 views

Category:

Documents


0 download

DESCRIPTION

CSE544 Transactions: Recovery. Thursday, January 27, 2011. DB. Buffer Management in a DBMS. Application (Database server). READ WRITE. BUFFER POOL. disk page. free frame. INPUT OUTUPT. Large gap between disk I/O and memory  Buffer pool. Page Replacement Policies. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CSE544 Transactions:  Recovery

1

CSE544Transactions: Recovery

Thursday, January 27, 2011

Dan Suciu -- 544, Winter 2011

Page 2: CSE544 Transactions:  Recovery

2

Buffer Management in a DBMS

DB

disk page

free frame

BUFFER POOLREADWRITE

INPUTOUTUPT

Application(Database server)

Large gap between disk I/O and memory Buffer pool

Page 3: CSE544 Transactions:  Recovery

Page Replacement Policies

• LRU = expensive– Next slide

• Clock algorithm = cheaper alternative– Read in the book

Both work well in OS, but not always in DB

Dan Suciu -- 544, Winter 2011 3

Page 4: CSE544 Transactions:  Recovery

Least Recently Used (LRU)

Dan Suciu -- 544, Winter 2011 4

P5, P2, P8, P4, P1, P9, P6, P3, P7Read(P6)

??

Most recent Least recent

Page 5: CSE544 Transactions:  Recovery

Least Recently Used (LRU)

Dan Suciu -- 544, Winter 2011 5

P5, P2, P8, P4, P1, P9, P6, P3, P7Read(P6)

P6, P5, P2, P8, P4, P1, P9, P3, P7

Most recent Least recent

Page 6: CSE544 Transactions:  Recovery

Least Recently Used (LRU)

Dan Suciu -- 544, Winter 2011 6

P5, P2, P8, P4, P1, P9, P6, P3, P7Read(P6)

P6, P5, P2, P8, P4, P1, P9, P3, P7

Read(P10)

??

Most recent Least recent

Page 7: CSE544 Transactions:  Recovery

Least Recently Used (LRU)

Dan Suciu -- 544, Winter 2011 7

P5, P2, P8, P4, P1, P9, P6, P3, P7Read(P6)

P6, P5, P2, P8, P4, P1, P9, P3, P7

Input(P10)

P10, P6, P5, P2, P8, P4, P1, P9, P3

Read(P10)

Most recent Least recent

Page 8: CSE544 Transactions:  Recovery

8

Atomic Transactions

• FORCE or NO-FORCE– Should all updates of a transaction be forced to

disk before the transaction commits?• STEAL or NO-STEAL

– Can an update made by an uncommitted transaction overwrite the most recent committed value of a data item on disk?

Dan Suciu -- 544, Winter 2011 Performance (e.g. LRU): NO-FORCE+STEAL

Atomicity: FORCE + NO-STEAL

Page 9: CSE544 Transactions:  Recovery

9

Atomic Transactions

• NO-FORCE: Pages of committed transactions not yet written to disk

• STEAL: Pages of uncommitted transactions already written to disk

Dan Suciu -- 544, Winter 2011 In either case, Atomicity is violated

Page 10: CSE544 Transactions:  Recovery

10

Notations• READ(X,t)

– copy element X to transaction local variable t• WRITE(X,t)

– copy transaction local variable t to element X

• INPUT(X)– read element X to memory buffer

• OUTPUT(X)– write element X to disk

Dan Suciu -- 544, Winter 2011

Page 11: CSE544 Transactions:  Recovery

11

Recovery

Write-ahead log = A file that records every single action of all running transactions

• Force log entry to disk

• After a crash, transaction manager reads the log and finds out exactly what the transactions did or did not

Dan Suciu -- 544, Winter 2011

Page 12: CSE544 Transactions:  Recovery

12

Example

Atomicity:Both A and Bare multiplied by 2,or none is.

Dan Suciu -- 544, Winter 2011

START TRANSACTION

READ(A,t); t := t*2;WRITE(A,t); READ(B,t); t := t*2;WRITE(B,t)COMMIT;

Page 13: CSE544 Transactions:  Recovery

13

Action t Mem A Mem B Disk A Disk B

INPUT(A) 8 8 8

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8

INPUT(B) 16 16 8 8 8

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16

COMMIT

Buffer pool DiskTransaction

READ(A,t); t := t*2; WRITE(A,t); READ(B,t); t := t*2; WRITE(B,t)

Page 14: CSE544 Transactions:  Recovery

Is this bad ?

Crash !

Action t Mem A Mem B Disk A Disk B

INPUT(A) 8 8 8

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8

INPUT(B) 16 16 8 8 8

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16

COMMIT

Page 15: CSE544 Transactions:  Recovery

Is this bad ?

Crash !

Action t Mem A Mem B Disk A Disk B

INPUT(A) 8 8 8

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8

INPUT(B) 16 16 8 8 8

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16

COMMIT

Yes it’s bad: A=16, B=8….

Page 16: CSE544 Transactions:  Recovery

Is this bad ?

Crash !

Action t Mem A Mem B Disk A Disk B

INPUT(A) 8 8 8

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8

INPUT(B) 16 16 8 8 8

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16

COMMIT

Page 17: CSE544 Transactions:  Recovery

Is this bad ?

Crash !

Action t Mem A Mem B Disk A Disk B

INPUT(A) 8 8 8

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8

INPUT(B) 16 16 8 8 8

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16

COMMIT

Yes it’s bad: A=B=16, but not committed

Page 18: CSE544 Transactions:  Recovery

Is this bad ?

Crash !

Action t Mem A Mem B Disk A Disk B

INPUT(A) 8 8 8

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8

INPUT(B) 16 16 8 8 8

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16

COMMIT

Page 19: CSE544 Transactions:  Recovery

Is this bad ?

Crash !

Action t Mem A Mem B Disk A Disk B

INPUT(A) 8 8 8

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8

INPUT(B) 16 16 8 8 8

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16

COMMIT

No: that’s OK

Page 20: CSE544 Transactions:  Recovery

20

UNDO Log

Dan Suciu -- 544, Winter 2011

Page 21: CSE544 Transactions:  Recovery

21

Action t Mem A Mem B Disk A Disk B UNDO Log

<START T>

INPUT(A) 8 8 8

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8 <T,A,8>

INPUT(B) 16 16 8 8 8

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8 <T,B,8>

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16

COMMIT <COMMIT T>

Page 22: CSE544 Transactions:  Recovery

22

Crash !

WHAT DO WE DO ?

Action t Mem A Mem B Disk A Disk B UNDO Log

<START T>

INPUT(A) 8 8 8

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8 <T,A,8>

INPUT(B) 16 16 8 8 8

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8 <T,B,8>

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16

COMMIT <COMMIT T>

Page 23: CSE544 Transactions:  Recovery

23

Crash !

WHAT DO WE DO ?

Action t Mem A Mem B Disk A Disk B UNDO Log

<START T>

INPUT(A) 8 8 8

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8 <T,A,8>

INPUT(B) 16 16 8 8 8

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8 <T,B,8>

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16

COMMIT <COMMIT T>

We UNDO by setting B=8 and A=8

Page 24: CSE544 Transactions:  Recovery

24Crash !

Action t Mem A Mem B Disk A Disk B UNDO Log

<START T>

INPUT(A) 8 8 8

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8 <T,A,8>

INPUT(B) 16 16 8 8 8

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8 <T,B,8>

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16

COMMIT <COMMIT T>

What do we do now ?

Page 25: CSE544 Transactions:  Recovery

25Crash !

Action t Mem A Mem B Disk A Disk B UNDO Log

<START T>

INPUT(A) 8 8 8

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8 <T,A,8>

INPUT(B) 16 16 8 8 8

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8 <T,B,8>

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16

COMMIT <COMMIT T>

What do we do now ? Nothing: log contains COMMIT

Page 26: CSE544 Transactions:  Recovery

Action t Mem A Mem B Disk A Disk B UNDO Log

<START T>

INPUT(A) 8 8 8

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8 <T,A,8>

INPUT(B) 16 16 8 8 8

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8 <T,B,8>

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16

COMMIT <COMMIT T>

When mustwe force pagesto disk ?

Page 27: CSE544 Transactions:  Recovery

27

Action t Mem A Mem B Disk A Disk B UNDO Log

<START T>

INPUT(A) 8 8 8

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8 <T,A,8>

INPUT(B) 16 16 8 8 8

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8 <T,B,8>

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16

COMMIT <COMMIT T>

RULES: log entry before OUTPUT before COMMIT

Page 28: CSE544 Transactions:  Recovery

28

REDO Log

Dan Suciu -- 544, Winter 2011

Page 29: CSE544 Transactions:  Recovery

29

Action t Mem A Mem B Disk A Disk B

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8

COMMIT

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16Crash !

Is this bad ?

Page 30: CSE544 Transactions:  Recovery

30

Action t Mem A Mem B Disk A Disk B

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8

COMMIT

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16Crash !

Is this bad ? Yes, it’s bad: A=16, B=8

Page 31: CSE544 Transactions:  Recovery

31

Action t Mem A Mem B Disk A Disk B

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8

COMMIT

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16

Crash !

Is this bad ?

Page 32: CSE544 Transactions:  Recovery

32

Action t Mem A Mem B Disk A Disk B

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8

COMMIT

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16

Crash !

Is this bad ? Yes, it’s bad: T committed but A=B=8

Page 33: CSE544 Transactions:  Recovery

33

Action t Mem A Mem B Disk A Disk B REDO Log

<START T>

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8 <T,A,16>

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8 <T,B,16>

COMMIT <COMMIT T>

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16

Page 34: CSE544 Transactions:  Recovery

34

Action t Mem A Mem B Disk A Disk B REDO Log

<START T>

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8 <T,A,16>

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8 <T,B,16>

COMMIT <COMMIT T>

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16Crash !

How do we recover ?

Page 35: CSE544 Transactions:  Recovery

35

Action t Mem A Mem B Disk A Disk B REDO Log

<START T>

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8 <T,A,16>

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8 <T,B,16>

COMMIT <COMMIT T>

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16Crash !

How do we recover ? We REDO by setting A=16 and B=16

Page 36: CSE544 Transactions:  Recovery

36

Action t Mem A Mem B Disk A Disk B REDO Log

<START T>

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8 <T,A,16>

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8 <T,B,16>

COMMIT <COMMIT T>

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16

When mustwe force pagesto disk ?

Page 37: CSE544 Transactions:  Recovery

37

Action t Mem A Mem B Disk A Disk B REDO Log

<START T>

READ(A,t) 8 8 8 8

t:=t*2 16 8 8 8

WRITE(A,t) 16 16 8 8 <T,A,16>

READ(B,t) 8 16 8 8 8

t:=t*2 16 16 8 8 8

WRITE(B,t) 16 16 16 8 8 <T,B,16>

COMMIT <COMMIT T>

OUTPUT(A) 16 16 16 16 8

OUTPUT(B) 16 16 16 16 16

RULE: OUTPUT after COMMIT

Page 38: CSE544 Transactions:  Recovery

38

Comparison Undo/Redo

• Undo logging: OUTPUT must be done early: – Inefficient

• Redo logging: OUTPUT must be done late: – Inflexible

Dan Suciu -- 544, Winter 2011

Page 39: CSE544 Transactions:  Recovery

39

Checkpointing

• To ensure recovery we must read from the beginning of the log: this is too inefficient

• Checkpointing: periodically write information to the log to allow us to process only the tail of the log

Dan Suciu -- 544, Winter 2011

Page 40: CSE544 Transactions:  Recovery

40

ARIES Recovery Manager

• A redo/undo log• Physiological logging

– Physical logging for REDO– Logical logging for UNDO

• Efficient checkpointing

Dan Suciu -- 544, Winter 2011

Why ?

Page 41: CSE544 Transactions:  Recovery

41

ARIES Recovery Manager

Log entryies:• <START T> -- when T begins• Update: <T,X,u,v>

– T updates X, old value=u, new value=v– In practice: undo only and redo only entries

• <COMMIT T> or <ABORT T>• CLR’s – we’ll talk about them later.

Dan Suciu -- 544, Winter 2011

Page 42: CSE544 Transactions:  Recovery

42

ARIES Recovery Manager

Rule:• If T modifies X, then <T,X,u,v> must be

written to disk before OUTPUT(X)

We are free to OUTPUT early or late

Dan Suciu -- 544, Winter 2011

Page 43: CSE544 Transactions:  Recovery

43

LSN = Log Sequence Number• LSN = identifier of a log entry

– Log entries belonging to the same txn are linked

• Each page contains a pageLSN:– LSN of log record for latest update to that page– Will serve to determine if an update needs to be

redone

Dan Suciu -- 544, Winter 2011

Page 44: CSE544 Transactions:  Recovery

44

ARIES Data Structures• Active Transactions Table

– Lists all running txns (active txns)– For each txn: lastLSN = most recent update by txn

• Dirty Page Table– Lists all dirty pages– For each dirty page: recoveryLSN (recLSN)= first LSN

that caused page to become dirty• Write Ahead Log

– LSN, prevLSN = previous LSN for same txn

Dan Suciu -- 544, Winter 2011

Page 45: CSE544 Transactions:  Recovery

ARIES Data Structures

pageID recLSNP5 102

P6 103

P7 101

LSN prevLSN transID pageID Log entry101 - T100 P7

102 - T200 P5

103 102 T200 P6

104 101 T100 P5

Dirty pages Log (WAL)

transID lastLSNT100 104

T200 103

Active transactionsP8 P2 . . .

. . .

P5PageLSN=104

P6PageLSN=103

P7PageLSN=101

Buffer Pool

WT100(P7)WT200(P5)WT200(P6)WT100(P5)

Page 46: CSE544 Transactions:  Recovery

46

ARIES Normal Operation

T writes page P• What do we do ?

Dan Suciu -- 544, Winter 2011

Page 47: CSE544 Transactions:  Recovery

47

ARIES Normal Operation

T writes page P• What do we do ?

• Write <T,P,u,v> in the Log• pageLSN=LSN• lastLSN=LSN• recLSN=if isNull then LSN

Dan Suciu -- 544, Winter 2011

Page 48: CSE544 Transactions:  Recovery

48

ARIES Normal Operation

Buffer manager wants to OUTPUT(P)• What do we do ?

Buffer manager wants INPUT(P)• What do we do ?

Dan Suciu -- 544, Winter 2011

Page 49: CSE544 Transactions:  Recovery

49

ARIES Normal Operation

Buffer manager wants to OUTPUT(P)• Flush log up to pageLSN• Remove P from Dirty Pages tableBuffer manager wants INPUT(P)• Create entry in Dirty Pages table

recLSN = NULL

Dan Suciu -- 544, Winter 2011

Page 50: CSE544 Transactions:  Recovery

50

ARIES Normal Operation

Transaction T starts• What do we do ?

Transaction T commits/aborts• What do we do ?

Dan Suciu -- 544, Winter 2011

Page 51: CSE544 Transactions:  Recovery

51

ARIES Normal Operation

Transaction T starts• Write <START T> in the log• New entry T in Active TXN;

lastLSN = nullTransaction T commits/aborts• Write <COMMIT T>• Flush log up to this entry

Dan Suciu -- 544, Winter 2011

Page 52: CSE544 Transactions:  Recovery

52

Checkpoints

Write into the log

• Entire active transactions table• Entire dirty pages table

Dan Suciu -- 544, Winter 2011

Recovery always starts by analyzing latest checkpoint

Background process periodically flushes dirty pages to disk

Page 53: CSE544 Transactions:  Recovery

53

ARIES Recovery1. Analysis pass

– Figure out what was going on at time of crash– List of dirty pages and active transactions

2. Redo pass (repeating history principle)– Redo all operations, even for transactions that will not commit– Get back to state at the moment of the crash

3. Undo pass– Remove effects of all uncommitted transactions– Log changes during undo in case of another crash during undo

Dan Suciu -- 544, Winter 2011

Page 54: CSE544 Transactions:  Recovery

54

ARIES Method Illustration

[Figure 3 from Franklin97]Dan Suciu -- 544, Winter 2011

First undo and first redo log entry might bein reverse order

Page 55: CSE544 Transactions:  Recovery

55

1. Analysis Phase• Goal

– Determine point in log where to start REDO– Determine set of dirty pages when crashed

• Conservative estimate of dirty pages– Identify active transactions when crashed

• Approach– Rebuild active transactions table and dirty pages table– Reprocess the log from the checkpoint

• Only update the two data structures– Compute: firstLSN = smallest of all recoveryLSN

Dan Suciu -- 544, Winter 2011

Page 56: CSE544 Transactions:  Recovery

1. Analysis Phase(crash)Checkpoint

Dirtypages

Activetxn

Log

pageID recLSN pageID

transID lastLSN transID

firstLSN= ??? Where do we startthe REDO phase ?

Page 57: CSE544 Transactions:  Recovery

1. Analysis Phase(crash)Checkpoint

Dirtypages

Activetxn

Log

pageID recLSN pageID

transID lastLSN transID

firstLSN=min(recLSN)

Page 58: CSE544 Transactions:  Recovery

1. Analysis Phase(crash)Checkpoint

Dirtypages

Activetxn

Log

pageID recLSN pageID

transID lastLSN transID

pageID recLSN pageID

transID lastLSN transID

Replayhistory

firstLSN

Page 59: CSE544 Transactions:  Recovery

59

2. Redo Phase

Main principle: replay history• Process Log forward, starting from

firstLSN• Read every log record, sequentially• Redo actions are not recorded in the log• Needs the Dirty Page Table

Dan Suciu -- 544, Winter 2011

Page 60: CSE544 Transactions:  Recovery

60

2. Redo Phase: Details

For each Log entry record LSN: <T,P,u,v>• Re-do the action P=u and WRITE(P)• But which actions can we skip, for

efficiency ?

Dan Suciu -- 544, Winter 2011

Page 61: CSE544 Transactions:  Recovery

61

2. Redo Phase: Details

For each Log entry record LSN: <T,P,u,v>• If P is not in Dirty Page then no update• If recLSN > LSN, then no update• INPUT(P) (read page from disk):

If pageLSN > LSN, then no update• Otherwise perform update

Dan Suciu -- 544, Winter 2011

Page 62: CSE544 Transactions:  Recovery

62

2. Redo Phase: Details

What happens if system crashes during REDO ?

Dan Suciu -- 544, Winter 2011

Page 63: CSE544 Transactions:  Recovery

63

2. Redo Phase: Details

What happens if system crashes during REDO ?

We REDO again ! Each REDO operation is idempotent: doing it twice is the as as doing it once.

Dan Suciu -- 544, Winter 2011

Page 64: CSE544 Transactions:  Recovery

64

3. Undo Phase

Main principle: “logical” undo• Start from end of Log, move backwards• Read only affected log entries• Undo actions are written in the Log as special

entries: CLR (Compensating Log Records)• CLRs are redone, but never undone

Dan Suciu -- 544, Winter 2011

Page 65: CSE544 Transactions:  Recovery

65

3. Undo Phase: Details• “Loser transactions” = uncommitted

transactions in Active Transactions Table

• ToUndo = set of lastLSN of loser transactions

Dan Suciu -- 544, Winter 2011

Page 66: CSE544 Transactions:  Recovery

66

3. Undo Phase: Details

While ToUndo not empty:• Choose most recent (largest) LSN in ToUndo• If LSN = regular record <T,P,u,v>:

– Undo v– Write a CLR where CLR.undoNextLSN = LSN.prevLSN

• If LSN = CLR record:– Don’t undo !

• if CLR.undoNextLSN not null, insert in ToUndootherwise, write <END TRANSACTION> in log

Dan Suciu -- 544, Winter 2011

Page 67: CSE544 Transactions:  Recovery

67

3. Undo Phase: Details

[Figure 4 from Franklin97]

Dan Suciu -- 544, Winter 2011

Page 68: CSE544 Transactions:  Recovery

68

3. Undo Phase: Details

What happens if system crashes during UNDO ?

Dan Suciu -- 544, Winter 2011

Page 69: CSE544 Transactions:  Recovery

69

3. Undo Phase: Details

What happens if system crashes during UNDO ?

We do not UNDO again ! Instead, each CLR is a REDO record: we simply redo the undo

Dan Suciu -- 544, Winter 2011

Page 70: CSE544 Transactions:  Recovery

70

Physical v.s. Logical Loging

Why are redo records physical ?

Why are undo records logical ?

Dan Suciu -- 544, Winter 2011

Page 71: CSE544 Transactions:  Recovery

71

Physical v.s. Logical Loging

Why are redo records physical ?• Simplicity: replaying history is easy, and

idempotent

Why are undo records logical ?• Required for transaction rollback: this not

“undoing history”, but selective undo

Dan Suciu -- 544, Winter 2011