concurrent algorithms memory · non-volatile. obstacle #2: (re-)ordering 49 processor caches...

70
Concurrent Algorithms & Memory Concurrent Algorithms Fall 2018 Igor Zablotchi [Some slides courtesy of Tudor David]

Upload: others

Post on 18-Oct-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Concurrent Algorithms&

MemoryConcurrent Algorithms

Fall 2018Igor Zablotchi

[Some slides courtesy of Tudor David]

Page 2: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Introduction• This lecture is about memory in how it relates to

concurrent computing• So far, we have assumed that memory is:• Infinite• Volatile

• These assumptions need not be true:• Infinite -> Finite -> Memory reclamation• Volatile -> Persistent

• Both topics of ongoing research (my thesis)

2

Page 3: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Concurrent Data Structures

3

Lists

Trees

Hash tables

Skip lists

Page 4: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Part 1Concurrent Memory Reclamation

4

Page 5: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

What is Memory Reclamation (MR)?

• Applications need memory• Most realistic applications grow and shrink in

memory• Grow = allocate memory• Shrink = free no-longer-useful memory

5

Page 6: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

What is Memory Reclamation (MR)?

6

ds = new_data_structure(…);node n = new_node(…);insert(ds, n);// use n in some wayremove(ds,n);

Need to free n!

Page 7: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Freeing Memory is Necessary

• Otherwise, applications might run out of memory or use too much memory

7

Page 8: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Automatic Garbage Collection• Some languages (e.g., Java) have automatic

memory management• Memory is allocated & freed without explicit

programmer intervention• Garbage collector decides automatically when a

pointer should be freed

8

Page 9: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Explicit Memory Management• Other languages (e.g., C, C++) require the

programmer to allocate & free memory explicitly• Programmer needs to determine when to free

some memory location• This is our focus for this class

9

Page 10: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

1-process MR is Easy• Allocate some memory• Use it• Free after last use

10

Page 11: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

1-process MR is Easy

O1

Process P2

O2 O3 ……

Use O1Remove O1

Free O1

11

Page 12: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Concurrent MR is Difficult• No easy way for a process to determine if a

memory location will be used later by a different process

12

Page 13: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Concurrent MR is Difficult

O1

Process P1 Process P2

O2 O3 ……

Use O1Remove O1

13

Page 14: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Concurrent MR is Difficult

About to read O1

O1

Process P1 Process P2

O2 O3 ……

14

Page 15: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Concurrent MR is Difficult

O1

Process P1 Process P2

O2 O3 ……

Free O1 ?About to read O1

15

Page 16: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Concurrent MR is Difficult

O1

Process P1 Process P2

O2 O3 ……

Free O1 !About to read O1

16

Page 17: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Concurrent MR is Difficult

O1

Process P1 Process P2

O2 O3 ……

Error!

17

Page 18: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Take-away So Far• Memory reclamation = deciding when to free

memory• Necessary:• Most applications need to allocate + free• C, C++ are here to stay• No MR → excessive memory use

• Challenging (concurrent case):• Need a way to determine when all processes are done

with some memory location

18

Page 19: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

A Few MR Techniques

• Lock-free Reference Counting

• Hazard Pointers

• Epoch-Based Reclamation

19

Page 20: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Lock-free Reference Counting• Main idea: • For each memory location, keep track of how many

references are held to it.• When there are 0 references, safe to reclaim.

20

Page 21: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

LFRC Example

O1 O2 O3 ……

1 1 1Reference count

A linked list. No process has references. Each node has reference count = 1 (the reference from the

previous node in the list).

21

Page 22: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

LFRC Example

O1

Process P2

O2 O3 ……

2 1 1

A thread is reading. The node that the thread is currently looking at has reference count = 2. 22

Page 23: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

LFRC Example

O1

Process P2

O2 O3 ……

1 2 1

A thread is reading. The node that the thread is currently looking at has reference count = 2. 23

Page 24: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

LFRC Example

O1

Process P2

O2 O3 ……

1 1 2

A thread is reading. The node that the thread is currently looking at has reference count = 2. 24

Page 25: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

LFRC Example

O1 O2

O3

……

1 1

1

A thread has removed node O3 from the list. O3 now has reference count = 1 (the reference from the thread). 25

Process P2

Page 26: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

LFRC Example

O1 O2

O3

……

1 1

0

The thread has released its reference to O3. O3 now has 0 references. Its memory can be freed. 26

Page 27: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Pros and cons of LFRC✓ Lock-free (wait-free version exists)✓ Easy to understand & implement

✘ Need to update reference counter on every access, even if read-only → bad performance

✘ Update of reference counter requires expensive atomic instructions → extremely bad performance!

27

Page 28: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Hazard Pointers (HP)• Main idea: • Each process announces memory locations it plans to

access: hazard pointers• Processes only free memory that is not protected by

hazard pointers

28

Page 29: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Hazard Pointers (HP)

29

O1

Process P1 Process P2

O2 O3 ……

Page 30: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Hazard Pointers (HP)

30

O1

Process P1 Process P2

O2 O3 ……

HP

Don’t free O1,I’m about to

use it.

Page 31: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Hazard Pointers (HP)

31

HP

O1

Process P1 Process P2

O2 O3 ……Don’t free O1,I’m about to

use it. I’d better not free O1, T1 is

using it.

Page 32: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

HP – More Details0. Reachability• Reachable node = can be found by following pointers

from data structure root(s)

32

O1 O3

O2

Before inserting → O2 not yet reachable

O1 O3O2

In the data structure ⬄O2 reachable

O1 O3

O2

After deletion→ O2 no longer reachable

Page 33: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

HP – More Details1. Announcing hazard pointers

33

Without hazard pointers With hazard pointers

1. Read a reference p2. Do something with p3. (Release reference to p)

1. Read a reference p2. HP = p // protect p3. Check if p is still

reachable. If yes, continue, otherwise restart operation.

4. Do something with p5. (Release reference to p)

Page 34: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

HP – More Details2. Deleting elements

• Each process has a “limbo list” containing nodes that have been deleted but not yet freed• After process pideletes a node n from the data

structure, it adds n to pi’s limbo list

34

Page 35: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

HP – More Details3. Reclaiming memory

• When the limbo list grows to a certain size R, pi initiates a scan:• For each node n in the limbo list:

• Look at HPs of all processes. Is any of them pointing to n?• If not, free n’s memory• (If yes, do nothing)

35

Page 36: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Pros and Cons of HP

✓ Limits memory use✓ Lock-free

✘ Need to update HP on every access, even if read-only → bad performance

✘ Complex to implement & use → prone to errors

36

Page 37: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Epoch-based Reclamation (EBR)• Main idea:• Processes keep track of each other’s progress• After deleting an object, when all processes have made

enough progress, memory can be freed

37

Page 38: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

EBR, Step by Step• Step 1: processes declare when they enter & exit

critical sections

38

// codeenter_critical_section();// more codeexit_critical_section();// even more code

Here, we may access “dangerous” memory

(memory that can be freed)

Here, only safe memory accesses are allowed

(memory that is never freed)

Page 39: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

EBR, Step by Step• Step 2: each process has an epoch (an integer,

initially 0). The epoch is incremented by 1 when entering and exiting a critical section.

→ epoch is odd if inside critical section and even otherwise39

// codeenter_critical_section();// more codeexit_critical_section();// even more code

epoch = 0

epoch = 1

epoch = 2

Page 40: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

EBR, Step by Step• Step 3: After deleting an element, add it to a per-

process limbo list, together with current epochs of all processes

40

O1 1 3 4 2 5 7,O3 3 4 8 2 7 7,

Limbo list

Node epoch vectorNode

Page 41: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

EBR, Step by Step• Step 4: Periodically scan limbo list

41

Scan:• cur_vec = current epoch vector• For each node n in the limbo list:

• node_vec = n’s epoch vector• For each process i:

• if node_vec[i] is odd• if node_vec[i] >= cur_vec[i]

• Continue to next node• Free node

Page 42: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

EBR, Step by Step• Step 4: Periodically scan limbo list

42

O3 3 4 8 2 7 7,

Only care about odd entries (processes inside crit. sec.)! Processes outside crit. sec.

cannot access this node.

Scan:• cur_vec = current epoch vector• For each node n in the limbo list:

• node_vec = n’s epoch vector• For each process i:

• if node_vec[i] is odd• if node_vec[i] >= cur_vec[i]

• Continue to next node• Free node

Page 43: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

EBR, Step by Step• Step 4: Periodically scan limbo list

43

O3 3 4 8 2 7 7,

5 4 8 4 9 8

OK to reclaim!

Scan:• cur_vec = current epoch vector• For each node n in the limbo list:

• node_vec = n’s epoch vector• For each process i:

• if node_vec[i] is odd• if node_vec[i] >= cur_vec[i]

• Continue to next node• Free node

Current Epoch vector

Page 44: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

EBR, Step by Step• Step 4: Periodically scan limbo list

44

O3 3 4 8 2 7 7,

3 4 8 4 9 9

Not OK to reclaim!

Scan:• cur_vec = current epoch vector• For each node n in the limbo list:

• node_vec = n’s epoch vector• For each process i:

• if node_vec[i] is odd• if node_vec[i] >= cur_vec[i]

• Continue to next node• Free node

Current Epoch vector

Page 45: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Pros and Cons of EBR

✓ Small overhead → very good performance✓ Easy to use

✘ Blocking (not lock-free) → can invalidate lock- or wait-freedom of data structure→ if some process is delayed inside a critical section,

memory cannot be reclaimed any more

45

Page 46: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Part 2Persistent Memory

46

Page 47: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

What Is Persistent Memory?

47

Access times ~ RAM

Byte 42

Byte 43

Byte-addressability

Durability in the face of crashes & recoveries

☞ Concurrent data structures for PM

Page 48: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Obstacle #1: Caches are Volatile

48

ProcessorCaches

Persistent Memory

Volatile Non-Volatile

Page 49: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Obstacle #2: (Re-)ordering

49

ProcessorCaches

Persistent Memory

Page 50: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Obstacles Illustrated

50

1: mark memory as allocated2: initialize memory3: change link of node 14: change link of node 25: done = 1

Write-back cache:1: mark allocation2: initialize mem3: change link 14: change link 25: done = 1

NV memory:

3: change link 1

5: done = 1

crash

Upon restart: incorrect state

Page 51: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Obstacles Illustrated

51

Write-back cache:1: mark allocation2: initialize mem3: change link 14: change link 25: done = 1

NV memory:

3: change link 1

5: done = 1

crash

Upon restart: incorrect state

1: mark memory as allocated2: persist allocation3: initialize memory4: persist memory content5: change link of node 16: persist new link7: change link of node 28: persist modified link9: done = 1

Page 52: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Obstacles Illustrated

52

Write-back cache:1: mark allocation2: initialize mem3: change link 14: change link 25: done = 1

NV memory:1: mark allocation2: initialize mem3: change link 1crash

Upon restart: incomplete operation

1: mark memory as allocated2: persist allocation3: initialize memory4: persist memory content5: change link of node 16: persist new link7: change link of node 28: persist modified link9: done = 1

Page 53: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Common Solution: Logging

53

1: log[0] = starting transaction X2: persist log[0]3: log[1] = allocating a node at address A4: persist log[1]5: mark memory as allocated6: persist allocation7: initialize memory8: persist memory content9: log[2] = previous value of link10: persist log[2]11: change link 112: persist modified link13: log[3] = previous value of link14: persist log[3]15: change link 216: persist modified link17: done = 118: persist done19: mark transaction X as finished

Frequent waiting for data to be persisted

Page 54: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

The Problem with Logging• Logging -> frequent waiting • slows down data structure performance

• Data structure performance is essential to overall system performance

54

The solution: reduce (or eliminate) logging

Page 55: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Log-free Data Structures• The main idea: use lock-free algorithms • They never leave the structure in an inconsistent state• No need for logging in the data structure algorithm

55

Page 56: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Detour: Durable Linearizability• After a restart, the structure reflects:• all operations completed (linearized) before the crash;• (potentially) some operations that were ongoing when

the crash occurred;

56

persist

1. Persistently allocate and initialize node2. Add link to new node3. Persist link to new node

If crash between steps 2 and 3,

violation of durable linearizability

Page 57: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Log-free Data Structures

57

persist

1. Persistently allocate and initialize node2. Add marked link to new node3. Persist link to new node4. Remove mark

Other threads - persist marked link if needed

Link-and-persist: atomic “modify” and “persist” link

Page 58: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Going Further: Batching

58

CLWB ACLWB B

CLWB C

Batching write-backs: beneficial for performance

time

cache line write-back

store fence

Page 59: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Going Further: Batching• A link only needs to be persisted when an operation

depends on it• Store all un-persisted links in a fast concurrent cache• When an operation directly depends on a link in the cache:

batch write-backs of all links in the cache (and empty the cache)

59

key 1 link addr1

key z link addr z

key y link addr y

Insert(X) X link addr X

Read(X)

…write-back all links

link cache

Page 60: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

YouCan’t Eliminate Fences• For any lock-free concurrent implementation of a

persistent object• there exists an execution E such that• in E, every update operation performs at least 1

persistent fence

60

Page 61: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Lower Bound: Sequential Case

61

p1

p2

p3

update

update

update

Page 62: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Lower Bound: Sequential Case

62

p1 ✘

p2 ✘

p3 ✘

update

crash

update

update

Page 63: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Lower Bound: Sequential Case

63

p1 ✘

p2 ✘

p3 ✘

update

crash

if (result = SUCCESS) {print(“Done”);

}update

update

Page 64: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Lower Bound: Sequential Case

64

p1 ✘

p2 ✘

p3 ✘

update

update

updatecrash

Need at least 1 persistent fence for every update.

Page 65: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Lower Bound: Concurrent Case

65

p1 update

p2 update

Page 66: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Lower Bound: Concurrent Case

66

p1 update

p2 update

I’ll just let p1 perform the

fence for both of us

Page 67: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Lower Bound: Concurrent Case

67

p1 update

p2 update

! delayed before fence

Page 68: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Lower Bound: Concurrent Case

68

p1 update

p2 update

! delayed before fence

Needs to perform its own fence

Page 69: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Lower Bound: Concurrent Case

69

p1 update

p2 update

! delayed before fence

Needs to perform its own fence

Both processes perform one fence per update operation.

Page 70: Concurrent Algorithms Memory · Non-Volatile. Obstacle #2: (Re-)ordering 49 Processor Caches Persistent Memory. Obstacles Illustrated 50 1: mark memory as allocated 2: initialize

Further Reading• T. E. Hart, P. E. McKenney, A. D. Brown, and J. Walpole. Performance of memory

reclamation for lockless synchronization. Journal of Parallel and Distributed Computing, 67(12), 2007.

• J. D. Valois. Lock-free linked lists using compare-and-swap. PODC 1995.• M.M. Michael, M.L. Scott. Correction of a memory management method for lock-free

data structures. Technical Report TR599, Computer Science Department, University of Rochester. 1995.

• D. L. Detlefs, P. A. Martin, M. Moir, and G. L. Steele, Jr. Lock-free reference counting. PODC 2001.

• M. M. Michael. Hazard pointers: Safe memory reclamation for lock-free objects. IEEE Trans. Parallel Distrib. Syst., 15(6), 2004.

• O. Balmau, R. Guerraoui, M. Herlihy, and I. Zablotchi. Fast and Robust Memory Reclamation for Concurrent Data Structures. SPAA 2016.

• T. David, A. Dragojevic, R. Guerraoui, and I. Zablotchi. Log-Free Concurrent Data Structures. USENIX ATC 2018

• N. Cohen, R. Guerraoui, and I. Zablotchi. The Inherent Cost of Remembering Consistently. SPAA 2018

70