spin locks and contention management the art of multiprocessor programming spring 2007

217
Spin Locks and Contention Management The Art of Multiprocessor Programming Spring 2007

Upload: raquel-kimberley

Post on 14-Dec-2015

222 views

Category:

Documents


0 download

TRANSCRIPT

Spin Locks and Contention

ManagementThe Art of

Multiprocessor Programming

Spring 2007

© Herlihy-Shavit 2007 2

Focus so far: Correctness

• Models– Accurate (we never lied to you)

– But idealized (so we forgot to mention a few things)

• Protocols– Elegant– Important– But naïve

© Herlihy-Shavit 2007 3

New Focus: Performance

• Models– More complicated (not the same as complex!)

– Still focus on principles (not soon obsolete)

• Protocols– Elegant (in their fashion)

– Important (why else would we pay attention)

– And realistic (your mileage may vary)

© Herlihy-Shavit 2007 4

Kinds of Architectures

• SISD (Uniprocessor)– Single instruction stream– Single data stream

• SIMD (Vector)– Single instruction– Multiple data

• MIMD (Multiprocessors)– Multiple instruction– Multiple data.

© Herlihy-Shavit 2007 5

Kinds of Architectures

• SISD (Uniprocessor)– Single instruction stream– Single data stream

• SIMD (Vector)– Single instruction– Multiple data

• MIMD (Multiprocessors)– Multiple instruction– Multiple data.

Our space

(1)

© Herlihy-Shavit 2007 6

MIMD Architectures

• Memory Contention• Communication Contention • Communication Latency

Shared Bus

memory

Distributed

© Herlihy-Shavit 2007 7

Today: Revisit Mutual Exclusion

• Think of performance, not just correctness and progress

• Begin to understand how performance depends on our software properly utilizing the multipprocessor machines hardware

• And get to know a collection of locking algorithms…

(1)

© Herlihy-Shavit 2007 8

What Should you do if you can’t get a lock?

• Keep trying– “spin” or “busy-wait”– Good if delays are short

• Give up the processor– Good if delays are long– Always good on uniprocessor

(1)

© Herlihy-Shavit 2007 9

What Should you do if you can’t get a lock?

• Keep trying– “spin” or “busy-wait”– Good if delays are short

• Give up the processor– Good if delays are long– Always good on uniprocessor

our focus

© Herlihy-Shavit 2007 10

Basic Spin-Lock

CS

Resets lock upon exit

spin lock

critical section

...

© Herlihy-Shavit 2007 11

Basic Spin-Lock

CS

Resets lock upon exit

spin lock

critical section

...

…lock introduces sequntial bottleneck

© Herlihy-Shavit 2007 12

Basic Spin-Lock

CS

Resets lock upon exit

spin lock

critical section

...

…lock suffers from contention

© Herlihy-Shavit 2007 13

Basic Spin-Lock

CS

Resets lock upon exit

spin lock

critical section

...Notice: these are two different phenomenon

…lock suffers from contention

Seq Bottleneck no parallelism

© Herlihy-Shavit 2007 14

Basic Spin-Lock

CS

Resets lock upon exit

spin lock

critical section

...Contention ???

…lock suffers from contention

© Herlihy-Shavit 2007 15

Review: Test-and-Set

• Boolean value• Test-and-set (TAS)

– Swap true with current value– Return value tells if prior value was

true or false

• Can reset just by writing false• TAS aka “getAndSet”

© Herlihy-Shavit 2007 16

Review: Test-and-Set

public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) {

boolean prior = value; value = newValue; return prior; }}

(5)

© Herlihy-Shavit 2007 17

Review: Test-and-Set

public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) {

boolean prior = value; value = newValue; return prior; }}

Packagejava.util.concurrent.atomic

© Herlihy-Shavit 2007 18

Review: Test-and-Set

public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) {

boolean prior = value; value = newValue; return prior; }}

Swap old and new values

© Herlihy-Shavit 2007 19

Review: Test-and-Set

AtomicBoolean lock = new AtomicBoolean(false)…boolean prior = lock.getAndSet(true)

© Herlihy-Shavit 2007 20

Review: Test-and-Set

AtomicBoolean lock = new AtomicBoolean(false)…boolean prior = lock.getAndSet(true)

(5)

Swapping in true is called “test-and-set” or TAS

© Herlihy-Shavit 2007 21

Test-and-Set Locks

• Locking– Lock is free: value is false– Lock is taken: value is true

• Acquire lock by calling TAS– If result is false, you win– If result is true, you lose

• Release lock by writing false

© Herlihy-Shavit 2007 22

Test-and-set Lock

class TASlock { AtomicBoolean state = new AtomicBoolean(false);

void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }}

© Herlihy-Shavit 2007 23

Test-and-set Lock

class TASlock { AtomicBoolean state = new AtomicBoolean(false);

void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }}

Lock state is AtomicBoolean

© Herlihy-Shavit 2007 24

Test-and-set Lock

class TASlock { AtomicBoolean state = new AtomicBoolean(false);

void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }}

Keep trying until lock acquired

© Herlihy-Shavit 2007 25

Test-and-set Lock

class TASlock { AtomicBoolean state = new AtomicBoolean(false);

void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }}

Release lock by resetting state to

false

© Herlihy-Shavit 2007 26

Space Complexity

• TAS spin-lock has small “footprint” • N thread spin-lock uses O(1) space• As opposed to O(n)

Peterson/Bakery • How did we overcome the (n)

lower bound? • We used a RMW operation…

© Herlihy-Shavit 2007 27

Performance

• Experiment– n threads– Increment shared counter 1 million

times

• How long should it take?• How long does it take?

© Herlihy-Shavit 2007 28

Graph

ideal

tim e

threads

no speedup because of sequential bottleneck

© Herlihy-Shavit 2007 29

Mystery #1ti

m e

threads

TAS lock

Ideal

(1)

What is going on?

Lets try and fix

it…

© Herlihy-Shavit 2007 30

Test-and-Test-and-Set Locks

• Lurking stage– Wait until lock “looks” free– Spin while read returns true (lock

taken)• Pouncing state

– As soon as lock “looks” available– Read returns false (lock free)– Call TAS to acquire lock– If TAS loses, back to lurking

© Herlihy-Shavit 2007 31

Test-and-test-and-set Lock

class TTASlock { AtomicBoolean state = new AtomicBoolean(false);

void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; }}

© Herlihy-Shavit 2007 32

Test-and-test-and-set Lock

class TTASlock { AtomicBoolean state = new AtomicBoolean(false);

void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; }} Wait until lock looks free

© Herlihy-Shavit 2007 33

Test-and-test-and-set Lock

class TTASlock { AtomicBoolean state = new AtomicBoolean(false);

void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; }}

Then try to acquire it

© Herlihy-Shavit 2007 34

Mystery #2

TAS lock

TTAS lock

Ideal

tim e

threads

© Herlihy-Shavit 2007 35

Mystery

• Both– TAS and TTAS– Do the same thing (in our model)

• Except that– TTAS performs much better than TAS– Neither approaches ideal

© Herlihy-Shavit 2007 36

Opinion

• Our memory abstraction is broken• TAS & TTAS methods

– Are provably the same (in our model)

– Except they aren’t (in field tests)

• Need a more detailed model …

© Herlihy-Shavit 2007 37

Bus-Based Architectures

Bus

cache

memory

cachecache

© Herlihy-Shavit 2007 38

Bus-Based Architectures

Bus

cache

memory

cachecache

Random access memory (10s of cycles)

© Herlihy-Shavit 2007 39

Bus-Based Architectures

cache

memory

cachecache

Shared Bus•broadcast medium•One broadcaster at a time•Processors and memory all “snoop”

Bus

© Herlihy-Shavit 2007 40

Bus-Based Architectures

Bus

cache

memory

cachecache

Per-Processor Caches•Small•Fast: 1 or 2 cycles•Address & state information

© Herlihy-Shavit 2007 41

Jargon Watch

• Cache hit– “I found what I wanted in my cache”– Good Thing™

© Herlihy-Shavit 2007 42

Jargon Watch

• Cache hit– “I found what I wanted in my cache”– Good Thing™

• Cache miss– “I had to shlep all the way to memory

for that data”– Bad Thing™

© Herlihy-Shavit 2007 43

Cave Canem

• This model is still a simplification– But not in any essential way– Illustrates basic principles

• Will discuss complexities later

© Herlihy-Shavit 2007 44

Bus

Processor Issues Load Request

cache

memory

cachecache

data

© Herlihy-Shavit 2007 45

Bus

Processor Issues Load Request

Bus

cache

memory

cachecache

data

Gimmedata

© Herlihy-Shavit 2007 46

cache

Bus

Memory Responds

Bus

memory

cachecache

data

Got your data right here data

© Herlihy-Shavit 2007 47

Bus

Processor Issues Load Request

memory

cachecachedata

data

Gimmedata

© Herlihy-Shavit 2007 48

Bus

Processor Issues Load Request

Bus

memory

cachecachedata

data

Gimmedata

© Herlihy-Shavit 2007 49

Bus

Processor Issues Load Request

Bus

memory

cachecachedata

data

I got data

© Herlihy-Shavit 2007 50

Bus

Other Processor Responds

memory

cachecache

data

I got data

datadata

Bus

© Herlihy-Shavit 2007 51

Bus

Other Processor Responds

memory

cachecache

data

datadata

Bus

© Herlihy-Shavit 2007 52

Modify Cached Data

Bus

data

memory

cachedata

data

(1)

© Herlihy-Shavit 2007 53

Modify Cached Data

Bus

data

memory

cachedata

data

data

(1)

© Herlihy-Shavit 2007 54

memory

Bus

data

Modify Cached Data

cachedata

data

© Herlihy-Shavit 2007 55

memory

Bus

data

Modify Cached Data

cache

What’s up with the other copies?

data

data

© Herlihy-Shavit 2007 56

Cache Coherence

• We have lots of copies of data– Original copy in memory – Cached copies at processors

• Some processor modifies its own copy– What do we do with the others?– How to avoid confusion?

© Herlihy-Shavit 2007 57

Write-Back Caches

• Accumulate changes in cache• Write back when needed

– Need the cache for something else– Another processor wants it

• On first modification– Invalidate other entries– Requires non-trivial protocol …

© Herlihy-Shavit 2007 58

Write-Back Caches

• Cache entry has three states– Invalid: contains raw seething bits– Valid: I can read but I can’t write– Dirty: Data has been modified

• Intercept other load requests• Write back to memory before using cache

© Herlihy-Shavit 2007 59

Bus

Invalidate

memory

cachedatadata

data

© Herlihy-Shavit 2007 60

Bus

Invalidate

Bus

memory

cachedatadata

data

Mine, all mine!

© Herlihy-Shavit 2007 61

Bus

Invalidate

Bus

memory

cachedatadata

data

cache

Uh,oh

© Herlihy-Shavit 2007 62

cache

Bus

Invalidate

memory

cachedata

data

Other caches lose read permission

© Herlihy-Shavit 2007 63

cache

Bus

Invalidate

memory

cachedata

data

Other caches lose read permission

This cache acquires write permission

© Herlihy-Shavit 2007 64

cache

Bus

Invalidate

memory

cachedata

data

Memory provides data only if not present in any cache, so no need

to change it now (expensive)

(2)

© Herlihy-Shavit 2007 65

cache

Bus

Another Processor Asks for Data

memory

cachedata

data

(2)

Bus

© Herlihy-Shavit 2007 66

cache data

Bus

Owner Responds

memory

cachedata

data

(2)

Bus

Here it is!

© Herlihy-Shavit 2007 67

Bus

End of the Day …

memory

cachedata

data

(1)

Reading OK, no writing

data data

© Herlihy-Shavit 2007 68

Mutual Exclusion

• What do we want to optimize?– Bus bandwidth used by spinning

threads– Release/Acquire latency– Acquire latency for idle lock

© Herlihy-Shavit 2007 69

Simple TASLock

• TAS invalidates cache lines• Spinners

– Miss in cache– Go to bus

• Thread wants to release lock– delayed behind spinners

© Herlihy-Shavit 2007 70

Test-and-test-and-set

• Wait until lock “looks” free– Spin on local cache– No bus use while lock busy

• Problem: when lock is released– Invalidation storm …

© Herlihy-Shavit 2007 71

Local Spinning while Lock is Busy

Bus

memory

busybusybusy

busy

© Herlihy-Shavit 2007 72

Bus

On Release

memory

freeinvalidinvalid

free

© Herlihy-Shavit 2007 73

On Release

Bus

memory

freeinvalidinvalid

free

miss miss

Everyone misses, rereads

(1)

© Herlihy-Shavit 2007 74

On Release

Bus

memory

freeinvalidinvalid

free

TAS(…) TAS(…)

Everyone tries TAS

(1)

© Herlihy-Shavit 2007 75

Problems

• Everyone misses– Reads satisfied sequentially

• Everyone does TAS– Invalidates others’ caches

• Eventually quiesces after lock acquired– How long does this take?

© Herlihy-Shavit 2007 76

Measuring Quiescence Time

P1

P2

Pn

X = time of ops that don’t use the busY = time of ops that cause intensive bus traffic

In critical section, run ops X then ops Y. As long as Quiescence time is less than X, no drop in performance.

By gradually varying X, can determine the exact time to quiesce.

© Herlihy-Shavit 2007 77

Quiescence Time

Increses linearly with the number of processors for bus architectureti

m e

threads

© Herlihy-Shavit 2007 78

Mystery Explained

TAS lock

TTAS lock

Ideal

tim e

threads

Better than TAS but still

not as good as ideal

© Herlihy-Shavit 2007 79

Solution: Introduce Delay

spin locktimedr1dr2d

• If the lock looks free• But I fail to get it

• There must be lots of contention• Better to back off than to collide again

© Herlihy-Shavit 2007 80

Dynamic Example: Exponential Backoff

timed2d4d spin lock

If I fail to get lock– wait random duration before retry– Each subsequent failure doubles

expected wait

© Herlihy-Shavit 2007 81

Exponential Backoff Lock

public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}}

© Herlihy-Shavit 2007 82

Exponential Backoff Lock

public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Fix minimum delay

© Herlihy-Shavit 2007 83

Exponential Backoff Lock

public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Wait until lock looks free

© Herlihy-Shavit 2007 84

Exponential Backoff Lock

public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} If we win, return

© Herlihy-Shavit 2007 85

Exponential Backoff Lock

public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}}

Back off for random duration

© Herlihy-Shavit 2007 86

Exponential Backoff Lock

public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}}

Double max delay, within reason

© Herlihy-Shavit 2007 87

Spin-Waiting Overhead

TTAS Lock

Backoff lock

tim e

threads

© Herlihy-Shavit 2007 88

Backoff: Other Issues

• Good– Easy to implement– Beats TTAS lock

• Bad– Must choose parameters carefully– Not portable across platforms

© Herlihy-Shavit 2007 89

Idea

• Avoid useless invalidations– By keeping a queue of threads

• Each thread– Notifies next in line– Without bothering the others

© Herlihy-Shavit 2007 90

Anderson Queue Lock

flags

next

T F F F F F F F

idle

© Herlihy-Shavit 2007 91

Anderson Queue Lock

flags

next

T F F F F F F F

acquiring

getAndIncrement

© Herlihy-Shavit 2007 92

Anderson Queue Lock

flags

next

T F F F F F F F

acquiring

getAndIncrement

© Herlihy-Shavit 2007 93

Anderson Queue Lock

flags

next

T F F F F F F F

acquired

Mine!

© Herlihy-Shavit 2007 94

Anderson Queue Lock

flags

next

T F F F F F F F

acquired acquiring

© Herlihy-Shavit 2007 95

Anderson Queue Lock

flags

next

T F F F F F F F

acquired acquiring

getAndIncrement

© Herlihy-Shavit 2007 96

Anderson Queue Lock

flags

next

T F F F F F F F

acquired acquiring

getAndIncrement

© Herlihy-Shavit 2007 97

acquired

Anderson Queue Lock

flags

next

T F F F F F F F

acquiring

© Herlihy-Shavit 2007 98

released

Anderson Queue Lock

flags

next

T T F F F F F F

acquired

© Herlihy-Shavit 2007 99

released

Anderson Queue Lock

flags

next

T T F F F F F F

acquired

Yow!

© Herlihy-Shavit 2007 100

Anderson Queue Lock

class ALock implements Lock { boolean[] flags={true,false,…,false}; AtomicInteger next = new AtomicInteger(0); int[] slot = new int[n];

© Herlihy-Shavit 2007 101

Anderson Queue Lock

class ALock implements Lock { boolean[] flags={true,false,…,false}; AtomicInteger next = new AtomicInteger(0); int[] slot = new int[n];

One flag per thread

© Herlihy-Shavit 2007 102

Anderson Queue Lock

class ALock implements Lock { boolean[] flags={true,false,…,false}; AtomicInteger next = new AtomicInteger(0); int[] slot = new int[n];

Next flag to use

© Herlihy-Shavit 2007 103

Anderson Queue Lock

class ALock implements Lock { boolean[] flags={true,false,…,false}; AtomicInteger next = new AtomicInteger(0); ThreadLocal<Integer> mySlot;

Thread-local variable

© Herlihy-Shavit 2007 104

Anderson Queue Lock

public lock() { int mySlot = next.getAndIncrement(); while (!flags[mySlot % n]) {}; flags[mySlot % n] = false;}

public unlock() { flags[(mySlot+1) % n] = true;}

© Herlihy-Shavit 2007 105

Anderson Queue Lock

public lock() { int mySlot = next.getAndIncrement(); while (!flags[mySlot % n]) {}; flags[mySlot % n] = false;}

public unlock() { flags[(mySlot+1) % n] = true;} Take next slot

© Herlihy-Shavit 2007 106

Anderson Queue Lock

public lock() { int mySlot = next.getAndIncrement(); while (!flags[mySlot % n]) {}; flags[mySlot % n] = false;}

public unlock() { flags[(mySlot+1) % n] = true;} Spin until told to go

© Herlihy-Shavit 2007 107

Anderson Queue Lock

public lock() { int slot[i]=next.getAndIncrement(); while (!flags[slot[i] % n]) {}; flags[slot[i] % n] = false;}

public unlock() { flags[slot[i]+1 % n] = true;} Prepare slot for re-use

© Herlihy-Shavit 2007 108

Anderson Queue Lock

public lock() { int mySlot = next.getAndIncrement(); while (!flags[mySlot % n]) {}; flags[mySlot % n] = false;}

public unlock() { flags[(mySlot+1) % n] = true;}

Tell next thread to go

© Herlihy-Shavit 2007 109

Performance

• Curve is practically flat

• Scalable performance

• FIFO fairness

queue

TTAS

© Herlihy-Shavit 2007 110

Anderson Queue Lock

• Good– First truly scalable lock– Simple, easy to implement

• Bad– Space hog– One bit per thread

• Unknown number of threads?• Small number of actual contenders?

© Herlihy-Shavit 2007 111

CLH Lock

• FIFO order• Small, constant-size overhead per

thread

© Herlihy-Shavit 2007 112

Initially

false

tail

idle

© Herlihy-Shavit 2007 113

Initially

false

tail

idle

Queue tail

© Herlihy-Shavit 2007 114

Initially

false

tail

idle

Lock is free

© Herlihy-Shavit 2007 115

Initially

false

tail

idle

© Herlihy-Shavit 2007 116

Purple Wants the Lock

false

tail

acquiring

© Herlihy-Shavit 2007 117

Purple Wants the Lock

false

tail

acquiring

true

© Herlihy-Shavit 2007 118

Purple Wants the Lock

falsetail

acquiring

true

Swap

© Herlihy-Shavit 2007 119

Purple Has the Lock

false

tail

acquired

true

© Herlihy-Shavit 2007 120

Red Wants the Lock

false

tail

acquired acquiring

true true

© Herlihy-Shavit 2007 121

Red Wants the Lock

false

tail

acquired acquiring

true

Swap

true

© Herlihy-Shavit 2007 122

Red Wants the Lock

false

tail

acquired acquiring

true true

© Herlihy-Shavit 2007 123

Red Wants the Lock

false

tail

acquired acquiring

true true

© Herlihy-Shavit 2007 124

Red Wants the Lock

false

tail

acquired acquiring

true true

trueActually, it spins on cached copy

© Herlihy-Shavit 2007 125

Purple Releases

false

tail

release acquiring

false true

falseBingo

!

© Herlihy-Shavit 2007 126

Purple Releases

tail

released acquired

true

© Herlihy-Shavit 2007 127

Space Usage

• Let– L = number of locks– N = number of threads

• ALock– O(LN)

• CLH lock– O(L+N)

© Herlihy-Shavit 2007 128

CLH Queue Lock

class Qnode { AtomicBoolean locked = new AtomicBoolean(true);}

© Herlihy-Shavit 2007 129

CLH Queue Lock

class Qnode { AtomicBoolean locked = new AtomicBoolean(true);}

Not released yet

© Herlihy-Shavit 2007 130

CLH Queue Lockclass CLHLock implements Lock { AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode = new Qnode(); public void lock() { Qnode pred = queue.getAndSet(myNode); while (pred.locked) {} }}

(3)

© Herlihy-Shavit 2007 131

CLH Queue Lockclass CLHLock implements Lock { AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode = new Qnode(); public void lock() { Qnode pred = queue.getAndSet(myNode); while (pred.locked) {} }}

(3)

Tail of the queue

© Herlihy-Shavit 2007 132

CLH Queue Lockclass CLHLock implements Lock { AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode = new Qnode(); public void lock() { Qnode pred = queue.getAndSet(myNode); while (pred.locked) {} }}

(3)

Thread-local Qnode

© Herlihy-Shavit 2007 133

CLH Queue Lockclass CLHLock implements Lock { AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode = new Qnode(); public void lock() { Qnode pred = queue.getAndSet(myNode); while (pred.locked) {} }}

(3)

Swap in my node

© Herlihy-Shavit 2007 134

CLH Queue Lockclass CLHLock implements Lock { AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode = new Qnode(); public void lock() { Qnode pred = queue.getAndSet(myNode); while (pred.locked) {} }}

(3)

Spin until predecessorreleases lock

© Herlihy-Shavit 2007 135

CLH Queue LockClass CLHLock implements Lock { … public void unlock() { myNode.locked.set(false); myNode = pred; }}

(3)

© Herlihy-Shavit 2007 136

CLH Queue LockClass CLHLock implements Lock { … public void unlock() { myNode.locked.set(false); myNode = pred; }}

(3)

Notify successor

© Herlihy-Shavit 2007 137

CLH Queue LockClass CLHLock implements Lock { … public void unlock() { myNode.locked.set(false); myNode = pred; }}

(3)

Recycle predecessor’s

node

© Herlihy-Shavit 2007 138

CLH Lock

• Good– Lock release affects predecessor only– Small, constant-sized space

• Bad– Doesn’t work for uncached NUMA

architectures

© Herlihy-Shavit 2007 139

NUMA Architecturs

• Acronym:– Non-Uniform Memory Architecture

• Illusion:– Flat shared memory

• Truth:– No caches (sometimes)– Some memory regions faster than

others

© Herlihy-Shavit 2007 140

NUMA Machines

Spinning on local memory is fast

© Herlihy-Shavit 2007 141

NUMA Machines

Spinning on remote memory is slow

© Herlihy-Shavit 2007 142

CLH Lock

• Each thread spin’s on predecessor’s memory

• Could be far away …

© Herlihy-Shavit 2007 143

MCS Lock

• FIFO order• Spin on local memory only• Small, Constant-size overhead

© Herlihy-Shavit 2007 144

Initially

false

tail false

idle

© Herlihy-Shavit 2007 145

Acquiring

false

queuefalse

true

acquiring

(allocate Qnode)

© Herlihy-Shavit 2007 146

Acquiring

false

tail false

true

acquired

swap

© Herlihy-Shavit 2007 147

Acquiring

false

tail false

true

acquired

© Herlihy-Shavit 2007 148

Acquired

false

tail false

true

acquired

© Herlihy-Shavit 2007 149

Acquiring

tail

true

acquiredacquiring

trueswap

© Herlihy-Shavit 2007 150

Acquiring

tail

acquiredacquiring

true

true

© Herlihy-Shavit 2007 151

Acquiring

tail

acquiredacquiring

true

true

© Herlihy-Shavit 2007 152

Acquiring

tail

acquiredacquiring

true

true

© Herlihy-Shavit 2007 153

Acquiring

tail

acquiredacquiring

true

true

false

© Herlihy-Shavit 2007 154

Acquiring

tail

acquiredacquiring

true

trueYes!

false

© Herlihy-Shavit 2007 155

MCS Queue Lock

class Qnode { boolean locked = false; qnode next = null;}

© Herlihy-Shavit 2007 156

MCS Queue Lockclass MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}}

(3)

© Herlihy-Shavit 2007 157

MCS Queue Lockclass MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}}

(3)

Make a QNode

© Herlihy-Shavit 2007 158

MCS Queue Lockclass MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}}

(3)

add my Node to the tail of

queue

© Herlihy-Shavit 2007 159

MCS Queue Lockclass MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}}

(3)

Fix if queue was non-

empty

© Herlihy-Shavit 2007 160

MCS Queue Lockclass MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}}

(3)

Wait until unlocked

© Herlihy-Shavit 2007 161

MCS Queue Lockclass MCSLock implements Lock { AtomicReference tail; public void unlock() { if (qnode.next == null) { if (queue.CAS(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false;}}

(3)

© Herlihy-Shavit 2007 162

MCS Queue Lockclass MCSLock implements Lock { AtomicReference tail; public void unlock() { if (qnode.next == null) { if (queue.CAS(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false;}}

(3)

Missingsuccessor?

© Herlihy-Shavit 2007 163

MCS Queue Lockclass MCSLock implements Lock { AtomicReference tail; public void unlock() { if (qnode.next == null) { if (queue.CAS(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false;}}

(3)

If really no successor, return

© Herlihy-Shavit 2007 164

MCS Queue Lockclass MCSLock implements Lock { AtomicReference tail; public void unlock() { if (qnode.next == null) { if (queue.CAS(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false;}}

(3)

Otherwise wait for successor to catch up

© Herlihy-Shavit 2007 165

MCS Queue Lockclass MCSLock implements Lock { AtomicReference queue; public void unlock() { if (qnode.next == null) { if (queue.CAS(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false;}}

(3)

Pass lock to successor

© Herlihy-Shavit 2007 166

Purple Release

false

releasing swap

false

(2)

© Herlihy-Shavit 2007 167

Purple Release

false

releasing swap

false

By looking at the queue, I see another

thread is active

(2)

© Herlihy-Shavit 2007 168

Purple Release

false

releasing swap

false

By looking at the queue, I see another

thread is active

I have to wait for that thread to finish

(2)

© Herlihy-Shavit 2007 169

Purple Release

false

releasing spinning

true

© Herlihy-Shavit 2007 170

Purple Release

false

releasing spinning

truefalse

© Herlihy-Shavit 2007 171

Purple Release

false

releasing spinning

true

locked

false

© Herlihy-Shavit 2007 172

Abortable Locks

• What if you want to give up waiting for a lock?

• For example– Timeout– Database transaction aborted by user

© Herlihy-Shavit 2007 173

Back-off Lock

• Aborting is trivial– Just return from lock() call

• Extra benefit:– No cleaning up– Wait-free– Immediate return

© Herlihy-Shavit 2007 174

Queue Locks

• Can’t just quit– Thread in line behind will starve

• Need a graceful way out

© Herlihy-Shavit 2007 175

Queue Locks

spinning

true

spinning

truetrue

spinning

© Herlihy-Shavit 2007 176

Queue Locks

spinning

true

spinning

truefalse

locked

© Herlihy-Shavit 2007 177

Queue Locks

spinning

true

locked

false

© Herlihy-Shavit 2007 178

Queue Locks

locked

false

© Herlihy-Shavit 2007 179

Queue Locks

spinning

true

spinning

truetrue

spinning

© Herlihy-Shavit 2007 180

Queue Locks

spinning

truetruetrue

spinning

© Herlihy-Shavit 2007 181

Queue Locks

spinning

truetruefalse

locked

© Herlihy-Shavit 2007 182

Queue Locks

spinning

truefalse

© Herlihy-Shavit 2007 183

Queue Locks

pwned

truefalse

© Herlihy-Shavit 2007 184

Abortable CLH Lock

• When a thread gives up– Removing node in a wait-free way is

hard

• Idea:– let successor deal with it.

© Herlihy-Shavit 2007 185

Initially

tail

idlePointer to

predecessor (or null)

A

© Herlihy-Shavit 2007 186

Initially

tail

idleDistinguished available

node means lock is free

A

© Herlihy-Shavit 2007 187

Acquiring

tail

acquiring

A

© Herlihy-Shavit 2007 188

Acquiringacquiring

A

Null predecessor means lock not

released or aborted

© Herlihy-Shavit 2007 189

Acquiringacquiring

A

Swap

© Herlihy-Shavit 2007 190

Acquiringacquiring

A

© Herlihy-Shavit 2007 191

Acquiredlocked

A

Pointer to AVAILABLE

means lock is free.

© Herlihy-Shavit 2007 192

Normal Case

spinningspinninglocked

Null means lock is not free & request not

aborted

© Herlihy-Shavit 2007 193

One Thread Aborts

spinningTimed outlocked

© Herlihy-Shavit 2007 194

Successor Notices

spinningTimed outlocked

Non-Null means predecessor

aborted

© Herlihy-Shavit 2007 195

Recycle Predecessor’s Node

spinninglocked

© Herlihy-Shavit 2007 196

Spin on Earlier Node

spinninglocked

© Herlihy-Shavit 2007 197

Spin on Earlier Node

spinningreleased

A

The lock is now mine

© Herlihy-Shavit 2007 198

Time-out Lockpublic class TOLock implements Lock { static Qnode AVAILABLE = new Qnode(); AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode;

(3)

© Herlihy-Shavit 2007 199

Time-out Lockpublic class TOLock implements Lock { static Qnode AVAILABLE = new Qnode(); AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode;

(3)

Distinguished node to signify free lock

© Herlihy-Shavit 2007 200

Time-out Lockpublic class TOLock implements Lock { static Qnode AVAILABLE = new Qnode(); AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode;

(3)

Tail of the queue

© Herlihy-Shavit 2007 201

Time-out Lockpublic class TOLock implements Lock { static Qnode AVAILABLE = new Qnode(); AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode;

(3)

Remember my node …

© Herlihy-Shavit 2007 202

Time-out Lockpublic boolean lock(long timeout) { Qnode qnode = new Qnode(); myNode.set(qnode); qnode.prev = null; Qnode pred = tail.getAndSet(qnode); if (pred == null || pred.prev == AVAILABLE) { return true; }…

(3)

© Herlihy-Shavit 2007 203

Time-out Lockpublic boolean lock(long timeout) { Qnode qnode = new Qnode(); myNode.set(qnode); qnode.prev = null; Qnode pred = tail.getAndSet(qnode); if (pred == null || pred.prev == AVAILABLE) { return true; }

(3)

Create & initialize node

© Herlihy-Shavit 2007 204

Time-out Lockpublic boolean lock(long timeout) { Qnode qnode = new Qnode(); myNode.set(qnode); qnode.prev = null; Qnode pred = tail.getAndSet(qnode); if (pred == null || pred.prev == AVAILABLE) { return true; }

(3)

Swap with tail

© Herlihy-Shavit 2007 205

Time-out Lockpublic boolean lock(long timeout) { Qnode qnode = new Qnode(); myNode.set(qnode); qnode.prev = null; Qnode pred = tail.getAndSet(qnode); if (pred == null || pred.prev == AVAILABLE) { return true; } ...

(3)

If predecessor absent or released, we are

done

© Herlihy-Shavit 2007 206

Time-out Lock… long start = now(); while (now()- start < timeout) { Qnode predPred = pred.prev; if (predPred == AVAILABLE) { return true; } else if (predPred != null) { pred = predPred; } } …

(3)

© Herlihy-Shavit 2007 207

Time-out Lock… long start = now(); while (now()- start < timeout) { Qnode predPred = pred.prev; if (predPred == AVAILABLE) { return true; } else if (predPred != null) { pred = predPred; } } …

(3)

Keep trying for a while …

© Herlihy-Shavit 2007 208

Time-out Lock… long start = now(); while (now()- start < timeout) { Qnode predPred = pred.prev; if (predPred == AVAILABLE) { return true; } else if (predPred != null) { pred = predPred; } } …

(3)

Spin on predecessor’s prev field

© Herlihy-Shavit 2007 209

Time-out Lock… long start = now(); while (now()- start < timeout) { Qnode predPred = pred.prev; if (predPred == AVAILABLE) { return true; } else if (predPred != null) { pred = predPred; } } …

(3)

Predecessor released lock

© Herlihy-Shavit 2007 210

Time-out Lock… long start = now(); while (now()- start < timeout) { Qnode predPred = pred.prev; if (predPred == AVAILABLE) { return true; } else if (predPred != null) { pred = predPred; } } …

(3)

Predecessor aborted, advance one

© Herlihy-Shavit 2007 211

Time-out Lock…if (!tail.compareAndSet(qnode, pred)) qnode.prev = pred; return false; }}

(3)

© Herlihy-Shavit 2007 212

Time-out Lock…if (!tail.compareAndSet(qnode, pred)) qnode.prev = pred; return false; }}

(3)

Try to put predecessor back

© Herlihy-Shavit 2007 213

Time-out Lock…if (!tail.compareAndSet(qnode, pred)) qnode.prev = pred; return false; }}

(3)

Otherwise, if it’s too late, redirect to predecessor

© Herlihy-Shavit 2007 214

Time-out Lockpublic void unlock() { Qnode qnode = myNode.get(); if (!tail.compareAndSet(qnode, null)) qnode.prev = AVAILABLE;}

(3)

© Herlihy-Shavit 2007 215

Time-out Lockpublic void unlock() { Qnode qnode = myNode.get(); if (!tail.compareAndSet(qnode, null)) qnode.prev = AVAILABLE;}

(3)

Clean up if no one waiting

© Herlihy-Shavit 2007 216

Time-out Lockpublic void unlock() { Qnode qnode = myNode.get(); if (!tail.compareAndSet(qnode, null)) qnode.prev = AVAILABLE;}

(3)

Notify successor by pointing to AVAILABLE

node

© Herlihy-Shavit 2007 217

One Lock To Rule Them All?

• TTAS+Backoff, CLH, MCS, ToLock…• Each better than others in some

way• There is no one solution• Lock we pick really depends on:

– the application– the hardware– which properties are important