cs510 concurrent systems class 2

30
CS510 Concurrent Systems Class 2 A Lock-Free Multiprocessor OS Kernel

Upload: kyle

Post on 06-Jan-2016

40 views

Category:

Documents


2 download

DESCRIPTION

CS510 Concurrent Systems Class 2. A Lock-Free Multiprocessor OS Kernel. The Synthesis kernel. A research project at Columbia University Synthesis V.0 Uniprocessor (Motorola 68020)‏ No virtual memory 1991 - Synthesis V.1 Dual 68030s virtual memory, threads, etc Lock-free kernel. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CS510 Concurrent Systems  Class 2

CS510 Concurrent Systems Class 2

A Lock-Free Multiprocessor OS Kernel

Page 2: CS510 Concurrent Systems  Class 2

CS510 - Concurrent Systems 2

The Synthesis kernel

A research project at Columbia University

Synthesis V.0o Uniprocessor (Motorola 68020)o No virtual memory

1991 - Synthesis V.1o Dual 68030so virtual memory, threads, etco Lock-free kernel

Page 3: CS510 Concurrent Systems  Class 2

Locking

Why do kernels normally use locks?

Locks support a concurrent programming style based on mutual exclusion

o Acquire lock on entry to critical sectionso Release lock on exito Block or spin if lock is heldo Only one thread at a time executes the critical section

Locks prevent concurrent access and enable sequential reasoning about critical section code

CS510 - Concurrent Systems 3

Page 4: CS510 Concurrent Systems  Class 2

So why not use locking?

Granularity decisionso Simplicity vs performanceo Increasingly poor performance (superscalar CPUs)

Complicates compositiono Need to know the locks I’m holding before calling a

functiono Need to know if its safe to call while holding those

locks?

Risk of deadlock

Propagates thread failures to other threadso What if I crash while holding a lock?

CS510 - Concurrent Systems 4

Page 5: CS510 Concurrent Systems  Class 2

Is there an alternative?

Use lock-free, “optimistic” synchronizationo Execute the critical section unconstrained, and check

at the end to see if you were the only oneo If so, continue. If not roll back and retry

Synthesis uses no locks at all!

Goal: Show that Lock-Free synchronization is...o Sufficient for all OS synchronization needso Practicalo High performance

CS510 - Concurrent Systems 5

Page 6: CS510 Concurrent Systems  Class 2

Locking is pessimistic

Murphy's law: “If it can go wrong, it will...”

In concurrent programming:o “If we can have a race condition, we will...”o “If another thread could mess us up, it will...”

Solution:o Hide the resources behind locked doorso Make everyone wait until we're doneo That is...if there was anyone at allo We pay the same cost either way

CS510 - Concurrent Systems 6

Page 7: CS510 Concurrent Systems  Class 2

Optimistic synchronization

The common case is often little or no contentiono Or at least it should be!o Do we really need to shut out the whole world?o Why not proceed optimistically and only incur cost if we

encounter contention?

If there's little contention, there's no starvationo So we don’t need to be “wait-free” which guarantees

no starvationo Lock-free is easier and cheaper than wait-free

Small critical sections really help performance

CS510 - Concurrent Systems 7

Page 8: CS510 Concurrent Systems  Class 2

How does it work?

Copyo Write down any state we need in order to retry

Do the worko Perform the computation

Atomically “test and commit” or retryo Compare saved assumptions with the actual state of the

worldo If different, undo work, and start over with new stateo If preconditions still hold, commit the results and continueo This is where the work becomes visible to the world (ideally)

CS510 - Concurrent Systems 8

Page 9: CS510 Concurrent Systems  Class 2

Example – stack pop

Pop() { retry: old_SP = SP; new_SP = old_SP + 1; elem = *old_SP; if (CAS(old_SP, new_SP, &SP) == FAIL) goto retry; return elem;}

CS510 - Concurrent Systems 9

Page 10: CS510 Concurrent Systems  Class 2

Example – stack pop

Pop() { retry: old_SP = SP; new_SP = old_SP + 1; elem = *old_SP; if (CAS(old_SP, new_SP, &SP) == FAIL) goto retry; return elem;}

CS510 - Concurrent Systems 10

loop

Page 11: CS510 Concurrent Systems  Class 2

Example – stack pop

Pop() { retry: old_SP = SP; new_SP = old_SP + 1; elem = *old_SP; if (CAS(old_SP, new_SP, &SP) == FAIL) goto retry; return elem;}

CS510 - Concurrent Systems 11

Locals -won’t change!

Global - may change any

time!

“Atomic”read-modify-write

instruction

Page 12: CS510 Concurrent Systems  Class 2

CAS

CAS – single word Compare and Swapo An atomic read-modify-write instructiono Semantics of the single atomic instruction are:

CAS(copy, update, mem_addr){if (*mem_addr == copy) {

*mem_addr = update;return SUCCESS;

} elsereturn FAIL;

}

CS510 - Concurrent Systems 12

Page 13: CS510 Concurrent Systems  Class 2

Example – stack pop

Pop() { retry: old_SP = SP; new_SP = old_SP + 1; elem = *old_SP; if (CAS(old_SP, new_SP, &SP) == FAIL) goto retry; return elem;}

CS510 - Concurrent Systems 13

Copyglobalto local

Page 14: CS510 Concurrent Systems  Class 2

Example – stack pop

Pop() { retry: old_SP = SP; new_SP = old_SP + 1; elem = *old_SP; if (CAS(old_SP, new_SP, &SP) == FAIL) goto retry; return elem;}

CS510 - Concurrent Systems 14

Do Work

Page 15: CS510 Concurrent Systems  Class 2

Example – stack pop

Pop() { retry: old_SP = SP; new_SP = old_SP + 1; elem = *old_SP; if (CAS(old_SP, new_SP, &SP) == FAIL) goto retry; return elem;}

CS510 - Concurrent Systems 15

Test

Page 16: CS510 Concurrent Systems  Class 2

Example – stack pop

Pop() { retry: old_SP = SP; new_SP = old_SP + 1; elem = *old_SP; if (CAS(old_SP, new_SP, &SP) == FAIL) goto retry; return elem;}

CS510 - Concurrent Systems 16

Copy

Do Work

Test

Page 17: CS510 Concurrent Systems  Class 2

What made it work?

It works because we can atomically commit the new stack pointer value and compare the old stack pointer with the one at commit time

This allows us to verify no other thread has accessed the stack concurrently with our operation

o i.e. since we took the copyo Well, at least we know the address in the stack pointer

is the same as it was when we started• Does this guarantee there was no concurrent activity?• Does it matter?• We have to be careful !

CS510 - Concurrent Systems 17

Page 18: CS510 Concurrent Systems  Class 2

Stack push

Push(elem) {retry:

old_SP = SP;new_SP = old_SP – 1;old_val = *new_SP;if(CAS2(old_SP, old_val, new_SP, elem, &SP,

new_SP)== FAIL) goto retry;

}

CS510 - Concurrent Systems 18

Page 19: CS510 Concurrent Systems  Class 2

Stack push

Push(elem) {retry:

old_SP = SP;new_SP = old_SP – 1;old_val = *new_SP;if(CAS2(old_SP, old_val, new_SP, elem, &SP,

new_SP)== FAIL) goto retry;

}

CS510 - Concurrent Systems 19

Copy

Page 20: CS510 Concurrent Systems  Class 2

Stack push

Push(elem) {retry:

old_SP = SP;new_SP = old_SP – 1;old_val = *new_SP;if(CAS2(old_SP, old_val, new_SP, elem, &SP,

new_SP)== FAIL) goto retry;

}

CS510 - Concurrent Systems 20

Do Work

Page 21: CS510 Concurrent Systems  Class 2

Stack push

Push(elem) {retry:

old_SP = SP;new_SP = old_SP – 1;old_val = *new_SP;if(CAS2(old_SP, old_val, new_SP, elem, &SP,

new_SP)== FAIL) goto retry;

}

CS510 - Concurrent Systems 21

Test andcommit

Page 22: CS510 Concurrent Systems  Class 2

Stack push

Push(elem) {retry:

old_SP = SP;new_SP = old_SP – 1;old_val = *new_SP;if(CAS2(old_SP, old_val, new_SP, elem, &SP,

new_SP)== FAIL) goto retry;

}

CS510 - Concurrent Systems 22

Note: this is a double compare and swap!Its needed to atomically update both thenew item and the new stack pointer

Unnecessary

Compare

Page 23: CS510 Concurrent Systems  Class 2

CAS2

CAS2 = double compare and swapo Sometimes referred to as DCAS

CAS2(copy1, copy2, update1, update2, addr1, addr2){

if(addr1 == copy1 && addr2 == copy2) {*addr1 = update1;*addr2 = update2;return SUCCEED;

} elsereturn FAIL;

}

CS510 - Concurrent Systems 23

Page 24: CS510 Concurrent Systems  Class 2

Stack push

Push(elem) {retry:

old_SP = SP;new_SP = old_SP – 1;old_val = *new_SP;if(CAS2(old_SP, old_val, new_SP, elem, &SP,

new_SP)== FAIL) goto retry;

}

CS510 - Concurrent Systems 24

Copy

Do Work

Test

Page 25: CS510 Concurrent Systems  Class 2

Optimistic synchronization in Synthesis

Saved state is only one or two words

Commit is done viao Compare-and-Swap (CAS), oro Double-Compare-and-Swap (CAS2 or DCAS)

Can we really do everything in only two words?

o Every synchronization problem in the Synthesis kernel is reduced to only needing to atomically touch two words at a time!

o Requires some very clever kernel architecture

CS510 - Concurrent Systems 25

Page 26: CS510 Concurrent Systems  Class 2

Approach

Build data structures that work concurrentlyo Stackso Queues (array-based to avoid allocations)o Linked lists

Then build the OS around these data structures

Concurrency is a first-class concern

CS510 - Concurrent Systems 26

Page 27: CS510 Concurrent Systems  Class 2

Why is this trickier than it seems?

List operations show insert and delete at the head

o This is the easy caseo What about insert and delete of interior nodes?o Next pointers of deletable nodes are not safe to

traverse, even the first time!o Need reference counts and DCAS to atomically

compare and update the count and pointer valueso This is expensive, so we may choose to defer deletes

instead (more on this later in the course)

Specialized list and queue implementations can reduce the overheads

CS510 - Concurrent Systems 27

Page 28: CS510 Concurrent Systems  Class 2

The fall-back position

If you can’t reduce the work such that it requires atomic updates to two or less words:

o Create a single server thread and do the work sequentially on a single CPU

o Why is this faster than letting multiple CPUs try to do it concurrently?

Callers pack the requested operation into a message

o Send it to the server (using lock-free queues!)o Wait for a response/callback/...o The queue effectively serializes the operations

CS510 - Concurrent Systems 28

Page 29: CS510 Concurrent Systems  Class 2

Lock vs lock-free critical sections

CS510 - Concurrent Systems 29

Lock_based_Pop() {

spin_lock(&lock);

elem = *SP; SP = SP + 1;

spin_unlock(&lock);

return elem;

}

Lock_free_Pop() {

retry:

old_SP = SP;

new_SP = old_SP + 1; elem = *old_SP;

if (CAS(old_SP, new_SP, &SP) == FAIL)

goto retry;

return elem;

}

Page 30: CS510 Concurrent Systems  Class 2

CS510 - Concurrent Systems 30

Conclusions

This is really intriguing!

Its possible to build an entire OS without locks!

But do you really want to?o Does it add or remove complexity?o What if hardware only gives you CAS and no DCAS?o What if critical sections are large or long lived?o What if contention is high?o What if we can’t undo the work?o … ?