scalable reader-writer synchronization for shared- memory multiprocessors mellor-crummey and scott...
Post on 15-Dec-2015
217 Views
Preview:
TRANSCRIPT
Scalable Reader-Writer Synchronization for Shared-
Memory Multiprocessors
Mellor-Crummey and ScottPresented by
Robert T. Bauer
Problem
• Efficient SMMP Reader/Writer Synchronization
Basics
• Readers can “share” a data structure• Writers need exclusive access
– Write appears to be atomic• Issues:
– Fairness: Fair every “process” eventually runs
– Preference:• Reader preference Writer can starve• Writer preference Reader can starve
Organization
• Algorithm 1 – simple mutual exclusion • Algorithm 2 – RW with reader preference• Algorithm 3 – A fair lock
• Algoirthm 4 – local only spinning (Fair)• Algorithm 5– local only reader preference• Algorithm 6 – local only writer preference
• Conclusions
Paper’s
Contrib
Algorithm I – just a spin lock
• Idea is that processors spin on their own lock record
• Lock records form a linked list• When a lock is released, the “next”
processor waiting on the lock is signaled by passing the lock
• By using “compare-swap” when releasing, the algorithm guarantees FIFO
• Spinning is “local” by design
Algorithm 1
• Acquire Lockpred := fetch_and_store(L, I)pred /= null I->locked := true
prednext := I repeat while Ilocked
• Release Lock Inext == null compare_and_swap(L,I,null) return repeat while Inext == nullInextlocked := false
Algorithm 2 – Simple RW lock with reader preference
Bit 0 – writer active?Bit 31:1 – count of interested readers
start_write – repeat until compare_and_swap(L,0, 0x1)
start_read – atomic_add(L,2);repeat until ((L & 0x1) = 0)
end_write – atomic_add(L, -1)
end_read – atomic_add(L, -2)
Algorithm 3 – Fair Lock
Writer CountReader Count
start_write prev = fetch_clear_then_add(Lrequests, MASK, 1) // ++ write requests repeat until completions = prev // wait for previous readers and writers to go first
end_write – clear_then_add(Lcompletions, MASK,1) // ++ write completions
start_read // ++ read request, get count of prev writers prev_writer = fetch_clear_then_add(Lrequests, MASK, 1) & MASK repeat until (completions & MASK) = prev_writer // wait for prev writers to go first
end_read – clear_then_add(Lcompletions, MASK,1) // ++ read completions
Requests
Writer CountReader Count Completions
So far so good, but …
• Algorithm 2 and 3 spin on a shared memory location.
• What we want is for the algorithms to spin on processor local variables.
• Note – results weren’t presented for Algorithms 2 and 3. We can guess the performance though, since we know the general characteristics of contention.
Algorithm 4Fair R/W Lock: Local-Only Spinning
• Fairness Algorithm– read request granted when all previous write
requests have completed– write request granted when all previous read
and write requests have completed
Lock and Local Data Layout
Case 1: Just a Read
Pred == nil
Lock.tail I
Upon exit:
Lock.tail I
Lock.reader_count == 1
Case 1: Exit Readnext == nil
Lock.tailI, so cas ret T
Lock.reader_count == 1
Lock.next_writer == nil
Upon Exit:
Lock.tail == nil
Lock.reader_count == 0
Case 2: Overlapping ReadAfter first read:
Lock.tail I1
Lock.reader_count == 1
not nil !!!!
predclass == reading
Pred->state == [false,none]
Locked.reader_count == 2
Case 2: Overlapping ReadAfter the 2nd read enters:
Locked.tail I2
I1next == I2
Case 2: Overlapping readsI1 finishes next != nil
I2 finishes Locked.tail = nil
count goes to zeroafter I1 and I2 finish
Case 3: Read Overlaps Write
• The previous cases weren’t interesting, but they did help us get familiar with the data structures and (some of) the code.
• Now we need to consider the case where a “write” has started, but a read is requested. The read should block (spin) until the write completes.
• We need to “prove” that the spinning occurs on a locally cached memory location.
Case 3: Read Overlaps WriteThe Write
Upon exit:
Locked.tail I
Locked.next_writer = nil
I.class = writing, I.next = nil
I.blocked = false, success… = none
pred == nil
reset blocked to false
Case 3: Read Overlaps WriteThe Read
pred class == writing
wait here for write to complete
Case 3: Read Overlaps WriteThe Write Completes
I.next The Read
Yes!Works, but is “uncomfortable”because concerns aren’tseparated
unlock the reader
Case 3: What if there were more than 1 reader?
change the predecessor reader
wait here
Yes! Changed by the successor
unblock the successor
Case 4: Write Overlaps Read
• Overlapping reads form a chain
• The overlapping write, “spins” waiting for the read chain to complete
• Reads that “enter” after the write as “enter”, but before the write completes (even while the write is “spinning”), form a chain following the write (as with case 3).
Case 4: Write Overlaps Read
wait here
Algorithm 5 Reader Preference R/W Local-Only Spinning
• We’ll look at the Reader-Writer-Reader case and demonstrate that the second Reader completes before the Writer is signaled to start.
1st Reader++reader_countWaflag == 0 false1st reader just runs!
Overlapping Write
queue the write
Register writerinterest, resultnot zero, sincethere is a reader
We have a reader,so the cas fails.
The writer blocks herewaiting for a readerset blocked = false
2nd ReaderStill no active reader++reader_count
Reader Completes
Only last reader willsatisfy equality
Last reader to completewill set WAFLAGand unblock writer
Algorithm 6 Writer Preference R/W Local-Only Spinning
• We’ll look at the Writer-Reader-Writer case and demonstrate that the second Writer completes before the Reader is signaled to start.
1st Writer
1st writer
“set_next_writer”
1st writerwriter interested or active
no readers, just writer
writer should run
1st Writer
1st writer
blocked = false, so writerstarts
Reader
put reader on queue
“register” reader, seeif there are writers
wait here for writerto complete
2nd Writer
queue this write behindthe other write
and wait
Writer Completes
start the queuedwrite
Last Writer Completes
clear write flagssignal readers
Unblock Readers
++reader count,clear rdr’s interested
no writers waiting oractive
empty the “waiting”reader list
when this readercontinues, it willunblock the “next”reader -- which willunblock the “next”reader, etc.reader count getsbumped
Results & Conclusion
• The authors reported results for a different algorithm than was presented here.
• The “algorithms” used were “more” costly in a multiprocessor environment; so they’re claiming that the algorithms presented here would be “better.”
Timing Results
Latency is costly becauseof the number of atomicoperations.
top related