fast and lock-free concurrent priority queues for multi-thread systems håkan sundell philippas...
TRANSCRIPT
![Page 1: Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas](https://reader036.vdocuments.site/reader036/viewer/2022062404/5518d030550346991f8b5c5f/html5/thumbnails/1.jpg)
Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems
Håkan Sundell
Philippas Tsigas
![Page 2: Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas](https://reader036.vdocuments.site/reader036/viewer/2022062404/5518d030550346991f8b5c5f/html5/thumbnails/2.jpg)
Outline
Synchronization Methods Priority Queues Concurrent Priority Queues
Lock-Free Algorithm: Problems and Solutions
Experiments Conclusions
![Page 3: Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas](https://reader036.vdocuments.site/reader036/viewer/2022062404/5518d030550346991f8b5c5f/html5/thumbnails/3.jpg)
Synchronization
Shared data structures needs synchronization
Synchronization using Locks Mutually exclusive access to whole or parts
of the data structure
P1P2
P3
P1P2
P3
![Page 4: Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas](https://reader036.vdocuments.site/reader036/viewer/2022062404/5518d030550346991f8b5c5f/html5/thumbnails/4.jpg)
Blocking Synchronization
DrawbacksBlockingPriority InversionRisk of deadlock
Locks: Semaphores, spinning, disabling interrupts etc.Reduced efficiency because of
reduced parallelism
![Page 5: Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas](https://reader036.vdocuments.site/reader036/viewer/2022062404/5518d030550346991f8b5c5f/html5/thumbnails/5.jpg)
Non-blocking Synchronization
Lock-Free SynchronizationOptimistic approach
• Assumes it’s alone and prepares operation which later takes place (unless interfered) in one atomic step, using hardware atomic primitives
• Interference is detected via shared memory and the atomic primitives
• Retries until not interfered by other operations
• Can cause starvation
![Page 6: Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas](https://reader036.vdocuments.site/reader036/viewer/2022062404/5518d030550346991f8b5c5f/html5/thumbnails/6.jpg)
Non-blocking Synchronization
Lock-Free SynchronizationAvoids problems with locks Simple algorithmsFast when having low contention
Wait-Free SynchronizationAlways finishes in a finite number of
its own steps.• Complex algorithms• Memory consuming• Less efficient in average than lock-free
![Page 7: Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas](https://reader036.vdocuments.site/reader036/viewer/2022062404/5518d030550346991f8b5c5f/html5/thumbnails/7.jpg)
Priority Queues
Fundamental data structure Works on a set of <value,priority>
pairs Two basic operations:
Insert(v,p): Adds a new element to the priority queue
v=DeleteMin(): Removes the element <v,p> with the highest priority
![Page 8: Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas](https://reader036.vdocuments.site/reader036/viewer/2022062404/5518d030550346991f8b5c5f/html5/thumbnails/8.jpg)
Sequential Priority Queues
All implementations involves search phase in either Insert or DeleteMinArrays. Maximum complexity O(N)Ordered Lists. O(N)Trees. O(log N)
• Heaps. O(log N)
Advanced structures (i.e. combinations)
![Page 9: Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas](https://reader036.vdocuments.site/reader036/viewer/2022062404/5518d030550346991f8b5c5f/html5/thumbnails/9.jpg)
Randomized Algorithm: Skip Lists
William Pugh: ”Skip Lists: A Probabilistic Alternative to Balanced Trees”, 1990 Layers of ordered lists with different
densities, achieves a tree-like behavior
Time complexity: O(log2N) – probabilistic!
1 2 3 4 5 6 7
Head Tail
50%25%…
![Page 10: Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas](https://reader036.vdocuments.site/reader036/viewer/2022062404/5518d030550346991f8b5c5f/html5/thumbnails/10.jpg)
Why Skip Lists for Concurrent Priority Queues? Ordered Lists is simpler than Trees
Easier to make efficient concurrently Search complexity is important
Skip Lists is an alternative to Trees Lotan and Shavit: “Skiplist-Based
Concurrent Priority Queues”, 2000 Implementation using multiple locks
1 2 3 4 5 6 7
LLL L
LL L
LLL L
LL L
L L L L L L L
![Page 11: Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas](https://reader036.vdocuments.site/reader036/viewer/2022062404/5518d030550346991f8b5c5f/html5/thumbnails/11.jpg)
Our Lock-Free Concurrent Skip List
Define node state to depend on the insertion status at lowest level as well as a deletion flag
Insert from lowest level going upwards
Set deletion flag. Delete from highest level going downwards
1 2 3 4 5 6 7D D D D D D D
123
p
123
p D
![Page 12: Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas](https://reader036.vdocuments.site/reader036/viewer/2022062404/5518d030550346991f8b5c5f/html5/thumbnails/12.jpg)
Overlapping operations on shared data Example: Insert operation
- which of 2 or 3 gets inserted? Solution: Compare-And-Swap
atomic primitive:
CAS(p:pointer to word, old:word, new:word):booleanatomic do
if *p = old then *p := new; return true;
else return false;
1
2
3
4
Insert 3
Insert 2
![Page 13: Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas](https://reader036.vdocuments.site/reader036/viewer/2022062404/5518d030550346991f8b5c5f/html5/thumbnails/13.jpg)
Dynamic Memory Management
Problem: System memory allocation functionality is blocking!
Solution (lock-free), IBM freelists:Pre-allocate a number of nodes, link
them into a dynamic stack structure, and allocate/reclaim using CAS
Head Mem 1 Mem 2 Mem n…
Used 1Reclaim
Allocate
![Page 14: Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas](https://reader036.vdocuments.site/reader036/viewer/2022062404/5518d030550346991f8b5c5f/html5/thumbnails/14.jpg)
Concurrent Insert vs. Delete operations
Problem:
- both nodes are deleted!
Solution (Harris et al): Use bit 0 of pointer to mark deletion status
1
3
42Delete
Insert
a)b)
1
3
42 * a)b)
c)
![Page 15: Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas](https://reader036.vdocuments.site/reader036/viewer/2022062404/5518d030550346991f8b5c5f/html5/thumbnails/15.jpg)
The ABA problem
Problem: Because of concurrency (pre-emption in particular), same pointer value does not always mean same node (i.e. CAS succeeds)!!!
1 76
4
2 73
4
Step 1:
Step 2:
![Page 16: Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas](https://reader036.vdocuments.site/reader036/viewer/2022062404/5518d030550346991f8b5c5f/html5/thumbnails/16.jpg)
The ABA problem
Solution: (Valois et al) Add reference counting to each node, in order to prevent nodes that are of interest to some thread to be reclaimed until all threads have left the node
1 * 6 *
2 73
4
1 1
? ? ?
1
CAS Failes!
New Step 2:
![Page 17: Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas](https://reader036.vdocuments.site/reader036/viewer/2022062404/5518d030550346991f8b5c5f/html5/thumbnails/17.jpg)
Helping Scheme
Threads need to traverse safely
Need to remove marked-to-be-deleted nodes while traversing – Help!
Finds previous node, finish deletion and continues traversing from previous node
1 42 *1 42 * or
? ?
1 42 *
![Page 18: Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas](https://reader036.vdocuments.site/reader036/viewer/2022062404/5518d030550346991f8b5c5f/html5/thumbnails/18.jpg)
Back-Off Strategy
For pre-emptive systems, helping is necessary for efficiency and lock-freeness
For really concurrent systems, overlapping CAS operations (caused by helping and others) on the same node can cause heavy contention
Solution: For every failed CAS attempt, back-off (i.e. sleep) for a certain duration, which increases exponentially
![Page 19: Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas](https://reader036.vdocuments.site/reader036/viewer/2022062404/5518d030550346991f8b5c5f/html5/thumbnails/19.jpg)
Our Lock-Free Algorithm
Based on Skip Lists Treated as layers of ordered lists
Uses CAS atomic primitive Lock-Free memory management
IBM Freelists Reference counting
Helping scheme Back-Off strategy All together proved to be linearizable
![Page 20: Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas](https://reader036.vdocuments.site/reader036/viewer/2022062404/5518d030550346991f8b5c5f/html5/thumbnails/20.jpg)
Experiments
1-30 threads on platforms with different levels of real concurrency
10000 Insert vs. DeleteMin operations by each thread. 100 vs. 1000 initial inserts
Compare with other implementations:Lotan and Shavit, 2000Hunt et al “An Efficient Algorithm for
Concurrent Priority Queue Heaps”, 1996
![Page 21: Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas](https://reader036.vdocuments.site/reader036/viewer/2022062404/5518d030550346991f8b5c5f/html5/thumbnails/21.jpg)
Full Concurrency
![Page 22: Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas](https://reader036.vdocuments.site/reader036/viewer/2022062404/5518d030550346991f8b5c5f/html5/thumbnails/22.jpg)
Medium Pre-emption
![Page 23: Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas](https://reader036.vdocuments.site/reader036/viewer/2022062404/5518d030550346991f8b5c5f/html5/thumbnails/23.jpg)
High Pre-emption
![Page 24: Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas](https://reader036.vdocuments.site/reader036/viewer/2022062404/5518d030550346991f8b5c5f/html5/thumbnails/24.jpg)
Conclusions
Our work includes a Real-Time extension of the algorithm, using time-stamps and a time-stamp recycling scheme
Our lock-free algorithm is suitable for both pre-emptive as well as systems with full concurrency Will be available as part of NOBLE software
library, http://www.noble-library.org See Technical Report for full details,
http://www.cs.chalmers.se/~phs
![Page 25: Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas](https://reader036.vdocuments.site/reader036/viewer/2022062404/5518d030550346991f8b5c5f/html5/thumbnails/25.jpg)
Questions?
Contact Information: Address:
Håkan Sundell vs. Philippas TsigasComputing ScienceChalmers University of Technology
Email:<phs , tsigas> @ cs.chalmers.se
Web: http://www.cs.chalmers.se/~phs/warp
![Page 26: Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas](https://reader036.vdocuments.site/reader036/viewer/2022062404/5518d030550346991f8b5c5f/html5/thumbnails/26.jpg)
Semaphores
![Page 27: Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas](https://reader036.vdocuments.site/reader036/viewer/2022062404/5518d030550346991f8b5c5f/html5/thumbnails/27.jpg)
Back-off spinlocks
![Page 28: Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas](https://reader036.vdocuments.site/reader036/viewer/2022062404/5518d030550346991f8b5c5f/html5/thumbnails/28.jpg)
Jones Skew-Heap
![Page 29: Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas](https://reader036.vdocuments.site/reader036/viewer/2022062404/5518d030550346991f8b5c5f/html5/thumbnails/29.jpg)
The algorithm in more detail
Insert:1. Create node with random height2. Search position (Remember drops)3. Insert or update on level 14. Insert on level 2 to top (unless
already deleted)5. If deleted then HelpDelete(1)
All of this while keeping track of references, help deleted nodes etc.
![Page 30: Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas](https://reader036.vdocuments.site/reader036/viewer/2022062404/5518d030550346991f8b5c5f/html5/thumbnails/30.jpg)
The algorithm in more detail
DeleteMin1. Mark first node at level 1 as deleted,
otherwise HelpDelete(1) and retry2. Mark next pointers on level 1 to top3. Delete on level top to 1 while
detecting helping, indicate success4. Free node
All of this while keeping track of references, help deleted nodes etc.
![Page 31: Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas](https://reader036.vdocuments.site/reader036/viewer/2022062404/5518d030550346991f8b5c5f/html5/thumbnails/31.jpg)
The algorithm in more detail
HelpDelete(level)1. Mark next pointer at level to top
2. Find previous node (info in node)
3. Delete on level unless already helped, indicate success
4. Return previous node All of this while keeping track of
references, help deleted nodes etc.
![Page 32: Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas](https://reader036.vdocuments.site/reader036/viewer/2022062404/5518d030550346991f8b5c5f/html5/thumbnails/32.jpg)
Correctness
Linearizability (Herlihy 1991)In order for an implementation to be
linearizable, for every concurrent execution, there should exist an equal sequential execution that respects the partial order of the operations in the concurrent execution
![Page 33: Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas](https://reader036.vdocuments.site/reader036/viewer/2022062404/5518d030550346991f8b5c5f/html5/thumbnails/33.jpg)
Correctness
Define precise sequential semantics Define abstract state and its interpretation
Show that state is atomically updated Define linearizability points
Show that operations take effect atomically at these points with respect to sequential semantics
Creates a total order using the linearizability points that respects the partial order The algorithm is linearizable
![Page 34: Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas](https://reader036.vdocuments.site/reader036/viewer/2022062404/5518d030550346991f8b5c5f/html5/thumbnails/34.jpg)
Correctness
Lock-freenessAt least one operation should always
make progress There are no cyclic loop depencies,
and all potentially unbounded loops are ”gate-keeped” by CAS operationsThe CAS operation guarantees that at
least one CAS will always succeed• The algorithm is lock-free
![Page 35: Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas](https://reader036.vdocuments.site/reader036/viewer/2022062404/5518d030550346991f8b5c5f/html5/thumbnails/35.jpg)
Real-Time extension
DeleteMin operations should ignore nodes that are inserted after the DeleteMin operation startedNodes are inserted together with a
timestampBecause timestamps are only used for
relative comparisons, no need for a real-time clock
• Generate time-stamps by increasing function
![Page 36: Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas](https://reader036.vdocuments.site/reader036/viewer/2022062404/5518d030550346991f8b5c5f/html5/thumbnails/36.jpg)
Real-Time extension
Timestamps are potentially unbounded and will overflowRecycle ”wrapped-over” timestamp
values by having TagFieldSize=MaxTag*2
Timestamps at nodes can stay forever (MaxTag => unlimited)Every operation traverses one step
through the Skiplist and updates ”too old” timestamps