Non-blocking Data Structures for High-Performance Computing
Håkan Sundell, PhD
5 August 2005EPCC 20052
Outline
Shared Memory Synchronization Methods Memory Management Shared Data Structures
Dictionary Performance Conclusions
5 August 2005EPCC 20053
Shared Memory
CPU CPU CPU
CPU CPU CPU CPU CPU CPU
Cache Cache Cache
Cache bus Cache bus Cache bus
Memory
Memory Memory Memory
...
. . .
... .... . .
- Uniform Memory Access (UMA)
- Non-Uniform Memory Access (NUMA)
5 August 2005EPCC 20054
Synchronization
Shared data structures needs synchronization!
Accesses and updates must be coordinated to establish consistency.
P1P2
P3
5 August 2005EPCC 20055
Hardware Synchronization Primitives Consensus 1
Atomic Read/Write
Consensus 2 Atomic Test-And-Set (TAS), Fetch-And-Add
(FAA), Swap
Consensus Infinite Atomic Compare-And-Swap (CAS) Atomic Load-Linked/Store-Conditionally
ReadWrite
Read
M=f(M,…)
5 August 2005EPCC 20056
Mutual Exclusion
Access to shared data will be atomic because of lock
Reduced Parallelism by definitionBlocking, Danger of priority inversion
and deadlocks.• Solutions exists, but with high overhead,
especially for multi-processor systems
P1P2
P3
5 August 2005EPCC 20057
Non-blocking Synchronization
Perform operation/changes using atomic primitives
Lock-Free Synchronization Optimistic approach
• Retries until succeeding
Guarantees progress of at least one operation
Wait-Free Synchronization Always finishes in a finite number of its own
steps• Coordination with all participants
5 August 2005EPCC 20058
Memory Management
Dynamic data structures need dynamic memory management Concurrent D.S. need concurrent M.M.!
5 August 2005EPCC 20059
Concurrent Memory Management Concurrent Memory Allocation
i.e. malloc/free functionality Concurrent Garbage Collection
Questions (among many):• When to re-use memory?• How to de-reference pointers safely?
P2 P1 P3
5 August 2005EPCC 200510
Lock-Free Memory Management Memory Allocation
Valois 1995: fixed block-size, fixed purpose Michael 2004: Gidenstam et al. 2004, any
size, any purpose Garbage Collection
Valois 1995, Detlefs et al. 2001: reference counting
Michael 2002, Herlihy et al. 2002: hazard pointers
Gidenstam, Papatriantafilou, Sundell and Tsigas 2005: hazard pointer + reference counting
5 August 2005EPCC 200511
Lock-Free Reference Counting De-referencing links
1. Read the link contents, i.e. a pointer. 2. Increment (FAA) the reference count on
the corresponding object. What if the link is changed between step 1
and 2? Solution by Detlefs et al:
• Use DCAS on step 2 that operates on two arbitrary memory words. Retries if link is changed after step 2.
Solution by Valois et al:• The reference count field is present indefinitely.
Decrement reference count and retries if link is changed after step 2.
5 August 2005EPCC 200512
Lock-Free Hazard Pointers (Michael 2002) De-referencing links
1. Read the link contents, i.e. a pointer. 2. Set a hazard pointer to the read pointer
value. 3. Read the link contents again; if not same
as in step 1 then restart from step 1. Deletion
After deleted from data structure, put node on a local list.
When the local list reaches a certain size; scan all hazard pointers globally, reclaim memory of all nodes which address does not match the scan.
5 August 2005EPCC 200513
Lock-Free Memory Allocation
Solution (lock-free), IBM freelists:Create a linked-list of the free nodes,
allocate/reclaim using CAS
Needs some mechanism to avoid the ABA problem.
Head Mem 1 Mem 2 Mem n…
Used 1Reclaim
Allocate
5 August 2005EPCC 200514
Shared Data Structure:Dictionaries (Sets) Fundamental data structure Works on a set of <key,value> pairs Three basic operations:
Insert(k,v): Adds a new item
v=FindKey(k): Finds the item <k,v>v=DeleteKey(k): Finds and removes
the item <k,v>
5 August 2005EPCC 200515
Randomized Algorithm: Skip Lists
William Pugh: ”Skip Lists: A Probabilistic Alternative to Balanced Trees”, 1990 Layers of ordered lists with different
densities, achieves a tree-like behavior
Time complexity: O(log2N) – probabilistic!
1 2 3 4 5 6 7
Head Tail
50%25%…
5 August 2005EPCC 200516
New Lock-Free Concurrent Skip List
Define node state to depend on the insertion status at lowest level as well as a deletion flag
Insert from lowest level going upwards
Set deletion flag. Delete from highest level going downwards
1 2 3 4 5 6 7D D D D D D D
123
p
123
p D
5 August 2005EPCC 200517
Overlapping operations on shared data Example: Insert operation
- which of 2 or 3 gets inserted? Solution: Compare-And-Swap
atomic primitive:
CAS(p:pointer to word, old:word, new:word):booleanatomic do
if *p = old then *p := new; return true;
else return false;
1
2
3
4
Insert 3
Insert 2
5 August 2005EPCC 200518
Concurrent Insert vs. Delete operations
Problem:
- both nodes are deleted!
Solution (Harris et al): Use bit 0 of pointer to mark deletion status
1
3
42Delete
Insert
a)b)
1
3
42 * a)b)
c)
5 August 2005EPCC 200519
Helping Scheme
Threads need to traverse safely
Need to remove marked-to-be-deleted nodes while traversing – Help!
Finds previous node, finish deletion and continues traversing from previous node
1 42 *1 42 * or
? ?
1 42 *
5 August 2005EPCC 200520
Lock-Free Skip List - Techniques Summary The Skip List is treated as layers of ordered
lists Uses CAS atomic primitive Lock-Free memory management
IBM Freelists Reference counting (Valois+Michael&Scott)
Helping scheme Back-Off strategy All together proved to be linearizable
5 August 2005EPCC 200521
Lock-Free Skip List publications First publications in literature:
H. Sundell and P. Tsigas, ”Fast and Lock-Free Concurrent Priority Queues for Multi-thread Systems”, IPDPS 2003
H. Sundell and P. Tsigas, ”Scalable and Lock-Free Concurrent Dictionaries”, SAC 2004
Later publications: M. Fomitchev and E. Ruppert, “Lock-free
linked lists and skip lists”, PODC 2004 K. Fraser, “Practical lock-freedom”, PhD
thesis, 2004
5 August 2005EPCC 200522
New Lock-Free Skip List !
The thread that fulfils the deletion of a node removes the next pointer when finished.
Allows other threads to traverse through even marked next pointers.
If not possible to traverse forward, go back to the remembered position on previous (upper) levels.
Helps deletions-in-progress only when absolutely necessary.
Works with a modified version of Michael’s Hazard Pointer memory management!
5 August 2005EPCC 200523
Correctness
Linearizability (Herlihy 1991)In order for an implementation to be
linearizable, for every concurrent execution, there should exist an equal sequential execution that respects the partial order of the operations in the concurrent execution
5 August 2005EPCC 200524
Correctness
Define precise sequential semantics Define abstract state and its interpretation
Show that state is atomically updated Define linearizability points
Show that operations take effect atomically at these points with respect to sequential semantics
Creates a total order using the linearizability points that respects the partial order The algorithm is linearizable
5 August 2005EPCC 200525
Memory Consistency and Out-Of-Order execution Models on actual multiprocessor
architectures: Relaxed Memory Order etc.
Must insert special machine instructions (memory barriers) to enforce stronger memory consistency models!
t
W(x,1)Ti
Tj
Tk
W(y,0)
R(y)=0W(x,0)
R(x)=1
W(y,1)
R(y)=1
R(x)=1
R(x)=0
R(y)=1
5 August 2005EPCC 200526
Experiments
Experiment with 1-32 threads performed on Sun Fire 15K with 48 cpu’s. Each thread performs 20000 operations,
whereof the first total 50-10000 operations are Insert’s, remaining are equally randomly distributed over Insert, FindKey and DeleteKey’s.
Fixed Skiplist maximum level of 10. Compare with implementations of other skip
list-based dictionaries and a singly linked list by Michael, using same scenarios.
Averaged execution time of 10 experiments.
5 August 2005EPCC 200527
5 August 2005EPCC 200528
5 August 2005EPCC 200529
5 August 2005EPCC 200530
5 August 2005EPCC 200531
5 August 2005EPCC 200532
5 August 2005EPCC 200533
5 August 2005EPCC 200534
5 August 2005EPCC 200535
Multi-Word Compare-And-Swap Operations:
bool CASN(int *p1, int o1, int n1, …); int Read(int *p);
Standard algoritmic approach: 1. Try to acquire a lock on all positions of interest. 2. If already taken, help corresponding operation 3. If all taken and all match change status of operation 4. Remove locks and possibly write new values
My approach: Wait-free memory management (IPDPS 2005) Lock stealing and lock hand-over Allow un-sorted pointers
5 August 2005EPCC 200536
5 August 2005EPCC 200537
5 August 2005EPCC 200538
5 August 2005EPCC 200539
5 August 2005EPCC 200540
5 August 2005EPCC 200541
5 August 2005EPCC 200542
5 August 2005EPCC 200543
5 August 2005EPCC 200544
5 August 2005EPCC 200545
5 August 2005EPCC 200546
5 August 2005EPCC 200547
5 August 2005EPCC 200548
Lock-Free Deque
Practical algorithms in literature: Michael 2003, ”CAS-based lock-free
algorithm for shared deques”, Euro-Par 2003 Sundell and Tsigas, ”Lock-Free and Practical
Doubly Linked List-Based Deques using Single-Word Compare-And-Swap”, OPODIS 2004
Approach Apply new memory management on lock-
free deque
5 August 2005EPCC 200549
5 August 2005EPCC 200550
5 August 2005EPCC 200551
Conclusions
Work performed at EPCC Improved algorithm of lock-free skip list
• Improved Michael’s hazard pointer algorithm Experiments comparing with other recent dictionary
algorithms New implementation of CASN. Experiments comparing with other recent CASN
algorithms. Experiments comparing a lock-free deque
algorithm using different memory management techniques.
Future work Implement new lock-free/ wait-free dynamic data
structures. More experiments.
5 August 2005EPCC 200552
Questions?
Contact Information: Address:
Håkan Sundell Computing ScienceChalmers University of
Technology Email:
[email protected] Web:
http://www.cs.chalmers.se/~phs