concurrency in failure atomic data structures on ......transactions and concurrency, lock-free...
TRANSCRIPT
Storage Performance Development Kit (SPDK)Persistent Memory Development Kit (PMDK)
Intel® VTune™ ProfilerVirtual Forum
Igor Chorążewicz
Software EngineerIntel
Sergey Vinogradov
Software EngineerIntel
SPDK, PMDK & Intel® VTune™ Profiler Virtual Forum
Agenda
02 Multithreading for persistent memoryTransactions and concurrency, lock-free programming
03 Concurrent data structuresPersistent TLS, concurrent map, concurrent hash map
04 SummaryIntegration with pmemkv and call to action
01 IntroductionBasic concepts of persistent memory programming
2
SPDK, PMDK & Intel® VTune™ Profiler Virtual Forum
Agenda
02 Multithreading for persistent memoryTransactions and concurrency, lock-free programming
03 Concurrent data structuresPersistent TLS, concurrent map, concurrent hash map
04 SummaryIntegration with pmemkv and call to action
01 IntroductionBasic concepts of persistent memory programming
3
SPDK, PMDK & Intel® VTune™ Profiler Virtual Forum 4
Persistent Domain
SPDK, PMDK & Intel® VTune™ Profiler Virtual Forum 5
8 bytes Power Fail Atomicity
uint64_t v = 0;...pmem->v = 1024;pmem_persist(&v, 8);
strcpy(pmem, "Hello, World!");pmem_persist(pmem, 14);
pmem->v = • 0• 1024
pmem = • "\0\0\0\0\0\0\0\0\0\0..."• "Hello, W\0\0\0\0\0\0..."• "\0\0\0\0\0\0\0\0ord!\0"• "Hello, World!\0"...
SPDK, PMDK & Intel® VTune™ Profiler Virtual Forum 6
libpmemobj(-cpp)libpmemobj is a transactional object storage system for Persistent Memory
It provides synchronous ACID-like transactions, a failure atomic memory allocator and other general facilities for persistent memory programming
https://github.com/pmem/pmdk
https://github.com/pmem/libpmemobj-cpp
SPDK, PMDK & Intel® VTune™ Profiler Virtual Forum 7
Libpmemobj Failure Atomic TransactionsTransaction guarantees:
• Atomicity
• Consistency
• Isolation
• Durability
SPDK, PMDK & Intel® VTune™ Profiler Virtual Forum 8
Alternative Approach - Redo Logs
SPDK, PMDK & Intel® VTune™ Profiler Virtual Forum
Agenda
02 Multithreading for persistent memoryTransactions and concurrency, lock-free programming
03 Concurrent data structuresPersistent TLS, concurrent map, concurrent hash map
04 SummaryIntegration with pmemkv and call to action
01 IntroductionBasic concepts of persistent memory programming
9
SPDK, PMDK & Intel® VTune™ Profiler Virtual Forum 10
Isolation and PMDK Transactions• Two threads do atomic increment
of the counter.
• Initial value of the counter logged by both transactions.
• Incorrect value of the counter is restored in case of transaction abort
std::atomic<int> a = 0; // a is in persistent memory
Thread 1:
manual tx;
snapshot(&a);
++a;
…
abort();
Thread 2:
manual tx;
snapshot(&a);
++a;
…
commit();
undo log
a = 0
undo log
a = 0
Incorrect value of the counter is restoredfrom undo log if thread 1 aborts transactionwhile thread 2 successfully commits itschanges.
Solution:1. Lock-based critical section2. Postpone the lock release to the end
of transaction
SPDK, PMDK & Intel® VTune™ Profiler Virtual Forum 11
Mutexes on Persistent Memory• Libpmemobj pmem-aware locks
• Automatic reinitialization
• Can be embedded in pmem-resident objects
// a and mtx are in persistent memoryint a = 0;pmem::obj::mutex mtx;
Thread 1:
manual tx(mtx);
snapshot(&a);
++a;
…
abort();
Thread 2:
manual tx(mtx);
snapshot(&a);
++a;
…
commit();
Transactions are executed serially
SPDK, PMDK & Intel® VTune™ Profiler Virtual Forum 12
Lock-Free Algorithms on Persistent Memory
• If caches are not flushed on failure (ADR)
• Cannot easily use std::atomic
• Must manually flush all data after write
• Visible value might be different than the persistent value
SPDK, PMDK & Intel® VTune™ Profiler Virtual Forum 13
Lock-Free Algorithms on Persistent Memory
Volatile domain
Persistent domain
3 4 thread 2 51 2
31 2
Volatile domain
Persistent domain
thread 1 4
thread 2 5
31 2
31 2
Compare-and-swap is used to insert an element to the tail
Changes made by one thread are visible to other threads before they are persisted.• CMPXCHG + CLWB – not atomic
SPDK, PMDK & Intel® VTune™ Profiler Virtual Forum 14
Atomic store/load - WRONG// thread 1// initial value of pmem->a is 0atomic_store(&pmem->a, 1); // visible = 1, persistent = ?pmem_persist(&pmem->a); // visible = 1, persistent = 1
// thread 2// initial value of pmem->b is 0if (atomic_load(&pmem->a) == 1) {
pmem->b = 1; // visible = 1, persistent = ?pmem_persist(&pmem->b); // visible = 1, persistent = 1
}
Possible persistent values of (a, b) = (0, 0), (1, 0), (1, 1), (0, 1)
SPDK, PMDK & Intel® VTune™ Profiler Virtual Forum 15
Atomic store/load – Persist on Read// thread 1// initial value of pmem->a is 0atomic_store(&pmem->a, 1); // visible = 1, persistent = ?pmem_persist(&pmem->a); // visible = 1, persistent = 1
// thread 2// initial value of pmem->b is 0if (atomic_load(&pmem->a) == 1) {
pmem_persist(&pmem->a); // visible = 1, persistent = 1pmem->b = 1; // visible = 1, persistent = ?pmem_persist(&pmem->b); // visible = 1, persistent = 1
}
Possible persistent values of (a, b) = (0, 0), (1, 0), (1, 1)
SPDK, PMDK & Intel® VTune™ Profiler Virtual Forum 16
Atomic store/load – Redo Log// thread 1// initial value of pmem->a is 0redo_log.set(&pmem->a, 1); // visible = 0, persistent = 0redo_log.publish(); // visible = 1, persistent = 1
// thread 2// initial value of pmem->b is 0if (atomic_load(&pmem->a) == 1) {
pmem->b = 1; // visible = 1, persistent = ?pmem_persist(&pmem->b); // visible = 1, persistent = 1
}
Possible persistent values of (a, b) = (0, 0), (1, 0), (1, 1)
SPDK, PMDK & Intel® VTune™ Profiler Virtual Forum 17
Persistent Memory Leaks
If a reference to newly alocated persistent memory is not stored durably there is a possibility of memory leak.
persistent_ptr ptr;
{manual tx;ptr = make_persistent();commit();
}
cas(&list_tail->next, nullptr, ptr);
Memory leak, if a crash happens here.
SPDK, PMDK & Intel® VTune™ Profiler Virtual Forum
Agenda
02 Multithreading for persistent memoryTransactions and concurrency, lock-free programming
03 Concurrent data structuresPersistent TLS, concurrent map, concurrent hash map
04 SummaryIntegration with pmemkv and call to action
01 IntroductionBasic concepts of persistent memory programming
18
SPDK, PMDK & Intel® VTune™ Profiler Virtual Forum 19
Concurrent Data Structures
• Designing efficient concurrent algorithm for persistent memory is a challenging task
• Libpmemobj-cpp provides two concurrent data structures:
• Concurrent map
• Concurrent hash map
• Our data structures are well-tested and ready to be used in a production out of the box
• Run tests under Valgrind tools (memcheck, drd, helgrind, pmemcheck, pmreorder)
• Simulate crashes using GDB
SPDK, PMDK & Intel® VTune™ Profiler Virtual Forum 20
Persistent TLS (Thread Local Storage)• Persistent memory local to a thread
• Use cases:
• Distributing global variables
• Base for redo log implementation
// Persistent version of// tbb::enumerable_thread_specifictemplate <typename T>class enumerable_thread_specific {public:
T& local();
void clear();
iterator begin();iterator end();
};
• Our concurrent data structures use persistent TLS to calculate size.
• Because, atomic variable cannot be used inside transactions.
SPDK, PMDK & Intel® VTune™ Profiler Virtual Forum 21
Concurrent Map based on Concurrent Skip List
• Multilayer linked list-like data structure.
• The bottom layer is an ordinary ordered linked list.
• Each higher layer acts as an "express lane" for the lists below.
• An element in layer i appears in layer i+1 with fixed probability p (in our case p = 1/2).
• Search is wait-free.
• Insert employs optimistic lock-based synchronization.
https://www.cs.tau.ac.il/~shanir/nir-pubs-web/Papers/OPODIS2006-BA.pdf
Algorithm Average Worst
Space O(n) O(n log n)
Search O(log n) O(n)
Insert O(log n) O(n)
Delete O(log n) O(n)
1 2 4 5 73 8 9dummy head
NIL
NIL
NIL
NIL
SPDK, PMDK & Intel® VTune™ Profiler Virtual Forum 22
Concurrent Skip List: Search Operation
• A search for a target element begins at the head element in the top list
• Proceeds horizontally until the current element is greater than or equal to the target
• Drops down vertically to the next lower list if the current element is greater to the target
• Search is wait-free
• Reading pointers to the next element atomically using load-with-acquire semantic
1 2 4 5 73 8 9dummy head
NIL
NIL
NIL
NIL
• Search element with key = 9.• Search path: dummy->4->8->9
SPDK, PMDK & Intel® VTune™ Profiler Virtual Forum 23
Concurrent Skip List: Delete Operation
Our implementation does not support concurrent delete operation
• There is a way to logically delete a node from the skip list. But…
• There is a memory reclamation problem
• We need to guarantee object life-time, while other threads accessing it
• It is hard to solve without garbage collector
• There are possible solutions, but they might hurt Search/Insert performance
SPDK, PMDK & Intel® VTune™ Profiler Virtual Forum 24
Concurrent Skip List: Insert Operation
• Find insert position and remember previous and next nodes on each layer
• Lock previous nodes on each layer (dummy, 4, 5)
• Check that next nodes has not been changed after locks are acquired
• Insert new node – update each layer
1 4 53 8 9dummy head
NIL
NIL
NIL
NIL
1 4 53 8 9dummy head
NIL
NIL
NIL
NIL
77
InsertLock previous nodes on each layer
Find insert position
SPDK, PMDK & Intel® VTune™ Profiler Virtual Forum
Data Consistency in List-like Data Structures
• Data consistency = each node is reachable after crash
• Use persistent TLS to track persistent allocations
• Each new node is always reachable via TLS
• In case of crash we can redo insert if it was not completed
7
TLS
TLS holds pointers to new nodes:New node allocated and assigned to
pointer in TLS
2 4 5 8dummy head
NIL
NIL
NIL
NIL
7
TLS
Insert new node into
the list
SPDK, PMDK & Intel® VTune™ Profiler Virtual Forum
• Optimistic per-bucket Read-Write lock
• Find() acquires read lock
• Insert() and Erase() acquires write lock
• Each operations do following actions:
• Finds required bucket using the hash
• Lock the bucket for read or write access
• Isolation: only a single writing thread can modify bucket at a time
• Works with the nodes inside bucket
26
Concurrent Hash MapAlgorithm Average Worst
Space O(n) O(n)
Search O(1) O(n)
Insert O(1) O(n)
Delete O(1) O(n)
buckets nodes
SPDK, PMDK & Intel® VTune™ Profiler Virtual Forum
Agenda
02 Multithreading for persistent memoryTransactions and concurrency, lock-free programming
03 Concurrent data structuresPersistent TLS, concurrent map, concurrent hash map
04 SummaryIntegration with pmemkv and call to action
01 IntroductionBasic concepts of persistent memory programming
27
SPDK, PMDK & Intel® VTune™ Profiler Virtual Forum 28
PMEMKV
PM
DK
libpmemobj
libpmemobj++
pmemkv core (C++)
C API
C++ API (header only)
pm
em
kv
C++ applications
NAPI
Node.js bindings
JNI
Java bindings
FFI
Ruby bindings
Capplications
Ruby applications
Java applications
JavaScript applications
memkind TBBpmemkv“native” engines
bin
din
gs
applications• Embedded Key/Value storage
• optimized for persistent memory.
• API is thread-safe
• Two concurrent engines:
• CSMAP
• CMAP
SPDK, PMDK & Intel® VTune™ Profiler Virtual Forum 29
Summary
• Designing efficient concurrent algorithms for persistent memory is a challenging task
• Developers should always keep in mind visibility vs. persistency
• Libpmemobj-cpp provides efficient and well-tested concurrent data structures designed for persistent memory
• PMEMKV is a key/value storage for persistent memory
• Thread-safe out of the box
• Provides two concurrent engines: CSMAP, CMAP
SPDK, PMDK & Intel® VTune™ Profiler Virtual Forum 30
Call to Action• Try our concurrent data structures
• https://github.com/pmem/libpmemobj-cpp
• Try PMEMKV in your C/C++, Java, Python or NodeJS apps
• https://github.com/pmem/pmemkv
• Read more about persistent memory and concurrent data structures
• https://pmem.io/book/
SPDK, PMDK & Intel® VTune™ Profiler Virtual Forum 31
Questions
Storage Performance Development Kit (SPDK)Persistent Memory Development Kit (PMDK)
Intel® VTune™ ProfilerVirtual Forum