daniel bittman peter alvaro darrell d. e. long ethan l. miller · 26/02/2019 · daniel bittman...
Post on 12-Aug-2020
4 Views
Preview:
TRANSCRIPT
Optimizing Systems for Byte-Addressable NVM by Reducing Bit Flipping
Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller
1
FAST ‘192019-02-26
CRSS Confidential
Byte-addressable Non-volatile Memory
2
BNVM is coming, and with it, new optimization targets
CRSS Confidential
Byte-addressable Non-volatile Memory
3
0 1 0 1 0 0 0 1
0 1 1 0 0 1 0 1
It’s not just writes... ...it’s the bits flipped by those writes
BNVM power usage
4
Can we take advantage of this?
5
Software vs. hardware?
Can we take advantage of this?
6
Software vs. hardware?
How hard is it to reason about bit flips?
Can we take advantage of this?
7
Software vs. hardware?
How hard is it to reason about bit flips?
How do we design data structures to reduce bit flips?
8
Reducing Bit Flips in Software ●
●●
9
XOR linked lists
10
value
prevnext
value
prevnext
value
prevnext
valuexptr
valuexptr
valuexptr
xptr = next ⊕ prev
Traditional doubly linked list
XOR linked list
Pointers!
11
Some actual pointersA = 0x000055b7bda8f260B = 0x000055b7bda8f6a0
A ⊕ B = 0x4C0 = 0b10011000000
Using XOR in hash tables
12
key
value
xnext
key
value
xnext
key
value
xnext
key
value
xnext
...
...
Using XOR in hash tables
13
key
value
xnext D
key
value
00000 0
key
value
xxxxx 1
Both indicate “entry is empty”
From XOR linked lists to Red Black Trees
14
L RP
Standard 3-pointer red-black tree design
From XOR linked lists to Red Black Trees
15
L RP LX RXLX = L ⊕ P
RX = R ⊕ P
Now 2-pointer, and XOR pointers
Evaluation ●
●
16
Experimental framework
17
, at the memory controller
Test different data structures, with different cache parameters,over a varying number of operations
18
Bit flips: calling malloc()
Cross-over point between40 and 48 bytes
19
Bit flips: XOR Linked Lists
XOR linked lists reducebit flips dramatically
better
20
Bit flips: Hash table
better
Hash table already hadfew flips to save:
chains should be short
21
Bit flips: Red-black Trees
better
Significant savings
22
Bit flips: L2 Cache Behavior
better
L2 cache has ultimately little effect!
23
Bit flips: L2 Cache Behavior
better
L2 cache has ultimately little effect!
...even when increasing in size
24
Performance: RBT insert
better
Performance is notsignificantly affected!
a) Performance cost of XORsb) Performance benefit of smaller node size
25
Performance: hash table
better
Almost no effect on performance
CRSS Confidential
Conclusions
26
Savings are significant with little performance impact.
We can design around bit flips, and we should.
Bit flip/write inversion
27
Data structures
Systemsimulation Caching
effects Pointerdistance
Power and wear estimates
Full systems
Real hardware
More datastructuresAdditional
techniques
Bitflips from algorithms
ABI modifications
Thank You! Questions?Daniel Bittman
darrell@ucsc.edu elm@ucsc.edupalvaro@ucsc.edu
28
https://gitlab.soe.ucsc.edu/gitlab/crss/opensource-bitflipping-fast19
29
Backup slides
30
Bit flips: instrumentation
better
31
Performance: RBT lookup
better
32
Performance: XOR Linked List
Operation XOR Linked List Doubly Linked List
45 +/- 1 45 +/- 1
27 +/- 1 28 +/- 1
2.6 +/ 0.1 2.2 +/- 0.1
Stack frames
33
arg0pc
csr0sp
csr1
arg0pc
csr1sp
fn0 fn1
Stack frames
34
arg0pc
csr0sp
csr1
arg0pcsp
fn0 fn1
csr1
“Wasted space”
35
Bit flips: stack frames
top related