security refresh prevent malicious wear-out and increase durability for phase-change memory with...
TRANSCRIPT
Security RefreshPrevent Malicious Wear-out and Increase Durability for Phase-Change Memory with Dynamically Randomized Address Mapping
Nak Hee SeongDong Hyuk WooHsien-Hsin S. Lee
Georgia Tech ECE
2
PCM as a Main Memory
Non-volatilityHigh densityCMOS compatible processBetter scalibility
High read / write latency
Limited write endurance (108 writes)
3
Evenly wearing outReducing bit flips
Write Endurance Schemes
Compare-N-Write[Yang, ISCS-07][Zhou, ISCA-36]
Flip-N-Write[Cho, MICRO-42]
Row shifting &Segment swapping
[Zhou, ISCA-36]
Randomized Region-Based Start-Gap
[Qureshi, MICRO-42]
4
What if we have
a malicious process?
5
Reducing bit flips Evenly wearing out
Write Endurance Schemes
Flip-N-Write[Cho, MICRO-42]
Row shifting &Segment swapping
[Zhou, ISCA-36]
Randomized Region-Based Start-Gap
[Qureshi, MICRO-42]
Compare-N-WriteCompare-N-Write
[Yang, ISCS-07][Zhou, ISCA-36]DETERMISTIC PATTERN
DANGER
DETERMISTIC PATTERN
DANGER
6
Write Endurance Schemes
Evenly wearing out
Row shifting &Segment swapping
[Zhou, ISCA-36]
Randomized Region-based Start-Gap
Randomized Region-Based Start-Gap
[Qureshi, MICRO-42]
7
Write Endurance Schemes
Evenly wearing out
Row shifting &Segment swapping
[Zhou, ISCA-36]
Randomized Region-based Start-Gap
Randomized Region-Based Start-Gap
[Qureshi, MICRO-42]
HIGH HW OVERHEAD
DANGER
Address translation
table
8
Write Endurance Schemes
Evenly wearing out
Row shifting &Segment swapping
[Zhou, ISCA-36]
Randomized Region-based Start-Gap
Randomized Region-Based Start-Gap
[Qureshi, MICRO-42]
HIGH HW OVERHEAD
DANGER
Static randomizer
G
STATIC RANDOMIZATION
DANGER
Linear mapping
9
Write Endurance Schemes
Evenly wearing out
Row shifting &
Segment swapping
Randomized Region-based Start-Gap
Randomized Region-Based Start-Gap
HIGH HW OVERHEAD
DANGER
STATIC RANDOMIZATION
DANGER
Low-Cost
Dynamic Randomization
Security Refresh
11
Security RefreshWrite Request
time
KEY0
KEY1
= 01
= 10
(Previous)
(Current)
0100
1110
RefreshMA = 00
RefreshMA = 01
RefreshMA = 10
RefreshMA = 11
A(00)B(01)
C(10)D(11)A(00)
PCM
Remapped Memory Address
Memory Address
RefreshInterval = 4
D(11)
B(01)
C(10)
RefreshMA = 10Ignore!! Ignore!!
Using XOR to Remap
A(00) XOR KEY(01) = “01”B(01) XOR KEY(01) = “00”C(10) XOR KEY(01) = “11”D(11) XOR KEY(01) = “10”
Remap Function: MA XOR KEY
RemappedMemory Address
Memory Address
12
Security RefreshWrite Request
time
KEY0
KEY1
= 01
= 10
(Previous)
(Current)
0100
1110
RefreshMA = 00
RefreshMA = 01
RefreshMA = 10
RefreshMA = 11
RefreshMA = 00
A(00)
PCM
Remapped Memory Address
Memory Address
RefreshInterval = 4
Refresh Round
D(11)
B(01)
C(10)
RefreshMA = 10Ignore!! Ignore!!
= 00= 10= 01= 11= 10= 01= 11= 00= 11A(00)B(01)
Remap Function: MA XOR KEY
13
Remap Function: MA XOR KEY
Security RefreshWrite Request
time
KEY0
KEY1
= 01(Previous)
(Current)
0100
1110
RefreshMA = 00
RefreshMA = 01
RefreshMA = 10
RefreshMA = 11
RefreshMA = 00
A(00)
PCM
Remapped Memory Address
Memory Address
RefreshInterval = 4
Refresh Round
D(11)
B(01)
C(10)
RefreshMA = 10Ignore!! Ignore!!
= 11A(00)B(01)C(10)D(11)
0100
1110
A(00)B(01)
C(10)D(11)
0100
1110
A(00)B(01)
C(10)D(11)A(00)D(11)
B(01)
C(10)0100
1110 A(00)
D(11)
B(01)
C(10)
A(00)B(01)C(10)D(11)
Security Refresh Round (i) Security Refresh Round (i+1)
Remapped byKey = 01
Remapped byKey = 10
Remapped byKey = 11
Dynamic Remapping
14
Evaluation Methodology• Monte Carlo Simulations
• 4GB PCM, 4 Banks
• Attack Model• Attack a random address for each refresh round
• Attack Latency = 600 ns
15
Refresh Intervals(Write Overhead)
0
50
100
150
200
250
300
350
400
450
256 512 1024 2048 4096 8192
Memory Block Size (B)
Avg
. Lif
etim
e (d
ays)
1 (50.0%)2 (33.3%)4 (20.0%)8 (11.1%)16 (5.9%)32 (3.0%)64 (1.5%)128 (0.8%)
14 months
Average Lifetime Evaluation
- Shorter Refresh Round
- Smaller Block Size
To Increase lifetime,
= Region Size X Refresh Interval
16
Needs Shorter Round (Frequent Key Updates)
Smallerregion
Higher vulnerability
Shorterinterval
Higher write & performance overhead
17
Needs Shorter Round (Frequent Key Updates)
Smallerregion
Higher vulnerability
Shorterinterval
Higher write & performance overhead
Virtually enlarge a regionwith multi-level Security Refresh
Multi-Level Security Refresh
19
One-Level Security Refresh
20
Two-Level Security Refresh
21
Two-Level Security Refresh Evaluation
• Monte Carlo Simulations
• 4GB PCM, 4 Banks
• Attack Model• Attack a random address for an Inner Refresh Round• Attack Latency = 600 ns
• Simulation• Memory Block Size: 256B• Outer Region: 1GB, 128 writes for Refresh Interval
22
Two-Level Security Refresh Evaluation
Inner-level Refresh Interval(Write Overhead)
0
10
20
30
40
50
60
70
80
90
100
16 32 64 128 256 512 1024The Number of Sub-regions
Av
g. L
ife
tim
e (
mo
nth
s) 8 (11.80%) 16 (6.61%)
32 (3.78%) 64 (2.30%)128 (1.54%) 256 (1.16%)512 (0.97%) 1024 (0.87%)
Theoretical Limit = 97.09 months 78.8 months
1.54%
Security RefreshBoth security and durabilityLow-cost, dynamic randomizationTwo-level Security Refresh
78.8 months (11.80% write overhead)60.0 months (1.54% write overhead)
Summary
24
Thank You All!!Questions?
Backup Slides
26
Reducing bit flips
Compare-N-Write
Evenly wearing out
Write Endurance Schemes
Flip-N-Write
Row shifting &
Segment swapping
Randomized Region-based Start-Gap
Randomized Region-based Start-Gap
Compare-N-WriteDANGER
DETERMISTIC PATTERN
Compare-N-WriteDANGER
DETERMISTIC PATTERN
Compare-N-WriteDANGER
HIGH HW OVERHEAD
Compare-N-WriteDANGER
STATIC RANDOMIZATION
27
Lifetime of Prior Works
RedundantWrite
Reduction
RedundantWrite
Reduction
Wear-levelingWear-leveling
Data-Comparison& Write [Yang, ISCS2007]
Data-Comparison& Write [Yang, ISCS2007]
Flip-N-Write[Cho, MICRO2009]
Flip-N-Write[Cho, MICRO2009]
Row-Shifting &Segment-Swapping
[Zhou, ISCA2009]
Row-Shifting &Segment-Swapping
[Zhou, ISCA2009]
RandomizedRegion-based
Start-Gap [Qureshi, MICRO2009]
RandomizedRegion-based
Start-Gap [Qureshi, MICRO2009]
DrawbacksDrawbacks
DeterministicPatterns
DeterministicPatterns
HighHardware
Cost
HighHardware
Cost
StaticRandomization
StaticRandomization
Time to failTime to fail
~2 minutes~2 minutes
~34 hours~34 hours
~18 minutesor
Avg. 23 hours
~18 minutesor
Avg. 23 hours
28
Vulnerability of Prior Works• Data-Comparison and Write
• Repeatedly write complementary values• 2 minutes
• Flip-N-Write• Repeatedly write 0x00 and 0x01 in turn• 2 minutes
• Row Shifting and Segment Swapping• Regular shifting pattern and high hardware overhead• 2048 minutes for 16GB 16-bank PRAM memory
• Randomized Region Based Start-Gap• Static Randomized Address Mapping• 34 minutes by carefully designed side-channel attacks
29
Prior Art: Dealing with Write Endurance• Eliminating unnecessary or redundant writes
• Partial dirty writes only [Lee, ISCA-36] [Qureshi, ISCA-36]
PCM Main Memory
1 1 0 0 0 1 0 0 L1 or L2 cache line
30
Prior Art: Dealing with Write Endurance• Eliminating unnecessary or redundant writes
• Partial dirty writes only [Lee, ISCA-36] [Qureshi, ISCA-36]
• Compare & write (silent stores) [Yang, ISCS-07][Zhou, ISCA-36]
FF00 DEAD BEEF 1234 5678 BCF0 0000 FFFF
PCM Main Memory
0012 DEAD BEEF 1234 5678 CDA0 0000 1111
=?
Read
=? =? =? =? =? =? =?
FF00 BCF0 FFFF
31
Prior Art: Dealing with Write Endurance• Eliminating unnecessary or redundant writes
• Partial dirty writes only [Lee, ISCA-36] [Qureshi, ISCA-36]
• Compare & write (silent stores) [Yang, ISCS-07][Zhou, ISCA-36]
• Flip-N-write (similar to bus-inverted coding) [Cho, MICRO-42]
PCM Main Memory
0001 0100 1110 1111 1100 0001 0000 1010
1110 1011 0000 0000 0000 1000 1111 0100
Hamming distance = 26 (out of 32) in this example
1111 1111 1110 1111 1100 1001 1111 1110Idea: Reduce Hamming distance to reduce flippingIdea: Reduce Hamming distance to reduce flipping
Read
32
Prior Art: Dealing with Write Endurance• Eliminating unnecessary or redundant writes
• Partial dirty writes only [Lee, ISCA-36] [Qureshi, ISCA-36]
• Compare & write (silent stores) [Yang, ISCS-07][Zhou, ISCA-36]
• Flip-N-write (similar to bus-inverted coding) [Cho, MICRO-42]
PCM Main Memory
0001 0100 1110 1111 1100 0001 0000 1010
0001 0100 1111 1111 1111 0111 0000 1011
Hamming distance = 6 (out of 32) in this example
0000 0000 0001 0000 0011 0110 0000 0001
1
Flip Bit1110 1011 0000 0000 0000 1000 1111 0100
0001 0100 1111 1111 1111 0111 0000 1011 10001 0100 1111 1111 1111 0111 0000 1011Store inverted data with flip bit
33
PCM Memory
Prior Art: Dealing with Write Endurance• Wear Leveling (evenly distribute writes)
• Row shifting and Segment swapping [Zhou, ISCA-36]
Shift one byte for every 256 writes
PCM Memory Row counterSh
ift a
mo
unt
34
Memory controllerPCM Memory
Prior Art: Dealing with Write Endurance• Wear Leveling (evenly distribute writes)
• Row shifting and Segment swapping [Zhou, ISCA-36]
1MB (hot) Segment X
1MB (cold) Segment X
4k-entry map table for 4GB PCM
counter
counter
35
Prior Art: Dealing with Write Endurance• Wear Leveling (evenly distribute writes)
• Row shifting and Segment swapping [Zhou, ISCA-36]
• Region-based start-gap (RBSG) [Qureshi, MICRO-42]
Animation courtesy: Moin Qureshi of IBM Corp.
STARTABC
0 1 2 3
4
D
GAP
Region counter
PCMAddr = (Start+Addr); (PCMAddr >= Gap) PCMAddr++)
36
Randomized Region Based Start-Gap
Address SpaceRandomization
ABCDEFGH
000001010011100101110111
PA
A
B
C
D
E
FG
H
000001010011100101110111
IA
Start-GapTranslation
A
B
C
D
E
FG
H
0 0 000 0 010 0 100 0 11
0 1 000 1 010 1 10 1 1 11
MA
Gap
Gap
1 0 11
0 1 11
Region #0
Region #1
37
Bank0
Start-Gap Configuration• System Configuration
• 16GB memory, 16 banks, 32KB physical page• 150 ns and 450 ns for PCRAM read and write latency• MC using open page policy
• Start-Gap Configuration• DWF = 16• ψ = 100
• Wmax = 108
• Line = Physical PageGBKB
KKW
K
16322 SizeRegion
291.1100
10
19
198
max
16n+016(n+1)+0
16(n-1)+0
Bank1
16n+116(n+1)+1
16(n-1)+1
Bank2
16n+216(n+1)+2
16(n-1)+2
Bank15
16n+1516(n+1)+15
16(n-1)+15
PhysicalLine Address
GAP
38
Side-Channel Attack: Step 1• Finding a set (α) of logical addresses mapped to the
physically same bank• using latency differences between bank conflict
latency and bank parallel access latency
sec63.0)1502(iterations 432
16 ns
KB
GB
Bank0
GM
A
Bank1
HN
B
Bank2
IO
C
Bank15
LR
FGAP
Bank Conflicts
Bank ParallelAccesses
LogicalLine Address 1st Bank Set α
39
Side-Channel Attack: Step 2• Shifting 16 lines
Bank0
M
A
Bank1
N
B
Bank2
O
C
Bank15
LR
FGAP
LogicalLine Address
KG H
sec96.0)150450(16 mnsns
40
Side-Channel Attack: Step 3
Bank0
M
A
Bank1
N
B
Bank2
O
C
Bank15
LR
FGAP
LogicalLine Address
KG H
2nd Bank Set β
• Finding a new set (β) of physical addresses mapped to the same bank with the first set (α).
• Finally, we found that H and G are physically continuous line addresses by comparing α with β.
sec63.0)1502(iterations 432
16 ns
KB
GB
41
Side-Channel Attack: Step 4
Bank0
L
Bank1
M
A
Bank2
N
B
Bank15
KO
EGAP
JF G
• Attacking the logical line address, H, for one Gap Rotation.
• Attacking the logical line address, G, for one Gap Rotation.
sec409)450150450(32
16 nsDWFnsns
KB
GB
sec409)450150450(32
16 nsDWFnsns
KB
GB
Fail in 14 minutes
42
Proof of Security Refresh• Magic of XOR!!
• A swapped victim is also remapped by a new key.• Assume CRP = A.
element.identity an is where, :Property Inverse-Self
:Property eCommutativ
)()( :Property eAssociativ
eexx
xyyx
zyxzyx
OLD
NEWOLDNEW
NEW
OLDNEW
OLD
NEW
KEYA
KEYKEYKEYA
KEY
KEYKEYA
KEY
KEYAA
victim theofMA victim theofLocation New
victim theofRMA victim theofMA
victim theofRMA
ofLocation New
43
How to know already remapped or not
• In other words, whether was an MA pointed by CRP the victim of a previous CRP?• If it is true,
• Check
CRPCRPCRPCRP PREVPREV where, of victim theofMA
remapped.already was then the
, if Therefore,
of victim theofMA
of victim theofRMA
CRP
CRPKEYKEYCRP
KEYKEYCRPCRP
KEYKEYCRPCRP
KEYCRPCRP
OLDNEW
OLDNEWPREV
OLDNEWPREVPREV
NEWPREVPREV
CRPKEYKEYCRP OLDNEW if
44
How to select a Key for Address Translation
• Assume A is the MA of a coming request.
• Two cases for using KEY1(KEYNEW).
• If ,
• or if
• Otherwise, use KEY0(KEYOLD).
CRPA
CRPKEYKEYA NEWOLD
45
Security Refresh FlowchartStart:
A Request from Upper Level
Is the MAalready
remapped?
RA=MA XOR KEY0
RA=MA XOR KEY1
Send a Request with RAto Lower Level
Send 4 Requests to Lower LevelRead from (CRP XOR KEY0)Read from (CRP XOR KEY1)Write to (CRP XOR KEY1)Write to (CRP XOR KEY0)
WriteOperation?
GWC++
GWCOverflow?
Is the CRPalready
remapped?
KEY0 = KEY1KEY1 = new key from RKG
CRPOverflow?
End
N Y
Y
N
Y
N
N
Y
N
Y
Upper level : Memory ControllerLower level : PCRAM Bank Array
Additional 4 requestscan be generated
for remapping.
46
Smaller Block Size
Block Address0 1 2 3 4 5 6 7
Life
time
Write EnduranceTotal Writes = 0
Block Address0 2 4 6 8 10 12 14
Life
time
Write EnduranceTotal Writes = 0
1 3 5 7 9 11 13 15
4
4
8
8
12
12
16
16
20
20
24
24
28
28
32
32
36
36
40
40
44
44
48
48
52
52
56
56
60
60646872768084889296100104
47
Shorter Refresh Round
Block Address0 1 2 3 4 5 6 7
Life
time
Write EnduranceTotal Writes = 048121620242832364044
Block Address0 1 2 3 4 5 6 7
Life
time
Write EnduranceTotal Writes = 0246810121416182022242628303234363840424446485052545658606264666870
48
Two-Level Security Refresh Rationale
• Inner sub-region level • Smaller regions• More frequent refresh rounds with different
random keys
• Outer bank level• Effectively enlarge the address remapping space
• Inner and outer levels can employ their own• Memory block sizes• Refresh intervals
49
RANK3RANK2
RANK1RANK0
Chip0
Two Level Security Refresh
Bank0Bank0
Bank0
Chip1 Chip7
Requestfrom MC Data Data Data
Two-levelSecurityRefresh
Bank0Bank0
Bank0Bank0
Bank0Bank0
Protect PCRAM from side-channel attacksby implementing Security Refresh inside a bank.
50
Two-Level Security Refresh
PCM Bank Array
BankSRC
Swap Buffers
AddressDecoder
RequestWriteData
ReadData
PCM Bank
MCLevel
BankLevel
(SR Level 1)
Sub-regionSRC 0
SharedSwap
Buffers
Sub-regionSRC 1
Sub-regionSRC (n-1)
Sub-regionLevel
(SR Level 2)
UpperLevel
LowerLevel
Physical ArrayLevel
Sub-region0
Sub-region1
Sub-region(n-1)
51
Two-Level Security Refresh
OuterSRC PCM Region
Sub-region #0
Sub-region #1
Sub-region #2
Sub-region #3
InnerSRC #0
InnerSRC #1
InnerSRC #2
InnerSRC #3
52
Two-Level Security Refresh Example
• Initial state
RA
Data
<Terminology>MC : memory controllerBSRC : bank-level SRCSSRC0, SSRC1 : Sub-region SRCMA : memory address from MCBRA : bank-level remapped addressSRA : sub-region remapped address
BADC
FEHG
0 000 010 100 11
1 001 011 101 11
Sub-region 0 Sub-region 1
Bank-regionGWC = 0
CRP = 000KEY0 = 001KEY1 = 110
GWC = 0CRP = 00
KEY0 = 00KEY1 = 10
GWC = 0CRP = 00
KEY0 = 00KEY1 = 01
Refresh Interval-Bank-region: 1-Sub-region: 1
buf0 buf1
buf0 buf1
53
Two-Level Security Refresh Example
BADC
FEHG
0 000 010 100 11
1 001 011 101 11
Sub-region 0 Sub-region 1
Bank-regionGWC = 0
CRP = 000KEY0 = 001KEY1 = 110
GWC = 0CRP = 00
KEY0 = 00KEY1 = 10
GWC = 0CRP = 00
KEY0 = 00KEY1 = 01
MCLevel
BankLevel
(SR Level 1)
Sub-regionLevel
(SR Level 2)
Refresh Interval-Bank-region: 1-Sub-region: 1Wr 000, I
Rd 000
BSRC
SSRC1
Wr 001, I
Wr 001, buf1
Rd 001Rd 110
buf0 buf1
buf0 buf1
Wr 110, buf0
Wr 000, buf1
Rd 000Rd 010
Wr 010, buf0
Overflow
SSRC0Wr 001, I
Overflow
I
B D
D
B
CRP = 001
CRP = 01
54
Two-Level Security Refresh Example
C
FEHG
0 000 010 100 11
1 001 011 101 11
Sub-region 0 Sub-region 1
Bank-regionGWC = 0 KEY0 = 001
KEY1 = 110
GWC = 0CRP = 01
KEY0 = 00KEY1 = 10
GWC = 0CRP = 00
KEY0 = 00KEY1 = 01
MCLevel
BankLevel
(SR Level 1)
Sub-regionLevel
(SR Level 2)
Refresh Interval-Bank-region: 1-Sub-region: 1Rd 000
BSRC
Wr 001, buf1
Rd 001Rd 110
buf0 buf1
buf0 buf1
Wr 110, buf0
SSRC0
ID
B
CRP = 001
Rd 001
I
SSRC1Rd 110
H
Wr 001, H
Overflow
Rd 001Rd 011
Wr 001, buf1Wr 011, buf0
H
H C
CRP = 10
C
H
Wr 110, I
Overflow
Rd 100Rd 101
Wr 100, buf1Wr 101, buf0
CRP = 01
I
F E
EF
55
Two-Level Security Refresh Example
G
0 000 010 100 11
1 001 011 101 11
Sub-region 0 Sub-region 1
Bank-regionGWC = 0 KEY0 = 001
KEY1 = 110
GWC = 0 KEY0 = 00KEY1 = 10
GWC = 0 KEY0 = 00KEY1 = 01
MCLevel
BankLevel
(SR Level 1)
Sub-regionLevel
(SR Level 2)
Refresh Interval-Bank-region: 1-Sub-region: 1Rd 000
BSRC
buf0 buf1
buf0 buf1
SSRC0
D
B
CRP = 001
CRP = 10
C
H
CRP = 01
I
EF
Rd 110
SSRC1Rd 110
I
56
Evaluation Method• Birthday Paradox Attack
• Can fail RBSG in 1~2 months
• Our side channel attack failed RBSG much faster
57
Evaluation Method• Equivalent to “throwing random balls to buckets” (collision attack)
To fail a PCM cell takes 108 collisions
58
Performance Evaluation
• Geometric means of IPC variations
• -1.2%, -0.7% and -0.5% for the 3 inner refresh intervals
Inner-level Refresh Interval (Write Overhead)
-7%-6%-5%-4%-3%-2%-1%0%1%
40
0.p
erl
be
nch
40
1.b
zip
2
40
3.g
cc
42
9.m
cf
44
5.g
ob
mk
45
6.h
mm
er
45
8.s
jen
g
46
2.li
bq
ua
ntu
m
46
4.h
26
4re
f
47
1.o
mn
etp
p
47
3.a
sta
r
48
3.x
ala
ncb
mk
41
0.b
wa
ves
41
6.g
am
ess
43
3.m
ilc
43
5.g
rom
acs
43
6.c
act
usA
DM
43
7.le
slie
3d
44
4.n
am
d
44
7.d
ea
lII
45
0.s
op
lex
45
3.p
ovr
ay
45
4.c
alc
ulix
45
9.G
em
sFD
TD
48
1.w
rf
48
2.s
ph
inx3
Ge
om
ea
n
IPC
Va
ria
tion
s
32 (3.78%) 64 (2.30%) 128 (1.54%)