towards thousand-core risc-v shared memory systems...towards thousand-core risc-v shared memory...
TRANSCRIPT
![Page 1: Towards Thousand-Core RISC-V Shared Memory Systems...Towards Thousand-Core RISC-V Shared Memory Systems Quan Nguyen Massachusetts Institute of Technology 30 November 2016 . Outline](https://reader034.vdocuments.site/reader034/viewer/2022042311/5ed9c66a3029ca4c5b53a63f/html5/thumbnails/1.jpg)
Towards Thousand-Core RISC-V Shared Memory Systems
Quan Nguyen Massachusetts Institute of Technology
30 November 2016
![Page 2: Towards Thousand-Core RISC-V Shared Memory Systems...Towards Thousand-Core RISC-V Shared Memory Systems Quan Nguyen Massachusetts Institute of Technology 30 November 2016 . Outline](https://reader034.vdocuments.site/reader034/viewer/2022042311/5ed9c66a3029ca4c5b53a63f/html5/thumbnails/2.jpg)
Outline
• The Tardis cache coherence protocol – Example – Scalability advantages
• Thousand-core prototype • RISC-V and Tardis
2
![Page 3: Towards Thousand-Core RISC-V Shared Memory Systems...Towards Thousand-Core RISC-V Shared Memory Systems Quan Nguyen Massachusetts Institute of Technology 30 November 2016 . Outline](https://reader034.vdocuments.site/reader034/viewer/2022042311/5ed9c66a3029ca4c5b53a63f/html5/thumbnails/3.jpg)
Tardis
• A new cache coherence protocol • Enforces consistency through timestamps • Key idea: logical leases – Timestamps wts and rts: start and end of lease – Processors keep timestamp pts in register
3 Xiangyao Yu and Srinivas Devadas, “Tardis: Time Traveling Coherence Algorithm for Distributed Shared Memory”, PACT 2015.
![Page 4: Towards Thousand-Core RISC-V Shared Memory Systems...Towards Thousand-Core RISC-V Shared Memory Systems Quan Nguyen Massachusetts Institute of Technology 30 November 2016 . Outline](https://reader034.vdocuments.site/reader034/viewer/2022042311/5ed9c66a3029ca4c5b53a63f/html5/thumbnails/4.jpg)
Block diagram Core
D$ Core
D$
Last-level cache
Main memory
Network-on-chip
4
![Page 5: Towards Thousand-Core RISC-V Shared Memory Systems...Towards Thousand-Core RISC-V Shared Memory Systems Quan Nguyen Massachusetts Institute of Technology 30 November 2016 . Outline](https://reader034.vdocuments.site/reader034/viewer/2022042311/5ed9c66a3029ca4c5b53a63f/html5/thumbnails/5.jpg)
Metadata per cache line
• Client: – Use ordinary MESI states
• Manager: – Tracks only the exclusive owner – Tracks most recent lease
A state: M owner: 0 wts: 0 rts: 0
A state: M wts: 1 rts: 1
5
![Page 6: Towards Thousand-Core RISC-V Shared Memory Systems...Towards Thousand-Core RISC-V Shared Memory Systems Quan Nguyen Massachusetts Institute of Technology 30 November 2016 . Outline](https://reader034.vdocuments.site/reader034/viewer/2022042311/5ed9c66a3029ca4c5b53a63f/html5/thumbnails/6.jpg)
Example
6
• Core 0 stores A • Manager leases A – Sets owner, wts, rts
• Core 0: – Gets A with
[wts, rts] = [0, 0] – Writes A’, creates new
version at [1, 1] – Updates its pts to 1
Core 0: Core 1: store A load B store B load A load B
Core 0 pts: 0
A state: I wts: rts:
B state: I wts: rts:
Core 1 pts: 0
A state: I wts: rts:
B state: I wts: rts:
pts: 1
A state: M wts: 0 rts: 0 A’ state: M wts: 1 rts: 1
Manager
A state: I owner: wts: 0 rts: 0
B state: I owner: wts: 0 rts: 0
A state: M owner: 0 wts: 0 rts: 0
![Page 7: Towards Thousand-Core RISC-V Shared Memory Systems...Towards Thousand-Core RISC-V Shared Memory Systems Quan Nguyen Massachusetts Institute of Technology 30 November 2016 . Outline](https://reader034.vdocuments.site/reader034/viewer/2022042311/5ed9c66a3029ca4c5b53a63f/html5/thumbnails/7.jpg)
7
• Core 0 loads B – Sends pts to manager
• Manager leases B – Sets lease based on pts – [wts, rts] = [1, 11]
• Core 0 reads B at pts 1
Core 0: Core 1: store A load B store B load A load B
Core 0 pts: 1
A’ state: M wts: 1 rts: 1
B state: I wts: rts:
Core 1 pts: 0
A state: I wts: rts:
B state: I wts: rts:
Manager
A state: M owner: 0 wts: 0 rts: 0
B state: I owner: wts: 0 rts: 0 B state: S owner: wts: 1 rts: 11
B state: S wts: 1 rts: 11
Example
![Page 8: Towards Thousand-Core RISC-V Shared Memory Systems...Towards Thousand-Core RISC-V Shared Memory Systems Quan Nguyen Massachusetts Institute of Technology 30 November 2016 . Outline](https://reader034.vdocuments.site/reader034/viewer/2022042311/5ed9c66a3029ca4c5b53a63f/html5/thumbnails/8.jpg)
8
• Core 1 stores B – Sends pts to manager
• Manager advances past Core 0’s lease – [wts, rts] = [12, 12]
• Instantly grant Core 0 exclusive ownership
• Core 1 writes B’ at pts 12
• Different versions of B coexist!
Core 0: Core 1: store A load B store B load A load B
Core 0 pts: 1
A’ state: M wts: 1 rts: 1
B state: S wts: 1 rts: 11
Core 1 pts: 0
A state: I wts: rts:
B state: I wts: rts:
Manager
A state: M owner: 0 wts: 0 rts: 0
B state: S owner: wts: 1 rts: 11 B state: M owner: 1 wts: 12 rts: 12
B’ state: M wts: 12 rts: 12
pts: 12
Example
![Page 9: Towards Thousand-Core RISC-V Shared Memory Systems...Towards Thousand-Core RISC-V Shared Memory Systems Quan Nguyen Massachusetts Institute of Technology 30 November 2016 . Outline](https://reader034.vdocuments.site/reader034/viewer/2022042311/5ed9c66a3029ca4c5b53a63f/html5/thumbnails/9.jpg)
9
• Core 1 loads A • Manager sends Core 0
writeback request • Core 0 downgrades • Core 1 receives new
lease based on its pts – [wts, rts] = [12, 22]
• Core 1 reads A’ at pts 12
Core 0: Core 1: store A load B store B load A load B
Core 0 pts: 1
A’ state: M wts: 1 rts: 1
B state: S wts: 1 rts: 11
Core 1 pts: 12
A state: I wts: rts:
B’ state: M wts: 12 rts: 12
Manager
A state: M owner: 0 wts: 0 rts: 0
B state: M owner: 1 wts: 12 rts: 12
A’ state: S wts: 1 rts: 1
A’ state: S owner: wts: 1 rts: 1
A’ state: S wts: 12 rts: 22
A’ state: S owner: wts: 12 rts: 22
Example
![Page 10: Towards Thousand-Core RISC-V Shared Memory Systems...Towards Thousand-Core RISC-V Shared Memory Systems Quan Nguyen Massachusetts Institute of Technology 30 November 2016 . Outline](https://reader034.vdocuments.site/reader034/viewer/2022042311/5ed9c66a3029ca4c5b53a63f/html5/thumbnails/10.jpg)
10
• Core 0 loads B • Cache hit; simply loads
B from data cache • Sequential order ≠
physical order
Core 0: Core 1: store A load B store B load A load B
Core 0 pts: 1
A’ state: S wts: 1 rts: 1
B state: S wts: 1 rts: 11
Core 1 pts: 12
A’ state: S wts: 12 rts: 22
B’ state: M wts: 12 rts: 12
Manager
A’ state: S owner: wts: 12 rts: 22
B state: M owner: 1 wts: 12 rts: 12
Example
![Page 11: Towards Thousand-Core RISC-V Shared Memory Systems...Towards Thousand-Core RISC-V Shared Memory Systems Quan Nguyen Massachusetts Institute of Technology 30 November 2016 . Outline](https://reader034.vdocuments.site/reader034/viewer/2022042311/5ed9c66a3029ca4c5b53a63f/html5/thumbnails/11.jpg)
A case for scalability
• Track only one node: O(log N) storage • No broadcast invalidations • Timestamps not tied to core count – No need for synchronized real-time clocks – Can be compressed
11
![Page 12: Towards Thousand-Core RISC-V Shared Memory Systems...Towards Thousand-Core RISC-V Shared Memory Systems Quan Nguyen Massachusetts Institute of Technology 30 November 2016 . Outline](https://reader034.vdocuments.site/reader034/viewer/2022042311/5ed9c66a3029ca4c5b53a63f/html5/thumbnails/12.jpg)
Outline
• The Tardis cache coherence protocol • Thousand-core prototype • RISC-V and Tardis
12
![Page 13: Towards Thousand-Core RISC-V Shared Memory Systems...Towards Thousand-Core RISC-V Shared Memory Systems Quan Nguyen Massachusetts Institute of Technology 30 November 2016 . Outline](https://reader034.vdocuments.site/reader034/viewer/2022042311/5ed9c66a3029ca4c5b53a63f/html5/thumbnails/13.jpg)
Thousand-core shared memory systems
• Fit as many cores will fit on a ZC706 • Connect in a 3D mesh – Aurora links, six connectors per board
• Demonstrate shared memory at scale • Name: T-1000
13 Terminator 2: Judgment Day (1991) Carolco Pictures, Lightstorm Entertainment, Le Studio Canal+, and TriStar Pictures
![Page 14: Towards Thousand-Core RISC-V Shared Memory Systems...Towards Thousand-Core RISC-V Shared Memory Systems Quan Nguyen Massachusetts Institute of Technology 30 November 2016 . Outline](https://reader034.vdocuments.site/reader034/viewer/2022042311/5ed9c66a3029ca4c5b53a63f/html5/thumbnails/14.jpg)
Tardis and RISC-V
• RISC-V: clean, extensible, orthogonal, free • Tardis supports RISC-V’s release consistency • Prototype Tardis on rocket-chip • Chisel simplifies extending hardware
14
![Page 15: Towards Thousand-Core RISC-V Shared Memory Systems...Towards Thousand-Core RISC-V Shared Memory Systems Quan Nguyen Massachusetts Institute of Technology 30 November 2016 . Outline](https://reader034.vdocuments.site/reader034/viewer/2022042311/5ed9c66a3029ca4c5b53a63f/html5/thumbnails/15.jpg)
Outline
• The Tardis cache coherence protocol • Thousand-core prototype • RISC-V and Tardis – Release consistency – Atomic instructions – Synchronization (see S.M.)
15 Quan Nguyen, “Synchronization in Timestamp-Based Cache Coherence Protocols”, S.M. thesis, MIT, 2016.
![Page 16: Towards Thousand-Core RISC-V Shared Memory Systems...Towards Thousand-Core RISC-V Shared Memory Systems Quan Nguyen Massachusetts Institute of Technology 30 November 2016 . Outline](https://reader034.vdocuments.site/reader034/viewer/2022042311/5ed9c66a3029ca4c5b53a63f/html5/thumbnails/16.jpg)
Consistency model comparison
Type Ordering rule Tardis rule
SC X <p Y ⟹ X <s Y X <p Y ⟹ X <ts Y
RC: acquires acq <p X ⟹ acq <s X acq <p X ⟹ acq <ts X
RC: releases X <p rel ⟹ X <s rel X <p rel ⟹ X <ts rel
RC: sync S ∈ {acq, rel} SX <p SY ⟹ SX <s SY SX <p SY ⟹ SX <ts SY
16
<p : program order <s : global memory order <ts : timestamp order
![Page 17: Towards Thousand-Core RISC-V Shared Memory Systems...Towards Thousand-Core RISC-V Shared Memory Systems Quan Nguyen Massachusetts Institute of Technology 30 November 2016 . Outline](https://reader034.vdocuments.site/reader034/viewer/2022042311/5ed9c66a3029ca4c5b53a63f/html5/thumbnails/17.jpg)
Release consistency and Tardis
• Loosen constraint of pts with tsmin • Track most recent operation with tsmax • Fences: tsmin ← tsmax • Track acquires/releases with tsrel – Release: tsrel ← tsmax – Acquire: tsmin ← tsrel
17
![Page 18: Towards Thousand-Core RISC-V Shared Memory Systems...Towards Thousand-Core RISC-V Shared Memory Systems Quan Nguyen Massachusetts Institute of Technology 30 November 2016 . Outline](https://reader034.vdocuments.site/reader034/viewer/2022042311/5ed9c66a3029ca4c5b53a63f/html5/thumbnails/18.jpg)
Load-reserved and store-conditional
• Tardis gives neat solution to LR/SC livelock • wts tracks cache line version • SC success condition: wtslr == wtsbefore sc
18
![Page 19: Towards Thousand-Core RISC-V Shared Memory Systems...Towards Thousand-Core RISC-V Shared Memory Systems Quan Nguyen Massachusetts Institute of Technology 30 November 2016 . Outline](https://reader034.vdocuments.site/reader034/viewer/2022042311/5ed9c66a3029ca4c5b53a63f/html5/thumbnails/19.jpg)
19
• Core 0 performs lr on C – exclusive ownership – wtslr = 0
• Core 1 performs lr on C – core 0 downgraded
• Core 0 performs sc – core 1 downgraded – succeeds; wtslr == wts – writes C’ at pts 1
loop: lr.d x1, 0(C) <do stuff to x1> sc.d x2, x1, 0(C) bnez x2, loop
Core 0 pts: 0
C state: I wts: 0 rts: 0
Core 1 pts: 0
C state: I wts: 0 rts: 0
Manager
C state: I owner: wts: 0 rts: 0
LR/SC example
C state: M wts: 0 rts: 0
C state: M owner: 0 wts: 0 rts: 0 C state: M owner: 1 wts: 0 rts: 0
C state: M wts: 0 rts: 0
C state: S wts: 0 rts: 0
C state: S wts: 0 rts: 0
C state: M owner: 0 wts: 0 rts: 0
C’ state: M wts: 1 rts: 1
pts: 1
![Page 20: Towards Thousand-Core RISC-V Shared Memory Systems...Towards Thousand-Core RISC-V Shared Memory Systems Quan Nguyen Massachusetts Institute of Technology 30 November 2016 . Outline](https://reader034.vdocuments.site/reader034/viewer/2022042311/5ed9c66a3029ca4c5b53a63f/html5/thumbnails/20.jpg)
Block diagram Rocket Core
HellaCache D$ Rocket Core
HellaCache D$
Last-level cache
Main memory
TileLink NoC
20
pts
metadata, hit/miss logic
message timestamps
new coherence logic
![Page 21: Towards Thousand-Core RISC-V Shared Memory Systems...Towards Thousand-Core RISC-V Shared Memory Systems Quan Nguyen Massachusetts Institute of Technology 30 November 2016 . Outline](https://reader034.vdocuments.site/reader034/viewer/2022042311/5ed9c66a3029ca4c5b53a63f/html5/thumbnails/21.jpg)
Thanks!
• Special thanks to Xiangyao Yu and Srini Devadas for their extensive advice and input
21