lucía g. menezo valentín puente josé Ángel gregorio university of cantabria (spain)
DESCRIPTION
MOSAIC : . The Case for a Scalable Coherence Protocol for Complex On-Chip Cache Hierarchies in Many-Core Systems. Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain). Outline. Motivation Directory Schemas In-cache Sparse MOSAIC Coherence Protocol - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain)](https://reader036.vdocuments.site/reader036/viewer/2022062323/568164d4550346895dd70c6c/html5/thumbnails/1.jpg)
The Case for a Scalable Coherence Protocol for
Complex On-Chip Cache Hierarchies in Many-Core
SystemsLucía G. Menezo
Valentín PuenteJosé Ángel Gregorio
University of Cantabria (Spain)
MOSAIC :
![Page 2: Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain)](https://reader036.vdocuments.site/reader036/viewer/2022062323/568164d4550346895dd70c6c/html5/thumbnails/2.jpg)
University of CantabriaEdinburgh - PACT 2013
Motivation Directory Schemas
◦ In-cache ◦ Sparse
MOSAIC Coherence Protocol◦ Examples
Evaluation Results Conclusions
Outline
![Page 3: Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain)](https://reader036.vdocuments.site/reader036/viewer/2022062323/568164d4550346895dd70c6c/html5/thumbnails/3.jpg)
3University of CantabriaEdinburgh - PACT 2013
Performance improvement: more processors per chip
Major challenges: off-chip bandwidth wall Introduce cache into the chip Complex on-chip cache hierarchies
Coherence protocol: fundamental role to play
Motivation
![Page 4: Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain)](https://reader036.vdocuments.site/reader036/viewer/2022062323/568164d4550346895dd70c6c/html5/thumbnails/4.jpg)
4University of CantabriaEdinburgh - PACT 2013
What coherence protocol to use with large number of cores: ◦ Broadcast-based protocols high energy
requirements◦ Directory-based protocols more storage
necessities for sharing information
MOSAIC: new coherence protocol◦ Directory without inclusiveness◦ Token Coherence to guarantee correctness
Motivation
![Page 5: Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain)](https://reader036.vdocuments.site/reader036/viewer/2022062323/568164d4550346895dd70c6c/html5/thumbnails/5.jpg)
University of CantabriaEdinburgh - PACT 2013
Motivation Directory Schemas
◦ In-cache ◦ Sparse
MOSAIC Coherence Protocol◦ Examples
Evaluation Results Conclusions
Outline
![Page 6: Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain)](https://reader036.vdocuments.site/reader036/viewer/2022062323/568164d4550346895dd70c6c/html5/thumbnails/6.jpg)
6University of CantabriaEdinburgh - PACT 2013
Each block in LLC includes tag, data and the sharers information
LLC receives requests needs precise knowledge
Inclusiveness is necessary: any block in the private levels needs to be allocated in LLC
Advantage: coherence protocol less complex Disadvantage: all LLC blocks has storage
overhead
Directory schemas: In-cache
![Page 7: Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain)](https://reader036.vdocuments.site/reader036/viewer/2022062323/568164d4550346895dd70c6c/html5/thumbnails/7.jpg)
7University of CantabriaEdinburgh - PACT 2013
@ data
sharers
@ data
@ data
@ data
@ data
P
Proc
esso
rs a
nd p
rivat
e ca
ches
LLC + in-cache directory
P
P
P Inte
rcon
nect
ion
netw
ork
Overhead!!!
@ dataP
@ dataP
@ dataP
@ dataP
@ dataP
@ dataP
@ dataP
@ dataP
Directory schemas: In-cache
![Page 8: Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain)](https://reader036.vdocuments.site/reader036/viewer/2022062323/568164d4550346895dd70c6c/html5/thumbnails/8.jpg)
8University of CantabriaEdinburgh - PACT 2013
Directory schemas: In-cache@ dat
asharers @ dat
asharers
LLC + in-cache directory
Inte
rcon
nect
ion
netw
ork
Overhead!!!
@ dataP
@ dataP
@ dataP
@ dataP
@ dataP
@ dataP
@ dataP
@ dataP
Overhead!!!
Proc
esso
rs a
nd p
rivat
e ca
ches
![Page 9: Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain)](https://reader036.vdocuments.site/reader036/viewer/2022062323/568164d4550346895dd70c6c/html5/thumbnails/9.jpg)
9University of CantabriaEdinburgh - PACT 2013
Directory entries separated from data Allocated under demand Overhead proportional to the aggregate
private levels size (not LLC) Capacity and associativity has to be
sufficient to keep private-level cache tags
Directory schemas: Sparse
![Page 10: Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain)](https://reader036.vdocuments.site/reader036/viewer/2022062323/568164d4550346895dd70c6c/html5/thumbnails/10.jpg)
10University of CantabriaEdinburgh - PACT 2013
@ data
sharers @ data
Directory schemas: Sparse
Inte
rcon
nect
ion
netw
ork
@ dataP
@ dataP
@ dataP
@ dataP
@ dataP
@ dataP
@ dataP
@ dataP
@@ sharers
LLCSparse dir
Proc
esso
rs a
nd p
rivat
e ca
ches
![Page 11: Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain)](https://reader036.vdocuments.site/reader036/viewer/2022062323/568164d4550346895dd70c6c/html5/thumbnails/11.jpg)
11University of CantabriaEdinburgh - PACT 2013
Duplicate-tag directory: holding all the tags of private levels
Example: 16 cores with 4-way 32KB L1 64-way
Directory schemas: SparseAssociativity = # cores * private caches associativity
# sets = # private
caches sets
tagtagtagtagtagtag
tagtagtagtagtagtag
tagtagtagtagtagtag
tagtagtagtagtagtag
tagtagtagtagtagtag
tagtagtagtagtagtag
tagtagtagtagtagtag
tagtagtagtagtagtag
tagtagtagtagtagtag
![Page 12: Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain)](https://reader036.vdocuments.site/reader036/viewer/2022062323/568164d4550346895dd70c6c/html5/thumbnails/12.jpg)
12University of CantabriaEdinburgh - PACT 2013
Directory schemas: Sparse
tagtagtagtagtagtag
tagtagtagtagtagtag
tagtagtagtagtagtag
tagtagtagtagtagtag
tagtagtagtagtagtag
tagtagtagtagtagtag
Decrease Associativity: now << # cores * private caches associativity
tagtagtagtagtagtag
tagtagtagtagtagtag
tagtagtagtagtagtag
sharers sharerssharerssharerssharerssharerssharerssharers
sharerssharerssharerssharers
sharers sharerssharerssharerssharerssharerssharerssharers
sharerssharerssharerssharers
tagtagtagtagtagtag
tagtagtagtagtagtag
One tag may be in various private caches
More than 1 tag per entry conflicts
Inclusiveness needed invalidate private data (recalls messages)
tagtagtagtagtagtag
tagtagtagtagtagtag
tagtagtagtagtagtag
tagtagtagtagtagtag
Increasenumber of sets
![Page 13: Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain)](https://reader036.vdocuments.site/reader036/viewer/2022062323/568164d4550346895dd70c6c/html5/thumbnails/13.jpg)
13University of CantabriaEdinburgh - PACT 2013
Motivation Directory Schemas
◦ In-cache ◦ Sparse
MOSAIC Coherence Protocol◦ Examples
Evaluation Results Conclusions
Outline
![Page 14: Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain)](https://reader036.vdocuments.site/reader036/viewer/2022062323/568164d4550346895dd70c6c/html5/thumbnails/14.jpg)
14University of CantabriaEdinburgh - PACT 2013
In-cache or sparse it doesn’t matter No inclusiveness No invalidations of data in private caches Reconstruction of sharing information under
demand Uses token counting to avoid extra traffic and
guarantee correctness
Token Coherence protocol:◦ Initially each block := # tokens (==#procs) ◦ Read request: data and 1 token◦ Write request: data and all tokens
MOSAIC Protocol
![Page 15: Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain)](https://reader036.vdocuments.site/reader036/viewer/2022062323/568164d4550346895dd70c6c/html5/thumbnails/15.jpg)
15University of CantabriaEdinburgh - PACT 2013
MOSAIC Conceptual Approach
I 0 N/A
P0
O 2 DATA
P1
S 1 DATA
P2
SharersI
Last Level Cache
I 0 N/A
Data_sliceDir_slice Memory
Controller
On-chip network
Priv
ate
Cach
es
1
2
3
4
5
State Num. Tokens
Data
V
2
3
1
![Page 16: Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain)](https://reader036.vdocuments.site/reader036/viewer/2022062323/568164d4550346895dd70c6c/html5/thumbnails/16.jpg)
16University of CantabriaEdinburgh - PACT 2013
When data not present in LLC broadcast for reconstruction
Private caches inform of num. of held tokens
Token counting avoids negative acknowledgements or timeouts
Reconstruction message piggybacks type of request and requestor
Key: directory may replace silently no invalidations
MOSAIC Key Facts
![Page 17: Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain)](https://reader036.vdocuments.site/reader036/viewer/2022062323/568164d4550346895dd70c6c/html5/thumbnails/17.jpg)
17University of CantabriaEdinburgh - PACT 2013
MOSAIC Read RequestP0 P1 P2
InvalidState IS
Read
P3 Dir LLC
State SState OState C
Data + token
State A
ReconstructionInfo 1 tokenInfo 2 tokensOwnerUnblock (info 1 token)
Read
Forward GETS to Owner
Sharers [P2]Owner: ¿?Sharers [P2, P1]Owner: P1Sharers [P2, P1, P0]Owner: P1
Data + token
3 tokens 1 token
Unblock Sharers [P2, P1, P0, P3]Owner: P1
![Page 18: Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain)](https://reader036.vdocuments.site/reader036/viewer/2022062323/568164d4550346895dd70c6c/html5/thumbnails/18.jpg)
18University of CantabriaEdinburgh - PACT 2013
MOSAIC Write RequestP0 P1 P2
InvalidState IS
WriteP3 Dir LLC
State SState OState C
Data + 3 tokens
State A
Reconstruction
Sharers [P0]Owner: P0
3 tokens 1 token
State IMState M
1 token
Unblock (info all tokens)
Directory Eviction
![Page 19: Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain)](https://reader036.vdocuments.site/reader036/viewer/2022062323/568164d4550346895dd70c6c/html5/thumbnails/19.jpg)
19University of CantabriaEdinburgh - PACT 2013
Motivation Directory Schemas
◦ In-cache ◦ Sparse
MOSAIC Coherence Protocol◦ Examples
Evaluation Results Conclusions
Outline
![Page 20: Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain)](https://reader036.vdocuments.site/reader036/viewer/2022062323/568164d4550346895dd70c6c/html5/thumbnails/20.jpg)
20University of CantabriaEdinburgh - PACT 2013
Evaluation methodologyConfig 1 Config 2
Number of cores 8 @3GHz 16 @3GHzIWin size/Issue
Width 128, 4-wayBlock size 64B
Private
L1 Size /
Associativity32KB I/D, 2-way
L2 Size /
Associativity64KB, 4-way
(exclusive with L1)
L3 Shared
Size / Associativity
16MB 16-way
32MB16-way
NUCA Mapping Static, interleaved across slices
Memory Capacity 4GBMax. Outstanding Mem. Operations 16
Topology 4×4 Mesh 6×6 Mesh
Core 0 Core 1 Core 2 Core 3
Core 4 Core 5 Core 6 Core 7
R R R R
R R R R
R R R R
R R R R
Slice 0 Slice 2Slice 1 Slice 3
Slice 4 Slice 6Slice 5 Slice 7
Slice 8 Slice 10Slice 9 Slice 11
Slice 12 Slice 14Slice 13 Slice 15
Core 0 Core 1 Core 2 Core 3
R R R R
R R R R
R R R R
R R R R
Slice 0 Slice 2Slice 1 Slice 3
Slice 5 Slice 7Slice 6 Slice 8
Slice 11 Slice 13Slice 12 Slice 14
Slice 17 Slice 19Slice 18 Slice 20
R
R
R
R
Slice 9
Slice 15
Slice 21
R
R
R
R
Slice 4
Slice 10
Slice 16
R R R RSlice 23 Slice 25Slice 24 Slice 26
RSlice 27
RSlice 22
R R R RSlice 28 Slice 30Slice 29 Slice 31
RR
Core 7Core 5
Core 6Core 4
Core 11 Core 10 Core 9 Core 8Co
re
12Co
re 1
4Co
re 1
3Co
re 1
5
![Page 21: Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain)](https://reader036.vdocuments.site/reader036/viewer/2022062323/568164d4550346895dd70c6c/html5/thumbnails/21.jpg)
21University of CantabriaEdinburgh - PACT 2013
GEMS: full-system evaluation
◦SLICC: Specification Language for Implementing Cache Coherence
Simulation stack and Workloads
Multithreaded Workloads
4 Wisconsin Commercial Workload
3 NAS Parallel Bench.
Multiprogrammed Workloads
3 Spec 2006 (Rate Mode)
![Page 22: Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain)](https://reader036.vdocuments.site/reader036/viewer/2022062323/568164d4550346895dd70c6c/html5/thumbnails/22.jpg)
22University of CantabriaEdinburgh - PACT 2013
Astar
Hmmer
Omnetpp FT IS LU
Apach
e Jbb OLTP Zeus
Gmean
0.50.60.70.80.9
11.1
64w128KB 32w128KB 2w128KB 1w128KB
MOSAIC PerformanceReducing associativity
Norm
alize
d ex
ecut
ion
time
128KB 16K entries (8 bytes per entry)
![Page 23: Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain)](https://reader036.vdocuments.site/reader036/viewer/2022062323/568164d4550346895dd70c6c/html5/thumbnails/23.jpg)
23University of CantabriaEdinburgh - PACT 2013
Number of misses64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1
BASE MO-SAIC
BASE MO-SAIC
BASE MO-SAIC
BASE MO-SAIC
BASE MO-SAIC
BASE MO-SAIC
BASE MO-SAIC
BASE MO-SAIC
BASE MO-SAIC
BASE MO-SAIC
Astar Hmmer Omnetpp FT IS LU Apache Jbb OLTP Zeus
00.20.40.60.8
11.21.41.61.8
2Misses L2 Misses L1I Misses L1D
Norm
alize
d nu
m. m
isses x2
![Page 24: Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain)](https://reader036.vdocuments.site/reader036/viewer/2022062323/568164d4550346895dd70c6c/html5/thumbnails/24.jpg)
24University of CantabriaEdinburgh - PACT 2013
Astar
Hmmer
Omnetpp FT IS LU
Apach
e Jbb OLTP Zeus
Gmean
0.40.50.60.70.80.9
11.1
64w16KB 32w16KB 2w16KB 1w16KB
MOSAIC Performance Reducing associativity and capacity
Norm
alize
d ex
ecut
ion
time
128KB 16K entries (8 bytes per entry) 16KB 2K entries
![Page 25: Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain)](https://reader036.vdocuments.site/reader036/viewer/2022062323/568164d4550346895dd70c6c/html5/thumbnails/25.jpg)
25University of CantabriaEdinburgh - PACT 2013
MOSAIC Latency64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1
BASE
MOSAIC
Astar Hmmer Omnetpp FT IS LU Apache Jbb OLTP Zeus
0
2
4
6
8
10
12
L3 Other L2 Other L1 Private L2 Local L1
Late
ncy
(Pro
cess
or C
ycle
s)
16KB 2K entries
![Page 26: Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain)](https://reader036.vdocuments.site/reader036/viewer/2022062323/568164d4550346895dd70c6c/html5/thumbnails/26.jpg)
26University of CantabriaEdinburgh - PACT 2013
Aver
age
netw
ork
link
utiliz
atio
n
MOSAIC Link Utilization
Astar
Hmmer
Omnetpp FT IS LU
Apach
e Jbb OLTP Zeus
Gmean
0
0.2
0.4
0.6
0.8
1
1.2
1.4 64w128KB 64w64KB 64w32KB 64w8KB 2w128KB 2w64KB2w16KB
![Page 27: Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain)](https://reader036.vdocuments.site/reader036/viewer/2022062323/568164d4550346895dd70c6c/html5/thumbnails/27.jpg)
27University of CantabriaEdinburgh - PACT 2013
MOSAIC Link Utilization vs. Dir
Astar
Hmmer
Omnetpp FT IS LU
Apach
e Jbb OLTP Zeus
Gmean
0
0.2
0.4
0.6
0.8
1
1.2
1.42w128KB 2w64KB 2w16KB
Nor
mal
ized
net
wor
k lin
k ut
iliza
tion
40%!!
![Page 28: Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain)](https://reader036.vdocuments.site/reader036/viewer/2022062323/568164d4550346895dd70c6c/html5/thumbnails/28.jpg)
28University of CantabriaEdinburgh - PACT 2013
MOSAIC Scalability
Astar
Hmmer
Omnetpp FT IS LU
Apach
e Jbb OLTP Zeus
Gmean
00.20.40.60.8
11.21.41.61.8
2 128w256KB 128w128KB 128w64KB 128w32KB 2w256KB 2w128KB2w64KB 2w32KB
Norm
alize
d lin
k ut
ilizat
ion
16 cores configuration
![Page 29: Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain)](https://reader036.vdocuments.site/reader036/viewer/2022062323/568164d4550346895dd70c6c/html5/thumbnails/29.jpg)
29University of CantabriaEdinburgh - PACT 2013
Low complexity and great scalability Very low storage overhead No noticeable energy cost Alternative for future many-core cache
coherent CMPs
ConclusionsBandwidth scalability of a directory
Elegancy of Token Coherence
MOSAIC Coherence Protocol
![Page 30: Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain)](https://reader036.vdocuments.site/reader036/viewer/2022062323/568164d4550346895dd70c6c/html5/thumbnails/30.jpg)
30University of CantabriaEdinburgh - PACT 2013
Thank you for your attention
![Page 31: Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain)](https://reader036.vdocuments.site/reader036/viewer/2022062323/568164d4550346895dd70c6c/html5/thumbnails/31.jpg)
31University of CantabriaEdinburgh - PACT 2013
![Page 32: Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain)](https://reader036.vdocuments.site/reader036/viewer/2022062323/568164d4550346895dd70c6c/html5/thumbnails/32.jpg)
32University of CantabriaEdinburgh - PACT 2013
Realistic Cache Configuration
Astar
Hmmer
Omnetpp FT IS LU
Apach
e Jbb OLTP Zeus
Gmean
00.20.40.60.8
11.2
16w512KB 16w256KB 16w128KB 16w64KB 16w32KB
Norm
alize
d ex
ecut
ion
time
- Same experiment with BASE: 20% impact in some cases
L1: 4-way 32KB / L2: 8-way 256KBx2 full dir 1/10 full dir
![Page 33: Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain)](https://reader036.vdocuments.site/reader036/viewer/2022062323/568164d4550346895dd70c6c/html5/thumbnails/33.jpg)
33University of CantabriaEdinburgh - PACT 2013
MOSAIC Energy12
8 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16
BASE
MO-SAIC
BASE
MO-SAIC
BASE
MO-SAIC
BASE
MO-SAIC
BASE
MO-SAIC
BASE
MO-SAIC
BASE
MO-SAIC
BASE
MO-SAIC
BASE
MO-SAIC
BASE
MOSAIC
Astar Hmmer Om-netpp
FT IS LU Apache Jbb OLTP Zeus
00.20.40.60.8
11.21.41.61.8
Network Sparse directory L3 L2 L1
Norm
alize
d Dy
nam
ic En
ergy