![Page 1: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/1.jpg)
Two Ways to Exploit Multi-Megabyte Caches
AENAO Research Group @ TorontoKaveh Aasaraai
Ioana Burcea
Myrto Papadopoulou
Elham Safi
Jason Zebchuk
Andreas Moshovos
{aasaraai, ioana, myrto, elham, zebchuk, moshovos}@eecg.toronto.edu
![Page 2: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/2.jpg)
EPFL, Jan. 2008 2Aenao Group/Toronto
Future Caches: Just Larger?
CPU
I$ D$
CPU
I$ D$
CPU
I$ D$
interconnect
Main Memory
1. “Big Picture” Management2. Store Metadata
10s – 100s of MB
![Page 3: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/3.jpg)
EPFL, Jan. 2008 3Aenao Group/Toronto
Conventional Block Centric Cache
“Small” Blocks Optimizes Bandwidth and Performance
Large L2/L3 caches especially
Fine-Grain View of Memory
L2 Cache
Big Picture Lost
![Page 4: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/4.jpg)
EPFL, Jan. 2008 4Aenao Group/Toronto
“Big Picture” View
Region: 2n sized, aligned area of memory Patterns and behavior exposed
Spatial locality
Exploit for performance/area/power
Coarse-Grain View of Memory
L2 Cache
![Page 5: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/5.jpg)
EPFL, Jan. 2008 5Aenao Group/Toronto
Exploiting Coarse-Grain Patterns
Many existing coarse-grain optimizations Add new structures to track coarse-grain information
CPU
L2 Cache
Stealth Prefetching
Run-time Adaptive Cache Hierarchy Management via
Reference Analysis
Destination-Set Prediction
Spatial Memory Streaming
Coarse-Grain Coherence Tracking
RegionScout
Circuit-Switched
Coherence
Hard to justify for a commercial design
Coarse-Grain Framework
Embed coarse-grain information in tag array
Support many different optimizations with less area overhead
Adaptable optimization FRAMEWORK
![Page 6: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/6.jpg)
EPFL, Jan. 2008 6Aenao Group/Toronto
L2 Cache
RegionTracker Solution
Manage blocks, but also track and manage regions
Tag Array
L1
L1
L1
L1
Data Array
Data Blocks
BlockRequests
Block Requests
RegionTracker
RegionProbes
RegionResponses
![Page 7: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/7.jpg)
EPFL, Jan. 2008 7Aenao Group/Toronto
RegionTracker Summary
Replace conventional tag array: 4-core CMP with 8MB shared L2 cache Within 1% of original performance Up to 20% less tag area Average 33% less energy consumption
Optimization Framework: Stealth Prefetching: same performance, 36% less area RegionScout: 2x more snoops avoided, no area overhead
![Page 8: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/8.jpg)
EPFL, Jan. 2008 8Aenao Group/Toronto
Road Map
Introduction
Goals
Coarse-Grain Cache Designs
RegionTracker: A Tag Array Replacement
RegionTracker: An Optimization Framework
Conclusion
![Page 9: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/9.jpg)
EPFL, Jan. 2008 9Aenao Group/Toronto
Goals
1. Conventional Tag Array Functionality Identify data block location and state Leave data array un-changed
2. Optimization Framework Functionality Is Region X cached? Which blocks of Region X are cached? Where? Evict or migrate Region X Easy to assign properties to each Region
![Page 10: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/10.jpg)
EPFL, Jan. 2008 10Aenao Group/Toronto
Coarse-Grain Cache Designs
Increased BW, Decreased hit-rates
Region X
Large Block Size
Tag Array Data Array
![Page 11: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/11.jpg)
EPFL, Jan. 2008 11Aenao Group/Toronto
Sector Cache
Decreased hit-rates
Region X
Tag Array Data Array
![Page 12: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/12.jpg)
EPFL, Jan. 2008 12Aenao Group/Toronto
Sector Pool Cache
High Associativity (2 - 4 times)
Region X
Tag Array Data Array
![Page 13: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/13.jpg)
EPFL, Jan. 2008 13Aenao Group/Toronto
Decoupled Sector Cache
Region information not exposed Region replacement requires scanning multiple entries
Region X
Tag Array Data ArrayStatus Table
![Page 14: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/14.jpg)
EPFL, Jan. 2008 14Aenao Group/Toronto
Design Requirements
Small block size (64B) Miss-rate does not increase Lookup associativity does not increase No additional access latency
(i.e., No scanning, no multiple block evictions)
Does not increase latency, area, or energy Allows banking and interleaving
Fit in conventional tag array “envelope”
![Page 15: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/15.jpg)
EPFL, Jan. 2008 15Aenao Group/Toronto
RegionTracker: A Tag Array Replacement
L1
L1
L1
L1
Data Array
3 SRAM arrays, combined smaller than tag array
RegionVectorArray
BlockStatusTable
EvictedRegionBuffer
![Page 16: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/16.jpg)
EPFL, Jan. 2008 16Aenao Group/Toronto
Basic Structures
Region Vector Array(RVA)
Region Tag ……
block0
block15
wayV
1 4
Block Status Table(BST)
status
3 2
Address: specific RVA set and BST set RVA entry: multiple, consecutive BST sets BST entry: one of four RVA sets
Ex: 8MB, 16-way set-associative cache, 64-byte blocks, 1KB region
![Page 17: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/17.jpg)
EPFL, Jan. 2008 17Aenao Group/Toronto
Common Case: Hit
Region Tag RVA Index Region OffsetBlock Offset49 061021
Address:
Region Vector Array(RVA)
Region Tag ……
block0
block15
wayV
Block Offset19 6 0
Block Status Table(BST)
1 4
status
3 2
Data Array + BST Index
To Data Array
Ex: 8MB, 16-way set-associative cache, 64-byte blocks, 1KB region
![Page 18: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/18.jpg)
EPFL, Jan. 2008 18Aenao Group/Toronto
Worst Case (Rare): Region Miss
Region Tag RVA Index Region OffsetBlock Offset
49 061021
Address:
Region Vector Array(RVA)
Region Tag ……
block0
block15
wayV
Block Offset19 6 0
Block Status Table(BST)
status
3
Ptr
2
Data Array + BST Index
EvictedRegionBuffer(ERB)No
Match!
Ptr
![Page 19: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/19.jpg)
EPFL, Jan. 2008 19Aenao Group/Toronto
Methodology
Flexus simulator from CMU SimFlex group Based on Simics full-system simulator
4-core CMP modeled after Piranha Private 32KB, 4-way set-associative L1 caches Shared 8MB, 16-way set-associative L2 cache 64-byte blocks
Miss-rates: Functional simulation of 2 billion instructions per core Performance and Energy: Timing simulation using SMARTS sampling
methodology Area and Power: Full custom implementation on 130nm commercial
technology 9 commercial workloads:
WEB: SpecWEB on Apache and Zeus OLTP: TPC-C on DB2 and Oracle DSS: 5 TPC-H queries on DB2
Interconnect
L2
P
D$ I$
P
D$ I$
P
D$ I$
P
D$ I$
![Page 20: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/20.jpg)
EPFL, Jan. 2008 20Aenao Group/Toronto
Miss-Rates vs. Area
Sector Cache: 512KB sectors, SPC and RT: 1KB regions Trade-offs comparable to conventional cache
0.99
1
1.01
1.02
1.03
1.04
1.05
0.5 0.6 0.7 0.8 0.9 1 1.1 1.2
Sector Pool Cache
RegionTracker
Conventional Tags
better
Rela
tive M
iss-
Rate
Relative Tag Array Area
Sector Cache (0.25, 1.26)
14-way 15-way
52-way
48-way
![Page 21: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/21.jpg)
EPFL, Jan. 2008 21Aenao Group/Toronto
Performance & Energy
0.97
0.98
0.99
1.00
1.01
1.02
1.03
WEB OLTP DSS0%
10%
20%
30%
40%
50%
WEB OLTP DSS
12-way set-associative RegionTracker: 20% less area Error bars: 95% confidence interval
Performance within 1%, with 33% tag energy reduction
Norm
aliz
ed E
xecu
tion T
ime
better
Reduct
ion in T
ag E
nerg
y
better
Performance Energy
![Page 22: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/22.jpg)
EPFL, Jan. 2008 22Aenao Group/Toronto
Road Map
Introduction
Goals
Coarse-Grain Cache Designs
RegionTracker: A Tag Array Replacement
RegionTracker: An Optimization Framework
Conclusion
![Page 23: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/23.jpg)
EPFL, Jan. 2008 23Aenao Group/Toronto
RegionTracker: An Optimization Framework
L1
L1
L1
L1
RVA
ERB
Data Array
BST
Stealth Prefetching:Average 20% performance improvement
Drop-in RegionTracker for 36% less area overhead
RegionScout:In-depth analysis
![Page 24: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/24.jpg)
EPFL, Jan. 2008 24Aenao Group/Toronto
Snoop Coherence: Common Case
Main Memory
CPU CPU CPURead x
mis
sm
iss
Read x+1Read x+2Read x+n
Many snoops are to non-shared regions
![Page 25: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/25.jpg)
EPFL, Jan. 2008 25Aenao Group/Toronto
RegionScout
Eliminate broadcasts for non-shared regions
Main Memory
CPUCPU CPU
Global Region Miss
Region Miss
Non-Shared Regions Locally Cached Regions
Read xRead x
RegionMiss
MissMiss
![Page 26: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/26.jpg)
EPFL, Jan. 2008 26Aenao Group/Toronto
RegionTracker Implementation
Minimal overhead to support RegionScout optimization
Still uses less area than conventional tag array
Non-Shared Regions
Add 1 bit to each RVA entry
Locally Cached Regions
Already provided by RVA
![Page 27: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/27.jpg)
EPFL, Jan. 2008 27Aenao Group/Toronto
RegionTracker + RegionScout
0%
10%
20%
30%
40%
50%
60%
RS 7KB RS 12KB RS 22KB RSRT
Reduct
ion in
Snoop B
roadca
sts
better
4 processors, 512KB L2 Caches 1KB regions
Avoid 41% of Snoop Broadcasts,no area overhead compared to conventional tag
array
BlockScout(4KB)
![Page 28: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/28.jpg)
EPFL, Jan. 2008 28Aenao Group/Toronto
Result Summary
Replace Conventional Tag Array: 20% Less tag area 33% Less tag energy Within 1% of original performance
Coarse-Grain Optimization Framework: 36% reduction in area overhead for Stealth Prefetching Filter 41% of snoop broadcasts with no area overhead
compared to conventional cache
![Page 29: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/29.jpg)
Predictor Virtualization
Ioana Burcea
Joint work with
Stephen Somogyi
Babak Falsafi
![Page 30: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/30.jpg)
EPFL, Jan. 2008 30Aenao Group/Toronto
Predictor Virtualization
Interconnect
L2
CPU CPU
L1-D
L1-I
CPU
L1-D
L1-I
Main Memory
Optimization Engines: Predictors
CPU CPU CPU
L1-D
L1-I
CPU CPU
L1-D L1-I
CPU
L1-D
L1-I
CPU CPU CPUCPU CPU
L1-D
L1-IL1-DL1-IL1-DL1-IL1-D
![Page 31: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/31.jpg)
EPFL, Jan. 2008 31Aenao Group/Toronto
Motivating Trends
Dedicating resources to predictors hard to justify: Chip multiprocessors
Space dedicated to predictors X #processors Larger predictor tables
Increased performance
Memory hierarchies offer the opportunity Increased capacity How many apps really use the space?
Use conventional memory hierarchies to store predictor information
![Page 32: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/32.jpg)
EPFL, Jan. 2008 32Aenao Group/Toronto
PV Architecture contd.
Optimization Engine
Predictor Table
request predictionrequest
![Page 33: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/33.jpg)
EPFL, Jan. 2008 33Aenao Group/Toronto
PV Architecture contd.
Optimization Engine
prediction
Predictor Virtualization
request
![Page 34: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/34.jpg)
EPFL, Jan. 2008 34Aenao Group/Toronto
PV Architecture contd.
Optimization Engine
prediction
+
indexPVStart
PVCache MSHR
PVProxy
L2
Main MemoryPVTable
request
On the backside of the L1
![Page 35: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/35.jpg)
EPFL, Jan. 2008 35Aenao Group/Toronto
To Virtualize Or Not to Virtualize?
1. Re-Use2. Predictor Info Prefetching
Common Case
CPU
I$ D$
interconnect
Main Memory
L2/L3
Infrequent
![Page 36: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/36.jpg)
EPFL, Jan. 2008 36Aenao Group/Toronto
To Virtualize or Not?
Challenge Hit in the PVCache most of the time
Will not work for all predictors out of the box
Reuse is necessary Intrinsic
Easy to virtualize Non-intrinsic
Must be engineered
More so if the predictor needs to be fast to start with
![Page 37: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/37.jpg)
EPFL, Jan. 2008 37Aenao Group/Toronto
Will There Be Reuse?
Intrinsic: Multiple [predictions per entry We’ll see an example
Can be engineered Group temporally correlated entries together:
Cache block
CPU
I$ D$
interconnect
Main Memory
L2/L3
![Page 38: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/38.jpg)
EPFL, Jan. 2008 38Aenao Group/Toronto
Spatial Memory Streaming
Footprint: Blocks accessed per memory region
Predict next time the footprint will be the same Handle: PC + offset within region
![Page 39: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/39.jpg)
EPFL, Jan. 2008 39Aenao Group/Toronto
Spatial Generations
![Page 40: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/40.jpg)
EPFL, Jan. 2008 40Aenao Group/Toronto
Virtualizing SMS
Detector Predictor
patterns
patterns
prefetchestrigger access
Virtualize
![Page 41: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/41.jpg)
EPFL, Jan. 2008 41Aenao Group/Toronto
Virtualizing SMS
VirtualTable1K
11
PVCache8
11
tag pattern
tag tagpattern
pattern0 11 43 54 85 unused
![Page 42: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/42.jpg)
EPFL, Jan. 2008 42Aenao Group/Toronto
Packing Entries in One Cache Block
Index: PC + offset within spatial group PC →16 bits 32 blocks in a spatial group → 5 bit offset
→ 32 bit spatial pattern
Pattern table: 1K sets 10 bits to index the table → 11 bit tag
Cache block: 64 bytes 11 entries per cache block → Pattern table
1K sets – 11-way set associative
21 bit index
tag pattern
tag tagpattern
pattern0 11 43 54 85 unused
![Page 43: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/43.jpg)
EPFL, Jan. 2008 43Aenao Group/Toronto
Memory Address Calculation
+000000
16 bits 5 bits
10 bits
PV Start Address
PC Block offset
Memory Address
![Page 44: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/44.jpg)
EPFL, Jan. 2008 44Aenao Group/Toronto
Simulation Infrastructure
SimFlex: CMU Impetus Full-system simulator based on Simics
Base processor configuration 8-wide OoO 256-entry ROB / 64-entry LSQ L1D/L1I 64KB 4-way set-associative UL2 8MB 16-way set-associative
Commercial workloads TPC-C: DB2 and Oracle TPC-H: Query 1, Query 2, Query 16, Query 17 Web: Apache and Zeus
![Page 45: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/45.jpg)
EPFL, Jan. 2008 45Aenao Group/Toronto
SMS – Performance Potential
0
20
40
60
80
100
120
140
Infin
ite1
K -
16
a1
K -
11
a5
12
-11
a2
56
-11
a1
28
-11
a6
4-1
1a
32
-11
a1
6 -
11
a8
- 1
1a
Infin
ite1
K -
16
a1
K -
11
a5
12
-11
a2
56
-11
a1
28
-11
a6
4-1
1a
32
-11
a1
6 -
11
a8
- 1
1a
Infin
ite1
K -
16
a1
K -
11
a5
12
-11
a2
56
-11
a1
28
-11
a6
4-1
1a
32
-11
a1
6 -
11
a8
- 1
1a
Apache Oracle Qry 17
Pe
rce
nta
ge
L1
Re
ad
Mis
se
s (
%)
Covered Uncovered Overpredictions
better
![Page 46: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/46.jpg)
EPFL, Jan. 2008 46Aenao Group/Toronto
Virtualized Spatial Memory Streaming
-100
1020304050607080
Apache Zeus DB2 Oracle Qry 1 Qry 2 Qry 16 Qry 17
Per
cent
age
Spe
edup
SMS - 1K sets SMS - 8 sets SMS - PVCache 8 sets
Original Prefetcher: Cost: 60KB
Virtualized Prefetcher: Cost: <1Kbyte
Nearly Identical Performance
better
![Page 47: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/47.jpg)
EPFL, Jan. 2008 47Aenao Group/Toronto
Impact of Virtualization on L2 Misses
0
0.5
1
1.5
2
2.5
Apache Oracle Qry 17Per
cen
tag
e In
crea
se L
2 M
isse
s
PV-8 PV-16 PV-32
![Page 48: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/48.jpg)
EPFL, Jan. 2008 48Aenao Group/Toronto
Impact of Virtualization on L2 Requests
0
10
20
30
40
50
Apache Oracle Qry 17
Perc
enta
ge In
crea
se L
2 Re
ques
ts
PV-8 PV-16 PV-32
![Page 49: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas](https://reader036.vdocuments.site/reader036/viewer/2022062314/56649ef45503460f94c08076/html5/thumbnails/49.jpg)
Coarse-Grain Tracking
Jason Zebchuk