predicting inter-thread cache contention on a chip multi-processor architecture
DESCRIPTION
Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture. Dhruba Chandra Fei Guo Seongbeom Kim Yan Solihin Electrical and Computer Engineering North Carolina State University HPCA-2005. Cache Sharing in CMP. Processor Core 1. Processor Core 2. L1 $. L1 $. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture](https://reader035.vdocuments.site/reader035/viewer/2022062321/56812cb2550346895d916686/html5/thumbnails/1.jpg)
Predicting Inter-Thread Cache Contention on a Chip
Multi-Processor Architecture
Dhruba Chandra Fei Guo Seongbeom Kim
Yan Solihin
Electrical and Computer EngineeringNorth Carolina State University
HPCA-2005
![Page 2: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture](https://reader035.vdocuments.site/reader035/viewer/2022062321/56812cb2550346895d916686/html5/thumbnails/2.jpg)
2Chandra, Guo, Kim, Solihin - Contention Model
L2 $
Cache Sharing in CMP
L1 $
……
Processor Core 1 Processor Core 2
L1 $
![Page 3: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture](https://reader035.vdocuments.site/reader035/viewer/2022062321/56812cb2550346895d916686/html5/thumbnails/3.jpg)
3Chandra, Guo, Kim, Solihin - Contention Model
Impact of Cache Space Contention
0%50%
100%150%200%250%300%350%400%
Alo
ne
mcf
+art
mcf
+sw
im
mcf
+mst
mcf
+gzi
p
L2
Cac
he M
isse
s
Application-specific (what) Coschedule-specific (when) Significant: Up to 4X cache misses, 65% IPC reduction
Need a model to understand cache sharing impact
0%
20%
40%
60%
80%
100%
Alo
ne
mcf
+art
mcf
+sw
im
mcf
+mst
mcf
+gzi
p
mcf
's N
orm
aliz
ed I
PC
![Page 4: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture](https://reader035.vdocuments.site/reader035/viewer/2022062321/56812cb2550346895d916686/html5/thumbnails/4.jpg)
4Chandra, Guo, Kim, Solihin - Contention Model
Related Work Uniprocessor miss estimation:
Cascaval et al., LCPC 1999 Chatterjee et al., PLDI 2001
Fraguela et al., PACT 1999 Ghosh et al., TPLS 1999J. Lee at al., HPCA 2001 Vera and Xue, HPCA 2002Wassermann et al., SC 1997
Context switch impact on time-shared processor: Agarwal, ACM Trans. On Computer Systems, 1989Suh et al., ICS 2001
No model for cache sharing impact: Relatively new phenomenon: SMT, CMP Many possible access interleaving scenarios
![Page 5: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture](https://reader035.vdocuments.site/reader035/viewer/2022062321/56812cb2550346895d916686/html5/thumbnails/5.jpg)
5Chandra, Guo, Kim, Solihin - Contention Model
Contributions Inter-Thread cache contention models
2 Heuristics models (refer to the paper) 1 Analytical model
Input: circular sequence profiling for each thread Output: Predicted num cache misses per thread in a co-schedule
Validation Against a detailed CMP simulator 3.9% average error for the analytical model
Insight Temporal reuse patterns impact of cache sharing
![Page 6: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture](https://reader035.vdocuments.site/reader035/viewer/2022062321/56812cb2550346895d916686/html5/thumbnails/6.jpg)
6Chandra, Guo, Kim, Solihin - Contention Model
Outline Model Assumptions Definitions Inductive Probability Model Validation Case Study Conclusions
![Page 7: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture](https://reader035.vdocuments.site/reader035/viewer/2022062321/56812cb2550346895d916686/html5/thumbnails/7.jpg)
7Chandra, Guo, Kim, Solihin - Contention Model
Outline Model Assumptions Definitions Inductive Probability Model Validation Case Study Conclusions
![Page 8: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture](https://reader035.vdocuments.site/reader035/viewer/2022062321/56812cb2550346895d916686/html5/thumbnails/8.jpg)
8Chandra, Guo, Kim, Solihin - Contention Model
Assumptions One circular sequence profile per thread
Average profile yields high prediction accuracy Phase-specific profile may improve accuracy
LRU Replacement Algorithm Others are usu. LRU approximations
Threads do not share data Mostly true for serial apps Parallel apps: threads likely to be impacted uniformly
![Page 9: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture](https://reader035.vdocuments.site/reader035/viewer/2022062321/56812cb2550346895d916686/html5/thumbnails/9.jpg)
9Chandra, Guo, Kim, Solihin - Contention Model
Outline Model Assumptions Definitions Inductive Probability (Prob) Model Validation Case Study Conclusions
![Page 10: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture](https://reader035.vdocuments.site/reader035/viewer/2022062321/56812cb2550346895d916686/html5/thumbnails/10.jpg)
10Chandra, Guo, Kim, Solihin - Contention Model
Definitions seqX(dX,nX) = sequence of nX accesses to dX distinct
addresses by a thread X to the same cache set cseqX(dX,nX) (circular sequence) = a sequence in which
the first and the last accesses are to the same address
A B C D A E E Bcseq(4,5) cseq(1,2)
cseq(5,7)
seq(5,8)
![Page 11: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture](https://reader035.vdocuments.site/reader035/viewer/2022062321/56812cb2550346895d916686/html5/thumbnails/11.jpg)
11Chandra, Guo, Kim, Solihin - Contention Model
Circular Sequence Properties Thread X runs alone in the system:
Given a circular sequence cseqX(dX,nX), the last access is a cache miss iff dX > Assoc
Thread X shares the cache with thread Y: During cseqX(dX,nX)’s lifetime if there is a sequence of
intervening accesses seqY(dY,nY), the last access of thread X is a miss iff dX+dY > Assoc
![Page 12: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture](https://reader035.vdocuments.site/reader035/viewer/2022062321/56812cb2550346895d916686/html5/thumbnails/12.jpg)
12Chandra, Guo, Kim, Solihin - Contention Model
Example Assume a 4-way associative cache:
A B A
X’s circular sequence cseqX(2,3)
U V V W
Y’s intervening access sequence
lifetime
No cache sharing: A is a cache hitCache sharing: is A a cache hit or miss?
![Page 13: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture](https://reader035.vdocuments.site/reader035/viewer/2022062321/56812cb2550346895d916686/html5/thumbnails/13.jpg)
13Chandra, Guo, Kim, Solihin - Contention Model
Example Assume a 4-way associative cache:
A U B V V W A
A B A
X’s circular sequence cseqX(2,3)
U V V W
Y’s intervening access sequence
A U B V V A W
Cache Hit Cache Miss
seqY(2,3) intervening in cseqX’s lifetime
seqY(3,4) intervening in cseqX’s lifetime
![Page 14: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture](https://reader035.vdocuments.site/reader035/viewer/2022062321/56812cb2550346895d916686/html5/thumbnails/14.jpg)
14Chandra, Guo, Kim, Solihin - Contention Model
Outline Model Assumptions Definitions Inductive Probability Model Validation Case Study Conclusions
![Page 15: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture](https://reader035.vdocuments.site/reader035/viewer/2022062321/56812cb2550346895d916686/html5/thumbnails/15.jpg)
15Chandra, Guo, Kim, Solihin - Contention Model
Inductive Probability Model For each cseqX(dX,nX) of thread X
Compute Pmiss(cseqX): the probability of the last access is a miss
Steps: Compute E(nY): expected number of intervening
accesses from thread Y during cseqX’s lifetime
For each possible dY, compute P(seq(dY, E(nY)): probability of occurrence of seq(dY, E(nY)),
If dY + dX > Assoc, add to Pmiss(cseqX)
Misses = old_misses + ∑ Pmiss(cseqX) x F(cseqX)
![Page 16: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture](https://reader035.vdocuments.site/reader035/viewer/2022062321/56812cb2550346895d916686/html5/thumbnails/16.jpg)
16Chandra, Guo, Kim, Solihin - Contention Model
Computing P(seq(dY, E(nY))) Basic Idea:
P(seq(d,n)) = A * P(seq(d-1,n)) + B * P(seq(d-1,n-1)) Where A and B are transition probabilities
Detailed steps in paper
seq(d,n)
seq(d-1,n-1) seq(d,n-1)
+ 1 access to a distinct address
+ 1 access to a non-distinct address
![Page 17: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture](https://reader035.vdocuments.site/reader035/viewer/2022062321/56812cb2550346895d916686/html5/thumbnails/17.jpg)
17Chandra, Guo, Kim, Solihin - Contention Model
Outline Model Assumptions Definitions Inductive Probability Model Validation Case Study Conclusions
![Page 18: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture](https://reader035.vdocuments.site/reader035/viewer/2022062321/56812cb2550346895d916686/html5/thumbnails/18.jpg)
18Chandra, Guo, Kim, Solihin - Contention Model
Validation SESC simulator Detailed CMP + memory hierarchy
14 co-schedules of benchmarks (Spec2K and Olden) Co-schedule terminated when an app completes
CMP Cores
2 cores, each 4-issue dynamic. 3.2GHz
Base Memory
L1 I/D (private): each WB, 32KB, 4way, 64B line
L2 Unified (shared): WB, 512 KB, 8way, 64B line
L2 replacement: LRU
![Page 19: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture](https://reader035.vdocuments.site/reader035/viewer/2022062321/56812cb2550346895d916686/html5/thumbnails/19.jpg)
19Chandra, Guo, Kim, Solihin - Contention Model
ValidationCo-schedule Actual Miss
IncreasePrediction Error
gzip
+ applu
243% -25%
11% 2%
gzip
+ apsi
180% -9%
0% 0%
mcf
+ art
296% 7%
0% 0%
mcf
+ gzip
18% 7%
102% 22%
mcf
+ swim
59% -7%
0% 0%
Error =
(PM-AM)/AM
Larger error happens when miss increase is very large Overall, the model is accurate
![Page 20: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture](https://reader035.vdocuments.site/reader035/viewer/2022062321/56812cb2550346895d916686/html5/thumbnails/20.jpg)
20Chandra, Guo, Kim, Solihin - Contention Model
Other Observations Based on how vulnerable to cache sharing impact:
Highly vulnerable (mcf, gzip) Not vulnerable (art, apsi, swim) Somewhat / sometimes vulnerable (applu, equake, perlbmk,
mst)
Prediction error: Very small, except for highly vulnerable apps 3.9% (average), 25% (maximum) Also small for different cache associativities and sizes
![Page 21: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture](https://reader035.vdocuments.site/reader035/viewer/2022062321/56812cb2550346895d916686/html5/thumbnails/21.jpg)
21Chandra, Guo, Kim, Solihin - Contention Model
Outline Model Assumptions Definitions Inductive Probability Model Validation Case Study Conclusions
![Page 22: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture](https://reader035.vdocuments.site/reader035/viewer/2022062321/56812cb2550346895d916686/html5/thumbnails/22.jpg)
22Chandra, Guo, Kim, Solihin - Contention Model
Case Study Profile approx. by geometric progression
F(cseq(1,*)) F(cseq(2,*)) F(cseq(3,*)) … F(cseq(A,*)) …
Z Zr Zr2 … ZrA … Z = amplitude 0 < r < 1 = common ratio Larger r larger working set
Impact of interfering thread on the base thread? Fix the base thread Interfering thread: vary
Miss frequency = # misses / time Reuse frequency = # hits / time
![Page 23: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture](https://reader035.vdocuments.site/reader035/viewer/2022062321/56812cb2550346895d916686/html5/thumbnails/23.jpg)
23Chandra, Guo, Kim, Solihin - Contention Model
Base Thread: r = 0.5 (Small WS)
Base thread: Not vulnerable to interfering thread’s miss frequency Vulnerable to interfering thread’s reuse frequency
1 1.5 2 2.5 3 3.5 4
Multiplying Factor
L2
Cac
he
Mis
ses
Miss Freq Reuse Freq
![Page 24: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture](https://reader035.vdocuments.site/reader035/viewer/2022062321/56812cb2550346895d916686/html5/thumbnails/24.jpg)
24Chandra, Guo, Kim, Solihin - Contention Model
Base Thread: r = 0.9 (Large WS)
Base thread: Vulnerable to interfering thread’s miss and reuse frequency
1 1.5 2 2.5 3 3.5 4
Multiplying Factor
L2 C
ach
e M
isses
Miss Freq Reuse Freq
![Page 25: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture](https://reader035.vdocuments.site/reader035/viewer/2022062321/56812cb2550346895d916686/html5/thumbnails/25.jpg)
25Chandra, Guo, Kim, Solihin - Contention Model
Outline Model Assumptions Definitions Inductive Probability Model Validation Case Study Conclusions
![Page 26: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture](https://reader035.vdocuments.site/reader035/viewer/2022062321/56812cb2550346895d916686/html5/thumbnails/26.jpg)
26Chandra, Guo, Kim, Solihin - Contention Model
Conclusions New Inter-Thread cache contention models Simple to use:
Input: circular sequence profiling per thread Output: Number of misses per thread in co-schedules
Accurate 3.9% average error
Useful Temporal reuse patterns cache sharing impact
Future work: Predict and avoid problematic co-schedules Release the tool at http://www.cesr.ncsu.edu/solihin