is reuse distance applicable to data locality …is reuse distance applicable to data locality...
TRANSCRIPT
![Page 1: Is Reuse Distance Applicable to Data Locality …Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors? Yunlian Jiang Eddy Z. Zhang Kai Tian Xipeng Shen (presenter)](https://reader033.vdocuments.site/reader033/viewer/2022043005/5f8d1b7bb9f227111b7f711e/html5/thumbnails/1.jpg)
Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors?
Yunlian JiangEddy Z. Zhang
Kai TianXipeng Shen (presenter)
Department of Computer ScienceThe College of William and Mary, VA, USA
![Page 2: Is Reuse Distance Applicable to Data Locality …Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors? Yunlian Jiang Eddy Z. Zhang Kai Tian Xipeng Shen (presenter)](https://reader033.vdocuments.site/reader033/viewer/2022043005/5f8d1b7bb9f227111b7f711e/html5/thumbnails/2.jpg)
The College of William and Mary
Cache Sharing
• A common feature on modern CMP
2
![Page 3: Is Reuse Distance Applicable to Data Locality …Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors? Yunlian Jiang Eddy Z. Zhang Kai Tian Xipeng Shen (presenter)](https://reader033.vdocuments.site/reader033/viewer/2022043005/5f8d1b7bb9f227111b7f711e/html5/thumbnails/3.jpg)
The College of William and Mary
Data Locality
• Extensively studied for uni-core processors
• Two classes of metrics
• At hardware level• E.g., cache miss rate
• At program level• E.g., reuse distance
3
![Page 4: Is Reuse Distance Applicable to Data Locality …Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors? Yunlian Jiang Eddy Z. Zhang Kai Tian Xipeng Shen (presenter)](https://reader033.vdocuments.site/reader033/viewer/2022043005/5f8d1b7bb9f227111b7f711e/html5/thumbnails/4.jpg)
The College of William and Mary
Reuse Distance (RD)
• Def: # of distinct data between two adjacent ref. to a data element
• E.g. b c a a c b rd=2
4
c a
RD histogram
![Page 5: Is Reuse Distance Applicable to Data Locality …Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors? Yunlian Jiang Eddy Z. Zhang Kai Tian Xipeng Shen (presenter)](https://reader033.vdocuments.site/reader033/viewer/2022043005/5f8d1b7bb9f227111b7f711e/html5/thumbnails/5.jpg)
The College of William and Mary
Reuse Distance (RD)
• Def: # of distinct data between two adjacent ref. to a data element
• E.g. b c a a c b rd=2
• Appealing properties
• Hardware-independence
• Accurate, point to point
• Cross-input predictable
• Bounded value---data size
5
![Page 6: Is Reuse Distance Applicable to Data Locality …Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors? Yunlian Jiang Eddy Z. Zhang Kai Tian Xipeng Shen (presenter)](https://reader033.vdocuments.site/reader033/viewer/2022043005/5f8d1b7bb9f227111b7f711e/html5/thumbnails/6.jpg)
The College of William and Mary
Many Uses of Reuse Distance• Cross-arch performance prediction [Marin
+:SIGMETRICS04,Zhong+:PACT03]
• Model reference affinity [Zhong+:PLDI04]
• Guide memory disambiguation [Fang+:PACT05]
• Detect locality phases [Shen+:ASPLOS04]
• Software refactoring [Beyls+:HPCC06]
• Model cache sharing [Chandra+:HPCA05]
• Study data reuses [Ding+:SC04,Huang+:ASPLOS05]
• Insert cache hints [Beyls+:JSA05]
• Manage superpages [Cascaval+:PACT05]
6
![Page 7: Is Reuse Distance Applicable to Data Locality …Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors? Yunlian Jiang Eddy Z. Zhang Kai Tian Xipeng Shen (presenter)](https://reader033.vdocuments.site/reader033/viewer/2022043005/5f8d1b7bb9f227111b7f711e/html5/thumbnails/7.jpg)
The College of William and Mary
Complexity Caused by Cache Sharing
• Data locality is not solely determined by a process itself
• Accesses by its co-runners need to be considered
7
![Page 8: Is Reuse Distance Applicable to Data Locality …Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors? Yunlian Jiang Eddy Z. Zhang Kai Tian Xipeng Shen (presenter)](https://reader033.vdocuments.site/reader033/viewer/2022043005/5f8d1b7bb9f227111b7f711e/html5/thumbnails/8.jpg)
The College of William and Mary
Questions to Answer
• Is reuse distance applicable for locality characterization on CMP?
• What are the new challenges?
• Are these challenges addressable?
8
![Page 9: Is Reuse Distance Applicable to Data Locality …Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors? Yunlian Jiang Eddy Z. Zhang Kai Tian Xipeng Shen (presenter)](https://reader033.vdocuments.site/reader033/viewer/2022043005/5f8d1b7bb9f227111b7f711e/html5/thumbnails/9.jpg)
The College of William and Mary
Outline• Complexities in extending reuse distance model to CMP
• Addressing the issues for some multithreading app.
9
• Loss of hardware-independence
• A chicken-egg dilemma for performance prediction
• A probabilistic model to derive reuse distance in co-runs
• Evaluation
![Page 10: Is Reuse Distance Applicable to Data Locality …Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors? Yunlian Jiang Eddy Z. Zhang Kai Tian Xipeng Shen (presenter)](https://reader033.vdocuments.site/reader033/viewer/2022043005/5f8d1b7bb9f227111b7f711e/html5/thumbnails/10.jpg)
The College of William and Mary
Terms
• Concurrent reuse distance (CRD)
• # of distinct data accessed by all co-runners between two adjacent ref. to a data element.
• Standalone reuse distance (SRD)
• # of distinct data accessed by the current process between two adjacent ref. to a data element.
• Example
10
a b b c d ap q p q
P1:P2:
SRD = 3; CRD =3+2=5
![Page 11: Is Reuse Distance Applicable to Data Locality …Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors? Yunlian Jiang Eddy Z. Zhang Kai Tian Xipeng Shen (presenter)](https://reader033.vdocuments.site/reader033/viewer/2022043005/5f8d1b7bb9f227111b7f711e/html5/thumbnails/11.jpg)
The College of William and Mary
Distinctive Property of CRD
• Example
11
a b c b aMem. references by P1
SRD = 2CRD = 2 + x
r = speed(P2)/speed(P1)The larger r is, the greater x tends to be.
Dependance on relative running speeds of co-runners.
![Page 12: Is Reuse Distance Applicable to Data Locality …Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors? Yunlian Jiang Eddy Z. Zhang Kai Tian Xipeng Shen (presenter)](https://reader033.vdocuments.site/reader033/viewer/2022043005/5f8d1b7bb9f227111b7f711e/html5/thumbnails/12.jpg)
The College of William and Mary
Two Implications
• First, CRD is hard to measure in real programs.
• Instrumentation changes relative speeds
12
relative speed original: r = IPCi/IPCj after instrumentation: r’ = IPC’i/IPC’j changes of relative speed: |r-r’|/r
![Page 13: Is Reuse Distance Applicable to Data Locality …Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors? Yunlian Jiang Eddy Z. Zhang Kai Tian Xipeng Shen (presenter)](https://reader033.vdocuments.site/reader033/viewer/2022043005/5f8d1b7bb9f227111b7f711e/html5/thumbnails/13.jpg)
The College of William and Mary
Two Implications (cont.)
• Second, CRD loses hardware-independence.
• Relative speeds change across architectures.
13
• Consequence
• Cross-arch. perf. pred. becomes hard for co-runs
![Page 14: Is Reuse Distance Applicable to Data Locality …Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors? Yunlian Jiang Eddy Z. Zhang Kai Tian Xipeng Shen (presenter)](https://reader033.vdocuments.site/reader033/viewer/2022043005/5f8d1b7bb9f227111b7f711e/html5/thumbnails/14.jpg)
The College of William and Mary
Cross-Arch. Performance Prediction
14
trainingSRD
SRD IPC
training platform testing platform
predictor
=
for single runs
trainingCRD
CRD IPC
training platform testing platform
chicken-egg dilemmafor co-runs
![Page 15: Is Reuse Distance Applicable to Data Locality …Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors? Yunlian Jiang Eddy Z. Zhang Kai Tian Xipeng Shen (presenter)](https://reader033.vdocuments.site/reader033/viewer/2022043005/5f8d1b7bb9f227111b7f711e/html5/thumbnails/15.jpg)
The College of William and Mary
Iterative Approach Not Applicable
15
trainingCRD
CRD IPC
training platform testing platform
![Page 16: Is Reuse Distance Applicable to Data Locality …Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors? Yunlian Jiang Eddy Z. Zhang Kai Tian Xipeng Shen (presenter)](https://reader033.vdocuments.site/reader033/viewer/2022043005/5f8d1b7bb9f227111b7f711e/html5/thumbnails/16.jpg)
The College of William and Mary
Iterative Approach Not Applicable
16
IPC(J)
IPC(I)
CRD(J)
CRD(I)
CacheMiss(J)
CacheMiss(I)
IPC(J)
IPC(I)
trainingCRD CRD IPC
training platform testing platform
![Page 17: Is Reuse Distance Applicable to Data Locality …Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors? Yunlian Jiang Eddy Z. Zhang Kai Tian Xipeng Shen (presenter)](https://reader033.vdocuments.site/reader033/viewer/2022043005/5f8d1b7bb9f227111b7f711e/html5/thumbnails/17.jpg)
The College of William and Mary
Outline
• Complexities in extending reuse distance model to CMP
• Loss of hardware-independence
• A chicken-egg dilemma for performance prediction
• Addressing the issues for some multithreading app.
• A probabilistic model to derive reuse distance in co-runs
• Evaluation
17
![Page 18: Is Reuse Distance Applicable to Data Locality …Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors? Yunlian Jiang Eddy Z. Zhang Kai Tian Xipeng Shen (presenter)](https://reader033.vdocuments.site/reader033/viewer/2022043005/5f8d1b7bb9f227111b7f711e/html5/thumbnails/18.jpg)
The College of William and Mary
Favorable Observations
• From a systematic study [Zhang+:PPoPP’10] on PARSEC non-pipelining multithreading benchmarks
• All parallel threads of an app. conduct similar computations
• Uniform relations among threads.
18
They hold across arch, inputs, # of threads, thread-core assignments, program phases.
![Page 19: Is Reuse Distance Applicable to Data Locality …Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors? Yunlian Jiang Eddy Z. Zhang Kai Tian Xipeng Shen (presenter)](https://reader033.vdocuments.site/reader033/viewer/2022043005/5f8d1b7bb9f227111b7f711e/html5/thumbnails/19.jpg)
The College of William and Mary
Implication
• Relative speeds among threads tend to remain the same across arch. and inputs.
19
![Page 20: Is Reuse Distance Applicable to Data Locality …Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors? Yunlian Jiang Eddy Z. Zhang Kai Tian Xipeng Shen (presenter)](https://reader033.vdocuments.site/reader033/viewer/2022043005/5f8d1b7bb9f227111b7f711e/html5/thumbnails/20.jpg)
The College of William and Mary
An Efficient Way to Estimate CRD
20
SRDT1
SRDT2
SRDTm
...CRDT1
CRDT2
CRDTm
...
prob.model
![Page 21: Is Reuse Distance Applicable to Data Locality …Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors? Yunlian Jiang Eddy Z. Zhang Kai Tian Xipeng Shen (presenter)](https://reader033.vdocuments.site/reader033/viewer/2022043005/5f8d1b7bb9f227111b7f711e/html5/thumbnails/21.jpg)
The College of William and Mary
Two Steps(1) ∆ d (# of distinct data accessed)
21
a ... a∆
trace of T1:
∆ dT1 dT2 dTm
... CRDT1= dT1 + dT1 + ... + dTm
assuming no data sharing
(2) Handle effects of data sharing
![Page 22: Is Reuse Distance Applicable to Data Locality …Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors? Yunlian Jiang Eddy Z. Zhang Kai Tian Xipeng Shen (presenter)](https://reader033.vdocuments.site/reader033/viewer/2022043005/5f8d1b7bb9f227111b7f711e/html5/thumbnails/22.jpg)
The College of William and Mary
Time Distance (TD)
• Def : the # of elements between reuses
22
• E.g. b c a a c b td=4 (rd=2)
time distance
• TD Histogram (TDH)
Shows the probability for an access to have a certain TD.
![Page 23: Is Reuse Distance Applicable to Data Locality …Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors? Yunlian Jiang Eddy Z. Zhang Kai Tian Xipeng Shen (presenter)](https://reader033.vdocuments.site/reader033/viewer/2022043005/5f8d1b7bb9f227111b7f711e/html5/thumbnails/23.jpg)
The College of William and Mary
• Pi(∆): Probability for an object Oi to be referenced in a ∆-long interval.
23
Pi(∆ ) = Pi(∆-1) + qi(∆ ) ∆
Pi(∆-1) = Pi(∆-2) + qi(∆-1)
Pi(1) = Pi(0) + qi(1)
...
Pi (Δ) = qi (τ )τ =1
Δ
∑
qi(∆ ): Oi is accessed at time point ∆, but not at the ∆-1 points ahead.
∆ d TDH
![Page 24: Is Reuse Distance Applicable to Data Locality …Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors? Yunlian Jiang Eddy Z. Zhang Kai Tian Xipeng Shen (presenter)](https://reader033.vdocuments.site/reader033/viewer/2022043005/5f8d1b7bb9f227111b7f711e/html5/thumbnails/24.jpg)
The College of William and Mary
• qi(τ): Oi is accessed at time point τ, but not at the τ-1 points ahead. It is equivalent to
1)The object accessed at τ is Oi &2)The time distance of that reference is greater than τ.
24
τ
qi (τ ) =niT
Hi (δ )δ =τ +1
T
∑TDH
TDH∆ d
Pi (Δ) =niT τ =1
Δ
∑ Hi (δ )δ =τ +1
T
∑
![Page 25: Is Reuse Distance Applicable to Data Locality …Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors? Yunlian Jiang Eddy Z. Zhang Kai Tian Xipeng Shen (presenter)](https://reader033.vdocuments.site/reader033/viewer/2022043005/5f8d1b7bb9f227111b7f711e/html5/thumbnails/25.jpg)
The College of William and Mary
• P(k, ∆): prob. for a ∆-long interval to contain k distinct data.
25
• d: # of distinct data referenced in a ∆-long interval.
d
∆ d TDH
See paper for details.
![Page 26: Is Reuse Distance Applicable to Data Locality …Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors? Yunlian Jiang Eddy Z. Zhang Kai Tian Xipeng Shen (presenter)](https://reader033.vdocuments.site/reader033/viewer/2022043005/5f8d1b7bb9f227111b7f711e/html5/thumbnails/26.jpg)
The College of William and Mary
Handling Data Sharing
• Two effects from data sharing on CRD
• Example
26
a b X X b X c d X a X: references by T2
• Scenario 1: Xs ∉ {a, b, c, d}.
• a b p q b p c d q a CRD=3+2=5
• Scenario 2: a ∈ Xs.
• a b p a b p c d q a break into 2 reuse intervals
• Scenario 3: {b,c,d} ∩ Xs ≠ ϕ.
• a b p c b p c d c a CRD=3+1=4
should not be counted.
![Page 27: Is Reuse Distance Applicable to Data Locality …Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors? Yunlian Jiang Eddy Z. Zhang Kai Tian Xipeng Shen (presenter)](https://reader033.vdocuments.site/reader033/viewer/2022043005/5f8d1b7bb9f227111b7f711e/html5/thumbnails/27.jpg)
The College of William and Mary
Treating the Effects
• Probability for a reuse interval to break
• Probability for |C|=c is
27
S: set of all shared data.N1, N2: data size of T1 and T2.n1, n2: # of distinct data accessed by T1 and T2 in an interval of length V.C: intersection of data sets referenced by T1 and T2 in the interval.
See paper for details.
![Page 28: Is Reuse Distance Applicable to Data Locality …Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors? Yunlian Jiang Eddy Z. Zhang Kai Tian Xipeng Shen (presenter)](https://reader033.vdocuments.site/reader033/viewer/2022043005/5f8d1b7bb9f227111b7f711e/html5/thumbnails/28.jpg)
The College of William and Mary
• Estimation accuracy of CRD histograms on synthetic traces
s: sharing ration1, n2: data sizes
![Page 29: Is Reuse Distance Applicable to Data Locality …Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors? Yunlian Jiang Eddy Z. Zhang Kai Tian Xipeng Shen (presenter)](https://reader033.vdocuments.site/reader033/viewer/2022043005/5f8d1b7bb9f227111b7f711e/html5/thumbnails/29.jpg)
The College of William and Mary
On Traces of Real Programs• Using simulator to record traces.
• SIMICS with GEMS
• Simulate UltraSPARC with 1MB shared L2 cache.
• Three PARSEC programs
• vips (image processing)• negligible shared data, 33,000 locks
• accuracy 76%
• swaptions (portfolio pricing)• 27% shared data, 23 locks
• accuracy 74%
• streamcluster (online clustering)
• 3% shared data, 129,600 barriers• accuracy 72%
29
![Page 30: Is Reuse Distance Applicable to Data Locality …Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors? Yunlian Jiang Eddy Z. Zhang Kai Tian Xipeng Shen (presenter)](https://reader033.vdocuments.site/reader033/viewer/2022043005/5f8d1b7bb9f227111b7f711e/html5/thumbnails/30.jpg)
The College of William and Mary
Related Work
• All-window profiling [Ding and Chilimbi]
• Predict cache misses of co-runs from circular stack distance histograms [Chandra et al., Chen & Aamodt]
• Statistical shared cache model [Berg et al.]
30
![Page 31: Is Reuse Distance Applicable to Data Locality …Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors? Yunlian Jiang Eddy Z. Zhang Kai Tian Xipeng Shen (presenter)](https://reader033.vdocuments.site/reader033/viewer/2022043005/5f8d1b7bb9f227111b7f711e/html5/thumbnails/31.jpg)
The College of William and Mary
Conclusions
31
• Is reuse distance applicable for locality characterization on CMP?
• What are the new challenges?
• Are these challenges addressable?
Difficult in general.
Reliance on relative speeds; loss of hardware-indep; falling into a chicken-egg dilemma.
Yes for a class of multithreading applications.A probabilistic model facilitates the derivation of CRD.
![Page 32: Is Reuse Distance Applicable to Data Locality …Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors? Yunlian Jiang Eddy Z. Zhang Kai Tian Xipeng Shen (presenter)](https://reader033.vdocuments.site/reader033/viewer/2022043005/5f8d1b7bb9f227111b7f711e/html5/thumbnails/32.jpg)
The College of William and Mary
Thanks!
32
Questions?