dynamic prefetching & cache optimizations
TRANSCRIPT
- 1. Prefetch and Cache in PLDI'02
- Dynamic Hot Data Stream Prefetching...
- Hot Data Stream
- Efficient Discovery of Regular Stride...
- Irregularload
- Static Load Classification for...
- load20Load-value prediction
- Dynamic Hot Data Stream Prefetching...
2. 2010.06.30 CITED BY40 3.
- 4.
- Temporal data reference profile 5. Extract hot data stream 6. With the added prefetch inst(no profiler, analyzer)
- Improvement 5-19% speedup
7. Overview 8. Data Refs. Profiling and Analysis
- Bursty Tracing Framework for Low-over-head Temporal Profiling
- Not only the freq., but also temporal relationships 9. eg.cdeabcdeabfg abcdefabcdeg
- Extensions for Online Optimization 10. Fast Hot Data Stream Detection
11. Bursty Tracing Framework [15]for Low-overhead temporal profiling2 nCheck nInst Vulcan 12. Extensions for Online Optilization 13. Fast Hot Data Stream Detection(1) = to compress the profile and infer its hierarchical structure. [23] 14. Fast Hot Data Stream Detection(2) v.heat=v.length*v.frequency A.heat = wA.length*A.coldUses 15. Overhead of profiling and analysis 16. Dynamic Prefetching
- Generating Detection and Prefetching Code 17. Injecting Detection and Prefetching Code
18. Generating Detection and Prefetching Code Hot data streamv=v1v2...v{v.length}into a headv.head=v1v2...vheadLen and a tailv.tail=v{headLen+1}v{headLen+2}...v{v.length} . 19. 20. Performance impact 21. 22. 2010.06.30 CITED BY18 23.
- Irregular data references 24. Irregularload() 25. load
- edge frequency 26. 17%
- 181.mcf: 1.59x, 254.gap: 1.14x
27. 2010.06.30 CITED BY2 28.
- Load-value prediction [20]: load 29. Load-value
predictionloadSpeculation
- : Hardware-/Profile-based method
- Speculation
- load 30. CJava
[20] M. H. Lipasti, C. B. Wilkerson, and J. P. Shen. ValueLocality and Load Value Prediction.InProceedings of the second international conference on architectural support for programming languages and operating systems , pages 138147, 1996. 31.
- Load20
- Region:S tack,H eap,G lobal space 32. Kind: objectF ield,A rray element,S calar variable 33. Type:P ointer,N on-pointer
- 16K, 64K, 256K2-way set-associative cache 34. 5 load-value predictors, 2048/infinite entries
(i)lv , which predicts the last value for every load (ii)l4v , which predicts one of the last four values for every load (iii)st2d , which uses strides to predict loads (iv)fcm , which uses a representation of the context of preceding loads to predict a load (v)dfcm , which enhancesfcmwith strides. 35. 36. 37. Prediction 38.