performance model for future multicore process designs
DESCRIPTION
Performance Model for Future Multicore Process Designs. Yipkei Kwok 02/06/2008. A Non-Work-Conserving Operating System Scheduler For SMT Processors. Authors: A. Fedorova et. al Calculate optimal level of //ism of SMT Processors at run time Analytical model - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Performance Model for Future Multicore Process Designs](https://reader035.vdocuments.site/reader035/viewer/2022072013/56812b70550346895d8f9188/html5/thumbnails/1.jpg)
Performance Model for Future Multicore Process Designs
Yipkei Kwok
02/06/2008
![Page 2: Performance Model for Future Multicore Process Designs](https://reader035.vdocuments.site/reader035/viewer/2022072013/56812b70550346895d8f9188/html5/thumbnails/2.jpg)
A Non-Work-Conserving Operating System Scheduler For SMT
Processors• Authors: A. Fedorova et. al• Calculate optimal level of //ism of SMT
Processors at run time• Analytical model• Estimate the workload’s IPC for a given
degree of concurrency• 1st id’fy performance bottle• Suppressing L2 misses improves
performance the best
![Page 3: Performance Model for Future Multicore Process Designs](https://reader035.vdocuments.site/reader035/viewer/2022072013/56812b70550346895d8f9188/html5/thumbnails/3.jpg)
A Non-Work-Conserving Operating System Scheduler For SMT
Processors• Factors
– N– perf_cache_CPI(N)– L2_RMR– L2_WMR– L2_WBR_R– L2_WBR_W– WSC– L2_MCOST
![Page 4: Performance Model for Future Multicore Process Designs](https://reader035.vdocuments.site/reader035/viewer/2022072013/56812b70550346895d8f9188/html5/thumbnails/4.jpg)
Non-Work-Conserving Operating System Scheduler For SMT
Processors• 2-phases scheduling
– Preparation phase• Collect model inputs under full //ism• W./ hardware counters• Till the retirement of the 100 million-th instructions
– Optimization phase• Estimate optimal N• Enforce it• Till … …
– New locality phase
![Page 5: Performance Model for Future Multicore Process Designs](https://reader035.vdocuments.site/reader035/viewer/2022072013/56812b70550346895d8f9188/html5/thumbnails/5.jpg)
Limitations
• 3-56% improvement but … ..
• Empirical model based on UltraSparc T1
• SMT only– But expandable w./, hopefully, reasonable
effort
• Once expanded, performance prediction
• What’re needed?– Extra factors?
![Page 6: Performance Model for Future Multicore Process Designs](https://reader035.vdocuments.site/reader035/viewer/2022072013/56812b70550346895d8f9188/html5/thumbnails/6.jpg)
What new factors?
• Depends on systems to model
• Shared-memory machine
• Threaded // workloads
• SMP of CMPs
• SMT per core
![Page 7: Performance Model for Future Multicore Process Designs](https://reader035.vdocuments.site/reader035/viewer/2022072013/56812b70550346895d8f9188/html5/thumbnails/7.jpg)
What new factors?
• Architecture– Homo/hetero cores
• Difference in speed, or functionality
– Level of cache sharing– Interconnects
![Page 8: Performance Model for Future Multicore Process Designs](https://reader035.vdocuments.site/reader035/viewer/2022072013/56812b70550346895d8f9188/html5/thumbnails/8.jpg)
What new factors?
• Params– #(cores)– Cache size– Degree of set-associativity– #(cores) sharing a cache– Bus, ring, crossbar, tiny-network– Switching & flow mechanisms– Routing algos– Fault tolerance techniques
![Page 9: Performance Model for Future Multicore Process Designs](https://reader035.vdocuments.site/reader035/viewer/2022072013/56812b70550346895d8f9188/html5/thumbnails/9.jpg)
What new factors?
• Protocols– Cache coherence protocol at dedicated/semi-
shared cache
• Algorithms– Block replacement algorithm– Algorithms of cache coherence and data
consistency protocols
![Page 10: Performance Model for Future Multicore Process Designs](https://reader035.vdocuments.site/reader035/viewer/2022072013/56812b70550346895d8f9188/html5/thumbnails/10.jpg)
Potential uses
• Performance prediction for future processors
• Scheduler
![Page 11: Performance Model for Future Multicore Process Designs](https://reader035.vdocuments.site/reader035/viewer/2022072013/56812b70550346895d8f9188/html5/thumbnails/11.jpg)
Similar work exists?
• Multi2Sim (2007)– Framework simulating the system working as
a whole– Yet, app-only simulation– Evaluate multicore-multithreaded processors– 3 major components simulated
• Core• Cache hierarchy• Interconnect
– Note: source code available
![Page 12: Performance Model for Future Multicore Process Designs](https://reader035.vdocuments.site/reader035/viewer/2022072013/56812b70550346895d8f9188/html5/thumbnails/12.jpg)
Enough?
• Limitations– Homogenous core– Topology
• Bus only• W./ variable bus width though