from serial to parallel - fortranfortrandev/2010/bcs_multicore_programming.pdf · 0.346 8m2...
TRANSCRIPT
![Page 1: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/1.jpg)
From Serial to Parallel From Serial to Parallel From Serial to Parallel From Serial to Parallel
Stephen Blair-ChappellIntel Compiler Labs
www.intel.com
![Page 2: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/2.jpg)
AgendaAgendaAgendaAgenda
�Why Parallel?
�Optimising Applications
Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
Intel® Software Development Products Overview6/18/20102
�Steps to move from Serial to Parallel
![Page 3: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/3.jpg)
Congratulations BCS FIG.wmv
3
Congratulations BCS FIG.wmv
![Page 4: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/4.jpg)
Moving to Parallel Moving to Parallel Moving to Parallel Moving to Parallel –––– a view from some developersa view from some developersa view from some developersa view from some developers
�Top 5 challenges
–Legacy
–Education
–Tools
Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
Intel® Software Development Products Overview6/18/20104
–Tools
–Fear of many cores
–Maintainability
![Page 5: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/5.jpg)
Why Parallel?Why Parallel?Why Parallel?Why Parallel?
Section 1
![Page 6: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/6.jpg)
•
Why is everyone going multi-core?
Po
wer
Den
sit
yP
ow
er
Den
sit
y(W
/cm
2)
(W/c
m2)
Power Density Race
1,0001,000
10,00010,000
Nuclear ReactorNuclear Reactor
Rocket NozzleRocket Nozzle
Sun’s SurfaceSun’s Surface
Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
Intel® Software Development Products Overview
Po
wer
Den
sit
yP
ow
er
Den
sit
y(W
/cm
2)
(W/c
m2)
4004400480088008
80808080
80858085
80868086
286286386386
486486
PentiumPentium®®
processorsprocessors
11
1010
100100
’70’70 ’80’80 ’90’90 ’00’00 ’10’10
Hot PlateHot Plate
Nuclear ReactorNuclear Reactor
![Page 7: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/7.jpg)
Moore’s Law reinterpretedMoore’s Law reinterpretedMoore’s Law reinterpretedMoore’s Law reinterpretedMoore’s Law reinterpretedMoore’s Law reinterpretedMoore’s Law reinterpretedMoore’s Law reinterpreted� Speed no longer
increasing
� Num transistors still growing
� Num Cores rather than clock speed is doubling every 18
From K. Olukotun, L. Hammond, H. Sutter, and B. Smith
Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
Intel® Software Development Products Overview
doubling every 18 months
![Page 8: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/8.jpg)
Theoretical growth of coresTheoretical growth of coresTheoretical growth of coresTheoretical growth of cores
Growth of Multicore
128
512
20481000
10000
Nu
m C
ore
s
Cores
Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
Intel® Software Development Products Overview
2
8
32
128
21
10
100
2000 2005 2010 2015 2020 2025
year
Nu
m C
ore
s
Cores
Cores Act
![Page 9: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/9.jpg)
Future: Multicore and ManycoreFuture: Multicore and ManycoreFuture: Multicore and ManycoreFuture: Multicore and Manycore
All Large Core
Mixed Largeand
Small Core
All Small Core
Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
Intel® Software Development Products Overview
All Small Core
Note: the above pictures don’t necessarily represent any current or future Intel products
Connections to memory bank(s), connections between processors,memory coherency models – all come into play. Diversity!
![Page 10: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/10.jpg)
MultiMultiMultiMulti----core : beating the core : beating the core : beating the core : beating the powerpowerpowerpower\\\\performanceperformanceperformanceperformance barrierbarrierbarrierbarrier
1.00x1.00x1.00x1.00x1.00x1.00x1.00x1.00x
1.73x1.73x1.73x1.73x1.73x1.73x1.73x1.73x
1.13x1.13x1.13x1.13x1.13x1.13x1.13x1.13x
PowerPowerPowerPowerPowerPowerPowerPower
PerformancePerformancePerformancePerformancePerformancePerformancePerformancePerformance
1.02x1.02x1.02x1.02x1.02x1.02x1.02x1.02x
1.73x1.73x1.73x1.73x1.73x1.73x1.73x1.73x
DualDualDualDualDualDualDualDual--------CoreCoreCoreCoreCoreCoreCoreCore
0.51x0.51x0.51x0.51x0.51x0.51x0.51x0.51x
0.87x0.87x0.87x0.87x0.87x0.87x0.87x0.87x
Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
Intel® Software Development Products Overview
OverOverOverOverOverOverOverOver--------clockedclockedclockedclockedclockedclockedclockedclocked(+20%)(+20%)(+20%)(+20%)(+20%)(+20%)(+20%)(+20%)
Relative singleRelative singleRelative singleRelative singleRelative singleRelative singleRelative singleRelative single--------core frequency and Vcccore frequency and Vcccore frequency and Vcccore frequency and Vcccore frequency and Vcccore frequency and Vcccore frequency and Vcccore frequency and Vcc
DesignDesignDesignDesignDesignDesignDesignDesignFrequencyFrequencyFrequencyFrequencyFrequencyFrequencyFrequencyFrequency
DualDualDualDualDualDualDualDual--------corecorecorecorecorecorecorecore((((((((--------20%)20%)20%)20%)20%)20%)20%)20%)
0.51x0.51x0.51x0.51x0.51x0.51x0.51x0.51x
UnderUnderUnderUnderUnderUnderUnderUnder--------clockedclockedclockedclockedclockedclockedclockedclocked((((((((--------20%)20%)20%)20%)20%)20%)20%)20%)
![Page 11: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/11.jpg)
Improved Transistor DensityImproved Transistor Density ~2x~2x
Improved Transistor Switching SpeedImproved Transistor Switching Speed >20%>20%
Reduced Transistor Switching PowerReduced Transistor Switching Power ~30%~30%
Reduction in gate oxide leakage powerReduction in gate oxide leakage power >10x>10x
Industry’s First 45 nm HighIndustry’s First 45 nm HighIndustry’s First 45 nm HighIndustry’s First 45 nm High----K + Metal Gate K + Metal Gate K + Metal Gate K + Metal Gate Transistor TechnologyTransistor TechnologyTransistor TechnologyTransistor Technology
Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
Intel® Software Development Products Overview
45 nm S45 nm S--RAM CellRAM Cell0.346 µm20.346 µm2
66--transistortransistor 65 nm S65 nm S--RAM Cell RAM Cell 0.570 µm20.570 µm2
Enables New Features, Higher Performance, Enables New Features, Higher Performance, Greater Energy EfficiencyGreater Energy Efficiency
65 nm Transistor65 nm Transistor 45 nm HK + MG45 nm HK + MG
![Page 12: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/12.jpg)
Intel’s Teraflops Research Chip
Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
Intel® Software Development Products Overview
Podtech_Intel_Research_Day_Terascale.flv
![Page 13: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/13.jpg)
Intel’s Teraflops Research Chip
Speed
GHz
Power
Watts
Perf.
Teraflops
3.16 62 1.01
Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
Intel® Software Development Products Overview
5.1 175 1.63
5.7 265 1.81
![Page 14: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/14.jpg)
LarrabeeLarrabeeLarrabeeLarrabee
Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
Intel® Software Development Products Overview
![Page 15: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/15.jpg)
Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
Intel® Software Development Products Overview
![Page 16: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/16.jpg)
What can we do with faster Computers?What can we do with faster Computers?What can we do with faster Computers?What can we do with faster Computers?
0
20
40
60
80
100
120
0 5 10
Processors
Time
• Solve problems faster
– Reduce turn-around time of big jobs
– Increase responsiveness of interactive apps
Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
Intel® Software Development Products Overview
0
100
200
300
400
500
600
700
0 5 10
Processors
Problem Size• Get better solutions in the
same amount of time
– Increase resolution of models
– Make model more sophisticated
![Page 17: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/17.jpg)
Optimising CodeOptimising CodeOptimising CodeOptimising Code
Section 2
![Page 18: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/18.jpg)
Two points to address before you start parallelisingTwo points to address before you start parallelisingTwo points to address before you start parallelisingTwo points to address before you start parallelising
�Will buying a faster computer solve your problem?
Dr Yann Golanski, York
Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
Intel® Software Development Products Overview
�Maybe Serial Optimisation will be sufficient.
6/18/201020
![Page 19: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/19.jpg)
A ThreeA ThreeA ThreeA Three----Tiered Tuning ModelTiered Tuning ModelTiered Tuning ModelTiered Tuning Model
Tuning Level Question being asked Examples of issues
System wide Can my system be ‘tuned’ to improve the
performance of my application
Network, disk and memory
performance.
Intrusion by 3rd party programs
such as virus scanners.
Application
Heuristics
Can my application code or heuristics to
improve performance?
Code redundancy. Inefficient
program algorithms. Poor
Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
Intel® Software Development Products Overview
Heuristics improve performance? program algorithms. Poor
memory allocation strategies.
Bad \ missing threading
implementation.
Architectural
Bottlenecks
Is the CPU architecture being used at its best? Stalls in CPU pipeline. Data
alignment . Cache misses. Using
expensive instructions. Failing to
use latest generation optimised
instructions.
6/18/201021
![Page 20: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/20.jpg)
Compiler generated Optimisations
Global Compiler Options
Inter-procedural Optimisations
Profile Guided Optimisations
1111
2222
3333
4444
s
p
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201022
Optimisations
Vectorisation
Parallelisation
![Page 21: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/21.jpg)
Example of handExample of handExample of handExample of hand----crafted SSE instructionscrafted SSE instructionscrafted SSE instructionscrafted SSE instructions
1: bool SSEHasNumber(SUDOKU *pPuzzle,__m128i BinArray[], int
i, int j)
2: {
3: __m128i Tmp1 = ( _mm_and_si128(pPuzzle->BinNum[j-1],
BinArray[i]));
4: __m128i Tmp2 = _mm_setzero_si128();
5:
6: Tmp2 = _mm_cmpeq_epi32(Tmp2, Tmp1);
Time Taken Speedup
No SSE 4.55 sec 1
Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
Intel® Software Development Products Overview
6: Tmp2 = _mm_cmpeq_epi32(Tmp2, Tmp1);
7:
8: unsigned int p[4];
9: _mm_storeu_si128((__m128i *)p, Tmp2);
10:
11: if (p[0] == 0 || p[1] == 0 || p[2] == 0)
12: return true;
13: return false;
14: }
No SSE 4.55 sec 1
With SSE 0.19 sec 24
![Page 22: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/22.jpg)
Modern Architectures have lots of features to help speed up code
1111
2222
3333
4444
s
p
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201024
The internals of the Intel low power IA architecture
![Page 23: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/23.jpg)
Intel® VTune™ Performance Analyzer
Graphical tool
Helps characterise runtime performance
System-wide View of application environment
Use to tune serial and
1111
2222
3333
4444
s
p
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201025
Use to tune serial and parallel code
Use to identify Hot Spotsin Code
Use to generate a call graph
![Page 24: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/24.jpg)
Call graphCall graphCall graphCall graph:::: Application workflowApplication workflowApplication workflowApplication workflow
The red lines show the critical path. The critical path is the most time-consuming call path. It is based on self time.
The red lines show the critical path. The critical path is the most time-consuming call path. It is based on self time.
Filter view by self timeFilter view by self time
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201026
Bright orange nodes indicate functions with the highest self time.
Bright orange nodes indicate functions with the highest self time.
Intel, VTune, and the Intel logo are trademarks or registered trademarks of Intel
Corporation or its subsidiaries in the United States or other countries.
![Page 25: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/25.jpg)
Execution Units
ReservationStation
ReorderBuffer
MemorySub-system
Inst. Fetch
Branch Pred
5. uops dispatched
The life of a program instruction
1. Instruction read from memory
2. Instruction fed
4. uops queued in RS
6. Results sent to ROB
Decoder
Retirement
Copyright © 2008, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
dispatched 2. Instruction fed to Decoder
3. Micro-ops (uops)
generated
7. Instruction marked – all
uops executed
8. Instruction sent for
retirement
![Page 26: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/26.jpg)
Execution Units
ReservationStation
ReorderBuffer
MemorySub-system
Inst. Fetch
Branch Pred
Hardware Performance Events
BUS_TRANS_ANY.ALL_AGENTS
RS_UOPS_DISPATCHED.CYCLES_NONE
BUS_TRANS_ANY.ALL_AGENTS
RS_UOPS_DISPATCHED.CYCLES_NONE
Decoder
Retirement
Copyright © 2008, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
CPU_CLK_UNHALTED.CORE
INST_RETIRED.ANY
RS_UOPS_DISPATCHED.CYCLES_NONE
MEM_LOAD_RETIRED.L2_MISS
CPU_CLK_UNHALTED.CORE
INST_RETIRED.ANY
RS_UOPS_DISPATCHED.CYCLES_NONE
MEM_LOAD_RETIRED.L2_MISS
![Page 27: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/27.jpg)
Demo Demo Demo Demo 0 0 0 0 –––– Using Intel© Using Intel© Using Intel© Using Intel© VTuneVTuneVTuneVTuneTMTMTMTM
Performance AnalyzerPerformance AnalyzerPerformance AnalyzerPerformance Analyzer
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel
From 1 to 1,000,000
![Page 28: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/28.jpg)
Notice the System Wide View Notice the System Wide View Notice the System Wide View Notice the System Wide View –––– Can you see any Can you see any Can you see any Can you see any problems?problems?problems?problems?
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201030
![Page 29: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/29.jpg)
What is the problem here?What is the problem here?What is the problem here?What is the problem here?
� VTune Sample-over-time view
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201031
![Page 30: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/30.jpg)
The HotspotThe HotspotThe HotspotThe Hotspot
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201032
![Page 31: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/31.jpg)
Four Steps in Moving to ParallelFour Steps in Moving to ParallelFour Steps in Moving to ParallelFour Steps in Moving to Parallel
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201034
Section 2
![Page 32: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/32.jpg)
Steps in moving from Serial to Parallel
Architectural Analysis
IntroducingParallelism
Validating
Serial
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201035
ValidatingCorrectness
Performance Tuning
Parallel
![Page 33: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/33.jpg)
Key Questions
Design
• Is my program parallel?
• Where is the best place to parallelise my program?
• How can I get my program to run faster?
• What’s the expected speedup?
Code & Debug
• How?
• How difficult?
• Is my code still working?
Verify
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel
Verify
• Is the parallelism correct?
• Do I have deadlocks or data races?
• Do I have memory errors?
• Does my program still work as intended?
Tune
• Do my tasks do equal amounts of work?
• Is my application scalable?
• Is the threading running efficiently?
6/18/201036
![Page 34: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/34.jpg)
Intel Software Tools Supporting Parallel Design Cycle
Architectural Analysis
IntroducingParallelism
Serial
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201037
ValidatingCorrectness
Performance Tuning
Parallel
![Page 35: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/35.jpg)
Architectural Analysis
IntroducingParallelism
Serial
Tools
Existing Intel Software
Intel Parallel Studio
Intel® VTuneTM
Performance Analyzer Advisor/Amplifier
Intel Compilers
Parallel Libraries
Composer
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201038
ValidatingCorrectness
Performance Tuning
Parallel
Intel® Thread Checker
Inspector
Intel® Thread Profiler Amplifier
![Page 36: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/36.jpg)
For Microsoft Visual Studio* C++ architects, developers, and software
innovators creating parallel Windows* applications.
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201039
Intel® Parallel Studio includes:
• Intel® Parallel Advisor Lite **
• Intel® Parallel Composer
• Intel® Parallel Inspector
• Intel® Parallel Amplifier
** Beta – from whatif.intel.com
Microsoft Visual Studio* plug-in
End-to-end product suite for parallelism
Forward scaling to many core
![Page 37: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/37.jpg)
Architectural Analysis
IntroducingParallelism
Serial
Step 1
Which part of my code should I make Parallel?
ValidatingCorrectness
Performance Tuning
Parallel
![Page 38: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/38.jpg)
Key Questions - Design
Is my program parallel?
Where is the best place to parallelise my program?
1111
2222
3333
4444
s
p
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel
How can I get my program to run faster?
What’s the expected speedup?
![Page 39: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/39.jpg)
Identifying best parts to Parallelize
Ph
as
e 1
Ph
as
e 2
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201043
Serial
Ph
as
e 2 Parallel
We need to Identify Hotspots
![Page 40: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/40.jpg)
Why use parallelism?Amdahl’s Law
Describes the upper bound of parallel execution speedup
Serial code limits speedup
0.5 + 0.250.5 + 0.25
n = 2n = 2n = n = ∞∞0.5 + 0.00.5 + 0.0
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201044
(1-P
)P
Tserial
(1-P
)
P/2n = number of processors
Tparallel = {(1-P) + P/n} Tserial
Speedup = Tserial / Tparallel
0.5 + 0.250.5 + 0.25
1.0/0.75 = 1.331.0/0.75 = 1.33P/∞∞
…
0.5 + 0.00.5 + 0.0
1.0/0.5 = 2.01.0/0.5 = 2.0
![Page 41: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/41.jpg)
Some code is not worth making parallel…
Don’t parallelise code
– just because it’s clever
– With low CPU utilisation
– I/O bound
• Do parallelise code that
– Eats significant CPU
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201045
– Eats significant CPU cycles
• You need to get visibility of the runtime behaviour
![Page 42: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/42.jpg)
Architectural Extensions can speed up your codeArchitectural Extensions can speed up your codeArchitectural Extensions can speed up your codeArchitectural Extensions can speed up your code
�Always optimise your code
�Even if you don’t go parallel, some architectural features can still give
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201046
architectural features can still give significant speed-up
�Example, SSE extensions
![Page 43: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/43.jpg)
Our Application - Prime Number Generator
bool TestForPrime(int val)
{ // let’s start checking from 3
int limit, factor = 3;
limit = (long)(sqrtf((float)val)+0.5f);
while( (factor <= limit) && (val % factor) )
factor ++;
return (factor > limit);
}
void FindPrimes(int start, int end)
i factor
61 3 5 7 63 365 3 567 3 5 7
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201047
void FindPrimes(int start, int end)
{
int range = end - start + 1;
for( int i = start; i <= end; i += 2 )
{
if( TestForPrime(i) )
globalPrimes[gPrimesFound++] = i;
ShowProgress(i, range);
}
}
67 3 5 7 69 3 71 3 5 7 73 3 5 7 9 75 3 577 3 5 7 79 3 5 7 9
![Page 44: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/44.jpg)
Demo 1 – Getting the Demo 1 – Getting the Benchmark
From 1 to 1,000,000
![Page 45: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/45.jpg)
Our Application - Prime Number Generator
bool TestForPrime(int val)
{ // let’s start checking from 3
int limit, factor = 3;
limit = (long)(sqrtf((float)val)+0.5f);
while( (factor <= limit) && (val % factor) )
factor ++;
return (factor > limit);
}
void FindPrimes(int start, int end)
i factor
61 3 5 7 63 365 3 567 3 5 7
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201049
void FindPrimes(int start, int end)
{
int range = end - start + 1;
for( int i = start; i <= end; i += 2 )
{
if( TestForPrime(i) )
globalPrimes[gPrimesFound++] = i;
ShowProgress(i, range);
}
}
67 3 5 7 69 3 71 3 5 7 73 3 5 7 9 75 3 577 3 5 7 79 3 5 7 9
![Page 46: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/46.jpg)
Optimise the serial code firstOptimise the serial code firstOptimise the serial code firstOptimise the serial code first
� Using Intel Compiler to automatically generate SSE instructions.
– Code ran twice as fast
– No change made to original code
Calculating Pi
004018D9 movaps xmmword ptr [esp],xmm0
004018DD paddd xmm5,xmm6
004018E1 addpd xmm7,xmm3
004018E5 mulpd xmm7,xmm2
004018E9 add eax,8
004018EC mulpd xmm7,xmm7
004018F0 movaps xmm0,xmmword ptr ds:[406770h]
004018F7 addpd xmm7,xmm1
004018FB divpd xmm0,xmm7
004018FF cvtdq2pd xmm7,xmm5
00401903 paddd xmm5,xmm6
00401907 addpd xmm4,xmm0
0040190B movaps xmm0,xmmword ptr ds:[406770h]
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201050
TimeSecs
Speedup
No SSE 1.29 1.00
With SSE 0.66 1.95
0040190B movaps xmm0,xmmword ptr ds:[406770h]
00401912 addpd xmm7,xmm3
00401916 mulpd xmm7,xmm2
0040191A mulpd xmm7,xmm7
0040191E addpd xmm7,xmm1
00401922 divpd xmm0,xmm7
00401926 movaps xmm7,xmmword ptr [esp]
0040192A addpd xmm7,xmm0
0040192E cvtdq2pd xmm0,xmm5
00401932 movaps xmmword ptr [esp],xmm7
Example of SSE compiler-generated instructions
![Page 47: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/47.jpg)
Demo Demo Demo Demo 2 2 2 2 –––– Using the Intel CompilerUsing the Intel CompilerUsing the Intel CompilerUsing the Intel Compiler
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel
Demo Demo Demo Demo 2 2 2 2 –––– Using the Intel CompilerUsing the Intel CompilerUsing the Intel CompilerUsing the Intel Compiler
From 1 to 1,000,000
![Page 48: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/48.jpg)
Swapping compilers.Swapping compilers.Swapping compilers.Swapping compilers.
� From solution drop-down menu
� Action is reversible
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201052
![Page 49: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/49.jpg)
Program built and run with Intel compilerProgram built and run with Intel compilerProgram built and run with Intel compilerProgram built and run with Intel compiler
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201053
Speedup 1.09
![Page 50: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/50.jpg)
Identifying HotspotsIdentifying HotspotsIdentifying HotspotsIdentifying Hotspots
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201054
Pinpointing places where an application could be parallelised
![Page 51: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/51.jpg)
The Big QuestionThe Big QuestionThe Big QuestionThe Big Question
“How can I make my code run faster?”
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201055
run faster?”
![Page 52: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/52.jpg)
Today’s QuestionToday’s QuestionToday’s QuestionToday’s Question
“Where do I split up my code to take advantage of multiple CPU cores?”
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201056
CPU cores?”
![Page 53: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/53.jpg)
The task, Identifying the Hot Spot…The task, Identifying the Hot Spot…The task, Identifying the Hot Spot…The task, Identifying the Hot Spot…
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201057
![Page 54: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/54.jpg)
… and Splitting up the Work.… and Splitting up the Work.… and Splitting up the Work.… and Splitting up the Work.
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201058
![Page 55: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/55.jpg)
Demo 2 Demo 2 Demo 2 Demo 2 –––– Finding the Hotspots Finding the Hotspots Finding the Hotspots Finding the Hotspots
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201059
![Page 56: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/56.jpg)
Finding a Hot SpotFinding a Hot SpotFinding a Hot SpotFinding a Hot Spot
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201060
![Page 57: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/57.jpg)
Where to Parallelise Where to Parallelise Where to Parallelise Where to Parallelise –––– AmplifierAmplifierAmplifierAmplifier
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201061
Call StackHotspot
![Page 58: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/58.jpg)
Design: What’s the expected speedup?Design: What’s the expected speedup?Design: What’s the expected speedup?Design: What’s the expected speedup?
�Use Amdhals LawSpeedup = 1/[s+(1-s)/n + H(n)]s is serial part (fraction of 1)H is parallel overhead (ignore)n is number of cores
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201062
S = 0Speedup = 1 / [ 0 + ( 1 - 0 ) / 2 ]
= 1 / [ 0 + 0.5 ]
Speedup = 2 ( i.e. new speed ~ 0.672 seconds)
![Page 59: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/59.jpg)
Alternate CalculationAlternate CalculationAlternate CalculationAlternate Calculation
Speedup = 1/[s+(1-s)/n + H(n)]s is serial part (fraction of 1)H is parallel overhead (ignore)n is number of cores
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201063
S = 1 - (1.688 – 0.012)/1.688 = .007Speedup = 1 / [ .007 + ( 1 - .007 ) / 2 ]
= 1 / [ 0007 + 0.4965 ]
Speedup = 1.986 ( i.e. CPU Time ~ 0.850 seconds)
![Page 60: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/60.jpg)
Architectural Analysis
IntroducingParallelism
Serial
Step 2
Implement Parallelism in code
ValidatingCorrectness
Performance Tuning
Parallel
![Page 61: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/61.jpg)
Key Questions – Code & Debug
How?
How difficult?
Is my code still working?
1111
2222
3333
4444
s
p
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel
Is my code still working?
![Page 62: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/62.jpg)
Common types of parallelismCommon types of parallelismCommon types of parallelismCommon types of parallelism
�Functional or Task Parallelism
�Data Parallelism
�Software Pipelining
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201067
�Software Pipelining
![Page 63: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/63.jpg)
Task and Data ParallelismTask and Data ParallelismTask and Data ParallelismTask and Data Parallelism
� Different job for each thread
� e.g. one thread prints, another reads keyboard
� Splitting workload between multiple identical threads
� e.g. three identical threads perform calculations on data array
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201068
Task parallelismTask parallelism Data parallelismData parallelism
![Page 64: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/64.jpg)
Software PipelineSoftware PipelineSoftware PipelineSoftware Pipeline
Collect ACore 1
Core 2
Core 3
Collect B Collect C Collect D …
Transfer A Transfer B Transfer C Transfer D
Polish A Polish B Polish C
…
…
…
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201069
Core 3
Core 4
Time
Polish A Polish B Polish C
Produce A Produce B
…
…
![Page 65: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/65.jpg)
QuestionQuestionQuestionQuestion
�How many different ways can you think of to implement parallelism?
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201070
parallelism?
–E.g OpenMP, …, …
![Page 66: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/66.jpg)
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201071
![Page 67: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/67.jpg)
Auto ParallelismAuto ParallelismAuto ParallelismAuto Parallelism
Loop-level parallelismautomatically suppliedby the compiler
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201072
![Page 68: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/68.jpg)
AutoAutoAutoAuto----parallelization parallelization parallelization parallelization
� Auto-parallelization: Automatic threading of loops without having to manually insert OpenMP* directives.
Windows* Linux* Mac*
/Qparallel -parallel -parallel
/Qpar_report[n] -par_report[n] -par_report[n]
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201073
� Compiler can identify “easy” candidates for parallelization, but large applications are difficult to analyze.
/Qpar_report[n] -par_report[n] -par_report[n]
![Page 69: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/69.jpg)
Optimisation Results Optimisation Results Optimisation Results Optimisation Results –––– pi applicationpi applicationpi applicationpi application
Optimisation Time Taken (secs) Speedup
default default default default 0.9380.9380.9380.938 1111
autoautoautoauto----vectorisationvectorisationvectorisationvectorisation 0.3750.3750.3750.375 2.52.52.52.5
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201074
autoautoautoauto----parallelismparallelismparallelismparallelism 0.5160.5160.5160.516 1.81.81.81.8
autoautoautoauto----vec. & autovec. & autovec. & autovec. & auto----par.par.par.par. 0.2030.2030.2030.203 4.64.64.64.6
![Page 70: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/70.jpg)
OpenMP ArchitectureOpenMP ArchitectureOpenMP ArchitectureOpenMP Architecture
� Fork-Join Model
Worksharing constructs
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201075
� Worksharing constructs
� Synchronization constructs
� Directive/pragma-based parallelism
� Extensive API for finer control
![Page 71: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/71.jpg)
OpenMP RuntimeOpenMP RuntimeOpenMP RuntimeOpenMP Runtime
Environment Variables
User
Application
Directive Compiler
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201076
Threads in Operating System
Runtime Library
![Page 72: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/72.jpg)
OpenMP Programming Model: OpenMP Programming Model: OpenMP Programming Model: OpenMP Programming Model:
Fork-Join Parallelism: �Master thread spawns a team of threads as needed.
�Parallelism added incrementally until performance are met: i.e. the sequential program evolves into a parallel program.
Parallel Regions A Nested A Nested
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201077
Parallel RegionsMaster Thread in red
A Nested Parallel region
A Nested Parallel region
Sequential PartsSequential PartsSequential PartsSequential Parts*Other names and brands may be claimed as the property of others.
![Page 73: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/73.jpg)
Introducing ParallelismIntroducing ParallelismIntroducing ParallelismIntroducing Parallelism
#pragma omp parallel for
for( int i = start; i <= end; i+= 2 ){
if( TestForPrime(i) )
globalPrimes[gPrimesFound++] = i;
ShowProgress(i, range);
OpenMP Divide iterations of the forfor loop
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201078
ShowProgress(i, range);
} Create threads here for this parallel region
![Page 74: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/74.jpg)
Demo 3 : Adding parallelism using Demo 3 : Adding parallelism using Demo 3 : Adding parallelism using Demo 3 : Adding parallelism using ####pragmapragmapragmapragma ompompompomp forforforfor
1111
2222
3333
4444
s
p
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201079
![Page 75: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/75.jpg)
Results Results Results Results –––– Open MP Open MP Open MP Open MP
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201080
Amazing!
We Have a speed up!
![Page 76: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/76.jpg)
Code: Is my code still working?Code: Is my code still working?Code: Is my code still working?Code: Is my code still working?
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201081
Bother !!!!!!!!!!!!!!!!!!! Number of primes is wrong
![Page 77: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/77.jpg)
QuestionsQuestionsQuestionsQuestions
Are the results right?
Was the run quicker?
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201082
Was the run quicker?
![Page 78: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/78.jpg)
Architectural Analysis
IntroducingParallelism
Serial
Step 3
Check for any problems
ValidatingCorrectness
Performance Tuning
Parallel
![Page 79: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/79.jpg)
Key Questions - Verify
Is the parallelism correct?
Do I have deadlocks or data races?
1111
2222
3333
4444
s
p
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel
Do I have memory errors?
Does my program still work as intended?
![Page 80: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/80.jpg)
New paradigm requires new toolsNew paradigm requires new toolsNew paradigm requires new toolsNew paradigm requires new tools
�Using traditional debugging tools is difficult /impossible– Printf – not re-entrant
– Debugging several threads is notoriously hard
– Many debuggers \ profilers are not multi-core enabled
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201086
�Multi-core tools are available
![Page 81: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/81.jpg)
Shared
Non deterministic Error Sources in parallel Applications
• Shared Resourcesrequire locks
Shared
Thread1 Thread2
L1
Thread1 Thread2
• Locks can– ‘serialize’ a program– lead to Deadlocks
X=0 X=0
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201087
SharedMemory X
SharedMemory X
time
L1
time
X=X+1 X=X+1
X=1
Wrong Result( X should be 2)
X=2
![Page 82: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/82.jpg)
Demo 4 Demo 4 Demo 4 Demo 4 –––– Checking for threading Checking for threading Checking for threading Checking for threading errors errors errors errors
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201088
![Page 83: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/83.jpg)
Checking for Errors with Parallel InspectorChecking for Errors with Parallel InspectorChecking for Errors with Parallel InspectorChecking for Errors with Parallel Inspector
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201089
![Page 84: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/84.jpg)
The Offending SourcesThe Offending SourcesThe Offending SourcesThe Offending Sources
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201090
![Page 85: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/85.jpg)
Protecting shared variables#pragma omp parallel for
for( int i = start; i <= end; i+= 2 ){
if( TestForPrime(i) )
#pragma omp critical
globalPrimes[gPrimesFound++] = i;
ShowProgress(i, range);
}
Will create a critical section for this reference
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201091
}
#pragma omp critical
{
gProgress++;
percentDone = (int)(gProgress/range *200.0f+0.5f)
}
Will create a critical section for both these references
![Page 86: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/86.jpg)
Demo 5 Demo 5 Demo 5 Demo 5 –––– Fixing the threading Fixing the threading Fixing the threading Fixing the threading errors errors errors errors
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201092
![Page 87: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/87.jpg)
Data Races Have Disappeared!Data Races Have Disappeared!Data Races Have Disappeared!Data Races Have Disappeared!
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201093
![Page 88: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/88.jpg)
Number of Primes is CorrectNumber of Primes is CorrectNumber of Primes is CorrectNumber of Primes is Correct
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201094
Number of primes is correct
![Page 89: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/89.jpg)
Architectural Analysis
IntroducingParallelism
Serial
Step 4
Tune for best performance
ValidatingCorrectness
Performance Tuning
Parallel
![Page 90: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/90.jpg)
Key Questions -Tune
Is the threading running efficiently?
Do my tasks do equal amounts of work?
1111
2222
3333
4444
s
p
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel
of work?
Is my application scalable?
![Page 91: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/91.jpg)
Performance Issues
Load Balancing
Synchronisation Overhead
Scalability
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201098
Difficult to examine without the right tools
![Page 92: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/92.jpg)
A Reminder – Where are we?
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/201099
Number of primes is correct
Almost as slow as the serial version
![Page 93: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/93.jpg)
Demo 6 – Find the Threading Demo 6 – Find the Threading Performance Issues
![Page 94: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/94.jpg)
Hotspot Analysis
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/2010101
![Page 95: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/95.jpg)
Source View
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/2010102
![Page 96: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/96.jpg)
Improving the Performance
void ShowProgress( int val, int range )
{
int percentDone;
gProgress++;
percentDone = (int)((float)gProgress/(float)range*200.0f+0.5f);
if( percentDone % 10 == 0 )
void ShowProgress( int val, int range )
{
int percentDone;
static int lastPercentDone = 0;
#pragma omp critical
{
gProgress++;
percentDone = (int)((float)gProgress/(float)range*200.0f+0.5f);
}
if( percentDone % 10 == 0 && lastPercentDone < percentDone / 10){
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/2010103
The algorithm has many more updates than the 10 needed for showing progress
if( percentDone % 10 == 0 )
printf("\b\b\b\b%3d%%", percentDone);
}
if( percentDone % 10 == 0 && lastPercentDone < percentDone / 10){
printf("\b\b\b\b%3d%%", percentDone);
lastPercentDone++;
}
}
This change should fix the contention issue
![Page 97: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/97.jpg)
Demo 7 – Fixing the Demo 7 – Fixing the Synchronisation issues
![Page 98: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/98.jpg)
Superb Speedup … ???
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/2010105
Speedup 7.36
On a dual core?
![Page 99: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/99.jpg)
Demo 8 – Getting New Serial Demo 8 – Getting New Serial Benchmark
![Page 100: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/100.jpg)
That’s better (but disappointing)…
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/2010107
Speedup 1.55
![Page 101: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/101.jpg)
Demo 9 – Correcting the Demo 9 – Correcting the Synchronisation Issue
![Page 102: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/102.jpg)
Hotspots
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/2010109
![Page 103: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/103.jpg)
Locks & Waits
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/2010110
![Page 104: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/104.jpg)
Source Code View
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/2010111
![Page 105: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/105.jpg)
Fixing the synchronisation issue -1
This fix removes the need for a critical section
void FindPrimes(int start, int end)
{
// start is always odd
int range = end - start + 1;
#pragma omp parallel for
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/2010112
#pragma omp parallel for
for( int i = start; i <= end; i += 2 )
{
if( TestForPrime(i) )
globalPrimes[InterlockedIncrement(&gPrimesFound)] = i;
ShowProgress(i, range);
}
}
![Page 106: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/106.jpg)
Fixing the synchronisation issue - 2
This fix removes the need for a critical section
void ShowProgress( int val, int range )
{
long percentDone, localProgress;
static int lastPercentDone = 0;
localProgress = InterlockedIncrement(&gProgress);
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/2010113
localProgress = InterlockedIncrement(&gProgress);
percentDone = (int)((float)localProgress/(float)range*200.0f+0.5f);
if( percentDone % 10 == 0 && lastPercentDone < percentDone / 10){
printf("\b\b\b\b%3d%%", percentDone);
lastPercentDone++;
}
}
![Page 107: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/107.jpg)
That’s better (but still disappointing)…
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/2010114
Speedup 1.6
![Page 108: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/108.jpg)
Demo 9 – Improving the Demo 9 – Improving the Load Balancing
![Page 109: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/109.jpg)
Threads are not doing equal work
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/2010117
![Page 110: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/110.jpg)
Fixing a Load Imbalance
Distribute the work more evenly
void FindPrimes(int start, int end)
{
// start is always odd
int range = end - start + 1;
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/2010118
#pragma omp parallel for schedule(static, 8)
for( int i = start; i <= end; i += 2 )
{
if( TestForPrime(i) )
globalPrimes[InterlockedIncrement(&gPrimesFound)] = i;
ShowProgress(i, range);
}
}
![Page 111: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/111.jpg)
That’s better
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/2010119
Speedup 1.92
![Page 112: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/112.jpg)
A Finely Balanced threaded Program!
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/2010120
![Page 113: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/113.jpg)
ScalabilityScalabilityScalabilityScalability
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/2010121
![Page 114: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/114.jpg)
Key Questions -Tune
Is the threading running efficiently?
Do my tasks do equal amounts of work?
1111
2222
3333
4444
s
p
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel
of work?
Is my application scalable?
![Page 115: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/115.jpg)
Moving to Parallel – a view from some developers
Top 5 challenges
•Legacy
•Education
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/2010123
•Education
•Tools
•Fear of many cores
•Maintainability
![Page 116: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/116.jpg)
Scalability http://paralleluniverse.intel.com
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/2010124
![Page 117: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/117.jpg)
The Results
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/2010125
![Page 118: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/118.jpg)
Without the printfs
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/2010126
![Page 119: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/119.jpg)
A run of 10 million
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/2010127
![Page 120: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/120.jpg)
Architectural Analysis
IntroducingParallelism
Serial
Tools
Existing Intel Software
Intel Parallel Studio
Intel® VTuneTM
Performance Analyzer Advisor/Amplifier
Intel Compilers
Parallel Libraries
Composer
Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
From Serial to Parallel6/18/2010128
ValidatingCorrectness
Performance Tuning
Parallel
Intel® Thread Checker
Inspector
Intel® Thread Profiler Amplifier
![Page 121: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/121.jpg)
thank youthank youthank youthank you
intel.com / go / parallelintel.com / go / parallelintel.com / go / parallelintel.com / go / parallel
![Page 122: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,](https://reader030.vdocuments.site/reader030/viewer/2022040901/5e71fd016516ae4a5b347b45/html5/thumbnails/122.jpg)
Q&AQ&Athank youthank youthank youthank you