zhelong pan [1]
DESCRIPTION
A presentation by Daniel Huguenin on the paper. Fast and Effective Orchestration of Compiler Optimizations for Automatic Performance Tuning. w ritten in 2006 at Purdue University by. Zhelong Pan [1]. Rudolf Eigenmann [2]. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Zhelong Pan [1]](https://reader036.vdocuments.site/reader036/viewer/2022062323/56816294550346895dd30934/html5/thumbnails/1.jpg)
1
Zhelong Pan[1]
This presentation as .pptx: http://tinyurl.com/6y7gy8x (or scan QR code)The paper: http://dl.acm.org/citation.cfm?id=1122414[1] http://www.nic.uoregon.edu/iwomp2005/IWOMP_Photos_Day1/IWOMP_Photos-Images/7.jpg[2] https://engineering.purdue.edu/ResourceDB/ResourceFiles/image3424
Rudolf Eigenmann[2]
Fast and Effective Orchestration of Compiler Optimizations for Automatic Performance Tuning
A presentation by Daniel Huguenin on the paper
written in 2006 at Purdue University by
![Page 2: Zhelong Pan [1]](https://reader036.vdocuments.site/reader036/viewer/2022062323/56816294550346895dd30934/html5/thumbnails/2.jpg)
3
« This is a cite from the paper. Note the dedicated quotation marks. »
Any references are listed here.The paper: http://dl.acm.org/citation.cfm?id=1122414
![Page 3: Zhelong Pan [1]](https://reader036.vdocuments.site/reader036/viewer/2022062323/56816294550346895dd30934/html5/thumbnails/3.jpg)
4
THE PROBLEM
![Page 4: Zhelong Pan [1]](https://reader036.vdocuments.site/reader036/viewer/2022062323/56816294550346895dd30934/html5/thumbnails/4.jpg)
5
Choose optimization options from above to maximize program performance. Good luck.
YOUR TASK!
The table is taken from page 5 of the original paper.
???
??
??
??
??
????
? ??
?? ?
?
![Page 5: Zhelong Pan [1]](https://reader036.vdocuments.site/reader036/viewer/2022062323/56816294550346895dd30934/html5/thumbnails/5.jpg)
6
« Given a set of compiler optimization options {F1, F2, ..., Fn}, find the combination that minimizes the program execution time. Do this efficiently, without the use of a priori knowledge of the optimizations and their interactions. »
OPTIMIZATIONORCHESTRATION
![Page 6: Zhelong Pan [1]](https://reader036.vdocuments.site/reader036/viewer/2022062323/56816294550346895dd30934/html5/thumbnails/6.jpg)
7
GOAL
![Page 7: Zhelong Pan [1]](https://reader036.vdocuments.site/reader036/viewer/2022062323/56816294550346895dd30934/html5/thumbnails/7.jpg)
8
« We present […] Combined Elimination (CE), which aims at picking the best set of compiler optimizations for a program. […] this algorithm takes the shortest tuning time, while achieving comparable or better performance than other algorithms. »
![Page 8: Zhelong Pan [1]](https://reader036.vdocuments.site/reader036/viewer/2022062323/56816294550346895dd30934/html5/thumbnails/8.jpg)
9
ALGORITHMS
![Page 9: Zhelong Pan [1]](https://reader036.vdocuments.site/reader036/viewer/2022062323/56816294550346895dd30934/html5/thumbnails/9.jpg)
10
- Exhaustive Search (ES)*- Batch Elimination (BE)- Iterative Elimination (IE)- Combined Elimination (CE)- Optimization Space Exploration (OSE)- Statistical Selection (SS)*
* Not covered in detail
![Page 10: Zhelong Pan [1]](https://reader036.vdocuments.site/reader036/viewer/2022062323/56816294550346895dd30934/html5/thumbnails/10.jpg)
11
EXHAUSTIVE SEARCH
«1. Get all 2n combinations of n options F1, F2, ..., Fn.2. Measure application execution time of the optimized
version compiled under every possible combination.3. The best version is the one with the least execution time.
»
« For 38 optimizations: It would take up to 238 program runs – a million years for a program that runs in two minutes. »
COMPLEXITY: O(2n)
![Page 11: Zhelong Pan [1]](https://reader036.vdocuments.site/reader036/viewer/2022062323/56816294550346895dd30934/html5/thumbnails/11.jpg)
13
RELATIVE IMPROVEMENTPERFORMANCE (RIP*)
* Not to be confused with Rest In Peace
0100%i B
B iB
T F TRIP F
T -
= A measure for the usefulness of an optimization.
B: The baseline; a configuration of optimization optionsFi: An optimization optionTB: Execution time when compiled under BT(Fi=0): Execution time when compiled under B but with Fi off
![Page 12: Zhelong Pan [1]](https://reader036.vdocuments.site/reader036/viewer/2022062323/56816294550346895dd30934/html5/thumbnails/12.jpg)
14
EXAMPLEBaseline B: F1 = 1, F2 = 1, F3 = 1TB: 80msT(F1 = 0): 100ms (F1 = 0, F2 = 1, F3 = 1)
11
0100%
100 80 100%80
25%
BB
B
T F TRIP F
Tms msms
-
-
![Page 13: Zhelong Pan [1]](https://reader036.vdocuments.site/reader036/viewer/2022062323/56816294550346895dd30934/html5/thumbnails/13.jpg)
BATCH ELIMINATION
16
Would be good if the optimizations did not affect each other.
COMPLEXITY: O(n)
F1, F2, ..., Fn
Compile w/ all-on
ExecuteFor each Fi
Compile with all-on except Fi
Execute T(Fi = 0)
TB
RIPB(Fi = 0)
Yes:Don’t use Fi
No:Use Fi
RIPB(Fi = 0) < 0?
![Page 14: Zhelong Pan [1]](https://reader036.vdocuments.site/reader036/viewer/2022062323/56816294550346895dd30934/html5/thumbnails/14.jpg)
17
EXAMPLECombination F1 F2 Runtime RIPB
1 OFF OFF 320 ms 60%
2 ON OFF 160 ms -20%
3 OFF ON 180 ms -10%
4 ON ON 200 ms (0%) TB
NO!
![Page 15: Zhelong Pan [1]](https://reader036.vdocuments.site/reader036/viewer/2022062323/56816294550346895dd30934/html5/thumbnails/15.jpg)
ITERATIVE ELIMINATION
19
F1, F2, ..., Fn
Compile w/ B
Execute
Compile under B, but Fi = 0
ExecuteT(Fi = 0)
TB
RIPB(Fi = 0)
No:Result in B
Exists Fk: RIPB(Fk = 0) < 0?
S = {F1, F2, ..., Fn}
B = {F1 = 1, ..., Fn = 1}
B.Fk = 0
S = S \ {Fk}
Yes:Find Fk with
minimal RIPB
For each Fi in S
TB = T(Fk = 0)
COMPLEXITY: O(n2)
« [...] IE achieves better program performance than BE, since it considers the interaction of optimizations. However, when the interactions have only small effects, BE may perform close to IE in a faster way. »
![Page 16: Zhelong Pan [1]](https://reader036.vdocuments.site/reader036/viewer/2022062323/56816294550346895dd30934/html5/thumbnails/16.jpg)
20
EXAMPLECombination F1 F2 Runtime RIPB
1 OFF OFF 320 ms 60%
2 ON OFF 160 ms -20%
3 OFF ON 180 ms -10%
4 ON ON 200 ms (0%)
Combination F1 F2 Runtime RIPB
1 OFF OFF 320 ms 100%
2 ON OFF 160 ms (0%)
3 OFF ON 180 ms
4 ON ON 200 ms TB
TB
YES!
![Page 17: Zhelong Pan [1]](https://reader036.vdocuments.site/reader036/viewer/2022062323/56816294550346895dd30934/html5/thumbnails/17.jpg)
22
COMBINED ELIMINATIONF1, F2, ..., Fn
Compile w/ B
Execute
Compile under B, but Fi = 0
ExecuteT(Fi = 0)
TB
RIPB(Fi = 0)
No:Result in B
Exists Fk: RIPB(Fk = 0) < 0?
S = {F1, F2, ..., Fn}
B = {F1 = 1, ..., Fn = 1}
B.Fk = 0
S = S \ {Fk}
Yes:Find Fk with
minimal RIPB
For each Fi in S
TB = T(Fk = 0)
CE
For all remaining Fj with negative RIPB,
check if the RIPB is still negative under the
changed B. If so, remove Fj directly.
COMPLEXITY: O(n2)
« CE takes the advantages of both BE and IE. When the optimizations interact weakly, CE eliminates the optimizations with negative effects in one iteration, just like BE. Otherwise, CE eliminates them iteratively, like IE. »
![Page 18: Zhelong Pan [1]](https://reader036.vdocuments.site/reader036/viewer/2022062323/56816294550346895dd30934/html5/thumbnails/18.jpg)
23
OPTIMIZATION SPACEEXPLORATION
1. Construct a set Ω which consists of a default optimization combination (Here: All on), and n combinations that each switch a single optimization off.
2. Measure the execution time under each combination in Ω. Keep only the m fastest combinations in Ω.
3. Construct a new Ω set consisting of all unions of two optimization combinations in the old Ω set.
4. Repeat 2 and 3 until no new combinations can be generated or the performance gain becomes insignificant.
5. The fastest version in the final Ω is the result.
COMPLEXITY: O(nm2) ~ O(n3)
Idea from S. Triantafyllis, M. Vachharajani, N. Vachharajani, and D. I. August. Compiler optimization-space exploration. In Proceedings of the international symposium on Code generation and optimization, pages 204–215, 2003.
![Page 19: Zhelong Pan [1]](https://reader036.vdocuments.site/reader036/viewer/2022062323/56816294550346895dd30934/html5/thumbnails/19.jpg)
24
F1 F2 ... Fn
Combination 1 0 1 0 1
Combination 2 1 0 1 0
Combination 3 1 1 0 0
...
Combination k 0 0 1 0
COMPLEXITY: O(n2)
You wouldn’t appreciate an in-depth explanation.
STATISTICAL SELECTION
Shown in R. P. J. Pinkers, P. M. W. Knijnenburg, M. Haneda, and H. A. G. Wijshoff. Statistical selection of compiler options. In The IEEE Computer Societys 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems (MASCOTS’ 04), pages 494–501, Volendam, The Netherlands, October 2004.
![Page 20: Zhelong Pan [1]](https://reader036.vdocuments.site/reader036/viewer/2022062323/56816294550346895dd30934/html5/thumbnails/20.jpg)
25
Algorithm Complexity
Exhaustive Search O(2n)
Optimization Space Exploration O(nm2) ~ O(n3)
Statistical Selection O(n2)
Iterative Elimination O(n2)
Combined Elimination O(n2)
Batch Elimination O(n)
COMPLEXITY OVERVIEW
Turtle: http://upload.wikimedia.org/wikipedia/commons/f/f4/Florida_Box_Turtle_Digon3_re-edited.jpgRabbit: http://upload.wikimedia.org/wikipedia/commons/5/59/JumpingRabbit.JPG
From
to
![Page 21: Zhelong Pan [1]](https://reader036.vdocuments.site/reader036/viewer/2022062323/56816294550346895dd30934/html5/thumbnails/21.jpg)
26
PERFORMANCE ANALYSIS
![Page 22: Zhelong Pan [1]](https://reader036.vdocuments.site/reader036/viewer/2022062323/56816294550346895dd30934/html5/thumbnails/22.jpg)
27
TESTING ENVIRONMENT
Pentium 4 SPARC IICPUs
Benchmark
Compiler
CPU2000
Pentium IV: http://www.esaitech.com/objects/catalog/product/image/thb51752.jpgSPARC II: http://upload.wikimedia.org/wikipedia/commons/1/1c/Sun_UltraSPARCII.jpgSPEC Logo: http://www.spec.org/images/SPECsmalllogoreg.pngGCC Logo: http://upload.wikimedia.org/wikipedia/commons/a/a9/Gccegg.svg
Ver. 3.3.3
![Page 23: Zhelong Pan [1]](https://reader036.vdocuments.site/reader036/viewer/2022062323/56816294550346895dd30934/html5/thumbnails/23.jpg)
28
ReferenceSet
TrainingSet
Executable icon:http://fromthegut.org/gwen/peachtree/Windows%20XP.pvm/Windows%20Applications/NTVDM.EXE.app/Contents/Resources/AppBigIcon.pngAll other illustrations except GCC logo are from Office.com.
#include <stdio.h>
#include <stdio.h>
#include <stdio.h>
![Page 24: Zhelong Pan [1]](https://reader036.vdocuments.site/reader036/viewer/2022062323/56816294550346895dd30934/html5/thumbnails/24.jpg)
29
SPEC CPU2000 INTEGER CODE- Compression (2x)- Game Playing: Chess- Group Theory, Interpreter- C Programming Language Compiler- Combinatorial Optimization- Word Processing- PERL Programming Language- Place and Route Simulator- Object-oriented Database- FPGA Circuit Placement and Routing
![Page 25: Zhelong Pan [1]](https://reader036.vdocuments.site/reader036/viewer/2022062323/56816294550346895dd30934/html5/thumbnails/25.jpg)
30
TUNING TIME (INT, P4)
![Page 26: Zhelong Pan [1]](https://reader036.vdocuments.site/reader036/viewer/2022062323/56816294550346895dd30934/html5/thumbnails/26.jpg)
31
PERFORMANCE (INT, P4)
![Page 27: Zhelong Pan [1]](https://reader036.vdocuments.site/reader036/viewer/2022062323/56816294550346895dd30934/html5/thumbnails/27.jpg)
32
COMPARISON
![Page 28: Zhelong Pan [1]](https://reader036.vdocuments.site/reader036/viewer/2022062323/56816294550346895dd30934/html5/thumbnails/28.jpg)
33
THE DOWNSIDE
CE: 2.96h
OSE: 4.51h
SS: 11.96h
Effective average tuning time on P4 @ 2.8 GHz (To scale)
![Page 29: Zhelong Pan [1]](https://reader036.vdocuments.site/reader036/viewer/2022062323/56816294550346895dd30934/html5/thumbnails/29.jpg)
34
THE FUTURE
#include <stdio.h>
for(i = 0; i < 10; ++i){ //...}
if(!over){ //...}
while(true){ printf("%d", ++j); if(j > 2 * i) break;}
iOS-style on/off switch: http://www.tobypitman.com/wp-content/uploads/2010/06/iphone-checkboxes.png