carnegie mellon compiler optimization of scalar value communication between speculative threads...
TRANSCRIPT
![Page 1: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/1.jpg)
Carnegie Mellon
Compiler Optimization of Scalar Value Communication Between Speculative
Threads
Antonia Zhai, Christopher B. Colohan,
J. Gregory Steffan and Todd C. Mowry
School of Computer ScienceCarnegie Mellon University
![Page 2: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/2.jpg)
Compiler Optimization of Scalar Value Communication…
- 2 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Motivation
Industry is delivering multithreaded processors
Improving throughput on multithreaded processors is straight forward
How can we use multithreaded processors to improve How can we use multithreaded processors to improve the performance of a single application?the performance of a single application?
We need parallel programs
Cache
Proc ProcIBM Power4 processor:
2 processor cores per die
4 dies per module
8 64-bit processors per unit
![Page 3: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/3.jpg)
Compiler Optimization of Scalar Value Communication…
- 3 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Automatic Parallelization
• Finding independent threads from integer programs is Finding independent threads from integer programs is limited bylimited by
Complex control flow Complex control flow
Ambiguous data dependencesAmbiguous data dependences
Runtime inputsRuntime inputs
• More fundamentally
• Parallelization is determined at compile time
Thread-Level Speculation Detecting data dependences at runtime
![Page 4: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/4.jpg)
Compiler Optimization of Scalar Value Communication…
- 4 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Thread-Level Speculation (TLS)
Retry
TLS
T1 T2 T3
Load
Thread1
Thread2
Thread3
Load
StoreTime
How do we communicate value between threads?
![Page 5: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/5.jpg)
Compiler Optimization of Scalar Value Communication…
- 5 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Speculation
TiTi+3
while(1) { …= *q; . . . *p = …;}
Retry
Is efficient when the dependence occurs infrequently
Ti+1 Ti+2
…=*q
*p=…
…=*q
*p=…
…=*q
*p=…
…=*q
…=*q
![Page 6: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/6.jpg)
Compiler Optimization of Scalar Value Communication…
- 6 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
while(1) {
…= a;
a =…;
}
Synchronization
a=
=a
TiTi+1
Is efficient when the dependence occurs frequently
wait(stall)signal
wait(a);
signal(a);
![Page 7: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/7.jpg)
Compiler Optimization of Scalar Value Communication…
- 7 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Critical Forwarding Path in TLS
while(1) { wait(a); …= a; a =…; signal(a);}
wait(a);
…= a;
a = …;
signal(a);
crit
ical
fo
rwar
di n
g p
ath
![Page 8: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/8.jpg)
Compiler Optimization of Scalar Value Communication…
- 8 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Cost of Synchronization
The processors spend 43% of total execution time on synchronization
Nor
mal
ized
Exe
cutio
n Ti
me
0
50
100
gcc go mcf parser perlbmk twolf vpr AVERAGE
sync
other
![Page 9: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/9.jpg)
Compiler Optimization of Scalar Value Communication…
- 9 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Outline
Compiler optimization Optimization opportunity• Conservative instruction scheduling• Aggressive instruction scheduling
Performance
Conclusions
![Page 10: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/10.jpg)
Compiler Optimization of Scalar Value Communication…
- 10 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Reducing the Critical Forwarding Path
Long Critical Path Short Critical Path
shorter critical forwarding path less execution timeexecution tim
e
execution time
![Page 11: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/11.jpg)
Compiler Optimization of Scalar Value Communication…
- 11 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
A Simplified Example from GCCdo {
counter = 0;
if(p->jmp)
p = p->jmp;
if(!p->real) {
p = p->next;
continue;
}
q = p;
do {
counter++;
q = q->next;
} while(q);
p = p->next;
} while(p);
start
counter=0p->jmp?
p=p->jmp
p->real?
p=p->next
q = p
counter++;q=q->next;q?
p=p->next
end
jmpnext
P
real
jmpnextreal
jmpnextreal
jmpnextreal
![Page 12: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/12.jpg)
Compiler Optimization of Scalar Value Communication…
- 12 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Insert Wait Before First Use of Pstart
counter=0p->jmp?
p=p->jmp
p->real?
p=p->next
q = p
counter++;q=q->next;q?
p=p->next
end
![Page 13: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/13.jpg)
Compiler Optimization of Scalar Value Communication…
- 13 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Insert Wait Before First Use of Pstart
counter=0wait(p)p->jmp?
p=p->jmp
p->real?
p=p->next
q = p
counter++;q=q->next;q?
p=p->next
end
![Page 14: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/14.jpg)
Compiler Optimization of Scalar Value Communication…
- 14 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Insert Signal After Last Definition of Pstart
counter=0wait(p)p->jmp?
p=p->jmp
p->real?
p=p->next
q = p
counter++;q=q->next;q?
p=p->next
end
![Page 15: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/15.jpg)
Compiler Optimization of Scalar Value Communication…
- 15 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Insert Signal After Last Definition of Pstart
counter=0wait(p)p->jmp?
p=p->jmp
p->real?
p=p->nextsignal(p)
q = p
counter++;q=q->next;q?
p=p->nextsignal(p)
end
![Page 16: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/16.jpg)
Compiler Optimization of Scalar Value Communication…
- 16 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Earlier Placement for Signalsstart
counter=0wait(p)p->jmp?
p=p->jmp
p->real?
p=p->nextsignal(p)
q = p
counter++;q=q->next;q?
p=p->nextsignal(p)
end
How can we systematicallyfind these insertionpoints?
![Page 17: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/17.jpg)
Compiler Optimization of Scalar Value Communication…
- 17 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Outline
Compiler optimization Optimization opportunity Conservative instruction scheduling
How to compute the forwarding value Where to compute the forwarding value
• Aggressive instruction scheduling
Performance
Conclusions
![Page 18: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/18.jpg)
Compiler Optimization of Scalar Value Communication…
- 18 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
A Dataflow Algorithm
Given: Control flow graph entry
exit
For each node in the graph: Can we compute the forwarding value at this node? If so, how do we compute the forwarding value? Can we forward the value at an earlier node?
![Page 19: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/19.jpg)
Compiler Optimization of Scalar Value Communication…
- 19 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Moving Instructions Across Basic Blocks
p=p->nextq = p
signal p
p=p->next
signal psignal p
signal p
end
signal p
endend
(A) (B) (C)
![Page 20: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/20.jpg)
Compiler Optimization of Scalar Value Communication…
- 20 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Example: (1)start
counter=0p->jmp?
p=p->jmp
p->real?
p=p->next
q = p
counter++;q=q->next;q?
p=p->next
endsignal p
![Page 21: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/21.jpg)
Compiler Optimization of Scalar Value Communication…
- 21 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Example (2)start
counter=0p->jmp?
p=p->jmp
p->real?
p=p->next
q = p
counter++;q=q->next;q?
p=p->next
endsignal p
signal p
p=p->next
![Page 22: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/22.jpg)
Compiler Optimization of Scalar Value Communication…
- 22 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Example (3)start
counter=0p->jmp?
p=p->jmp
p->real?
p=p->next
q = p
counter++;q=q->next;q?
p=p->next
end
signal p
p=p->next
signal p
signal p
p=p->next
![Page 23: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/23.jpg)
Compiler Optimization of Scalar Value Communication…
- 23 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Example (4)start
counter=0p->jmp?
p=p->jmp
p->real?
p=p->next
q = p
counter++;q=q->next;q?
p=p->next
end
signal p
p=p->next
signal p
signal p
p=p->next
signal p
p=p->next
![Page 24: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/24.jpg)
Compiler Optimization of Scalar Value Communication…
- 24 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Example (5)start
counter=0p->jmp?
p=p->jmp
p->real?
p=p->next
q = p
counter++;q=q->next;q?
p=p->next
end
signal p
p=p->next
signal p
p=p->next
signal p
signal p
p=p->next
signal p
p=p->next
![Page 25: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/25.jpg)
Compiler Optimization of Scalar Value Communication…
- 25 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Example (6)start
counter=0p->jmp?
p=p->jmp
p->real?
p=p->next
q = p
counter++;q=q->next;q?
p=p->next
end
signal p
p=p->next
signal p
p=p->next
signal p
p=p->next
signal p
signal p
p=p->next
signal p
p=p->next
![Page 26: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/26.jpg)
Compiler Optimization of Scalar Value Communication…
- 26 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Example (7)start
counter=0p->jmp?
p=p->jmp
p->real?
p=p->next
q = p
counter++;q=q->next;q?
p=p->next
end
signal p
p=p->next
signal p
p=p->next
signal p
p=p->next
signal p
signal p
p=p->next
signal p
p=p->next
![Page 27: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/27.jpg)
Compiler Optimization of Scalar Value Communication…
- 27 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Nodes with Multiple Successors
signal p
p=p->next
signal p
p=p->next
signal p
p=p->next
![Page 28: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/28.jpg)
Compiler Optimization of Scalar Value Communication…
- 28 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Example (7)start
counter=0p->jmp?
p=p->jmp
p->real?
p=p->next
q = p
counter++;q=q->next;q?
p=p->next
end
signal p
p=p->next
signal p
p=p->next
signal p
p=p->nextsignal p
p=p->next
signal p
signal p
p=p->next
signal p
p=p->next
![Page 29: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/29.jpg)
Compiler Optimization of Scalar Value Communication…
- 29 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Example (8)start
counter=0p->jmp?
p=p->jmp
p->real?
p=p->next
q = p
counter++;q=q->next;q?
p=p->next
end
signal p
p=p->next
signal p
p=p->next
signal p
p=p->nextsignal p
p=p->next
signal p
p=p->next
signal p
signal p
p=p->next
signal p
p=p->next
![Page 30: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/30.jpg)
Compiler Optimization of Scalar Value Communication…
- 30 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Example (9)start
counter=0p->jmp?
p=p->jmp
p->real?
p=p->next
q = p
counter++;q=q->next;q?
p=p->next
end
signal p
p=p->next
signal p
p=p->next
signal p
p=p->nextsignal p
p=p->next
signal p
p=p->next
signal p
p=p->next
signal p
signal p
p=p->next
signal p
p=p->next
p=p->jmp
![Page 31: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/31.jpg)
Compiler Optimization of Scalar Value Communication…
- 31 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Example (10)start
counter=0p->jmp?
p=p->jmp
p->real?
p=p->next
q = p
counter++;q=q->next;q?
p=p->next
end
signal p
p=p->next
signal p
p=p->next
signal p
p=p->nextsignal p
p=p->next
signal p
p=p->next
signal p
p=p->next
signal p
signal p
p=p->next
signal p
p=p->next
p=p->jmp
![Page 32: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/32.jpg)
Compiler Optimization of Scalar Value Communication…
- 32 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Nodes with Multiple Successors
signal p
p=p->next
signal p
p=p->next
p=p->jmp
![Page 33: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/33.jpg)
Compiler Optimization of Scalar Value Communication…
- 33 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Example (10)start
counter=0p->jmp?
p=p->jmp
p->real?
p=p->next
q = p
counter++;q=q->next;q?
p=p->next
end
signal p
p=p->next
signal p
p=p->next
signal p
p=p->nextsignal p
p=p->next
signal p
p=p->next
signal p
p=p->next
signal p
signal p
p=p->next
signal p
p=p->next
p=p->jmp
![Page 34: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/34.jpg)
Compiler Optimization of Scalar Value Communication…
- 34 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Example (11)start
counter=0p->jmp?
p=p->jmp
p->real?
p=p->next
q = p
counter++;q=q->next;q?
p=p->next
end
signal p
p=p->next
signal p
p=p->next
signal p
p=p->nextsignal p
p=p->next
signal p
p=p->next
signal p
p=p->next
signal p
signal p
p=p->next
signal p
p=p->next
p=p->jmp
![Page 35: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/35.jpg)
Compiler Optimization of Scalar Value Communication…
- 35 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Finding the Earliest Insertion Point
Find at each node in the CFG: Can we compute the forwarding value at this node? If so, how do we compute the forwarding value? Can we forward value at an earlier node?
Earliest analysis A node is earliest, if on some execution path, no earlier
node can compute the forwarding value
![Page 36: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/36.jpg)
Compiler Optimization of Scalar Value Communication…
- 36 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
The Earliest Insertion Point (1)start
counter=0p->jmp?
p=p->jmp
p->real?
p=p->next
q = p
counter++;q=q->next;q?
p=p->next
end
signal p
p=p->next
signal p
p=p->next
signal p
p=p->nextsignal p
p=p->next
signal p
p=p->next
signal p
p=p->next
signal p
signal p
p=p->next
signal p
p=p->next
p=p->jmp
![Page 37: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/37.jpg)
Compiler Optimization of Scalar Value Communication…
- 37 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
The Earliest Insertion Point (2)start
counter=0p->jmp?
p=p->jmp
p->real?
p=p->next
q = p
counter++;q=q->next;q?
p=p->next
end
signal p
p=p->next
signal p
p=p->next
signal p
p=p->nextsignal p
p=p->next
signal p
p=p->next
signal p
p=p->next
signal p
signal p
p=p->next
signal p
p=p->next
p=p->jmp
![Page 38: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/38.jpg)
Compiler Optimization of Scalar Value Communication…
- 38 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
The Earliest Insertion Point (3)start
counter=0p->jmp?
p=p->jmp
p->real?
p=p->next
q = p
counter++;q=q->next;q?
p=p->next
end
signal p
p=p->next
signal p
p=p->next
signal p
p=p->nextsignal p
p=p->next
signal p
p=p->next
signal p
p=p->next
signal p
signal p
p=p->next
signal p
p=p->next
p=p->jmp
![Page 39: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/39.jpg)
Compiler Optimization of Scalar Value Communication…
- 39 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
The Earliest Insertion Point (4)start
counter=0p->jmp?
p=p->jmp
p->real?
p=p->next
q = p
counter++;q=q->next;q?
p=p->next
end
signal p
p=p->next
signal p
p=p->next
signal p
p=p->nextsignal p
p=p->next
signal p
p=p->next
signal p
p=p->next
signal p
signal p
p=p->next
signal p
p=p->next
p=p->jmp
![Page 40: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/40.jpg)
Compiler Optimization of Scalar Value Communication…
- 40 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
The Resulting Graphstart
counter=0wait(p)p->jmp?
p=p->jmpp1=p->nextsignal(p1)
p->real?
p=p1
q = p
counter++;q=q->next;q?
p=p1
end
p1=p->nextsignal(p1)
![Page 41: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/41.jpg)
Compiler Optimization of Scalar Value Communication…
- 41 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Outline
Compiler optimization Optimization opportunity Conservative instruction scheduling Aggressive instruction scheduling
Speculate on control flow Speculate on data dependence
Performance
Conclusions
![Page 42: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/42.jpg)
Compiler Optimization of Scalar Value Communication…
- 42 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
We Cannot Compute the Forwarded Value When…
Control Dependence Ambiguous Data Dependence
signal p
p=p->jmp
signal p
p=p->next
update(p)
signal p
p=p->next
Speculation
![Page 43: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/43.jpg)
Compiler Optimization of Scalar Value Communication…
- 43 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Speculating on Control Dependences
violate(p)
signal p
p=p->next
p=p->jmp
signal p
p=p->next
signal p
p=p->next
p=p->nextsignal(p)
violate(p)
p=p->jmpp=p->nextsignal(p)
![Page 44: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/44.jpg)
Compiler Optimization of Scalar Value Communication…
- 44 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Speculating on Data Dependence
update(p)
p=p->nextsignal(p)
p=load(add);
signal(p);
store1
store3
store2
p=load(add);
signal(p);
Hardware support
![Page 45: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/45.jpg)
Compiler Optimization of Scalar Value Communication…
- 45 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Hardware Support
store1
store3
store2
p = load(0xADE8);
0xADE8
mark_load(0xADE8)
p = load(addr);mark_load(addr)
store2(0x8438)
store3(0x88E8)
unmark_load(addr)
unmark_load(0xADE8)
![Page 46: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/46.jpg)
Compiler Optimization of Scalar Value Communication…
- 46 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Outline
Compiler optimization Optimization opportunity Conservative instruction scheduling Aggressive instruction scheduling
Performance Conservative instruction scheduling Aggressive instruction scheduling Hardware optimization for scalar value communication Hardware optimization for all value communication
Conclusions
![Page 47: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/47.jpg)
Compiler Optimization of Scalar Value Communication…
- 47 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Conservative Instruction Scheduling
sync
failotherbusy
gcc go mcf parser perlbmk twolf vpr AVERAGE
Nor
mal
ized
Exe
cutio
n Ti
me
0
50
100
U=No Instruction Scheduling
A=Conservative Instruction Scheduling
Improves performance by 15%
![Page 48: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/48.jpg)
Compiler Optimization of Scalar Value Communication…
- 48 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Optimizing Induction Variable Only
sync
failotherbusy
gcc go mcf parser perlbmk twolf vpr AVERAGE
Nor
mal
ized
Exe
cutio
n Ti
me
0
50
100
U=No Instruction Scheduling
I=Induction Variable Scheduling only
A=Conservative Instruction Scheduling
Is responsible for 10% performance improvement
![Page 49: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/49.jpg)
Compiler Optimization of Scalar Value Communication…
- 49 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Benefits from Global Analysis
Multiscalar instruction scheduling[Vijaykumar, Thesis’98]
Uses local analysis to schedule instructions across basic blocks
Does not allow scheduling of instruction across inner loops
Nor
mal
ized
Exe
cutio
n Ti
me
0
50
100
gcc goM A M A
sync
failotherbusy
M=Multiscalar Scheduling
A=Conservative Scheduling
![Page 50: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/50.jpg)
Compiler Optimization of Scalar Value Communication…
- 50 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Aggressively Scheduling Instructions Across Control Dependences
sync
failotherbusy
gcc go mcf parser perlbmk twolf vpr AVERAGE
Nor
mal
ized
Exe
cutio
n Ti
me
0
50
100
A=Conservative Instruction Scheduling
C=Aggressive Instruction Scheduling(Control)
Has no performance improvement over conservative scheduling
![Page 51: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/51.jpg)
Compiler Optimization of Scalar Value Communication…
- 51 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Aggressively Scheduling Instructions Across Control and Data Dependence
gcc go mcf parser perlbmk twolf vpr AVERAGE
sync
failotherbusy
Nor
mal
ized
Exe
cutio
n Ti
me
0
50
100
A=Conservative Instruction Scheduling
C=Aggressive Instruction Scheduling(Control)
D=Aggressive Instruction Scheduling(Control + Data)
Improves performance by 9% over conservative scheduling
![Page 52: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/52.jpg)
Compiler Optimization of Scalar Value Communication…
- 52 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Hardware Optimization Over Non-Optimized Code
sync
failotherbusy
gcc go mcf parser perlbmk twolf vpr AVERAGE
Nor
mal
ized
Exe
cutio
n Ti
me
0
50
100
U=No Instruction Scheduling
E=No Instruction Scheduling+Hardware Optimization
Improves performance by 4%
![Page 53: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/53.jpg)
Compiler Optimization of Scalar Value Communication…
- 53 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Hardware Optimization + Conservative Instruction Scheduling
sync
failotherbusy
gcc go mcf parser perlbmk twolf vpr AVERAGE
Nor
mal
ized
Exe
cutio
n Ti
me
0
50
100
A=Conservative Instruction Scheduling
F=Conservative Instruction Scheduling+Hardware Optimization
Improves performance by 2%
![Page 54: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/54.jpg)
Compiler Optimization of Scalar Value Communication…
- 54 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Hardware Optimization + Aggressive Instruction Scheduling
gcc go mcf parser perlbmk twolf vpr AVERAGE
Nor
mal
ized
Exe
cutio
n Ti
me
0
50
100 sync
failotherbusy
D=Aggressive Instruction Scheduling
O=Aggressive Instruction Scheduling+Hardware Optimization
Degrades performance slightly
Hardware optimization is less important with compiler optimization
![Page 55: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/55.jpg)
Compiler Optimization of Scalar Value Communication…
- 55 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Hardware Optimization for Communicating Both Scalar and Memory Values
Reduces cost of violation by 27.6%,
translating into 4.7% performance improvement
Nor
mal
ized
Exe
cutio
n Ti
me
0
50
100
gcc go mcf parser perlbmk twolf vprcompress crafty gap gzip ijpeg m88k vortex
sync
failotherbusy
![Page 56: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/56.jpg)
Compiler Optimization of Scalar Value Communication…
- 56 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Impact of Hardware and Software Optimizations
6 benchmarks are improved by 6.2-28.5%, 4 others by 2.7-3.6%
Nor
mal
ized
Exe
cutio
n Ti
me
0
50
100
gccgo mcf
parserperlbmk
twolf vprcompress
craftygap
gzipijpeg
m88kvortex
![Page 57: Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan](https://reader034.vdocuments.site/reader034/viewer/2022051417/5697c0201a28abf838cd243f/html5/thumbnails/57.jpg)
Compiler Optimization of Scalar Value Communication…
- 57 - Zhai, Colohan, Steffan and Mowry
Carnegie Mellon
Conclusions
Critical forwarding path is an important bottleneck in TLS
Loop induction variables serialize parallel threads Can be eliminated with our instruction scheduling algorithm
Non-inductive scalars can benefit from conservative instruction scheduling
Aggressive instruction scheduling should be applied selectively Speculating on control dependence alone is not very effective Speculating on both control and data dependences can reduce
synchronization significantly GCC is the biggest benefactor
Hardware optimization is less important as compiler schedules instructions more aggressively
Critical forwarding path is best addressed by the compiler