lawrence rauchwerger parasol laboratory cs, texas a&m university speculative run-time...
TRANSCRIPT
![Page 1: Lawrence Rauchwerger Parasol Laboratory CS, Texas A&M University Speculative Run-Time Parallelization](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649f425503460f94c612a6/html5/thumbnails/1.jpg)
Lawrence Rauchwerger
Parasol LaboratoryCS, Texas A&M Universityhttp://parasol.tamu.edu/
Speculative Run-Time Parallelization
![Page 2: Lawrence Rauchwerger Parasol Laboratory CS, Texas A&M University Speculative Run-Time Parallelization](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649f425503460f94c612a6/html5/thumbnails/2.jpg)
Outline
• Motivation
• LPRD test, a speculative run-time parallelization framework
• Recursive LRPD test for partially parallel loops
• A run-time test for loops with sparse memory accesses
• Summary and Current work
![Page 3: Lawrence Rauchwerger Parasol Laboratory CS, Texas A&M University Speculative Run-Time Parallelization](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649f425503460f94c612a6/html5/thumbnails/3.jpg)
Data Dependence
Data Dependence (DD) :
• Data dependence relations are used as the essential ordering constraints among statements or operations in a program.
• Data dependence happens when two memory accesses access one memory location and at least one of them writes to the location.
.. = XX = ..
X = .... = X
X = ..X = ..
flow anti output
Three basic data dependence relations
![Page 4: Lawrence Rauchwerger Parasol Laboratory CS, Texas A&M University Speculative Run-Time Parallelization](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649f425503460f94c612a6/html5/thumbnails/4.jpg)
Parallel Loop
Can a loop be executed in parallel ? • Test procedure
– FOR every pair of load/store and store/store operations: <L,S> DO– IF (L and S could access the same location in diff. Iteration)– LOOP is sequential
• For array, the memory accesses are functions of loop indices. These functions can be: linear, non-linear, unknown map.
for i = .. A[i] = A[i] + B[i]
for i = .. A[i+1] = A[i] + B[i]
A parallel loop A sequential loop
![Page 5: Lawrence Rauchwerger Parasol Laboratory CS, Texas A&M University Speculative Run-Time Parallelization](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649f425503460f94c612a6/html5/thumbnails/5.jpg)
Parallelism Enabling Transformations
for i = .. temp = A[i] A[i] = B[i] B[i] = temp
for i = … private temp temp = A[i] A[i] = B[i] B[i] = temp
antioutput
• Privatization: let each iteration have separate location for ‘temp’
• ‘temp’ is used as temporary storage in each iteration
Parallel or not?
![Page 6: Lawrence Rauchwerger Parasol Laboratory CS, Texas A&M University Speculative Run-Time Parallelization](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649f425503460f94c612a6/html5/thumbnails/6.jpg)
Reduction Parallelization
Reduction is• Associative and commutative operation of the form: x = x exp• x does not occur in exp or anywhere else in the loop (exceptions)• Usually extracted by pattern matching.
do i = .. A = A + B[i]
• Reduction parallelization:
pA[1:P] = 0do p = 1, P do i = my_i … pA[p] = pA[p] + B[i]do p = 1, P A = A + pA[p]
anti
outputflow
Parallel or not?
![Page 7: Lawrence Rauchwerger Parasol Laboratory CS, Texas A&M University Speculative Run-Time Parallelization](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649f425503460f94c612a6/html5/thumbnails/7.jpg)
Irregular Applications: The most challenging
for i = .. A[B[i]] = A[C[i]] + B[i]
outputflow
• From compiler’s viewpoint, in irregular programs,– Data arrays are accessed via indirections, that usually are input dependent.
– Optimization, parallelization information not available statically
Problem: loop is sequential if any B[i]=C[j], i<>jOr any B[i]=B[j], i<>j
• Adaptation: matrix pivoting, adaptive mesh refinement, etc.
• Approximately, more than 50% scientific programs are irregular [Kennedy’94]
• Irregular applications involve problems defined on irregular or adaptive data structures– CFD, Molecular dynamics, sparse linear algebra.
![Page 8: Lawrence Rauchwerger Parasol Laboratory CS, Texas A&M University Speculative Run-Time Parallelization](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649f425503460f94c612a6/html5/thumbnails/8.jpg)
Irregular Programs: an Example
Sparse Symmetric MVM (irregular : data accesses via indirections)
DO I=1,N ! Row ID DO K=row[I], row[I+1] J=col[K] ! Column ID B[I] += M[K] * X[J] ! M[K]=M[I,J] B[J] += M[K] * X[I] ! M[K]=M[J,I]
X X X
X X X
X X X
X X X
X X X
X
X
X
X
X
X
X
X
M X B
=xDO I=1,N ! Row ID DO J=1,N ! Column ID B[I] += M[I,J] * X[J]
Dense MVM (regular)
![Page 9: Lawrence Rauchwerger Parasol Laboratory CS, Texas A&M University Speculative Run-Time Parallelization](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649f425503460f94c612a6/html5/thumbnails/9.jpg)
Run-Time Parallelization: The Idea
• Perform program analysis and optimization during program execution (run-time)
• Select dynamically between pre-optimized versions of the code
-N < K < N
do i = 1,N A[i+K]=A[i]+B[i]
T
doall i = 1,N A[i+K]=A[i]+B[i]
F
do i = 1,N A[i+K]=A[i]+B[i]
(K, N from input)
[Wolfe 97]
![Page 10: Lawrence Rauchwerger Parasol Laboratory CS, Texas A&M University Speculative Run-Time Parallelization](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649f425503460f94c612a6/html5/thumbnails/10.jpg)
Outline
• Motivation
• LPRD test, a speculative run-time parallelization framework
• Recursive LRPD test for partially parallel loops
• A run-time test for loops with sparse memory accesses
• Summary and Current work
![Page 11: Lawrence Rauchwerger Parasol Laboratory CS, Texas A&M University Speculative Run-Time Parallelization](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649f425503460f94c612a6/html5/thumbnails/11.jpg)
Run-time Parallelization: Irregular Apps.
Solution: instrument code to 1. Collect data access pattern (represented by W[i], R[i])2. Verify whether any data dependence could occur
Inspector/executor, Speculation
Problem: Loop is not parallel if any R[i] = W[j] , i <> j
DO i= A[ W[i] ] = A[ R[i] ] + C[i]
do i = … trace W[i] , R[i]Analyze and scheduledoall i = … A[W[i]] = A[R[i]] …
Inspector/executor
doall i = … trace W[i] , R[i] A[W[i]] = A[R[i]] …Analyzeif ( fail ) do i = …
speculation
![Page 12: Lawrence Rauchwerger Parasol Laboratory CS, Texas A&M University Speculative Run-Time Parallelization](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649f425503460f94c612a6/html5/thumbnails/12.jpg)
Run-time Parallelization: Approaches
Inspector:Tracing + scheduling
Executor
End
Inspector/Executor [Saltz et al 88,91]
No
PatternChanged?
Yes
Schedule Reuse
Speculative execution+ tracing
Success ?
Test
Roll back +Sequential loop
Yes
No
End
Speculative Execution[Rauchwerger & Padua, 95]
Checkpoint
![Page 13: Lawrence Rauchwerger Parasol Laboratory CS, Texas A&M University Speculative Run-Time Parallelization](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649f425503460f94c612a6/html5/thumbnails/13.jpg)
Overview of Speculative Parallelization
Speculative parallelexecution
Success ?
Error Detection(Data dependence analysis)
Checkpoint
Restore
Sequentialexecution
Polaris
Source Code
Compile time
Run-time
Static analysis
Run-time transformations
Yes
No
Run-time Optimization
• Postpones analysis and optimization until execution time– checkpoint/restoration
mechanism– error detection method to test
validity of speculation• Use actual run-time values of
program parameters affecting performance
• Select dynamically between pre-optimized versions of the code
![Page 14: Lawrence Rauchwerger Parasol Laboratory CS, Texas A&M University Speculative Run-Time Parallelization](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649f425503460f94c612a6/html5/thumbnails/14.jpg)
Speculative DOALL Parallelization
Main Idea: • Speculatively execute the loop in parallel
+ Record accesses to data under test in shadows
• Afterwards, analyze if the loop was truly parallel (no actual dependences) by identifying multiple accesses to same location
Speculative parallel execution
+ MARK Read, Write
Success ?
Analysis
Checkpoint
Restore
Sequentialexecution
Yes
No
End
[Rauchwerger & Padua ’94]
W R W R
Replicated shadow of data array (A)
Merged shadow of data array
DO i= A[ W[i] ] = A[ R[i] ] + C[i]
Problem:
![Page 15: Lawrence Rauchwerger Parasol Laboratory CS, Texas A&M University Speculative Run-Time Parallelization](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649f425503460f94c612a6/html5/thumbnails/15.jpg)
DOALL Test – Marking and Analysis
• Parallel speculative execution– Mark read and write operations into different private shadow
arrays. marking writeclear read mark.
– Incremental private write counter (# write operations).
• Post-speculation analysis– Merge private shadow arrays to global shadow arrays.
– Count elements have been marked write.
– If (write shadow ^ read shadow 0) exist anti or flow dep.
– If (# modified elements < # write operations) exist output dep.
![Page 16: Lawrence Rauchwerger Parasol Laboratory CS, Texas A&M University Speculative Run-Time Parallelization](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649f425503460f94c612a6/html5/thumbnails/16.jpg)
LRPD Test: Main Ideas
• Lazy (value-based) Reduction Privatization DOALL test
• Errors:– Loop-carried flow dependence
– Loop-carried anti or output dep. for arrays that are not privatizable.
– Speculatively applied privatization and reduction parallelization transformations are INVALID.
[Rauchwerger & Padua ’95]
![Page 17: Lawrence Rauchwerger Parasol Laboratory CS, Texas A&M University Speculative Run-Time Parallelization](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649f425503460f94c612a6/html5/thumbnails/17.jpg)
The problem: Parallelization of the following loop:
Do I=1,5
z = A(K(I)) if B(I) then A(L(I)) = z + C(I) endifEnddo
B(1:5) = (1,0,1,0,1)
K(1:5) = (1,2,3,4,1)
L(1:5) = (2,2,4,4,2)
• all iterations executed concurrently• unsafe if some A(K(i)) == A(L(j)), ij
Types of Errors:• Data Dependence Related
• Writing a memory location in different iterations • Writing and reading a memory location in different iterations
LRPD Test: an Example
![Page 18: Lawrence Rauchwerger Parasol Laboratory CS, Texas A&M University Speculative Run-Time Parallelization](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649f425503460f94c612a6/html5/thumbnails/18.jpg)
Parallel Speculative Execution and Marking Phase• allocate shadow arrays Aw, Ar, Anp - one per processor • speculatively privatize A and execute loop in parallel• Record accesses to data under test in shadows
Markwrite()
If first time A(i) written in iter.• Mark Aw(i) • Clear Ar(i)• increment tw_A (write counter)
Markread()
If A(i) not already written in iter.• Mark Ar(i) • Mark Anp(i) (not privatizable)
do i = 1, 5S1 z = A[K[i]] if (B[i]) thenS2 A[L[i]] = z+C[i] endif enddo
doall i = 1, 5S1 z = A[K[i]] if (B[i]) then markread(K[i]) markwrite(L[i]) increment (tw_A)S2 A[L[i]] = z+C[i] endif enddo
LRPD Test: Marking Phase
![Page 19: Lawrence Rauchwerger Parasol Laboratory CS, Texas A&M University Speculative Run-Time Parallelization](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649f425503460f94c612a6/html5/thumbnails/19.jpg)
Post-execution Analysis Phase, Detect errors (dependences) by identifying multiple accesses to same location.
• compute tm(A) = sum of marks in Aw across processors (total number of writes in distinct iterations)
• if Aw ^ Ar then loop was NOT a DOALL• if tw = tm then loop was a DOALL• if Aw ^ Anp then loop was NOT a DOALL • otherwise loop privatization was valid and loop was a DOALL
Shadow Array Attempted Counted Outcome
1 2 3 4 Tw Tm
Aw(1:4) 0 1 0 1 3 2 FAIL
Ar(1:4) 1 0 1 0
Anp(1:4) 1 0 1 0
Aw ^ Ar 0 0 0 0 Pass
Aw ^ Anp 0 0 0 0 Pass
LRPD Test: Analysis Phase
![Page 20: Lawrence Rauchwerger Parasol Laboratory CS, Texas A&M University Speculative Run-Time Parallelization](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649f425503460f94c612a6/html5/thumbnails/20.jpg)
Outline
• Motivation
• LPRD test, a speculative run-time parallelization framework
• Recursive LRPD test for partially parallel loops
• A run-time test for loops with sparse memory accesses
• Summary and Current work
![Page 21: Lawrence Rauchwerger Parasol Laboratory CS, Texas A&M University Speculative Run-Time Parallelization](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649f425503460f94c612a6/html5/thumbnails/21.jpg)
do i = 1, 8 z = A[K[i]] A[L[i]] = z + C[i]end do
K[1:8] = [1,2,3,1,44,2,1,1]L[1:8] = [44,5,5,44,3,5,3,3]
iter 1 2 3 4 5 6 7 8
A()
1 R R R R
2 R R
3 R W W W
4 W W R
5 W W W
For LRPD test– One data dependence can invalidate speculative parallelization.– Slowdown is proportional to speculative parallel execution time.– Partial parallelism is not exploited.
Partially Parallel Loop Example
![Page 22: Lawrence Rauchwerger Parasol Laboratory CS, Texas A&M University Speculative Run-Time Parallelization](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649f425503460f94c612a6/html5/thumbnails/22.jpg)
• Main Idea– Transform a partially parallel loop into a sequence of fully
parallel, block-scheduled loops.
– Iterations before the first data dependence are correct and committed.
– Re-apply the LRPD test on the remaining iterations.
• Worst case– Sequential time plus testing overhead
[Dang, Yu and Rauchwerger’02]
The Recursive LRPD
![Page 23: Lawrence Rauchwerger Parasol Laboratory CS, Texas A&M University Speculative Run-Time Parallelization](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649f425503460f94c612a6/html5/thumbnails/23.jpg)
success
Initialize
Commit
Analyze
Execute as DOALL
Checkpoint
if failure
Reinitialize
Restore
Restart
p0
p1
p2
p3
2nd stage
After 2nd stage
After 1st stage
1st stage
Block scheduled iterations
Algorithm
![Page 24: Lawrence Rauchwerger Parasol Laboratory CS, Texas A&M University Speculative Run-Time Parallelization](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649f425503460f94c612a6/html5/thumbnails/24.jpg)
Example
do i = 1, 8
B[i] = f(i)
z = A[K[i]]
A[L[i]] = z + C[i]
enddo
L[1:5] = [2,2,4,4,2,1,5,5]
K[1:5] = [1,2,3,4,1,2,4,2]
start = newstart = 1; success = false; end = 8
initialize shadow array; checkpoint B
while (.not. success)
doall i = newstart, end
B[i] = f(i)
z = pA[K[i]]
pA[L[i]] = z + C[i]
markread(K[i]); markwrite(L[i])
end doall
analyze(success, newstart)
commit(pA, A, start, newstart-1)
if (.not. success) then
restore B[newstart:end]
reinitialize shadow array
endif
end while
![Page 25: Lawrence Rauchwerger Parasol Laboratory CS, Texas A&M University Speculative Run-Time Parallelization](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649f425503460f94c612a6/html5/thumbnails/25.jpg)
• Implemented in run-time pass in Polaris and additional hand-inserted code.– Privatization with copy-in/copy-out for arrays under test.
– Replicated buffers for reductions.
– Backup arrays for checkpointing.
Implementation
![Page 26: Lawrence Rauchwerger Parasol Laboratory CS, Texas A&M University Speculative Run-Time Parallelization](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649f425503460f94c612a6/html5/thumbnails/26.jpg)
7-85-63-41-2iter
WWW5
RWW4
WR3
WR2
RRR1
A()
P4P3P2P1proc
First Stage: Detect cross-proc. DD
7-85-6iter
W5
R4
W3
W2
R1
A()
P4P3P2P1proc
Second Stage: fully parallel
do i = 1, 8
z = A[K[i]]
A[L[i]] = z + C[i]
end do
K[1:8] = [1,2,3,1,44,2,1,1]
L[1:8] = [44,5,5,44,3,5,3,3]
Recursive LRPD Example
![Page 27: Lawrence Rauchwerger Parasol Laboratory CS, Texas A&M University Speculative Run-Time Parallelization](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649f425503460f94c612a6/html5/thumbnails/27.jpg)
• Redistribute remaining iterations across processors.
• Execution time for each stage will decrease.
• Disadvantages:– May uncover new dependences across processors.
– May incur remote cache misses from data redistribution.
p1 p2 p3 p4
1st stage
After 1st stage
2nd stage
After 2nd stage
With Redistribution
p1 p2 p3 p4
1st stage
After 1st stage
2nd stage
After 2nd stage
Without Redistribution
Work Redistribution
![Page 28: Lawrence Rauchwerger Parasol Laboratory CS, Texas A&M University Speculative Run-Time Parallelization](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649f425503460f94c612a6/html5/thumbnails/28.jpg)
Outline
• Motivation
• LPRD test, a speculative run-time parallelization framework
• Recursive LRPD test for partially parallel loops
• A run-time test for loops with sparse memory accesses
• Summary and Current work
![Page 29: Lawrence Rauchwerger Parasol Laboratory CS, Texas A&M University Speculative Run-Time Parallelization](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649f425503460f94c612a6/html5/thumbnails/29.jpg)
1 2 3 4
Question: exist multiple accesses to same location? Two ways to log necessary info. to answer the question.
1. For each data element: which operation accessed it?– Complexity: Proportional to number of elements
2. For each memory related operation: which element did it access?– Complexity: Proportional to number of iterations
1 2 3 4Operations(iterations)
Data elements
5 6 7 8
Dense access Sparse access
Sparse Memory Accesses
Overhead of LRPD test (first way).• Marking (speculation) phase: proportional to # operations.• Analysis phase: proportional to # elements.
not efficient for loops with “sparse accesses”.
[Yu and Rauchwerger’00]
![Page 30: Lawrence Rauchwerger Parasol Laboratory CS, Texas A&M University Speculative Run-Time Parallelization](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649f425503460f94c612a6/html5/thumbnails/30.jpg)
Reduce Overhead of Run-Time Test
2. List – Monotonic Access + variable stride
3. Hash Table – Random access
1. Close form – Monotonic Access + constant stride
(1, 3, 4)
– Use compacted shadow structure for loops w/ sparse access patterns (2nd way).
1 2 3 4
1 2 3 4
1 2 3 4
– Run-time library adaptively selects shadow structures among close form, list and hash.
– Compile-time technique to reduce redundant markings.
![Page 31: Lawrence Rauchwerger Parasol Laboratory CS, Texas A&M University Speculative Run-Time Parallelization](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649f425503460f94c612a6/html5/thumbnails/31.jpg)
Run-Time Test for Loops with Sparse Access
• Speculative Execution– For every static marking site, mark in a temporary private
shadow structure.– At the end of each iteration, adaptively aggregate the
markings (triplet list hash table).– Overhead: proportional to # of distinct array references.
• Analysis Phase– Compares pair by pair the aggregated shadow structures.– May reduce to ranges or triplets comparison.– Overhead: proportional to the dynamic marking sites,
constant proportion of # of distinct array references.
![Page 32: Lawrence Rauchwerger Parasol Laboratory CS, Texas A&M University Speculative Run-Time Parallelization](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649f425503460f94c612a6/html5/thumbnails/32.jpg)
Combine Marks
do … if (pred1) then A(1+W(i)) = … A(2+W(i)) = … … if (pred1) then A(3+W(i)) = …
do … if (pred1) then Mark(A, W(i), WF) A(1+W(i)) = … A(2+W(i)) = … … if (pred1) then A(3+W(i)) = …
• One mark for multiple references if:– Their subscripts are only different on a loop-inv. expression.– They are the same type among RO, RW, WR.– They have the same guarding predicates.
• Combining procedure: – Partition the subscript expressions.– Apply set operations during recursive traversal of CDG.
![Page 33: Lawrence Rauchwerger Parasol Laboratory CS, Texas A&M University Speculative Run-Time Parallelization](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649f425503460f94c612a6/html5/thumbnails/33.jpg)
Outline
• Motivation
• LPRD test, a speculative run-time parallelization framework
• Recursive LRPD test for partially parallel loops
• A run-time test for loops with sparse memory accesses
• Summary and Current work
![Page 34: Lawrence Rauchwerger Parasol Laboratory CS, Texas A&M University Speculative Run-Time Parallelization](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649f425503460f94c612a6/html5/thumbnails/34.jpg)
Effect of Speculative Parallelization
0
1
2
3
4
5
6
TRACK SPICE FMA3D MDG
Programs
Sp
eed
up
s 2
4
8
16
Program Techniques Coverage Suite
TRACK R-LRPD 98% Perfect
SPICE Sparse Test / R-LRPD 89% SPEC’92
FMA3D R-LRPD 71% SPEC’00
MDG LRPD 99% Perfect
![Page 35: Lawrence Rauchwerger Parasol Laboratory CS, Texas A&M University Speculative Run-Time Parallelization](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649f425503460f94c612a6/html5/thumbnails/35.jpg)
Speculative Run-time Parallelization: Summary
• Run-time techniques apply program analysis and optimization transformations during program execution.
• Speculative run-time parallelization techniques (LRPD test, etc.) collect memory access history while executing a loop in parallel.
• Recursive LRPD test can speculatively parallelize any loop.
• Overhead of Run-time speculation can be further reduced by adaptively applying different shadow data structures.
![Page 36: Lawrence Rauchwerger Parasol Laboratory CS, Texas A&M University Speculative Run-Time Parallelization](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649f425503460f94c612a6/html5/thumbnails/36.jpg)
The New Computing Challenge
• Today’s systems: General purpose, Heterogeneous– Poor portability, low efficiency – Need automatic system-level software support
GAUSSIAN Quantum chemistry system
CHARMM Molecular dynamics of organic systems
SPICE Circuit simulation
ASCI Multi-physics simulations
• The Challenge: Easy to Use & High Performance
• Today’s scientific applications: Bio, Multi-physics, etc– Time consuming, dynamic features and irregular data structures.– Need automatic optimization techniques to generate shorten execution.
![Page 37: Lawrence Rauchwerger Parasol Laboratory CS, Texas A&M University Speculative Run-Time Parallelization](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649f425503460f94c612a6/html5/thumbnails/37.jpg)
Today: System Centric Computing
No Global Optimization
(In the interest of dynamic applications)
Compiler(static)
Application(algorithm)
System(OS & Arch)
Execution
Development,Analysis &Optimization
Input Data
• OS services are generic
• Architecture is generic
• Compilers are conservative
Application
Compiler
OS
System-Centric Computing
HW
![Page 38: Lawrence Rauchwerger Parasol Laboratory CS, Texas A&M University Speculative Run-Time Parallelization](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649f425503460f94c612a6/html5/thumbnails/38.jpg)
Approach: Application Centric Computing SmartApps
Input Data
Application
Compiler
HW
OS
Application-Centric Computing
Run-time System:Execution, Analysis& Optimization
Compiler
ApplicationDevelopment,Analysis &Optimization
Architecture(reconfigurable)
OS(modular)
Compiler(run-time)
SmartApp
Application ControlInstance-specific optimization
Compiler + OS + Architecture + Data + Feedback
![Page 39: Lawrence Rauchwerger Parasol Laboratory CS, Texas A&M University Speculative Run-Time Parallelization](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649f425503460f94c612a6/html5/thumbnails/39.jpg)
SmartAppsSystem Architecture
• Configurable executable
• Compiler-internal info.
Parallelizing CompilerAugmented with runtime techniques
Application
Get Runtime InformationSample input, system . etc.
Execute ApplicationContinuously monitor performance and adaptas necessary
Adaptive Software
Runtime tuning (no recompile/reconfigure)
Generate Optimal Applicationand System Configuration
Recompile Applicationand/or Reconfigure System
Smart Applicationrun-time system
Small adaptation (tuning)
Large adaptation(failure, phase change)
![Page 40: Lawrence Rauchwerger Parasol Laboratory CS, Texas A&M University Speculative Run-Time Parallelization](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649f425503460f94c612a6/html5/thumbnails/40.jpg)
Related Publications
The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization, Lawrence Rauchwerger and David Padua, PLDI’95
Parallelizing While loops for Multiprocessor Systems, Lawrence Rauchwerger and David Padua, IPPS’95
Run-time Methods for Parallelizing Partially Parallel Loops, Lawrence Rauchwerger, Nancy Amato and David Padua, ICS’95
SmartApps: An Application Centric Approach to High Performance Computing: Compiler-Assisted Software and Hardware Support for Reduction Operations, F. Dang, M. J. Garzaran, M. Prvulovic, Y. Zhang, A. Jula, H. Yu, N. Amato, L. Rauchwerger and J. Torrellas, NSFNGS, 2002
The R-LRPD Test: Speculative Parallelization of Partially Parallel Loops, F. Dang, H. Yu and L. Rauchwerger, IPDPS’02
Hybrid Analysis: Static & Dynamic Memory Reference Analysis, S. Rus, L. Rauchwerger and J. Hoeflinger, ICS’02
Techniques for Reducing the Overhead of Run-time Parallelization, H. Yu and L. Rauchwerger, CC’00
Adaptive Reduction Parallelization Techniques, H. Yu and L. Rauchwerger, ICS’00
http://parasol.tamu.edu/