dynamic scheduling and dynamic percolation elkin garcia capsl – ud based on a presentation made by...
Post on 15-Jan-2016
213 views
TRANSCRIPT
Dynamic Scheduling and Dynamic Percolation
Elkin GarciaCAPSL – UD
Based on a presentation made by Rishi KhanET International
1
Motivation
• Static Scheduling is not eable to achieve the maximum performance on a many-core Archtecture (C64)
• EVEN FOR REGULAR APPLICATIONS LIKE MATRIX MULTIPLICATION
2
Issues of Static Scheduling• Blocks are not necessarily multiples of the Optimal
Tile Size (OTS)• Extra overhead for processing non-optimal sized tiles.
• It is worst when many processors share a small fixed amount of on-chip memory
3
Issues of Static Scheduling (2)
• SS does not consider stalls due to arbitration of shared resources.
• SS assumes that TUs doing the same amount of work will complete at the same time.
• Many-core architectures have plenty of shared resources:– FPUs, crossbar, memory, and I-Caches that can
produce unexpected stalls.
4
Dynamic Scheduling (DS)Balances optimally in presence of shared
resources with higher efficiency.
- Partition of matrix C only in tiles.
- Use of atomic operations for low overhead.
5
X =
A B C
Results on Cyclops64
6
Dynamic Percolation (DP)
• Assign tasks (codelets) to TU at runtime with low overhead using a lock-free queue:
• Computation tasks: Compute optimum tiles of 6x6.
• Data movement tasks: Move inputs and outputs between SRAM and DRAM using double buffering.
7
Codelets
• A nonpreemptive set of code that can run to completion once it’s dependencies/Events are met.
• Dependencies/Events: – Data Dependencies– Resource Constrains (Threads/BW)– Desired Behavior (Power)
8
Matrix Multiply in SRAM using Petri nets
INIT
INIT
Comp1Comp1
Clean
Clean
10241024
TT
LowLow
Size: 192x192
X =
A B C9
PLACE
TRANSITION
TOKENS
Matrix Multiply in DRAM
INIT
INIT
Comp1Comp1
Clean
Clean
INIT
INIT
Copy1Copy1
Clean
Clean
10241024Done
Done
88
TT TT
FF
TT
LowLow
10
Rules:Always take highest priority task firstIf two tasks have the same priority, take the task that was enabled firstOtherwise, choose arbitrarily
INIT
INIT
Comp1Comp1
Clean
Clean
INIT
INIT
Copy1Copy1
Clean
Clean
10241024Done
Done
INIT
INIT
Comp2Comp2
Clean
Clean
INIT
INIT
Copy2Copy2
Clean
Clean
10241024Done
Done
ΔTΔT
StartStart
88
88
TT
TT
TT
TT
FF
FF
LowLow
TT
TT
LowLow
Double Buffer Computation Example
11
INIT
INIT
Comp1Comp1
Clean
Clean
INIT
INIT
Copy1Copy1
Clean
Clean
10241024Done
Done
INIT
INIT
Comp2Comp2
Clean
Clean
INIT
INIT
Copy2Copy2
Clean
Clean
10241024Done
Done
ΔTΔT
StartStart
88
88
TT
TT
TT
TT
FF
FF
LowLow
High Init Set 1
TT
TT
Rules:Always take highest priority task firstIf two tasks have the same priority, take the task that was enabled firstOtherwise, choose arbitrarily
LowLow
ΔT
11
11
Low
Double Buffer Computation Example
12
INIT
INIT
Comp1Comp1
Clean
Clean
INIT
INIT
Copy1Copy1
Clean
Clean
10241024Done
Done
INIT
INIT
Comp2Comp2
Clean
Clean
INIT
INIT
Copy2Copy2
Clean
Clean
10241024Done
Done
ΔTΔT
StartStart
88
88
TT
TT
TT
TT
FF
FF
LowLow
High
Low
Init Set 2
TT
TT
Rules:Always take highest priority task firstIf two tasks have the same priority, take the task that was enabled firstOtherwise, choose arbitrarily
LowLow
Comp1(1024)
11
10241024
Double Buffer Computation Example
13
INIT
INIT
Comp1Comp1
Clean
Clean
INIT
INIT
Copy1Copy1
Clean
Clean
10241024Done
Done
INIT
INIT
Comp2Comp2
Clean
Clean
INIT
INIT
Copy2Copy2
Clean
Clean
10241024Done
Done
ΔTΔT
StartStart
88
88
TT
TT
TT
TT
FF
FF
LowLow
High
Low
TT
TT
Rules:Always take highest priority task firstIf two tasks have the same priority, take the task that was enabled firstOtherwise, choose arbitrarily
LowLow
Comp1(1020)
10201020
10241024
Comp2(1024)
Double Buffer Computation Example
14
INIT
INIT
Comp1Comp1
Clean
Clean
INIT
INIT
Copy1Copy1
Clean
Clean
10241024Done
Done
INIT
INIT
Comp2Comp2
Clean
Clean
INIT
INIT
Copy2Copy2
Clean
Clean
10241024Done
Done
ΔTΔT
StartStart
88
88
TT
TT
TT
TT
FF
FF
LowLow
High
Low
TT
TT
Rules:Always take highest priority task firstIf two tasks have the same priority, take the task that was enabled firstOtherwise, choose arbitrarily
LowLow
10241024
Comp2(1024)
Clean
Double Buffer Computation Example
15
INIT
INIT
Comp1Comp1
Clean
Clean
INIT
INIT
Copy1Copy1
Clean
Clean
10241024Done
Done
INIT
INIT
Comp2Comp2
Clean
Clean
INIT
INIT
Copy2Copy2
Clean
Clean
10241024Done
Done
ΔTΔT
StartStart
88
88
TT
TT
TT
TT
FF
FF
LowLow
High
Low
TT
TT
Rules:Always take highest priority task firstIf two tasks have the same priority, take the task that was enabled firstOtherwise, choose arbitrarily
LowLow11
10201020
Comp2(1020)
Init Copy Set
Double Buffer Computation Example
16
INIT
INIT
Comp1Comp1
Clean
Clean
INIT
INIT
Copy1Copy1
Clean
Clean
10241024Done
Done
INIT
INIT
Comp2Comp2
Clean
Clean
INIT
INIT
Copy2Copy2
Clean
Clean
10241024Done
Done
ΔTΔT
StartStart
88
88
TT
TT
TT
TT
FF
FF
LowLow
High
Low
TT
TT
Rules:Always take highest priority task firstIf two tasks have the same priority, take the task that was enabled firstOtherwise, choose arbitrarily
LowLow
88
10001000
Comp2(1000)
Copy (8)
Double Buffer Computation Example
17
INIT
INIT
Comp1Comp1
Clean
Clean
INIT
INIT
Copy1Copy1
Clean
Clean
10241024Done
Done
INIT
INIT
Comp2Comp2
Clean
Clean
INIT
INIT
Copy2Copy2
Clean
Clean
10241024Done
Done
ΔTΔT
StartStart
88
88
TT
TT
TT
TT
FF
FF
LowLow
High
Low
TT
TT
Rules:Always take highest priority task firstIf two tasks have the same priority, take the task that was enabled firstOtherwise, choose arbitrarily
LowLow
500500
Comp2(500)
Clean
Double Buffer Computation Example
18
INIT
INIT
Comp1Comp1
Clean
Clean
INIT
INIT
Copy1Copy1
Clean
Clean
10241024Done
Done
INIT
INIT
Comp2Comp2
Clean
Clean
INIT
INIT
Copy2Copy2
Clean
Clean
10241024Done
Done
ΔTΔT
StartStart
88
88
TT
TT
TT
TT
FF
FF
LowLow
High
Low
TT
TT
Rules:Always take highest priority task firstIf two tasks have the same priority, take the task that was enabled firstOtherwise, choose arbitrarily
LowLow11
495495
Comp2(495)
Done
Double Buffer Computation Example
19
INIT
INIT
Comp1Comp1
Clean
Clean
INIT
INIT
Copy1Copy1
Clean
Clean
10241024Done
Done
INIT
INIT
Comp2Comp2
Clean
Clean
INIT
INIT
Copy2Copy2
Clean
Clean
10241024Done
Done
ΔTΔT
StartStart
88
88
TT
TT
TT
TT
FF
FF
LowLow
High
Low
TT
TT
Rules:Always take highest priority task firstIf two tasks have the same priority, take the task that was enabled firstOtherwise, choose arbitrarily
LowLow
11
490490
Comp2(490)
Init Set 1
Double Buffer Computation Example
20
INIT
INIT
Comp1Comp1
Clean
Clean
INIT
INIT
Copy1Copy1
Clean
Clean
10241024Done
Done
INIT
INIT
Comp2Comp2
Clean
Clean
INIT
INIT
Copy2Copy2
Clean
Clean
10241024Done
Done
ΔTΔT
StartStart
88
88
TT
TT
TT
TT
FF
FF
LowLow
High
Low
TT
TT
Rules:Always take highest priority task firstIf two tasks have the same priority, take the task that was enabled firstOtherwise, choose arbitrarily
LowLow
10241024
480480
Comp2(480) Comp1(1024)
Double Buffer Computation Example
21
INIT
INIT
Comp1Comp1
Clean
Clean
INIT
INIT
Copy1Copy1
Clean
Clean
10241024Done
Done
INIT
INIT
Comp2Comp2
Clean
Clean
INIT
INIT
Copy2Copy2
Clean
Clean
10241024Done
Done
ΔTΔT
StartStart
88
88
TT
TT
TT
TT
FF
FF
LowLow
High
Low
TT
TT
Rules:Always take highest priority task firstIf two tasks have the same priority, take the task that was enabled firstOtherwise, choose arbitrarily
LowLow
10241024
Comp1(1024)
Clean
Double Buffer Computation Example
22
INIT
INIT
Comp1Comp1
Clean
Clean
INIT
INIT
Copy1Copy1
Clean
Clean
10241024Done
Done
INIT
INIT
Comp2Comp2
Clean
Clean
INIT
INIT
Copy2Copy2
Clean
Clean
10241024Done
Done
ΔTΔT
StartStart
88
88
TT
TT
TT
TT
FF
FF
LowLow
High
Low
TT
TT
Rules:Always take highest priority task firstIf two tasks have the same priority, take the task that was enabled firstOtherwise, choose arbitrarily
LowLow
10201020
Comp1(1020)
Init Copy Set 2
11
Double Buffer Computation Example
23
The Cool Demo on MxM
• Due to Rishi Khan (ETI)
04/21/23 UHPC-Portland-Meeting-06-2009 24
Final Results
25
Scalability
26
Summary• Static Optimizations increase performance
substantially. • Dynamic Scheduling and Dynamic Percolation
mitigates the unpredictable effects of resource sharing.
• Optimizations implemented are also power efficient.
27
04/21/23 Tutorial Project Part 2 28
Acknowledgements
• Professor Guang Gao• ETI and CAPSL people that have help on this
project (Rishi Khan, Daniel Orozco, Kelly Livingston, Ioannis Venetis)
• Members of CAPSL