Download - Smart Art) [Compatibility Mode]
-
8/3/2019 Smart Art) [Compatibility Mode]
1/25
Bo Yang, Liang Guang, Tero Sntti, Juha Plosila
Parameter-Optimized SimulatedAnnealing for Application
Mapping on Networks-on-Chip
-
8/3/2019 Smart Art) [Compatibility Mode]
2/25
Outline
Introduction
Application Mapping
Implementation of SA
Nelder-Mead Simplex Method
Experiment and Analysis
Conclusion
-
8/3/2019 Smart Art) [Compatibility Mode]
3/25
Introduction
Moores Law is still valid (ITRSs perspective)
What could we do with billions of transistors?
Tens to hundreds of cores on a single chip
80-core Intel Terascale Chip
Tilera TILE-Gx Family with 16 to 100 processing cores
...
Intel: Why a 1,000-core chip is feasible. ZDNet, 2010
Manycore architecture has become the mainstreamfor parallel commputing
-
8/3/2019 Smart Art) [Compatibility Mode]
4/25
Major Concern
Communication, instead of computation
Great impact on perfomance and energy consumption
Conventional bus, point-to-point connections-bottleneck
Networks-on-Chip (NoC)
Better scalability
Higher reliability
More reusability
Introduction
-
8/3/2019 Smart Art) [Compatibility Mode]
5/25
Application
a set of concurrent tasks
modeled by the communication weighted graph (CWG)
Many-core NoC
a set of tiles and links
Modeled by the computation and communication resource graph(CCRG)
Application Mapping
-
8/3/2019 Smart Art) [Compatibility Mode]
6/25
The role is to determine how to place each task on a tile of theNoC so that the specific design interests and costraints arefulfilled.
Objective: mapping solution to minimize the communicationenergy consumption
Application Mapping
-
8/3/2019 Smart Art) [Compatibility Mode]
7/25
Energy model of NoC [Jingcao2005]
Energy consumped by one communication
where : data volume transferred from task i to j
: distance of communication channel from node i
to j on the NoC and : energy consumed by switch and link
for transferring one bit of data on the NoC
Application Mapping
-
8/3/2019 Smart Art) [Compatibility Mode]
8/25
Objective Formulization
Communication energy consumption of an application
Given constants and , Eapp is linearly proportional to
the product of and of all communications .
Weghted Communication of an Application (WCA)
The objective of the application mapping is to findthe optimal solution with minimal WCA.
Application Mapping
SmallerWCA
Bettersolution
-
8/3/2019 Smart Art) [Compatibility Mode]
9/25
NP-hard problem
to map m tasks on n cores (m n )
possible solutions
search space increases exponentionally withproblem size m and n
Exhaustive search is impossible. (e.g., n =m =25,
25!1.55e25 ) Heuristic search including Simulated Annealing
(SA), Tabu Search (TS), Greedy Incremental (GI),etc.
Application Mapping
)!1(! mn
n
-
8/3/2019 Smart Art) [Compatibility Mode]
10/25
Pro and Con
Be able to find global optima
Numerous computions and evaluations-long runtime
Parameters and Functions
Initial temperature T0 Final temperature T f Cost fucntion Cost(S)
Temperature function T e m p ( i ) Acceptance function Accept ( C, T)
Termination function Term ina te ( i , R)
Move function Move( S, T)
etc.
Simulated Annealing
-
8/3/2019 Smart Art) [Compatibility Mode]
11/25
Cost function Cost(S)
Cost(S) = WCA of solution S
Temperature function T e m p ( i )
i :
# of iterations,q: cooling ratio
L: # of iterations at each temperature
Simulated Annealing
LiqTiTemp 0)(
-
8/3/2019 Smart Art) [Compatibility Mode]
12/25
Acceptance function Accept ( C, T)
C0: initial cost, C : cost difference
K: normalized ratio
Termination function Term ina t e ( i , R)
Move function Mov e( S, T)
Single random swapping
A task in current solution is randomly selected and swapped to arandomly selected tiles to generate a new solution
Simulated Annealing
ZNRRTiTemp Cf 0max)(
)exp(1
1
0
()TKC
Cprobrandom
-
8/3/2019 Smart Art) [Compatibility Mode]
13/25
Initial temperature T0 and final temperature T f Solve the acceptance function for T
T0 and T f can be derived by:
p r ob 0:probability of accepting Cm ax at temperature T0 p r ob f:probability of accepting Cm in at temperature T f
Simulated Annealing
)11ln(
0
00
max
probKC
CT
)11ln(0
probKC
CT
)11ln(0
min
fprobKC
CfT
-
8/3/2019 Smart Art) [Compatibility Mode]
14/25
To summary, we need to determine parameters:
q : colling ratio
K: normalizing ratio
p r ob 0: probability of accepting Cm ax at temperature T0 p r ob f: probability of accepting Cm in at temperature T f Cm a x , Cm in : coputed using a finite number of trial moves
Considerations on parameter selection
problem-specific
jointly afftect the performance of SA
The set of parameters should be selected in a systemway, instead of being set mannually and independently.
Simulated Annealing
-
8/3/2019 Smart Art) [Compatibility Mode]
15/25
Method for minimization of a function f ( p )
Proposed by Nelder et.al in 1965
f ( p ) : function with n variables x1, x2, , xn n + 1 points form the initial simplex, each point p
kis a n-
tuple (xk1, xk2, , xkn)
Sort the n + 1 function values so that f ( p 0 ) f ( p 1 ) f ( p n )
To get the minimum off ( p ) , in each iteration:
a new simplex is formed:either by replacing the point p n with
the re fe lec t i on po in t , the expans ion po in t or thecon t r act i on po in t , or by updating all points when thepreceding replacements failed.
Sort the n + 1 function values of points in the new simplex andcontinue the process
Nelder-Mead Simplex Method
-
8/3/2019 Smart Art) [Compatibility Mode]
16/25
The process terminates until f ( p 0) , f ( p 1) , , f ( p n)converge to one value which is the approximation ofthe mimimum value of function f ( p )
For more detail, refer J.A.Nelder and R.Mead. Asimplex method for function minimization.
Nelder-Mead Simplex Method
-
8/3/2019 Smart Art) [Compatibility Mode]
17/25
Parameter-Optimized SA (POSA) algorithm
Variables: q, K , p r ob 0 and p r ob f Initial simplex:5 initial points consisting of selected
values of 4 variables
SA algorihm applies each set of parameters of onepoint and finds one mapping solution
The WCA of the mapping solution found by the SAalgorithm is defined as the value of function f ( p ) andcompared with others
The Nelder-Mead method terminates when all 5 pointsconverge to one point which represents the set ofoptimized parameters we try to find
This set of optimized parameters is then applied to theSA algorithm to find the best mapping solution
Nelder-Mead Simplex Method
-
8/3/2019 Smart Art) [Compatibility Mode]
18/25
Expereiment and Result
Setup
Four applications
video object plane decoder (VOPD, 16 tasks)
MPEG4 12 tasks
multimedia systems application (MMS, 25 tasks)H.264 decoder (H264, 16 tasks)
Reference work: NoCMap ([Jingcao2005])
Parameters are set manually
q:0.9, T0:100, Tf:unbounded
Exponential form of acceptance function
Random swapping move function
Simulator in NoCMap is used to obtain the communication energyconsumption for POSA and NoCMap
-
8/3/2019 Smart Art) [Compatibility Mode]
19/25
Expereiment and Result
Optimized Parameters
Application q prob0 probf K
VOPD 0.91 0.44 0.05 0.72
MPEG4 0.95 0.34 0.05 0.36
MMS 0.94 0.36 0.05 0.62
H264 0.89 0.42 0.05 0.49
Parameters are problem-specific
Instead of using identical set of parameters, for different problems, differentsets of parmaters should be applied to the SA algorithm
-
8/3/2019 Smart Art) [Compatibility Mode]
20/25
Expereiment and Result
Number of I terations
Application NoCMap POSA POSA/ NoCMap
VOPD 4.30e6 2.74e4 0.64%
MPEG4 2.61e6 2.77e4 1.06%
MMS 1.14e7 1.18e5 1.04%
H264 1.61e6 1.94e4 1.02%
Avg. 0.94%
POSA uses significantly less number of iterations
On average less than 1% of that in NoCMap
-
8/3/2019 Smart Art) [Compatibility Mode]
21/25
Expereiment and Result
Runtime of SA (seconds)
App NoCMap POSANoCMap/ POSA
POSANoCMap/ POSA
VOPD 31.69 15.50 2.04 0.087 364
MPEG4 15.74 9.67 1.63 0.059 267
MMS 171.74 181.75 0.94 1.17 147
H264 12.34 11.90 1.04 0.072 171
Avg. 1.41 237
POSA includes the runtime of the Nelder-Mead method
POSA is the runtime of SA applying the optimized parameters
On average, a 237 times of speedup is achieved
-
8/3/2019 Smart Art) [Compatibility Mode]
22/25
Expereiment and Result
Weighted Communication (WCA)
The mapping solution of POSA yields comparable WCA with that of NoCMap
-
8/3/2019 Smart Art) [Compatibility Mode]
23/25
Expereiment and Result
Energy Consumption(EC)
Be consistent with the result of WCA
The mapping solution of POSA yields comparable communication energy consumption withthat of NoCMap
-
8/3/2019 Smart Art) [Compatibility Mode]
24/25
Conclusion
A method to systematically select the parameters of the SAalgorithm for the application mapping problem is proposed.
With the set of optimized parameters, significantly less numberof evaluations are processed in the POSA and the SA algorithmis accelerated.
The accelerated POSA algorithm achieves comparable energyconsumption with NoCMap.
For the set of benchmarks, the POSA obtains the same qualitymapping solutions while using less than 1% of iterations ofNoCMap and achieving an average of 237 times of speedup.
-
8/3/2019 Smart Art) [Compatibility Mode]
25/25
Thank for your attention!
Comments and Questions