orchestration: turning material breakthroughs into ... · orchestration: turning material...
TRANSCRIPT
Orchestration: Turning material breakthroughs into application
performance
Jeronimo Castrillon (on behalf of the Orchestration team)
TU Dresden – CfaedChair for Compiler Construction
Orchestration: Materials for performance
© Speaker: J. CastrillonDate: September 21, 2015Page: 2
Cfaed: Research program (revisited)
Goalto explore new technologies for electronic
information processing which overcome the limits of CMOS technology
Orchestration: Materials for performance
© Speaker: J. CastrillonDate: September 21, 2015Page: 3
Goal“to explore new technologies for electronic
information processing which overcome the limits of CMOS technology”
Cfaed: Research program (revisited)
Materials &Functions
Devices &Circuits
Information Processing
CMOS
(industry.focus)
A!!Silicon!Nanowires
B!!Carbon
C!!Organic
D!!BiomolecularAssembly
E!!Chemical
F Orchestration
G Resilience
HHAEC:
Highly Adaptive
ComputingEnergy6Efficient
IBiologicalSystems
Material research for post-CMOS technologies
Systems research to handle heterogeneity
Biology to inspire solutions
Orchestration: Materials for performance
© Speaker: J. CastrillonDate: September 21, 2015Page: 4
Orchestration Path
MissionTurn materials breakthroughs into application
performanceF Orchestration
Orchestration: Materials for performance
© Speaker: J. CastrillonDate: September 21, 2015Page: 5
Orchestration Path
MissionTurn materials breakthroughs into application
performanceF Orchestration
Materials)Inspired Paths (Paths A2– E)
HW/SW%StackHardware
Operating2system
SkeletonsApp.2Algorithms
Databaseapplication
High2PerformanceComputing
Mobile2Communi)cation
Underlying algorithms: self-adjusting
Programming abstractions
Compilers
Formal2Analysis
CoordinationCode synthesis: variant
generation“Micro-kernels”
Heterogeneous “template”
System analysis: modeling, verification &
reasoningCross-layer strategies
Orchestration: Materials for performance
© Speaker: J. CastrillonDate: September 21, 2015Page: 6
Orchestration Path: the team
! Investigators– Prof. Uwe Aßmann– Prof. Franz Baader– Prof. Christel Baier– Prof. Jeronimo Castrillon– Prof. Gerhard Fettweis– Prof. Christof Fetzer– Prof. Jochen Fröhlich
– Prof. Hermann Härtig– Prof. Wolfgang Lehner– Prof. Wolfgang E. Nagel– Dr. Marcus Völp– Prof. Axel Voigt
Orchestration: Materials for performance
© Speaker: J. CastrillonDate: September 21, 2015Page: 7
Orchestration Path: the team (2)
! PhD/Postdocs– Nils Asmußen– Andres Goens– Sebastian Haas– Dirk Habich– Immo Huismann– Tomas Karnagel– Sven Karol– Sascha Klüppelholz– Linda Leuschner
– Matthias Lieber– Siqi Ling– Steffen Märker– Johannes Mey– Benedikt Nöthen– Michael Raitza– Norman Rink– Annett Ungethüm
Orchestration: Materials for performance
© Speaker: J. CastrillonDate: September 21, 2015Page: 8
Outline
! Introduction
! HW/SW stacks
! Orchestration stack
! Outlook
! Concluding remarks
Orchestration: Materials for performance
© Speaker: J. CastrillonDate: September 21, 2015Page: 9
What is a stack
! Software components that form a complete platform (where you can run applications on)– Web app: Operating system, web server, database,
programming language
! Provide means to handle complexity
Orchestration: Materials for performance
© Speaker: J. CastrillonDate: September 21, 2015Page: 10
HW/SW Stack: Example 1
Source:%https://source.android.com/source/index.html
Android%sack
Orchestration: Materials for performance
© Speaker: J. CastrillonDate: September 21, 2015Page: 11
HW/SW Stack: Example 2
Source:%Heterogeneous%System%Architecturehttp://developer.amd.com/resources/heterogeneous=computing/what=is=heterogeneous=system=architecture=hsa/
HSA%Stack
Orchestration: Materials for performance
© Speaker: J. CastrillonDate: September 21, 2015Page: 12
Stacks and wild heterogeneity
! Abstract/hide complexity! Leave as much visible as needed for efficiency
! Emerging techs.: rethink established layers– Languages, compilers, operating systems, …
Orchestration: Materials for performance
© Speaker: J. CastrillonDate: September 21, 2015Page: 13
Example: Compiler – Runtime
Compiler
Runtimeenvironment
“protocol”assumptions
0x0
0xF..F
Code
Data
Heap
Stack
Endurance problem: try not to use same locations that often!
Orchestration: Materials for performance
© Speaker: J. CastrillonDate: September 21, 2015Page: 14
Example: Flash FTL
File%system
Hard%disk
File%system
Flash%drive
Flash%Translation%Layer%(FTL)
Mapping
Wear%leveling
Orchestration: Materials for performance
© Speaker: J. CastrillonDate: September 21, 2015Page: 15
Outline
! Introduction
! HW/SW stacks
! Orchestration stack
! Outlook
! Concluding remarks
Orchestration: Materials for performance
© Speaker: J. CastrillonDate: September 21, 2015Page: 16
Recall: orchestration stack
! Heterogeneous materials– In Cfaed: Along with orchestration
(see other DMA presentations)! Research plan
– Start with heterogeneous CMOS– Tiled for scalability, integrability– Pilot projects with materials as
they become availableMaterials)Inspired Paths (Paths A2– E)
HardwareOperating2system
SkeletonsApp.2Algorithms
Databaseapplication
High2PerformanceComputing
Mobile2Communi)cation
Compilers
Formal2Analysis
Coordination
Orchestration: Materials for performance
© Speaker: J. CastrillonDate: September 21, 2015Page: 17
Applications/Drivers
! Mobile communication: typically run on heterogeneous
! Data-bases: benefits of heterogeneity (see below)! HPC: high level languages, abstractions for
scalability, productivity and portability– Adaptive Finite Element Methods (FEM)– Computational Fluid Dynamics (CFD)
HardwareOperating% system
SkeletonsApp.%Algorithms
Applications
Compilers
Formal%Analysis
Coordination
Orchestration: Materials for performance
© Speaker: J. CastrillonDate: September 21, 2015Page: 18
Applications/Drivers (2)
! Adaptive Finite Element Methods (FEM)– Working with multiple-meshes – Communication across subdomains– Requires repartitioning " load balancing
– Working on! Portable templatized library! First results on heterogeneous platforms
HardwareOperating% system
SkeletonsApp.%Algorithms
Applications
Compilers
Formal%Analysis
Coordination
Dendritic%growth[Witkowski15]
Orchestration: Materials for performance
© Speaker: J. CastrillonDate: September 21, 2015Page: 19
Applications/Drivers (3)
! Computational Fluid Dynamics (CFD)! Large applications (1010 unknowns)! Communication/computation is key
! Working on ! Heterogeneous implementation! More versatile fundamental concepts
HardwareOperating% system
SkeletonsApp.%Algorithms
Applications
Compilers
Formal%Analysis
Coordination
[Huismann15]
Orchestration: Materials for performance
© Speaker: J. CastrillonDate: September 21, 2015Page: 20
Hardware
! Tomahawk: heterogeneous CMOS
CM/µK
…
…
…
HardwareOperating% system
SkeletonsApp.%Algorithms
Applications
Compilers
Formal%Analysis
Coordination
! Tiled (with NoC) for scalability! Allows to integrate wildly
heterogeneous components! CoreManager (CM): HW
support for fine-grained task dispatching
[Arnold13]
Orchestration: Materials for performance
© Speaker: J. CastrillonDate: September 21, 2015Page: 21
Hardware
! DB-processor for sorting algorithms HardwareOperating% system
SkeletonsApp.%Algorithms
Applications
Compilers
Formal%Analysis
Coordination
Selectivity:
[Arnold14]
Orchestration: Materials for performance
© Speaker: J. CastrillonDate: September 21, 2015Page: 22
Operating Systems
! Microkernel for heterogeneity– Isolation (at NoC level)
! Dynamic compartments! Isolation at network boundary
– Remote control: no need for OS everywhere! Simplify integration of novel materials! OS services on remote cores
HardwareOperating% system
SkeletonsApp.%Algorithms
Applications
Compilers
Formal%Analysis
Coordination
Orchestration: Materials for performance
© Speaker: J. CastrillonDate: September 21, 2015Page: 23
Operating Systems
! Microkernel for heterogeneity
Isolation & remote control
HardwareOperating% system
SkeletonsApp.%Algorithms
Applications
Compilers
Formal%Analysis
Coordination
CM/ µK
…
…
…App22
App21
File2System
[Asmussen15]
Orchestration: Materials for performance
© Speaker: J. CastrillonDate: September 21, 2015Page: 24
Compilers
! Parallel dataflow programming languages and compilers
HardwareOperating% system
SkeletonsApp.%Algorithms
Applications
Compilers
Formal%Analysis
Coordination
[Castrillon14]
__PNkpn AudioAmp __PNin(short A[2]) __PNout(short B[2]) __PNparam(short boost){
while (1)__PNin(A) __PNout(B) {for (int i = 0; i < 2; i++)B[i] = A[i]*boost;
}}__PNprocess Amp1 = AudioAmp __PNin(C) __PNout(F) __PNparam(3);__PNprocess Amp2 = AudioAmp __PNin(D) __PNout(G) __PNparam(10);
Orchestration: Materials for performance
© Speaker: J. CastrillonDate: September 21, 2015Page: 25
Compilers
! SW synthesis for Tomahawk– Static/dynamic mapping– Automatic code generation
P1
P2
P6P3
P4
P5
CM/ µK
HardwareOperating% system
SkeletonsApp.%Algorithms
Applications
Compilers
Formal%Analysis
Coordination
Orchestration: Materials for performance
© Speaker: J. CastrillonDate: September 21, 2015Page: 26
Skeletons
! Basic building blocks for parallelism! “Heterogeneous” programming
– Different possible implementations of same principles (CUDA, Fortran, …)
– Advanced “pre-compiler” to generate and manage variants
" Productivity
HardwareOperating% system
SkeletonsApp.%Algorithms
Applications
Compilers
Formal%Analysis
Coordination
Orchestration: Materials for performance
© Speaker: J. CastrillonDate: September 21, 2015Page: 27
Skeletons
! Basic building blocks for parallelism HardwareOperating% system
SkeletonsApp.%Algorithms
Applications
Compilers
Formal%Analysis
Coordination
Sequential*Program
do*concurrent*(i =*1:n,*j =*1:m)
mat(i,j))=)x), y(i,j,k))*)2end*do
1
!$acc*loop*private(#PRIVATE_VARS#)
#INNER#
!$acc*end*do
Style*Definitions
Style:*OpenACCType:*parallelization
AttributeLoop.private_vars()
Fragment Target:DoConcurrent
2
RECIPE*{parallelization:OpenACC}
OpenMP
Style*Application**Recipe2
Parallel*Program
!$acc*parallel
!$acc*loop*private(i,*j)
do*k*=*1,*num_ele
do*i*=*0,*po
mat(i,j))=)x), y(i,j,k))*)2end*do
end*do
!$acc*end*loop
!$acc*end*parallel
3
[Karol15]
Orchestration: Materials for performance
© Speaker: J. CastrillonDate: September 21, 2015Page: 28
Algorithms
! Raise the level of abstraction– High-level domain specific languages (more later)– Algorithmic transformations: more optimization room– Algorithmic specification: allow for adaptability
HardwareOperating% system
SkeletonsApp.%Algorithms
Applications
Compilers
Formal%Analysis
Coordination
Orchestration: Materials for performance
© Speaker: J. CastrillonDate: September 21, 2015Page: 29
Formal analysis
! Probabilistic model checking (PMC)– Analyze approaches across layers of the stack– Work on models for heterogeneous systems– Examples:
! Probability of finishing N tasks in T time! Probability of finishing N tasks with energy budget B! Trade-off analysis cost-utility (energy vs. performance)
– Model protocols: Spinlock, barriers, eBond, …
HardwareOperating% system
SkeletonsApp.%Algorithms
Applications
Compilers
Formal%Analysis
Coordination
Orchestration: Materials for performance
© Speaker: J. CastrillonDate: September 21, 2015Page: 30
Formal analysis
! Probabilistic model checking (PMC) HardwareOperating% system
SkeletonsApp.%Algorithms
Applications
Compilers
Formal%Analysis
Coordination
1.3$
1.5$
1.7$
1.9$
2.1$
2.3$
2.5$
2.7$
2.9$
200$ 300$ 400$ 500$ 600$ 700$ 800$ 900$ 1000$ 1100$ 1200$ 1300$ 1400$ 1500$ 1600$ 1700$ 1800$ 1900$ 2000$ 2100$ 2200$ 2300$ 2400$ 2500$ 2600$ 2700$ 2800$ 2900$ 3000$ 3100$ 3200$ 3300$ 3400$ 3500$ 3600$ 3700$ 3800$ 3900$ 4000$ 4100$ 4200$ 4300$ 4400$ 4500$ 4600$ 4700$ 4800$ 4900$ 5000$ 5100$ 5200$ 5300$ 5400$ 5500$ 5600$ 5700$ 5800$ 5900$ 6000$ 6100$ 6200$ 6300$ 6400$ 6500$ 6600$ 6700$ 6800$ 6900$ 7000$ 7100$ 7200$
1.3$
2.3$
3.3$
4.3$
5.3$
6.3$
7.3$
200$
300$
400$
500$
600$
700$
800$
900$
1000
$11
00$
1200
$13
00$
1400
$15
00$
1600
$17
00$
1800
$19
00$
2000
$21
00$
2200
$23
00$
2400
$25
00$
2600
$27
00$
2800
$29
00$
3000
$31
00$
3200
$33
00$
3400
$35
00$
3600
$37
00$
3800
$39
00$
4000
$41
00$
4200
$43
00$
4400
$45
00$
4600
$47
00$
4800
$49
00$
5000
$51
00$
5200
$53
00$
5400
$55
00$
5600
$57
00$
5800
$59
00$
6000
$61
00$
6200
$63
00$
6400
$65
00$
6600
$67
00$
6800
$69
00$
7000
$71
00$
7200
$
minim
al&expected&en
ergy&[W
a2s]
maximal&required&eBOND+&bandwidth&[MBit/s]Minimal%expected%energy
Maximal%required%bandwidth
eBond
[Baier14]
Orchestration: Materials for performance
© Speaker: J. CastrillonDate: September 21, 2015Page: 31
Outline
! Introduction
! HW/SW stacks
! Orchestration stack
! Outlook
! Concluding remarks
Orchestration: Materials for performance
© Speaker: J. CastrillonDate: September 21, 2015Page: 32
Current and future work
! Further work on languages and methodologies– Computational biology– Chemical information processing
! Technology analysis at system-level– Pros (USPs) and cons– Attempt skip incremental improvement – System-level benefit based on assumptions
Orchestration: Materials for performance
© Speaker: J. CastrillonDate: September 21, 2015Page: 33
Language for computational biology
! Particle and mesh-based Simulations– Discrete models of physical phenomena
! Goal: Abstract programming language + optimization for heterogeneous HPC-clusters
Particle0Mesh DSL Analysis Static
Opt.Runtimeopt.
[Karol15]
Orchestration: Materials for performance
© Speaker: J. CastrillonDate: September 21, 2015Page: 34
Language for computational biologyclient grayscottinteger, dimension(6) :: bcdef = …real(..), dimension(:,:), pointer :: displaceinteger :: interval
add_arg(k_rate,<#real(mk)#>,1.0_mk,0.0_mk,'k_rate',’’)add_arg(F,<#real(mk)#>,1.0_mk,0.0_mk,'F_param',’')add_arg(D_u,<#real(mk)#>,1.0_mk,0.0_mk,'Du_param',’’)add_arg(D_v,<#real(mk)#>,1.0_mk,0.0_mk,'Dv_param',’')
ppm_init(1)
U = create_field(1, "U")V = create_field(1, "V”)topo = create_topology(bcdef)c = create_particles(topo)
allocate(displace(ppm_dim,c%Npart))call random_number(displace)call c%move(displace, info)call c%apply_bc(info)
global_mapping(c, topo)discretize(U,c)discretize(V,c)ghost_mapping(c)
foreach p in particles(c) with … sca_fields(U,V)U_p = 1.0_mkV_p = 0.0_mkif (((x_p(1)-0.5)**2+(x_p(2)-0.5_mk)**2).lt.0.01)
thenU_p = 0.5_mk + 0.01_mk*noiseV_p = 0.25_mk + 0.01_mk*noise
end ifend foreach
n = create_neighlist(c,cutoff=<#4._mk * c%h_avg#>)Lap = define_op(2, [2,0, 0,2], [1.0_mk, 1.0_mk], "Lap")
W = discretize_op(Lap, c, ppm_param_op_dcpse, [order=>2,c=>1.0_mk])
o,nstag = create_ode([U,V],grayscott_rhs,[U=>c,V], rk4)interval = 1 t = timeloop()
do istage=1,nstagghost_mapping(c)ode_step(o, t, time_step, 1)
end doprint([U=>c, V=>c],interval )
end timeloop
ppm_finalize()end client
rhs grayscott_rhs(U=>parts,V)get_fields(dU,dV)
dU = apply_op(W, U)dV = apply_op(W, V)
foreach p in particles(parts) with sca_fields(U,V,dU,dV)dU_p = D_u*dU_p - U_p*(V_p**2) + F*(1.0_mk-U_p)dV_p = D_v*dV_p + U_p*(V_p**2) - (F+k_rate)*V_p
end foreach
end rhs
[Karol15]
Orchestration: Materials for performance
© Speaker: J. CastrillonDate: September 21, 2015Page: 35
SiNW: Components and reconfigurability
! Edge of SiNW over other technologies– P/N reconfigurability– Multi-gate devices (with low latency penalty)
! Components based on assumptions– New uArchitectures?– New pipeline structures and compiler
optimizations?
Orchestration: Materials for performance
© Speaker: J. CastrillonDate: September 21, 2015Page: 36
Outline
! Introduction
! HW/SW stacks
! Orchestration stack
! Outlook
! Concluding remarks
Orchestration: Materials for performance
© Speaker: J. CastrillonDate: September 21, 2015Page: 37
Summary
Materials)Inspired Paths (Paths A2– E)
Hardware
Operating2system
Skeletons
App.2Algorithms
Databaseapplication
High2PerformanceComputing
Mobile2Communi)cation
Compilers
Formal2Analysis
Coordination
! Orchestration path: Multi-disciplinary effort to turn materials breakthroughsinto application performance
! HW/SW stack to handle heterogeneity– Using CMOS as vehicle– Prepare for “wildly” heterogeneous systems
! Interact with material experts, understand technologies and analyze at system-level
Orchestration: Materials for performance
© Speaker: J. CastrillonDate: September 21, 2015Page: 38
Thanks!
Retreat:%Orchestration%and%resilience%paths%(03.2015)
Orchestration: Materials for performance
© Speaker: J. CastrillonDate: September 21, 2015Page: 39
References[Witkowski15] Witkowski, T., Ling, S., Praetorius, S., & Voigt, A. (2015). Software concepts and numerical algorithms for a scalable adaptive parallel finite element method. Advances in Computational Mathematics, 1-33[Huismann15] Huismann, I., Stiller, J., & Fröhlich, J. (2015). Two-level parallelization of a fluid mechanics algorithm exploiting hardware heterogenity. Computers & Fluids[Arnold13] Arnold, O., Matus, E., Noethen, B., Winter, M., Limberg, T., & Fettweis, G. (2014). Tomahawk: Parallelism and heterogeneity in communications signal processing MPSoCs. ACM Transactions on Embedded Computing Systems (TECS), 13(3s), 107[Arnold14] Arnold, O., Haas, S., Fettweis, G., Schlegel, B., Kissinger, T., Karnagel, T., & Lehner, W. (2014). HASHI: An Application-Specific Instruction Set Extension for Hashing. ADMS@ VLDB, 25-33[Asmussen15] Asmussen, N., Volp, M., Nothen, B., & Ungethum, A. (2015, April). Demo abstract: Taming many heterogeneous cores. In Real-Time and Embedded Technology and Applications Symposium (RTAS), 2015 IEEE (pp. 329-329)[Castrillon14] J. Castrillon and R. Leupers, Programming Heterogeneous MPSoCs: Tool Flows to Close the Software Productivity Gap. Springer, 2014[Karol15] Sven Karol, Well-Formed and Scalable Invasive Software Composition. PhD thesis. May 2015[Baier14] Baier, C., Dubslaff, C., & Klüppelholz, S. (2014, July). Trade-off analysis meets probabilistic model checking. In Proceedings of the Joint Meeting of the Twenty-Third EACSL Annual Conference on Computer Science Logic (CSL) and the Twenty-Ninth Annual ACM/IEEE Symposium on Logic in Computer Science (LICS)[Karol15] S. Karol, P. Incardona, Y. Afshar, I. Sbalzarini, and J. Castrillon. Towards a Next-Generation Parallel Particle-Mesh Language, in Domain-Specific Language Design and Implementation (DSLDI). July 2015