arm supported hpc tools - microsoft · 2018. 10. 5. · openhpc is a community effort to provide a...
TRANSCRIPT
©2017ArmLimited
ArmResearchSummit12September2017
ArmSupportedHPCTools
GeraintNorthDistinguishedEngineer,ArmHPCTools
©2017ArmLimited
ContinuityacrossArm-compatiblecores,andbeyond.
Communitythroughourpartnershipandopen-source.
Consumabilitythroughintegrated,tested,supportedproducts.
©2017ArmLimited3
SeriousArmHPCdeploymentsstartingin2017TwobigannouncementsaboutArminHPCinEurope
©2017ArmLimited
Open-sourceArmHPC
©2017ArmLimited5
OpenHPCisacommunityefforttoprovideacommon,verifiedsetofopensourcepackagesforHPCdeployments
Arm’sparticipation:• SilvermemberofOpenHPC• ArmisontheOpenHPCTechnicalSteeringCommitteein
ordertodriveArmbuildsupport
Status: 1.3.2releaseoutnow• AllpackagesbuiltonARMv8forCentOSandSUSE• Armmachinesarebeingusedforbuildingandalsointhe
OpenHPCbuildinfrastructure.• GreatworkfromLinaro fortestingthereleaseand
additionofnewpackages(plasma,pnetcdf,scotch,slepc)
FunctionalAreas
Components include
BaseOS RHEL/CentOS 7.1, SLES 12
AdministrativeTools
Conman, Ganglia, Lmod, LosF, ORCM,Nagios, pdsh, prun
Provisioning Warewulf
ResourceMgmt.
SLURM, Munge. Altair PBS Pro*
I/OServices Lustre client (community version)
Numerical/ScientificLibraries
Boost, GSL, FFTW, Metis, PETSc, Trilinos,Hypre, SuperLU, Mumps
I/OLibraries HDF5 (pHDF5), NetCDF (including C++and Fortran interfaces), Adios
CompilerFamilies
GNU (gcc, g++, gfortran)
MPIFamilies OpenMPI, MVAPICH2
DevelopmentTools
Autotools (autoconf, automake, libtool),Valgrind,R, SciPy/NumPy
PerformanceTools
PAPI, Intel IMB, mpiP, pdtoolkit TAU
– NowonArm
©2017ArmLimited6
https://arm-hpc.gitlab.io
©2017ArmLimited7
• 50membersjoinedsinceISC.• MostlyusedbyArmtodrawattentiontonewresourcesandannouncements.
• Someinvolvementfromothersinthecommunity.
https://arm-hpc.gitlab.io
©2017ArmLimited8
https://arm-hpc.gitlab.io
TheArmHPCPackagesWiki isacommunitysitetoshareknowledgeabout:
• Whatbuilds(GCCandARMCompiler)
• Whatisimportant
• Whathasbeentuned
• Whatflags/patchesareneededforgoodperformance.
©2017ArmLimited9
Categories grouppagestogetherintolists,e.g.:• Benchmarks,debuggers,
compilers,etc.• Applicationsinterestingto
specificend-users.• Open/Closedsource• IncludedinOpenHPC etc.
https://arm-hpc.gitlab.io
©2017ArmLimited10
TheWikipagesthemselvesmarkupCategories andLabels,whichcausethesummariesandspreadsheetstobeautomaticallyupdated.
https://arm-hpc.gitlab.io
©2017ArmLimited11
©2017ArmLimited
CommercialHPCTools
©2017ArmLimited13
CommercialHPCproductssimplifytheecosystem
Comprehensive • Comprehensivesuiteoftools– compiler,libraries,debuggersandprofilers
Performant • Best inclassperformancewithlatestfeatures• Tunedforawiderangeof64-bitARMv8-A-based
platforms
Supported • Commercially supportedbyArm
©2017ArmLimited14
ArmcommercialHPCsoftwareportfolio
ArmHPCCompilersCOMMERCIALLYSUPPORTED
FORTRAN,CANDC++
ArmPerformanceLibrariesBLAS,LAPACKandFFT
MICRO-ARCHITECTURALLYTUNED
Allinea Forge(DDT+MAP)PARALLELDEBUGGINGandPROFILING
Allinea PerformanceReportsPERFORMANCESUMMARY
©2017ArmLimited15
ArmC/C++/FortranCompiler
Linuxuser-spacecompilertailoredforHPConArm• MaintainedandsupportedbyArmforawiderangeofArm-basedSoCs runningleadingLinuxdistributions
• BasedonClang/LLVM,theleadingcompilerframeworkwithFlang forFortransupport.
Latestfeaturesgointothecommercialreleasesfirst• AheadofupstreamLLVMbyuptoanyearwithlatestperformanceimprovementpatches
• SVEsupportintheassembler,disassembler,intrinsicsandautovectorizer
OpenMP• Useslatestopen source(nowArm) LLVMOpenMPruntime
• ChangespushedbacktothecommunityOptimizedOpenMP
Latestfeaturesandperformanceoptimizations
CommerciallysupportedbyARM
©2017ArmLimited16
ArmPerformanceLibrariesOptimizedBLAS,LAPACKandFFT
Commercial64-bitARMv8mathlibraries• Commonlyusedlow-levelmathroutines- BLAS,LAPACKandFFT• ValidatedwithNAG’stestsuite,ade-factostandard
Best-in-classperformancewithcommercialsupport• TunedbyArmforCortex-A72,Cortex-A57andCortex-A53• MaintainedandsupportedbyArmforawiderangeofArm-basedSoCs
• IncludingCaviumThunderX andThunderX2CN99cores
Siliconpartnerscanprovidetunedmicro-kernelsfortheirSoCs• Partnerscancontributedirectlythroughopensourceroute• Paralleltuningwithinourlibraryincreasesoverallapplicationperformance
CommerciallySupportedbyARM
ValidatedwithNAGtestsuite
Performanceonparwithbest-in-classmathlibraries
©2017ArmLimited17
0102030405060708090
100
0 500 1000 1500 2000
Percen
tageofp
eak
Matrixdimension(M=N=K)
DGEMM– 1threadonCaviumThunderX2CN99
ARMPerformanceLibraries OpenBLAS
ArmPerformanceLibraries
HPEComanche- AdvancedTechnologyPreviewMicro-architecturaltuning
• Armcoreshaveavarietyofdesigns,createdbybothARMandourpartners
• ArmPerformanceLibrariesarecreatingtailoredversionsofroutinestotargetthesedifferentmicro-architectures
• Itisimportanttoensurethatthecorrectversionisinstalledonyoursystem
©2017ArmLimited18
ArmPerformanceLibraries
0102030405060708090
100
0 2000 4000 6000 8000 10000
Percen
tageofp
eak
Matrixdimension(M=N=K)
DGEMM– 56threadsonCaviumThunderX2CN99
ARMPerformanceLibraries OpenBLAS
HPEComanche- AdvancedTechnologyPreviewMicro-architecturaltuning
• Armcoreshaveavarietyofdesigns,createdbybothARMandourpartners
• ArmPerformanceLibrariesarecreatingtailoredversionsofroutinestotargetthesedifferentmicro-architectures
• Itisimportanttoensurethatthecorrectversionisinstalledonyoursystem
©2017ArmLimited19
NewinArmCompilerforHPC1.4
ArmPerformanceLibraries2.3.0
• SupportsGCC7.1.0andArmCompiler1.4
ArmCompiler1.4
• Supportforsomegfortran flagsinarmflang forcompatibility:-ffree-form -ffixed-form -ffixed-line-length-0 -ffixed-line-length-132 -ffixed-line-length-none-ffree-line-length-0 -ffree-line-length-132 -ffree-line-length-none-fconvert={native|swap|little-endian|big-endian}
• Supportfor-mcpu=native flag.
• Supportforvectorized mathroutines(fromSLEEF)– undocumentedfeature.
Packaging
• Modulefilesarenowcompatiblewithlmod
©2017ArmLimited20
ExperimentaltoolstosupportSVE
Compile Emulate Analyse
ARMHPCCompiler
C/C++/Fortran
SVE viaauto-vectorization,intrinsicsandassembly.
CompilerInsight:Compilerplacesresultsofcompile-timedecisionsandanalysisintheresultingbinary.
InstructionEmulator
Runsuserspace binariesforfutureARMarchitecturesontoday’ssystems.
Supportedinstructionsrununmodified.
Unsupportedinstructionsaretrappedandemulated.
CodeAdvisor
Consoleorweb-basedoutputshowsprioritizedadvicein-linewithoriginalsourcecode.
WithArmCompiler,InstructionEmulatorandCodeAdvisor
©2017ArmLimited21
NewinArmInstructionEmulator1.2.1
ExperimentalfeaturetointegratewithDynamoRIO togeneratememoryaccesstraces.ThankstoChrisAdeniyi-JonesandMiguelTairum-Cruzfortheirdesigninput!
Outputfileformat:
sequence, tid, bundle, isWrite, size, addr, pc
Where:sequence sequencenumberwhichorderstheload/storesacrossmultipletracefilestid threadid
bundle supportbundlingofmultiplemem_refs forgather/scatter/strided accesses
isWrite trueifstore,falseifload
size numberofbytesstoredorloaded
addr load/storeaddress
pc instructionaddress
©2017ArmLimited22
IntroducingtheComputeLibrary
Optimizedlow-levelfunctionsforCPUandGPU• MostpopularComputerVision(CV)andMachineLearning(ML)functions
• SupportscommonMLframeworks
EnablefasterdeploymentofCVandML• TargetingCPU(NEON)andGPU(OpenCL)
• SignificantperformanceupliftcomparedtoOSSalternatives
Publiclyavailablenow(nofee,MITlicense)
KeyFunctionscategoriesBasicarithmeticConvolutions
ColourmanipulationFeaturedetectionNeuralnetwork
GEMMPyramidsFilters
Image reshapingMathematicalfunctions
©2017ArmLimited23
SoftwareGrantsOverview
Academicresearchers
Hardwarevendors
Softwareauthors
Systemintegrators
We will assess eligibility on a case-by-case basis for each grant application with the sales manager for the relevant region.
Guidelines for acceptance§ Single nodes and small systems intended to explore ARM technology§ Research projects with limited scope and duration§ Contribution to the ARM ecosystem
Unlikely to be eligible§ Production systems§ Long-term use for large teams§ Prevents a sale in progress
We will provide free access to HPC tools (ARM compiler, libraries, Emulator, Forge and Reports) for:§ Researchers experimenting with and porting codes to ARM hardware§ Partners porting applications and developing systems for the ARM ecosystem
©2017ArmLimited24
Summary
Arm’secosystemisbuiltonpartnership andchoice
•Weworkwithmanyorganizationstodrivehardwaredesignanddeliverbettersoftware
• Thismethodenablespartnerstodesigndifferentproductsfordifferentmarkets
WelicenseIPatalllevelsofthestacktohelpcustomersbesuccessful
Our64-bitserverplatformsarebeginningtoseelarge,main-streamdeployments
Buildingthesoftwareecosystemandtoolsisanimportantpartofthisstory
•Weenhanceopensourcesoftwareaswellasdevelopingcommerciallysupportedoptions
2525 ©2017ArmLimited
TheArmtrademarksfeaturedinthispresentationareregisteredtrademarksortrademarksofArmLimited(oritssubsidiaries)intheUSand/orelsewhere. Allrightsreserved. Allothermarksfeaturedmaybetrademarksoftheirrespectiveowners.
www.arm.com/company/policies/trademarks