arm supported hpc tools - microsoft · 2018. 10. 5. · openhpc is a community effort to provide a...

Post on 30-Dec-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

©2017ArmLimited

ArmResearchSummit12September2017

ArmSupportedHPCTools

GeraintNorthDistinguishedEngineer,ArmHPCTools

©2017ArmLimited

ContinuityacrossArm-compatiblecores,andbeyond.

Communitythroughourpartnershipandopen-source.

Consumabilitythroughintegrated,tested,supportedproducts.

©2017ArmLimited3

SeriousArmHPCdeploymentsstartingin2017TwobigannouncementsaboutArminHPCinEurope

©2017ArmLimited

Open-sourceArmHPC

©2017ArmLimited5

OpenHPCisacommunityefforttoprovideacommon,verifiedsetofopensourcepackagesforHPCdeployments

Arm’sparticipation:• SilvermemberofOpenHPC• ArmisontheOpenHPCTechnicalSteeringCommitteein

ordertodriveArmbuildsupport

Status: 1.3.2releaseoutnow• AllpackagesbuiltonARMv8forCentOSandSUSE• Armmachinesarebeingusedforbuildingandalsointhe

OpenHPCbuildinfrastructure.• GreatworkfromLinaro fortestingthereleaseand

additionofnewpackages(plasma,pnetcdf,scotch,slepc)

FunctionalAreas

Components include

BaseOS RHEL/CentOS 7.1, SLES 12

AdministrativeTools

Conman, Ganglia, Lmod, LosF, ORCM,Nagios, pdsh, prun

Provisioning Warewulf

ResourceMgmt.

SLURM, Munge. Altair PBS Pro*

I/OServices Lustre client (community version)

Numerical/ScientificLibraries

Boost, GSL, FFTW, Metis, PETSc, Trilinos,Hypre, SuperLU, Mumps

I/OLibraries HDF5 (pHDF5), NetCDF (including C++and Fortran interfaces), Adios

CompilerFamilies

GNU (gcc, g++, gfortran)

MPIFamilies OpenMPI, MVAPICH2

DevelopmentTools

Autotools (autoconf, automake, libtool),Valgrind,R, SciPy/NumPy

PerformanceTools

PAPI, Intel IMB, mpiP, pdtoolkit TAU

– NowonArm

©2017ArmLimited6

https://arm-hpc.gitlab.io

©2017ArmLimited7

• 50membersjoinedsinceISC.• MostlyusedbyArmtodrawattentiontonewresourcesandannouncements.

• Someinvolvementfromothersinthecommunity.

https://arm-hpc.gitlab.io

©2017ArmLimited8

https://arm-hpc.gitlab.io

TheArmHPCPackagesWiki isacommunitysitetoshareknowledgeabout:

• Whatbuilds(GCCandARMCompiler)

• Whatisimportant

• Whathasbeentuned

• Whatflags/patchesareneededforgoodperformance.

©2017ArmLimited9

Categories grouppagestogetherintolists,e.g.:• Benchmarks,debuggers,

compilers,etc.• Applicationsinterestingto

specificend-users.• Open/Closedsource• IncludedinOpenHPC etc.

https://arm-hpc.gitlab.io

©2017ArmLimited10

TheWikipagesthemselvesmarkupCategories andLabels,whichcausethesummariesandspreadsheetstobeautomaticallyupdated.

https://arm-hpc.gitlab.io

©2017ArmLimited11

©2017ArmLimited

CommercialHPCTools

©2017ArmLimited13

CommercialHPCproductssimplifytheecosystem

Comprehensive • Comprehensivesuiteoftools– compiler,libraries,debuggersandprofilers

Performant • Best inclassperformancewithlatestfeatures• Tunedforawiderangeof64-bitARMv8-A-based

platforms

Supported • Commercially supportedbyArm

©2017ArmLimited14

ArmcommercialHPCsoftwareportfolio

ArmHPCCompilersCOMMERCIALLYSUPPORTED

FORTRAN,CANDC++

ArmPerformanceLibrariesBLAS,LAPACKandFFT

MICRO-ARCHITECTURALLYTUNED

Allinea Forge(DDT+MAP)PARALLELDEBUGGINGandPROFILING

Allinea PerformanceReportsPERFORMANCESUMMARY

©2017ArmLimited15

ArmC/C++/FortranCompiler

Linuxuser-spacecompilertailoredforHPConArm• MaintainedandsupportedbyArmforawiderangeofArm-basedSoCs runningleadingLinuxdistributions

• BasedonClang/LLVM,theleadingcompilerframeworkwithFlang forFortransupport.

Latestfeaturesgointothecommercialreleasesfirst• AheadofupstreamLLVMbyuptoanyearwithlatestperformanceimprovementpatches

• SVEsupportintheassembler,disassembler,intrinsicsandautovectorizer

OpenMP• Useslatestopen source(nowArm) LLVMOpenMPruntime

• ChangespushedbacktothecommunityOptimizedOpenMP

Latestfeaturesandperformanceoptimizations

CommerciallysupportedbyARM

©2017ArmLimited16

ArmPerformanceLibrariesOptimizedBLAS,LAPACKandFFT

Commercial64-bitARMv8mathlibraries• Commonlyusedlow-levelmathroutines- BLAS,LAPACKandFFT• ValidatedwithNAG’stestsuite,ade-factostandard

Best-in-classperformancewithcommercialsupport• TunedbyArmforCortex-A72,Cortex-A57andCortex-A53• MaintainedandsupportedbyArmforawiderangeofArm-basedSoCs

• IncludingCaviumThunderX andThunderX2CN99cores

Siliconpartnerscanprovidetunedmicro-kernelsfortheirSoCs• Partnerscancontributedirectlythroughopensourceroute• Paralleltuningwithinourlibraryincreasesoverallapplicationperformance

CommerciallySupportedbyARM

ValidatedwithNAGtestsuite

Performanceonparwithbest-in-classmathlibraries

©2017ArmLimited17

0102030405060708090

100

0 500 1000 1500 2000

Percen

tageofp

eak

Matrixdimension(M=N=K)

DGEMM– 1threadonCaviumThunderX2CN99

ARMPerformanceLibraries OpenBLAS

ArmPerformanceLibraries

HPEComanche- AdvancedTechnologyPreviewMicro-architecturaltuning

• Armcoreshaveavarietyofdesigns,createdbybothARMandourpartners

• ArmPerformanceLibrariesarecreatingtailoredversionsofroutinestotargetthesedifferentmicro-architectures

• Itisimportanttoensurethatthecorrectversionisinstalledonyoursystem

©2017ArmLimited18

ArmPerformanceLibraries

0102030405060708090

100

0 2000 4000 6000 8000 10000

Percen

tageofp

eak

Matrixdimension(M=N=K)

DGEMM– 56threadsonCaviumThunderX2CN99

ARMPerformanceLibraries OpenBLAS

HPEComanche- AdvancedTechnologyPreviewMicro-architecturaltuning

• Armcoreshaveavarietyofdesigns,createdbybothARMandourpartners

• ArmPerformanceLibrariesarecreatingtailoredversionsofroutinestotargetthesedifferentmicro-architectures

• Itisimportanttoensurethatthecorrectversionisinstalledonyoursystem

©2017ArmLimited19

NewinArmCompilerforHPC1.4

ArmPerformanceLibraries2.3.0

• SupportsGCC7.1.0andArmCompiler1.4

ArmCompiler1.4

• Supportforsomegfortran flagsinarmflang forcompatibility:-ffree-form -ffixed-form -ffixed-line-length-0 -ffixed-line-length-132 -ffixed-line-length-none-ffree-line-length-0 -ffree-line-length-132 -ffree-line-length-none-fconvert={native|swap|little-endian|big-endian}

• Supportfor-mcpu=native flag.

• Supportforvectorized mathroutines(fromSLEEF)– undocumentedfeature.

Packaging

• Modulefilesarenowcompatiblewithlmod

©2017ArmLimited20

ExperimentaltoolstosupportSVE

Compile Emulate Analyse

ARMHPCCompiler

C/C++/Fortran

SVE viaauto-vectorization,intrinsicsandassembly.

CompilerInsight:Compilerplacesresultsofcompile-timedecisionsandanalysisintheresultingbinary.

InstructionEmulator

Runsuserspace binariesforfutureARMarchitecturesontoday’ssystems.

Supportedinstructionsrununmodified.

Unsupportedinstructionsaretrappedandemulated.

CodeAdvisor

Consoleorweb-basedoutputshowsprioritizedadvicein-linewithoriginalsourcecode.

WithArmCompiler,InstructionEmulatorandCodeAdvisor

©2017ArmLimited21

NewinArmInstructionEmulator1.2.1

ExperimentalfeaturetointegratewithDynamoRIO togeneratememoryaccesstraces.ThankstoChrisAdeniyi-JonesandMiguelTairum-Cruzfortheirdesigninput!

Outputfileformat:

sequence, tid, bundle, isWrite, size, addr, pc

Where:sequence sequencenumberwhichorderstheload/storesacrossmultipletracefilestid threadid

bundle supportbundlingofmultiplemem_refs forgather/scatter/strided accesses

isWrite trueifstore,falseifload

size numberofbytesstoredorloaded

addr load/storeaddress

pc instructionaddress

©2017ArmLimited22

IntroducingtheComputeLibrary

Optimizedlow-levelfunctionsforCPUandGPU• MostpopularComputerVision(CV)andMachineLearning(ML)functions

• SupportscommonMLframeworks

EnablefasterdeploymentofCVandML• TargetingCPU(NEON)andGPU(OpenCL)

• SignificantperformanceupliftcomparedtoOSSalternatives

Publiclyavailablenow(nofee,MITlicense)

KeyFunctionscategoriesBasicarithmeticConvolutions

ColourmanipulationFeaturedetectionNeuralnetwork

GEMMPyramidsFilters

Image reshapingMathematicalfunctions

©2017ArmLimited23

SoftwareGrantsOverview

Academicresearchers

Hardwarevendors

Softwareauthors

Systemintegrators

We will assess eligibility on a case-by-case basis for each grant application with the sales manager for the relevant region.

Guidelines for acceptance§ Single nodes and small systems intended to explore ARM technology§ Research projects with limited scope and duration§ Contribution to the ARM ecosystem

Unlikely to be eligible§ Production systems§ Long-term use for large teams§ Prevents a sale in progress

We will provide free access to HPC tools (ARM compiler, libraries, Emulator, Forge and Reports) for:§ Researchers experimenting with and porting codes to ARM hardware§ Partners porting applications and developing systems for the ARM ecosystem

©2017ArmLimited24

Summary

Arm’secosystemisbuiltonpartnership andchoice

•Weworkwithmanyorganizationstodrivehardwaredesignanddeliverbettersoftware

• Thismethodenablespartnerstodesigndifferentproductsfordifferentmarkets

WelicenseIPatalllevelsofthestacktohelpcustomersbesuccessful

Our64-bitserverplatformsarebeginningtoseelarge,main-streamdeployments

Buildingthesoftwareecosystemandtoolsisanimportantpartofthisstory

•Weenhanceopensourcesoftwareaswellasdevelopingcommerciallysupportedoptions

2525 ©2017ArmLimited

TheArmtrademarksfeaturedinthispresentationareregisteredtrademarksortrademarksofArmLimited(oritssubsidiaries)intheUSand/orelsewhere. Allrightsreserved. Allothermarksfeaturedmaybetrademarksoftheirrespectiveowners.

www.arm.com/company/policies/trademarks

top related