arm processor technology update and roadmap€¦ · cavium corporate overview multi-core mips, arm...
TRANSCRIPT
Arm Processor Technology Update and Roadmap
©2016Cavium,Inc.– ConfidentialandProprietaryInformation
ARM Processor Technology Update and Roadmap
§ Cavium:GiriChukkapalliisaDistinguishedEngineerintheDataCenterGroup(DCG)
Introduction to ARM Architecture for HPC deployment and rationale for the design pointof ThunderX2 Core and SOC in the single thread performance vs throughput space ispresented in this talk. Focus is on the sustained performance per TCO and ease ofexploiting spectral parallelism of HPC applications. Preliminary experience of porting,running and performance analysis of HPC applications will be discussed.
ThunderX2 in HPC
©2017Cavium,Inc.– ConfidentialandProprietaryInformation4
Cavium Corporate Overview
Multi-Core MIPS, ARM Processors, Security, SDN Switch andServer/Storage Connectivity ~$10B TAM
Mobile InfrastructureEnterprise Data Center and Cloud Service Provider Cloud
©2017Cavium,Inc.– ConfidentialandProprietaryInformation
MostWidelyUsed
LicensingModel
ü Over90Bshippedin25yrsü Outshipsx86by20Xper
year
ü Anyonecanbuildü Innovate&Optimize
fortargetedapplications
ARM Servers & ARM for HPCARMforHPC
§ ARM=Choice&pathtomoreoptimizedsolutions
§ MarchtoExascale openingdoorfornewISA
• MassiveparallelismrequiresSWchanges
• ARMHPCprojectsactiveworldwide
• HPChaslargeopensourcecomponent
• ThrivingARMecosystemforHPC
©2017Cavium,Inc.– ConfidentialandProprietaryInformation
Cavium’s Proven Leadership in Silicon Design
Highestperformance,mostwidelysupported,dualsocketARMv8serversinproduction
2coreto48core,varietyofpricepoints,TDP
CommonSWarchitecture
THECPUcompanyforInfrastructure
MulticoreCPUExpert
Performance2SConfig
HighPerfCustomCores
CompletePortfolio
ARMv8architecturallicensee
Power,Perf,AreaOptimized
#1inSecurity&WirelessInfrastructure,#2inEmbedded
OPTIMIZINGARM64SERVERSFORHPC&CLOUDDATACENTER
©2017Cavium,Inc.– ConfidentialandProprietaryInformation
§ Multithreaded,fullyoutoforderhighperformanceARMv8customcores
§ Singleanddualsocketsupport§ Highestmemorybandwidth&capacity§ Serverclassvirtualization§ ServerclassRAS
§ Extensivepowermanagement§ RichIOconfigurations§ ExtensivePowermanagement§ CoreandSocketlevelperformancecompetitivewith
nextgenincumbentserverCPUs§ Comprehensivehardwareandsoftwareecosystem
ARM Leadership –ThunderX2 FIRSTS for ARM Processors
® World’s Highest Performance Xeon Class ARM Server – 2nd generation product from Cavium
7
©2017Cavium,Inc.– ConfidentialandProprietaryInformation
Differentiation
PCIeLanes
MemoryCapacity
MemoryBandwidth
TotalThreads
Cores Highercorecountdelivershigherthroughput
Higherthreadcount=largernumberofvCPUs
More memorybandwidthformemoryintensiveworkloads
Morememorycapacityforin-memoryworkloads
RichIOconnectivityoptionsDirectattachtoVMe devices
IncumbentServerCPUThunderX2
©2017Cavium,Inc.– ConfidentialandProprietaryInformation
:ThrivingHPCEcosystem
StandardsBased SysManagement&FW
IndustryLeadingOperatingSystems
Linux EnterpriseSLE12
OptimizedCompilers&DevEnvironments
Debuggers,Profilers&ClusterMgt
OpenSource&CommunityFocus
©2017Cavium,Inc.– ConfidentialandProprietaryInformation
ThunderX Momentum in HPC Continues to Grow…
Mem
ory
Band
width
Integer
Floatin
gPo
int
Vectors
1.0
2.2X2.5X
3X4X
2-4X better HPC performance
SignificantHPCEngagementsEarlyPressAnnouncements
ServerplatformsatWorld’spremierHPCLabs
Early Performance for HPC Applications
©2017Cavium,Inc.– ConfidentialandProprietaryInformation
ThunderX2 Delivers Compelling Memory Bandwidth
Details: ThunderX2CPULinuxkernel4.8.0-32-generic(4kpages)StreamcompiledwithGCCversion5.4.0-6Ubuntu16.04.4at-O3
0102030405060708090
100
%ofp
eakband
width
%ofcpuloadload
StreamScaling
copy scale add triad
HighestMemorybandwidthenablesmemoryboundapplicationstoscalebetter
©2017Cavium,Inc.– ConfidentialandProprietaryInformation13
OpenBLAS DGEMM
Details: ThunderX2CPULinuxkernel4.8.0-32-generic(4kpages)OpenBLAS compilewithGCCversion5.4.0-6ubuntu1~16.04.4at-O3
Efficientcorescapableofachievingclosetotheoreticalpeakperformance
©2017Cavium,Inc.– ConfidentialandProprietaryInformation14
OpenBLAS SGEMM
Details: ThunderX2CPULinuxkernel4.8.0-32-generic(4kpages)OpenBLAS compilewithGCCversion5.4.0-6ubuntu1~16.04.4at-O3
Efficientcorescapableofachievingclosetotheoreticalpeakperformance
Efficientcorescapableofachievingclosetotheoreticalpeakperformance
©2017Cavium,Inc.– ConfidentialandProprietaryInformation15
ThunderX2 Delivers Best-In-Class HPL Performance
Details: ThunderX2CPULinuxkernel4.8.0-32-generic(4kpages)HPLcompilewithGCCversion5.4.0-6ubuntu1~16.04.4(defaults)mpich v3.2,noopenmp,singlesockettest,processgridforeachtestcasebasedonnumberofcoresintest
Efficientcorescapableofachievingclosetotheoreticalpeakperformance
0
20
40
60
80
100
3 13 19 25 28 38 50 63 75 78 94 100
%ofp
eakpe
rform
ance
%ofsystemload
HPLscaling- %ofpeakGflops
©2017Cavium,Inc.– ConfidentialandProprietaryInformation16
ThunderX2 Performance Scaling on Real Applications
Highmemorythroughputbenefitssimplesimulations
Largehighperformancecorecountcombinedwithhighmemorythroughputbenefitscomplexsimulations