the expert's mission in china, may 9-13, 2016
TRANSCRIPT
LeCalculHautePerformanceenChineHighPerformanceComputinginChina
SergeG.Petiton
Outline
• Introductionandcontext• Theexpert’smissioninChina,May9-13,2016• Othersvisitsanddiscussions• Projects• Conclusion
ORAPOctober18,2016
Outline
• Introductionandcontext• Theexpert’smissioninChina,May9-13,2016• Othersvisitsanddiscussions• Projects• Conclusion
ORAPOctober18,2016
October18,2016 ORAP
Chinesesupercomputersare#1(93Pf)and#2(Linpack)Approximately:#1isx5fasterthanthefirstUSsupercomputer(Titan),#1isx18fasterthanthefirstinFrance(TOTAL,5.2PF,#11)#1isx45fasterthanProlix(Meteo,2,1Pf,#40)#1isx60fasterthanfirstGENCIsupercomputer(1.6Pf,#53)
ProcessorsmadeinChina(#1)
SeveralimportantChineseprojectsonHPC.Exascale supercomputerscheduledfor2020(35MW).
EcosystemandresearchstructuresforHPCinChina?Applicationsandutilizationsofsupercomputers?DoesaFrench-ChinesecollaborationonHPCpossible?
Expert’smissionco-organizedwiththeFrenchEmbassyinBeijing
VisitingProfessorattheCASin2016Othersvisits(HPCChina,inWuxi,….)
Severalvisitsindifferentlaboratoriesandcompanies
Thistalk:atentativetoshareobservations,discussions,remarksandothersconsiderationsonHPCinChina.
Onlyothersmissionsandinternationalcollaborationswouldallowtohaveamorerealisticview
Outline
• Introductionandcontext• Theexpert’smissioninChina,May9-13,2016• Othersvisitsanddiscussions• Projects• Conclusion
ORAPOctober18,2016
October18,2016 ORAP
Theideaofthisexpert’smissiononHPCinChinacomefromdiscussionsduringtheHPCChinaconferenceinWuxi,onNovember2015.
MissionSchedule:
• First Chinese-French Workshop on Extreme Computing (CFWEC2016)organized with Yutong Lu on May 9th and 10th at theNational SuperComputing Center in Guangzhou (NSCC-GZ)
• Visits co-organized with Abdo Malloc (French Embassy in China, Beijing) andHaiwu He (Chinese Academy of Science)
o Sugono ParaTerao CNIC at CASo CAS-CNIC supercomputing centreo University Tsinghuao Inspuro National Supercomputing Center at Jinan
Thereportwouldbeonlinesoonorap.irisa.fr
ContactattheFrenchEmbassy:Abdo Mallac
WithGabrielAntoniu (INRIA),ChristopheCALVIN(CEA),ThierryCollette(CEA),MichelDayde (CNRS)DiegoKlahr (TOTAL),XavierVigouroux (Atos)
October18,2016 ORAP
Goal:scientificexchangeswithChineseresearchersandleadersofHPCinChina.Discussionsforfuturecollaborations.
ChinesespeakersatCFWEC2016:
• Depei Qian:ProfessorattheXi’anJiatong UniversityandattheBeihang University(Beijing),memberoftheNationalCommitteeofHighTechnologyPrograms.
• Yutong Lu:DirectoroftheNationalSuperComputing CenteratGuangzhouand(until2016)ProfessorattheNationalUniversityofDefenseandTechnology.DeputyChiefdesignerofTianhe 2andoftheChinaacceleratorMatrix2000.
• ChaoYang,ProfessoratBeihang UniversityandattheInstituteofSoftwareoftheCAS
• HuiYan,LiangLiu,Jingkun Chen,Jianhui Li:researchersattheNSCC-GZ
PresentthesupercomputingcentersinChinaandtheChineseNationalGrid(CNGrid).
ChinawasnotabletoobtainIntelprocessors(USembargo)fortheTianhe 2A.Theyneedprocessorsandaccelerators.Then,theydevelopedtheirowndomesticprocessors.
Newfiveyearsnationalplan:HPCisapriority
Nextobjective:severalhundredsofpetaflops
Exascale machine(2020,35MW):3Dprocessors,opticalcommunications,onchipnetwork….
ChinawanttodevelopinternationalcollaborationsonHPC,forDepei QianthisworkshopisafirststepItexistsalreadyChina-EUcollaboration.HPCisanIdentifiedtopicwhichwasneverlaunched.
Necessityofmulti-disciplinaryresearches
Lackof“talents”forHPCinChina.
“Withoutanecosystemforourdomesticprocessor,wewillnotsuccess”
October18,2016 ORAP
Depei Qian’stalk
CNGrid
October18,2016 ORAP
Yutong Lu’stalk
Yutong LudescribedthedevelopmentofsupercomputersinNUDT,especiallyoftheTianhe ones.
October18,2016 ORAP
October18,2016 ORAP
NoPGASlanguageornewprogrammingparadigmsontheTianhe 2softwarestack(butwhereitisproposed?)
Allthenationalcenters integrateHPCandBigData:HighPerformanceDataAnalytics(HPDA).
Cloudisconsideredonthesoftwarestack.
Nevertheless,itisstillseparatedatthenationallevel.SomecenterfocusonlyonHPCandothersalreadyonHPDA.
HadoopandSparkareonthesameTianhe 2starlightsoftwarestack,withMPIandOpenMP.
Tianhe 2hasafilesystemcalledH2FS,builtonthetopofHDFS
DuringtheworkshopHPC,BigDataandCloudswereoftenconsideredonthesameecosystem,theCNGridandtheSCGrid arealsowell-integrated.
Allthespecificationsarewell-introducedbutitisreallydifficulttoknowthedegreeofrealizationandthenumberofapplicationsdeployedwiththisHPDAparadigm.Nevertheless,theidea,thedynamismandthecontextarereal!
October18,2016 ORAP
Software:• Paramon :datacollector,overhead 5%.Scalableupto1000nodes.Nostorage.APImaygiveothersmetrics.
X86,GPU,XeonPhiandPower.• Paratune : Characterisation of codes with respect to the communication throughput.• OITS : System supervision with alerting by mail.
They focus on end-users and acquire some expertise to develop software for future Chinese machines
Collaborations with :
• CNPC BGP, PetroChina and Sinopec, Research institutes such as SC-CAS, of CAS, Tsinghua University.
• HP, Dell, Lenovo and Intel, for example
ProfessorZhangYouhui andhisteam,HPCResearchandEducation andInstituteofHPC(IHPC),pioneerinChinatobuiltHPCclustersbasedonx86architecture
• Developmentofmathematicmodelsforscientificsimulations• Braininspiredcomputing.Managementofdendrite-axonusinga256x256crossbar:ProcessormadeinChina• Approximatedcomputing• Neuralcomputationaccelerators
IHPCisanimportantevaluationcentreforthenationalHPCprojects
Thislaboratoryseemsinvolvedon(quite)alltheHPCprojectsinChina.
October18,2016 ORAP
TheComputerNetworkInformationCenter,CAS,Beijing
TheSCGrid :interconnectingtheCASsupercomputingCenters
Researchessupported(62%)bytheNationalScienceFoundationofChina (NSFC )
Numerical algorithms, Cloud and Grid services, brainsimulation, genomic. The CNIC also develop tight collaborationwitht the China Electric Power Research Institute (CEPRI) onnew energy management.
Deep learning is a new important topic (teams havetomove to this topic while I was there)
Several talks, including• Zhao Di, of the Advanced Brain Computing of
CAS : researches on the brain siulation suingGOU for neural convolutions wuth deeplearning.
• Wei Chen (center of scientific computingapplications & research) on large scale physicalsimulation : soft condensed matter physics.
• ….
Join lab CNIC-Intel and collaboration with CERN
SC-CAS : supercomputing center of the CNIC-CAS
October18,2016 ORAP
TheSC-CAS
• VisualisationandHPCapplicationdevelopmentsupportsforCAS,inparticularforCNIC
• ManagementoftheScGrid• ManagementofthelargernodeinnorthofChinaoftheCNGrid
October18,2016 ORAP
WevisitthetwocompaniesthemostinvolvedonHPCdevelopmentofHPCmachinesinChina:• Sugon (exDawning)inBeijing(Privatecompany)• Inspur(exLangchao)inJinan
Sugon developedabout10%oftheTop500machines(4th) andInspurabout4%
ThefasterssystemsaredevelopedbyNationalCentersorUniversities(NUDT)withtheexpertiseoftheCAS(ICTmainly,butalsowithothersInstitutes),andtheIHPCoftheTsinghuaUniversity.
Sugon
FirstinChinaHPCmarket:HPCandDataCentersTheSugon 7000cube(earthsystemnumericalsimulationSiliconCube):3600processors
Lenovo:newInnovationcentervisited
ChineseCompanies:Sugon,Inspur
October18,2016 ORAP
TechnologyIntel:CPUandMICTargetedclimatemodeling,withtheCAS
Expertisetobuildperformanthardware,includingfastnetworkforHPC
October18,2016 ORAP
Inspur
FirstChinesecompanytobeinvolvedontheITmarket.Hardwareforserversbutdevelopalsosomesoftware.HPC,BigDataandCloud.LargemarketwithBaidu,Alibabaandothers
ExpertisetobuildperformantmachinesinrelationwiththeICTofCAS
TwojoinlaboratorieswithNvidia andIntel.AnofficeintheSiliconValleyatFremont.
NationalSupercomputingCenteratJinanJuin 2011
Juin 2016SunwayBlueLight MPP796teraflopsEfficiencyLinpack :74,4%140000cores(approx.),8704processors1nodewith256socketsand4terabytesofmemory.ProcessormadeinChinaShenWei SW1600,16cores128Gigaflops,16GBofRAM.1MWatts (741Megaflop/WattInfiniBandwithswitchesmadeinChina
ImportantKnowledgeacquiredtodevelopthefollowingSunwaysupercomputernow#1(Linpack).
OneoftheknownChinesetracktoexascale
Outline
• Introductionandcontext• Theexpert’smissioninChina,May9-13,2016• Othersvisitsanddiscussions• Projects• Conclusion
ORAPOctober18,2016
October18,2016 ORAP
InstituteofSoftware(IS),CAS,Beijing
Severalinternationalcollaborations
Linearalgebra,deeplearning,optimizations,…...
ResearchesonDistributedandParallelComputing.
DiscussionsonKrylovmethodsandauto-tunningatruntimeforlinearalgebrarestartedmethods
Alotofinternationalcollaborations(wediscussedaboutfutureFrench-Chineseones)
ThemajorityoftheresearchersspenttimeinUSA(Argone,Yale,…) orinEurope(Germany,…)
InvolvedonHPCChineseprojectandothersCASapplicationdevelopments
WiththeInstituteofComputingTechnology(ICT),probablythemostimportantCASinstituteforHPC(excludingApplications).ItseemsthattheCNICisnowdedicatedlargelyfordeeplearningcomputing(sincespring2016)
(invitedbyChaoYang)
October18,2016 ORAP
LenovoEnterpriseInnovationCenter(EIC),Beijing
WithLaurentBodelin (CAS):firstvisitors
Benchmark,demos,Training
Showrooms
HPCisoneofthe3tech.fieldsItexiststhesameinitiativeinUSandinGermany
Outline
• Introductionandcontext• Theexpert’smissioninChina,May9-13,2016• Othersvisitsanddiscussions• Projects• Conclusion
ORAPOctober18,2016
October18,2016 ORAP
TheChinaAccelerator(Matrix2000projectleadbyYutong Lu)
TheGeneralProposeDigitalSignalProcessor(GPDSP)Matrix2000projectwasfirstscheduledfortheTianhe 2Awhichwassupposedtobethefirstsupercomputertoreachthe100petaflops,beforetheUSembargo.
Itwassupposetoreach2.4teraflops(64bits)for200Watts,runningat1GHertz.
Eachcoreownascalarandavectorunits,withVLIW
Nevertheless,Tianhe 2AwillusetheARMprocessor“Mars”developped inChinaandtheMatrix2000acceleratorprojectseemtohavebeenabandonedforHPCmachine,forthemoment(Itisdifficulttohaveanyconfirmation).
TheARMprocessorAMRv8,andtheScalableVectorExtensions(SVE)
AnnouncedatHotChip2016,August2016Vector:2048bits
Backatthetimeoftheautomaticvectorizationmirage:SIMDversusfluxparallelism(pipeline)orDataParallelprogrammingparadigms
October18,2016 ORAP
TheMarschipdevelopedbyPhytium inChina
ARMchipbasedontheARMv8,512gigaflopsfor100Watts,2GHertz.500M$28nm,TSMCBulk,inChina
Eachpanelowns8“Xiaomi”cores,forasetof64cores
October18,2016 ORAP
Thesunway TaihuLight MPP
Developed by the National Research Center of Parallel Computer Engineering and Technology(NRCPC) in Shanghai and installed at the National Supercomputer center in Wuxi,
93 Petaflops (Linpack), peak performance of 125 Petaflops (efficiency : 75%).165 120 nodes and 1.2 Petabytes of memory (7.2 Terabytes for each node). The electricconsumption is 15,371 MWatts (Linpack), e.g. 6 Gigaflops per Watt.
The Sunway TaihuLight processor is the SW 26010, designed by the Shanghai High Performance ICDesign Center
Nevertheless, for the High Performance Conjugate Gradients (HPCG) benchmark , the performanceis 0,371 only petaflops (0,3 % of the peak performance : memory and network not enough”powerful »
Cf. JackDongarra’s reportonTaihuLigth MPP
October18,2016 ORAP
Eachgroupofcoresiscomposedbyamanagementunit(MPE)and64computingunits(CPE)structuredona8x8grid,connectedbyanetworkonchip(NOC)
On each SW 26019 processor, made in China, we have 4 CPEs (64 x 4 = 256 cores) and 4 MPE - e.g.260 cores – and 4 memory controllers (MC). Each MC has 8 Gigabytes of memory.
The all machine has 40960 nodes and 10 649 600 cores for 1,31 petabytes of memory.
October18,2016 ORAP
Each core computes 8 double floating pointoperations per cycles
Core : 1,45 GigaHertz
MPE peak performance : 23,2 Gigaflops, compute 16 floating point operations percycle
Each node performance : 3,06 Teraflops.
40 racks, each with 4 supernodesof 256 nodes each
Peak performance of 125,4 petaflops,
Outline
• Introductionandcontext• Theexpert’smissioninChina,May9-13,2016• Othersvisitsanddiscussions• Projects• Conclusion(2slides)
ORAPOctober18,2016
October18,2016 ORAP
HPCChinesesupercomputersandframeworks
China has the faster supercomputers since a few years and the larger number of machines on theTop500 since June 2016. Nevertheless, they are not always very well-adapted for otherscomputations than the « Linpack » (but it is also the case of some others computers), but thedifference is so large that at the end the computing capacity is probably still quite larger.
Itisnotonlyaproblemofcomputingpeakorsustainedperformances,butalsoonthepossibilitytodoresearchonfutureexascale andbeyondalgorithmsandprogrammingparadigms!Withlargermachines,wemaydomorecredibleresearchesonnewresilientandenergyefficientalgorithmsandevaluatenewprogrammingparadigms(PAGS,graphsofComponents/tasks,….)
Chinahasatleasttwoknowntrackstodevelopanexascale supercomputerby2020:theSunway(RISCChineseprocessor)andTianhe (TH3– usingARMphytium processorswithSVE?)ones.
Thechallengeisalso,andperhapsmostly,todeveloptheassociatedecosystem.
Chinesechallenge
First:developalltheecosystemassociatedwiththeirmadeinChinamachines,andeducate“talents”todevelopefficientHPCandBigDataapplications
ImproveinternationalcollaborationsonHPCandBigData
October18,2016 ORAP
FrenchChineseCollaborationsonHPC
Wedecided,duringdiscussionsatNCCS-GZ,towriteawhitepaperaboutdevelopingChinese-FrenchresearchesonExtremeComputationalandDataScience.Adraftofsuchpaperisunderreview.
Weestablishedafirstlistofpotentialjointresearches(tobeconfirmed):• Linearalgebra
- firstexperimentationsonTianhe 2atNSCC-GZ:unite-and-conquerasynchronousGMRES/ERAM-LS(realisticteststobelaunchedasap)
• Programmingparadigmsandlanguages- YML-TEZwhileatCAS-CNIC:mixingcomputationalanddatacomputing
• I/Osystemsandmanagements• Runtimesystems• Applications:datascience,climate,medicine,lifescience…..• Compilerandautomaticparallelization• Machinelearning
ItwasdecidedattheendoftheworkshopinGuangzhoutoorganizenextspringasecondChinese-FrenchWorkshoponExtremeComputing(CFWEC2017),probablyattheMaison delaSimulationinSaclay.
Finalremarks
Atthe39th ORAPforum,scheduledonMarchthe28th atCNRSMichelAnge,Yutong LuoranotherChineseHPCexpertwouldgiveatalk.
OurreallyfirstresultsonThianhe 2,withXinzheWu(U.Lille1,CNRS/MDLS)
October18,2016 ORAP
FrenchChineseCollaborationsonHPC
Wedecided,duringdiscussionsatNCCS-GZ,towriteawhitepaperaboutdevelopingChinese-FrenchresearchesonExtremeComputationalandDataScience.Adraftofsuchpaperisunderreview.
Weestablishedafirstlistofpotentialjointresearches(tobeconfirmed):• Linearalgebra
- firstexperimentationsonTianhe 2atNSCC-GZ:unite-and-conquerasynchronousGMRES/ERAM-LS(realisticteststobelaunchedasap)
• Programmingparadigmsandlanguages- YML-TEZwhileatCAS-CNIC:mixingcomputationalanddatacomputing
• I/Osystemsandmanagements• Runtimesystems• Applications:datascience,climate,medicine,lifescience…..• Compilerandautomaticparallelization• Machinelearning
ItwasdecidedattheendoftheworkshopinGuangzhoutoorganizenextspringasecondChinese-FrenchWorkshoponExtremeComputing(CFWEC2017),probablyattheMaison delaSimulationinSaclay.
Finalremarks
Atthe39th ORAPforum,scheduledonMarchthe28th atCNRSMichelAnge,Yutong LuoranotherChineseHPCexpertwouldgiveatalk.
Isn’titapleasuretostudyandpracticewhatyouhavelearned.Confucius
OurreallyfirstresultsonThianhe 2,withXinzheWu(U.Lille1,CNRS/MDLS)
October18,2016 ORAP
InstituteofComputingTechnology(ICT),CAS,Beijing(invitedbythedirectorYunquan Zhang,but….)
Theyinvitedmetogiveatalk,wetrytovisitICTduringtheexpert’smission,but…