the expert's mission in china, may 9-13, 2016

31
Le Calcul Haute Performance en Chine High Performance Computing in China Serge G. Petiton

Upload: dinhthuan

Post on 13-Feb-2017

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The expert's mission in China, May 9-13, 2016

LeCalculHautePerformanceenChineHighPerformanceComputinginChina

SergeG.Petiton

Page 2: The expert's mission in China, May 9-13, 2016

Outline

• Introductionandcontext• Theexpert’smissioninChina,May9-13,2016• Othersvisitsanddiscussions• Projects• Conclusion

ORAPOctober18,2016

Page 3: The expert's mission in China, May 9-13, 2016

Outline

• Introductionandcontext• Theexpert’smissioninChina,May9-13,2016• Othersvisitsanddiscussions• Projects• Conclusion

ORAPOctober18,2016

Page 4: The expert's mission in China, May 9-13, 2016

October18,2016 ORAP

Chinesesupercomputersare#1(93Pf)and#2(Linpack)Approximately:#1isx5fasterthanthefirstUSsupercomputer(Titan),#1isx18fasterthanthefirstinFrance(TOTAL,5.2PF,#11)#1isx45fasterthanProlix(Meteo,2,1Pf,#40)#1isx60fasterthanfirstGENCIsupercomputer(1.6Pf,#53)

ProcessorsmadeinChina(#1)

SeveralimportantChineseprojectsonHPC.Exascale supercomputerscheduledfor2020(35MW).

EcosystemandresearchstructuresforHPCinChina?Applicationsandutilizationsofsupercomputers?DoesaFrench-ChinesecollaborationonHPCpossible?

Expert’smissionco-organizedwiththeFrenchEmbassyinBeijing

VisitingProfessorattheCASin2016Othersvisits(HPCChina,inWuxi,….)

Severalvisitsindifferentlaboratoriesandcompanies

Thistalk:atentativetoshareobservations,discussions,remarksandothersconsiderationsonHPCinChina.

Onlyothersmissionsandinternationalcollaborationswouldallowtohaveamorerealisticview

Page 5: The expert's mission in China, May 9-13, 2016

Outline

• Introductionandcontext• Theexpert’smissioninChina,May9-13,2016• Othersvisitsanddiscussions• Projects• Conclusion

ORAPOctober18,2016

Page 6: The expert's mission in China, May 9-13, 2016

October18,2016 ORAP

Theideaofthisexpert’smissiononHPCinChinacomefromdiscussionsduringtheHPCChinaconferenceinWuxi,onNovember2015.

MissionSchedule:

• First Chinese-French Workshop on Extreme Computing (CFWEC2016)organized with Yutong Lu on May 9th and 10th at theNational SuperComputing Center in Guangzhou (NSCC-GZ)

• Visits co-organized with Abdo Malloc (French Embassy in China, Beijing) andHaiwu He (Chinese Academy of Science)

o Sugono ParaTerao CNIC at CASo CAS-CNIC supercomputing centreo University Tsinghuao Inspuro National Supercomputing Center at Jinan

Thereportwouldbeonlinesoonorap.irisa.fr

ContactattheFrenchEmbassy:Abdo Mallac

WithGabrielAntoniu (INRIA),ChristopheCALVIN(CEA),ThierryCollette(CEA),MichelDayde (CNRS)DiegoKlahr (TOTAL),XavierVigouroux (Atos)

Page 7: The expert's mission in China, May 9-13, 2016

October18,2016 ORAP

Goal:scientificexchangeswithChineseresearchersandleadersofHPCinChina.Discussionsforfuturecollaborations.

ChinesespeakersatCFWEC2016:

• Depei Qian:ProfessorattheXi’anJiatong UniversityandattheBeihang University(Beijing),memberoftheNationalCommitteeofHighTechnologyPrograms.

• Yutong Lu:DirectoroftheNationalSuperComputing CenteratGuangzhouand(until2016)ProfessorattheNationalUniversityofDefenseandTechnology.DeputyChiefdesignerofTianhe 2andoftheChinaacceleratorMatrix2000.

• ChaoYang,ProfessoratBeihang UniversityandattheInstituteofSoftwareoftheCAS

• HuiYan,LiangLiu,Jingkun Chen,Jianhui Li:researchersattheNSCC-GZ

Page 8: The expert's mission in China, May 9-13, 2016

PresentthesupercomputingcentersinChinaandtheChineseNationalGrid(CNGrid).

ChinawasnotabletoobtainIntelprocessors(USembargo)fortheTianhe 2A.Theyneedprocessorsandaccelerators.Then,theydevelopedtheirowndomesticprocessors.

Newfiveyearsnationalplan:HPCisapriority

Nextobjective:severalhundredsofpetaflops

Exascale machine(2020,35MW):3Dprocessors,opticalcommunications,onchipnetwork….

ChinawanttodevelopinternationalcollaborationsonHPC,forDepei QianthisworkshopisafirststepItexistsalreadyChina-EUcollaboration.HPCisanIdentifiedtopicwhichwasneverlaunched.

Necessityofmulti-disciplinaryresearches

Lackof“talents”forHPCinChina.

“Withoutanecosystemforourdomesticprocessor,wewillnotsuccess”

October18,2016 ORAP

Depei Qian’stalk

CNGrid

Page 9: The expert's mission in China, May 9-13, 2016

October18,2016 ORAP

Yutong Lu’stalk

Yutong LudescribedthedevelopmentofsupercomputersinNUDT,especiallyoftheTianhe ones.

Page 10: The expert's mission in China, May 9-13, 2016

October18,2016 ORAP

Page 11: The expert's mission in China, May 9-13, 2016

October18,2016 ORAP

NoPGASlanguageornewprogrammingparadigmsontheTianhe 2softwarestack(butwhereitisproposed?)

Allthenationalcenters integrateHPCandBigData:HighPerformanceDataAnalytics(HPDA).

Cloudisconsideredonthesoftwarestack.

Nevertheless,itisstillseparatedatthenationallevel.SomecenterfocusonlyonHPCandothersalreadyonHPDA.

HadoopandSparkareonthesameTianhe 2starlightsoftwarestack,withMPIandOpenMP.

Tianhe 2hasafilesystemcalledH2FS,builtonthetopofHDFS

DuringtheworkshopHPC,BigDataandCloudswereoftenconsideredonthesameecosystem,theCNGridandtheSCGrid arealsowell-integrated.

Allthespecificationsarewell-introducedbutitisreallydifficulttoknowthedegreeofrealizationandthenumberofapplicationsdeployedwiththisHPDAparadigm.Nevertheless,theidea,thedynamismandthecontextarereal!

Page 12: The expert's mission in China, May 9-13, 2016

October18,2016 ORAP

Software:• Paramon :datacollector,overhead 5%.Scalableupto1000nodes.Nostorage.APImaygiveothersmetrics.

X86,GPU,XeonPhiandPower.• Paratune : Characterisation of codes with respect to the communication throughput.• OITS : System supervision with alerting by mail.

They focus on end-users and acquire some expertise to develop software for future Chinese machines

Collaborations with :

• CNPC BGP, PetroChina and Sinopec, Research institutes such as SC-CAS, of CAS, Tsinghua University.

• HP, Dell, Lenovo and Intel, for example

ProfessorZhangYouhui andhisteam,HPCResearchandEducation andInstituteofHPC(IHPC),pioneerinChinatobuiltHPCclustersbasedonx86architecture

• Developmentofmathematicmodelsforscientificsimulations• Braininspiredcomputing.Managementofdendrite-axonusinga256x256crossbar:ProcessormadeinChina• Approximatedcomputing• Neuralcomputationaccelerators

IHPCisanimportantevaluationcentreforthenationalHPCprojects

Thislaboratoryseemsinvolvedon(quite)alltheHPCprojectsinChina.

Page 13: The expert's mission in China, May 9-13, 2016

October18,2016 ORAP

TheComputerNetworkInformationCenter,CAS,Beijing

TheSCGrid :interconnectingtheCASsupercomputingCenters

Researchessupported(62%)bytheNationalScienceFoundationofChina (NSFC )

Numerical algorithms, Cloud and Grid services, brainsimulation, genomic. The CNIC also develop tight collaborationwitht the China Electric Power Research Institute (CEPRI) onnew energy management.

Deep learning is a new important topic (teams havetomove to this topic while I was there)

Several talks, including• Zhao Di, of the Advanced Brain Computing of

CAS : researches on the brain siulation suingGOU for neural convolutions wuth deeplearning.

• Wei Chen (center of scientific computingapplications & research) on large scale physicalsimulation : soft condensed matter physics.

• ….

Join lab CNIC-Intel and collaboration with CERN

SC-CAS : supercomputing center of the CNIC-CAS

Page 14: The expert's mission in China, May 9-13, 2016

October18,2016 ORAP

TheSC-CAS

• VisualisationandHPCapplicationdevelopmentsupportsforCAS,inparticularforCNIC

• ManagementoftheScGrid• ManagementofthelargernodeinnorthofChinaoftheCNGrid

Page 15: The expert's mission in China, May 9-13, 2016

October18,2016 ORAP

WevisitthetwocompaniesthemostinvolvedonHPCdevelopmentofHPCmachinesinChina:• Sugon (exDawning)inBeijing(Privatecompany)• Inspur(exLangchao)inJinan

Sugon developedabout10%oftheTop500machines(4th) andInspurabout4%

ThefasterssystemsaredevelopedbyNationalCentersorUniversities(NUDT)withtheexpertiseoftheCAS(ICTmainly,butalsowithothersInstitutes),andtheIHPCoftheTsinghuaUniversity.

Sugon

FirstinChinaHPCmarket:HPCandDataCentersTheSugon 7000cube(earthsystemnumericalsimulationSiliconCube):3600processors

Lenovo:newInnovationcentervisited

ChineseCompanies:Sugon,Inspur

Page 16: The expert's mission in China, May 9-13, 2016

October18,2016 ORAP

TechnologyIntel:CPUandMICTargetedclimatemodeling,withtheCAS

Expertisetobuildperformanthardware,includingfastnetworkforHPC

Page 17: The expert's mission in China, May 9-13, 2016

October18,2016 ORAP

Inspur

FirstChinesecompanytobeinvolvedontheITmarket.Hardwareforserversbutdevelopalsosomesoftware.HPC,BigDataandCloud.LargemarketwithBaidu,Alibabaandothers

ExpertisetobuildperformantmachinesinrelationwiththeICTofCAS

TwojoinlaboratorieswithNvidia andIntel.AnofficeintheSiliconValleyatFremont.

NationalSupercomputingCenteratJinanJuin 2011

Juin 2016SunwayBlueLight MPP796teraflopsEfficiencyLinpack :74,4%140000cores(approx.),8704processors1nodewith256socketsand4terabytesofmemory.ProcessormadeinChinaShenWei SW1600,16cores128Gigaflops,16GBofRAM.1MWatts (741Megaflop/WattInfiniBandwithswitchesmadeinChina

ImportantKnowledgeacquiredtodevelopthefollowingSunwaysupercomputernow#1(Linpack).

OneoftheknownChinesetracktoexascale

Page 18: The expert's mission in China, May 9-13, 2016

Outline

• Introductionandcontext• Theexpert’smissioninChina,May9-13,2016• Othersvisitsanddiscussions• Projects• Conclusion

ORAPOctober18,2016

Page 19: The expert's mission in China, May 9-13, 2016

October18,2016 ORAP

InstituteofSoftware(IS),CAS,Beijing

Severalinternationalcollaborations

Linearalgebra,deeplearning,optimizations,…...

ResearchesonDistributedandParallelComputing.

DiscussionsonKrylovmethodsandauto-tunningatruntimeforlinearalgebrarestartedmethods

Alotofinternationalcollaborations(wediscussedaboutfutureFrench-Chineseones)

ThemajorityoftheresearchersspenttimeinUSA(Argone,Yale,…) orinEurope(Germany,…)

InvolvedonHPCChineseprojectandothersCASapplicationdevelopments

WiththeInstituteofComputingTechnology(ICT),probablythemostimportantCASinstituteforHPC(excludingApplications).ItseemsthattheCNICisnowdedicatedlargelyfordeeplearningcomputing(sincespring2016)

(invitedbyChaoYang)

Page 20: The expert's mission in China, May 9-13, 2016

October18,2016 ORAP

LenovoEnterpriseInnovationCenter(EIC),Beijing

WithLaurentBodelin (CAS):firstvisitors

Benchmark,demos,Training

Showrooms

HPCisoneofthe3tech.fieldsItexiststhesameinitiativeinUSandinGermany

Page 21: The expert's mission in China, May 9-13, 2016

Outline

• Introductionandcontext• Theexpert’smissioninChina,May9-13,2016• Othersvisitsanddiscussions• Projects• Conclusion

ORAPOctober18,2016

Page 22: The expert's mission in China, May 9-13, 2016

October18,2016 ORAP

TheChinaAccelerator(Matrix2000projectleadbyYutong Lu)

TheGeneralProposeDigitalSignalProcessor(GPDSP)Matrix2000projectwasfirstscheduledfortheTianhe 2Awhichwassupposedtobethefirstsupercomputertoreachthe100petaflops,beforetheUSembargo.

Itwassupposetoreach2.4teraflops(64bits)for200Watts,runningat1GHertz.

Eachcoreownascalarandavectorunits,withVLIW

Nevertheless,Tianhe 2AwillusetheARMprocessor“Mars”developped inChinaandtheMatrix2000acceleratorprojectseemtohavebeenabandonedforHPCmachine,forthemoment(Itisdifficulttohaveanyconfirmation).

TheARMprocessorAMRv8,andtheScalableVectorExtensions(SVE)

AnnouncedatHotChip2016,August2016Vector:2048bits

Backatthetimeoftheautomaticvectorizationmirage:SIMDversusfluxparallelism(pipeline)orDataParallelprogrammingparadigms

Page 23: The expert's mission in China, May 9-13, 2016

October18,2016 ORAP

TheMarschipdevelopedbyPhytium inChina

ARMchipbasedontheARMv8,512gigaflopsfor100Watts,2GHertz.500M$28nm,TSMCBulk,inChina

Eachpanelowns8“Xiaomi”cores,forasetof64cores

Page 24: The expert's mission in China, May 9-13, 2016

October18,2016 ORAP

Thesunway TaihuLight MPP

Developed by the National Research Center of Parallel Computer Engineering and Technology(NRCPC) in Shanghai and installed at the National Supercomputer center in Wuxi,

93 Petaflops (Linpack), peak performance of 125 Petaflops (efficiency : 75%).165 120 nodes and 1.2 Petabytes of memory (7.2 Terabytes for each node). The electricconsumption is 15,371 MWatts (Linpack), e.g. 6 Gigaflops per Watt.

The Sunway TaihuLight processor is the SW 26010, designed by the Shanghai High Performance ICDesign Center

Nevertheless, for the High Performance Conjugate Gradients (HPCG) benchmark , the performanceis 0,371 only petaflops (0,3 % of the peak performance : memory and network not enough”powerful »

Cf. JackDongarra’s reportonTaihuLigth MPP

Page 25: The expert's mission in China, May 9-13, 2016

October18,2016 ORAP

Eachgroupofcoresiscomposedbyamanagementunit(MPE)and64computingunits(CPE)structuredona8x8grid,connectedbyanetworkonchip(NOC)

On each SW 26019 processor, made in China, we have 4 CPEs (64 x 4 = 256 cores) and 4 MPE - e.g.260 cores – and 4 memory controllers (MC). Each MC has 8 Gigabytes of memory.

The all machine has 40960 nodes and 10 649 600 cores for 1,31 petabytes of memory.

Page 26: The expert's mission in China, May 9-13, 2016

October18,2016 ORAP

Each core computes 8 double floating pointoperations per cycles

Core : 1,45 GigaHertz

MPE peak performance : 23,2 Gigaflops, compute 16 floating point operations percycle

Each node performance : 3,06 Teraflops.

40 racks, each with 4 supernodesof 256 nodes each

Peak performance of 125,4 petaflops,

Page 27: The expert's mission in China, May 9-13, 2016

Outline

• Introductionandcontext• Theexpert’smissioninChina,May9-13,2016• Othersvisitsanddiscussions• Projects• Conclusion(2slides)

ORAPOctober18,2016

Page 28: The expert's mission in China, May 9-13, 2016

October18,2016 ORAP

HPCChinesesupercomputersandframeworks

China has the faster supercomputers since a few years and the larger number of machines on theTop500 since June 2016. Nevertheless, they are not always very well-adapted for otherscomputations than the « Linpack » (but it is also the case of some others computers), but thedifference is so large that at the end the computing capacity is probably still quite larger.

Itisnotonlyaproblemofcomputingpeakorsustainedperformances,butalsoonthepossibilitytodoresearchonfutureexascale andbeyondalgorithmsandprogrammingparadigms!Withlargermachines,wemaydomorecredibleresearchesonnewresilientandenergyefficientalgorithmsandevaluatenewprogrammingparadigms(PAGS,graphsofComponents/tasks,….)

Chinahasatleasttwoknowntrackstodevelopanexascale supercomputerby2020:theSunway(RISCChineseprocessor)andTianhe (TH3– usingARMphytium processorswithSVE?)ones.

Thechallengeisalso,andperhapsmostly,todeveloptheassociatedecosystem.

Chinesechallenge

First:developalltheecosystemassociatedwiththeirmadeinChinamachines,andeducate“talents”todevelopefficientHPCandBigDataapplications

ImproveinternationalcollaborationsonHPCandBigData

Page 29: The expert's mission in China, May 9-13, 2016

October18,2016 ORAP

FrenchChineseCollaborationsonHPC

Wedecided,duringdiscussionsatNCCS-GZ,towriteawhitepaperaboutdevelopingChinese-FrenchresearchesonExtremeComputationalandDataScience.Adraftofsuchpaperisunderreview.

Weestablishedafirstlistofpotentialjointresearches(tobeconfirmed):• Linearalgebra

- firstexperimentationsonTianhe 2atNSCC-GZ:unite-and-conquerasynchronousGMRES/ERAM-LS(realisticteststobelaunchedasap)

• Programmingparadigmsandlanguages- YML-TEZwhileatCAS-CNIC:mixingcomputationalanddatacomputing

• I/Osystemsandmanagements• Runtimesystems• Applications:datascience,climate,medicine,lifescience…..• Compilerandautomaticparallelization• Machinelearning

ItwasdecidedattheendoftheworkshopinGuangzhoutoorganizenextspringasecondChinese-FrenchWorkshoponExtremeComputing(CFWEC2017),probablyattheMaison delaSimulationinSaclay.

Finalremarks

Atthe39th ORAPforum,scheduledonMarchthe28th atCNRSMichelAnge,Yutong LuoranotherChineseHPCexpertwouldgiveatalk.

OurreallyfirstresultsonThianhe 2,withXinzheWu(U.Lille1,CNRS/MDLS)

Page 30: The expert's mission in China, May 9-13, 2016

October18,2016 ORAP

FrenchChineseCollaborationsonHPC

Wedecided,duringdiscussionsatNCCS-GZ,towriteawhitepaperaboutdevelopingChinese-FrenchresearchesonExtremeComputationalandDataScience.Adraftofsuchpaperisunderreview.

Weestablishedafirstlistofpotentialjointresearches(tobeconfirmed):• Linearalgebra

- firstexperimentationsonTianhe 2atNSCC-GZ:unite-and-conquerasynchronousGMRES/ERAM-LS(realisticteststobelaunchedasap)

• Programmingparadigmsandlanguages- YML-TEZwhileatCAS-CNIC:mixingcomputationalanddatacomputing

• I/Osystemsandmanagements• Runtimesystems• Applications:datascience,climate,medicine,lifescience…..• Compilerandautomaticparallelization• Machinelearning

ItwasdecidedattheendoftheworkshopinGuangzhoutoorganizenextspringasecondChinese-FrenchWorkshoponExtremeComputing(CFWEC2017),probablyattheMaison delaSimulationinSaclay.

Finalremarks

Atthe39th ORAPforum,scheduledonMarchthe28th atCNRSMichelAnge,Yutong LuoranotherChineseHPCexpertwouldgiveatalk.

Isn’titapleasuretostudyandpracticewhatyouhavelearned.Confucius

OurreallyfirstresultsonThianhe 2,withXinzheWu(U.Lille1,CNRS/MDLS)

Page 31: The expert's mission in China, May 9-13, 2016

October18,2016 ORAP

InstituteofComputingTechnology(ICT),CAS,Beijing(invitedbythedirectorYunquan Zhang,but….)

Theyinvitedmetogiveatalk,wetrytovisitICTduringtheexpert’smission,but…