a vision for exascale, simulation, and deep learning

“AVisionforExascale:Simulation,DataandLearning”

RickStevensArgonneNationalLaboratory

TheUniversityofChicago

Crescatscientia;vitaexcolatur

Data-DrivenScienceExamplesFormanyproblemsthereisadeepcouplingof observation(measurement)andcomputation(simulation)

Cosmology:Thestudyoftheuniverseasadynamicalsystem

SampleExperimentalscattering

Material composition

Simulated structure

Simulatedscattering

La60%Sr 40%

Materialsscience:Diffusescatteringtounderstanddisorderedstructures

ImagesfromSalmanHabibetal.(HEP,MCS,etc.)andRayOsborneetal.(MSD,APS,etc.)

HowManyProjects?

By 2020, the market for machine learning will reach $40

billion, according to market research firm IDC.

Deep Learning market is projected to be ~$5B by 2020

MarketsareDevelopingatDifferentRates~2020

• HPC(Simulation)à [email protected]%• DataAnalysisà [email protected]%• DeepLearningà ~$5B@65%

• DL>HPCin2024• DL>DAin2030

BigPicture

• Mixofapplicationsischanging• HPC“Simulation”,“Big”DataAnalytics,MachineLearning“AI”

• Manyprojectsarecombiningallthreemodalities– Cancer– Cosmology– MaterialsDesign– Climate– DrugDesign

DeepLearning inClimateScience

• StatisticalDownscaling• Subgrid ScalePhysics• DirectEstimateofClimate

Statistics• EnsembleSelection• Dipole/AntipodeDetection

DeepLearninginGenomics

PredictingMicrobialPhenotypes

ClassificationofTumors

Usingdeeplearningtoenhancecancerdiagnosisandclassification,ICML2013

HighThroughputDrugScreening

DeepLearningasanOpportunityinVirtualScreening,NIPS2014

DeepNetworksScreenDrugs

DeepLearningandDrugDiscovery

DeepLearningInDiseasePrediction

LearningClimateDisease

EnvironmentAssociations

BigDataOpportunitiesforGlobalInfectiousDiseaseSurveillanceSimonI.Hay,DylanB.George,CatherineL.Moyes,JohnS.Brownstein

NeuralNetworksin

Materialsscience

• EstimateMaterialsPropertiesfromCompositionParameters

• EstimateProcessingParametersforSynthesis

• MaterialsGenome

SearchingForLensedGalaxies

15TB/NightUseCNNtofindGravitationalLenses

DeepLearningisbecomingamajorelementofscientificcomputingapplications

• AcrosstheDOElabsystemhundredsofexamplesareemerging– Fromfusionenergytoprecisionmedicine– Materialsdesign– Fluiddynamics– Genomics– Structuralengineering– Intelligentsensing– Etc.

WEESTIMATEBY2021ONETHIRDOFTHESUPERCOMPUTINGJOBSONOURMACHINES

WILLBEMACHINELEARNINGAPPLICATIONS

SHOULDWECONSIDERARCHITECTURESTHATAREOPTIMIZEDFORTHISTYPEOFWORK?

HOWTOLEVERAGEEXASCALE?

TheNewHPC“Paradigm”

SIMULATION

DATAANALYSIS

LEARNING

VISUALIZATION

TheCriticalConnectionsI

• EmbeddingSimulationintoDeepLearning– Leveragingsimulationtoprovide“hints”viatheTeacher-StudentparadigmforDNN

– DNNinvokes“SimulationTraining”toaugmenttrainingdataortoprovidesupervised“labels”forgenerallyunlabeleddata

– Simulationscouldbeinvokedmillionsoftimesduringtrainingruns

– Trainingratelimitedbysimulationrates– Ex.CancerDrugResistance

HybridModelsinCancer

Teacher-StudentNetworkModel

Teacher-StudentNetworkModelSimulationBasedPredictions

IntegratingMLandSimulation

TheCriticalConnectionsII

• EmbeddingMachineLearningintoSimulations– Replacingexplicitfirstprinciplesmodelswithlearnedfunctions

– Faster,LowerPower,LowerAccuracy(?)– FunctionsinsimulationsaccessingMLmodelsathighthroughput

– Onnodeinvocationofdozensorhundredsofmodelsmillionsoftimespersecond?

– Ex.Nowcasting inWeather

AlgorithmApproximation

NeuralAccelerationforGeneral-PurposeApproximateProgramsHadi Esmaeilzadeh AdrianSampsonLuisCeze DougBurger∗UniversityofWashington∗MicrosoftResearch

ReplacingImperativeCodewithNNComputedApproximations


2.3xSpeedup,3xPowerReduction,~7%Error


JointDesignofAdvancedComputingSolutionsforCancerDOE-NCIpartnershiptoadvancecancerresearchandhighperformancecomputingintheU.S.

NCINationalCancerInstituteDOE

DepartmentofEnergy

Cancerdrivingcomputingadvances

Computingdrivingcanceradvances

DOESecretaryofEnergy

DirectoroftheNationalCancerInstitute

ScalableDataAnalytics

DeepLearning

Large-ScaleNumericalSimulation

DOEObjective:DirveIntegrationofSimulation,DataAnalyticsandMachineLearning

CORALSupercomputersandExascaleSystems

TraditionalHPC

Systems

Exascale Node ConceptSpace

AbstractMachineModelsandProxyArchitecturesforExascale ComputingRev1.1SandiaNationalLaboratoryandLawrenceBerkeleyNationalLaboratory

LeverageResourcesontheDie,inPackageorontheNode

• Localhigh-bandwidthmemorystacks• Nodebasednon-volitile memory• High-BandwidthLowLatencyFabric• GeneralPurposeCores• DynamicPowerManagement

WhatKindofAccelerator(s)toAdd?

• VectorProcessors• DataFlowEngines• PatchesofFPGA• Many“Nano”Cores(<5MTr each?)

Hardwareandsystemsarchitecturesareemergingforsupportingdeeplearning

• CPUs– AVX,VNNI,KNL,KNM,KNH,…

• GPUs– Nvidia P100,V100,AMDInstinct,BaiduGPU,…

• ASICs– Nervana,DianNao,Eyeriss,GraphCore,TPU,DLU,…

• FPGA– Arria10,Stratix10,FalconMesa,…

• Neuromorphic– TrueNorth,Zeroth,N1,…

Aurora21

• Argonne’sExascale System• Balancedarchitecturetosupportthreepillars

– Large-scaleSimulation(PDEs,traditionalHPC)– DataIntensiveApplications(sciencepipelines)– DeepLearningandEmergingScienceAI

• Enableintegrationandembeddingofpillars• Integratedcomputing,acceleration,storage• Towardsacommonsoftwarestack

DeepLearningApplications• DrugResponsePrediction• ScientificImage

Classification• ScientificText

Understanding• MaterialsPropertyDesign• GravitationalLens

Detection• FeatureDetectionin3D• StreetSceneAnalysis• OrganismDesign• StateSpacePrediction• PersistentLearning• HyperspectralPatterns

ArgonneTargetsforExascaleSimulationApplications• MaterialsScience• Cosmology• MolecularDynamics• NuclearReactorModeling• Combustion• QuantumComputer

Simulation• ClimateModeling• PowerGrid• DiscreteEventSimulation• FusionReactorSimulation• BrainSimulation• TransportationNetworks

BigDataApplications

• APSDataAnalysis• HEPDataAnalysis• LSSTDataAnalysis• SKADataAnalysis• MetagenomeAnalysis• BatteryDesignSearch• GraphAnalysis• VirtualCompound

Library• NeuroscienceData

Analysis• GenomePipelines

44

DeepLearningApplications

• LowerPrecision(fp32,fp16)• FMAC@32and16okay• Inferencingcanbe8bit(TPU)• Scaledintegerpossible• Trainingdominatesdev• Inferencedominatespro• Reuseoftrainingdata• Datapipelinesneeded• DenseFPtypicalSGEMM• SmallDFT,CNN• EnsemblesandSearch• SingleModelsSmall• ImoreimportantthanO• Outputismodels

DifferingRequirements?SimulationApplications

• 64bitfloatingpoint• MemoryBandwith• RandomAccesstoMemory• SparseMatrices• DistributedMemoryjobs• SynchronousI/Omultinode• ScalabilityLimitedComm• LowLatencyHighBandwidth• LargeCoherencyDomains

helpsometimes• OtypicallygreaterthanI• Orarelyread• Outputisdata

BigDataApplications

• 64bitandIntegerimportant• DataanalysisPipelines• DBincludingNoSQL• MapReduce/SPARK• Millionsofjobs• I/Obandwidthlimited• Datamanagementlimited• Manytaskparallelism• Large-datainandLarge-data

out• IandObothimportant• Oisreadandused• Outputisdata

45

Aurora21Exascale Software

• SingleUnifiedstackwithresourceallocationandschedulingacrossallpillarsandabilityforframeworksandlibrariestoseamlesslycompose

• Minimizedatamovement:keeppermanentdatainthemachineviadistributedpersistentmemorywhilemaintainingavailabilityrequirements

• SupportstandardfileI/OandpathtomemorycoupledmodelforSim,DataandLearning

• Isolationandreliabilityformulti-tenancyandcombiningworkflows

TowardsanIntegratedStack

TheNewHPC“Paradigm”

SIMULATION

DATAANALYSIS

LEARNING

VISUALIZATION

Acknowledgements

ManythankstoDOE,NSF,NIH,DOD,ANL,UC,MooreFoundation,SloanFoundation,Apple,Microsoft,Cray,Intel,NVIDIAandIBMforsupportingourresearchgroupovertheyears

OurVisionAutomateandAccelerate

TheCANDLEExascaleProject

56

DrugResponse CANDLEGeneralWorkflow

56

ECP-CANDLE :CANcerDistributedLearningEnvironmentCANDLEGoals

Developanexascaledeeplearningenvironmentforcancer

BuildingonopensourceDeeplearningframeworks

OptimizationforCORALandexascaleplatforms

Supportallthreepilotprojectneedsfordeep

CollaboratewithDOEcomputingcenters,HPCvendorsandECPco-designandsoftwaretechnologyprojects

57

CANDLESoftwareStack

HyperparameterSweeps,DataManagement(e.g.DIGITS,Swift,etc.)

ArchitectureSpecificOptimizationLayer(e.g.cuDNN,MKL-DNN,etc.)

Tensor/GraphExecutionEngine(e.g.Theano,TensorFlow,LBANN-LL,etc.)

Networkdescription,ExecutionscriptingAPI(e.g.Keras,Mocha)

Workflow

Scripting

Engine

Optimization

58

DLFrameworks“TensorEngines”• TensorFlow(c++,symbolicdiff+)• Theano(c++,symbolicdiff+)• Neon (integrated)(python+GPU,symbolicdiff+)• Mxnet (integrated)(c++)• LBANN (c++,aimedatscalablehardware)• pyTorch7THTensor(clayer,symbolicdiff-,pgks)• Caffe (integrated)(c++,symbolicdiff-)• Mocha backend(julia+GPU)• CNTKbackend(microsoft)(c++)• PaddlePaddle(Baidu)(python,c++,GPU)

• Variational AutoEncoder– Learning(non-linear)featuresofcoredatatypes

• AutoEncoder– Moleculardynamicstrajectorystatedetection

• MLP+LCNNClassification– Cancertypefromgeneexpression/SNPs

• MLP+CNNRegression– Drugresponse(geneexp,descriptors)

• CNN– Cancerpathologyreporttermextraction

• RNN-LSTM– Cancerpathologyreporttextanalysis

• RNN-LSTM– Moleculardynamicssimulationcontrol

CANDLEBenchmarks..Representativeproblems

ProgressinDeepLearningforCancer• AutoEncoders – learningdatarepresentationsforclassificaitonandpredictionofdrugresponse,moleculartrajectories

• VAEsandGANs– generatingdatatosupportmethodsdevelopment,dataaugmentationandfeaturespacealgebra,drugcandidategeneration

• CNNs – typeclassification,drugresponse,outcomesprediction,drugresistance

• RNNs– sequence,textandmoleculartrajectoriesanalysis

• Multi-TaskLearning– terms(fromtext)andfeatureextraction(data),datatranslation(RNAseq<->uArray)

CANDLE- FOM– RateofTraining• “Numberofnetworkstrainedperday”

– sizeandtypeofnetwork,amountoftrainingdata,batchsize,numberofepochs,typeofhardware

• “Numberof‘weight’updates/second”– ForwardPass+BackwardPass

• TrainingRate=∑ni=1 aiRi whereRi istherateforourbenchmarki andaiisaweight

7 CANDLEBenchmarks

Benchmark Type Data ID OD SampleSize

SizeofNetwork

Additional(activation,layer

types,etc.)1.P1:B1Autoencoder MLP RNA-Seq 105 105 15K 5layers Log2(x+1)à [0,1]

KPRM-UQ2.P1:B2Classifier MLP SNPà

Type106 40 15K 5layers TrainingSetBalance

issues3.P1:B3Regression MLP+LCN expression;

drug descs105 1 3M 8layers DrugResponse

[-100,100]

4.P2:B1Autoencoder MLP MDK-RAS 105 102 106-108 5-8layers StateCompression

5.P2:B2RNN-LSTM RNN-LSTM MDK-RAS 105 3 106 4layers StatetoAction

6.P3:B1RNN-LSTM RNN-LSTM Pathreports

103 5 5K 1-2layers Dictionary12K+30K

7.P3:B2Classification CNN Pathreports

104 102 105 5layers Biomarkers

BenchmarkOwners:• P1:FangfangXia(ANL)• P2:BrianVanEssen(LLNL)• P3:ArvindRamanathan(ORNL)

64

https://github.com/ECP-CANDLE

TypicalPerformanceExperienceCANDLE- Predictingdrugresponseoftumorsamples• MLP/CNNonKeras• 7layers,30M- 500Mparameters• 200GBinputsize• 1hour/epochonDGX-1;200epochstake8days(200GPU

hrs)• Hyperparametersearch~200,000GPUhrsor8MCPUhrs

Proteinfunctionclassificationingenomeannotation• DeepresidualconvolutionnetworkonKeras• 50layers• 1GBinputsize• 20minutes/epochonDGX-1;200epochstake3days(72

GPUhrs)• Hyperparametersearch~72,000GPUhrsor2.8MCPUhrs

GithubandFTP

• ECP-CANDLEGitHubOrganization:• https://github.com/ECP-CANDLE

• ECP-CANDLEFTPSite:• TheFTPsitehostsallthepublicdatasetsfor thebenchmarksfromthreepilots.

• http://ftp.mcs.anl.gov/pub/candle/public/

ThingsWeNeed• DeepLearningWorkflowTools• DataManagementforTrainingDataandModels• PerformanceMeasurement,ModelingandMonitoringofTrainingRuns

• DeepNetworkModelVisualization• Low-levelSolvers,OptimizationandDataEncoding

• ProgrammingModels/RuntimestosupportnextgenerationParallelDeepLearningwithsparsity

• OSSupportforHigh-ThroughputTraining

a vision for exascale, simulation, and deep learning

Technology