using parallel python tools to postprocessdata for cmip6€¦ · using parallel python tools to...
TRANSCRIPT
air • planet • peopleCISL/TDD/ASAP
© UCAR, 2015
Using Parallel Python Tools to Postprocess Data for CMIP6
Sheri MickelsonKevin Paul
Eighth Symposium on Advances in Modeling and Analysis Using Python
AMS 2018
1
air • planet • people© UCAR, 2015
What is CMIP6?
Using Parallel Python Tools to Postprocess Data for CMIP6
2
§ Internationally coordinated effort to run sets of defined experiments. There are roughly 30 different centers from around the world that will be participating.
§ Each experiment has a defined set of protocols, forcings, and requested output.
§ Running the same experiments with multiple models leads to stronger results.
§ The results of these simulations are evaluated as part of the IPCC Climate Assessment Reports. The results are also used by international governments for policy decisions and for further research by scientific institutions and universities.
air • planet • people© UCAR, 2015
CMIP5 Workflow
Using Parallel Python Tools to Postprocess Data for CMIP6
3
ModelRun
Publication
Post-Processing
CESMModelRun
TimeSeriesConversion(NCO)
CMOR
Diagnostics(NCO/NCL)
PushtoESGF
Theseareallofthestepsthatweneedtotaketopublishourdatatothecommunity.ForCMIP5thisprocesstook15monthstopostprocess 200TBofdata.
air • planet • people© UCAR, 2015
CMIP5 Workflow
Using Parallel Python Tools to Postprocess Data for CMIP6
4
ModelRun
Publication
Post-Processing
CESMModelRun
TimeSeriesConversion(NCO)
CMOR
Diagnostics(NCO/NCL)
PushtoESGF
Series 1Field 1
Slice 1
Fiel
d 1
Fiel
d 2
Fiel
d 3
Slice 5
Fiel
d 1
Fiel
d 2
Fiel
d 3
Slice 3
Fiel
d 1
Fiel
d 2
Fiel
d 3
Slice 4
Fiel
d 1
Fiel
d 2
Fiel
d 3
Slice 2
Fiel
d 1
Fiel
d 2
Fiel
d 3
Series 2Field 2
Series 3Field 3
ConvertingfromTimeSlicetoTimeSeries
Modeloutputsdatainaformatthathasonetimesliceandmultiplevariables.Thepreferreddistributionformatisonevariableandmultipletimeslices.Thisstepconvertsfromoneformattotheother.TheexistingmethodusedNCOfortheconversion.ThiswasthemostexpensivestepinCMIP5.
air • planet • people© UCAR, 2015
CMIP5 Workflow
Using Parallel Python Tools to Postprocess Data for CMIP6
5
ModelRun
Publication
Post-Processing
CESMModelRun
TimeSeriesConversion(NCO)
CMOR
Diagnostics(NCO/NCL)
PushtoESGF
6ComponentDiagnosticPackages(Atm,Lndx2,Ocn,SeaIce,BGC)
Usedtodocumentandevaluatetheclimatesimulation.
Alloftheoriginalpackages:1. Containatoplevelcontrolscript2. CreateclimatologyfileswithNCO
tools3. CreatehundredsofplotswithNCL
scripts4. Createwebpagesthatallowusersto
browsethroughplots
air • planet • people© UCAR, 2015
CMIP5 Workflow
Using Parallel Python Tools to Postprocess Data for CMIP6
6
ModelRun
Publication
Post-Processing
CESMModelRun
TimeSeriesConversion(NCO)
CMOR
Diagnostics(NCO/NCL)
PushtoESGF
ThisstepStandardizes themodeloutputininordertomakeiteasiertocompareagainstothermodelsfortheintercomparison.
Someexamples:• Fileformats(e.g.,NetCDF4)• Namesoffilesanddirectorystructure• Fileattributes(e.g.,institution,MIPname,…)• Namesofdimensions(e.g.,lat,lon,…)• Namesofvariables(e.g.,psl,ta,tas,…)• Dimensionsofvariables• Variabledatatypes(e.g.,float,double,
…)• Attributesofvariables(e.g.,units,…)• Rangesoftime(e.g.,2006to2100)• Derivingvariablesthatarenotoutputted
directly
UsedFortranandNCLcode,NCO,andCMOR
air • planet • people© UCAR, 2015
CMIP5 Workflow
Using Parallel Python Tools to Postprocess Data for CMIP6
7
ModelRun
Publication
Post-Processing
CESMModelRun
TimeSeriesConversion(NCO)
CMOR
Diagnostics(NCO/NCL)
PushtoESGF
Motivation:ForCMIP6wewillhavetopostprocess 6PBofdatawithinthesameamountoftime.WeneededbettermethodsinordertobeabletocreatethisamountofdataforthecommunityintimeforAR6.
WeNeededtodoThreeThings:1. IncreasePerformance:Addedparallelizationintotheworkflow2. ReduceHumanIntervention:Workedonintegratingourworkflows
intoanautomatedworkflowengine3. ProjectManagement:Everythingiscoordinatedthroughacentral
database
CMIP66PB
(CurrentPrediction)
CMIP5200TB
air • planet • people© UCAR, 2015
CMIP6 Workflow
Using Parallel Python Tools to Postprocess Data for CMIP6
8
ModelRun
Publication
Post-Processing
CESMModelRun
TimeSeriesConversion(PyReshape)
PyConform
Diagnostics(PyAverager)
PushtoESGF
WorkflowDriv
enbyC
ylc
ExperimentsUpdate
TheirStatusinRunDatabase
WerewrotetoolsinPythonandaddedtaskparallelization.
Allofthetoolsdependon:• MPI4Pyforinternodecommunication• PyNIO andNetCDF4-pythonforI/O
air • planet • people© UCAR, 2015
Parallelization Methods
Using Parallel Python Tools to Postprocess Data for CMIP6
9
Slice 1Fi
eld
1Fi
eld
2Fi
eld
3Slice 3
Fiel
d 1
Fiel
d 2
Fiel
d 3
Slice 2
Fiel
d 1
Fiel
d 2
Fiel
d 3
Series 1Field 1
Series 2Field 2
Series 3Field 3
Rank 1
Rank 2
Rank 3
Tim
e A
vera
ged
Clim
atol
ogy
File
Tim
e A
vera
ges
(Int
erna
l M
emor
y)T
ime-
Seri
es
File
s
Var 1 Var 2 Var 3
Rank 1 Rank 2 Rank 3
Avg
Var 1
Avg
Var 2
Avg
Var 3
Rank 0
Avg
Var 1
Avg
Var 2
Avg
Var 3
Var 1 Var 2 Var 3
Rank 1 Rank 2 Rank 3
Avg
Var 1
Avg
Var 2
Avg
Var 3
Rank 0
Avg
Var 1
Avg
Var 2
Avg
Var 3
Var 1 Var 2 Var 3
Rank 1 Rank 2 Rank 3
Avg
Var 1
Avg
Var 2
Avg
Var 3
Rank 0
Avg
Var 1
Avg
Var 2
Avg
Var 3
AVG1
AVG 2
AVG 3
AVG 4
AVG 5
AVG 6
AVG 7
AVG 8
AVG 9
Ave
rage
s to
C
ompu
te
InterCommunicator 1 InterCommunicator 2 InterCommunicator 3
PyReshaper
PyAverager
PyConform
“x = X1 + X2”Read:X1[i]
Read:X2[i]
Evaluate:(X1+X2)[i]
Map:iàj
Validate:> minimum< maximum
dimensions = [j]etcetera
Write:x[j] File
“y = X1 - X2”
Read:X1[i]
Read:X2[i]
Evaluate:(X1-X2)[i]
Map:iàj
Validate:> minimum< maximum
dimensions = [j]etcetera
Write:y[j] File
air • planet • people© UCAR, 2015
PyReshaper Performance
Using Parallel Python Tools to Postprocess Data for CMIP6
10
ResultsarefromrunningthePyReshaper toolon16yellowstone cores,4coreson4nodes
air • planet • people© UCAR, 2015
PyAverager Performance
Using Parallel Python Tools to Postprocess Data for CMIP6
11
0.1
1
10
100
1000
ATM-SE ICE LND OCN Total
Tim
e (m
inut
es) l
og s
cale
CESM Model Component
NCOPyAverager
Timetocomputeclimatologyfilesfor10yearsofCESMmonthlytimeslicefiles.ThePyAverager ranon120coresonyellowstone andthediagnosticson16yellowstone cores.
1
10
100
Tim
e (m
in) l
og s
cale
Performance Comparison Across Diagnostic Packages
OriginalPyAverager/NCL in Parallel
air • planet • people© UCAR, 2015
PyConform Performance(Preliminary Timing Numbers)
Using Parallel Python Tools to Postprocess Data for CMIP6
12
CESMCaseNameCMIP5Table
InputDatasetSize
OutputDatasetSize
OriginalSerialRuntime
PyConformParallelRuntime(16Procs) SPEEDUP
b40.rcp4_5.1deg.006 Amon 84 GB 62GB 72mins 2mins 38x
b40.20th.track1.1deg.012Amon 135GB 102GB 120mins 8mins 16x
3hr 540GB 506GB 6hours 11mins 34x
air • planet • people© UCAR, 2015
CMIP6 Workflow
Using Parallel Python Tools to Postprocess Data for CMIP6
13
ModelRun
Publication
Post-Processing
CESMModelRun
TimeSeriesConversion(PyReshape)
PyConform
Diagnostics(PyAverager)
PushtoESGF
WorkflowDriv
enbyC
ylc
ExperimentsUpdate
TheirStatusinRunDatabase
WeadoptedCylc asourworkflowEngine(writtenbyHilaryOliveratNIWA)
Weauto-generatetheworkflowdescriptionfilesfromboththeCESMandpostprocessenvironments.Alltheuserneedstodoiseditatoplevelscripttosetcertainvariables(runlength,whichdiagnosticstorun,etc)andthenmanuallystarttherunthroughCylc’sGUIorcommandlineinterface.
air • planet • people© UCAR, 2015
Use Cases of Experiments That Used Cylc
§ Used Cylc to complete 1,240 out of 1,860 total runs ~750 TB timeslice output in about 1 month
§ Used Cylc to run and postprocess part of a 30 member ensemble in a couple of months
§ Used Cylc to build and run over 20,000 forecast ensembles in a couple of months
Using Parallel Python Tools to Postprocess Data for CMIP6
14
air • planet • people© UCAR, 2015
Questions?§ PyReshaper
§ https://github.com/NCAR/pyreshaper§ PyAverager
§ https://github.com/NCAR/pyAverager§ PyConform (still in development)
§ https://github.com/NCAR/PyConform§ CESM/Cylc WF
§ https://github.com/NCAR/CESM-WF§ Cylc
§ https://cylc.github.io/cylc/
Conact Info mickelso .at. ucar.edu
Using Parallel Python Tools to Postprocess Data for CMIP6
15