parallel io in cesm jim edwards [email protected] ncar, p.o. box 3000, boulder co, 80307-3000 usa...
TRANSCRIPT
Parallel IO in CESM Jim [email protected]
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
Workshop on Scalable IO in Climate Models 27/02/2012
Parallel IO in the Community Earth System Model
Jim Edwards John Dennis
(NCAR)Ray Loy(ANL)
Pat Worley (ORNL)
Parallel IO in CESM Jim [email protected]
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
Workshop on Scalable IO in Climate Models 27/02/2012
Parallel IO in CESM Jim [email protected]
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
Workshop on Scalable IO in Climate Models 27/02/2012
• Some CESM 1.1 Capabilities:– Ensemble configurations with multiple
instances of each component– Highly scalable capability proven to
100K+ tasks– Regionally refined grids– Data assimilation with DART
Parallel IO in CESM Jim [email protected]
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
Workshop on Scalable IO in Climate Models 27/02/2012
Prior to PIO• Each model component was
independent with it’s own IO interface
• Mix of file formats – NetCDF– Binary (POSIX)– Binary (Fortran)
• Gather-Scatter method to interface serial IO
Parallel IO in CESM Jim [email protected]
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
Workshop on Scalable IO in Climate Models 27/02/2012
Steps toward PIO• Converge on a single file format
– NetCDF selected • Self describing• Lossless with lossy capability (netcdf4
only)• Works with the current postprocessing tool
chain
Parallel IO in CESM Jim [email protected]
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
Workshop on Scalable IO in Climate Models 27/02/2012
• Extension to parallel• Reduce single task memory profile• Maintain single file decomposition
independent format• Performance (secondary issue)
Parallel IO in CESM Jim [email protected]
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
Workshop on Scalable IO in Climate Models 27/02/2012
• Parallel IO from all compute tasks is not the best strategy– Data rearrangement is complicated
leading to numerous small and inefficient IO operations
– MPI-IO aggregation alone cannot overcome this problem
Parallel IO in CESM Jim [email protected]
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
Workshop on Scalable IO in Climate Models 27/02/2012
Parallel I/O library (PIO)
• Goals:– Reduce per MPI task memory usage– Easy to use– Improve performance
• Write/read a single file from parallel application
• Multiple backend libraries: MPI-IO,NetCDF3, NetCDF4, pNetCDF, NetCDF+VDC
• Meta-IO library: potential interface to other general libraries
Parallel IO in CESM Jim [email protected]
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
Workshop on Scalable IO in Climate Models 27/02/2012
CPL7 COUPLER
CISL LAND ICE MODEL
CAM ATMOSPHERIC
MODEL
CLM LAND MODEL
POP2 OCEAN MODEL
CICE OCEAN ICE MODEL
PIO
netcdf3pnetcdf
netcdf4
HDF5
VDC
MPI-IO
Parallel IO in CESM Jim [email protected]
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
Workshop on Scalable IO in Climate Models 27/02/2012
• Separation of Concerns• Separate computational and I/O
decomposition• Flexible user-level rearrangement• Encapsulate expert knowledge
PIO design principles
Parallel IO in CESM Jim [email protected]
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
Workshop on Scalable IO in Climate Models 27/02/2012
• What versus How– Concern of the user:
• What to write/read to/from disk?• eg: “I want to write T,V, PS.”
– Concern of the library developer:• How to efficiently access the disk?• eq: “How do I construct I/O operations so
that write bandwidth is maximized?”
• Improves ease of use• Improves robustness• Enables better reuse
Separation of concerns
Parallel IO in CESM Jim [email protected]
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
Workshop on Scalable IO in Climate Models 27/02/2012
Separate computational and I/O decomposition
computational decomposition
I/O decomposition
Rearrangement between computational and I/Odecompositions
Parallel IO in CESM Jim [email protected]
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
Workshop on Scalable IO in Climate Models 27/02/2012
• A single technical solution is not suitable for the entire user community:– User A: Linux cluster, 32 core job, 200
MB files, NFS file system– User B: Cray XE6, 115,000 core job,
100 GB files, Lustre file system
Different compute environment requires different technical solution!
Flexible user-level rearrangement
Parallel IO in CESM Jim [email protected]
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
Workshop on Scalable IO in Climate Models 27/02/2012
Writing distributed data (I)
+ Maximize size of individual io-op’s to disk- Non-scalable user space buffering- Very large fan-in large MPI buffer allocations
Correct solution for User A
Computational decompositionI/O decomposition
Rearrangement
Parallel IO in CESM Jim [email protected]
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
Workshop on Scalable IO in Climate Models 27/02/2012
Writing distributed data (II)
+ Scalable user space memory + Relatively large individual io-op’s to disk- Very large fan-in large MPI buffer allocations
Computational decomposition
Rearrangement
I/O decomposition
Parallel IO in CESM Jim [email protected]
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
Workshop on Scalable IO in Climate Models 27/02/2012
Writing distributed data (III)
+ Scalable user space memory+ Smaller fan-in -> modest MPI buffer allocations- Smaller individual io-op’s to disk
Correct solution for User B
Computational decomposition
Rearrangement
I/O decomposition
Parallel IO in CESM Jim [email protected]
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
Workshop on Scalable IO in Climate Models 27/02/2012
• Flow-control algorithm• Match size of I/O operations to stripe size
– Cray XT5/XE6 + Lustre file system– Minimize message passing traffic at
MPI-IO layer• Load balance disk traffic over all I/O nodes
– IBM Blue Gene/{L,P}+ GPFS file system
– Utilizes Blue Gene specific topology information
Encapsulate Expert knowledge
Parallel IO in CESM Jim [email protected]
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
Workshop on Scalable IO in Climate Models 27/02/2012
• Did we achieve our design goals?• Impact of PIO features
– Flow-control– Vary number of IO-tasks– Different general I/O backends
• Read/write 3D POP sized variable [3600x2400x40]
• 10 files, 10 variables per file, [max bandwidth]
• Using Kraken (Cray XT5) + Lustre filesystem– Used 16 of 336 OST
Experimental setup
Parallel IO in CESM Jim [email protected]
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
Workshop on Scalable IO in Climate Models 27/02/2012
3D POP arrays [3600x2400x40]
Parallel IO in CESM Jim [email protected]
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
Workshop on Scalable IO in Climate Models 27/02/2012
3D POP arrays [3600x2400x40]
Parallel IO in CESM Jim [email protected]
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
Workshop on Scalable IO in Climate Models 27/02/2012
3D POP arrays [3600x2400x40]
Parallel IO in CESM Jim [email protected]
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
Workshop on Scalable IO in Climate Models 27/02/2012
3D POP arrays [3600x2400x40]
Parallel IO in CESM Jim [email protected]
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
Workshop on Scalable IO in Climate Models 27/02/2012
3D POP arrays [3600x2400x40]
Parallel IO in CESM Jim [email protected]
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
Workshop on Scalable IO in Climate Models 27/02/2012
PIOVDCParallel output to a VAPOR Data Collection (VDC)
• VDC:– A wavelet-based, gridded data format supporting both progressive
access and efficient data subsetting
• Data may be progressively accessed (read back) at different levels of detail, permitting the application to trade off speed and accuracy
– Think GoogleEarth: less detail when the viewer is far away, progressively more detail as the viewer zooms in
– Enables rapid (interactive) exploration and hypothesis testing that can subsequently be validated with full fidelity data as needed
• Subsetting– Arrays are decomposed into smaller blocks that significantly improve
extraction of arbitrarily oriented sub arrays
• Wavelet transform– Similar to Fourier transforms– Computationally efficient: order O(n)– Basis for many multimedia compression technologies (e.g. mpeg4,
jpeg2000)
Parallel IO in CESM Jim [email protected]
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
Workshop on Scalable IO in Climate Models 27/02/2012
Other PIO Users
• Earth System Modeling Framework (ESMF)
• Model for Prediction Across Scales (MPAS)
• Geophysical High Order Suite for Turbulence (GHOST)
• Data Assimilation Research Testbed (DART)
Parallel IO in CESM Jim [email protected]
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
Workshop on Scalable IO in Climate Models 27/02/2012
Penn State University
26
Write performance on BG/L
April 26, 2010
Parallel IO in CESM Jim [email protected]
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
Workshop on Scalable IO in Climate Models 27/02/2012
Penn State University
27
Read performance on BG/L
April 26, 2010
Parallel IO in CESM Jim [email protected]
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
Workshop on Scalable IO in Climate Models 27/02/2012
100:1 Compression with coefficient prioritization10243 Taylor-Green turbulence (enstrophy field) [P. Mininni, 2006]
No compression Coefficient prioritization (VDC2)
Parallel IO in CESM Jim [email protected]
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
Workshop on Scalable IO in Climate Models 27/02/2012
40963 Homogenous turbulence simulation Volume rendering of original enstrophy field and 800:1 compressed field
Data provided by P.K. Yeung at Georgia Tech and Diego Donzis at Texas A&M
Original: 275GBs/field 800:1 compressed: 0.34GBs/field
Parallel IO in CESM Jim [email protected]
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
Workshop on Scalable IO in Climate Models 27/02/2012
F90 code generation
interface PIO_write_darray
! TYPE real,int
! DIMS 1,2,3
module procedure write_darray_{DIMS}d_{TYPE}
end interface
genf90.pl
Parallel IO in CESM Jim [email protected]
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
Workshop on Scalable IO in Climate Models 27/02/2012
# 1 "tmp.F90.in"
interface PIO_write_darray
module procedure dosomething_1d_real
module procedure dosomething_2d_real
module procedure dosomething_3d_real
module procedure dosomething_1d_int
module procedure dosomething_2d_int
module procedure dosomething_3d_int
end interface
Parallel IO in CESM Jim [email protected]
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
Workshop on Scalable IO in Climate Models 27/02/2012
• PIO is opensource– http://code.google.com/p/parallelio/
Documentation using doxygen• http://web.ncar.teragrid.org/~dennis/pio_do
c/html/
Parallel IO in CESM Jim [email protected]
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
Workshop on Scalable IO in Climate Models 27/02/2012
• Thank you
Parallel IO in CESM Jim [email protected]
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
Workshop on Scalable IO in Climate Models 27/02/2012
• netCDF3– Serial– Easy to implement– Limited flexibility
• HDF5– Serial and Parallel– Very flexible– Difficult to implement– Difficult to achieve good performance
• netCDF4– Serial and Parallel – Based on HDF5– Easy to implement– Limited flexibility– Difficult to achieve good performance
Existing I/O libraries
Parallel IO in CESM Jim [email protected]
NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA
Workshop on Scalable IO in Climate Models 27/02/2012
• Parallel-netCDF– Parallel– Easy to implement– Limited flexibility– Difficult to achieve good performance
• MPI-IO– Parallel – Very difficult to implement– Very flexible– Difficult to achieve good performance
• ADIOS– Serial and parallel– Easy to implement– BP file format
• Easy to achieve good performance
– All other file formats• Difficult to achieve good performance
Existing I/O libraries (con’t)