March 9, 2009 10th International LCI Conference - HDF5 Tutorial 1
Tutorial II: HDF5 and NetCDF-4
10th International LCI Conference
Albert Cheng, Neil Fortner
The HDF Group
Ed Hartnett
Unidata/UCAR
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 2
Outline
8:30 – 9:30
Introduction to HDF5 data, programming models and tools
9:30 – 10:00
Advanced features of the HDF5 library
10:30 – 11:30
Advanced features of the HDF5 library (continued)
11:30 – 12:00
Introduction to Parallel HDF5
1:00 – 2:30
Introduction to Parallel HDF5 (continued) and Parallel I/O
Performance Study
3:00 – 4:30
NetCDF-4
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 3
Introduction to HDF5 Data, Programming Models
and Tools
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 4
What is HDF?
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 5
HDF is…
• HDF stands for Hierarchical Data Format• A file format for managing any kind of data• Software system to manage data in the format
• Designed for high volume or complex data• Designed for every size and type of system• Open format and software library, tools
• There are two HDF’s: HDF4 and HDF5• Today we focus on HDF5
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 6
Brief History of HDF
1987 At NCSA (University of Illinois), a task force formed to create an architecture-independent format and library:
AEHOO (All Encompassing Hierarchical Object Oriented format) Became HDF
Early NASA adopted HDF for Earth Observing System project 1990’s
1996 DOE’s ASC (Advanced Simulation and Computing) Project began collaborating with the HDF group (NCSA) to create “Big HDF” (Increase in computing power of DOE systems at LLNL, LANL and Sandia National labs, required bigger, more complex data files).
“Big HDF” became HDF5. 1998 HDF5 was released with support from National Labs, NASA,
NCSA
2006 The HDF Group spun off from University of Illinois as non-profit corporation
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 7• 7
Why HDF5?
In one sentence ...
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 8• 8
• Matter and the universe
• Weather and climate
• August 24, 2001 • August 24, 2002
• Total Column Ozone (Dobson)
• 60 385 610
• Life and nature
Answering big questions …
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 9• 9
… involves big data …
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 10
• LCI Tutorial
• 10
… varied data …
• Thanks to Mark Miller, LLNL
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 11• 11
• Contig Summaries
• Discrepancies
• Contig Qualities
• Coverage Depth
• Read quality
• Aligned bases
• Contig
• Reads
• Percent match
• Trace
• SNP Score
… and complex relationships …
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 12• 12
… on big computers …
• … and small computers …
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 13• 13
How do we…
• Describe our data? • Read it? Store it? Find it? Share it? Mine it? • Move it into, out of, and between computers and
repositories?• Achieve storage and I/O efficiency?• Give applications and tools easy access our data?
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 14
Solution: HDF5!
• Can store all kinds of data in a variety of ways
• Runs on most systems
• Lots of tools to access data
• Emphasis on standards (HDF-EOS, CGNS)
• Library and format emphasis on I/O efficiency and storage
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 15
A single platform with multiple uses• One general format• One library, with
• Options to adapt I/O and storage to data needs• Layers on top and below
• Ability to interact well with other technologies
• Attention to past, present, future compatibility
HDF5 Philosophy
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 16
Who uses HDF5?
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 17
Who uses HDF5?
• Applications that deal with big or complex data• Over 200 different types of apps• 2+million product users world-wide• Academia, government agencies, industry
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 18
NASA EOS remote sense data
• HDF format is the standard file format for storing data from NASA's Earth Observing System (EOS) mission.
• Petabytes of data stored in HDF and HDF5 to support the Global Climate Change Research Program.
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 19
• File or other “storage”
• Virtual file I/O
• Library internals
Structure of HDF5 Library
• Object API (C, F90, C++, Java)
• Applications
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 20• 20
HDF Tools
- HDFView and Java Products
- Command-line utilities (h5dump, h5ls, h5cc, h5diff, h5repack)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 21
HDF5 Applications & Domains
• Simulation, visualization, • remote sensing…
• Examples: Thermonuclear simulations• Product modeling• Data mining tools• Visualization tools
• Climate models
• HDF-EOS CGNS ASC
• Storage
• File on parallel• file system
• File• Split metadata
• and raw data files
• User-defined• device
•?• HDF5 • format
• HDF5 Data Model & API• Stdio • Custom• Split Files • MPI
I/O
• Communities
• Virtual File Layer• (I/O Drivers)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 22
HDF5The Format
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 23
An HDF5 “file” is a container…
• lat | lon | temp• ----|-----|-----• 12 | 23 | 3.1• 15 | 24 | 4.2• 17 | 21 | 3.6
• palette
• …into which you can put your data objects
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 24
Structures to organize objects
• palette
• Raster image
• 3-D array
• 2-D array
• Raster image
• lat | lon | temp• ----|-----|-----• 12 | 23 | 3.1• 15 | 24 | 4.2• 17 | 21 | 3.6
• Table
• “/” (root)
• “/foo”
• “Groups”
• “Datasets”
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 25
HDF5 model
• Groups – provide structure among objects• Datasets – where the primary data goes
• Data arrays• Rich set of datatype options• Flexible, efficient storage and I/O
• Attributes, for metadata
•Everything else is built essentially from these parts.
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 26
HDF5The Software
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 27
• HDF5 I/O Library
• Tools, Applications, Libraries
• HDF5 File
HDF5 Software
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 28
• “Virtual file layer” (VFL)
• HDF5 Application • Programming Interface
• Tools & Applications
• “HDF5 File”
• File system, MPI-IO, SAN, other layers
• Modules to adapt I/O to specific features of system, or do I/O in some special way.
• “File” could be on parallel system, in memory, collection of files, etc.
• Applications, tools use this API to create, read, write, query, etc.
• Power users (consumers)
• Most data consumers are here. Scientific/engineering applications.
• Domain-specific libraries/API, tools.
Users of HDF5 Software
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 29
HDF5 Data Model
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 30
HDF5 model (recap)
• Groups – provide structure among objects• Datasets – where the primary data goes
• Data arrays• Rich set of datatype options• Flexible, efficient storage and I/O
• Attributes, for metadata• Other objects
• Links (point to data in a file or in another HDF5 file)• Datatypes (can be stored for complex structures
and reused by multiple datatsets)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 31
HDF5 Dataset
• Data• Metadata• Dataspace
• 3
• Rank
• Dim_2 = 5
• Dim_1 = 4
• Dimensions
• Time = 32.4
• Pressure = 987
• Temp = 56
• Attributes
• Chunked
• Compressed
• Dim_3 = 7
• Storage info
• IEEE 32-bit float
• Datatype
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 32
HDF5 Dataspace
• Two roles• Dataspace contains spatial info about a dataset
stored in a file• Rank and dimensions• Permanent part of dataset
definition
• Dataspace describes application’s data buffer and data elements participating in I/O
• Rank = 2• Dimensions = 4x6
• Rank = 1• Dimensions = 12
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 33
HDF5 Datatype
• Datatype – how to interpret a data element• Permanent part of the dataset definition• Two classes: atomic and compound• Can be stored in a file as an HDF5 object (HDF5
committed datatype)• Can be shared among different datasets
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 34
HDF5 Datatype
• HDF5 atomic types include• normal integer & float• user-definable (e.g., 13-bit integer)• variable length types (e.g., strings)• references to objects/dataset regions• enumeration - names mapped to integers• array
• HDF5 compound types• Comparable to C structs (“records”)• Members can be atomic or compound types
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 35
• Record
• int8 • int4 • int16 • 2x3x2 array of float32• Datatype:
HDF5 dataset: array of records
• Dimensionality: 5 x 3
•3
•5
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 36
Special storage options for dataset
• Better subsetting access time; compressible; extendable;
• chunked
• Improves storage efficiency, transmission speed
• compressed
• Arrays can be extended in any direction
• extendable
• Metadata for Fred
• Dataset “Fred”• File A
• File B
• Data for Fred
• Metadata in HDF5 file, raw data in a binary file
• external
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 37
HDF5 Attribute
• Attribute – data of the form “name = value”, attached to an object by application
• Operations similar to dataset operations, but … • Not extendible • No compression or partial I/O
• Can be overwritten, deleted, added during the “life” of a dataset
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 38
HDF5 Group
• A mechanism for organizing collections of related objects
• Every file starts with a root group
• Similar to UNIXdirectories
• Can have attributes
• “/”
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 39
• “/” •X
• temp
• temp
• / (root)• /X• /Y• /Y/temp• /Y/bar/temp
Path to HDF5 object in a file
• Y
• bar
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 40
Shared HDF5 objects
• /A/P• /B/R• /C/R
• “/”• A • B• C
• P• R • R
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 41
HDF5 Data ModelExample
ENSIGHT
Automotive crash simulation
Automotive crash simulation
March 9, 2009 4210th International LCI Conference - HDF5 Tutorial
Automotive crash simulation
March 9, 2009 4310th International LCI Conference - HDF5 Tutorial
Automotive crash simulation
March 9, 2009 4410th International LCI Conference - HDF5 Tutorial
Solid modeling
March 9, 2009 4510th International LCI Conference - HDF5 Tutorial
Solid modeling
March 9, 2009 4610th International LCI Conference - HDF5 Tutorial
HDF5mesh
March 9, 2009 4710th International LCI Conference - HDF5 Tutorial
• April 28, 2008 • LCI Tutorial • 48
Mesh Example, in HDFView
March 9, 2009 4810th International LCI Conference - HDF5 Tutorial
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 49
HDF5 Software
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 50
• Tools & Applications
• HDF File
• HDF I/O Library
HDF5 software stack
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 51
• Virtual file I/O (C only)· Perform byte-stream I/O operations (open/close, read/write, seek)
· User-implementable I/O (stdio, network, memory, etc.)
• Virtual file I/O (C only)· Perform byte-stream I/O operations (open/close, read/write, seek)
· User-implementable I/O (stdio, network, memory, etc.)
• Library internals• Performs data transformations and other prep for I/O • Configurable transformations (compression, etc.)
• Library internals• Performs data transformations and other prep for I/O • Configurable transformations (compression, etc.)
Structure of HDF5 Library
• Object API (C, Fortran 90, Java, C++)· Specify objects and transformation properties· Invoke data movement operations and data transformations
• Object API (C, Fortran 90, Java, C++)· Specify objects and transformation properties· Invoke data movement operations and data transformations
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 52
Write – from memory to disk
• memory
• disk
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 53
Partial I/O
• (b) Regular series of blocks from a 2D array to a contiguous sequence at a certain offset in a 1D array
• memory
• disk• (a) Hyperslab from a 2D array to the corner of a smaller 2D array
• memory
• disk
• Move just part of a dataset
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 54
• (c) A sequence of points from a 2D array to a sequence of points in a 3D array.
• memory
• disk
• (d) Union of hyperslabs in file to union of hyperslabs in memory.
Partial I/O
• memory
• disk
• Move just part of a dataset
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 55
Layers – parallel example
• Application
• Parallel computing system (Linux cluster)• Comp
utenode
• I/O library (HDF5)
• Parallel I/O library (MPI-I/O)
• Parallel file system (GPFS)
• Switch network/I/O servers
• Compute
node
• Compute
node
• Compute
node
• Disk architecture & layout of data on disk
• I/O flows through many layers from application to disk.
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 56
• Virtual file I/O (C only)• Virtual file I/O (C only)
• Library internals• Library internals
Virtual I/O layer
• Object API (C, Fortran 90, Java, C++)• Object API (C, Fortran 90, Java, C++)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 57
Virtual file I/O layer
• A public API for writing I/O drivers• Allows HDF5 to interface to disk, memory, or a
user-defined device
• ???
• Custom
• File Family • MPI I/O • Core
• Virtual file I/O drivers
• Memory
• Stdio
• File Family
• File
• “Storage”
Applications & Domains
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 58
• Storage
• File on parallel• file system
• File• Split metadata
• and raw data files
• User-defined• device
•?• Syste
m memo
ry
• HDF5 format
• HDF5 data model & API
• Simulation, visualization, • remote sensing…
• Examples: Thermonuclear simulations• Product modeling• Data mining tools• Visualization tools
• Climate models
• Common domain-specific data models
• HDF5 virtual file layer (I/O drivers)
• MPI I/O• Multi• Stdio • Custom • Core
• HDF5 serial &
parallel I/O
• UDM • SAF • H5Part • HDF-EOS• IDL• Domain-
specific APIs • LANL • LLNL, SNL • Grids COTS • NASA
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 59
Portability & Robustness
• Runs almost anywhere• Linux and UNIX workstations• Windows, Mac OS X• Big ASC machines, Crays, VMS systems• TeraGrid and other clusters• Source and binaries available from http://www.hdfgroup.org/HDF5/release/index.html
• QA• Daily regression tests on key platforms• Meets NASA’s highest technology readiness level
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 60
Other Software
• The HDF Group• HDFView• Java tools• Command-line utilities• Web browser plug-in• Regression and performance testing software• Parallel h5diff
• 3rd Party (IDL, MATLAB, Mathematica, PyTables, HDF Explorer, LabView)
• Communities (EOS, ASC, CGNS)• Integration with other software (iRODS, OPeNDAP)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 61
Creating an HDF5 File with HDFView
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 62
• A • B
• “/” (root)
Example: Create this HDF5 File
• 4x6 array of integers
• Storm
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 63
Demo
• Demonstrate the use of HDFView to create the HDF5 file
• Use h5dump to see the contents of the HDF5 file• Use h5import to add data to the HDF5 file• Use h5repack to change properties of the stored
objects• Use h5diff to compare two files
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 64
Introduction to HDF5 Programming Model
and APIs
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 65
• Virtual file I/O API (C only)· Perform byte-stream I/O operations (open/close, read/write, seek)
· User-implementable I/O (stdio, mpi-io, memory, etc.)
• Virtual file I/O API (C only)· Perform byte-stream I/O operations (open/close, read/write, seek)
· User-implementable I/O (stdio, mpi-io, memory, etc.)
• Library internals• Performs data transformations and other prep for I/O • Configurable transformations (compression, etc.)
• Library internals• Performs data transformations and other prep for I/O • Configurable transformations (compression, etc.)
• Object API (C, Fortran 90, Java, C++)· Specify objects and transformation properties· Invoke data movement operations and data transformations
• Object API (C, Fortran 90, Java, C++)· Specify objects and transformation properties· Invoke data movement operations and data transformations
Structure of HDF5 Library (recap)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 66
Goals of HDF5 Library
• Provide flexible API to support a wide range of operations on data.
• Support high performance access in serial and parallel computing environments.
• Be compatible with common data models and programming languages.
• Because of these goals,• the HDF5 API is rich and large
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 67
Operations Supported by the API
• Create groups, datasets, attributes, linkages• Create complex data types• Assign storage and I/O properties to objects• Perform complex subsetting during read/write• Use variety of I/O “devices” (parallel, remote, etc.)• Transform data during I/O• Query about file and structure and properties• Query about object structure, content, properties
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 68
Characteristics of the HDF5 API
• For flexibility, the API is extensive • 300+ functions
• This can be daunting… but there is hope• A few functions can do a lot• Start simple • Build up knowledge as more features are needed
• Library functions are categorized by object type• “H5Lite” API supports basic capabilities
• Victronix Swiss Army Cybertool 34
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 69
The General HDF5 API
• Currently C, Fortran 90, Java, and C++ bindings. • C routines begin with prefix H5?
? is a character corresponding to the type of object the function acts on
• Example APIs:
• H5D : Dataset interface e.g., H5Dread • H5F : File interface e.g., H5Fopen
• H5S : dataSpace interface e.g., H5Sclose
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 70
Compiling HDF5 Applications
• h5cc – HDF5 C compiler command• Similar to mpicc
• h5fc – HDF5 F90 compiler command• Similar to mpif90
• h5c++ – HDF5 C++ compiler command
To compile:% h5cc h5prog.c% h5fc h5prog.f90
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 71
Compile option: -show
-show: displays the compiler commands and options without executing them
• % h5cc –show Sample_c.c• gcc -I/home/packages/hdf5_1.6.6/Linux_2.6/include -UH5_DEBUG_API • -DNDEBUG -I/home/packages/szip/static/encoder/Linux2.6-gcc/include • -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -
D_FILE_OFFSET_BITS=64 • -D_POSIX_SOURCE -D_BSD_SOURCE -std=c99 -Wno-long-long -O • -fomit-frame-pointer -finline-functions -c Sample_c.c
• gcc -std=c99 -Wno-long-long -O -fomit-frame-pointer -finline-functions • -L/home/packages/szip/static/encoder/Linux2.6-gcc/lib Sample_c.o • -L/home/packages/hdf5_1.6.6/Linux_2.6/lib
/home/packages/hdf5_1.6.6/Linux_2.6/lib/libhdf5_hl.a /home/packages/hdf5_1.6.6/Linux_2.6/lib/libhdf5.a
• -lsz -lz -lm -Wl,-rpath -Wl,/home/packages/hdf5_1.6.6/Linux_2.6/lib
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 72
General Programming Paradigm
• Properties of object are optionally defined • Creation properties• Access property lists• Default values used if none are defined
• Object is opened or created• Object is accessed, possibly many times• Object is closed
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 73
Order of Operations
• An order is imposed on operations by argument dependencies
For Example:
A file must be opened before a dataset -because-
the dataset open call requires a file handle as an argument.
• Objects can be closed in any order.
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 74
HDF5 Defined Types
• For portability, the HDF5 library has its own defined types:
• hid_t: object identifiers (native integer)• hsize_t: size used for dimensions• (unsigned long or unsigned long long)• hssize_t: for specifying coordinates and sometimes for• dimensions (signed long or signed long long)• herr_t: function return value
• hvl_t: variable length datatype
• For C, include hdf5.h in your HDF5 application.
Example: Create this HDF5 File
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 75
• A • B
• “/” (root)
• 4x6 array of integers
Example: Step by Step
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 76
• A
• 4x6 array of integers
• B
• “/” (root)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 77
• “/” (root)
Example: Create a File
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 78
Steps to Create a File
1. Decide any special properties the file should have • Creation properties, like size of user block• Access properties, such as metadata cache size
2. Create property lists, if necessary
3. Create the file
4. Close the file and the property lists, as needed
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 79
Code: Create a File
• • hid_t file_id; • • file_id = H5Fcreate("file.h5",H5F_ACC_TRUNC, • H5P_DEFAULT,H5P_DEFAULT);
•
• H5F_ACC_TRUNC flag – removes existing file• H5P_DEFAULT flags – create regular UNIX file and access• it with HDF5 SEC2 I/O file driver
Example: Add a Dataset
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 80
• 4x6 array of integers
• A• “/” (root)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 81
Dataset Components
• Data• Metadata• Dataspace
• 3
• Rank
• Dim_2 = 5
• Dim_1 = 4
• Dimensions
• Time = 32.4
• Pressure = 987
• Temp = 56
• Attributes
• Chunked
• Compressed
• Dim_3 = 7
• Storage info
• IEEE 32-bit float
• Datatype
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 82
Dataset Creation Property List
• Dataset creation property list: • information on how to store
data in a file
• Chunked
• Chunked & compressed
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 83
Steps to Create a Dataset
1. Define dataset characteristics• Dataspace – 4x6 Datatype – integer• Properties (if needed)
2. Decide where to put it – “root group”• Obtain location identifier
3. Decide link or path – “A”4. Create link and dataset in file5. (Eventually) Close everything • A
• “/” (root)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 84
• 1 hid_t file_id, dataset_id, dataspace_id; • 2 hsize_t dims[2];• 3 herr_t status; • • 4 file_id = H5Fcreate (”file.h5", H5F_ACC_TRUNC, • H5P_DEFAULT, H5P_DEFAULT); • • 5 dims[0] = 4;• 6 dims[1] = 6;• 7 dataspace_id = H5Screate_simple (2, dims, NULL);
•
• 8 dataset_id = H5Dcreate(file_id,”A",H5T_STD_I32BE,
• dataspace_id, H5P_DEFAULT);
•
•
• 9 status = H5Dclose (dataset_id); • 10 status = H5Sclose (dataspace_id); • 11 status = H5Fclose (file_id);
Code: Create a Dataset
• Terminate access to dataset, dataspace, file
• Create a dataspace • ra
nk
• current dims
• Create a dataset
• dataspace
• datatype
• property list (default)
• pathname
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 85
• A • B
• “/” (root)
Example: Create a Group
• 4x6 array of integers
• file.h5
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 86
Steps to Create a Group
1. Decide where to put it – “root group”• Obtain location identifier
2. Decide link or path – “B”3. Create link and group in file
• Specify number of bytes to store names of objects to be added to group (as a hint) – or use default.
4. (Eventually) Close the group.
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 87
Code: Create a Group
• hid_t file_id, group_id; • ...• /* Open “file.h5” */ • file_id = H5Fopen(“file.h5”, H5F_ACC_RDWR,• H5P_DEFAULT);
• /* Create group "/B" in file. */ • group_id = H5Gcreate(file_id, "/B", H5P_DEFAULT,
H5P_DEFAULT);
• /* Close group and file. */ • status = H5Gclose(group_id); • status = H5Fclose(file_id);
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 88
HDF5 Information
HDF Information Centerhttp://www.hdfgroup.org
HDF Help email [email protected]
HDF users mailing [email protected]
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 89
Questions?
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 90
Introduction to HDF5 Command-line Tools
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 91
HDF5 Command-line Tools
• Readers • h5dump, h5diff, h5ls• h5stat, h5check (new in release 1.8)
• Writers• h5import, h5repack, h5repart, h5jam/h5unjam• h5copy, h5mkgrp (new in release 1.8)
• Converters• h4toh5, h5toh4, gif2h5, h52gif
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 92
h5dump
h5dump: exports (dumps) the contents of an HDF5 file Multiple output types
ASCII binary XML
Complete or selected file content Object header information (the structure) Attributes (the metadata) Datasets (the data)
All dataset values Subsets of dataset values
Properties (filters, storage layout, fill value) Specific objects: groups/ datasets/ attributes / named datatypes /
soft links h5dump –help
Lists all option flags
Example: h5dump
HDF5 "Sample.h5" {GROUP "/" { GROUP "Floats" { DATASET "FloatArray" { DATATYPE H5T_IEEE_F32LE DATASPACE SIMPLE { ( 4, 3 ) / ( 4, 3 ) } DATA { (0,0): 0.01, 0.02, 0.03, (1,0): 0.1, 0.2, 0.3, (2,0): 1, 2, 3, (3,0): 10, 20, 30 } } } DATASET "IntArray" { DATATYPE H5T_STD_I32LE DATASPACE SIMPLE { ( 5, 6 ) / ( 5, 6 ) } DATA { (0,0): 0, 1, 2, 3, 4, 5, (1,0): 10, 11, 12, 13, 14, 15, (2,0): 20, 21, 22, 23, 24, 25, (3,0): 30, 31, 32, 33, 34, 35, (4,0): 40, 41, 42, 43, 44, 45 } }}}
No options: “All” contents to standard out
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 93
• % h5dump Sample.h5
h5dump - object header information
HDF5 "Sample.h5" {
GROUP "/" {
GROUP "Floats" {
DATASET "FloatArray" {
DATATYPE H5T_IEEE_F32LE
DATASPACE SIMPLE { ( 4, 3 ) / ( 4, 3 ) }
}
}
DATASET "IntArray" {
DATATYPE H5T_STD_I32LE
DATASPACE SIMPLE { ( 5, 6 ) / ( 5, 6 ) }
}
}
}
-H option: Object header information
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 94
• % h5dump –H Sample.h5
h5dump – specific dataset
HDF5 "Sample.h5" {
DATASET "/Floats/FloatArray" {
DATATYPE H5T_IEEE_F32LE
DATASPACE SIMPLE { ( 4, 3 ) / ( 4, 3 ) }
DATA {
(0,0): 0.01, 0.02, 0.03,
(1,0): 0.1, 0.2, 0.3,
(2,0): 1, 2, 3,
(3,0): 10, 20, 30
}
}
-d dataset option: Specific dataset
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 95
• % h5dump –d /Floats/FloatArray Sample.h5
h5dump – dataset values to file
HDF5 "Sample.h5" {
DATASET "/IntArray" {
DATATYPE H5T_STD_I32LE
DATASPACE SIMPLE { ( 5, 6 ) / ( 5, 6 ) }
DATA {
}
}
}
-o file option: Dataset values output to file
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 96
• % h5dump –o Ofile –d /IntArray Sample.h5
• (0,0): 0, 1, 2, 3, 4, 5,• (1,0): 10, 11, 12, 13, 14, 15,• (2,0): 20, 21, 22, 23, 24, 25,• (3,0): 30, 31, 32, 33, 34, 35,• (4,0): 40, 41, 42, 43, 44, 45
• % cat Ofile
• -y option: Do not output array indices with data values
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 97
h5dump – binary output
-b FORMAT option: Binary output, FORMAT can be: MEMORY
Data exported with datatypes matching memory on system where h5dump is run.
FILE Data exported with datatypes matching those in HDF5 file
being dumped.LE
Data exported with pre-defined little endian datatype.BE
Data exported with pre-defined big endian datatype.
• Typically used with –d dataset -o outputFile options Allows data values to be exported for use with other applications. When –b and –d used together, array indices are not output.
h5dump – binary output
0000000 000 000 000 000 000 000 000 001 000 000 000 002 000 000 000 003
0000020 000 000 000 004 000 000 000 005 000 000 000 012 000 000 000 013
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 98
• % h5dump –b BE –d /IntArray -o OBE Sample.h5
• % od –b OBE | head -2
• % h5dump –b LE –d /IntArray -o OLE Sample.h5
• % od –b OLE | head -2• 0000000 000 000 000 000 001 000 000 000 002 000 000 000 003 000 000 000• 0000020 004 000 000 000 005 000 000 000 012 000 000 000 013 000 000 000
• % h5dump –b MEMORY –d /IntArray -o OME Sample.h5
• % od –b OME | head -2• 0000000 000 000 000 000 001 000 000 000 002 000 000 000 003 000 000 000• 0000020 004 000 000 000 005 000 000 000 012 000 000 000 013 000 000 000
h5dump – properties information
HDF5 "Sample.h5" {GROUP "/" { GROUP "Floats" { DATASET "FloatArray" { DATATYPE H5T_IEEE_F32LE DATASPACE SIMPLE { ( 4, 3 ) / ( 4, 3 ) } STORAGE_LAYOUT { CONTIGUOUS SIZE 48 OFFSET 3696 } FILTERS { NONE } FILLVALUE { FILL_TIME H5D_FILL_TIME_IFSET VALUE 0 } ALLOCATION_TIME { H5D_ALLOC_TIME_LATE } …
-p option: Print dataset filters, storage layout, fill value
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 99
• % h5dump –p –H Sample.h5
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 100
h5import
h5import: loads data into an existing or new HDF5 file• Data loaded from ASCII or binary files• Each file corresponds to data values for one dataset• Integer (signed or unsigned) and float data can be loaded• Per-dataset settable properties include:
• datatype (int or float; size; architecture; byte order)• storage (compression, chunking, external file, maximum
dimensions) • Properties set via
• command line
% h5import in in_opts [in2 in2_opts] –o out• configuration file
% h5import in –c conf1 [in2 –c conf2] –o out
Example: h5import
PATH /Floats/FloatArrayINPUT-CLASS TEXTFPRANK 2DIMENSION-SIZES 4 3
Create Sample2.h5 based on Sample.h5
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 101
• % cat config.FloatArray• 0.01 0.02 0.03• 0.1 0.2 0.3• 1 2 3• 10 20 30
• % cat in.FloatArray
• HDF5 "Sample.h5" {• DATASET “/Float/FloatArray" {• DATATYPE H5T_IEEE_F32LE• DATASPACE SIMPLE { ( 4, 3 ) / ( 4, 3 ) }• DATA {• 0.01, 0.02, 0.03,• 0.1, 0.2, 0.3,
• 1, 2, 3,• 10, 20, 30• }• }• }
• % h5dump –d Floats/FloatArray –y Sample.h5
Example: h5import
PATH /IntArrayINPUT-CLASS TEXTINRANK 2DIMENSION-SIZES 5 6
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 102
• % cat config.IntArray
• 0 1 2 3 4 5• 10 11 12 13 14 15• 20 21 22 23 24 25• 30 31 32 38 34 35• 40 41 42 43 44 45
• % cat in.IntArray
Input and configuration files ready; issue command
• % h5import in.FloatArray -c config.FloatArray \• in.IntArray -c config.IntArray -o Sample2.h5
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 103
h5mkgrp
h5mkgrp: makes groups in an HDF5 file.
Usage: h5mkgrp [OPTIONS] FILE GROUP... OPTIONS
-h, --help Print a usage message and exit
-l, --latest Use latest version of file format to create groups
-p, --parents No error if existing, make parent groups as needed
-v, --verbose Print information about OBJECTS and OPTIONS
-V, --version Print version number and exit
Example:
% h5mkgrp Sample2.h5 /EmptyGroup
Introduced in HDF5 release 1.8.0.
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 104
h5diff
h5diff: compares HDF5 files and reports differences• compare two HDF5 files
% h5diff file1 file2• compare same object in two files
% h5diff file1 file2 object• compare different objects in two files
% h5diff file1 file2 object1 object2
Option flags:none: report number of differences found in objects and
where they occurred
-r: in addition, report the differences
-v: in addition, print list of object(s) and warnings; typically used when comparing two files without specifying object(s)
Example: h5diff
file1 file2--------------------------------------- x x / x /EmptyGroup x x /Floats x x /Floats/FloatArray x x /IntArray
group : </> and </>0 differences foundgroup : </Floats> and </Floats>0 differences founddataset: </Floats/FloatArray> and </Floats/FloatArray>0 differences founddataset: </IntArray> and </IntArray>size: [5x6] [5x6]position IntArray IntArray difference -------------------------------------------------------------------[ 3 3 ] 33 38 5
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 105
• % h5diff –v Sample.h5 Sample2.h5
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 106
h5repack
h5repack: copies an HDF5 file to a new file with specified filter and storage layout
• Removes unused space introduced when… Objects were deleted Compressed datasets were updated and no longer fit in
original space Full space allocated for variable-length data not used
• Optionally applies filter to datasets gzip, szip, shuffle, checksum
• Optionally applies storage layout to datasets Continuous, chunking, compact
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 107
h5repack: filters
Compression will not be performed if data is smaller than 1K unless –m flag is used.
• -f FILTER option: Apply filter, FILTER can be:
• GZIP to apply GZIP compression
• SZIP to apply SZIP compression
• SHUF to apply the HDF5 shuffle filter
• FLET to apply the HDF5 checksum filter
• NBIT to apply NBIT compression
• SOFF to apply the HDF5 Scale/Offset filter
• NONE to remove all filters
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 108
h5repack: storage layout
• -f LAYOUT option: Apply layout, LAYOUT can
be:
• CHUNK to apply chunking layout
• COMPA to apply compact layout
• CONTI to apply continuous layout
33% reduction in file size
Example: h5repack (filter)
75608 TES-Aura.he5 56808 TES-rp.he5
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 109
• % h5repack –f SHUF –f GZIP=1 TES-Aura.he5 \• TES-rp.he5
• % ls –sk TES-Aura.he5 TES-rp.he5
• Tropspheric Emission Spectrometer on Aura, the third of NASA's Earth Observing System's spacecrafts.
• Makes global 3-d measurements of ozone and other chemical species involved in its formation and destruction.
Example: h5repack (layout)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 110
• % h5repack –m 1 –l Floats/FloatArray:CHUNK=4x1 \
• Sample.h5 Sample-rp.h5HDF5 "Sample-rp.h5" {GROUP "/" { GROUP "Floats" { DATASET "FloatArray" { DATATYPE H5T_IEEE_F32LE DATASPACE SIMPLE { ( 4, 3 ) / ( 4, 3 ) } STORAGE_LAYOUT { CHUNKED ( 4, 1 ) SIZE 48
} FILTERS { NONE } FILLVALUE { FILL_TIME H5D_FILL_TIME_IFSET VALUE 0 } ALLOCATION_TIME { H5D_ALLOC_TIME_INCR } …
• % h5dump –p –H Sample-rp.h5
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 111
Performance Tuning & Troubleshooting
• HDF5 tools can assist with performance tuning and troubleshootingDiscover objects and their properties in HDF5 files
h5dump -p Get file size overhead information
h5statFind locations of objects in a file
h5lsDiscover differences
h5diff, h5lsLocation of raw data
h5ls –varDoes file conform to HDF5 File Format Specification?
h5check
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 112
h5stat
h5stat: Prints statistics about HDF5 files
• Reports two types of statistics: High-level information about objects:
Number of different objects (groups, datasets, datatypes) Number of unique datatypes Size of raw data
Information about object’s structural metadata Size of structural metadata (total/free)
• Object header, local and global heaps• Size of B-trees
Object header fragmentation
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 113
h5stat
• Helps… troubleshoot size overhead in HDF5 files choose appropriate properties and storage strategies
• Usage:% h5stat –help % h5stat file.h5
• Full specification at : http://www.hdfgroup.uiuc.edu/RFC/HDF5/h5stat/
Introduced in HDF5 release 1.8.0.
h5check
• Verifies that a file is encoded according to the HDF5 File Format Specificationhttp://www.hdfgroup.org/HDF5/doc/H5.format.html
• Does not use the HDF5 library• Used to confirm that the files written by the HDF5
library are compliant with the specification• Tool is not part of the HDF5 source code
distributionftp://ftp.hdfgroup.org/HDF5/special_tools/h5check/
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 114
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 115
Questions?
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 116
HDF5 Advanced Topics
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 117
Outline
• Part I• Overview of HDF5 datatypes
• Part II• Partial I/O in HDF5
• Hyperslab selection• Dataset region references
• Chunking and compression• Part III
• Performance issues (how to do it right)• Part IV
• Performance benefits of HDF5 version 1.8
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 118
Part IHDF5 Datatypes
Quick overview of the most difficult topics
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 119
HDF5 Datatypes
• HDF5 has a rich set of pre-defined datatypes and supports the creation of an unlimited variety of complex user-defined datatypes.
• Datatype definitions are stored in the HDF5 file with the data.
• Datatype definitions include information such as byte order (endianess), size, and floating point representation to fully describe how the data is stored and to insure portability across platforms.
• Datatype definitions can be shared among objects in an HDF file, providing a powerful and efficient mechanism for describing data.
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 120
Example
• Array of integers on IA32 platform• Native integer is little-endian, 4 bytes
•
• H5T_SDT_I32LE
• H5Dwrite
• Array of integers on SPARC64 platform• Native integer is big-endian, 8 bytes•
• H5T_NATIVE_INT • H5T_NATIVE_INT
• H5Dread
• Little-endian 4 bytes integer
• VAX G-floating
• H5Dwrite
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 121
Storing Variable Length Data in HDF5
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 122
• Data
• Time
• Data
• Data
• Data
• Data
• Data
• Data
• Data
• Data
• Time
HDF5 Fixed and Variable Length Array Storage
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 123
Storing Strings in HDF5
• Array of characters (Array datatype or extra dimension in dataset)• Quick access to each character• Extra work to access and interpret each string
• Fixed lengthstring_id = H5Tcopy(H5T_C_S1);H5Tset_size(string_id, size);
• Wasted space in shorter strings• Can be compressed
• Variable lengthstring_id = H5Tcopy(H5T_C_S1);H5Tset_size(string_id, H5T_VARIABLE);
• Overhead as for all VL datatypes• Compression will not be applied to actual data
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 124
Storing Variable Length Data in HDF5
• Each element is represented by C structure typedef struct { size_t length; void *p;} hvl_t;
• Base type can be any HDF5 typeH5Tvlen_create(base_type)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 125
• Data
• Data
• Data
• Data
• Data
Example• hvl_t data[LENGTH];
• for(i=0; i<LENGTH; i++) { data[i].p=malloc((i+1)*sizeof(unsigned int)); data[i].len=i+1;
• }
• tvl = H5Tvlen_create (H5T_NATIVE_UINT);
• data[0].p
• data[4].len
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 126
Reading HDF5 Variable Length Array
• hvl_t rdata[LENGTH];• /* Create the memory vlen type */• tvl = H5Tvlen_create (H5T_NATIVE_UINT);• ret = H5Dread(dataset,tvl,H5S_ALL,H5S_ALL,• H5P_DEFAULT, rdata); • /* Reclaim the read VL data */• H5Dvlen_reclaim(tvl,H5S_ALL,H5P_DEFAULT,rdata);
• On read HDF5 Library allocates memory to read data in, • application only needs to allocate array of hvl_t elements • (pointers and lengths).
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 127
Storing Tables in HDF5 file
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 128
Example
a_name (integer)
b_name (float)
c_name (double)
0 0. 1.0000
1 1. 0.5000
2 4. 0.3333
3 9. 0.2500
4 16. 0.2000
5 25. 0.1667
6 36. 0.1429
7 49. 0.1250
8 64. 0.1111
9 81. 0.1000
Multiple ways to store a table Dataset for each field Dataset with compound datatype If all fields have the same type: 2-dim array 1-dim array of array datatype continued…..Choose to achieve your goal!How much overhead each type of storage will create?Do I always read all fields?Do I need to read some fields more often?Do I want to use compression?Do I want to access some records?
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 129
HDF5 Compound Datatypes
• Compound types• Comparable to C structs • Members can be atomic or compound
types • Members can be multidimensional• Can be written/read by a field or set of
fields• Not all data filters can be applied (shuffling,
SZIP)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 130
HDF5 Compound Datatypes
• Which APIs to use?• H5TB APIs
• Create, read, get info and merge tables• Add, delete, and append records• Insert and delete fields• Limited control over table’s properties (i.e. only GZIP
compression, level 6, default allocation time for table, extendible, etc.)
• PyTables http://www.pytables.org• Based on H5TB• Python interface• Indexing capabilities
• HDF5 APIs • H5Tcreate(H5T_COMPOUND), H5Tinsert calls to create a
compound datatype• H5Dcreate, etc.• See H5Tget_member* functions for discovering properties of the
HDF5 compound datatype
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 131
Creating and Writing Compound Dataset
• h5_compound.c example
• typedef struct s1_t { • int a; • float b; • double c;• } s1_t;
• s1_t s1[LENGTH];
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 132
Creating and Writing Compound Dataset
• /* Create datatype in memory. */
• s1_tid = H5Tcreate (H5T_COMPOUND, sizeof(s1_t)); • H5Tinsert(s1_tid, "a_name", HOFFSET(s1_t, a),• H5T_NATIVE_INT); • H5Tinsert(s1_tid, "c_name", HOFFSET(s1_t, c),• H5T_NATIVE_DOUBLE); • H5Tinsert(s1_tid, "b_name", HOFFSET(s1_t, b),• H5T_NATIVE_FLOAT);
• Note: • Use HOFFSET macro instead of calculating offset by hand.• Order of H5Tinsert calls is not important if HOFFSET is used.
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 133
Creating and Writing Compound Dataset
• /* Create dataset and write data */
• dataset = H5Dcreate(file, DATASETNAME, s1_tid, space,
• H5P_DEFAULT, H5P_DEFAULT);• status = H5Dwrite(dataset, s1_tid, H5S_ALL, H5S_ALL,
• H5P_DEFAULT, s1); • Note: • In this example memory and file datatypes are the same.• Type is not packed. • Use H5Tpack to save space in the file.
• status = H5Tpack(s1_tid);• status = H5Dcreate(file, DATASETNAME, s1_tid, space,• H5P_DEFAULT, H5P_DEFAULT);
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 134
File Content with h5dump
• HDF5 "SDScompound.h5" {• GROUP "/" { • DATASET "ArrayOfStructures" {• DATATYPE { • H5T_STD_I32BE "a_name"; • H5T_IEEE_F32BE "b_name"; • H5T_IEEE_F64BE "c_name"; } • DATASPACE { SIMPLE ( 10 ) / ( 10 ) }
• DATA { • {• [ 0 ],• [ 0 ],• [ 1 ] • }, • { • [ 1 ],• …
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 135
Reading Compound Dataset
• /* Create datatype in memory and read data. */
• dataset = H5Dopen(file, DATASETNAME, H5P_DEFAULT);
• s2_tid = H5Dget_type(dataset);• mem_tid = H5Tget_native_type (s2_tid);• s1 = malloc(H5Tget_size(mem_tid)*number_of_elements);
• status = H5Dread(dataset, mem_tid, H5S_ALL,• H5S_ALL, H5P_DEFAULT, s1);• Note:
• We could construct memory type as we did in writing example.
• For general applications we need to discover the type in the file, find out corresponding memory type, allocate space and do read.
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 136
Reading Compound Dataset by Fields
• typedef struct s2_t { • double c; • int a;• } s2_t; • s2_t s2[LENGTH];• …• s2_tid = H5Tcreate (H5T_COMPOUND, sizeof(s2_t)); • H5Tinsert(s2_tid, "c_name", HOFFSET(s2_t, c),• H5T_NATIVE_DOUBLE); • H5Tinsert(s2_tid, “a_name", HOFFSET(s2_t, a),• H5T_NATIVE_INT);• …• status = H5Dread(dataset, s2_tid, H5S_ALL,• H5S_ALL, H5P_DEFAULT, s2);
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 137
New Way of Creating Datatypes
• Another way to create a compound datatype
• #include H5LTpublic.h• …..
• s2_tid = H5LTtext_to_dtype(• "H5T_COMPOUND • {H5T_NATIVE_DOUBLE \"c_name\"; • H5T_NATIVE_INT \"a_name\";• }", • H5LT_DDL);
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 138
Need Help with Datatypes?
• Check our support web pages
• http://www.hdfgroup.uiuc.edu/UserSupport/examples-by-api/api18-c.html
• http://www.hdfgroup.uiuc.edu/UserSupport/examples-by-api/api16-c.html
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 139
Part IIWorking with subsets
Collect data one way ….
• Array of images (3D)
March 9, 2009 14010th International LCI Conference - HDF5 Tutorial
• Stitched image (2D array)
Display data another way …
March 9, 2009 14110th International LCI Conference - HDF5 Tutorial
Data is too big to read….
March 9, 2009 14210th International LCI Conference - HDF5 Tutorial
• Need to select and access the same • elements of a dataset
Refer to a region…
March 9, 2009 14310th International LCI Conference - HDF5 Tutorial
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 144
HDF5 Library Features
• HDF5 Library provides capabilities to• Describe subsets of data and perform write/read
operations on subsets• Hyperslab selections and partial I/O
• Store descriptions of the data subsets in a file• Object references• Region references
• Use efficient storage mechanism to achieve good performance while writing/reading subsets of data• Chunking, compression
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 145
Partial I/O in HDF5
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 146
How to Describe a Subset in HDF5?
• Before writing and reading a subset of data one has to describe it to the HDF5 Library.
• HDF5 APIs and documentation refer to a subset as a “selection” or “hyperslab selection”.
• If specified, HDF5 Library will perform I/O on a selection only and not on all elements of a dataset.
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 147
Types of Selections in HDF5
• Two types of selections• Hyperslab selection
• Regular hyperslab• Simple hyperslab• Result of set operations on hyperslabs (union,
difference, …) • Point selection
• Hyperslab selection is especially important for doing parallel I/O in HDF5 (See Parallel HDF5 Tutorial)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 148
•Regular Hyperslab
•
• • • • •
• • •
• • •
• • • • •
• Collection of regularly spaced equal size blocks
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 149
•Simple Hyperslab
•
• Contiguous subset or sub-array
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 150
•Hyperslab Selection
• Result of union operation on three simple hyperslabs
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 151
Hyperslab Description
• Start - starting location of a hyperslab (1,1)• Stride - number of elements that separate each
block (3,2)• Count - number of blocks (2,6)• Block - block size (2,1)• Everything is “measured” in number of elements
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 152
Simple Hyperslab Description
• Two ways to describe a simple hyperslab• As several blocks
• Stride – (1,1)• Count – (2,6)• Block – (2,1)
• As one block• Stride – (1,1)• Count – (1,1)• Block – (4,6)
• No performance penalty for • one way or another
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 153
H5Sselect_hyperslab Function
• space_id Identifier of dataspace • op Selection operator
• H5S_SELECT_SET or H5S_SELECT_OR • start Array with starting coordinates of hyperslab • stride Array specifying which positions along a dimension• to select• count Array specifying how many blocks to select from
the • dataspace, in each dimension
• block Array specifying size of element block • (NULL indicates a block size of a single element
in • a dimension)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 154
Reading/Writing Selections
Programming model for reading from a dataset in
a file1. Open a dataset.
2. Get file dataspace handle of the dataset and specify subset to read from.a. H5Dget_space returns file dataspace handle
a. File dataspace describes array stored in a file (number of dimensions and their sizes).
b. H5Sselect_hyperslab selects elements of the array that participate in I/O operation.
3. Allocate data buffer of an appropriate shape and size
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 155
Reading/Writing Selections
Programming model (continued)4. Create a memory dataspace and specify subset to write
to.1. Memory dataspace describes data buffer (its rank and
dimension sizes).
2. Use H5Screate_simple function to create memory dataspace.
3. Use H5Sselect_hyperslab to select elements of the data buffer that participate in I/O operation.
5. Issue H5Dread or H5Dwrite to move the data between file and memory buffer.
6. Close file dataspace and memory dataspace when done.
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 156
Example : Reading Two Rows
1 2 3 4 5 6
7 8 9 10 11 12
13 14 15 16 17 18
19 20 21 22 23 24
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
• Data in a file• 4x6 matrix
• Buffer in memory• 1-dim array of length 14
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 157
Example: Reading Two Rows
1 2 3 4 5 6
7 8 9 10 11 12
13 14 15 16 17 18
19 20 21 22 23 24
• start = {1,0}• count = {2,6}• block = {1,1}• stride = {1,1}
• filespace = H5Dget_space (dataset);• H5Sselect_hyperslab (filespace, H5S_SELECT_SET,• start, NULL, count, NULL)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 158
Example: Reading Two Rows
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
• start[1] = {1}• count[1] = {12}• dim[1] = {14}
• memspace = H5Screate_simple(1, dim, NULL);• H5Sselect_hyperslab (memspace, H5S_SELECT_SET,• start, NULL, count, NULL)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 159
Example: Reading Two Rows
1 2 3 4 5 6
7 8 9 10 11 12
13 14 15 16 17 18
19 20 21 22 23 24
-1 7 8 9 10 11 12 13 14 15 16 17 18 -1
• H5Dread (…, …, memspace, filespace, …, …);
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 160
Things to Remember
• Number of elements selected in a file and in a memory buffer must be the same • H5Sget_select_npoints returns number of
selected elements in a hyperslab selection• HDF5 partial I/O is tuned to move data between
selections that have the same dimensionality; avoid choosing subsets that have different ranks (as in example above)
• Allocate a buffer of an appropriate size when reading data; use H5Tget_native_type and H5Tget_size to get the correct size of the data element in memory.
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 161
HDF5 Region References and Selections
• Need to select and access the same • elements of a dataset
Saving Selected Region in a File
March 9, 2009 16210th International LCI Conference - HDF5 Tutorial
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 163
Reference Datatype
• Reference to an HDF5 object• Pointer to a group or a dataset in a file
• Predefined datatype H5T_STD_REG_OBJ describe object references
• Reference to a dataset region (or to selection)• Pointer to the dataspace selection
• Predefined datatype H5T_STD_REF_DSETREG to describe regions
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 164
Reference to Dataset Region
• REF_REG.h5
• Root
• Region References• Matrix
• 1 1 2 3 3 4 5 5 6
• 1 2 2 3 4 4 5 6 6
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 165
Reference to Dataset Region
Example
dsetr_id = H5Dcreate(file_id, “REGION REFERENCES”, H5T_STD_REF_DSETREG, …);
H5Sselect_hyperslab(space_id, H5S_SELECT_SET, start, NULL, …);H5Rcreate(&ref[0], file_id, “MATRIX”,H5R_DATASET_REGION, space_id);
H5Dwrite(dsetr_id, H5T_STD_REF_DSETREG, H5S_ALL, H5S_ALL, H5P_DEFAULT,ref);
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 166
Reference to Dataset RegionHDF5 "REF_REG.h5" {GROUP "/" { DATASET "MATRIX" { …… } DATASET "REGION_REFERENCES" { DATATYPE H5T_REFERENCE DATASPACE SIMPLE { ( 2 ) / ( 2 ) } DATA { (0): DATASET /MATRIX {(0,3)-(1,5)}, (1): DATASET /MATRIX {(0,0), (1,6), (0,8)} } }}}
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 167
Chunking in HDF5
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 168
HDF5 Chunking
• Dataset data is divided into equally sized blocks (chunks).• Each chunk is stored separately as a contiguous block in
HDF5 file.
• Application memory
• Metadata cache• Dataset header
• ………….• Datatype
• Dataspace• ………….• Attributes
• …
• File
• Dataset data
• A • D• C • B• header• Chunk
index
• Chunkindex
• A • B • C • D
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 169
HDF5 Chunking
• Chunking is needed for• Enabling compression and other filters• Extendible datasets
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 170
HDF5 Chunking
• If used appropriately chunking improves partial I/O for big datasets
• Only two chunks are involved in I/O
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 171
HDF5 Chunking
• Chunk has the same rank as a dataset• Chunk’s dimensions do not need to be factors of
dataset’s dimensions
•
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 172
Creating Chunked Dataset
1. Create a dataset creation property list.2. Set property list to use chunked storage layout.3. Create dataset with the above property list.
dcpl_id = H5Pcreate(H5P_DATASET_CREATE); rank = 2; ch_dims[0] = 100; ch_dims[1] = 100; H5Pset_chunk(dcpl_id, rank, ch_dims); dset_id = H5Dcreate (…, dcpl_id); H5Pclose(dcpl_id);
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 173
Writing or Reading Chunked Dataset
1. Chunking mechanism is transparent to application.
2. Use the same set of operation as for contiguous dataset, for example,
H5Dopen(…); H5Sselect_hyperslab (…); H5Dread(…);
3. Selections do not need to coincide precisely with the chunks boundaries.
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 174
HDF5 Filters
• HDF5 filters modify data during I/O operations• Available filters:
1. Checksum (H5Pset_fletcher32)2. Shuffling filter (H5Pset_shuffle)3. Data transformation (in 1.8.*)4. Compression
• Scale + offset (in 1.8.*)• N-bit (in 1.8.*)• GZIP (deflate), SZIP (H5Pset_deflate, H5Pset_szip)• User-defined filters (BZIP2)
• Example of a user-defined compression filter can be found http://www.hdfgroup.uiuc.edu/papers/papers/bzip2/
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 175
Creating Compressed Dataset
1. Create a dataset creation property list2. Set property list to use chunked storage layout3. Set property list to use filters4. Create dataset with the above property list
crp_id = H5Pcreate(H5P_DATASET_CREATE); rank = 2; ch_dims[0] = 100; ch_dims[1] = 100; H5Pset_chunk(crp_id, rank, ch_dims); H5Pset_deflate(crp_id, 9); dset_id = H5Dcreate (…, crp_id); H5Pclose(crp_id);
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 176
Writing Compressed Dataset
• C • B• A
• …………..
• Default chunk cache size is 1MB. • Filters including compression are applied when chunk is evicted from
cache.• Chunks in the file may have different sizes
• A• B • C
• C
• File
• Chunk cache (per dataset)• Chunked dataset
• Filter pipeline
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 177
Chunking Basics to Remember
• Chunking creates storage overhead in the file.• Performance is affected by
• Chunking and compression parameters • Chunking cache size (H5Pset_cache call)
• Some hints for getting better performance• Use chunk size not smaller than block size (4k) on
a file system.• Use compression method appropriate for your
data.• Avoid using selections that do not coincide with
the chunking boundaries.
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 178
Example
Creates a compressed 1000x20 integer dataset in a file
%h5dump –p –H zip.h5
HDF5 "zip.h5" {GROUP "/" { GROUP "Data" { DATASET "Compressed_Data" { DATATYPE H5T_STD_I32BE DATASPACE SIMPLE { ( 1000, 20 )……… STORAGE_LAYOUT { CHUNKED ( 20, 20 ) SIZE 5316 }
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 179
Example (continued)
FILTERS { COMPRESSION DEFLATE { LEVEL 6 } } FILLVALUE { FILL_TIME H5D_FILL_TIME_IFSET VALUE 0 } ALLOCATION_TIME { H5D_ALLOC_TIME_INCR } } }}}
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 180
Example (bigger chunk)
Creates a compressed integer dataset 1000x20 in afile; better compression ratio is achieved.
h5dump –p –H zip.h5
HDF5 "zip.h5" {GROUP "/" { GROUP "Data" { DATASET "Compressed_Data" { DATATYPE H5T_STD_I32BE DATASPACE SIMPLE { ( 1000, 20 )……… STORAGE_LAYOUT { CHUNKED ( 200, 20 ) SIZE 2936 }
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 181
Part IIIPerformance Issues(How to Do it Right)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 182
Performance of Serial I/O Operations
• Next slides show the performance effects of using different access patterns and storage layouts.
• We use three test cases which consist of writing a selection to an array of characters.
• Data is stored in a row-major order.• Tests were executed on THG Linux x86_64 box
using h5perf_serial and HDF5 version 1.8.0
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 183
Serial Benchmarking Tool
• Benchmarking tool, h5perf_serial, publicly released with HDF5 1.8.1
• Features inlcude:• Support for POSIX and HDF5 I/O calls.• Support for datasets and buffers with multiple
dimensions.• Entire dataset access using a single or several I/O
operations.• Selection of contiguous and chunked storage for HDF5
operations.
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 184
Contiguous Storage (Case 1)
• Rectangular dataset of size 48K x 48K, with write selections of 512 x 48K.
• HDF5 storage layout is contiguous.• Good I/O pattern for POSIX and
HDF5 because each selection is contiguous.
• POSIX: 5.19 MB/s• HDF5: 5.36 MB/s
• 1
• 2
• 3
• 4
• 1 • 2 • 3 • 4
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 185
Contiguous Storage (Case 2)
• Rectangular dataset of 48K x 48K, with write selections of 48K x 512.
• HDF5 storage layout is contiguous.
• Bad I/O pattern for POSIX and HDF5 because each selection is noncontiguous.
• POSIX: 1.24 MB/s• HDF5: 0.05 MB/s
• 1• 2• 3• 4
• 1• 2• 3• 4• 1• 2• 3• 4 • …….
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 186
Chunked Storage
• Rectangular dataset of 48K x 48K, with write selections of 48K x 512.
• HDF5 storage layout is chunked. Chunks and selections sizes are equal.
• Bad I/O case for POSIX because selections are noncontiguous.
• Good I/O case for HDF5 since selections are contiguous due to chunking layout settings.
• POSIX: 1.51 MB/s• HDF5: 5.58 MB/s
• 1• 2• 3• 4
• 1 • 2 • 3 • 4
• 1• 2• 3• 4• 1• 2• 3• 4 • …….
• POSIX
• HDF5
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 187
Conclusions
• Access patterns with small I/O operations incur high latency and overhead costs many times.
• Chunked storage may improve I/O performance by affecting the contiguity of the data selection.
Writing Chunked Dataset
• 1000x100x100 dataset• 4 byte integers• Random values 0-99
• 50x100x100 chunks (20 total)• Chunk size: 2 MB
• Write the entire dataset using 1x100x100 slices• Slices are written sequentially
March 9, 2009 18810th International LCI Conference - HDF5 Tutorial
Test Setup
• 20 Chunks
• 1000 slices• Chunk size is 2MB
March 9, 2009 18910th International LCI Conference - HDF5 Tutorial
Test Setup (continued)
• Tests performed with 1 MB and 5MB chunk cache size• Cache size set with H5Pset_cache function
H5Pget_cache (fapl, NULL, &rdcc_nelmts,
&rdcc_nbytes, &rdcc_w0); H5Pset_cache (fapl, 0, rdcc_nelmts,
5*1024*1024, rdcc_w0);
• Tests performed with no compression and with gzip (deflate) compression
March 9, 2009 19010th International LCI Conference - HDF5 Tutorial
Effect of Chunk Cache Size on Write
Cache size I/O operations Total data written
File size
1 MB (default) 1002 75.54 MB 38.15 MB
5 MB 22 38.16 MB 38.15 MB
• No compression
• Gzip compression
Cache size I/O operations Total data written
File size
1 MB (default) 1982 335.42 MB(322.34 MB read)
13.08 MB
5 MB 22 13.08 MB 13.08 MB
March 9, 2009 19110th International LCI Conference - HDF5 Tutorial
Effect of Chunk Cache Size on Write
• With the 1 MB cache size, a chunk will not fit into the cache• All writes to the dataset must be immediately
written to disk• With compression, the entire chunk must be read
and rewritten every time a part of the chunk is written to• Data must also be decompressed and
recompressed each time• Non sequential writes could result in a larger file
• Without compression, the entire chunk must be written when it is first written to the file• If the selection were not contiguous on disk, it could
require as much as 1 I/O operation for each element
March 9, 2009 19210th International LCI Conference - HDF5 Tutorial
Effect of Chunk Cache Size on Write
• With the 5 MB cache size, the chunk is written only after it is full• Drastically reduces the number of I/O operations• Reduces the amount of data that must be written
(and read)• Reduces processing time, especially with the
compression filter
March 9, 2009 19310th International LCI Conference - HDF5 Tutorial
Conclusion
• It is important to make sure that a chunk will fit into the raw data chunk cache
• If you will be writing to multiple chunks at once, you should increase the cache size even more• Try to design chunk dimensions to minimize the
number you will be writing to at once
March 9, 2009 19410th International LCI Conference - HDF5 Tutorial
Reading Chunked Dataset
• Read the same dataset, again by slices, but the slices cross through all the chunks
• 2 orientations for read plane• Plane includes fastest changing dimension• Plane does not include fastest changing dimension
• Measure total read operations, and total size read• Chunk sizes of 50x100x100, and 10x100x100• 1 MB cache
March 9, 2009 19510th International LCI Conference - HDF5 Tutorial
• Chunks
• Read slices• Vertical and horizontal
Test Setup
March 9, 2009 19610th International LCI Conference - HDF5 Tutorial
Results
• Read slice includes fastest changing dimension
Chunk size Compression I/O operations Total data read
50 Yes 2010 1307 MB
10 Yes 10012 1308 MB
50 No 100010 38 MB
10 No 10012 3814 MB
March 9, 2009 19710th International LCI Conference - HDF5 Tutorial
Results (continued)
• Read slice does not include fastest changing dimension
Chunk size Compression I/O operations Total data read
50 Yes 2010 1307 MB
10 Yes 10012 1308 MB
50 No 10000010 38 MB
10 No 10012 3814 MB
March 9, 2009 19810th International LCI Conference - HDF5 Tutorial
Effect of Cache Size on Read
• When compression is enabled, the library must always read each entire chunk once for each call to H5Dread.
• When compression is disabled, the library’s behavior depends on the cache size relative to the chunk size.• If the chunk fits in cache, the library reads each
entire chunk once for each call to H5Dread• If the chunk does not fit in cache, the library reads
only the data that is selected• More read operations, especially if the read plane
does not include the fastest changing dimension• Less total data read
March 9, 2009 19910th International LCI Conference - HDF5 Tutorial
Conclusion
• In this case cache size does not matter when reading if compression is enabled.
• Without compression, a larger cache may not be beneficial, unless the cache is large enough to hold all of the chunks.• The optimum cache size depends on the exact
shape of the data, as well as the hardware.
March 9, 2009 20010th International LCI Conference - HDF5 Tutorial
Hints for Chunk Settings
• Chunk dimensions should align as closely as possible with hyperslab dimensions for read/write
• Chunk cache size (rdcc_nbytes) should be large enough to hold all the chunks in the selection• If this is not possible, it may be best to disable chunk
caching altogether (set rdcc_nbytes to 0)• rdcc_nelmts should be a prime number that is at
least 10 to 100 times the number of chunks that can fit into rdcc_nbytes
• rdcc_w0 should be set to 1 if chunks that have been fully read/written will never be read/written again
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 201
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 202
Part IVPerformance Benefits of
HDF5 version 1.8
What Did We Do in HDF5 1.8?
• Extended File Format Specification • Reviewed group implementations• Introduced new link object• Revamped metadata cache implementation• Improved handling of datasets and datatypes• Introduced shared object header message• Extended error handling• Enhanced backward/forward APIs and file format
compatibility
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 203
What Did We Do in HDF5 1.8?
And much more good stuff to make HDF5
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 204
•Better and Faster
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 205
HDF5 File Format Extension
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 206
HDF5 File Format Extension
• Why: • Address deficiencies of the original file format• Address space overhead in an HDF5 file• Enable new features
• What: • New routine that instructs the HDF5 library to
create all objects using the latest version of the HDF5 file format (cmp. with the earliest version when object became available, for example, array datatype)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 207
HDF5 File Format Extension
Example
/* Use the latest version of a file format for each object created in a file */
fapl_id = H5Pcreate(H5P_FILE_ACCESS);H5Pset_libver_bounds(fapl_id, H5F_LIBVER_LATEST, H5F_LIBVER_LATEST);fid = H5Fcreate(…,…,…,fapl_id);orfid = H5Fopen(…,…,fapl_id);
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 208
Group Revisions
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 209
Better Large Group Storage
• Why: • Faster, more scalable storage and access for large
groups• What:
• New format and method for storing groups with many links
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 210
Informal Benchmark
• Create a file and a group in a file• Create up to 10^6 groups with one dataset in
each group• Compare files sizes and performance of HDF5
1.8.1 using the latest group format with the performance of HDF5 1.8.1 (default, old format) and 1.6.7
• Note: Default 1.8.1 and 1.6.7 became very slow after 700000 groups
Time to Open and Read a Dataset
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 211
10000 100000 10000000.1
1
10
100
1000
1.61.8 (old groups)1.8 (new groups)
Number of Groups
Tim
e (
mil
lis
ec
on
ds
)
File Size
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 212
0 200000 400000 600000 8000000
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
1.8 (old groups)1.8 (new groups)
Number of Groups
Siz
e (
kil
ob
yte
s)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 213
Questions?
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 214
Data Storage and I/O in HDF5
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 215
Software stack
• Life cycle: What happens to data when it is transferred from application buffer to HDF5 file and from HDF5 file to application buffer?
• File or other “storage”
• Virtual file I/O
• Library internals
• Object API
• Application • Data buffer
• H5Dwrite
• ?
• Unbuffered I/O
• Data in a file
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 216
Goals
• Understanding of what is happening to data inside the HDF5 library will help to write efficient applications
• Goals of this talk:• Describe some basic operations and data
structures, and explain how they affect performance and storage sizes
• Give some “recipes” for how to improve performance
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 217
Topics
• Dataset metadata and array data storage layouts• Types of dataset storage layouts• Factors affecting I/O performance
• I/O with compact datasets• I/O with contiguous datasets• I/O with chunked datasets• Variable length data and I/O
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 218
HDF5 dataset metadata and array data storage
layouts
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 219
HDF5 Dataset
• Data array• Ordered collection of identically typed data items
distinguished by their indices
• Metadata• Dataspace: Rank, dimensions of dataset array• Datatype: Information on how to interpret data• Storage Properties: How array is organized on
disk• Attributes: User-defined metadata (optional)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 220
HDF5 Dataset
• Dataset data• Metadata• Dataspace
• 3
• Rank
• Dim_2 = 5
• Dim_1 = 4
• Dimensions
• Time = 32.4
• Pressure = 987
• Temp = 56
• Attributes
• Chunked
• Compressed
• Dim_3 = 7
• Storage info
• IEEE 32-bit float
• Datatype
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 221
Metadata cache and dataset data
• Dataset data typically kept in application memory• Dataset header in separate space – metadata cache
• Application memory
• Metadata cache
• File •
• Dataset data
• Dataset header • Dataset data
• Dataset header• ………….• Datatype
• Dataspace• ………….• Attributes
• …
•
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 222
Metadata and metadata cache
• HDF5 metadata• Information about HDF5 objects used by the library• Examples: object headers, B-tree nodes for group,
B-Tree nodes for chunks, heaps, super-block, etc. • Usually small compared to raw data sizes (KB vs.
MB-GB)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 223
Metadata and metadata cache
• Metadata cache• Space allocated to handle pieces of the HDF5
metadata • Allocated by the HDF5 library in application’s
memory space• Cache behavior affects overall performance• Metadata cache implementation prior to HDF5
1.6.5 could cause performance degradation for some applications
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 224
Types of data storage layouts
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 225
HDF5 datasets storage layouts
• Contiguous• Chunked• Compact
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 226
Contiguous storage layout
• Metadata header separate from dataset data• Data stored in one contiguous block in HDF5 file
• Application memory
• Metadata cache• Dataset header
• ………….• Datatype
• Dataspace• ………….• Attributes
• …
• File •
• Dataset data
• Dataset data
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 227
Chunked storage
• Chunking – storage layout where a dataset is partitioned in fixed-size multi-dimensional tiles or chunks
• Used for extendible datasets and datasets with filters applied (checksum, compression)
• HDF5 library treats each chunk as atomic object• Greatly affects performance and file sizes
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 228
Chunked storage layout
• Dataset data divided into equal sized blocks (chunks)• Each chunk stored separately as a contiguous block in
HDF5 file
• Application memory
• Metadata cache• Dataset header
• ………….• Datatype
• Dataspace• ………….• Attributes
• …
• File
• Dataset data
• A • D• C • B• header• Chunk
index
• Chunkindex
• A • B • C • D
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 229
Compact storage layout
• Dataset data and metadata stored together in the object header
• File
• Application memory
• Dataset header• ………….• Datatype
• Dataspace• ………….• Attributes
• …
• Metadata cache • Dataset data
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 230
Factors affecting I/O performance
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 231
What goes on inside the library?
• Operations on data inside the library• Copying to/from internal buffers• Datatype conversion• Scattering - gathering • Data transformation (filters, compression)
• Data structures used• B-trees (groups, dataset chunks)• Hash tables• Local and Global heaps (variable length data: link names, strings,
etc.)• Other concepts
• HDF5 metadata, metadata cache• Chunking, chunk cache
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 232
Operations on data inside the library
• Copying to/from internal buffers• Datatype conversion, such as
• float integer• Little-endian big-endian• 64-bit integer to 16-bit integer
• Scattering - gathering • Data is scattered/gathered from/to application buffers
into internal buffers for datatype conversion and partial I/O
• Data transformation (filters, compression)• Checksum on raw data and metadata (in 1.8.0)• Algebraic transform• GZIP and SZIP compressions• User-defined filters
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 233
I/O performance
• I/O performance depends on • Storage layouts• Dataset storage properties• Chunking strategy• Metadata cache performance• Datatype conversion performance• Other filters, such as compression• Access patterns
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 234
I/O with different storage layouts
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 235
Writing a compact dataset
• Application memory
• Dataset header• ………….• Datatype
• Dataspace• ………….• Attributes
• …
• File
• Metadata cache
• Dataset data
• One write to store header and dataset data
• Dataset data
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 236
Writing contiguous dataset – no conversion
• Application memory
• Metadata cache• Dataset header
• ………….• Datatype
• Dataspace• ………….• Attributes
• …
• File •
• Dataset data
• No sub-setting in memory or a file is performed
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 237
Writing a contiguous dataset with datatype conversion
• Dataset header• ………….• Datatype
• Dataspace• ………….• Attribute 1• Attribute 2• ………… • Application memory
• Metadata cache
• File
• Conversion buffer 1MB
• Dataset data
•
• No sub-setting in memory or a file is performed
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 238
Partial I/O with contiguous datasets
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 239
Writing whole dataset – contiguous rows
• File
• Application data in memory
• Data is contiguous in a file
• One I/O operation
• M rows
• M
• N
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 240
Sub-setting of contiguous datasetSeries of adjacent rows
• File
• N
• Application data in memory
• Subset – contiguous in a file
• One I/O operation
• M rows
• M
• Entire dataset – contiguous in a file
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 241
Sub-setting of contiguous datasetAdjacent, partial rows
• File
• N
• M
• …
• Application data in memory
• Data is scattered in a file in M contiguous blocks
• Several small I/O operation
• N elements
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 242
Sub-setting of contiguous datasetExtreme case: writing a column
• N
• M
• Application data in memory
• Subset data is scattered in a file in M different locations
• Several small I/O operation
• …
• 1 element
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 243
Sub-setting of contiguous datasetData sieve buffer
• File
• N
• M
• …
• Application data in memory
• Data is scattered in a file
• 1 element
• Data in a sieve buffer (64K) in memory
• memcopy
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 244
Performance tuning for contiguous dataset
• Datatype conversion• Avoid for better performance• Use H5Pset_buffer function to customize
conversion buffer size• Partial I/O
• Write/read in big contiguous blocks • Use H5Pset_sieve_buf_size to improve
performance for complex subsetting
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 245
I/O with Chunking
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 246
Chunked storage layout
• Raw data divided into equal sized blocks (chunks)• Each chunk stored separately as a contiguous block
in a file
• Application memory
• Metadata cache• Dataset header
• ………….• Datatype
• Dataspace• ………….• Attributes
• …
• File
• Dataset data
• A • D• C • B• header
• Chunkindex
• Chunkindex
• A • B • C • D
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 247
Information about chunking
• HDF5 library treats each chunk as atomic object• Compression and other filters are applied to each chunk• Datatype conversion is performed on each chunk
• Chunk size greatly affects performance• Chunk overhead adds to file size• Chunk processing involves many steps
• Chunk cache• Caches chunks for better performance• Size of chunk cache is set for file (default size 1MB)• Each chunked dataset has its own chunk cache• Chunk may be too big to fit into cache• Memory may grow if application keeps opening datasets
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 248
Chunk cache
• Dataset_1 header
• …………
• Application memory
• Metadata cache
• Chunking B-tree nodes• Chunk cache
• Default size is 1MB• Dataset_N header
• …………
• ………
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 249
Writing chunked dataset
• C • B• A
• …………..
• Filters including compression are applied when chunk is evicted from cache
• A• B • C
• C
• File
• Chunk cache• Chunked dataset
• Filter pipeline
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 250
Partial I/O with Chunking
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 251
Partial I/O for chunked dataset
Example: write the green subset from the dataset , converting the data
Dataset is stored as six chunks in the file. The subset spans four chunks, numbered 1-4 in the figure. Hence four chunks must be written to the file. But first, the four chunks must be read from the file, to preserve
those parts of each chunk that are not to be overwritten.
• 1 • 2
• 3 • 4
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 252
Partial I/O for chunked dataset
• For each of four chunks on writing:• Read chunk from file into chunk
cache, unless it’s already there• Determine which part of the chunk will
be replaced by the selection• Move those elements from application
buffer to conversion buffer • Perform conversion• Replace that part of the chunk in the
cache with the corresponding elements from the conversion buffer
• Apply filters (compression) when chunk is flushed from chunk cache
• For each element 3 (or more) memcopy operations are performed
• 1 • 2
• 3 • 4
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 253
Partial I/O for chunked dataset
• 3
• Application memory
• conversion buffer
• Application buffer
• Chunk
• Elements participating in I/O are gathered into corresponding chunk• after going through conversion buffer
• Chunk cache
• 3
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 254
Partial I/O for chunked dataset
• 3 • Conversion buffer
• Application memory
• Chunk cache
• File • Chunk
• Apply filters and write to file
• Application buffer
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 255
Variable length data and I/O
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 256
Examples of variable length data
• String A[0] “the first string we want to write”
…………………………………
A[N-1] “the N-th string we want to write”• Each element is a record of variable-length
A[0] (1,1,0,0,0,5,6,7,8,9) [length = 10]
A[1] (0,0,110,2005) [length = 4]
………………………..
A[N] (1,2,3,4,5,6,7,8,9,10,11,12,….,M) [length = M]
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 257
Variable length data in HDF5
• Variable length description in HDF5 application
typedef struct { size_t length; void *p;}hvl_t;
• Base type can be any HDF5 type
H5Tvlen_create(base_type)• ~ 20 bytes overhead for each element• Data cannot be compressed
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 258
Variable length data storage in HDF5
• Global hea
p
• Actual variable length data
• Dataset withvariable length elements
• Pointer intoglobal heap
• File
• Dataset header
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 259
Variable length datasets and I/O
• When writing variable length data, elements in application buffer always go through conversion and are copied to the global heaps in a metadata cache before ending in a file
• Global heap
• Application buffer
• Metadata cache
• Raw VL data
• conversion buffer
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 260
There may be more than one global heap
• Global
heap
• Raw data
• Global
heap• Metadata cache
• Application memory
• Raw VL data
• Raw VL data
• Application buffer• Conversion buffer
• Raw VL data
• On a write request, VL data goes through conversion and is written to • a global heap; elements of the same dataset may be written to different heaps.
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 261
Variable length datasets and I/O
• File
• Global
heap
• Raw data
• Global
heap• Metadata cache
• Application memory
• Raw VL data
• Raw VL data
• Application buffer• Conversion buffer
• Raw VL data
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 262
VL chunked dataset in a file
• File
• Dataset header
• Chunk B-tree
• Dataset chunks• Heaps with VL data
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 263
Writing chunked VL datasets
• Dataset header
• …………
• Application memory• Metadata cache • B-tree nodes
• Chunk cache
• ………
• Conversion buffer• VL• data
• Raw data
• Global heap
• Chunk cache
• Data in applicati
on buffers
• File
• Filter pipeline
• hvl_t pointers • 1
• 2
• 3 • 4
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 264
Hints for variable length data I/O
• Avoid closing/opening a file while writing VL datasets • Global heap information is lost• Global heaps may have unused space
• Avoid alternately writing different VL datasets• Data from different datasets will go into to the
same heap• If maximum length of the record is known,
consider using fixed-length records and compression
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 265
Questions?
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 266
Parallel HDF5Tutorial
Albert Cheng
The HDF Group
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 267
Parallel HDF5Introductory Tutorial
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 268
Outline
• Overview of Parallel HDF5 design• Setting up parallel environment• Programming model for
• Creating and accessing a File• Creating and accessing a Dataset• Writing and reading Hyperslabs
• Parallel tutorial available at• http://www.hdfgroup.org/HDF5/Tutor/
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 269
Overview of Parallel HDF5 Design
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 270
PHDF5 Requirements
• Support MPI programming• PHDF5 files compatible with serial HDF5 files
• Shareable between different serial or parallel platforms
• Single file image to all processes• One file per process design is undesirable
• Expensive post processing• Not usable by different number of processes
• Standard parallel I/O interface• Must be portable to different platforms
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 271
PHDF5 Implementation Layers
Application
Parallel computing system (Linux cluster)Compute
node
I/O library (HDF5)
Parallel I/O library (MPI-I/O)
Parallel file system (GPFS)
Switch network/I/O servers
Computenode
Computenode
Computenode
Disk architecture & layout of data on disk
PHDF5 built on top of standard MPI-IO API
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 272
Parallel Environment Requirements
• MPI with MPI-IO. E.g.,• MPICH2 ROMIO• Vendor’s MPI-IO
• POSIX compliant parallel file system. E.g.,• GPFS• Lustre
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 273
MPI-IO vs. HDF5
• MPI-IO is an Input/Output API.• It treats the data file as a “linear byte stream”
and each MPI application needs to provide its own file view and data representations to interpret those bytes.
• All data stored are machine dependent except the “external32” representation.
• External32 is defined in Big Endianness• Little-endian machines have to do the data
conversion in both read or write operations.• 64bit sized data types may lose information.
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 274
MPI-IO vs. HDF5 Cont.
• HDF5 is a data management software.• It stores the data and metadata according to
the HDF5 data format definition.• HDF5 file is self-described.• Each machine can store the data in its own
native representation for efficient I/O without loss of data precision.
• Any necessary data representation conversion is done by the HDF5 library automatically.
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 275
How to Compile PHDF5 Applications
• h5pcc – HDF5 C compiler command• Similar to mpicc
• h5pfc – HDF5 F90 compiler command• Similar to mpif90
• To compile:• % h5pcc h5prog.c• % h5pfc h5prog.f90
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 276
h5pcc/h5pfc -show option
• -show displays the compiler commands and options without executing them, i.e., dry run
% h5pcc -show Sample_mpio.cmpicc -I/home/packages/phdf5/include \-D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE \-D_FILE_OFFSET_BITS=64 -D_POSIX_SOURCE \-D_BSD_SOURCE -std=c99 -c Sample_mpio.c
mpicc -std=c99 Sample_mpio.o \-L/home/packages/phdf5/lib \home/packages/phdf5/lib/libhdf5_hl.a \ /home/packages/phdf5/lib/libhdf5.a -lz -lm -Wl,-rpath \-Wl,/home/packages/phdf5/lib
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 277
Collective vs. Independent Calls
• MPI definition of collective call• All processes of the communicator must
participate in the right order. E.g.,• Process1 Process2• call A(); call B(); call A(); call B(); **right**• call A(); call B(); call B(); call A(); **wrong**
• Independent means not collective• Collective is not necessarily synchronous
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 278
Programming Restrictions
• Most PHDF5 APIs are collective• PHDF5 opens a parallel file with a communicator
• Returns a file-handle• Future access to the file via the file-handle• All processes must participate in collective PHDF5
APIs• Different files can be opened via different
communicators
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 279
Examples of PHDF5 API
• Examples of PHDF5 collective API• File operations: H5Fcreate, H5Fopen, H5Fclose• Objects creation: H5Dcreate, H5Dopen, H5Dclose• Objects structure: H5Dextend (increase dimension
sizes)• Array data transfer can be collective or
independent• Dataset operations: H5Dwrite, H5Dread• Collectiveness is indicated by function parameters,
not by function names as in MPI API
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 280
What Does PHDF5 Support ?
• After a file is opened by the processes of a communicator• All parts of file are accessible by all processes• All objects in the file are accessible by all
processes• Multiple processes may write to the same data
array• Each process may write to individual data array
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 281
PHDF5 API Languages
• C and F90 language interfaces• Platforms supported:
• Most platforms with MPI-IO supported. E.g.,• IBM SP, Linux clusters, SGI Altrix, Cray XT3, …
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 282
Programming model for creating and accessing a file
• HDF5 uses access template object (property list) to control the file access mechanism
• General model to access HDF5 file in parallel:• Setup MPI-IO access template (access
property list)• Open File • Access Data• Close File
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 283
Setup MPI-IO access template
Each process of the MPI communicator creates anaccess template and sets it up with MPI parallel access informationC:
herr_t H5Pset_fapl_mpio(hid_t plist_id, MPI_Comm comm, MPI_Info info);
F90:
h5pset_fapl_mpio_f(plist_id, comm, info) integer(hid_t) :: plist_id integer :: comm, info
plist_id is a file access property list identifier
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 284
C Example Parallel File Create
23 comm = MPI_COMM_WORLD; 24 info = MPI_INFO_NULL; 26 /* 27 * Initialize MPI 28 */ 29 MPI_Init(&argc, &argv); 30 /* 34 * Set up file access property list for MPI-IO access 35 */ ->36 plist_id = H5Pcreate(H5P_FILE_ACCESS); ->37 H5Pset_fapl_mpio(plist_id, comm, info); 38 ->42 file_id = H5Fcreate(H5FILE_NAME, H5F_ACC_TRUNC, H5P_DEFAULT, plist_id); 49 /* 50 * Close the file. 51 */ 52 H5Fclose(file_id); 54 MPI_Finalize();
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 285
F90 Example Parallel File Create
23 comm = MPI_COMM_WORLD 24 info = MPI_INFO_NULL 26 CALL MPI_INIT(mpierror) 29 ! 30 !Initialize FORTRAN predefined datatypes 32 CALL h5open_f(error) 34 ! 35 !Setup file access property list for MPI-IO access. ->37 CALL h5pcreate_f(H5P_FILE_ACCESS_F, plist_id, error) ->38 CALL h5pset_fapl_mpio_f(plist_id, comm, info, error) 40 ! 41 !Create the file collectively. ->43 CALL h5fcreate_f(filename, H5F_ACC_TRUNC_F, file_id, error, access_prp = plist_id) 45 ! 46 !Close the file. 49 CALL h5fclose_f(file_id, error) 51 ! 52 !Close FORTRAN interface 54 CALL h5close_f(error) 56 CALL MPI_FINALIZE(mpierror)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 286
Creating and Opening Dataset
• All processes of the communicator open/close a dataset by a collective callC: H5Dcreate or H5Dopen; H5DcloseF90: h5dcreate_f or h5dopen_f; h5dclose_f
• All processes of the communicator must extend an unlimited dimension dataset before writing to itC: H5DextendF90: h5dextend_f
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 287
C Example: Create Dataset
56 file_id = H5Fcreate(…); 57 /* 58 * Create the dataspace for the dataset. 59 */ 60 dimsf[0] = NX; 61 dimsf[1] = NY; 62 filespace = H5Screate_simple(RANK, dimsf, NULL); 63 64 /* 65 * Create the dataset with default properties collective. 66 */ ->67 dset_id = H5Dcreate(file_id, “dataset1”, H5T_NATIVE_INT, 68 filespace, H5P_DEFAULT);
70 H5Dclose(dset_id); 71 /* 72 * Close the file. 73 */ 74 H5Fclose(file_id);
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 288
F90 Example: Create Dataset
43 CALL h5fcreate_f(filename, H5F_ACC_TRUNC_F, file_id, error, access_prp = plist_id) 73 CALL h5screate_simple_f(rank, dimsf, filespace, error) 76 ! 77 ! Create the dataset with default properties. 78 ! ->79 CALL h5dcreate_f(file_id, “dataset1”, H5T_NATIVE_INTEGER, filespace, dset_id, error) 90 ! 91 ! Close the dataset. 92 CALL h5dclose_f(dset_id, error) 93 ! 94 ! Close the file. 95 CALL h5fclose_f(file_id, error)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 289
Accessing a Dataset
• All processes that have opened dataset may do collective I/O
• Each process may do independent and arbitrary number of data I/O access calls • C: H5Dwrite and H5Dread• F90: h5dwrite_f and h5dread_f
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 290
Programming model for dataset access
• Create and set dataset transfer property• C: H5Pset_dxpl_mpio
• H5FD_MPIO_COLLECTIVE• H5FD_MPIO_INDEPENDENT (default)
• F90: h5pset_dxpl_mpio_f• H5FD_MPIO_COLLECTIVE_F• H5FD_MPIO_INDEPENDENT_F (default)
• Access dataset with the defined transfer property
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 291
C Example: Collective write
95 /* 96 * Create property list for collective dataset write. 97 */ 98 plist_id = H5Pcreate(H5P_DATASET_XFER); ->99 H5Pset_dxpl_mpio(plist_id, H5FD_MPIO_COLLECTIVE); 100 101 status = H5Dwrite(dset_id, H5T_NATIVE_INT, 102 memspace, filespace, plist_id, data);
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 292
F90 Example: Collective write
88 ! Create property list for collective dataset write 89 ! 90 CALL h5pcreate_f(H5P_DATASET_XFER_F, plist_id, error) ->91 CALL h5pset_dxpl_mpio_f(plist_id, & H5FD_MPIO_COLLECTIVE_F, error) 92 93 ! 94 ! Write the dataset collectively. 95 ! 96 CALL h5dwrite_f(dset_id, H5T_NATIVE_INTEGER, data, & error, & file_space_id = filespace, & mem_space_id = memspace, & xfer_prp = plist_id)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 293
Writing and Reading Hyperslabs
• Distributed memory model: data is split among processes
• PHDF5 uses HDF5 hyperslab model• Each process defines memory and file hyperslabs• Each process executes partial write/read call
• Collective calls• Independent calls
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 294
Set up the Hyperslab for Read/Write
H5Sselect_hyperslab(filespace,H5S_SELECT_SET,offset, stride, count, block
)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 295
P0
P1File
Example 1: Writing dataset by rows
P2
P3
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 296
Writing by rows: Output of h5dump
HDF5 "SDS_row.h5" {GROUP "/" { DATASET "IntArray" { DATATYPE H5T_STD_I32BE DATASPACE SIMPLE { ( 8, 5 ) / ( 8, 5 ) } DATA { 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13 } } } }
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 297
Memory File
Example 1: Writing dataset by rows
count[0] = dimsf[0]/mpi_sizecount[1] = dimsf[1];offset[0] = mpi_rank * count[0]; /* = 2 */offset[1] = 0;
count[0]
count[1]
offset[0]
offset[1]
Process 1
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 298
Example 1: Writing dataset by rows
71 /* 72 * Each process defines dataset in memory and * writes it to the hyperslab 73 * in the file. 74 */ 75 count[0] = dimsf[0]/mpi_size; 76 count[1] = dimsf[1]; 77 offset[0] = mpi_rank * count[0]; 78 offset[1] = 0; 79 memspace = H5Screate_simple(RANK,count,NULL); 80 81 /* 82 * Select hyperslab in the file. 83 */ 84 filespace = H5Dget_space(dset_id); 85 H5Sselect_hyperslab(filespace, H5S_SELECT_SET,offset,NULL,count,NULL);
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 299
P0
P1
File
Example 2: Writing dataset by columns
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 300
Writing by columns: Output of h5dump
HDF5 "SDS_col.h5" {GROUP "/" { DATASET "IntArray" { DATATYPE H5T_STD_I32BE DATASPACE SIMPLE { ( 8, 6 ) / ( 8, 6 ) } DATA { 1, 2, 10, 20, 100, 200, 1, 2, 10, 20, 100, 200, 1, 2, 10, 20, 100, 200, 1, 2, 10, 20, 100, 200, 1, 2, 10, 20, 100, 200, 1, 2, 10, 20, 100, 200, 1, 2, 10, 20, 100, 200, 1, 2, 10, 20, 100, 200 } } } }
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 301
Example 2: Writing dataset by column
Process 1
Process 0
FileMemory
block[1]
block[0]
P0 offset[1]
P1 offset[1]stride[1]
dimsm[0]dimsm[1]
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 302
Example 2: Writing dataset by column
85 /*86 * Each process defines a hyperslab in * the file88 */89 count[0] = 1;90 count[1] = dimsm[1];91 offset[0] = 0;92 offset[1] = mpi_rank;93 stride[0] = 1;94 stride[1] = 2;95 block[0] = dimsf[0];96 block[1] = 1;9798 /*99 * Each process selects a hyperslab.100 */101 filespace = H5Dget_space(dset_id);
102 H5Sselect_hyperslab(filespace, H5S_SELECT_SET, offset, stride, count, block);
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 303
Example 3: Writing dataset by pattern
Process 0
Process 2
File
Process 3
Process 1
Memory
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 304
Writing by Pattern: Output of h5dump
HDF5 "SDS_pat.h5" {GROUP "/" { DATASET "IntArray" { DATATYPE H5T_STD_I32BE DATASPACE SIMPLE { ( 8, 4 ) / ( 8, 4 ) } DATA { 1, 3, 1, 3, 2, 4, 2, 4, 1, 3, 1, 3, 2, 4, 2, 4, 1, 3, 1, 3, 2, 4, 2, 4, 1, 3, 1, 3, 2, 4, 2, 4 } } } }
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 305
Process 2
File
Example 3: Writing dataset by pattern
offset[0] = 0;offset[1] = 1;count[0] = 4;count[1] = 2;stride[0] = 2;stride[1] = 2;
Memory
stride[0]
stride[1]
offset[1]
count[1]
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 306
Example 3: Writing by pattern
90 /* Each process defines dataset in memory and 91 * writes it to the hyperslab in the file. 92 */ 93 count[0] = 4; 94 count[1] = 2; 95 stride[0] = 2; 96 stride[1] = 2; 97 if(mpi_rank == 0) { 98 offset[0] = 0; 99 offset[1] = 0; 100 } 101 if(mpi_rank == 1) { 102 offset[0] = 1; 103 offset[1] = 0; 104 } 105 if(mpi_rank == 2) { 106 offset[0] = 0; 107 offset[1] = 1; 108 } 109 if(mpi_rank == 3) { 110 offset[0] = 1; 111 offset[1] = 1; 112 }
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 307
P0 P2 File
Example 4: Writing dataset by chunks
P1 P3
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 308
Writing by Chunks: Output of h5dump
HDF5 "SDS_chnk.h5" {GROUP "/" { DATASET "IntArray" { DATATYPE H5T_STD_I32BE DATASPACE SIMPLE { ( 8, 4 ) / ( 8, 4 ) } DATA { 1, 1, 2, 2, 1, 1, 2, 2, 1, 1, 2, 2, 1, 1, 2, 2, 3, 3, 4, 4, 3, 3, 4, 4, 3, 3, 4, 4, 3, 3, 4, 4 } } } }
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 309
Example 4: Writing dataset by chunks
FileProcess 2: Memory
block[0] = chunk_dims[0];block[1] = chunk_dims[1];offset[0] = chunk_dims[0];offset[1] = 0;
chunk_dims[0]
chunk_dims[1]
block[0]
block[1]
offset[0]
offset[1]
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 310
Example 4: Writing by chunks
97 count[0] = 1; 98 count[1] = 1 ; 99 stride[0] = 1; 100 stride[1] = 1; 101 block[0] = chunk_dims[0]; 102 block[1] = chunk_dims[1]; 103 if(mpi_rank == 0) { 104 offset[0] = 0; 105 offset[1] = 0; 106 } 107 if(mpi_rank == 1) { 108 offset[0] = 0; 109 offset[1] = chunk_dims[1]; 110 } 111 if(mpi_rank == 2) { 112 offset[0] = chunk_dims[0]; 113 offset[1] = 0; 114 } 115 if(mpi_rank == 3) { 116 offset[0] = chunk_dims[0]; 117 offset[1] = chunk_dims[1]; 118 }
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 311
Parallel HDF5Intermediate Tutorial
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 312
Outline
• Performance• Parallel tools
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 313
My PHDF5 Application I/O is slow
• If my application I/O performance is slow, what can I do?• Use larger I/O data sizes• Independent vs. Collective I/O• Specific I/O system hints• Increase Parallel File System capacity
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 314
Write Speed vs. Block Size
TFLOPS: HDF5 Write vs MPIO Write(File size 3200MB, Nodes: 8)
020406080
100120
1 2 4 8 16 32
Block Size (MB)
MB
/Se
c
HDF5 WriteMPIO Write
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 315
Independent vs. Collective Access
• User reported Independent data transfer mode was much slower than the Collective data transfer mode
• Data array was tall and thin: 230,000 rows by 6 columns
:::
230,000 rows:::
Debug Slow Parallel I/O Speed(1)
• Writing to one dataset• Using 4 processes == 4 columns• data type is 8 bytes doubles• 4 processes, 1000 rows == 4x1000x8 = 32,000
bytes• % mpirun -np 4 ./a.out i t 1000
• Execution time: 1.783798 s.• % mpirun -np 4 ./a.out i t 2000
• Execution time: 3.838858 s.• # Difference of 2 seconds for 1000 more rows =
32,000 Bytes.• # A speed of 16KB/Sec!!! Way too slow.
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 316
Debug Slow Parallel I/O Speed(2)
• Build a version of PHDF5 with • ./configure --enable-debug --enable-parallel …• This allows the tracing of MPIO I/O calls in the
HDF5 library.• E.g., to trace
• MPI_File_read_xx and MPI_File_write_xx calls• % setenv H5FD_mpio_Debug “rw”
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 317
Debug Slow Parallel I/O Speed(3)
• % setenv H5FD_mpio_Debug ’rw’• % mpirun -np 4 ./a.out i t 1000 # Indep.; contiguous.• in H5FD_mpio_write mpi_off=0 size_i=96• in H5FD_mpio_write mpi_off=0 size_i=96• in H5FD_mpio_write mpi_off=0 size_i=96• in H5FD_mpio_write mpi_off=0 size_i=96• in H5FD_mpio_write mpi_off=2056 size_i=8• in H5FD_mpio_write mpi_off=2048 size_i=8• in H5FD_mpio_write mpi_off=2072 size_i=8• in H5FD_mpio_write mpi_off=2064 size_i=8• in H5FD_mpio_write mpi_off=2088 size_i=8• in H5FD_mpio_write mpi_off=2080 size_i=8• …• # total of 4000 of this little 8 bytes writes == 32,000 bytes.
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 318
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 319
Independent calls are many and small
• Each process writes one element of one row, skips to next row, write one element, so on.
• Each process issues 230,000 writes of 8 bytes each.
• Not good==just like many independent cars driving to work, waste gas, time, total traffic jam.
:::
230,000 rows:::
Debug Slow Parallel I/O Speed (4)
• % setenv H5FD_mpio_Debug ’rw’• % mpirun -np 4 ./a.out i h 1000 # Indep., Chunked.• in H5FD_mpio_write mpi_off=0 size_i=96• in H5FD_mpio_write mpi_off=0 size_i=96• in H5FD_mpio_write mpi_off=0 size_i=96• in H5FD_mpio_write mpi_off=0 size_i=96• in H5FD_mpio_write mpi_off=3688 size_i=8000• in H5FD_mpio_write mpi_off=11688 size_i=8000• in H5FD_mpio_write mpi_off=27688 size_i=8000• in H5FD_mpio_write mpi_off=19688 size_i=8000• in H5FD_mpio_write mpi_off=96 size_i=40• in H5FD_mpio_write mpi_off=136 size_i=544• in H5FD_mpio_write mpi_off=680 size_i=120• in H5FD_mpio_write mpi_off=800 size_i=272• …• Execution time: 0.011599 s.
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 320
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 321
Use Collective Mode or Chunked Storage
• Collective mode will combine many small independent calls into few but bigger calls==like people going to work by trains collectively.
• Chunks of columns speeds up too==like people live and work in suburbs to reduce overlapping traffics.
:::
230,000 rows:::
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 322
# of Rows Data Size(MB)
Independent (Sec.)
Collective (Sec.)
16384 0.25 8.26 1.72
32768 0.50 65.12 1.80
65536 1.00 108.20 2.68
122918 1.88 276.57 3.11
150000 2.29 528.15 3.63
180300 2.75 881.39 4.12
Independent vs. Collective write
6 processes, IBM p-690, AIX, GPFS
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 323
Independent vs. Collective write (cont.)
Performance (non-contiguous)
0
100
200
300
400
500
600
700
800
900
1000
0.00 0.50 1.00 1.50 2.00 2.50 3.00
Data space size (MB)
Tim
e (
s)
Independent
Collective
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 324
Effects of I/O Hints: IBM_largeblock_io
• GPFS at LLNL Blue• 4 nodes, 16 tasks• Total data size 1024MB• I/O buffer size 1MB
IBM_largeblock_io=false IBM_largeblock_io=trueTasks MPI-IO PHDF5 MPI-IO PHDF516 write (MB/S) 60 48 354 29416 read (MB/S) 44 39 256 248
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 325
• GPFS at LLNL ASCI Blue machine• 4 nodes, 16 tasks• Total data size 1024MB• I/O buffer size 1MB
0
50
100
150
200
250
300
350
400
MPI-IO PHDF5 MPI-IO PHDF5
IBM_largeblock_io=false IBM_largeblock_io=true
16 write
16 read
Effects of I/O Hints: IBM_largeblock_io
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 326
Parallel Tools
• ph5diff• Parallel version of the h5diff tool
• h5perf• Performance measuring tools showing
I/O performance for different I/O API
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 327
ph5diff
• An parallel version of the h5diff tool• Supports all features of h5diff• An MPI parallel tool• Manager process (proc 0)
• coordinates each the remaining processes (workers) to “diff” one dataset at a time;
• collects any output from each worker and prints them out.
• Works best if there are many datasets in the two files with few differences.
• Available in v1.8.
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 328
h5perf
• An I/O performance measurement tool• Test 3 File I/O API
• POSIX I/O (open/write/read/close…)• MPIO (MPI_File_{open,write,read,close})• PHDF5
• H5Pset_fapl_mpio (using MPI-IO)• H5Pset_fapl_mpiposix (using POSIX I/O)
• An indication of I/O speed upper limits
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 329
h5perf: Some features
• Check (-c) verify data correctness• Added 2-D chunk patterns in v1.8• -h shows the help page.
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 330
h5perf: example output 1/3 %mpirun -np 4 h5perf # Ran in a Linux systemNumber of processors = 4 Transfer Buffer Size: 131072 bytes, File size: 1.00 MBs # of files: 1, # of datasets: 1, dataset size: 1.00 MBs IO API = POSIX Write (1 iteration(s)): Maximum Throughput: 18.75 MB/s Average Throughput: 18.75 MB/s Minimum Throughput: 18.75 MB/s Write Open-Close (1 iteration(s)): Maximum Throughput: 10.79 MB/s Average Throughput: 10.79 MB/s Minimum Throughput: 10.79 MB/s Read (1 iteration(s)): Maximum Throughput: 2241.74 MB/s Average Throughput: 2241.74 MB/s Minimum Throughput: 2241.74 MB/s Read Open-Close (1 iteration(s)): Maximum Throughput: 756.41 MB/s Average Throughput: 756.41 MB/s Minimum Throughput: 756.41 MB/s
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 331
h5perf: example output 2/3 %mpirun -np 4 h5perf… IO API = MPIO Write (1 iteration(s)): Maximum Throughput: 611.95 MB/s Average Throughput: 611.95 MB/s Minimum Throughput: 611.95 MB/s Write Open-Close (1 iteration(s)): Maximum Throughput: 16.89 MB/s Average Throughput: 16.89 MB/s Minimum Throughput: 16.89 MB/s Read (1 iteration(s)): Maximum Throughput: 421.75 MB/s Average Throughput: 421.75 MB/s Minimum Throughput: 421.75 MB/s Read Open-Close (1 iteration(s)): Maximum Throughput: 109.22 MB/s Average Throughput: 109.22 MB/s Minimum Throughput: 109.22 MB/s
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 332
h5perf: example output 3/3 %mpirun -np 4 h5perf… IO API = PHDF5 (w/MPI-I/O driver) Write (1 iteration(s)): Maximum Throughput: 304.40 MB/s Average Throughput: 304.40 MB/s Minimum Throughput: 304.40 MB/s Write Open-Close (1 iteration(s)): Maximum Throughput: 15.14 MB/s Average Throughput: 15.14 MB/s Minimum Throughput: 15.14 MB/s Read (1 iteration(s)): Maximum Throughput: 1718.27 MB/s Average Throughput: 1718.27 MB/s Minimum Throughput: 1718.27 MB/s Read Open-Close (1 iteration(s)): Maximum Throughput: 78.06 MB/s Average Throughput: 78.06 MB/s Minimum Throughput: 78.06 MB/s Transfer Buffer Size: 262144 bytes, File size: 1.00 MBs # of files: 1, # of datasets: 1, dataset size: 1.00 MBs
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 333
Useful Parallel HDF Links
• Parallel HDF information sitehttp://www.hdfgroup.org/HDF5/PHDF5/
• Parallel HDF5 tutorial available athttp://www.hdfgroup.org/HDF5/Tutor/
• HDF Help email [email protected]
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 334
Questions?
Parallel I/O Performance Study
(preliminary results)Albert Cheng
The HDF Group
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 335
Introduction
• Parallel performance affected by the I/O access pattern, file system, and MPI communication modes.
• Determination of interaction of these elements provides hints for improving performance.
• Study presents four test cases using h5perf and h5perf_serial.• h5perf has been extended to support parallel testing of
2D datasets.
• h5perf_serial, based on h5perf, allows serial testing of n-dimensional datasets and various file drivers.
• Testing includes various combinations of MPI communication modes and HDF5 storage layouts.
• Finally, we make recommendations that can improve the I/O performance for specific patterns.
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 336
Testing Systems and Configuration
System Architecture File System MPI Implementation
abe Linux Cluster with Intel 64
Lustre MVAPICH2 1.0.2p1 Message Passing with Intel compiler
cobalt ccNUMA with Itanium 2
CXFS SGI Message Passing Toolkit 1.16
mercury Linux Cluster with Itanium 2
GPFS MPICH Myrinet 1.2.5..10, GM 2.0.8, Intel 8.0
Processors 4
Dataset Size 64K×64K (4GB)
I/O Selection 64MB per processor (shape depends on test case)
API HDF5 v181 (default building options)
Iterations 3
MPI/IO Type Collective / Independent
Storage Layout Contiguous / Chunked (chunk size depend on test case)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 337
HDF5 Storage Layouts
• Contiguous• HDF5 assigns a static contiguous region of storage
for raw data.
Dataset File storage
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 338
HDF5 Storage Layouts
• Chunked• HDF5 define separate regions of storage for raw data
named chunks, which are pre-allocated in row-major order when a file is created in parallel.
• This layout is only valid when a file is created and the chunks are pre-allocated. Further modification of the file may cause the chunks to be arranged differently.
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 339
C0 C1
C2 C3C0 C1 C2 C3
Test Cases
• Case A• The transfer selections extend over the entire columns
with a size of 64K×1K. If the storage is chunked, the size of the chunks is 1K×1K. The selections are interleaved horizontally with respect to the processors.
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 340
P0 P1 P2 P3 P0 P1 P2 P3 P0 P1 P2 P3
64K
64K
1K
…
Test Cases
• Case B• The transfer selection only spans half the columns with a size of
32K×2K. If the storage is chunked, the size of the chunks is
2K×2K. The selections are interleaved horizontally with respect
to the processors.
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 341
P0 P1 P2 P3 P0 P1 P2 P3 P0 P1 P2 P332K …
2K
P0 P1 P2 P3 P0 P1 P2 P3 P0 P1 P2 P3…
64K
64K
Test Cases• Case C
• The transfer selections only span half the rows with a size of
2K×32K. If the storage is chunked, the size of the chunks is
2K×2K. The lower dimension (column) is evenly divided among
the processors.
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 342
P0…P0P1…P1P2…P2P3…P3
P0…P0P1…P1P2…P2P3…P3
64K
64K
32K
2K
Test Cases• Case D
• The transfer selection extends over the entire rows with a size of 1K×64K. If the storage is chunked, the size of the chunks is 1K×1K. The lower dimension (column) is evenly divided among the processors.
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 343
P0…P0P1…P1P2…P2P3…
64K
1K
64K
P3
Access Patterns
• Contiguous• Each processor retrieves a separate region of
contiguous storage. An example of this pattern is case D using contiguous storage.
• Non-contiguous• Separate regions are still assigned to each processor
but such regions contain gaps. Examples of this pattern include case C using contiguous storage, and collective cases C-D using chunked storage.
P0 P1 P2 P3
P0 … P1 P1 … P2 P2 … P3 P3 ...P0
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 344
Access Patterns
• Interleaved (or overlapped)• Each processor writes into many portions that are
interleaved with respect to the other processors. For example, using contiguous storage along with cases A-B generates
• Another instance results from using chunked storage with collective cases A-B
P0 P1 P2 P3 P0 P1 P2 P3 …
P0 P1 P2 P3 P0 P1 P2 P3 …
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 345
Performance Results and Analysis
• The results correspond to maximum throughput values of Write Open-Close operations during 3 iterations.
• Serial throughput is the performance baseline since our objective is to determine how parallel access can improve performance.
• Unlike GPFS and CXFS, Lustre does not stripe files by default. To enable parallel access, the directory / file must be striped using the command lfs.
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 346
I/O Performance in Lustre
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 347
NON-STRIPED STRIPED
COLLECTIVE Case A Case B Case C Case D Case A Case B Case C Case D
Contiguous 11.66 23.68 46.12 36.67 25.35 50.26 42.67 119.26
Chunked 179.85 117.31 124.88 106.95 180.33 224.28 86.88 93.45
INDEPENDENT Case A Case B Case C Case D Case A Case B Case C Case D
Contiguous 5.92 8.17 20.98 304.06 6.7 10.81 73.45 298.09
Chunked 219.15 328.04 12.15 8.16 158.9 133.27 12.94 10.51
I/O Performance in Lustre
• Striping partitions the file space into stripes and assigns them to several Object Storage Targets (OSTs) in round-robin fashion.
• Since each OST stores portions of the file that are different from the other OSTs, they all can access the file in parallel.
• The default configuration on abe uses a stripe size of 4MB and a stripe count of 16.
• Striping improves performance when the I/O request of each processor spans several stripes (and OSTs) after MPI aggregations, if any.
• When the processors make small independent I/O requests that are practically contiguous as cases A-B using chunked storage, a single OST can provide better performance due to asynchronous operations.
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 348
I/O Performance
Case A Case B Case C Case D1
10
100
1000
abe
serial/cont
serial/chk
ind/cont
ind/chk
coll/cont
coll/chk
MB
/s
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 349
I/O Performance
Case A Case B Case C Case D1
10
100
1000
cobalt
serial/cont
serial/chk
ind/cont
ind/chk
coll/cont
coll/chk
MB
/s
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 350
I/O Performance
Case A Case B Case C Case D0.1
1
10
100
1000
mercury
serial/cont
serial/chk
ind/cont
ind/chk
coll/cont
coll/chk
MB
/s
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 351
Performance of Serial I/O
• Access using contiguous storage has the steepest performance trend as the cases change from A to D.
• When using chunked storage, the throughput remains almost constant at the upper bound.
• The allocation of chunks at the time they are written causes the access pattern to be virtually contiguous regardless of the test cases.
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 352
Performance of Independent I/O
• Processors perform their I/O requests independently from each other.
• For contiguous storage, performance improves as the tests move from A to D.
• For chunked storage, throughput is high for interleaved cases A-B since writing blocks (chunks) become larger and caching is exploited. For cases C-D, the many writing requests (one per chunk) multiply the overhead due to unnecessary locking and caching in Lustre and CXFS.
• Unlike these file systems, GPFS has shown better scalability [1,2].
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 353
Performance of Collective I/O
• The participating processors coordinate and combine their many requests into fewer I/O operations reducing latency.
• Since the file space is evenly divided among the processors, no need for locking which reduces overhead [3].
• For contiguous storage, performance is overall high but there is still an increasing trend as the cases change from A to D.
• For chunked storage, the performance is even higher with minor variations among the tests cases because several chunks can be written with a single I/O operation.
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 354
Conclusion
• Important to determine the access pattern by analyzing the I/O requirements of the application and the storage implementation.
• For contiguous access patterns, independent access is preferable because it omits unnecessary overhead of collective calls.
• For non-contiguous patterns, there is little difference between independent and collective access. However, writing many chunks in independent mode may be expensive in Lustre and CXFS if caching is not exploited.
• For interleaved access pattern, collective mode is usually faster.• For all the access patterns, collective mode and chunk storage
provide the combination that yields the highest average performance.
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 355
References
1. J. Borrill, L. Oliker, J. Shalf, and H. Shan. Investigation of Leading HPC I/O Performance Using A Scientific-Application Derived Benchmark. In Proceedings of SC’07: High Performance Networking and Computing, Reno, NV, November 2007.
2. W. Liao, A. Ching, K. Coloma, A. Choudhary, and L. Ward. An Implementation and Evaluation of Client-Side File Caching for MPI-IO. In Proceedings - International Parallel and Distributed Processing Symposium, IPDPS 2007, IEEE International Volume, Issue 26-30, pages 1-10, March 2007.
3. R. Thakur, W. Gropp, and E. Lusk. Data Sieving and Collective I/O in ROMIO. In Proceedings of the 7th Symposium of the Frontiers of Massively Parallel Computation. IEEE Computer Society Press, February 1999.
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 356
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 357
Questions?