![Page 1: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/1.jpg)
NCSA-NARA investigations of HDF5
in support of EXPRESS-Driven dataMike Folk
The HDF NARA Project
PDES, Inc. Offsite MeetingSeptember 24-29, 2006
![Page 2: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/2.jpg)
PDES, Inc. Offsite Sept 2006 2
Acknowledgement
This report is based upon work supported by the National Archives and Records Administration (NARA)
through the grant NARA NSF 0202 GPG. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily
reflect the views of the NARA.
![Page 3: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/3.jpg)
PDES, Inc. Offsite Sept 2006 3
ParticipantsMike Folk, Vailin Choi, Elena Pourmal – The
HDF GroupMark Conrad and Bob Chadduck – NARADavid Price – EuroSTEPKeith Hunten – Lockheed-MartinSteve Cooper and Denny Moore – Electric
BoatOthers
![Page 4: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/4.jpg)
1. What is HDF5?
![Page 5: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/5.jpg)
PDES, Inc. Offsite Sept 2006 5
HDF5 is
• A file format for managing any kind of data
• Software system to manage data in the format
• Suited especially to large volume or complex data
• Suited for every size and type of system• Open file format, open software
![Page 6: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/6.jpg)
PDES, Inc. Offsite Sept 2006 6
Definitions• “HDF” – Hierarchical Data Format
• Originated in 1988• NCSA at University of Illinois at Urbana-
Champaign
• “HDF5” • Successor to HDF, introduced in 1998
![Page 7: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/7.jpg)
PDES, Inc. Offsite Sept 2006 7
An HDF5 file is a container…
lat | lon | temp----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6
palette
palette
……into into which you which you can put can put your data your data objects.objects.
![Page 8: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/8.jpg)
PDES, Inc. Offsite Sept 2006 8
HDF5 data model• HDF5 file – container for data objects• Primary Objects
• Groups• Datasets
• Additional ways to organize data• Attributes for metadata• Sharable objects• Storage and access properties
Everything else is built from
Everything else is built from
these parts.
these parts.
![Page 9: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/9.jpg)
PDES, Inc. Offsite Sept 2006 9
HDF “groups” for organizing objects in files
palettepalette
Raster imageRaster image
3-D array3-D array
2-D array2-D arrayRaster imageRaster image
lat | lon | templat | lon | temp----|-----|---------|-----|----- 12 | 23 | 3.112 | 23 | 3.1 15 | 24 | 4.215 | 24 | 4.2 17 | 21 | 3.617 | 21 | 3.6
TableTable
““/” /” (root)(root)““/” /” (root)(root)
““/foo”/foo”““/foo”/foo”
![Page 10: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/10.jpg)
PDES, Inc. Offsite Sept 2006 10
HDF5 “dataset” for holding the data
DataMetadataDataspaceDataspaceDataspaceDataspace
3
RankRank
Dim_2 = 5Dim_1 = 4
DimensionsDimensions
time = 32.4
pressure = 987
temp = 56
AttributesAttributes
Chunked
compressed
Dim_3 = 7
Storage infoStorage info
IEEE 32-bit float
DatatypeDatatype
![Page 11: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/11.jpg)
PDES, Inc. Offsite Sept 2006 11
Datatypes (array elements)• Datatype – how to interpret a data
element• Two classes: atomic and compound
![Page 12: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/12.jpg)
PDES, Inc. Offsite Sept 2006 12
Datatypes• HDF5 atomic types
• normal integer & float• user-definable (e.g. 13-bit integer)• fixed length and variable length multiples (e.g.
strings)• references to objects/dataset regions• enumeration - names mapped to integers• array
• HDF5 compound types• Records with fields – comparable to C structs • Members can be atomic or compound types
![Page 13: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/13.jpg)
PDES, Inc. Offsite Sept 2006 13
“Groups”• A mechanism for
collections of related objects
• Every file starts with a root group
• Similar to UNIX directories
• Can have attributes
“/”tom dick
harry
a b c
![Page 14: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/14.jpg)
PDES, Inc. Offsite Sept 2006 14
Special Storage OptionsBetter subsetting Better subsetting access time; access time; extendableextendable
chunked
Improves storage Improves storage efficiency, efficiency, transmission speedtransmission speed
compressedcompressed
Arrays can be Arrays can be extended in any extended in any directiondirection
extendableextendable
Metadata for FredMetadata for FredMetadata for FredMetadata for Fred
Dataset “Fred”Dataset “Fred”Dataset “Fred”Dataset “Fred”
File AFile A
File BFile B
Data for FredData for Fred
Metadata in one file, Metadata in one file, raw data in another.raw data in another.Split fileSplit file
![Page 15: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/15.jpg)
PDES, Inc. Offsite Sept 2006 15
Mesh Example, in HDFView
![Page 16: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/16.jpg)
PDES, Inc. Offsite Sept 2006 16
HDF5 Software
Tools & ApplicationsTools & ApplicationsTools & ApplicationsTools & Applications
HDF FileHDF FileHDF FileHDF File
HDF I/O LibraryHDF I/O LibraryHDF I/O LibraryHDF I/O Library
![Page 17: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/17.jpg)
PDES, Inc. Offsite Sept 2006 17
Features of library• Ability to create and access complex data
structures• Fast, flexible I/O• Data transformation and filtering during I/O• Flexible API for power users• Compatibility with common data models
• Able to represent all common data structures• Supports key language models – C, Fortran,
Java, etc.
![Page 18: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/18.jpg)
PDES, Inc. Offsite Sept 2006 18
Other info• Library and tools run almost anywhere• Other software from THG
• Java viewer• Command-line utilities
• Other software• Commercial (IDL, Matlab, Labview, etc.)• Community (EOS, ASCI, etc.)• Integration with other software (SRB,
databases, etc.)
![Page 19: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/19.jpg)
PDES, Inc. Offsite Sept 2006 19
Making HDF useful for your application• There are many ways to organize and
access data in HDF5• How do we apply these capabilities to a
particular domain, such as product data?• We have to decide how we will organize and
access our data in a way that best addresses our needs.
• And create data models, APIs and tools as appropriate to support our applications.
• Or adapt existing data models, APIs and tools as appropriate to support our applications.
![Page 20: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/20.jpg)
Sample uses of HDF
![Page 21: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/21.jpg)
PDES, Inc. Offsite Sept 2006 21
1. NASA Earth Observing System (EOS)
Aqua (6/01)Aura
TES HRDLSMLS OMI
Terra
CERES MISR
MODIS MOPITT
AquaCERES MODIS
AMSR
![Page 22: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/22.jpg)
PDES, Inc. Offsite Sept 2006 22
2. Advanced Simulation & Computing (ASC)
Question: How do we maintain a nuclear stockpile in the absence
of testing?
Answer: Very large simulations
![Page 23: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/23.jpg)
PDES, Inc. Offsite Sept 2006 23
ASC Data requirements• Large datasets (> a terabyte) • Fast I/O on massive parallel systems • Complex data and extensive metadata• Availability on leading edge systems
![Page 24: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/24.jpg)
3. Bioinformatics
--
Managing genomic data
caacaagccaaaactcgtacaacaacaagccaaaactcgtacaaCgagatatctcttggaaaaactCgagatatctcttggaaaaactgctcacaatattgacgtacaaggctcacaatattgacgtacaaggttgttcatgaaactttcggtagttgttcatgaaactttcggtaAcaatcgttgacattgcgacctAcaatcgttgacattgcgacctaatacagcccagcaagcagaataatacagcccagcaagcagaat
![Page 25: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/25.jpg)
PDES, Inc. Offsite Sept 2006 25
DNA sequencing workflows are complex
• Diverse formats• Highly redundant data• Multiple levels of
information• Complex associations• Repeated file
processing• Non-scalable storage• Lack of persistence
![Page 26: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/26.jpg)
PDES, Inc. Offsite Sept 2006 26
HDF5 as binary exchange format for bioinformatics
![Page 27: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/27.jpg)
4. Flight test data
![Page 28: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/28.jpg)
PDES, Inc. Offsite Sept 2006 28
Boeing flight test
![Page 29: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/29.jpg)
HDF role in the Software Stack
![Page 30: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/30.jpg)
PDES, Inc. Offsite Sept 2006 30
StorageStorage
File on parallelFile on parallelfile systemfile systemFileFile
Split metadata Split metadata and raw data filesand raw data files
User-definedUser-defineddevicedevice
?? Across the networkAcross the networkor to/from anotheror to/from another
application or libraryapplication or libraryHDF5 formatHDF5 format
HDF5HDF5 data model & API data model & API
Apps: simulation, visualization, remote sensing…
Examples: Thermonuclear simulationsProduct modelingData mining tools
Visualization toolsClimate models
Common application-specific data models
HDF5 virtual file layer (I/O drivers)HDF5 virtual file layer (I/O drivers)
MPI I/OMPI I/OSplit FilesSplit FilesStdioStdio CustomCustom StreamStreamHDF5 serial & HDF5 serial &
parallel I/Oparallel I/O
BioHDF SAF HDF-Packet HDF-EOSMatlabapp-specificapp-specific API or GUI
LANL LLNL, SNL Grids COTS NASA
![Page 31: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/31.jpg)
2. Why is there interest in HDF5 for
product data? (Courtesy of David Price, EuroSTEP)
![Page 32: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/32.jpg)
PDES, Inc. Offsite Sept 2006 32
Needs• STEP and related models exist using
EXPRESS• ASCII, XML STEP formats defined,
software developed• But ASCII/XML don’t adapt well for
highly voluminous, complex data• Finite element analysis• Computational fluid dynamics• Heterogeneous product data
![Page 33: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/33.jpg)
PDES, Inc. Offsite Sept 2006 33
EuroSTEP project• VIVACE: “Value Improvement through a
Virtual Aeronautical Collaborative Enterprise”
• Deliverable: EXPRESS-driven Large Volume Binary Data Representation
![Page 34: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/34.jpg)
PDES, Inc. Offsite Sept 2006 36
Survey of State of the Art• Candidates
• ASN.1 : Abstract Syntax Notation 1• HDF5 : Hierarchical Data Format• XML/Binary• CGNS : CFD General Notation System• SDAI implementation by LKSoft
• Found HDF5 most suitable for very large scientific datasets and complex relationships
![Page 35: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/35.jpg)
Goal:Create open-source
toolkit mapping EXPRESS to HDF5
![Page 36: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/36.jpg)
PDES, Inc. Offsite Sept 2006 38
StorageStorage
File on parallelFile on parallelfile systemfile systemFileFile
Split metadata Split metadata and raw data filesand raw data files
User-definedUser-defineddevicedevice
?? Across the networkAcross the networkor to/from anotheror to/from another
application or libraryapplication or libraryHDF5 formatHDF5 format
HDF5HDF5 data model & API data model & API
Apps: simulation, visualization, remote sensing…
Examples: Thermonuclear simulationsProduct modelingData mining tools
Visualization toolsClimate models
Common application-specific data models
HDF5 virtual file layer (I/O drivers)HDF5 virtual file layer (I/O drivers)
MPI I/OMPI I/OSplit FilesSplit FilesStdioStdio CustomCustom StreamStreamHDF5 serial & HDF5 serial &
parallel I/Oparallel I/O
BioHDF SAF HDF-Packet HDF-EOSMatlabappl-specificappl-specific
APIsLANL LLNL, SNL Grids COTS NASA
Product model Applications
Examples: Thermonuclear simulationsProduct modelingData mining tools
Visualization tools
STEP data models
STEP-HDF5
![Page 37: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/37.jpg)
NARA-sponsored work
![Page 38: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/38.jpg)
PDES, Inc. Offsite Sept 2006 40
NCSA-THG NARA Research• Investigate the viability of scientific data
formats, such as HDF5, for long-term preservation of engineering data in the federal archives
![Page 39: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/39.jpg)
PDES, Inc. Offsite Sept 2006 41
Heterogeneous data aggregation, with HDF5 • Goal:
Using NARA’s TWR collection, investigate the possibilities and limitations of using HDF5 as a container for archiving heterogeneous collections of records, with special attention to STEP data.
![Page 40: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/40.jpg)
PDES, Inc. Offsite Sept 2006 42
Activities• Use files, datatypes, structures in NARA
TWR collection – STEP files, photos, schematics, etc.
• Map these to HDF5 objects and structures, exploiting features of HDF5
• Assess benefits and costs in terms of storage efficiency and accessibility
• Investigate use of HDF5 as container for collection
![Page 41: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/41.jpg)
PDES, Inc. Offsite Sept 2006 43
Relationship EuroSTEP, Electric Boat, et al
• Working together to develop mappings from EXPRESS to HDF5
• Sharing data for testing• Periodic meetings to share information
and coordinate research• Some involvement with standardization
![Page 42: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/42.jpg)
PDES, Inc. Offsite Sept 2006 44
Investigating I/O efficiency and size • Explore different datatypes and storage
options for b-spline surface models (later: finite element models)
• Two types of data – b-splines themselves and cartesian points
• Variables• Different HDF5 datatypes• Dataset compression• Use of extra indexes in HDF5 for fast access
![Page 43: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/43.jpg)
PDES, Inc. Offsite Sept 2006 45
Some results• Small files
• HDF5 not appreciably better then STEP, sometimes worse
• Large files• Compression always made HDF5 files smaller• Even without compression, HDF5 storage better• Indexing approach also tended to save space
• Lessons• HDF5 can provide very efficient storage for
cartesian points• Choice of data types and data storage is important
![Page 44: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/44.jpg)
HDF5 as container
HDFView Demo
![Page 45: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/45.jpg)
PDES, Inc. Offsite Sept 2006 47
![Page 46: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/46.jpg)
Thank you
![Page 47: NCSA-NARA investigations of HDF5 in support of EXPRESS-Driven data](https://reader033.vdocuments.site/reader033/viewer/2022051621/56814956550346895db6a881/html5/thumbnails/47.jpg)
PDES, Inc. Offsite Sept 2006 49
HDF Information• HDF Information Center
• http://hdfgroup.org/
• HDF Help email address• [email protected]/
• HDF users mailing list• [email protected]/