ensuring long term access to remotely sensed hdf4 data with layout maps ruth duerr, nsidc...

39
Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct. 16 2008 1 HDF and HDF-EOS Workshop XII

Upload: todd-carpenter

Post on 20-Jan-2016

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct. 16 20081HDF and

Ensuring Long Term Access to Remotely Sensed HDF4 Data

with Layout MapsRuth Duerr, NSIDC

Christopher Lynnes, GES DISC

The HDF Group

Oct. 16 2008 1HDF and HDF-EOS Workshop XII

Page 2: Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct. 16 20081HDF and

Background and basic concept

Oct. 16 2008 HDF and HDF-EOS Workshop XII 2

Page 3: Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct. 16 20081HDF and

Oct. 16 2008 HDF and HDF-EOS Workshop XII 3

HDF4 is

FLEXIBLE

EXTENSIBLE

SELF-DESCRIBING

I’m Plastic Man!I’m Plastic Man!

Page 4: Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct. 16 20081HDF and

ButThere’s a cost…

Oct. 16 2008 HDF and HDF-EOS Workshop XII 4

Page 5: Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct. 16 20081HDF and

Complexity!

Oct. 16 2008 HDF and HDF-EOS Workshop XII 5

Page 6: Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct. 16 20081HDF and

Oct. 16 2008 HDF and HDF-EOS Workshop XII 6

Page 7: Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct. 16 20081HDF and

Oct. 16 2008 HDF and HDF-EOS Workshop XII 7

Page 8: Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct. 16 20081HDF and

Oct. 16 2008 HDF and HDF-EOS Workshop XII 8

Page 9: Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct. 16 20081HDF and

Oct. 16 2008 HDF and HDF-EOS Workshop XII 9

Page 10: Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct. 16 20081HDF and

Oct. 16 2008 HDF and HDF-EOS Workshop XII 10

Page 11: Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct. 16 20081HDF and

Oct. 16 2008 HDF and HDF-EOS Workshop XII 11

Page 12: Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct. 16 20081HDF and

Oct. 16 2008 HDF and HDF-EOS Workshop XII 12

Page 13: Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct. 16 20081HDF and

Oct. 16 2008 HDF and HDF-EOS Workshop XII 13

How do we save HDF users from having to deal with all of

the complexity under the hood?

Page 14: Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct. 16 20081HDF and

Oct. 16 2008 HDF and HDF-EOS Workshop XII 14

Through the HDF software libraries, either by using the

HDF APIs directly or by using HDF tools that depend on the

HDF libraries.

But what about the future…

Page 15: Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct. 16 20081HDF and

• There is a risk in depending solely on the HDF libraries to access HDF-formatted data over the long term.

• It is possible, especially in the distant future, that the libraries may not be available.

Oct. 16 2008 15HDF and HDF-EOS Workshop XII

Page 16: Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct. 16 20081HDF and

Really smart people and software?

Oct. 16 2008 HDF and HDF-EOS Workshop XII 16

Maybe future data users and their computers will be so smart that the HDF4 format will be a piece of cake.

Page 17: Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct. 16 20081HDF and

Oct. 16 2008 HDF and HDF-EOS Workshop XII 17

Maybe not.

Page 18: Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct. 16 20081HDF and

We need an “easy” button

Oct. 16 2008 HDF and HDF-EOS Workshop XII 18

Page 19: Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct. 16 20081HDF and

Oct. 16 2008 HDF and HDF-EOS Workshop XII 19

“If only we could read HDF data with an read HDF data with an independent program that does not rely on independent program that does not rely on

the HDF API… the HDF API… A possible approach [would be to] extend

hdfls to print a hierarchical map of a data file, [and] write ncdump/hdp-like utilities to find,

assemble and write out SDSes and vdatas.”

“Leveraging HDF Utilities”Christopher LynnesHDF Workshop X.

Page 20: Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct. 16 20081HDF and

Oct. 16 2008 HDF and HDF-EOS Workshop XII 20

Page 21: Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct. 16 20081HDF and

Oct. 16 2008 HDF and HDF-EOS Workshop XII 21

HDF4 file layout

Page 22: Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct. 16 20081HDF and

Oct. 16 2008 HDF and HDF-EOS Workshop XII 22

HDF4 file layout

Page 23: Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct. 16 20081HDF and

The project

Oct. 16 2008 HDF and HDF-EOS Workshop XII 23

Page 24: Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct. 16 20081HDF and

HDF4 mapping

• Problem The complex internal byte layout of HDF files

requires one to use the API to access HDF data. This makes long-term readability of HDF data

dependent on long-term allocation of resources to support HDF software.

• Proposed solution Create a map of the layout of data objects in an

HDF file, allowing a simple reader to be written to access the data.

Oct. 16 2008 24HDF and HDF-EOS Workshop XII

Page 25: Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct. 16 20081HDF and

HDF4 mapping project activities

1. Assess and categorize HDF4 data held by NASA To determine what types of objects to map. To get an idea of the magnitude of the project.

2. Develop prototype for proof of concept Develop markup-language based layout

specification. Develop tool to produce layout for an HDF4 file. Develop and test two independent tools to read

HDF4 data based solely on the map files.

Oct. 16 2008 25HDF and HDF-EOS Workshop XII

Page 26: Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct. 16 20081HDF and

Project activities (continued)

3. Assess results and plan next steps Present results and options for proceeding to the

community. Assess the likely usefulness of this approach, as

well as any desirable modifications Evaluate the effort required for a full solution that

best meets community needs Submit a proposal for the work needed to provide

a full solution

Oct. 16 2008 26HDF and HDF-EOS Workshop XII

Page 27: Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct. 16 20081HDF and

1. Assess and categorize

Oct. 16 2008 HDF and HDF-EOS Workshop XII 27

Page 28: Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct. 16 20081HDF and

HDF and HDF-EOS Workshop XII

How many HDF4 products?

Data Center HDF4 Products

ASF 0

GES-DISC 236

GHRC 54

ASDC 63

LP-DAAC 67

NSIDC 47

ORNL-DAAC 2

PO.DAAC 22

SDAC 0

MrDC 95

Total 586

Oct. 16 2008 28

Page 29: Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct. 16 20081HDF and

HDF and HDF-EOS Workshop XII

Data characteristics

• Product Identification Product Name Data Level Archive Location Product Version

• Whether the product was multi-file• For HDF-EOS products

HDF-EOS version For point data

• Number of point data sets• Maximum number of levels

For swath data• Number of swaths• Maximum number of dimensions• Organized by time, space, both, or other• Whether dimension maps were used

For gridded data• Number of grids• Max number of dimensions in a grid• Number of projections used• Whether any grids were indexed

• HDF Version

• For raster data Number of 8-bit rasters Number of 24-bit rasters Number of general rasters Whether any rasters had attributes Whether any rasters were compressed Whether any rasters were chunked Whether there were any palettes

• For SDS data Number of SDSs Maximum number of dimensions Did any SDS have attributes Was any SDS annotated Were dimension scales used Was compression used and if so what kind Was chunking used

• For Vdata Number of Vdata structures Did any Vdata have attributes Did any Vdata fields have attributes Was compression used and if so what kind Was chunking used

Product Characteristics Examined

Oct. 16 2008 29

Page 30: Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct. 16 20081HDF and

HDF and HDF-EOS Workshop XII

Other results

• Slightly more than half of the HDF4 products are in HDF-EOS 2 format

• Grids are the most common HDF-EOS data structures in use

• No products use a combination of grid, swath, and point data structures

Oct. 16 2008 30

Page 31: Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct. 16 20081HDF and

2. Prototype and proof of concept

Oct. 16 2008 HDF and HDF-EOS Workshop XII 31

Page 32: Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct. 16 20081HDF and

HDF4 mapping prototype workflow

October 15-18, 2008 HDF and HDF-EOS Workshop XII 32

HDF4 File “H4.hdf”

HDF4 File “H4.hdf”

HDF4 Mapping File (XML document)“H4.hdf.map.xml”

HDF4 Mapping File (XML document)“H4.hdf.map.xml”

hmaplinked with HDF4 library

hmaplinked with HDF4 library

Reader 1(C program)

Object DataObject Data Groups, Data Objects, Structural and Application

Metadata; Locations of Object Data

Reader 2(Perl Script)Reader 2

(Perl Script)

Page 33: Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct. 16 20081HDF and

Proof-of-concept results

• The HDF Group created prototype map generation software and a draft map specification

• Map generator was tested on a wide variety of data products

• GES-DISC and NSIDC independently wrote software that uses maps to read data files in NSIDC’s and GES-DISC’s archives

• Summary - the concept is feasible!

Oct. 16 2008 33HDF and HDF-EOS Workshop XII

Page 34: Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct. 16 20081HDF and

Example map fragment

<?xml version="1.0" encoding="utf-8"?><hdf4:HDFMap xmlns:hdf4="http://www.hdfgroup.org/HDF4/HDF4Map"> <hdf4:RootGroup> <hdf4:SDS objName="data1" objPath="/" objID="xid-DFTAG_NDG-2"> <hdf4:Attribute name="data range" ntDesc="32-bit signed integer"> 0 255 </hdf4:Attribute> <hdf4:Datatype dtypeClass="INT" dtypeSize="4" byteOrder="BE" /> <hdf4:Dataspace ndims="2"> 10 100 </hdf4:Dataspace> <hdf4:Datablock nblocks="1"> <hdf4:BlockOffset> 2502 </hdf4:BlockOffset> <hdf4:BlockNbytes> 4000 </hdf4:BlockNbytes> </hdf4:Datablock> </hdf4:SDS> </hdf4:RootGroup></hdf4:HDFMap>

Oct. 16 2008 34HDF and HDF-EOS Workshop XII

Page 35: Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct. 16 20081HDF and

Next steps

Oct. 16 2008 HDF and HDF-EOS Workshop XII 35

Page 36: Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct. 16 20081HDF and

Effort for full implementation

• Generate maps for existing archives GES-DISC approach: append the map XML to the XML

files already kept for each file in their archive NSIDC non-ECS data implementation: add an XML file

for each data file in same directory Other systems TBD

• Generate maps for new data Add map generation as a step in the ingest process

using stand alone tool Request product generation systems to use new API

calls that generate maps• Develop production quality implementation of

mapping tool, and possibly an API.• Possibly do similar assessment for HDF5 maps.HDF and HDF-

EOS Workshop XII 36Oct. 16 2008

Page 37: Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct. 16 20081HDF and

HDF and HDF-EOS Workshop XII

How you can help

• Consider what it might take to implement this for your archive - contact Ruth if you’d like support

• Review the materials on the wiki and elsewhere - comment heavily!

Oct. 16 2008 37

Page 38: Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct. 16 20081HDF and

For more information

• Wiki page added to Confluence wiki• Project page at The HDF Group website:

http://www.hdfgroup.org/projects/hdf4mapping/

• Paper at 2008 fall AGU• Paper “Ensuring Long Term Access to Remotely

Sensed Data with Layout Maps” in the upcoming TGRSS special issue on archiving and distribution

HDF and HDF-EOS Workshop XII 38Oct. 16 2008

Page 39: Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct. 16 20081HDF and

Thank you.This report is based upon work supported in part

by a Cooperative Agreement with the National Aeronautics and Space Administration (NASA)

under NASA Award NNX06AC83A. Any opinions, findings, and conclusions or

recommendations expressed in this material are those of the author(s) and do not necessarily

reflect the views of the National Aeronautics and Space Administration.

Oct. 16 2008 HDF and HDF-EOS Workshop XII 39