ibm haifa research lab © 2008 ibm corporation contacts: simona cohen, michael factor, dalit naor
TRANSCRIPT
IBM Haifa Research Lab
© 2008 IBM Corporation
Contacts: Simona Cohen, Michael Factor, Dalit Naor
http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
Preservation DataStores (PDS): A demonstration of Preservation Aware Storage
IBM Haifa Research Lab
© 2008 IBM Corporation2 http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
PDS Overview
OAIS–based preservation-aware storage, media-agnostic, and generic storage to support logical preservation
Manage preservation specific metadata– Compute fixity– Update technical provenance– Manage PDI RepInfo– Ensure referential integrity
Storlet container– Module container that can execute restricted modules with predefined interfaces for
data intensive functions, e.g., transformations, fixity calculations– Optimal scheduling– Update PDS modules, e.g., fixity algorithm, packaging format
Managing availability/ data loss– Physically co-locate data and metadata– Cluster related AIPs on the same media unit, based upon their relative importance
AIP identifier generation – globally unique identifiers
Content DataObject
Representation Information
Reference
Context Fixity
Provenance
Content Information Preservation Descriptive Information
IBM Haifa Research Lab
© 2008 IBM Corporation3 http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
PDS Architecture
Layered approach is based on open standards: OAIS, XAM, OSD
In CASPAR, all layers are utilized. In other embodiments, only some layers can be used
Utilizes XAM to provide logical abstraction of containers (XSets)
Offloads preservation functionality to storage
ingestAIP, accessAIP, transformAIP, getPreservationPolicy
IBM Haifa Research Lab
© 2008 IBM Corporation4 http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
AIP Representation in the Preservation Engine
The basic building-block of an AIP is the Information Object
– A single instance of data is a byte stream
– Zero or more instances of RepInfo to interpret the data
RepInfo is also an AIP– Shared resource with
unique ID
– RepInfo recursive network
– Keep the design simple
IBM Haifa Research Lab
© 2008 IBM Corporation5 http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
Preservation of an AIP
Ingest AIP– Storing an AIP in PDS
Bit and logical preservation of an AIP– Bit migrations
– Format migrations - data transformation• Transformation modules are packed as AIPs and preserved• Transformation result is a new version for the original AIP
– Migrations are documented as Provenance records
– During migrations PDS performs operations on AIP• Update PDI (e.g., fixity calculation, additional provenance events)• Execute previously loaded storlets
Access AIP– Retrieval of an AIP
• By retrieval time, the original AIP may have several versions and copies
IBM Haifa Research Lab
© 2008 IBM Corporation
Demo
IBM Haifa Research Lab
© 2008 IBM Corporation7 http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
MST DataNatural Environment Research Council (NERC) Mesosphere-Stratosphere-Troposphere (MST) Radar at Aberystwyth
Provides measurements of vertical and horizontal components of the wind
Highly complex data
Preserved for the atmospheric scientific community
Needs long-term preservation for research purposes
Community is highly dependent on interoperability
IBM Haifa Research Lab
© 2008 IBM Corporation8 http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
British Atmospheric Data Centre and Self-Describing Data Formats
BADC strongly advocate the use of self-describing files to capture provenance and semantic information
Two self-describing file formats compete in the Atmospheric Science Community – NASA AMES (ASCII)
– NetCDF (binary)
The current user community finds NetCDF more suitable
IBM Haifa Research Lab
© 2008 IBM Corporation9 http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
10 years later
20 years later
Ingest AIP
Load and apply transformation
Access and view transformed data
View AIP
Demo Flow
IBM Haifa Research Lab
© 2008 IBM Corporation10 http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
pds_view
Video:
IBM Haifa Research Lab
© 2008 IBM Corporation11 http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
Ingest AIP flow in PDS
Preservation Engine
PDS web service
XAM
XAM to OSD (VIM)
OSD
ingest (AIP)
XUID
AIP ID
create XSet
write fields (AIP data)
commit
create OSD objects
write (XSet data)
Unpack AIP Generate AIP ID Compute fixity Add “ingest”
provenance event
store (OID, XUID) mapping persistently
store (XUID, AIP ID) mapping persistently
IBM Haifa Research Lab
© 2008 IBM Corporation12 http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
10 years later
20 years later
Ingest AIP
Load and apply transformation
Access and view transformed data
View AIP
Demo Flow
IBM Haifa Research Lab
© 2008 IBM Corporation13 http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
pds_ingest
Video:
IBM Haifa Research Lab
© 2008 IBM Corporation14 http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
IBM Haifa Research Lab
© 2008 IBM Corporation15 http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
Format Change
Converting NetCDF to NASA-AMES
– The MST NetCDF file version is no longer supported by the NetCDF foundations libraries, and files are not backwards compatible with this NetCDF version
– The skills base of the atmospheric scientist in the future is heavily biased towards the use of ASCII-based file formats
– The NASA-AMES format has become the preferred data format of the atmospheric scientists’ community
Converting NetCDF to PNG
– Provide additional quick image view of the data for users with limited access permissions
IBM Haifa Research Lab
© 2008 IBM Corporation16 http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
Load transformation flow in PDS
Preservation Engine
PDS web service
loadTransformation (AIP)
AIP ID
Register transformation Ingest transformation AIP
IBM Haifa Research Lab
© 2008 IBM Corporation17 http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
Transform AIP flow in PDS
PDS web service
transformAip (targetAIPID, transformationAIPID)
AIP ID
Access transformation AIP Access target AIP Execute transformation Ingest result as new AIP, a
version of the target AIP
Preservation Engine
IBM Haifa Research Lab
© 2008 IBM Corporation18 http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
10 years later
20 years later
Ingest AIP
Load and apply transformation
Access and view transformed data
View AIP
Demo Flow
IBM Haifa Research Lab
© 2008 IBM Corporation19 http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
pds_xform
Video:
IBM Haifa Research Lab
© 2008 IBM Corporation20 http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
IBM Haifa Research Lab
© 2008 IBM Corporation21 http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
Access AIP flow in PDS
Preservation Engine
PDS web service
XAM
XAM to OSD (VIM)
OSD
access (AIP ID)
read OSD objects
AIP data
AIP
XSet data
open XSet (XUID)
read fields
Lookup XUID by AIP ID
Lookup OSD OID by XUID
Validate AIP ID Validate fixity Add “access”
provenance event Package AIP
IBM Haifa Research Lab
© 2008 IBM Corporation22 http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
Demo Flow
10 years later
20 years later
Ingest AIP
Load and apply transformation
Access and view transformed data
View AIP
IBM Haifa Research Lab
© 2008 IBM Corporation23 http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
pds_access
Video:
IBM Haifa Research Lab
© 2008 IBM Corporation
Thank you!
Haifa preservation team:
Simona Cohen, Michael Factor, Aner Hamama, Ealan Henis, Dalit Naor, Petra Reshef, Shahar Ronen