ibm haifa research lab © 2008 ibm corporation contacts: simona cohen, michael factor, dalit naor

24
IBM Haifa Research Lab © 2008 IBM Corporation Contacts: Simona Cohen, Michael Factor, Dalit Naor http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml Preservation DataStores (PDS): A demonstration of Preservation Aware Storage

Upload: jacob-kennedy

Post on 27-Mar-2015

222 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: IBM Haifa Research Lab © 2008 IBM Corporation Contacts: Simona Cohen, Michael Factor, Dalit Naor

IBM Haifa Research Lab

© 2008 IBM Corporation

Contacts: Simona Cohen, Michael Factor, Dalit Naor

http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

Preservation DataStores (PDS): A demonstration of Preservation Aware Storage

Page 2: IBM Haifa Research Lab © 2008 IBM Corporation Contacts: Simona Cohen, Michael Factor, Dalit Naor

IBM Haifa Research Lab

© 2008 IBM Corporation2 http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

PDS Overview

OAIS–based preservation-aware storage, media-agnostic, and generic storage to support logical preservation

Manage preservation specific metadata– Compute fixity– Update technical provenance– Manage PDI RepInfo– Ensure referential integrity

Storlet container– Module container that can execute restricted modules with predefined interfaces for

data intensive functions, e.g., transformations, fixity calculations– Optimal scheduling– Update PDS modules, e.g., fixity algorithm, packaging format

Managing availability/ data loss– Physically co-locate data and metadata– Cluster related AIPs on the same media unit, based upon their relative importance

AIP identifier generation – globally unique identifiers

Content DataObject

Representation Information

Reference

Context Fixity

Provenance

Content Information Preservation Descriptive Information

Page 3: IBM Haifa Research Lab © 2008 IBM Corporation Contacts: Simona Cohen, Michael Factor, Dalit Naor

IBM Haifa Research Lab

© 2008 IBM Corporation3 http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

PDS Architecture

Layered approach is based on open standards: OAIS, XAM, OSD

In CASPAR, all layers are utilized. In other embodiments, only some layers can be used

Utilizes XAM to provide logical abstraction of containers (XSets)

Offloads preservation functionality to storage

ingestAIP, accessAIP, transformAIP, getPreservationPolicy

Page 4: IBM Haifa Research Lab © 2008 IBM Corporation Contacts: Simona Cohen, Michael Factor, Dalit Naor

IBM Haifa Research Lab

© 2008 IBM Corporation4 http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

AIP Representation in the Preservation Engine

The basic building-block of an AIP is the Information Object

– A single instance of data is a byte stream

– Zero or more instances of RepInfo to interpret the data

RepInfo is also an AIP– Shared resource with

unique ID

– RepInfo recursive network

– Keep the design simple

Page 5: IBM Haifa Research Lab © 2008 IBM Corporation Contacts: Simona Cohen, Michael Factor, Dalit Naor

IBM Haifa Research Lab

© 2008 IBM Corporation5 http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

Preservation of an AIP

Ingest AIP– Storing an AIP in PDS

Bit and logical preservation of an AIP– Bit migrations

– Format migrations - data transformation• Transformation modules are packed as AIPs and preserved• Transformation result is a new version for the original AIP

– Migrations are documented as Provenance records

– During migrations PDS performs operations on AIP• Update PDI (e.g., fixity calculation, additional provenance events)• Execute previously loaded storlets

Access AIP– Retrieval of an AIP

• By retrieval time, the original AIP may have several versions and copies

Page 6: IBM Haifa Research Lab © 2008 IBM Corporation Contacts: Simona Cohen, Michael Factor, Dalit Naor

IBM Haifa Research Lab

© 2008 IBM Corporation

Demo

Page 7: IBM Haifa Research Lab © 2008 IBM Corporation Contacts: Simona Cohen, Michael Factor, Dalit Naor

IBM Haifa Research Lab

© 2008 IBM Corporation7 http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

MST DataNatural Environment Research Council (NERC) Mesosphere-Stratosphere-Troposphere (MST) Radar at Aberystwyth

Provides measurements of vertical and horizontal components of the wind

Highly complex data

Preserved for the atmospheric scientific community

Needs long-term preservation for research purposes

Community is highly dependent on interoperability

Page 8: IBM Haifa Research Lab © 2008 IBM Corporation Contacts: Simona Cohen, Michael Factor, Dalit Naor

IBM Haifa Research Lab

© 2008 IBM Corporation8 http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

British Atmospheric Data Centre and Self-Describing Data Formats

BADC strongly advocate the use of self-describing files to capture provenance and semantic information

Two self-describing file formats compete in the Atmospheric Science Community – NASA AMES (ASCII)

– NetCDF (binary)

The current user community finds NetCDF more suitable

Page 9: IBM Haifa Research Lab © 2008 IBM Corporation Contacts: Simona Cohen, Michael Factor, Dalit Naor

IBM Haifa Research Lab

© 2008 IBM Corporation9 http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

10 years later

20 years later

Ingest AIP

Load and apply transformation

Access and view transformed data

View AIP

Demo Flow

Page 10: IBM Haifa Research Lab © 2008 IBM Corporation Contacts: Simona Cohen, Michael Factor, Dalit Naor

IBM Haifa Research Lab

© 2008 IBM Corporation10 http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

pds_view

Video:

Page 11: IBM Haifa Research Lab © 2008 IBM Corporation Contacts: Simona Cohen, Michael Factor, Dalit Naor

IBM Haifa Research Lab

© 2008 IBM Corporation11 http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

Ingest AIP flow in PDS

Preservation Engine

PDS web service

XAM

XAM to OSD (VIM)

OSD

ingest (AIP)

XUID

AIP ID

create XSet

write fields (AIP data)

commit

create OSD objects

write (XSet data)

Unpack AIP Generate AIP ID Compute fixity Add “ingest”

provenance event

store (OID, XUID) mapping persistently

store (XUID, AIP ID) mapping persistently

Page 12: IBM Haifa Research Lab © 2008 IBM Corporation Contacts: Simona Cohen, Michael Factor, Dalit Naor

IBM Haifa Research Lab

© 2008 IBM Corporation12 http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

10 years later

20 years later

Ingest AIP

Load and apply transformation

Access and view transformed data

View AIP

Demo Flow

Page 13: IBM Haifa Research Lab © 2008 IBM Corporation Contacts: Simona Cohen, Michael Factor, Dalit Naor

IBM Haifa Research Lab

© 2008 IBM Corporation13 http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

pds_ingest

Video:

Page 14: IBM Haifa Research Lab © 2008 IBM Corporation Contacts: Simona Cohen, Michael Factor, Dalit Naor

IBM Haifa Research Lab

© 2008 IBM Corporation14 http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

Page 15: IBM Haifa Research Lab © 2008 IBM Corporation Contacts: Simona Cohen, Michael Factor, Dalit Naor

IBM Haifa Research Lab

© 2008 IBM Corporation15 http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

Format Change

Converting NetCDF to NASA-AMES

– The MST NetCDF file version is no longer supported by the NetCDF foundations libraries, and files are not backwards compatible with this NetCDF version

– The skills base of the atmospheric scientist in the future is heavily biased towards the use of ASCII-based file formats

– The NASA-AMES format has become the preferred data format of the atmospheric scientists’ community

Converting NetCDF to PNG

– Provide additional quick image view of the data for users with limited access permissions

Page 16: IBM Haifa Research Lab © 2008 IBM Corporation Contacts: Simona Cohen, Michael Factor, Dalit Naor

IBM Haifa Research Lab

© 2008 IBM Corporation16 http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

Load transformation flow in PDS

Preservation Engine

PDS web service

loadTransformation (AIP)

AIP ID

Register transformation Ingest transformation AIP

Page 17: IBM Haifa Research Lab © 2008 IBM Corporation Contacts: Simona Cohen, Michael Factor, Dalit Naor

IBM Haifa Research Lab

© 2008 IBM Corporation17 http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

Transform AIP flow in PDS

PDS web service

transformAip (targetAIPID, transformationAIPID)

AIP ID

Access transformation AIP Access target AIP Execute transformation Ingest result as new AIP, a

version of the target AIP

Preservation Engine

Page 18: IBM Haifa Research Lab © 2008 IBM Corporation Contacts: Simona Cohen, Michael Factor, Dalit Naor

IBM Haifa Research Lab

© 2008 IBM Corporation18 http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

10 years later

20 years later

Ingest AIP

Load and apply transformation

Access and view transformed data

View AIP

Demo Flow

Page 19: IBM Haifa Research Lab © 2008 IBM Corporation Contacts: Simona Cohen, Michael Factor, Dalit Naor

IBM Haifa Research Lab

© 2008 IBM Corporation19 http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

pds_xform

Video:

Page 20: IBM Haifa Research Lab © 2008 IBM Corporation Contacts: Simona Cohen, Michael Factor, Dalit Naor

IBM Haifa Research Lab

© 2008 IBM Corporation20 http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

Page 21: IBM Haifa Research Lab © 2008 IBM Corporation Contacts: Simona Cohen, Michael Factor, Dalit Naor

IBM Haifa Research Lab

© 2008 IBM Corporation21 http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

Access AIP flow in PDS

Preservation Engine

PDS web service

XAM

XAM to OSD (VIM)

OSD

access (AIP ID)

read OSD objects

AIP data

AIP

XSet data

open XSet (XUID)

read fields

Lookup XUID by AIP ID

Lookup OSD OID by XUID

Validate AIP ID Validate fixity Add “access”

provenance event Package AIP

Page 22: IBM Haifa Research Lab © 2008 IBM Corporation Contacts: Simona Cohen, Michael Factor, Dalit Naor

IBM Haifa Research Lab

© 2008 IBM Corporation22 http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

Demo Flow

10 years later

20 years later

Ingest AIP

Load and apply transformation

Access and view transformed data

View AIP

Page 23: IBM Haifa Research Lab © 2008 IBM Corporation Contacts: Simona Cohen, Michael Factor, Dalit Naor

IBM Haifa Research Lab

© 2008 IBM Corporation23 http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

pds_access

Video:

Page 24: IBM Haifa Research Lab © 2008 IBM Corporation Contacts: Simona Cohen, Michael Factor, Dalit Naor

IBM Haifa Research Lab

© 2008 IBM Corporation

Thank you!

Haifa preservation team:

Simona Cohen, Michael Factor, Aner Hamama, Ealan Henis, Dalit Naor, Petra Reshef, Shahar Ronen