e-irg open workshop on e-infrastructures 4-5 oct 2006 caspar project digital preservation and...

25
e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 CASPAR Project Digital Preservation and Digital interoperability

Post on 21-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: E-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 CASPAR Project Digital Preservation and Digital interoperability

e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006

CASPAR Project

Digital Preservation and

Digital interoperability

Page 2: E-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 CASPAR Project Digital Preservation and Digital interoperability

e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006

Outline• Unfamiliar Data• Usability• Link to Preservation• OAIS Reference Model• OAIS Information Model• Representation Information• Preservation and Virtualisation• CASPAR project

Page 3: E-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 CASPAR Project Digital Preservation and Digital interoperability

e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006

Unfamiliar Data• E-Research/e-Infrastructures allow users to

find and try to use data from many sources• Some familiar sources• Most available sources will be unfamiliar• How can one be sure that the unfamiliar data

is used correctly• Garbage in – garbage out principle• Various horror stories

Page 4: E-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 CASPAR Project Digital Preservation and Digital interoperability

e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006

Usability• Ability for the user to “do something” with the

bits• Preferably using software

– Even better if software does not have to be specially written

• Better still if user does not have to guess what to do or trawl around looking for documentation

• Could use existing software to display and process – but how do we prevent nonsense being produced accidentally.

Page 5: E-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 CASPAR Project Digital Preservation and Digital interoperability

e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006

Link to Preservation• An archive is just another remote source of

digitally encoded information– Preserved digital data was created some time ago

– possibly a considerable time ago (decades)

• Digital Preservation can mean many things• Simplest type is just keeping the “bits” and

making sure they are available

• A more useful definition comes from OAIS

Page 6: E-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 CASPAR Project Digital Preservation and Digital interoperability

e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006

OAIS Reference Model• ISO 14721 : Reference Model for an Open Archival Information System

(OAIS). • An OAIS is an archive, consisting of an organization of people and

systems, that has accepted the responsibility to preserve information and make it available for a Designated Community.

• Long Term Preservation: The act of maintaining information, in a correct and Independently Understandable form, over the Long Term.

• Long Term is long enough to be concerned with the impacts of changing technologies, including support for new media and data formats, or with a changing user community.

• Designated Community: An identified group of potential Consumers who should be able to understand a particular set of information. The Designated Community may be composed of multiple user communities.

• Has sufficient documentation to allow the information to be understood and used by the Designated Community without having to resort to special resources not widely available, including named individuals.

Page 7: E-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 CASPAR Project Digital Preservation and Digital interoperability

e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006

Information ObjectsInformation

Object

RepresentationInformation

1+

interpretedusing1+Data

Object

interpretedusing

PhysicalObject

DigitalObject

BitSequence

1+

Recursion ends at KNOWLEDGEBASE (of whom?)

(tacit knowledge)

Page 8: E-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 CASPAR Project Digital Preservation and Digital interoperability

e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006

Representation Information• The Data Object is “interpreted using” the

Representation Information (RepInfo) • The Reference Model is designed to ensure

that an OAIS is not set the impossible task of having to provide all possible RepInfo immediately

• Hence:– Take account of the Designated Community and its

associated Knowledge Base• Note that RepInfo may itself need further

RepInfo • NB very important for CERTIFICATION

Page 9: E-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 CASPAR Project Digital Preservation and Digital interoperability

e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006

Representation Information The Representation Information accompanying a physical

object, like a moon rock, may give additional meaning– It typically is a result of some analysis of the physically observable

attributes of the rock

The Representation Information accompanying a digital object, or sequence of bits, is used to provide additional meaning.

– It typically maps the bits into commonly recognized data types such as character, integer, and real and into groups of these data types.

– It associates these with higher level meanings which can have complex inter-relationships that are also described

Page 10: E-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 CASPAR Project Digital Preservation and Digital interoperability

e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006

Designated Community • general English reading public educated to High School and

above, with access to a Web Browser (HTML 4.0 capable) • GIS data: GIS researchers - undergraduates and above, having an

understanding of the concepts of Geographic data; having access to current (2005, USA) GIS tools/computer software e.g. ArcInfo (2005)

• Astronomer (undergraduate and above) with access to FITS software such as FITSIO, familiar with astronomical spectrographic instruments

• Student of Middle English with an understanding of TEI encoding and access to an XML rendering environment. – Variant 1: Cannot understand TEI – Variant 2: Cannot understand TEI and no access to XML rendering

environment – Variant 3: No understanding of Middle English but does understand

TEI and XML

Page 11: E-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 CASPAR Project Digital Preservation and Digital interoperability

e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006

Rep.Info. Classification

Page 12: E-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 CASPAR Project Digital Preservation and Digital interoperability

e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006

Structure• Distinguish

– formats which are used mainly for rendering – to be followed by human inspection, and

– formats used for automated processing – particularly important for science data

• Distinguish:– Things with unknown structure – needs software

• proprietary software e.g. MS Word• Open Source software e.g. CDF

– Things with known/well described structure• ASCII file, FITS file, telemetry etc

– Document the format– Use description language if possible e.g. EAST, DFDL, – The EAST tools are themselves Representation Information which in due course will

have to be fully defined – the closure of their Representation Nets will be the EAST standard

• Higher level definitions should include useful scientific objects and humanities objects

Page 13: E-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 CASPAR Project Digital Preservation and Digital interoperability

e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006

Layered Model from OAIS

Page 14: E-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 CASPAR Project Digital Preservation and Digital interoperability

e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006

Semantics– Meaning/ Relationships

• Data Dictionaries• Thesauri• Ontologies• Semantic interoperability

Page 15: E-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 CASPAR Project Digital Preservation and Digital interoperability

e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006

Time Dependent Information– Many, perhaps most, datasets change over time and

the state at each particular moment in time may be important. It may be useful to break the issue into separate parts.

• at each moment in time we could, in principle, take a snapshot and store it. That snapshot has its associated Representation Net.

• efficient storage of a series of snapshots may lead one to store differences or include time tags in the data

– Additional Representation Information would be needed which describes how to get to a particular time's snapshot from the efficiently encoded version.

– Also applies to ANNOTATION – who said what about which and when did they say it

Page 16: E-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 CASPAR Project Digital Preservation and Digital interoperability

e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006

Actions and Processes (Behaviour)

• Some information has, as an integral part of its content, an implicit or explicit process associated with it – An examples of this is a database or other

time dependent or reactive system such as a Neural Net.

• Emulations– Limited – but may be adequate for rendered

document-type data

Page 17: E-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 CASPAR Project Digital Preservation and Digital interoperability

e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006

Sharing RepInfo• RepInfo is needed• RepInfo is extensive• May need to “extend” RepInfo as

Designated Community and/or its knowledgebase changes

• How can we avoid every Repository repeating the work– Need to control costs

• Need to share the effort

Page 18: E-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 CASPAR Project Digital Preservation and Digital interoperability

e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006

Requirements• Data users - need to be able to obtain

pre-identified RepInfo

• Curators: need to be able to find suitable pre-existing RepInfo to re-use

Or

• Create RepInfo

Page 19: E-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 CASPAR Project Digital Preservation and Digital interoperability

e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006

Registry for Representation Info

The Digital Object could have RepInfo packed with it, as well as CPID

Support automated access & processing

Rep. Info. Registry/Repository

network

Archive

User

Representation Information

Digital Object

CPID

CPID

CPIDCPID

CPID

CPID

CPID

Rep. Info. Registry/Repository

network

Archive

User

Representation Information

Digital Object

CPID

CPID

CPIDCPIDCPIDCPID

CPIDCPID

CPIDCPID

CPIDCPID

•1 – User gets data from archive. Data has associated Curation Persistent Identifier (CPID)

•2

•2 – User unfamiliar with data so requests Rep.Info.using CPID

•1

•3•3 – User receives Rep.Info – which has its own CPID in case it is not immediately usable

Page 20: E-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 CASPAR Project Digital Preservation and Digital interoperability

e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006

Use of RepInfo

CPIDStructure = CPID

Semantics = CPID

Rendering s/w = CPID

CPID

Structure = CPID

Semantics = CPID

Rendering s/w = CPID

External Registry

Each “bag of bits” has an associated pointer (CPID) to a Label

•DCC Label – points to other RepInfo

CPID

•copy

Page 21: E-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 CASPAR Project Digital Preservation and Digital interoperability

e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006

CASPAR – EU FP6Cultural, Artistic and Scientific

knowledge for Preservation Access and retrieval

• Closely follows DCC Development ideas• Approx 16 M Euro – 8.8M from EU• 17 Partners• Led by CCLRC

– Co-ordinator: David Giaretta

See http://www.casparpreserves.eu

Page 22: E-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 CASPAR Project Digital Preservation and Digital interoperability

e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006

CASPAR Consortium

See http://www.casparpreserves.eu

Page 23: E-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 CASPAR Project Digital Preservation and Digital interoperability

e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006

CASPAR information flow architecture

•Rep

•Info

Page 24: E-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 CASPAR Project Digital Preservation and Digital interoperability

e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006

CASPAR Integrated architecture

See http://www.casparpreserves.eu

Page 25: E-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 CASPAR Project Digital Preservation and Digital interoperability

e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006

Possible Infrastructure Build-up

European Preservation Infrastructure

Task Force on Permanent Access Alliance

Other Alliance Members

Other Alliance Members

CCLRC Curation Activities

CASPAR

Other CCLRC projects

Other CCLRC projects

FP7 projects

http://tfpa.kb.nl