where are we with digital preservation?

21
Where are we with Digital Preservation? Andrew Waugh Public Record Office Victoria

Upload: matt

Post on 11-Jan-2016

33 views

Category:

Documents


0 download

DESCRIPTION

Where are we with Digital Preservation?. Andrew Waugh Public Record Office Victoria. Where are we?. It is not the end. It may not even be the beginning of the end. But it is undoubtedly the end of the beginning Winston Churchill This talk will cover Consensus views on digital presevation - PowerPoint PPT Presentation

TRANSCRIPT

Where are we with Digital Preservation?

Andrew Waugh

Public Record Office Victoria

Where are we?

• It is not the end. It may not even be the beginning of the end. But it is undoubtedly the end of the beginning– Winston Churchill

• This talk will cover– Consensus views on digital presevation– Open questions and future challenges

What this presentation will cover• Understanding (building systems)

• Storage (preserving the bit strings)

• Access (preserving the meaning)

• Metadata (preserving the context & authenticity)

• Transfer (overcoming system senescence)

Understanding• Communication requires shared terminology and

concepts• Open Archival Information System (OAIS)

reference model (IS 14721:2003)– http://public.ccsds.org/publications/archive/650x0b1.pdf– High level terminology very widely used, but few use

the detail in the model– Does not cover preservation– Pre web and detail does not reflect actual

implementations– Currently under review

Trusted digital repositories• How can you be sure if an organisation (& its

system) is up to holding your digital objects?• Trustworthy Repositories Audit and Certification

– CRL/NARA (2007)• http://www.crl.edu/content.asp?l1=13&l2=58&l3=162&l4=91

– Administrative focus rather than technical– high level (cannot be tested)– Based on OAIS, basis for audit checklists

Audit checklists

• Provide tests to see if a repository can be trusted– Drambora: DCC/DPE (2007)

• Risk based, self certification• http://www.repositoryaudit.eu/

Public domain digital repositories• Public domain digital repository code

– D-Space (http://www.dspace.org/)– Fedora (http://www.fedora-commons.org/)

• Both came out of the academic community and primarily support institutional repositories

Storage – preserving the bit string• Fundamental task of digital preservation is

ensuring that the bits that make up the digital objects are preserved

• “Solved” problem – large scale data repositories have existed for decades and there is lots of operational experience

• Archival twist: actively monitor health of stored objects using hashes

Storage - future challenges• Reducing storage cost (and chance for error)

– Swedish National Archives estimated in 2005 between 4 and 8 Euro per digitised page mostly in system and support costs

– http://www.tape-online.net/docs/Palm_Black_Hole.pdf • Reducing risks

– Administrator risk vs packaged risk• Ideal storage system

– Packaged (i.e. built in administration such as the Centera)– Open so that you can trust it and replace components

• CLOCKSS– Uses redundant copies at participating institutions to ensure

preservation (LOCKSS)– http://www.clockss.org/clockss/Home

Access – preserving the meaning• What do you do when you no longer have

an application to open the data files?

• Current approach is either– Do nothing now with eventual migration– Normalisation upon accession

• Future approach might be emulation

Migration

• Save what you capture now and convert to new formats as required– Web harvesting (studies show web sites are

mostly safe formats – HTML, XML, jpeg, gif, etc)

– Formats (and software) proving surprisingly resilient

Normalisation

• Convert upon accession to small number of long term preservation formats– E.g. PDF/A (PROV), ODF (NAA)– Immediate cost upon accession, but expected

lower long term management cost– Criteria for good LTPF (Library of Congress)

• http://www.digitalpreservation.gov/formats/intro/intro.shtml

Challenges

• What is it? Tools to determine file formats– Pronom – repository of format descriptions

and DROID (format classifier) http://www.nationalarchives.gov.uk/pronom/

– JHOVE (Harvard) classifier and simple validation http://hul.harvard.edu/jhove/

• How accurate is the conversion?

• Is it a valid file according to the standard?

Metadata is better data

• Metadata is information about the bit string– What it is (semantic)– What it is (technical)– How it relates to other digital objects– What is its history?– How is it to be managed?

• Unfortunately, lots and lots of large metadata standards

Metadata standards

• For an excellent summary of metadata standards see the Metadata chapter in the DCC Digital Curation Manual– http://www.dcc.ac.uk/resource/curation-

manual/chapters/metadata/metadata.pdf

Digital preservation metadata• Data Dictionary for Preservation Metadata

(PREMIS)– little descriptive information and nothing format

specific– http://www.loc.gov/standards/premis/

• ISO 23081 (Metadata for records)• National Archives Australia Recordkeeping

Metadata Standard – http://www.naa.gov.au/Images/rkms_pt1_2_tcm2-1036.pdf

Future challenges

• Too many competing standards– Which do I implement?

• Too many elements– Increases cost of standard development and

software implementation

• Few elements ever used– Too expensive and too hard to capture

metadata

TransferOvercoming system senescence• Digital objects have a much longer life

than the systems that hold them– Move objects to digital repositories where

they can be properly managed– Move them from one digital repository to its

replacement

• Storage is so cheap that holders may be tempted to keep digital objects (until it is too late)

Future challenges

• Current systems are not designed around the assumption that digital objects must be relocated– AIHT, Conceptual Issues from Practical Tests, Clay

Shirky, D-Lib Magazine, Vol 11 No 12, December 2005, http://www.dlib.org/dlib/december05/shirky/12shirky.html

• ADRI-UN/CEFACT work on a standard to transfer custody of digital records

More information

• If I have whetted your appetite...– PADI Annotated bibliography of digital

preservation (http://www.nla.gov.au/padi/)– D-Lib Magazine (http://www.dlib.org/)

Final thoughts

• We know about compasses, and we have some charts, but there are a lot of rocks out there… We are a long way from satellite navigation

• What about small/medium archives… personal archives?

• Are photographs better digital or as negatives?– http://www.wilhelm-research.com/