bratislava, slovakia 12 april 2011 steve knight, programme director preservation research and...
TRANSCRIPT
Bratislava, Slovakia12 April 2011
Steve Knight, Programme Director Preservation Research and Consultancy
National Library of New Zealand
University Library in Bratislava
Today
- Some introductory comments about the digital environment
- A bit about the National Digital Heritage Archive project at NLNZ
- Leveraging the NDHA for the Government Digital Archive and the public record
- A look at how we’re starting to model preservation risk management
- Some comments on the state of digital preservation
- Some comments on getting started in digital preservation
Some introductory remarks about the digital environment
“The problem with floppy disks and the loss of NASA’s records of the first moon landing are two of the most striking examples of what can happen when digital preservation is not taken care of properly.”
Interview Dutch Ministry of Education, Culture & Science (OCW), 8 Feb 2010
PARSE.Insight
June 2010
PARSE.Insight
June 2010
In 2007, the amount of digital information created in a year surpassed, for the first time, the amount of storage to deal with it.
Of course we don't need to store all the bits created - like digital TV signals, phone-call routing information, or old email spam.
But if we wanted to, we couldn't.
IDC, 2008IDC, 2008
About the only growth rate that hasn’t gone negative since the recession began is the creation of new digital information.
People are still taking pictures, making phone calls, sending emails, blogging, and putting up videos on YouTube.
Enterprises are still capturing daily transaction records, adding to their data warehouses.
Governments are still requiring more information be kept and protected, forcing the migration to digital TV, and taking surveillance photos of their citizens.
IDC, 2008IDC, 2008
High-Energy Physics’ two questions:
"What is the world made of?” "What holds it together?”
Unique, costly, non-reproducible data! Pushing energy and precision frontiers
Why should we preserve our data?
EO archives and datasets are invaluable:
– Analysing the state of the Earth, its environment and its variability over time requires a very large number of observations;
– It is impossible to go back in time and resample environmental data, therefore global and complete measurements need to be performed;
– The value of an environmental data set is impossible to estimate and it is impossible to foresee its potential future uses
PARSE.Insight
June 2010
PARSE.Insight
June 2010
Q: When would you invest effort into preservation ?
A: Whenever I have time to do so. This would not be a priority because there would be no recognition for the large effort involved
PARSE.Insight
June 2010
PARSE.Insight
June 2010
A bit about the National Digital Heritage Archive project at the
National Library of New Zealand
Technology as the enemy
INFORMATION OBJECT Danger Point
The evolution of technology environments
Windows XP Word 2000
Windows Vista Word 2007
Windows xxx? Word xxxx?
Windows xxx? Word xxxx?
2004 2030TIME LINE
INFORMATION OBJECT
What is the real digital preservation problem?
‘ the problem of preserving digital information for the future is not only, or even primarily, a problem of fine tuning a narrow set of technical variables. It is not a clearly defined problem … rather, it is a grander problem of organizing ourselves over time and as a society to maneuver effectively in a digital landscape. It is a problem of building … the various systematic supports … that will enable us to tame the anxieties and move our cultural records naturally and confidently into the future.’
Garrett, J. & Waters, D. (eds). (1996)
Preserving digital information …
Garrett, J. & Waters, D. (eds). (1996)
Preserving digital information …
What do we do this for?
“A National Library is a place where a nation nourishes its memory and exerts its imagination where it connects with its past and invents its future.”
Pierre Ryckmans. 1996. “Perplexities of an electronically illiterate old man.”
Quad-rant, September 1996, No 329.
Why should national libraries or everyone here care?
Why should national libraries or everyone here care?
What we’ve been doing at NLNZ
Storage
Futures
Testing Training
Some milestones
There is still a long way to go
Some milestones
There is still a long way to go
Jul03 : Preservation Metadata Schema and Logical Data Model (iteration2)
Jul04 : NDHA Programme establishedSep04 : Metadata Extraction Tool Sep05 : Object Management System (OMS) Sep05 : Interim Electronic Legal Deposit (IELD) Online
Submission mechanism/processes Nov05 : Sun Centre of Excellence for Digital Futures in
Libraries announcedMay04 : NDHA Business Requirements SpecificationsNov05 : NDHA Functional Requirement SpecificationsSep06 : Web Harvesting Web Content Tool (WCT)Mar07 : NDHA / DigiTool Gap Analysis completedOct08 : Phase 1 Rosetta deliveredFeb09 : NDHA launched at NLNZNov09 : Phase 2 Rosetta delivered
MAR24 2010 : Production Release of Rosetta v2.0
Collaboration
Partnership
The NDHA Programme will be successful and delivered in a timely and cost effective manner
Design & BuildSun
Centre of Excellence
OAIS model
OAIS Reference Model
OAIS base map
Rosetta functionality v2.0
From producer management workflow automation delivery, audit trails & reporting format registry, preservation risk management, planning and action
http://ndha-wiki.natlib.govt.nz/ndha/pages/BackgroundInformation
• User management• Producer management • Deposit 1• Deposit 2• Validation stack• Intellectual Entity (IE)
data model• Submission Information
Package (SIP) submission
• SIP processing• Deposit registration• Technical analyst• Workbench• Consolidated appraisal
workbench
• Rosetta transformers
• Deposit Application Programme Interface (API)
• Audit & provenance
• Process management
• User management API
• Permanent repository
• Format Registry
• Preservation planning
• Delivery
• Meditor
• Reports
• Back office configuration
Integration
• Deposit applications development
• Existing collection management systems integration
• Browser based content delivery systems development
• Existing resource discovery and delivery systems integration
• Reporting systems
• Common services integration
• Data migration
Integration work stream
It’s not all about the Digital Preservation System
Integration work stream
It’s not all about the Digital Preservation System
Internal Submission Application• Submission Information Package (SIP) Creation Tool (Templates, Hotkey support
Packages up
• Files (supports complex digital objects)
• Metadata (Structure map creation – METS)
• Digital object structure – multiple representations
• Fixity generation (MD5)
• Links to descriptive record – CMS integration
• Links producer records
• Submits SIP to the NDHA
INDIGO
Forms …Romanic: indicum, IndicusSpanish: indicoPotruguese: endegoDutch: indigo
NDHA: in dey go
INDIGO
Forms …Romanic: indicum, IndicusSpanish: indicoPotruguese: endegoDutch: indigo
NDHA: in dey go
Integration Points
PDS StagingDeposit Operational Permanent
Authentication
Deposit API
Digital Preservation
Customer systems uses
Creating IE
V.S. Plug-In
Enrichment Plug in
MigrationTools
CMS Integrations
PublishingOAI
SRU/SRW
Viewers
Access API
Tech MDExtraction
Access API
Technology Infrastructure
How ready is our infrastructure for digital preservation?
Maturity
Leveraging the NDHA for the Government Digital Archive and
the public record
The NDHA and the GDA = NZDA?
The public record
Public Records Act 2005
“through the systematic creation and preservation of public archives and local authority archives, to enhance the accessibility of records that are relevant to the historical and cultural heritage of New Zealand and to New Zealanders' sense of their national identity .”
A government archiving point of view
A government archiving point of view
GDA – leveraging the learnings from the NDHA
A shared vision of how government digital information should be preserved
NZ$12.6 million – 1 July 2010 to 31 June 2013
Control transfers of archival materials from government departments
Manage re-use of content by the creating department
Provide general access to government archives online
Manage preservation processes
Government Digital Archive (GDA)
A government archiving point of view
A government archiving point of view
Conceptual – how is content defined across the two organisations and how should it be preserved
Practical – managing the system for NLNZ and ANZ specific requirements as they become clearer
Rosetta enhancements:• enhanced bulk updating• enhanced support for consortium management• ITP in local libraries• enhanced support for Plug-ins• enhanced functionality for deposit arrangements• enhanced functionality for ingest workflows• enhanced support for delivery rules• enhanced functionality for logon and identity verification.
Clear benefit to the wider digital preservation community as we work to include increased support for archival practice in the Rosetta system.
Challenges 1
A government archiving point of view
A government archiving point of view
Appraisal of very large transfers from government departments
Providing simple, secure access for departments to their own content
Updating primary collection management systems to incorporate digital preservation workflows
Migration of current corpus to Rosetta – approx 70TB
In a small country, continuing to extrapolate outwards for a national approach to digital preservation – primary research, data sets etc.
Digital preservation as central to a national knowledge infrastructure.
This is the challenge and the opportunity for Archives New Zealand and the National Library of New Zealand.
Challenges 2
It’s not just about The technologyIt’s not just about The technology
A look at how we’re starting to model risk management
NLNZ definition of obsolescence
•We define format obsolescence in relation to the Library’s ability to render files within the repository.
•If we cannot view, render, or migrate formats then they are “at-risk”.
‘Risk is about the impending loss of the means of providing access’
Pearson & WebbIJDC 1:3, 2008
‘Risk is about the impending loss of the means of providing access’
Pearson & WebbIJDC 1:3, 2008
Obsolescence at NLNZ
Risk is based on formats
• We accept all formats
• Risk assessement has to be:
- Automated (to a degree)
- Meaningful and obtainable
- Granular
- Cognizant of internal and external factors
- Able to be acted on…(bytestream)
Some attributes of a preservation risk assessment process
Some attributes of a preservation risk assessment process
• A Local Format section (including a permanent format identifier or ‘signature’)
• An Application section (that records the Library’s available or tested tools)
• A Risk section(that documents known problems that can affect our ability to render digital objects)
• Works with PRONOM (from the National Archive, UK)
• Works with the UDFR initiative
A tripartite approach to risk assessmentA tripartite approach to risk assessment
The Format Registry
Libraries Will Document
• Formats that can be rendered;
• Specific versions of formats that can be rendered;
• The particular characteristics within these versions that are “problematic” (for example compression and colour encoding);
• Applications that can render variations of formats; version and characteristics;
• The sustainability of applications and formats.
Ok, so what sort of stuff is going to be in these three libraries?
Ok, so what sort of stuff is going to be in these three libraries?
What else is in the Libraries?
The Application library also tells us about:
a. Contract dates with vendor
b. Tech services schedules
c. Controlling the application in the system
d. Vendor support dates
e. Review date if no other date in place.
Defining the timelines related to particular application based format risk criteria
Defining the timelines related to particular application based format risk criteria
Format library and risk grading
• A local library of formats connected to the global registry.
– Extend the global with local information.
– Extend the global with additional formats
• Connected to – application library, characteristics and risk
• Each format can have one or more risks attached
• A risk can refer to sub set of the format
• Risks are updated by users and can be global or local
• Users can view other institutions risk and import them.
Some more on the link to UDFR and other format based tools and services such as PRONOM and DROID
Some more on the link to UDFR and other format based tools and services such as PRONOM and DROID
Some comments on the state of digital preservation
Where are we up to in digital preservation?
• Language
• The data deluge
• Products, tools and services
• Standards, quality assurance and confidence
What are some of the issues we face?What are some of the issues we face?
A Tower of Babel or a lingua franca?
•Repositories•Data archiving•Digital archiving•Life cycle•Digital curation•Data curation•Digital preservation
•Standards•Certification/Audit
We need clarity and certainty about what we mean when we say digital preservation.
What do we mean when we talk about digital preservation?
What do we mean when we talk about digital preservation?
Paper - 0.01%
Optical - 0.002%Film - 7%
Magnetic - 92%
The data deluge
BBC Petabytes per week
CERN LHC – black holes (mini or otherwise) How much data?
Content complexity- Kam Woods – CDs- Alex Ball – CAD (Engineering)- Mark Guttenbrunner (gaming)
- Astronomy, oceanography- Management of data sets
Work on digital preservation is really only just beginning.
Products, tools and services
• Few tools managing formats - JHOVE, DROID, and MET• None deal with formats in a satisfactory manner• Limited formats, overlapping functionality• Problems regarding accuracy well documented
We need • comprehensive management approach• strategy to identify risk of format obsolescence• strategy to mitigate risk of format obsolescence
• ability to identify the specific files that are most at-risk • ready access to detailed, accurate information describing
the file formats
‘Managing format is fundamentally important’
Steve Abrams (iPres 2008)
‘Managing format is fundamentally important’
Steve Abrams (iPres 2008)
Standards, quality assurance and confidence
• OAIS• PREMIS• NARA• PLANETS• NDIIPP• CASPAR• SHAMAN• DURASPACE• HathiTrust • Requirements• Certification & Audit
Is the OAIS model still relevant or do we hold to it too tightly instead of developing more granular standards for digital preservation?
Where’s the agreement as to what comprises digital preservation?
Where’s the agreement as to what comprises digital preservation?
Economic sustainability
Sustainable economics for a digital planet
Blue Ribbon Task Force
February 2010
Sustainable economics for a digital planet
Blue Ribbon Task Force
February 2010
‘Digital preservation strategies face the following challenges’:• uncertainty about selection criteria for assessing long-term
value, especially with large-scale data sets, small ‘hand-crafted’ digital collections, and the emerging genres of collective authorship on the web
• misalignment of incentives between those who are in a position to preserve and those who benefit from preservation and access
• lack of clear responsibility for digital preservation, coupled with a prevailing assumption that it is someone else’s problem
• little coordination of preservation activities across diffused stakeholder communities
• difficulty in separating preservation costs from other costs, that is, in distinguishing between the process of making things available now and making things available in the future
• difficulty in valuing or monetizing the costs and benefits of digital preservation, which are necessary to secure funding and investment.
Some comments on getting started in digital preservation
Strategic drivers
What are the key drivers for your institution to look at a digital preservation programme?
What are the key drivers for your institution to look at a digital preservation programme?
It is important to have a clear discussion of the strategic drivers for digital preservation including:
•does your organisation have a long term preservation mandate?•what is the nature of your digital collections?•what is the extent/size of your digital collections now and in the future?•what are your institutional policy requirements for digital preservation?•what is the status of digital preservation within your institution?•What is your available resourcing/staffing to implement/support digital preservation?•what is your funding environment for digital preservation?
Business models
What are the options available to you in determining an appropriate business model for digital preservation?
What are the options available to you in determining an appropriate business model for digital preservation?
Business models and therefore costs may vary from institution to institution and may significantly influence the nature of a digital preservation programme:
•does your institution have a national/regional mandate?
•is there potential for a consortial arrangement?•Collaboration•Shared infrastructure•Cost efficiency•Archives NZ and National Library of NZ•Goportis consortium in Germany
•is there potential for revenue generation, eg for 3rd party hosting?
Build or buy?
• Commercial solution vs. building it yourself
vs. project based companyUser communityEnhancementsContinuityOpen source 80% (Jhove, Droid)
• Important to look at the required institutional outcome
• Repository solutions, digital archiving solutions and digital preservation systems are unlikely to be the same thing
Goal
Collection, preservation and access in perpetuity
The Solution
• Digital Preservation System (DPS) • generic software solution for the wider market• broad ranging digital preservation solution for a range of community interests
• NDHA is the NLNZ implementation of DPS • wider functionality and business change are required for practical digital preservation within any given institution
Digital Preservation Solution (DPS) = Rosetta
It is important from NLNZ perspective that the solution is not NLNZ specific
What we think we’re doing
Components of digital preservation
Storage
Futures
Risk management
Planning
Migration
Emulation
Testing Training
ProvenanceContextAuthenticityIntegrity
StoragepsychologyProcesses/strategies
It is not possible to do everything at once
StoragepsychologyProcesses/strategies
It is not possible to do everything at once
Each of these is a substantial and necessary aspect of the overall digital preservation puzzle. However, NLNZ has concentrated primarily on issues related to provenance, context, authenticity and integrity. Phase 2 will develop and implement our thinking regarding risk management and preservation planning. We have taken components of the digital preservation continuum and not attempted to implement a big bang system, which has allowed us to progress at a pace that suits our overall capability and capacity. We also need to be mindful that what we do now may not necessarily have any longevity in the context of a sustainable digital preservation programme, ie today’s solutions for digital preservation undoubtedly will not be tomorrow’s solutions.
Deployment and implementation
Deployment and implementation need to be undertaken with a view to available funding and staffing resources.
Deployment and implementation need to be undertaken with a view to available funding and staffing resources.
Critical project staffing includes:
•Project Manager (preferably high quality to manage overall implementation)
•Technical Lead (preferably internal resource with good knowledge of the infrastructure)
•Business Lead (preferably a champion from the business)
•Each of these should be supported by an appropriately resourced and sized team.
Deployment and implementation
Deployment and implementation also need to take into account the materials to be preserved.
Deployment and implementation also need to take into account the materials to be preserved.
Determining which materials to be preserved in the first instance should begin with:
•A resource type where the parameters of the objects are well understood, eg the results of an internal digitisation program where all the specifications have been set by the institution
•New resource types being added depending on need, learning complexity, internal capability/capacity etc.
Ongoing staffing
It is not yet clear what ongoing resourcing will be required for digital preservation.
It is not yet clear what ongoing resourcing will be required for digital preservation.
However, it is likely to vary from institution to institution.
NLNZ has created a new NDHA business unit comprising:
•Manager NDHA•Preservation Policy Analyst•Preservation Technical Analyst•Rosetta Configuration Analyst•Preservation Ingest Analyst•Preservation Requirements Analyst•NDHA Developer
We do not yet know whether this will be sufficient staffing for a digital preservation programme.
Getting started
So, getting started is the key.So, getting started is the key.
Given the newness of digital preservation as a discipline a combination of the above approaches will allow an institution to implement at their own speed and according to the funding and human resources available. And it would provide the time window to undertake the strategic and policy planning to support the funding and resourcing of a sustainable digital preservation programme.
Some key challenges ahead
• Agreed lexicon describing what we mean by digital preservation and what we want from digital preservation systems
• Capability/capacity to respond to technological change and innovation
• Citizen’s created content impacting on our collection, description and preservation processes
• Content (ie digital preservation) systems are our core operational systems, not the catalogue
• Defining, resourcing and pursuing the research agenda (understanding the web, science data sets etc)
• Quality assurance of products and tools• Professional services market (commercial or otherwise)• Digital preservation as a component of a national knowledge
infrastructure• A coordinated national/international approach to supporting
digital preservation research, products and services
A conclusion
Digital preservation is at the heart of our Business within 10 years
A conclusion
Digital preservation is at the heart of our Business within 10 years
The most important reasons for preservation are the ones we do not see now
PARSE.Insight
June 2010
PARSE.Insight
June 2010
Steve [email protected]
Thank you