the digital object management programme (dom) richard masters, programme manager
DESCRIPTION
The Digital Object Management Programme (DOM) Richard Masters, Programme Manager PRESERV Partners Meeting 18 th November 2005. 1. DOM Programme Mission and Vision. Our mission is to enable the United Kingdom to preserve and use its digital output forever - PowerPoint PPT PresentationTRANSCRIPT
The Digital Object Management Programme (DOM)
Richard Masters, Programme Manager
PRESERV Partners Meeting 18th November 2005
1
2
DOM Programme Mission and Vision
Our mission is to enable the United Kingdom to preserve and use its digital output foreverOur vision is to create a management system for digital objects that will
store and preserve any type of digital material in perpetuityprovide access to this material to users with appropriate permissions ensure that the material is easy to findensure that users can view the material with contemporary applications ensure that users can, where possible, experience material with the original look-and-feel
3
DOM Programme Scope
Providing a generic and cost-effective infrastructure for the Library’s digital material that will
take in material of many types take in material coming from many sources store it all securely for the long term allow controlled access endure
4
We already have a wide range of materials to deal withExisting voluntary deposit scheme, operational since 2000 (1.5 TB)Digitised versions of BL material, from early ’90s onwards (15 to 20 TB)Electronic journals (1 TB)New digitisation initiatives: newspapers, sound, etcSound Archive material (150 TB, growing at 30 TB per year)Web archiving, Cartographic data, Picture library, Purchased and donated digital materials
We must be prepared forLegal deposit legislation for non-print material: royal assent was given in October 2003 but the law needs secondary legislation to bring it into force. The first materials will probably be hand-held (DVDs, CD-ROMs).
Our storage planning figure is 300 TB after 5 years.
DOM Programme Scope
5
DOM System – key features
Scalability100s of TBs, millions of objects, millions of users
ResilienceConventional DR is not adequateDuty of care means we have to have multiple sites
Integrity and authenticityIdentify and repair damaged objectsA process is defined to provide long-term assurance that an object that is re-presented is as it was when it was ingested
Rights managementCurrent rights agreements, licences are complex legal documentsSeparate policy and enforcement
Representation modelNeed to deal with complex structured objectse.g. digitised newspaper, OCR text, articles
6
DOM architecture – 2 key concepts
Heterogeneous Storage Storage is supplied by several vendors Storage is independent of all vendors ‘Commodity’ storage Avoid paying for unneeded features of high
performance and high resilience Multiple Sites
Same design implemented on several sites But may be different equipment 2 sites at first, aim for 4 Dark Archive
7
DOM architecture – 2 more key concepts
IntegritySystem can monitor the object store continuously to detect object corruptionIt would then initiate object recovery
AuthenticityLong-term assurance that an object when presented is the same as when it was ingestedBased on the use of cryptographic signing techniques
Each object is ‘signed’ when it is ingestedThe signature is verified when requiredThe signing mechanism is ‘tightly’ controlled
8
Content Providers
Accession/Ingest
• Format Validation• Format Conversion• Request/Rerequest• Metadata Validation/Creation
Repository
• Storage• Digital Preservation• Continuous Validation• Performance Management• Metadata
DRM
Res
ourc
e D
isco
very
/Use
r Int
erfa
ce
• Combined Resource Discovery with other collections
DigitalPreservation
Metadata
Management Information / Technical Operations
Researchers
DOM Component Architecture
9
Published papers
“The large-scale archival storage of digital objects.”
February 2005The 4th in the series of Digital Preservation Coalition Technology Watch reports, available at: http://www.dpconline.org/graphics/reports/
10
Published papers
Adam Farquhar et al“Design for the Long Term: Authenticity and Object Representation”Presented at the Archiving 2005 conference, April 2005 http://www.bl.uk/about/policies/dom/pdf/archiving2005l.pdf
Sean Martin, with Mary Baker and Kim Keeton of HP Labs"Why Traditional Storage Systems Don’t Help Us Save Stuff Forever" Presented at the 1st IEEE Workshop on Hot Topics in System Dependability on June 30th 2005 in Yokohama, Japan.http://www.stanford.edu/~candea/hotdep/papers/baker_forever.pdf
11
UK Web Archiving Consortium
Developing a selective approach to web archivingLicense for PANDAS about to be signed with NLASub-licenses with consortium partners and contractor to followITT concluded with Magus Research winning the contract.
Implement a common web arching infrastructure (lots of Linux machines + PANDAS)Provide customisation/development of PANDASProvide help desk and support
12
International Internet Preservation Consortium
Developing advanced web archiving technologiesSmart Crawler
Continuous adaptive crawler, adjusting crawl priority on the flyBased on IA HeritrixWorking on requirements nowExpect to being tender process in June
Content ManagementArchival formatsFrameworkMetrics and Test Bed