harvesting and dams

30
Harvesting and DAMS Glen Robson, DAMS Manager, National Library of Wales

Upload: imani-roman

Post on 30-Dec-2015

37 views

Category:

Documents


1 download

DESCRIPTION

Harvesting and DAMS. Glen Robson, DAMS Manager, National Library of Wales. What do we do when it gets here. Normalise Meta data Migrate? Storage Access. Normalise Metadata. Consistency Convert to NLW standards (METS) Consistent METS between projects Add technical metadata - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Harvesting and DAMS

Harvesting and DAMSGlen Robson, DAMS Manager, National Library of Wales

Page 2: Harvesting and DAMS

What do we do when it gets here•Normalise Meta data•Migrate?•Storage•Access

Page 3: Harvesting and DAMS

Normalise Metadata

•Consistency Convert to NLW standards (METS) Consistent METS between projects

•Add technical metadata▫Link file format to PRONOM registry▫Automatic technical metadata

Jhove or NZ metadata extraction tool•Add preservation metadata (PREMIS)

▫Objects history

Page 4: Harvesting and DAMS

Harvesting

•Take a copy of metadata and Thesis•Different formats

▫PDF, Word and Text•Complex Objects

▫E.g. 1 PDF per chapter

Page 5: Harvesting and DAMS

Migration

•Input:▫221 application/msword  ▫4 application/octet-stream▫114 application/pdf  ▫3 application/vnd.ms-excel▫340 text/plain

Page 6: Harvesting and DAMS

Now or later?

•Migrate on ingest▫How do you choose the format?▫Storage Cost

•Migrate on obsolescence▫Tools available?

Page 7: Harvesting and DAMS

Migration

•Microsoft Word▫Can open it now▫Have to have a copy of Word

•application/octet-stream▫Can’t open now

Page 8: Harvesting and DAMS

Storage

•LOCKSS•University copy•NLW Copy

▫Archive copy on tape▫Archive copy on Optical Disc▫Archive copy offsite▫Access copy

•Ethos copy

Page 9: Harvesting and DAMS

Access

•Convert to MARC▫Digital and Print in MARC▫Single Point of access for all collections

•Mostly automated▫Best use of resources

Page 10: Harvesting and DAMS

Lessons Learnt and Problems Encountered•Started using Fedora in 2004

▫Ingested 3 Digitisation Project 2 Mass Digitisation

▫Ingesting Video and Radio Programs•Started with Pilot•Purchased VITAL based on Fedora•Project Driven

Page 11: Harvesting and DAMS

Lesson 1: Physical carriers degrade or obsolete

Page 12: Harvesting and DAMS

Lesson 1: Physical carriers degrade or obsolete

Page 13: Harvesting and DAMS

Lesson 1: Physical carriers degrade or obsolete

Page 14: Harvesting and DAMS

Lesson 1: Physical carriers degrade or obsolete

Page 15: Harvesting and DAMS

Why is this a problem for the library?•Deposit

▫Sometimes no choice on carrier▫Depositors aren’t in a position to change

the carrier

Page 16: Harvesting and DAMS

Lesson 1: Physical carriers degrade or obsolete• Age• Storage conditions• Sun light • Temperature

• “Widely differing claims have been made for the life expectancy of CD-Rs, but it is generally accepted that they will last longer than the associated technology and are therefore suitable for preservation purposes. CD-Rs offer storage capacities of 650 MB to 700 MB. CD-RW is based upon a different recording process to CD-R, and is not recommended for archival storage.”

• http://www.nationalarchives.gov.uk/documents/media_care.rtf

Page 17: Harvesting and DAMS

Practical Example•Deposit of CDs from Cliff McLucas and Brith

Gof Theater company•22% of the Cliff McLucas CDs •60% from Brith Gof could not be copied or

read. •According to the sleeves, many of the Brith

Gof discs contain material relating to performances between about 1989 and 1992.

•Only real solution is to copy data from carrier as soon as possible

Page 18: Harvesting and DAMS

CDAS

Page 19: Harvesting and DAMS

Lesson 2: Digital can get BIG• Wills Project

▫ 182, 404 Wills▫ 816, 325 Images▫ 998, 729 Fedora Objects

• Welsh Journals▫ 50 Titles▫ Thousands of Pages

• Offair▫ 40,000 Records

• SCIF Newspaper and Magazines▫ 2 Million Pages

• Repository 3 Million plus Objects

Page 20: Harvesting and DAMS

Problems

•Processing takes time•Management•Discovery•Cost•Cataloguing / Metadata

Page 21: Harvesting and DAMS

Lesson 2: Digital can get BIG• Sgrîn – Cardiff Media Company• Company closing down (2006)• Collect data from Shared drive• Stats:

▫ 29.2 GB▫ 68,446 files

Microsoft Word Documents: 32,086 JPEG Images: 18,093 Rich Text Format: 2,707 Microsoft Excel Documents: 2,498 Microsoft Works Word Document: 2,127 Files with missing File extension: 2,036

• Selection?• Cataloging?

Page 22: Harvesting and DAMS

Lesson 3: Metadata is expensive•Accessioning:

▫Depositor adds metadata (Roda)▫Deposit comes with metadata (Ethos)

•Digitisation▫Structure / Context▫From Catalogue▫Write Once use many

•Automate as much as possible

Page 23: Harvesting and DAMS

Lesson 4: You can’t automate everything

•Offair Recording•Original Plan:

▫BOB System records programs Metadata from EPG

▫Harvest from BOB create MARC record ▫Ingest

•Totally automated

Page 24: Harvesting and DAMS

Lesson 4: You can’t automate everything

•Spanners in the works:▫Duplicate Recordings▫Failed Recordings▫EPG Errors

•New workflow:▫BOB System records programs

Metadata from EPG▫Fix failed validation records (Human

Process)▫Harvest from BOB create MARC record ▫Ingest

Page 25: Harvesting and DAMS

Lesson 5: Things Change

Page 26: Harvesting and DAMS

Ingest Early

•Items managed early•Missing items picked up earlier•Change / Creation at the same point•1 interface rather than 1 creation 1 edit

•Preserve but allow change▫Systems make it difficult

Page 27: Harvesting and DAMS

Lesson 6: Workflows not Projects•Develop specific Project based workflows•Have to be customised each time•Symptom of project based funding

•Digitisation Workflow•Generic Services

▫Technical Metadata▫Checksums

Page 28: Harvesting and DAMS

Preservation Paranoia

•Lesson we may learn:▫How much metadata is too much?▫How much technical metadata should we

have?▫Migrations MS-Word:

PDF Text Image of each page Open Office XML

Page 29: Harvesting and DAMS

Summary

•Physical carriers degrade or obsolete•Digital can get BIG•Metadata is expensive•You can’t automate everything•Things change•Workflows not Projects•Preservation Paranoia

Page 30: Harvesting and DAMS

Questions