a preservation repository in prose being a story of the drs past, present and future by andrea...

61
A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

Upload: audrey-lane

Post on 23-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

A Preservation Repository

in Prose Being a Story of the

DRS Past, Present and FutureBy

Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

Page 2: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

Today’s Agenda

DRS 1: Being a Story of the PastA Transition: Being a Story of the Present

DRS 2 and You!: Being a Story of the Future

Questions?

Page 3: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

DRS 1: Being a Story of the Past

1997-2007

Page 4: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

The Formative years - LDI

• November 1997 Proposal for the Library Digital Initiative

“…create the first-generation technical infrastructure to support storage of and access to digital library materials.”

• In July 1998, LDI was approved and funded

• In December 1998, planning for DRS began

Page 5: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

Digital Repository Service (DRS)• provides a set of professionally

managed services to ensure the usability of securely stored digital objects over time.

• is both a preservation and an access repository 

• includes the bundled delivery services

October 2000 Launch

Page 6: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

LDI Grant projects

49 Grants were awarded 1999-2006

• Digitizing analog collections• Images• Text• Audio • Music scores

• Born Digital • Biomedical images• Geospatial data • Web sites

• Online cataloging projects

Page 7: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

Digitizing Facilities

June 1999• Harvard College Library

Imaging Services

2001 - 2002• HCL Fine Arts Library

Digital Imaging Lab (FAL DIL)• Harvard Art Museum

Digital Imaging and Visual Resources (DIVR)

• Harvard College Library Audio Preservation Services (HCL APS)

• Peabody Museum of Archaeology and Ethnology

Page 8: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

The first Deposit

• and the first object was deposited

on October 23, 2000…

Page 9: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009
Page 10: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009
Page 11: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

w/ Metadata• Administrative

• Stewardship, contacts (e.g., HCL Harvard-Yenching Library,

Ray Lum, etc.) • Billing account

(e.g., 33-digit account number)• Access flag

(e.g., open to the public, restricted to the Harvard community, no access)

• Technical • Physical characteristics

(e.g. for images, x and y resolution, MD5 signature, pixel width and height, compression, bit sample rate, etc.)

• Production methods (e.g. for images, Scitex; Leaf Volare; Leaf Colorshop 5.x ) 

Page 12: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009
Page 13: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009
Page 14: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009
Page 15: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

The first Book was deposited on June 29, 2001

Page 16: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009
Page 17: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

The first Audio was deposited on January 28, 2003

• Matins for Sunday after the Elevation of the Holy Cross

• Laura Boulton (1899-1980) Collection of Byzantine and Orthodox MusicsArchive of World Music

• One of a series of Byzantine hymns and liturgies recorded in a monastery on Patmos, 1960.

• Logbook (Part I, p. 1-10)

Page 18: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009
Page 19: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

The first georeferenced map was deposited on January 14, 2005

• Barnstable, Massachusetts 15 Minute Digital Raster Graphic

• From an 1893 Historic USGS map reprinted in 1907

Page 20: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009
Page 21: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009
Page 22: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

Systems and Services

1985• HOLLIS –our OPAC

1998 - 1999• VIA Visual Information Access–

union catalog• OASIS Online Archival Search

Information System – union catalog

1999-2000• OLIVIA – image cataloging tool

Page 23: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

Systems and Services

2000-2001• DRS Digital Repository Service –

preservation and access repository• NRS Name Resolution Service – to

resolve persistent identifiers• AMS Access Management Service – to

provide access controls• IDS Image Delivery Service • PDS Page Delivery Service• FTS Full-text Search Service• NRS Web Admin• Policy Web Admin

Page 24: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

Systems and Services2001-2002

• DRS Web Admin – staff interface to DRS• PDS Maint• Harvard Geospatial Library – union

catalog

2002-2003 • TED TEmplated Database – collection

building tool• SDS Streaming Delivery Service – for

audio delivery• ADS Asynchronous Delivery – for large

files• Cross-catalog search – for federated

searching

Page 25: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

Systems and Services2003-2004

• Dynamic IDS – for zoom and pan features w/ JP2

• DMART - Audio deposit tool2004-2005

• RList – Course reserves tool2005-2006

• Virtual Collections2006 - 2007

• Batch Builder2008 - 2009

• Google data loading• WAX

Page 26: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

A Transition: Being a Story of the Present

2008-2009

Page 27: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

2008: new DRS storage system• New servers, new storage arrays, new tape

library, new storage software• Increased storage capacity• Less complex - DRS loader doesn’t need to

know the details of storage system anymore• Higher availability for deliverable content• Copies stored in 3 different geographic

locations• 3 “low use” copies, 4 “high use” copies

Page 28: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

Cumulative file count per format type

2000 2001 2002 2003 2004 2005 2006 2007 2008

0

1,000,000

2,000,000

3,000,000

4,000,000

5,000,000

6,000,000

7,000,000

8,000,000

9,000,000

Im age

Tex tContainer

2000 2001 2002 2003 2004 2005 2006 2007 2008

0

2,000

4,000

6,000

8,000

10,000

12,000

14,000

16,000

18,000

20,000

A udio

Page 29: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

Annual file size per harvard unit (gb)

2000 2001 2002 2003 2004 2005 2006 2007 2008

0

2000

4000

6000

8000

10000

12000

14000

A rn. A rb. Divinity FA S M us /S pec . Libs HCL GS D GS E HB S

CHS Law Countway HA M HU A rchives K S G Radc liffe

HCL

ArtMuseums

Page 30: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

Cumulative non-Google file sizeper use (gb)

• April 2009: 45,742 GB

0

5,000

10,000

15,000

20,000

25,000

30,000

35,000

40,000

45,000

50,000

Oct-0

0

Feb-0

1

Jun-

01

Oct-0

1

Feb-0

2

Jun-

02

Oct-0

2

Feb-0

3

Jun-

03

Oct-0

3

Feb-0

4

Jun-

04

Oct-0

4

Feb-0

5

Jun-

05

Oct-0

5

Feb-0

6

Jun-

06

Oct-0

6

Feb-0

7

Jun-

07

Oct-0

7

Feb-0

8

Jun-

08

Oct-0

8

Feb-0

9

Lowuse

Highuse

Page 31: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

Cumulative file size (gb)

• April 2009: 105,652 GB

0

20,000

40,000

60,000

80,000

100,000

120,000

Oct-0

0Feb

-01

Jun-

01Oct

-01

Feb-0

2Ju

n-02

Oct-0

2Feb

-03

Jun-

03Oct

-03

Feb-0

4Ju

n-04

Oct-0

4Feb

-05

Jun-

05Oct

-05

Feb-0

6Ju

n-06

Oct-0

6Feb

-07

Jun-

07Oct

-07

Feb-0

8Ju

n-08

Oct-0

8Feb

-09

Google

Non-Google

Page 32: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

DIY -- http://hul.harvard.edu/ois/reporting/

Page 33: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009
Page 34: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

2008: new program, new position

• HUL takes next step in its commitment to digital preservation and establishes:1. Digital Preservation and

Repository Manager Position• March 2008• Andrea Goethals

2. Digital Preservation Program• June 2008• Established within OIS

Page 35: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

2008/9 priorities of new digital preservation program

1. Define additional infrastructure requirements to support digital preservation• DRS enhancements• Global digital format registry (GDFR)

2. Identify and analyze new formats for the DRS to support• PDF, email, audio, architectural

drawings, etc.

3. Establish communication network with the 2 communities we inhabit• Broader digital preservation

community • Harvard community

Page 36: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

Avenues of communication

• Broader digital preservation community• Conferences and meetings• Collaborative projects• Email conversations, blogs,

newsgroups

• Harvard community• Committees (ULC, CCCC, DMCC,

DCSWG, etc.)• Digital project librarians• Ad-hoc focus groups, meetings and

email with stakeholders (depositors, curators and collection managers)

• Customer surveys

Page 37: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

These communities inform our thinking about:

• Concepts and terms• Metadata• Data models• Content

• Recommended & supported formats

• Best practices• Preservation planning and actions• Storage, management and monitoring

• Certifications• Registries• Tools and services

Page 38: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

DRS customer survey 2008

• August - September 2008• Users of DRS tools or services• To evaluate the level of

satisfaction with DRS tools, services, and websites

• To understand any unmet needs

Page 39: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

Survey findings

• Question 1: What word or phrase best describes the DRS?

• In general the DRS is valued for its preservation services and perceived as stable, secure and trusted.

Page 40: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

Other key findings of survey

DRS Customers want:• Support for more formats• Guidance on preservation

formats and content creation• Better search and editing

management tools• Delivery services that use

common or popular third-party applications

Page 41: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

Trends in DRS customer needs

1. Problem of abundance2. Remote creators3. Diversity of formats

Page 42: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

1. Problem of abundance

DRS owners and depositors:

• Are increasingly overwhelmed by the amount of digital content to preserve

• Can’t fully process the material they want to deposit into the DRS

• Can’t go through a deposit process that is time-consuming

Page 43: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

2. Remote creators

• Increasingly DRS owners and depositors are acquiring content they did not create

• DRS staff can not influence the formats or technical properties of this content during creation

Page 44: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

3. Diversity of formats• DRS owners and depositors increasingly

need to preserve formats and genres that aren’t currently supported by the DRS

CAD formatsSpreadsheet formats

3D visualization formats

Presentation formats

Additional audio formats

Databases

Video formatsLocally archived websites

Executable file formats Raw survey data

Word processing formats

Raw camera files

Page 45: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

Implications of these trends

The DRS needs to:• accept and preserve minimally-

processed content• provide a time-efficient deposit

process• support a broad range of formats

and genresAnd:• can’t rely on the content being in

“preservable” formats prior to deposit into the DRS

Page 46: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

DRS 2 and You!: Being a Story of the Future

2009 -

Page 47: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

DRS 2 changes

Why?1. To better support digital

preservation2. To better support needs of DRS

depositors, curators and collection managers

Page 48: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

DRS 2 changes

1. New conceptual foundation• Objects• Content models

2. User improvements• Support for opaque objects• Support for new file formats• Deposit, management &

delivery tools• Guidance & user community

3. A new approach to metadata4. Increased preservation planning

and activities

Page 49: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

Objects

• Currently only a file level in the DRS• All management has to be done at

the individual file level

• Objects are aggregations of files • Page-turned object• Still image object

• More intuitive unit for management, reporting and searching • Example: How many Page-turned

objects do I have in the DRS?

Page 50: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

Content models

• Types of objects• Example: audio content model

Page 51: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

Support for opaque objects

• A special content model• Allows files in any format• The digital equivalent of buying time at

HD• Content can be minimally processed• Must be intended for long-term

preservation

• The content could be fully processed by depositors but not supported yet by DRS

• Will receive some preservation services• Will be on a path to fuller DRS

preservation

Page 52: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

Support for new file formats

• PDF• Audio

• MP3, MP4/AAC

• Drawings• AutoCAD• Adobe Illustrator

• Video• What’s next?

Page 53: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

Deposit, management & delivery tools

• Enhanced Batch Builder• Integrated with File Information

Tool Set (FITS)• Enhanced DRS Web Admin

• Better searching • Richer management and reporting• Ability to perform batch updates

• File Delivery Service (FDS)• Created for PDF delivery• Delivers a file to user’s web browser

Page 54: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

Future of http://hul.harvard.edu/ois/

Page 55: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

Guidance & user community

New website for digital preservation

• Formats central• Content models• DRS practices• HUL digital preservation projects• Emerging standards and best

practices• Tools, services, registries• Resources & Experts

Page 56: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

A new approach to metadata

• Moving towards community-standard schemas• PREMIS, MODS, MIX, textMD, etc.

• Metadata files on the file system alongside content files• “object descriptors”

• Preservation, rights, descriptive metadata

• More reliance on embedded metadata• Automatic extraction at deposit time by FITS• Third party delivery applications are

becoming aware of file-embedded metadata

Page 57: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

Increased preservation planning and activities

• More granular format identification• Sub-file characterization

• Preservation plans per content model• Digital first aid (content & metadata)• “Localization,” migrations,

normalizations

• Technology watch• Virus checking

Page 58: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

DRS 2 process

• Phases of work• DRS 2.1, 2.2, 2.3, etc.

• Themed phases• DRS 2.1: “Object Security and

Integrity”• DRS 2.2: “Management and

Monitoring”

• Includes support for new formats• DRS 2.1: PDFs, opaque objects• DRS 2.2: more audio formats

(MP3, MP4/AAC)

Page 59: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

http://hul.harvard.edu/ois/systems/drs/enhancements.html

Page 60: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

Questions?

Page 61: A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

Image credits• Future ghost

• http://www.animationartgallery.com/images/OSC/OSCJL2.gif

• Marley’s ghost• http://cueballcol.files.wordpress.com/2007/12/435px-a_c

hristmas_carol_-_marley27s_ghost.jpg• Ghost of the past

• https://www.1st-art-gallery.com/thumbnail/202533/1/Scrooge-And-The-Ghost-Of-Marley,-From-Dickens-A-Christmas-Carol.jpg

• Ignorance and want• http://doxoblogy.files.wordpress.com/2007/03/a_christm

as_carol_02.jpg• Weight of wikipedia

• http://images.theage.com.au/ftage/ffximage/2008/05/26/300_wikipedia1.jpg

• Lots of people• http://repairstemcell.files.wordpress.com/2009/02/lotsa-

people.jpg• Ghost of the future

• http://www.ibiblio.org/ebooks/Dickens/Carol/4.jpg • Mr. Magoo

• http://www.affordablehousinginstitute.org/blogs/us/Magoo_christmas_future_small.jpg