web at risk: extending the digital curation mission to the web

31
Preservation Program Digital Preservation Program Web At Risk: Extending the Digital Curation Mission to the Web Patricia Cruse, Director, Digital Preservation Program Kirsten Neilsen, Digital Preservation Services Manager California Digital Library DigCCurr 2007 – April 18-20 UNC Building Capabilities for Digital Curation Repositories

Upload: others

Post on 12-Sep-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Web At Risk: Extending the Digital Curation Mission to the Web

Preservation  ProgramDigital Preservation Program

Web At Risk: Extending the Digital Curation Mission to the Web

Patricia Cruse, Director, Digital Preservation ProgramKirsten Neilsen, Digital Preservation Services Manager

California Digital Library

DigCCurr 2007 – April 18-20 UNCBuilding Capabilities for Digital Curation Repositories

Page 2: Web At Risk: Extending the Digital Curation Mission to the Web

Preservation  ProgramDigital Preservation Program

The Digital Preservation Program

• Established in 2002• UC-wide program• Goal: ensure long-term availability and accessibility to

materials that are important to the research, teaching, and learning on the UC campuses.

• Centrally managed • Central and external funds• A partnership

Page 3: Web At Risk: Extending the Digital Curation Mission to the Web
Page 4: Web At Risk: Extending the Digital Curation Mission to the Web

Preservation  ProgramDigital Preservation Program

Cornerstone of the Program: Digital Preservation Repository (DPR)

• Suite of tools & services: – Digital Preservation Repository – Documentation, guidelines, policies

• Intern’l Standards & Open Source• Service oriented architecture: flexible, adaptable,

simple• Preservation Partnership

– Curate– Preserve

Page 5: Web At Risk: Extending the Digital Curation Mission to the Web

Preservation  ProgramDigital Preservation Program

Digital Preservation Repository core services

• A set of services that support the long-term retention of digital objects: – Submit (deposit) digital objects– Manage digital objects: add versions, replace,

update, delete– Request dissemination– Request administrative reports (forthcoming)

• What the service is not…

Page 6: Web At Risk: Extending the Digital Curation Mission to the Web

Preservation  ProgramDigital Preservation Program

Page 7: Web At Risk: Extending the Digital Curation Mission to the Web
Page 8: Web At Risk: Extending the Digital Curation Mission to the Web

Preservation  ProgramDigital Preservation Program

DPR to Web Archiving Service

Page 9: Web At Risk: Extending the Digital Curation Mission to the Web

Preservation  ProgramDigital Preservation Program

Web-at-Risk: NDIIPP FundsJan 2005 – Jan 2008

• Build tools to allow librarians to capture, curate and preserve web-based government and political information.– Create topical and event-based archives– Capture individual sites and documents

• Assess the impact of these tools on traditional collection development practices.

• Explore web archiving service sustainability.

Page 10: Web At Risk: Extending the Digital Curation Mission to the Web

Project Partners

Page 11: Web At Risk: Extending the Digital Curation Mission to the Web

Preservation  ProgramDigital Preservation Program

Preserving the Web

• Why all the fuss?• What is “Web Archiving?”• Web Archiving Service (WAS)

– Collecting content– Curating content

• Current status & future plans

Page 12: Web At Risk: Extending the Digital Curation Mission to the Web

Preservation  ProgramDigital Preservation Program

Page 13: Web At Risk: Extending the Digital Curation Mission to the Web

Preservation  ProgramDigital Preservation Program

Page 14: Web At Risk: Extending the Digital Curation Mission to the Web

Preservation  ProgramDigital Preservation Program

• 2003 survey of the .gov domain:

– as much as 65 percent of all government publications that are distributed to libraries through the federal depository library program are currently produced exclusively in electronic form and distributed via the web.

Page 15: Web At Risk: Extending the Digital Curation Mission to the Web

Preservation  ProgramDigital Preservation Program

What is a “Web Archive?”

• Automated method to gather web content• Collections composed of multiple sites• Captured content preserved• Meaningful access to content provided

– Public or end-user access may not be available

Page 16: Web At Risk: Extending the Digital Curation Mission to the Web

Preservation  ProgramDigital Preservation Program

Page 17: Web At Risk: Extending the Digital Curation Mission to the Web

Preservation  ProgramDigital Preservation Program

Domain-Based Web Archives

Nordic Web Archive

Kulturarw3

National Web Archive

Nordic National Libraries

National Library of Sweden

National Library of Iceland

Page 18: Web At Risk: Extending the Digital Curation Mission to the Web

Preservation  ProgramDigital Preservation Program

Topical Web Archives

Page 19: Web At Risk: Extending the Digital Curation Mission to the Web

Preservation  ProgramDigital Preservation Program

Event-Based Web Archives

Page 20: Web At Risk: Extending the Digital Curation Mission to the Web

Preservation  ProgramDigital Preservation Program

Page 21: Web At Risk: Extending the Digital Curation Mission to the Web

Preservation  ProgramDigital Preservation Program

Web Archiving Lingo

• Crawler• Host• Site• Seed• Capture• Robots.txt

Page 22: Web At Risk: Extending the Digital Curation Mission to the Web

Preservation  ProgramDigital Preservation Program

Page 23: Web At Risk: Extending the Digital Curation Mission to the Web

Preservation  ProgramDigital Preservation Program

Page 24: Web At Risk: Extending the Digital Curation Mission to the Web

Preservation  ProgramDigital Preservation Program

Page 25: Web At Risk: Extending the Digital Curation Mission to the Web

Preservation  ProgramDigital Preservation Program

Sample Collection Plan

• Section 1. Mission & Scope• Section 2. Selection• Section 3. Acquisition• Section 4. Descriptive Metadata• Section 5. Rights and Access• Section 6. Maintenance and Weeding• Section 7. Preservation

• Appendix A. Letter of Agreement• Appendix B. Seed List• Appendix C. Metadata

Page 26: Web At Risk: Extending the Digital Curation Mission to the Web

Preservation  ProgramDigital Preservation Program

Flexibility in the face of uncertainty

Page 27: Web At Risk: Extending the Digital Curation Mission to the Web

Preservation  ProgramDigital Preservation Program

TitleParallel TitleAlternate TitleAdded TitleSeries TitleSerial TitleUniform TitleOther

CreatorCreator NameCreator RoleCreator Information

ContributorContributor NameContributor RoleContributor Information

PublisherPublisher NamePlace of PublicationPublisher Information

DateOriginal Resource Creation DateDigital Creation Date

LanguageDescription

Content DescriptionPhysical Description

Subject and KeywordsPrimary Source

CoveragePlace NameTime Period

DateDate Range

SourceRelationCollectionInstitutionRights ManagementResource TypeFormatIdentifier

URLURNDOIISBNISSNOCLC No.Report No.Government Document No.Accession or Local Control No.UNT Catalog No.RISM No.Other Identifier

NoteMetadata InformationMetadata CreatorDate of Creation

Metadata ModifierDate of Modification

File InformationFile SizeFile NameFormat Name

Format VersionFile description

ResolutionDimensionDurationRateTonal-ResolutionColorCompressionOther File information

Fixity InformationAuthentication TypeAuthentication ResultDate

First DateLast date

System InformationSoftware

Creation Application SoftwareCreation Application NameCreation Application Version

Access Application Software

Access Application NameAccess Application Version

Other Software InformationHardware

Creation HardwareAccess HardwareOther Hardware Information

DocumentationStructural CompositionStorage MediumAccess Inhibitors

Inhibitor KeyFunctionalityExceptionAlteration History

Action TakenDate of AlterationModifier Other Alteration Information

Metadata InformationMetadata Editor/ModifierMetadata Creation/Modification DateMetadata Modification ActionOther Metadata Information

Comments

What metadata will you need?

Page 28: Web At Risk: Extending the Digital Curation Mission to the Web

Preservation  ProgramDigital Preservation Program

Rights Management Approaches• Library of Congress

– Extensive rights management efforts– Permission secured for any site not clearly in

the public domain• If no response, the site is not captured

• Internet Archive– Opt-out policy– Obey robots.txt

• WAS– Flexibility

Page 29: Web At Risk: Extending the Digital Curation Mission to the Web

Preservation  ProgramDigital Preservation Program

Preservation

• Content preserved in the DPR– Bit preservation (fixity, integrity)– Replication– Desiccation

• Massive storage requirements– Multiple projects investigating mass storage

environments

Page 30: Web At Risk: Extending the Digital Curation Mission to the Web

Preservation  ProgramDigital Preservation Program

WAS: Now & into the Future

• Current Status– in development– 12/07 roll out to current curators

• Beyond 2007 – Extending service to additional curators– Developing end user access– Exploring release of open access tools

Page 31: Web At Risk: Extending the Digital Curation Mission to the Web

Preservation  ProgramDigital Preservation Program

Acknowledgements• Tracy Seneca, Web Archiving Coordinator

– CDL WAS development team• Kathleen Murray

– UNT Partners• NDIIIPP