vrc: preservation risk management for web resources nancy y. mcgovern, ecure 2004

31
VRC: Preservation Risk Management for Web Resources Nancy Y. McGovern, ECURE 2004

Post on 18-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: VRC: Preservation Risk Management for Web Resources Nancy Y. McGovern, ECURE 2004

VRC: Preservation Risk Management for Web Resources

Nancy Y. McGovern, ECURE 2004

Page 2: VRC: Preservation Risk Management for Web Resources Nancy Y. McGovern, ECURE 2004

VRC Funding

• Part of a 4(5)-year NSF-funded project – supported by the Digital Libraries Initiative, Phase 2

(Grant No. IIS-9905955, the Prism Project)

• Also partially funded by a grant from The Andrew W. Mellon Foundation– Political Communications Web Archiving

http://www.crl.edu/content/PolitWeb.htm

• For updates:– http://irisresearch.library.cornell.edu/VRC/

Page 3: VRC: Preservation Risk Management for Web Resources Nancy Y. McGovern, ECURE 2004

Current Team

Anne R. Kenney, Advisor

Nancy Y. McGovern, Project Manager

Richard Entlich, Sr. Researcher

William R. Kehoe, Technology Coordinator

Ellie Buckley, Digital Research Specialist

Erica Olsen (recent)

Carl Lagoze, CIS PI

Page 4: VRC: Preservation Risk Management for Web Resources Nancy Y. McGovern, ECURE 2004

Research Scope

see, "Preservation Risk Management for Web Resources: Virtual Remote Control in Cornell's Project Prism"

by Anne R. Kenney, Nancy Y. McGovern, Peter Botticelli, Richard Entlich, Carl Lagoze, and Sandra Payette

in DLib Magazine, January 2002

http://www.dlib.org/dlib/january02/kenney/01kenney.html

Page 5: VRC: Preservation Risk Management for Web Resources Nancy Y. McGovern, ECURE 2004

Virtual…

• because VRC develops models to represent essential features of selected Web sites

• that enable ongoing monitoring over time

• to identify, respond to, and mitigate potential risks to the site integrity and longevity

Page 6: VRC: Preservation Risk Management for Web Resources Nancy Y. McGovern, ECURE 2004

Remote…

• because VRC is intended for use by cultural heritage institutions

• interested in the longevity of Web resources

• residing on remote servers –not owned or managed by the monitoring

institution

Page 7: VRC: Preservation Risk Management for Web Resources Nancy Y. McGovern, ECURE 2004

Control…

• because at the most proactive end of the VRC approach

• a monitoring organization may act to protect another organization's resources

• by agreement or implicit consent

• through notification and/or action

Page 8: VRC: Preservation Risk Management for Web Resources Nancy Y. McGovern, ECURE 2004

Purpose

• Develop a model for research libraries (adaptable to other contexts)

• Support spectrum from passive monitoring to active capture

• Lifecycle support: selection to capture

• Understand nature of Web resources

• Promulgate good practice

Page 9: VRC: Preservation Risk Management for Web Resources Nancy Y. McGovern, ECURE 2004

Types of Web Resources

Two types of initiatives for monitoring and/or capture of:

• Web-based publications [Web site as a means]

• All of (or a subset of) a Web site consisting of pages within a boundary defined by a URL (or a portion of one) [Web site as an end] (VRC)

Page 10: VRC: Preservation Risk Management for Web Resources Nancy Y. McGovern, ECURE 2004

Nature of Risks

Two perspectives on Web-based risk:

• potential liability of an institution based upon the content of its Web site, or a Web site for which it is responsible

• potential threats to the integrity and longevity of a Web resource (VRC)

Page 11: VRC: Preservation Risk Management for Web Resources Nancy Y. McGovern, ECURE 2004

Types of Risks

Include:

• technological obsolescence

• security weaknesses and breaches

• human-error in developing/maintaining sites

• organizational issues; benign neglect

• power and technology failures

• inadequate backup and secondary systems

Page 12: VRC: Preservation Risk Management for Web Resources Nancy Y. McGovern, ECURE 2004

Risk Factors

• Organizational Context

• Combination of indicators

• Monitoring (change/loss over time)

• Triggers (events, organizational, upgrades)

• Degradation of site management indicators

Page 13: VRC: Preservation Risk Management for Web Resources Nancy Y. McGovern, ECURE 2004

VRC Stages

1. Identification

2. Analysis

3. Appraisal

4. Strategy

5. Detection

6. Response

Page 14: VRC: Preservation Risk Management for Web Resources Nancy Y. McGovern, ECURE 2004

Human – Tool Scenario

1. Identification– Human: identify Web resources of interest– Tool: verify list, expand list

2. Analysis– Tool: crawl sites, generate characterizations– Human: accept/revise characterizations

3. Appraisal– Human: define/review attributes of value– Tool: support appraisal, capture results

Page 15: VRC: Preservation Risk Management for Web Resources Nancy Y. McGovern, ECURE 2004

Human – Tool Scenario4. Strategy

– Human: develop/review strategies– Tool: plot appraisals, compile strategies

5. Detection– Human: define risk parameters– Tool: identify/assess risks; propose responses

6. Response– Tool: propose risk response based on rules;

automatic response for some risk categories– Human: monitor automated responses; select

response based on recommended actions

Page 16: VRC: Preservation Risk Management for Web Resources Nancy Y. McGovern, ECURE 2004

Contextual Layers

Page 17: VRC: Preservation Risk Management for Web Resources Nancy Y. McGovern, ECURE 2004

Server-level Monitoring• Potential multi-site impact• Server vulnerabilities put site content at risk

– deletion or modification

• Patches and new versions of Microsoft IIS and Apache server released frequently

• Apache http server 1.3 security updates– to version 1.3.26 on June 18, 2002 – to version 1.3.27 on October 3, 2002

Page 18: VRC: Preservation Risk Management for Web Resources Nancy Y. McGovern, ECURE 2004

Apache HTTP server upgrades

0

10

20

30

40

50

60

70

6/1

7/2

00

2

6/2

4/2

00

2

7/1

/20

02

7/8

/20

02

7/1

5/2

00

2

7/2

2/2

00

2

7/2

9/2

00

2

8/5

/20

02

8/1

2/2

00

2

8/1

9/2

00

2

8/2

6/2

00

2

9/2

/20

02

9/9

/20

02

9/1

6/2

00

2

9/2

3/2

00

2

9/3

0/2

00

2

10

/7/2

00

2

10

/14

/20

02

10

/21

/20

02

10

/28

/20

02

11

/4/2

00

2

11

/11

/20

02

11

/18

/20

02

11

/25

/20

02

12

/2/2

00

2

12

/9/2

00

2

12

/16

/20

02

12

/23

/20

02

12

/30

/20

02

1/6

/20

03

1/1

3/2

00

3

Week of server check

% S

erv

ers

up

gra

de

d

Upgrades to Apache 1.3.26--Asia sites Upgrades to Apache 1.3.26--ARL sitesUpgrades to Apache 1.3.27--Asia sites Upgrades to Apache 1.3.27--ARL sites

Server-level Monitoring

Page 19: VRC: Preservation Risk Management for Web Resources Nancy Y. McGovern, ECURE 2004

VRC Toolkit

• Identify tools for each stage (adopt, adapt, define, devise)

• Leverage existing; apply to longevity

• Analyze steps - automated and manual

• Formalize protocol

• Provide a framework to map existing, plug gaps with developments

Page 20: VRC: Preservation Risk Management for Web Resources Nancy Y. McGovern, ECURE 2004

VRC Toolkit

Development steps:

– extensive literature review– development of tool categories– definition of categories and test protocols– survey existing tools for evaluation – select representative for testing – highlight findings in category summaries

Page 21: VRC: Preservation Risk Management for Web Resources Nancy Y. McGovern, ECURE 2004

Web Crawling

• traversing Web sites via links

• a capability common to most tools, but with different purposes and results

• the VRC toolkit needs more than just Web crawlers

Page 22: VRC: Preservation Risk Management for Web Resources Nancy Y. McGovern, ECURE 2004

Tool Categories

Link checkers

Web site monitors

Web crawlers

Site management

Change Management

Site Mapping (includes visualization)

Page 23: VRC: Preservation Risk Management for Web Resources Nancy Y. McGovern, ECURE 2004
Page 24: VRC: Preservation Risk Management for Web Resources Nancy Y. McGovern, ECURE 2004
Page 25: VRC: Preservation Risk Management for Web Resources Nancy Y. McGovern, ECURE 2004
Page 26: VRC: Preservation Risk Management for Web Resources Nancy Y. McGovern, ECURE 2004

OAIS Issues

• Pre-Ingest: Selection options

• Ingest: Capture– vs. monitoring– Targets, level and frequency

• Archival Storage: Formats

• Access: Site(s) vs. Page(s)

• AIP: Metadata issues

Page 27: VRC: Preservation Risk Management for Web Resources Nancy Y. McGovern, ECURE 2004

Management Issues

• frequency of capture – determined by– nature of sites/pages– events: technological, organizational– resources

• well-informed crawling

• valuable vs. archival

Page 28: VRC: Preservation Risk Management for Web Resources Nancy Y. McGovern, ECURE 2004

Mandate

• to fully document the site by capturing all changes to the pages/sites

• to capture significant changes to pages/sites

• to record periodic versions of the site

• to capture one-time copy of pages/sites

Page 29: VRC: Preservation Risk Management for Web Resources Nancy Y. McGovern, ECURE 2004

Current Activities

• VRC Preservation Risk Management Program:– Map stages to tool requirements– Apply to potential organizational scenarios– Enable risk/response scenario development

• Toolkit:– Revise and populate tool inventory– VRC Control Site

Page 30: VRC: Preservation Risk Management for Web Resources Nancy Y. McGovern, ECURE 2004

Future Projects

• Develop approach for building human sexuality collection: capturing Web blogs and other Internet communications

• State Government Web site case study

• Demonstrators for toolkit scenarios

Page 31: VRC: Preservation Risk Management for Web Resources Nancy Y. McGovern, ECURE 2004

For Discussion

What would the VRC approach have to address to be of interest, value, and/or potential impact for archivists and records managers?