mellon e-journal archiving project january20, 2002

26
MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002

Upload: brittney-booth

Post on 13-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002

MELLONE-JOURNAL ARCHIVING

PROJECT

January20, 2002

Page 2: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002

DIGITAL PRESERVATION

Page 3: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002

THE BIG ISSUE IN DIGITAL LIBRARIES

• Digital is inherently fragile– constant technological change yields short life

for all digital materials

• Nothing will be saved passively– requires constant and conscious action to

preserve

• A core role for research libraries in the digital era????

Page 4: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002

JOURNAL ARCHIVING IN THE PAPER ERA

• Large-scale redundancy

• Access copy and archival copy usually the same

• Not just storage, but preservation– includes environmental control, library binding,

repair, reformatting. . .

• Deliberate, long-term archiving largely the role of national and research libraries

Page 5: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002

E-JOURNAL MODEL IS DIFFERENT

• “Copies” are remote, held in publisher systems– not replicated across different institutions

• Perpetual license provides limited comfort in the absence of independent copies

• Long-term preservation involves very different issues than day-to-day access

Page 6: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002

LACK OF ARCHIVING A GROWING PROBLEM

• Libraries bearing double costs– the e-journals users prefer– the paper for preservation

• Publishers cannot convert totally to digital– authors and editors distrust e-only journals because of

concerns about persistence– libraries demand paper for preservation

• Libraries preserving paper version, but electronic more complete, increasingly the copy of record

Page 7: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002

MELLON E-JOURNAL ARCHIVING PROGRAM

• 13 institutions invited to submit proposals for a one-year planning project

• Six planning proposals were selected and funded in December 2000– additional project focused on technology

(LOCKSS) also funded

• Second round of Mellon grants to be announced in June will fund actual implementation

Page 8: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002

SIX PLANNING PROJECTS

• Publisher-based – Harvard (Wiley, Blackwell, University of Chicago

Press)– Penn (Oxford and Cambridge University Presses) – Yale (Elsevier)

• Discipline-based – Cornell (agriculture), – NYPL (performing arts)

• Dynamic e-journals – MIT

Page 9: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002

SOME BASIC ASSUMPTIONS

• Archive should be independent of publishers– responsibility of institutions for whom archiving is

a core mission

• Archiving requires active publisher partnership• Address long timeframes (100 years?)• Archive design based on Open Archival

Information System (OAIS) model

Page 10: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002

OBJECTIVES FOR PLANNING PROJECTS

• Develop draft archiving agreements with publisher partners

• Design technical architecture for an archive• Formulate an acquisitions and growth plan• Articulate access policies• Address validation/certification• Design an organizational model, staffing,

long-term funding model

Page 11: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002

Key planning issues/decisions…

Page 12: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002

BASE ON DL INFRASTRUCTURE

• Use existing infrastructure for storage, management, preservation, access

• Enhanced to comply with OAIS model

• New ingest and rendering functions

Page 13: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002

ARCHIVING AGREEMENT

• Explicit archiving license with publisher

• License addresses what content is archived, responsibilities of parties, conditions of use, economics

• Not always an easy negotiation– archiving involves handing publisher’s

intellectual property to independent party

Page 14: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002

PUSH MODEL

• Publishers will “push” content to be archived to Harvard– on-going regular deposit following on-line

publication of issue• (what happens when issues disappear?)

Page 15: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002

WHAT CONTENT IS DEPOSITED?

• “Journal issues” are complex– publishers do not treat all journal content the

same (e. g. “front matter” treated as web pages, not objects in content management systems)

– “associated materials” (datasets, images, tables, etc.) not in the print versions

– advertising usually dynamic, and can involve country-specific complexities

Page 16: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002

SOME COMMON STUFF

• Journal description• Editorial board• Instructions to authors• Rights and usage terms• Copyright statement• Ordering information• Reprint information• Indexes

• Career information• News• Events lists• Discussion fora• Editorials• Errata• Reviewers• Conference

announcements

Page 17: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002

ARCHIVE MOST CONTENT

• Exclude little except advertisements – different from most “local loading”

• Articles include supplementary materials• Include an “issue object” in addition to the

article components– masthead, news, jobs, meetings, etc

• Reference links problematic– dynamic, frequently separate from article

Page 18: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002

STANDARD ARCHIVAL ARTICLE DTD

• Publisher’s SGML formats vary widely• Consultant report on practicality of common

archival XML DTD• Dramatically reduces archive complexity• Issues include

– how low a common denominator– extended character sets, formulae, etc.– sacrifice functionality and original appearance– transformations involve risks

Page 19: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002

DEPOSIT MORE THAN ONE FORMAT?

• Archive must accept PDF in any case– so include both SGML and PDF when

available?• belt and suspenders

– inclined to do this

• Accept publisher’s original SGML also?– conversion to archival DTD will result in loss– inclined to not do this

Page 20: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002

“DARK-TO-LIGHT”

• Archived material not accessible at deposit– do not compete with publishers

• Content becomes accessible after “trigger event”– default then is universal access

• But how do you know “dark” archival content is still good? – it would be better if there was some on-going

access…..

Page 21: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002

ACCESS MODEL

• Archived content always accessible to anyone with appropriate license from publisher – might be satisfied by batch export

• After trigger, simple on-line functionality – assume same functionality for auditors

Page 22: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002

TRIGGER EVENTS

• “N” years after deposit– “N” set by publisher title-by-title

• When title/year no longer commercially accessible on the Internet – still problematic with some publishers

• When content enters public domain

Page 23: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002

PRESERVATION

• Format-by-format issue

• Archive specifies preferred formats, which will be kept renderable

• Just maintain bits for others– e. g., “associated materials” (datasets, models, etc.)

generally accepted in ANY format• maintaining the viability of such wildly heterogeneous

materials unrealistic

– keep unaltered for future “digital archeology”

Page 24: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002

ECONOMIC MODEL

• First question is not who pays, but what will it cost…– reducing costs to the minimum is critical

• In general publishers expected to bear preparation costs for archived objects

• Process automation critical to keeping costs low– ingest process

– auditing

Page 25: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002

PAYMENT WITH DEPOSIT

• Two part fee– ingest fee to cover up-front costs

• varies with publisher effort to create easily archived objects???

– “dowry” to create maintenance endowment

• Sources include subscribers, authors, societies

Page 26: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002

NEXT…..

• Proposal to Mellon by April 1 for funding to implement an archive– particular parameters of the call-for-proposals still

uncertain

• Original plan suggested 3 or 4 year projects• Intent is to implement archive, contract for

deposit, begin operations– learn by getting dirty hands– help understand issues, costs