access to digital back copy

63
Access to Digital Back Copy http://www.flickr.com/photos/shinez/5000985919/

Upload: edina-university-of-edinburgh

Post on 10-Jul-2015

347 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Access to Digital Back Copy

Access to Digital Back Copy

http://www.flickr.com/photos/shinez/5000985919/

Page 2: Access to Digital Back Copy

to ensure researchers, students & their teachers have

ease and continuing access toonline resources for scholarship

licence to use

“ease” “continuing”

usability preservation

access to content & tools

Our Shared Task is

Page 3: Access to Digital Back Copy

what was once available in print ,

on-shelf locally …

… is now online & accessed remotely,

‘anytime/anywhere’

exploiting the telematic opportunity!1990s/1990s Euro-speak

But what of Continuity of Access?

we’ve seen improved Ease of Access

Page 4: Access to Digital Back Copy

Back Copy, once available in print on-shelf locally(or via that tedious ILL)

Picture credit: http://somanybooksblog.com/2009/03/27/library-tour/

… is where exactly is the digital back copy?

Page 5: Access to Digital Back Copy

… not in the custody of Libraries

Picture credit: http://somanybooksblog.com/2009/03/27/library-tour/

Libraries boast of ‘e-collections’, but maybe they only have ‘e-connections’

=> real & present threat to the integrity of what is published as scholarly record

Page 6: Access to Digital Back Copy

The following questions are implicit:

1. What exactly was once on library shelves

& What exactly is the scholarly record?… and where is it now?

Ensuring access to digital back copy:

Page 7: Access to Digital Back Copy

The following questions are implicit:1. What exactly was once on library shelves

& What exactly is the scholarly record?

2. What is now ‘on the Web’?… or rather, what was once ‘on the Web’?

Ensuring access to digital back copy:

Page 8: Access to Digital Back Copy

The following questions are implicit:1. What exactly was once on library shelves

& What exactly is the scholarly record?

2. What is now ‘on the Web’?

3. What of other (external) resources, now issued online & needed for scholarship?

eg Gov. Docs, the cultural record?

Ensuring access to digital back copy:

Page 9: Access to Digital Back Copy

The following questions are implicit:1. What exactly was once on library shelves

& What exactly is the scholarly record?

2. What is now ‘on the Web’?

3. What of other (external) resources needed for scholarship,

eg Gov. Docs, the cultural record?

2. & whose responsibility to archive content? Each research library; consortia; national/state

libraries/archives?

& is this a national, or a trans-national challenge? challenge?

Ensuring access to digital back copy:

Page 10: Access to Digital Back Copy

What every country should know: trans-national action!

%age of 132,806 ISSN issued for e-serials (December 2013)

US: 20%Sp: 5%

Rest of World: > 50%

Researchers (& libraries/publishers) in any one country are dependent upon content written and

published as serials in countries other than their own

Canada 5.5%UK: 9%

Brazil: 6%

Ger: 6%

Page 11: Access to Digital Back Copy

licence to use

Ensuring researchers, students and their teachers have

ease and continuing accessto online resources used for scholarship

“ease” “continuing”

usability preservation

access to content & services

security & integrity of medium

replicationusability of format

back content

semantiic drift

archiving

Access to Digital Back Copy: Search for digital shelving …

trust & verification

Page 12: Access to Digital Back Copy

Reflect upon a landmark, 10+ years ago

The editor, Linda Cantara [Abbott]

passed away, 22 August, 2013

Page 13: Access to Digital Back Copy

Her summary of “responsibility for archiving the content of electronic journals”, involved some familiar organisational names

And so began different investigations; all addressed key issues: • Identification of what should be archived • Guidelines for accessing e-journal archives• Development of sustainable economic and business models

Page 14: Access to Digital Back Copy

The result includes some digital shelves

a. Web-scale not-for-profit archiving agencies:

a. National libraries …

a. Research libraries: consortia & specialist centres …

… alongside other Keepers with archival intent:

National Science Library, Chinese Academy of Sciences

National Science Library, Chinese Academy of Sciences

Different models

100 +

Page 15: Access to Digital Back Copy

Many archiving organisations a Good Thing

“Digital information is best preserved by replicating it at multiple archives run by autonomous organizations”

B. Cooper and H. Garcia-Molina (2002)

Bad stuff will happen!

Page 16: Access to Digital Back Copy

following themes recur:

1. Identify Threat & Seek Remedy ✔2. What’s the (scale of the) Present Danger?

• How do we know?

3. What’s the Remedy?• How best to implement remedy?

4. Monitor progress / Reflect / Re-think

5. Repeat ↵

Moving towards some practical steps …

Page 17: Access to Digital Back Copy

… to discover who is looking after what

*New in 2014*

Library of Congress and Scholars Portal

now reporting in

*What’s New in 2014 and what’s coming*

eg Library of Congress and Scholars Portal

now reporting in

New functionality

Evidence of what is archived

Page 18: Access to Digital Back Copy

Keepers Registry: an online service that has:

• free-to-web facilities:• search and browse by serial title, ISSN and by publisher

• ‘Holdings statement’ – issues & volumes• summary statistics; date of last update for each ‘Keeper’

+

• a Members Area [enabling additional functionality] check archival status of list of ISSN machine (API) interfaces, eg OpenURL link [3rd party

website] statistics, beyond those provided on the simple user

interface

• the Keepers Area [to be ‘co-designed’]

Page 19: Access to Digital Back Copy

Successfully made transition to be a sustainable service!

Sustainable …• Technologically: the software/hardware/data

• Organisationally: EDINA & ISSN IC, Jisc Core Service

• Financially: costs understood; has recurrent revenue

Needed & wanted by one or more Use Community

1. the means to discover who is looking after what, how & access terms

2. the lens on what is being kept safe => what is at risk of loss

3. a showcase for archival organizations of all types, worldwide.

Keepers Registry: an online service that is …

Page 20: Access to Digital Back Copy

ISSN Register

E-J Preservation Registry Service

E-Journal Preservation

Registry

user requirements

(a)

(b)

ISSN-L as kernel field

METADATAon extant e-serials

METADATA on preservation action

Digital Preservation Agencies

Pilot: CLOCKSS, Portico; BL, KB; UK LOCKSS Alliance

A Project to

Pilot an E-journal Preservation Registry Service

Need to know who is looking after what & how?

The Keepers Registry

"Tales from the Keepers Registry"

Serials Review 39.1 (2013)

Serials, March 2009

Project Data Model

Page 21: Access to Digital Back Copy

10 Questions & Some Short Answers1. What type of resources are recorded in the Keepers Registry?

Very short answer: Serial content

The streams of content (in digital form) that are:• issued online in parts (e.g. journal content) • issued through change over time (e.g. web page).

The Registry follows the rules used for ISSN assignment. Such serial titles include:• digitised journal content as well as born digital• e-books that are issued as a series (having ISSN)• contents of selected websites• what may be made available via repositories.

Page 22: Access to Digital Back Copy

10 Questions & Some Short Answers …

2. Is the purpose of the Registry MAINLY to record

'scholarly resources’?• and does that also mean cultural heritage resources?

Very short answer: That was the motivation, but …

Page 23: Access to Digital Back Copy

The Scholarly Record & Serials … [not to scale]

Continuing Resources

‘The Scholarly Record’

‘resources needed for scholarship’

Issued in Parts (Serials)

Content changes over time

(Intergrating)

‘e-journals’

Websites, Databases, Repositories

‘Book-length work’

‘Gov Docs’

Page 24: Access to Digital Back Copy

10 Questions & Some Short Answers (cont)

3. Why has Keepers Registry a global remit, why not national registries?• Researchers (& libraries/publishers) in any one country are dependent

upon content written & published as serials in countries other than their own

3. Does the Keepers Registry intend to carry out audit or certification?• No, but each ‘keeper’ can report such information

3. What granularity is recorded about archived content?• Issue & volume (& year if available)• Not article-level, altho’ keepers can report at that level

Page 25: Access to Digital Back Copy

10 Questions & Some Short Answers (cont)6. Is theKeepers.org only intended for librarians and policy-

makers or also for individual scholars?

• Open for all but geared to librarians who would be stewards

6. What is meant by archived, and is this the same as preserved?

• Someone is keeping with archival intent; preservation levels?

6. Can the Keepers Registry help print archiving initiatives?

• It already assists UK Research Reserve

6. Can the Keepers Registry help digitisation initiatives?

7. And what about the Internet Archive?

• Interesting you should ask – ability to ‘see the streams’ ?

Page 26: Access to Digital Back Copy

What’s the (scale of the) Present Danger?• How do we know?

In 2011, the Keepers Registry recorded

16,558 titles ‘ingested & archived’ by at least 1 ‘keeper’

21,557 in 2013

26,195 as at November 2014

9,656 'ingested & archived' by 3+

More archives reporting into Registry & more archiving!

Page 27: Access to Digital Back Copy

“Are we there yet?” … “Don’t think so”

‘Ingest Ratio’ = titles being ingested by one or more Keeper

/ ‘online serials’ in ISSN Register

= 26,195 / 136,965 [in March 2014]

=> 19%(We do not know about 80% of e-serials having ISSN)

‘KeepSafe Ratio’ = titles being ingested by 3+ Keepers/ ‘online serials’ in ISSN Register

= 9,656 / 136,965

=> 7%

Page 28: Access to Digital Back Copy

Evidence using Title List Comparison tool

As reported in: P. Burnhill (2013) Tales from The Keepers Registry: Serial Issues About Archiving & the Web. Serials Review 39 (1), 3–20. http://www.sciencedirect.com/science/article/pii/S0098791313000178, &https://www.era.lib.ed.ac.uk/handle/1842/6682

In 2011/12 three major research libraries in the USA (Columbia, Cornell & Duke)

checked archival status of serial titles regarded as important

‘Ingest Ratio’ = 22% to 28%, ie about a quarter

=> fate of c.75% is unknown

Page 29: Access to Digital Back Copy

very many ‘at risk’ e-journals from many small publishers

BIG publishers

act early but incompletely

Priority: find economic way to

archive content from …

Page 30: Access to Digital Back Copy

… with usage logs for the UK OpenURL Router*• 8.5m full text requests in UK during 2012

=> 53,311 online titles requested Analysis in 2013::

‘Ingest Ratio’ = 32% (16,985/53,311)

=> over two thirds 68% (36,326 titles) held by none!

User-centric Evidence

* As reported in Keepers Registry Blog, OpenURL Router passes ‘discovery’ requests to commercial OpenURL resolver services; developed & delivered by EDINA as part of Jisc support for UK universities & colleges

Next Step is to focus on ‘scholarly record’?

Page 31: Access to Digital Back Copy

Imagine CNI 2020• Best Case scenario

– Publishers (& Libraries) have acted– Together with the Keepers they have ensured

that all the e-journal content used by researchers this year (in 2014) has been preserved and can be used successfully in 2020

Page 32: Access to Digital Back Copy

Imagine CNI 2020

Page 33: Access to Digital Back Copy

Added remarks from related projects

• Keepers Extra: 2-year investment by Jisc to ensure that the Keepers Registry is all it can be

• Hiberlink: Investigation into the threat of ‘reference rot’; bonus report of potential remedy

– With thanks to Andrew W Mellon Foundation

• SafeNet: 2-year investigation for Jisc into a PLN for the UK, with part focus on ‘post-cancelation access’

Page 34: Access to Digital Back Copy

Keepers Extra: 2-year (Jisc) ProjectBuilds on the work of the eJournal Archiving Group run by Jisc in 2012/13 (we may re-name this project as JARVIG): •Assign priority of attention: collection judgement & decisions

•Provide librarians with a toolkit relating to collection coverage, using the Keepers Registry

•R&D on data quality and metadata challenges – Might lead to of service enhancements for Keepers Registry

– Improve ‘holdings display’

•Governance?

•Extend Keepers Registry model

– to recognise identifiers other than ISSN (URN?)

– model for how other types of scholarly content are kept safe?

Page 35: Access to Digital Back Copy

We will have something now to report & yet more to say in 2015

Two-year project funded by Andrew Mellon Foundation

‘Reference Rot’ When what was referenced & cited

ceases to say the same thing, or ‘has ceased to be’http://www.snorgtees.com/this-parrot-has-ceased-to-be

… undermining the integrity of what is published

Page 36: Access to Digital Back Copy

An International Team at Workfunded by the

Andrew W. Mellon Foundation

• Los Alamos National Laboratory:

Research Library: Martin Klein, (Rob Sanderson), Harihar Shankar, Herbert Van de Sompel

• University of Edinburgh:

Language Technology Group: Beatrice Alex, Claire Grover, Richard Tobin, Ke “Adam” Zhou

EDINA * : Neil Mayo, Muriel Mewissen (Project Manager), Christine Rees, Tim Stickland, Richard Wincewicz, Peter Burnhill

Centre for Service Delivery & Digital Expertise

Funded by the Andrew W. Mellon Foundation

Page 37: Access to Digital Back Copy

Reference Rot = Link Rot + Content Drift

“when links to web resources no longer point to what they once did”

Investigating Reference Rot in Web-Based Scholarly Communication

Page 38: Access to Digital Back Copy

Link Rot

‘Link Rot’

Page 39: Access to Digital Back Copy

+ Content Drift: What is at end of URI has changed, or gone!

http://dl00.org

2000

http://dl00.org

2004

http://dl00.org

2005

http://dl00.org

2008

(a) Dynamic contentas values on webpage changes over time

(b) Static contentbut very different (often

unrelated) web pages

Page 40: Access to Digital Back Copy

What of the references to Web resources that were cited in the landmark publication ?

Page 41: Access to Digital Back Copy

11 years later, few references work as intended

Page 42: Access to Digital Back Copy

A re-direct [from RLG to OCLC] but ‘content drift’

Fail !!

Page 43: Access to Digital Back Copy

Reference no longer works: ‘link rot’

Fail !!

Page 44: Access to Digital Back Copy

Reference no longer works: ‘link rot’

Fail !!

Page 45: Access to Digital Back Copy

A re-direct but content not found

Fail !!

Page 46: Access to Digital Back Copy

Successful link: URI works as expected

Page 47: Access to Digital Back Copy

Successful link: URI works as expected

Page 48: Access to Digital Back Copy

Classic link rot: ‘Page Not Found’

Fail !!

Page 49: Access to Digital Back Copy

reference to the Web is to an e-journal that is still current

Page 50: Access to Digital Back Copy

Classic link rot: ‘Page Not Found’

Fail !!

Page 51: Access to Digital Back Copy

URI works but content drift: reference is not as intended

Fail !!

Page 52: Access to Digital Back Copy

This is a Threat to The Integrity of

The Scholarly Record

hiberlink.org

Page 53: Access to Digital Back Copy

What we are doing in Hiberlink1. Creating evidence on extent of ‘Reference Rot’

– Main focus has been on references (& URIs) made in Journal Articles• Inc. reference rot in Supreme Court judgments with Harvard Law Library & permaCC

– ETD2014 was opportunity to look at Reference Rot & the e-Thesis

– PRELIDA is opportunity to look at impact on Linked Data

1. Understanding the preparation/publication/ingest workflow(s) – Identifying opportunity for productive intervention

1. Prototypes for pro-active archiving to enable remedy– Embedding such ‘solutions’ in existing tools & infrastructure

– Propose/test new infrastructure for temporal referencing • supporting & using the Memento protocol

1. Raising awareness & seeking collaborative actions…. through events like this

Page 54: Access to Digital Back Copy

Remedy for The Integrity of The Scholarly Record

Envisage the best opportunities for Intervention to make Remedy, to ‘flash-freeze’, either to avoid reference rot or to ‘stop the rot’.

3 basic workflows:a.Study: Preparation -> (Review) -> Submission b.Publication: Editorial -> (Revision) -> Acceptance -> Issue c.Post-Publication: Deposit/Ingest -> Provide/Access -> Use

Identify the Actors involved in:a.Composition: author/creatorb.Public Release: editor/referee/copy c.Curation: librarian / repository manager / archivist

Page 55: Access to Digital Back Copy

1. Hiberlink Plug-in - to help authors and middle-folk

(publishers/librarians) do the right thing:

– Zotero - used by authors to manage references

https://www.zotero.org/

– Open Journal System (OJS) - used by OA publishers

https://pkp.sfu.ca/ojs/

‘Work in progress’ to effect Remedy (1)

Page 56: Access to Digital Back Copy

For use during preparation of thesis & before final submission but also

before deposit with Library (& maybe for repair by Library …)

Hiberlink Plug-in for Zotero a. Triggers archiving of referenced web content

b. Returns Datetime URI for archived content

Page 57: Access to Digital Back Copy

1. Hiberlink Plug-in - to enable pro-active archiving

2. Missing Link - re-factor the HTML link that is returned

‘Work in progress’ to effect Remedy (2)

b) Augment Link with a set of Datetime & location pairs

a) Take simple URI - to French National Library (say)

Page 58: Access to Digital Back Copy

1. Hiberlink Plug-in - to enable pro-active archiving

2. Missing Link - re-factoring the HTML link

First two approaches support ‘perfect scenario’:

• All authors archive all their cited URIs

• e.g. (but not exclusively) with Hiberlink / Zotero

3. HiberActive

– Enables repositories to ‘stop the rot’ by actively archiving those references in e-theses

– A notification hub, a component for the infrastructure

• testing workflow with ResourceSync, CORE & external archive programme

‘Work in progress’ to effect Remedy (3)

Page 59: Access to Digital Back Copy

Back Copy, once available in print on-shelf locally(or via that tedious ILL)

Picture credit: http://somanybooksblog.com/2009/03/27/library-tour/

… is where exactly is the digital back copy?

Scholarly e-journals Alternative ‘Scholarly’ & other Web venues

That which supports scholarly statement: References / Citations

In Scholarly e-journals On the ‘Web at Large’

Page 60: Access to Digital Back Copy

a. Web-scale not-for-profit archiving agencies:

b. National libraries …

a. Research libraries: consortia & specialist centres …

Meanwhile: Promote & engage the real heroes!

National Science Library, Chinese Academy of Sciences

100 +

Page 61: Access to Digital Back Copy

What you can also do today!

1. Engage now with the real heroes of this story: those that provide digital shelving

2. Go to the Keepers Registry => thekeepers.org Search on Title/ISSN

• Check key volumes & issues are being archived Browse by publisher

3. Sign-up to test the new Member Services: Title List Comparison tool

• Are your Titles actually being archived?

• & Check archival status for ISSNs listed in citations

Linking Options for ‘archival status’ on your website

Page 62: Access to Digital Back Copy

very many ‘at risk’ e-journals from many small publishersincluding Gov Docs!

BIG publishers

act early but incompletely

Priority: work with other

organisations to find economic way to

archive content from …

Page 63: Access to Digital Back Copy

Access to Digital Back Copy

http://www.flickr.com/photos/shinez/5000985919/

Thank you