cambridge university library ess update for ucs

From Books to Bits - IT developments in the

University Library

Ed Chamberlain - Systems Development Librarian

Overview

1. UL and the ‘shift to the digital’

2. ESS team and work Resource Discovery service COMET – Cambridge Open METadata

Cambridge University Library …

One of six legal deposit libraries6.5+ million items1 major site

‘Mausoleum of dusty old books for the humanities’

4 ‘dependent libraries’Wider group of college and departmental

libraries – not a federation or single service

Shift to the digital

Aprx. 40% of UL & dependents materials budget now spent on online resources (ejournals, database subscriptions)

e.g. Science Direct, Web of Knowledge, JSTOR, CUP ejournals

Majority of this on STEM publications

Reliance on subscription content housed on publishers websites

Increasing cost

Shift to the digital

Legal deposit electronic intake to start in next year Publishers can submit electronic versions of

material for legal deposit to the legal deposit agency

Initially voluntary for periodicals only

Dependent on law being passed

Internal digitisation

Digitising special collections for some time with external funding

No Google books project

Planning for a unified digital library

DSpace

Organisational change

New skills base in staff

Changes in buildings

Changes in services

Changes in approach

UL divisional layout

Who we are?

30+ staff in the UL

Lead by Patricia Killiard

Mix of skills and backgrounds - (Librarians, I.T. Officers, Developers, Early Career Researchers ...)

ESS – major areas of activity

Two recent projects …

Both in similar area – how library readers can find our stuff, but very different in tone and scope

Resource Discovery platform – commercial software acquisition and implementation (2008-2010)

COMET (Cambridge Open METadata) – JISC funded exercise in publishing linked open data (2011)

Resource Discovery

What do you mean by Resource Discovery?

Catalogue alone does not represent the true scope of library resources

Library catalogues of print collections (Newton) Online article databases

Abstract only – Web of Knowledge, Scopus Full text – JSTOR, Science Direct, journal publisher sites etc

A-Z of ejournal titles Ebook websites Repository content Archive catalogue Other stuff (content on our websites)

Problems with Newton

Newton – traditional library catalogue: Replicates Author / Title / Subject card index on the web Tied into Voyager – part of the same application stack

as library ‘back office’ Cambridge setup fragmented by databases (e.g.

colleges A-N)

Trend in search towards: Keyword based searching Initial ‘dumb’ search – refine afterwards Is this a good thing?

Google generation?

‘Although young people demonstrate an ease and familiarity with computers, they rely on the most basic search tools and do not possess the critical and analytical skills to asses the information that they find on the web.’

‘The study calls for libraries to respond urgently to the changing needs of researchers and other users and to understand the new means of searching and navigating information. Learning what researchers want and need is crucial if libraries are not to become obsolete, the report warns.’

Nicholas, D., et al. "The Google generation: the information behaviour of the researcher of the future." Aslib

Proceedings 60.4 (2008):290-310.

Diminishing brand?

Many students expressed low levels of awareness of electronic resources, combined with a high use of Google.

Very few undergraduate students identified librarians as a source of either recommendations, or of help in searching for information.

However, they regarded the library as a key source of information material, and as a useful study space.

Information Skills Provision: Mapping the information skills of Cambridge undergraduates and induction / training provision across the University. Lizz Edwards-Waller, 2009 (http://arcadiaproject.lib.cam.ac.uk/docs/Report_IRIS_final.pdf)

Two main types of response …

Attempt to ‘educate them’

Try and adapt our resources and mechanisms to better suit their needs

Information rich, time poor

What could we do?

Adopt the newer trend of library ‘resource discovery software’ Common features Recognize that library resources do not end at

the catalogue Harvest resources from ‘silos’ (catalogue,

repository) etc. Separate front end application from backend

What we did

Went to tender (full EU): Open source options look promising now, but

not there at the time We wanted to move quickly Reached decision by June 2009

What happened

Five months of contract negotiation – signed in October 2009

Hardware purchased December 2009Software installed January 2010Live by August 2010

What we got

Aquabrowser – used by Harvard, Edinburgh, York, Chicago, National Libraries of Wales and Scotland

Scalable, provenMinimal hardware requirementsRelatively user friendlyAffordable

How does it work?

How does it work?

Different stages: Import step 1 and 2

First step –Individual datasources –Mapping data structures X, Y, Z to AquaBrowser data format –Defining indexes

Second step –Merging •Bibs, holdings, items –Enriching •Edition grouping, FRBR’ization

What does it cover?

Voyager / Newton catalogues (about 6 million bibliographic records)

Most of Dspace (harvested as Dublin Core)

‘Just in time’ search of article databases

What can we do with it?

Web interface: search.lib.cam.ac.uk Branded as LibrarySearch for Cambridge

XML API (rest-like) produces Marc21-XML and Dublin Core data: www.lib.cam.ac.uk/api

Problems

Historical nature of Cambridge bibliographic records

No policy of centralised cataloguing in Cambridge Lots of duplicate records across Cambridge

libraries ID centric de-duplication – works up to a point

More problems

Cannot replace totally Newton Not the original intention

Place for multiplicity of interfaces

Shift focus of marketing and development to LibrarySearch

Coming soon:

British Library electronic legal depositArchives cataloguesSearch engine crawlable ...

COMET

(Cambridge Open METadata)

Background

Peter Murray-Rust and the JISC Open Bibliography Project

JISC followed this up with a general call for ‘Infrastructure for Resource Discovery’

COMET (Cambridge Open METadata) Releasing large subset of UL records under a Public

Domain Data License

Identifying IPR history of our bibliographic data

Documenting process and releasing tools for others to do the same

Some as Marc21

Converting to useful linked RDF

Establishing a triplestore for the library

Why?

Part of a larger bid across the UK to open up data to provide data for national level discovery options

See what developers can do with our stuff

Gain in-house understanding of semantic web

Better realise value in records through contribution to the public domain

Why not the whole lot?

Legal ownership of bibliographic data

Large chunks of records from cataloguing collectives – reuse as RDF under public domain license not necessarily covered

OCLC – the major record provider are partners on the project

Problems

RDF vocabs – no accepted practice for bibliographic material

Marc21 does not translate well

Triplestores – relative immaturity of software

URI construction – needs to done in a sensible extensible fashion

What?

Eventually hope that we could provide all our metadata in this way

Joint effort with Caret – parallel project at the Fitzwilliam

Triplestore at data.lib.cam.ac.uk Drawing on external developments – no

modelling of data – use existing vocabs and URI guidelines

Project blogspot: http://cul-comet.blogspot.com/

Ed Chamberlain

[email protected]@edchamberlain

cambridge university library ess update for ucs

Education