cambridge university library ess update for ucs
DESCRIPTION
Talk given at UCS morning seminar 4/5/11TRANSCRIPT
From Books to Bits - IT developments in the
University Library
Ed Chamberlain - Systems Development Librarian
Overview
1. UL and the ‘shift to the digital’
2. ESS team and work Resource Discovery service COMET – Cambridge Open METadata
Cambridge University Library …
One of six legal deposit libraries6.5+ million items1 major site
‘Mausoleum of dusty old books for the humanities’
4 ‘dependent libraries’Wider group of college and departmental
libraries – not a federation or single service
Shift to the digital
Aprx. 40% of UL & dependents materials budget now spent on online resources (ejournals, database subscriptions)
e.g. Science Direct, Web of Knowledge, JSTOR, CUP ejournals
Majority of this on STEM publications
Reliance on subscription content housed on publishers websites
Increasing cost
Shift to the digital
Legal deposit electronic intake to start in next year Publishers can submit electronic versions of
material for legal deposit to the legal deposit agency
Initially voluntary for periodicals only
Dependent on law being passed
Internal digitisation
Digitising special collections for some time with external funding
No Google books project
Planning for a unified digital library
DSpace
Organisational change
New skills base in staff
Changes in buildings
Changes in services
Changes in approach
UL divisional layout
Who we are?
30+ staff in the UL
Lead by Patricia Killiard
Mix of skills and backgrounds - (Librarians, I.T. Officers, Developers, Early Career Researchers ...)
ESS – major areas of activity
ESS – major areas of activity
Two recent projects …
Both in similar area – how library readers can find our stuff, but very different in tone and scope
Resource Discovery platform – commercial software acquisition and implementation (2008-2010)
COMET (Cambridge Open METadata) – JISC funded exercise in publishing linked open data (2011)
Resource Discovery
What do you mean by Resource Discovery?
Catalogue alone does not represent the true scope of library resources
Library catalogues of print collections (Newton) Online article databases
Abstract only – Web of Knowledge, Scopus Full text – JSTOR, Science Direct, journal publisher sites etc
A-Z of ejournal titles Ebook websites Repository content Archive catalogue Other stuff (content on our websites)
Problems with Newton
Newton – traditional library catalogue: Replicates Author / Title / Subject card index on the web Tied into Voyager – part of the same application stack
as library ‘back office’ Cambridge setup fragmented by databases (e.g.
colleges A-N)
Trend in search towards: Keyword based searching Initial ‘dumb’ search – refine afterwards Is this a good thing?
Google generation?
‘Although young people demonstrate an ease and familiarity with computers, they rely on the most basic search tools and do not possess the critical and analytical skills to asses the information that they find on the web.’
‘The study calls for libraries to respond urgently to the changing needs of researchers and other users and to understand the new means of searching and navigating information. Learning what researchers want and need is crucial if libraries are not to become obsolete, the report warns.’
Nicholas, D., et al. "The Google generation: the information behaviour of the researcher of the future." Aslib
Proceedings 60.4 (2008):290-310.
Diminishing brand?
Many students expressed low levels of awareness of electronic resources, combined with a high use of Google.
Very few undergraduate students identified librarians as a source of either recommendations, or of help in searching for information.
However, they regarded the library as a key source of information material, and as a useful study space.
Information Skills Provision: Mapping the information skills of Cambridge undergraduates and induction / training provision across the University. Lizz Edwards-Waller, 2009 (http://arcadiaproject.lib.cam.ac.uk/docs/Report_IRIS_final.pdf)
Two main types of response …
Attempt to ‘educate them’
Try and adapt our resources and mechanisms to better suit their needs
Information rich, time poor
What could we do?
Adopt the newer trend of library ‘resource discovery software’ Common features Recognize that library resources do not end at
the catalogue Harvest resources from ‘silos’ (catalogue,
repository) etc. Separate front end application from backend
What we did
Went to tender (full EU): Open source options look promising now, but
not there at the time We wanted to move quickly Reached decision by June 2009
What happened
Five months of contract negotiation – signed in October 2009
Hardware purchased December 2009Software installed January 2010Live by August 2010
What we got
Aquabrowser – used by Harvard, Edinburgh, York, Chicago, National Libraries of Wales and Scotland
Scalable, provenMinimal hardware requirementsRelatively user friendlyAffordable
How does it work?
How does it work?
How does it work?
Different stages: Import step 1 and 2
First step –Individual datasources –Mapping data structures X, Y, Z to AquaBrowser data format –Defining indexes
Second step –Merging •Bibs, holdings, items –Enriching •Edition grouping, FRBR’ization
What does it cover?
Voyager / Newton catalogues (about 6 million bibliographic records)
Most of Dspace (harvested as Dublin Core)
‘Just in time’ search of article databases
What can we do with it?
Web interface: search.lib.cam.ac.uk Branded as LibrarySearch for Cambridge
XML API (rest-like) produces Marc21-XML and Dublin Core data: www.lib.cam.ac.uk/api
Problems
Historical nature of Cambridge bibliographic records
No policy of centralised cataloguing in Cambridge Lots of duplicate records across Cambridge
libraries ID centric de-duplication – works up to a point
More problems
Cannot replace totally Newton Not the original intention
Place for multiplicity of interfaces
Shift focus of marketing and development to LibrarySearch
Coming soon:
British Library electronic legal depositArchives cataloguesSearch engine crawlable ...
COMET
(Cambridge Open METadata)
Background
Peter Murray-Rust and the JISC Open Bibliography Project
JISC followed this up with a general call for ‘Infrastructure for Resource Discovery’
COMET (Cambridge Open METadata) Releasing large subset of UL records under a Public
Domain Data License
Identifying IPR history of our bibliographic data
Documenting process and releasing tools for others to do the same
Some as Marc21
Converting to useful linked RDF
Establishing a triplestore for the library
Why?
Part of a larger bid across the UK to open up data to provide data for national level discovery options
See what developers can do with our stuff
Gain in-house understanding of semantic web
Better realise value in records through contribution to the public domain
Why not the whole lot?
Legal ownership of bibliographic data
Large chunks of records from cataloguing collectives – reuse as RDF under public domain license not necessarily covered
OCLC – the major record provider are partners on the project
Problems
RDF vocabs – no accepted practice for bibliographic material
Marc21 does not translate well
Triplestores – relative immaturity of software
URI construction – needs to done in a sensible extensible fashion
What?
Eventually hope that we could provide all our metadata in this way
Joint effort with Caret – parallel project at the Fitzwilliam
Triplestore at data.lib.cam.ac.uk Drawing on external developments – no
modelling of data – use existing vocabs and URI guidelines
Project blogspot: http://cul-comet.blogspot.com/
Ed Chamberlain
[email protected]@edchamberlain