aggregation using linked data – locah project experiences
DESCRIPTION
Workshop with Paul Walk and Herbert Van De Sompel at OAI7, Geneva, http://indico.cern.ch/conferenceTimeTable.py?confId=103325#20110622TRANSCRIPT
www.bath.ac.uk
UKOLN is supported by:
Aggregation Using Linked Data – LOCAH Project Experiences
23rd June 2011
OAI7, Geneva, Switzerland
Adrian Stevenson
LOCAH Project Manager
www.bath.ac.uk
LOCAH Project• Linked Open Copac and Archives Hub• Funded by #JiscEXPO 2/10 ‘Expose’ call
– 1 year project. Started August 2010
• Partners & Consultants:– UKOLN – Adrian Stevenson, Julian Cheal– Mimas – Jane Stevenson, Bethan Ruddock, Yogesh
Patel– Eduserv – Pete Johnston– Talis – Leigh Dodds, Tim Hodson– OCLC - Ralph LeVan, Thom Hickey– Ed Summers
• http://blogs.ukoln.ac.uk/locah/ tag: #locah
www.bath.ac.uk
Archives Hub and Copac• UK National Data Services based at Mimas• Archives Hub is an aggregation of archival
descriptions from archive repositories across the UK– http://archiveshub.ac.uk
• Copac provides access to the merged library catalogues of libraries throughout the UK, including all national libraries– http://copac.ac.uk
www.bath.ac.uk
What is LOCAH Doing?
• Part 1: Exposing Archives Hub & Copac data as Linked Data
• Part 2: Creating a prototype visualisation
• Part 3: Reporting on opportunities and barriers
www.bath.ac.uk
We’re Aggregating
• If something is identified, it can be linked to• We take items from one dataset and link
them to items from other datasets
BBCBBCVIAFVIAF
DBPediaDBPediaArchives
HubArchives
Hub
CopacCopac
GeoNamesGeoNames
www.bath.ac.uk
Enhancing our data• Already have some links:
– Time - reference.data.gov.uk URIs– Location - UK Postcodes URIs and Ordnance
Survey URIs – Names - Virtual International Authority File
• Matches and links widely-used authority files - http://viaf.org/
– Names - DBPedia
• Also looking at:– Subjects - Library Congress Subject Headings and
DBPedia
http://data.archiveshub.ac.uk/
‘Aggregates’ property points to http://www.openarchives.org/ore/terms/aggregates
Visualisation Prototype• Using Timemap –
– Googlemaps and Simile
– http://code.google.com/p/timemap/
• Early stages with this• Will give location and
‘extent’ of archive.• Will link through to
Archives Hub
www.bath.ac.uk
BBC Music
www.bath.ac.uk
APIs, Mashups and Linked Data
• Mashups work against a fixed set of data sources
• Hand crafted by humans
• Don’t integrate well
• Linked Data promises an unbound global data space
• Easy dataset integration
• Generic ‘mesh-up’ tools
www.bath.ac.uk
Aggregation / Integration Challenges
www.bath.ac.uk
Sustainability
• Can you rely on data sources long-term?
• Ed Summers at the Library of Congress createdhttp://lcsh.info
• Linked Data interface for LOC subject headings
• People started using it
www.bath.ac.uk
Library of Congress Subject Headings
www.bath.ac.uk
Scalability
• Will the Web of Data scale?
Example by Bradley Allen, Elsevier at LOD LAM Summit, SF, USA
www.bath.ac.uk
Data Modelling• Complexity
– Archival description is hierarchical and multi-level
• Dirty Data
Licensing• ‘Ownership’ of data• Hard to track attribution• CC0 for Archives Hub and Copac data
www.bath.ac.uk
Linked Data the Way for Aggregation?
• Enables ‘straightforward’ aggregation of wide variety of data sources
• New channels into your data services
• Researchers are more likely to discover sources
• ‘Hidden' collections of repositories become of the Web
www.bath.ac.uk
Questions for Discussion
• Will using vocabularies and ontologies always be too difficult?– Or will the tools appear? – MS Access
for Linked Data?
• Will the Web of Data scale?
www.bath.ac.uk
– What constitutes data worth linking to?– How to find datasets suitable for
interlinking? – How to make my dataset worth linking to?– How to encourage others to link to my
data?– What is the added value of links? – How to determine the quality of a link?
Questions if you’ve bought in
www.bath.ac.uk
Attribution and CC License
• Sections of this presentation adapted from materials created by other members of the LOCAH Project
• This presentation available under creative commons Non Commercial-Share Alike:
http://creativecommons.org/licenses/by-nc/2.0/uk/