linked data and the locah project ili2011
DESCRIPTION
Slides for a presentation given at the Internet Librarian International Conference (ILI2011), October 2011TRANSCRIPT
Bethan Ruddock, Library and Archival Services, Mimas, University of Manchester
[email protected] @bethanar
LINKED DATA AND THE LOCAH PROJECT
#ILI2011
LINKED OPEN COPAC & ARCHIVES HUB
JISC-funded project (under JISCexpo - exposing digital content for education and research)
September 2010 – August 2011
Staff from Mimas, UKOLN, Eduserv
Additional expertise from Talis, OCLC, Library of Congress
PROJECT AIMS
Put archival and bibliographic data at the heart of the Linked Data Web, making new links between diverse content sources, enabling the free and flexible exploration of data and enabling researchers to make new connections between subjects, people, organisations and places to reveal more about our history and society.
Make a collection of resources available on the Web as structured data, in particular linked data, where a case can be made that it would benefit teaching, learning, research, administration and/or knowledge transfer in UK higher education
Develop a prototype with instructional step-by-step demonstration and documentation to show how the structured content can be used by 3rd party tools and services
Explore and report on the opportunities and barriers in making content structured and exposed on the Web for discovery and use. Such opportunities and barriers may coalesce around licensing implications, trust, provenance, sustainability and usability
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
THE DATA: COPAC
• Merged union catalogue of the holdings of over 60 UK libraries
• Over 50 million records• Consolidated records• MODS XML (not MARC)
A Copac consolidated record created from 5 contributed records. Lines show how contributed records match with one another.
THE DATA: ARCHIVES HUB
• Descriptions of archive collections from over 200 UK repositories
• Nearly 25,000 descriptions – collection-level and multi-level
• EAD (Encoded Archival Description)
CHALLENGES: VARIANCE
• Data from many sources – should adhere to Standards
AARC2 ISAD(G) BUT
Differences in implementation
CHALLENGES: DATA
dct:publisher: unknown
260 $b: unknown
dct publisher: definition:‘entity responsible for making the resource available’
CHALLENGES: MULTIPLE SOURCES
A ‘match graph’ of a consolidated Copac record
CHALLENGES: VOCABULARY
Stuffc r e a t e d
co
llec
ted
r e l at e
s
t o
co l l ec t e
d
c r e a t e d
re l a t e s t o
ORIGINATION
LICENSING
• Data comes from contributors Not ours to redistribute!
• Concerns Provenance Trust Control
• Consulted Liaised with contributors and stakeholders
THE TECHY STUFF
Specifications required a lot of brainstorming…
Image used under a CC licence from http://www.flickr.com/photos/blankdots/4865831504/
ARCHIVES HUB MODEL
ArchivalResource
Finding Aid
EAD Document
Biographical
History
Agent
Family Person Place
Concept
Genre Function
Organisation
maintainedBy/maintains
origination
associatedWith
accessProvidedBy/providesAccessTo
topic/page
hasPart/partOf
hasPart/partOf
encodedAs/encodes
Repository(Agent)
Book
Place
topic/page
Language
Level
administeredBy/administers
hasBiogHist/isBiogHistFor
foaf:focus Is-a associatedWith
level
Is-a
language
ConceptScheme
inScheme
ObjectrepresentedBy
PostcodeUnit
Extent
Creation
Birth Death
extent
participates in
TemporalEntity
TemporalEntity
at time
at time
product of
in
COPAC MODEL
Node name MODS field Ontology
BibliographicResource
<modscollection> bibo
cardinality property URI/literal ontology
0 1 copac:creator Creator URI dc
0 m copac:contributor Contributor URI coapc
0 1 event:producedIn Production Date URI event
0 1 dct:issued Production Date URI dc
0 m pode:publicationPlace Place URI pode
0 m isbd:P1016 Place URI isbd
0 m dct:publisher Publisher URI dc
0 1 dct:isPartOf Series URI dc
1 m copac:HeldBy Institution URI with Institution as subject
1 1 bibo:type Type URI bibo
0 m dct:subject Subject URI dc
0 m skos:subject subject URI skos
0 m dct:language Language URI dc
1 1 hub:encodedAs mods URI hub
data.copac.ac.uk
data.archiveshub.ac.uk
Visualisation Prototype Using Timemap –
Googlemaps and Simile
http://code.google.com/p/timemap/
Early stages with this
Will give location and ‘extent’ of archive.
Will link through to Archives Hub
BBC:Cranford
VIAF:Dickens
DBPedia: Gaskell Hub:Gaske
ll
Copac:Cranford
Geonames:Mancheste
r
DBPedia: Dickens
Hub:Dickens
Linking
CHALLENGES: ANONYMOUS
Mask image used under a CC licence from http://www.yourbdnews.com
Anonymous
Anonymous
anonymous
Anonymous
Anonymous
Anonymo
us
Anonymous
Anonymous
anonymous
Anony
m
ous
anon.
anon.
Anon.
anon
Anon.
Anon.
anonymous
data.copac.ac.uk/doc/bibliographicresource/6947473
data.copac.ac.uk/doc/concept/agent/6947473lacywilliam
data.copac.ac.uk/doc/bibliographicresource/6947473
data.copac.ac.uk/doc/agent/rys
data.archiveshub.ac.uk/doc/archivalresource/gb1086colour
data.archiveshub.ac.uk/doc/concept/unesco/photography
WHAT NEXT?
Linking Lives name-based approach into the data integrating archival resource with other
resources DBPedia, VIAF, Copac... route into archives for different
audiences? issues around trust and provenance to be
explored
FINALLY…
The LOCAH data is open for use…
…please play with it!Image used under a CC licence from
http://www.flickr.com/photos/huladancer22/530743543/
@bethanarbethaninfoprof.wordpress.combethan.ruddock@manchester.ac.uk
LOCAH blog: http://blogs.ukoln.ac.uk/locah/
Image used under a CC licence from http://www.flickr.com/photos/theilluminated/5386099858/