locah project and considerations of linked data approaches

Download LOCAH Project and Considerations of Linked Data Approaches

Post on 16-May-2015




0 download

Embed Size (px)


Presentation given at JISC 'Managing Research Data International Workshop', Birmingham, UK. 29th March 2011http://www.jisc.ac.uk/whatwedo/programmes/mrd/rdmevents/mrdinternationalworkshop.aspx


  • 1. UKOLN is supportedby: LOCAH Project and Considerations of Linked Data Approaches 29 thMarch 2011 JISC Managing Research Data International Workshop, Birmingham, UK Adrian Stevenson LOCAH Project Manager


  • The term Linked Data refers to a set of best practices for publishing and connecting structured data on the Web.
  • the Semantic Web is the goal or end result Linked Data provides the means to reach that goal
  • From Linked Data: The Story So Far - Heath, Bizer and Berners-Lee 2009

3. The goal of Linked Data is to enable people to share structured data on the Web as easily as they can share documents today. Bizer/Cyganiak/Heath Linked Data Tutorial, linkeddata.org 4. In essence, it marks a shift in thinking from publishing data in human readable HTML documents to machine readable documents. That means that machines can do a little more of the thinking work for us. http://www.linkeddatatools.com/semantic-web-basics 5.

  • But havent we been putting linked data on the web for years?
    • In CSV , relational databases, XML etc?
  • Well yes, but these approaches are not so easy to integrate
  • Web 2.0 mashups work against a fixed set of data sources
  • Linked Data applications operate on top of an unbound, global data space.

6. So whats been happening? 7. 8. Data.gov.uk Officially launched 21 stJanuary 2010 9. BBC Music 10. A little bit of the techy stuff 11. Linked Data is

  • A way of publishing data on the web that:
    • Encourages reuse
    • Reduces redundancy
    • Maximises inter-connectedness
    • Enables network effects
  • So how is this achieved?

12. Presentational tagging HTML

  • Manchester Physiotherapy Centre

    Welcome to the Manchester Physiotherapy Centre home page. Do you feel pain? Have you had an injury? Let our staff take care of your body and soul.

    Consultation hoursMon 11am - 7pm
    Tue 11am - 7pm
    Wed 3pm - 7pm
    Thu 11am - 7pm
    Fri 11am - 3pm
  • Please note that we will not be offering consultation during the weeks of the Olympic games.

13. Semantic tagging

  • Physiotherapy
  • Manchester Physiotherapy Centre
  • Lisa DavenportSteve Matthews
  • Kelly Townsend

14. Linked Data Design Issues

  • URIs
  • LD Design Issues
  • Triples

http://www.w3.org/DesignIssues/LinkedData.html 15. URIs and HTTP

  • A Uniform Resource Identifier (URI) provides a simple and extensible means for identifying a resource - RFC 3986
    • HTTP URIs can be de-referenced
      • A URL is a type of URI
  • HTTP URIs are used for real world things
      • http://adrianstevenson.com/id/me
      • http://dbpedia.org/page/Tim_Berners-Lee

16. RDF

  • Resource Description Framework
    • a language for representing information about resources on the Web
    • RDF can be used to represent thingsidentifiedon the Web, even when they cannot be directlyretrievedon the Web
  • Describes relations using triples
  • http://www.w3.org/TR/REC-rdf-syntax/

17. Triples

  • Triples statements
    • Things have properties with values
    • Subject Predicate - Object
  • Triples are the basis of RDF

Archival Resource Repository Provides Access To The Rolling Stones Keith Richards Is Member Of 18. BBC Music 19. LOCAH Project 20. What is the LOCAH Project?

  • L inkedO penC opac andA rchivesH ub
  • Funded by #JiscEXPO 2/10 Expose call
  • 1 year project. Started August 2010
  • http://blogs.ukoln.ac.uk/locah/tag: #locah

21. What are the Archives Hub and Copac?

  • National data services
  • The Archives Hub is an aggregation of archival descriptions from archive repositories across the UK
    • http://archiveshub.ac.uk
  • Copac provides access to the merged library catalogues of libraries throughout the UK, including all national libraries
    • http://copac.ac.uk

22. What is LOCAH Doing?

  • Part 1: Exposing Archives Hub & Copac data as Linked Data
  • Part 2: Creating a prototype visualisation
  • Part 3: Reporting on opportunities and barriers

23. LOCAH Linked Data

  • If something is identified, it can be linked to
  • We can then takeitems from one dataset and link them to items from other datasets

BBC VIAF DBPedia Archives Hub Copac GeoNames 24. BBC:Cranford VIAF:Dickens DBpedia: Gaskell Hub:Gaskell Copac:Cranford Geonames:Manchester DBpedia: Dickens Hub:Dickens The Linking benefits of Linked Data 25. Archives Hub Model (as at 14/2/2011) Archival Resource FindingAid EADDocument BiographicalHistory AgentFamilyPersonPlaceConceptGenreFunctionOrganisationmaintainedBy/ maintains origination associatedWith accessProvidedBy/ providesAccessTo topic/ page hasPart/ partOf hasPart/ partOf encodedAs/ encodes Repository (Agent) Book Placetopic/ page Language Level administeredBy/ administers hasBiogHist/ isBiogHistFor foaf:focus Is-a associatedWith level Is-a language Concept Scheme inScheme Object representedBy Postcode Unit Extent Creation Birth Death extent participates in Temporal Entity Temporal Entity at time at time product of in 26. Enhancing our data

  • Already have some links:
    • lexvo.org URIs for languages of archival materials
    • reference.data.gov.uk URIs for time periods
    • Postcodes, using both UK Postcodes URIs and Ordnance Survey URIs
    • Virtual International Authority File
      • Matches and links widely-used authority files - http://viaf.org/
    • DBPedia
  • Also looking at:
    • Library Congress Subject Headings

27. http://data.archiveshub.ac.uk/id/archivalresource/gb1086skinner 28. http://data.archiveshub.ac.uk/doc/person/ncarules/chamberlainarthurneville1869-1940statesman 29. How are we creating the Visualisation Prototype?

  • Based on researcher use cases
  • Data queried from Sparql endpoint
  • Use tools such as Simile, Many Eyes, Google Charts
  • Also looking at custom built prototype

30. Use Case Slide http://www.w3.org/2005/Incubator/lld/wiki/Use_Case_LOCAH 31. Visualisation Prototype

  • Using Timemap
    • Googlemaps and Simile
    • http://code.google.com/p/timemap /
  • Early stages with this
  • Will give location and extent of archive.
  • Will link through to Archives Hub

32. Some issues

  • Data Modelling
  • Sustainability
  • Provenance
  • Licensing

33. Data Modelling Challenges

  • Archival description is hierarchical and multi-level
  • Archives Hub: inconsistencies in data and lack of standardisation
    • there's no content standard in the UK

34. Sustainability

  • Can you rely on data sources long-term?
  • Ed Summers at the Library of Congress created http://lcsh.info
  • Linked Data interface for LOC subject headings
  • People started using it

35. Library of Congress Subject Headings 36. Provenance

  • Triples create individual statements
  • OK if data watermarked
  • But can often be a problem

37. Licensing

  • Nature of Linked Data: each triple as a piece of data
  • Ownership of data
  • Hard to track attribution
  • Were using CC BY-NC 2.0 for now

38. Questions? 39. Attribution and CC License

  • Sections of this presentation adapted from materials created by other members of the LOCAH Project
  • This presentation available under creative commons Non Commercial-Share Alike:
  • http://creativecommons.org/licenses/by-nc/2.0/uk/


View more >