building a linked open data set

27
Implementing a Linked Open Data set Joel Richard Smithsonian Libraries [email protected] SLA Annual Conference, July 2012

Upload: joel-richard

Post on 11-Jul-2015

935 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Building a Linked Open Data Set

Implementing a Linked Open Data set

Joel Richard

Smithsonian Libraries

[email protected]

SLA Annual Conference, July

2012

Page 2: Building a Linked Open Data Set

Who are the Smithsonian Libraries?

• 20 Libraries in the U.S. and Panama

• Supports research of staff and the public

• Strong effort to digitize pre-1923 texts

• Taxonomic Literature II is one of these texts

Joel Richard,

[email protected]

SLA Annual Conference, July

2012

Page 3: Building a Linked Open Data Set

Summary of Agenda

• Our data set and process

• Conversion to Linked Data

• Storing Linked Data

• Examples and More Info

• Summary

• … and Best brew pubs in Chicago

Joel Richard,

[email protected]

SLA Annual Conference, July

2012

Page 4: Building a Linked Open Data Set

Disclaimer

We are still learning.

Joel Richard,

[email protected]

SLA Annual Conference, July

2012

Page 5: Building a Linked Open Data Set

What is Linked Data?

HTTP URIs identify things to Humans and computers

Identifiers are related to other identifiers (or values) via predicates in a “triple”:

Charles Darwin // Creator // On the Origin of Species

See also :

http://linkeddata.org/

http://en.wikipedia.org/wiki/Linked_Data

http://richard.cyganiak.de/2007/10/lod/

Joel

Richard, [email protected]

SLA Annual Conference, July

2012

Page 6: Building a Linked Open Data Set

Joel Richard,

[email protected]

SLA Annual Conference, July

2012

http://richard.cyganiak.de/2007/10/lod/

Page 7: Building a Linked Open Data Set

Taxonmic Literature II

Essential Reference Tool for Botanists

Authors and their Publications from1753 to 1940

It is a “database in book form.”

Page 8: Building a Linked Open Data Set

Joel Richard,

[email protected]

SLA Annual Conference, July

2012

Page 9: Building a Linked Open Data Set

Our process

Scanned the pages

Hired contractor for OCR and correction (99.97% accuracy)

Received XML dataset from Contractor

Verified and Imported to SQL Server

Built a website to search the data

Joel Richard,

[email protected]

SLA Annual Conference, July

2012

Page 10: Building a Linked Open Data Set

Joel Richard,

[email protected]

SLA Annual Conference, July

2012

Page 11: Building a Linked Open Data Set

Great! Let’s make some linked data!

First...what does 99.97% accuracy mean?

Joel Richard,

[email protected]

SLA Annual Conference, July

2012

~12,000 Errors

Page 12: Building a Linked Open Data Set

Great! Let’s make some linked data!

Select Identifiers for your data

http://library.si.edu/tl-2/author/darwin

http://library.si.edu/tl-2/title/origin_of_species

http://library.si.edu/tl-2/title/1313

Choose vocabularies for predicates(harder than it sounds)

OWL, FOAF, DublinCore, OpenGraph, SIOC, SKOS, BIBO, etc.

Joel

Richard, [email protected]

SLA Annual Conference, July

2012

Page 13: Building a Linked Open Data Set

Joel

Richard, [email protected]

SLA Annual Conference, July

2012

Mondeca Labs

Linked Open Vocabularies (LOV)

Vocabulary of a Friend(VOAF)

A vocabulary for describing other vocabularies

http://labs.mondeca.com/dataset/lov

Page 14: Building a Linked Open Data Set

Joel Richard,

[email protected]

SLA Annual Conference, July

2012

http://library.si.edu/tl2/author/darwin

http://library.si.edu/tl2/title/origin…

tl2:creatorhttp://library.si.edu/tl2/title/1313

owl:sameAshttp://viaf.org/viaf/27063124

dc:creatorhttp://library.si.edu/tl2/author/darwin

owl:sameAshttp://www.archive.org/details/

originofspecies00darwuoft

Page 15: Building a Linked Open Data Set

Joel

Richard, [email protected]

SLA Annual Conference, July

2012

foaf:lastName, foaf:familyName

foaf:firstName, foaf:givenName

foaf:name, skos:prefLabel

tl2:birthYear

tl2:deathYear

skos:definition

tl2:personAbbreviation

tl2:titleNumber

dc:title

event:place

dc:publisher

dc:created

tl2:titleAbbreviation

http://library.si.edu/tl2/author/darwinRDF Type = foaf:Person

http://library.si.edu/tl2/title/origin…RDF Type = bibo:Book

Page 16: Building a Linked Open Data Set

Great! Let’s make some linked data!

How are we going to store all this?

We’re using Drupal. RDFa is built-in, RDF extensions is an add-on module.

Probably not a good idea for very large datasets.

TL-2: 10,000 authors + 37,000 titles becomes about 400,000 triples.

Joel

Richard, [email protected]

SLA Annual Conference, July

2012

Page 17: Building a Linked Open Data Set

Storage considerations

Performance of Drupal Import:

Feeds Import: 7 Hours for 35k Records

Other options? Still searching…

Our linked data set will grow to at least 600-700k Drupal nodes.

Is Drupal the best way to do this?

Joel

Richard, [email protected]

SLA Annual Conference, July

2012

Page 18: Building a Linked Open Data Set

Storage considerations

2000 US Census

19 million households received “long form”

Joshua Tauberer: converted to 1bln triples

http://www.rdfabout.com/demo/census/

Carefully consider your storage options!

Joel

Richard, [email protected]

SLA Annual Conference, July

2012

Page 19: Building a Linked Open Data Set

Storage

ARC2 used by Drupal 7

RDBMS via D2RQ

RDBMS via Triplify

OpenLink Virtuoso

See Also:

http://www.w3.org/2001/sw/rdb2rdf/use-cases/

Joel Richard,

[email protected]

SLA Annual Conference, July

2012

Page 20: Building a Linked Open Data Set

Linked Data. What’s the point?

Disambiguation

Connecting Relevant Information

More visible via search

Enrichment of your data

Easier reuse of data

Joel Richard,

[email protected]

SLA Annual Conference, July

2012

Page 21: Building a Linked Open Data Set

Joel Richard,

[email protected]

SLA Annual Conference, July

2012

Page 22: Building a Linked Open Data Set

Joel

Richard, [email protected]

SLA Annual Conference, July

2012

http://en.openei.org/apps/mashathon2010/

Page 23: Building a Linked Open Data Set

Joel

Richard, [email protected]

SLA Annual Conference, July

2012

http://data.nytimes.com/schools/schools.html

Page 24: Building a Linked Open Data Set

Joel

Richard, [email protected]

SLA Annual Conference, July

2012

http://data.nytimes.com/N38444093941437235523

Page 25: Building a Linked Open Data Set

Joel Richard,

[email protected]

SLA Annual Conference, July

2012

http://www.worldcat.org/oclc/7619054

Page 27: Building a Linked Open Data Set

Joel Richard,

[email protected]

SLA Annual Conference, July

2012

Thank you!

?

[email protected]://slideshare.net/joelrichard