building the new open linked library
DESCRIPTION
Presented at the LITA National Forum, September 30, 2011TRANSCRIPT
Building the New OpenLinked Library Theory and Practice
…and results!
Keri Thompson, Joel Richard, Trish Rose-Sandler
LITA National Forum, September 30, 2011
• Founded in 1846• 1.5 m volumes in collection, plus
assorted archival collections• 15,000 volumes scanned and online• 20 libraries serving ~500
researchers/curators + hundreds of fellows and interns• 102 library staff• 1.5 web staff• Founding member of the Biodiversity
Heritage Library
Smithsonian Libraries
LITA National Forum, September 30, 2011
WHY Linked Open Data?
• It’s cool • “Increase and Diffusion of Knowledge”• Share, contribute to a global database• Create context around our data• Allow data to be reused/repurposed by
ourselves and others• Improve discoverability of our content
Linked Data in our Library
LITA National Forum, September 30, 2011
“The Semantic Web isn’t just about putting data on the web. It is about making links, so that a person or machine can explore the web of data. With linked data, when you have some of it, you can find other, related, data.”
Tim Berners-Lee, Linked Data – Design Issues
1. Use URIs as names for things2. Use HTTP URIs so that people can look up those names3. When someone looks up a URI, provide useful
information, using the standards (RDF, SPARQL)4. Include links to other URIs, so that they can discover
more things.
Linked Data
LITA National Forum, September 30, 2011
• Publishing structured data on the web• RDF (Resource Description Framework) • Enables queries computer 2 computer• uses standard ontologies (vocabularies)• data in “triples” (“triplestore”)
• Freely available to use, reuse, republish with no restrictions
• Made available through various mechanisms such as .csv files, APIs
URI http://library.si.edu/tl2/author/charles-darwin Predicate owl:sameAs Object http://viaf.org/viaf/27063124
Linked Data Open Data
LITA National Forum, September 30, 2011
Organically grown since 1995
• 83,000 HTML pages• 3,700 ColdFusion pages• 253,000 JPEG files• 27,000 PNG files• 46,000 PDFs
No CMS.
Our Website
LITA National Forum, September 30, 2011
1. Analyze and categorize our current & future online content
2. Create high-level data models for common content types
Questions:Where are we metadata-rich? What do we have that others don’t? What is feasible right now?
Digital Library Planning
LITA National Forum, September 30, 2011
• 400+ Online “books”• Exhibitions• Research Tools• Image Collections (60,000+ images)• “Brochure” content (About us, Locations, Hours)• Bibliographies, Fact Sheets, Subject Guides• Databases, inventories and database-like books
Collections not on our website:• ~15,000 digitized volumes, with many more planned• Other analog collections that will be digitized
Content Analysis
LITA National Forum, September 30, 2011
Books (and book-like objects)• expose bibliographic data for reuse• consume links to other internal content
and external authoritative dataDatabases• expose data previously unavailable• provide authoritative data• consume our data and others’ to create
new aggregate websites
Linked Data in our Library
LITA National Forum, September 30, 2011
1. Decide which data elements should be exposed as linked data for each content type
2. Choose appropriate vocabularies3. Create a rough timeline and plan
for migrating site content (=1 year*)
* Optimism included in this estimate
Linked Digital Library Planning
LITA National Forum, September 30, 2011
Implement all this linked open data goodness (and a shiny new website) by
moving to Drupal 7
Linked Data in our Library
LITA National Forum, September 30, 2011
Drupal and Linked Data
• Native support for RDFa in Drupal 7.
• RDF Extensions (rdfx) – even more features.
• Vocabularies can be imported and cached for reuse.
• Few or no modifications to HTML to support RDFa.
What’s the difference between RDF, RDF/XML and RDFa?
LITA National Forum, September 30, 2011
<meta content="The Origin of Species" about=”/book/origin-species" property="dc:title" />
<h1>The Origin of Species</h1>
<img typeof="foaf:Image" src="http://localhost:8087/images/origin-of-
species.png" alt="The origin of species cover image”title="The origin of species cover image" />
<div rel="bibo:authorList">
<a href="/content/darwin-charles-1809-1882">Darwin, Charles, 1809-1882
</a>
</div>
<div property="dc:created">November 24, 1859</div>
<div property="bibo:numPages">1000</div>
<div property="dc:language">english</div>
<div rel="owl:sameAs">
<a href="http://www.worldcat.org/oclc/1184647" target="_blank">http://www.worldcat.org/oclc/
1184647</a>
</div>
URI: http://library.si.edu/book/origin-of-species
RDFa Sample
LITA National Forum, September 30, 2011
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/terms/" xmlns:bibo="http://purl.org/ontology/bibo/">
<rdf:Description rdf:about="http://localhost:8087/content/ origin-species">
<rdf:type rdf:resource="http://purl.org/ontology/bibo/Book"/>
<dc:title>The Origin of Species</dc:title>
<dc:created>November 24, 1859</dc:created>
<bibo:numPages>1000</bibo:numPages>
<dc:language>english</dc:language>
<bibo:authorList rdf:resource="http://localhost:8087/content/darwin-
charles"/>
<owl:sameAs rdf:resource=“http://www.worldcat.org/oclc/1184647”>
</rdf:Description>
</rdf:RDF>
RDF/XML Sample
LITA National Forum, September 30, 2011
URI: http://library.si.edu/book/origin-of-species.rdf
• Fields, Views, Views UI
• Node Reference
• SPARQL Endpoint , SPARQL API
• RESTful Web Services
• SPARQL Views
• RDF External Vocabulary Importer
Caveat: Some modules not ready for Drupal 7
• i.e., Biblio module (no CCK, RDF capabilities)
What other modules are we using?
LITA National Forum, September 30, 2011
• Drupal 7 comes with several namespaces. We will use: DC Terms, FOAF, SKOS, OWL
•We're working with books, so we need the Bibliographic Ontology:•Website: http://bibliontology.com/• Namespace: http://purl.org/ontology/bibo/• Prefix: “bibo”
•We may also create our own vocabulary.
What about Namespaces/Vocabularies?
LITA National Forum, September 30, 2011
Adding a Namespace to Drupal
LITA National Forum, September 30, 2011
Setting up RDF Mappings in Drupal
LITA National Forum, September 30, 2011
Taxonomic Literature 2 (1977-2009)
• The standard reference work for plant taxonomic literature from Linnaeus to 1940.• Contains botanists, authors, biographies,
citations, and species.• Indexed and cross referenced.• Should be digitized & on the web!• SIL aims to be an authority for
botanist names on the Internet.
Databases: TL-2
LITA National Forum, September 30, 2011
Taxonomic Literature 2 (TL-2). v1., p. 600
TL-2 Page Sample
LITA National Forum, September 30, 2011
http://library.si.edu/tl2/author/darwin
http://library.si.edu/tl2/book/1313
tl2:creatorOfhttp://library.si.edu/tl2/book/1313
owl:sameAshttp://viaf.org/viaf/27063124
dc:creatorhttp://library.si.edu/tl2/author/darwin
owl:sameAshttp://www.archive.org/details/ originofspecies00darwuoft
TL-2 Page Sample
LITA National Forum, September 30, 2011
foaf:lastName, foaf:familyName
foaf:firstName, foaf:givenName
foaf:name, skos:prefLabel
tl2:birthYear
tl2:deathYear
tl2:description
tl2:personAbbrev
tl2:bookNumber
dc:title
event:place
dc:publisher
dc:created
tl2:bookAbbreviation
http://library.si.edu/tl2/author/darwinRDF Type = foaf:Person
http://library.si.edu/tl2/book/1313RDF Type = bibo:Book
TL-2 Page Sample
LITA National Forum, September 30, 2011
http://library.si.edu/tl2/author/darwin http://library.si.edu/tl2/book/1313
tl2:creatorOf“http://library.si.edu/tl2/book/1313”
owl:sameAs “http://viaf.org/viaf/27063124”
foaf:lastName “Darwin”
foaf:familyName “Darwin”
foaf:firstName “Charles”
foaf:givenName “Charles”
foaf:name “Darwin, Charles Robert”
skos:prefLabel “Darwin, Charles Robert”
tl2:birthYear “1809”
tl2:deathYear “1882”
tl2:description “British evolutionary biologist”
tl2:personAbbrev “Darwin”
dc:creator“http://library.si.edu/tl2/author/darwin”
owl:sameAs”http://www.archive.org/details/originofspecies00darwuoft”
tl2:bookNumber “1313”
bibo:shortTitle “On the origin of species”
dc:title “On the origin of species by means of natural selection, or the preservation of favoured races in the struggle for life.”
event:place “London”
dc:publisher “John Murray”
dc:created “1859”
tl2:bookAbbreviation “Origin sp.”
TL-2 Page Sample Results
LITA National Forum, September 30, 2011
• Two Content Types: Authors (Botanists) and Publications
• Node Reference between Authors and Publications based on the TL-2 index.
• Other data is available when it's parsed:
• Herbaria • Institutions• Species names• Bibliographies• Handwriting Samples• Postage Stamps
Setting up TL-2 in Drupal
LITA National Forum, September 30, 2011
• Create Content Types (Digital Library books & TL-2)• Create import process• May be able to use the Feeds module for import• Must create node references during the import.• Must accommodate the blocks of unparsed
information in TL-2• Create a search interface specifically for TL-2
Imag
e Cr
edits
: D
atab
ase:
epo
nas-
deew
ay (h
ttp:
//ep
onas
-dee
way
.dev
iant
art.c
om);
Mag
nify
ing
Gla
ss: F
laho
rn (h
ttp:
//fla
horn
.dev
iant
art.c
om/)
Getting Data into Drupal
LITA National Forum, September 30, 2011
Resolve /node/22365.rdf and /tl2/author/charles-darwin
Handling "See also" and "Same as" entries in the TL-2 indexes.
Can we search our own data using SPARQL?• Should we? Does it make sense?
Discuss/Extend vocabulary for our special needs.
Set up linked data within our site• image collections• trade literature• Exhibitions
What else is there to do?
LITA National Forum, September 30, 2011
LinkedData.orghttp://linkeddata.org/guides-and-tutorialshttp://linkeddatabook.com/editions/1.0/
Drupal Groupshttp://groups.drupal.org/semantic-webhttp://groups.drupal.org/libraries
Tim Berners-Lee, TED talksTim Berners-Lee on the next Web (2009)The year open data went worldwide (2010)
Other Resources
LITA National Forum, September 30, 2011
BHL is….• A consortium of 13 natural history and
botanical libraries and research institutions• An open access digital library for legacy
biodiversity literature.• An open data repository of taxonomic names
and bibliographic information
LITA National Forum, September 30, 2011
LITA National Forum, September 30, 2011
LITA National Forum, September 30, 2011
Allows data which was created for a specific purpose and audience to interact with other data to serve new, previously
unimagined roles..
LITA National Forum, September 30, 2011
Benefits of open data
What information have we opened up?
Essentially, everything – our metadata (descriptive, rights, structural), our image files,
scientific names, OCR’d files
LITA National Forum, September 30, 2011
Technical methods for opening data
• Data exports• APIs• OpenURL• OAI-PMH
LITA National Forum, September 30, 2011
Who is reusing our data?
• Tropicos• Rod Page – BioGUID, BioStor
• Encyclopedia of Life• Ryan Schenk – Visualizing taxominic
synonyms
LITA National Forum, September 30, 2011
LITA National Forum, September 30, 2011
Who is reusing our data?Tropicos
LITA National Forum, September 30, 2011
Tropicos
LITA National Forum, September 30, 2011
Tropicos
Who is reusing our data?
LITA National Forum, September 30, 2011
Rod Page – BioGUID – http://bioguid.info/bhl/
Who is reusing our data?
LITA National Forum, September 30, 2011
Rod Page – BioStor – http://biostor.org/
Who is reusing our data?
LITA National Forum, September 30, 2011
Rod Page – BioStor – http://biostor.org/
Who is reusing our data?
LITA National Forum, September 30, 2011
Encyclopedia of Life – http://eol.org/
Who is reusing our data?
LITA National Forum, September 30, 2011
Encyclopedia of Life – http://eol.org/
Who is reusing our data?
Encyclopedia of Life – http://eol.org/
LITA National Forum, September 30, 2011
Who is reusing our data?
Ryan Schenk – http://ryanschenk.com/2011/02/visualizing-taxonomic-synoymns/
LITA National Forum, September 30, 2011
Who is reusing our data?
Making open data successful• Promote it!
LITA National Forum, September 30, 2011
Do a code challenge
LITA National Forum, September 30, 2011
Publicly display your data’s copyright/licensing and API terms of service
LITA National Forum, September 30, 2011
Keri Thompson, Head of Web ServicesSmithsonian Institution Libraries
[email protected] , @DigiKeri_SIL
Joel Richard, Lead DeveloperSmithsonian Institution Libraries
Trish Rose-Sandler, Data AnalystBiodiversity Heritage Library
Building the New Open Linked Library
Thank You!
LITA National Forum, September 30, 2011