building the new open linked library

48
Building the New Open Linked Library Theory and Practice …and results! Keri Thompson, Joel Richard, Trish Rose-Sandler LITA National Forum, September 30, 2011

Upload: joel-richard

Post on 25-May-2015

2.358 views

Category:

Technology


0 download

DESCRIPTION

Presented at the LITA National Forum, September 30, 2011

TRANSCRIPT

Page 1: Building the New Open Linked Library

Building the New OpenLinked Library Theory and Practice

…and results!

Keri Thompson, Joel Richard, Trish Rose-Sandler

LITA National Forum, September 30, 2011

Page 2: Building the New Open Linked Library

• Founded in 1846• 1.5 m volumes in collection, plus

assorted archival collections• 15,000 volumes scanned and online• 20 libraries serving ~500

researchers/curators + hundreds of fellows and interns• 102 library staff• 1.5 web staff• Founding member of the Biodiversity

Heritage Library

Smithsonian Libraries

LITA National Forum, September 30, 2011

Page 3: Building the New Open Linked Library

WHY Linked Open Data?

• It’s cool • “Increase and Diffusion of Knowledge”• Share, contribute to a global database• Create context around our data• Allow data to be reused/repurposed by

ourselves and others• Improve discoverability of our content

Linked Data in our Library

LITA National Forum, September 30, 2011

Page 4: Building the New Open Linked Library

“The Semantic Web isn’t just about putting data on the web. It is about making links, so that a person or machine can explore the web of data. With linked data, when you have some of it, you can find other, related, data.”

Tim Berners-Lee, Linked Data – Design Issues

1. Use URIs as names for things2. Use HTTP URIs so that people can look up those names3. When someone looks up a URI, provide useful

information, using the standards (RDF, SPARQL)4. Include links to other URIs, so that they can discover

more things.

Linked Data

LITA National Forum, September 30, 2011

Page 5: Building the New Open Linked Library

• Publishing structured data on the web• RDF (Resource Description Framework) • Enables queries computer 2 computer• uses standard ontologies (vocabularies)• data in “triples” (“triplestore”)

• Freely available to use, reuse, republish with no restrictions

• Made available through various mechanisms such as .csv files, APIs

URI http://library.si.edu/tl2/author/charles-darwin Predicate owl:sameAs Object http://viaf.org/viaf/27063124

Linked Data Open Data

LITA National Forum, September 30, 2011

Page 6: Building the New Open Linked Library

Organically grown since 1995

• 83,000 HTML pages• 3,700 ColdFusion pages• 253,000 JPEG files• 27,000 PNG files• 46,000 PDFs

No CMS.

Our Website

LITA National Forum, September 30, 2011

Page 7: Building the New Open Linked Library

1. Analyze and categorize our current & future online content

2. Create high-level data models for common content types

Questions:Where are we metadata-rich? What do we have that others don’t? What is feasible right now?

Digital Library Planning

LITA National Forum, September 30, 2011

Page 8: Building the New Open Linked Library

• 400+ Online “books”• Exhibitions• Research Tools• Image Collections (60,000+ images)• “Brochure” content (About us, Locations, Hours)• Bibliographies, Fact Sheets, Subject Guides• Databases, inventories and database-like books

Collections not on our website:• ~15,000 digitized volumes, with many more planned• Other analog collections that will be digitized

Content Analysis

LITA National Forum, September 30, 2011

Page 9: Building the New Open Linked Library

Books (and book-like objects)• expose bibliographic data for reuse• consume links to other internal content

and external authoritative dataDatabases• expose data previously unavailable• provide authoritative data• consume our data and others’ to create

new aggregate websites

Linked Data in our Library

LITA National Forum, September 30, 2011

Page 10: Building the New Open Linked Library

1. Decide which data elements should be exposed as linked data for each content type

2. Choose appropriate vocabularies3. Create a rough timeline and plan

for migrating site content (=1 year*)

* Optimism included in this estimate

Linked Digital Library Planning

LITA National Forum, September 30, 2011

Page 11: Building the New Open Linked Library

Implement all this linked open data goodness (and a shiny new website) by

moving to Drupal 7

Linked Data in our Library

LITA National Forum, September 30, 2011

Page 12: Building the New Open Linked Library

Drupal and Linked Data

• Native support for RDFa in Drupal 7.

• RDF Extensions (rdfx) – even more features.

• Vocabularies can be imported and cached for reuse.

• Few or no modifications to HTML to support RDFa.

What’s the difference between RDF, RDF/XML and RDFa?

LITA National Forum, September 30, 2011

Page 13: Building the New Open Linked Library

<meta content="The Origin of Species" about=”/book/origin-species" property="dc:title" />

<h1>The Origin of Species</h1>

<img typeof="foaf:Image" src="http://localhost:8087/images/origin-of-

species.png" alt="The origin of species cover image”title="The origin of species cover image" />

<div rel="bibo:authorList">

<a href="/content/darwin-charles-1809-1882">Darwin, Charles, 1809-1882

</a>

</div>

<div property="dc:created">November 24, 1859</div>

<div property="bibo:numPages">1000</div>

<div property="dc:language">english</div>

<div rel="owl:sameAs">

<a href="http://www.worldcat.org/oclc/1184647" target="_blank">http://www.worldcat.org/oclc/

1184647</a>

</div>

URI: http://library.si.edu/book/origin-of-species

RDFa Sample

LITA National Forum, September 30, 2011

Page 14: Building the New Open Linked Library

<?xml version="1.0" encoding="UTF-8"?>

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/terms/" xmlns:bibo="http://purl.org/ontology/bibo/">

<rdf:Description rdf:about="http://localhost:8087/content/ origin-species">

<rdf:type rdf:resource="http://purl.org/ontology/bibo/Book"/>

<dc:title>The Origin of Species</dc:title>

<dc:created>November 24, 1859</dc:created>

<bibo:numPages>1000</bibo:numPages>

<dc:language>english</dc:language>

<bibo:authorList rdf:resource="http://localhost:8087/content/darwin-

charles"/>

<owl:sameAs rdf:resource=“http://www.worldcat.org/oclc/1184647”>

</rdf:Description>

</rdf:RDF>

RDF/XML Sample

LITA National Forum, September 30, 2011

URI: http://library.si.edu/book/origin-of-species.rdf

Page 15: Building the New Open Linked Library

• Fields, Views, Views UI

• Node Reference

• SPARQL Endpoint , SPARQL API

• RESTful Web Services

• SPARQL Views

• RDF External Vocabulary Importer

Caveat: Some modules not ready for Drupal 7

• i.e., Biblio module (no CCK, RDF capabilities)

What other modules are we using?

LITA National Forum, September 30, 2011

Page 16: Building the New Open Linked Library

• Drupal 7 comes with several namespaces. We will use: DC Terms, FOAF, SKOS, OWL

•We're working with books, so we need the Bibliographic Ontology:•Website: http://bibliontology.com/• Namespace: http://purl.org/ontology/bibo/• Prefix: “bibo”

•We may also create our own vocabulary.

What about Namespaces/Vocabularies?

LITA National Forum, September 30, 2011

Page 17: Building the New Open Linked Library

Adding a Namespace to Drupal

LITA National Forum, September 30, 2011

Page 18: Building the New Open Linked Library

Setting up RDF Mappings in Drupal

LITA National Forum, September 30, 2011

Page 19: Building the New Open Linked Library

Taxonomic Literature 2 (1977-2009)

• The standard reference work for plant taxonomic literature from Linnaeus to 1940.• Contains botanists, authors, biographies,

citations, and species.• Indexed and cross referenced.• Should be digitized & on the web!• SIL aims to be an authority for

botanist names on the Internet.

Databases: TL-2

LITA National Forum, September 30, 2011

Page 20: Building the New Open Linked Library

Taxonomic Literature 2 (TL-2). v1., p. 600

TL-2 Page Sample

LITA National Forum, September 30, 2011

Page 21: Building the New Open Linked Library

http://library.si.edu/tl2/author/darwin

http://library.si.edu/tl2/book/1313

tl2:creatorOfhttp://library.si.edu/tl2/book/1313

owl:sameAshttp://viaf.org/viaf/27063124

dc:creatorhttp://library.si.edu/tl2/author/darwin

owl:sameAshttp://www.archive.org/details/ originofspecies00darwuoft

TL-2 Page Sample

LITA National Forum, September 30, 2011

Page 22: Building the New Open Linked Library

foaf:lastName, foaf:familyName

foaf:firstName, foaf:givenName

foaf:name, skos:prefLabel

tl2:birthYear

tl2:deathYear

tl2:description

tl2:personAbbrev

tl2:bookNumber

dc:title

event:place

dc:publisher

dc:created

tl2:bookAbbreviation

http://library.si.edu/tl2/author/darwinRDF Type = foaf:Person

http://library.si.edu/tl2/book/1313RDF Type = bibo:Book

TL-2 Page Sample

LITA National Forum, September 30, 2011

Page 23: Building the New Open Linked Library

http://library.si.edu/tl2/author/darwin http://library.si.edu/tl2/book/1313

tl2:creatorOf“http://library.si.edu/tl2/book/1313”

owl:sameAs “http://viaf.org/viaf/27063124”

foaf:lastName “Darwin”

foaf:familyName “Darwin”

foaf:firstName “Charles”

foaf:givenName “Charles”

foaf:name “Darwin, Charles Robert”

skos:prefLabel “Darwin, Charles Robert”

tl2:birthYear “1809”

tl2:deathYear “1882”

tl2:description “British evolutionary biologist”

tl2:personAbbrev “Darwin”

dc:creator“http://library.si.edu/tl2/author/darwin”

owl:sameAs”http://www.archive.org/details/originofspecies00darwuoft”

tl2:bookNumber “1313”

bibo:shortTitle “On the origin of species”

dc:title “On the origin of species by means of natural selection, or the preservation of favoured races in the struggle for life.”

event:place “London”

dc:publisher “John Murray”

dc:created “1859”

tl2:bookAbbreviation “Origin sp.”

TL-2 Page Sample Results

LITA National Forum, September 30, 2011

Page 24: Building the New Open Linked Library

• Two Content Types: Authors (Botanists) and Publications

• Node Reference between Authors and Publications based on the TL-2 index.

• Other data is available when it's parsed:

• Herbaria • Institutions• Species names• Bibliographies• Handwriting Samples• Postage Stamps

Setting up TL-2 in Drupal

LITA National Forum, September 30, 2011

Page 25: Building the New Open Linked Library

• Create Content Types (Digital Library books & TL-2)• Create import process• May be able to use the Feeds module for import• Must create node references during the import.• Must accommodate the blocks of unparsed

information in TL-2• Create a search interface specifically for TL-2

Imag

e Cr

edits

: D

atab

ase:

epo

nas-

deew

ay (h

ttp:

//ep

onas

-dee

way

.dev

iant

art.c

om);

Mag

nify

ing

Gla

ss: F

laho

rn (h

ttp:

//fla

horn

.dev

iant

art.c

om/)

Getting Data into Drupal

LITA National Forum, September 30, 2011

Page 26: Building the New Open Linked Library

Resolve /node/22365.rdf and /tl2/author/charles-darwin

Handling "See also" and "Same as" entries in the TL-2 indexes.

Can we search our own data using SPARQL?• Should we? Does it make sense?

Discuss/Extend vocabulary for our special needs.

Set up linked data within our site• image collections• trade literature• Exhibitions

What else is there to do?

LITA National Forum, September 30, 2011

Page 27: Building the New Open Linked Library

LinkedData.orghttp://linkeddata.org/guides-and-tutorialshttp://linkeddatabook.com/editions/1.0/

Drupal Groupshttp://groups.drupal.org/semantic-webhttp://groups.drupal.org/libraries

Tim Berners-Lee, TED talksTim Berners-Lee on the next Web (2009)The year open data went worldwide (2010)

Other Resources

LITA National Forum, September 30, 2011

Page 28: Building the New Open Linked Library

BHL is….• A consortium of 13 natural history and

botanical libraries and research institutions• An open access digital library for legacy

biodiversity literature.• An open data repository of taxonomic names

and bibliographic information

LITA National Forum, September 30, 2011

Page 29: Building the New Open Linked Library

LITA National Forum, September 30, 2011

Page 30: Building the New Open Linked Library

LITA National Forum, September 30, 2011

Page 31: Building the New Open Linked Library

Allows data which was created for a specific purpose and audience to interact with other data to serve new, previously

unimagined roles..

LITA National Forum, September 30, 2011

Benefits of open data

Page 32: Building the New Open Linked Library

What information have we opened up?

Essentially, everything – our metadata (descriptive, rights, structural), our image files,

scientific names, OCR’d files

LITA National Forum, September 30, 2011

Page 33: Building the New Open Linked Library

Technical methods for opening data

• Data exports• APIs• OpenURL• OAI-PMH

LITA National Forum, September 30, 2011

Page 34: Building the New Open Linked Library

Who is reusing our data?

• Tropicos• Rod Page – BioGUID, BioStor

• Encyclopedia of Life• Ryan Schenk – Visualizing taxominic

synonyms

LITA National Forum, September 30, 2011

Page 35: Building the New Open Linked Library

LITA National Forum, September 30, 2011

Who is reusing our data?Tropicos

Page 36: Building the New Open Linked Library

LITA National Forum, September 30, 2011

Tropicos

Page 37: Building the New Open Linked Library

LITA National Forum, September 30, 2011

Tropicos

Who is reusing our data?

Page 38: Building the New Open Linked Library

LITA National Forum, September 30, 2011

Rod Page – BioGUID – http://bioguid.info/bhl/

Who is reusing our data?

Page 39: Building the New Open Linked Library

LITA National Forum, September 30, 2011

Rod Page – BioStor – http://biostor.org/

Who is reusing our data?

Page 40: Building the New Open Linked Library

LITA National Forum, September 30, 2011

Rod Page – BioStor – http://biostor.org/

Who is reusing our data?

Page 41: Building the New Open Linked Library

LITA National Forum, September 30, 2011

Encyclopedia of Life – http://eol.org/

Who is reusing our data?

Page 42: Building the New Open Linked Library

LITA National Forum, September 30, 2011

Encyclopedia of Life – http://eol.org/

Who is reusing our data?

Page 43: Building the New Open Linked Library

Encyclopedia of Life – http://eol.org/

LITA National Forum, September 30, 2011

Who is reusing our data?

Page 44: Building the New Open Linked Library

Ryan Schenk – http://ryanschenk.com/2011/02/visualizing-taxonomic-synoymns/

LITA National Forum, September 30, 2011

Who is reusing our data?

Page 45: Building the New Open Linked Library

Making open data successful• Promote it!

LITA National Forum, September 30, 2011

Page 46: Building the New Open Linked Library

Do a code challenge

LITA National Forum, September 30, 2011

Page 47: Building the New Open Linked Library

Publicly display your data’s copyright/licensing and API terms of service

LITA National Forum, September 30, 2011

Page 48: Building the New Open Linked Library

Keri Thompson, Head of Web ServicesSmithsonian Institution Libraries

[email protected] , @DigiKeri_SIL

Joel Richard, Lead DeveloperSmithsonian Institution Libraries

[email protected]

Trish Rose-Sandler, Data AnalystBiodiversity Heritage Library

[email protected]

Building the New Open Linked Library

Thank You!

LITA National Forum, September 30, 2011