building the new open linked library

Post on 25-May-2015

2.359 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presented at the LITA National Forum, September 30, 2011

TRANSCRIPT

Building the New OpenLinked Library Theory and Practice

…and results!

Keri Thompson, Joel Richard, Trish Rose-Sandler

LITA National Forum, September 30, 2011

• Founded in 1846• 1.5 m volumes in collection, plus

assorted archival collections• 15,000 volumes scanned and online• 20 libraries serving ~500

researchers/curators + hundreds of fellows and interns• 102 library staff• 1.5 web staff• Founding member of the Biodiversity

Heritage Library

Smithsonian Libraries

LITA National Forum, September 30, 2011

WHY Linked Open Data?

• It’s cool • “Increase and Diffusion of Knowledge”• Share, contribute to a global database• Create context around our data• Allow data to be reused/repurposed by

ourselves and others• Improve discoverability of our content

Linked Data in our Library

LITA National Forum, September 30, 2011

“The Semantic Web isn’t just about putting data on the web. It is about making links, so that a person or machine can explore the web of data. With linked data, when you have some of it, you can find other, related, data.”

Tim Berners-Lee, Linked Data – Design Issues

1. Use URIs as names for things2. Use HTTP URIs so that people can look up those names3. When someone looks up a URI, provide useful

information, using the standards (RDF, SPARQL)4. Include links to other URIs, so that they can discover

more things.

Linked Data

LITA National Forum, September 30, 2011

• Publishing structured data on the web• RDF (Resource Description Framework) • Enables queries computer 2 computer• uses standard ontologies (vocabularies)• data in “triples” (“triplestore”)

• Freely available to use, reuse, republish with no restrictions

• Made available through various mechanisms such as .csv files, APIs

URI http://library.si.edu/tl2/author/charles-darwin Predicate owl:sameAs Object http://viaf.org/viaf/27063124

Linked Data Open Data

LITA National Forum, September 30, 2011

Organically grown since 1995

• 83,000 HTML pages• 3,700 ColdFusion pages• 253,000 JPEG files• 27,000 PNG files• 46,000 PDFs

No CMS.

Our Website

LITA National Forum, September 30, 2011

1. Analyze and categorize our current & future online content

2. Create high-level data models for common content types

Questions:Where are we metadata-rich? What do we have that others don’t? What is feasible right now?

Digital Library Planning

LITA National Forum, September 30, 2011

• 400+ Online “books”• Exhibitions• Research Tools• Image Collections (60,000+ images)• “Brochure” content (About us, Locations, Hours)• Bibliographies, Fact Sheets, Subject Guides• Databases, inventories and database-like books

Collections not on our website:• ~15,000 digitized volumes, with many more planned• Other analog collections that will be digitized

Content Analysis

LITA National Forum, September 30, 2011

Books (and book-like objects)• expose bibliographic data for reuse• consume links to other internal content

and external authoritative dataDatabases• expose data previously unavailable• provide authoritative data• consume our data and others’ to create

new aggregate websites

Linked Data in our Library

LITA National Forum, September 30, 2011

1. Decide which data elements should be exposed as linked data for each content type

2. Choose appropriate vocabularies3. Create a rough timeline and plan

for migrating site content (=1 year*)

* Optimism included in this estimate

Linked Digital Library Planning

LITA National Forum, September 30, 2011

Implement all this linked open data goodness (and a shiny new website) by

moving to Drupal 7

Linked Data in our Library

LITA National Forum, September 30, 2011

Drupal and Linked Data

• Native support for RDFa in Drupal 7.

• RDF Extensions (rdfx) – even more features.

• Vocabularies can be imported and cached for reuse.

• Few or no modifications to HTML to support RDFa.

What’s the difference between RDF, RDF/XML and RDFa?

LITA National Forum, September 30, 2011

<meta content="The Origin of Species" about=”/book/origin-species" property="dc:title" />

<h1>The Origin of Species</h1>

<img typeof="foaf:Image" src="http://localhost:8087/images/origin-of-

species.png" alt="The origin of species cover image”title="The origin of species cover image" />

<div rel="bibo:authorList">

<a href="/content/darwin-charles-1809-1882">Darwin, Charles, 1809-1882

</a>

</div>

<div property="dc:created">November 24, 1859</div>

<div property="bibo:numPages">1000</div>

<div property="dc:language">english</div>

<div rel="owl:sameAs">

<a href="http://www.worldcat.org/oclc/1184647" target="_blank">http://www.worldcat.org/oclc/

1184647</a>

</div>

URI: http://library.si.edu/book/origin-of-species

RDFa Sample

LITA National Forum, September 30, 2011

<?xml version="1.0" encoding="UTF-8"?>

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/terms/" xmlns:bibo="http://purl.org/ontology/bibo/">

<rdf:Description rdf:about="http://localhost:8087/content/ origin-species">

<rdf:type rdf:resource="http://purl.org/ontology/bibo/Book"/>

<dc:title>The Origin of Species</dc:title>

<dc:created>November 24, 1859</dc:created>

<bibo:numPages>1000</bibo:numPages>

<dc:language>english</dc:language>

<bibo:authorList rdf:resource="http://localhost:8087/content/darwin-

charles"/>

<owl:sameAs rdf:resource=“http://www.worldcat.org/oclc/1184647”>

</rdf:Description>

</rdf:RDF>

RDF/XML Sample

LITA National Forum, September 30, 2011

URI: http://library.si.edu/book/origin-of-species.rdf

• Fields, Views, Views UI

• Node Reference

• SPARQL Endpoint , SPARQL API

• RESTful Web Services

• SPARQL Views

• RDF External Vocabulary Importer

Caveat: Some modules not ready for Drupal 7

• i.e., Biblio module (no CCK, RDF capabilities)

What other modules are we using?

LITA National Forum, September 30, 2011

• Drupal 7 comes with several namespaces. We will use: DC Terms, FOAF, SKOS, OWL

•We're working with books, so we need the Bibliographic Ontology:•Website: http://bibliontology.com/• Namespace: http://purl.org/ontology/bibo/• Prefix: “bibo”

•We may also create our own vocabulary.

What about Namespaces/Vocabularies?

LITA National Forum, September 30, 2011

Adding a Namespace to Drupal

LITA National Forum, September 30, 2011

Setting up RDF Mappings in Drupal

LITA National Forum, September 30, 2011

Taxonomic Literature 2 (1977-2009)

• The standard reference work for plant taxonomic literature from Linnaeus to 1940.• Contains botanists, authors, biographies,

citations, and species.• Indexed and cross referenced.• Should be digitized & on the web!• SIL aims to be an authority for

botanist names on the Internet.

Databases: TL-2

LITA National Forum, September 30, 2011

Taxonomic Literature 2 (TL-2). v1., p. 600

TL-2 Page Sample

LITA National Forum, September 30, 2011

http://library.si.edu/tl2/author/darwin

http://library.si.edu/tl2/book/1313

tl2:creatorOfhttp://library.si.edu/tl2/book/1313

owl:sameAshttp://viaf.org/viaf/27063124

dc:creatorhttp://library.si.edu/tl2/author/darwin

owl:sameAshttp://www.archive.org/details/ originofspecies00darwuoft

TL-2 Page Sample

LITA National Forum, September 30, 2011

foaf:lastName, foaf:familyName

foaf:firstName, foaf:givenName

foaf:name, skos:prefLabel

tl2:birthYear

tl2:deathYear

tl2:description

tl2:personAbbrev

tl2:bookNumber

dc:title

event:place

dc:publisher

dc:created

tl2:bookAbbreviation

http://library.si.edu/tl2/author/darwinRDF Type = foaf:Person

http://library.si.edu/tl2/book/1313RDF Type = bibo:Book

TL-2 Page Sample

LITA National Forum, September 30, 2011

http://library.si.edu/tl2/author/darwin http://library.si.edu/tl2/book/1313

tl2:creatorOf“http://library.si.edu/tl2/book/1313”

owl:sameAs “http://viaf.org/viaf/27063124”

foaf:lastName “Darwin”

foaf:familyName “Darwin”

foaf:firstName “Charles”

foaf:givenName “Charles”

foaf:name “Darwin, Charles Robert”

skos:prefLabel “Darwin, Charles Robert”

tl2:birthYear “1809”

tl2:deathYear “1882”

tl2:description “British evolutionary biologist”

tl2:personAbbrev “Darwin”

dc:creator“http://library.si.edu/tl2/author/darwin”

owl:sameAs”http://www.archive.org/details/originofspecies00darwuoft”

tl2:bookNumber “1313”

bibo:shortTitle “On the origin of species”

dc:title “On the origin of species by means of natural selection, or the preservation of favoured races in the struggle for life.”

event:place “London”

dc:publisher “John Murray”

dc:created “1859”

tl2:bookAbbreviation “Origin sp.”

TL-2 Page Sample Results

LITA National Forum, September 30, 2011

• Two Content Types: Authors (Botanists) and Publications

• Node Reference between Authors and Publications based on the TL-2 index.

• Other data is available when it's parsed:

• Herbaria • Institutions• Species names• Bibliographies• Handwriting Samples• Postage Stamps

Setting up TL-2 in Drupal

LITA National Forum, September 30, 2011

• Create Content Types (Digital Library books & TL-2)• Create import process• May be able to use the Feeds module for import• Must create node references during the import.• Must accommodate the blocks of unparsed

information in TL-2• Create a search interface specifically for TL-2

Imag

e Cr

edits

: D

atab

ase:

epo

nas-

deew

ay (h

ttp:

//ep

onas

-dee

way

.dev

iant

art.c

om);

Mag

nify

ing

Gla

ss: F

laho

rn (h

ttp:

//fla

horn

.dev

iant

art.c

om/)

Getting Data into Drupal

LITA National Forum, September 30, 2011

Resolve /node/22365.rdf and /tl2/author/charles-darwin

Handling "See also" and "Same as" entries in the TL-2 indexes.

Can we search our own data using SPARQL?• Should we? Does it make sense?

Discuss/Extend vocabulary for our special needs.

Set up linked data within our site• image collections• trade literature• Exhibitions

What else is there to do?

LITA National Forum, September 30, 2011

LinkedData.orghttp://linkeddata.org/guides-and-tutorialshttp://linkeddatabook.com/editions/1.0/

Drupal Groupshttp://groups.drupal.org/semantic-webhttp://groups.drupal.org/libraries

Tim Berners-Lee, TED talksTim Berners-Lee on the next Web (2009)The year open data went worldwide (2010)

Other Resources

LITA National Forum, September 30, 2011

BHL is….• A consortium of 13 natural history and

botanical libraries and research institutions• An open access digital library for legacy

biodiversity literature.• An open data repository of taxonomic names

and bibliographic information

LITA National Forum, September 30, 2011

LITA National Forum, September 30, 2011

LITA National Forum, September 30, 2011

Allows data which was created for a specific purpose and audience to interact with other data to serve new, previously

unimagined roles..

LITA National Forum, September 30, 2011

Benefits of open data

What information have we opened up?

Essentially, everything – our metadata (descriptive, rights, structural), our image files,

scientific names, OCR’d files

LITA National Forum, September 30, 2011

Technical methods for opening data

• Data exports• APIs• OpenURL• OAI-PMH

LITA National Forum, September 30, 2011

Who is reusing our data?

• Tropicos• Rod Page – BioGUID, BioStor

• Encyclopedia of Life• Ryan Schenk – Visualizing taxominic

synonyms

LITA National Forum, September 30, 2011

LITA National Forum, September 30, 2011

Who is reusing our data?Tropicos

LITA National Forum, September 30, 2011

Tropicos

LITA National Forum, September 30, 2011

Tropicos

Who is reusing our data?

LITA National Forum, September 30, 2011

Rod Page – BioGUID – http://bioguid.info/bhl/

Who is reusing our data?

LITA National Forum, September 30, 2011

Rod Page – BioStor – http://biostor.org/

Who is reusing our data?

LITA National Forum, September 30, 2011

Rod Page – BioStor – http://biostor.org/

Who is reusing our data?

LITA National Forum, September 30, 2011

Encyclopedia of Life – http://eol.org/

Who is reusing our data?

LITA National Forum, September 30, 2011

Encyclopedia of Life – http://eol.org/

Who is reusing our data?

Encyclopedia of Life – http://eol.org/

LITA National Forum, September 30, 2011

Who is reusing our data?

Ryan Schenk – http://ryanschenk.com/2011/02/visualizing-taxonomic-synoymns/

LITA National Forum, September 30, 2011

Who is reusing our data?

Making open data successful• Promote it!

LITA National Forum, September 30, 2011

Do a code challenge

LITA National Forum, September 30, 2011

Publicly display your data’s copyright/licensing and API terms of service

LITA National Forum, September 30, 2011

Keri Thompson, Head of Web ServicesSmithsonian Institution Libraries

thompsonk@si.edu , @DigiKeri_SIL

Joel Richard, Lead DeveloperSmithsonian Institution Libraries

richardjm@si.edu

Trish Rose-Sandler, Data AnalystBiodiversity Heritage Library

trisha.rose-sandler@mobot.org

Building the New Open Linked Library

Thank You!

LITA National Forum, September 30, 2011

top related