harvesting repositories: dpla, europeana, & other case studies

44
Harvesting Repositories DPLA, Europeana, and Other Case Studies ALA Conference June 25, 2016

Upload: eohallor

Post on 15-Apr-2017

643 views

Category:

Software


2 download

TRANSCRIPT

Page 1: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

Harvesting Repositories DPLA, Europeana, and Other Case Studies

ALA Conference June 25, 2016

Page 2: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

Introductions

Erin Tripp, Bus. Dev.

Staff librarian since 2011. Erin delivers Islandora

training at events worldwide and has managed more than 40 digital repository projects.

Contact Details ●  Email: [email protected] ●  Twitter: @eeohalloran or @discgarden ●  Hashtags: #islandora #ALAAC16

Page 3: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

Agenda

Objectives Overview

By Show of Hands & Introductions

Why Should We Care? Repository Requirements

OAI-PMH Overview

Case Studies

Top Takeaways

Page 4: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

Objectives for Today

Learn a thing or two about:

●  OAI-PMH

●  Common Harvesters

●  Who to ask for help

●  What questions to ask

●  Confidence to continue

learning/ try a new tool

Page 5: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

By Show of Hands...

Who is interested in ●  National Harvester, ●  State Harvester, ●  Subject Harvester, or ●  Proprietary Discovery Service

Harvester? Who has already been involved in a harvesting project? Who has experience using ●  XLSTs ●  OAI-PMH ●  REPOX?

Page 6: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

Why should we care? Discoverability.

Page 7: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

Why should we care? Discoverability.

February 2015 LITA panelists said Top Technology Trends include enhancing discoverability (Enis, 2015) Making content accessible where the search originates (e.g. Google, Google Scholar, WorldCat, DPLA, Europeana) creates value for digital libraries and users Repositories contributing to aggregators can experience increased site visits from 55-109 per cent (DPLA, n.d.)

Page 8: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

Why should we care? Discoverability.

Increased exposure through

●  Blogs, social media and Wikipedia,

Provide richer context and increase the visibility of your collections

Make your collections available for re-use by other services (Europeana, n.d.)

Access to valuable skills

Data modelling

Copyright and licensing

Reporting on access usage analytics (Europeana, n.d.)

Page 9: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

Why should we care? Discoverability.

Using open source

Linking up to thousands of other collections

Interoperable (no vendor lock in/ proprietary formats)

Access to Wikimedia Commons (Europeana, n.d.)

Expanding your network

Connect with like-minded industry professionals

Identify potential partners and joint funding opportunities

Reach out to other sectors – creatives, education, tourism and more (Europeana, n.d.)

Page 10: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

Why should we care? Discoverability.

Anecdotally, repository harvest can: ●  Act as incentive for people to deposit content into

the repository / buy-in from stakeholders

●  Clean up and normalize metadata resulting in better raw material to support discovery

Page 11: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

OAI-PMH Overview

Page 12: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

OAI-PMH

Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH)

Low-barrier mechanism for repository interoperability

OAI-PMH is a set of six requests

(aka verbs or services) that are invoked within HTTP

Page 13: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

Providers

Data Providers are repositories that expose structured metadata via OAI-PMH = Repository

Service Providers then make OAI-PMH service requests to harvest that metadata = Harvester

Page 14: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

Vocabulary

Request/ Verb/ Service The action that the service

provider (harvester) is requesting from the data provider (repository)

Response Size The maximum number of

records to issue per response

Page 15: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

Vocabulary… continued

Resumption Token

When a request returns records greater than the response size a resumptionToken is issued such that the service provider can resume harvesting from where it left off

Identify

This request used to retrieve information about a repository. Some of the information returned is required as part of the OAI-PMH. Example: YourSite/oai2?verb=Identify

Page 16: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

Vocabulary… continued

ListMetadataFormats This request is used to retrieve the metadata formats available from a repository. Example: YourSite/oai2?verb=ListMetadataFormats

ListRecords This request is used to harvest records from a repository. Optional arguments permit selective harvesting of records based on set membership and/or datestamp. Example: YourSite/oai2?verb=ListRecords&metadataPrefix=oai_dc

Page 17: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

Vocabulary… continued

ListSets This request is used to retrieve the set structure of a repository, useful for selective harvesting All Collections Example: YourSite/oai2?verb=ListSets Specific Collection Example: YourSite/oai2?verb=ListRecords&metadataPrefix=oai_dc&set=ir_citationCollection

Page 18: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

Repository Requirements

Accessible to the web

Storing standards, XML-based descriptive metadata

The ability to apply additional

metadata mapping if needed (rather in or external to repository)

Access to documentation and XSLTs used for metadata mapping

Page 19: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

Repository Requirements

Pass XML metadata to service provider from the:

1.  Preservation (storage) component or

2.  Discovery (index) component

Provide a method to harvest a TN and link back to repository Accommodate customization

Page 20: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

Repository Requirements … Continued

For example: University of South Carolina video content model is tiered for preservation, media production and streaming web access. We only want to harvest one of three possible records

Page 21: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

Case Study Europeana

Page 22: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

Europeana

Our material comes from all over

Europe and the scope of the

collections is really quite

astonishing. [...]

http://www.europeana.eu/

http://pro.europeana.eu/

Page 23: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

Intermediate Aggregator

Digibess repo stores digitized objects from 18 Economic and Social Sciences libraries in Italy Europeana requires an intermediate aggregator; a national harvester such as Cultura Italia Cultura Italia harvests custom “Pico” metadata format from Digibess and then is harvested by Europeana

Page 24: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

Harvesting Tools

Digibess pre-dated Islandora OAI module and REPOX aggregator

Used Proai servlet oaiprovider-1.2.2

Harvest resulted in examining in general needs and specific applications of the protocol

Page 25: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

Digibess on Europeana

Page 26: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

REPOX

Since the Digibess project a new intermediate aggregator has been released called REPOX. It aims to provide [...] Europeana partners a simple solution to import, convert and expose their bibliographic data via OAI-PMH http://repox.sysresearch.org/

Page 27: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

Case Study Digital Public Library of America (DPLA)

Page 28: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

DPLA

The Digital Public Library of America brings together the riches of America’s libraries, archives, and museums, and makes them freely available to the world.

https://dp.la/info/

Page 29: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

Service Hub

Empire State Digital Network (ESDN) is the New York State service hub for the DPLA

Hosted and administered by the Metropolitan New York Library Council in conjunction with eight allied regional library councils working collectively in New York State as the ESLN

Liaise with partners for data aggregation, mapping and licensing

Page 30: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

Mapping & Testing

Harvests from partners using OAI-PMH

o  Provides all partner metadata to DPLA through one OAI-PMH feed from REPOX

Undertakes data review and QA prior to exposing feed to DPLA for harvest

Page 31: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

ESDN on DPLA

Page 32: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

Case Study Other Discovery Services

Page 33: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

Other Discovery Services

WorldCat, Summon, & Primo are commercial discovery services Local discovery layers can also collocate resources for discovery OAI -PMH modules within your repository framework can allow for these services to harvest your repository

Page 34: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

Everyone is Harvesting Everyone

Connecticut State Library aggregating data to Research It State Library harvests University of Connecticut Archives and Special Collections, ILS and other University of Connecticut Library harvests to Summon/ Primo and will be harvested by DPLA

Page 35: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

Creating Lots of Portals

University of Connecticut Library started harvesting in mid 2014 Notable increases in access to digital content since harvest (one of many factors) Access statistics available at CTDA Statistics

Page 36: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

University of Connecticut on Research It - EBSCOhost

Page 37: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

Harvesting Top Takeaways

Page 38: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

Top Takeaways -

Data Providers

●  Server Load/ Application Load

●  Permissions / Copyright

●  Relationships with Service

Providers ●  Repository Buy-in

●  Increased Discovery

●  Metadata Normalization

Page 39: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

Top Takeaways - Service

Providers

●  Knowledge of ○  XSLT, ○  OAI-PMH, and ○  Metadata Schema Knowledge

(DC, MODS, QDC, MARC XML)

●  Technical staff to set-up and maintain the aggregator & write scripts to transform harvested metadata

●  Relationships with Data Providers

Page 40: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

Harvesting Discussion

Page 41: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

Discussion

●  What are your biggest challenges?

●  What Resources do you find helpful?

●  What was your AH HA! moment?

●  What was most useful in this presentation?

Page 42: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

Harvesting Demonstration

Page 43: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

Demonstration

To follow along or try it at home, navigate to….

http://sandbox.discoverygarden.ca/ OR

http://islandora.ca/downloads Click Islandora > Islandora Utility Modules > Islandora OAI

Page 44: Harvesting Repositories:  DPLA, Europeana, & Other Case Studies

Questions? Contact us at: [email protected]