harvesting repositories: dpla, europeana, & other case studies

Post on 15-Apr-2017

643 Views

Category:

Software

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Harvesting Repositories DPLA, Europeana, and Other Case Studies

ALA Conference June 25, 2016

Introductions

Erin Tripp, Bus. Dev.

Staff librarian since 2011. Erin delivers Islandora

training at events worldwide and has managed more than 40 digital repository projects.

Contact Details ●  Email: erin@discoverygarden.ca ●  Twitter: @eeohalloran or @discgarden ●  Hashtags: #islandora #ALAAC16

Agenda

Objectives Overview

By Show of Hands & Introductions

Why Should We Care? Repository Requirements

OAI-PMH Overview

Case Studies

Top Takeaways

Objectives for Today

Learn a thing or two about:

●  OAI-PMH

●  Common Harvesters

●  Who to ask for help

●  What questions to ask

●  Confidence to continue

learning/ try a new tool

By Show of Hands...

Who is interested in ●  National Harvester, ●  State Harvester, ●  Subject Harvester, or ●  Proprietary Discovery Service

Harvester? Who has already been involved in a harvesting project? Who has experience using ●  XLSTs ●  OAI-PMH ●  REPOX?

Why should we care? Discoverability.

Why should we care? Discoverability.

February 2015 LITA panelists said Top Technology Trends include enhancing discoverability (Enis, 2015) Making content accessible where the search originates (e.g. Google, Google Scholar, WorldCat, DPLA, Europeana) creates value for digital libraries and users Repositories contributing to aggregators can experience increased site visits from 55-109 per cent (DPLA, n.d.)

Why should we care? Discoverability.

Increased exposure through

●  Blogs, social media and Wikipedia,

Provide richer context and increase the visibility of your collections

Make your collections available for re-use by other services (Europeana, n.d.)

Access to valuable skills

Data modelling

Copyright and licensing

Reporting on access usage analytics (Europeana, n.d.)

Why should we care? Discoverability.

Using open source

Linking up to thousands of other collections

Interoperable (no vendor lock in/ proprietary formats)

Access to Wikimedia Commons (Europeana, n.d.)

Expanding your network

Connect with like-minded industry professionals

Identify potential partners and joint funding opportunities

Reach out to other sectors – creatives, education, tourism and more (Europeana, n.d.)

Why should we care? Discoverability.

Anecdotally, repository harvest can: ●  Act as incentive for people to deposit content into

the repository / buy-in from stakeholders

●  Clean up and normalize metadata resulting in better raw material to support discovery

OAI-PMH Overview

OAI-PMH

Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH)

Low-barrier mechanism for repository interoperability

OAI-PMH is a set of six requests

(aka verbs or services) that are invoked within HTTP

Providers

Data Providers are repositories that expose structured metadata via OAI-PMH = Repository

Service Providers then make OAI-PMH service requests to harvest that metadata = Harvester

Vocabulary

Request/ Verb/ Service The action that the service

provider (harvester) is requesting from the data provider (repository)

Response Size The maximum number of

records to issue per response

Vocabulary… continued

Resumption Token

When a request returns records greater than the response size a resumptionToken is issued such that the service provider can resume harvesting from where it left off

Identify

This request used to retrieve information about a repository. Some of the information returned is required as part of the OAI-PMH. Example: YourSite/oai2?verb=Identify

Vocabulary… continued

ListMetadataFormats This request is used to retrieve the metadata formats available from a repository. Example: YourSite/oai2?verb=ListMetadataFormats

ListRecords This request is used to harvest records from a repository. Optional arguments permit selective harvesting of records based on set membership and/or datestamp. Example: YourSite/oai2?verb=ListRecords&metadataPrefix=oai_dc

Vocabulary… continued

ListSets This request is used to retrieve the set structure of a repository, useful for selective harvesting All Collections Example: YourSite/oai2?verb=ListSets Specific Collection Example: YourSite/oai2?verb=ListRecords&metadataPrefix=oai_dc&set=ir_citationCollection

Repository Requirements

Accessible to the web

Storing standards, XML-based descriptive metadata

The ability to apply additional

metadata mapping if needed (rather in or external to repository)

Access to documentation and XSLTs used for metadata mapping

Repository Requirements

Pass XML metadata to service provider from the:

1.  Preservation (storage) component or

2.  Discovery (index) component

Provide a method to harvest a TN and link back to repository Accommodate customization

Repository Requirements … Continued

For example: University of South Carolina video content model is tiered for preservation, media production and streaming web access. We only want to harvest one of three possible records

Case Study Europeana

Europeana

Our material comes from all over

Europe and the scope of the

collections is really quite

astonishing. [...]

http://www.europeana.eu/

http://pro.europeana.eu/

Intermediate Aggregator

Digibess repo stores digitized objects from 18 Economic and Social Sciences libraries in Italy Europeana requires an intermediate aggregator; a national harvester such as Cultura Italia Cultura Italia harvests custom “Pico” metadata format from Digibess and then is harvested by Europeana

Harvesting Tools

Digibess pre-dated Islandora OAI module and REPOX aggregator

Used Proai servlet oaiprovider-1.2.2

Harvest resulted in examining in general needs and specific applications of the protocol

Digibess on Europeana

REPOX

Since the Digibess project a new intermediate aggregator has been released called REPOX. It aims to provide [...] Europeana partners a simple solution to import, convert and expose their bibliographic data via OAI-PMH http://repox.sysresearch.org/

Case Study Digital Public Library of America (DPLA)

DPLA

The Digital Public Library of America brings together the riches of America’s libraries, archives, and museums, and makes them freely available to the world.

https://dp.la/info/

Service Hub

Empire State Digital Network (ESDN) is the New York State service hub for the DPLA

Hosted and administered by the Metropolitan New York Library Council in conjunction with eight allied regional library councils working collectively in New York State as the ESLN

Liaise with partners for data aggregation, mapping and licensing

Mapping & Testing

Harvests from partners using OAI-PMH

o  Provides all partner metadata to DPLA through one OAI-PMH feed from REPOX

Undertakes data review and QA prior to exposing feed to DPLA for harvest

ESDN on DPLA

Case Study Other Discovery Services

Other Discovery Services

WorldCat, Summon, & Primo are commercial discovery services Local discovery layers can also collocate resources for discovery OAI -PMH modules within your repository framework can allow for these services to harvest your repository

Everyone is Harvesting Everyone

Connecticut State Library aggregating data to Research It State Library harvests University of Connecticut Archives and Special Collections, ILS and other University of Connecticut Library harvests to Summon/ Primo and will be harvested by DPLA

Creating Lots of Portals

University of Connecticut Library started harvesting in mid 2014 Notable increases in access to digital content since harvest (one of many factors) Access statistics available at CTDA Statistics

University of Connecticut on Research It - EBSCOhost

Harvesting Top Takeaways

Top Takeaways -

Data Providers

●  Server Load/ Application Load

●  Permissions / Copyright

●  Relationships with Service

Providers ●  Repository Buy-in

●  Increased Discovery

●  Metadata Normalization

Top Takeaways - Service

Providers

●  Knowledge of ○  XSLT, ○  OAI-PMH, and ○  Metadata Schema Knowledge

(DC, MODS, QDC, MARC XML)

●  Technical staff to set-up and maintain the aggregator & write scripts to transform harvested metadata

●  Relationships with Data Providers

Harvesting Discussion

Discussion

●  What are your biggest challenges?

●  What Resources do you find helpful?

●  What was your AH HA! moment?

●  What was most useful in this presentation?

Harvesting Demonstration

Demonstration

To follow along or try it at home, navigate to….

http://sandbox.discoverygarden.ca/ OR

http://islandora.ca/downloads Click Islandora > Islandora Utility Modules > Islandora OAI

Questions? Contact us at: erin@discoverygarden.ca

top related