harvesting and resolution methods for building oai-based services

26
OCLC Online Computer Library Center Harvesting and Resolution Methods for Building OAI-based Services Jeffrey A. Young [email protected] CERN OAI3 Workshop# 4 Geneva, Switzerland 14 February 2004

Upload: aquarius

Post on 06-Jan-2016

30 views

Category:

Documents


0 download

DESCRIPTION

Harvesting and Resolution Methods for Building OAI-based Services. Jeffrey A. Young [email protected] CERN OAI3 Workshop# 4 Geneva, Switzerland 14 February 2004. Introductions. Name Affiliation Plans Needs Technical experience. Review OAI-PMH Protocol. Identify ListSets - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Harvesting and Resolution Methods for Building OAI-based Services

OCLC Online Computer Library Center

Harvesting and Resolution Methods

for Building OAI-based ServicesJeffrey A. Young

[email protected]

CERN OAI3 Workshop# 4Geneva, Switzerland

14 February 2004

Page 2: Harvesting and Resolution Methods for Building OAI-based Services

IntroductionsIntroductionsName

Affiliation

Plans

Needs

Technical experience

Page 3: Harvesting and Resolution Methods for Building OAI-based Services

Review OAI-PMH ProtocolReview OAI-PMH Protocol

Identify

ListSets

ListMetadataFormats

ListRecords

ListIdentifiers

GetRecord

Page 4: Harvesting and Resolution Methods for Building OAI-based Services

Find Repositories to HarvestFind Repositories to Harvest

http://www.openarchives.org/Register/BrowseSites.pl

http://oai.dlib.vt.edu/cgi-bin/Explorer/oai2.0/testoai

http://oai.grainger.uiuc.edu/registry/

Friends lists

Communities (e.g. www.ndltd.org)

Page 5: Harvesting and Resolution Methods for Building OAI-based Services

Exercise: Getting StartedExercise: Getting StartedWhat are your data sources?

How will you add value?

Who will design the system?

Who will create/operate the software?

Who will create/maintain the data?

Who will advocate for it politically?

Who will benefit?

Who will pay?

Page 6: Harvesting and Resolution Methods for Building OAI-based Services

MetadataMetadataMetadata is data about data

Metadata formats: two extremes– Dublin Core– MARC

Metadata can be relative– Who created this document?– Who created the metadata about this

document?

Keep in mind, though, that OAI works just as well for sharing XML content

Page 7: Harvesting and Resolution Methods for Building OAI-based Services

XML/DTD/XSD/XSLXML/DTD/XSD/XSLXML - eXtensible Markup Language

DTD - Document Type Definition

XSD - XML Schema Definition

XSL - eXtensible Stylesheet Language

Page 8: Harvesting and Resolution Methods for Building OAI-based Services

eXtensible Markup LanguageeXtensible Markup Language

Meta-markup language

HTML – Hypertext markup language

XHTML – eXtensible hypertext markup language

Page 9: Harvesting and Resolution Methods for Building OAI-based Services

XML OverviewXML OverviewWell-formed XML

XML Namespaces

Valid XML– DTDs– XML Schemas

OAI Items vs. Records– Item identifiers– Multiple metadata record representations

Page 10: Harvesting and Resolution Methods for Building OAI-based Services

XML NamespacesXML NamespacesAmbiguous XML Elements– <wind>NNE</wind>– <wind>Clockwise</wind>

Prefixes help identify and differentiate elements– <weather:wind>SE</weather:wind>– <toy:wind>Widdershins</toy:wind>

But, prefixes are arbitrary and potentially ambiguous, so what we really need is a URI (ie. prefixes are a local shorthand for the URI)– <weather:wind

xmlns:weather=“someURI”>NW</weather:wind>

Page 11: Harvesting and Resolution Methods for Building OAI-based Services

XML Schema DefinitionXML Schema Definition

Defines what an XML document contains– XHTML– oai_dc– MARC21 XML

Page 12: Harvesting and Resolution Methods for Building OAI-based Services

What is our “item”?What is our “item”?Work – a distinct intellectual or artistic creation– J.S. Bach’s The art of the fugue

Expression – the intellectual or artistic realization of a work– The composer’s score for organ– An arrangement for chamber orchestra by Anthony

Lewis

Manifestation – The physical embodiment of an expression of a work– CD, printed score, multimedia kit, etc.

Item – A single exemplar of a manifestation

Page 13: Harvesting and Resolution Methods for Building OAI-based Services

Exercise: Data DefinitionExercise: Data Definition

Design a metadata format for items in your project– List the elements you need– Consider the encoding rules– Consider using controlled vocabularies

Assign an XML namespace

Map a crosswalk to Dublin Core

Create a sample item with both formats– Consider assigning OAI sets

Report issues, problems, and concerns

Page 14: Harvesting and Resolution Methods for Building OAI-based Services

Exercise: A Simple HarvesterExercise: A Simple HarvesterXOAIHarvester – a simple harvester

written in XSLT

http://errol.oclc.org/oai:xmlregistry.oclc.org:xoai/xoaiharvester.xsl

The purpose of the Perl script is to manage incremental harvesting

Caveat! OAI is merely the first step. Once data is harvested, OAI provides absolutely no guidance for doing something useful with it.

Page 15: Harvesting and Resolution Methods for Building OAI-based Services

ConcernsConcernsData quality

Duplicates

Intellectual Property Rights (IPR)

The appropriate copy problem

Persistence

Page 16: Harvesting and Resolution Methods for Building OAI-based Services

Repository VariablesRepository VariablesMetadataPrefix– oai_dc – the lowest common denominator

Set– Hierarchical– Allows selective harvesting– Work best with community agreement– Client warrant

Page 17: Harvesting and Resolution Methods for Building OAI-based Services

Exercise: Select/Create ToolsExercise: Select/Create Tools

http://www.oaforum.org/oaf_db/list_db/list_software.php

http://www.openarchives.org/tools/tools.html

http://www.cs.cornell.edu/people/simeon/software/utf8conditioner/

http://harvest.physik.uni-oldenburg.de/dc/index.html

Page 18: Harvesting and Resolution Methods for Building OAI-based Services

An Alternative Service ModelAn Alternative Service Model

ERRoLs are URLs to content and services related to repositories in the OAI Registry at UIUC

http://errol.oclc.org/

Page 19: Harvesting and Resolution Methods for Building OAI-based Services

DiscussionDiscussionIssues, Problems, Concerns?

Page 20: Harvesting and Resolution Methods for Building OAI-based Services

Music ServicesMusic ServicesOrganizational issues

Cultural issues

Collection policies

Best practices

Consensus-building

Controlled vocabularies– http://alcme.oclc.org/gsafd/

Do items represent digital and/or physical entities?

Authority control

Page 21: Harvesting and Resolution Methods for Building OAI-based Services

Repository DescriptorsRepository Descriptors

Repository-level “description” elements– oai-identifier description – identifier layout– eprints description – content & policies– friends description – discover repositories– branding description – branding information– olac-archive description – archive info

Record-level “about” elements– Rights statements– Provenance statements

Page 22: Harvesting and Resolution Methods for Building OAI-based Services

XSLT OverviewXSLT Overview

XSLTStylesheet

XSLTProcessor

XMLDocument

XMLDocument

Page 23: Harvesting and Resolution Methods for Building OAI-based Services

Validate RepositoriesValidate Repositorieshttp://www.openarchives.org/data/registerasprovider.html

http://oai.dlib.vt.edu/cgi-bin/Explorer/oai2.0/testoai

http://www.w3.org/2001/03/webdata/xsv

Page 24: Harvesting and Resolution Methods for Building OAI-based Services

Example Service ProvidersExample Service ProvidersARC - A Cross Archive Search Service (experimental research service)

http://arc.cs.odu.edu/

Dokumenten- und Publikationsserver der Humboldt-Universität zu Berlin (search service, German language user interface)http://edoc.hu-berlin.de/oaisearch/

iCite (citation index)http://icite.sissa.it/

NCSTRL—Networked Computer Science Technical Reference Library (search engine)http://www.ncstrl.org/

my.OAI (value-added search interface to a selected list of metadata databases)http://www.myoai.com/

Physnet (simple search interface to an experimental OAI harvester)http://physnet.uni-oldenburg.de/oai/query.php

ProPrint (printing-on-demand service, German and English language user interfaces offered)http://www.proprint-service.de/

Page 25: Harvesting and Resolution Methods for Building OAI-based Services

ResourcesResourceshttp://www.openarchives.org/

http://www.oaforum.org/

Page 26: Harvesting and Resolution Methods for Building OAI-based Services

Everything you need to knowEverything you need to know

http://www.oaforum.org/otherfiles/oaf_d23_technical2.pdf