the open archives initiative and oaister: past, present and future kat hagedorn university of...

24
The Open Archives Initiative and OAIster: Past, Present and Future Kat Hagedorn Kat Hagedorn University of Michigan University of Michigan Libraries Libraries April 6, 2006 April 6, 2006

Post on 22-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

The Open Archives Initiative and OAIster:

Past, Present and Future

Kat HagedornKat Hagedorn

University of Michigan LibrariesUniversity of Michigan Libraries

April 6, 2006April 6, 2006

The oy(ai)ster and the hareThe oy(ai)ster and the hare

Well, if oysters had feet…Well, if oysters had feet…

Other projects move faster (think Google)Other projects move faster (think Google) OAI still building speedOAI still building speed Follows the punctuated equilibrium model…Follows the punctuated equilibrium model…

* © Johnny Hart!

OAIster records over time

0

1,000,000

2,000,000

3,000,000

4,000,000

5,000,000

6,000,000

7,000,000

8,000,000

Jun-02Aug-02Oct-02Dec-02Feb-03Apr-03Jun-03Aug-03Oct-03Dec-03Feb-04Apr-04Jun-04Aug-04Oct-04Dec-04Feb-05Apr-05Jun-05Aug-05Oct-05Dec-05Feb-06

months

# records

OAIster repositories over time

0

100

200

300

400

500

600

700

Jun-02Aug-02Oct-02Dec-02Feb-03Apr-03Jun-03Aug-03Oct-03Dec-03Feb-04Apr-04Jun-04Aug-04Oct-04Dec-04Feb-05Apr-05Jun-05Aug-05Oct-05Dec-05Feb-06

months

# repositories

Why OAIster?Why OAIster?

And why the silly name?And why the silly name?

Initially, wanted to build the Academic Initially, wanted to build the Academic HotBot (yup, you read that right)HotBot (yup, you read that right)

Essentially, a union catalog of those Essentially, a union catalog of those “objects” that couldn’t easily be spidered“objects” that couldn’t easily be spidered

Currently, have more records that link to Currently, have more records that link to “objects” than there are records in our “objects” than there are records in our OPACOPAC

What does OAIster contain?What does OAIster contain?

Harvest everything availableHarvest everything available except obvious test repositoriesexcept obvious test repositories

Keep nearly everythingKeep nearly everything must have a digital object linkmust have a digital object link must have decent metadatamust have decent metadata must be scholarly or informationalmust be scholarly or informational

For example…For example…

Why do (should) people use it?Why do (should) people use it?

It’s big-- over 7 million last monthIt’s big-- over 7 million last month It’s varied-- contains articles, books, images It’s varied-- contains articles, books, images

of artwork, datasets, videos, audios, finding of artwork, datasets, videos, audios, finding aids, manuscriptsaids, manuscripts

It keeps growing-- as long as they keep It keeps growing-- as long as they keep paying my salarypaying my salary

One interface to rule them all?One interface to rule them all?

If you don’t know this…If you don’t know this… www.oaister.orgwww.oaister.org www.oaister.umdl.umich.edu/o/oaisterwww.oaister.umdl.umich.edu/o/oaister

……how do you get to the content?how do you get to the content? We consider part of our mission making this We consider part of our mission making this

metadata as widely available as possible, metadata as widely available as possible, so…so…

Approached us as part of a big content Approached us as part of a big content appropriation pushappropriation push

Send them our metadata monthly-- takes Send them our metadata monthly-- takes them about a week to include it in the them about a week to include it in the search indexsearch index

For example--For example--

SRU interfaceSRU interface

Federated search engines are “it” now--Federated search engines are “it” now--trying to solve problem of how to search trying to solve problem of how to search simultaneouslysimultaneously

Perfect place for OAIsterPerfect place for OAIster Built SRU interface (Z39.50 deemed older Built SRU interface (Z39.50 deemed older

tech at this point)tech at this point) ExLibris building connector for MetaLib toolExLibris building connector for MetaLib tool For example--For example--

OAI: what it is (finally)OAI: what it is (finally)

Stands for Open Archives InitiativeStands for Open Archives Initiative “…develops and promotes interoperability standards

that aim to facilitate the efficient dissemination of content.”

Includes a Protocol for Metadata Harvesting Includes a Protocol for Metadata Harvesting (PMH), i.e., what we use to fill OAIster(PMH), i.e., what we use to fill OAIster

Consists of data providers and service Consists of data providers and service providersproviders

OAI: what it is notOAI: what it is not

OAI ≠ open access OAI ≠ open access “…defining and promoting machine interfaces that

facilitate the availability of content from a variety of providers. Openness does not mean ‘free’ or ‘unlimited’ access to the information repositories that conform to the OAI-PMH.”

However, a large majority of OAIster However, a large majority of OAIster records are available to all and sundryrecords are available to all and sundry

Perfect opportunity-- freely sharing free stuffPerfect opportunity-- freely sharing free stuff

OAIster and open accessOAIster and open access

We harvest a large number of open access We harvest a large number of open access “self-publishing” repositories, e.g.,“self-publishing” repositories, e.g., DSpace: 68DSpace: 68 EPrints: 113EPrints: 113 OJS: 21OJS: 21

Plus green and gold standard peer-reviewed Plus green and gold standard peer-reviewed digital object records from repositories like digital object records from repositories like PLOS and arXivPLOS and arXiv

OAI-PMH modelOAI-PMH model

OAI-PMH modelOAI-PMH model

Data providers:Data providers: XML UTF-8 metadata recordsXML UTF-8 metadata records hosted by shareware softwarehosted by shareware software

Service providers:Service providers: discover the data providerdiscover the data provider harvest that metadataharvest that metadata transform it…transform it… index it and make it searchableindex it and make it searchable

Transformation toolTransformation tool

Remove “no digital object” recordsRemove “no digital object” records Add normalized fields for limiting searchAdd normalized fields for limiting search

currently resource type normalized to 5 values: currently resource type normalized to 5 values: text, image, audio, video, datasettext, image, audio, video, dataset

planning on date normalizationplanning on date normalization Maps Simple Dublin Core to our own DLXS Maps Simple Dublin Core to our own DLXS

Bibliographic Class for indexingBibliographic Class for indexing

System designSystem design

UM harvester

Record storage

XSLT transformation

tool

BibClass indexes

OAI-enabled DC records

XSL stylesheets (per source

type)

Search interface(XPAT)

MODS / Aquifer portalsMODS / Aquifer portals

Only harvest Simple Dublin Core for OAIsterOnly harvest Simple Dublin Core for OAIster Experimenting with harvesting MODSExperimenting with harvesting MODS

Why MODS?Why MODS?

Is the metadata standard of choice among Is the metadata standard of choice among richer, enhanced formatsricher, enhanced formats

Offers more focused ability to search and Offers more focused ability to search and retrieve recordsretrieve records

Based on MARC, but human-readableBased on MARC, but human-readable Digital Library Federation (we’re members) Digital Library Federation (we’re members)

is pushing for its useis pushing for its use

What’d we do with MODS?What’d we do with MODS?

Mapping MODS to DLXS Bibliographic Mapping MODS to DLXS Bibliographic Class with many modificationsClass with many modifications adding attributes-- handle display title (The adding attributes-- handle display title (The

quick fox…) vs. sort title (quick fox…, The)quick fox…) vs. sort title (quick fox…, The) merging fields-- namePartsmerging fields-- nameParts splitting out subject fields-- topical, name, splitting out subject fields-- topical, name,

geographical, hierarchicalgeographical, hierarchical Not all that perfectNot all that perfect

merged fields don’t always make sensemerged fields don’t always make sense not fully leveraging the richer fields in searchnot fully leveraging the richer fields in search

What else?What else?

Added bookbag functionsAdded bookbag functions Added thumbnailsAdded thumbnails Created better search interfaceCreated better search interface

Next…Next… tackle date normalizationtackle date normalization downloading of MODS directly from interfacedownloading of MODS directly from interface port useful features and widgets to OAIsterport useful features and widgets to OAIster

Onwards…Onwards…

Receive grant to work on Receive grant to work on metadata remediation…metadata remediation…

……meaning ways to cluster meaning ways to cluster and classify metadata so it is and classify metadata so it is more easily searchable and more easily searchable and browseable browseable

And continue to work on best And continue to work on best practices for data providerspractices for data providers

Who will win?*Who will win?*

* kidding…?

Questions?Questions?

Kat HagedornKat Hagedorn University of Michigan LibrariesUniversity of Michigan Libraries Digital Library Production ServiceDigital Library Production Service www.oaister.orgwww.oaister.org [email protected]@umich.edu