oaister: metadata pointing to digital objects kat hagedorn metadata harvesting/dlxs librarian...

24
OAIster: Metadata Pointing to Digital Objects Kat Hagedorn Metadata Harvesting/DLXS Librarian University of Michigan Libraries February 18, 2004

Upload: nathan-kristopher-black

Post on 05-Jan-2016

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: OAIster: Metadata Pointing to Digital Objects Kat Hagedorn Metadata Harvesting/DLXS Librarian University of Michigan Libraries February 18, 2004

OAIster: Metadata Pointing to Digital Objects

Kat HagedornMetadata Harvesting/DLXS Librarian

University of Michigan LibrariesFebruary 18, 2004

Page 2: OAIster: Metadata Pointing to Digital Objects Kat Hagedorn Metadata Harvesting/DLXS Librarian University of Michigan Libraries February 18, 2004

background

• One-year Mellon grant project to test the feasibility of making OAI-enabled metadata for digital objects accessible to the public

• Digital Library Production Service at University of Michigan Libraries began work in December 2001

• Launched in June 2002

Page 3: OAIster: Metadata Pointing to Digital Objects Kat Hagedorn Metadata Harvesting/DLXS Librarian University of Michigan Libraries February 18, 2004

highlights

• Any audience• Any subject matter• Any format• Freely accessible• No dead ends• One-stop shopping

…retrieving the “hidden web”

Page 4: OAIster: Metadata Pointing to Digital Objects Kat Hagedorn Metadata Harvesting/DLXS Librarian University of Michigan Libraries February 18, 2004

tool we borrowed

• University of Illinois Urbana-Champaign open-source OAI protocol harvester

• java edition for our unix environment• Worked collaboratively to iron out kinks

– resumptionToken / retryAfter– inexplicable kill– bogus records in MySQL table

Page 5: OAIster: Metadata Pointing to Digital Objects Kat Hagedorn Metadata Harvesting/DLXS Librarian University of Michigan Libraries February 18, 2004

development environment

• Digital Library Extension Service (DLXS)• Develop open-source middleware and

license XPAT search engine for building and mounting digital libraries

• Middleware consists of document classes, i.e., Text, Image, Bib, FindAid

• Originally designed to make SGML encoded texts available online

Page 6: OAIster: Metadata Pointing to Digital Objects Kat Hagedorn Metadata Harvesting/DLXS Librarian University of Michigan Libraries February 18, 2004

tool we developed

• Runs in DLXS environment using BibClass• Current BibClass web templates modified• Additional java-based transformation tool

to:– DC metadata records concatenated– No-digital-object records filtered out– Records counted– Conversion from UTF-8 to ISO-8859-1– XSLT used to transform DC records into

BibClass records

Page 7: OAIster: Metadata Pointing to Digital Objects Kat Hagedorn Metadata Harvesting/DLXS Librarian University of Michigan Libraries February 18, 2004

system design

UIUC harvester

Record storage

XSLT transformation

tool

BibClass indexes

OAI-enabled DC records

Non-OAI-enabled

DC records

XSL stylesheets (per source

type)

Search interface(XPAT)

Page 8: OAIster: Metadata Pointing to Digital Objects Kat Hagedorn Metadata Harvesting/DLXS Librarian University of Michigan Libraries February 18, 2004

result

• One place to look for digital objects• Big

– 3,016,251 metadata records– 267 institutions (as of last week…)

• Popular– Averages 3300 search sessions / month– Picked up in March ‘03: average 3500 now– 43,894 searches in one year (June 2002 –

July 2003)

Page 9: OAIster: Metadata Pointing to Digital Objects Kat Hagedorn Metadata Harvesting/DLXS Librarian University of Michigan Libraries February 18, 2004

repositories: e.g.,

• arXiv Eprint Archive: math and physics pre- and post-prints

• Online Archive of California: manuscripts, photographs, and works of art held in institutions across California

• Sammelpunkt, Elektronisch Archivierte Theorie: archive of philosophical publications

• British Women Romantic Poets Project: collection of poems written by British women between 1789 and 1832

Page 10: OAIster: Metadata Pointing to Digital Objects Kat Hagedorn Metadata Harvesting/DLXS Librarian University of Michigan Libraries February 18, 2004

repositories: stats

• As of February ‘04, out of 267 repositories…• International and U.S.

– U.S.: 50.5% (135)– Intl: 49.5% (132)

• By subject– Humanities: 24% (65)– Science: 30% (81)– Mixed: 46% (121)

• E-prints and pre-prints– Using eprints.org software: 39% (104)– Not using eprints.org software: 61% (163)

Page 11: OAIster: Metadata Pointing to Digital Objects Kat Hagedorn Metadata Harvesting/DLXS Librarian University of Michigan Libraries February 18, 2004

major issues encountered

• Metadata variation• Records not leading to digital objects• Access restrictions on digital objects

described in records• Duplicate records for a single digital

object

Page 12: OAIster: Metadata Pointing to Digital Objects Kat Hagedorn Metadata Harvesting/DLXS Librarian University of Michigan Libraries February 18, 2004

issue: metadata variation

• With more records, users need more restrictions

• Consistent metadata needed to facilitate these restrictions

• One option: normalization of data

Page 13: OAIster: Metadata Pointing to Digital Objects Kat Hagedorn Metadata Harvesting/DLXS Librarian University of Michigan Libraries February 18, 2004

issue: metadata variation

• Type: the obvious quick win– 240 metadata values mapped to four

generic values (text, image, audio, video)– e.g.,

audio, sound = audiomotion, animation, newsreels, etc. = videowatercolour, watercolor, slides, etc. = imagearticle, articles, booklet, diss, story, etc. = text

Page 14: OAIster: Metadata Pointing to Digital Objects Kat Hagedorn Metadata Harvesting/DLXS Librarian University of Michigan Libraries February 18, 2004

issue: metadata variation

• Date: where to begin?– Most records with at least one date– Some records include up to seven dates– No consistent style of date

• Subject: out of context, what meaning?– Many records with at least one subject element– But over 100 records with more than 50 subjects– And one record with 1000!

Page 15: OAIster: Metadata Pointing to Digital Objects Kat Hagedorn Metadata Harvesting/DLXS Librarian University of Michigan Libraries February 18, 2004

issue: metadata variation

• Sample date values

<date>2-12-01</date><date>2002-01-01</date><date>0000-00-00</date><date>1822</date><date>between 1827 and 1833</date><date>18--?</date><date>November 13, 1947</date><date>SEP 1958</date><date>235 bce</date><date>Summer, 1948</date>

Page 16: OAIster: Metadata Pointing to Digital Objects Kat Hagedorn Metadata Harvesting/DLXS Librarian University of Michigan Libraries February 18, 2004

issue: metadata variation

• Sample subject values

<subject>30,51,52</subject><subject>1852, Apr. 22. E[veritt] Judson, letter to

Philuta [Judson].</subject><subject>Slavery--United States--Controversial

literature</subject><subject>view of interior with John Henry

sculpture</subject><subject>Particles (Nuclear physics) --

Research.</subject>

Page 17: OAIster: Metadata Pointing to Digital Objects Kat Hagedorn Metadata Harvesting/DLXS Librarian University of Michigan Libraries February 18, 2004

issue: no digital objects

• Some records contain links to further description of digital object

• But not the digital object itself• Culling difficult• One option: add explanatory text to site• Or, unfortunately, spot-check and

remove repositories with this issue

Page 18: OAIster: Metadata Pointing to Digital Objects Kat Hagedorn Metadata Harvesting/DLXS Librarian University of Michigan Libraries February 18, 2004

issue: access restrictions

• No records where metadata itself is restricted in use (as far as we know!)

• Definitely some records where objects are restricted to licensed users

• One option: add explanatory text to site• Or sub-set OAIster into free and

“partially” free repositories

Page 19: OAIster: Metadata Pointing to Digital Objects Kat Hagedorn Metadata Harvesting/DLXS Librarian University of Michigan Libraries February 18, 2004

issue: duplicate records

• Two records harvested, different identifiers, same object described and pointed to

• Two records harvested inadvertently through aggregators and original repositories

Page 20: OAIster: Metadata Pointing to Digital Objects Kat Hagedorn Metadata Harvesting/DLXS Librarian University of Michigan Libraries February 18, 2004

issue: duplicate records

• Need algorithm to automate de-duplication

• Were duplicates to be identified, how to deal with the issue?– Suppress?– Group?– Flag?

• So far, not addressed in OAIster

Page 21: OAIster: Metadata Pointing to Digital Objects Kat Hagedorn Metadata Harvesting/DLXS Librarian University of Michigan Libraries February 18, 2004

future of OAIster

• Advanced searching• Grouping to aid browsing• Further normalization of data• Handling duplicate records• Saving/emailing/downloading records• Collaboration with other services:

search, instructional…• More user testing…

Page 22: OAIster: Metadata Pointing to Digital Objects Kat Hagedorn Metadata Harvesting/DLXS Librarian University of Michigan Libraries February 18, 2004

current state of protocol

• Popular• As Peter Suber says:

– “…no other single idea or technology in the [open-source movement] has enjoyed this density of endorsement and adoption in a six month period.”

• Data providers over one year:– June ‘02: 56 repositories / 274,062 records– June ‘03: 187 repositories / 1,246,953 records– Over three-fold increase for repositories– Over four-fold increase for records

Page 23: OAIster: Metadata Pointing to Digital Objects Kat Hagedorn Metadata Harvesting/DLXS Librarian University of Michigan Libraries February 18, 2004

future of protocol

• Branching out– DC required vs. highly recommended– Use of OAI in closed environments– Static repository protocol– OAI-rights committee

• OAI evangelism

Page 24: OAIster: Metadata Pointing to Digital Objects Kat Hagedorn Metadata Harvesting/DLXS Librarian University of Michigan Libraries February 18, 2004

contact info

• Kat Hagedorn• University of Michigan Libraries, Digital

Library Production Service• [email protected]• http://www.oaister.org/