www.renardus.org follow the fox to renardus: an academic subject gateway service for europe...

38
enardus www.renardus.org Follow the Fox to Renardus: an Academic Subject Gateway Service for Europe Cross-browsing and Cross-searching in a Distributed Network of Subject Gateways: Architecture, Data Model and Classification Dr. Heike Neuroth & Traugott Koch State Library of Lower Saxony and the University Library of Göttingen, Germany [email protected] NetLab, Lund University Library Development Department, Sweden [email protected]

Upload: tyler-caldwell

Post on 29-Dec-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

enardus

www.renardus.org

Follow the Fox to Renardus: an Academic Subject Gateway Service for Europe

Cross-browsing and Cross-searching in a Distributed Network of Subject Gateways:

Architecture, Data Model and ClassificationDr. Heike Neuroth & Traugott Koch

State Library of Lower Saxony and the University Library of Göttingen, Germany [email protected]

NetLab, Lund University Library Development Department, [email protected]

ELAG 2001, Prague 6-8 June 2001Neuroth & Koch

Content

Renardus (aim, partners, etc.) Subject Gateway (definition, elements) Renardus Application Profile (working steps, metadata

core set, data model, etc.) Renardus Collection Level Description Renardus Technical Approach DDC Mapping for Cross-Browsing (methods, mapping

relationships etc.) Outlook

ELAG 2001, Prague 6-8 June 2001Neuroth & Koch

What is Renardus?

EU-funded project: EC: 1,7 Mio EURO, including non costs: 2,3 Mio EURO

1 January 2000 - 30 June 2002

under the “Information Society Technologies” (IST-1999-10562) 'Promoting a User-friendly Information Society‘, a major theme of the European Union's 5th Framework Programme

Partners drawn from 7 countries: Project Management: National Library Den Haag (NL) Denmark, Finland, Sweden, France, United Kingdom,

Germany

ELAG 2001, Prague 6-8 June 2001Neuroth & Koch

Objectives

to provide access to distributed quality-controlled subject gateways (high quality metadata collections) across Europe via one single interface: cross-search cross-browse

and to develop, define: metadata solutions Renardus Application Profile, Renardus Namespaces,

Renardus Collection Level Description technical solutions organizational/business models

ELAG 2001, Prague 6-8 June 2001Neuroth & Koch

Member Subject Gateways

DAINet: German Agricultural Information System Document Server DEPOSIT: Deposit of German Online

Dissertations DutchESS: Dutch Electronic Subject Service EELS: Engineering Electronic Library, Sweden FVL: The Finnish Virtual Library NOVAGate: Libraries of Nordic Agricultural & Veterinary Univ. SSG-FI: MathGuide, Geo-Guide, History Guide, Anglistik Guide RDN hubs: Resource Discovery Network (EEVL, SOSIG, OMNI, ...)

Danish Electronic Research Library (future partner) Les Signets: Collection of Internet Resources (future partner)

ELAG 2001, Prague 6-8 June 2001Neuroth & Koch

Subject Gateway

”Quality-controlled subject gateways are Internet-services which

apply a rich set of quality measures to support systematic resource

discovery. Considerable manual effort is used to secure a selection

of resources which meet quality criteria and to display a rich

description of these resources with standards-based metadata.

Regular checking and updating ensure good collection management.

A main goal is to provide a high quality of subject access through

indexing resources using controlled vocabularies and by offering a

deep classification structure for advanced searching and browsing.”

ELAG 2001, Prague 6-8 June 2001Neuroth & Koch

Subject Gateway cont.

Elements: creation: manual/intellectual, experts etc. selection and collection development: policy, selection criteria

etc. collection management: maintenance of collection etc. resource description/metadata: rich set of metadata, formalized

content description etc. subject classification/subject access: controlled vocabularies etc. standards: allow interoperability etc. value-adding features: display, usage features etc.

ELAG 2001, Prague 6-8 June 2001Neuroth & Koch

Working Steps - General

selection of necessary/meaningful elements: for a service like Renardus: „Meta-Subject Gateway“,

European service (multilingual access, search, browse) for search, filter, sort, and display options for browse, subject access

selection of common metadata format (exchange format): Dublin Core Metadata Element Set v1.1 Dublin Core Qualifiers others home-grown

ELAG 2001, Prague 6-8 June 2001Neuroth & Koch

Working Steps - Analysis

first survey of partners‘ metadata format and detailed descripion of each subject gateway

GENERAL name of SG, acronym, responsible organization, source of

funding, time for record creation, general description etc. COLLECTION/SELECTION

target user group, common primary language of target audience, collection scope, geographical and language coverage, selection criteria, granularity, resource types, resource formats etc.

CONTENT - METADATA metadata scheme, metadata set, crosswalks, interoperability,

cataloging rules, authority files etc.

ELAG 2001, Prague 6-8 June 2001Neuroth & Koch

Working Steps - Analysis cont.

CONTENT - OTHERS metadata browsable, searchable language(s) of descriptions, thesauri, interface, translation

support etc., keywords, classification systems, etc. INDEX TYPE/TECHNICAL NOTES

search engine, indexing system, structure of data storage etc. INTELLECTUAL PROPERTY RIGHTS (IPR)

copyright, branding VARIOUS

(quality) control, link checking, record checking/update etc. backlinks of the gateway, statistical analysis of log files etc.

etc.

ELAG 2001, Prague 6-8 June 2001Neuroth & Koch

First Results

definition of 8 metadata elements without detailed semantics, syntax based on Dublin Core:

DC.Title DC.Creator DC.Description DC.Subject DC.Identifier DC.Language DC.Type Country

ELAG 2001, Prague 6-8 June 2001Neuroth & Koch

Renardus Data Model

detailed investigations of each element about: semantics and syntax of each element qualifiers (refinements, encoding schemes) cataloging rules (creator, description, keywords) namespace repeatability of each element form of obligation (mandatory, strongly recommended, optional) language qualifier (for title, description, subject)

and: administrative elements future elements (rights, publisher), additional elements (format, etc.) common browsing structure via classification system (home-grown,

reuse of an existing system, which one)

ELAG 2001, Prague 6-8 June 2001Neuroth & Koch

Renardus Data Model cont.

ELAG 2001, Prague 6-8 June 2001Neuroth & Koch

Renardus Application Profile

Renardus Application Profile based on four namespaces, to be encoded in RDF/XML: Dublin Core Namespace: [DCMES version 1.1] Dublin Core

Metadata Element Set, Version 1.1: Reference Description

Dublin Core Qualifiers Namespace: [DCMES Qualifiers (2000-07-11)] Dublin Core Qualifiers

Renardus Namespace: [RMES version 0.1, 2001-04-30] Renardus Metadata Element Set

Renardus Namespace Qualifiers: [RMES Qualifiers version 0.1, 2001-04-30] Renardus Metadata Element Set

ELAG 2001, Prague 6-8 June 2001Neuroth & Koch

Renardus AP cont.

“content metadata”Title and Title.Alternative

title: DCMES: mandatory, not repeatable, language tag

title.alternative: DCMES Qualifiers: optional, repeatable, language tag

Creator

DCMES: strongly recommended, repeatable

RMES Qualifiers (LastName, FirstName): strongly recommended, repeatable

Description

DCMES: mandatory in text version, repeatable, language tag

ELAG 2001, Prague 6-8 June 2001Neuroth & Koch

Renardus AP cont.

Subject

DCMES: mandatory, repeatable, language tag

DCMES Qualifiers: strongly recommended, repeatable, language tag

RMES Qualifiers (all other encoding schemes): mandatory, repeatable, language tag

RMES Qualifiers (Ren-DDC): mandatory, repeatable

Identifier

DCMES Qualifiers: mandatory, repeatable (probably in the pilot system)

RMES Qualifiers: “Operational System” mit Qualifiers “Archive”, “Mirror” ...

Language

DCMES Qualifiers: strongly recommended, repeatable

ELAG 2001, Prague 6-8 June 2001Neuroth & Koch

Renardus AP cont.

Type

DCMES: strongly recommended, repeatable

DCMES Qualifiers (DCT1): strongly recommended, repeatable

DCMES Qualifiers (DCT2): “Operational System”

Country

RMES Qualifiers: strongly recommended, not repeatable

“administrative metadata”Full Record URL

RMES Qualifiers: strongly recommended, not repeatable

SBIG ID

RMES Qualifiers: mandatory, not repeatable

ELAG 2001, Prague 6-8 June 2001Neuroth & Koch

Renardus CLD Schema

Collection Level Description: simple description of collections, locations and related people or organizations

in Renardus: to provide information about participating Subject Gateways: users chose Subject Gateways for thematic search (semi-

automatic selection for subject)

well-structured background information (human and machine readable)

promotion

registry of Subject Gateways

ELAG 2001, Prague 6-8 June 2001Neuroth & Koch

Renardus CLD Format

Format: based on RSLP Collection Description (UKOLN):

Dublin Core metadata elements (e.g. DC.Title, DC.Description, DC.Subject)

RSLP metadata elements (cld.country)

Renardus specific metadata elements (e.g. rencld:acronym, rencld:subjectNotation, rencld:resourceLanguage etc.)

ELAG 2001, Prague 6-8 June 2001Neuroth & Koch

Renardus CLD Tool

WWW based form RDF, RDF/XML, and text encoding file is saved locally, each partner is able to update his

description at every time Renardus broker gathers all Subject Gateway

descriptions

ELAG 2001, Prague 6-8 June 2001Neuroth & Koch

ELAG 2001, Prague 6-8 June 2001Neuroth & Koch

Renardus Technical Approach

PREPARATION investigation:

of available standards and technologies of functional and user requirements of service provider requirements

formulation of use cases in UML development of data model

data model choosing architecture (decentralized vs. centralized)

architectural diagram search/retrieval protocol common profile (map data model to the protocol Z39.50)

Z39.50 profile, Bath compliant

ELAG 2001, Prague 6-8 June 2001Neuroth & Koch

ELAG 2001, Prague 6-8 June 2001Neuroth & Koch

Renardus Technical Approach cont.

IMPLEMENTATION data normalization

encoding RDF/XML (RDF normalizing toolkit) classification mapping (mapping tool adapted from CARMENx) CLDs (CLD tool adapted from RSLP)

creation of participants Renardus servers (Z39.50, Z'mbol) implementation of broker software and functionality

cross-searching (Zebril and modified EUROPAGATE simultaneous gateway)

cross-browsing (browsing tool, SQL) user interface implementation (with use cases)

screen layout (Zebril and HTML, Javascript)

ELAG 2001, Prague 6-8 June 2001Neuroth & Koch

DDC Mapping for Cross-Browsing

why subject cross-browsing and classification? why switching language?

browsing/mapping from DDC to the local systems/browsing structures

why DDC? comparison to alternatives research license, allowed changes

analysis of partners classification systems types, adaptions, number of levels and classes, subject

overlap

ELAG 2001, Prague 6-8 June 2001Neuroth & Koch

DDC Mapping for Cross-Browsing cont.

mapping approaches and issues mapping methods

mapping between classes, not between individual resources

priorities: e.g. only well used classes are mapped

recommendations for local improvements

mapping relationships fully equivalent, narrower and broader equivalent, major and

minor overlap

reuse for retrieval result clustering

ELAG 2001, Prague 6-8 June 2001Neuroth & Koch

DDC Mapping for Cross-Browsing cont.

technical solution sources: local classifications, CORC Web Dewey mapping tool adapted from CARMENx (MySQL, PHP, Javascript) syntax of the mapping information creation of the browsing pages

usage of the DDC mapping in Renardus „browse and jump“ why not virtual browsing? DDC classification search (in advanced search) user interface solutions

ELAG 2001, Prague 6-8 June 2001Neuroth & Koch

MSC 2000MSC 2000

DDCDDC

DDCDDCMSC 2000MSC 2000

ELAG 2001, Prague 6-8 June 2001Neuroth & Koch

DDC Mapping for Cross-Browsing cont.

future recommendations for subject access efforts in gateways and

brokers multilingual access to the DDC top-levels automatic mapping (and classification) as support owners should take over for sustainable mapping

documentation DDC mapping report (D7.4) practical mapping guidelines (D7.4) paper at IFLA Satellite Conf., August 2001

ELAG 2001, Prague 6-8 June 2001Neuroth & Koch

Outlook

June 2001: Public Deliverable WP 6, D6.5 Renardus Application Profile Renardus Namespaces Renardus Collection Level Description DDC Mapping

June 2001: Beta-Version of Renardus broker first DDC mapping results first evaluations of broker will start

November 2001 Renardus Workshop for future participating Subject Gateways

ELAG 2001, Prague 6-8 June 2001Neuroth & Koch

URLs & References

Renardus http://www.renardus.org SUB Renardus - http://renardus.sub.uni-goettingen.de/ (also with D7.4) News Digest SIGN-UP Form - http://www.renardus.org/news/sign-up.html Evaluation of existing data models (D6.1) -

http://www.renardus.org/deliverables/d6_1/docframe.htm DCMI Dublin Core Metadata Initiative - http://www.dublincore.org/

Dublin Core Metadata Element Set, Version 1.1: Reference Description - http://www.dublincore.org/documents/dces/

Dublin Core Qualifiers - http://www.dublincore.org/documents/dcmes-qualifiers/

DCMI Agents Working Group - http://www.dublincore.org/groups/agents/ DCMI Type Working Group - http://www.dublincore.org/groups/type/

ELAG 2001, Prague 6-8 June 2001Neuroth & Koch

URLs & References

RSLP Collection Description - http://www.ukoln.ac.uk/metadata/rslp/ CLD Collection Level Description - http://ukoln.ac.uk/metadata/cld/ RSLP Collection Description Tool -

http://www.ukoln.ac.uk/metadata/rslp/tool/

Subject Gateways (Traugott Koch): Online Information Review, Vol. 24, Number 1, 2000

ELAG 2001, Prague 6-8 June 2001Neuroth & Koch

Cross-Search

basic index: Title, Description, Subject

field search: Title

Creator (in DC Simple and later on in RMES Qualifiers)

Description

DDC Captions (also cross-browsable!)

Subject (in future: several encoding schemes for keyword and classification systems of partners)

Type

ELAG 2001, Prague 6-8 June 2001Neuroth & Koch

Filter Options

Type DCMI Type 1 (mapping of partners‘ document types to Dublin

Core Type 1)

in future also meaningful: mapping to Sub Type List of DCMI?

Probably no Renardus specific type list

Language (of resources and languages of metadata = Language Tag)

Country

ELAG 2001, Prague 6-8 June 2001Neuroth & Koch

Sorting

Title (alphabetic sorting)

in future: Type, Language, Country? (central architecture)

Subject: Ren-DDC Classification mapping relation (fully equivalent, narrower equivalent, broader

equivalent, major overlap, minor overlap)

in discussion: Subject - Keywords: sorting after subject indexing group:

controlled vocabulary versus free keywords, but problematic!

ELAG 2001, Prague 6-8 June 2001Neuroth & Koch

ELAG 2001, Prague 6-8 June 2001Neuroth & Koch

ELAG 2001, Prague 6-8 June 2001Neuroth & Koch