flexelink winter presentation 26 february 2002 flexible linking (and formatting) management software...

17
flexElink Winter presentation 26 February 2002 Flexible linking (and formatting) management software Hector Sanchez Universitat Jaume I Ing. Informatica CERN ETT-DH

Upload: jordan-anderson

Post on 03-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: FlexElink Winter presentation 26 February 2002 Flexible linking (and formatting) management software Hector Sanchez Universitat Jaume I Ing. Informatica

flexElink

Winter presentation26 February 2002

Flexible linking (and formatting) management software

Hector Sanchez

Universitat Jaume IIng. Informatica

CERNETT-DH

Page 2: FlexElink Winter presentation 26 February 2002 Flexible linking (and formatting) management software Hector Sanchez Universitat Jaume I Ing. Informatica

Hector Sanchez 26 February 2002 @ CERN

Contents

Introduction

Project overview: definition, scenarios, architecture, technology

Main features

Benefits & results

Page 3: FlexElink Winter presentation 26 February 2002 Flexible linking (and formatting) management software Hector Sanchez Universitat Jaume I Ing. Informatica

Hector Sanchez 26 February 2002 @ CERN

Introduction

Link in the scope of FlexElink

Stored vs. generated links

Link managers

Reference to the fulltext version or a Internet resource related to a certain bibliographic record (not necessarily an URL)

Generated links reduce considerably maintenance

Know when to create a link and build them from bibliographic data

Link managers@CDS: SetLink, GoDirect, Dynamic Format

Page 4: FlexElink Winter presentation 26 February 2002 Flexible linking (and formatting) management software Hector Sanchez Universitat Jaume I Ing. Informatica

Hector Sanchez 26 February 2002 @ CERN

Project goals

New link management tool

Improvement of the formatting tool

Integration of already existing LM technologies used at CDSBe able to adapt to new situations and needs

Independent of the formatter

Work over different types of inputs

Cover all possible formatting functions needed

Reduce maintenance Avoid ‘harcode’ maintenance

Make it easy to use for CDS clients

Page 5: FlexElink Winter presentation 26 February 2002 Flexible linking (and formatting) management software Hector Sanchez Universitat Jaume I Ing. Informatica

Hector Sanchez 26 February 2002 @ CERN

Scenario 1: Brief formats

Output: Original XML record with its HTML version

Input: Bunch of records in OAI MARC XML

cv3t5 flexElink

‘CERN MARC’

OAI MARC XML cxtmOAI MARC XML*

SQL

<oai_marc> <varfield id="041" i1="" i2=""> <subfield label="a">und</subfield> </varfiled>...</oai_marc>

<oai_marc> <varfield id="041" i1="" i2=""> <subfield label="a">und</subfield> </varfiled>... <varfield id="FMT" i1="" i2=""> <subfield label="f">h</subfield> <subfield label="g>HTML</subfield> </varfield> </oai_marc>

BibliographicDB

ALEPH

ConsultationDB

MySQL

Page 6: FlexElink Winter presentation 26 February 2002 Flexible linking (and formatting) management software Hector Sanchez Universitat Jaume I Ing. Informatica

Hector Sanchez 26 February 2002 @ CERN

Scenario 2: Detailed formats

Output: HTML version to be displayed or PHP to be saved to a file

Input: record in OAI MARC XML

CDS search flexElink

OAI MARC XML HTML page

Links to fulltext & references

PHP file

setlink outputPre-generated

references inclusion

ConsultationDB

MySQL

Page 7: FlexElink Winter presentation 26 February 2002 Flexible linking (and formatting) management software Hector Sanchez Universitat Jaume I Ing. Informatica

Hector Sanchez 26 February 2002 @ CERN

Architecture overview

RecordSeparator

VariableExtractor

BehaviorProcessor

LinkManager

Web configuration

interface

Extractionrules

Link repository

Behaviorrepository

individualrecord

internal variables

solve links

Text output

inputrecords

admins

Page 8: FlexElink Winter presentation 26 February 2002 Flexible linking (and formatting) management software Hector Sanchez Universitat Jaume I Ing. Informatica

Hector Sanchez 26 February 2002 @ CERN

Technology

OO analysis and design

Implementation tools

100% open source & freeware

Component based delegation & collaboration lead to a more de-coupled and re-usable software

Almost any part of the system can be substituted, modified or extended without affecting the rest

Page 9: FlexElink Winter presentation 26 February 2002 Flexible linking (and formatting) management software Hector Sanchez Universitat Jaume I Ing. Informatica

Hector Sanchez 26 February 2002 @ CERN

Main features: Internal variables

Maps the values in the input OAI MARC XML records into internal variables

This mapping can be configured using the Extraction Rules

Tells the extraction module which values to extract from the input and to which variables it has to map them

Makes the rest of the configuration independent of the input

Developed for OAI MARC XML but it can be adapted to other input types (DB) by specialising the extraction module

Page 10: FlexElink Winter presentation 26 February 2002 Flexible linking (and formatting) management software Hector Sanchez Universitat Jaume I Ing. Informatica

Hector Sanchez 26 February 2002 @ CERN

Main features: Internal Variables

OAI MARC XML extraction rules example

<oai_marc> <varfield id="037" i1="" i2=""> <subfield label="a">SCAN-0009119</subfield> </varfield> <varfield id="100" i1="" i2=""> <subfield label="a">Racah, Giulio</subfield> </varfield> <varfield id="100" i1="" i2=""> <subfield label="a">Guignard, G</subfield> <subfield label="e">editor</subfield> </varfield> <varfield id="909" i1="C" i2="0"> <subfield label="b">11</subfield> </varfield></oai_marc>

<varfield id="100" i1="" i2=""> author

<subfield label="a"> name

<subfield label=“e"> editor

fields

Variable: author

Value #0 field: name Racah, Giulio

Value #1 field: name Guignard, G

field: editor

editor

<varfield id="100" i1="" i2="">

<oai_marc> <varfield id="037" i1="" i2=""> <subfield label="a">SCAN-0009119</subfield> </varfield> <varfield id="100" i1="" i2=""> <subfield label="a">Racah, Giulio</subfield> </varfield> <varfield id="100" i1="" i2=""> <subfield label="a">Guignard, G</subfield> <subfield label="e">editor</subfield> </varfield> <varfield id="909" i1="C" i2="0"> <subfield label="b">11</subfield> </varfield></oai_marc>

<subfield label="a">

<oai_marc> <varfield id="037" i1="" i2=""> <subfield label="a">SCAN-0009119</subfield> </varfield> <varfield id="100" i1="" i2=""> <subfield label="a">Racah, Giulio</subfield> </varfield> <varfield id="100" i1="" i2=""> <subfield label="a">Guignard, G</subfield> <subfield label="e">editor</subfield> </varfield> <varfield id="909" i1="C" i2="0"> <subfield label="b">11</subfield> </varfield></oai_marc>

<subfield label=“e">

<oai_marc> <varfield id="037" i1="" i2=""> <subfield label="a">SCAN-0009119</subfield> </varfield> <varfield id="100" i1="" i2=""> <subfield label="a">Racah, Giulio</subfield> </varfield> <varfield id="100" i1="" i2=""> <subfield label="a">Guignard, G</subfield> <subfield label="e">editor</subfield> </varfield> <varfield id="909" i1="C" i2="0"> <subfield label="b">11</subfield> </varfield></oai_marc>

Page 11: FlexElink Winter presentation 26 February 2002 Flexible linking (and formatting) management software Hector Sanchez Universitat Jaume I Ing. Informatica

Hector Sanchez 26 February 2002 @ CERN

Main features: Behaviours

Behaviour: Describes how the input has to be processedin order to achieve desired output

Support for multiple behavioursBehaviour

Condition 1

Actions

Condition 2

Actions

Condition: Expression that makes associated actions to be applied only if it’s TRUE for the current input record data

Action: Set of statements that describes how the output has to be built (e.g. formats) if thecorresponding condition is accomplished

Conditions and actions are expressed using the Evaluation Language

Page 12: FlexElink Winter presentation 26 February 2002 Flexible linking (and formatting) management software Hector Sanchez Universitat Jaume I Ing. Informatica

Hector Sanchez 26 February 2002 @ CERN

Main features: Evaluation Language

Specially designed for FlexElink

Context-free grammar

Extensible via User Defined Functions (UDFs)

Operations that are defined in PHP

Simple Knowledge Base management

Allows interaction with the Link manager

Re-usability of expressions through Formats

Enables the access to internal variables

Page 13: FlexElink Winter presentation 26 February 2002 Flexible linking (and formatting) management software Hector Sanchez Universitat Jaume I Ing. Informatica

Hector Sanchez 26 February 2002 @ CERN

Main features: Behaviours

Simple behaviour example

Behaviour: SIMPLE

$909C0.b=”27”

“”=“”

“<b>” $245.a ”</b>”forall($0248.a){ rep_prefix(“ – “) $0248.a separator("; ") }

“<b>”$245.a”</b>”forall($100.a){ rep_prefix(“– Authors: “) $100.a separator("; ") }

UDFs

100.a author name245.a title

0248.a standard ref

Internal Variables

909C0.b base #

Page 14: FlexElink Winter presentation 26 February 2002 Flexible linking (and formatting) management software Hector Sanchez Universitat Jaume I Ing. Informatica

Hector Sanchez 26 February 2002 @ CERN

Main features: Link Manager

Generates links from stored rules

These rules are also expressed using the Evaluation Language

Supports different types of link solving

External linking Just generate the link from the rulesInternal linking The link is always a file, it checks the existence, access, formats, etc

Can be extended: The LM is just a framework to which new linking logic can be added

Independent of the formatter

It has no access to Internal Variables, receives data as parameters

Page 15: FlexElink Winter presentation 26 February 2002 Flexible linking (and formatting) management software Hector Sanchez Universitat Jaume I Ing. Informatica

Hector Sanchez 26 February 2002 @ CERN

Main features: Link Manager

Example: simple link definition and access from ELGeneration of records with already solved fulltext links

“<b>” $245.a “</b><br>”link(“FULLTEXT”, $base, $categ, $id) { “<b>Fulltext access:</b>” forall($link){ “<a href=\”” $link “\”>[“ $link.format_id “]</a>” }}else{ “No link found”}

FULLTEXT link definition

Link manager call

Page 16: FlexElink Winter presentation 26 February 2002 Flexible linking (and formatting) management software Hector Sanchez Universitat Jaume I Ing. Informatica

Hector Sanchez 26 February 2002 @ CERN

Benefits

More modular and specialised CDS Search

The OO approach eases the maintenance and allows future extensibility

Only one way of configuring formats and links

All the configuration is kept in a DB and separated of the logic

Possible to generate different configuration views

Search Engine doesn’t know anything about linking or formatting

flexElink Search Engine

formatslinks

format/link config

users

queryresults

Page 17: FlexElink Winter presentation 26 February 2002 Flexible linking (and formatting) management software Hector Sanchez Universitat Jaume I Ing. Informatica

Hector Sanchez 26 February 2002 @ CERN

Results

It’s already being successfully used for

Pre-generated CDS Search BRIEF formats

On-the-fly creation of CDS Search DETAILED formats

HTML pages of the fulltext extracted references

Speed optimisation (test over 15’000 records)

BRIEF format creation (average): 0.05 sec/record

DETAILED format creation (average): 0.15 sec/record

Testing for future replacement of GoDirect and SetLink

GoDirect: ‘automatically’ migrated 91% of journals

Setlink: Ready for defining new fulltext rules