flexelink winter presentation 26 february 2002 flexible linking (and formatting) management software...
TRANSCRIPT
flexElink
Winter presentation26 February 2002
Flexible linking (and formatting) management software
Hector Sanchez
Universitat Jaume IIng. Informatica
CERNETT-DH
Hector Sanchez 26 February 2002 @ CERN
Contents
Introduction
Project overview: definition, scenarios, architecture, technology
Main features
Benefits & results
Hector Sanchez 26 February 2002 @ CERN
Introduction
Link in the scope of FlexElink
Stored vs. generated links
Link managers
Reference to the fulltext version or a Internet resource related to a certain bibliographic record (not necessarily an URL)
Generated links reduce considerably maintenance
Know when to create a link and build them from bibliographic data
Link managers@CDS: SetLink, GoDirect, Dynamic Format
Hector Sanchez 26 February 2002 @ CERN
Project goals
New link management tool
Improvement of the formatting tool
Integration of already existing LM technologies used at CDSBe able to adapt to new situations and needs
Independent of the formatter
Work over different types of inputs
Cover all possible formatting functions needed
Reduce maintenance Avoid ‘harcode’ maintenance
Make it easy to use for CDS clients
Hector Sanchez 26 February 2002 @ CERN
Scenario 1: Brief formats
Output: Original XML record with its HTML version
Input: Bunch of records in OAI MARC XML
cv3t5 flexElink
‘CERN MARC’
OAI MARC XML cxtmOAI MARC XML*
SQL
<oai_marc> <varfield id="041" i1="" i2=""> <subfield label="a">und</subfield> </varfiled>...</oai_marc>
<oai_marc> <varfield id="041" i1="" i2=""> <subfield label="a">und</subfield> </varfiled>... <varfield id="FMT" i1="" i2=""> <subfield label="f">h</subfield> <subfield label="g>HTML</subfield> </varfield> </oai_marc>
BibliographicDB
ALEPH
ConsultationDB
MySQL
Hector Sanchez 26 February 2002 @ CERN
Scenario 2: Detailed formats
Output: HTML version to be displayed or PHP to be saved to a file
Input: record in OAI MARC XML
CDS search flexElink
OAI MARC XML HTML page
Links to fulltext & references
PHP file
setlink outputPre-generated
references inclusion
ConsultationDB
MySQL
Hector Sanchez 26 February 2002 @ CERN
Architecture overview
RecordSeparator
VariableExtractor
BehaviorProcessor
LinkManager
Web configuration
interface
Extractionrules
Link repository
Behaviorrepository
individualrecord
internal variables
solve links
Text output
inputrecords
admins
Hector Sanchez 26 February 2002 @ CERN
Technology
OO analysis and design
Implementation tools
100% open source & freeware
Component based delegation & collaboration lead to a more de-coupled and re-usable software
Almost any part of the system can be substituted, modified or extended without affecting the rest
Hector Sanchez 26 February 2002 @ CERN
Main features: Internal variables
Maps the values in the input OAI MARC XML records into internal variables
This mapping can be configured using the Extraction Rules
Tells the extraction module which values to extract from the input and to which variables it has to map them
Makes the rest of the configuration independent of the input
Developed for OAI MARC XML but it can be adapted to other input types (DB) by specialising the extraction module
Hector Sanchez 26 February 2002 @ CERN
Main features: Internal Variables
OAI MARC XML extraction rules example
<oai_marc> <varfield id="037" i1="" i2=""> <subfield label="a">SCAN-0009119</subfield> </varfield> <varfield id="100" i1="" i2=""> <subfield label="a">Racah, Giulio</subfield> </varfield> <varfield id="100" i1="" i2=""> <subfield label="a">Guignard, G</subfield> <subfield label="e">editor</subfield> </varfield> <varfield id="909" i1="C" i2="0"> <subfield label="b">11</subfield> </varfield></oai_marc>
<varfield id="100" i1="" i2=""> author
<subfield label="a"> name
<subfield label=“e"> editor
fields
Variable: author
Value #0 field: name Racah, Giulio
Value #1 field: name Guignard, G
field: editor
editor
<varfield id="100" i1="" i2="">
<oai_marc> <varfield id="037" i1="" i2=""> <subfield label="a">SCAN-0009119</subfield> </varfield> <varfield id="100" i1="" i2=""> <subfield label="a">Racah, Giulio</subfield> </varfield> <varfield id="100" i1="" i2=""> <subfield label="a">Guignard, G</subfield> <subfield label="e">editor</subfield> </varfield> <varfield id="909" i1="C" i2="0"> <subfield label="b">11</subfield> </varfield></oai_marc>
<subfield label="a">
<oai_marc> <varfield id="037" i1="" i2=""> <subfield label="a">SCAN-0009119</subfield> </varfield> <varfield id="100" i1="" i2=""> <subfield label="a">Racah, Giulio</subfield> </varfield> <varfield id="100" i1="" i2=""> <subfield label="a">Guignard, G</subfield> <subfield label="e">editor</subfield> </varfield> <varfield id="909" i1="C" i2="0"> <subfield label="b">11</subfield> </varfield></oai_marc>
<subfield label=“e">
<oai_marc> <varfield id="037" i1="" i2=""> <subfield label="a">SCAN-0009119</subfield> </varfield> <varfield id="100" i1="" i2=""> <subfield label="a">Racah, Giulio</subfield> </varfield> <varfield id="100" i1="" i2=""> <subfield label="a">Guignard, G</subfield> <subfield label="e">editor</subfield> </varfield> <varfield id="909" i1="C" i2="0"> <subfield label="b">11</subfield> </varfield></oai_marc>
Hector Sanchez 26 February 2002 @ CERN
Main features: Behaviours
Behaviour: Describes how the input has to be processedin order to achieve desired output
Support for multiple behavioursBehaviour
Condition 1
Actions
Condition 2
Actions
Condition: Expression that makes associated actions to be applied only if it’s TRUE for the current input record data
Action: Set of statements that describes how the output has to be built (e.g. formats) if thecorresponding condition is accomplished
Conditions and actions are expressed using the Evaluation Language
Hector Sanchez 26 February 2002 @ CERN
Main features: Evaluation Language
Specially designed for FlexElink
Context-free grammar
Extensible via User Defined Functions (UDFs)
Operations that are defined in PHP
Simple Knowledge Base management
Allows interaction with the Link manager
Re-usability of expressions through Formats
Enables the access to internal variables
Hector Sanchez 26 February 2002 @ CERN
Main features: Behaviours
Simple behaviour example
Behaviour: SIMPLE
$909C0.b=”27”
“”=“”
“<b>” $245.a ”</b>”forall($0248.a){ rep_prefix(“ – “) $0248.a separator("; ") }
“<b>”$245.a”</b>”forall($100.a){ rep_prefix(“– Authors: “) $100.a separator("; ") }
UDFs
100.a author name245.a title
0248.a standard ref
Internal Variables
909C0.b base #
Hector Sanchez 26 February 2002 @ CERN
Main features: Link Manager
Generates links from stored rules
These rules are also expressed using the Evaluation Language
Supports different types of link solving
External linking Just generate the link from the rulesInternal linking The link is always a file, it checks the existence, access, formats, etc
Can be extended: The LM is just a framework to which new linking logic can be added
Independent of the formatter
It has no access to Internal Variables, receives data as parameters
Hector Sanchez 26 February 2002 @ CERN
Main features: Link Manager
Example: simple link definition and access from ELGeneration of records with already solved fulltext links
“<b>” $245.a “</b><br>”link(“FULLTEXT”, $base, $categ, $id) { “<b>Fulltext access:</b>” forall($link){ “<a href=\”” $link “\”>[“ $link.format_id “]</a>” }}else{ “No link found”}
FULLTEXT link definition
Link manager call
Hector Sanchez 26 February 2002 @ CERN
Benefits
More modular and specialised CDS Search
The OO approach eases the maintenance and allows future extensibility
Only one way of configuring formats and links
All the configuration is kept in a DB and separated of the logic
Possible to generate different configuration views
Search Engine doesn’t know anything about linking or formatting
flexElink Search Engine
formatslinks
format/link config
users
queryresults
Hector Sanchez 26 February 2002 @ CERN
Results
It’s already being successfully used for
Pre-generated CDS Search BRIEF formats
On-the-fly creation of CDS Search DETAILED formats
HTML pages of the fulltext extracted references
Speed optimisation (test over 15’000 records)
BRIEF format creation (average): 0.05 sec/record
DETAILED format creation (average): 0.15 sec/record
Testing for future replacement of GoDirect and SetLink
GoDirect: ‘automatically’ migrated 91% of journals
Setlink: Ready for defining new fulltext rules