open scholarship 2006 bielefeld academic search engine a scientific search service for institutional...
TRANSCRIPT
Op
en
Sch
ola
rsh
ip 2
00
6
Bielefeld Academic Search Enginea Scientific Search Service
for Institutional Repositories
Open Scholarship 2006New Challenges for Open Access RepositoriesUniv. of Glasgow, 18-20 October 2006
Friedrich SummannBielefeld University Library
Op
en
Sch
ola
rsh
ip 2
00
6
BASE: concept and contentOverview BASE user-interface and further visionsBASE dataflowOAI harvesting challengesBASE interfacesDemo
Overview:
Op
en
Sch
ola
rsh
ip 2
00
6
BASE uses Fast Data Search BASE uses Linux-based multi-node systemBASE contains intellectual selected resources with focus on OAI Servers but also web crawled contentBASE displays result lists as bibliographic data and full text hitsBASE frontend is written in PHP using the search API from Fast Data SearchBASE offers sorting, search refinement and search history
BASE: concept and content
http://www.base-search.net
Op
en
Sch
ola
rsh
ip 2
00
6
Search API
Pipeline
QU
ERY &
RESU
LTPR
OC
ESSINGDO
CU
MEN
TPR
OC
ESSING
Pipeline
Pipeline
FILETRAVERSER
FILTER
SEARCH
INDEXFILES
CO
NN
ECTO
RS
TUNING, ADMINISTRATION and DEBUGGING
WEBCRAWLER
BASE: concept and content
Op
en
Sch
ola
rsh
ip 2
00
6
BASE: concept and content At present 3.8 mio documents in 274 collections,
15 of them web crawled data
Op
en
Sch
ola
rsh
ip 2
00
6
Projekt Gutenberg-DE
Internet Library of Early Journals Oxford
Various Institutional Repositories
Springer Link Metadata
Cornell HistMath Fulltext Crawl
University Michigan Historical Math
CiteSeer Zentralblatt Mathematik
Bielefeld Univ: Math. Preprints
ArXiv OPAC UL Bielefeld
Ifo Institute Munich
PubMed Journals of Enlightment
(Digital Collection of Bielefeld UL)
BASE: concept and content
Op
en
Sch
ola
rsh
ip 2
00
6Special view on IR server collections
Collections are listed in configuration file [ftubirmingham]
url = "http://eprints.bham.ac.uk/"desc_de = "The Univ. of Birmingham: Eprints Archive"desc_en = "The Univ. of Birmingham: Eprints Archive"descdd_de = "Birmingham Univ."descdd_en = "Birmingham Univ."
Collections can be clustered for user-interface, e.g. “Institutional Repositories Europe” consists of [ftubarcelona], [ftubath], [ftubristol] , [ftuhelsinki], …
Parametric search possible
Frontend is ready for multi view (independent views with own configuration and layouts on the same backend)
Op
en
Sch
ola
rsh
ip 2
00
6BASE: end-user interface (1)
Displays search results as
bibliographic data and full text hits
Op
en
Sch
ola
rsh
ip 2
00
6BASE: end-user interface (2)
The result list (left hand
side)
If the document
contains meta
data (e.g. title,
author, abstract)
the displayed
description is
highlighted
Op
en
Sch
ola
rsh
ip 2
00
6BASE: end-user interface (3)
• Various options to sort
the result set• Search refinement by
author, keyword,
document type,
language etc.• Search history
comprises up to 10
queries
The result list (right hand side)
Op
en
Sch
ola
rsh
ip 2
00
6BASE: end-user interface (4)
Search RefinementSelect an author ...
... only documents by this author are displayed
Op
en
Sch
ola
rsh
ip 2
00
6
Check citations (citing articles) in Google
Scholar ...
Google Scholar integration
Op
en
Sch
ola
rsh
ip 2
00
6Vision: DDC Browsing
Op
en
Sch
ola
rsh
ip 2
00
6
OAI-Data Web PagesDatabaseRecords
Harvesting Pre-Processing
Processing
Internal Index (FAST)
User interface (PHP)
BASE dataflowBASE dataflow
Op
en
Sch
ola
rsh
ip 2
00
6
2
1612
12
55
176
39
4
2
18
17 3
3
USA 82Canada 14South America 2Africa 3 India 5Australia 11New Zealand 1
OAI-compliant university repositories in BASEOAI-compliant university repositories in BASE
3
1
1
Op
en
Sch
ola
rsh
ip 2
00
6OAI harvesting challengesOAI harvesting challenges
Repositories do not response or deliver Error Messages
Data contain only References without any Fulltext
Links to the Document are not included or do not work
Access to fulltext often is restricted
XML file is not well-formed
Field content varies
Op
en
Sch
ola
rsh
ip 2
00
6Some Rules from the Harvesting PracticeSome Rules from the Harvesting Practice
Standard repository software is great - for OAI harvesting as well
Small collections – small problems
Getting the related fulltext is complicated
Libraries produce better metadata
Data aggregation may produce problems
Writing e-mails helps - sometimes
Op
en
Sch
ola
rsh
ip 2
00
6BASE interfacesBASE interfaces
Search form
HTTP calls
Web Service
Op
en
Sch
ola
rsh
ip 2
00
6
<form action="http://www.base-search.net/index.php" method="post" accept-charset="UTF-8"> <input maxlength="512" name="q" type="text" size="50" /> <input value="Search!" type="submit" /> <input value="all" name="s" type="hidden" /></form>
Local integration (via search form)Local integration (via search form)
E-Repository Integration
Op
en
Sch
ola
rsh
ip 2
00
6
Prototype: Search Based on SOAP interface(EU project DRIVER)
Op
en
Sch
ola
rsh
ip 2
00
6 Thank you!