regal - a repository for electronic documents and bibliographic data
TRANSCRIPT
graphthinking
a Repository for Electronic Documents and Bibliographic Data
Felix Ostrowski (graphthinking, @literarymachine)Jan Schnasse (hbz, @InspektorHicks)
ELAG, June 11th 2014, University of Bath
graphthinking
Rationale: A new foundation for Edoweb
● A system to gather, describe and archive deposit copies of electronic publications and websites on behalf of the State Library Center of Rhineland-Palatinate (LBZ)
● Operated by the North Rhine-Westphalian Library Service Center (hbz) since 2002
● Technical evolution: OPUS – Digitool – regal
graphthinking
The current system and its shortcomings: Digitool
● Digitool end-of-life is coming● Unwanted/unexpected dependencies to other projects
hosted on the same Digitool instance● Performance issues (we have millions of objects in
Digitool)● No easily configurable search indexes or OAI-PMH
interfaces for single collections● No out-of-the-box support of regional requirements (e.g.
metadata delivery to German National Library), extra money/developer hours needed
graphthinking
The current system and its shortcomings: Homemade
● Mix of self developed and Ex Libris components● Vicious circle
– introduction of workarounds– unpredictable migration costs
– decision to stay on obsolete version
– running out of support– introduction of workarounds
● Administrative responsibilities in different hbz working groups
graphthinking
Altogether, this leads to a expensive, hard to maintain and outdated system that doesn't
satisfy our and ours clients needs.
graphthinking
The following aspects are mandatory to achieve our goals
● Increase the overall performance● Provide an up-to-date, modern user interface● Use open source software (Fedora, Elasticsearch, Drupal)● Seamlessly import (meta-)data from Digitool and potentially other
(repository) systems● Integrate the system with the emerging Linked-Open-Data
ecosystem, especially authority data● Loosen the tight integration with Ex Libris Aleph● Expose (meta-)data for easy discovery & re-use by others.
graphthinking
Overview of the new architecture
regal (backend)
Fedora Elasticsearch
regal-drupal (frontend)
Ex LibrisAleph
lobid API
graphthinking
Data model
● Simple hierarchical data model consists of nodes associated via hasPart and partOf relations
● Each node is identified by a namespace combined with a Universally Unique Identifier (UUID)
● Each node can have a bit and a metadata stream
● Metadata canonically stored as RDF N-triples● Bitstream can contain arbitrary data
graphthinking
graphthinking
Fedora (3.7.1)
● mainly used to organize and associate multiple datastreams and their versions
● provides a long term accessible data storage ● usage of Proai as OAI-PMH solution
graphthinking
Elasticsearch (1.1.0)
● Used to provide performant lookup (for metadata and full-text)
● Stores compacted JSON-LD● Faceting can be used to browse the collection
graphthinking
Backend / API
● Java Web API (RESTful) implemented with Jersey
● Abstracts access to storage & indexing, transparently updates Fedora and different Elasticsearch indexes
● Provides resources as OAI-ORE aggregations
graphthinking
Drupal Frontend
● Re-use of common features– User management
– Template-system
– Field API
– RDF Mappings
– HTML-Form API
● Extended with custom modules for– Storage Backend
– Linked Data Fields
– JavaScript UI enhancements
graphthinking
No big surprises for plaintext input...
graphthinking
Catalinking
graphthinking
Simple lookup widget withconfigurable data sources(currently only lobid-API
is implemented)
graphthinking
graphthinking
Additional linked data isintegrated on-the-fly
graphthinking
graphthinking
Client-side sorting (andsoon also searching) of
linked data
graphthinking
Exposing data
graphthinking
graphthinking
graphthinking
Importing data
graphthinking
This is simply a shortcut,any linked data URI can
be used.
graphthinking
Tada!
graphthinking
graphthinking
Managing structure
graphthinking
Possible child nodes, in caseof a monograph these are
only files. Journals provide morecomplex structures (volumes,
issues, articles).
graphthinking
graphthinking
Basic technical metadataadded by the backend.
graphthinking
Move object by settingsits new parent.
graphthinking
Faceted search, brought to us by Elasticsearch
graphthinkingFacets can be added and removed individually.
graphthinking
graphthinking
Anybody can say anything about anything...
graphthinking
Local views on remote resources,e.g. authors and classifications.
graphthinking
Obstacles encountered / lessons learned: Drupal
● is designed to be standalone, so we basically have two backends
● its HTML Form API can be awkward to work with if you don't want to do things the "Drupal-way"
● a pure JavaScript / HTML5 frontend might replace Drupal in upcoming versions
graphthinking
Obstacles encountered / lessons learned: Fedora
● is more of an infrastructure than a storage system
● because of its complexity, we consider authorization via XACML a big disadvantage
● OAI-PMH is also not supported very well● we are still looking for a more lightweight
solution● perhaps as lightweight as simply using the file
system for both bitstreams and metadata
graphthinking
Obstacles encountered / lessons learned: Elasticsearch
● Works very well with JSON-LD in general● but needs some care to create proper
mappings● and could use a more generic notion of
relations than only parent/child.
graphthinking
Further regal applications
● Migrate further Digitool and non-Digitool repositories
● Frontend: Prototype of an OER World Map
graphthinking
Good news: Linked Data Works!
● regal / Edoweb is not a research project,● it is integrated into the hbz IT landscape,● it is on the web,● it does not require expertise in Linked Data,● and real librarians will use it to create real
catalog entries.
graphthinking
Thank you!
Questions? Now or later to