publishing rdf skos with microservices

Download Publishing RDF SKOS with microservices

If you can't read please download the document

Upload: bart-hanssens

Post on 07-Apr-2017

108 views

Category:

Technology


4 download

TRANSCRIPT

Publishing RDF SKOS with Java microservices

Publishing RDF SKOS
with Java microservices

Fedict Brussel jan 2017

Linked Data

Resource Description Framework

Triple stores

Jena and RDF4j

Dropwizard


Agenda

Tekening van een hoed

DropwizardRDF4j


Overview

JettyLuceneRDF storeJerseyFreemarkerSlf4j

Linked Data

Making the web machine-readable

Distributed / webChallenging for queries

Data not guaranteed to be available / persistent

Add meaning to relations / links


Semantic web

Using URI as identifier

Dereferenceable URI


Identifier

Resource Description Framework

Triple

S and P are resource identifiers (IRI)http://example.com, mailto:[email protected],

urn:example:1234-56789, ...

O can be: Identifier (link to something else)

LiteralString value with optional language tag

OR typed value (e.g XSD date, integer...)


RDF Basics

RDF is not a file formatAlthough .rdf extension is often used for RDF/XML

Popular serializationsN-Triples (.nt): fast and easy

Turtle (.ttl): human-friendly

RDF/XML (.rdf): XML-flows

JSON-LD (.json): web devs


RDF serializations

Based upon RDF SchemaSomewhat similar to XML Schema

Classes and properties

Can (and should be !) be mixed, reused

Popular vocabulariesDublin Core: generic title, description

SKOS: broader / narrower term

ROV: registered organizations

http://lov.okfn.org/dataset/lov/


Vocabularies

RDF can be generated without triple store

Less suitable for:Very large tabular sets (e.g. RDBMS dumps)

Tiny sensor data


Notes

Jena and RDF4j

Both great Java open source frameworksReading/writing/converting RDF, Triple stores ...

Apache Jenahttps://jena.apache.org/

Better performance / more scalable ?

Eclipse RDF4j (Sesame)http://rdf4j.org/

Better architecture (Sails) ?


Jena vs RDF4j

Embedded store / standalone server100 - 150 mln triples

No out-of-the-box HA / replicationProbably not needed for publishing smaller sets

Running multiple shared nothing ?

Bonus: Sail abstractionSwitch to GraphDB, Blazegraph with minor changes


Why (not) RDF4j as data store

Triple stores

TS optimized for storing triples

TS often lack fine-grained checksFew checks for data types, non-null

Commercial stores like StarDog offer more options

Work in progress: https://www.w3.org/TR/shacl/

Full text search often handled by LuceneOften product-specific extension

Queries and updates with SPARQL (SQL-alike)And / or custom api, faster but less portable


Triple store vs RDBMS

Small / medium setsApache Jena store (part of framework)

Eclipse RDF4j store (part of framework)

Larger setsBlazegraph (GPU acceleration in comm.version)

OntoText GraphDB (free demo)

Oracle Spatial and Graph

Virtuoso (hybrid XML / RDBMS / TS)


Popular stores

SPARQL endpointsAdvanced queries

Heavy load on server side

Linked Data FragmentsVery basic queries

Shifting workload to client

More network traffic

http://linkeddatafragments.org/concept/


Distributed queries

Dropwizard

Mixing REST / SOA / Unix philosophyDo 1 thing and do it well

Back-end

Also in JavaTraditional Java EE to complex for small apps

Pippo, RH Wildfly Swarm, Jooby, Ninja,

Using Annotations, default config


Microservices

HTTP methodsGET, PUT, POST, DELETE, PATCH, HEAD, ...

Content NegotiationHTTP request header

Automatically serve different formats using same URL


REST

Initially developed by Yammerhttp://www.dropwizard.io

Modular but opinionatedJetty server, Jersey JAX-RS, Jackson JSON, Metrics

Very good for RESTLess suitable for front-end apps

Easy deployment1 uberjar (no need for Docker ?)


Dropwizard


Notes

Small hack for file type / language negotiationFor human-friendly HTML view

Use Jetty UriConnegFilter

Not intended for multiple vhosts, heavy cachingProxy / web server in front

AuthenticationMaybe Pac4j (3rd party): http://www.pac4j.org/

Thanks !

Bart Hanssens / FedictWTC III, Simon Bolivarlaan 301000 Brussels, [email protected] [at] fedict.be | www.fedict.belgium.be

| p.

Fedict 2014. All rights reserved | p.