open data node - platform and methodology - 2015-may

30
The COMSODE project has received funding from the Seventh Framework Programme of the European Union in the grant agreement number 611358. Open Data Node Platform and Methodology Peter Hanečák <[email protected]>, EEA s.r.o. May, 2015

Upload: comsode-fp7-project

Post on 12-Aug-2015

79 views

Category:

Software


0 download

TRANSCRIPT

The COMSODE project has received funding from the Seventh Framework Programme of the European Union in the grant agreement number 611358.

Open Data Node

Platform and Methodology

Peter Hanečák <[email protected]>, EEA s.r.o.

May, 2015

Who am I

● Peter Hanečák <[email protected]>

● member of COMSODE project

– leader of WP2 (architecture and design of ODN)

– leader of WP4 (implementation ODN)

● enthusiast in many things “Open”,

active in NGOs and other communities

– member of OpenData.sk and SOIT

– Fedora Linux packager

https://www.facebook.com/hany.skhttps://www.linkedin.com/in/peterhanecakhttps://twitter.com/PHanecak

Agenda

● What is COMSODE

● What is COMSODE Methodology

● What is Open Data Node (ODN)

● Integration with ODN

● HW and SW requirements

● Future of ODN

COMSODE

● Components Supporting the Open Data Exploitation

● main target: publication platform for Open Data

– software tool

● supplemental goal: methodology for publication of Open Data

– mainly for those with little or no experience with Open Data

– because software as of itself is useless for such people, organizations

● validation: pilots

– pilots by 3rd parties

– pilot by COMSODE itself: 150 datasets + 3rd party-like Search app by Spinque

COMSODE Methodology

● publication plan

● preparation of publication

● realization of publication

● archiving

reference:

● http://www.comsode.eu/index.php/deliverables/

● Deliverable D5.1 + ANNEX 1 and 2

COMSODE Methodology

● publication plan

● preparation of publication

● realization of publication

● archiving

COMSODE Methodology

● publication plan

● preparation of publication

● realization of publication

● archiving

COMSODE Methodology

● publication plan

● preparation of publication

● realization of publication

● archiving

COMSODE Methodology

● publication plan

● preparation of publication

● realization of publication

● archiving

Open Data Node

help with many publication steps as outlined in Methodology

handle complexities as present in sources of data

make it easy to publish high-quality (Linked) Open Data from those sources

in automated fashion

most common use-cases: 2* -> 3*+

● input: XLS, SQL DB, ...

● transformations: XLS, SQL -> CSV, „bad CSV“ -> CSV, CSV -> Linked Data

● output:

– tabular/relational data: CSV, REST API

– Linked Data: RDF, SPARQL endpoint

Open Datanot

Open Data

Open Data Node

Open Data Node

ODN can be used by:

● data publishers

● data users

Many publishers are also users, thus

the data ecosystem is quite

complex.

ODN can be used in many roles

within that ecosystem.

Open Data Node

● platform supporting whole

OD publishing process

● modular design

● allowing to create distributed

network of nodes

● able to be integrated to

existing infrastructure

Open Data Node

● extraction, transformation and

enrichment of internal data

● storage of resulting Open Data

● publishing of stored Open Data

on the Web

● cataloging functionality

● management functions

Open Data Node

● publication plan

● preparation of publication

● realization of publication

● archiving

Open Data Node

● publication plan

● preparation of publication

● realization of publication

● archiving

Open Data Node

● publication plan

● preparation of publication

● realization of publication

● archiving

Open Data Node

● publication plan

● preparation of publication

● realization of publication

● archiving

Open Data Node

● publication plan

● preparation of publication

● realization of publication

● archiving

Integration with Open Data Node

● data harvesting side

● data publication side

● special cases

Integration with Open Data Node

data publication side: as implied by most common use-cases

● files: CSV, RDF

● API: REST API, SPARQL endpoint

Integration with Open Data Node

data harvesting side: as implied by most common use-cases

● files: XLS, „bad CSV“, ... - almost anything(*)

● API: SQL, SOAP, ... - almost anything(*)

● plus all the „Open Data files and APIs“

(*) given a prominence of a format/technology or particular interest of „customer“

Integration with Open Data Node

special cases:

● ODN/Management: integration of SSO with your existing infrastructure

● ODN/Storage: direct access to SPARQL endpoint

● ODN/InternalCatalog: direct access to management API

● etc.

HW and SW requirements

HW:

● CPU: common x86_64 compatible (dual/quad core is recommended)

● memory: minimum 4 GB (recommended 8 GB) (*)

● storage: minimum 40 GB (*)

SW:

● OS: Debian 7.6 „Wheezy“

● OpenJDK 7

(*) Subject to size of transformed data and requirements on transformation operations.

Future of ODN

Key point: Open Source

Future depends on many factors:

● strenght of communities

– around ODN itself

– around individual components: UnifiedViews, CKAN, PostgreSQL, etc.

● how well the business goes for commercial partners which use and

maintain ODN (EEA, etc.)

Future of ODN

Key point: Open Source

Future depends on many factors:

● strenght of communities

● how well the business goes for commercial partners which use and maintain ODN (EEA, etc.)

Existing achievements strenghtening the future:

● consortium around UnifiedViews: three companies and other organizations

● Slovak government as customer for ODN

● around 10 COMSODE Pilots in various EU countries

(so far, at various stages)

ODN implementation in Slovakia

in eDemokracia project, ODN is used as:

● centralized component

● de-centralized component

de-centralized component

centralized component

ODN implementation in Slovakia

ODN as part of centralized component:

● heavily customized

– only some modules used, commercial version of triplestore,

clustered RDBMS, etc.

● decomposed to multiple servers

● integrated with other components

– centralized SSO, OCR and content clasification services, etc.

● an “upgrade” for existing data portal

data.gov.sk

● incorporated as extension into top-level GOV portal

slovensko.sk

ODN implementation in Slovakia

ODN as de-centralized component:

● ODN with little customizations

– central catalog and storage preconfigured

– etc.

● distributed as „live DVD“

● for gov. organizations and

municipalities