bde sc6-ws-05/12/2016 technology part - swc

24
BIG DATA EUROPE PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL EUROPE IN A CHANGING WORLD - INCLUSIVE, INNOVATIVE AND REFLECTIVE SOCIETIES WORKSHOP: THE CHALLENGES OF BIG DATA FOR SOCIETIES IN A CHANGING WORLD, 05 DECEMBER 2016 MARTIN KALTENBÖCK (CFO, SEMANTIC WEB COMPANY) Integrating Big Data, Software & Communities for Addressing Europe’s Societal Challenges BDE SC6 Workshop

Upload: bigdataeurope

Post on 15-Jan-2017

30 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Page 1: BDE SC6-ws-05/12/2016 technology part - SWC

BIG DATA EUROPEPILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVELEUROPE IN A CHANGING WORLD - INCLUSIVE, INNOVATIVE AND REFLECTIVE SOCIETIES

WORKSHOP: THE CHALLENGES OF BIG DATA FOR SOCIETIES IN A CHANGING WORLD, 05 DECEMBER 2016MARTIN KALTENBÖCK (CFO, SEMANTIC WEB COMPANY)

Integrating Big Data, Software & Communities for Addressing Europe’s Societal Challenges

                    

BDE SC6 Workshop

Page 2: BDE SC6-ws-05/12/2016 technology part - SWC

Big Data Europe (CSA: 2015-17)

Show societal value of Big Data: 7 Domains

Lower barrier for using big data technologieso Required effort and resourceso Limited data science skills

Help establishing cross-lingual/organizational/domain Data Value Chains

1 mai 2023

Page 3: BDE SC6-ws-05/12/2016 technology part - SWC

Big Data Europe

1 mai 2023

COORDINATIONStakeholder Engagement

(Requirements Elicitation)

SUPPORTDesign, Realise, Evaluate

Big Data Aggregator Platform

Create and Manage Societal Big Data Interest

Groups

Cloud-deployment ready Big Data Aggregator

Platform

CSA Measures

Results

Page 4: BDE SC6-ws-05/12/2016 technology part - SWC

THE BDE PLATFORM ARCHITECTURE & COMPONENTS

Integrating Big Data, Software & Communities for Addressing Europe’s Societal Challenges

                    

Page 5: BDE SC6-ws-05/12/2016 technology part - SWC

The three Big Data „V“ Variety is often neglected

Page 6: BDE SC6-ws-05/12/2016 technology part - SWC

Current State of Platform Architecture

Page 7: BDE SC6-ws-05/12/2016 technology part - SWC

Adding a Semantic Layer to Data Lakes

Manufacturing Marketing Sales SupportAccounting

Semantic Data Lake• central place for

model, schema and data historization

• Combination of Scale Out (cost reduction) and semantics (increased control & flexibility)

• grows incrementally (pay-as-you-go)

Inbound

Data Sources

Outbound and Consumption

Inbound Raw Data Store

Data Lake (order of magnitude cheaper scalable data store)

Knowledge Graph for Relationship Definition and Meta Data

Frontend to Access Relationship and KPI Definition / Documentation Frontend to Access (ad hoc) Reports Outbound Data Delivery to

Target Systems

JSON-LD CSVW R2RMLXML2RDF

Page 8: BDE SC6-ws-05/12/2016 technology part - SWC

Why to use BDE Technology?Hortonworks Cloudera MapR Bigtop BDE

File System HDFS HDFS NFS HDFS HDFS

Installation Native Native Native Native lightweight virtualization

Plug & play components (no rigid schema)

no no no no yes

High Availability Single failure recovery (yarn)

Single failure recovery (yarn)

Self healing, mult. failure rec.

Single failure recovery (yarn)

Multiple Failure recovery

Cost Commercial Commercial Commercial Free Free

Scaling Freemium Freemium Freemium Free Free

Addition of custom components

Not easy No No No Yes

Integration testing yes yes yes yes --

Operating systems Linux Linux Linux Linux All

Management tool Ambari Cloudera manager MapR Control system - Docker swarm UI+ Custom

Page 9: BDE SC6-ws-05/12/2016 technology part - SWC

SC6 PILOTCITIZENS BUDGET ON MUNICIPAL LEVELARCHITECTURE & COMPONENTS

Integrating Big Data, Software & Communities for Addressing Europe’s Societal Challenges

                    

Page 10: BDE SC6-ws-05/12/2016 technology part - SWC

SC6 in Big Data Europe – what is included

Europe in a changing world - inclusive, innovative and reflective societies

Social Sciences Smart Statistics (Digital) Humanities Digital (Research) Archives1 mai 2023www.big-data-europe.eu

Page 11: BDE SC6-ws-05/12/2016 technology part - SWC

SC6: Social Sciences

1 mai 2023www.big-data-europe.eu

Pilot focus area:Citizens budget

spending on municipal levelBig Data Focus area:

Statistical and research data linking & integrationSelected Key Data assets: Detailed budget execution data in city level, statistical data from public data portals and statistical offices, federated social sciences data catalogs

Page 12: BDE SC6-ws-05/12/2016 technology part - SWC

SC6 Pilot: Idea & ObjectivesState of the Art:

o Budget: the most important document of public policy

o Budget execution affects everyday liveso Citizens are more involved in city level

activitiesObjective:Can we make budgets more useful for citizens, researchers and decision makers?

1 mai 2023

Page 13: BDE SC6-ws-05/12/2016 technology part - SWC

SC6 Pilot: Idea & Objectives Create an online Dashboard on Economic

Datao Harvest data from several sources in diff. formatso Normalise the data (RDF)o Link & map the data (attributes, structure,

languages)o Analyse the data – financial ratios (comparisons,

predictions etc.)o Visualise the analysis on an online dashboard

including help & infos to understand data & analysiso Procide raw data (for further use as open data)

1 mai 2023www.big-data-europe.eu

Page 14: BDE SC6-ws-05/12/2016 technology part - SWC

2 H2020 projects working together on the SC6 Pilot

• Big Data Europe

• Your Data Stories

SC6 Pilot core team: Ivana Versic (Cessda), Michalis Vafopoulos (NCSR-D), Martin Kaltenböck (SWC), Jürgen Jakobitsch (SWC), Hossein Abroshan (Cessda)

SC6 Pilot Partners

Page 15: BDE SC6-ws-05/12/2016 technology part - SWC

Data used / produced in Pilot

Budget Data and Budget Execution Data Municipality of Athens, Greece

o Description: budget execution data in detailo Frequency: dailyo Ownership: openo Format: API

 Municipality of Thessaloniki, Greeceo Description: budget execution data in detail o Frequency: dailyo Ownership: openo Format: csv, xls (files for download provided)

1 mai 2023www.big-data-europe.eu

Municipality of Kalamaria, Greeceo Description: budget execution data in

detail o Frequency: weeklyo Ownership: openo Format: csv, xls (files for download

provided)

Additional Open Data o Description: economic taxonomies etc.o Ownership: openo Format: RDF (skos, owl), othero E.g. COFOG (UN Classification)

Size of Datao ~ 30 Mio triples (statements) for 1

year

Page 16: BDE SC6-ws-05/12/2016 technology part - SWC

4 Vs of Big Data in SC6 Pilot Variety: requirement based on the harvesting of budget data

and budget execution data from several sources, available in different structures and formats.

Volume: requirement regarding the growing amount of open budget data available as well as of budget execution data

Velocity: requirements regarding budget execution data that is provided on continuous basis by the publisher (daily, weekly, monthly).

Veracity: Veracity refers to the biases, noise and abnormality in data. Even for within the same country there are differences on the published data because often are coming from different systems or public accounting standards are not enforced absolutely uniformly (e.g. different municipal departments)

1 mai 2023www.big-data-europe.eu

Page 17: BDE SC6-ws-05/12/2016 technology part - SWC

SC6: Social Sciences

www.big-data-europe.eu

Pilot Architecture & Components

Page 18: BDE SC6-ws-05/12/2016 technology part - SWC

SC6 Pilot - Architecture

1 mai 2023www.big-data-europe.eu

Page 19: BDE SC6-ws-05/12/2016 technology part - SWC

SC6 Pilot: Technical Components

Apache Flume, https://flume.apache.org/ (data ingestion) Apache Kafka, http://kafka.apache.org (messaging service) Apache Spark, http://spark.apache.org (distributed analysis, transformation) Apache HDFS, http://hadoop.apache.org (raw data storage) SWCs’ PoolParty Semantic Suite, http://poolparty.biz (data consolidation,

curation, mapping) OpenLink s’ Virtuoso, http://virtuoso.openlinksw.com (triple store – data

storage) Apache HTTP, http://httpd.apache.org (linked data serving) Apache Avro, http://avro.apache.org/docs/current/ (intermediate data

schema) D3 JS Library, https://d3js.org/ (visualisation of RDF data using SPARQL

queries) SWCs’ PoolParty GraphSearch (SPARQL based interface component for

filter & faceted search)

1 mai 2023www.big-data-europe.eu

Page 20: BDE SC6-ws-05/12/2016 technology part - SWC

SC6 Pilot: 1st version implemented

1 mai 2023www.big-data-europe.eu

https://bde.poolparty.bizGraphSearchSC6

Page 21: BDE SC6-ws-05/12/2016 technology part - SWC

SC6 Pilot: Pilot EvaluationEvaluation Approach SC6 Pilot (starts 01/2017): Invite municipalities to evaluate and use the system Invite community (open data, data community, BDE community,

W3C) Evaluate within the participating projects (BDE, DataStories,

invite: OpenBudget) BDE SC6 workshop in Cologne, 5.12.2016Additional evaluation – tests over time with a growing amount of data a growing number of different sources & formats docked onto the

system additional analytics in place

1 mai 2023www.big-data-europe.eu

Page 23: BDE SC6-ws-05/12/2016 technology part - SWC

Contacts: CESSDA, http://cessda.net/ Ivana Ilijasic Versic, [email protected] Abroshan, [email protected]

NCSR-D, http://www.demokritos.gr/?lang=en Michalis Vafopoulos, [email protected]

Semantic Web Company (SWC), http://www.semantic-web.at Martin Kaltenböck, [email protected] Jürgen Jakobitsch, [email protected]

1 mai 2023www.big-data-europe.eu

Page 24: BDE SC6-ws-05/12/2016 technology part - SWC

Questions & Contactswww.big-data-europe.eu

1 mai 2023#BigDataEurope

Martin KaltenböckCFO, Semantic Web [email protected]

http://www.linkedin.com/in/martinkaltenboeckhttps://twitter.com/kalte2707http://de.slideshare.net/MartinKaltenboeck http://blog.semantic-web.at