bde sc6-ws-05/12/2016 technology part - swc
TRANSCRIPT
BIG DATA EUROPEPILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVELEUROPE IN A CHANGING WORLD - INCLUSIVE, INNOVATIVE AND REFLECTIVE SOCIETIES
WORKSHOP: THE CHALLENGES OF BIG DATA FOR SOCIETIES IN A CHANGING WORLD, 05 DECEMBER 2016MARTIN KALTENBÖCK (CFO, SEMANTIC WEB COMPANY)
Integrating Big Data, Software & Communities for Addressing Europe’s Societal Challenges
BDE SC6 Workshop
Big Data Europe (CSA: 2015-17)
Show societal value of Big Data: 7 Domains
Lower barrier for using big data technologieso Required effort and resourceso Limited data science skills
Help establishing cross-lingual/organizational/domain Data Value Chains
1 mai 2023
Big Data Europe
1 mai 2023
COORDINATIONStakeholder Engagement
(Requirements Elicitation)
SUPPORTDesign, Realise, Evaluate
Big Data Aggregator Platform
Create and Manage Societal Big Data Interest
Groups
Cloud-deployment ready Big Data Aggregator
Platform
CSA Measures
Results
THE BDE PLATFORM ARCHITECTURE & COMPONENTS
Integrating Big Data, Software & Communities for Addressing Europe’s Societal Challenges
The three Big Data „V“ Variety is often neglected
Current State of Platform Architecture
Adding a Semantic Layer to Data Lakes
Manufacturing Marketing Sales SupportAccounting
Semantic Data Lake• central place for
model, schema and data historization
• Combination of Scale Out (cost reduction) and semantics (increased control & flexibility)
• grows incrementally (pay-as-you-go)
Inbound
Data Sources
Outbound and Consumption
Inbound Raw Data Store
Data Lake (order of magnitude cheaper scalable data store)
Knowledge Graph for Relationship Definition and Meta Data
Frontend to Access Relationship and KPI Definition / Documentation Frontend to Access (ad hoc) Reports Outbound Data Delivery to
Target Systems
JSON-LD CSVW R2RMLXML2RDF
Why to use BDE Technology?Hortonworks Cloudera MapR Bigtop BDE
File System HDFS HDFS NFS HDFS HDFS
Installation Native Native Native Native lightweight virtualization
Plug & play components (no rigid schema)
no no no no yes
High Availability Single failure recovery (yarn)
Single failure recovery (yarn)
Self healing, mult. failure rec.
Single failure recovery (yarn)
Multiple Failure recovery
Cost Commercial Commercial Commercial Free Free
Scaling Freemium Freemium Freemium Free Free
Addition of custom components
Not easy No No No Yes
Integration testing yes yes yes yes --
Operating systems Linux Linux Linux Linux All
Management tool Ambari Cloudera manager MapR Control system - Docker swarm UI+ Custom
SC6 PILOTCITIZENS BUDGET ON MUNICIPAL LEVELARCHITECTURE & COMPONENTS
Integrating Big Data, Software & Communities for Addressing Europe’s Societal Challenges
SC6 in Big Data Europe – what is included
Europe in a changing world - inclusive, innovative and reflective societies
Social Sciences Smart Statistics (Digital) Humanities Digital (Research) Archives1 mai 2023www.big-data-europe.eu
SC6: Social Sciences
1 mai 2023www.big-data-europe.eu
Pilot focus area:Citizens budget
spending on municipal levelBig Data Focus area:
Statistical and research data linking & integrationSelected Key Data assets: Detailed budget execution data in city level, statistical data from public data portals and statistical offices, federated social sciences data catalogs
SC6 Pilot: Idea & ObjectivesState of the Art:
o Budget: the most important document of public policy
o Budget execution affects everyday liveso Citizens are more involved in city level
activitiesObjective:Can we make budgets more useful for citizens, researchers and decision makers?
1 mai 2023
SC6 Pilot: Idea & Objectives Create an online Dashboard on Economic
Datao Harvest data from several sources in diff. formatso Normalise the data (RDF)o Link & map the data (attributes, structure,
languages)o Analyse the data – financial ratios (comparisons,
predictions etc.)o Visualise the analysis on an online dashboard
including help & infos to understand data & analysiso Procide raw data (for further use as open data)
1 mai 2023www.big-data-europe.eu
2 H2020 projects working together on the SC6 Pilot
• Big Data Europe
• Your Data Stories
SC6 Pilot core team: Ivana Versic (Cessda), Michalis Vafopoulos (NCSR-D), Martin Kaltenböck (SWC), Jürgen Jakobitsch (SWC), Hossein Abroshan (Cessda)
SC6 Pilot Partners
Data used / produced in Pilot
Budget Data and Budget Execution Data Municipality of Athens, Greece
o Description: budget execution data in detailo Frequency: dailyo Ownership: openo Format: API
Municipality of Thessaloniki, Greeceo Description: budget execution data in detail o Frequency: dailyo Ownership: openo Format: csv, xls (files for download provided)
1 mai 2023www.big-data-europe.eu
Municipality of Kalamaria, Greeceo Description: budget execution data in
detail o Frequency: weeklyo Ownership: openo Format: csv, xls (files for download
provided)
Additional Open Data o Description: economic taxonomies etc.o Ownership: openo Format: RDF (skos, owl), othero E.g. COFOG (UN Classification)
Size of Datao ~ 30 Mio triples (statements) for 1
year
4 Vs of Big Data in SC6 Pilot Variety: requirement based on the harvesting of budget data
and budget execution data from several sources, available in different structures and formats.
Volume: requirement regarding the growing amount of open budget data available as well as of budget execution data
Velocity: requirements regarding budget execution data that is provided on continuous basis by the publisher (daily, weekly, monthly).
Veracity: Veracity refers to the biases, noise and abnormality in data. Even for within the same country there are differences on the published data because often are coming from different systems or public accounting standards are not enforced absolutely uniformly (e.g. different municipal departments)
1 mai 2023www.big-data-europe.eu
SC6: Social Sciences
www.big-data-europe.eu
Pilot Architecture & Components
SC6 Pilot - Architecture
1 mai 2023www.big-data-europe.eu
SC6 Pilot: Technical Components
Apache Flume, https://flume.apache.org/ (data ingestion) Apache Kafka, http://kafka.apache.org (messaging service) Apache Spark, http://spark.apache.org (distributed analysis, transformation) Apache HDFS, http://hadoop.apache.org (raw data storage) SWCs’ PoolParty Semantic Suite, http://poolparty.biz (data consolidation,
curation, mapping) OpenLink s’ Virtuoso, http://virtuoso.openlinksw.com (triple store – data
storage) Apache HTTP, http://httpd.apache.org (linked data serving) Apache Avro, http://avro.apache.org/docs/current/ (intermediate data
schema) D3 JS Library, https://d3js.org/ (visualisation of RDF data using SPARQL
queries) SWCs’ PoolParty GraphSearch (SPARQL based interface component for
filter & faceted search)
1 mai 2023www.big-data-europe.eu
SC6 Pilot: 1st version implemented
1 mai 2023www.big-data-europe.eu
https://bde.poolparty.bizGraphSearchSC6
SC6 Pilot: Pilot EvaluationEvaluation Approach SC6 Pilot (starts 01/2017): Invite municipalities to evaluate and use the system Invite community (open data, data community, BDE community,
W3C) Evaluate within the participating projects (BDE, DataStories,
invite: OpenBudget) BDE SC6 workshop in Cologne, 5.12.2016Additional evaluation – tests over time with a growing amount of data a growing number of different sources & formats docked onto the
system additional analytics in place
1 mai 2023www.big-data-europe.eu
How to benefit best from BDE
1 mai 2023www.big-data-europe.eu
• BDE Workshops& Webinars• Use & expand the BDE Platform (BDE
github)• Visit Website: news, events,
community, …• Big Data Europe W3C Community
Group• 7+1x Mailing Lists – stay tuned!• BDE Platform website coming
soon!!
• Related EC Call on Big Data, open until 02 Feb2017:Policy-development in the age of big data: data-driven policy-making, policy-modelling and policy-implementation
Contacts: CESSDA, http://cessda.net/ Ivana Ilijasic Versic, [email protected] Abroshan, [email protected]
NCSR-D, http://www.demokritos.gr/?lang=en Michalis Vafopoulos, [email protected]
Semantic Web Company (SWC), http://www.semantic-web.at Martin Kaltenböck, [email protected] Jürgen Jakobitsch, [email protected]
1 mai 2023www.big-data-europe.eu
Questions & Contactswww.big-data-europe.eu
1 mai 2023#BigDataEurope
Martin KaltenböckCFO, Semantic Web [email protected]
http://www.linkedin.com/in/martinkaltenboeckhttps://twitter.com/kalte2707http://de.slideshare.net/MartinKaltenboeck http://blog.semantic-web.at