ess workshop dissemination of official statistics as open data · ess workshop: dissemination of...
TRANSCRIPT
20-3-2017
ESS Workshop Dissemination of Official
Statistics as open data
ESS Workshop: dissemination of official statistics as open data Malta 2017
1
The information and views set out in the report are those of the author(s) and do not necessarily reflect the
official opinion of the European Union. Neither the European Union institutions and bodies nor any person
acting on their behalf may be held responsible for the use which may be made of the information contained
therein.
Authors:
Oscar CORCHO (Full Professor at the Department of Artificial Intelligence (Universidad Politécnica de Madrid))
Evangelos KALAMPOKIS (Research fellow with the Information Technologies Institute of the Centre for Research & Technology - Hellas (CERTH-ITI))
Eoin MACCUIRC (Webmaster at the Data dissemination Unit of CSO Ireland)
Joan Miquel PIQUÉ (External expert of DevStat)
Paola VOTTA (DevStat)
ESS Workshop: dissemination of official statistics as open data Malta 2017
2
Table of Contents
Introduction .................................................................................................................................. 3
1. Key conclusions and Recommendations from the Workshop .............................................. 5
1.1 Opportunities and Use Cases for the adoption of (Linked) Open Data in Official Statistics 5
1.2 The current landscape of challenges and tools ................................................................... 7
1.3 Strategy and Policy for an effective (Linked) Open Data adoption in official statistics ...... 7
Annexes ......................................................................................................................................... 9
Annex 1: Programme of the Workshop ........................................................................................ 9
Annex 2: Description of workshop sessions ................................................................................ 11
2.1 Opening Session ................................................................................................................ 11
2.1.1 Summary of presentations ......................................................................................... 11
2.2 Session 1: Opportunities and Use Cases ........................................................................... 12
2.2.1 Objective of the session ............................................................................................. 12
2.2.2 Summary of presentations ......................................................................................... 13
2.3 Session 2: Challenges and Tools ........................................................................................ 18
2.3.1 Objective of the session ............................................................................................. 18
2.3.2 Summary of presentations ......................................................................................... 19
2.4 Session 3: Strategy and Policy ........................................................................................... 21
2.4.1 Objective of the session ............................................................................................. 21
2.4.2 Summary of presentations ......................................................................................... 22
2.5 Closing Session .................................................................................................................. 24
Annex 3: Methodology of the Group Discussions ....................................................................... 26
3.1 Objective of the group discussions ................................................................................... 26
3.2 Methodology ..................................................................................................................... 27
3.2.1 Session 1 - Activity: Enriched Double SWOT analysis ............................................. 28
3.2.2 Session 2 - Activity: Lotus Flower ........................................................................... 30
3.2.3 Session 3 - Activity: From…to ................................................................................. 31
ESS Workshop: dissemination of official statistics as open data Malta 2017
3
Introduction
This document describes the findings and conclusions obtained from the ESS workshop
on dissemination of official statistics as open data, held in St. Julians (Malta) on 18th -
19th January 2017.
Open data is an emerging discipline with a huge potential for the (re-)dissemination of
official statistics. There are numerous questions to address for the statistical
community. The workshop aimed at bringing together various stakeholders to explore
and prepare the ESS future orientation in this area of work. 56 delegates from 27
countries, including representatives from private companies and academia, joined
together in order to learn about the ESS vision of disseminating official statistics as
(Linked) Open Data, and to discuss about the potential benefits of such an approach, the
limitations and challenges for a broader uptake, the key ingredients involved in the
different topics to be addressed, and the steps to be taken to achieve this vision.
The event was preceded by a preparatory meeting in Luxembourg, held on 28th
November 2016, where a group of 25 experts from Eurostat, the Office for Official
Publications of the European Communities, several national statistical Offices and
Devstat and PriceWaterHouseCoopers, as private companies, met together. During this
meeting, several discussions took place and agreements were reached, such as the
selection of three Proofs of Concept (PoC)1 to continue focusing on, from those
proposed in the initial study done by PwC. The first PoC selected was the one on using
LOD as a vehicle for harmonization, standardization and management of classifications
handled by national statistical offices, the second on the usage of Linked Data principles
for the publication of data and metadata from national statistical offices, and the last
one on using LOD technologies for integration of multiple data sets from multiple data
providers.
Besides that several needs were also identified, such as the need for clear guidelines for
the adoption of an LOD approach in official statistics, the need to study better the
benefits that LOD provides in this area, the need to characterise further the users of
such data, and the need to address the perceived complexity of linked data across data
providers and users, including the development of strategy roadmaps,
recommendations about the governance of the generated LOD and the generation of
technical guidelines.
The results from the preparatory meeting allowed configuring the ESS Workshop, which
was structured around three main sessions:
1 A proof of concept is considered to be an experiment or pilot project that demonstrates the feasibility of an approach
or concept. In this case, PoCs will demonstrate the feasibility of using Linked Open Data for the dissemination and exploitation of official statistics.
ESS Workshop: dissemination of official statistics as open data Malta 2017
4
• Session 1: Opportunities and Use cases
• Session 2: Challenges and tools
• Session 3: Strategy and policy
Each session combined expert lectures around the main topic, presentations of national
experiences and group discussions.
The experts lectures and presentations aimed at showing the key ingredients and
opportunities that open data, in general, and Linked Open Data, in particular, offer for
the dissemination of official statistics, as well as the organisational and technological
support and challenges associated to such adoption. They also illustrated examples of
the work already done by different Member States, who in some cases are already
deployed and available in their production systems.
These presentations also identified challenges and barriers for a wider and faster
adoption of these approaches in national statistical institutes (NSIs) from EU member
and associate states. These benefits, challenges and barriers were all reflected in the
discussions held in the group discussion activities.
The group discussions (organised as round tables) were carried out based on different
activities aimed at letting participants developing first their ideas individually and then
share them with the other participants in order to define a common. At the end of every
activity, a representative of every team presented to the whole audience a short
conclusion of the discussion, results, and common agreements reached.
In order to enhance its reading and use, this report first presents the key conclusions
from the Workshop in the following chapter. The Workshop sessions are then described
in Annex 2, summarising the presentations and explaining the methodology used for the
group activities under each session. Annex 1 include instead the programme of the
Workshop.
All presentations are available on the CROS portal2.
2 http://ec.europa.eu/eurostat/cros/content/presentations-11_en
ESS Workshop: dissemination of official statistics as open data Malta 2017
5
1. Key conclusions and Recommendations from
the Workshop
Main conclusions from the workshop are the following:
• There is a shared understanding of the benefits of LOD among NSIs who have experimented it. LOD supports more flexible means of data dissemination, enhanced data exploration between datasets and enables the linking with other sources (e.g. within a national statistical system) while keeping the information on data provenance. Indirect benefits are that LOD projects foster internal coherence of data and metadata, reinforce the role of NSI as standard setters and stimulate partnerships.
• LOD is an area in which NSIs are still largely experimenting. It is not yet perceived as mature for full production but there was broad agreement on the advantages of developing further steps in a coordinated way at ESS level. This can best be achieved through concrete results and pilots to further demonstrate the feasibility and benefits from LOD. The following priorities were identified:
o There is a need to build capacities at NSI and ESS level, through training, common pilot projects and collaboration across multidisciplinary teams (IT, dissemination, content and classifications).
o Governance is a key element in LOD. Common governance approaches and processes for LOD should be developed collaboratively and embedded in existing structures of the ESS, Eurostat and NSIs. The existing governance structures and communities will first be examined.
o The technology is available but the ESS would benefit from an evaluation and selection of standard tools and for guidelines concerning performance issues.
o The ESS should liaise more systematically with standards setters beyond the EU, and from experiments outside the EU (e.g. Australia, Japan).
In this section, a brief analysis of the discussions held during the working groups and the
results achieved by each activity is presented, which will help in setting up the
communication, training and work strategy to achieve the ESS vision 2020 for improving
dissemination of official statistics. We recall that the event was structured around three
sessions each of them characterised by a different activity.
1.1 Opportunities and Use Cases for the adoption of
(Linked) Open Data in Official Statistics
The conclusions presented in this section are based on the analysis of the results of the
group activity that was done during session 1. This activity intended to establish a
framework to visualise the key elements in order to start developing action, setting a
ESS Workshop: dissemination of official statistics as open data Malta 2017
6
starting point with regards to different perspectives of the situation, reaching a
consensus about causes, effects, and main challenges.
As a result of the group discussion on the enriched SWOT analysis, where participants
were organised in seven tables, 7 consolidated enriched SWOTs were generated. These
SWOTs were later analysed, after the meeting, and the following main conclusions can
be drafted (as a summary SWOT):
• Strengths are generally shared across all participants, who understand the main benefits of adopting a (Linked) Open Data policy: better interoperability, improving access to data and metadata, the possibility of handling multiple formats and standardisation.
• Weaknesses are also identified and generally shared across tables, including the lack of appropriate skills for data production and dissemination inside NSIs, and for data (re)use by users, the perceived immaturity of tooling despite existing success stories, the lack of clear guidelines and how-to, and lack of awareness and buy-in from top management in some NSIs.
• In terms of opportunities, such an approach is perceived as a good opportunity to provide more visibility and relevance to the work done by NSIs, and to provide solutions to some of the existing current limitations, such as providing more trust, allowing dealing with provenance, improving data literacy overall and opening up new opportunities for business and society.
• Threats identified were: whether this was just a fashionable approach that would live further in the future, or the possibility of having private companies releasing data if NSIs do not do it quickly, hence losing relevance in the data value chain landscape.
Many of the points raised in the SWOTs were accompanied by proposals for measures
and actions that would be required, according to participants, in order to boost and
enhance strengths and opportunities, as well as to mitigate the potential negative
effects of weaknesses and threats. Most of these proposals were oriented towards the
need and opportunity for good training, for the development of good simple use cases
that would act as a showcase and allow easier replication, for the establishment of good
technological, organisational and, in general, governance guidelines for all to follow, and
for the common development of a joint fabric of utilities and software to be reused and
adapted throughout the NSIs.
ESS Workshop: dissemination of official statistics as open data Malta 2017
7
1.2 The current landscape of challenges and tools
The second group activity focused on identifying the main ingredients needed to address
challenges identified in the previous session: (1) Interoperability to join data silos and
standardisation, (2) Codelists, (3) Technical complexity of Linked Data for Users, (4)
Technical complexity of Linked data for the Producers (skills gap), (5) Naming policies
and handling of temporal references and versions in identifiers and data, (6)
Reputational risk, data quality and confidentiality, (7) Risk of low performance/
Immature tooling.
This activity was performed using the Lotus Flower technique, with the main objective
of identifying the main places where existing technologies and approaches can help, and
those where there is a need to do more investment.
Given the fact that each table discussed a different topic, it is hard to provide a
consolidated view on all the ingredients that were addressed. However, they may be
grouped into the following three main areas:
• Governance/organisational: need for standardisation (common vocabularies, level of aggregation, metadata, common codelists, etc.), need to understand better users and their needs, need to start with low hanging fruit first, need to establish clear naming policies, need to address confidentiality, need to maintain visibility of NSIs’ work.
• Technological: use of cloud services, create catalogues of mature tools, ensure the sustainability of tools and create a common set of tools to be used by all NSIs.
• Training: need for documentation (developer-oriented, user-oriented, management-oriented), guidelines and cookbooks.
1.3 Strategy and Policy for an effective (Linked) Open Data
adoption in official statistics
The final activity was focused on making participants think and agree collectively about
the concrete steps to be taken in order to make the ESS vision 2020 a reality, based on
the previous identification of the elements to be tackled. This was performed using a
“From…to” activity, aimed to define the next steps in the short term, and guidelines and
priorities for an action plan. Key aspects and trends were integrated to draft future
scenarios. Each group was given a project to develop where a list of key aspects for the
development of the given project, including trends and possible external factors, was
prepared and future scenarios drafted.
Participants were divided into 8 groups, each of which discussed one of the following
five topics: (1) Strategy and Policy, (2) People and capabilities, (3) Data and Metadata,
ESS Workshop: dissemination of official statistics as open data Malta 2017
8
(4) Linked Data Governance, (5) Technology Infrastructure, as well as the following three
proposed “Proofs of Concept” (PoC): (6) Codelists, (7) Data publishing and provenance
and (8) Data integration.
The main discussions held under this activity can be summarised as follows:
• LOD is an area in which NSIs are still largely experimenting. There was broad agreement on the advantages of developing further steps in a coordinated way at ESS level. This can best be achieved through concrete pilots to further demonstrate the feasibility and benefits from LOD and a maturity model to assess advances across NSIs.
• The general lack of skills needs to be addressed through training.
• Governance is a key element in LOD. Common governance approaches and processes for LOD should be developed collaboratively and embedded in existing structures of the ESS, Eurostat and NSIs. The existing governance structures and communities will first be examined.
• LOD approach’s uptake will be possible if the landscape of formats is clearly understood, a limited number of sources clearly identified, with supporting scripts and ETLs to be shared, and agreements on naming.
• Codelists need to be standardised via international collaborations, and establishing a good governance for ensuring quality.
• For the PoC on codelists, the steps towards the development of a Linked Open Classification repository were discussed: XKOS profile, a central store for codelists, establish liaisons with different organisations.
• For the PoC on data publication, the need to select already harmonised datasets, export in different formats (including JSON-Stat) and the need for software were identified.
• For the PoC on data integration, similar aspects were also discussed.
ESS Workshop: dissemination of official statistics as open data Malta 2017
9
Annexes
Annex 1: Programme of the Workshop
Wednesday 18 January 2017
11:30 – 12:30 Registration, Networking
12.30 – 13:15 Buffet Lunch
13:15 – 14:00 Welcome address Mr Reuben Fenech (NSI Malta)
Opening address Ms Martina Hahn (Eurostat)
Presentation of the Workshop Objectives Mr Jose Enrique Vila (DevStat) & Ms Christine Kormann (Eurostat)
Session 1 Opportunities and Use Cases
The purpose of this session is to establish the added value of LOD for official statistics and its users, based on existing experience
14:00 – 14:45 Expert Lecture I: Linked Statistical Data 101 Dr Oscar Corcho, Full Professor at the Department of
Artificial Intelligence (Madrid)
14:45 – 15:15 Study on LOD requirement- main findings and proofs of concept
Dr Nikolaos Loutas, PWC Data and Analytics
15:15-15:45 Coffee Break
15:45 – 16:30 National Experiences Experience in Italy: Mr Giovanni Barbieri, ISTAT Experience in the UK: Mr Darren Barnes, ONS
Experience in Ireland: Mr Eoin MacCuirc, NSI Ireland
16:30 – 18:00 Group discussion (First Part)
20:00 Social Dinner
ESS Workshop: dissemination of official statistics as open data Malta 2017
10
Thursday 19 January 2017 - Morning Session 2
Challenges & Tools The purpose of this session is to examine the technical and organisational challenges
raised by LOD, available tools and areas for further development
9:30 – 10:15 Expert Lecture II: Linked Statistical Data: challenges and tools.
Dr Evangelos Kalampokis, research fellow with the Information Technologies Institute of the Centre for
Research & Technology - Hellas (CERTH-ITI)
10:15 – 10:45 Coffee Break
10:45 – 11:45 National Experiences Experience in France: Mr Franck Cotton, INSEE
LOD approaches in NSIs – Geographic Information perspective: Mr Hannes Reuter, Eurostat
11:45 – 13:00 Group discussion (Second Part)
13:00 – 14:00 Buffet Lunch
Thursday 19 January 2017 - Afternoon
Session 3 Strategy and Policy
The purpose of this session is to discuss the possible strategic orientations for the ESS in the field of LOD, level of ambition, common projects and governance issues
14:00 – 14:30 Expert Lecture III: Where are we going? – Delivering open data in Europe.
Mr Eoin MacCuirc, Webmaster at the Data dissemination Unit of CSO Ireland
14:30-14:45 Building blocks for an open data strategy of the European Statistical System
Dr Nikolaos Loutas, PWC Data and Analytics
14:45 – 15:45 Group discussion (Third Part)
15:45 – 16:00 Coffee Break
16:00 – 17:00 Presentation of Conclusions Closing address
Mr Emanuele Baldacci (Eurostat)
ESS Workshop: dissemination of official statistics as open data Malta 2017
11
Annex 2: Description of workshop sessions
2.1 Opening Session
Chair: Dr José Enrique Vila
2.1.1 Summary of presentations
The welcome address was provided by Mr Reuben Fenech, Director General at the
National Statistical Office of Malta). Ms. Martina Hahn (Head of Unit Methodology and
corporate Architecture in Eurostat) then made the opening presentation,
acknowledging the presence of representatives of several National Statistical Institutes
(NSIs). She presented the main objectives of the workshop, which were focused on
developing a shared understanding of the opportunities and challenges of open data
and Linked Open Data for official statistics, as well as contributing to the elaboration of
the ESS strategy in this field.
She moved then into describing the ESS vision 2020 (a common strategic response of
the ESS to the challenges that official statistics are facing, such as the data revolution,
the development of new metrics, or the price of statistics) and the DIGICOM project,
which provides the framework for the organisation this event. DIGICOM Work Package
3 (WP3) deals with open data dissemination, addressing all open data-related aspects
for the area of official statistics, facilitating and harmonising APIs to European data, and
improving re-dissemination.
Following this workshop, the contractor PriceWaterHouseCoopers will finalise its study
on LOD for official statistics (February 2017) and an initial draft of the ESS open data
strategy will be produced (also for February 2017). An ESSnet is then planned to be
launched in 2017.
A general presentation on open data was also provided by Ms. Christine Kormann
(DIGICOM project manager in Eurostat), from the definition of open data and the 5 stars
of open data, proposed by Tim Berners Lee, to the existence of open data barometers
that are helpful to understand the maturity of open data adoption in Europe, and the
availability of the pan-European open data portal. An interesting point was made about
the fact that open data is now a policy requirement that matches perfectly with NSI's
mission to disseminate official statistics, while LOD is a set of design principles that the
ESS may decide to use or apply. A final reflection was done on the high importance of
official statistics in this open data landscape, since statistics are considered as high value
datasets for open data portals, according to G8.
ESS Workshop: dissemination of official statistics as open data Malta 2017
12
Dr José Enrique Vila (Devstat) provided then a quick presentation of the workshop
logistics and agenda, paying special attention to the fact that the workshop would
contain three group activities facilitated by a mediator (Dr Oscar Corcho) and several
facilitators.
2.2 Session 1: Opportunities and Use Cases
Chair: Dr José Enrique Vila
2.2.1 Objective of the session
The objective of this session was to provide an initial tutorial on the main characteristics
to be considered when applying Linked Open Data principles for the dissemination of
official statistics, and then to reflect on the main opportunities, benefits and added value
that LOD can provide for official statistics and its users, based on the existing experience
from some early adopters.
ESS Workshop: dissemination of official statistics as open data Malta 2017
13
2.2.2 Summary of presentations
Dr Oscar Corcho (Universidad Politécnica de Madrid) started this session with an initial
tutorial-oriented presentation (an expert lecture) on how to apply Linked Open Data for
the dissemination of official statistics, which he entitled Linked Statistical Data 101. The
presentation was structured in four main blocks: an initial part describing the main
foundations for (Linked) Open Data, then a set of examples from the application of LOD
principles in a regional statistical office in Spain (Aragón), then the foundations of LOD
for official statistics (more specifically, the W3C RDF DataCube recommendation) and
finally an initial set of ideas to ignite the discussions during the group activity.
W3C Data Cube
51 51
ESS Workshop: dissemination of official statistics as open data Malta 2017
14
Some questions raised after this presentation were related to whether data reusers are
explicitly asking for the usage of RDF. Dr Oscar Corcho answered saying that normally
what they ask for is good/permanent identifiers to refer to the data, and that if RDF is
provided, they do not find it really problematic to deal with such data, since there are
development libraries in many programming languages.
After this initial presentation, Mr Daniel Brulé (PwC) discussed about the study that had
been carried out in the previous months by PwC on LOD requirements for official
statistics. The study objectives are the following:
• The identification and study of LOD initiatives in the ESS
• The identification and assessment of implementations, supported use cases and public learning resources for LOD in statistics
• The identification of current initiatives and projects on LOD in Statistics
• The proposition of proof of concepts that demonstrate the benefits of LOD for official statistics
• The identification of a high-level architecture for LOD in statistics
• The definition of a joint LOD strategy at ESS level
The approach taken for the study was also presented, based on desk research, onsite
visits and audioconferences with nine stakeholders coming from NSIs, standardisation
organisations like W3C, and academia.
ESS Workshop: dissemination of official statistics as open data Malta 2017
15
Some of the initial insights obtained were related to:
• Why NSIs use LOD. Mostly for interconnecting datasets within the NSI, with other NSIs and Eurostat, and to publish official statistics in machine-readable, linkable formats.
• Several use cases are available: metadata catalogues at the EU and national levels, integrated access to EU and BEA data, the Scottish Index of Multiple Deprivation, LOD for fact checking, selecting the best places to invest, or the Digital Agenda Scoreboard.
• Cost structure, for development, maintenance, promotion and licensing.
• Channels, like NSIs portals, endpoints and APIs, and mobile apps.
• Customer relationships, like contests (e.g. hackatons), and feedback.
After these two initial presentations, several presentations on national experiences
were made: Mr Giovanni Barbieri from ISTAT, Mr Darren Barnes from ONS United
Kingdom and Mr Eoin MacCuirc from CSO Ireland.
Mr Giovanni Barbieri described the principles and main design decisions behind the
datiopen.istat.it portal. Some of the benefits that were highlighted in his presentation
were the opportunity to reinforce trust, getting closer to users, making easier for users
to retrieve data and provide richer services to the users, reaching new users, giving
information back and improving metadata. Then a good presentation of use cases from
Italy was given, on spatial querying on mobile apps, federated queries on ISTAT and
ISFRA datasets, and connecting with social media. He concluded on the fact that recent
technologies advances in the open data community enable new advanced dissemination
channels for official statistics.
ESS Workshop: dissemination of official statistics as open data Malta 2017
16
Mr Darren Barnes described the current landscape of Linked Data from a UK perspective.
The number of Linked Data providers from the UK (e.g., Government, Geographical data
producers) is relatively small. The main challenges identified are not technical but rather
the lack of skills and trainings inside NSIs to deal with this type of approach. He also
raised the need for agreed standards and vocabularies to deploy an LOD approach
conveniently: identifiers, data models, vocabularies for dimensions, attributes and
measures, reference data, metadata and API methods. Next steps include proof of
concept demos.
ESS Workshop: dissemination of official statistics as open data Malta 2017
17
Finally, in the last presentation Mr Eoin MacCuirc discussed about the status of open
data and linked open data in Ireland, and not just on the National Statistical Office.
Statements about how to produce knowledge from data were discussed, referring to
the CSO statement of strategy 2016-2018. The presentation also pointed out the
importance of a good collaboration with the academic sector. In an analogy to how the
Web was created, the presentation also reflected about the fact that what is now seen
as difficult (publishing linked data and linking it to other datasets) is similar to what was
done at the beginning of the Web.
The next steps for CSO are related to data audits, assisting data publishers (in fact, one
of the last points covered in this presentation was the fact that now CSO is being
contacted by other organisations for help in the publication of their data), and publishing
high value datasets. The main benefits identified for the adoption of LOD were:
• More transparency and accountability of public bodies
• Better data discipline in public bodies, providing for greater efficiency and effectiveness of service delivery
• More citizen participation and inclusion
• Business innovation, business creation and business efficiency, leading to economic growth
• Opportunity for CSO to play a pivotal role
ESS Workshop: dissemination of official statistics as open data Malta 2017
18
During the question&answer session for this presentation, Mr MacCuirc was asked
about which were the high value datasets to be considered for publication first. The
answer was that those ones already proposed by the Open Knowledge Foundation
(OKFN) for the whole world, for instance. It is not completely clear yet which are the
users and how users are making use of the Linked Data. In any case, it is extremely
important to work on achieving a good data literacy among the population, and
especially starting with young people and kids.
2.3 Session 2: Challenges and Tools
Chair: Ms Martina Hahn
2.3.1 Objective of the session
The purpose of this session was to examine the technical and organisational challenges
raised by LOD, available tools and areas for further development.
The first presentation by Evangelos Kalampokis (University of Macedonia) described
existing tools that can be used for publishing, combining and exploiting Linked Statistical
Data as well as limitations of these tools that should be addressed in order to achieve
the vision of Linked Data Cube Analytics. The second presentation by Franck Cotton
(INSEE) described the experience of INSEE in France with regards to publishing Linked
Statistical Data. Finally, Hannes Reuter (Eurostat) presented the experiences of member
states in linking geospatial information with statistics.
ESS Workshop: dissemination of official statistics as open data Malta 2017
19
2.3.2 Summary of presentations
In the first presentation of the session Evangelos Kalampokis initially motivated the
value of Linked Data technologies in statistics by introducing the vision of Linked Data
Cube Analytics. This vision describes a paradigm where multi-dimensional statistical
datasets are connected through Web standards and users are able to perform
innovative data analytics scenarios on top of multiple datasets.
Towards this end, various software tools have been already developed. These tools
address requirements related to publishing (e.g. Grafter, TARQL from the OpenCube
toolkit, QBer, CSV2DataCube from the LOD2 Statistical Workbench) combining (e.g. the
OpenCube Compatibility Explorer and the StatSpace Explorer) and exploiting (e.g. the
OpenCube OLAP Browser, the CODE Linked Data Query Wizard) Linked Statistical Data.
Most of the tools cover exploiting related functionalities while only few tools aiming at
data integration.
The characteristics that differentiate the publishing tools include among others the user
interface (i.e. graphical user interface or command line interface), the technical format
of the raw datasets (e.g. CSV, JSON-stat, RDBMS etc.), and the structure of the produced
linked data cube. On the other hand, exploitation tools are characterized by the type of
analysis (e.g. OLAP operations and map or graphs visualisations), the domain of
application (e.g. tourism and health), and the way of provision (e.g. web application or
standalone tool). The most important category of tools that were presented was the
tools enabling integration and analysis of multiple datasets. The main characteristics
related to these tools are the way they identify the datasets to join as well as the
integrated data.
Although these tools are a major step towards achieving the vision of linked data cube
analytics, Evangelos Kalampokis also presented some challenges that need to be
addressed. Although statistical data are modelled using the RDF QB vocabulary, which
is a W3C standard, the degrees of freedom of the QB vocabulary allows data publishers
to follow different practices in the application of the vocabulary. For example, they
define the unit of a measured variable at different levels (that is, qb:DataSet,
qb:MeasureProperty, and qb:Observation) and using different type of properties (for
example, qb:AttributeProperty and qb:MeasureProperty). This results in Linked
Statistical Data silos and in software tools that cannot be reused across different
datasets. A set of guidelines for publishing linked statistical data, which are currently
being developed in the course of the H2020 OpenGovIntelligence project, have been
presented to address this challenges.
Moreover, the complexity of linked data technologies often hamper the wide
exploitation of these software tools and thus the wide adoption and exploitation of
Linked Statistical Data. An effort aiming at addressing this challenge is JSON-QB API
developed during the H2020 OpenGovIntelligence project. This API is designed to
ESS Workshop: dissemination of official statistics as open data Malta 2017
20
support developers to use Linked Statistical Data while assuming minimal knowledge of
linked data.
Finally, the performance of the tools has been presented as a challenge, especially in
web-based applications, in large datasets and in the execution of federated queries.
In the second presentation of the session Franck Cotton (INSEE) described the
experience of France in publishing official statistics using linked data technologies.
INSEE, which is the official statistics authority in France, have been experimenting with
RDF since 2006, starting with an official geographic code, a static website
(http://rdf.insee.fr) and an identification namespace (http://id.insee.fr). INSEE
introduced the XKOS vocabulary that extends SKOS and enables a richer description of
code lists and published statistical classifications linked to other datasets (e.g. DBpedia
and Geonames) through a SPARQL endpoint. In 2012 INSEE used RDF for their internal
metadata repository and published their first census dataset as linked data that is
updated every year. INSEE is also very active in the research community as they
participate in research projects (e.g. Datalift) and co-organise the SemStats workshop
which takes place every year with the International Semantic Web Conference since
2013. The Linked Statistical Data of INSEE are available through zip files, SPARQL
enspoints, and dereferencable URIs.
INSEE participated in the UNECE-HLG Implementing Modernstats Standard project
(2016) with two concrete projects related to linked open metadata. The first one was
about the creation of an RDF store with international and national classifications from
numerous organisations (e.g. UN and Eurostat) and countries (e.g. France, Slovenia, Italy
etc.). The second project refers to UNECE models and standards such as GSIM, CSPA etc.
Next steps in INSEE include publishing descriptive metadata as linked data considering
also quality as well as promoting existing datasets through the new INSEE’s
dissemination policy and new open data legislation in France.
After all these years of experimentation with linked data INSEE believes that the
technology is mature and that publishing not only data but also metadata with linked
data is very useful. Linked Statistical Data provide instant data services to users both
inside and outside INSEE, facilitate data integration, support collaboration with other
publishers, and enable distributed data storing and thus limit data replication. However,
they have identified several challenges such as using naming things with URIs (including
modeling and versioning of the data) as well as the need for more “profiled” standards.
In the final presentation of the session Hannes Reuter (Eurostat) presented the
experiences of individual member states in linking geospatial information. This type of
information is important because an action on “merging statistics and geospatial
information” is active since 2012.
ESS Workshop: dissemination of official statistics as open data Malta 2017
21
In particular, Hannes Reuter presented three cases from the UK, Poland, and Finland. In
the UK, data related to geographic codes, postcode lookup and geographic boundaries
was transformed to RDF and published through a portal. This portal apart from RDF data
provides also a user interface. In the polish case, statistical units for which data can be
published with harmonization of their geometries for respective years were identified
and published as RDF. In Finland, the INSPIRE datasets were published with URIs and a
technical infrastructure has been created for dereferencing the URIs.
Outside what NSIs have done, Hannes Reuter described EEA Semantic Data Platform
(http://semantic.eea.europa.eu).
Concluding, Hanner Reuter emphasized the need to bring together the data, metadata,
and geospatial worlds and mentioned that different approaches and methods can be
followed in the implementation of linked geospatial data.
2.4 Session 3: Strategy and Policy
Chair: Ms Martina Hahn
2.4.1 Objective of the session
The objective of this session was to discuss the possible strategic directions for the ESS
in LOD, the level of ambition, common projects and governance issues.
The session had three parts. Firstly, an expert lecture; Secondly, a presentation on a
study by PwC on developing an ESS LOD strategy, the building blocks necessary and
recommending a way forward. Thirdly, there was an opportunity for a group discussion.
While LOD can deliver benefits for the ESS, NSIs and data (re)users there are many
elements to consider in ensuring its ready implementation. NSIs and countries in the ESS
are at different starting points, having different data ecosystems and different levels of
engagement with LOD.
A maturity model approach to delivering LOD nationally and throughout the ESS could
prove optimal. Early adopters can pave the way for a successful implementation of LOD
in the ESS. Strategic test bed projects can deliver well defined outcomes. Lessons
learned, strategies, tools and resources can be shared. Gradually, in a structured and
systematic way the new LOD ecosystem would permeate the ESS and delivering its
benefits.
From the session, it was apparent that the LOD journey has already started in the ESS.
The question the session primarily addressed was how can the ESS best guide this
journey.
ESS Workshop: dissemination of official statistics as open data Malta 2017
22
2.4.2 Summary of presentations
The session started with an expert lecture from Mr Eoin MacCuirc (CSO, Ireland) entitled
“Where are we going? Delivering open data in Ireland and Europe”.
The lecture shared the journey of the CSO from early days in LOD, through publishing
Irish Census 2011 as LOD, to Irish involvement in the Open Cube project and current LOD
projects.
The lecture continued with an outline of how Ireland adopted the Open Government
Partnership and the story of Irelands open data portal data.gov.ie. Mr MacCuirc shared
the resources published to guide the strategic development of open data in Ireland.
Though published in 2014 and 2015 the Best Practice Handbook, the Open Data
Publication Handbook and the Technical Framework provide essential information for
publishing open data nationally and the Roadmap lays out initial steps on such a journey.
Mr Mac Cuirc finished his lecture with some lessons learned and the open data charter
principles.
The key points raised in the lecture were that publishing LOD was possible in a national
and NSI context. Resources and tools are available to guide producers in publishing LOD.
Collaboration is key as with any innovation there are new skillsets, tools, technical and
infrastructural challenges, data and metadata challenges and most NSO in the ESS do
not currently possess the people, capacity and capabilities required in the LOD sphere.
Mr MacCuirc encouraged those present to embrace an incremental and experimental
approach to publishing LOD and stated that the ESS could play a key role in coordinating
and liaising with international LOD developments.
ESS Workshop: dissemination of official statistics as open data Malta 2017
23
The expert lecture was followed by the presentation of the study conducted by PwC
“Towards a joint linked open data strategy for the European Statistical System” and
presented by Dr Nikolaus Loutas. The presentation outlined the essential building blocks
of a LOD strategy for NSIs and proposed LOD proofs of concept for collaborative
development in the ESS.
For each building block Dr Loutas began with recommendations and concluded with key
strategic questions.
The presentation outlined a practical approach to developing a joint LOD strategy in the
ESS. Looking at key players in the ESS LOD landscape and at a high-level architecture for
LOD for official statistics the essential building blocks of an ESS LOD strategy were
identified:
• Strategy and policy
• People and Capabilities
• Data and Metadata
• Linked Data Governance
• Technology and infrastructure
The list of recommendation and the key strategic questions for each building block were
proposed for further discussion by the group.
The presentation concluded with a set of proposed proof of concept for collaborative
development within the ESS publishing linked open (meta)data:
ESS Workshop: dissemination of official statistics as open data Malta 2017
24
1. Linking official statistics within an NSO to improve data dissemination
2. Publishing standardised nomenclatures as linked open metadata
3. Linking official statistics with other data (across ESS) to develop value added
services and apps
Each of these proof of concepts were viewed as quick projects involving Eurostat, NSIs
and other LOD actors, allowing some NSIs to dip their toes into LOD while
simultaneously building collaboration, capability and capacity with the ESS LOD
community.
Dr Loutas’ presentation was followed by a group discussion based on the activity
“From...to” selected for this session.
2.5 Closing Session
The closing address was done by Ms Martina Hahn (Eurostat), who underlined that the
objective of raising awareness on LOD potential and challenges for official statistics was
clearly met. She mentioned the fact that besides the 56 onsite participants from 27
countries, approximately 100 additional persons attended the meeting online, through
streaming, following the talks and seeing the results of the group discussions.
She thanked all the attendants for their active participation throughout the whole
workshop and for their ideas and suggestions, which undoubtedly contributed to the
success of the Workshop.
Ms Hahn reminded the schedule ahead for DIGICOM WP3 during 2017 and provided the
following general conclusions:
1) LOD is an area in which NSIs are still largely experimenting. It is not yet perceived as mature for full production but there was broad agreement on the advantages of developing further steps in a coordinated way at ESS level.
2) This can best be achieved through a “low-hanging fruit” approach and concrete pilots to further demonstrate the feasibility and benefits from LOD and collect more feedback from users.
3) Existing experience (as compiled in the inventory of practices from PWC) can be shared and reused. Learning from experience beyond the EU (e.g. Japan, Australia) will also be useful.
4) There is a shared understanding of the benefits of LOD among NSIs who have experimented it.
ESS Workshop: dissemination of official statistics as open data Malta 2017
25
5) It is time to engage other NSIs, redisseminators, researchers and other actors involved in the data value chain ecosystem. They need to be well informed about the benefits of this approach, and this requires good "marketing" and training material.
6) The combination of LOD and official statistics require a multidisciplinary
approach where different skills need to be taken into account and trained.
7) Further work is needed to develop good governance strategies, on e.g. standard models, vocabularies, codelists, URIs, building on existing models such as the SDMX community.
ESS Workshop: dissemination of official statistics as open data Malta 2017
26
Annex 3: Methodology of the Group Discussions
Group discussions have been considered to be an important part of the reflection to be
carried out during the conference. That has been the reason why a specific team has
been devoted to analyse the program and prepare specific methods and techniques to
stimulate debate and obtain the most inputs from the participants and the best results
in terms of effectiveness and productivity.
The preparation of the discussion groups was performed in parallel to the definition of
the different items and topics of the programme, and according to the profiles of
participants and the objectives of every part of the Workshop. Several meetings from
October 2016 to January 2017 were held to fine tune the objectives and expected
results, to adapt and adjust methodologies of every session to specific needs.
3.1 Objective of the group discussions
Formal content and presentations are usually just a part of the achievements wished
from this sort of events, when there may probably be found as much knowledge in the
audience than the one provided by the speakers. It is key to provide the environment
and the tools to create trust and a fruitful atmosphere for the participants to contribute
with ideas, feedback, and eventually be able to make one step further in the
identification and design of the solutions to needs and challenges posed by the different
topics addressed.
A specific document with detailed information about the objectives for every discussion
was delivered and used for the preparation of the discussions. It is relevant, though, to
cite here the following general aims:
• Set the starting point regarding different perspectives of the situation, reaching
a consensus with regard to causes, effects, and main challenges of the current
context.
• Establish a framework to visualize the key elements in order to start
developing action.
• Harmonize visions and expectations.
• Agree and set up challenges to be addressed.
• Work towards key aspects to face and overcome different challenges, to
bring them into reality.
• Define next steps in the short term, and guidelines and priorities for an
action plan.
ESS Workshop: dissemination of official statistics as open data Malta 2017
27
3.2 Methodology
Each group discussion was developed with a specific methodology according to the
objectives and tasks to be performed. The logics for the group discussions was prepared
through a “Diverge-Converge” framework, in order to get as many ideas as possible, and
the focus them into the most relevant or those gathering higher consensus.
From this, 4 methodologies were implemented:
• Enriched double SWOT (Session 1).
• Lotus Flower (Session 2)
• From-To (Session 3).
• Elevator pitch (to present the results of every team for each of the 3
activities, in 2 or 3 steps versions).
Besides, a specific document for the facilitators to follow and correctly apply every
technique was also provided, in order to harmonize the development of the sessions, to
perform based on the same rules and parameters, and for the results to be as
comparable as possible, taking into account the different issues discussed and the
particular profiles and ways of every discussion. Among the instructions previously
provided to the team of moderators and organisation, presented also during the briefing
of the conference, the following:
ESS Workshop: dissemination of official statistics as open data Malta 2017
28
1 Have a flipchart close to each table/group to work comfortably with all the members
participating at the same time.
2 Be extremely free to get completely focused on every demanding task (question-
mirror-focus-funnel-scanning-everyone’s happy tasks). Please no comments, no
last-minute issues – no “please inform the group that---“, no “tell them to pick the
menu for dinner…”, etc.
3 Be extremely sensitive to the group’s mood, and be entitled to act according to that,
being flexible with timings, results, etc. If a group feels better working a bit different,
let them get results.
4 Have coordination with the other facilitators. At the end, they are seeking a result
for the whole group working from smaller pieces. Facilitators are a team
themselves.
5 Collect and keep pictures of every wall, result, etc. When doing the activity, it seems
trivial, but the day after details get overlapped, mixed and confusing. Keep record
of everything.
6 Think about recording dialogue or parts of the discussion. When you hear a
conversation second time, you grab a lot of new details and nuances. Audio
recording can be easily done with a phone, and it is always a useful tool (always
asking for permission, and if possible).
7 Control time, is key. Not for counting every second, but to be able to do everything
that should be done. Pattern helps to advance, to cover, and not to forget.
Smartphones are also a decent tool to do that.
8 Have enough resources to be relaxed and ready to take the most of the group. When
the group activity is on, everyone is working for the facilitators, they must feel
backed and supported.
3.2.1 Session 1 - Activity: Enriched Double SWOT analysis
The activity Enriched double SWOT analysis consisted of the usual elements of a SWOT
grid, plus a proposal to enrich positive aspects, or keep the negative under control. It
focused on understanding the benefits that Linked Data may bring in and why it is good
to invest on it. Participants first developed their SWOT individually and then, as a group,
worked on a common vision.
ESS Workshop: dissemination of official statistics as open data Malta 2017
29
Participants had 15 minutes to develop their SWOT individually and then 2 minutes to
present to the group his/her vision, without debate and trying not to repeat issues and
concepts already appeared. At the same time, the facilitator was writing down the
different elements in the canvas available next to each table, trying to set up links with
the ideas of the other participants. A joint discussion of 15 minutes followed, where the
group tried to define a common scenario.
The final presentation of the conclusions by each group was made following the
methodology of the “Elevator pitch” (duration: 2 minutes). This methodology implied
that a representative of each group (mainly the facilitator) presented to the whole
audience a short conclusion of the discussion, results, and common agreements reached
by his/her group. The “Elevator pitch” was organised as a climb of “3 floors”:
• 1st Floor (duration: 30 seconds): Composition of the team, who they are, profiles and backgrounds.
• 2nd Floor (duration: 45 seconds): The issues discussed and reflection were presented, in bullets.
• 3rd Floor (duration: 45 seconds): conclusions, key issues and needs to approach.
ESS Workshop: dissemination of official statistics as open data Malta 2017
30
3.2.2 Session 2 - Activity: Lotus Flower
Stimulating participants to develop an open mind exercise to identify as much as
possible key factors to address challenges, facilitating to share and work together about
specific topics and detailed issues. Lotus Flower allows to represent and organize, easing
the discussion to reach consensus and calls for action.
Under this activity every participant elaborated the relevant aspects to aboard a specific
challenge, which were afterwards presented to the group and a final joint proposal was
prepared.
The participants had 15 minutes to develop their Lotus flower individually, taking into
account the challenge assigned, and trying to define the maximum number of elements
needed to face the challenge.
The following step consisted of dividing the group into two smaller groups where the
reduced members presented the aspects they would like to be highlighted, to start a
joint discussion regarding the key aspects to deal with the assigned challenge (duration:
25 minutes). Then, the two reduced groups got together again, and shared the key
aspects chosen to define a final joint proposal (duration: 15 minutes).
The final presentation of the conclusions by each group was made following the
methodology of the “Elevator pitch” (duration: 2 minutes per group), this time in a climb
of “2 floors”:
Brief description
of thechallenge
Needed elements
to addresschallenge
.
.
.
.
..
.
.
.
.
ESS Workshop: dissemination of official statistics as open data Malta 2017
31
3.2.3 Session 3 - Activity: From…to
The final activity was focused on making participants think and agree collectively about
the concrete steps to be taken in order to make the ESS vision 2020 a reality, based on
the previous identification of the elements to be tackled.
“From…To” technique aims to make concrete statements, linked to reality and with a
very oriented focus to future actions to achieve the goals, according to a classical past-
present-future time line logic for discussion and shaping strategy and priorities.
Each group received a project to work on and the following template:
From… Current situation To…
Working as a team, the group had to define a list of key aspects for the development of
the project (duration: 15 minutes), where trends and external factors were also
identified (duration: 10 minutes). Then, working in pairs, the key aspects and trends
were integrated to draft future scenarios, using the template received (duration: 15
minutes).
Then, the group joined again during 20 minutes to define a common scenario. The
presentation of the final conclusions to the other groups followed again the Elevator
pitch, organized in a climb of “3 floors”:
• 1st Floor (duration: 1 minute): From…TO: Where do we come from, where do we want to reach.
• 2nd Floor (duration: 1 minute): Key success factors for the project.
• 3rd Floor (duration: 1 minute): First steps to be taken in an Action Plan.
ESS Workshop: dissemination of official statistics as open data Malta 2017
32