linked statistical data 101 - european commission · linked statistical data 101 ess workshop on...

Post on 25-Oct-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Linked Statistical Data 101ESS Workshop on dissemination of official

statistics as open data

18-19 January 2017, Malta

Oscar CorchoEscuela Técnica Superior de Ingenieros Informáticos

Universidad Politécnica de Madrid

Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid

http://www.oeg-upm.net/

ocorcho@fi.upm.es

Contents

• Foundations of (Linked) Open Data

- For public administrations, in general

- For statistical offices, in particular

• Linked Statistical Data by example

- A use case from IAEST (Aragón Statistical Office)

• A bit of technical background

- W3C RDF DataCube

• Preparing the discussion on benefits for different

types of stakeholders

2

Contents

• Foundations of (Linked) Open Data

- For public administrations, in general

- For statistical offices, in particular

• Linked Statistical Data by example

- A use case from IAEST (Aragón Statistical Office)

• A bit of technical background

- W3C RDF DataCube

• Preparing the discussion on benefits for different

types of stakeholders

3

What is Open Data?

• Open data is data that can be freely used, re-used

and redistributed by anyone - subject only, at most, to

the requirement to attribute and share alike

• Key aspects:

- Availability and access: the data must be available as a

whole and at no more than a reasonable reproduction cost,

preferably by downloading over the Internet. The data must

also be available in a convenient and modifiable form.

- Re-use and redistribution: the data must be provided

under terms that permit re-use and redistribution including

the intermixing with other datasets.

- Universal participation: everyone must be able to use, re-

use and redistribute - there should be no discrimination

against fields of endeavour or against persons or groups

[source: Open Data Handbook, http://opendatahandbook.org/en/what-is-open-data/ ]

Relevant Legislation. Europe and Spain

• Open Access Initiative (2001). Scientific information; > 510 orgs

• Aarhus Convention (1998). Right to participate and access; 41

countries and the EU

• PSI Directives. PSI reuse (2003/98/EC and 2013/37/UE)

• Convention about access to official documentation (2009)

- 12 countries

• Law 37/2007. PSI reuse (transposition of directive 2003/98/EC)

- Modified in law 18/2015 (BOE 10/07/2015, directive 2013/37/UE )

• Law 11/2007. Citizen access to public services, and rights to good

quality services

• RD 4/2010 Esquema Nacional de Interoperabilidad

- Open standards, technology neutral, open source

• RD 1495/2011 It develops Law 37/2007 for national agencies

• Norma Técnica de Interoperabilidad (19/02/2013, BOE 4/3/2013)

[source: based on a presentation from Antonio Rodríguez Pascual (CNIG)]

An Explosion of Open Data Portals

Open Data and how to publish it

1) In a posterboard

- For those with a lot of free time available

- Or those who happen to be there at the right time

Adapted from: Antonio Rodríguez Pascual (IGN)

Open Data and how to publish it

2) On a Web page or mobile app

- For people, but not downloadable

Adapted from: Antonio Rodríguez Pascual (IGN)

Open Data and how to publish it

3) In files

- These can be downloaded and use by humans in

information systems (XML, HTML, CSV, GTFS, etc.)

- Luckily, it is not a scanned PDF

Adapted from: Antonio Rodríguez Pascual (IGN)

Open Data and how to publish it

4) Via Web Services

- They can be used by systems (sometimes persons)

- They allow generating added value

- Ease of integration in the application logic

Adapted from: Antonio Rodríguez Pascual (IGN)

All together…, Shaken, not stirred…

What is Linked Data?

1. Use URIs to identify

rsources

2. Use HTTP URIs, so that

they can be found

3. Use de-referenceable

URIs, that is, provide

useful data (RDF, JSON,

SPARQL)

4. Include links to other

URIs.

• http://www.w3.org/DesignIssues/

LinkedData.html

Open Data and how to publish it

5) Via APIs (semantically enhanced) and linked

- To be used by systems (and sometimes persons)

- It allows generating added-value services

- Standardised formats (JSON, JSON-LD, RDF)

- Standardised models (vocabularies, ontologies)

Difficult to reuse

√ Reusable.

Not open

√ Reusable, open

Difficult to link together

√ Reusable, open,

complete, easier to link

Data representation formats

And many more: JSON, JSON-LD, Shapefiles, KMZ, KML, PC-Axis, etc.

Recap: The 5-star categorisation from TBL

Contents

• Foundations of (Linked) Open Data

- For public administrations, in general

- For statistical offices, in particular

• Linked Statistical Data by example

- A use case from IAEST (Aragón Statistical Office)

• A bit of technical background

- W3C RDF DataCube

• Preparing the discussion on benefits for different

types of stakeholders

18

INFRASTRUCTURE

MICRODATA

MACRODATA

i

Cartography, streets,directories, codes…

ANALYSTSJOURNALISTS

CITIZENS

RESEARCHERS

METADATA

Which type of data and which (re)users?

[source: Alberto González Yanes (ISTAC)]

Our use case: Aragón

• IAEST

- Instituto Aragonés de

Estadística

• Good open data

ecosystem

- Aragón Open Data

• http://opendata.aragon.es/

- Zaragoza

• http://datos.zaragoza.es/

24

Reports and templates

from Oracle BI

Current Web application

for local statistics

Statistics about municipalities

Statistics about municipalities

• At IAEst Web

- http://www.aragon.es/DepartamentosOrganismosPublicos/In

stitutos/InstitutoAragonesEstadistica/AreasGenericas/ci.Esta

disticaLocal.detalleDepartamento

• At OpenDataAragón

- http://opendata.aragon.es/catalogo/edificios-superficie-y-

vivienda-comarcas

Reports and templates

from Oracle BI

Current Web application

for local statistics

What have we done?

SPARQL

Elda

Linked Data

Transformation process

API

Publication process

General architecture

This is not the purpose of my talk

https://github.com/aragonopendata/local-data-aragopedia

URIs for datasets

• Let’s look for the dataset on “Number of homes per

owner per municipality”

- Número de hogares por tipo de propietario por municipio

• The dataset has a URI

- http://opendata.aragon.es/recurso/iaest/dataset/01-

010013TM

URIs for each observation

• And now we can point to specific observations in this

dataset

- In 2001, the number of buildings owned by one person in the

municipality of Ilche

• http://opendata.aragon.es/recurso/iaest/observacion/01-

010013TM/00794aab-964f-35c7-8e7c-156c9bc60133

36

URIs for each observation

37

And links to other URIs in Aragón

• The municipality of Ilche

- http://opendata.aragon.es/recurso/territorio/Municipio/Ilche

- This information is owned by another department of the

Government of Aragón

38

And links to codelists

• Types of owners

- http://opendata.aragon.es/kos/iaest/clase-de-propietario

• The community

• A person

• A society

• A public organisation

39

SPARQL endpoint

The women population in Zaragoza in the age range of 0-15

years growed until 2013 and then reduced

select distinct ?year ?personas

where

{

?x a qb:Observation .

?x qb:dataSet <http://opendata.aragon.es/recurso/iaest/dataset/03-030005TM> .

?x <http://purl.org/linked-data/sdmx/2009/dimension#refPeriod> ?year .

?x <http://purl.org/linked-data/sdmx/2009/dimension#refArea> <http://opendata.aragon.es/recurso/territorio/Municipio/Zaragoza>.

?x <http://opendata.aragon.es/def/iaest/dimension#edad-grandes-grupos> <http://opendata.aragon.es/kos/iaest/edad-grandes-grupos/0-a-15> .

?x <http://opendata.aragon.es/def/iaest/dimension#sexo> <http://opendata.aragon.es/kos/iaest/sexo/mujeres>.

?x <http://opendata.aragon.es/def/iaest/medida#personas> ?personas .

} ORDER BY ?year

Examples at

https://github.com/aragonopendata/local-data-aragopedia/blob/master/consultas.md

Contents

• Foundations of (Linked) Open Data

- For public administrations, in general

- For statistical offices, in particular

• Linked Statistical Data by example

- A use case from IAEST (Aragón Statistical Office)

• A bit of technical background

- W3C RDF DataCube

• Preparing the discussion on benefits for different

types of stakeholders

48

W3C Data Cube

4949

http://www.w3.org/TR/vocab-data-cube/

W3C Data Cube

5050

DataSets and Observations

55

Observations in a dataset

58

qb:DataSetqb:Observationqb:dataSet

rdf:type

iaest-data:01-010003M/22001/030-045 aod:Abiego

sdmx:refArea

Iaest-codelist:superficie030-045

iaest:superficieUtil

“1”^^xsd:int

Iaest:numeroHogares

iaest:01-010003M

qb:dataSet

rdf:type

DataCube Structure Definition

60

Describing the dataset

61

qb:DataSet qb:DataStructureDefinition qb:ComponentSpecification qb:ComponentProperty

sdmx:refArea

iaest:superficieUtil

qb:structure qb:component qb:componentProperty

rdf:type rdf:type

iaest:01-010003M iaest--dsd:01-010003M

qb:structure qb:component

qb:measure

iaest:numeroHogares

qb:dimension

qb:dimension

rdf:typerdf:type

Dimensions

62

qb:DataSet qb:DataStructureDefinition

rdfs:range

qb:concept

qb:DimensionProperty qb:MeasureProperty

qb:Observation

esadm:MunicipioIaest:SuperficieUtil

qb:ComponentSpecification

qb:ComponentProperty

rdfs:subClassOf

qb:dataSet

iaest:numeroHogaressdmx:refAreaiaest:superficieUtil

rdf:type rdf:type

rdfs:range

xsd:int

rdfs:range

qb:structure qb:component

qb:componentProperty

SKOS Codelists

63

rdfs:subClassOf

sdmx:CodeList skos:Conceptskos:ConceptScheme

iaest:SuperficieUtilqb:codeList

Iaest-codelist:SuperficieUtil

rdf:type

Iaest-codelist:superficie030-045

skos:hasTopConceptrdf:type

Iaest-codelist:superficie046-060

Iaest-codelist:superficie180-mas

Contents

• Foundations of (Linked) Open Data

- For public administrations, in general

- For statistical offices, in particular

• Linked Statistical Data by example

- A use case from IAEST (Aragón Statistical Office)

• A bit of technical background

- W3C RDF DataCube

• Preparing the discussion on benefits for different

types of stakeholders

79

Why Linked Statistical Data? (I)

• Facilitate data (re)use by developers outside our

organisation

• Data access APIs (according to standards)

• Do they prefer CSVs, PCAxis, SDMX, RDF?

• Fine-grained data granularity (refer to specific facts)

• Integration with other data sources from other public

or private organisations

- E.g., Government of Aragón for municipalities

• Allow for queries across datasets

- E.g., tell me how many municipalities may benefit from this

funding that I am making available with these restrictions:

number of registered companies lower than 5 and

unemployed population higher than 15%

Why Linked Statistical Data? (II)

• Internal benefits as well

- Codelists are made available and more visible internally

- Methodology and metadata explicitly described as part of

the RDF DataCube data (e.g., reference years in datasets)

81

Linked Statistical Data 101ESS Workshop on dissemination of official

statistics as open data

18-19 January 2017, Malta

Oscar CorchoEscuela Técnica Superior de Ingenieros Informáticos

Universidad Politécnica de Madrid

Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid

http://www.oeg-upm.net/

ocorcho@fi.upm.es

top related