eurostat 4. sdmx: main objects for data exchange 1 raynald palmieri eurostat unit b5: “central...

40
Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, 27-29 October 2015

Upload: lester-small

Post on 21-Jan-2016

236 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, 27-29 October

Eurostat

4. SDMX: Main objects for data exchange

1

Raynald PalmieriEurostatUnit B5: “Central data and metadata services”

SDMX Basics course, 27-29 October 2015

Page 2: Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, 27-29 October

Eurostat

The SDMXComponents

2

Describe statistics in a standard way Objects and their relationships

Data Structure Definition (DSD), Concepts, Code List

Central management and standard access SDMX Registry, SDMX Web Services

Cross Domain Concepts Cross Domain Code Lists Statistical Domains Metadata Common Vocabulary

Push Provider generates and sends file to receiver

Pull Provider opens web service to data Receiver downloads regularly

Hub Special case of pull: receiver downloads on end user request

Page 3: Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, 27-29 October

Eurostat

Who?

What?

When?Who?

Where?How?

What?

Describing the data exchange

Page 4: Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, 27-29 October

Eurostat

Dataflows - classification

4

Sub categories

Statistical Tables = data flows

Category Tourism

Page 5: Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, 27-29 October

Eurostat

SDMX Implementation steps

5

DSD sharing

DataflowsConcepts & Code

lists

SDMX Data Structure Definition

Page 6: Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, 27-29 October

Eurostat

SDMX Implementation steps

6

Provision agreement

Dataflows

Data Structure

Data Provider?

DataflowsDataflows

Table 1Table 2

Table 3

Definition of flows

Definition of table structures

Data Structure

Data Structure

Page 7: Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, 27-29 October

Eurostat

Dataflows - classification

7

Tourism

Capacity

Occupancy

Night_Spent

Arrival_of_residents

Occupancy_rate

DataflowsCategories

Page 8: Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, 27-29 October

Eurostat

Concepts & Codelists : Tourism Example

• What do we want to exchange?• Statistical tables

8

Page 9: Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, 27-29 October

Eurostat

Preparation phaseSDMX Implementation steps

9

DSD sharing

DataflowsConcepts & Code

lists

SDMX Data Structure Definition

Page 10: Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, 27-29 October

Eurostat

I ndicator

2002A00 33411 2374 61479

2003A00 33480 2530 58526

2004A00 33518 2529 56586

2005A00 33527 2411 68385

2006A00 33768 2510 68376

2007A00 34058 2587 61810

Number of touristic establishmentsin I taly, annual data

A100Hotels and similar

B010Tourist Campsites

B020Holiday dwellings

2529

Tourism establishmentsItaly Annual data

I ndicatorTime

2002A00 33411 2374 61479

2003A00 33480 2530 58526

2004A00 33518 2529 56586

2005A00 33527 2411 68385

2006A00 33768 2510 68376

2007A00 34058 2587 61810

I ndicatorTime

2002A00 33411 2374 61479

2003A00 33480 2530 58526

2004A00 33518 2529 56586

2005A00 33527 2411 68385

2006A00 33768 2510 68376

2007A00 34058 2587 61810

A100Hotels and similar

B010Tourist Campsites

B020Holiday dwellings

I ndicatorTime

2002A00 33411 2374 61479

2003A00 33480 2530 58526

2004A00 33518 2529 56586

2005A00 33527 2411 68385

2006A00 33768 2510 68376

2007A00 34058 2587 61810

Number

I ndicatorTime

2002A00 33411 2374 61479

2003A00 33480 2530 58526

2004A00 33518 2529 56586

2005A00 33527 2411 68385

2006A00 33768 2510 68376

2007A00 34058 2587 61810

A100Hotels and similar

I ndicatorTime

2002A00 33411 2374 61479

2003A00 33480 2530 58526

2004A00 33518 2529 56586

2005A00 33527 2411 68385

2006A00 33768 2510 68376

2007A00 34058 2587 61810

A100Hotels and similar

B010Tourist Campsites

I ndicator

2002A00 33411 2374 61479

2003A00 33480 2530 58526

2004A00 33518 2529 56586

2005A00 33527 2411 68385

2006A00 33768 2510 68376

2007A00 34058 2587 61810

Model of the statistical table

Page 11: Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, 27-29 October

Eurostat

• Sources• Existing data set tables

From websiteFrom applications

• Data Collection InstrumentsQuestionnaires/Excel spreadsheets

• Handbooks, User Guides• Database Tables• Existing Data Structure Definitions

From other organisations• Legislation/Regulation

• Identify the Concepts• A concept is a unit of knowledge created by a

unique combination of characteristics (SDMX Information Model)

Model of the statistical table:What do we need to do first?

Page 12: Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, 27-29 October

Eurostat

TIME

COUNTRY

FREQUENCY

TOURISM_ACTIVITY

OBS_VALUE

E

UNIT

TOURISM_INDICATOR

P

Concept Identifier Concept name FormatFREQUENCY Frequency A1COUNTRY Country A2TOURISM_INDICATOR Tourism Indicator AN4TOURISM_ACTIVITY Tourism Activity AN4TIME Time Period N4OBS_VALUE Observation N15UNIT Unit AN2OBS_STATUS Observation status A1

Identifying the concepts

OBS_STATUS

Page 13: Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, 27-29 October

Eurostat

ID CS_TOURISMVersion 1.0Name (English) List of statistical concept for Tourism tables

(French) Liste des concepts statistiques pour les tables TourismeDescription (English) Concept list to be used for all Tourism tables

(French) Liste des concepts valable pout toutes les tables Tourisme

Concept Scheme:

Concept Identifier Concept name Format Code listFREQUENCY Frequency A1 CL_FREQCOUNTRY Country A2 CL_AREATOURISM_INDICATOR Tourism Indicator AN4 CL_TOUR_INDICATORTOURISM_ACTIVITY Tourism Activity AN4 CL_TOUR_ACTIVITYTIME Time Period N4OBS_VALUE Observation N15UNIT Unit AN2 CL_UNITOBS_STATUS Observation status A1 CL_OBS_STATUS

Concept Scheme

Page 14: Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, 27-29 October

Eurostat

Identify/Define Code Lists

• Purpose of a Code List• Constrains the value domain of concepts

when used in a structure like a data structure definition

• Defines a shortened language independent representation of the values

• Gives semantic meaning to the values, possibly in multiple languages

• Agreeing on harmonised code lists is an important aspect of defining a data structure definition

Page 15: Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, 27-29 October

Eurostat

ID: CL_AREA

Version: 1,0

Maintenance Agency: ESTAT

Name: (English) List of géographical ISO codes

Code ID Name (English)AT AustriaBE BelgiumDE GermanyES SpainFR FranceIT ItalyPT Portugal

Code lists

Concepts & Codelists : Tourism Example

Partial code lists can also be exchanged (v2.1).

The content of the partial code list is specified in a

Constraint.

Code list is maintainable SDMX container.

Each code is defined uniquely by an ID, a

maintenance agency, and a version. The name can be

provided in several languages.

15

SDMX Code List

Page 16: Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, 27-29 October

Eurostat

Exercise: Deriving a concept scheme from a table

Concept Identifier Concept name Format Code list

Exercice

Page 17: Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, 27-29 October

Eurostat

Deriving a concept scheme from a table

Concept Identifier Concept name Format Code list

FREQUENCY Frequency A1 CL_FREQUNIT Unit AN2 CL_UNITGEO Country A2 CL_GEONACE_R2 Economic Activity AN11 CL_NACE_R2WASTE Type of Waste AN15 CL_WASTEHAZARD Hazardous Waste AN5 CL_HAZARDOUSOBS_VALUE Observation N15TIME Time Period N4

Proposed solution

Page 18: Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, 27-29 October

Eurostat

Data Set Structure

• Computers need to know the structure of data in terms of:• Dimensionality• Additional metadata• Measures (Observation)• Concepts• Valid content

Code ListsNon coded format (integer, date, text)

Page 19: Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, 27-29 October

Eurostat

Concepts play roles in a Data Structure

• Comprises– Concepts that identify the observation value– Concepts that add additional metadata about

the observation value (as a value or the context of the value)

– Concept that is the observation value– Any of these may be

• coded• text• date/time• number• etc.

Dimensions

Attributes

Measure

Representation

Page 20: Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, 27-29 October

Eurostat

TIME

COUNTRYFREQUENCY

TOURISM_ACTIVITY

OBS_VALUE

P

EOBS_STATUS

DIMENSIONS ATTRIBUTES MEASURES

UNIT

TOURISM_INDICATOR

DERIVING A DATA STRUCTURE FROM A TABLE

Page 21: Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, 27-29 October

Eurostat

ID TOURISM_AVersion 1.0Name (English) Strucutre of the Tourism table

(French) Strucutre de la table TourismeDescription (English) Data Structure Definition for Tourism activity

(French) Définition de la structure de données pour l'activité Touristique

Data Structure Definition:

DATA STRUCTURE DEFINITION

Page 22: Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, 27-29 October

Eurostat

DATA STRUCTURE DEFINITION - Summary

DSDConcept Scheme

Code listsReference Reference

Reference

Page 23: Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, 27-29 October

Eurostat

DATA STRUCTURE DEFINITION - Design

Data Structure Wizard

• Java desktop application• Graphical Interface• For DSD designers• Maintenance of SDMX v2.0/2.1

data and meta data structures• Web service to query/submit

SDMX registries

Page 24: Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, 27-29 October

Eurostat

Publishing DSDs: SDMX Registry

Graphical User

Interface

Web service

Page 25: Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, 27-29 October

Eurostat

Exercise: Consult a DSD

URL Registry ( Test purpose):https://webgate.acceptance.ec.europa.eu/sdmxregistry/

DSD: WASTE_GENER

Page 26: Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, 27-29 October

Eurostat

Exercise: Browse the different objects of the DSD

Codelists:• CL_FREQ• CL_GEO_EUCCEFTA• CL_WASTE• CL_HAZARD• CL_NACE_R2_WASTE

Concept Scheme:• CS_WASTE

DSD:• WASTE_GENER

Page 27: Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, 27-29 October

Eurostat

SDMX Implementation steps

27

DSD sharing

DataflowsConcepts & Code

lists

SDMX Data Structure Definition

Page 28: Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, 27-29 October

Eurostat

DSD Sharing: Tourism Example

28

Table/Concept FREQ

TOU

RISM

_IN

DiCA

TOR

TOU

RISM

_ACT

IVIT

Y

DE

ST

DU

RA

TIO

N

CO

UN

TR

Y

PU

RP

OS

E

TIM

E

TIN

FO

UN

IT

UN

IT_

MU

LT

IPL

IER

OB

S_

ST

AT

US

tour_cap_nat A x x x x x x xtour_cap_bed A x A003 x x x 1000 xtour_dem_toq Q x O x x x x x xtour_dem_exq Q O x x x x x x x

Page 29: Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, 27-29 October

Eurostat

How to achieve DSD sharing? Use of Constraints

The Constraint can define one or both of:• the Codes in a Code List that are applicable

Ex: (A, M, W, Q) -> (A)

• the list of series keys that are applicable

Can be used to constrain the DSD for which a sub set of the DSD content is meaningful. Constraints are usually linked to the dataflows or the provision agreements.

29

FREQ COUNTRY TOURISM_INDICATOR

TOURISM_ACTIVITY

A IT A003 B100

Page 30: Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, 27-29 October

Eurostat

Table/Concept FREQ

TOU

RISM

_IN

DiCA

TOR

TOU

RISM

_ACT

IVIT

Y

DE

ST

DU

RA

TIO

N

CO

UN

TR

Y

PU

RP

OS

E

TIM

E

TIN

FO

UN

IT

UN

IT_

MU

LT

IPL

IER

OB

S_

ST

AT

US

tour_cap_nat A x x x x x x xtour_cap_bed A x A003 x x x 1000 xtour_dem_toq Q x O x x x x x xtour_dem_exq Q O x x x x x x x

Constraints – Example

DSD_TOUR_CAP_XS

DSD_TOUR_DEM_XS30

Page 31: Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, 27-29 October

Eurostat

SDMX Dataset

P

E

DSDDefine the structure

Dataset = XML file describing the table content according to the DSD.

Page 32: Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, 27-29 October

Eurostat

Syntaxes for SDMX datasets

• Based on a common Information Model• SDMX-EDI (GESMES/TS)

EDIFACT syntaxTime-series oriented – One format for Data

Sets• SDMX-ML

XML syntaxDifferent formats for Data SetsEasier validation (XML based)

Page 33: Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, 27-29 October

Eurostat

Equivalent formatsEquivalent formats

Generic SDMX-ML

Cross-sectional SDMX-ML

Compact SDMX-ML

Can be expanded to other formats (e.g. CSV, GESMES)

Can be expanded to other formats (e.g. CSV, GESMES)

Based on the

same IM

Based on the

same IM

SDMX-ML formats Conversions

Page 34: Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, 27-29 October

Eurostat

Element Example id TEST0000 test true truncated false name FISH_AQ_TEST prepared 2010-30-01T09:30:47+01:00 senderid ESTAT sendername Eurostat sendercontactname G. Smith sendercontactdepartment Statistics sendercontactrole Response sendercontacttelephone 0210 2222222 sendercontactfax 0210 00010999 sendercontactx400 sendercontacturi www.sdmx.org sendercontactemail [email protected] receiverid NSI_GB receivername CSO receivercontactname P. Mustermann receivercontactdepartment Statistics receivercontactrole Statistician receivercontacttelephone 02101234567 receivercontactfax 02103810999 receivercontactx400 receivercontacturi www.sdmx.org receivercontactemail [email protected] datasetagency ESTAT datasetid FISH_AQX datasetaction Append extracted 2010-30-01T09:30:47+01:00 reportingbegin 2008-01-01T00:00:00 reportingend 2008-12-31T00:00:00 source DH lang en

SDMX data common header

Page 35: Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, 27-29 October

Eurostat

SDMX 2.0 vs 2.1

Page 36: Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, 27-29 October

Eurostat

Equivalent representations for reporting DatasetsEquivalent representations for reporting Datasets

Version 2.0 Version 2.1

4 data messages, each with a distinct format.

GenericData

CrossSectional DataCompact Data

UtilityData

Therefore, there are now 4 data messages which are based on two general formats:

• GenericData GenericTimeSeriesData

• StructureSpecificData StructureSpecificTimeSeriesData

Phased out

SDMX-ML formats

Page 37: Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, 27-29 October

Eurostat

Data structure Definition (DSD)

• Support for non-time-series data structuresMeasure DimensionDSD

Code lists

Code lists

Code lists

DimensionsAnd

Measure dimension

Attributes

Measures

Concepts

DSD

Version 2.0 Version 2.1

Measure Dimension

Dimensions

Attributes

Primary Measure

Concepts

Concept Scheme

Code lists

Code lists

Concept role explicit element

Page 38: Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, 27-29 October

Eurostat

The same Constraint can be “used” to constrain

multiple objects

Constraint

Version 2.0 Version 2.1

Dataflow

Provision agreement

Constraint

Constraint

Registry Constraint

Dataflow

Provision agreement

DSD

Constraint is embedded in the

object it constrains

Constraint is onlyavailable for use

in a Registry context

Constraint is independently

maintained

Page 39: Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, 27-29 October

Eurostat

Code List

Common

Code listConstraint 1 P

artia

l

DSD DSD

Constraint 2

Version 2.1

Page 40: Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, 27-29 October

Eurostat

Questions