using the metadata in statistical processing cycle – the production tools perspective

21
USING THE METADATA IN STATISTICAL PROCESSING CYCLE – THE PRODUCTION TOOLS PERSPECTIVE Matjaž Jug, Pavle Kozjek, Tomaž Špeh Statistical Office of the Republic of Slovenia

Upload: hamish-barnett

Post on 30-Dec-2015

28 views

Category:

Documents


0 download

DESCRIPTION

USING THE METADATA IN STATISTICAL PROCESSING CYCLE – THE PRODUCTION TOOLS PERSPECTIVE. Matjaž Jug, Pavle Kozjek, Tomaž Špeh Statistical Office of the Republic of Slovenia. Overview. Current statistical production cycle in SORS Using the metadata in B laise applications - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: USING THE METADATA IN STATISTICAL PROCESSING CYCLE – THE PRODUCTION TOOLS PERSPECTIVE

USING THE METADATA IN STATISTICAL PROCESSING CYCLE – THE PRODUCTION

TOOLS PERSPECTIVE

Matjaž Jug, Pavle Kozjek, Tomaž ŠpehStatistical Office of the Republic of

Slovenia

Page 2: USING THE METADATA IN STATISTICAL PROCESSING CYCLE – THE PRODUCTION TOOLS PERSPECTIVE

Overview

Current statistical production cycle in SORS

Using the metadata in Blaise applications The role of metadata in automatic editing

system in SAS Metadata connected with the data in

Oracle data warehouse Lessons learnt Questions

Page 3: USING THE METADATA IN STATISTICAL PROCESSING CYCLE – THE PRODUCTION TOOLS PERSPECTIVE

Current statistical production cycle

Entry and micro editing (Blaise) Macro and statistical editing (SAS) Storing and analysis (Oracle) Dissemination (PC-Axis) Central metadata stores (Klasje &

Metis)

Page 4: USING THE METADATA IN STATISTICAL PROCESSING CYCLE – THE PRODUCTION TOOLS PERSPECTIVE

Using the metadata in Blaise applications

Generation of (high speed) data-entry applications using Gentry (using by non-IT personnel)

Metadata-based transformations between different data structures (EXTRA-FAT, FAT, THIN)

Page 5: USING THE METADATA IN STATISTICAL PROCESSING CYCLE – THE PRODUCTION TOOLS PERSPECTIVE

Gentry – tool for generation of the Blaise data-entry application

Questionnaire structure and layout (name, blocks, tables, routing etc.)

Field characteristics (length, data type, constants, other parameters)

Field characteristics

Data type

Page 6: USING THE METADATA IN STATISTICAL PROCESSING CYCLE – THE PRODUCTION TOOLS PERSPECTIVE

Gentry – example of generated application

section

header

Data entry for table 12

Page 7: USING THE METADATA IN STATISTICAL PROCESSING CYCLE – THE PRODUCTION TOOLS PERSPECTIVE

Transformations

PROVIDER Industry SizeClass QuantityProduct1 ValueEURProduct1 QuantityProduct2 ValueEURProduct2Provider 1 A big 150 300 200 400

PROVIDER PRODUCT Industry SizeClass Quantity ValueEURProvider 1 Product 1 A big 150 300Provider 1 Product 2 A big 200 400

All data for one unit (provider) in one row (EXTRA FAT): suitable for micro editing

PROVIDER PRODUCT Industry SizeClass ContVariables ContObservationsProvider 1 Product 1 A big Quantity 150Provider 1 Product 2 A big ValueEUR 300Provider 1 Product 1 A big Quantity 200Provider 1 Product 2 A big ValueEUR 400

Classification and continuous variables in the columns (FAT): suitable for analysis

Classification variables in the columns and continuous variables in the rows (THIN)

Metadata-based transformation in Blaise

Metadata-based transformation in SAS

Page 8: USING THE METADATA IN STATISTICAL PROCESSING CYCLE – THE PRODUCTION TOOLS PERSPECTIVE

The role of metadata in automatic editing system in SAS

General system for automated editing

Process metadata

Page 9: USING THE METADATA IN STATISTICAL PROCESSING CYCLE – THE PRODUCTION TOOLS PERSPECTIVE

The role of metadata in automatic editing system in SAS

In order to be general the tool must be able to:- recognize the data which are due to be

subjected to editing and/or imputation; - recognize which editing method should be

applied,- and with what parameters

Page 10: USING THE METADATA IN STATISTICAL PROCESSING CYCLE – THE PRODUCTION TOOLS PERSPECTIVE

Process indicators – level 1

Mode of data collection - 1 data provided directly by reporting unit- 2 data from administrative source- 3 data computed from original values- 4 imputed data – imputation of non-response- 5 imputed data – imputation due to invalid

values detected through the editing process- 6 data missing because the unit is not

eligible for the item (logical skip)

Page 11: USING THE METADATA IN STATISTICAL PROCESSING CYCLE – THE PRODUCTION TOOLS PERSPECTIVE

Process indicators – level 2

Data status - 1 original value- 2 corrected value

Page 12: USING THE METADATA IN STATISTICAL PROCESSING CYCLE – THE PRODUCTION TOOLS PERSPECTIVE

Process indicators – level 3

Method of data correction - 11 correction after telephone contact- 12 data reported at a later stage

Page 13: USING THE METADATA IN STATISTICAL PROCESSING CYCLE – THE PRODUCTION TOOLS PERSPECTIVE

Process indicators – level 3

Reporting methods - 11 reporting by mail questionnaire- 12 computer assisted telephone interview(CATI)- 13 telephone interview without computer

assistance- 14 paper assisted personal interview (PAPI)- 15 computer assisted personal interview (CAPI)- 16 paper assisted self interviewing - 17 computer assisted self interviewing- 18 web reporting

Page 14: USING THE METADATA IN STATISTICAL PROCESSING CYCLE – THE PRODUCTION TOOLS PERSPECTIVE

Process indicators – level 3

Imputation methods - 10 method of zero values- 11 logical imputation- 12 historical data imputation- 13 mean values imputation- 14 nearest neighbour imputation- 15 hot-deck imputation- 16 cold-deck imputation- 17 regression imputation- 18 method of the most frequent value- 19 estimation of anual value based on infraanual data- 21 stochastic hot-deck (random donor) - 22 regression imputation with random residuals- 23 multiple imputation

Page 15: USING THE METADATA IN STATISTICAL PROCESSING CYCLE – THE PRODUCTION TOOLS PERSPECTIVE

Process indicators examples - xy.zz

11.15 means:1 - data provided

directly by reporting unit

11 - original value11.15 - computer

assisted personal interview (CAPI)

42.19 means:4 - imputed data –

imputation of non-response

42 - corrected value42.19 - estimation of

anual value based on infraanual data

Page 16: USING THE METADATA IN STATISTICAL PROCESSING CYCLE – THE PRODUCTION TOOLS PERSPECTIVE

Statistical process

Key responders

Other units

SAS

Blaise

OracleSAS

Blaise

Page 17: USING THE METADATA IN STATISTICAL PROCESSING CYCLE – THE PRODUCTION TOOLS PERSPECTIVE

Metadata connected with the data in Oracle data warehouse

On-line access to:- Historical data- Data from different phases (not only final

data)- Data for multiple surveys (not only data

marts)- Statistical (variables & classifications) and

process (time stamps, status indicators...) metadata connected with the data

...accessible for third-party tools

Page 18: USING THE METADATA IN STATISTICAL PROCESSING CYCLE – THE PRODUCTION TOOLS PERSPECTIVE

Conceptual star scheme for SBS

...

CLASSIFICATION DIMENSIONSMETADATA DIMENSIONS

FACTS_SBS

OBS_VALUEWEIGHT

DIM_NACE

DIM_NUTS

DIM_ORG_FORM

DIM_VARIABLE

DIM_SOURCE

DIM_INDICATORDIM_OBS_UNIT

DIM_TIME

THIN table design

Page 19: USING THE METADATA IN STATISTICAL PROCESSING CYCLE – THE PRODUCTION TOOLS PERSPECTIVE

Loading

Transactionalstar

Editingtable

Input tables

Oracle Discoverer

Analytical cubes

Imputations

Uncorrect data

Imputation table

Input data

Editing

Cleandata

Corrected data

Automaticcorrections

Editing form

Manualcorrections

Classificationserver

Business register

Searching

Imputeddata

Imputations

Classifications

Metadataserver

VariablesSources...

Extractions

Analyticaldata

Extracted data

Analysis Results

Controlquery

Analyticalquery

Parameterquery

Page 20: USING THE METADATA IN STATISTICAL PROCESSING CYCLE – THE PRODUCTION TOOLS PERSPECTIVE

Lessons learnt

The role of central repositories for metadata- Natural source of conceptual metadata- Metadata have to be exact, complete and consistant- Process metadata should be connected with the

data

Harmonisation of metadata concepts- Local metadata vs. global metadata- The cultural change is needed

Technical considerations- The possibilities for metadata exchange and system

integration are good (XML, SQL)

Page 21: USING THE METADATA IN STATISTICAL PROCESSING CYCLE – THE PRODUCTION TOOLS PERSPECTIVE

Questions