developing statistical information systems and xml information technologies - possibilities and...

31
Developing Statistical Information Systems and XML Information Technologies - Possibilities and Practicable Solutions [email protected] Geneva, 8-10 May 2007 Heikki Rouhuvirta, Statistical Methodology R&D

Upload: augustine-skinner

Post on 04-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Developing Statistical Information Systems and XML Information Technologies - Possibilities and Practicable Solutions heikki.rouhuvirta@stat.fi Geneva,

Developing Statistical Information Systems and XML Information Technologies

- Possibilities and Practicable Solutions

[email protected]

Geneva, 8-10 May 2007

Heikki Rouhuvirta, Statistical Methodology R&D

Page 2: Developing Statistical Information Systems and XML Information Technologies - Possibilities and Practicable Solutions heikki.rouhuvirta@stat.fi Geneva,

01.04.2007 2Heikki Rouhuvirta

Approaches to Statistics Production

Sources to statistics – Data Processing Sources to statistics – Statistical Methodology Statistics as Information

Page 3: Developing Statistical Information Systems and XML Information Technologies - Possibilities and Practicable Solutions heikki.rouhuvirta@stat.fi Geneva,

01.04.2007 3Heikki Rouhuvirta

tilasto-aineisto

Dirty data

Compilation / combining of data

logical verifications

processing into statistical concepts

reporting

release

analyses

reporting

release

protection of unit-level data

quality control and approval of data for the purpose of statistics compilation

further processing

registers

Inquiries

other statistical data

Imputation etc.Datum

IT in Statistics Production

Page 4: Developing Statistical Information Systems and XML Information Technologies - Possibilities and Practicable Solutions heikki.rouhuvirta@stat.fi Geneva,

01.04.2007 4Heikki Rouhuvirta

raw data=

archival data (I)

registers

datacollection

editing

edited data=

archival data (II)

quality controlimputationestimation

final data=

archival data (III)

compilation

statistical data

tabulationother computation

sour

ces

of s

tatis

tical

dat

aMethodological processing of statistical dataIn statistics production

Page 5: Developing Statistical Information Systems and XML Information Technologies - Possibilities and Practicable Solutions heikki.rouhuvirta@stat.fi Geneva,

01.04.2007 5Heikki Rouhuvirta

statisticalinformation

matrix KeysFormat tablequestionnaire

form

organisation of data

concept model

statisticalreport

content ofnumerical

statistical data

evaluation ofdata quality

fileidentification

data

processingparameter

data

content ofnumerical

source data

graphics

Statistical Information

Page 6: Developing Statistical Information Systems and XML Information Technologies - Possibilities and Practicable Solutions heikki.rouhuvirta@stat.fi Geneva,

01.04.2007 6Heikki Rouhuvirta

Challenge:

create solutions that unite the foregoing point of views the solutions offer the services that statistic production

needs the solutions are easy recognizable by a user and offer an adequate informative basis for each individual task by solutions the entity of tasks is manageable for the

statistician

Key for Solution:

exploitation of XML Technology

Page 7: Developing Statistical Information Systems and XML Information Technologies - Possibilities and Practicable Solutions heikki.rouhuvirta@stat.fi Geneva,

01.04.2007 7Heikki Rouhuvirta

XML Spesification for Statistical Information Common Structure of Statistical Information (CoSSI)

statisticalinformation

matrix KeysFormat tablequestionnaire

form

Spesifications

organisation of data

concept model

statmeta.dtdquality

declaration.dtddocmeta.dtd

xyz_procmeta.dtd

Spesifications matrix.dtd KeysFormat.dtd table.dtdquestion.dtd

cxqf.dtd

statisticalreport

-taxmeta.dtd-vrkmeta.dtd

….

content ofnumerical

statistical data

evaluation ofdata quality

fileidentification

data

processingparameter

data

content ofnumerical

source data

publication.dtd

graphics

graphics.dtd

Basic of XML

Page 8: Developing Statistical Information Systems and XML Information Technologies - Possibilities and Practicable Solutions heikki.rouhuvirta@stat.fi Geneva,

01.04.2007 8Heikki Rouhuvirta

… the result from a statistics standpoint …

Page 9: Developing Statistical Information Systems and XML Information Technologies - Possibilities and Practicable Solutions heikki.rouhuvirta@stat.fi Geneva,

01.04.2007 9Heikki Rouhuvirta

0. Defining

1. Collecting

2. Editing

3. Producing public statistics

4. Using

basic format

datamatrix

and description

condensed format

table

and description

descriptions

in

different documents

matrix model

including statmeta

table model

including statmeta

statistical metadata

model

Stages of Processing

cond

ensi

nginterpreting

Model of Data Organisation

matrix module table module statmeta module

Statistics Production and Statistical Information

Page 10: Developing Statistical Information Systems and XML Information Technologies - Possibilities and Practicable Solutions heikki.rouhuvirta@stat.fi Geneva,

01.04.2007 10Heikki Rouhuvirta

… case studies of XML in statistics production …

Page 11: Developing Statistical Information Systems and XML Information Technologies - Possibilities and Practicable Solutions heikki.rouhuvirta@stat.fi Geneva,

01.04.2007 11Heikki Rouhuvirta

eXist XND

Open area

Restricted-use

Statistics has writing rights

XMLDBstatistics

Collection

Instances

PublicStatisticsn

Statmeta Procmeta

SAS

PublicationsTables

andstatmeta

PublicationsTables

andstatmeta

TablesPublications DataDescriptions

Statisticsn Statisticsn

Operations:-Filtering-User rights (view)-Structura retrieval-Validation-Scalability

in all a. 200...

in all a. 200...

XML Database and Statistical Information

Page 12: Developing Statistical Information Systems and XML Information Technologies - Possibilities and Practicable Solutions heikki.rouhuvirta@stat.fi Geneva,

01.04.2007 12Heikki Rouhuvirta

Retrieval of Statistical Metadata for a Variable- Simple User Interface

Page 13: Developing Statistical Information Systems and XML Information Technologies - Possibilities and Practicable Solutions heikki.rouhuvirta@stat.fi Geneva,

01.04.2007 13Heikki Rouhuvirta

Turn over the Documents in XML Database

Page 14: Developing Statistical Information Systems and XML Information Technologies - Possibilities and Practicable Solutions heikki.rouhuvirta@stat.fi Geneva,

01.04.2007 14Heikki Rouhuvirta

Saving Documents to XML Database

Page 15: Developing Statistical Information Systems and XML Information Technologies - Possibilities and Practicable Solutions heikki.rouhuvirta@stat.fi Geneva,

01.04.2007 15Heikki Rouhuvirta

/db/logs/contents.xml

... <event timestamp="2007-03-02T10:57:47.941+02:00"> <type>STORE</type> <path>/db/Tilastot/Arbortext-koulutus/Julkaisut/Julkaisu4.xml</path> </event> <event timestamp="2007-03-02T10:57:48.235+02:00"> <type>STORE</type> <path>/db/Tilastot/Arbortext-koulutus/Julkaisut/Julkaisu4_001.gif</path> </event> <event timestamp="2007-03-02T10:57:48.898+02:00"> <type>STORE</type> <path>/db/Tilastot/Arbortext-koulutus/Julkaisut/Julkaisu4_002.gif</path> </event> <event timestamp="2007-03-02T10:57:49.89+02:00"> <type>STORE</type> <path>/db/Tilastot/Arbortext-koulutus/Julkaisut/Julkaisu4_002.png</path> </event> <event timestamp="2007-03-02T10:58:35.741+02:00"> <type>STORE</type> <path>/db/Tilastot/Arbortext-koulutus/Julkaisut/Julkaisu4_eq_00.gif</path> </event> <event timestamp="2007-03-02T11:26:28.432+02:00"> <type>UPDATE</type> <path>/db/Tilastot/Arbortext-koulutus/Julkaisut/Julkaisu1.xml</path> </event></events>

/db

/system admin dba

/config admin dba

users.xml admin dba rwurwu---

/Tilastot admin dba

/logs admin dba

contents.xml admin dba rwurwur--

Event log of XML Database

Page 16: Developing Statistical Information Systems and XML Information Technologies - Possibilities and Practicable Solutions heikki.rouhuvirta@stat.fi Geneva,

01.04.2007 16Heikki Rouhuvirta

SASEnterprise Guide

SAS Add-inTabulate_UI.dll

Visual Studio C#

Web Services SOAP JavaApache/TomcatWeb Services

Windowsworkstation

SAS serverSAS

dataset

Web Services

docmetastatmetaeXist XMLDB

procTemplateODS

XMLtagset

XML-statistical CALS

-matrix….

XMLfiles

procTabulate

SASmacrovariable

eXist XMLDB

disseminationserver

metadataserver

Tabulation Application Architecture in SAS

Page 17: Developing Statistical Information Systems and XML Information Technologies - Possibilities and Practicable Solutions heikki.rouhuvirta@stat.fi Geneva,

01.04.2007 17Heikki Rouhuvirta

Tabulation Wizard User Interface in SAS EG

Page 18: Developing Statistical Information Systems and XML Information Technologies - Possibilities and Practicable Solutions heikki.rouhuvirta@stat.fi Geneva,

01.04.2007 18Heikki Rouhuvirta

Statistical Data*.xml

(matrix.dtd)

StatisticalMetadata

*.xml(statmeta.dtd)

Source Data

SAS datastep

NOT IN USE:SAS metadata

repository

xmlLibxmlMap

xml engine

SAS dataset

Source DataSource Data

Statsitical Data*.xml

(matrix.dtd)

StatsiticalMetadata

*.xml(statmeta.dtd)

Statistical Data*.xml

(matrix.dtd)

StatsiticalMetadata

*.xml(statmeta.dtd)

XMLDB

SAS dataset2

Data editing/complementing

SAS dataset1

Metadataediting/

complementing

SASEG

A

B

SAS Data Editing Process

Page 19: Developing Statistical Information Systems and XML Information Technologies - Possibilities and Practicable Solutions heikki.rouhuvirta@stat.fi Geneva,

01.04.2007 19Heikki Rouhuvirta

……

xnpxnj…xn2xn1........x ipx ij…x i2x i1........x2px2j…x22x21

x1px1j…x12x11

xp…x j…x2x1

a.

n

.ai..

Variable

… x ipx ij…x i2x i1........

Sta

tistic

al u

nit .

a1

Matrix title

Document metadata

Statistical data matrix

Variables

Class values

Statistical units

Statistical metadata

Footnotes

XDF

Data

Statistical data

Logical schemaof an XML file

Page 20: Developing Statistical Information Systems and XML Information Technologies - Possibilities and Practicable Solutions heikki.rouhuvirta@stat.fi Geneva,

01.04.2007 20Heikki Rouhuvirta

Archiving and Backupingto XML

SAS Dataset

Data

RDB

xmlLibxmlMap

xml engine

ODS

XQuery/SQL

XMLDB

XML filesFS or XMLDB

<?xml version="1.0" encoding="iso-8859-1" ?><statmatrix> <matrixtitlegrp> <matrixtitle> <matrixmaintitle>Employees 2001.</matrixmaintitle> </matrixtitle> </matrixtitlegrp> <docmeta> <statxdf> <array> <fieldAxis> <axis axisId="axisvar1" axisIdRef="var1"> <axis axisId="axisvar2" axisIdRef="var2"> <axis axisId="axisvar3" axisIdRef="var3"> <read> <statmeta> <data> </array> </statxdf></statmatrix>

……

xnpxnj…xn2xn1........xipxij…xi2xi1........x2px2j…x22x21x1px1j…x12x11xp…xj…x2x

a.n

.ai..

Variable

Statisticalunit

<?xml version="1.0" encoding="iso-8859-1" ?><statmatrix> <matrixtitlegrp> <matrixtitle> <matrixmaintitle>Employees 2001.</matrixmaintitle> </matrixtitle> </matrixtitlegrp> <docmeta> <statxdf> <array> <fieldAxis> <axis axisId="axisvar1" axisIdRef="var1"> <axis axisId="axisvar2" axisIdRef="var2"> <axis axisId="axisvar3" axisIdRef="var3"> <read> <statmeta> <data> </array> </statxdf></statmatrix>

<?xml version="1.0" encoding="iso-8859-1" ?><statmatrix> <matrixtitlegrp> <matrixtitle> <matrixmaintitle>Employees 2001.</matrixmaintitle> </matrixtitle> </matrixtitlegrp> <docmeta> <statxdf> <array> <fieldAxis> <axis axisId="axisvar1" axisIdRef="var1"> <axis axisId="axisvar2" axisIdRef="var2"> <axis axisId="axisvar3" axisIdRef="var3"> <read> <statmeta> <data> </array> </statxdf></statmatrix>

……

xnpxnj…xn2xn1........xipxij…i2xi1........x2px2j…x22x21x1px1j…x12xxp…xjx2x

an

.ai..

Variable

Statistical

……

xnpxnj…xn2xn1........xipxij…xi2xi1........x2px2j…x22x21x1px1j…xx11

1

.n

.i..

Variable

Statisticalunit

x

12

.n

.

a

i

.a

1

x xxx2 j p… …

a

Data Description(statmeta.dtd)

Statistical DataArchives/Backup

(matrix.dtd)

Metadata

Numeric Data

+

Observation Matrix (matrix.dtd) - Content of xml-file:

Page 21: Developing Statistical Information Systems and XML Information Technologies - Possibilities and Practicable Solutions heikki.rouhuvirta@stat.fi Geneva,

01.04.2007 21Heikki Rouhuvirta

Example of Xquery/SQL

Page 22: Developing Statistical Information Systems and XML Information Technologies - Possibilities and Practicable Solutions heikki.rouhuvirta@stat.fi Geneva,

01.04.2007 22Heikki Rouhuvirta

Content of XML file

Page 23: Developing Statistical Information Systems and XML Information Technologies - Possibilities and Practicable Solutions heikki.rouhuvirta@stat.fi Geneva,

01.04.2007 23Heikki Rouhuvirta

tables

Publication production(Monthly, quarterly, yearly publ, publication tables...)

XMLDBeXist

Metadata:- statistical md- document md- classifications- processing md

Conversions &Publishing

Saxon- XSLT 2.0

XEP- XSL-FO

PX-Web

HTMLHTMLHTML

PDFPDF

Statisticalapplication

SAS

SuperStar

PX-Edit

PC-Axis

...

SAS

PX-Edit

PC-Axis

...

Publicationeditor

XML-EditorAbortext

OthersRSS,txt...

figuresXMLDBeXist

Disseminationdatabase

- publications- matrices- tables

matrices

PXML

Production and Dissemination of Tables in Publishing Process

Page 24: Developing Statistical Information Systems and XML Information Technologies - Possibilities and Practicable Solutions heikki.rouhuvirta@stat.fi Geneva,

01.04.2007 24Heikki Rouhuvirta

XML Publication Editor- User Interface

Page 25: Developing Statistical Information Systems and XML Information Technologies - Possibilities and Practicable Solutions heikki.rouhuvirta@stat.fi Geneva,

01.04.2007 25Heikki Rouhuvirta

Full Text Search:(with operators)

1) Specifying Search Terms

the words or phrase that bestdescribe the statistical

information wanted to find

Where to search:(choice one or more items)

2) Focussing Search

Table Titles

Table Contents

Publication Titles

Graphics

Data Descriptions

Show:(choice one ormore items)

3) Defining Result View

Document Titles

Occurrences

Linklist

etc.

etc.

Publication Contents

Retrieval of Statsitical Information

Page 26: Developing Statistical Information Systems and XML Information Technologies - Possibilities and Practicable Solutions heikki.rouhuvirta@stat.fi Geneva,

01.04.2007 26Heikki Rouhuvirta

… and statistical information in tables

Page 27: Developing Statistical Information Systems and XML Information Technologies - Possibilities and Practicable Solutions heikki.rouhuvirta@stat.fi Geneva,

01.04.2007 27Heikki Rouhuvirta

Statistical figure 6Statistical figure 1Class value 1

Statistical figure 8Statistical figure 4Class value 2

Variable 3Variable 2Variable 1

Statistical figure 6

Statistical figure 5Statistical figure 2Statistical figure 1Class value 1

Statistical figure 7Statistical figure 3Class value 2

Variable 3Variable 2Variable 1

Table 1. Statistical Metadata in a informative statistical table (I)

Statistical metadata:title, subtitle, footnote, metadata reference (quality declaration)

Document metadata elements: subject, keywords, content description, date, identifier

Statistical metadata elements:-name, specification, concept definition, concept definition description, operational definition, operational definition description, calculation name, calculation formula, calculation description, measurement unit, measurement description

Statistical metadata elements:-code, name, description

Document metadata elements:-classification id, type, author, date

Statistical metadata elements: -note

Register metadata elements:name, concept definition, formation intsruction, law, interpretation of law, lawcases, etc.

Page 28: Developing Statistical Information Systems and XML Information Technologies - Possibilities and Practicable Solutions heikki.rouhuvirta@stat.fi Geneva,

01.04.2007 28Heikki Rouhuvirta

Statistical figure 6Statistical figure 1Class value 1

Statistical figure 8Statistical figure 4Class value 2

Variable 3Variable 2Variable 1

Statistical figure 6

Statistical figure 5Statistical figure 2Statistical figure 1Class value 1

Statistical figure 7Statistical figure 3Class value 2

Variable 3Variable 2Variable 1

Table 1. Statistical Metadata in a informative statistical table (II)

Quality declaration

Quality Indicators:Coefficient of VariationValue=0.92

Quality Indicators:Coefficient of VariationValue=0.87

Page 29: Developing Statistical Information Systems and XML Information Technologies - Possibilities and Practicable Solutions heikki.rouhuvirta@stat.fi Geneva,

01.04.2007 29Heikki Rouhuvirta

Statistical figure 6Statistical figure 1Class value 1

Statistical figure 8Statistical figure 4Class value 2

Variable 3Variable 2Variable 1

Statistical figure 6

Statistical figure 5Statistical figure 2Statistical figure 1Class value 1

Statistical figure 7Statistical figure 3Class value 2

Variable 3Variable 2Variable 1

Table 1. Statistical Metadata in a informative statistical table (III)

Quality declaration

Quality Indicators:Coefficient of VariationValue=0.92

Quality Indicators:Coefficient of VariationValue=0.87

Page 30: Developing Statistical Information Systems and XML Information Technologies - Possibilities and Practicable Solutions heikki.rouhuvirta@stat.fi Geneva,

01.04.2007 30Heikki Rouhuvirta

Conclusions XML Based Service Environment in Statistics Production The statistics production solution briefly described above gives indications of

the kinds of services that could be produced from a statistical information system in future, both for statisticians and the users of statistical data. The foundation (for statistics production) is an XML-based information architecture and standard applications exploiting it.

Basing the implementation of the information architecture on XML allows utilisation of standard and standard-like specifications, but the special characteristics of statistical information should be taken into consideration in their application and implementation. If, for instance, the possibilities of a semantic structural specification are not exploited in the structural analysis and the final structure of statistical data, from the point of information management the solutions become complicated, on the one hand, and ineffective in practice, on the other. From the perspective of application development, it seems especially important that the information architecture itself does not contain application-specific data specifications, because we are unlikely to see a situation where we would have just one monolithic application for both statistics production and information service provision.

A semantically relevant structure helps the statistician and the user of statistics to control the correctness of contents.

Page 31: Developing Statistical Information Systems and XML Information Technologies - Possibilities and Practicable Solutions heikki.rouhuvirta@stat.fi Geneva,

01.04.2007 31Heikki Rouhuvirta

Thank you for your attention!