developing statistical information systems and xml information technologies - possibilities and...
TRANSCRIPT
Developing Statistical Information Systems and XML Information Technologies
- Possibilities and Practicable Solutions
Geneva, 8-10 May 2007
Heikki Rouhuvirta, Statistical Methodology R&D
01.04.2007 2Heikki Rouhuvirta
Approaches to Statistics Production
Sources to statistics – Data Processing Sources to statistics – Statistical Methodology Statistics as Information
01.04.2007 3Heikki Rouhuvirta
tilasto-aineisto
Dirty data
Compilation / combining of data
logical verifications
processing into statistical concepts
reporting
release
analyses
reporting
release
protection of unit-level data
quality control and approval of data for the purpose of statistics compilation
further processing
registers
Inquiries
other statistical data
Imputation etc.Datum
IT in Statistics Production
01.04.2007 4Heikki Rouhuvirta
raw data=
archival data (I)
registers
datacollection
editing
edited data=
archival data (II)
quality controlimputationestimation
final data=
archival data (III)
compilation
statistical data
tabulationother computation
sour
ces
of s
tatis
tical
dat
aMethodological processing of statistical dataIn statistics production
01.04.2007 5Heikki Rouhuvirta
statisticalinformation
matrix KeysFormat tablequestionnaire
form
organisation of data
concept model
statisticalreport
content ofnumerical
statistical data
evaluation ofdata quality
fileidentification
data
processingparameter
data
content ofnumerical
source data
graphics
Statistical Information
01.04.2007 6Heikki Rouhuvirta
Challenge:
create solutions that unite the foregoing point of views the solutions offer the services that statistic production
needs the solutions are easy recognizable by a user and offer an adequate informative basis for each individual task by solutions the entity of tasks is manageable for the
statistician
Key for Solution:
exploitation of XML Technology
01.04.2007 7Heikki Rouhuvirta
XML Spesification for Statistical Information Common Structure of Statistical Information (CoSSI)
statisticalinformation
matrix KeysFormat tablequestionnaire
form
Spesifications
organisation of data
concept model
statmeta.dtdquality
declaration.dtddocmeta.dtd
xyz_procmeta.dtd
Spesifications matrix.dtd KeysFormat.dtd table.dtdquestion.dtd
cxqf.dtd
statisticalreport
-taxmeta.dtd-vrkmeta.dtd
….
content ofnumerical
statistical data
evaluation ofdata quality
fileidentification
data
processingparameter
data
content ofnumerical
source data
publication.dtd
graphics
graphics.dtd
Basic of XML
01.04.2007 8Heikki Rouhuvirta
… the result from a statistics standpoint …
01.04.2007 9Heikki Rouhuvirta
0. Defining
1. Collecting
2. Editing
3. Producing public statistics
4. Using
basic format
datamatrix
and description
condensed format
table
and description
descriptions
in
different documents
matrix model
including statmeta
table model
including statmeta
statistical metadata
model
Stages of Processing
cond
ensi
nginterpreting
Model of Data Organisation
matrix module table module statmeta module
Statistics Production and Statistical Information
01.04.2007 10Heikki Rouhuvirta
… case studies of XML in statistics production …
01.04.2007 11Heikki Rouhuvirta
eXist XND
Open area
Restricted-use
Statistics has writing rights
XMLDBstatistics
Collection
Instances
PublicStatisticsn
Statmeta Procmeta
SAS
PublicationsTables
andstatmeta
PublicationsTables
andstatmeta
TablesPublications DataDescriptions
Statisticsn Statisticsn
Operations:-Filtering-User rights (view)-Structura retrieval-Validation-Scalability
in all a. 200...
in all a. 200...
XML Database and Statistical Information
01.04.2007 12Heikki Rouhuvirta
Retrieval of Statistical Metadata for a Variable- Simple User Interface
01.04.2007 13Heikki Rouhuvirta
Turn over the Documents in XML Database
01.04.2007 14Heikki Rouhuvirta
Saving Documents to XML Database
01.04.2007 15Heikki Rouhuvirta
/db/logs/contents.xml
... <event timestamp="2007-03-02T10:57:47.941+02:00"> <type>STORE</type> <path>/db/Tilastot/Arbortext-koulutus/Julkaisut/Julkaisu4.xml</path> </event> <event timestamp="2007-03-02T10:57:48.235+02:00"> <type>STORE</type> <path>/db/Tilastot/Arbortext-koulutus/Julkaisut/Julkaisu4_001.gif</path> </event> <event timestamp="2007-03-02T10:57:48.898+02:00"> <type>STORE</type> <path>/db/Tilastot/Arbortext-koulutus/Julkaisut/Julkaisu4_002.gif</path> </event> <event timestamp="2007-03-02T10:57:49.89+02:00"> <type>STORE</type> <path>/db/Tilastot/Arbortext-koulutus/Julkaisut/Julkaisu4_002.png</path> </event> <event timestamp="2007-03-02T10:58:35.741+02:00"> <type>STORE</type> <path>/db/Tilastot/Arbortext-koulutus/Julkaisut/Julkaisu4_eq_00.gif</path> </event> <event timestamp="2007-03-02T11:26:28.432+02:00"> <type>UPDATE</type> <path>/db/Tilastot/Arbortext-koulutus/Julkaisut/Julkaisu1.xml</path> </event></events>
/db
/system admin dba
/config admin dba
users.xml admin dba rwurwu---
/Tilastot admin dba
/logs admin dba
contents.xml admin dba rwurwur--
Event log of XML Database
01.04.2007 16Heikki Rouhuvirta
SASEnterprise Guide
SAS Add-inTabulate_UI.dll
Visual Studio C#
Web Services SOAP JavaApache/TomcatWeb Services
Windowsworkstation
SAS serverSAS
dataset
Web Services
docmetastatmetaeXist XMLDB
procTemplateODS
XMLtagset
XML-statistical CALS
-matrix….
XMLfiles
procTabulate
SASmacrovariable
eXist XMLDB
disseminationserver
metadataserver
Tabulation Application Architecture in SAS
01.04.2007 17Heikki Rouhuvirta
Tabulation Wizard User Interface in SAS EG
01.04.2007 18Heikki Rouhuvirta
Statistical Data*.xml
(matrix.dtd)
StatisticalMetadata
*.xml(statmeta.dtd)
Source Data
SAS datastep
NOT IN USE:SAS metadata
repository
xmlLibxmlMap
xml engine
SAS dataset
Source DataSource Data
Statsitical Data*.xml
(matrix.dtd)
StatsiticalMetadata
*.xml(statmeta.dtd)
Statistical Data*.xml
(matrix.dtd)
StatsiticalMetadata
*.xml(statmeta.dtd)
XMLDB
SAS dataset2
Data editing/complementing
SAS dataset1
Metadataediting/
complementing
SASEG
A
B
SAS Data Editing Process
01.04.2007 19Heikki Rouhuvirta
…
…
……
xnpxnj…xn2xn1........x ipx ij…x i2x i1........x2px2j…x22x21
x1px1j…x12x11
xp…x j…x2x1
a.
n
.ai..
Variable
… x ipx ij…x i2x i1........
Sta
tistic
al u
nit .
a1
Matrix title
Document metadata
Statistical data matrix
Variables
Class values
Statistical units
Statistical metadata
Footnotes
XDF
Data
Statistical data
Logical schemaof an XML file
01.04.2007 20Heikki Rouhuvirta
Archiving and Backupingto XML
SAS Dataset
Data
RDB
xmlLibxmlMap
xml engine
ODS
XQuery/SQL
XMLDB
XML filesFS or XMLDB
<?xml version="1.0" encoding="iso-8859-1" ?><statmatrix> <matrixtitlegrp> <matrixtitle> <matrixmaintitle>Employees 2001.</matrixmaintitle> </matrixtitle> </matrixtitlegrp> <docmeta> <statxdf> <array> <fieldAxis> <axis axisId="axisvar1" axisIdRef="var1"> <axis axisId="axisvar2" axisIdRef="var2"> <axis axisId="axisvar3" axisIdRef="var3"> <read> <statmeta> <data> </array> </statxdf></statmatrix>
…
…
……
xnpxnj…xn2xn1........xipxij…xi2xi1........x2px2j…x22x21x1px1j…x12x11xp…xj…x2x
a.n
.ai..
Variable
Statisticalunit
<?xml version="1.0" encoding="iso-8859-1" ?><statmatrix> <matrixtitlegrp> <matrixtitle> <matrixmaintitle>Employees 2001.</matrixmaintitle> </matrixtitle> </matrixtitlegrp> <docmeta> <statxdf> <array> <fieldAxis> <axis axisId="axisvar1" axisIdRef="var1"> <axis axisId="axisvar2" axisIdRef="var2"> <axis axisId="axisvar3" axisIdRef="var3"> <read> <statmeta> <data> </array> </statxdf></statmatrix>
<?xml version="1.0" encoding="iso-8859-1" ?><statmatrix> <matrixtitlegrp> <matrixtitle> <matrixmaintitle>Employees 2001.</matrixmaintitle> </matrixtitle> </matrixtitlegrp> <docmeta> <statxdf> <array> <fieldAxis> <axis axisId="axisvar1" axisIdRef="var1"> <axis axisId="axisvar2" axisIdRef="var2"> <axis axisId="axisvar3" axisIdRef="var3"> <read> <statmeta> <data> </array> </statxdf></statmatrix>
…
…
……
xnpxnj…xn2xn1........xipxij…i2xi1........x2px2j…x22x21x1px1j…x12xxp…xjx2x
an
.ai..
Variable
Statistical
…
…
……
xnpxnj…xn2xn1........xipxij…xi2xi1........x2px2j…x22x21x1px1j…xx11
1
.n
.i..
Variable
Statisticalunit
x
12
.n
.
a
i
.a
1
x xxx2 j p… …
a
Data Description(statmeta.dtd)
Statistical DataArchives/Backup
(matrix.dtd)
Metadata
Numeric Data
+
Observation Matrix (matrix.dtd) - Content of xml-file:
01.04.2007 21Heikki Rouhuvirta
Example of Xquery/SQL
01.04.2007 22Heikki Rouhuvirta
Content of XML file
01.04.2007 23Heikki Rouhuvirta
tables
Publication production(Monthly, quarterly, yearly publ, publication tables...)
XMLDBeXist
Metadata:- statistical md- document md- classifications- processing md
Conversions &Publishing
Saxon- XSLT 2.0
XEP- XSL-FO
PX-Web
HTMLHTMLHTML
PDFPDF
Statisticalapplication
SAS
SuperStar
PX-Edit
PC-Axis
...
SAS
PX-Edit
PC-Axis
...
Publicationeditor
XML-EditorAbortext
OthersRSS,txt...
figuresXMLDBeXist
Disseminationdatabase
- publications- matrices- tables
matrices
PXML
Production and Dissemination of Tables in Publishing Process
01.04.2007 24Heikki Rouhuvirta
XML Publication Editor- User Interface
01.04.2007 25Heikki Rouhuvirta
Full Text Search:(with operators)
1) Specifying Search Terms
the words or phrase that bestdescribe the statistical
information wanted to find
Where to search:(choice one or more items)
2) Focussing Search
Table Titles
Table Contents
Publication Titles
Graphics
Data Descriptions
Show:(choice one ormore items)
3) Defining Result View
Document Titles
Occurrences
Linklist
etc.
etc.
Publication Contents
Retrieval of Statsitical Information
01.04.2007 26Heikki Rouhuvirta
… and statistical information in tables
01.04.2007 27Heikki Rouhuvirta
Statistical figure 6Statistical figure 1Class value 1
Statistical figure 8Statistical figure 4Class value 2
Variable 3Variable 2Variable 1
Statistical figure 6
Statistical figure 5Statistical figure 2Statistical figure 1Class value 1
Statistical figure 7Statistical figure 3Class value 2
Variable 3Variable 2Variable 1
Table 1. Statistical Metadata in a informative statistical table (I)
Statistical metadata:title, subtitle, footnote, metadata reference (quality declaration)
Document metadata elements: subject, keywords, content description, date, identifier
Statistical metadata elements:-name, specification, concept definition, concept definition description, operational definition, operational definition description, calculation name, calculation formula, calculation description, measurement unit, measurement description
Statistical metadata elements:-code, name, description
Document metadata elements:-classification id, type, author, date
Statistical metadata elements: -note
Register metadata elements:name, concept definition, formation intsruction, law, interpretation of law, lawcases, etc.
01.04.2007 28Heikki Rouhuvirta
Statistical figure 6Statistical figure 1Class value 1
Statistical figure 8Statistical figure 4Class value 2
Variable 3Variable 2Variable 1
Statistical figure 6
Statistical figure 5Statistical figure 2Statistical figure 1Class value 1
Statistical figure 7Statistical figure 3Class value 2
Variable 3Variable 2Variable 1
Table 1. Statistical Metadata in a informative statistical table (II)
Quality declaration
Quality Indicators:Coefficient of VariationValue=0.92
Quality Indicators:Coefficient of VariationValue=0.87
01.04.2007 29Heikki Rouhuvirta
Statistical figure 6Statistical figure 1Class value 1
Statistical figure 8Statistical figure 4Class value 2
Variable 3Variable 2Variable 1
Statistical figure 6
Statistical figure 5Statistical figure 2Statistical figure 1Class value 1
Statistical figure 7Statistical figure 3Class value 2
Variable 3Variable 2Variable 1
Table 1. Statistical Metadata in a informative statistical table (III)
Quality declaration
Quality Indicators:Coefficient of VariationValue=0.92
Quality Indicators:Coefficient of VariationValue=0.87
01.04.2007 30Heikki Rouhuvirta
Conclusions XML Based Service Environment in Statistics Production The statistics production solution briefly described above gives indications of
the kinds of services that could be produced from a statistical information system in future, both for statisticians and the users of statistical data. The foundation (for statistics production) is an XML-based information architecture and standard applications exploiting it.
Basing the implementation of the information architecture on XML allows utilisation of standard and standard-like specifications, but the special characteristics of statistical information should be taken into consideration in their application and implementation. If, for instance, the possibilities of a semantic structural specification are not exploited in the structural analysis and the final structure of statistical data, from the point of information management the solutions become complicated, on the one hand, and ineffective in practice, on the other. From the perspective of application development, it seems especially important that the information architecture itself does not contain application-specific data specifications, because we are unlikely to see a situation where we would have just one monolithic application for both statistics production and information service provision.
A semantically relevant structure helps the statistician and the user of statistics to control the correctness of contents.
01.04.2007 31Heikki Rouhuvirta
Thank you for your attention!