ogc spet 2010 meta-propagation of uncertainties within workflows
DESCRIPTION
To begin with let us quote the QA4EO (Quality Assurance for Earth Observation)1: “If the vision of GEOSS is to be achieved, Quality Indicators (QIs) should be ascribed to data and, in particular, to delivered information products, at each stage of the data processing chain - from collection and processing to delivery. A QI should provide sufficient information to allow all users to readily evaluate a product’s suitability for their particular application, i.e. its “fitness for purpose”. To ensure that this process is internationally harmonised and consistent, the QI needs to be based on a documented and quantifiable assessment of evidence demonstrating the level of traceability to internationally agreed (where possible SI) reference standards. Such standards may be manmade, natural or intrinsic in nature. The documented evidence should include a description of the processes used, together with an uncertainty budget (or other appropriate quality performance measure).The guidelines of QA4EO provide a template and guidance on how to achieve this in a harmonised and robust manner. “ For interoperability purposes, each data and process registered within EuroGEOSS possesses appropriate metadata elements. The metadata description and the semantics attached to each component of a workflow (datasets and processing services) allow updating/swapping of these components. With varying quality of the components of the workflow, the quality of the outputs of this workflow can become unreliable. With the knowledge of the level of uncertainty in each dataset involved and the sensitivity aspects of the processing steps it is possible to define the quality of a workflow and the level of uncertainty of the outputs by error propagation principles. Reusing of a given model encapsulated in a scientific workflow implies running the workflow using either the same datasets but not necessarily coming from the same sources, or different datasets which have also not necessarily the required/desired scale specified by the workflow. From error propagation principles and the knowledge of the quality metadata of the components of the workflow, using datasets from different sources or at different scales can be assessed for the quality of the workflow. As part of the integrated modelling activity the latter assessment will help the modeller in choosing the appropriate datasets or in refining the workflow model for example by considering data assimilation, downscaling, multiple scale integration steps within the scientific model and its associated workflow. The workflow quality assessment will help also the modeller in swapping or refining the processing steps as well. Under these modelling activities, the workflow is then seen as the concrete support of a conceptual model, which evolves as the conceptual model does. On top of quality descriptors existing in the ISO19157, the present document describes the requirements for uncertainty analysis within scientific workflows.TRANSCRIPT
®
© 2010 Open Geospatial Consortium, Inc.
Workflow Uncertainty using a Metamodel Framework and Metadata for Data and
ProcessesOGC Technical Committee
September 20-24, 2010
Toulouse, France
Didier G Leibovici and Amir Pourabdollah
Centre for Geospatial Science
University of Nottingham
OGC®
© 2010 Open Geospatial Consortium, Inc. 2
outline
• integrated modelling /scientific workflowmodel building / reusing / user’s perspective /rescaling / quality assessment
• uncertainty / sensitivity analyses for workflowserror propagation / uncertainty analysis / emulator (“metamodelling”) / use of metadata
• metadata for data and for processes quality metadata / UncertML / quality principles & measures for processes
• metamodel for workflows notation/ encoding/ enrichment
• towards Web Workflow Service? WPS / WWS / requirements for workflow assessment
FP7 European project
OGC®
OGC initiatives related to workflows
• OWS-5 http://www.opengeospatial.org/projects/initiatives/ows-5
conflation workflow and SWE workflow
• OWS-6 http://www.opengeospatial.org/projects/initiatives/ows-6GeoProcessing Workflow, Decision Support Service
http://www.opengeospatial.org/pub/www/ows6/web_files/ows6.html
© 2010 Open Geospatial Consortium, Inc. 3
OGC®
OGC OWS-5 conflation workflow
© 2010 Open Geospatial Consortium, Inc. 4
OGC®
OGC OWS-6 landslide sensor geoprocessingworkflow
© 2010 Open Geospatial Consortium, Inc. 5
OGC®
Debris flow operational scenario
OGC®
integrated modelling/ scientific workflow
© 2010 Open Geospatial Consortium, Inc. 7
model building
reusing
user’s perspective
multidiscipline
rescaling
quality assessment
uncertainties
OGC®
integrated modelling/ scientific workflow
• representation BPMN
© 2010 Open Geospatial Consortium, Inc. 8
toy example:greenness model
Data3= P1(Data1, Data2)
Data3= P1’ (Data1, Data2, Data7)Data6= P2(Data3, Data4, Data5)
P1’
D7
OGC®
uncertainty / accuracy /sensitivity
© 2010 Open Geospatial Consortium, Inc. 9
OGC®
© 2010 Open Geospatial Consortium, Inc. 10
OGC®
uncertainty / accuracy /sensitivity
© 2010 Open Geospatial Consortium, Inc. 11
OGC®
uncertainty / accuracy /sensitivity
• error propagation (via the model)
– variables interaction
– spatial dependence of uncertainties
© 2010 Open Geospatial Consortium, Inc. 12
sensitivity and uncertainty analysis
sampling design and model building
sampling design and propagation
OGC®
uncertainty / accuracy /sensitivity
• uncertainty analysis what is the output uncertainty?
• and sensitivity analysis where output uncertainty comes from?
© 2010 Open Geospatial Consortium, Inc. 13
1. uses quality metadata about inputs (distribution, variance, ...)
2. sampling design accordingly
3. look at output distribution, variance, ... and compare with inputs
A. using the model
B. using an emulator (see UncertWeb project)
C. can we do a simple estimation without 2 and 3?
for each atomic process
Workflow level
OGC®
propagating thematic uncertainty
© 2010 Open Geospatial Consortium, Inc. 14
Z
X1
X2
X3
^
^
^
Y^
^Z
X1
X2
X3
^
^
^
Y^
^
=><
?
variance
OGC®
propagating thematic uncertainty
© 2010 Open Geospatial Consortium, Inc. 15
Z
X1
X2
X3
^
^
^
Y
Z
X1
X2
X3
^
^
^
Y
^=~><<<>>
?
• is in the “tolerance” of according to ? ~
• If then
• if
X1^
Sensitivityinformation
OGC®
propagating thematic uncertainty
© 2010 Open Geospatial Consortium, Inc. 16
Z
X1
X2
X3
^
^
^
Y
Z
X1
X2
X3
^
^
^
Y
^
Need more thanSensitivityInformation
Need a kind of meta-sensitivityi.e. for various samplingVariancesa variance transfer function
=~><<<>>
OGC®
metadata for data and for processes
• ISO standards (data and services)
19115, 19113, 19114, 19135, 19138,19119, (19139)
• UncertML (OGC discussion paper)
encoding uncertainty measures
© 2010 Open Geospatial Consortium, Inc. 17
ISO 19113 - Quality principles, ISO 19114- Quality evaluation procedures, ISO 19115-Metadata, ISO - 19138 - Data quality measures and ISO - 19135 Registration,
OGC®
metadata for data
© 2010 Open Geospatial Consortium, Inc. 18
Table 1: Data quality elements and data quality sub-elements with definitions (ISO 19113)
OGC®
metadata for data
© 2010 Open Geospatial Consortium, Inc. 19
OGC®
metadata for processes (proposal)
© 2010 Open Geospatial Consortium, Inc. 20
OGC®
metadata for processes (proposal)
© 2010 Open Geospatial Consortium, Inc. 21
OGC®
© 2010 Open Geospatial Consortium, Inc. 22
Metadata for processes / basic measures
OGC®
© 2010 Open Geospatial Consortium, Inc. 23
Metadata for processes / basic measures
• encoding using the same structure as in
ISO19115/ISO19139 for data quality
DQ_element PQ_element
• registration of measures ISO19135
PQ_ConflationInformationLoss, PQ_ThematicClassificationPropagation, PQ_QuantitativeAttributePropagation PQ_ConceptualSemanticConformance, PQ_DomainConsistency, PQ_TopologicalPreservation
OGC®
Metadata workflow quality / metadata propagation
© 2010 Open Geospatial Consortium, Inc. 24
Dynamic Metadatae.g -discrepancy of scales (data chosen vs expected input)
-Capitalising uses:dynamic alsoby web 2.0
-parameter choices ”
model building
reusing
user’s perspective
multidiscipline
rescaling
quality assessment
OGC®
metamodel for workflows
• representing / storing & navigate / execute• notation encoding enrichment engine
BPMN XPDL (extensions) XPDL or BPEL engine
PNML (Petri-Nets)
• enrichment with metadata (quality element)• enrichment with semantic related to quality (tags)
© 2010 Open Geospatial Consortium, Inc. 25
e.g greenery / greenness model
OGC®
XPDL 2.1 process meta-model
© 2010 Open Geospatial Consortium, Inc. 26attached with quality metadata
OGC®
XPDL 2.1 linking with BPMN
© 2010 Open Geospatial Consortium, Inc. 27
attached with quality metadata
OGC®
Extended attributes
• Without namespace
• With namespace
© 2010 Open Geospatial Consortium, Inc. 28
OGC®
BPMN/XPDL Example
© 2010 Open Geospatial Consortium, Inc. 29
Data3= P1(Data1, Data2)
OGC®
BPMN/XPDL Example – Step 2
© 2010 Open Geospatial Consortium, Inc. 30
Data3= P1(Data1, Data2)Data6= P2(Data3, Data4, Data5)
OGC®
BPMN/XPDL Example – Step 3
© 2010 Open Geospatial Consortium, Inc. 31
Data3= P1’ (Data1, Data2, Data7)Data6= P2(Data3, Data4, Data5)
‘
OGC®
towards Web Workflow Service?
• needs to easily
combine /assess / refine web data/process services
• in a “WPS” fashion (WPS are atomic Workflows)
• and other things: validation using PNML
© 2010 Open Geospatial Consortium, Inc. 32
OGC®
towards Web Workflow Service?
• WPS executing a worklfow
see OWS-5 6 (“hard-coded” and / or using a BPEL engine)
• WPS acting alike a workflow service WPS GetCapabilities:
. specific operations stored as available processes (Op)
. list of the workflows processes (Wkf)
the principle is the Ops informed on a Wkf by returning an enriched XPDL file representing the workflow
• WWS the “WPS acting” has unbalanced intrinsic properties of the existing processes living in the WPS
© 2010 Open Geospatial Consortium, Inc. 33
OGC®
towards Web Workflow Service?
• WPS acting alike a workflow service WPS GetCapabilities:
. specific operations stored as available processes (Op)
. list of the workflows processes (Wkf) the principle is the Ops informed on a Wkf by returning an enriched XPDL file
representing the workflow
1. OpShow Id_Wkf returns the XPDL (enriched) of a Wkf
2. OpSet data/processes (modifiable entries of Wkf) returns the updated XPDL file with the updated metadata (particularly propagated
metadata)
3. OpExecute, same as OpSet but runs the Wkf as an“aggregated process”, returns an XPDL containing as well the links for the outputs.
4. OpStatus returns the status per node of the Wkf in an XPDL file
© 2010 Open Geospatial Consortium, Inc. 34
OGC®
towards Web Workflow Service?
• WWS• GetCapabilities OGC generic request• DescribeWorkflow request to retrieve the definition of a workflow in a number of
standard formats, in which XPDL is the primary choice. It corresponds to OpShow.
• DefineWorkflow like OpSet allowing to set/modify a workflow (fixed workflow witih user’s input, partially modifiable workflow with user’s inputs and swaps of internal processes or data, or user’s workflow)
• ExecuteWorkflow as OpExecute launch the execution in “instant” or “delayed” mode, as in WPS and requests the execution status as XPDL or “other workflow format”.
Parameters to manage the
- different levels of aggregation/hierarchy (e.g. an erosion model may have precipitation model and a run-off model (among other sub-models).
- uncomplete but published conceptual workflows (collaborations)
© 2010 Open Geospatial Consortium, Inc. 35
OGC®
© 2010 Open Geospatial Consortium, Inc. 36
summary
• integrated modelling /scientific workflowmodel building / reusing / user’s perspective /rescaling / quality assessment
• uncertainty / sensitivity analyses for workflowserror propagation / uncertainty analysis / emulator (“metamodelling”) / use of metadata
• metadata for data and for processes quality metadata / UncertML / quality principles & measures for processes
• metamodel for workflows notation/ encoding/ enrichment
• towards Web Workflow Service? WPS / WWS / requirements for workflow assessment
FP7 European project