stetl for inspire data transformation

69
INSPIRE Transformation with Stetl - A lightweight Python Framework for Geospatial ETL Just van den Broecke EuroGeographics - KEN Workshop Paris, Oct 8, 2013 www.justobjects.nl

Upload: just-van-den-broecke

Post on 11-May-2015

548 views

Category:

Technology


1 download

DESCRIPTION

Slides of presentation given at EuroGeographics KEN workshop on INSPIRE Data Harmonization, Paris oct 8-9, 2013: http://www.eurogeographics.org/event/inspire-ken-schema-transformation-workshop. Describes the Stetl ETL framework and cases of INSPIRE transformation. There is a video recording of this presentation: https://www.youtube.com/watch?v=vjdpYBm4AaM (first about XSLT and about halfway on Stetl for INSPIRE)

TRANSCRIPT

Page 1: Stetl for INSPIRE Data Transformation

INSPIRE Transformation with Stetl-

A lightweight Python Framework for Geospatial ETL

Just van den BroeckeEuroGeographics - KEN Workshop

Paris, Oct 8, 2013www.justobjects.nl

Page 2: Stetl for INSPIRE Data Transformation

About MeIndependent Open Source Geospatial Professional

Secretary OSGeo Dutch Local Chapter Member of the Dutch OpenGeoGroep

Just van den [email protected] www.justobjects.nl

Page 3: Stetl for INSPIRE Data Transformation

We have a Problem

Page 4: Stetl for INSPIRE Data Transformation

The Rich GML Problem

Page 5: Stetl for INSPIRE Data Transformation

Rich GML = Complex Mess

Page 6: Stetl for INSPIRE Data Transformation

INSPIRE Dutch National Datasets

Germany: AFIS-ALKIS-ATKISUK: OS Mastermap

.

.

Page 7: Stetl for INSPIRE Data Transformation

“Semi GML” e.g. Dutch Addresses & Buildings (BAG)

ArbitraryNesting

Page 8: Stetl for INSPIRE Data Transformation

The Street Name!

A Street Element in an INSPIRE Annex I Address..

Page 9: Stetl for INSPIRE Data Transformation

Complex Model

Transformations

Page 10: Stetl for INSPIRE Data Transformation

100+ MBGML Files

Page 11: Stetl for INSPIRE Data Transformation
Page 12: Stetl for INSPIRE Data Transformation

Millionsof

Objects

Page 13: Stetl for INSPIRE Data Transformation

10s of Millionsof

<Elements>

Page 14: Stetl for INSPIRE Data Transformation

MultipleTransformation

Steps

Page 15: Stetl for INSPIRE Data Transformation

Solution is Spatial ETL

Page 16: Stetl for INSPIRE Data Transformation

But How ?(with FOSS)

Page 17: Stetl for INSPIRE Data Transformation

FOSS ETL - DIY ? Maybe

Page 18: Stetl for INSPIRE Data Transformation

FOSS ETL - High Level

Page 19: Stetl for INSPIRE Data Transformation

FOSS ETL - Lower Level

Each powerful individually but cannot do the entire ETL

ogr2ogr

Page 20: Stetl for INSPIRE Data Transformation

FOSS ETL - How to Combine?

=+ + ?ogr2ogr

Page 21: Stetl for INSPIRE Data Transformation

Example - 2011 Kadaster ESDIN

http://inspire.kademo.nl/doc/design-etl.html

Good ideas buthard to scale and reuse. Need Framework

Page 22: Stetl for INSPIRE Data Transformation

FOSS ETL : Add Python to Equation

=+ + ?( )ogr2ogr

Page 23: Stetl for INSPIRE Data Transformation

=+ +

Stetl

( )ogr2ogr

Page 24: Stetl for INSPIRE Data Transformation

Stetl=

SimpleStreaming

SpatialSpeedy

ETL

Page 25: Stetl for INSPIRE Data Transformation

GML1

GML2

Stetl

From Barrels of GML to Maps

Page 26: Stetl for INSPIRE Data Transformation
Page 27: Stetl for INSPIRE Data Transformation

From Local National Datato INSPIRE DL Services

Source<GML>

NLExtractStetl deegree

WFS

INSPIRE<GML>

AtomFeed

INSPIREAddresses

DutchAddresses+

Buildings

deegreeblobstore

Stetl

Page 28: Stetl for INSPIRE Data Transformation

StetlConcepts

Page 29: Stetl for INSPIRE Data Transformation

Process Chain

Input Filter OutputFilter

Stetl concepts

Source Target

Page 30: Stetl for INSPIRE Data Transformation

Process Chain

Input Filter Outputgml

Filter

Stetl concepts

Page 31: Stetl for INSPIRE Data Transformation

Example: GML to PostGIS

Reader ogr2ogr

gml

Stetl concepts

Page 32: Stetl for INSPIRE Data Transformation

Example: INSPIRE Model Transform

ogr2ogr XSLT Writergml

Stetl concepts

Simple Features

Complex Features

Page 33: Stetl for INSPIRE Data Transformation

Example: deegree Store

ogr2ogr XSLTdeegreeWriter

Stetl concepts

Or viaWFS-T

Page 34: Stetl for INSPIRE Data Transformation

Process Chain - How?

Input Filters Output

Stetl concepts

Page 35: Stetl for INSPIRE Data Transformation

Example: XML to Shape

XMLInput

XSLTFilter

ogr2ogrOutput

Page 36: Stetl for INSPIRE Data Transformation

Example: XML to Shape

The Source

Page 37: Stetl for INSPIRE Data Transformation

Example: XML to Shape

XMLInput

Page 38: Stetl for INSPIRE Data Transformation

Example: XML to Shape

XMLInput

XSLTFilter

Page 39: Stetl for INSPIRE Data Transformation

Example: XML to Shape

Prepare XSLT Script

Page 40: Stetl for INSPIRE Data Transformation

Example: XML to Shape

XSLT GML Output

Page 41: Stetl for INSPIRE Data Transformation

Example: XML to Shape

XMLInput

XSLTFilter

ogr2ogrOutput

Page 42: Stetl for INSPIRE Data Transformation

Example: XML to Shape

The Stetl Config File

ProcessChain

XMLInputXSLT

Filter

ogr2ogrOutput

Page 43: Stetl for INSPIRE Data Transformation

Running Stetl

stetl -c etl.cfg

Page 44: Stetl for INSPIRE Data Transformation

Result Shapefile viewed in QGIS

Page 45: Stetl for INSPIRE Data Transformation

Installing Stetl

via PyPi

Deps•GDAL+Python bindings•lxml (xml proc)•psycopg2 (Postgres)

sudo pip install stetl

Page 46: Stetl for INSPIRE Data Transformation

Speed: Streaming

Input Filter Output

gml

Stetl concepts

Page 47: Stetl for INSPIRE Data Transformation

Speed: Going Native

Input Filter Outputgml

ogr2ogr StetlStetl

Native C Libs/Progs

Calls

Stetl concepts

Page 48: Stetl for INSPIRE Data Transformation

Example Components

Input Filters Output

Stetl concepts

XMLFile XSLT GMLFile

ogr2ogr XMLAssembler ogr2ogr

LineStream XMLValidator WFS-T

deegree* FeatureExtractor deegree*

YourInput YourFilter YourOutput

Page 49: Stetl for INSPIRE Data Transformation

Example: XsltFilter Pythonfrom util import Util, etreefrom filter import Filterfrom packet import FORMAT

log = Util.get_log("xsltfilter")

class XsltFilter(Filter): # Constructor def __init__(self, configdict, section): Filter.__init__(self, configdict, section, consumes=FORMAT.etree_doc, produces=FORMAT.etree_doc)

self.xslt_file_path = self.cfg.get('script') self.xslt_file = open(self.xslt_file_path, 'r') # Parse XSLT file only once self.xslt_doc = etree.parse(self.xslt_file) self.xslt_obj = etree.XSLT(self.xslt_doc) self.xslt_file.close()

def invoke(self, packet): if packet.data is None: return packet return self.transform(packet)

def transform(self, packet): packet.data = self.xslt_obj(packet.data) log.info("XSLT Transform OK") return packet

Page 50: Stetl for INSPIRE Data Transformation

[etl]chains = input_xml_file|my_filter|output_std

[input_xml_file]class = inputs.fileinput.XmlFileInputfile_path = input/cities.xml

# My custom component[my_filter]class = my.myfilter.MyFilter

[output_std]class = outputs.standardoutput.StandardXmlOutput

class MyFilter(Filter): # Constructor def __init__(self, configdict, section): Filter.__init__(self, configdict, section, consumes=FORMAT.etree_doc, produces=FORMAT.etree_doc)

def invoke(self, packet): log.info("CALLING MyFilter OK!!!!") return packet

Your Own Components

Stetl concepts

Step 1- Define Class

Step 2- Config Class

Page 51: Stetl for INSPIRE Data Transformation

Data Structures

Stetl concepts

• Components exchange Packets• Packet contains data and status• Data formats, e.g. :

xml_line_stream etree_docetree_element (feature)etree_element_arraystringany..

Page 52: Stetl for INSPIRE Data Transformation

deegree Integration

Stetl concepts

•Input DeegreeBlobstoreInput•Output DeegreeBlobstoreInput DeegreeFSLoaderOutput WFSTOutput

Page 53: Stetl for INSPIRE Data Transformation

Cases - The Netherlands

•INSPIRE Download Services publish to deegree store (WFS) generate GML files (for Atom Feed)

•National GML Datasets GML to PostGIS (Top10NL, BGT)

Page 54: Stetl for INSPIRE Data Transformation

[etl]chains = input_sql_pre|schema_name_filter|output_postgres, input_big_gml_files|xml_assembler|transformer_xslt|output_ogr2ogr, input_sql_post|schema_name_filter|output_postgres

# Pre SQL file inputs to be executed[input_sql_pre]class = inputs.fileinput.StringFileInputfile_path = sql/drop-tables.sql,sql/create-schema.sql

# Post SQL file inputs to be executed[input_sql_post]class = inputs.fileinput.StringFileInputfile_path = sql/delete-duplicates.sql

# Generic filter to substitute Python-format string values like {schema} in string[schema_name_filter]class = filters.stringfilter.StringSubstitutionFilter# format args {schema} is schema nameformat_args = schema:{schema}

[output_postgres]class = outputs.dboutput.PostgresDbOutputdatabase = {database}host = {host}port = {port}user = {user}password = {password}schema = {schema}

# The source input file(s) from dir and produce gml:featureMember elements[input_big_gml_files]class = inputs.fileinput.XmlElementStreamerFileInputfile_path = {gml_files}element_tags = featureMember

Top10NL Extract

ParameterSubstitution

Page 55: Stetl for INSPIRE Data Transformation

Top10NL+BAG (Dutch Topo + Buildings)

Page 56: Stetl for INSPIRE Data Transformation

BGT - Dutch Large Scale Topo

Page 57: Stetl for INSPIRE Data Transformation

Cases - INSPIRE Transforms

•Simple: Dutch Admin Borders to AU

•Advanced: Dutch Addresses to AD

Page 58: Stetl for INSPIRE Data Transformation

INSPIRE - XSLT STRUCTURE

Local CP GMLto

INSPIRE SpatialDataset

Local CP GMLto

INSPIRE GML

GenerateCP INSPIRE GML

ReusableXSLT ScriptsReusable

XSLT Scripts

Theme CP

Local AU GMLto

INSPIRE SpatialDataset

Local AU GMLto

INSPIRE GML

GenerateAU INSPIRE GML

Theme AU

Local GN GMLto

INSPIRE SpatialDataset

Local GN GMLto

INSPIRE GML

GenerateGN INSPIRE GML

Theme GN

Called by All

Locally Specific XSL

GenericXSL

XSLT Template Call

Page 59: Stetl for INSPIRE Data Transformation

XSLT - 3 MAIN STEPS/SCRIPTS

1.Generate Spatial Dataset GML Container (specific)

2.Extract data values from local OGR simple feature data (specific)

3. Call XSLT template per Theme Feature type (generic)

Page 60: Stetl for INSPIRE Data Transformation

XSLT AU - STEP 1

Page 61: Stetl for INSPIRE Data Transformation

XSLT AU - STEP 2

Page 62: Stetl for INSPIRE Data Transformation

XSLT AU - STEP 3

Page 63: Stetl for INSPIRE Data Transformation

XSLT - REUSE

Page 64: Stetl for INSPIRE Data Transformation

STETL CONFIG

Page 65: Stetl for INSPIRE Data Transformation

STETL CONFIG AD

Page 66: Stetl for INSPIRE Data Transformation

Case: INSPIRE DL Services - Dutch Addresses

Source<GML>

NLExtractStetl deegree

WFS

INSPIRE<GML>

AtomFeed

INSPIREAddresses

DutchAddresses+

Buildings

deegreeblobstore

Stetl

Other Uses (Geocoder etc)

Page 67: Stetl for INSPIRE Data Transformation

Project Status - Sept 21, 2013

• v1.0.4 installable via PyPi• Documentation on www.stetl.org • Real world transforms done• Seeking feedback, support and contributors

Page 68: Stetl for INSPIRE Data Transformation

Rich GML Problem Solved?