geospatial etl with stetl - geopython 2016

46
Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz - Switserland June 24, 2016 www.justobjects.nl www.stetl.org

Upload: just-van-den-broecke

Post on 23-Jan-2018

1.101 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Geospatial ETL with Stetl - GeoPython 2016

Geospatial Data Processing with Stetl

Just van den BroeckeGeoPython 2016

Muttenz - SwitserlandJune 24, 2016

www.justobjects.nl

www.stetl.org

Page 2: Geospatial ETL with Stetl - GeoPython 2016

About MeIndependent Open Source Geospatial Professional

+ Secretary OSGeo Dutch Local Chapter + Member of the Dutch OpenGeoGroep

Just van den [email protected] www.justobjects.nl

Page 3: Geospatial ETL with Stetl - GeoPython 2016

Agenda

• Spatial ETL• Stetl

• Concepts• Cases• Status

• Q & A

Page 4: Geospatial ETL with Stetl - GeoPython 2016

Spatial ETL

Page 5: Geospatial ETL with Stetl - GeoPython 2016

Data Wrangling

Page 6: Geospatial ETL with Stetl - GeoPython 2016

ETL - Extract Transform Load

Page 7: Geospatial ETL with Stetl - GeoPython 2016

Spatial ETL

GML

PostGIS

Shapefile

WFS

CSV

SQLite

XML

GMLPostGIS

Shapefile

WFSCSV

SQLite

XML

Page 8: Geospatial ETL with Stetl - GeoPython 2016

GMLPostGIS

Shapefile

WFSCSV

SQLite

XML

Spatial ETL Example non-standard source

Page 9: Geospatial ETL with Stetl - GeoPython 2016

From: https://live.osgeo.org/en/overview/gdal_overview.html

Page 10: Geospatial ETL with Stetl - GeoPython 2016
Page 11: Geospatial ETL with Stetl - GeoPython 2016

Plenty of Tools…

Each tool is powerful by itself but cannot do the entire ETL

ogr2ogr

Spatial ETL

Page 12: Geospatial ETL with Stetl - GeoPython 2016

FOSS ETL - How to Combine Components?

=+ + ?+ ..

Page 13: Geospatial ETL with Stetl - GeoPython 2016

Example - 2011 INSPIRE-FOSS

http://inspire.kademo.nl/doc/design-etl.html

Nice ideas but hard to scale, deploy and reuse.

Need Framework

Page 14: Geospatial ETL with Stetl - GeoPython 2016

Solution: Add Python to the Equation

=+ + ?( )+ ..

Page 15: Geospatial ETL with Stetl - GeoPython 2016

Stetl

Solution: Add Python to the Equation

=+ +( )+ ..

Page 16: Geospatial ETL with Stetl - GeoPython 2016

Stetl =

Simple Streaming

Spatial Speedy

ETL

Page 17: Geospatial ETL with Stetl - GeoPython 2016

Stetl Concepts

Page 18: Geospatial ETL with Stetl - GeoPython 2016

Process Chain

Input Filter OutputFilter

Stetl concepts

Source Target

Page 19: Geospatial ETL with Stetl - GeoPython 2016

Process Chain

Input Filter Outputgml

Filter

Stetl concepts

Page 20: Geospatial ETL with Stetl - GeoPython 2016

Example: GML to PostGIS

GMLReader

PG Output

gml

Stetl concepts

Page 21: Geospatial ETL with Stetl - GeoPython 2016

Example: Data Model Transform

OGRReader XSLT

GMLWriter

gml

Stetl concepts

Simple Features

Complex Features

or Jinja2 !

Page 22: Geospatial ETL with Stetl - GeoPython 2016

Process Chain - How?

Input Filters Output

Stetl concepts

Stetl Config File

Instantiate

Page 23: Geospatial ETL with Stetl - GeoPython 2016

Example: XML to Shape

XMLInput

XSLTFilter

OGROutput

Page 24: Geospatial ETL with Stetl - GeoPython 2016

Example: XML to Shape

The Source File

Page 25: Geospatial ETL with Stetl - GeoPython 2016

Example: XML to Shape

XMLInput

Page 26: Geospatial ETL with Stetl - GeoPython 2016

Example: XML to Shape

XMLInput

XSLTFilter

Page 27: Geospatial ETL with Stetl - GeoPython 2016

Example: XML to Shape

Prepare XSLT Script

Page 28: Geospatial ETL with Stetl - GeoPython 2016

Example: XML to Shape

XSLT GML Output

Page 29: Geospatial ETL with Stetl - GeoPython 2016

Example: XML to Shape

XMLInput

XSLTFilter

OGROutput

OGC Simple

Features

XML DOM

Page 30: Geospatial ETL with Stetl - GeoPython 2016

Example: XML to Shape

The Stetl Config File

ProcessChain

XMLInputXSLT

Filter

ogr2ogrOutput

Page 31: Geospatial ETL with Stetl - GeoPython 2016

Running Stetl

stetl -c etl.cfg

stetl -c etl.cfg [-a <properties>]

Page 32: Geospatial ETL with Stetl - GeoPython 2016

Installing Stetl - PyPi

Deps•GDAL+Python bindings•lxml (xml proc)•psycopg2 (Postgres)

sudo pip install stetl

Page 33: Geospatial ETL with Stetl - GeoPython 2016

Installing Stetl - new: Docker

https://hub.docker.com/r/justb4/stetl

Page 34: Geospatial ETL with Stetl - GeoPython 2016

Speed: Streaming

Input Filter Output

gml

Stetl concepts

Page 35: Geospatial ETL with Stetl - GeoPython 2016

Speed: Going Native

Input Filter Output

gml

ogr2ogr StetlStetl

Native C Libs/Progs

Calls

Stetl concepts

Page 36: Geospatial ETL with Stetl - GeoPython 2016

Example Components

Input Filters Output

Stetl concepts

XMLFile XSLT GML

ogr2ogr XMLAssembler GDAL/OGR

LineStream XMLValidator WFS-T

Postgres/PostGIS Jinja2 Postgres/PostGISdeegree* FeatureExtractor deegree*YourInput YourFilter YourOutput

Page 37: Geospatial ETL with Stetl - GeoPython 2016

Example: XsltFilter Pythonfrom util import Util, etreefrom filter import Filterfrom packet import FORMAT

log = Util.get_log("xsltfilter")

class XsltFilter(Filter): # Constructor def __init__(self, configdict, section): Filter.__init__(self, configdict, section, consumes=FORMAT.etree_doc, produces=FORMAT.etree_doc)

self.xslt_file_path = self.cfg.get('script') self.xslt_file = open(self.xslt_file_path, 'r') # Parse XSLT file only once self.xslt_doc = etree.parse(self.xslt_file) self.xslt_obj = etree.XSLT(self.xslt_doc) self.xslt_file.close()

def invoke(self, packet): if packet.data is None: return packet return self.transform(packet)

def transform(self, packet): packet.data = self.xslt_obj(packet.data) log.info("XSLT Transform OK") return packet

Page 38: Geospatial ETL with Stetl - GeoPython 2016

[etl]chains = input_xml_file|my_filter|output_std

[input_xml_file]class = inputs.fileinput.XmlFileInputfile_path = input/cities.xml

# My custom component[my_filter] class = my.myfilter.MyFilter

[output_std]class = outputs.standardoutput.StandardXmlOutput

class MyFilter(Filter): # Constructor def __init__(self, configdict, section): Filter.__init__(self, configdict, section, consumes=FORMAT.etree_doc, produces=FORMAT.etree_doc)

def invoke(self, packet): log.info("CALLING MyFilter OK!!!!") return packet

Your Own Components

Stetl concepts

Step 1- Define Class

Step 2- Configure Class

Page 39: Geospatial ETL with Stetl - GeoPython 2016

Data Structures

Stetl concepts

• Components exchange Packets • Packet contains data and status• Data formats, e.g. :

xml_line_stream etree_docetree_element (feature)etree_element_arraystringany..

Page 40: Geospatial ETL with Stetl - GeoPython 2016

Cases

Page 41: Geospatial ETL with Stetl - GeoPython 2016
Page 42: Geospatial ETL with Stetl - GeoPython 2016

Cases - GML to PostGIS

• National Dutch Open Datasets (GML) http://nlextract.nl ✴ Topography: Top10NL, BGT✴ Cadastral Parcels (BRK)

• Ordnance Survey UK✴ PoC Topography (OS Mastermap)

Page 43: Geospatial ETL with Stetl - GeoPython 2016

Cases - IoT/SensorWeb• SOSPilot

✴ Dutch Air Quality Data✴ publish to PostGIS+SOS (SOS-T)✴ EU Reporting (Jinja2 Filter)✴ sospilot.geonovum.nl

• Smart Emission✴ AQ Sensors hosted by citizens✴ Calibration/Aggregation✴ publication to SOS and OGC SensorThings✴ data.smartemission.nl

Page 44: Geospatial ETL with Stetl - GeoPython 2016

Project Status - June 24, 2016

• v1.0.9 installable via PyPi or Docker • Documentation on www.stetl.org • Real world transforms done

Page 45: Geospatial ETL with Stetl - GeoPython 2016

Project - Planned & in progress

• PyWPS integration • GUI - Flask+Celery+Redis? - Node Red? - Jupiter Notebook?• More on GitHub https://github.com/geopython/stetl

Page 46: Geospatial ETL with Stetl - GeoPython 2016

Thank You !

www.stetl.org

github.com/geopython/stetl