open source geospatial business intelligence (geobi): definition, architectures, projects,...
DESCRIPTION
Slides of the presentation I gave at the OSGIS 2011 conference, June 21-22, 2011, Nottingham, UK.TRANSCRIPT
Open Source Geospatial Business Intelligence (GeoBI):
Definition, architectures, projects, challenges and outlooks
Prof. Thierry Badard3rd OSGIS Conference
Nottingham University, Nottingham, UK – June 21-22, 2011
● Open data / Data deluge● BI & Geospatial BI (GeoBI)
● Definition● Architecture● Offerings
● Open source GeoBI solutions● Conclusion & outlooks
Content of the presentation
Open data, OSM, social networks, mobility ...
Open data, OSM, social networks, mobility ...
Open data, OSM, social networks, mobility ...
Open data, OSM, social networks, mobility ...
Open data, OSM, social networks, mobility ...
Open data, OSM, social networks, mobility ...
Open data, OSM, social networks, mobility ...
Open data, OSM, social networks, mobility ...
Are you ready for the data deluge?
● Store and face huge amounts of data, that are very diverse and heterogeneous● Sales● Stocks● Socio-economical data● Surveys● …● Impact assessment of their advertisment campaigns● Website visits● Social networks
Nowadays, companies and public bodies …
● Which imposes to take informed and strategic decisions more and more often and rapidly in order to keep competitive
● So, they require to analyze these huge amounts of data rapidly, simply and efficiently
● For that they use more and more some BI (Business Intelligence) tools
● These tools produce:● Synthesis reports (PDF, Excel, ...)● Interactive analytical dashboards
In a world that changes rapidly ...
Reports and dashboards
Reports and dashboards
Reports and dashboards
Reports and dashboards
Reports and dashboards
Reports and dashboards
● About 80% of all data stored in corporate databases has a spatial component [Franklin, 1992]
● Mail address● Postal/ZIP code● Coordinates● IP, phone number, …
● As time, space (location) is a crucial dimension that should be taken into account when analyzing corporate data!
Location is everywhere! ;-)
● It requires to get the right tools in order to observe, handle and explore this type of information in BI tools ...
● Let us imagine you are HSBC … or ministry for public safety … or Haiti government ...
● Do you think that cross tables or charts are the best media to answer these type of questions ?
How to take into account location …
Le bon médium, c'est la carte !The right medium is obviously a map!
Le bon médium, c'est la carte !The right medium is obviously a map!
Le bon médium, c'est la carte !The right medium is obviously a map!
● Brings :● The map ...● and all the tools required to represent, navigate and
analyse the geospatial dimension
● In the very core of BI tools of the market● In order to fully take into account the spatial
dimension in the corporate data analysis and decision processes based on these data!
So, GeoBI (Geospatial BI) ...
Web 2.0
Business Intelligence
(BI)Geospatial
Ind
ustr
y
3 billions $ in software
(total of 15 billions $ IT)
8,8 billions $ in software
(increased of 22% in 2008)Siz
e
GeoBI
Pla
yers
BI and geospatial industry
Back in 2005 ...
● GeoSOA research team: research in BI, geospatial and mobiliy
● Purposes: ● Propose ubiquitous and mobile decision making tools which make use of the
geospatial dimension (mobile GeoBI),
● And find ways to deliver a business info adapted to the context of the user (location, environment, situation, pupose, ...)
● But, unfortunately: ● No GeoBI tool were available and enabling:
– The processing of large volumes of information that come from very diverse and heterogeneous data sources
– The rapid delivery of these data to numerous concurrent users in a connected/disconnected modes (web services architecture)
– The support of standards dedicated to mobility
– With key GeoBI capabilities and with a rich, comprehensive and extensible API at an affordable cost
– Relying on a robust and complete BI architecture in order to be able to fulfill all types of user requirements
● Transactional databases● Web ressources● XML, flat files, proprietary file formats (Excel spreadsheets, …)● LDAP● …
Classical architecture of a BI infrastructure
● Repository of an organization’s historical data, for analysis purposes.
● Primarily destined to analysts and decision makers.
● Separate from operational (OLTP) systems (source data) But often stored in relational DBMS: Oracle, MSSQL,
PostgreSQL, MySQL, Ingres, …
● Contents are often presented in a summarized form (e.g. key performance indicators, dashboards, OLAP client applications, reports).Need to define some metrics/measures
The Data Warehouse: the crucial/central part!
● Optimized for: Large volumes of data (up to terabytes); Fast response (<10 s) to analytical queries (vs. update speed for
transactional DB): de-normalized data schemas (e.g. star or snowflake schemas),
Introduces some redundancy to avoid time consuming JOIN queries
all data are stored in the DW across time (no corrections), summary (aggregate) data at different levels of details and/or time scales, (multi)dimensional modeling (a dimension per analysis axis).
All data are interrelated according to the analysis axes (OLAP datacube paradigm)
● Focus is thus more on the analysis / correlation of large amount of data than on retrieving/updating a precise set of data!
● Specific methods to propagate updates into the DW needed!
The Data Warehouse: the crucial/central part!
● MDX stands for MultiDimensional eXpressions● Multidimensional query language● De facto standard from Microsoft for SQL Server OLAP Services
(now Analysis Services)● Also implemented by other OLAP servers (Essbase, Mondrian) and
clients (Proclarity, Excel PivotTables, Cognos, JPivot, …)● MDX is for OLAP data cubes what SQL is for relational databases● Looks like a SQL query but relies on a different model (close to the
one used in spreasheets)● SELECT
{ [Measures].[Store Sales] } ON COLUMNS, { [Date].[2002], [Date].[2003] } ON ROWSFROM SalesWHERE ( [Store].[USA].[CA] )
MDX query language
● SELECT { [Product].[All Products].[Drink], [Product].[All Products].[Food] } ON COLUMNS, { [Store].[All Stores].[USA].[WA].[Yakima].[Store 23], [Store].[All Stores].[USA].[CA].[Beverly Hills].[Store 6], [Store].[All Stores].[USA].[OR].[Portland].[Store 11] } ON ROWSFROM WarehouseWHERE ([Time].[1997], [Measures].[Units Shipped])
OLAP client software propose: Alternate representation modes (pie charts, diagrams, etc.) Different tools to refine queries/explore data
Drill down, roll up, pivot, … Based on operators provided by MDX
Results representation
Require to consistently integrate the geospatial component in all parts of the architecture!
Geospatial BI adds maps and spatial analysis!
● Pentaho (http://www.pentaho.org)
Kettle Mondrian
Weka
PentahoReporting
+ CDF: Community Dashboard Framework+ Other projects: olap4j, JPivot, Halogen, …
Pentaho open source BI software stack
GeoKNIME
& integration in various dashboard and reporting
tools
S
Spatial
Spatial• PostGIS• Oracle Spatial
Towards an open source geospatial BI stack
● A type of software used to populate databases or data warehouses from heterogeneous data sources.
● ETL stands for:− Extract – Extract data from data sources
− Transform – Transformation of data in order to correct errors, make some data cleansing, change the data structure, make them compliant to defined standards, etc.
− Load – Load transformed data into the target DBMS
● An ETL tool should manage the insertion of new data and the updating of existing data.
● Should be able to perform transformations from :− An OLTP system to another OLTP system
− An OLTP system to an analytical data warehouse
An ETL tool is …
Why use an ETL tool?
● Automation of complex and repetitive data processing without producing any specific code
● Conversion between various data formats
● Migration of data from a DBMS to another
● Data feeding into various DBMS
● Population of analytical data warehouses for decision support purposes
● etc.
● GeoKettle is a "spatially-enabled" version of Pentaho Data Integration (Kettle)
● Kettle is a metadata-driven ETL with direct execution of transformations− No intermediate code generation!
● Support of several DBMS and file formats− DBMS support: MySQL, PostgreSQL, Oracle, DB2, MS SQL Server, ...
(total of 37)
− Read/write support of various data file formats: text, Excel, Access, DBF, XML, ...
● Numerous transformation steps
● Support of methods for the updating of DW
GeoKettle
● GeoKettle provides a true and consistent integration of the spatial component− All steps provided by Kettle are able to deal with geospatial data types
− Some geospatial dedicated steps have been added
● First release in May 2008: 2.5.2-20080531
● Current stable version: 3.2.0-r188-20090706
● To be released shortly: GeoKettle 2.0 with many new features!
● Released under LGPL at http://www.geokettle.org
● Used in different organizations and countries:− Some ministries, bank, insurance, integrators, …
− E.g. GeoETL from Inova is in fact GeoKettle! :-)
● A growing community of users and developers
GeoKettle
● Transformations vs. Jobs:
− Running in parallel vs. running sequentially
● All can be stored in a central repository (database)
− But each transformation or job could also be saved in a simple XML file!
● Offers different interfaces:
− Spoon: GUI for the edition of transformations and jobs
− Pan: command line interface for running transformations
− Kitchen: command line interface for running jobs
− Carte: Web service for the remote execution of transformations and jobs
GeoKettle
GeoKettle – Spoon
● Version 2.0 provides support for:
− Handling geometry data types (based on JTS)
− Accessing Geometry objects in JavaScript
− It allows the definition of custom transformation steps by the user (“Modified JavaScript Value” step)
− Topological predicates (Intersects, crosses, etc.) and aggregation operators (envelope, union, geometry collection, ...)
− SRS definition and transformations
− Input / Output with some spatial DBMS
- Native support for Oracle, PostGIS and MySQL
- MS SQL Server 2008 and SQLite/Spatialite
- Ingres and IBM DB2 can be used but it requires some tricks
− GIS file Input / Output: Shapefile, GML 3, KML 2.2 and OGR support (~33 vector data formats and DBMS)
− Sensor Web (SOS) and metadata (CSW) services
− Cartographic preview
− Spatial analysis and advanced geo-processing
GeoKettle
● GeoKettle releases are aligned with the ones of Pentaho Data Integration (Kettle), − GeoKettle then benefits all new features provided by
PDI (Kettle).
● Kettle is natively designed to be deployed in cluster and web service environments. − It makes GeoKettle a perfect software component to be deployed
as a service (SaaS) in cloud computing environments as those provided by Amazon EC2.
− It enables then the scalable, distributed and on demand processing of large and complex volumes of geospatial data in minutes for critical applications and without requiring a company to invest in an expensive IT infrastructure of servers, networks and software.
GeoKettle
GeoKettle – Requirements and install
● Very simple installation procedure● All you need is a Java Runtime Environment
● Version 5 or higher
● Just unzip the binary archive of GeoKettle ...● And let’s go !
● Run spoon.sh (UNIX/Linux/Mac) or spoon.bat (Windows)
● Need help, please visit our wiki:● http://wiki.spatialytics.org
- Demo -
GeoKettle
● Upcoming features:− Implementation of data matching and conflation steps/jobs in order to
allow advanced geometric data cleansing and comparison of geospatial datasets
− Read/write support for other DBMS, GIS file formats and services− LAS (LiDAR), ...
− WFS-T, Sensor Web (TML, SensorML, SOS-T, ...), ...
− Dedicated steps for social media (Twitter, ...), OSM, generalization, ...
− Support of the third dimension, (x,y,z, M), linear reference systems, raster ...
− Raster support: development in progress of a plugin to integrate all capabilities provided by the Sextante library (BeETLe)
GeoKettle
● GeoMondrian is a "spatially-enabled" version of Pentaho Analysis Services (Mondrian)
● GeoMondrian brings to the Mondrian OLAP server what PostGIS brings to the PostgreSQL DBMS– i.e. a consistent and powerful support for geospatial data.
● Licensed under the EPL
● http://www.geo-mondrian.org
GeoMondrian
● As far as we know, it is the first implementation of a true Spatial OLAP (SOLAP) Server− And it is an open source project! ;-)
● Provides a consistent integration of spatial objects into the OLAP data cube structure− Instead of fetching them from an external spatial DBMS, web
service or a GIS file
● Implements a native Geometry data type● Provides first spatial extensions to the MDX language
− Add spatial analysis capabilities to the analytical queries
● At present, it only supports PostGIS datawarehouses− But other DBMS will be supported in the next version!
GeoMondrian
● Goal: bring to Mondrian and MDX what SQL spatial extensions do for relational DBMS (i.e. Simple Features for SQL and implementations such as PostGIS).
● Example query: filter spatial dimension members based on distance from a feature− SELECT
{[Measures].[Population]} on columns, Filter( {[Unite geographique].[Region economique].members}, ST_Distance([Unitegeographique].CurrentMember.Properties("geom"), [Unite geographique].[Province].[Ontario].Properties("geom")) < 2.0 ) on rows FROM [Recensements] WHERE [Temps].[Rencensement 2001 (2001-2003)].[2001]
Spatially enabled MDX
● Many more possibilities:− in-line geometry constructors (from WKT)
− member filters based on topological predicates (intersects, contains, within, …)
− spatial calculated members and measures (e.g. aggregates of spatial features, buffers)
− calculations based on scalar attributes derived from spatial features (area, length, distance, …)
Spatially enabled MDX
- Demo -
GeoMondrian
● SOLAPLayers is a lightweight cartographic component (framework) which enables navigation in geospatial (Spatial OLAP or SOLAP) data cubes, such as those handled by GeoMondrian.
● It aims to be integrated into existing dashboard frameworks in order to produce interactive geo-analytical dashboards.
● Such dashboards help in supporting the decision making process by including the geospatial dimension in the analysis of enterprise data.
● First version stems from a GSoC 2008 project performed under the umbrella of OSGeo.
● Licensed under BSD (client part) and EPL (server part).● http://www.solaplayers.org
SOLAPLayers
● Version 1 was based on OpenLayers and Dojo● It allows:
− the connection with a Spatial OLAP server such as GeoMondrian,
− some basic navigation capabilities in the geospatial data cubes,
− and the cartographic representation of some measures as static or dynamic choropleth maps, maps with proportional symbols.
SOLAPLayers v1
SOLAPLayers v1
SOLAPLayers v1
● Version 1 was a mostly proof of concept!
● It presents important limitations:− Allows only the cartographic representation (no crosstabs or
charts)
− Works only for one measure and the spatial dimension !
− Offers limited navigation capabilities in the geospatial data cubes
− Is able to connect to GeoMondrian only
− Extending the framework is difficult due to thelack of flexibility and the poor documentation of Dojo,
− Integration with other currently used geo-web and dashboard frameworks was difficult
− ...
SOLAPLayers v1
● So, SOLAPLayers has undergone (and is still undergoing ;-) ) a deep re-engineering!
● Version 2 is fully based on ExtJS/GeoExt (and hence OpenLayers)
− It will make its integration with other geo/web and BI/dashboard frameworks easier
− It provides some new ExtJS components dedicated to GeoBI!
− Based on the philosophy for the development of applications adopted by these geo-web frameworks, it allows an easier creation/maintenance of the produced geo-analytical dashboards!
− Like ExtJS, it supports internationalization!
SOLAPLayers 2.0
MDX
OLA
P4J
SOLAPJSON
Client
Authentication
Built-in or LDAPServer
Server
Native or XML/A
SOLAP Server
1
SOLAPLayers 2.0 – Architecture
MDX
OLA
P4J
SOLAPJSON
Client
Authentication
Built-in or LDAPServer
Server
Native or XML/A
SOLAP Server
Native or XML/A
Geospatial data source(WFS, DBMS, ...)
OLAP Server
1
2
SOLAPLayers 2.0 – Architecture
MDX
OLA
P4J
SOLAPJSON
Client
Authentication
Built-in or LDAPServer
Server
Native or XML/A
SOLAP Server
Native or XML/A
Geospatial data source(WFS, DBMS, ...)
OLAP Server
Bridge architecture • Maximize what is in place in organisations• But, no Geo-MDX capabilities available!• Maximize what is in place in organisations• But, no Geo-MDX capabilities available!
1
2
SOLAPLayers 2.0 – Architecture
MDX
OLA
P4J
SOLAPJSON
Client
Authentication
Built-in or LDAPServer
Server
Native or XML/A
SOLAP Server
Native or XML/A
Geospatial data source(WFS, DBMS, ...)
OLAP Server
Geospatial data source(WFS, DBMS, ...)
1
2
3
SOLAPLayers 2.0 – Architecture
MDX
OLA
P4J
SOLAPJSON
Client
Authentication
Built-in or LDAPServer
Server
Native or XML/A
SOLAP Server
Native or XML/A
Geospatial data source(WFS, DBMS, ...)
OLAP Server
Geospatial data source(WFS, DBMS, ...)
• For simple geo-dashboards• Based on transactional data• Thematic mapping• No Geo-MDX and drill-down or roll-up capabilities!
1
2
3
SOLAPLayers 2.0 – Architecture
SOLAPLayers 2.0 – Architecture
● Many more connectors to come …– GeoKettle
– CDA (BI domain)
– Oracle/MSSQL support
– ...
● Many more output formats to be available ...– Image (PNG, ...)
– KML
– ...
Define the template of the dashboard in a HTML file1
SOLAPLayers 2.0 – Geo-dashboard made easy!
Define the template of the dashboard in a HTML file1
Define your dashboard components in a JS file and map it to the div in the HTML file2
SOLAPLayers 2.0 – Geo-dashboard made easy!
Enjoy! ;-)
3
SOLAPLayers 2.0 – Geo-dashboard made easy!
SOLAPLayers 2.0
SOLAPLayers 2.0
SOLAPLayers 2.0
SOLAPLayers 2.0
SOLAPLayers 2.0
SOLAPLayers 2.0
- Demo -
● Constats● Quartiers pauvres
– Tenderloin– Bayview– Mission
● Quartiers riches– Central– Southern– Northern– …
● Sud-Est de San Francisco– Criminalité élevée
GeoKNIME
GeoKNIME
77
>> voir Vidéo
78
Drogue/Narcotique
Violence armée
Prostitution
Cluster#1Cluster#2Cluster#3
Nœud de configuration
KNN
Prédiction
Valeurs réelles
● A rich and passionating topic ...
● And it is just an introduction ;-)
● Do not hesitate to ask for more demos!
● R&D is still required in this domain ...
● Open source allows cooperation/collaboration and makes research work easier: students can use the infrastructure to develop their own research without reinventing the wheel
● Based on this infrastructure, some research works on mobile GeoBI have already been performed:
– Web services for the dissemination of geo-analytical data
– Methods for real time updating and integration of data stemming from sensors
– Definition of mobile GeoBI context profiles
– Implementation of first mobile and context aware GeoBI clients
– …
● Towards active geospatial datawarehouses ...
Conclusion & outlooks