st-toolkit, a framework for trajectory data warehousing
DESCRIPTION
Presentation of ST-Toolkit: a Framework for trajectory Data warehousing short paper published in AGILE 2011, Utrecht 20th April 2011.TRANSCRIPT
ST-Toolkit: a Framework for Trajectory Data Warehousing
Authors
AGILE 2011Utrecht – 20/04/2011
Simone CamporaJose Fernandes De MacedoLaura Spinsanti
Overview
Agenda
• Introduction• Generic Data Warehouse Schema for Trajectories• Generic Data Warehouse Architecture• Some experiments• Conclusions
Why Trajectory Data Warehousing
The motivation behind Trajectory Data Warehouses (TrDWs) is to transform raw moving objects' trajectories to valuable information that can be exploited for decision-making purposes in ubiquitous applications, such as location-based services, traffic control management, etc using an OLAP (or STOLAP) fashion.
Why Trajectory Data Warehousing
Problems in Trajectory Management
• Rapid Access to huge archive of Data(e.g. our dataset counts 2Mlns records in one week
only!)
• Knowledge Discovery on Trajectories(extract interesting patterns from Raw Coordinates)
• Knowledge Presentation(how to deliver such information extractions?)
• Semantic Integration(how to integrate semantic support-data?)
Our Contribution
Is developed following two main axes:
• Theoretical• by providing a generic TrDW schema propositions that is robust, intuitive and fits the most used cases• by providing a centric and non fragmented overview on the main topics of trajectory data warehousing
•Architectural• by deploying a modular cross-database cross-platform Middleware to support Spatio-Temporal data warehouse modeling
Trajectory Extraction
Raw Data Level
Trajectory Level (Semantic Level)
Episodes Level
Application Domain Level
stop
stop
stop
stopstop
move move
movemove
Cleaned Data Level
garage workplace workplacelunch meeting
garage
Generic Data Warehouse Schema
•Solution built around “Episodes”
• Independent External Semantics
• Trajectory-based Pre-Grouping
(MutiDimER notation)
Episode
Moving Entity
Avg Speed
Type
Var Speed
Lifetime
Shape
Month
name
Year
name
Trajectories
IDLifetimeAvg Travel TimeAvg SpeedTravel TypeMoving entity
Time
TIME
Day
NameTimestamp
Events
NameCategoryAverage Visit Time
EVENTS
TRAJECTORY GROUP
Trajectory Groups
Main Trajectory Groups
ShapeNumber of trajectoriesNumber of episodes
Region
Shapename
Environment
Shapename
Space
Area
ShapeSurfacename
SPACE
Designing a Data Warehouse solution can be tricky because of
Data Warehouse Design Issues
Lack of standard interfacesLack of standard interfaces
every commercial/academic solution is implementing different approaches to istantiate multi every commercial/academic solution is implementing different approaches to istantiate multi dimensional models into Databasesdimensional models into Databases
Lack of standard interfacesLack of standard interfaces
every commercial/academic solution is implementing different approaches to istantiate multi every commercial/academic solution is implementing different approaches to istantiate multi dimensional models into Databasesdimensional models into Databases
• Longer learning curves
• Difficulties while migrating to different architectures (RDBMS,Distributed FS, … )
• Difficulties in replicating the same TDW on the same architecture
Generic Data Warehouse Architecture
J2EE Application Server
Remote Objects JDBC Provider MDX Endpoint XML/A Web Service
Generic Data Warehouse Interfacing Middleware
RDBMS Oracle OLAP Mondrian JDBC
Data Loader (ETL)
Data Warehouse Designer
Query Parser and Translator
Generic Object Interface for Data Warehouse Objects
...
Analytical Workspace Analytical Workspace Analytical Workspace
Cube Cube Cube Cube Cube Cube Cube Cube
Mondrian Bridging
Example: Data Warehouse Design
Design Module - Mondrian
GPS Dataset
Time Dimension
Space Dimension
Event Dimension
Trajectory Dimension
Events (e.g.POI)
Space Areas
Input Data
Episode Cube
Episode Facts
Data Warehouse
Java Object Translation
List<Event> (LifeSpanPoint)
List<Geometry>
Relational Load Procedures (alternatively Hibernate Persistence)
Event Table Loader
Space Table Loader
Trajectory Extractor (ETL Procedure)
List<Trajectory>
List<Episode>
List<Geometry> Trajectory Table Loader
Episodes Fact Table Loader
Time Table LoaderList<LifeSpan>
Raw Data Level Run Time Memory Level Database Storage Level Multidimensional Mapping Level
First Step
Data are Streamed From Raw
Datasets into Primary Memory
Second Step
Java Objects are Buffered and Istantiated
Asychronously Sent to the
RDBMS
Third Step
Java Objects are Persisted into
RDBM and properly Indexed
Fourth Step
The MultiDimensional Model is istantiated from RDBMS data
sources + DW Metedata Definitions
Some Experiments
The Milano Dataset
Our Experiments are aimed to test • SOLAP Queries•STOLAP Queries• “Presence” Custom Specified Measure Validation
Features Value
Records 2075213
Trajectories 83134
Stops 464584
Moves 1527495
POIs 39776
What is the role of semantics in query complexity?
STOLAP Query
STOLAP Query With Semantics: Give the number of visits of a moving entity for events of type “Restaurant” where its own trajectory started occur in a range = Ɵ close to a residential area (where residential area is a record of the Event dimension)
STOLAP Query Without Semantics: Give the number of visits of a moving entity occur in a range = Ɵ close to regions where there is a high concentration Ω of trajectories at lunch time (12:00-13:00) or dinner time (19:00-20:00) where its own trajectory started near a residential area ( defines as an area where a number of ᵚ trajectories start)
SpatialFilter filter = new DistanceFilter(eventDimension.getProperty("Event Shape"),stopMeasure,1);
OlapQuery query = new OlapQuery();
query.addSelection(presenceMeasure,OlapQuery.COLUMNS);query.addSelection(eventDimension,OlapQuery.ROWS);query.addFilter(filter);query.addCondition("[Event].[Food Shop]");query.addCondition("[Trajectory].[Trajectory Group].[Number of Trajectories > 10]");query.setCube(stCube);query.execute();
N_VISITS OBJET_ID64640 8975456055 7879652015 7070249995 7693047470 7908846460 82085
Presence Measure Validation
Presence Measure: Problem: how to aggregates the number of trajectories within a hierarchical fully-geometric dimension avoiding the double-counting problem ?
A B
C D
Sum
1
1
1
1
1 != 4
Presence Measure Validation
Solution :define an aggregation algorithm that can use spatial operators!
Our application can define SQL injections for spatial-aggregates :String sqlExpression = "case when get_trj_space_area_intersections(trdw_episode_facts.geom) > 0 then ceil(1/get_trj_space_area_intersections(trdw_episode_facts.geom)) else 0 end ";
Measure presence = new VirtualMeasure(“Trj Presence Measure", factTable, “presence", sqlExpression);
String sqlExpression = "case when get_trj_space_area_intersections(trdw_episode_facts.geom) > 0 then ceil(1/get_trj_space_area_intersections(trdw_episode_facts.geom)) else 0 end ";
Measure presence = new VirtualMeasure(“Trj Presence Measure", factTable, “presence", sqlExpression);
Presence Measure Validation
Results on 260 Trajectories subset Milano – Arese: 2
Milano – Assago: 2Milano – Bollate: 1Milano – Bresso: 2
Milano – Buccinasco: 2Milano - Cesano Boscone: 6
Milano – Cormano: 2Milano – Corsico: 2
Milano - Cusano Milanino: 2Milano – Gaggiano: 2
Milano - Locate di Triulzi: 2Milano – Milano: 186
Milano – Novate: 2Milano – Opera: 2
Milano – Pero: 2Milano - Peschiera Borromeo: 2
Milano – Rho: 14Milano – Rozzano: 2
Milano - San Donato Milanese: 1Milano - San Giuliano Milanese: 6
Milano – Segrate: 2Milano - Settimo Milanese: 8
Milano - Trezzano Rosa: 4Milano - Zibido San Giacomo: 2
Monza and Brianza – Mezzano: 2
Milano: 258 Monza and Brianza: 2
Lombardia: 260
Conclusions
Summarizing: we are proposing
• a cross-database cross-platform generic middleware for spatio-temporal DW
• a modular architecture that can be enriched with user-defined aggregation functions
• a proposal for independent integration of Semantics for Trajectories
• the first (known) implementation of a Semantic enriched Trajectory Data Warehouse
Thanks for your attention
Any Question?Any Question? Suggestions?Suggestions?
Comments?Comments?
For more information: http://st-toolkit.sourceforge.net/
Thanks for the attention