orientdb - time series and event sequences - codemotion milan 2014
Post on 10-Jul-2015
2.191 Views
Preview:
DESCRIPTION
TRANSCRIPT
Time flows, my friend
Luigi Dell’Aquila
Orient Technologies LTD
Twitter: @ldellaquila
Managing event sequences and time series with a Document-Graph Database
Codemotion Milan 2014
Time What…?
Time What…?
Time series:
A time series is a sequence of data points, typicallyconsisting of successive measurements made over a time interval (Wikipedia)
Time What…?
Event sequences:
• A set of events with a timestamp
• A set of relationships “happenedbefore/after”
• Cause and effect relationships
Time What…?
Time as a dimension:
• Direct:
– Eg. begin and end of relationships (I’m a friend of John since…)
• Calculated
– Eg. Speed (distance/time)
Time What…?
Time as a constraint:
• Query execution time!
The problem:Fast and Effective
Fast and Effective
Fast write: Time doesn’t wait! Writes just arrive
Fast read: a lot of data to be read in a short time
Effective manipulation: complex operations like
- Aggregation
- Prediction
- Analysis
Current approaches
Current approaches
0. Relational approach: table
Timestamp Value
2014:11:21 14:35:00 1321
2014:11:21 14:35:01 2444
2014:11:21 14:35:02 2135
2014:11:21 14:35:03 1833
Current approaches
0. Relational approach: table
HH MM SS Value
14 35 0 1321
14 35 1 2444
14 35 2 2135
14 35 3 1833
Current approaches
0. Relational – Advantages
• Simple
• It can be used together with your application data (operational)
Current approaches
0. Relational – Disadvantages
• Slow read (relies on an index)
• Slow insert (update the index…)
Current approaches
1. Document Database
• Collections of Documents instead of tables
• Schemaless
• Complex data structures
Current approaches
1. Document approach: Minute Based
{
timestamp: “2014-11-21 12.05“
load: [10, 15, 3, … 30] //array of 60, one per second
}
Current approaches
1. Document approach: Hour Based
{
timestamp: “2014-11-21 12.00“
load: {
0: [10, 15, 3, … 30], //array of 60, one per second
1: [0, 12, 31, … 24],
…
59: [10, 10, 1, … 16]
}
}
Current approaches
1. Document approach – Advantages
• Fast write: One insert x 60 updates
• Fast fetch
Current approaches
1. Document approach – Disadvantages
• Fixed time windows
• Single point per unit
• How to pre-aggregate?
• Relationships with the rest of the world?
• Relationships between events?
Current approaches
2. Graph Database
• Nodes/Edges instead of tables
• Index free adjacency
• Fast traversal
• Dynamic structure
Current approaches
2. Graph approach: linked sequence
e1
e2
nexte3
next e4
nexte5
next
(timestamp on vertex)
Current approaches
2. Graph approach: linked sequence (tagbased)
e1
e2
nextTag1
e3
nextTag2
e4
nextTag1
e5
nextTag1
nextTag2
[Tag1, Tag2] [Tag1]
[Tag1, Tag2]
[Tag1]
[Tag2]
Current approaches
2. Graph approach: Hierarchy
e1
e2
e60
1
1
8
24
2 60…
…
Days
Hours
Minutes
Seconds
…
e3
Current approaches
2. Graph approach: mixed
e1
e2
e60
1
1
8
24
2 60…
…
Days
Hours
Minutes
Seconds
…
e3
Current approaches
1. Graph approach – Advantages
• Flexible
• Events can be connected together in different ways
• You can connect events to other entities
• Fast traversal of dynamic time windows
• Fast aggregation (based on hierarchy)
Current approaches
1. Graph approach – Disadvantages
• Slow writes (vertex + edge + maintenance)
• Not so fast reads
Can we mix different models and getall the advantages?
Can we mix all this with the rest of application logic?
Multi-Model!
• Document database (schema-free, complexproperties)
• Graph database (index-free adjacency, fast traversal)
• SQL (extended)• Operational (schema - ACID)• OO concepts (Classes, inheritance, polymorphism)• REST/JSON interface• Native Javascript (extend query language, expose
services, event hooks)• Distributed (Multi-master replica/sharding)
architecture
OrientDB
First step: put them together
1
1
8
24
2 60…
Days
Hours
Minutes
…
{0: 1000,1: 1500.…59: 96
}
OrientDB
First step: put them together
1
1
8
24
2 60…
Days
Hours
Minutes
…
{0: 1000,1: 1500.…59: 96
}
Graph
Document <- IT’S A VERTEX TOO!!!
OrientDB
First step: put them together
1
8
24
Days
Hours…
{0: {
0: 1000, 1: 1500,…59: 210
}1: { … }…59: { … }
}
Graph
Document
Where should I stop?
It depends on my domain and requirements.
OrientDB
Result:
• Same insert speed of Document approach
• But with flexibility of a Graph
• (as a side effect of mixing models, documents can also contain “pointers” to other elements of app domain)
OrientDB
Second step: Pre-aggregate
1
1
8
24
2 60…
Days
Hours
Minutes
…
{0: 1000,1: 1500.…59: 96
}
Graph
Document <- IT’S A VERTEX TOO!!!
OrientDB
Second step: Pre-aggregate
1
1
8
24
2 60…
Days
Hours
Minutes
…
{0: 1000,1: 1500.…59: 96
}
Graph
Document <- IT’S A VERTEX TOO!!!
sum()
OrientDB
Second step: Pre-aggregate
1
1
8
24
2 60…
Days
Hours
Minutes
…
{0: 1000,1: 1500.…59: 96
}
Graph
Document <- IT’S A VERTEX TOO!!!
sum()
sum()
OrientDB
How to aggregate
Hooks: Server side triggers (Java or Javascript), executed when DB operations happen (eg. Insertor update)
Java interface:
Public RESULT onBeforeInsert(…);
public void onAfterInsert(…);
public RESULT onBeforeUpdate(…);
public void onAfterUpdate(…);
OrientDB
Aggregation logic
• Second 0 -> insert
• Second 1 -> update
• …
• Second 57 -> update
• Second 58 -> update
• Second 59 -> update + aggregate
– Write aggregate value on minute vertex• Minute == 59? Calculate aggregate on hour vertex
OrientDB
1
1
8
24
2 60…
Days
Hours
Minutes
…
{0: 1,1: 12.…59: 3
}
sum = 1000
sum = 15000
sum = 300
incomplete
complete
1 2
sum = null
sum = null
OrientDB
Query logic:
• Traverse from root node to specified level(filtering based on vertex data)
• Is there aggregate value?
– Yes: return it
– No: go one level down and do the same
Aggregation on a level will be VERY fast if youhave horizontal edges!
OrientDB
How to calculate aggregate values with a query
Input params:
- Root node (suppose it is #11:11)
select sum(aggregateVal) from (
traverse out() from #11:11
while in().aggregateVal is null
)
With the same logic you can query based on time windows
OrientDB
Third step: Complex domains
1
1 2 60…
Hours
Minutes
{0: {val: 1000},1: {val: 1500}.…59: {
val: 96,eventTags: [tag1, tag2]…
}}
Graph
Document <- Enrich the domain
OrientDB
Another use case: Event Categories and OO
e1
e2
nextTag1
e3
nextTag2
e4
nextTag1
e5
nextTag1
nextTag2
[Tag1, Tag2, Tag3] [Tag1]
[Tag1, Tag2]
[Tag1]
[Tag2]
nextTag3
e3
[Tag3]
OrientDB
Another use case: Event Categories and OO
Suppose tags are hierarchical categories(Classes for vertices and/or edges)
nextTAG
nextTagX nextTag3
nextTag2nextTag1
OrientDB
Subset of events
TRAVERSE out(‘nextTag1’) FROM <e1>
e1
e2
nextTag1e4
nextTag1
e5
nextTag1
[Tag1, Tag2, Tag3] [Tag1]
[Tag1, Tag2]
[Tag1]
OrientDB
Subset of events
TRAVERSE out(‘nextTag2’) FROM <e1>
e1
nextTag1
e3
nextTag2 e5
nextTag2
[Tag1, Tag2, Tag3]
[Tag1, Tag2]
[Tag2]
OrientDB
Subset of events (Polymorphic!!!)
TRAVERSE out(‘nextTagX’) FROM <e1>
e1
e2
nextTag1
e3
nextTag2
e4
nextTag1
e5
nextTag1
nextTag2
[Tag1, Tag2, Tag3] [Tag1]
[Tag1, Tag2]
[Tag1]
[Tag2]
Connect all this with the rest of yourapplication domain
You’ll see, everything will get more complex: you will discover new time-
related dimensions (speed, position…) and new needs (complex
forecasting)
CHASE!
Chase
• Your target is running away
• You have informers that track his moves(coordinates in a point of time) and giveyou additional (unstructured) information
• You have a street map
• You want to:
– Catch him ASAP
– Predict his moves
– Be sure that he is inside an area
Chase
Chase
Chase
• Map is made of points and distances
• You also have speed limits for streets
point1
pointN Distance: 1KmMax speed: 70Km/h
Distance: 2KmMax speed: 120Km/h
Distance: 8KmMax speed: 90Km/h
Street
Map point
Chase
• Map is made of points and distances
• You also have speed limits for streets
• Distance / Speed = TIME!!!
Chase
You have a time series of your target’s moves
{Timestamp: 29/11/2014 17:15:00LAT: 19,12223LON: 42,134
}
{Timestamp: 29/11/2014 17:55:00LAT: 19,12223LON: 42,134
}
Event seqence
Event
{Timestamp: 29/11/2014 17:55:00LAT: 19,12223LON: 42,134
}
Chase
You have a time series of your target’s moves
21/11/20142:35:00 PM
20/11/20141:20:00 PM
Street
Map point
Chase
You have a time series of your target’s moves
21/11/201414:35:00
20/11/201413:20:00
Where
Event seqence
Street
Event
Map point
29/11/201417:55:00
Chase
Vertices and edges are also documents
So you can store complex information inside them
{
timestamp: 22213989487987,
lat: xxxx,
lon: yyy,
informer: 15,
additional: {
speed: 120,
description: “the target was in a car”
car: {
model: “Fiat 500”,
licensePlate: “AA 123 BB”
}
}
}
Chase
Now you can:
• Predict his moves (eg. statistical methods, interpolation on lat/lon + time)
• Calculate how far he can be (based on last position, avg speed and street data)
• Reach him quickly (shortest path, Dijkstra)
• … intelligence?
Chase
But to have all this you need:
• An easy way for your informers to sendtime series events
Hint: REST interface
With OrientDB you can expose Javascriptfunctions as REST services!
Chase
And you need:
• An extended query language
Eg. TRAVERSE out(“street”) FROM (
SELECT out(“point”) FROM #11:11
// my last event
) WHILE canBeReached($current, #11:11)
(where he could be)
Chase
With OrientDB you can write
function canBeReached(node, event)
In Javascript and use it in your queries
Chase
It’s just a game, but think about:
• Fraud detection
• Traffic routing
• Multi-dimensional analytics
• Forecasting
• …
Summary
One model is not enough
One of most common issues of my customersis:
“I have a zoo of technologies in my applicationstack, and it’s getting worse every day”
My answer is: Multi-Model DB
One model is not enough
One of most common issues of my customersis:
“I have a zoo of technologies in my applicationstack, and it’s getting worse every day”
My answer is: Multi-Model DB
of course ;-)
From:“choose the right data model for your
use case”
To:“Your application has multiple data
models, you need all of them!”
This is NoSQL 2.0!!!
Thank you!
@ldellaquila
l.dellaquila@orientechnologies.com
top related