orientdb - time series and event sequences - codemotion milan 2014

71
Time flows, my friend Luigi Dell’Aquila Orient Technologies LTD Twitter: @ldellaquila Managing event sequences and time series with a Document-Graph Database Codemotion Milan 2014

Upload: luigi-dellaquila

Post on 10-Jul-2015

2.191 views

Category:

Data & Analytics


10 download

DESCRIPTION

Managing event sequences and time series with a Document-Graph Database

TRANSCRIPT

Page 1: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Time flows, my friend

Luigi Dell’Aquila

Orient Technologies LTD

Twitter: @ldellaquila

Managing event sequences and time series with a Document-Graph Database

Codemotion Milan 2014

Page 2: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Time What…?

Page 3: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Time What…?

Time series:

A time series is a sequence of data points, typicallyconsisting of successive measurements made over a time interval (Wikipedia)

Page 4: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Time What…?

Event sequences:

• A set of events with a timestamp

• A set of relationships “happenedbefore/after”

• Cause and effect relationships

Page 5: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Time What…?

Time as a dimension:

• Direct:

– Eg. begin and end of relationships (I’m a friend of John since…)

• Calculated

– Eg. Speed (distance/time)

Page 6: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Time What…?

Time as a constraint:

• Query execution time!

Page 7: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

The problem:Fast and Effective

Page 8: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Fast and Effective

Fast write: Time doesn’t wait! Writes just arrive

Fast read: a lot of data to be read in a short time

Effective manipulation: complex operations like

- Aggregation

- Prediction

- Analysis

Page 9: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Current approaches

Page 10: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Current approaches

0. Relational approach: table

Timestamp Value

2014:11:21 14:35:00 1321

2014:11:21 14:35:01 2444

2014:11:21 14:35:02 2135

2014:11:21 14:35:03 1833

Page 11: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Current approaches

0. Relational approach: table

HH MM SS Value

14 35 0 1321

14 35 1 2444

14 35 2 2135

14 35 3 1833

Page 12: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Current approaches

0. Relational – Advantages

• Simple

• It can be used together with your application data (operational)

Page 13: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Current approaches

0. Relational – Disadvantages

• Slow read (relies on an index)

• Slow insert (update the index…)

Page 14: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Current approaches

1. Document Database

• Collections of Documents instead of tables

• Schemaless

• Complex data structures

Page 15: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Current approaches

1. Document approach: Minute Based

{

timestamp: “2014-11-21 12.05“

load: [10, 15, 3, … 30] //array of 60, one per second

}

Page 16: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Current approaches

1. Document approach: Hour Based

{

timestamp: “2014-11-21 12.00“

load: {

0: [10, 15, 3, … 30], //array of 60, one per second

1: [0, 12, 31, … 24],

59: [10, 10, 1, … 16]

}

}

Page 17: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Current approaches

1. Document approach – Advantages

• Fast write: One insert x 60 updates

• Fast fetch

Page 18: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Current approaches

1. Document approach – Disadvantages

• Fixed time windows

• Single point per unit

• How to pre-aggregate?

• Relationships with the rest of the world?

• Relationships between events?

Page 19: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Current approaches

2. Graph Database

• Nodes/Edges instead of tables

• Index free adjacency

• Fast traversal

• Dynamic structure

Page 20: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Current approaches

2. Graph approach: linked sequence

e1

e2

nexte3

next e4

nexte5

next

(timestamp on vertex)

Page 21: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Current approaches

2. Graph approach: linked sequence (tagbased)

e1

e2

nextTag1

e3

nextTag2

e4

nextTag1

e5

nextTag1

nextTag2

[Tag1, Tag2] [Tag1]

[Tag1, Tag2]

[Tag1]

[Tag2]

Page 22: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Current approaches

2. Graph approach: Hierarchy

e1

e2

e60

1

1

8

24

2 60…

Days

Hours

Minutes

Seconds

e3

Page 23: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Current approaches

2. Graph approach: mixed

e1

e2

e60

1

1

8

24

2 60…

Days

Hours

Minutes

Seconds

e3

Page 24: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Current approaches

1. Graph approach – Advantages

• Flexible

• Events can be connected together in different ways

• You can connect events to other entities

• Fast traversal of dynamic time windows

• Fast aggregation (based on hierarchy)

Page 25: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Current approaches

1. Graph approach – Disadvantages

• Slow writes (vertex + edge + maintenance)

• Not so fast reads

Page 26: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Can we mix different models and getall the advantages?

Page 27: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Can we mix all this with the rest of application logic?

Page 28: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Multi-Model!

Page 29: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

• Document database (schema-free, complexproperties)

• Graph database (index-free adjacency, fast traversal)

• SQL (extended)• Operational (schema - ACID)• OO concepts (Classes, inheritance, polymorphism)• REST/JSON interface• Native Javascript (extend query language, expose

services, event hooks)• Distributed (Multi-master replica/sharding)

architecture

Page 30: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

OrientDB

First step: put them together

1

1

8

24

2 60…

Days

Hours

Minutes

{0: 1000,1: 1500.…59: 96

}

Page 31: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

OrientDB

First step: put them together

1

1

8

24

2 60…

Days

Hours

Minutes

{0: 1000,1: 1500.…59: 96

}

Graph

Document <- IT’S A VERTEX TOO!!!

Page 32: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

OrientDB

First step: put them together

1

8

24

Days

Hours…

{0: {

0: 1000, 1: 1500,…59: 210

}1: { … }…59: { … }

}

Graph

Document

Page 33: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Where should I stop?

It depends on my domain and requirements.

Page 34: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

OrientDB

Result:

• Same insert speed of Document approach

• But with flexibility of a Graph

• (as a side effect of mixing models, documents can also contain “pointers” to other elements of app domain)

Page 35: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

OrientDB

Second step: Pre-aggregate

1

1

8

24

2 60…

Days

Hours

Minutes

{0: 1000,1: 1500.…59: 96

}

Graph

Document <- IT’S A VERTEX TOO!!!

Page 36: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

OrientDB

Second step: Pre-aggregate

1

1

8

24

2 60…

Days

Hours

Minutes

{0: 1000,1: 1500.…59: 96

}

Graph

Document <- IT’S A VERTEX TOO!!!

sum()

Page 37: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

OrientDB

Second step: Pre-aggregate

1

1

8

24

2 60…

Days

Hours

Minutes

{0: 1000,1: 1500.…59: 96

}

Graph

Document <- IT’S A VERTEX TOO!!!

sum()

sum()

Page 38: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

OrientDB

How to aggregate

Hooks: Server side triggers (Java or Javascript), executed when DB operations happen (eg. Insertor update)

Java interface:

Public RESULT onBeforeInsert(…);

public void onAfterInsert(…);

public RESULT onBeforeUpdate(…);

public void onAfterUpdate(…);

Page 39: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

OrientDB

Aggregation logic

• Second 0 -> insert

• Second 1 -> update

• …

• Second 57 -> update

• Second 58 -> update

• Second 59 -> update + aggregate

– Write aggregate value on minute vertex• Minute == 59? Calculate aggregate on hour vertex

Page 40: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

OrientDB

1

1

8

24

2 60…

Days

Hours

Minutes

{0: 1,1: 12.…59: 3

}

sum = 1000

sum = 15000

sum = 300

incomplete

complete

1 2

sum = null

sum = null

Page 41: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

OrientDB

Query logic:

• Traverse from root node to specified level(filtering based on vertex data)

• Is there aggregate value?

– Yes: return it

– No: go one level down and do the same

Aggregation on a level will be VERY fast if youhave horizontal edges!

Page 42: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

OrientDB

How to calculate aggregate values with a query

Input params:

- Root node (suppose it is #11:11)

select sum(aggregateVal) from (

traverse out() from #11:11

while in().aggregateVal is null

)

With the same logic you can query based on time windows

Page 43: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

OrientDB

Third step: Complex domains

1

1 2 60…

Hours

Minutes

{0: {val: 1000},1: {val: 1500}.…59: {

val: 96,eventTags: [tag1, tag2]…

}}

Graph

Document <- Enrich the domain

Page 44: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

OrientDB

Another use case: Event Categories and OO

e1

e2

nextTag1

e3

nextTag2

e4

nextTag1

e5

nextTag1

nextTag2

[Tag1, Tag2, Tag3] [Tag1]

[Tag1, Tag2]

[Tag1]

[Tag2]

nextTag3

e3

[Tag3]

Page 45: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

OrientDB

Another use case: Event Categories and OO

Suppose tags are hierarchical categories(Classes for vertices and/or edges)

nextTAG

nextTagX nextTag3

nextTag2nextTag1

Page 46: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

OrientDB

Subset of events

TRAVERSE out(‘nextTag1’) FROM <e1>

e1

e2

nextTag1e4

nextTag1

e5

nextTag1

[Tag1, Tag2, Tag3] [Tag1]

[Tag1, Tag2]

[Tag1]

Page 47: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

OrientDB

Subset of events

TRAVERSE out(‘nextTag2’) FROM <e1>

e1

nextTag1

e3

nextTag2 e5

nextTag2

[Tag1, Tag2, Tag3]

[Tag1, Tag2]

[Tag2]

Page 48: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

OrientDB

Subset of events (Polymorphic!!!)

TRAVERSE out(‘nextTagX’) FROM <e1>

e1

e2

nextTag1

e3

nextTag2

e4

nextTag1

e5

nextTag1

nextTag2

[Tag1, Tag2, Tag3] [Tag1]

[Tag1, Tag2]

[Tag1]

[Tag2]

Page 49: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Connect all this with the rest of yourapplication domain

Page 50: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

You’ll see, everything will get more complex: you will discover new time-

related dimensions (speed, position…) and new needs (complex

forecasting)

Page 51: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

CHASE!

Page 52: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Chase

• Your target is running away

• You have informers that track his moves(coordinates in a point of time) and giveyou additional (unstructured) information

• You have a street map

• You want to:

– Catch him ASAP

– Predict his moves

– Be sure that he is inside an area

Page 53: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Chase

Page 54: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Chase

Page 55: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Chase

• Map is made of points and distances

• You also have speed limits for streets

point1

pointN Distance: 1KmMax speed: 70Km/h

Distance: 2KmMax speed: 120Km/h

Distance: 8KmMax speed: 90Km/h

Street

Map point

Page 56: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Chase

• Map is made of points and distances

• You also have speed limits for streets

• Distance / Speed = TIME!!!

Page 57: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Chase

You have a time series of your target’s moves

{Timestamp: 29/11/2014 17:15:00LAT: 19,12223LON: 42,134

}

{Timestamp: 29/11/2014 17:55:00LAT: 19,12223LON: 42,134

}

Event seqence

Event

{Timestamp: 29/11/2014 17:55:00LAT: 19,12223LON: 42,134

}

Page 58: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Chase

You have a time series of your target’s moves

21/11/20142:35:00 PM

20/11/20141:20:00 PM

Street

Map point

Page 59: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Chase

You have a time series of your target’s moves

21/11/201414:35:00

20/11/201413:20:00

Where

Event seqence

Street

Event

Map point

29/11/201417:55:00

Page 60: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Chase

Vertices and edges are also documents

So you can store complex information inside them

{

timestamp: 22213989487987,

lat: xxxx,

lon: yyy,

informer: 15,

additional: {

speed: 120,

description: “the target was in a car”

car: {

model: “Fiat 500”,

licensePlate: “AA 123 BB”

}

}

}

Page 61: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Chase

Now you can:

• Predict his moves (eg. statistical methods, interpolation on lat/lon + time)

• Calculate how far he can be (based on last position, avg speed and street data)

• Reach him quickly (shortest path, Dijkstra)

• … intelligence?

Page 62: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Chase

But to have all this you need:

• An easy way for your informers to sendtime series events

Hint: REST interface

With OrientDB you can expose Javascriptfunctions as REST services!

Page 63: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Chase

And you need:

• An extended query language

Eg. TRAVERSE out(“street”) FROM (

SELECT out(“point”) FROM #11:11

// my last event

) WHILE canBeReached($current, #11:11)

(where he could be)

Page 64: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Chase

With OrientDB you can write

function canBeReached(node, event)

In Javascript and use it in your queries

Page 65: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Chase

It’s just a game, but think about:

• Fraud detection

• Traffic routing

• Multi-dimensional analytics

• Forecasting

• …

Page 66: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Summary

Page 67: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

One model is not enough

One of most common issues of my customersis:

“I have a zoo of technologies in my applicationstack, and it’s getting worse every day”

My answer is: Multi-Model DB

Page 68: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

One model is not enough

One of most common issues of my customersis:

“I have a zoo of technologies in my applicationstack, and it’s getting worse every day”

My answer is: Multi-Model DB

of course ;-)

Page 69: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

From:“choose the right data model for your

use case”

To:“Your application has multiple data

models, you need all of them!”

Page 70: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

This is NoSQL 2.0!!!

Page 71: OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Thank you!

@ldellaquila

[email protected]