Download - Soffri di patologie da "domini complessi con tante relazioni"? C'è una nuova cura: Graph Database
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 1www.orientechnologies.com
Luca Garulli – Founder and CEO@Orient Technologies LtdAuthor of OrientDB
Soffri di patologie da"domini complessi con
tante relazioni"? C'è una nuova cura:
Graph Database
www.twitter.com/lgarulli
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 2
1979First Relational DBMS available as product
2009NoSQL movement
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 3
1979First Relational DBMS available as product
2009NoSQL movement
Hey, 30 years in the IT field is so huge!
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 4
Before 2009 teams of developersalways fought to select:
Operative SystemProgramming Language
Middleware (App-Servers)
What about the Database?
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 5
One of the main resistances ofRDBMS users to pass to a NoSQL product
are related to thecomplexity of the model:
Ok, NoSQL products are super forBigData and BigScale
but...
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 6
...what about the model?
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 7
What is the NoSQL answer about managing complex domains?
Key-Value stores ?Column-Based ?
Document database ?Graph database !
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 8
CAUTION!This presentation will not use a
social like domain withthe classic paradigm of
friend-of-friendN
where the graph databasesare already widely used...
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 9
...But rather we will explore howto think «graphically» with one of the
most common domains in theenterprise world:
The old-classic CRM* domain
* today in 99% of the cases a RDBMS is used
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 10
Every developer knowsthe Relational Model,but who knows the
Graph one?
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 11
Back to school:Graph Theory crash course
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 12
Basic Graph
LucaLuca NoSQLDay
NoSQLDay
Likes
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 13
Property Graph Model*
Lucaname: Luca
surname: Garullicompany: Orient Tech
Lucaname: Luca
surname: Garullicompany: Orient Tech
NoSQLDay
date: Nov 15° 2013
NoSQLDay
date: Nov 15° 2013
Likes
since: 2013
Vertices and Edges can have propertiesVertices and Edges can have propertiesVertices and Edges can have properties
Vertices are directed
* https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 14
Property Graph Model
LucaLuca NoSQLDay
NoSQLDay
Likes
since: 2013
Speakstitle: «Switching...»
abstract: «This talk presents...»
An Edge connects 2 vertices: use multiple edges to represents 1-N and N-M
relationships
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 15
Property Graph Model
Likes
DanielDaniel
LucaLuca
Organizes
FriendOf
NoSQLDay
NoSQLDay
UdineUdine
located
Studies
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 16
Compliments, this is your diploma in«Graph Theory»
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 17
Now go backto our domain:
the CRM
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 18
Domain: the super minimal CRM
CustomerCustomer AddressAddress
OrderOrder StockStock
Registry system
Order system
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 19
StockStock
Registry system
Domain: the super minimal CRM
OrderOrder
Order system
CustomerCustomer AddressAddress
How doesRelational DBMS
manage relationships?
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 20
Relational World: 1-1 Relationships
JOIN Customer.Address -> Address.Id
Customer
Id Name Address
10 Luca 34
11 Jill 44
34 John 54
56 Mark 66
88 Steve 68
Address
Id Location
34 Rome
44 London
54 Moscow
66 New Mexico
68 Palo Alto
Foreign key
Primary keyPrimary key
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 21
Relational World: 1-N Relationships
Inverse JOIN Address.Customer -> Customer.Id
Customer
Id Name
10 Luca
11 Jill
34 John
56 Mark
88 Steve
Address
Id Customer Location
24 10 Rome
33 10 London
44 34 Moscow
66 56 Cologne
68 88 Palo Alto
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 22
Relational World: N-M Relationships
Additional table with 2 JOINs(1) CustomerAddress.Id -> Customer.Id and(2) CustomerAddress.Address -> Address.Id
Customer
Id Name
10 Luca
11 Jill
34 John
56 Mark
88 Steve
Address
Id Location
24 Rome
33 London
44 Moscow
66 Cologne
68 Palo Alto
CustomerAddress
Id Address
10 24
10 33
34 44
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 23
What’s wrong with theRelational Model?
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 24
These are all JOINs executedeverytime you traverse a
relationship
The JOIN is the evil!Customer
Id Name
10 Luca
11 Jill
34 John
56 Mark
88 Steve
Address
Id Location
24 Rome
33 London
44 Moscow
66 Cologne
68 Palo Alto
These are all JOINs executedeverytime you traverse a
relationship
These are all JOINs executedeverytime you traverse a
relationship
These are all JOINs executedeverytime you traverse a
relationship!
CustomerAddress
Id Address
10 24
10 33
34 24
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 25
A JOIN means searching for a key inanother table
The first rule to improve performanceis indexing all the keys
Index speeds up searches, but slows downinsert, updates and deletes
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 26
So in the best case a JOIN is a lookupinto an index
This is done per single join!
If you traverse hundreds of relationshipsyou’re executing hundreds of JOINs
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 27
Index Lookupis it really that fast?
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 28
Index Lookup: how does it works?
A-Z
A-L M-Z
Think to an Address Book
where we have to find the Luca’s phone
number
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 29
Index Lookup: how does it works?
A-Z
A-L M-Z
A-L
A-D E-L
M-Z
M-R S-Z
Index algorithms are all similar and based on
balanced trees
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 30
Index Lookup: how does it works?
A-Z
A-L M-Z
A-L
A-D E-L
M-Z
M-R S-Z
A-D
A-B C-D
E-L
E-G H-L
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 31
Index Lookup: how does it works?
A-Z
A-L M-Z
A-L
A-D E-L
M-Z
M-R S-Z
A-D
A-B C-D
E-L
E-G H-L
E-G
E-F G
H-L
H-J K-L
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 32
Index Lookup: how does it works?
A-Z
A-L M-Z
A-L
A-D E-L
M-Z
M-R S-Z
A-D
A-B C-D
E-L
E-G H-L
E-G
E-F G
H-L
H-J K-L
Luca
Found! This lookup took 5 steps and grows up with the index
size!
Found! This lookup took 5 steps and grows up with the index
size!
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 33
An index lookup is executedfor each JOIN
Querying more tables can easilyproduce millions of JOINs/Lookups!
Here the rule: more entries= more lookup steps = slower JOIN
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 34
Oh! This is whyperformance of my database
drops down whenit becomes bigger,
and bigger,and bigger!
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 35
Is there a better way tomanage relationships?
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 36
“A graph database is anystorage system
that providesindex-free adjacency”
- Marko Rodriguez (author of TinkerPop Blueprints)
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 37
How does GraphDB manageindex-free relationships?
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 38
an Open Source (Apache licensed)document-graph NoSQL dbms
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 39
Let’s go backto the Graph Stuff
How does OrientDBmanage relationships?
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 40
LucaLuca
OrientDB: traverse a relationship
label : ‘Customer’name : ‘Luca’label : ‘Customer’name : ‘Luca’
RID = #13:35RID = #13:35 RID = #13:100RID = #13:100
label = ‘Address’name = ‘Rome’label = ‘Address’name = ‘Rome’
The Record ID (RID)is the physical position
RomeRome
The Record ID (RID)is the physical position
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 41
LucaLucaLives
OrientDB: traverse a relationship
out : [#14:54]label : ‘Customer’name : ‘Luca’
out : [#14:54]label : ‘Customer’name : ‘Luca’
out: [#13:35]in: [#13:100]Label : ‘Lives’
out: [#13:35]in: [#13:100]Label : ‘Lives’
RID = #13:35RID = #13:35 RID = #13:100RID = #13:100
in: [#14:54]label = ‘Address’name = ‘Rome’
in: [#14:54]label = ‘Address’name = ‘Rome’
The Edge’s RID is saved inside both vertices, as
«out» and «in»
RomeRome
The Edge’s RID is saved inside both vertices, as
«out» and «in»
RID = #14:54RID = #14:54
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 42
LucaLucaLives
OrientDB: traverse a relationship
out : [#14:54]label : ‘Customer’name : ‘Luca’
out : [#14:54]label : ‘Customer’name : ‘Luca’
out: [#13:35]in: [#13:100]Label : ‘Lives’
out: [#13:35]in: [#13:100]Label : ‘Lives’
RID = #13:35RID = #13:35
RID = #14:54RID = #14:54RID = #13:100RID = #13:100
in: [#14:54]label = ‘Address’name = ‘Rome’
in: [#14:54]label = ‘Address’name = ‘Rome’
RomeRome
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 43
LucaLucaLives
OrientDB: traverse a relationship
out : [#14:54]label : ‘Customer’name : ‘Luca’
out : [#14:54]label : ‘Customer’name : ‘Luca’
out: [#13:35]in: [#13:100]Label : ‘Lives’
out: [#13:35]in: [#13:100]Label : ‘Lives’
RID = #13:35RID = #13:35
RID = #14:54RID = #14:54RID = #13:100RID = #13:100
in: [#14:54]label = ‘Address’name = ‘Rome’
in: [#14:54]label = ‘Address’name = ‘Rome’
RomeRome
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 44
GraphDB handles relationships as aphysical LINK to the record
assigned when the edge is created
on the other side
RDBMS computes therelationship every time you query a database
Is not that crazy?!
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 45
This means jumping from aO(log N) algorithm to a near O(1)
traversing cost is not more affectedby database size!
This is huge in the BigData age
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 46
OrientDB in the Blueprints micro-benchmark,on common hw, with a hot cache,
traverses 29,6 Millionsof records in less than 5 seconds
about 6 Millions of nodes traversed per sec!
*unless you live in the Google’s server farm
Do not try this at home with a RDBMS*!
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 47
Create the graph in SQL$luca> cd bin$luca> ./console.shOrientDB console v.1.3.0-SNAPSHOT (www.orientdb.org) Type 'help' to display all the commands supported.
orientdb> create vertex Customer set name = ‘Luca’Created vertex #13:35 in 0.03 secs
orientdb> create vertex Address set name = ‘Rome’Created vertex #13:100 in 0.02 secs
orientdb> create edge Lives from #13:35 to #13:100Created edge #14:54 in 0.02 secs
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 48
Create the graph in Java
Graph graph = new OrientGraph("local:/tmp/db/graph”);
Vertex luca = graph.addVertex( “class:Customer” );luca.setProperty( “name", “Luca” );
Vertex rome = graph.addVertex ( “class:Address” );rome.setProperty( “name", “Rome” );
Edge edge = luca.addEdge( “Lives”, rome );
graph.shutdown();
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 49
Query the graph in SQL
orientdb> select in(‘Lives’) from Address where name = ‘Rome’
---+------+---------|--------------------+--------------------+--------+ #| RID |@class |label |out_Lives |in |---+------+---------+--------------------+--------------------+--------+ 0| 13:35|Customer |Luca |[#14:54] | |---+------+---------+--------------------+--------------------+--------+1 item(s) found. Query executed in 0.007 sec(s).
Incoming vertices
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 50
More on query power
orientdb> select sum( out(‘Order’).total ) from Customer where name = ‘Luca’
orientdb> traverse both(‘Friend’) from Customer while $depth <= 7
orientdb> select from ( traverse both(‘Friend’) from Customer while $depth <= 7 ) where @class=‘Customer’ and city.name = ‘Udine’
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 51
Query vs traversal
Once you’ve a well connected databasein the form of a Super Graph you cancross records instead of query them!
All you need is some root verticeswhere to start traversing
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 52
Query vs traversal
CustomersCustomers
LucaLuca Mark
Mark JillJill
Order2332Order2332
Order8834Order8834
WhiteSoapWhiteSoap
StocksStocksSpecialCustomers
SpecialCustomers
This is aroot
vertex
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 53
Temporal based graph
Order2333Order2333
Order2334Order2334
CalendarCalendar
Hour9/4/2013
10:00
Hour9/4/2013
10:00
Hour9/4/2013
09:00
Hour9/4/2013
09:00
Order2332Order2332
Day9/4/2013
Day9/4/2013
MonthApril 2013
MonthApril 2013
Year2013Year2013
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 54
Location based graph
Order2333Order2333
Order2334Order2334
LocationLocation
CityRomeCity
RomeCity
FiumicinoCity
Fiumicino
Order2332Order2332
StateRM
StateRM
RegionLazio
RegionLazio
CountryItaly
CountryItaly
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 55
Mix & Merge graphs
Order2333Order2333
Order2334Order2334
LocationLocation
CityRomeCity
RomeCity
FiumicinoCity
Fiumicino
Order2332Order2332
StateRM
StateRM
RegionLazio
RegionLazio
CountryItaly
CountryItaly
CalendarCalendar
Hour9/4/2013
10:00
Hour9/4/2013
10:00
Hour9/4/2013
09:00
Hour9/4/2013
09:00
Day9/4/2013
Day9/4/2013
MonthApril 2013
MonthApril 2013
Year2013Year2013
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 56
This is your database
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 57
Get last customer bought ‘Barolo’select last(out(‘Order’).in(‘Customer)) from Stock where name = ‘Barolo’
#34:22
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 58
Get his’s country
select out(‘City’) from #34:22Udine, Italy
#55:12
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 59
Get orders from that country
select in(‘Customer’) from #55:12
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 60
Let’s move like aSpider
on the web
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 61
Subscribe using the code “nosqlday”to get 20% for all
NoSQLDay attendees!
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 62
Questions & (maybe) Answers
Luca Garulli
www.twitter.com/lgarulli