the ingraph project - amazon s3 · the ingraph project and incremental evaluation of cypher queries...
TRANSCRIPT
The ingraph project and incremental evaluation of Cypher queries
Gábor Szárnyas, József Marton
Incremental Queries
Live railway model
Live railway model
Live railway model
Live railway model
Live railway model
Proximity detection
Live railway model
Proximity detection
Live railway model
Proximity detection
Live railway model
Trailing the switch
Proximity detection
Live railway model
Live railway model
Live railway model
c d e
g
fdiv
2
a b
1
Live railway model
c d e
g
fdiv
2
NEXT NEXT
STRAIGHT TOP
ON
a b
1
NEXT
ON
NEXT
Proximity detection
Proximity detection
≤ 𝟏 segments
Proximity detection
seg1
NEXT: 1..2
t1
ON
Proximity detection
seg2
t2
ON≤ 𝟏 segments
Proximity detection
seg1
NEXT: 1..2
t1
ON
MATCH
(t1:Train)-[:ON]->(seg1:Segment)
-[:NEXT*1..2]->(seg2:Segment)
<-[:ON]-(t2:Train)
RETURN t1, t2, seg1, seg2
Proximity detection
seg2
t2
ON≤ 𝟏 segments
Proximity detection
seg1
NEXT: 1..2
t1
ON
MATCH
(t1:Train)-[:ON]->(seg1:Segment)
-[:NEXT*1..2]->(seg2:Segment)
<-[:ON]-(t2:Train)
RETURN t1, t2, seg1, seg2
Proximity detection
seg2
t2
ON≤ 𝟏 segments
Trailing the switch
Trailing the switch
seg div
t
STRAIGHT
ON
Trailing the switch
seg div
t
STRAIGHT
ON
MATCH (t:Train)-[:ON]->(seg:Segment)
<-[:STRAIGHT]-(sw:Switch)
WHERE sw.position = 'diverging'
RETURN t.number, sw
Trailing the switch
seg div
t
STRAIGHT
ON
MATCH (t:Train)-[:ON]->(seg:Segment)
<-[:STRAIGHT]-(sw:Switch)
WHERE sw.position = 'diverging'
RETURN t.number, sw
Trailing the switch
seg div
t
STRAIGHT
ON
MATCH (t:Train)-[:ON]->(seg:Segment)
<-[:STRAIGHT]-(sw:Switch)
WHERE sw.position = 'diverging'
RETURN t.number, sw
Evaluate
continuously
Incremental queries
Register a set of standing queries
Continuously evaluate queries on changes
The Rete algorithm (1974)
o Originally for rule-based expert systems
o Indexes the graph and caches interim query results
Ujhelyi, Z. et al.
EMF-IncQuery: An integrated development environment for live model queries
Science of Computer Programming (SCP), 2015
http://www.sciencedirect.com/science/article/pii/S0167642314000082
πt.number, sw
σsw.position = ′diverging′
⋈
STRAIGHTON
divSTRAIGHT
Trailing the switch
ON
πt.number, sw
σsw.position = ′diverging′
⋈
STRAIGHTON
c d e
g
fdiv
2
NEXT NEXT
STRAIGHT TOP
a b
1
NEXT NEXT
ON
ON
divSTRAIGHT
Trailing the switch
ON
πt.number, sw
σsw.position = ′diverging′
⋈
STRAIGHTON
c d e
g
fdiv
2
NEXT NEXT
STRAIGHT TOP
a b
1
NEXT NEXT
ON
ON
divSTRAIGHT
Trailing the switch
ON
πt.number, sw
σsw.position = ′diverging′
⋈
STRAIGHTON
c d e
g
fdiv
2
NEXT NEXT
STRAIGHT TOP
a b
1
NEXT NEXT
ON
ON
divSTRAIGHT
Trailing the switch
ON
a
1
ON
e
2
ON
πt.number, sw
σsw.position = ′diverging′
⋈
STRAIGHTON e2ON
a1ON
c d e
g
fdiv
2
NEXT NEXT
STRAIGHT TOP
a b
1
NEXT NEXT
ON
ON
divSTRAIGHT
Trailing the switch
ON
a
1
ON
e
2
ON
πt.number, sw
σsw.position = ′diverging′
⋈
STRAIGHTON e2ON
a1ON
c d e
g
fdiv
2
NEXT NEXT
STRAIGHT TOP
a b
1
NEXT NEXT
ON
ON
divSTRAIGHT
Trailing the switch
ON
πt.number, sw
σsw.position = ′diverging′
⋈
STRAIGHTON e2ON
a1ON
c d e
g
fdiv
2
NEXT NEXT
STRAIGHT TOP
a b
1
NEXT NEXT
ON
ON
divSTRAIGHT
Trailing the switch
ON
e divSTRAIGHT
πt.number, sw
σsw.position = ′diverging′
⋈
STRAIGHTON e2ON
a1ON
c d e
g
fdiv
2
NEXT NEXT
STRAIGHT TOP
a b
1
NEXT NEXT
ON
ON
divSTRAIGHT
Trailing the switch
ON
e divSTRAIGHT
e divSTRAIGHT
πt.number, sw
σsw.position = ′diverging′
⋈
STRAIGHTON e divSTRAIGHT
e2ON
a1ON
c d e
g
fdiv
2
NEXT NEXT
STRAIGHT TOP
a b
1
NEXT NEXT
ON
ON
divSTRAIGHT
Trailing the switch
ON
πt.number, sw
σsw.position = ′diverging′
⋈
STRAIGHTON e divSTRAIGHT
e2ON
a1ON
c d e
g
fdiv
2
NEXT NEXT
STRAIGHT TOP
a b
1
NEXT NEXT
ON
ON
divSTRAIGHT
Trailing the switch
ON
e divSTRAIGHT
e2ON
πt.number, sw
σsw.position = ′diverging′
⋈
STRAIGHTON e divSTRAIGHT
e2ON
a1ON
c d e
g
fdiv
2
NEXT NEXT
STRAIGHT TOP
a b
1
NEXT NEXT
ON
ON
divSTRAIGHT
Trailing the switch
ON
e divSTRAIGHT
e2ON
e div
2
STRAIGHT
ON
πt.number, sw
σsw.position = ′diverging′
⋈
STRAIGHTON e divSTRAIGHT
e2ON
a1ON
e divSTRAIGHT
2ON
c d e
g
fdiv
2
NEXT NEXT
STRAIGHT TOP
a b
1
NEXT NEXT
ON
ON
divSTRAIGHT
Trailing the switch
ON
e divSTRAIGHT
e2ON
e div
2
STRAIGHT
ON
πt.number, sw
σsw.position = ′diverging′
⋈
STRAIGHTON e divSTRAIGHT
e2ON
a1ON
divSTRAIGHTON
c d e
g
fdiv
2
NEXT NEXT
STRAIGHT TOP
a b
1
NEXT NEXT
ON
ON
divSTRAIGHT
Trailing the switch
ON
e2 divSTRAIGHTON
e2
πt.number, sw
σsw.position = ′diverging′
⋈
STRAIGHTON e divSTRAIGHT
e2ON
a1ON
divSTRAIGHTON
e divSTRAIGHT
2ON
c d e
g
fdiv
2
NEXT NEXT
STRAIGHT TOP
a b
1
NEXT NEXT
ON
ON
divSTRAIGHT
Trailing the switch
ON
e2 divSTRAIGHTON
e2
πt.number, sw
σsw.position = ′diverging′
⋈
STRAIGHTON e divSTRAIGHT
e2ON
a1ON
e divSTRAIGHT
2ON
e divSTRAIGHT
2ON
c d e
g
fdiv
2
NEXT NEXT
STRAIGHT TOP
a b
1
NEXT NEXT
ON
ON
divSTRAIGHT
Trailing the switch
ON
div
2
πt.number, sw
σsw.position = ′diverging′
⋈
STRAIGHTON e divSTRAIGHT
e2ON
a1ON
e divSTRAIGHT
2ON
e divSTRAIGHT
2ON
div2
c d e
g
fdiv
2
NEXT NEXT
STRAIGHT TOP
a b
1
NEXT NEXT
ON
ON
divSTRAIGHT
Trailing the switch
ON
div
2
πt.number, sw
σsw.position = ′diverging′
⋈
STRAIGHTON e divSTRAIGHT
e2ON
a1ON
e divSTRAIGHT
2ON
e divSTRAIGHT
2ON
div2
c d e
g
fdiv
2
NEXT NEXT
STRAIGHT TOP
a b
1
NEXT NEXT
ON
ON
divSTRAIGHT
Trailing the switch
ON
div
2
πt.number, sw
σsw.position = ′diverging′
⋈
STRAIGHTON e divSTRAIGHT
e2ON
a1ON
e divSTRAIGHT
2ON
e divSTRAIGHT
2ON
div2
c e
g
fdivNEXT NEXT
STRAIGHT TOP
a b
1
NEXT NEXT
ON
divSTRAIGHT
Trailing the switch
ON
div
ON
2
d
πt.number, sw
σsw.position = ′diverging′
⋈
STRAIGHTON e divSTRAIGHT
d2ON
a1ON
e divSTRAIGHT
2ON
e divSTRAIGHT
2ON
div2
c e
g
fdivNEXT NEXT
STRAIGHT TOP
a b
1
NEXT NEXT
ON
divSTRAIGHT
Trailing the switch
ON
div
ON
2
d
πt.number, sw
σsw.position = ′diverging′
⋈
STRAIGHTON e divSTRAIGHT
d2ON
a1ON
e divSTRAIGHT
2ON
div2
c e
g
fdivNEXT NEXT
STRAIGHT TOP
a b
1
NEXT NEXT
ON
divSTRAIGHT
Trailing the switch
ON
div
ON
2
d
πt.number, sw
σsw.position = ′diverging′
⋈
STRAIGHTON e divSTRAIGHT
d2ON
a1ON
div2
c e
g
fdivNEXT NEXT
STRAIGHT TOP
a b
1
NEXT NEXT
ON
divSTRAIGHT
Trailing the switch
ON
div
ON
2
d
πt.number, sw
σsw.position = ′diverging′
⋈
STRAIGHTON e divSTRAIGHT
d2ON
a1ON
c e
g
fdivNEXT NEXT
STRAIGHT TOP
a b
1
NEXT NEXT
ON
divSTRAIGHT
Trailing the switch
ON
div
ON
2
d
ingraph
PoC query engine for openCypher
Based on the Rete algorithm
Goals:
o Provide incremental query evaluation
o Cover standard openCypher constructs
o Run in parallel & distributedly to allow scalability
Szárnyas, G. et al.
IncQuery-D: A distributed incremental model query framework in the cloud.
MODELS, 2014,
https://link.springer.com/chapter/10.1007/978-3-319-11653-2_40
Architecture & Building Blocks
MATCH (t:Train)-[:ON]->(seg:Segment)
<-[:STRAIGHT]-(sw:Switch)
WHERE sw.position = 'diverging'
RETURN t.number, sw
openCypher
query
MATCH (t:Train)-[:ON]->(seg:Segment)
<-[:STRAIGHT]-(sw:Switch)
WHERE sw.position = 'diverging'
RETURN t.number, sw
openCypher
query
Query
syntax tree
MATCH (t:Train)-[:ON]->(seg:Segment)
<-[:STRAIGHT]-(sw:Switch)
WHERE sw.position = 'diverging'
RETURN t.number, sw
Query
parser
openCypher
query
Query
syntax tree
MATCH (t:Train)-[:ON]->(seg:Segment)
<-[:STRAIGHT]-(sw:Switch)
WHERE sw.position = 'diverging'
RETURN t.number, sw
Query
parser
openCypher
query
Relational
Graph
Algebra
Query
syntax tree
MATCH (t:Train)-[:ON]->(seg:Segment)
<-[:STRAIGHT]-(sw:Switch)
WHERE sw.position = 'diverging'
RETURN t.number, sw
Relational
algebra
builder
Query
parser
openCypher
query
Relational
Graph
Algebra
Query
syntax tree
MATCH (t:Train)-[:ON]->(seg:Segment)
<-[:STRAIGHT]-(sw:Switch)
WHERE sw.position = 'diverging'
RETURN t.number, sw
Relational
algebra
builder
Query
parser
openCypher
query
Relational
Graph
Algebra
Query
syntax tree
MATCH (t:Train)-[:ON]->(seg:Segment)
<-[:STRAIGHT]-(sw:Switch)
WHERE sw.position = 'diverging'
RETURN t.number, sw
Relational
algebra
builder
Query
parser
openCypher
query
Relational
Graph
Algebra
Query
syntax tree
Relational
Graph
Algebra
Rete
network
Rete
network
model
Relational
Graph
Algebra
Rete
network
Rete
network
model
Transformer
and optimizer
Relational
Graph
Algebra
Rete
network
Rete
network
model
Transformer
and optimizer
Relational
Graph
Algebra
Rete
network
Rete
network
model
Transformer
and optimizer
Query
deployer
Operators of Relational Graph Algebra
Basic relational algebra
o projection, selection, join, left outer join, antijoin, union
Common extensions
o aggregation (𝛾), duplicate-elimination (𝛿), sort (𝜏), top (𝜆)
Specific extensions
o get-vertices ()
o expand-out (↑), expand-in (↓), expand-both (↕)
o all-different (≡)
o unwind (𝜔)
Szárnyas, G., Marton, J. and Varró, D.:
Formalising openCypher Graph Queries in Relational Algebra.
https://arxiv.org/abs/1705.02844
/
Incremental query evaluation
RGA defines a plan for a search-based engine
Some operators cannot be maintained
incrementally
o expand-out, expand-in, …
o use edge indexers and joins instead
Implement with graph transformations
Query “Trailing the switch”
Query “Close proximity”
Accessing attributes
Assuming that x is a column of a graph relation, we use the notation “x.a” in selection conditions to express the access to the corresponding value of property a in the property graph.
Hölsch, J. and Grossniklaus, M.:
An algebra and equivalences to transform graph patterns in Neo4j,
GraphQ 2016, EDBT,
http://kops.uni-konstanz.de/handle/123456789/33584
Accessing attributes
Assuming that x is a column of a graph relation, we use the notation “x.a” in selection conditions to express the access to the corresponding value of property a in the property graph.
Hölsch, J. and Grossniklaus, M.:
An algebra and equivalences to transform graph patterns in Neo4j,
GraphQ 2016, EDBT,
http://kops.uni-konstanz.de/handle/123456789/33584
Difficult to implement in incremental algorithms:
“Schema calculation” problem
t, segt, seg, t.number
sw, segsw, seg, sw.position
t.number, sw.position
πt.number, sw
σsw.position = ′diverging′
⋈
(sw:Switch)−[:STRAIGHT]−>(seg:Segment)(t:Train)−[:ON]−>(seg:Segment)
t.number, swt.number, sw
t, seg, swt, seg, t.number, sw, sw.position
t, seg, swt, seg, t.number, sw, sw.position
t.number
t.number, sw.position
sw.positiont.number
2
1. external schema
2. extra attributes
3. internal schema
This is the currentimplementation
Works, but fragile
Nested Relational Algebra (NRA)
Additional operators
o Nest (𝜈) ~ collect
o Unnest (𝜇) ~ UNWIND
Catch: incrementality requires
Flat Relational Algebra (FRA)
Roth, M.A., Korth, H.F. and Silberschatz, A.:
Extended algebra and calculus for nested relational databases.
ACM Transactions on Database Systems (TODS), 1988
http://dl.acm.org/citation.cfm?id=49347
name works
John year company
1982 Big Biz, Inc.
2010 Fusion Power Plant, Ltd.
name works.year works.company
John 1982 Big Biz, Inc.
John 2010 Fusion Power Plant, Ltd.
Property graphs as nested relations
Node/relationship properties:
id name age favColours beerRatings
1 John 32 [blue, green] {lager: 5, ale: 3}
Property graphs as nested relations
Node/relationship properties:
List:
id name age favColours beerRatings
1 John 32 [blue, green] {lager: 5, ale: 3}
id name age favColours beerRatings
1 John 32 {lager: 5, ale: 3}id value
0 blue
1 green
Property graphs as nested relations
Node/relationship properties:
List:
Map:
id name age favColours beerRatings
1 John 32 [blue, green] {lager: 5, ale: 3}
id name age favColours beerRatings
1 John 32 {lager: 5, ale: 3}id value
0 blue
1 green
id name age favColours beerRatings
1 John 32 id value
0 blue
1 green
key value
lager 5
ale 3
Property graphs as nested relations
Node/relationship properties:
List:
Map:
Paths: […]
id name age favColours beerRatings
1 John 32 [blue, green] {lager: 5, ale: 3}
id name age favColours beerRatings
1 John 32 {lager: 5, ale: 3}id value
0 blue
1 green
id name age favColours beerRatings
1 John 32 id value
0 blue
1 green
key value
lager 5
ale 3
Flattening NRA to FRA
It is possible to transform NRA to flat algebra expressions
Research questions:
o Does it solve the schema calculation problem?
o Is it fast enough for practical implementations?
Paredaens, J. and Van Gucht, D.:
Converting nested algebra expressions into flat algebra expressions.
ACM Transactions on Database Systems (TODS), 1992
http://dl.acm.org/citation.cfm?id=128768
Incremental maintenance of FRA
Szárnyas, G., Maginecz, J. and Varró, D.:
Evaluation of optimization strategies for incremental graph queries.
Periodica Polytechnica EECS, 2017
http://docs.inf.mit.bme.hu/preprints/perpol2016-gqo.pdf
For a change Δ𝑠 on the input
define change Δ𝑡 on the output
update internal data structures
Maintenance of the antijoin operator
IRE – Incremental Relational Engine
Incremental (flat) relational engine built on Akka
Independent from Cypher and property graphs
Source code: https://github.com/ftsrg/ingraph/tree/master/ire
OCIM1 revisited
Composite
data structures
o Lists
oMaps
o Paths
Nested
data structures
[e1,e2,{k: […]}
]
OCIM1 revisited
Composite
data structures
o Lists
oMaps
o Paths
Nested
data structures
[e1,e2,{k: […]}
]
OCIM1 revisited
Composite
data structures
o Lists
oMaps
o Paths
Nested
data structures
[e1,e2,{k: […]}
]
( )
OCIM1 revisited
Composite
data structures
o Lists
oMaps
o Paths
Nested
data structures
[e1,e2,{k: […]}
]
( )
OCIM1 revisited
Composite
data structures
o Lists
oMaps
o Paths
Nested
data structures
[e1,e2,{k: […]}
]
( )
OCIM1 revisited
Composite
data structures
o Lists
oMaps
o Paths
Nested
data structures
[e1,e2,{k: […]}
]
( )
CIR-2017-220
Current challenges
Update operations
Presume a perfectly working incremental query engine
How to perform updates?
o Low-level API operations:indexer.addTuple()
o Adding new nodes:
CREATE (…)
oMatching and creating:
MATCH (n)CREATE (n)-[:REL]->(:Label)
o Loading CSVs (legacy construct):
LOAD CSV FROM … AS lineCREATE (:Label {prop1: toInt(line[2]), …})
Update operations
Presume a perfectly working incremental query engine
How to perform updates?
o Low-level API operations:indexer.addTuple()
o Adding new nodes:
CREATE (…)
oMatching and creating:
MATCH (n)CREATE (n)-[:REL]->(:Label)
o Loading CSVs (legacy construct):
LOAD CSV FROM … AS lineCREATE (:Label {prop1: toInt(line[2]), …})
Not well suited to Rete
Roadmap
Research
o Formalise openCypher using Nested Relational Algebra
o Transform nested expressions to Flat Relational Algebra
Development
o Support for LDBC’s Social Network Benchmark / BI workload
o Use TCK for testing
o Implement NRA to FRA transformation
• See if it works
• Run benchmarks
o Use Akka clustering and Docker Compose for deployment
o Discover more use cases
Related resources
Repository:
https://github.com/ftsrg/ingraph
Technical report:
http://docs.inf.mit.bme.hu/ingraph/pub/opencypher-report.pdf
Formalisation (preprint):
https://arxiv.org/abs/1705.02844