graph query languages: update from ldbc
TRANSCRIPT
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Graph Query Languages
Juan F. Sequeda, Ph.DCo-FounderCapsenta
1Smart Data – Graphorum Conference – January 20, 2017
@juansequeda @ LDBCouncil
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Take away message
• Graph Databases need a Standardized Query Language
• It’s complicated• “Those who fail to learn from history are doomed to repeat it”
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Linked Data Benchmark Council (LDBC)
• LDBC is a non-‐profit organization dedicated to establishing benchmarks, benchmark practices and benchmark results for graph data management software.
• LDBC was established as an outcome of the LDBC EU project funded by the European Commission within the 7th Framework Programme (Grant Agreement No. 317548).
http://ldbcouncil.org/
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
LDBC Organization (non-‐profit)
“sponsors”
+ non-‐profit members (FORTH, STI2) & personal members+ Task Forces, volunteers developing benchmarks+ TUC: Technical User Community (8 workshops, ~40 graph and
RDF user case studies, 18 vendor presentations)
9th TUC Meeting, SAP Headquarters in Walldorf Germany, February 9-‐10 2017http://ldbcouncil.org/9th-‐tuc-‐meeting-‐sap-‐headquarters-‐walldorf-‐germany-‐february-‐9-‐10-‐2017
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
LDBC Benchmarks
5
http://ldbcouncil.org/benchmarks
Graphalytics Semantic Publishing Social Network
VLDB 2016
SIGMOD 2015
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Graph Query Language Task Force
• Study query languages for graph data management systems, specifically systems storing “Property Graph” data
• Query language should cover the needs of important use cases: social network benchmark, interactive and BI workloads
• Devise a list of desired features and functionalities
• Renzo Angles, Universidad de Talca
• Marcelo Arenas, PUC Chile -‐ task force lead• Pablo Barceló, Universidad de Chile
• Peter Boncz, Vrije Universiteit Amsterdam
• George Fletcher, Eindhoven University of Technology
• Claudio Gutierrez, Universidad de Chile
• Tobias Lindaaker, Neo Technology• Marcus Paradies, SAP
• Raquel Pau, UPC
• Arnau Prat, UPC / Sparsity• Juan Sequeda, Capsenta
• Oskar van Rest, Oracle Labs• Hannes Voigt, TU Dresden
• YinglongXia, IBM
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Property Graph Data Model
id: 123name: Juan Sequeda
Personid: 456name: Marcelo Arenas
Personknows
since: 2010
• Neo4j’s Cypher• Oracle’s PGQL• Tinkerpop’sGremlin
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Graph Query Languages Today
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Standard Syntax and Semantics• Standard– Vendor lock-‐in– Prevents true performance competition à improvement of systems
• Syntax– Hard to define benchmarks– Hard to compare Graph DBMS– Applications are not portable
• Semantics– PGQL: iso/homomorphism– Cypher: “edge” isomorphism– Gremlin: homomorphism– SPARQL: homomorphism
Best Paper WWW2012
Angles and Gutierrez. The Expressive Power of SPARQL. ISWC2008
Best Paper ISWC2006. 10 Year Award 2016
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Closed Language
SQL
GraphQuery
Language*
however …
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Path Support
• Conjunctive Query (CQ)• Regular Path Queries (RPQ): regular expression over edge labels• Conjunctive Regular Path Query (CRPQ)• Union Conjunctive Regular Path Query (UCRPQ)• Path Matching supported. What about further processing of Paths?
• Require for an (a,b)* RPQ that the sum of some property on all b edges along the path is greater some give threshold
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Desired Query Functionalities
• Adjacency Queries• Graph Pattern Matching• Navigational Queries• Aggregate Queries• Sub queries
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Adjacency queries
• Property access– Get the firstName and lastName of a person having email "$email”
• Neighborhood of a node– Get the firstName and lastName of the friends of a person identified by email "$email".
• K-‐neighborhood of a node– Get the email, firstName and lastName of friends of the friends of a person having email "$email" (excluding the start person) (i.e. get a list of recommended friends) (directed 2-‐neighborhood)
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Graph Pattern Matching• Join
– Get the creationDate and content of the messages created by a person identified by email "$email1" and commented by another person identified by "$email2".
• Union– Get the creationDate and content of the messages either created or liked by a person
identified by email "$email".• Intersection
– Get the email, firstNameand lastName of the common friends between two persons identified by emails "$email1" and "$email2" respectively.
• Difference– Given two friends identified by emails "$email1" and "$email2" respectively, get the email,
firstName and lastName of the friends of the second person which are not friends of the first person (this questions is relevant for friendship recommendations).
• Optional– Given a person identified by email "$email", get the title of all the messages created by such
person, and the content of the first comment replying each message (if it exists).• Filter
– Get the properties of the people whose firstNameincludes the string "xxx" (it implies use of wildcards).
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Navigational queries• Reachability
– Is there a friendship connection between two persons identified by emails "$email1" and "$email2" respectively?
• All Path Finding– Get the friendship paths between two persons identified by emails "$email1"
and "$email2" respectively.• Shortest Path Finding
– The shortest friendship path between two persons identified by emails "$email1" and "$email2" respectively".
• Regular Path Query– Get the firstName of friends of the friends of the friends of a person identified
by email "$email".• Conjunctive Regular Path Queries
– Given a target message created on "$dateTime" by a person identified by email "$email", for each comment replying the target message, get the comment's content and the email of the comment´s creator.
• Filtered regular path query– Given a person identified by email "$email", get the title of all the messages
liked by such person between "$dateTime1" and "$dateTime2”.
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Where are we now?
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Decision: Property Graph Data Model
In the following definition, we assume the existence of the following sets:• L is an infinite set of (node and edge) labels;• P is an infinite set of property names;• V is an infinite set of literals (actual values).Moreover, we assume that SET(X) is the set of all finite subsets of a given
set X.Then a property graph is a tupleG = (N, E, ρ, λ, σ), where:• nodes: N is a finite set of nodes;• edges: E is a finite set of edges such thatN and E have no elements in
common;ρ : E→ (N × N) is a total function;
• labels: λ : (N U E) → SET(L) is a total function;
• properties: σ : (N U E) × P→ V is a partial function.
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Property Graph Data Model
id: 123name: Juan Sequeda
Personid: 456name: Marcelo Arenas
Personknows
since: 2010
L is an infinite set of (node and edge) labelsP is an infinite set of property namesV is an infinite set of literals (actual values)N is a finite set of nodesE is a finite set of edges such thatN and E have no elements in common
L = {Person, knows}P = {id, name, since}V = {“123”, “456”, “Juan Sequeda”, “Marcelo Arenas”, “2010”}N = {n1, n2}E = {e1}
ρ (edge) : E →(N × N) is a total function
λ (labels): (N U E) → SET(L) is a total function
σ (properties): (N U E) × P→ V is a partial function
ρ = [ρ(e1) = (n1,n2) ]
λ = [λ(n1) = “Person”, λ(n2) = “Person”, λ(e1) = “knows” ]
σ = [σ(n1) = {(“id”, “123”), (“name”, “Juan Sequeda”)},σ(n2) = {(“id”, “456”), (“name”, “Marcelo Arenas”)},σ(e1) = {(“since”, “2010”)}]
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Unified Data Model Options
• Extend Relational model (where cells can contain actual values such as strings, integers, dates,…) with objects of type either NODE, EDGE, PATH or GRAPH– Embedding Graphs in Relational Database– Similar Postgres + JSON datatype–What happens if you join on a GRAPH column?
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Unified Data Model Options
• “Natural” Graph Data model
20
1 2 3 4
5 6
id label E.id E.dest
1 purple 10 2
2 green 11 3
13 5
3 green 12 4
14 6
4 purple no out edges
5 green 15 6
6 purple 16 4
10 11 12
13 1415 16
Data graph G
SELECT x, yFROM G (x:purple)-[e:*]->(y:purple)
Query 1:
id x y
90 1 4
91 1 6
x=1y=4
90x=1y=6
91
id x y E.id E.dest
90 1 4 no out edges
91 1 6 no out edges
New Nodes Because ofRelation Projection
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
(1) Paths make things “weird”!
SELECT x, y, CHEAPEST PATH edgeIdFROM G (x:purple)-[e:*]->(y:purple)
Query 2a:
id x y
90 1 4
91 1 6
id x y E.id E.dest E.edgeId E.order
90 1 4 20 90 10 1
21 90 11 2
22 90 12 3
91 1 6 23 91 10 1
24 91 13 2
25 91 15 3
x=1y=4
90x=1y=6
91edgeId=10order=1
edgeId=11order=2
edgeId=12order=3
edgeId=10order=1
edgeId=13order=2
edgeId=15order=3
1 2 3 4
5 6
10 11 12
13 1415 16
Data graph G
Paths are Self Edges with an Order
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
(2) Paths make things “weird”!
22
SELECT x, y, pFROM G (x:purple)-[CHEAPEST p:+]->(y:purple)Query 2b:
id x y
90 1 4
91 1 6
x=1y=4
90x=1y=6
91id x y p E.id E.dest
90 1 4 [10,11,12] no out edges
91 1 6 [10,13,15] no out edges
x=1y=4p=[10,11,12]
90x=1y=6p=[10,13,15]
91
1 2 3 4
5 6
10 11 12
13 1415 16
Data graph G
Paths are Datatypes
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
(1) Graph Projection (instead of Relation Projection)
23
Query 3a:
1 2 3 4
5 6
10 11 12
13 1415 16
Data graph G
SELECT (x), (y)FROM G (x:orange)-[:+]->(y:orange)
id label E.id E.dest
1 purple no out edges
4 purple no out edges
6 purple no out edges
1 4
6
id x y
90 1 4
91 1 6
x=1y=4
90x=1y=6
91
id x y E.id E.dest
90 1 4 no out edges
91 1 6 no out edges
Remember: this is the Relational Projection
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
(2) Graph Projection (instead of Relation Projection)
24
id label E.id E.dest
1 purple 21 4
22 6
4 purple 23 6
6 purple no out edges
Query 3b:
1 4
6
21
2322
SELECT (x)--(y)FROM G (x:purple)-[:+]->(y:purple)
1 2 3 4
5 6
10 11 12
13 1415 16
Data graph G
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
(1) Graph Projection with Paths
25
1 2 3 4
5 6
10 11 12
13 1415 16
Data graph G
SELECT (PATH p)FROM G (x:orange)-[CHEAPEST p:*]->(y:orange)WHERE y.id = 4
1 2 3 4
6
10 11 12
16id label E.id E.dest
1 purple 10 2
2 green 11 3
3 green 12 4
4 purple no out edges
6 purple 16 6
Query 4a:
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
(2) Graph Projection with Paths
26
1 2 3 4
5 6
10 11 12
13 1415 16
Data graph G
Query 4b:
21
23l=1
L=3
SELECT (x)-[:{l=LENGTH OF p}]-(y)FROM G (x:orange)-[CHEAPEST p:*]->(y:orange)WHERE y.id = 4
id label E.id E.dest E.l
1 purple 21 4 34 purple 23 6 16 purple no out edges
1 4
6
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Wrap up
• We want to hear from you!
• 9th LDBC Technical User Community Meeting, SAP Headquarters in WalldorfGermany, February 9-‐10 2017
• http://ldbcouncil.org/9th-‐tuc-‐meeting-‐sap-‐headquarters-‐walldorf-‐germany-‐february-‐9-‐10-‐2017
27
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
THANK YOU
Juan Sequeda, Ph.DCo-‐Founder – [email protected]@juansequeda
28
Sequeda J. Integrating Relational Databases with the Semantic Web. IOS Press. 2016http://www.iospress.nl/book/integrating-‐relational-‐databases-‐with-‐the-‐semantic-‐web/