20141216 graph database prototyping ams meetup
TRANSCRIPT
Graph Database Prototyping
@ AMS GraphDB
meetup
Agenda for Tonight
• Building a Graph Database Prototype • 3 parts – Graph database & modeling concepts – Prototyping tools & import – Graph querying with Cypher
Data Modeling With Neo4j
Topics
• Graph model building blocks • Quick intro to Cypher • Example modeling process • Modeling Eps • Recipes for common modeling scenarios • Refactoring • Test-‐driven data modeling
Graph Model Building Blocks
Property Graph Data Model
Four Building Blocks
• Nodes • RelaEonships • ProperEes • Labels
Nodes
Nodes
• Used to represent en##es and complex value types in your domain
• Can contain properEes – Used to represent enEty a1ributes and/or metadata (e.g. Emestamps, version)
– Key-‐value pairs • Java primiEves • Arrays • null is not a valid value
– Every node can have different properEes
EnEEes and Value Types
• EnEEes – Have unique conceptual idenEty – Change aWribute values, but idenEty remains the same
• Value types – No conceptual idenEty – Can subsEtute for each other if they have the same value • Simple: single value (e.g. colour, category) • Complex: mulEple aWributes (e.g. address)
RelaEonships
RelaEonships
• Every relaEonship has a name and a direc#on – Add structure to the graph – Provide semanEc context for nodes
• Can contain properEes – Used to represent quality or weight of relaEonship, or metadata
• Every relaEonship must have a start node and end node – No dangling relaEonships
RelaEonships (conEnued)
Nodes can have more than one relaEonship
Self relaEonships are allowed
Nodes can be connected by more than one relaEonship
Variable Structure
• RelaEonships are defined with regard to node instances, not classes of nodes – Two nodes represenEng the same kind of “thing” can be connected in very different ways • Allows for structural variaEon in the domain
– Contrast with relaEonal schemas, where foreign key relaEonships apply to all rows in a table • No need to use null to represent the absence of a connecEon
Labels
Labels
• Every node can have zero or more labels • Used to represent roles (e.g. user, product, company) – Group nodes – Allow us to associate indexes and constraints with groups of nodes
Four Building Blocks
• Nodes – EnEEes
• RelaEonships – Connect enEEes and structure domain
• ProperEes – EnEty aWributes, relaEonship qualiEes, and metadata
• Labels – Group nodes by role
Designing a Graph Model
Models
Images: en.wikipedia.org
Purposeful abstracEon of a domain designed to saEsfy parEcular applicaEon/end-‐user goals
Design for Queryability
Model Query
Method
1. IdenEfy applicaEon/end-‐user goals 2. Figure out what quesEons to ask of the domain 3. IdenEfy enEEes in each quesEon 4. IdenEfy relaEonships between enEEes in each
quesEon 5. Convert enEEes and relaEonships to paths – These become the basis of the data model
6. Express quesEons as graph paWerns – These become the basis for queries
ApplicaEon/End-‐User Goals
As an employee
I want to know who in the company has similar skills to me
So that we can exchange knowledge
QuesEons To Ask of the Domain
Which people, who work for the same company as me, have similar skills to me?
As an employee I want to know who in the company has similar skills to me So that we can exchange knowledge
IdenEfy EnEEes
Which people, who work for the same company as me, have similar skills to me? Person Company Skill
IdenEfy RelaEonships Between EnEEes
Which people, who work for the same company as me, have similar skills to me? Person WORKS_FOR Company Person HAS_SKILL Skill
Convert to Cypher Paths
Person WORKS_FOR Company Person HAS_SKILL Skill
RelaEonship
Label
(:Person)-[:WORKS_FOR]->(:Company),(:Person)-[:HAS_SKILL]->(:Skill)
Consolidate Paths
(:Person)-[:WORKS_FOR]->(:Company),(:Person)-[:HAS_SKILL]->(:Skill)
(:Company)<-[:WORKS_FOR]-(:Person)-[:HAS_SKILL]->(:Skill)
Create Person Subgraph
MERGE (c:Company{name:'Acme'})MERGE (p:Person{name:'Ian'})MERGE (s1:Skill{name:'Java'})MERGE (s2:Skill{name:'C#'})MERGE (s3:Skill{name:'Neo4j'})CREATE UNIQUE (c)<-[:WORKS_FOR]-(p), (p)-[:HAS_SKILL]->(s1), (p)-[:HAS_SKILL]->(s2), (p)-[:HAS_SKILL]->(s3)RETURN c, p, s1, s2, s3
Candidate Data Model
(:Company)<-[:WORKS_FOR]-(:Person)-[:HAS_SKILL]->(:Skill)
Express QuesEon as Graph PaWern
Which people, who work for the same company as me, have similar skills to me?
Cypher Query
Which people, who work for the same company as me, have similar skills to me? MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill), (company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)WHERE me.name = {name}RETURN colleague.name AS name, count(skill) AS score, collect(skill.name) AS skillsORDER BY score DESC
Graph PaWern
Which people, who work for the same company as me, have similar skills to me? MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill), (company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)WHERE me.name = {name}RETURN colleague.name AS name, count(skill) AS score, collect(skill.name) AS skillsORDER BY score DESC
Which people, who work for the same company as me, have similar skills to me? MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill), (company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)WHERE me.name = {name}RETURN colleague.name AS name, count(skill) AS score, collect(skill.name) AS skillsORDER BY score DESC
Anchor PaWern in Graph
If an index for Person.name exists, Cypher will use it
Create ProjecEon of Results
Which people, who work for the same company as me, have similar skills to me? MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill), (company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)WHERE me.name = {name}RETURN colleague.name AS name, count(skill) AS score, collect(skill.name) AS skillsORDER BY score DESC
First Match
Second Match
Third Match
Running the Query
+-----------------------------------+| name | score | skills |+-----------------------------------+| "Lucy" | 2 | ["Java","Neo4j"] || "Bill" | 1 | ["Neo4j"] |+-----------------------------------+2 rows
From User Story to Model and Query
MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill), (company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)WHERE me.name = {name}RETURN colleague.name AS name, count(skill) AS score, collect(skill.name) AS skillsORDER BY score DESC
As an employee I want to know who in the company has similar skills to me So that we can exchange knowledge
(:Company)<-[:WORKS_FOR]-(:Person)-[:HAS_SKILL]->(:Skill)
Person WORKS_FOR Company Person HAS_SKILL Skill
?Which people, who work for the same company as me, have similar skills to me?
Modeling Tips
ProperEes Versus RelaEonships
Use RelaEonships When…
• You need to specify the weight, strength, or some other quality of the rela#onship
• AND/OR the aWribute value comprises a complex value type (e.g. address)
• Examples: – Find all my colleagues who are expert (relaEonship quality) at a skill (aWribute value) we have in common
– Find all recent orders delivered to the same delivery address (complex value type)
Use ProperEes When…
• There’s no need to qualify the relaEonship • AND the aWribute value comprises a simple value type (e.g. colour)
• Examples: – Find those projects wriWen by contributors to my projects that use the same language (aWribute value) as my projects
If Performance is CriEcal…
• Small property lookup on a node will be quicker than traversing a relaEonship – But traversing a relaEonship is sEll faster than a SQL join…
• However, many small proper#es on a node, or a lookup on a large string or large array property will impact performance – Always performance test against a representaEve dataset
RelaEonship Granularity
Align With Use Cases
• RelaEonships are the “royal road” into the graph
• When querying, well-‐named relaEonships help discover only what is absolutely necessary – And eliminate unnecessary porEons of the graph from consideraEon
General RelaEonships
• Qualified by property
Specific RelaEonships
Best of Both Worlds
Model and Query Recipes
Events and AcEons
• Oken involve mulEple parEes • Can include other circumstanEal detail, which may be common to mulEple events
• Examples – Patrick worked for Acme from 2001 to 2005 as a Sokware Developer
– Sarah sent an email to Lucy, copying in David and Claire
Timeline Trees
• Discrete events – No natural relaEonships to other events
• You need to find events at differing levels of granularity – Between two days – Between two months – Between two minutes
Example Timeline Tree
Pimalls and AnE-‐PaWerns
Modeling EnEEes as RelaEonships
• Limits data model evoluEon – A relaEonship connects two things – Modeling an enEty as a relaEonship prevents it from being related to more than two things
• Smells: – Lots of aWribute-‐like properEes – Heavy use of relaEonship indexes
• EnEEes hidden in verbs: – E.g. emailed, reviewed
Example: Movie Reviews
• IniEal requirements: – People review films – ApplicaEon aggregates reviews from mulEple sites
IniEal Model
New Requirements
• Allow user to comment on each other’s reviews – Can’t connect a review to a third enEty
Revised model
Model AcEons in Terms of Products
Now for
Some Prototyping!
Draw a Model!
Eg. Using Visio, www.apcjones.com/arrows, hWp://graphjson.io, Omnigraffle
CreaEng a prototype DB out of our model?
Now for Some
Queries!
Next meetup!
• January 22nd : how to create an APPLICATION on top of our newly created database
BACKUP slides: Cypher Query Language
Nodes and RelaEonships
()-->()
Labels and RelaEonship Types
(:Person)-[:FRIEND]->(:Person)
ProperEes
(:Person{name:'Peter'})-[:FRIEND]->(:Person{name:'Lucy'})
IdenEfiers
(p1:Person{name:'Peter'})-[r:FRIEND]->(p2:Person{name:'Lucy'})
Cypher
MATCH graph_patternWHERE binding_and_filter_criteriaRETURN results
Cypher
MATCH (p:Person)-[:FRIEND]->(friends)WHERE p.name = 'Peter'RETURN friends
Lookup Using IdenEfier + Label
MATCH (p:Person)-[:FRIEND]->(friends)WHERE p.name = 'Peter'RETURN friends