introduction to graph databases, neo4j and spring data - english 2015 edition
TRANSCRIPT
Introduc)on to graph databases, Neo4j and Spring Data
Aleksander M. Stensby
Monokkel A/S
• Aleksander M. Stensby
• CEO in Monokkel AS
• Previously COO in Integrasco AS
• Working with search and data analysis since 2004
www.monokkel.io
• Daglig leder i Monokkel AS
• Tidligere COO i Integrasco AS
• Persistering, Prosessering og Presentasjon av data
Persistence – Processing – PresentaHon
Agenda
• Intro to graph databases and modelling
• Neo4j and Cypher
• SpringData Neo4j
RelaHonal databases…
…when it comes to relaHons
RelaHonal databases are awesome when it comes to aggregaHon of data, schema mapping and tabular data!
BUT
We have a tendency to force _all_
our problems into our relaHonal databases!
Join-‐hell.. Recursion... Null-‐checks...
Polyglot persistence
NoSQL
Not only SQL
NoSQL
• Key-‐value stores – Amazon Dynamo – Ex: Voldemort,...
• BigTable clones – Google’s BigTable – Ex: Hbase, ...
• Document databases – Lotus Notes – Ex: MongoDB, CouchDB
NoSQL
• Graph databases – Euler and graph theory – Ex: Neo4j, VertexDB, AllegroGraph, Giraphe, OrientDB, etc...
NoSQL Size
Complexity
Graphs
Document
BigTable
KV
NoSQL Size
Complexity
Graphs
Document
BigTable
KV
90% Billions of nodes and rela)onships
enough buzzwords…
Friends of friends...
Person ID Name
Friends PersonID FriendID
Person ID Name
Friends PersonID FriendID
«Aleks»’s friends
SELECT p1.Person FROM Person p1 JOIN Friends
ON Friends.FriendID = p1.ID JOIN Person p2
ON Friends.PersonID = p2.ID WHERE p2.Name= 'Aleks'
Person ID Name
Friends PersonID FriendID
Friends with «Aleks»
SELECT p1.Person FROM Person p1 JOIN Friends
ON Friends.PersonID = p1.ID JOIN Person p2
ON Friends.FriendID = p2.ID WHERE p2.Name = 'Aleks'
Person ID Name
Friends PersonID FriendID
«Aleks»’s friends’ friends?
SELECT p1.Person AS PERSON, p2.Person AS FRIEND_OF_FRIEND FROM Friends v1 JOIN Person p1
ON v1.PersonID = p1.ID JOIN Friends v2
ON v2.FriendID = v1.FriendID JOIN Person p2
ON v2.FriendID = p2. ID WHERE p1.Name = 'Aleks‘ AND v2.FriendID <> p1.ID
Source: Neo4j in AcHon
1,000,000 persons
Depth RDBMS Neo4j Records
2 0.016 0.01 ~ 2500 3 30.267 0.168 ~ 110,000 4 1543.505 1.359 ~ 600,000 5 DNF 2.132 ~ 800,000
A graph database...
• uses graph structures with – Nodes – RelaHonships – Apributes
• to store informaHon!
• Exellent for relaHonships – but not necessarily best at aggregaHng data
Graph databases and neo4j
• Visual – Schema less!
• Doubly-‐linked-‐lists – each node has a list of incoming and outgoing relaWonships
• Direct lookup = O(1)
A graph stores data in nodes
ALEKS TARJEI
Nodes are connected by relaWonships
FRIENDS_WITH
ALEKS TARJEI
Nodes have a\ributes
ALEKS TARJEI
Age: 29 First Name: Aleksander Last Name: Stensby
Age: 30 First Name: Tarjei Last Name: Romtveit
FRIENDS_WITH
RelaWonships can also have a\ributes
ALEKS TARJEI
Age: 29 First Name: Aleksander Last Name: Stensby
Age: 30 First Name: Tarjei Last Name: Romtveit
FRIENDS_WITH Since: 01.01.2004
RelaWonships can be bi-‐direcHonal
ALEKS TARJEI
Age: 29 First Name: Aleksander Last Name: Stensby
Age: 30 First Name: Tarjei Last Name: Romtveit
FRIENDS_WITH Since: 01.01.2004
FRIENDS_WITH Since: 01.01.2004
ALEKS TARJEI
Age: 29 First Name: Aleksander Last Name: Stensby
Age: 30 First Name: Tarjei Last Name: Romtveit
FRIENDS_WITH Since: 01.01.2004
ALEKS TARJEI
Age: 29 First Name: Aleksander Last Name: Stensby Type: Person
Age: 30 First Name: Tarjei Last Name: Romtveit Type: Person
FRIENDS_WITH Since: 01.01.2004
different types of nodes
Person Person
Sport
Programming language
Drink
Person
Some use cases
• Social Media • RecommendaHons • Geo rouHng and logisHcs • Access management • Network management • Finance / Fraud
Neo4j – [IS_A] -‐> Property Graph
Neo4j 2.2.x
• Currently early access milestone 4
• New page cache -‐> high concurrency • TransacHonal and batch write performance
• Cost-‐Based OpHmizer for Cypher
(node) – [relaHonship] -‐> (node)
(Aleks) – [FRIENDS_WITH] -‐> (Tarjei)
Cypher
"Make the simple things simple, and the complex things possible"
Intro to Cypher
• START • MATCH • WHERE • RETURN • CREATE • DELETE • SET • FOREACH • WITH
MATCH <papern> WHERE RETURN
Describe what you want to retrieve with PATTERNS
(a)-‐[r]-‐>(b)
(Aleks) – [FRIENDS_WITH] -‐> (Tarjei)
Path depth
(a)-‐[*]-‐>(b)
START
START n=node(1) RETURN n
MATCH
MATCH (movie:Movie) RETURN movie
WHERE
MATCH movie WHERE move.Htle = ‘Blade Runner' RETURN movie
START a=node(*) MATCH (a:Person) WHERE a.name='Danny DeVito' RETURN a
START a=node(*) MATCH (a:Person) WHERE a.name='Danny DeVito' RETURN a
START a=node(*) MATCH (a:Person {name: 'Danny DeVito'} ) WHERE a.name='Danny DeVito' RETURN a
Neo4j 2.0.1
Friend of friend...
MATCH (aleks)-‐[r:KNOWS]-‐()-‐[r2:KNOWS] -‐>(friend_of_friend) WHERE aleks.firstName= 'Aleks' RETURN friend_of_friend.firstName
Friend of friend...
MATCH (aleks)-‐[:KNOWS*2..2]-‐>(friend_of_friend) WHERE aleks.firstName= 'Aleks' AND NOT (aleks)-‐[:KNOWS]-‐>(friend_of_friend) RETURN friend_of_friend.firstName
hpp://docs.neo4j.org/refcard/2.1.7/
DEMO
And a whole lot more… • Traversals to navigate the graph • Traversals idenHfy paths • Paths order nodes • Cypher parameters • Indexes • Neo4j embedded, Neo4j REST • Neo4j 2.x – Labels and automaWc indexes • Gremlin (procedural) – Cypher (declaraHve) • Enterprise (mulH-‐instance cluster, online backup) • Neo4j SpaWal
Language drivers
Contrib
Intro to SpringData Neo4j
• SpringData • Repositories • Quick example
<dependencies> <dependency> <groupId>org.springframework.data</groupId> <arHfactId>spring-‐data-‐neo4j</arHfactId> <version>3.2.2.RELEASE</version> </dependency> </dependencies>
neo4j 2.1.7 spring-‐data-‐neo4j 3.2.2 spring 4.1.4
SpringData Neo4j Config @Configuration @EnableNeo4jRepositories("no.stensby.javabin.neo.repositories") public class Config extends Neo4jConfiguration { }
@Autowired protected Neo4jTemplate template;
SpringData Neo4j Config @Configuration @EnableNeo4jRepositories("no.stensby.javabin.neo.repositories") public class Config extends Neo4jConfiguration { public Config (){ setBasePackage("no.stensby.javabin.neo.domain"); } }
@Autowired protected Neo4jTemplate template;
SpringData Neo4j Config
@Bean GraphDatabaseService graphDatabaseService() { return new GraphDatabaseFactory().newEmbeddedDatabase("tmp/neo4j"); }
Embedded:
REST: @Bean GraphDatabaseService graphDatabaseService() { return new SpringRestGraphDatabase("http://localhost:7474/db/data"); }
@Bean(destroyMethod = "shutdown") @Scope(SCOPE_PROTOTYPE) public GraphDatabaseService graphDatabaseService() { return new TestGraphDatabaseFactory().newImpermanentDatabase(); }
TEST:
SpringData Neo4j -‐ Domain
@NodeEntity public class Person { @GraphId public Long nodeId; @Indexed(unique = true) public int id; public String firstName; public String lastName; }
1-‐1 relaHonships
@NodeEntity public class Address { private Country country; ... }
1-‐1 relaHonships
@NodeEntity public class Address { @Fetch private Country country; ... }
1-‐m relaHonships -‐ @RelatedTo
@RelatedTo(type = "ADDRESS") private Set<Address> addresses = new HashSet<Address>();
@RelatedTo private Set<Person> venner = new HashSet<Person> ();
m-‐m relaHonships -‐ @RelatedToVia
@Fetch @RelatedToVia (type = "ACTS_IN") Set<Role> roles = new HashSet<Role>();
@Fetch @RelatedToVia(type = "ACTS_IN", direction = Direction.INCOMING) Set<Role> cast = new HashSet<Role>();
Actor
Movie
@RelationshipEntity(type = "ACTS_IN") public class Role { @GraphId Long nodeId; @StartNode Actor actor; @EndNode Movie movie;
Role
SpringData Neo4j -‐ Repositories
GraphRepository<Person> personRepository = template.repositoryFor(Person.class);
@Autowired Neo4jTemplate template;
SpringData Neo4j -‐ Repositories
public interface PersonRepository extends GraphRepository<Person>{}
@Autowired PersonRepository repository;
SpringData Neo4j -‐ Repositories
public interface PersonRepository extends GraphRepository<Person>{ List<Person> findByFirstName(String firstName); }
SpringData Neo4j -‐ Repositories
public interface PersonRepository extends GraphRepository<Person>{ List<Person> findByFirstName(String firstName); @Query("MATCH (p:Person{firstName:{0}}) RETURN p") List<Person> getPersonWithFirstName (String firstName); }
SpringData Neo4j -‐ Repositories
public interface PersonRepository extends GraphRepository<Person>{ List<Person> findByFirstName(String firstName); @Query("MATCH (p:Person{firstName:{0}}) RETURN p") List<Person> getPersonWithFirstName (String firstName); @Query( "MATCH (p:Person{firstName:{0}})-‐[:knows]-‐>friends " +
" RETURN friends") Iterable<Person> findFriendsOfPerson(String firstName); }
Lessons Learned
• Be careful with versions and upgrading … – Neo 1.9.x -‐> Neo 2.x = lots of problems… – Lots of breaking changes… – Spring Data Neo4j != Neo4j
• Nodes are “first-‐class” ciHzens in the graph – Hyperedges are not supported -‐> use “event nodes”
• AggregaHons are expensive – caching stats on nodes!
• Unique and expressive relaHonship types
Pi�alls • Don’t use the graphid as your ID
• IniHalize Set
• Use @Fetch with care
• Use standard gepers/sepers
• Be careful with String escaping
InspiraHon • Single Malt Scotch Whisky
hpp://gist.neo4j.org/?8139605
• Chess Games and PosiWons hpp://gist.neo4j.org/?6506717
• Movie RecommendaWons hpp://gist.neo4j.org/?8173017
• Food Recipes RecommendaWon hpp://gist.neo4j.org/?8731452