graphconnect 2014 sf: from zero to graph in 120: scale

79
Scaling Neo4j Applica0ons SAN FRANCISCO | 10.22.2014 @iansrobinson

Upload: neo4j-the-open-source-graph-database

Post on 02-Dec-2014

194 views

Category:

Software


2 download

DESCRIPTION

GraphConnect 2014 SF: From Zero to Graph in 120: Scale

TRANSCRIPT

Page 1: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Scaling  Neo4j  Applica0ons  

SAN  FRANCISCO  |  10.22.2014  

powered by!

powered by!

@iansrobinson  

Page 2: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

The  Burden  of  Success  

•  More  users  •  Larger  datasets  •  More  concurrent  requests  •  More  complex  queries  

Page 3: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Scaling  is  a  Feature  

•  It  doesn’t  come  for  free  •  Condi0ons  of  success:    – Understand  current  needs  

•  Design  for  an  order  of  magnitude  growth  

–  Itera0ve  and  incremental  development  – Unit  tests  

•  Bedrock  of  asserted  behaviour  – Performance  tests  

Page 4: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Overview  

•  Scaling  Reads  – Latency  – Throughput  

•  Scaling  Writes  •  Hardware  

Page 5: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Scaling  Reads  -­‐  Latency  

Page 6: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Query  Latency  

latency = f(search_area)

Page 7: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Query  Latency  

latency = f(search_area)

Page 8: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Query  Latency  

latency = f(search_area)

Page 9: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Query  Latency  

latency = f(search_area)

Page 10: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Query  Latency  

latency = f(search_area)

Page 11: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Query  Latency  

latency = f(search_area)

Page 12: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Search  Area  

search_area = f(domain_invariants)

Page 13: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Search  Area  

search_area = f(domain_invariants)

Absolute  Every  user  has  50  friends      

Page 14: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Search  Area  

search_area = f(domain_invariants)

Absolute  Every  user  has  50  friends      

Page 15: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Search  Area  

search_area = f(domain_invariants)

Absolute  Every  user  has  50  friends    Rela,ve  Every  user  is  friends  with  10%  of  the  user  base  

Page 16: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Search  Area  

search_area = f(domain_invariants)

Absolute  Every  user  has  50  friends    Rela,ve  Every  user  is  friends  with  10%  of  the  user  base  

Page 17: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Reducing  Read  Latency  

•  The  Blackadder  solu0on  

Page 18: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Reducing  Read  Latency  

•  The  Blackadder  solu0on  •  Improve  the  Cypher  query  •  Change  the  model  •  Use  an  Unmanaged  Extension  

Page 19: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Improve  Cypher  Query  

•  Small  queries,  separated  by  WITH•  Start  from  low-­‐cardinality  nodes  

h\p://thought-­‐bytes.blogspot.co.uk/2013/01/op0mizing-­‐neo4j-­‐cypher-­‐queries.html  h\p://wes.skeweredrook.com/pragma0c-­‐cypher-­‐op0miza0on-­‐2-­‐0-­‐m06/  

Page 20: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Change  the  Model  

Goal  Do  less  work  (in  the  query)  –  By  exploring  less  of  the  graph  

How?  Iden0fy  inferred  rela-onships  –  Replace  with  use-­‐case  specific  shortcuts  

Page 21: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Change  the  Model  -­‐  From  

MATCH (:Person{username:'ben'}) -[:WORKED_ON]->(:Project)<-[:WORKED_ON]- (colleague:Person)  

Page 22: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Change  the  Model  -­‐  From  

MATCH (:Person{username:'ben'}) -[:WORKED_ON]->(:Project)<-[:WORKED_ON]- (colleague:Person)  

Page 23: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Change  the  Model  -­‐  To  

MATCH (:Person{username:'ben'}) -[:WORKED_WITH]- (colleague:Person)  

Page 24: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Tradeoff  

More  expensive  writes  More  data  

Cheaper  reads  

When  to  add  the  new  rela0onship?  • With  tx  • Queue  for  subsequent  tx  •  Periodic/batch  

Page 25: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Refactor  Exis0ng  Data  

MATCH (p1:Person) -[:WORKED_ON]->(:Project)<-[:WORKED_ON]- (p2:Person)WHERE NOT ((p1)-[:WORKED_WITH]-(p2))WITH DISTINCT p1, p2 LIMIT 10MERGE (p1)-[r:WORKED_WITH]-(p2)RETURN count(r)  

Page 26: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Select  Batch  

MATCH (p1:Person) -[:WORKED_ON]->(:Project)<-[:WORKED_ON]- (p2:Person)WHERE NOT ((p1)-[:WORKED_WITH]-(p2))WITH DISTINCT p1, p2 LIMIT 10MERGE (p1)-[r:WORKED_WITH]-(p2)RETURN count(r)  

Batch  size  

Page 27: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Add  New  Rela0onship  

MATCH (p1:Person) -[:WORKED_ON]->(:Project)<-[:WORKED_ON]- (p2:Person)WHERE NOT ((p1)-[:WORKED_WITH]-(p2))WITH DISTINCT p1, p2 LIMIT 10MERGE (p1)-[r:WORKED_WITH]-(p2)RETURN count(r)  

Page 28: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Con0nue  While  count(r)  >  0  

MATCH (p1:Person) -[:WORKED_ON]->(:Project)<-[:WORKED_ON]- (p2:Person)WHERE NOT ((p1)-[:WORKED_WITH]-(p2))WITH DISTINCT p1, p2 LIMIT 10MERGE (p1)-[r:WORKED_WITH]-(p2)RETURN count(r)  

Page 29: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Use  Unmanaged  Extensions  

REST  API   Extensions  

/db/data/cypher /my-extension/service

Page 30: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

RESTful  Resource  

@Path("/similar-skills")public class ColleagueFinderExtension { private static final ObjectMapper MAPPER = new ObjectMapper(); private final ColleagueFinder colleagueFinder; public ColleagueFinderExtension( @Context CypherExecutor cypherExecutor ) { this.colleagueFinder = new ColleagueFinder( cypherExecutor.getExecutionEngine() ); } @GET @Produces(MediaType.APPLICATION_JSON) @Path("/{name}") public Response getColleagues( @PathParam("name") String name ) throws IOException { String json = MAPPER .writeValueAsString( colleagueFinder.findColleaguesFor( name ) ); return Response.ok().entity( json ).build(); }}

Page 31: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

JAX-­‐RS  Annota0ons  

@Path("/similar-skills")public class ColleagueFinderExtension { private static final ObjectMapper MAPPER = new ObjectMapper(); private final ColleagueFinder colleagueFinder; public ColleagueFinderExtension( @Context CypherExecutor cypherExecutor ) { this.colleagueFinder = new ColleagueFinder( cypherExecutor.getExecutionEngine() ); } @GET @Produces(MediaType.APPLICATION_JSON) @Path("/{name}") public Response getColleagues( @PathParam("name") String name ) throws IOException { String json = MAPPER .writeValueAsString( colleagueFinder.findColleaguesFor( name ) ); return Response.ok().entity( json ).build(); }}

Page 32: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Inject  Database/Cypher  Execu0on  Engine  

@Path("/similar-skills")public class ColleagueFinderExtension { private static final ObjectMapper MAPPER = new ObjectMapper(); private final ColleagueFinder colleagueFinder; public ColleagueFinderExtension( @Context CypherExecutor cypherExecutor ) { this.colleagueFinder = new ColleagueFinder( cypherExecutor.getExecutionEngine() ); } @GET @Produces(MediaType.APPLICATION_JSON) @Path("/{name}") public Response getColleagues( @PathParam("name") String name ) throws IOException { String json = MAPPER .writeValueAsString( colleagueFinder.findColleaguesFor( name ) ); return Response.ok().entity( json ).build(); }}

Page 33: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

1.  Get  Close  to  the  Data  

Applica0on  

MATCH MATCH CREATE DELETE MERGE MATCH

Single  request,  many  opera0ons  –   Reduce  network  latencies  

Page 34: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

2.  Mul0ple  Implementa0on  Op0ons  

REST  API   Extensions  

Cypher  Traversal  Framework  Graph  Algo  Package  Core  API  

Page 35: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

3.  Control  Request/Response  Format  

{ users: [ { id: 1234}, { id: 9876} ] }

JSON,  CSV,  protobuf,  etc  

1a 03 08 96 01 Domain-­‐specific  representa0ons  –  Compact  –  Conserve  bandwidth  

Page 36: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

4.  Control  HTTP  Headers  

GET /my-extension/service/top-10

Reverse  Proxy  

Applica0on  

HTTP/1.1 200 OK Cache-Control: max-age=60

Page 37: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

5.  Integrate  with  Backend  Systems  

REST  API   Extensions  

Applica0on  

RDBMS   LDAP  

Page 38: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Migra0ng  to  Extensions  

•  Re-­‐implement  original  query  inside  extension  •  Modify  request/response  formats  and  headers  

•  Refactor  implementa0on  to  use  lower  parts  of  the  stack  where  necessary  

•  Measure,  measure,  measure  

Page 39: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Scaling  Reads  -­‐  Throughput  

Page 40: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Scale  Horizontally  For  High  Read  Throughput  

Applica0on  

Page 41: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Scale  Horizontally  For  High  Read  Throughput  

Applica0on  

Master   Slave   Slave  

Load  Balancer  

Page 42: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Scale  Horizontally  For  High  Read  Throughput  

Applica0on  

Master   Slave   Slave  

Read  Load  Balancer  

Write  Load  Balancer  

Page 43: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Configure  HAProxy  as  Read  Load  Balancer  global daemon maxconn 256defaults mode http timeout connect 5000ms timeout client 50000ms timeout server 50000msfrontend http-in bind *:80 default_backend neo4j-slavesbackend neo4j-slaves option httpchk GET /db/manage/server/ha/slave server s1 10.0.1.10:7474 maxconn 32 check server s2 10.0.1.11:7474 maxconn 32 check server s3 10.0.1.12:7474 maxconn 32 checklisten admin bind *:8080 stats enable

Page 44: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Configure  HAProxy  as  Read  Load  Balancer  global daemon maxconn 256defaults mode http timeout connect 5000ms timeout client 50000ms timeout server 50000msfrontend http-in bind *:80 default_backend neo4j-slavesbackend neo4j-slaves option httpchk GET /db/manage/server/ha/slave server s1 10.0.1.10:7474 maxconn 32 check server s2 10.0.1.11:7474 maxconn 32 check server s3 10.0.1.12:7474 maxconn 32 checklisten admin bind *:8080 stats enable

404 Not Found false

404 Not Found UNKNOWN

200 OK true

Master  

Slave  

Unknown  

Page 45: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

This  Isn’t  The  Throughput  You  Were  Looking  For  

Applica0on  

1   2   3  

Load  Balancer  

MATCH (c:Country{name:'Australia'})... MATCH (c:Country{name:'Zambia'})... MATCH (c:Country{name:'Norway'})...

Page 46: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Cache  Sharding  Using  Consistent  Rou0ng  

Applica0on  

1   2   3  

Load  Balancer  

MATCH (c:Country{name:'Australia'})... MATCH (c:Country{name:'Zambia'})... MATCH (c:Country{name:'Norway'})...

A-­‐I          1  J-­‐R          2  S-­‐Z          3  

MATCH (c:Country{name:'Zimbabwe'})... MATCH (c:Country{name:'Japan'})... MATCH (c:Country{name:'Brazil'})...

Page 47: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Configure  HAProxy  for  Cache  Sharding  global daemon maxconn 256defaults mode http timeout connect 5000ms timeout client 50000ms timeout server 50000msfrontend http-in bind *:80 default_backend neo4j-slavesbackend neo4j-slaves balance url_param country_code server s1 10.0.1.10:7474 maxconn 32 server s2 10.0.1.11:7474 maxconn 32 server s3 10.0.1.12:7474 maxconn 32listen admin bind *:8080 stats enable

Page 48: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Configure  HAProxy  for  Cache  Sharding  global daemon maxconn 256defaults mode http timeout connect 5000ms timeout client 50000ms timeout server 50000msfrontend http-in bind *:80 default_backend neo4j-slavesbackend neo4j-slaves balance url_param country_code server s1 10.0.1.10:7474 maxconn 32 server s2 10.0.1.11:7474 maxconn 32 server s3 10.0.1.12:7474 maxconn 32listen admin bind *:8080 stats enable

Page 49: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Scaling  Writes  -­‐  Throughput  

Page 50: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Factors  Impac0ng  Write  Performance  

•  Managing  transac0onal  state  – Crea0ng  and  commilng  are  expensive  opera0ons  

•  Contending  for  locks  – Nodes  and  rela0onships  

Page 51: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Improving  Write  Throughput  

•  Delay  taking  expensive  locks  •  Batch/queue  writes  

Page 52: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Delay  Expensive  Locks  

•  Iden0fy  contended  nodes  •  Involve  them  as  late  as  possible  in  a  transac0on  

Page 53: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Add  Linked  List  Item  +  Update  Pointers  

Page 54: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Add  Linked  List  Item  +  Update  Pointers  

Locked  

Page 55: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Add  Linked  List  Item  +  Update  Pointers  

Locked  

Page 56: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Add  Linked  List  Item  +  Update  Pointers  

Locked  

Page 57: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Add  Linked  List  Item  

Page 58: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Add  Linked  List  

Page 59: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Add  Linked  List  

Page 60: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Add  Linked  List  

Page 61: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Add  Pointers  

Locked  

Page 62: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Batch  Writes  

•  Mul0ple  CREATE/MERGE  statements  per  request  – Good  for  integra0on  with  backend  systems  

•  Queue  – Good  for  small,  online  transac0ons  

Page 63: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Single-­‐Threaded  Queue  

Write  

Write  Write  

Queue  

Single  Thread  Batch  

Page 64: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Queue  Loca0on  Op0ons  

Applica0on  Applica0on  

Page 65: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Benefits  of  Batched  Writes  

•  Less  transac0onal  state  management  – Create/commit  per  batch  rather  than  per  write  

•  No  conten0on  for  locks  – No  deadlocks  

•  Query  consolida0on  – Reduce  the  amount  of  work  inside  the  database  

Page 66: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Query  Consolida0on  

MATCH samMATCH jennyCREATE sam-[:KNOWS]-jennyMATCH samMATCH sarahCREATE sam-[:KNOWS]-sarahCREATE address1CREATE address2DELETE address1MATCH samCREATE sam-[:LIVES_AT]-address2

Page 67: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Eliminate  Duplicate  Lookups  

MATCH samMATCH jennyCREATE sam-[:KNOWS]-jennyMATCH samMATCH sarahCREATE sam-[:KNOWS]-sarahCREATE address1CREATE address2DELETE address1MATCH samCREATE sam-[:LIVES_AT]-address2

Page 68: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Eliminate  Duplicate  Lookups  

MATCH samMATCH jennyCREATE sam-[:KNOWS]-jennyMATCH samMATCH sarahCREATE sam-[:KNOWS]-sarahCREATE address1CREATE address2DELETE address1MATCH samCREATE sam-[:LIVES_AT]-address2

Page 69: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Eliminate  Duplicate  Lookups  

MATCH samMATCH jennyCREATE sam-[:KNOWS]-jennyMATCH sarahCREATE sam-[:KNOWS]-sarahCREATE address1CREATE address2DELETE address1CREATE sam-[:LIVES_AT]-address2

Page 70: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Eliminate  Duplicate  Lookups  

MATCH samMATCH jennyCREATE sam-[:KNOWS]-jennyMATCH sarahCREATE sam-[:KNOWS]-sarahCREATE address1CREATE address2DELETE address1CREATE sam-[:LIVES_AT]-address2

Page 71: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Eliminate  Unnecessary  Writes  

MATCH samMATCH jennyCREATE sam-[:KNOWS]-jennyMATCH sarahCREATE sam-[:KNOWS]-sarahCREATE address1CREATE address2DELETE address1CREATE sam-[:LIVES_AT]-address2

Page 72: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Eliminate  Unnecessary  Writes  

MATCH samMATCH jennyCREATE sam-[:KNOWS]-jennyMATCH sarahCREATE sam-[:KNOWS]-sarahCREATE address1CREATE address2DELETE address1CREATE sam-[:LIVES_AT]-address2

Page 73: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Eliminate  Unnecessary  Writes  

MATCH samMATCH jennyCREATE sam-[:KNOWS]-jennyMATCH sarahCREATE sam-[:KNOWS]-sarahCREATE address2CREATE sam-[:LIVES_AT]-address2

Page 74: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Tradeoff  

Latency  

Higher  throughput  

In-­‐memory  or  durable  queues?  • Lost  writes  in  event  of  crash  • Transac0onal  dequeue?  

Page 75: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Further  Reading  

h\p://maxdemarzi.com/2013/09/05/scaling-­‐writes/  h\p://maxdemarzi.com/2014/07/01/scaling-­‐concurrent-­‐writes-­‐in-­‐neo4j/  

Page 76: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Hardware  

Page 77: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Memory  

•  SLC  (single-­‐level  cell)  SSD  w/SATA    •  Lots  of  RAM  – 8-­‐12G  heap  – Explicitly  memory-­‐map  store  files  

Page 78: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Object  Cache  

•  2G  for  12G  heap  •  No  object  cache  – consistent  throughput  at  expense  of  latency  

Page 79: GraphConnect 2014 SF: From Zero to Graph in 120: Scale

AWS  

•  HVM  (hardware  virtual  machine)  over  PV  (paravirtual)  

•  EBS-­‐op0mized  instances    •  Provisioned  IOPS