intro to graphs and neo4j

Post on 26-Jan-2015

3.660 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

This presentation introduces the graph model as obvious choice for rich and connected data. Graph Databases are a category of open-source NoSQL datastores which are specialized in storing, handling and querying graph structures efficiently. Use cases represent the applicability of the graph model across many domains. Neo4j as the most widely used graph database supports the property graph model, which is explained in detail. To query a graph database a powerful and expressive but also friendly and easily understandable query language that is tailored for graph patterns is key. Neo4j's Cypher is such a query language developed from the ground up to support expressing challenging use-cases in a comprehensive way. A series of examples rounds up the presentation to apply the lessons learned.

TRANSCRIPT

(c) Neo Technology, Inc 2014

Graph Database Introduction

Meetup April 2014

Michael Hunger michael@neotechnology.com

@mesirii @neo4j

(c) Neo Technology, Inc 2014

Agenda

1. Why Graphs, Why Now?

2. What Is A Graph, Anyway?

3. Graphs In The Real World

4. The Graph Landscape

i) Popular Graph Models

ii) Graph Databases

iii)Graph Compute Engines

(c) Neo Technology, Inc 2014

Why Graphs?

(c) Neo Technology, Inc 2014

The World is a Graph

(c) Neo Technology, Inc 2014

Some Use-Cases

(c) Neo Technology, Inc 2014

Social  Network

(c) Neo Technology, Inc 2014

(Network)  Impact  Analysis

(c) Neo Technology, Inc 2014

Route  Finding

(c) Neo Technology, Inc 2014

Recommenda<ons

(c) Neo Technology, Inc 2014

Logis<cs

(c) Neo Technology, Inc 2014

Access  Control

(c) Neo Technology, Inc 2014

Fraud  Analysis

(c) Neo Technology, Inc 2014

Securi<es  &  Debt

(c) Neo Technology, Inc 2014

What Is A Graph, Anyway?

(c) Neo Technology, Inc 2014

A  GraphNode

Relationship

(c) Neo Technology, Inc 2014

Four Graph Model Building Blocks

(c) Neo Technology, Inc 2014

Property  Graph  Data  Model

(c) Neo Technology, Inc 2014

Nodes

(c) Neo Technology, Inc 2014

Rela<onships

(c) Neo Technology, Inc 2014

Rela<onships  (con<nued)

Nodes  can  have  more  than  one  rela<onship

Self  rela<onships  are  allowed

Nodes  can  be  connected  by  more  than  one  rela<onship

(c) Neo Technology, Inc 2014

Labels

(c) Neo Technology, Inc 2014

Four  Building  Blocks๏ Nodes  

• En<<es  

๏ Rela<onships  

• Connect  en<<es  and  structure  domain  

๏ Proper<es  

• AJributes  and  metadata  

๏ Labels  

• Group  nodes  by  role

(c) Neo Technology, Inc 2014

Whiteboard Friendlyness

Easy to design and model direct representation of the model

(c) Neo Technology, Inc 2014

(c) Neo Technology, Inc 2014

Tom Hanks Hugo Weaving

Cloud AtlasThe Matrix

Lana Wachowski

ACTED_IN

ACTED_INACTED_IN

DIRECTED

DIRECTED

(c) Neo Technology, Inc 2014

name: Tom Hanks born: 1956

title: Cloud Atlas released: 2012

title: The Matrix released: 1999

name: Lana Wachowski born: 1965

ACTED_IN roles: Zachry

ACTED_IN roles: Bill Smoke

DIRECTED

DIRECTED

ACTED_IN roles: Agent Smith

name: Hugo Weaving born: 1960

Person

Movie

Movie

Person Director

ActorPerson Actor

(c) Neo Technology, Inc 2014

(c) Neo Technology, Inc 2014

Aggregate vs. Connected Data-Model

(c) Neo Technology, Inc 2014

What is NOSQL?

It’s not “No to SQL”

It’s not “Never SQL”

It’s “Not Only SQL”

NOSQL \no-seek-wool\ n. Describes ongoing trend where developers increasingly opt for non-relational databases to help solve their problems, in an effort to use the right tool for the right job.

(c) Neo Technology, Inc 2014

NOSQL

RelationalGraph

Document

KeyValue

Riak

Column oriented

Redis

Cassandra

Mongo

Couch

Neo4j

MySQL Postgres

NOSQL Databases

(c) Neo Technology, Inc 2014

31

(c) Neo Technology, Inc 2014

31

Living in a NOSQL World

(c) Neo Technology, Inc 2014

31

Living in a NOSQL World

Volume ~= Size

(c) Neo Technology, Inc 2014

31

Living in a NOSQL WorldD

ensi

ty ~

= C

ompl

exity

Volume ~= Size

(c) Neo Technology, Inc 2014

31

Living in a NOSQL WorldD

ensi

ty ~

= C

ompl

exity

Volume ~= Size

Key-Value Store

(c) Neo Technology, Inc 2014

31

Living in a NOSQL WorldD

ensi

ty ~

= C

ompl

exity

Column Family

Volume ~= Size

Key-Value Store

(c) Neo Technology, Inc 2014

31

Living in a NOSQL WorldD

ensi

ty ~

= C

ompl

exity

Column Family

Volume ~= Size

Key-Value Store

Document Databases

(c) Neo Technology, Inc 2014

31

Living in a NOSQL World

RDBMS

Den

sity

~=

Com

plex

ity

Column Family

Volume ~= Size

Key-Value Store

Document Databases

(c) Neo Technology, Inc 2014

31

Living in a NOSQL World

RDBMS

Den

sity

~=

Com

plex

ity

Column Family

Volume ~= Size

Key-Value Store

Document Databases

Graph Databases

(c) Neo Technology, Inc 2014

31

Living in a NOSQL World

RDBMS

Den

sity

~=

Com

plex

ity

Column Family

Volume ~= Size

Key-Value Store

Document Databases

Graph Databases

90% of

use cases

(c) Neo Technology, Inc 2014

31

Living in a NOSQL World

RDBMS

Den

sity

~=

Com

plex

ity

Column Family

Volume ~= Size

Key-Value Store

Document Databases

Graph Databases

90% of

use cases

(c) Neo Technology, Inc 2014

31

Living in a NOSQL World

Aggregate Oriented

RDBMS

Den

sity

~=

Com

plex

ity

Column Family

Volume ~= Size

Key-Value Store

Document Databases

Graph Databases

90% of

use cases

(c) Neo Technology, Inc 2014

“There is a significant downside - the whole approach works really well when data access is aligned with the aggregates, but what if you want to look at the data in a different way? Order entry naturally stores orders as aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the database is that it allows you to slice and dice your data different ways for different audiences. !This is why aggregate-oriented stores talk so much about map-reduce.”

Martin Fowler

Aggregate Oriented Model

(c) Neo Technology, Inc 2014

The connected data model is based on fine grained elements that are richly connected, the emphasis is on extracting many

dimensions and attributes as elements. Connections are cheap and can be used not only for the

domain-level relationships but also for additional structures that allow efficient access for different use-cases. The fine

grained model requires a external scope for mutating operations that ensures Atomicity, Consistency, Isolation and

Durability - ACID also known as Transactions. !

Michael Hunger

Connected Data Model

(c) Neo Technology, Inc 2014

Relational vs. Graph

34

(c) Neo Technology, Inc 2014

Relational vs. GraphYou know relational

34

(c) Neo Technology, Inc 2014

Relational vs. GraphYou know relational

34

(c) Neo Technology, Inc 2014

Relational vs. GraphYou know relational

34

users

(c) Neo Technology, Inc 2014

Relational vs. GraphYou know relational

34

users skills

(c) Neo Technology, Inc 2014

Relational vs. GraphYou know relational

34

users skillsuser_skill

(c) Neo Technology, Inc 2014

Relational vs. GraphYou know relational

34

users skillsuser_skill

(c) Neo Technology, Inc 2014

Relational vs. GraphYou know relational

34

users skillsuser_skill

(c) Neo Technology, Inc 2014

Relational vs. GraphYou know relational

34

users skillsuser_skill

(c) Neo Technology, Inc 2014

Relational vs. GraphYou know relational

34

now consider relationships...

(c) Neo Technology, Inc 2014

Relational vs. GraphYou know relational

34

now consider relationships...

(c) Neo Technology, Inc 2014

Relational vs. GraphYou know relational

34

now consider relationships...

(c) Neo Technology, Inc 2014

Relational vs. GraphYou know relational

34

now consider relationships...

(c) Neo Technology, Inc 2014

Relational vs. GraphYou know relational

34

now consider relationships...

(c) Neo Technology, Inc 2014

Relational vs. GraphYou know relational

34

now consider relationships...

(c) Neo Technology, Inc 2014

Relational vs. Graph

34

(c) Neo Technology, Inc 2014

35

(c) Neo Technology, Inc 2014

Looks different, fine. Who cares?

35

(c) Neo Technology, Inc 2014

Looks different, fine. Who cares?

๏a sample social graph

35

(c) Neo Technology, Inc 2014

Looks different, fine. Who cares?

๏a sample social graph

•with ~1,000 persons

35

(c) Neo Technology, Inc 2014

Looks different, fine. Who cares?

๏a sample social graph

•with ~1,000 persons

๏average 50 friends per person

35

(c) Neo Technology, Inc 2014

Looks different, fine. Who cares?

๏a sample social graph

•with ~1,000 persons

๏average 50 friends per person

๏pathExists(a,b) limited to depth 4

35

(c) Neo Technology, Inc 2014

Looks different, fine. Who cares?

๏a sample social graph

•with ~1,000 persons

๏average 50 friends per person

๏pathExists(a,b) limited to depth 4

๏caches warmed up to eliminate disk I/O

35

(c) Neo Technology, Inc 2014

Looks different, fine. Who cares?

๏a sample social graph

•with ~1,000 persons

๏average 50 friends per person

๏pathExists(a,b) limited to depth 4

๏caches warmed up to eliminate disk I/O

35

# persons query time

Relational database 1.000 2000ms

(c) Neo Technology, Inc 2014

Looks different, fine. Who cares?

๏a sample social graph

•with ~1,000 persons

๏average 50 friends per person

๏pathExists(a,b) limited to depth 4

๏caches warmed up to eliminate disk I/O

35

# persons query time

Relational database 1.000 2000ms

Neo4j 1.000 2ms

(c) Neo Technology, Inc 2014

Looks different, fine. Who cares?

๏a sample social graph

•with ~1,000 persons

๏average 50 friends per person

๏pathExists(a,b) limited to depth 4

๏caches warmed up to eliminate disk I/O

35

# persons query time

Relational database 1.000 2000ms

Neo4j 1.000 2ms

Neo4j 1.000.000 2ms

(c) Neo Technology, Inc 2014

35

(c) Neo Technology, Inc 2014

Neo4j is a Graph Database

(c) Neo Technology, Inc 2014

Neo4j is a Graph Database

• A Graph Database:

(c) Neo Technology, Inc 2014

Neo4j is a Graph Database

• A Graph Database:

• a schema-free labeled Property Graph

(c) Neo Technology, Inc 2014

Neo4j is a Graph Database

• A Graph Database:

• a schema-free labeled Property Graph

• perfect for complex, highly connected data

(c) Neo Technology, Inc 2014

Neo4j is a Graph Database

• A Graph Database:

• a schema-free labeled Property Graph

• perfect for complex, highly connected data

• A Graph Database:

(c) Neo Technology, Inc 2014

Neo4j is a Graph Database

• A Graph Database:

• a schema-free labeled Property Graph

• perfect for complex, highly connected data

• A Graph Database:

• reliable with real ACID Transactions

(c) Neo Technology, Inc 2014

Neo4j is a Graph Database

• A Graph Database:

• a schema-free labeled Property Graph

• perfect for complex, highly connected data

• A Graph Database:

• reliable with real ACID Transactions

• scalable: Billions of Nodes and Relationships, Scale out with highly available Neo4j-Cluster

(c) Neo Technology, Inc 2014

Neo4j is a Graph Database

• A Graph Database:

• a schema-free labeled Property Graph

• perfect for complex, highly connected data

• A Graph Database:

• reliable with real ACID Transactions

• scalable: Billions of Nodes and Relationships, Scale out with highly available Neo4j-Cluster

• fast with more than 2M traversals / second

(c) Neo Technology, Inc 2014

Neo4j is a Graph Database

• A Graph Database:

• a schema-free labeled Property Graph

• perfect for complex, highly connected data

• A Graph Database:

• reliable with real ACID Transactions

• scalable: Billions of Nodes and Relationships, Scale out with highly available Neo4j-Cluster

• fast with more than 2M traversals / second

• Server with HTTP API, or Embeddable on the JVM

(c) Neo Technology, Inc 2014

Neo4j is a Graph Database

• A Graph Database:

• a schema-free labeled Property Graph

• perfect for complex, highly connected data

• A Graph Database:

• reliable with real ACID Transactions

• scalable: Billions of Nodes and Relationships, Scale out with highly available Neo4j-Cluster

• fast with more than 2M traversals / second

• Server with HTTP API, or Embeddable on the JVM

• Declarative Query Language

(c) Neo Technology, Inc 2014

Graph Database: Pros & Cons• Strengths

• Powerful data model, as general as RDBMS

• Whiteboard friendly, agile development

• Fast, for connected data

• Easy to query

• Weaknesses:

• Sharding (they can scale up and out reasonably well)

• Global Queries / Number Crunching

• Binary Data / Blobs

• Requires conceptual shift

• graph-like thinking becomes addictive

(c) Neo Technology, Inc 2014

Graph Querying

(c) Neo Technology, Inc 2014

You know how to query a relational database!

(c) Neo Technology, Inc 2014

40

(c) Neo Technology, Inc 2014

Just use SQL

40

(c) Neo Technology, Inc 2014

Just use SQL

40users skillsuser_skills

(c) Neo Technology, Inc 2014

Just use SQL

40users skillsuser_skills

select skills.namefrom users join user_skills on (...) join skills on (...)where users.name = “Michael“

(c) Neo Technology, Inc 2014

How to query a graph?

(c) Neo Technology, Inc 2014

42

(c) Neo Technology, Inc 2014

You traverse the graph

42

(c) Neo Technology, Inc 2014

// find starting nodesMATCH (me:Person {name:'Andreas'})

Andreas

You traverse the graph

42

(c) Neo Technology, Inc 2014

// find starting nodesMATCH (me:Person {name:'Andreas'})// then traverse the relationshipsMATCH (me:Person {name:'Andreas'})-[:FRIEND]-(friend)

-[:FRIEND]-(friend2)RETURN friend2

Andreas

You traverse the graph

42

(c) Neo Technology, Inc 2014

Cypher a pattern-matching

query language for graphs

(c) Neo Technology, Inc 2014

Cypher attributes

#1 Declarative

You tell Cypher what you want, not how to get it

44

(c) Neo Technology, Inc 2014

Cypher attributes

#2 Expressive

Optimize syntax for reading

45

MATCH (a:Actor)-[r:ACTS_IN]->(m:Movie) RETURN a.name, r.role, m.title

(c) Neo Technology, Inc 2014

Cypher attributes

#3 Pattern Matching

Patterns are easy for your human brain

46

(c) Neo Technology, Inc 2014

Cypher attributes

#4 Idempotent

State change should be expressed idempotently

47

(c) Neo Technology, Inc 2014

Query Structure

(c) Neo Technology, Inc 2014

MATCH (n:Label)-[:REL]->(m:Label) WHERE n.prop < 42 WITH n, count(m) as cnt, collect(m.attr) as attrs WHERE cnt > 12 RETURN n.prop, extract(a2 in filter(a1 in attrs WHERE a1 =~ "...-.*") | substr(a2,4,size(a2)-1)] AS ids ORDER BY length(ids) DESC LIMIT 10

Query Structure

(c) Neo Technology, Inc 2014

MATCH describes the pattern

(c) Neo Technology, Inc 2014

MATCH (n:Label)-[:REL]->(m:Label) WHERE n.prop < 42 WITH n, count(m) as cnt, collect(m.attr) as attrs WHERE cnt > 12 RETURN n.prop, extract(a2 in filter(a1 in attrs WHERE a1 =~ "...-.*") | substr(a2,4,size(a2)-1)] AS ids ORDER BY length(ids) DESC SKIP 5 LIMIT 10

MATCH - Pattern

(c) Neo Technology, Inc 2014

WHERE filters the result set

(c) Neo Technology, Inc 2014

MATCH (n:Label)-[:REL]->(m:Label) WHERE n.prop < 42 WITH n, count(m) as cnt, collect(m.attr) as attrs WHERE cnt > 12 RETURN n.prop, extract(a2 in filter(a1 in attrs WHERE a1 =~ "...-.*") | substr(a2,4,size(a2)-1)] AS ids ORDER BY length(ids) DESC SKIP 5 LIMIT 10

WHERE - filter

(c) Neo Technology, Inc 2014

RETURN returns the result rows

(c) Neo Technology, Inc 2014

MATCH (n:Label)-[:REL]->(m:Label) WHERE n.prop < 42 WITH n, count(m) as cnt, collect(m.attr) as attrs WHERE cnt > 12 RETURN n.prop, extract(a2 in filter(a1 in attrs WHERE a1 =~ "...-.*") | substr(a2,4,size(a2)-1)] AS ids ORDER BY length(ids) DESC SKIP 5 LIMIT 10

RETURN - project

(c) Neo Technology, Inc 2014

ORDER BY LIMIT SKIP sort and paginate

(c) Neo Technology, Inc 2014

MATCH (n:Label)-[:REL]->(m:Label) WHERE n.prop < 42 WITH n, count(m) as cnt, collect(m.attr) as attrs WHERE cnt > 12 RETURN n.prop, extract(a2 in filter(a1 in attrs WHERE a1 =~ "...-.*") | substr(a2,4,size(a2)-1)] AS ids ORDER BY length(ids) DESC SKIP 5 LIMIT 10

ORDER BY LIMIT - Paginate

(c) Neo Technology, Inc 2014

WITH combines query parts

like a pipe

(c) Neo Technology, Inc 2014

MATCH (n:Label)-[:REL]->(m:Label) WHERE n.prop < 42 WITH n, count(m) as cnt, collect(m.attr) as attrs WHERE cnt > 12 RETURN n.prop, extract(a2 in filter(a1 in attrs WHERE a1 =~ "...-.*") | substr(a2,4,size(a2)-1)] AS ids ORDER BY length(ids) DESC SKIP 5 LIMIT 10

WITH + WHERE = HAVING

(c) Neo Technology, Inc 2014

Collections powerful datastructure

handling

(c) Neo Technology, Inc 2014

MATCH (n:Label)-[:REL]->(m:Label) WHERE n.prop < 42 WITH n, count(m) as cnt, collect(m.attr) as attrs WHERE cnt > 12 RETURN n.prop, extract(a2 in filter(a1 in attrs WHERE a1 =~ "...-.*") | substr(a2,4,size(a2)-1)] AS ids ORDER BY length(ids) DESC LIMIT 10

Collections

(c) Neo Technology, Inc 2014

MATCH (:Country {name:"Sweden"}) <-[:REGISTERED_IN]-(c:Company) <-[:WORKS_AT]-(p:Person:Developer) WHERE p.age < 42 WITH c, count(p) as cnt, collect(p.empId) as emp_ids WHERE cnt > 12 RETURN c.name AS company_name, extract(id2 in filter(id1 in emp_ids WHERE id1 =~ "...-.*") | substr(id2,4,size(id2)-1)] AS last_emp_id_digits ORDER BY length(last_emp_id_digits) DESC SKIP 5 LIMIT 10

Concrete Example

(c) Neo Technology, Inc 2014

CREATE creates nodes, relationships

and patterns

(c) Neo Technology, Inc 2014

CREATE (y:Year {year:2014}) FOREACH (m IN range(1,12) | CREATE (:Month {month:m})-[:IN]->(y) )

CREATE - nodes, rels, structures

(c) Neo Technology, Inc 2014

MERGE matches or creates

(c) Neo Technology, Inc 2014

MERGE (y:Year {year:2014})ON CREATE SET y.created = timestamp() FOREACH (m IN range(1,12) | MERGE (:Month {month:m})-[:IN]->(y) )

MERGE - get or create

(c) Neo Technology, Inc 2014

SET, REMOVE update attributes and labels

(c) Neo Technology, Inc 2014

MATCH (year:Year)WHERE year.year % 4 = 0 OR year.year % 100 <> 0 AND year.year % 400 = 0 SET year:Leap WITH year MATCH (year)<-[:IN]-(feb:Month {month:2}) SET feb.days = 29CREATE (feb)<-[:IN]-(:Day {day:29})

SET, REMOVE, DELETE

(c) Neo Technology, Inc 2014

INDEX, CONSTRAINTS

represent optional schema

(c) Neo Technology, Inc 2014

CREATE CONSTRAINT ON (y:Year) ASSERT y.year IS UNIQUE !CREATE INDEX ON :Month(month)

INDEX / CONSTRAINT

(c) Neo Technology, Inc 2014

Graph Query Examples

(c) Neo Technology, Inc 2014

Social Recommendation

(c) Neo Technology, Inc 2014

(c) Neo Technology, Inc 2014

(c) Neo Technology, Inc 2014

MATCH (person:Person)-[:IS_FRIEND_OF]->(friend), (friend)-[:LIKES]->(restaurant),

(restaurant)-[:LOCATED_IN]->(loc:Location), (restaurant)-[:SERVES]->(type:Cuisine) !WHERE person.name = 'Philip' AND loc.location='New York' AND type.cuisine='Sushi' !RETURN restaurant.name

* Cypher query language examplehttp://maxdemarzi.com/?s=facebook

(c) Neo Technology, Inc 2014

(c) Neo Technology, Inc 2014

(c) Neo Technology, Inc 2014

Network Management Example

(c) Neo Technology, Inc 2014

Network Management - Create

CREATE !! (crm {name:"CRM"}),!! (dbvm {name:"Database VM"}),!! (www {name:"Public Website"}),!! (wwwvm {name:"Webserver VM"}),!! (srv1 {name:"Server 1"}),!! (san {name:"SAN"}),!! (srv2 {name:"Server 2"}),!!! (crm)-[:DEPENDS_ON]->(dbvm),!! (dbvm)-[:DEPENDS_ON]->(srv2),!! (srv2)-[:DEPENDS_ON]->(san),!! (www)-[:DEPENDS_ON]->(dbvm),!! (www)-[:DEPENDS_ON]->(wwwvm),!! (wwwvm)-[:DEPENDS_ON]->(srv1),!! (srv1)-[:DEPENDS_ON]->(san)!

Practical Cypher

(c) Neo Technology, Inc 2014

Network Management - Impact Analysis

// Server 1 Outage!MATCH (n)<-[:DEPENDS_ON*]-(upstream)!WHERE n.name = "Server 1"!RETURN upstream!

Practical Cypher

upstream

{name:"Webserver VM"}

{name:"Public Website"}

(c) Neo Technology, Inc 2014

Network Management - Dependency Analysis

// Public website dependencies!MATCH (n)-[:DEPENDS_ON*]->(downstream)!WHERE n.name = "Public Website"!RETURN downstream!!

Practical Cypher

downstream

{name:"Database VM"}

{name:"Server 2"}

{name:"SAN"}

{name:"Webserver VM"}

{name:"Server 1"}

(c) Neo Technology, Inc 2014

Network Management - Statistics

// Most depended on component!MATCH (n)<-[:DEPENDS_ON*]-(dependent)!RETURN n, !count(DISTINCT dependent) !AS dependents!

ORDER BY dependents DESC!LIMIT 1

Practical Cypher

n dependents

{name:"SAN"} 6

(c) Neo Technology, Inc 2014

๏ Full day Neo4j Training & Online Training

๏ Free e-Books

• Graph Databases, Neo4j 2.0 (DE)

๏ neo4j.org

• http://neo4j.org/develop/modeling

๏ docs.neo4j.org

• Data Modeling Examples

๏ http://console.neo4j.org

๏ http://gist.neo4j.org

๏ Get Neo4j

• http://neo4j.org/download

๏ Participate

• http://groups.google.com/group/neo4j

How to get started?

81

(c) Neo Technology, Inc 2014

Thank YouTime for Questions!

top related