slides from graphday santa clara
TRANSCRIPT
SANTA CLARA APRIL 14, 2016
09:00-09:30 09:30-10:15
10:15-11:00
11:00-11:30 11:30-12:30 12:30-13:30 13:30-17:00
Breakfast and RegistrationGraphs in Action: Driving Digital Transformation with Neo4jUnder the Hood: What’s a Graph and Where Do They FitBreakTransform Your Data: A worked exampleLunchTraining Session
Agenda
Speakers
Lars Nordwall Emil Eifrem Kevin Van Gundy Nicole White
Driving Digital Transformation With Neo4j
GRAPHS IN ACTION
Santa Clara, April 14, 2016
Lars Nordwall Chief Operating Officer
@lnordwall [email protected]
2016 Reality
174
Corporate Threat
Source: Accenture Strategy research, summer 2015
700 business leaders in the European Union, United States, China and Japan, a majority identified large digital players or start-ups as the greatest competitive threat to profitable growth. *)
Corporate Life SpanThe average corporate life span has been falling for more than half a century.
Standard & Poor’s data show it was - 61 years in 1958 - 25 years in 1980 - 18 years in 2011
Digitization is placing unprecedented pressure on organizations to evolve.
Atthepresentrate,75percentofS&P500incumbentswillbegoneby2027
Source: McKinsey, 2015
DilemmaEveryone collects data today. More
data the better..
“Store first, ask questions later”
Everyone seems to hire data
scientists today (or at least trying).
There is another dimension
beyond data volume:
Data Relationships
Why a graph database?
Social networks RetailHR & Recruiting
Manufacturing & Logistics
Health Care Telco
Today we see graph-projects in virtually every industry
Finance
Retail
Neo4j solves retail-related challenges for some of the largest companies in the world
Adidas uses Neo4j to combine content and product data into a single, searchable graph database which is used to create a personalized customer experience
“We have many different silos, many different data domains, and in order to make sense out of our data, we needed to bring those together and make them useful for us,” – Sokratis Kartelias, Adidas
eBay Now Tackles eCommerce Delivery Service Routing with Neo4j
“We needed to rebuild when growth and new features made our slowest query longer than our fastest delivery - 15 minutes! Neo4j gave us best solution” – Volker Pacher, eBay
Walmart uses Neo4j to give customer best web experience through relevant and personal recommendations
“As the current market leader in graph databases, and with enterprise features for scalability and availability, Neo4j is the right choice to meet our demands”. - Marcos Vada, Walmart
End ConsumersComponent Manufacturers
Logistics
Traditional Retail Value Chain
RetailersWholesalersAssembly Plants
PAYMENTSSALES- CHANNELS
SUPPLY CHAIN
PRODUCTS MARKETING
CRM
CUSTOMER EXPERIENCE
The Online Retail Value Chain
PAYMENTSSALES-CHANNELS
SUPPLY CHAIN
PRODUCTS MARKETING
CRM
CUSTOMER EXPERIENCEStore
Mobile
Webstore
PAYMENTSSALES-CHANNELS
SUPPLY CHAIN
PRODUCTS MARKETING
CRM
CUSTOMER EXPERIENCEStore
Mobile
Shipping
Inventory
Express goods
Home delivery
Webstore
PAYMENTSSALES-CHANNELS
SUPPLY CHAIN
PRODUCTS MARKETING
CRM
CUSTOMER EXPERIENCEStore
Mobile
Shipping
Inventory
Express goods
Home delivery RatingsPrice-range
Category
Webstore
PAYMENTSSALES-CHANNELS
SUPPLY CHAIN
PRODUCTS MARKETING
CRM
CUSTOMER EXPERIENCEStore
Mobile
Shipping
Inventory
Express goods
Home delivery RatingsPrice-range
Category ContentPromotions
Online advertising
Webstore
PAYMENTSSALES-CHANNELS
SUPPLY CHAIN
PRODUCTS MARKETING
CRM
CUSTOMER EXPERIENCEStore
Mobile
Shipping
Inventory
Express goods
Home delivery RatingsPrice-range
Category ContentPromotions
Online advertising
Loyalty Programs
Returns
Feedback
reviews
Tweets
Emails
Customer support
Webstore
PAYMENTSSALES-CHANNELS
SUPPLY CHAIN
PRODUCTS MARKETING
CRM
CUSTOMER EXPERIENCEStore
Mobile
Shipping
Inventory
Express goods
Home delivery RatingsPrice-range
Category ContentPromotions
Online advertising
Loyalty Programs
Returns
Feedback
reviews
Tweets
Emails
Customer support
Credit Card
Cash
Mobile Pay
Purchase History
PAYMENTS
Webstore
Digital transformation in retail today requires to put all this data into good use
SHOPPING EXPERIENCE
Related products
People who bought X also bought Y
Recommendations (In Real-Time)
The main product
LOOKS_AT
KITCHEN AID SERIES
LOOKS_AT
Complaints
reviews
TweetsEmails
KITCHEN AID SERIES
LOOKS_AT
Returns
Complaints
reviews
TweetsEmails
KITCHEN AID SERIES
LOOKS_AT
Returns
Inventory
Complaints
reviews
TweetsEmails
KITCHEN AID SERIES
LOOKS_AT
Returns
Home delivery
Inventory
Express goods
Complaints
reviews
TweetsEmails
Location/
KITCHEN AID SERIES
Promotions
Bundling
LOOKS_AT
Returns
Purchase History
Price-range
Home delivery
Inventory
Express goods
Complaints
reviews
TweetsEmails
Category
Promotions
Bundling
Location/
KITCHEN AID SERIES
LOOKS_AT
Returns
Purchase History
Price-range
Home delivery
Inventory
Express goods
Complaints
reviews
TweetsEmails
Category
Promotions
Bundling
Location
KITCHEN AID SERIES
To get results, in real time, from a dataset that is highly interconnected
– you need a graph database!
Under the Hood: What’s a Graph, and Where Do They Fit
Santa Clara, April 14, 2016
Emil Eifrem CEO, Neo Technology
Founder, Neo4j
What is the most powerful database in the world?
The internet
Genetic Ancestry of One Single Corn Variety
Philip’s Linkedin Graph
GOT IT. GRAPHS.
BUT WHAT IS A GRAPH?
A Graph Is
NODE
NODE
NODE
RELATIONSHIP
RELATIONSHIP
RELATIONSHIP
WITH
PERSON
CHECKING ACCOUNT
BANK
A Graph IsH
AS
HA
S
HAS
HOTEL
ROOM
BOOKING
A Graph Is
PERFORMED
PAUL McCARTNEY
BEATLES
A Graph IsB
ELO
NG
S_TO
SINGER
COMPOSERHEY JUDE
KNOWS
KN
OW
S
KNOWS
WO
RK
S_AT
WORKS_AT
WORKS_AT
COMPANY
STANFORD
STU
DIE
D_A
T
KNOWS
NEO
COLUMBIA
STU
DIE
D_A
T
STUDIED_AT
STUDIED_AT
NAME:ANNE
SINCE:2012
PROPERTY
A Graph
NAME:ANNE
SINCE:2012
A Graph
Use of Graphs has created some of the most successful companies in the world
C34,3%B
38,4%A3,3%
D3,8%
1,8%1,8% 1,8%
1,8%
1,8%
E8,1%
F3,9%
NEO4j USE CASESReal Time Recommendations
Master Data Management
Fraud Detection
Identity & Access Management
Graph Based Search
Network & IT-Operations
NEO4j USE CASESReal Time Recommendations
Master Data Management
Fraud Detection
Identity & Access Management
Graph Based Search
Network & IT-Operations
VIEW
ED
GRAPH THINKING: Real Time Recommendations
VIEWED
BOUG
HT
VIEWED BOUGHT
BOUGHT
BO
UG
HT
BOUG
HT
“As the current market leader in graph databases, and with enterprise features for scalability and availability, Neo4j is the right choice to meet our demands.” Marcos Wada
Software Developer, Walmart
NEO4j USE CASESReal Time Recommendations
Master Data Management
Fraud Detection
Identity & Access Management
Graph Based Search
Network & IT-Operations
NEO4j USE CASESReal Time Recommendations
Master Data Management
Fraud Detection
Identity & Access Management
Graph Based Search
Network & IT-Operations
GRAPH THINKING: Master Data Management
MANAGES
MANAGES
LEADS
REGION
MANAGES
MANAGES
REGION
LEADS
LEADS
COLL
ABO
RATE
S
Neo4j is the heart of Cisco HMP: used for governance and single source of truth and a one-stop shop for all of Cisco’s hierarchies.
NEO4j USE CASESReal Time Recommendations
Master Data Management
Fraud Detection
Identity & Access Management
Graph Based Search
Network & IT-Operations
NEO4j USE CASESReal Time Recommendations
Master Data Management
Fraud Detection
Identity & Access Management
Graph Based Search
Network & IT-Operations
OPENED_ACCOUNT
HAS IS_ISSUED
GRAPH THINKING: Fraud Detection
HAS
LIVES LIVES
IS_ISSUED
OPE
NED_
ACCO
UNT
“Graph databases offer new methods of uncovering fraud rings and other sophisticated scams with a high-level of accuracy, and are capable of stopping advanced fraud scenarios in real-time.”
Gorka SadowskiCyber Security Expert
NEO4j USE CASESReal Time Recommendations
Master Data Management
Fraud Detection
Identity & Access Management
Graph Based Search
Network & IT-Operations
GRAPH THINKING: Graph Based Search
PUBLISH
INCLUDE
INCLUDE
CREATE
CAPT
URE
IN
INSO
URCE
USES
USES
IN
IN
USES
NEO4j USE CASESReal Time Recommendations
Master Data Management
Fraud Detection
Identity & Access Management
Graph Based Search
Network & IT-Operations
SOURCE SOURCE
Uses Neo4j to manage the digital assets inside of its next generation in-flight entertainment system.
NEO4j USE CASESReal Time Recommendations
Master Data Management
Fraud Detection
Identity & Access Management
Graph Based Search
Network & IT-Operations
NEO4j USE CASESReal Time Recommendations
Master Data Management
Fraud Detection
Identity & Access Management
Graph Based Search
Network & IT-Operations
BROWSES
CONN
ECTS
BRIDGES
ROUTES
POW
ERS
ROUTES
POWERSPOWERS
HOSTS
QUERIES
GRAPH THINKING: Network & IT-Operations
Uses Neo4j for network topology analysis for big telco service providers
NEO4j USE CASESReal Time Recommendations
Master Data Management
Fraud Detection
Identity & Access Management
Graph Based Search
Network & IT-Operations
GRAPH THINKING: Identity And Access Management
TRUSTS
TRUSTS
ID
ID
AUTHENTICATES AUTH
ENTI
CATE
S
NEO4j USE CASESReal Time Recommendations
Master Data Management
Fraud Detection
Identity & Access Management
Graph Based Search
Network & IT-Operations
OWNS
OWNSC
AN
_REA
D
UBS was the recipient of the 2014 Graphie Award for “Best Identify And Access Management App”
NEO4j USE CASESReal Time Recommendations
Master Data Management
Fraud Detection
Identity & Access Management
Graph Based Search
Network & IT-Operations
Neo4j Adoption by Selected VerticalsSOFTWARE FINANCIAL
SERVICES RETAIL MEDIA & BROADCASTING
SOCIAL NETWORKS TELECOM HEALTHCARE
TECHNICAL BENEFITS OF GRAPH DATABASES
IntuitivnessSpeedAgility
IntuitivnessSpeedAgility
Intuitivness
IntuitivnessSpeedAgility
Connectedness and Size of Data Set
Resp
onse
Tim
e Relational and Other NoSQL Databases
0 to 2 hops 0 to 3 degrees Thousands of connections
1000x Advantage
Tens to hundreds of hops Thousands of degrees Billions of connections
Neo4j
“Minutes to milliseconds”
Real-Time Query Performance
Speed
“We found Neo4j to be literally thousands of times faster than our prior MySQL solution, with queries that require
10-100 times less code. Today, Neo4j provides eBay with functionality that was previously impossible.”
- Volker Pacher, Senior Developer
“Minutes to milliseconds” performance Queries up to 1000x faster than RDBMS or other NoSQL
IntuitivnessSpeedAgility
A Naturally Adaptive Model
A Query Language Designed for Connectedness
+
=Agility
CypherTypical Complex SQL Join The Same Query using Cypher
MATCH (boss)-[:MANAGES*0..3]->(sub), (sub)-[:MANAGES*1..3]->(report)WHERE boss.name = “John Doe”RETURN sub.name AS Subordinate, count(report) AS Total
Project ImpactLess time writing queries• More time understanding the answers • Leaving time to ask the next question
Less time debugging queries: • More time writing the next piece of code • Improved quality of overall code base
Code that’s easier to read: • Faster ramp-up for new project members • Improved maintainability & troubleshooting
CYPHER
Users Love Cypher
openCypher
LovesAnn Dan
(Dan)(Ann) -[:LOVES]->
Impact on the Business
Neo4j is ultra efficient &normally needs far less hardware
than any alternative
How?
Increase revenue
• Do new & impossible things • Faster time-to-market
Reduce cost
• Lower infrastructure costs
How?• Value from data relationships • Batch to real time • 1000x faster
THANK YOU!
Coffee Break Next session: Transform Your Data: A Worked Example
TRANSFORM YOUR DATA
Santa Clara, April 15, 2016Neo4j @ GraphDay
"The future is now…"
ACCOUNT HOLDER 2
ACCOUNT HOLDER 1
ACCOUNT HOLDER 3
CREDIT CARD
BANKACCOUNT
BANKACCOUNT
BANKACCOUNT
ADDRESS
PHONE NUMBER
PHONE NUMBER
SSN 2
UNSECURE LOAN
SSN 2
UNSECURE LOAN
CREDIT CARD
AGENDA• SQL Pains • Building a Neo4j Application • Moving from RDBMS -> Graph Models
• Walk through an Example • Creating Data in Graphs • Querying Data
SQL
Day in the Life of a RDBMS Developer
SELECT p.name, c.country, c.leader, p.hair, u.name, u.pres, u.stateFROM people p LEFT JOIN country c ON c.ID=p.country LEFT JOIN uni u ON p.uni=u.idWHERE u.state=‘CT’
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
Have you seen Ted's UUID?
• Complex to model and store relationships • Performance degrades with increases in data • Queries get long and complex • Maintenance is painful
SQL Pains
• Easy to model and store relationships • Performance of relationship traversal remains constant with
growth in data size • Queries are shortened and more readable • Adding additional properties and relationships can be done on
the fly - no migrations
Graph Gains
SQL Pains
Graph Gains
SQL Pains
Graph Gains
How do you use Neo4j?
CREATE MODEL
+
LOAD DATA QUERY DATA
How do you use Neo4j?
How do you use Neo4j?
Language Drivers
Language Drivers
Native Server-Side Extensions
Architectural Options
DataStorageandBusinessRulesExecu5on
DataMiningandAggrega5on
Applica'on
GraphDatabaseCluster
Neo4j Neo4j Neo4j
AdHocAnalysis
BulkAnaly'cInfrastructureHadoop,EDW…
DataScien'st
EndUser
DatabasesRela5onalNoSQLHadoop
MIGRATE ALLDATA
MIGRATE GRAPHDATA
DUPLICATEGRAPHDATA
Non-graphdata Graphdata
GraphdataAlldata
Alldata
Relational Database
Graph Database
Application
Application
Application
RDBMS to Graph Options
FROM RDBMS TO GRAPHS
Northwind
Northwind - the canonical RDBMS Example
( )-[:TO]->(Graph)
( )-[:IS_BETTER_AS]->(Graph)
Starting with the ER Diagram
Locate the Foreign Keys
Drop the Foreign Keys
Find the JOIN Tables
(Simple) JOIN Tables Become Relationships
Attributed JOIN Tables -> Relationships with Properties
Querying a Subset Today
As a Graph
QUERYING THE GRAPH
using openCypher
Who do people report to?MATCH (sub:Employee)-[:REPORTS_TO]->(e:Employee)RETURN *
Who do people report to?
Who do people report to?MATCH (sub:Employee)-[:REPORTS_TO]->(e:Employee)RETURN e.employeeID AS managerID, e.firstName AS managerName, sub.employeeID AS employeeID, sub.firstName AS employeeName;
Who do people report to?
Who does Robert report to?
MATCH p=(sub:Employee)-[:REPORTS_TO]->(e:Employee)WHERE sub.firstName = ‘Robert’RETURN p
Who does Robert report to?
What is Robert’s reporting chain?
MATCH p=(sub:Employee)-[:REPORTS_TO*]->(e:Employee)WHERE sub.firstName = ‘Robert’RETURN p
What is Robert’s reporting chain?
Report: Product Cross-Selling
MATCH (o:Order)-[:INCLUDES]->(:Product{productName:'Chocolade'}),(employee)-[:SOLD]->(o),(employee)-[:SOLD]->(otherOrder)-[:INCLUDES]->(other:Product)RETURN employee.firstName, other.productName, COUNT(DISTINCT otherOrder) as count ORDER BY count DESC;
Product Cross-Selling
POWERING AN APP
Simple App
Simple Python Code
Simple Python Code
Simple Python Code
Simple Python Code
But how do I liberate my RDBMs data?
CSV
CSV files for Northwind
3 Steps to Creating the Graph
IMPORT NODES CREATE INDEXES IMPORT RELATIONSHIPS
Importing Nodes// Create categoriesUSING PERIODIC COMMITLOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-contrib/developer-resources/gh-pages/data/northwind/categories.csv" AS rowCREATE (:Category {categoryID: row.CategoryID, categoryName: row.CategoryName, description: row.Description});
// Create ordersUSING PERIODIC COMMITLOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-contrib/developer-resources/gh-pages/data/northwind/orders.csv" AS rowMERGE (order:Order {orderID: row.OrderID}) ON CREATE SET order.shipName = row.ShipName;
Importing Nodes// Create customersUSING PERIODIC COMMITLOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-contrib/developer-resources/gh-pages/data/northwind/customers.csv" AS rowCREATE (:Customer {companyName: row.CompanyName, customerID: row.CustomerID, fax: row.Fax, phone: row.Phone});
// Create productsUSING PERIODIC COMMITLOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-contrib/developer-resources/gh-pages/data/northwind/products.csv" AS rowCREATE (:Product {productName: row.ProductName, productID: row.ProductID, unitPrice: toFloat(row.UnitPrice)});
Creating Indexes
CREATE CONSTRAINT ON (p:Product) ASSERT p.productID is UNIQUE;CREATE CONSTRAINT ON (e:Employee) ASSERT e.employeeID is UNIQUE;CREATE CONSTRAINT ON (c:Customer) ASSERT c.customerID is UNIQUE;CREATE INDEX ON :Product(productName);CREATE INDEX ON :Category(categoryID);CREATE INDEX ON :Supplier(supplierID);CREATE INDEX ON :Customer(customerName);
Sew it together…
Creating RelationshipsUSING PERIODIC COMMITLOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-contrib/developer-resources/gh-pages/data/northwind/orders.csv" AS rowMATCH (order:Order {orderID: row.OrderID})MATCH (customer:Customer {customerID: row.CustomerID})MERGE (customer)-[:PURCHASED]->(order);
USING PERIODIC COMMITLOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-contrib/developer-resources/gh-pages/data/northwind/products.csv" AS rowMATCH (product:Product {productID: row.ProductID})MATCH (supplier:Supplier {supplierID: row.SupplierID})MERGE (supplier)-[:SUPPLIES]->(product);
Creating RelationshipsUSING PERIODIC COMMITLOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-contrib/developer-resources/gh-pages/data/northwind/orders.csv" AS rowMATCH (order:Order {orderID: row.OrderID})MATCH (product:Product {productID: row.ProductID})MERGE (order)-[pu:INCLUDES]->(product)ON CREATE SET pu.unitPrice = toFloat(row.UnitPrice), pu.quantity = toFloat(row.Quantity);
USING PERIODIC COMMITLOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-contrib/developer-resources/gh-pages/data/northwind/orders.csv" AS rowMATCH (order:Order {orderID: row.OrderID})MATCH (employee:Employee {employeeID: row.EmployeeID})MERGE (employee)-[:SOLD]->(order);
Creating RelationshipsUSING PERIODIC COMMITLOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-contrib/developer-resources/gh-pages/data/northwind/products.csv" AS rowMATCH (product:Product {productID: row.ProductID})MATCH (category:Category {categoryID: row.CategoryID})MERGE (product)-[:PART_OF]->(category);
USING PERIODIC COMMITLOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-contrib/developer-resources/gh-pages/data/northwind/employees.csv" AS rowMATCH (employee:Employee {employeeID: row.EmployeeID})MATCH (manager:Employee {employeeID: row.ReportsTo})MERGE (employee)-[:REPORTS_TO]->(manager);
High Performance LOADingneo4j-import
4.58 million thingsand their relationships…
Loads in 100 seconds!
WRAPPING UP
“Graph analysis is possibly the single most effective competitive differentiator for organizations pursuing data-driven operations and decisions after the design of data capture.”