how we learned to stop worrying and solve the distributed graph problem
DESCRIPTION
Graphs, what are they and why? Graph Data Management. Why do we need it? Problems in Distributed Graph How we solved the problems? Finding the value in Big Data.TRANSCRIPT
www.Objectivity.com
How we Learned to Stop Worrying and Solve the
Distributed Graph Problem
Ibrahim Sallam
Director of Development
What we’ll talk about
• Graphs, what are they and why?• Graph Data Management. Why do we need it?• Problems in Distributed Graph• How we solved the problems
Graphs
Graphs
http://opte.org
Simple Graph
Node <-> Node
The Value of Data
• “[finding] Japanese restaurants in New York City liked by people from Japan” – a challenging Facebook query.
• "The value of any piece of information is only known when you can connect it with something else that arrives at a future point in time,” – CIA’s CTO Ira "Gus" Hunt
Connecting Data
Person Building?
Work Live RR Visit Eat Shop
Relationships are everywhere
CRM, Sales & Marketing Network
Mgmt, Telecom
Intelligence (Government& Business)
Finance
HealthcareResearch: Genomics
Social Networks
PLM (Product Lifecycle Mgmt)
IG
Graph Data
• Data structure for representing complicated relationships between entities.
• Wide applications in…– Bioinformatics.– Social networks.– Network management… etc.
Why a Graph Database ?
The Graph Specialist !
• Everyone specializes – Doctors, Lawyers, Bankers, Developers
• Why was data so normalized for so long !• NoSQL is all about the data specialist• Specializing in…– Distribution / deployment– Physical data storage– Logical data model– Query mechanism
What Problem!
Ingest
Update
Navigate
Massive graph data require efficient and intelligent tools to analyze and understand it.
Scaling Writes
• Big/Fast data demands write performance• Most NoSQL solutions allow you to scale
writes by…– Partitioning the data– Understanding your consistency requirements– Allowing you to defer conflicts
Ingest
App-2(Ingest V2)
App-2(E23{ V2V3})
Scaling Graph Writes ACID Transactions
InfiniteGraph
Objectivity/DB Persistence Layer
App-1(Ingest V1)
App-3(Ingest V3)
V1 V2 V3
App-1(E1 2{ V1V2})
App-3
E12 E23
Ingest
High Performance Edge Ingest
IG Core/API
C1
C2
C3
E12
E23
Targ
et C
onta
iner
s
Pipeline Containers
E(1->2)
E(3->1)
E(2->3)
E(2->1)
E(2->3)E(3->1)
E(1->2)
E(3->2)
E(1->2)
E(2->3)
E(3->1)
E(2->1)
E(2->3)
E(3->1)
E(3->2)
E(1->2)
Pipeline
Agent
Ingest
Trade offs
• Excellent for efficient use of page cache• Able to maintain full database consistency• Achieves highest ingest rate in distributed
environments• Almost always has highest “perceived” rate• Trading Off :
• Eventual consistency in graph (connections)• Updates are still atomic, isolated and durable but phased• External agent performs graph building
Ingest
Result…
1
2
4
0
50000
100000
150000
200000
250000
300000
350000
400000
450000
500000
1 client
2 clients
4 clients
8 clients
1 client
2 clients
4 clients
8 clients
# of Processes
Nod
es a
nd E
dges
per
seco
nd
Ingest
Scaling Reads and Query
Distributed API
Application(s)
Partition 1 Partition 3Partition 2 Partition ...n
Processor Processor Processor Processor
Partitioning and Read Replicas… easy right !
Why are Graphs Different ?
Distributed API
Application(s)
Partition 1 Partition 3Partition 2 Partition ...n
Processor Processor Processor Processor
Navigate
Distributed Navigation
• Detect local hops and perform in memory traversal
• Send the partial path to the distributed processing to continue the navigation.
• Intelligently cache remote data when accessed frequently
• Route tasks to other hosts when it is optimal
Navigate
Distributed Navigation Server
Processor
Distributed API
Partition 1 Partition 2
Processor
Application
A
XY
B
C
D
EP(A,B,C,D)
F
G
Navigate
Schema – It’s not your enemy ! (at least not all the time...)
• Schema vs Schema-less– Database religion– InfiniteGraph supports schema, but does not
restrict connection types between vertices–We take advantage of the schema when pruning. –…Working on a hybrid support.
GraphViews Leveraging Schema in the Graph
Patient Prescription
Drug
Ingredient
Outcome
Complaint
Visit
Allergy
Physician
Navigate
Schema Enables Views
• GraphViews are extremely powerful• Allow Big Data to appear small !• Connection inference can lead to exponential
gains in query performance• Views are reusable between queries• Built into the native kernel
Navigate
Why InfiniteGraph™?
• Objectivity/DB is a proven foundation– Building distributed databases since 1993– A complete database management system
• Concurrency, transactions, cache, schema, query, indexing
• It’s a Graph Specialist !– Simple but powerful API tailored for data
navigation.– Easy to configure distribution model
Advanced Configured Placement
• Physically co-locate “closely related” data• Driven through a declarative placement model• Dramatically speeds “local” reads
Facility Data Page(s)Patient Data Page(s)
MrCitizen
Visit Visit
Dr Jones
San Jose
Facility
Dr Smith
Primary Physician
HasHas With
At
Located Located
Facility Data Page(s)
Dr Blake
Sunny-vale
Dr Quinn
Located Located
With
At
Fully Distributed Data Model
Zone 2Zone 1
HostA
IG Core/API
Distributed Object and Relationship Persistence Layer
Customizable Placement
HostB HostC HostX
AddVertex()
Polyglot NoSQL Architectures
Distributed Data Processing Platform Document
GraphDatabase
RDBMS
Partitioned Distributed DB (often Document / KV)
Users
Appl
icati
ons
External / Legacy Data Tr
ansf
orm
ation
\ M
DM
Business
What else!
• Distributed update.
Update
… we are working on it.
Conclusion
Have we solved all the distributed graph problems?
Not quite, but…
We Learned to Stop Worrying.
QUESTIONS?