how we learned to stop worrying and solve the distributed graph problem

30
www.Objectivit y.com How we Learned to Stop Worrying and Solve the Distributed Graph Problem Ibrahim Sallam Director of Development

Upload: infinitegraph

Post on 14-Jan-2015

327 views

Category:

Technology


0 download

DESCRIPTION

Graphs, what are they and why? Graph Data Management. Why do we need it? Problems in Distributed Graph How we solved the problems? Finding the value in Big Data.

TRANSCRIPT

Page 1: How we Learned to Stop Worrying and Solve the Distributed Graph Problem

www.Objectivity.com

How we Learned to Stop Worrying and Solve the

Distributed Graph Problem

Ibrahim Sallam

Director of Development

Page 2: How we Learned to Stop Worrying and Solve the Distributed Graph Problem

What we’ll talk about

• Graphs, what are they and why?• Graph Data Management. Why do we need it?• Problems in Distributed Graph• How we solved the problems

Page 3: How we Learned to Stop Worrying and Solve the Distributed Graph Problem

Graphs

Page 4: How we Learned to Stop Worrying and Solve the Distributed Graph Problem

Graphs

http://opte.org

Simple Graph

Node <-> Node

Page 5: How we Learned to Stop Worrying and Solve the Distributed Graph Problem

The Value of Data

• “[finding] Japanese restaurants in New York City liked by people from Japan” – a challenging Facebook query.

• "The value of any piece of information is only known when you can connect it with something else that arrives at a future point in time,” – CIA’s CTO Ira "Gus" Hunt

Page 6: How we Learned to Stop Worrying and Solve the Distributed Graph Problem

Connecting Data

Person Building?

Work Live RR Visit Eat Shop

Page 7: How we Learned to Stop Worrying and Solve the Distributed Graph Problem

Relationships are everywhere

CRM, Sales & Marketing Network

Mgmt, Telecom

Intelligence (Government& Business)

Finance

HealthcareResearch: Genomics

Social Networks

PLM (Product Lifecycle Mgmt)

IG

Page 8: How we Learned to Stop Worrying and Solve the Distributed Graph Problem

Graph Data

• Data structure for representing complicated relationships between entities.

• Wide applications in…– Bioinformatics.– Social networks.– Network management… etc.

Page 9: How we Learned to Stop Worrying and Solve the Distributed Graph Problem

Why a Graph Database ?

Page 10: How we Learned to Stop Worrying and Solve the Distributed Graph Problem

The Graph Specialist !

• Everyone specializes – Doctors, Lawyers, Bankers, Developers

• Why was data so normalized for so long !• NoSQL is all about the data specialist• Specializing in…– Distribution / deployment– Physical data storage– Logical data model– Query mechanism

Page 11: How we Learned to Stop Worrying and Solve the Distributed Graph Problem

What Problem!

Ingest

Update

Navigate

Massive graph data require efficient and intelligent tools to analyze and understand it.

Page 12: How we Learned to Stop Worrying and Solve the Distributed Graph Problem

Scaling Writes

• Big/Fast data demands write performance• Most NoSQL solutions allow you to scale

writes by…– Partitioning the data– Understanding your consistency requirements– Allowing you to defer conflicts

Ingest

Page 13: How we Learned to Stop Worrying and Solve the Distributed Graph Problem

App-2(Ingest V2)

App-2(E23{ V2V3})

Scaling Graph Writes ACID Transactions

InfiniteGraph

Objectivity/DB Persistence Layer

App-1(Ingest V1)

App-3(Ingest V3)

V1 V2 V3

App-1(E1 2{ V1V2})

App-3

E12 E23

Ingest

Page 14: How we Learned to Stop Worrying and Solve the Distributed Graph Problem

High Performance Edge Ingest

IG Core/API

C1

C2

C3

E12

E23

Targ

et C

onta

iner

s

Pipeline Containers

E(1->2)

E(3->1)

E(2->3)

E(2->1)

E(2->3)E(3->1)

E(1->2)

E(3->2)

E(1->2)

E(2->3)

E(3->1)

E(2->1)

E(2->3)

E(3->1)

E(3->2)

E(1->2)

Pipeline

Agent

Ingest

Page 15: How we Learned to Stop Worrying and Solve the Distributed Graph Problem

Trade offs

• Excellent for efficient use of page cache• Able to maintain full database consistency• Achieves highest ingest rate in distributed

environments• Almost always has highest “perceived” rate• Trading Off :

• Eventual consistency in graph (connections)• Updates are still atomic, isolated and durable but phased• External agent performs graph building

Ingest

Page 16: How we Learned to Stop Worrying and Solve the Distributed Graph Problem

Result…

1

2

4

0

50000

100000

150000

200000

250000

300000

350000

400000

450000

500000

1 client

2 clients

4 clients

8 clients

1 client

2 clients

4 clients

8 clients

# of Processes

Nod

es a

nd E

dges

per

seco

nd

Ingest

Page 17: How we Learned to Stop Worrying and Solve the Distributed Graph Problem

Scaling Reads and Query

Distributed API

Application(s)

Partition 1 Partition 3Partition 2 Partition ...n

Processor Processor Processor Processor

Partitioning and Read Replicas… easy right !

Page 18: How we Learned to Stop Worrying and Solve the Distributed Graph Problem

Why are Graphs Different ?

Distributed API

Application(s)

Partition 1 Partition 3Partition 2 Partition ...n

Processor Processor Processor Processor

Navigate

Page 19: How we Learned to Stop Worrying and Solve the Distributed Graph Problem

Distributed Navigation

• Detect local hops and perform in memory traversal

• Send the partial path to the distributed processing to continue the navigation.

• Intelligently cache remote data when accessed frequently

• Route tasks to other hosts when it is optimal

Navigate

Page 20: How we Learned to Stop Worrying and Solve the Distributed Graph Problem

Distributed Navigation Server

Processor

Distributed API

Partition 1 Partition 2

Processor

Application

A

XY

B

C

D

EP(A,B,C,D)

F

G

Navigate

Page 21: How we Learned to Stop Worrying and Solve the Distributed Graph Problem

Schema – It’s not your enemy ! (at least not all the time...)

• Schema vs Schema-less– Database religion– InfiniteGraph supports schema, but does not

restrict connection types between vertices–We take advantage of the schema when pruning. –…Working on a hybrid support.

Page 22: How we Learned to Stop Worrying and Solve the Distributed Graph Problem

GraphViews Leveraging Schema in the Graph

Patient Prescription

Drug

Ingredient

Outcome

Complaint

Visit

Allergy

Physician

Navigate

Page 23: How we Learned to Stop Worrying and Solve the Distributed Graph Problem

Schema Enables Views

• GraphViews are extremely powerful• Allow Big Data to appear small !• Connection inference can lead to exponential

gains in query performance• Views are reusable between queries• Built into the native kernel

Navigate

Page 24: How we Learned to Stop Worrying and Solve the Distributed Graph Problem

Why InfiniteGraph™?

• Objectivity/DB is a proven foundation– Building distributed databases since 1993– A complete database management system

• Concurrency, transactions, cache, schema, query, indexing

• It’s a Graph Specialist !– Simple but powerful API tailored for data

navigation.– Easy to configure distribution model

Page 25: How we Learned to Stop Worrying and Solve the Distributed Graph Problem

Advanced Configured Placement

• Physically co-locate “closely related” data• Driven through a declarative placement model• Dramatically speeds “local” reads

Facility Data Page(s)Patient Data Page(s)

MrCitizen

Visit Visit

Dr Jones

San Jose

Facility

Dr Smith

Primary Physician

HasHas With

At

Located Located

Facility Data Page(s)

Dr Blake

Sunny-vale

Dr Quinn

Located Located

With

At

Page 26: How we Learned to Stop Worrying and Solve the Distributed Graph Problem

Fully Distributed Data Model

Zone 2Zone 1

HostA

IG Core/API

Distributed Object and Relationship Persistence Layer

Customizable Placement

HostB HostC HostX

AddVertex()

Page 27: How we Learned to Stop Worrying and Solve the Distributed Graph Problem

Polyglot NoSQL Architectures

Distributed Data Processing Platform Document

GraphDatabase

RDBMS

Partitioned Distributed DB (often Document / KV)

Users

Appl

icati

ons

External / Legacy Data Tr

ansf

orm

ation

\ M

DM

Business

Page 28: How we Learned to Stop Worrying and Solve the Distributed Graph Problem

What else!

• Distributed update.

Update

… we are working on it.

Page 29: How we Learned to Stop Worrying and Solve the Distributed Graph Problem

Conclusion

Have we solved all the distributed graph problems?

Not quite, but…

We Learned to Stop Worrying.

Page 30: How we Learned to Stop Worrying and Solve the Distributed Graph Problem

QUESTIONS?