solving problems with graphs
DESCRIPTION
Who am I and why do I feel that the world is not infinitely perfect? Which technologies should I use to rectify this situation? Enter the graph and the graph traversal.TRANSCRIPT
MARKO A. RODRIGUEZ
http://THINKAURELIUS.COM
SOLVING PROBLEMS WITH GRAPHS
MARKO A. RODRIGUEZ
MARKO A. RODRIGUEZ
MARKO A. RODRIGUEZ
YOU WILL NOT KNOW MEBY LOOKING WITHIN
marko
marko
TO KNOW ME IS
TO KNOW MY WORLD
marko
gremlin
russell
AND IN MY WORLD THERE ARE THINGS
created
marko
gremlin
knows
russell
created
marko
gremlin
knows
russell
AND TO KNOW THESE THINGS IS TO KNOW THEIR WORLD
... AD INFINITUM
created
marko
gremlin
blueprints
depends
created
pavel
hortonworks
knows
russell
works
created
marko
gremlin
blueprints
depends
titan
depends
created
pavelcreated
cql
hbase
faunus
depend
s
hadoop
works
hortonworks
uses
knows
russell
works
uses
created
marko
gremlin
blueprints
depends
titan
depends
created
pavelcreated
cql
depends
cassandradepends
hbase
depends
faunusdepends
created
depend
s
hadoop
uses
works
hortonworks
cloudera
uses
uses
depends
knows
russell
works
depends
uses
depends
created
marko
gremlin
blueprints
depends
titan
depends
created
pavelcreated
cql
depends
cassandradepends
hbase
depends
faunusdepends
created
depend
s
hadoop
uses
works
hortonworks
cloudera
uses
uses
depends
knows
russell
works
depends
uses
depends
I POSIT THAT THE SOLUTIONS TO PROBLEMS EXIST IN THE WORLD AND THAT RESOLUTION IS SOUGHT IN LINKAGE.
created
marko
gremlin
blueprints
depends
titan
depends
createdpavel
createdcql
depends
cassandradepends
hbase
dependsfaunus
depends
created
depends
hadoop
uses
wor
kshortonworks
cloudera
uses
uses
depends
know
s
russell
works
depe
nds
uses
depe
nds
PROBLEM
• I want organizations with Big Graph Data to use the Aurelius Graph Cluster.
created
marko
gremlin
blueprints
depends
titan
depends
createdpavel
createdcql
depends
cassandradepends
hbase
dependsfaunus
depends
created
depends
hadoop
uses
wor
kshortonworks
cloudera
uses
uses
depends
know
s
russell
works
depe
nds
uses
depe
nds
PROBLEM
• I want organizations with Big Graph Data to use the Aurelius Graph Cluster.
SOLUTION #1
• Hortonworks and Cloudera use Big Data technology.
created
marko
gremlin
blueprints
depends
titan
depends
createdpavel
createdcql
depends
cassandradepends
hbase
dependsfaunus
depends
created
depends
hadoop
uses
wor
kshortonworks
cloudera
uses
uses
depends
know
s
russell
works
depe
nds
uses
depe
nds
PROBLEM
• I want organizations with Big Graph Data to use the Aurelius Graph Cluster.
SOLUTION #1
• Hortonworks and Cloudera use Big Data technology. I know Russell.
created
marko
gremlin
blueprints
depends
titan
depends
createdpavel
createdcql
depends
cassandradepends
hbase
dependsfaunus
depends
created
depends
hadoop
uses
wor
kshortonworks
cloudera
uses
uses
depends
know
s
russell
works
depe
nds
uses
depe
nds
partnerpartner
PROBLEM
• I want organizations with Big Graph Data to use the Aurelius Graph Cluster.
SOLUTION #1
• Hortonworks and Cloudera use Big Data technology. I know Russell. Partnership?
created
marko
gremlin
blueprints
depends
titan
depends
created
pavelcreated
cql
depends
cassandradepends
hbase
depends
faunusdepends
created
depend
s
hadoop
uses
works
hortonworks
cloudera
uses
uses
depends
knows
russell
works
depends
uses
depends
created
marko
gremlin
blueprints
depends
titan
depends
createdpavel
createdcql
depends
cassandradepends
hbase
dependsfaunus
depends
created
depends
hadoop
uses
wor
kshortonworks
cloudera
uses
uses
depends
know
s
russell
works
depe
nds
uses
depe
nds
PROBLEM
• I want organizations with Big Graph Data to use the Aurelius Graph Cluster.
created
marko
gremlin
blueprints
depends
titan
depends
createdpavel
createdcql
depends
cassandradepends
hbase
dependsfaunus
depends
created
depends
hadoop
uses
wor
kshortonworks
cloudera
uses
uses
depends
know
s
russell
works
depe
nds
uses
depe
nds
SOLUTION #2
• Pavel and I created Gremlin.
PROBLEM
• I want organizations with Big Graph Data to use the Aurelius Graph Cluster.
created
marko
gremlin
blueprints
depends
titan
depends
createdpavel
createdcql
depends
cassandradepends
hbase
dependsfaunus
depends
created
depends
hadoop
uses
wor
kshortonworks
cloudera
uses
uses
depends
know
s
russell
works
depe
nds
uses
depe
nds
PROBLEM
• I want organizations with Big Graph Data to use the Aurelius Graph Cluster.
SOLUTION #2
• Pavel and I created Gremlin. He is a Cassandra engineer at Twitter.
created
marko
gremlin
blueprints
depends
titan
depends
createdpavel
createdcql
depends
cassandradepends
hbase
dependsfaunus
depends
created
depends
hadoop
uses
wor
kshortonworks
cloudera
uses
uses
depends
know
s
russell
works
depe
nds
uses
depe
nds
meets
PROBLEM
• I want organizations with Big Graph Data to use the Aurelius Graph Cluster.
SOLUTION #2
• Pavel and I created Gremlin. He is a Cassandra engineer at Twitter. Meeting?
created
marko
gremlin
blueprints
depends
titan
depends
created
pavelcreated
cql
depends
cassandradepends
hbase
depends
faunusdepends
created
depend
s
hadoop
uses
works
hortonworks
cloudera
uses
uses
depends
knows
russell
works
depends
uses
depends
created
marko
gremlin
blueprints
depends
titan
depends
created
pavelcreated
cql
depends
cassandradepends
hbase
depends
faunusdepends
created
depend
s
hadoop
uses
works
hortonworks
cloudera
uses
uses
depends
knows
russell
works
depends
uses
depends
SOFTWARE
marko
SOFTWARE
marko
SOFTWARE
marko
SOFTWARE
marko
marko
puppy
pet pet
mama
marko
puppy
pet pet
mamasniffs
marko
puppy
pet pet
mama
curley
chula
biscuit
whettlejackson
mia
missy
scout
sniffs
sniffs
sniffs
sniffs
sniffs sniffs
sniffs
marko
puppy
pet pet
mama
males
curley
chula
biscuit
whettlejackson
mia
missy
scout
sniffs
sniffs
sniffs
sniffs
sniffs sniffs
sniffs
marko
puppy
pet pet
mama
males
curley
chula
biscuit
whettlejackson
mia
missy
scout
sniffs
sniffs
sniffs
sniffs
sniffs sniffs
sniffs
sniffs
sniffs
sniffs sniffs
sniffs
sniffs
marko
puppy
pet pet
mama
males
curley
chula
biscuit
whettlejackson
mia
missy
scout
sniffs
sniffs
sniffs
sniffs
sniffs sniffs
sniffs
sniffs
sniffs
sniffs sniffs
sniffs
sniffs
1
marko
puppy
pet pet
mama
males
curley
chula
biscuit
whettlejackson
mia
missy
scout
sniffs
sniffs
sniffs
sniffs
sniffs sniffs
sniffs
sniffs
sniffs
sniffs sniffs
sniffs
sniffs
1
2
2
marko
puppy
pet pet
mama
males
curley
chula
biscuit
whettlejackson
mia
missy
scout
sniffs
sniffs
sniffs
sniffs
sniffs sniffs
sniffs
sniffs
sniffs
sniffs sniffs
sniffs
sniffs
2
2
3
marko
puppy
pet pet
mama
males
curley
chula
biscuit
whettlejackson
mia
missy
scout
sniffs
sniffs
sniffs
sniffs
sniffs sniffs
sniffs
sniffs
sniffs
sniffs sniffs
sniffs
sniffs
3
4
4 4
4
4
marko
puppy
pet pet
mama
males
curley
chula
biscuit
whettlejackson
mia
missy
scout
sniffs
sniffs
sniffs
sniffs
sniffs sniffs
sniffs
sniffs
sniffs
sniffs sniffs
sniffs
sniffs
4
4 4
4
4
5
5
marko
puppy
pet pet
mama
males
curley
chula
biscuit
whettlejackson
mia
missy
scout
sniffs
sniffs
sniffs
sniffs
sniffs sniffs
sniffs
sniffs
sniffs
sniffs sniffs
sniffs
sniffs
5
5
marko
puppy
pet pet
mama
males
curley
chula
biscuit
whettlejackson
mia
missy
scout
sniffs
sniffs
sniffs
sniffs
sniffs sniffs
sniffs
sniffs
sniffs
sniffs sniffs
sniffs
sniffs
marko
puppy
pet pet
mama
males
curley
chula
biscuit
whettlejackson
mia
missy
scout
sniffs
sniffs
sniffs
sniffs
sniffs sniffs
sniffs
sniffs
sniffs
sniffs sniffs
sniffs
sniffs
marko
puppy
pet pet
mama
males
curley
chula
biscuit
whettlejackson
mia
missy
scout
sniffs
sniffs
sniffs
sniffs
sniffs sniffs
sniffs
sniffs
sniffs
sniffs sniffs
sniffs
sniffs
• What is the best way to save this dog community from disease?PROBLEM
marko
puppy
pet pet
mama
males
curley
chula
biscuit
whettlejackson
mia
missy
scout
sniffs
sniffs
sniffs
sniffs
sniffs sniffs
sniffs
sniffs
sniffs
sniffs sniffs
sniffs
sniffs
• What is the best way to save this dog community from disease?PROBLEM
marko
puppy
pet pet
mama
males
curley
chula
biscuit
whettlejackson
mia
missy
scout
sniffs
sniffs
sniffs
sniffs
sniffs sniffs
sniffs
sniffs
sniffs
sniffs sniffs
sniffs
sniffs
• What is the best way to save this dog community from disease?
• Inoculate the most central dog.
inoc
ulat
ePROBLEM
SOLUTION
marko
puppy
pet pet
mama
males
curley
chula
biscuit
whettlejackson
mia
missy
scout
sniffs
sniffs
sniffs
sniffs
sniffs sniffs
sniffs
sniffs
sniffs
sniffs sniffs
sniffs
sniffs
• What is the best way to save this dog community from disease?
• Inoculate the most central dog.
PROBLEM
SOLUTION
marko
puppy
pet pet
mama
males
curley
chula
biscuit
whettlejackson
mia
missy
scout
sniffs
sniffs
sniffs
sniffs
sniffs sniffs
sniffs
sniffs
sniffs
sniffs sniffs
sniffs
sniffs
marko
puppy
pet pet
mama
males
curley
chula
biscuit
whettlejackson
mia
missy
scout
sniffs
sniffs
sniffs
sniffs
sniffs sniffs
sniffs
sniffs
sniffs
sniffs sniffs
sniffs
sniffs
DOGS
marko
DOGS
marko
DOGS
SOFTWARE
marko
DOGS
SOFTWARE
marko
DOGS
SOFTWARE
marko
DOGS
SOFTWARE
marko
DOGS
SOFTWARE
marko
marko
avalanche
captain
blackhawks
wild
panthers
marko
avalanche
captain
blackhawks
wild
panthers
marko
avalanche
game #1
#2
#3
#4
#5
#n
played
captain
...
blackhawks
wild
panthers
marko
avalanche
game #1
#2
#3
#4
#5
#n
period #1
#2
#3
played
p3
p2
p1
captain
...
blackhawks
wild
panthers
marko
avalanche
game #1
#2
#3
#4
#5
#n
period #1
#2
#3penalty
#1
#2
#3
goal #1
#2
played
p3
p2
p1
penalties
goals
captain
...
blackhawks
wild
panthers
marko
avalanche
game #1
#2
#3
#4
#5
#n
period #1
#2
#3penalty
#1
#2
#3
goal #1
#2
played
p3
p2
p1
penalties
goals
jasonsteve craig patrick
captain
...
...
blackhawks
wild
panthers
marko
avalanche
game #1
#2
#3
#4
#5
#n
period #1
#2
#3penalty
#1
#2
#3
goal #1
#2
played
p3
p2
p1
penalties
goals
jasonsteve craig patrickscored
received
captain
assisted
scored
assisted
...
...
blackhawks
wild
panthers
marko
avalanche
game #1
#2
#3
#4
#5
#n
period #1
#2
#3penalty
#1
#2
#3
goal #1
#2
played
p3
p2
p1
penalties
goals
jasonsteve craig patrickscored
received
PROBLEM
• Who do I draft for the 2012/2013 season?
captain
assisted
scored
assisted
...
...
blackhawks
wild
panthers
marko
avalanche
game #1
#2
#3
#4
#5
#n
period #1
#2
#3penalty
#1
#2
#3
goal #1
#2
played
p3
p2
p1
penalties
goals
jasonsteve craig patrickscored
received
PROBLEM
• Who do I draft for the 2012/2013 season?
captain
assisted
scored
assisted
...
...SOLUTION
• Craig/Patrick play well together.
blackhawks
wild
panthers
marko
avalanche
game #1
#2
#3
#4
#5
#n
period #1
#2
#3penalty
#1
#2
#3
goal #1
#2
played
p3
p2
p1
penalties
goals
jasonsteve craig patrickscored
received
PROBLEM
• Who do I draft for the 2012/2013 season?
captain
assisted
scored
assisted
...
...
draft
draftSOLUTION
• Craig/Patrick play well together. Draft them as a pair?
blackhawks
wild
panthers
marko
avalanche
game #1
#2
#3
#4
#5
#n
period #1
#2
#3penalty
#1
#2
#3
goal #1
#2
played
p3
p2
p1
penalties
goals
jasonsteve craig patrickscored
received
captain
assisted
scored
assisted
...
...SPORTS
marko
SPORTS
marko
DOGS
SOFTWARE
SPORTS
THIS IS JUST MY SUBSET OF THE WORLD
AND MY PERSONAL PROBLEMS.
OUR PERSONAL WORLDS ...
... ARE EMBEDDED WITHIN A LARGER WORLD ...
... OF OTHER PEOPLEAND ARTIFACTS ...
... AND TOGETHER WE NAVIGATE THAT WORLD ...
... TRYING TO BETTER
OUR LIVES ...
... BY SOLVING
OUR PROBLEMS ...
... BY OPTIMALLY LINKING OURSELVES WITHIN THE WORLD.
WHY DO WE ENCODE OURSELVES?
This where I work.
I like these books.
I think about these ideas.
These are my friends.
I work on these projects.
I visit these webpages.
WHY ARE WE CREATING A UNIVERSAL MODEL?
I wrote these articles.
I am.
Where is the best place for me to live?
What career path should I choose given my interests and expertise?
Who should I fall in love with and live my life with?
BECAUSE WE HAVE FAITH IN THE ALGORITHM.
What movie should I watch tonight with the friends I'm meeting up with?
Who should I befriend?
What ideas will inspire me?
WE HAVE FAITH THAT TOGETHER, WITH COMPUTERS,
WE WILL DETERMINE THE OPTIMAL EMBEDDING.
WE WILL GENERATE ENTHRALLING CONNECTIONS ...
THAT RESONATES US ...
TO CREATE AND EXPERIENCE ...
EVEN GREATER THINGS.
HOW DO WE STORE AND PROCESSA WORLD MODEL?
BILLIONS OF VERTICES
TRILLIONS OF EDGES
SOLVING MILLIONS OF PROBLEMS/SECOND
AURELIUS
GRAPH CLUSTER
GRAPH COMPUTING
GRAPH COMPUTING
DATA STRUCTURE
TRAVERSAL GRAPH
ALGORITHM
+
GRAPH
name:markoage:32
name:Fountain Headpublished:1943
likes
star:5time:2002
GRAPH
name:markoage:32
name:Fountain Headpublished:1943
likes
star:5time:2002
PROPERTIES
VERTEX
EDGE
KEYVALUE
LABEL
TRAVERSAL
A traversal is an algorithmic walk over a (sub)graph in orderto make explicit information that is implicit within its structure.
1.) graph derivation ("my father's father is my grandfather.") 2.) graph statistic ("many paths lead to Rome.")
1
2
2
3
3
3
3
TRAVERSAL
knows
knows likes
likes
likes
lives
likes
likes
lives
likes
likes
likes
knows
A
B
C
D
name:marko
name:joshbirthmonth:12
name:jenbirthmonth:2 name:santa fe
name:pavel
name:russell
TRAVERSAL
knows
knows likes
likes
likes
lives
likes
likes
lives
likes
likes
likes
knows
A
B
C
D
name:marko
name:joshbirthmonth:12
name:jenbirthmonth:2
PROBLEM
• What should I buy for any upcoming friends' birthday?
name:santa fe
name:pavel
name:russell
TRAVERSAL
knows
knows likes
likes
likes
lives
likes
likes
lives
likes
likes
likes
knows
A
B
C
D
name:jenbirthmonth:2
name:marko
name:joshbirthmonth:12
g.V('name','marko')
name:santa fe
name:pavel
name:russell
TRAVERSAL
knows
knows likes
likes
likes
lives
likes
likes
lives
likes
likes
likes
knows
A
B
C
D
name:jenbirthmonth:2
name:marko
name:joshbirthmonth:12
g.V('name','marko').out('knows')
name:santa fe
name:pavel
name:russell
TRAVERSAL
knows
knows likes
likes
likes
lives
likes
likes
lives
likes
likes
likes
knows
A
B
C
D
name:jenbirthmonth:2
name:marko
name:joshbirthmonth:12
g.V('name','marko').out('knows') .filter{(it.birthmonth + currentMonth) % 12 < 2}
name:santa fe
name:pavel
name:russell
TRAVERSAL
knows
knows likes
likes
likes
lives
likes
likes
lives
likes
likes
likes
knows
A
B
C
D
name:jenbirthmonth:2
name:marko
name:joshbirthmonth:12
g.V('name','marko').out('knows') .filter{(it.birthmonth + currentMonth) % 12 < 2} .out('likes')
name:santa fe
name:pavel
name:russell
x
TRAVERSAL
knows
knows likes
likes
likes
lives
likes
likes
lives
likes
likes
likes
knows
A
B
C
D
name:jenbirthmonth:2
name:marko
name:joshbirthmonth:12
g.V('name','marko').out('knows') .filter{(it.birthmonth + currentMonth) % 12 < 2} .out('likes').aggregate(x)
name:santa fe
name:pavel
name:russell
TRAVERSAL
knows
knows likes
likes
likes
lives
likes
likes
lives
likes
likes
likes
knows
A
B
C
D
name:jenbirthmonth:2
name:marko
name:joshbirthmonth:12
g.V('name','marko').out('knows') .filter{(it.birthmonth + currentMonth) % 12 < 2} .out('likes').aggregate(x).in('likes')
name:santa fe
name:pavel
name:russell
x
x
TRAVERSAL
knows
knows likes
likes
likes
lives
likes
likes
lives
likes
likes
likes
knows
A
B
C
D
name:jenbirthmonth:2
name:marko
name:joshbirthmonth:12
g.V('name','marko').out('knows') .filter{(it.birthmonth + currentMonth) % 12 < 2} .out('likes').aggregate(x).in('likes').out('like')
name:santa fe
name:pavel
name:russell
x
TRAVERSAL
knows
knows likes
likes
likes
lives
likes
likes
lives
likes
likes
likes
knows
A
B
C
D
name:jenbirthmonth:2
name:marko
name:joshbirthmonth:12
g.V('name','marko').out('knows') .filter{(it.birthmonth + currentMonth) % 12 < 2} .out('likes').aggregate(x).in('likes').out('like') .except(x)
name:santa fe
name:pavel
name:russell
TRAVERSAL
knows
knows likes
likes
likes
lives
likes
likes
lives
likes
likes
likes
knows
A
B
C
D
name:jenbirthmonth:2
name:marko
name:joshbirthmonth:12
g.V('name','marko').out('knows') .filter{(it.birthmonth + currentMonth) % 12 < 2} .out('likes').aggregate(x).in('likes').out('like') .except(x)
name:santa fe
name:pavel
name:russell
TRAVERSAL
knows
knows likes
likes
likes
lives
likes
likes
lives
likes
likes
likes
knows
A
B
C
D
name:jenbirthmonth:2
name:marko
name:joshbirthmonth:12
g.V('name','marko').out('knows') .filter{(it.birthmonth + currentMonth) % 12 < 2} .out('likes').aggregate(x).in('likes').out('like') .except(x).groupCount()
name:santa fe
name:pavel
name:russell
C:2D:1
C:2D:1
TRAVERSAL
knows
knows likes
likes
likes
lives
likes
likes
lives
likes
likes
likes
knows
A
B
C
D
name:jenbirthmonth:2
name:marko
name:joshbirthmonth:12
g.V('name','marko').out('knows') .filter{(it.birthmonth + currentMonth) % 12 < 2} .out('likes').aggregate(x).in('likes').out('like') .except(x).groupCount()
likesname:santa fe
name:pavel
name:russell
GRAPH COMPUTING SYSTEMS
AURELIUS GRAPH CLUSTER
TITANDistributed Graph Database
FAUNUSGraph Analytics Engine
FULGORAFast Graph Processor
Apache 2 Licensed
TITANDISTRIBUTED GRAPH DATABASE
TITANDISTRIBUTED GRAPH DATABASE
Represents the world as a single, atomic graph structure.
TITANDISTRIBUTED GRAPH DATABASE
Represents the world as a single, atomic graph structure.
Distributed over a multi-machine cluster using existing distributed data systems such as Cassandra and HBase.
TITANDISTRIBUTED GRAPH DATABASE
Represents the world as a single, atomic graph structure.
Distributed over a multi-machine cluster using existing distributed data systems such as Cassandra and HBase.
Supporting numerous short-lived, topologically local, real-time traversals.
FAUNUSGRAPH ANALYTICS ENGINE
FAUNUSGRAPH ANALYTICS ENGINE
Extracts an ephemeral snapshot of the master graph.
FAUNUSGRAPH ANALYTICS ENGINE
Extracts an ephemeral snapshot of the master graph.
Leverages Hadoop as its distributed computing engine.
FAUNUSGRAPH ANALYTICS ENGINE
Executing long-lived, topologically global analyses of the graph.
Extracts an ephemeral snapshot of the master graph.
Leverages Hadoop as its distributed computing engine.
FULGORAFAST GRAPH PROCESSOR
FULGORAFAST GRAPH PROCESSOR
Stores a compressed subset of the master graph.
FULGORAFAST GRAPH PROCESSOR
Stores a compressed subset of the master graph.
Contained within the confines of a single high memory/CPU machine.
FULGORAFAST GRAPH PROCESSOR
Evaluating heavily threaded, memory efficient graph and machine learning algorithms.
Stores a compressed subset of the master graph.
Contained within the confines of a single high memory/CPU machine.
}
AURELIUS GRAPH CLUSTERAN INTEGRATED BIG GRAPH DATA SOLUTION
TITANDistributed Graph Database
FAUNUSGraph Analytics Engine
FULGORAFast Graph Processor
Map/Reduce
Update Graph
Update Graph
Output Graph
Output Statistics
Output Statistics
Other Data Analysis Tools
Apache 2 Licensed
PRESENTERMARKO A. RODRIGUEZ
CONTRIBUTORSMATTHIAS BROCHELER
STEPHEN MALLETTEDAN LAROCQUE
VADAS GINTAUTAS
MANY THANKS TOAURELIUS COMMUNITY
TINKERPOP COMMUNITYKETRINA YIM
CREDITS
RELATED MATERIAL
FAITH IN THE ALGORITHM 2:COMPUTATIONAL EUDAEMONICS
http://arxiv.org/abs/0904.0027
A COLLECTIVELY GENERATED MODEL OF THE WORLD
http://markorodriguez.files.wordpress.com/2011/01/collective-model.pdf
http://markorodriguez.com/2011/07/14/graphs-brains-and-gremlin/GRAPHS, BRAINS, AND GREMLIN
http://thinkaurelius.com/2012/05/08/structural-abstractions-in-brains-and-graphs/
STRUCTURAL ABSTRACTIONS IN THE BRAIN