tinkerpop: a story of graphs, dbs, and graph dbs
DESCRIPTION
intro to TinkerPop and the Aurelius Graph Cluster for the Graph DB Workshop, Texas Linux Festival 2014TRANSCRIPT
TinkerPop: a story of graphs, DBs, and graph DBs
Joshua Shinavier and James Thornton
Texas Linux FestivalJune 13th, 2014
Once, there was a thing
v(1)
Let’s call it a vertex
The vertex had some metadata
v(1)
name: “Graph DB workshop”
We’ll call that a property
v(1)
name: “Graph DB workshop”
You are here.
In fact, the vertex had multiple properties
v(1)
name: “Graph DB workshop”type: “Event”
The properties were of various types
v(1)
name: “Graph DB workshop”type: “Event”starts: 1402682400000ends: 1402696800000
v(1)
name: “Graph DB workshop”type: “Event”starts: 1402682400000ends: 1402696800000
v(2)
name: “Texas Linux Fest”type: “Event”starts: 1402664400000ends: 1402808400000
Our vertex was not alone
Thus, an edge
v(1)
name: “Graph DB workshop”type: “Event”starts: 1402682400000ends: 1402696800000
v(2)
name: “Texas Linux Fest”type: “Event”starts: 1402664400000ends: 1402808400000
The edge was directed…
v(1)
name: “Graph DB workshop”type: “Event”starts: 1402682400000ends: 1402696800000
v(2)
name: “Texas Linux Fest”type: “Event”starts: 1402664400000ends: 1402808400000
…and labeled
v(1)
name: “Graph DB workshop”type: “Event”starts: 1402682400000ends: 1402696800000
v(2)
name: “Texas Linux Fest”type: “Event”starts: 1402664400000ends: 1402808400000
partOf
The label types the relationship
v(1)
name: “Graph DB workshop”type: “Event”starts: 1402682400000ends: 1402696800000
v(2)
name: “Texas Linux Fest”type: “Event”starts: 1402664400000ends: 1402808400000
partOf
You are here, too.
v(1)
name: “Graph DB workshop”type: “Event”starts: 1402682400000ends: 1402696800000
v(2)
name: “Texas Linux Fest”type: “Event”starts: 1402664400000ends: 1402808400000
partOfv(3)
name: “Chef Workshop”type: “Event”starts: 1402664400000ends: 1402696800000
v(4)
name: “Canonical Charm School”type: “Event”starts: 1402664400000ends: 1402696800000
partOf
partOf
More vertices joined the fun…
v(1)
name: “Graph DB workshop”type: “Event”starts: 1402682400000ends: 1402696800000
v(2)
name: “Texas Linux Fest”type: “Event”starts: 1402664400000ends: 1402808400000
partOfv(3)
name: “Chef Workshop”type: “Event”starts: 1402664400000ends: 1402696800000
v(4)
name: “Canonical Charm School”type: “Event”starts: 1402664400000ends: 1402696800000
partOf
partOf
v(7)
name: “TinkerPop suite”type: “Software”
hasTopic
v(8)
name: “Aurelius Graph Cluster”type: “Software”
hasTopic
More labels, too
Now it was a labeled multigraph
v(1)
name: “Graph DB workshop”type: “Event”starts: 1402682400000ends: 1402696800000
v(2)
name: “Texas Linux Fest”type: “Event”starts: 1402664400000ends: 1402808400000
partOfv(3)
name: “Chef Workshop”type: “Event”starts: 1402664400000ends: 1402696800000
v(4)
name: “Canonical Charm School”type: “Event”starts: 1402664400000ends: 1402696800000
partOf
partOf
v(6)
name: “Joshua Shinavier”type: “Person”githubId: “joshsh”
v(5)
presentedBy
presentedBy
v(7)
name: “TinkerPop suite”type: “Software”
hasTopic
v(8)
name: “Aurelius Graph Cluster”type: “Software”
hasTopic
name: “James Thornton”type: “Person”githubId: “espeed”
A few more edges
v(1)
name: “Graph DB workshop”type: “Event”starts: 1402682400000ends: 1402696800000
v(2)
name: “Texas Linux Fest”type: “Event”starts: 1402664400000ends: 1402808400000
partOfv(3)
name: “Chef Workshop”type: “Event”starts: 1402664400000ends: 1402696800000
v(4)
name: “Canonical Charm School”type: “Event”starts: 1402664400000ends: 1402696800000
partOf
partOf
v(6)
name: “Joshua Shinavier”type: “Person”githubId: “joshsh”
v(5)
presentedBy
presentedBy
v(7)
name: “TinkerPop suite”type: “Software”
hasTopic
v(8)
name: “Aurelius Graph Cluster”type: “Software”
contributesTo
contributesTohasTopic
contributesTo
name: “James Thornton”type: “Person”githubId: “espeed”
Some edges also had properties
v(1)
name: “Graph DB workshop”type: “Event”starts: 1402682400000ends: 1402696800000
v(2)
name: “Texas Linux Fest”type: “Event”starts: 1402664400000ends: 1402808400000
partOfv(3)
name: “Chef Workshop”type: “Event”starts: 1402664400000ends: 1402696800000
v(4)
name: “Canonical Charm School”type: “Event”starts: 1402664400000ends: 1402696800000
partOf
partOf
v(6)
name: “Joshua Shinavier”type: “Person”githubId: “joshsh”
v(5)
presentedBy
presentedBy
v(7)
name: “TinkerPop suite”type: “Software”
hasTopic
v(8)
name: “Aurelius Graph Cluster”type: “Software”
contributesTo
contributesTohasTopic
contributesTo
weight: 0.2
weight: 0.8
name: “James Thornton”type: “Person”githubId: “espeed”
weight: 1.0
We call this a Property Graph
v(1)
name: “Graph DB workshop”type: “Event”starts: 1402682400000ends: 1402696800000
v(2)
name: “Texas Linux Fest”type: “Event”starts: 1402664400000ends: 1402808400000
partOfv(3)
name: “Chef Workshop”type: “Event”starts: 1402664400000ends: 1402696800000
v(4)
name: “Canonical Charm School”type: “Event”starts: 1402664400000ends: 1402696800000
partOf
partOf
v(6)
name: “Joshua Shinavier”type: “Person”githubId: “joshsh”
v(5)
presentedBy
presentedBy
v(7)
name: “TinkerPop suite”type: “Software”
hasTopic
v(8)
name: “Aurelius Graph Cluster”type: “Software”
contributesTo
contributesTohasTopic
contributesTo
weight: 0.2
weight: 0.8
name: “James Thornton”type: “Person”githubId: “espeed”
weight: 1.0
Many graph DB data models are variations on this theme
v(1)
name: “Graph DB workshop”type: “Event”starts: 1402682400000ends: 1402696800000
v(2)
name: “Texas Linux Fest”type: “Event”starts: 1402664400000ends: 1402808400000
partOfv(3)
name: “Chef Workshop”type: “Event”starts: 1402664400000ends: 1402696800000
v(4)
name: “Canonical Charm School”type: “Event”starts: 1402664400000ends: 1402696800000
partOf
partOf
v(6)
name: “Joshua Shinavier”type: “Person”githubId: “joshsh”
v(5)
presentedBy
presentedBy
v(7)
name: “TinkerPop suite”type: “Software”
hasTopic
v(8)
name: “Aurelius Graph Cluster”type: “Software”
contributesTo
contributesTohasTopic
contributesTo
weight: 0.2
weight: 0.8
name: “James Thornton”type: “Person”githubId: “espeed”
weight: 1.0
Neo4j
OrientDB
Sparksee*
* the graph database previously known as DEX
etc.
Enter
• single Property Graph API supported by diverse graph database backends
• choose your favorite, but avoid vendor lock-in
• Blueprints : graph DB :: JDBC : RDBMS
• implementations, “ouplementations”, test suites, and helper utilities are built on top
Blueprints implementations
Now we need a query language…
• build it on the Blueprints API
• query over any Blueprints-compatible DB
• make it path-like, with side-effects
• match abstract traversals through the graph, filtering, ranking, and mutating as you go
• make it interactive. How about a REPL?
• a domain-specific language for traversing graphs
• Turing-complete, permits access to the full JDK
• has been adapted to various JVM languages
• Gremlin : graph DB :: SQL : RDBMS… sort of
Enter
Think “pipes and filters”
• Pipes: dataflow framework. The basis of Gremlin
• Frames: Java bean framework for graphs
• Furnace: Property Graph algorithms
• Rexster: high-performance graph database server
The rest of the TinkerPop family
TinkerPop is…• a developer group creating an open-source graph DB
stack
• a community of users and third-party implementors
• a foundation for building high-performance graph applications of any size
• model some data on your laptop
• build massive clustered applications
• open source, BSD licensed
A detailed guide to the rest of this workshop
• intro to the Aurelius Graph Cluster
• demos of graph tools and concepts
• guided installation of tools
• preview of TinkerPop3
Thanks!
The Aurelius Graph Cluster
In TinkerPop…
• we adapt various graph DBs to a unified API
• they become Property Graph databases
With AGC…
• we adapt various high-performance databases to the Titan API
• they become graph databases
Take your pick of CAP
Titan highlights• graphs, transactions scale with the number of
machines in a cluster
• geo, numeric range, and full text search for vertices and edges
• support for either of two indexing backends
• ElasticSearch, Lucene
• native support for Blueprints, Rexster
Dealing with supernodes
• Titan’s vertex-centric indices permit ordered querying from a vertex
• e.g. retrieve “knows” edges… in order of “since” timestamp
• iterates efficiently, even if there are thousands of edges
What about Faunus
Faunus…
• is a Hadoop-based graph analytics engine
• in Titan 0.5 will simply be called Titan/Hadoop
• adds support for global distributed graph operations
• applies (a subset of) Gremlin in a breadth-first fashion
Faunus inputs and outputs
• Hadoop SequenceFile format (in/out)
• Titan graph DB (in/out)
• GraphSON format (in/out)
• Rexster (in)
• RDF (in)
• Gremlin scripts (in/out)
Demo time
TinkerPop3
What’s new in TP3• new Gremlin implementation which makes good use of
Java 8 closures, enables introspection and optimization of traversals
• new OLAP API with support for message passing systems like Giraph, Hama, Faunus, etc.
• revamped I/O utilities with support for GraphSON, GraphML, and GremlinKryo
• new server model, incl. remote execution of scripts via WebSocket API, server plugin support, customizable serialization formats
Gremlitron
• Blueprints, Pipes, and Gremlin are all integrated in TinkerPop3
• Frames obsoleted by Gremlin DSLs
• Furnace is Gremlin OLAP
• Rexster is Gremlin Server
Try it out
• at:
• https://github.com/tinkerpop/tinkerpop3
• mailing list:
• https://groups.google.com/forum/gremlin-users
• we welcome your feedback and/or PRs