cassandra summit 2014: titandb - scaling relationship data and analysis with cassandra
DESCRIPTION
Presenter: Matthias Broecheler, Managing Partner at Aurelius LLC Storing relationship data in Cassandra entails data denormalization or pointer chasing inside the application which reduces developer productivity, is error prone, and slow due to lack of optimization. Titan:db exposes a property graph data model directly atop Cassandra which makes storing and querying relationship data fast, easy, and scalable to huge graphs. This talk demonstrates how Titan's features enable complex, multi-relational databases in Cassandra and discusses customer use cases for recommendation and personalization engines.TRANSCRIPT
AURELIUS THINKAURELIUS.COM
Titan:db Scaling Relationship Data with C*
Matthias Broecheler @mbroecheler September XI, MMXIV
#CassandraSummit
Storing relationship data in Cassandra entails data denormalization or pointer chasing inside the application which reduces developer productivity, is error prone, and slow due to lack of optimization. Titan:db exposes a property graph data model directly atop Cassandra which makes storing and querying relationship data fast, easy, and scalable to huge graphs. This talk demonstrates how Titan's features enable complex, multi-relational databases in Cassandra and discusses customer use cases for recommendation and personalization engines.
Multi-Relational Data Structure
Graph
Titan = Cassandra + Graph
Titan 0.5
Cassandra
,CH?;L M=;F;<CFCNS
@;OFN NIF?L;H=?
IJ?H MIOL=?
GOFNC >;N;=?HN?L
BCAB J?L@ILG;H=?
Key ColumnA ColumnB ColumnC ColumnD ColumnE ColumnF
username email password
ma7 ma7@ 12345
john john@ qwerty
billy billy@ abcde
producDd name price
52235 cup 12.55
42215 spoon 7.22
24529 knife 5.32
User Product
CREATE INDEX ON User.username, User.email, Product.productid
CREATE INDEX ON username(User), email(User), productid(Product)
User Product
productid: 52235 name: cup price: 12.55
username: matt email: matt@ password: 12345
username producDd Dme
ma7 52235 9/5/14
billy 42215 8/7/14
billy 42215 8/7/14
Buy
username email password
ma7 ma7@ 12345
john john@ qwerty
billy billy@ abcde
producDd name price
52235 cup 12.55
42215 spoon 7.22
24529 knife 5.32
User Product
productid: 52235 name: cup price: 12.55
username: matt email: matt@ password: 12345
buy time: 9/5/14
What did ‘matt’ buy? Application level join
username producDd Dme
ma7 52235 9/5/14
billy 42215 8/7/14
billy 42215 8/7/14
Buy
username email password
ma7 ma7@ 12345
john john@ qwerty
billy billy@ abcde
producDd name price
52235 cup 12.55
42215 spoon 7.22
24529 knife 5.32
What did ‘matt’ buy? g.V.has(‘username’,’matt’) .out(‘buy’)
User Product
productid: 52235 name: cup price: 12.55
username: matt email: matt@ password: 12345
buy time: 9/5/14
What did ‘matt’ recently buy?
Application level join
username producDd Dme
ma7 52235 9/5/14
billy 42215 8/7/14
billy 42215 8/7/14
Buy
username email password
ma7 ma7@ 12345
john john@ qwerty
billy billy@ abcde
producDd name price
52235 cup 12.55
42215 spoon 7.22
24529 knife 5.32
g.V.has(‘username’,’matt’) .outE(‘buy’).orderBy(‘time’,DESC) [0..9].inV
What did ‘matt’ recently buy?
User Product
productid: 52235 name: cup price: 12.55
username: matt email: matt@ password: 12345
buy time: 9/5/14
What did ‘matt’ recently buy?
slow
User Product
productid: 52235 name: cup price: 12.55
username: matt email: matt@ password: 12345
buy time: 9/5/14
g.V.has(‘username’,’matt’) .outE(‘buy’).orderBy(‘time’,DESC) [0..9].inV
username producDd Dme
ma7 52235 9/5/14
billy 42215 8/7/14
billy 42215 8/7/14
What did ‘matt’ recently buy?
Rewrite join logic
username Dme producDd
ma7 9/5/14 52235
billy 8/7/14 42215
billy 8/7/14 42215
username email password
ma7 ma7@ 12345
john john@ qwerty
billy billy@ abcde
producDd name price
52235 cup 12.55
42215 spoon 7.22
24529 knife 5.32
User Product
productid: 52235 name: cup price: 12.55
username: matt email: matt@ password: 12345
buy time: 9/5/14
What did ‘matt’ recently buy?
CREATE INDEX ON buy edges by time OUT direction
g.V.has(‘username’,’matt’) .outE(‘buy’).orderBy(‘time’,DESC) [0..9].inV
producDd username Dme
52235 ma7 9/5/14
42215 billy 8/7/14
42215 billy 8/7/14
Who bought ‘52235’? More application joins
producDd Dme producDd
52235 9/5/14 ma7
42215 8/7/14 billy
42215 8/7/14 billy
username email password
ma7 ma7@ 12345
john john@ qwerty
billy billy@ abcde
producDd name price
52235 cup 12.55
42215 spoon 7.22
24529 knife 5.32
User Product
productid: 52235 name: cup price: 12.55
username: matt email: matt@ password: 12345
buy time: 9/5/14
g.V.has(‘productid’,52235) .in(‘buy’)
CREATE INDEX ON buy edges by time IN direction
Who bought ‘52235’?
Product join tables won’t scale
username producDd Dme
ma7 52235 9/5/14
billy 42215 8/7/14
billy 42215 8/7/14
username Dme producDd
ma7 9/5/14 52235
billy 8/7/14 42215
billy 8/7/14 42215
username email password
ma7 ma7@ 12345
john john@ qwerty
billy billy@ abcde
producDd name price
52235 cup 12.55
42215 spoon 7.22
24529 knife 5.32
producDd username Dme
52235 ma7 9/5/14
42215 billy 8/7/14
42215 billy 8/7/14
producDd Dme producDd
52235 9/5/14 ma7
42215 8/7/14 billy
42215 8/7/14 billy
User Product
productid: 52235 name: cup price: 12.55
username: matt email: matt@ password: 12345
buy time: 9/5/14
PARTITION Product Vertices
Token Ring (BOP)
Edge Cut
- assigns ids to map vertices into “optimal” token range - Maintains virtual partitions
Vertical Partitioning = divide communities
Vertex Cut
Combined Graph Partitioning
Database
Datastore
Transactions
v = g.V.has(‘username’,’matt’) .has(‘password’,’12345’) p = g.V.has(‘productid’,52235) e = v.addEdge(‘buy’,p) e.setProperty(‘time’,’9/11/2014’) o = g.addVertex([orderid:242343]) o.addEdge(‘buyer’,v) o.addEdge(‘product’,p) g.commit()
unit of work
Atomicity Consistency
Isolation Durability
Transaction Consistency
u = g.addVertex([username:’matt’]) p = g.V.has(‘username’,’senior’) u.addEdge(‘father’,p) p.setProperty(‘surname’,’Jones’) g.commit()
Locks acquired to ensure consistency constraints are enforced
• Index Uniqueness • Multiplicity Constraints • Cardinality Constraints
Polyglot Data Architecture
© Jay Kreps @ LinkedIn
Transaction modifications
logged
Consumers
Titan Event Framework
Use Cases
http://arli.us/magazinaluiza
Security
Fraud
http://arli.us/cisco-sec1
© Sean York @ Pearson Education
http://bit.ly/ WPTitanSEAGraph
http://arli.us/musicgraphintro
Music Graph
Knowledge Graph
TitanDB.io
Relationships + Cassandra
AURELIUS THINKAURELIUS.COM