cassandra summit 2014: titandb - scaling relationship data and analysis with cassandra

39
AURELIUS THINKAURELIUS.COM Titan:db Scaling Relationship Data with C* Matthias Broecheler @mbroecheler September XI, MMXIV #CassandraSummit

Upload: planet-cassandra

Post on 02-Dec-2014

346 views

Category:

Technology


0 download

DESCRIPTION

Presenter: Matthias Broecheler, Managing Partner at Aurelius LLC Storing relationship data in Cassandra entails data denormalization or pointer chasing inside the application which reduces developer productivity, is error prone, and slow due to lack of optimization. Titan:db exposes a property graph data model directly atop Cassandra which makes storing and querying relationship data fast, easy, and scalable to huge graphs. This talk demonstrates how Titan's features enable complex, multi-relational databases in Cassandra and discusses customer use cases for recommendation and personalization engines.

TRANSCRIPT

Page 1: Cassandra Summit 2014: TitanDB - Scaling Relationship Data and Analysis with Cassandra

AURELIUS THINKAURELIUS.COM

Titan:db Scaling Relationship Data with C*

Matthias Broecheler @mbroecheler September XI, MMXIV

#CassandraSummit

Page 2: Cassandra Summit 2014: TitanDB - Scaling Relationship Data and Analysis with Cassandra

Storing relationship data in Cassandra entails data denormalization or pointer chasing inside the application which reduces developer productivity, is error prone, and slow due to lack of optimization. Titan:db exposes a property graph data model directly atop Cassandra which makes storing and querying relationship data fast, easy, and scalable to huge graphs. This talk demonstrates how Titan's features enable complex, multi-relational databases in Cassandra and discusses customer use cases for recommendation and personalization engines.

Page 3: Cassandra Summit 2014: TitanDB - Scaling Relationship Data and Analysis with Cassandra

Multi-Relational Data Structure

Graph

Page 4: Cassandra Summit 2014: TitanDB - Scaling Relationship Data and Analysis with Cassandra

Titan = Cassandra + Graph

Page 5: Cassandra Summit 2014: TitanDB - Scaling Relationship Data and Analysis with Cassandra

Titan 0.5

Page 6: Cassandra Summit 2014: TitanDB - Scaling Relationship Data and Analysis with Cassandra

Cassandra

,CH?;L M=;F;<CFCNS

@;OFN NIF?L;H=?

IJ?H MIOL=?

GOFNC >;N;=?HN?L

BCAB J?L@ILG;H=?

Page 7: Cassandra Summit 2014: TitanDB - Scaling Relationship Data and Analysis with Cassandra

Key   ColumnA   ColumnB   ColumnC   ColumnD   ColumnE   ColumnF  

Page 8: Cassandra Summit 2014: TitanDB - Scaling Relationship Data and Analysis with Cassandra
Page 9: Cassandra Summit 2014: TitanDB - Scaling Relationship Data and Analysis with Cassandra

username   email   password  

ma7   ma7@   12345  

john   john@   qwerty  

billy   billy@   abcde  

producDd   name   price  

52235   cup   12.55  

42215   spoon   7.22  

24529   knife   5.32  

User Product

CREATE INDEX ON User.username, User.email, Product.productid

Page 10: Cassandra Summit 2014: TitanDB - Scaling Relationship Data and Analysis with Cassandra

CREATE INDEX ON username(User), email(User), productid(Product)

User Product

productid: 52235 name: cup price: 12.55

username: matt email: matt@ password: 12345

Page 11: Cassandra Summit 2014: TitanDB - Scaling Relationship Data and Analysis with Cassandra

username   producDd   Dme  

ma7   52235   9/5/14  

billy   42215   8/7/14  

billy   42215   8/7/14  

Buy

username   email   password  

ma7   ma7@   12345  

john   john@   qwerty  

billy   billy@   abcde  

producDd   name   price  

52235   cup   12.55  

42215   spoon   7.22  

24529   knife   5.32  

Page 12: Cassandra Summit 2014: TitanDB - Scaling Relationship Data and Analysis with Cassandra

User Product

productid: 52235 name: cup price: 12.55

username: matt email: matt@ password: 12345

buy time: 9/5/14

Page 13: Cassandra Summit 2014: TitanDB - Scaling Relationship Data and Analysis with Cassandra

What did ‘matt’ buy? Application level join

username   producDd   Dme  

ma7   52235   9/5/14  

billy   42215   8/7/14  

billy   42215   8/7/14  

Buy

username   email   password  

ma7   ma7@   12345  

john   john@   qwerty  

billy   billy@   abcde  

producDd   name   price  

52235   cup   12.55  

42215   spoon   7.22  

24529   knife   5.32  

Page 14: Cassandra Summit 2014: TitanDB - Scaling Relationship Data and Analysis with Cassandra

What did ‘matt’ buy? g.V.has(‘username’,’matt’) .out(‘buy’)

User Product

productid: 52235 name: cup price: 12.55

username: matt email: matt@ password: 12345

buy time: 9/5/14

Page 15: Cassandra Summit 2014: TitanDB - Scaling Relationship Data and Analysis with Cassandra

What did ‘matt’ recently buy?

Application level join

username   producDd   Dme  

ma7   52235   9/5/14  

billy   42215   8/7/14  

billy   42215   8/7/14  

Buy

username   email   password  

ma7   ma7@   12345  

john   john@   qwerty  

billy   billy@   abcde  

producDd   name   price  

52235   cup   12.55  

42215   spoon   7.22  

24529   knife   5.32  

Page 16: Cassandra Summit 2014: TitanDB - Scaling Relationship Data and Analysis with Cassandra

g.V.has(‘username’,’matt’) .outE(‘buy’).orderBy(‘time’,DESC) [0..9].inV

What did ‘matt’ recently buy?

User Product

productid: 52235 name: cup price: 12.55

username: matt email: matt@ password: 12345

buy time: 9/5/14

Page 17: Cassandra Summit 2014: TitanDB - Scaling Relationship Data and Analysis with Cassandra

What did ‘matt’ recently buy?

slow

User Product

productid: 52235 name: cup price: 12.55

username: matt email: matt@ password: 12345

buy time: 9/5/14

g.V.has(‘username’,’matt’) .outE(‘buy’).orderBy(‘time’,DESC) [0..9].inV

Page 18: Cassandra Summit 2014: TitanDB - Scaling Relationship Data and Analysis with Cassandra

username   producDd   Dme  

ma7   52235   9/5/14  

billy   42215   8/7/14  

billy   42215   8/7/14  

What did ‘matt’ recently buy?

Rewrite join logic

username   Dme   producDd  

ma7   9/5/14   52235  

billy   8/7/14   42215  

billy   8/7/14   42215  

username   email   password  

ma7   ma7@   12345  

john   john@   qwerty  

billy   billy@   abcde  

producDd   name   price  

52235   cup   12.55  

42215   spoon   7.22  

24529   knife   5.32  

Page 19: Cassandra Summit 2014: TitanDB - Scaling Relationship Data and Analysis with Cassandra

User Product

productid: 52235 name: cup price: 12.55

username: matt email: matt@ password: 12345

buy time: 9/5/14

What did ‘matt’ recently buy?

CREATE INDEX ON buy edges by time OUT direction

g.V.has(‘username’,’matt’) .outE(‘buy’).orderBy(‘time’,DESC) [0..9].inV

Page 20: Cassandra Summit 2014: TitanDB - Scaling Relationship Data and Analysis with Cassandra

producDd   username   Dme  

52235   ma7   9/5/14  

42215   billy   8/7/14  

42215   billy   8/7/14  

Who bought ‘52235’? More application joins

producDd   Dme   producDd  

52235   9/5/14   ma7  

42215   8/7/14   billy  

42215   8/7/14   billy  

username   email   password  

ma7   ma7@   12345  

john   john@   qwerty  

billy   billy@   abcde  

producDd   name   price  

52235   cup   12.55  

42215   spoon   7.22  

24529   knife   5.32  

Page 21: Cassandra Summit 2014: TitanDB - Scaling Relationship Data and Analysis with Cassandra

User Product

productid: 52235 name: cup price: 12.55

username: matt email: matt@ password: 12345

buy time: 9/5/14

g.V.has(‘productid’,52235) .in(‘buy’)

CREATE INDEX ON buy edges by time IN direction

Who bought ‘52235’?

Page 22: Cassandra Summit 2014: TitanDB - Scaling Relationship Data and Analysis with Cassandra

Product join tables won’t scale

username   producDd   Dme  

ma7   52235   9/5/14  

billy   42215   8/7/14  

billy   42215   8/7/14  

username   Dme   producDd  

ma7   9/5/14   52235  

billy   8/7/14   42215  

billy   8/7/14   42215  

username   email   password  

ma7   ma7@   12345  

john   john@   qwerty  

billy   billy@   abcde  

producDd   name   price  

52235   cup   12.55  

42215   spoon   7.22  

24529   knife   5.32  

producDd   username   Dme  

52235   ma7   9/5/14  

42215   billy   8/7/14  

42215   billy   8/7/14  

producDd   Dme   producDd  

52235   9/5/14   ma7  

42215   8/7/14   billy  

42215   8/7/14   billy  

Page 23: Cassandra Summit 2014: TitanDB - Scaling Relationship Data and Analysis with Cassandra

User Product

productid: 52235 name: cup price: 12.55

username: matt email: matt@ password: 12345

buy time: 9/5/14

PARTITION Product Vertices

Page 24: Cassandra Summit 2014: TitanDB - Scaling Relationship Data and Analysis with Cassandra

Token Ring (BOP)

Edge Cut

- assigns ids to map vertices into “optimal” token range - Maintains virtual partitions

Vertical Partitioning = divide communities

Page 25: Cassandra Summit 2014: TitanDB - Scaling Relationship Data and Analysis with Cassandra

Vertex Cut

Page 26: Cassandra Summit 2014: TitanDB - Scaling Relationship Data and Analysis with Cassandra

Combined Graph Partitioning

Page 27: Cassandra Summit 2014: TitanDB - Scaling Relationship Data and Analysis with Cassandra

Database

Datastore

Page 28: Cassandra Summit 2014: TitanDB - Scaling Relationship Data and Analysis with Cassandra

Transactions

v = g.V.has(‘username’,’matt’) .has(‘password’,’12345’) p = g.V.has(‘productid’,52235) e = v.addEdge(‘buy’,p) e.setProperty(‘time’,’9/11/2014’) o = g.addVertex([orderid:242343]) o.addEdge(‘buyer’,v) o.addEdge(‘product’,p) g.commit()

unit of work

Atomicity Consistency

Isolation Durability

Page 29: Cassandra Summit 2014: TitanDB - Scaling Relationship Data and Analysis with Cassandra

Transaction Consistency

u = g.addVertex([username:’matt’]) p = g.V.has(‘username’,’senior’) u.addEdge(‘father’,p) p.setProperty(‘surname’,’Jones’) g.commit()

Locks acquired to ensure consistency constraints are enforced

•  Index Uniqueness •  Multiplicity Constraints •  Cardinality Constraints

Page 30: Cassandra Summit 2014: TitanDB - Scaling Relationship Data and Analysis with Cassandra

Polyglot Data Architecture

© Jay Kreps @ LinkedIn

Page 31: Cassandra Summit 2014: TitanDB - Scaling Relationship Data and Analysis with Cassandra

Transaction modifications

logged

Consumers

Titan Event Framework

Page 32: Cassandra Summit 2014: TitanDB - Scaling Relationship Data and Analysis with Cassandra

Use Cases

Page 33: Cassandra Summit 2014: TitanDB - Scaling Relationship Data and Analysis with Cassandra

http://arli.us/magazinaluiza

Page 34: Cassandra Summit 2014: TitanDB - Scaling Relationship Data and Analysis with Cassandra

Security

Fraud

http://arli.us/cisco-sec1

Page 35: Cassandra Summit 2014: TitanDB - Scaling Relationship Data and Analysis with Cassandra

© Sean York @ Pearson Education

Page 36: Cassandra Summit 2014: TitanDB - Scaling Relationship Data and Analysis with Cassandra

http://bit.ly/ WPTitanSEAGraph

Page 37: Cassandra Summit 2014: TitanDB - Scaling Relationship Data and Analysis with Cassandra

http://arli.us/musicgraphintro

Music Graph

Knowledge Graph

Page 38: Cassandra Summit 2014: TitanDB - Scaling Relationship Data and Analysis with Cassandra

TitanDB.io

Relationships + Cassandra

Page 39: Cassandra Summit 2014: TitanDB - Scaling Relationship Data and Analysis with Cassandra

AURELIUS THINKAURELIUS.COM