cassandra summit 2014: titandb - scaling relationship data and analysis with cassandra

Post on 02-Dec-2014

346 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presenter: Matthias Broecheler, Managing Partner at Aurelius LLC Storing relationship data in Cassandra entails data denormalization or pointer chasing inside the application which reduces developer productivity, is error prone, and slow due to lack of optimization. Titan:db exposes a property graph data model directly atop Cassandra which makes storing and querying relationship data fast, easy, and scalable to huge graphs. This talk demonstrates how Titan's features enable complex, multi-relational databases in Cassandra and discusses customer use cases for recommendation and personalization engines.

TRANSCRIPT

AURELIUS THINKAURELIUS.COM

Titan:db Scaling Relationship Data with C*

Matthias Broecheler @mbroecheler September XI, MMXIV

#CassandraSummit

Storing relationship data in Cassandra entails data denormalization or pointer chasing inside the application which reduces developer productivity, is error prone, and slow due to lack of optimization. Titan:db exposes a property graph data model directly atop Cassandra which makes storing and querying relationship data fast, easy, and scalable to huge graphs. This talk demonstrates how Titan's features enable complex, multi-relational databases in Cassandra and discusses customer use cases for recommendation and personalization engines.

Multi-Relational Data Structure

Graph

Titan = Cassandra + Graph

Titan 0.5

Cassandra

,CH?;L M=;F;<CFCNS

@;OFN NIF?L;H=?

IJ?H MIOL=?

GOFNC >;N;=?HN?L

BCAB J?L@ILG;H=?

Key   ColumnA   ColumnB   ColumnC   ColumnD   ColumnE   ColumnF  

username   email   password  

ma7   ma7@   12345  

john   john@   qwerty  

billy   billy@   abcde  

producDd   name   price  

52235   cup   12.55  

42215   spoon   7.22  

24529   knife   5.32  

User Product

CREATE INDEX ON User.username, User.email, Product.productid

CREATE INDEX ON username(User), email(User), productid(Product)

User Product

productid: 52235 name: cup price: 12.55

username: matt email: matt@ password: 12345

username   producDd   Dme  

ma7   52235   9/5/14  

billy   42215   8/7/14  

billy   42215   8/7/14  

Buy

username   email   password  

ma7   ma7@   12345  

john   john@   qwerty  

billy   billy@   abcde  

producDd   name   price  

52235   cup   12.55  

42215   spoon   7.22  

24529   knife   5.32  

User Product

productid: 52235 name: cup price: 12.55

username: matt email: matt@ password: 12345

buy time: 9/5/14

What did ‘matt’ buy? Application level join

username   producDd   Dme  

ma7   52235   9/5/14  

billy   42215   8/7/14  

billy   42215   8/7/14  

Buy

username   email   password  

ma7   ma7@   12345  

john   john@   qwerty  

billy   billy@   abcde  

producDd   name   price  

52235   cup   12.55  

42215   spoon   7.22  

24529   knife   5.32  

What did ‘matt’ buy? g.V.has(‘username’,’matt’) .out(‘buy’)

User Product

productid: 52235 name: cup price: 12.55

username: matt email: matt@ password: 12345

buy time: 9/5/14

What did ‘matt’ recently buy?

Application level join

username   producDd   Dme  

ma7   52235   9/5/14  

billy   42215   8/7/14  

billy   42215   8/7/14  

Buy

username   email   password  

ma7   ma7@   12345  

john   john@   qwerty  

billy   billy@   abcde  

producDd   name   price  

52235   cup   12.55  

42215   spoon   7.22  

24529   knife   5.32  

g.V.has(‘username’,’matt’) .outE(‘buy’).orderBy(‘time’,DESC) [0..9].inV

What did ‘matt’ recently buy?

User Product

productid: 52235 name: cup price: 12.55

username: matt email: matt@ password: 12345

buy time: 9/5/14

What did ‘matt’ recently buy?

slow

User Product

productid: 52235 name: cup price: 12.55

username: matt email: matt@ password: 12345

buy time: 9/5/14

g.V.has(‘username’,’matt’) .outE(‘buy’).orderBy(‘time’,DESC) [0..9].inV

username   producDd   Dme  

ma7   52235   9/5/14  

billy   42215   8/7/14  

billy   42215   8/7/14  

What did ‘matt’ recently buy?

Rewrite join logic

username   Dme   producDd  

ma7   9/5/14   52235  

billy   8/7/14   42215  

billy   8/7/14   42215  

username   email   password  

ma7   ma7@   12345  

john   john@   qwerty  

billy   billy@   abcde  

producDd   name   price  

52235   cup   12.55  

42215   spoon   7.22  

24529   knife   5.32  

User Product

productid: 52235 name: cup price: 12.55

username: matt email: matt@ password: 12345

buy time: 9/5/14

What did ‘matt’ recently buy?

CREATE INDEX ON buy edges by time OUT direction

g.V.has(‘username’,’matt’) .outE(‘buy’).orderBy(‘time’,DESC) [0..9].inV

producDd   username   Dme  

52235   ma7   9/5/14  

42215   billy   8/7/14  

42215   billy   8/7/14  

Who bought ‘52235’? More application joins

producDd   Dme   producDd  

52235   9/5/14   ma7  

42215   8/7/14   billy  

42215   8/7/14   billy  

username   email   password  

ma7   ma7@   12345  

john   john@   qwerty  

billy   billy@   abcde  

producDd   name   price  

52235   cup   12.55  

42215   spoon   7.22  

24529   knife   5.32  

User Product

productid: 52235 name: cup price: 12.55

username: matt email: matt@ password: 12345

buy time: 9/5/14

g.V.has(‘productid’,52235) .in(‘buy’)

CREATE INDEX ON buy edges by time IN direction

Who bought ‘52235’?

Product join tables won’t scale

username   producDd   Dme  

ma7   52235   9/5/14  

billy   42215   8/7/14  

billy   42215   8/7/14  

username   Dme   producDd  

ma7   9/5/14   52235  

billy   8/7/14   42215  

billy   8/7/14   42215  

username   email   password  

ma7   ma7@   12345  

john   john@   qwerty  

billy   billy@   abcde  

producDd   name   price  

52235   cup   12.55  

42215   spoon   7.22  

24529   knife   5.32  

producDd   username   Dme  

52235   ma7   9/5/14  

42215   billy   8/7/14  

42215   billy   8/7/14  

producDd   Dme   producDd  

52235   9/5/14   ma7  

42215   8/7/14   billy  

42215   8/7/14   billy  

User Product

productid: 52235 name: cup price: 12.55

username: matt email: matt@ password: 12345

buy time: 9/5/14

PARTITION Product Vertices

Token Ring (BOP)

Edge Cut

- assigns ids to map vertices into “optimal” token range - Maintains virtual partitions

Vertical Partitioning = divide communities

Vertex Cut

Combined Graph Partitioning

Database

Datastore

Transactions

v = g.V.has(‘username’,’matt’) .has(‘password’,’12345’) p = g.V.has(‘productid’,52235) e = v.addEdge(‘buy’,p) e.setProperty(‘time’,’9/11/2014’) o = g.addVertex([orderid:242343]) o.addEdge(‘buyer’,v) o.addEdge(‘product’,p) g.commit()

unit of work

Atomicity Consistency

Isolation Durability

Transaction Consistency

u = g.addVertex([username:’matt’]) p = g.V.has(‘username’,’senior’) u.addEdge(‘father’,p) p.setProperty(‘surname’,’Jones’) g.commit()

Locks acquired to ensure consistency constraints are enforced

•  Index Uniqueness •  Multiplicity Constraints •  Cardinality Constraints

Polyglot Data Architecture

© Jay Kreps @ LinkedIn

Transaction modifications

logged

Consumers

Titan Event Framework

Use Cases

http://arli.us/magazinaluiza

Security

Fraud

http://arli.us/cisco-sec1

© Sean York @ Pearson Education

http://bit.ly/ WPTitanSEAGraph

http://arli.us/musicgraphintro

Music Graph

Knowledge Graph

TitanDB.io

Relationships + Cassandra

AURELIUS THINKAURELIUS.COM

top related