choosing the right nosql database

43
Choosing the Right NOSQL Database Tobias Ivarsson Hacker @ Neo Technology twitter: @thobe / @neo4j / #neo4j email: [email protected] web: http://neo4j.org/ web: http://thobe.org/

Upload: tobias-lindaaker

Post on 15-Jan-2015

5.855 views

Category:

Technology


1 download

DESCRIPTION

My presentation from JavaOne 2010 on how to

TRANSCRIPT

Page 2: Choosing the right NOSQL database

2Image credit: http://browsertoolkit.com/fault-tolerance.png

Page 3: Choosing the right NOSQL database

3Image credit: http://browsertoolkit.com/fault-tolerance.png

Page 4: Choosing the right NOSQL database

4Image credit: http://browsertoolkit.com/fault-tolerance.png

This is still the view a lot of people have of NOSQL.

Page 5: Choosing the right NOSQL database

The Technologies

๏Graph Databases- Neo4j

๏Document Databases- MongoDB

๏Column Family Database- Cassandra

5

Page 6: Choosing the right NOSQL database

Neo4j is a Graph Database

6

Graph databases FOCUS on the interconnection between entities.

Page 7: Choosing the right NOSQL database

6

IS_A

Neo4j Graph Database

Graph databases FOCUS on the interconnection between entities.

Page 8: Choosing the right NOSQL database

Other Graph Databases๏Neo4j

๏Sones GraphDB

๏ Infinite Graph (by Objectivity)

๏AllegroGraph (by Franz inc.)

๏HypergraphDB

๏ InfoGrid

๏DEX

๏VertexDB

๏FlockDB7

Page 9: Choosing the right NOSQL database

Document Databases

8

Page 10: Choosing the right NOSQL database

Document Databases๏MongoDB

๏Riak

๏CouchDB

๏SimpleDB (internal at Amazon)

9

Page 11: Choosing the right NOSQL database

ColumnFamily DBs

10

Page 12: Choosing the right NOSQL database

ColumnFamily Databases๏Cassandra

๏BigTable (internal at Google)

๏HBase (part of Hadoop)

๏Hypertable

11

Page 13: Choosing the right NOSQL database

Application 1:Blog system

12

Page 14: Choosing the right NOSQL database

Requirements for a Blog System

13

๏Get blog posts for a specific blog ordered by date

• possibly filtered by tag

๏Blogs can have an arbitrary number of blog posts

๏Blog posts can have an arbitrary number of comments

Page 15: Choosing the right NOSQL database

the choice:Document DB

14

Page 16: Choosing the right NOSQL database

“Schema” design

15

๏Represent each Blog as a Collection of Post documents

๏Represent Comments as nested documents in the Post documents

Page 17: Choosing the right NOSQL database

Creating a blog post

16

import com.mongodb.Mongo;import com.mongodb.DB;import com.mongodb.DBCollection;import com.mongodb.BasicDBObject;import com.mongodb.DBObject;// ...Mongo mongo = new Mongo( "localhost" ); // Connect to MongoDB// ...DB blogs = mongo.getDB( "blogs" ); // Access the blogs databaseDBCollection myBlog = blogs.getCollection( "myBlog" );

DBObject blogPost = new BasicDBObject();blogPost.put( "title", "JavaOne 2010" );blogPost.put( "pub_date", new Date() );blogPost.put( "body", "Publishing a post about JavaOne in my

MongoDB blog!" );blogPost.put( "tags", Arrays.asList( "conference", "java" ) );blogPost.put( "comments", new ArrayList() );

myBlog.insert( blogPost );

Page 18: Choosing the right NOSQL database

Retrieving posts// ...import com.mongodb.DBCursor;// ...

public Object getAllPosts( String blogName ) {DBCollection blog = db.getCollection( blogName );return renderPosts( blog.find() );

}

public Object getPostsByTag( String blogName, String tag ) {DBCollection blog = db.getCollection( blogName );return renderPosts( blog.find(

new BasicDBObject( "tags", tag ) ) );}

private Object renderPosts( DBCursor cursor ) {// order by publication date (descending)cursor = cursor.sort( new BasicDBObject( "pub_date", -1 ) );// ...

} 17

Page 19: Choosing the right NOSQL database

Adding a commentDBCollection myBlog = blogs.getCollection( "myBlog" );// ...

void addComment( String blogPostId, String message ) {DBCursor posts = myBlog.find(

new BasicDBObject( "_id", blogPostId );if ( !posts.hasNext() ) throw new NoSuchElementException();

DBObject blogPost = posts.next();

List comments = (List)blogPost.get( "comments" );comments.add( new BasicDBObject( "message", message )

.append( "date", new Date() ) );

myBlog.save( blogPost );}

18

Page 20: Choosing the right NOSQL database

Application 2:Twitter Clone

19

Page 21: Choosing the right NOSQL database

Requirements for a Twitter Clone

20

๏Handle high load - especially high write load

•Twitter generates 300GB of tweets / hour (April 2010)

๏Retrieve all posts by a specific user, ordered by date

๏Retrieve all posts by people a specific user follows, ordered by date

Page 22: Choosing the right NOSQL database

the choice:ColumnFamily DB

21

Page 23: Choosing the right NOSQL database

Schema design

22

๏Main keyspace: “Twissandra”, with these ColumnFamilies:

•User - user data, keyed by user id (UUID)

•Username - inverted index from username to user id

• Friends - who is user X following?

• Followers - who is following user X?

•Tweet - the actual messages

•Userline - timeline of tweets posted by a specific user

•Timeline - timeline of tweets posted by usersthat a specific user follows

Page 24: Choosing the right NOSQL database

... that’s a lot of denormalization ...๏ColumnFamilies are similar to tables in an RDBMS

๏Each ColumnFamily can only have one Key

๏This makes the data highly shardable

๏Which in turn enables very high write throughput

๏Note however that each ColumnFamily will require its own writes

•There are no ACID transactions

•YOU as a developer is responsible for Consistency!

• (again, this gives you really high write throughput)

23

Page 25: Choosing the right NOSQL database

Create user

24

new_useruuid = str(uuid())

USER.insert(useruuid, {'id': new_useruuid,'username': username,'password': password})

USERNAME.insert(username, {'id': new_useruuid})

FRIENDS.insert(useruuid, {frienduuid: time.time()})FOLLOWERS.insert(frienduuid, {useruuid: time.time()})

Follow user

Page 26: Choosing the right NOSQL database

Create messagetweetuuid = str(uuid())timestamp = long(time.time() * 1e6)

TWEET.insert(tweetuuid, {'id': tweetuuid,'user_id': useruuid,'body': body,'_ts': timestamp})

message_ref = {struct.pack('>d'),timestamp: tweetuuid}

USERLINE.insert(useruuid, message_ref)

TIMELINE.insert(useruuid, message_ref)for otheruuid in FOLLOWERS.get(useruuid, 5000): TIMELINE.insert(otheruuid, message_ref)

25

Page 27: Choosing the right NOSQL database

Get messages

timeline = TIMELINE.get(useruuid,column_start=start,column_count=NUM_PER_PAGE,column_reversed=True)

tweets = TWEET.multiget( timeline.values() )

timeline = USERLINE.get(useruuid,column_start=start,column_count=NUM_PER_PAGE,column_reversed=True)

tweets = TWEET.multiget( timeline.values() )

26

For all users this user follows

By a specific user

Page 28: Choosing the right NOSQL database

Application 3:Social Network

27

Page 29: Choosing the right NOSQL database

Requirements for a Social Network

28

๏ Interact with friends

๏Get recommendations for new friends

๏View the social context of a personi.e. How do I know this person?

Page 30: Choosing the right NOSQL database

the choice:Graph DB

29

Page 31: Choosing the right NOSQL database

“Schema” design

30

๏Persons represented by Nodes

๏Friendship represented by Relationships between Person Nodes

๏Groups represented by Nodes

๏Group membership represented by Relationshipfrom Person Node to Group Node

๏ Index for Person Nodes for lookup by name

๏ Index for Group Nodes for lookup by name

Page 32: Choosing the right NOSQL database

A small social graph example

31

Nebuchadnezzar crew

Agent taskforce

Morpheus

Agent Smith

Agent Brown

Cypher

Trinity

Thomas Anderson

Tank

Dozer

Qualifier: F

amily

Qualifier: Lovers

FRIENDSHIP

MEMBERSHIP

Page 33: Choosing the right NOSQL database

Creating the social graph

32

GraphDatabaseService graphDb = new EmbeddedGraphDatabase(GRAPH_STORAGE_LOCATION );

IndexService indexes = new LuceneIndexService( graphDb );Transaction tx = graphDb.beginTx();try {

Node mrAnderson = graphDb.createNode();mrAnderson.setProperty( "name", "Thomas Anderson" );mrAnderson.setProperty( "age", 29 );indexes.index( mrAnderson, "person", "Thomas Anderson" );Node morpheus = graphDb.createNode();morpheus.setProperty( "name", "Morpheus" );morpheus.setProperty( "rank", "Captain" );indexes.index( mrAnderson, "person", "Morpheus" );Relationship friendship = mrAnderson.createRelationshipTo(

morpheus, SocialGraphTypes.FRIENDSHIP );

tx.success();} finally {

tx.finish();}

Page 34: Choosing the right NOSQL database

Making new friendsNode person1 = indexes.getSingle( "persons", person1Name );Node person2 = indexes.getSingle( "persons", person2Name );

person1.createRelationshipTo(person2, SocialGraphTypes.FRIENDSHIP );

Node person = indexes.getSingle( "persons", personName );Node group = indexes.getSingle( "groups", groupName );

person.createRelationshipTo(group, SocialGraphTypes.MEMBERSHIP );

33

Joining a group

Page 35: Choosing the right NOSQL database

How do I know this person?Node me = ...Node you = ...

PathFinder shortestPathFinder = GraphAlgoFactory.shortestPath(Traversals.expanderForTypes(

SocialGraphTypes.FRIENDSHIP, Direction.BOTH ),/* maximum depth: */ 4 );

Path shortestPath = shortestPathFinder.findSinglePath(me, you);

for ( Node friend : shortestPath.nodes() ) {System.out.println( friend.getProperty( "name" ) );

}

34

Page 36: Choosing the right NOSQL database

Recommend new friendsNode person = ...

TraversalDescription friendsOfFriends = Traversal.description().expand( Traversals.expanderForTypes(

SocialGraphTypes.FRIENDSHIP, Direction.BOTH ) ).prune( Traversal.pruneAfterDepth( 2 ) ).breadthFirst() // Visit my friends before their friends.//Visit a node at most once (don’t recommend direct friends).uniqueness( Uniqueness.NODE_GLOBAL ).filter( new Predicate<Path>() {

// Only return friends of friendspublic boolean accept( Path traversalPos ) {

return traversalPos.length() == 2;}

} );

for ( Node recommendation : friendsOfFriends.traverse( person ).nodes() ) {System.out.println( recommendedFriend.getProperty("name") );

} 35

Page 37: Choosing the right NOSQL database

When to use Document DB (e.g. MongoDB)๏When data is collections of similar entities

•But semi structured (sparse) rather than tabular

•When fields in entries have multiple values

36

Page 38: Choosing the right NOSQL database

When to use ColumnFamily DB (e.g. Cassandra)๏When scalability is the main issue

•Both scaling size and scaling load

‣In particular scaling write load

๏Linear scalability (as you add servers) both in read and write

๏Low level - will require you to duplicate data to support queries

37

Page 39: Choosing the right NOSQL database

When to use Graph DB (e.g. Neo4j)๏When deep traversals are important

๏For complex and domains

๏When how entities relate is an important aspect of the domain

38

Page 40: Choosing the right NOSQL database

When not to use a NOSQL Database๏RDBMSes have been the de-facto standard for years, and still have

better tools for some tasks

• Especially for reporting

๏When maintaining a system that works already

๏Sometimes when data is uniform / structured

๏When aggregations over (subsets) of the entire dataset is key

๏But please don’t use a Relational database for persisting objects

39

Page 41: Choosing the right NOSQL database

Complex problem? - right tool for each job!

40Image credits: Unknown :’(

Page 42: Choosing the right NOSQL database

Polyglot persistence

41

๏Use multiple databases in the same system- use the right tool for each part of the system

๏Examples:

•Use an RDBMS for structured data and a Graph Database for modeling the relationships between entities

•Use a Graph Database for the domain model and a Document Database for storing large data objects

Page 43: Choosing the right NOSQL database

http://neotechnology.com

- the Graph Database company