nosql databasesbigdata.cs.byu.edu/wp-content/uploads/2015/11/andrew... · 2016-12-29 ·...

NOSQL DATABASES A comparison between the MongoDB, Cassandra, and Redis databases

OCTOBER 20, 2015 ANDREW HYTE

Contents Introduction .................................................................................................................................................. 2

MongoDB ...................................................................................................................................................... 2

History ....................................................................................................................................................... 2

Data Model ............................................................................................................................................... 2

Physical Storage ........................................................................................................................................ 3

Transactions .............................................................................................................................................. 4

Scalability .................................................................................................................................................. 4

Cassandra ...................................................................................................................................................... 5

History ....................................................................................................................................................... 5

Data Model ............................................................................................................................................... 5


Transactions .............................................................................................................................................. 6

Scalability .................................................................................................................................................. 7

Redis .............................................................................................................................................................. 8

History ....................................................................................................................................................... 8

Data Model ............................................................................................................................................... 8


Transactions .............................................................................................................................................. 9

Scalability .................................................................................................................................................. 9

Differences .................................................................................................................................................. 10

Conclusions ................................................................................................................................................. 10

Introduction NoSQL Databases are acclaimed by many web developers for their partition tolerance. Most NoSQL

databases have that in common however there are many differences in the way they achieve that

scalability and in how much availability or consistency they provide. This report examines the history,

data models, physical storage, transactional capabilities, and the scalability of three different types of

NoSQL databases. Pay attention to the differences between the three and the advantages or

disadvantages one affords over another.

MongoDB MongoDb may be the most well-known of all NoSQL databases. This assumption is supported by the fact

that the DB-Engines group has ranked MongoDB as the number four most popular Database

Management System overall, lead in popularity only by three relational databases (Oracle, MySQL, and

Microsoft SQL Server) (DB-Engines).

History It was originally created by two developers who founded “DoubleClick”, Eliot Horowitz and Dwight

Merriman (Chodorow). The two left their company and went on to found several other startups and ran

into the same problem over and over: How to scale out an application? What they decided to do next

was to create a type of platform as a service similar to Google app engine. They originally were going to

call the database ED for Eliot and Dwight. The database was only part of this PaaS package. The system

as a whole was not readily adopted and the project would have been a flop had it not been for the

database. People were saying stuff like, “Well, the database is cool, but blech, app engine.” (Chodorow).

When the owners recognized what they had, they decided to strip out the database, name it Mongo,

coming from “Humongous”, and open source it. The database quickly started gaining traction and soon

had many developers not only using it but also contributing to the project and creating their own

versions. Today there are countless projects which originated from MongoDB such as: Casbah, Morphia,

MongoMapper, Mongoose, CandyGram, MongoKit, Mongoid, and Ming, to name a few.

Data Model MongoDB uses documents in the Binary JSON format to store data instead of tables and rows as used in

a traditional relational database (MongoDB). MongoDB has a document data model which works well

for most modern software applications using the object oriented programming paradigm. Since the

document data model is lightweight, traversable and fast, MongoDB also supports useful queries along

with its ability to uniquely index data. Indexes support the efficient traversal of collections when

querying data. If there are no indexes defined, then Mongo must traverse all of the documents in a

collection versus just the ones with the particular index. One of the most interesting ways Mongo has

made it possible to index and query data is using geospatial indexes. There is a built in 2dsphere

indexing system which allows data to be indexed in relation to some 2dshere, like the earth. This makes

geospatial queries exceptionally quick. MongoDB uses BSON (Binary JSON) to build their document data

structure. BSON build on JSON to include some extra types and to provide efficient encoding and

decoding of data in different languages.

Physical Storage Thanks to Mongo’s ability to easily shard a database into many sections, the size of the physical storage

on a machine is not limited. One could have a 1 TB database on a single machine or just as easily have 4

250 GB shards distributed on to multiple machines. This report will discuss sharding further in the

Scalability section.

One topic to be discussed in physical storage is replication. Mongo can replicate its data easily, creating

multiple servers that hold the same data. If needed, one of these servers can become the primary

server. Mongo allows this to happen easily with its implementation of replica set elections. The primary

in a set is the only database that can accept write operations. If a primary member of the set becomes

unavailable, elections make it possible to resume normal operation without the necessary intervention

of a DBA.

Figure 1: If a member fails, an election is held to change the primary. https://docs.mongodb.org/manual/core/replica-set-elections/

Elections take some time and don’t allow for writes during the process; for these reasons, Mongo avoids

holding elections unless absolutely necessary (MongoDB).

The following are some of the factors which drive elections: the replica databases send heartbeats

(pings) to each other every two seconds. If a heartbeat is not returned within 10 seconds, the delinquent

server is marked as unavailable. Priorities may be set on a replica member. If the highest priority

member is already the primary, then no election will be held. Members with a zero priority cannot

become the primary and are not considered in the elections. A member must be able to connect to a

majority of the other members in order to be eligible to become primary. If there are no members to

connect to a majority of the other members in a replica set, then no primary will be elected.

The following are a few of the events which may trigger an election: if there is not currently a primary,

an election will be held. This may occur if a new replica set has been added, a secondary loses

connection with a primary, or a primary steps down. Primaries will step down if they are asked

specifically with a command, if another member has a higher priority and is eligible to be a primary, or if

the primary loses contact with the majority of the group.

Elections are a way that MongoDB has made it easy to shard the database onto many different

machines and still be able to use replication. A large benefit to this is that it is already integrated. The

overhead of MongoDB is very light since much of the needed systems are already integrated in Mongo,

such as Analytics, text search, geospatial, in-memory performance, and global replication (MongoDB).

Transactions Mongo doesn’t advertise that it can do transactions. It does however offer a “Transaction-Like”

operation where it makes a series of writes conditional on the success of all of the writes. The reason for

being called “Transaction-Like” is because intermediate processes can still return data while the

transaction is being committed. This “Transaction-Like” process is called Two-Phase Commits. These two

phase commits allow for data to be written to multiple documents and still allow for data to be

recovered, should an error occur. There are various transaction commands which give the user full

control over the order of the data write process. The most important part of transactions is not the

write syntax, but what happens in the case there is an error in the procedure. Mongo offers different

“states” which refer to the steps in the transaction process; most notably there are the “Applied” and

the “Pending” states. For each state in the transaction process, if an error occurs, there are certain

operations which may be used to revert to a previous state, start a transaction over, or even to

“Rollback” or undo an applied action.

Even though there are not true relational database like transactions in MongoDB, the technology does a

pretty good job at making up for it. Two phase commits are a great work around, and there is plenty of

documentation to help out if one wishes to explore the functionality out.

Scalability

Figure 2: Sharding and Replication. http://s3.amazonaws.com/info-mongodb-com/MongoDB_Architecture_Guide.pdf

Historically scaling has mostly been known to happen in a vertical manner, meaning that when an

application needed more memory storage space, an upgrade to the database server would be

performed. Resources such as physical memory, RAM or graphics cards could be added, replaced or

upgraded. Mongo’s approach to scaling, much like most other NoSQL databases, is horizontal scaling or

sharding. With Mongo, a user can set up the database to auto shard.

Cassandra

History Avinash Lakshman invented Cassandra at Facebook in 2008. The project was started during one of

Facebook’s hackathons in 2007. The main goal was to help query the massive amounts of data the

company was dealing with, particularly in users’ inboxes. The project was released to the open source

community in July of 2008 and by February 2010, it was considered an apache top-level project. The

software has speculatively been named after the Greek mythological prophet Cassandra. The myth goes

that the princess of Troy was given prophetic powers by Apollo who wished something in return. When

he did not get what he wanted, he cursed her by making it so no one would ever believe her word again.

An astute blogger at Kellabyte.com points out that the creators at Facebook may have put a little more

thought into the name than just a cool Greek myth: “Cassandra is the cursed Oracle” (Kellabyte).

Data Model

The data model is meant to be somewhat familiar for traditional RDBMS users. An instance of Cassandra

has one table which is made up of multiple column families as defined by the user. Each column family

can contain one of two structures: super-columns or columns. There is no limit on the number of these

that can be stored in a column family. Columns have a name value and a user defined timestamp

associated with them. The number of columns that is allowed in a column family is very large. Super

columns are a data structure which have a name and an infinite number of columns associated with

them. Overall they exhibit the same characteristics as columns. Columns are made up of row entries.

Each row is made up of columns and has a primary key. The first part of a key is a column name. This is

where things become interesting, in the way the database is distributed, which is addressed more in the

Scalability Section of this report. Every row is uniquely identified by a partition key, which is a string, and

has no limit on its size. All rows are distributed across the cluster based on the value of the hashed key.

One feature which should be mentioned is the CQL, which is very familiar to users of SQL. For example,

create table statement looks like this:

CREATE TABLE tweets (

tweet_id uuid PRIMARY KEY,

author varchar,

body varchar

);

Or adding a new column in a table:

ALTER TABLE users ADD birth_date INT;

This may present the question then: How is Cassandra different from a Relational database? The key is

in the way that it allocates memory for a column. In traditional RDBMS each row reserves storage space

for every column it is associated with, even if there is nothing populated in that column for a particular

entry.

Figure 3: In a static-column storage engine, each row must reserve space for every column

Figure 4: In a sparse-column engine, space is only used by columns present in each row

In Cassandra a row is sparse, meaning only columns which have data are stored. In this way Cassandra

affords its users flexibility, normally associated with a schema less system like MongoDB, while also

providing the benefits of a defined schema like RDBMS typically have. This also means that Cassandra

can easily support thousands of columns per table without wasting space, if each row only needs a few of

them (Ellis).

Physical Storage Nodes make up the basic infrastructure of Cassandra. A data center is a collection of nodes. These data

centers can either be physical or virtual data centers. A cluster contains one or more data centers and it

may be distributed over physical locations.

Cassandra is designed to handle big data workloads across multiple nodes with no single point of failure.

One of the biggest advantages of Cassandra is the fact that servers do not depend on each other to a

degree that would cause multiple failures if one server lost connection with another. Creator Avinash

Lakshman described the problem that led them to come up with Cassandra as a fragile system which

had too many points of failure. Facebook had a lot of data just sitting around on a lot of servers, which

created a sort of “house of cards” effect. When one server went down, it caused big issues for the

system as a whole. With Cassandra, the data can be distributed across many systems in a way that one

server’s failure, which inevitably happens, has only the smallest impact on the entire application.

Transactions Cassandra doesn’t use ACID transactions with rollback mechanisms, but instead offers atomic, isolated,

and durable transactions with eventual consistency. Cassandra’s transactions allow the user to decide

how strong or eventual they would like the transaction to be. Atomicity means that everything in a

transaction succeeds or else the entire transaction is rolled back. Transactions cannot interfere with

each other, and completed transactions persist in the event of crashes or failure. Lightweight

transactions can be used in INSERT and UPDATE statements using the IF clause in CQL. For example:

INSERT INTO USERS (login, email, name, login_count)

values ('jbellis', '[email protected]', 'Jonathan Ellis', 1)

IF NOT EXISTS

Or

UPDATE users

SET reset_token = null, password = ‘newpassword’

WHERE login = ‘jbellis’

IF reset_token = ‘some-generated-reset-token’

In these cases the preceding commands will only take place if the IF condition is met.

Scalability Cassandra is designed to handle large amounts of data across multiple nodes with no single point of

failure. The architecture of Cassandra takes into account that system failures can and will happen. To

remedy this problem, the system employs a peer to peer distributed system. According to a post by the

creator of the database on Facebook in August 2008 (close to when the technology was first developed),

Facebook was using Cassandra for its Inbox search system and had scaled to a cluster of 600+ cores and

120+ TB of disk space. In this same post on Facebook Avinash Lakshman says,

” Reliability at massive scale is a very big challenge. Outages in the service can have

significant negative impact. Hence Cassandra aims to run on top of an infrastructure of

hundreds of nodes (possibly spread across different datacenters). At this scale, small and

large components fail continuously; the way Cassandra manages the persistent state in the

face of these failures drives the reliability and scalability of the software systems relying on

this service.”

Data is replicated across systems and eventual consistency is the mantra of the system. This is because a

user may write data to any one of the nodes in the cluster and then the changes are eventually pushed

out to the rest of the nodes via the peer to peer communication of the nodes. Consistency is one of the

tradeoffs in Cassandra in order to achieve high availability and partition tolerance. An advantage which

stems from this tradeoff however, is that the system has great incremental scalability properties which

can be achieved as easily as dropping a new node and automatically having it initialized with data.

Redis

History Redis (“REmote DIctionary Service”) is a key value database which was originally developed by an Italian

software engineer named Salvatore Sanfilippo (Russo). While Sanfilippo was working at a company he

started, he developed an application which would allow a developer to see who was accessing his site

and what actions they were taking in real time. This application was called LLOOGG. With the rapid rate

and the large amount of data coming in to the application, there was no way his original implementation

using MySQL could keep up and scale according to needs. So in early 2009, Sanfilippo started working on

Redis to help take care of the scalability needs. By June 2009, Redis was released as the production

database for LLOOGG. After this initial release, Redis became a hit in the NoSQL community. Sanfilippo

added features quickly and was always helping resolve database corruption bugs and other Redis

related issues. In March of 2010, Sanfilippo was hired by VM Ware to work full time on Redis, even

though it was BSD open source licensed.

Data Model The data model used in Redis is very familiar to computer scientists. A programmer will use Strings, Lists,

Sets, Sorted Sets, and Hashes on a regular basis. These are all types of data that can be stored in a Redis

data base. Regardless of the data that is being stored, it is always identified by a key, and that key is

always a string.

Physical Storage The way that Redis can perform 100,000+ SETs and 80,000+ GETs per second is by requiring the entire

dataset to be loaded to memory at all times (Russo). This may be argued to be one of the main

disadvantages to using Redis because the amount of RAM needed is proportional to the size of the data

set. In most cases using RAM is very fast, yet very expensive.

Replication is available in Redis through the master and slave topology. There is one master which may

have any number of slaves. Each slave can have as many other slaves as desired as well. This allows for

many different server configurations and personalization. When a slave is initialized, it subscribes as a

slave of another member of the topology. After the initialization process, the slave is given the snapshot

of the current master and is then notified of all commands the master receives after initiating that

snapshot.

Data persistence is achieved in various ways: if data durability is not of great importance, then the snap

shot technique is recommended. This involves a snapshot of the entire data set being taken every x

seconds and being written to memory. This operation has been optimized to use at most 2x the memory

needed for the entire set. If durability is desired then the append-only file method is suggested. This

method syncs data to a file in memory, which upon server failure and restart, just replays the entire file

into active memory again. The synchronization process may be set up to be carried out with every

command, every second, or let the OS decide when to sync.

Transactions MULTI, EXEC, DISCARD, and WATCH are the commands most often associated with transactions in Redis.

A user may queue up multiple commands using MULTI. Instead of executing these commands Redis will

queue them. All commands are then run once EXEC is called. A user may call Discard and this will flush

the transaction queue and exit the transaction. If errors occur during the execution of a transaction,

however, this will not stop the execution of the other commands in the transaction. In order to maintain

speed of commands, there are no roll back capabilities in Redis.

Scalability Since data is stored in a key value pair Redis makes it very easy to partition the data set and distribute

over multiple computers. Because Redis is an in memory database, the overall possible size of a Redis

instance depends on how much RAM is available. Partitioning and distributing to multiple computers

adds more resource,s and therefore increases the overall capacity to the total amount of RAM in the

cluster.

Redis is architected in such a way that allows multiple choices for a partitioning strategy. One of the

more useful strategies is called hash partitioning. This is where the key name is hashed according to

some hash function the user defines. The hash number is then modulo by the number of computers in

the Redis cluster. The resulting number then tells the program which computer the key value pair should

be stored on.

Figure 5: Redis is easily partitioned thanks to the Key Value data model.

As an example of how Redis is scalable we can observe what Twitter has done with it. In 2014, according

to highscalability.com, the timeline feature of Twitter alone was using around 40 TB of RAM. The Redis

instance running the timeline feature got over 30 million queries per second and had more than 6000

instances running (Hoff).

Differences The main differences between the three databases compared in this report are in the data models and

the amount of consistency vs. availability afforded.

MongoDB is an example of a document database where the main advantage is in the flexibility of the

data schema. As long as the data can be represented as a JSON object it can be stored in the database.

The superior sharding and redundancy of Mongo allows for pretty good consistency while maintaining

scalability.

Cassandra is an example of a table database and is best used in situations where the stakes are high and

data must always be available. The peer to peer architecture between nodes allows for high availability

and ensures that there is no single point of failure.

Redis is built for speed. With great speed comes great resource demands, since this key value store

database requires that the entire data set be loaded in RAM at all times. The nature of a key value

database allows for easy partitioning and generally values consistency over availability.

Conclusions

One of the greatest reasons to choose a specific technology maybe the amount of documentation there

is on that technology. Not only does this help with learning, but it can also be a great advantage to using

the technology the most effective way possible. Although all three technologies considered in this

report have pretty good documentation on their respective websites, at this point in time, MongoDB has

a humongous advantage in community documentation over the other two.

All three technologies have their own scaling options and do that relatively well. So a technology

decision for myself would depend on the use case. I would choose:

Redis as a very interesting option for a system which is projected to grow quickly and needs to make

exceptionally fast reads and writes.

Mongo for a quick project which may not be completely thought through, this is due to the flexibility it

affords. The simplicity of the BSON format would be useful for the same reason. I realize that this way of

thinking may cause problems down the road, but who are we kidding, as a developer sometimes we just

want to quickly throw something together.

Cassandra for extremely large data sets which may have lots of different connections. With lot of reads

and writes happening at any given time. Cassandra is also useful in situations where I need to be able to

run complex queries on the data and get my results quickly. I would not use Cassandra in applications

such as banking software where consistency is paramount, since the system runs with the mantra of

eventual consistency is good enough.

References Chodorow, Kristina. http://www.kchodorow.com/blog/2010/08/23/history-of-mongodb/. 23 8 2010.

Laptop. 18 10 2015.

DB-Engines. http://db-engines.com/en/. n.d. 10 10 2015.

Ellis, Jonathan. http://www.datastax.com/dev/blog/schema-in-cassandra-1-1. 15 February 2012. 20

October 2015.

Hoff, Todd. http://highscalability.com/blog/2014/9/8/how-twitter-uses-redis-to-scale-105tb-ram-39mm-

qps-10000-ins.html. 8 9 2014. Laptop. 20 10 2015.

Kellabyte. http://kellabyte.com/2013/01/04/the-meaning-behind-the-name-of-apache-cassandra/. 4 1

2013. Laptop. 20 10 2015.

MongoDB. http://s3.amazonaws.com/info-mongodb-com/MongoDB_Architecture_Guide.pdf. n.d.

Laptop. 19 10 2015.

—. https://docs.mongodb.org/manual/core/replica-set-elections/. n.d. Laptop. 19 10 2015.

—. https://www.mongodb.com/json-and-bson. n.d. Laptop. 18 10 2015.

Russo, Michael. http://blog.mjrusso.com/2010/10/17/redis-from-the-ground-up.html#heading_toc_j_0.

17 10 2010. Laptop. 19 10 2015.

nosql databasesbigdata.cs.byu.edu/wp-content/uploads/2015/11/andrew... · 2016-12-29 ·...

Documents