nosql databasesbigdata.cs.byu.edu/wp-content/uploads/2015/11/andrew... · 2016-12-29 ·...
TRANSCRIPT
NOSQL DATABASES A comparison between the MongoDB, Cassandra, and Redis databases
OCTOBER 20, 2015 ANDREW HYTE
Contents Introduction .................................................................................................................................................. 2
MongoDB ...................................................................................................................................................... 2
History ....................................................................................................................................................... 2
Data Model ............................................................................................................................................... 2
Physical Storage ........................................................................................................................................ 3
Transactions .............................................................................................................................................. 4
Scalability .................................................................................................................................................. 4
Cassandra ...................................................................................................................................................... 5
History ....................................................................................................................................................... 5
Data Model ............................................................................................................................................... 5
Physical Storage ........................................................................................................................................ 6
Transactions .............................................................................................................................................. 6
Scalability .................................................................................................................................................. 7
Redis .............................................................................................................................................................. 8
History ....................................................................................................................................................... 8
Data Model ............................................................................................................................................... 8
Physical Storage ........................................................................................................................................ 8
Transactions .............................................................................................................................................. 9
Scalability .................................................................................................................................................. 9
Differences .................................................................................................................................................. 10
Conclusions ................................................................................................................................................. 10
Introduction NoSQL Databases are acclaimed by many web developers for their partition tolerance. Most NoSQL
databases have that in common however there are many differences in the way they achieve that
scalability and in how much availability or consistency they provide. This report examines the history,
data models, physical storage, transactional capabilities, and the scalability of three different types of
NoSQL databases. Pay attention to the differences between the three and the advantages or
disadvantages one affords over another.
MongoDB MongoDb may be the most well-known of all NoSQL databases. This assumption is supported by the fact
that the DB-Engines group has ranked MongoDB as the number four most popular Database
Management System overall, lead in popularity only by three relational databases (Oracle, MySQL, and
Microsoft SQL Server) (DB-Engines).
History It was originally created by two developers who founded “DoubleClick”, Eliot Horowitz and Dwight
Merriman (Chodorow). The two left their company and went on to found several other startups and ran
into the same problem over and over: How to scale out an application? What they decided to do next
was to create a type of platform as a service similar to Google app engine. They originally were going to
call the database ED for Eliot and Dwight. The database was only part of this PaaS package. The system
as a whole was not readily adopted and the project would have been a flop had it not been for the
database. People were saying stuff like, “Well, the database is cool, but blech, app engine.” (Chodorow).
When the owners recognized what they had, they decided to strip out the database, name it Mongo,
coming from “Humongous”, and open source it. The database quickly started gaining traction and soon
had many developers not only using it but also contributing to the project and creating their own
versions. Today there are countless projects which originated from MongoDB such as: Casbah, Morphia,
MongoMapper, Mongoose, CandyGram, MongoKit, Mongoid, and Ming, to name a few.
Data Model MongoDB uses documents in the Binary JSON format to store data instead of tables and rows as used in
a traditional relational database (MongoDB). MongoDB has a document data model which works well
for most modern software applications using the object oriented programming paradigm. Since the
document data model is lightweight, traversable and fast, MongoDB also supports useful queries along
with its ability to uniquely index data. Indexes support the efficient traversal of collections when
querying data. If there are no indexes defined, then Mongo must traverse all of the documents in a
collection versus just the ones with the particular index. One of the most interesting ways Mongo has
made it possible to index and query data is using geospatial indexes. There is a built in 2dsphere
indexing system which allows data to be indexed in relation to some 2dshere, like the earth. This makes
geospatial queries exceptionally quick. MongoDB uses BSON (Binary JSON) to build their document data
structure. BSON build on JSON to include some extra types and to provide efficient encoding and
decoding of data in different languages.
Physical Storage Thanks to Mongo’s ability to easily shard a database into many sections, the size of the physical storage
on a machine is not limited. One could have a 1 TB database on a single machine or just as easily have 4
250 GB shards distributed on to multiple machines. This report will discuss sharding further in the
Scalability section.
One topic to be discussed in physical storage is replication. Mongo can replicate its data easily, creating
multiple servers that hold the same data. If needed, one of these servers can become the primary
server. Mongo allows this to happen easily with its implementation of replica set elections. The primary
in a set is the only database that can accept write operations. If a primary member of the set becomes
unavailable, elections make it possible to resume normal operation without the necessary intervention
of a DBA.
Figure 1: If a member fails, an election is held to change the primary. https://docs.mongodb.org/manual/core/replica-set-elections/
Elections take some time and don’t allow for writes during the process; for these reasons, Mongo avoids
holding elections unless absolutely necessary (MongoDB).
The following are some of the factors which drive elections: the replica databases send heartbeats
(pings) to each other every two seconds. If a heartbeat is not returned within 10 seconds, the delinquent
server is marked as unavailable. Priorities may be set on a replica member. If the highest priority
member is already the primary, then no election will be held. Members with a zero priority cannot
become the primary and are not considered in the elections. A member must be able to connect to a
majority of the other members in order to be eligible to become primary. If there are no members to
connect to a majority of the other members in a replica set, then no primary will be elected.
The following are a few of the events which may trigger an election: if there is not currently a primary,
an election will be held. This may occur if a new replica set has been added, a secondary loses
connection with a primary, or a primary steps down. Primaries will step down if they are asked
specifically with a command, if another member has a higher priority and is eligible to be a primary, or if
the primary loses contact with the majority of the group.
Elections are a way that MongoDB has made it easy to shard the database onto many different
machines and still be able to use replication. A large benefit to this is that it is already integrated. The
overhead of MongoDB is very light since much of the needed systems are already integrated in Mongo,
such as Analytics, text search, geospatial, in-memory performance, and global replication (MongoDB).
Transactions Mongo doesn’t advertise that it can do transactions. It does however offer a “Transaction-Like”
operation where it makes a series of writes conditional on the success of all of the writes. The reason for
being called “Transaction-Like” is because intermediate processes can still return data while the
transaction is being committed. This “Transaction-Like” process is called Two-Phase Commits. These two
phase commits allow for data to be written to multiple documents and still allow for data to be
recovered, should an error occur. There are various transaction commands which give the user full
control over the order of the data write process. The most important part of transactions is not the
write syntax, but what happens in the case there is an error in the procedure. Mongo offers different
“states” which refer to the steps in the transaction process; most notably there are the “Applied” and
the “Pending” states. For each state in the transaction process, if an error occurs, there are certain
operations which may be used to revert to a previous state, start a transaction over, or even to
“Rollback” or undo an applied action.
Even though there are not true relational database like transactions in MongoDB, the technology does a
pretty good job at making up for it. Two phase commits are a great work around, and there is plenty of
documentation to help out if one wishes to explore the functionality out.
Scalability
Figure 2: Sharding and Replication. http://s3.amazonaws.com/info-mongodb-com/MongoDB_Architecture_Guide.pdf
Historically scaling has mostly been known to happen in a vertical manner, meaning that when an
application needed more memory storage space, an upgrade to the database server would be
performed. Resources such as physical memory, RAM or graphics cards could be added, replaced or
upgraded. Mongo’s approach to scaling, much like most other NoSQL databases, is horizontal scaling or
sharding. With Mongo, a user can set up the database to auto shard.
Cassandra
History Avinash Lakshman invented Cassandra at Facebook in 2008. The project was started during one of
Facebook’s hackathons in 2007. The main goal was to help query the massive amounts of data the
company was dealing with, particularly in users’ inboxes. The project was released to the open source
community in July of 2008 and by February 2010, it was considered an apache top-level project. The
software has speculatively been named after the Greek mythological prophet Cassandra. The myth goes
that the princess of Troy was given prophetic powers by Apollo who wished something in return. When
he did not get what he wanted, he cursed her by making it so no one would ever believe her word again.
An astute blogger at Kellabyte.com points out that the creators at Facebook may have put a little more
thought into the name than just a cool Greek myth: “Cassandra is the cursed Oracle” (Kellabyte).
Data Model
The data model is meant to be somewhat familiar for traditional RDBMS users. An instance of Cassandra
has one table which is made up of multiple column families as defined by the user. Each column family
can contain one of two structures: super-columns or columns. There is no limit on the number of these
that can be stored in a column family. Columns have a name value and a user defined timestamp
associated with them. The number of columns that is allowed in a column family is very large. Super
columns are a data structure which have a name and an infinite number of columns associated with
them. Overall they exhibit the same characteristics as columns. Columns are made up of row entries.
Each row is made up of columns and has a primary key. The first part of a key is a column name. This is
where things become interesting, in the way the database is distributed, which is addressed more in the
Scalability Section of this report. Every row is uniquely identified by a partition key, which is a string, and
has no limit on its size. All rows are distributed across the cluster based on the value of the hashed key.
One feature which should be mentioned is the CQL, which is very familiar to users of SQL. For example,
create table statement looks like this:
CREATE TABLE tweets (
tweet_id uuid PRIMARY KEY,
author varchar,
body varchar
);
Or adding a new column in a table:
ALTER TABLE users ADD birth_date INT;
This may present the question then: How is Cassandra different from a Relational database? The key is
in the way that it allocates memory for a column. In traditional RDBMS each row reserves storage space
for every column it is associated with, even if there is nothing populated in that column for a particular
entry.
Figure 3: In a static-column storage engine, each row must reserve space for every column
Figure 4: In a sparse-column engine, space is only used by columns present in each row
In Cassandra a row is sparse, meaning only columns which have data are stored. In this way Cassandra
affords its users flexibility, normally associated with a schema less system like MongoDB, while also
providing the benefits of a defined schema like RDBMS typically have. This also means that Cassandra
can easily support thousands of columns per table without wasting space, if each row only needs a few of
them (Ellis).
Physical Storage Nodes make up the basic infrastructure of Cassandra. A data center is a collection of nodes. These data
centers can either be physical or virtual data centers. A cluster contains one or more data centers and it
may be distributed over physical locations.
Cassandra is designed to handle big data workloads across multiple nodes with no single point of failure.
One of the biggest advantages of Cassandra is the fact that servers do not depend on each other to a
degree that would cause multiple failures if one server lost connection with another. Creator Avinash
Lakshman described the problem that led them to come up with Cassandra as a fragile system which
had too many points of failure. Facebook had a lot of data just sitting around on a lot of servers, which
created a sort of “house of cards” effect. When one server went down, it caused big issues for the
system as a whole. With Cassandra, the data can be distributed across many systems in a way that one
server’s failure, which inevitably happens, has only the smallest impact on the entire application.
Transactions Cassandra doesn’t use ACID transactions with rollback mechanisms, but instead offers atomic, isolated,
and durable transactions with eventual consistency. Cassandra’s transactions allow the user to decide
how strong or eventual they would like the transaction to be. Atomicity means that everything in a
transaction succeeds or else the entire transaction is rolled back. Transactions cannot interfere with
each other, and completed transactions persist in the event of crashes or failure. Lightweight
transactions can be used in INSERT and UPDATE statements using the IF clause in CQL. For example:
INSERT INTO USERS (login, email, name, login_count)
values ('jbellis', '[email protected]', 'Jonathan Ellis', 1)
IF NOT EXISTS
Or
UPDATE users
SET reset_token = null, password = ‘newpassword’
WHERE login = ‘jbellis’
IF reset_token = ‘some-generated-reset-token’
In these cases the preceding commands will only take place if the IF condition is met.
Scalability Cassandra is designed to handle large amounts of data across multiple nodes with no single point of
failure. The architecture of Cassandra takes into account that system failures can and will happen. To
remedy this problem, the system employs a peer to peer distributed system. According to a post by the
creator of the database on Facebook in August 2008 (close to when the technology was first developed),
Facebook was using Cassandra for its Inbox search system and had scaled to a cluster of 600+ cores and
120+ TB of disk space. In this same post on Facebook Avinash Lakshman says,
” Reliability at massive scale is a very big challenge. Outages in the service can have
significant negative impact. Hence Cassandra aims to run on top of an infrastructure of
hundreds of nodes (possibly spread across different datacenters). At this scale, small and
large components fail continuously; the way Cassandra manages the persistent state in the
face of these failures drives the reliability and scalability of the software systems relying on
this service.”
Data is replicated across systems and eventual consistency is the mantra of the system. This is because a
user may write data to any one of the nodes in the cluster and then the changes are eventually pushed
out to the rest of the nodes via the peer to peer communication of the nodes. Consistency is one of the
tradeoffs in Cassandra in order to achieve high availability and partition tolerance. An advantage which
stems from this tradeoff however, is that the system has great incremental scalability properties which
can be achieved as easily as dropping a new node and automatically having it initialized with data.
Redis
History Redis (“REmote DIctionary Service”) is a key value database which was originally developed by an Italian
software engineer named Salvatore Sanfilippo (Russo). While Sanfilippo was working at a company he
started, he developed an application which would allow a developer to see who was accessing his site
and what actions they were taking in real time. This application was called LLOOGG. With the rapid rate
and the large amount of data coming in to the application, there was no way his original implementation
using MySQL could keep up and scale according to needs. So in early 2009, Sanfilippo started working on
Redis to help take care of the scalability needs. By June 2009, Redis was released as the production
database for LLOOGG. After this initial release, Redis became a hit in the NoSQL community. Sanfilippo
added features quickly and was always helping resolve database corruption bugs and other Redis
related issues. In March of 2010, Sanfilippo was hired by VM Ware to work full time on Redis, even
though it was BSD open source licensed.
Data Model The data model used in Redis is very familiar to computer scientists. A programmer will use Strings, Lists,
Sets, Sorted Sets, and Hashes on a regular basis. These are all types of data that can be stored in a Redis
data base. Regardless of the data that is being stored, it is always identified by a key, and that key is
always a string.
Physical Storage The way that Redis can perform 100,000+ SETs and 80,000+ GETs per second is by requiring the entire
dataset to be loaded to memory at all times (Russo). This may be argued to be one of the main
disadvantages to using Redis because the amount of RAM needed is proportional to the size of the data
set. In most cases using RAM is very fast, yet very expensive.
Replication is available in Redis through the master and slave topology. There is one master which may
have any number of slaves. Each slave can have as many other slaves as desired as well. This allows for
many different server configurations and personalization. When a slave is initialized, it subscribes as a
slave of another member of the topology. After the initialization process, the slave is given the snapshot
of the current master and is then notified of all commands the master receives after initiating that
snapshot.
Data persistence is achieved in various ways: if data durability is not of great importance, then the snap
shot technique is recommended. This involves a snapshot of the entire data set being taken every x
seconds and being written to memory. This operation has been optimized to use at most 2x the memory
needed for the entire set. If durability is desired then the append-only file method is suggested. This
method syncs data to a file in memory, which upon server failure and restart, just replays the entire file
into active memory again. The synchronization process may be set up to be carried out with every
command, every second, or let the OS decide when to sync.
Transactions MULTI, EXEC, DISCARD, and WATCH are the commands most often associated with transactions in Redis.
A user may queue up multiple commands using MULTI. Instead of executing these commands Redis will
queue them. All commands are then run once EXEC is called. A user may call Discard and this will flush
the transaction queue and exit the transaction. If errors occur during the execution of a transaction,
however, this will not stop the execution of the other commands in the transaction. In order to maintain
speed of commands, there are no roll back capabilities in Redis.
Scalability Since data is stored in a key value pair Redis makes it very easy to partition the data set and distribute
over multiple computers. Because Redis is an in memory database, the overall possible size of a Redis
instance depends on how much RAM is available. Partitioning and distributing to multiple computers
adds more resource,s and therefore increases the overall capacity to the total amount of RAM in the
cluster.
Redis is architected in such a way that allows multiple choices for a partitioning strategy. One of the
more useful strategies is called hash partitioning. This is where the key name is hashed according to
some hash function the user defines. The hash number is then modulo by the number of computers in
the Redis cluster. The resulting number then tells the program which computer the key value pair should
be stored on.
Figure 5: Redis is easily partitioned thanks to the Key Value data model.
As an example of how Redis is scalable we can observe what Twitter has done with it. In 2014, according
to highscalability.com, the timeline feature of Twitter alone was using around 40 TB of RAM. The Redis
instance running the timeline feature got over 30 million queries per second and had more than 6000
instances running (Hoff).
Differences The main differences between the three databases compared in this report are in the data models and
the amount of consistency vs. availability afforded.
MongoDB is an example of a document database where the main advantage is in the flexibility of the
data schema. As long as the data can be represented as a JSON object it can be stored in the database.
The superior sharding and redundancy of Mongo allows for pretty good consistency while maintaining
scalability.
Cassandra is an example of a table database and is best used in situations where the stakes are high and
data must always be available. The peer to peer architecture between nodes allows for high availability
and ensures that there is no single point of failure.
Redis is built for speed. With great speed comes great resource demands, since this key value store
database requires that the entire data set be loaded in RAM at all times. The nature of a key value
database allows for easy partitioning and generally values consistency over availability.
Conclusions
One of the greatest reasons to choose a specific technology maybe the amount of documentation there
is on that technology. Not only does this help with learning, but it can also be a great advantage to using
the technology the most effective way possible. Although all three technologies considered in this
report have pretty good documentation on their respective websites, at this point in time, MongoDB has
a humongous advantage in community documentation over the other two.
All three technologies have their own scaling options and do that relatively well. So a technology
decision for myself would depend on the use case. I would choose:
Redis as a very interesting option for a system which is projected to grow quickly and needs to make
exceptionally fast reads and writes.
Mongo for a quick project which may not be completely thought through, this is due to the flexibility it
affords. The simplicity of the BSON format would be useful for the same reason. I realize that this way of
thinking may cause problems down the road, but who are we kidding, as a developer sometimes we just
want to quickly throw something together.
Cassandra for extremely large data sets which may have lots of different connections. With lot of reads
and writes happening at any given time. Cassandra is also useful in situations where I need to be able to
run complex queries on the data and get my results quickly. I would not use Cassandra in applications
such as banking software where consistency is paramount, since the system runs with the mantra of
eventual consistency is good enough.
References Chodorow, Kristina. http://www.kchodorow.com/blog/2010/08/23/history-of-mongodb/. 23 8 2010.
Laptop. 18 10 2015.
DB-Engines. http://db-engines.com/en/. n.d. 10 10 2015.
Ellis, Jonathan. http://www.datastax.com/dev/blog/schema-in-cassandra-1-1. 15 February 2012. 20
October 2015.
Hoff, Todd. http://highscalability.com/blog/2014/9/8/how-twitter-uses-redis-to-scale-105tb-ram-39mm-
qps-10000-ins.html. 8 9 2014. Laptop. 20 10 2015.
Kellabyte. http://kellabyte.com/2013/01/04/the-meaning-behind-the-name-of-apache-cassandra/. 4 1
2013. Laptop. 20 10 2015.
MongoDB. http://s3.amazonaws.com/info-mongodb-com/MongoDB_Architecture_Guide.pdf. n.d.
Laptop. 19 10 2015.
—. https://docs.mongodb.org/manual/core/replica-set-elections/. n.d. Laptop. 19 10 2015.
—. https://www.mongodb.com/json-and-bson. n.d. Laptop. 18 10 2015.
Russo, Michael. http://blog.mjrusso.com/2010/10/17/redis-from-the-ground-up.html#heading_toc_j_0.
17 10 2010. Laptop. 19 10 2015.