athabasca university a survey of data consistency in...
TRANSCRIPT
1
ATHABASCA UNIVERSITY
A Survey of Data Consistency in Cloud DBs
by
Hank (Yonghang) Lin
A project submitted in partial fulfillment
of the requirements for the degree of
MASTER OF SCIENCE in INFORMATION SYSTEMS
Athabasca, Alberta
August, 2014
© Hank Lin, 2014
2
DEDICATION
This paper is dedicated to my wife for her encouragement and support to pursue my educational
goal. Unique thanks to my employer, Enabil Solution, who supported and partially sponsored my
tuition fee.
3
ABSTRACT
This essay is a survey of data consistency in the cloud databases. It explores the five
essential characteristics of a cloud computing platform identified by NIST (National Institute of
Standards and Technology). They are: on-demand self-service, broad network access, resource
pooling, rapid elasticity and measured service. Those characteristics of a cloud platform
differentiate it from the traditional data center and make it an attractive infrastructure solution for
enterprise. Most cloud platforms utilize commodity machines to build a server farm and can deliver
high scalability and availability at low cost. The common architecture of the traditional RDBMS
(Relational DataBase Management System) cluster adopts shared disks architecture and it is well-
suited to a big machine. On the other hand, most cloud DBs choose shared nothing architecture,
which allows the DB to scale up to thousands of nodes, as they do not interfere with one another.
This essay reviews a traditional RDBMS cluster – Oracle RAC and a number of DBs that can offer
strong data consistency in the cloud environment and analyses how they handle the data consistency
and at what level from its architecture’s perspective.
Strong data consistency is a must for many enterprise applications. This essay can help IT
architects and DBAs understand the difference between traditional RDBMS and cloud DBs so that
they can evaluate the changes and efforts when they are thinking of migrating their applications to
the cloud. It can also be used as material for university students who have interest on the data
management domain.
The remainder of this essay is organized as follows: Chapter I introduces the research
background and states my research objective and methodology. Chapter II presents a literature
review of the state-of-the-art research into the cloud data management domain today. The foundation
knowledge of the cloud computing domain, database consistency theories, common cloud DB categories
and traditional RDBMS are discussed as well, so the readers can understand the subject better and
learn its latest development and challenges. The research methodology is detailed in the Chapter III.
In Chapter IV, seven cloud DBs are selected for analysis with a focus on its architecture. Issues,
4
challenges and opportunities are discussed in Chapter V and Chapter VI concludes this essay.
References are provided in the end.
5
ACKNOWLEDGMENTS
This research paper work was guided by my supervisor, Professor Qing Tan, Athabasca
University. I am thankful to Professor Tan for his valuable suggestions, proper guidelines and
support.
6
TABLE OF CONTENTS
ABSTRACT ......................................................................................................................................... 3
ACKNOWLEDGMENTS ................................................................................................................... 5
TABLE OF CONTENTS ..................................................................................................................... 6
CHAPTER I: INTRODUCTION ...................................................................................................... 9
1. Research Background ............................................................................................................... 9
2. Research Purpose and Objective ............................................................................................. 11
3. Research Methodology ........................................................................................................... 11
4. Research Scope and Contribution ........................................................................................... 12
CHAPTER II: LITERATURE REVIEW ..................................................................................... 14
1. Cloud Computing .................................................................................................................... 14
2. ACID and CAP theorem ......................................................................................................... 18
3. NoSQL and NewSQL ............................................................................................................. 20
4. RDBMS Cluster – Oracle RAC .............................................................................................. 22
5. Cloud Computing Database and Data Management ............................................................... 25
CHAPTER III: METHODOLOGY ............................................................................................... 29
CHAPTER IV: CASE STUDIES................................................................................................... 32
1. Megastore ................................................................................................................................ 32
2. SAP HANA ............................................................................................................................. 34
3. VoltDB .................................................................................................................................... 36
4. MySQL Cluster ....................................................................................................................... 37
7
5. ScaleDB .................................................................................................................................. 38
6. NuoDB .................................................................................................................................... 40
7. ClustrixDB .............................................................................................................................. 42
CHAPTER V: CHALLENGE, OPPORTUNITY AND TREND ................................................ 43
CHAPTER VI: CONCLUSIONS AND RECOMMENDATIONS ............................................... 47
References .......................................................................................................................................... 49
8
LIST OF FIGURES
Figure 1: Research methodology diagram ......................................................................................... 12
Figure 2: PACELC Tradeoffs for Distributed Data Services (Abadi, 2012) ..................................... 27
Figure 3: Trend Popularity Data provided by DB-Engine. ................................................................ 30
Figure 4: Operations across entity groups (Baker et al., 2011) .......................................................... 33
Figure 5: The SAP HANA database architecture (Färber, 2012) ...................................................... 35
Figure 6: MySQL Node Architecture (Ronstrom, 2004) ................................................................... 38
Figure 7: ScaleDB Cluster with Mirrored Storage (Shadmon, 2009) ................................................ 39
Figure 8: NuoDB architecture (NuoDB, 2013) .................................................................................. 41
Figure 9: Clustrix Distributed Query Processing (Clustrix, 2014) .................................................... 42
LIST OF TABLES
Table 1: Reviewed DB comparison ................................................................................................... 45
9
CHAPTER I: INTRODUCTION
1. Research Background
Cloud computing is becoming a trend. Gartner (2008) describes cloud computing as a style
of computing in which scalable and elastic IT-enabled capabilities are delivered “as a service” using
Internet technologies. The advantages and benefits of cloud computing have been well known. The
major point includes: cost efficiency, scalability, continuous availability and on-demand
provisioning. Even though enterprises still have some concerns regarding the cloud platform, such
as security and privacy in the cloud, its user base is growing constantly and most major IT players
have bet on it and heavily invested in it.
With the continued development of globalization, more enterprises now access global
markets and their workforces and client bases are spread over multiple regions. It is essential that
their applications are available all the time and can be accessed from anywhere. Also, user tolerance
for the latency of application response time is closing to near zero. Cloud computing can address the
concerns of high availability, easy access and fast response time. But the majority of data
management solutions in the cloud today cannot meet the mandatory data consistency requirement
of many enterprise applications.
In this essay, the consistency of a distributed database refers to level of guarantee as to when
a committed write would be visible to other clients/users in a distributed, concurrently accessible
system. Doug Terry defines six possible consistency guarantees in his paper (Terry, 2013).
1. Eventual Consistency
2. Consistent Prefix
3. Bounded Staleness
4. Monotonic Reads
5. Read My Own Writes
6. Strong Consistency
Most NoSQL is designed with the eventual consistency principle while strong consistency is
one of the key characteristics of traditional relational databases. Between eventual consistency and
10
strong consistency, Consistent Prefix can guarantee read data in ordered sequence of writes.
Bounded Staleness guarantees the retrieved data within a defined time period. Monotonic Reads
guarantee Consistent Prefix in a session, also called “session guarantee.” Read My Own Write
offers strong consistency for a single client. It guarantees that all writes that were performed by the
client are visible to the client's subsequent reads. The middle four consistency models are all a form
of eventual consistency but stronger than the basic eventual consistency.
The trade-off between consistency and availability or scalability is well explained in Eric
Brewer’s CAP theorem. For many enterprise applications, strong consistency is a must in some of
its use cases, but not all the use cases demand strong data consistency. For the benefit of availability
and scalability, it is possible that data consistency can be relaxed in some use cases. For example,
the account balance of prepaid cellular service is critical and has to be tracked in real time for
authorization before allowing a subscriber’s call request. But account balance is not so critical for
the majority postpaid accounts, as the carrier collects payment at the end of the customer’s bill
cycle. As long as the account balance of a postpaid account shows consistency before the bill cycle,
it is good enough for the carrier to bill the customer correctly. Even for a prepaid account, not all the
changes within its transactions demand strong consistency. The account balance must maintain
consistency anytime, but the call detail does not have to. A customer is unlikely to detect when the
call details are posted by seconds or even minutes of delay. From a business perspective, some data
in some transactions require absolute consistency while others do not. Even different customers may
have different consistency needs. If database consistency can be customized and the job of choosing
the desired consistency level for each transaction or user can be left with business analysts, then the
system does not have to pay the consistency costs for those cases where consistency is not really
needed.
Many new databases which claim to offer strong data consistency in the cloud are emerging
in recent years; especially some DBs label themselves as NewSQL. It is interesting to understand
their architecture and how they can overcome the overhead of maintaining strong consistency while
allowing their system to be highly scalable and available in the cloud environment.
11
2. Research Purpose and Objective
The purpose of this research is to survey the difference between the traditional RDBMS and
cloud-based databases and investigate how different databases handle the data consistency challenge
from an architecture perspective. In this essay, seven data management solutions were selected for
review through a systematic approach including Google Megastore, SAP HANA, VoltDB, MySQL
Cluster, ScaleDB, NuoDB and ClustrixDB. .
Although databases in the cloud can deliver much better scalability and availability along
with higher performance than tradition relational databases, the consistency model is a major
obstacle for migrating traditional database into the cloud.
The objective of this essay is to provide enough background for the people who are
interested to know more about cloud databases so that they can understand the complexity of data
consistency in the cloud database.
3. Research Methodology
In this essay, a systematic approach is employed to select the solutions for review. Gartner
Magic Quadrant for Operational Database Management Systems is used as an important reference
when selecting solutions for review. The DB-Engines ranking is used as a secondary factor.
My research interests are focused on the potential that a traditional OLTP (OnLine
Transaction Processing) system can be migrated to the cloud and achieve the high scalability that
the cloud platform can deliver, while it’s mandatory strong consistency attribute is protected in the
cloud. The solutions I choose to review should be closely associated with my research interests,
which are highly scalable, strong consistency guaranteed cloud-based databases. Also, it is
impractical to check all the existing solutions, as there are many novel solutions emerging all the
time. It makes more sense to study the solutions that have been recognized by the market instead of
diverting effort to investigation of the niche players.
The major selection criteria for review include:
Solutions must have potential to scale in the cloud for the OLTP transaction.
Solutions must be able to provide strong consistency.
12
Solutions should be either well-known or have great growth potential.
The research methodology diagram in Figure 1 illustrates how my research was conducted.
Define a research area
OLTP DB migrate to cloud DB while protect consistency attribute
Define criteria to select major palyers who claim have that abilities
Review related papers on selected DBs and investigate its architecture
Synthesise the findings from individual studies
Follow the defined criteria and use the Gartner data to select DBs for
inclusion
Interpret the finding and offer my recommendation
Figure 1: Research methodology diagram
4. Research Scope and Contribution
While many researchers have studied various database solutions suited or designed for the
cloud, they studied individual DB’s unique advantages and made a full comparison. My research
has assumed that cloud-based databases have better availability and scalability compared to
traditional RDBMS while they inherit the characteristics of the cloud platform and that the data
consistency issue becomes a user’s major concern when they are planning to move their applications
to the cloud. While there is no doubt that each solution has its best-use cases, my research tries to
investigate solutions that address a user’s mandatory data consistency requirement first.
13
The limitations of traditional relational databases cause major concerns when a company is
thinking of moving to a cloud-based database. Many new solutions have emerged in recent years to
tackle the issues of data management in the cloud. However, it is not realistic to study all of them.
In this essay, I use a systematic approach to select seven DBs, explore their design and present an
unbiased view of the architecture of those solutions and their pros and cons.
The outcomes of this research will assist practitioners who are thinking of migrating their
traditional application into the cloud to understand the differences in architecture between
traditional RDBMS and available databases suitable for the cloud and to assess different database
management solutions, especially from the perspective of their consistency needs.
In the following chapters of this essay, the Literature Review chapter presents an overview
of the state-of-the-art research in the cloud data management domain. Readers can learn the latest
development and challenges. Foundation knowledge of the cloud computing domain, database
consistency theories, common cloud DB categories, and traditional RDBMS are also discussed in
this chapter. The chapter helps the readers understand why strong consistency matters. The Case
Studies chapter focuses on how individual solutions in the cloud address the data consistency issue.
Issues, challenges and recommendations summarize what I found in my research.
14
CHAPTER II: LITERATURE REVIEW
Cloud-based data management solutions live in the cloud. It is important to know the
characteristics of a cloud platform so as to understand why the traditional RDBMS cannot perform
well in the cloud. An overview of cloud computing is covered in this chapter. Most popular cloud-
based data management solutions, such as Cassandra, MongoDB and Apache HBase, are referred to
as NoSQL. These NoSQL solutions mostly spawn from the recognition and application of the CAP
theorem, while traditional RDBMS stick to the ACID principle. The difference between the CAP
theorem and ACID principle is discussed in this chapter. Trying to overcome the pitfalls of NoSQL
solutions, the NewSQL is emerging. NewSQL aims to preserve the traditional RDBMS’s
characteristics in the cloud environment. Before studying the cloud databases, the popular RDBMS
cluster solution, Oracle Real Application Cluster (RAC), is used as an example to illustrate the
architecture of a traditional RDBMS cluster. At the end of this chapter, extensive literature review
on the related researches is provided.
1. Cloud Computing
One of the most prominent IT trends of the last decade has been the emergence of cloud
computing. Cloud computing is the delivery model for providing pervasive, readily available, on-
demand network access to a shared pool of configurable computing resources. These resources can
be quickly provisioned with minimal management. The technological foundation consists of the
collection of hardware and software required to support the delivery model. It can be divided into a
physical layer and an abstraction layer. The physical layer consists of the network, storage and
server infrastructure, while the abstraction layer is composed of the software implemented across
the physical layer. This cloud model promotes availability and is composed of five essential
characteristics: on-demand self-service, broad network access, resource pooling, rapid elasticity, and
measured service. (Mell & Grance, 2009)
Virtualization, high speed Internet and cloud management are key enabling technologies
behind the emerging cloud computing paradigm.
15
In cloud computing, virtualization is the creation of a virtual version of something, such as a
hardware platform, operating system (OS), storage device, or network resources. The usual goal of
virtualization is to centralize administrative tasks while improving scalability and overall hardware-
resource utilization. By leveraging virtualization technology, a company can pool IT assets into
resource pools to be carved up, consumed, and released back into the pool as workloads require. It is
a more utility-like resource. Physical and logical resources are made available through a virtual
service layer across the enterprise. The concept of cloud computing has captured the attention and
imagination of organizations of all sizes because its service delivery model converts the power of
virtualization into measurable business value by adding provisioning and billing capabilities.
When people switch on computers, they expect the application powered by cloud computing
to work just like locally installed software. They want information to be served up immediately.
Cloud computing requires not just high speed, but also high quality broadband connections that are
always connected. While many websites are usable on non-broadband connections or slow
broadband connections, cloud-based applications are often not usable in these environments.
Connection speed in kilobytes per second (or MB/s and GB/s) is important in the use of cloud
computing services. Also important is Quality of Service (QoS), indicators for which include the
number of times the connections are dropped, response time, and the extent of delays in the
processing of network data (latency) and loss of data (packet loss). Cloud computing can have high
costs due to its requirements for both an “always on” connection, as well as using large amounts of
data. This is hard for many who pay by the megabyte or gigabyte or are limited by a data cap. The
arrival of superfast network links (such as fiber optics) delivered just that access to the worldwide
web at the speed of light. This opened the door to cloud computing, with its high expectations of
accessing cloud platforms from any location in an instant.
Cloud management is software and technologies designed for operating and monitoring the
applications, data and services residing in the cloud. Cloud management tools help ensures that a
company’s cloud computer-based resources are working optimally and properly interacting with
users and other services. Cloud management strategies typically involve numerous tasks, including
16
performance monitoring (response times, latency, uptime, etc.), security and compliance auditing
and management, and initiating and overseeing disaster recovery and contingency plans. With cloud
computing growing more complex and a wide variety of private, hybrid, and public cloud-based
systems and infrastructure already in use, a more flexible and scalable cloud management system is
required.
Cloud computing represents a shift of application architecture from traditional vertical
scalability to horizontal scalability. A typical cloud platform utilizes commodity hardware rather
than big machines such as mainframes. Vertical scaling is the process of beefing up a server by
adding more CPUs, more memory or faster disks. These machines are not just expensive, but are
also limited by their designed capacity. Horizontal scalability is a new way of scaling, since it is no
longer bound by the physical size of the server. It scales by adding more nodes. As commodity
machine is less reliable than big machine, cloud platforms often use virtualization technology to add
hardware into pooled resources to resist failure.
There are three primary models of cloud computing services: Infrastructure as a Service
(IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). ( Hogan et al., 2011)
The IaaS provider delivers and manages infrastructure for end users, including storage,
network and computing resources. Some examples of these service providers are AWS, open stack,
Rackspace and IBM.
The PaaS provider delivers a platform for the end user to create and deploy applications.
Some examples of these service providers are Amazon's AWS Elastic Beanstalk, Force.com and
Engine Yard.
SaaS is service where many users can make use of the software hosted by the service
provider and pay only for the time it is being used. It will be better than buying the hardware and
software, since this model removes the burden of updating the software to the latest version,
licensing the software and it is, of course, more economical. Some examples of these service
providers are Salesforce and Google Apps.
17
A cloud computing platform can come in four different deployment models known as public
clouds, private clouds, community clouds and hybrid clouds. (Hogan et al., 2011) Public cloud
providers offer computing resources to the general public. Resources can include hardware, a
platform for application development or whole applications. Amazon Elastic Compute Cloud (EC2),
Google AppEngine and Windows Azure Services Platform are among the most well-known public
clouds. Private clouds are built by a single company for their own applications. The private cloud is
typically on premise, but may be 100% owned by the business yet located at a third party hosting
facility. The initial cost of private clouds can be expensive and are suitable for large corporations
who have concerns around data security and want to keep maintain control of their own
infrastructure. Community clouds are shared within a specific community who have similar
concerns. They are like private clouds, but built and serviced for a number of members within a
community. Hybrid clouds can be a composite of any two or more of the cloud models described
above. They can provide high availability backup for critical systems and meet different required
levels of security and management.
Cloud computing is seen by many as the next wave of information technology for
individuals, companies and governments. Many scientists highlighted the large potential benefits of
adoption, ranging from economic growth and potentially sizeable improvements in employment to
enabling innovation and collaboration. Virtualization, high-speed Internet and agile cloud
management are the fundamental technologies that drive cloud computing to grow.
With the way the world is embracing the cloud, it will become one of the revolutionary
technologies in the near future.
In general, a cloud computing platform is built on top of commodity hardware through
virtualization technology. The benefits of cloud computing include: no up-front investment, lower
operating costs, highly scalable, easy access and reduced business risks and maintenance expenses.
(Zhang, 2010)
Traditional RDBMS do not fit the new cloud environments well. The cloud-based data
management system should be able to run on a commodity machine, as it forms a typical cloud
18
environment. However, commodity hardware is prone to fail, so the cloud-based databases have to
be fault tolerant. The cloud platform is highly scalable and elastic and the cloud-based databases are
expected to take advantage of the cloud platform and offer high speed data processing while being
easily scaled. Security and privacy concerns are always a hot topic regarding cloud computing, as
the data may now be stored on third-party premises on resources shared among different tenants.
Cloud-based databases have to address the same concerns, so that an enterprise can have sufficient
trust to embrace it.
2. ACID and CAP theorem
ACID is acronym for Atomicity, Consistency, Isolation, and Durability. Those four
properties guarantee that database transactions are processed reliably and are the foundation of
traditional RDBMS. A transaction is a single logic operation on the data composed of a series of
read and write operations. RDBMS must ensure ACID properties in the face of concurrent access or
even system failure.
Atomicity states that each transaction can be regarded as atomic. All the operations in one
transaction either are carried out or are cancelled. Even when a system crash occurs, RDBMS can
roll back the incomplete transaction to guarantee transaction atomicity. Atomicity of a transaction
means that either all or none of the transaction’s operations are performed.
Consistency states that only valid data will be written to the database without breaking
certain pre-set constraints. If a transaction is executed that violates the database’s consistency rules
such as constraints, cascades and triggers for some reason, the entire transaction will be rolled back.
The consistency property ensures that every transaction will bring the database from one valid state
to another. The rule must be satisfied for all nodes, even in a cluster environment.
Isolation refers to the requirement that other sessions cannot see the data which has yet to
commit during a transaction. Each transaction is unaware of other transactions executing
concurrently in the system. The isolation of transaction means that if several transactions are
executed concurrently the results must be the same as if they were executed serially in some order.
19
Durability refers to the guarantee that once transaction get commit, all the changes will be
persisted. The common technique is to write all transactions into a redo log that can be replayed to
restore the system state during a failure. A transaction can only be deemed committed after all the
changes are written into the log successfully.
The way ACID can guarantee transaction consistency in the presence of failures is through
so-called two-phase commit protocol. In the first phase, the transaction coordinator sends out
commit requests to each participant. All the participants must send an acknowledgment response
and block related resources. The coordinator will send a commit command in the second phase after
it receives a green light from all the participants. Any resources used by the participant are
unavailable for use by other atomic transactions between the first phase and second phase. If the
coordinator fails before delivery of the second phase message these resources remain blocked until
it recovers.
CAP theorem was proposed in 2000 by Eric Brewer, who was a professor at the University
of California. The CAP theorem basically states that any distributed system cannot guarantee
Consistency, Availability and Partitioning tolerance simultaneously. Consistency in the CAP
theorem refers to all nodes in the distributed system seeing the same data at the same time. It is not
the same as the consistency in the ACID properties of RDBMS; actually, it is close to the Atomicity
property of ACID. Availability refers to the services accessible; the system as a whole continue to
operate in spite of some nodes failing. When a system is available, it can respond to all requests in a
timely fashion. Partition tolerance means that the system can continue to operate despite arbitrary
message loss or failure of part of the system. In a distributed environment, network communication
is critical for multiple nodes to work as a signal system. Partition tolerance application allows the
system to continue functioning unless total network failure occurs. In the case of partitioning, where
a distributed data store is partitioned into two sets of nodes, both partitions have to deny all write
requests in order to guarantee data consistency. Otherwise, the data will likely become inconsistent
when either of the partitioned nodes is available and allow updates. The first approach protects data
20
consistency, but sacrifices availability; the second approach maximizes availability, but cannot
guarantee data consistency.
The CAP theorem suggests that a system can only have, at most, but two of three desirable
properties. Scalability has become an essential facet of cloud computing that allows the system to
scale horizontally. In today’s business environment, more and more applications have to be highly
available; even long latency is not acceptable. Consistency becomes the only option to be
compromised and most NoSQL systems trade off consistency for the other two properties. The
common consistency model of NoSQL is known as BASE (Basically Available, Soft state, Eventual
consistency).
The BASE is eventual consistency. Its consistency is weaker than ACID, but it makes the
system easier to scale and maintains availability. ACID and BASE adopt two opposite design
philosophies. While ACID is pessimistic and requires consistency at the end of every operation,
BASE is optimistic and acknowledges that data might become inconsistent, but it will become
consistent eventually.
3. NoSQL and NewSQL
The NoSQL database is a whole new approach to manage data for very large sets of
distributed data. The data created today is so large and complex it is very difficult to process in
traditional RDBMS. The term “big data” is used to describe that massive volume of data and was
named as one of most overused buzzwords in 2013 by FactSet Research Systems Inc. As a result,
cloud-based NoSQL databases emerged to tackle the big data.
Compared to traditional relational databases, a NoSQL database provides a mechanism for
storage and retrieval of data that uses looser consistency models. Motivations for that approach
include simplicity of design, horizontal scaling and finer control over availability. NoSQL databases
are finding significant and growing industry use in big data and real-time web applications. (NoSQL
Wekipedia, 2013)
NoSQL is widely used for exploring big data. Many Internet giants, such as Amazon
Dynamo, Google BigTable, LinkedIn Voldemort, Twitter FlockDB and Facebook Cassandra, found
21
the traditional RDBMS could not handle their unprecedented volume of data within an acceptable
time, so they developed their own in-house solution. Those databases are designed and optimized
for their specify use cases.
The common genres of NoSQL DB include key-value, columnar, document-oriented, and
graph databases.
Key-value is very simple structure similar to a dictionary. All the values have to be looked
up by a unique key. Values are isolated and independent of each other, so relationships must be
handled in application logic. (Hecht, 2011) The key-value store is suited to simple operations.
Amazon Dynamo, LinkedIn Voldemort and Riak are all key-value stores.
Columnar databases (aka column-oriented) share some similarity with key-value databases
in that values are queried by matching keys. Unlike key-value database, the value in column-
oriented DBs is composed of many columns, so all the related column values can be retrieved in one
lookup. Google Bigtable is the most representative columnar database. HBase is inspired by Google
Bigtable and is Bigtable’s open source implementation.
Compared to a column-oriented database, instead of grouping a number of columns,
document-oriented databases pack whole objects as a single value. JavaScript Object Notation
(JSON) and Extensible Markup Language (XML) are common representations of document-
oriented objects. The most prominent document stores are CouchDB, MongoDB and Riak. (Hecht,
2011)
Graph databases focus on the free interrelation of data; therefore they are very efficient in
traversing relationships between different entities. Existing graph databases are not very scalable
compared to other genres of NoSQL DBs, because it is expensive to travel multiple distributed
nodes to retrieve data. According DB-Engines Ranking, Neo4j is the most popular graph database in
use today.
The common technique used to partition and distribute data across nodes in the cloud is
Distributed Hash Tables (DHT). DHTs, which distribute lookup and storage over a number of peers
with no central coordination required, offer a scalable alternative to the central server lookup.
22
As most NoSQL databases do not provide strong data consistency, which is critical for
many OLTP systems, NoSQL technology is complementary to RDBMS, not a replacement.
(Pokorny, 2013)
NewSQL was first defined by Matthew Aslett, an analyst with the 451 Group. NewSQL has
the scalability and flexibility promised by NoSQL while retaining support for SQL queries and/or
ACID (atomicity, consistency, isolation and durability). It claims to improve performance for
appropriate workloads to the extent that the advanced scalability promised by some NoSQL
databases becomes irrelevant (Aslett, 2011). Like NoSQL, NewSQL uses shared-nothing
architecture and can scale-out to a large number of nodes without suffering a performance
bottleneck. Unlike NoSQL, which sacrifices consistency, NewSQL uses the relational data model
and primarily has an SQL interface. It supports the ACID properties for application transaction.
The traditional OLTP solution cannot handle massive OLTP workload and is incapable of
providing real-time analytics data warehouses as the traditional way of transferring data from OLTP
to OLAP (OnLine Analytics Processing) typically takes tens of minutes to hours (Stonebraker,
2011).
NewSQL preserves SQL and data consistency while offering high performance and
scalability.
Many OLTP applications require a strong consistency property. NewSQL is an alternative
for bringing the relational data model into the NoSQL database. It can make it easy for traditional
applications to migrate to a highly scalable cloud environment.
4. RDBMS Cluster – Oracle RAC
A traditional RDBMS cluster is a shared storage architecture. Oracle RAC is one of the most
successful products for enterprise mission-critical application. Here I am using Oracle RAC as an
example to illustrate the mechanism of traditional RDBMS cluster.
Oracle Real Application Clusters (RAC) allows multiple instances to access a single
database. RAC was evolved from Oracle Parallel Server (OPS) and Oracle introduced RAC in 2001
with their cache fusion technology in their Oracle 9i release. In 2004, Oracle extended their RAC
23
with their own server Clusterware software, a field used control by IBM HACMP (High
Availability Cluster Multi-Processing), HP Service Guard and Sun Cluster. Clusterware allows
multiple nodes to work together and can be viewed as a single system. The Oracle RAC
infrastructure is a key component for implementing the Oracle enterprise grid computing
architecture. The database of the Oracle RAC system is stored in a shared storage; all nodes must be
able to access the shared storage simultaneously.
In order to maintain data consistency across multiple instances, Oracle utilizes their
proprietary cache fusion technology to ensure data consistency within all the nodes. Cache fusion
technology, also known as cache coherence, maintains the consistency of data blocks in the buffer
caches within multiple instances. Any node in the RAC must acquire cluster-wide data lock before a
block can be modified or read. Oracle Global Cache Service (GCS) is implemented to maintain the
buffer cache coherence through high speed interconnection.
Oracle built a robust mechanism to protect data integrity when facing system failure. Each
node has a daemon process CSSD (Cluster Services Synchronization Daemon) to monitor the health
of the system and communicate with other nodes. When a severe issue is detected by the local
CSSD, the notification is broadcast to other nodes. RAC cluster can evict the membership of failed
nodes. When a network failure occurs, the CSSD in each node cannot communicate with others.
RAC uses voting disks to decide the node eviction. Voting disks are located on shared storage and
should be visible to all nodes. They are used to monitor disk heart beats. All the nodes must update
the block of voting disk periodically. If the disk block is not updated in a short timeout period, then
that node is considered unhealthy. The cluster can evict the unhealthy node or reboot it, depending
upon the quorum of that node, to avoid a split-brain situation.
Oracle RAC can prevent the server from being a single point of failure and provide high
availability and scalability. It can combine smaller commodity servers into a cluster to create
scalable environments that support mission-critical business applications.
However, compared to cloud-based data management solutions, the pitfall of Oracle RAC is
obvious. First, RAC can only prevent server failure. The communication latency between RAC
24
nodes is critical, so it is rare to deploy RAC nodes in different locations. The storage is shared; if it
fails, all the nodes will go down. The high availability of RAC is limited compared to the cloud
databases.
Second, Oracle RAC cannot provide linear scale-out performance. The performance of the
RAC system will downgrade when adding more nodes into the cluster. Oracle implemented a
Global Resource Directory (GRD) to record information about how resources are used within a
cluster database. It also introduced cache fusion technology to speed the data block movement
around a cluster. When multiple nodes are trying to access the same data set, the global locking
mechanism is used to protect the consistency property. It will cause contention and slow down the
performance. Oracle suggests partitioning the data between different applications. The shared
storage architecture prevents RAC from scaling further. Since all the read/write will go to the same
storage, sooner or later it will hit the storage speed limit when more nodes are added. RAC did offer
some scalability, but not much. Theoretically, Oracle RAC can have up to 255 nodes; however, it
has only been tested with up to 16 nodes. Actually, it is not common to see RAC on more than 6 or
8 nodes. Most instances of the RAC database are two-node. Oracle recognized the problem with
their RAC. In the latest Oracle 12c, Oracle introduced a new architecture called Flex Clusters,
which divide the nodes into two different types: Hub and Leaf. The Hub nodes are the same as the
traditional cluster nodes in its previous version. The Leaf nodes are connected only with the
corresponding attached Hub Nodes and they are not connected with each other. (Hussain et al.,
2013) The new architecture greatly reduces interconnect traffic and provides room to scale up the
cluster to the traditional cluster.
Finally, RAC implementation is expensive. As the speed of interconnection is critical for
RAC’s performance, Oracle suggests high-end switch to reduce the latency of cache fusion. The
storage performance has a common bottleneck when all the nodes access the same storage. High-
end storage is the silver bullet to improve overall RAC performance. Moreover, Oracle charges their
license fee based on the number of nodes. Oracle packs their RAC software and hardware together
into one appliance named Oracle Exadata. The list price for the basic eighth rack two-node Exadata
25
starts from $220,000 plus $55,000 storage cost, with an additional support fee based on Oracle’s
July17, 2014 price list. (http://www.oracle.com/us/corporate/pricing/exadata-pricelist-070598.pdf )
The top configuration of Exadata is eight-node with a list price of around $1.5 million.
Despite the fact that Oracle RAC is an expensive solution with limited improvement on
scalability and availability, it still a good choice for existing Oracle customers who do not want to
change their codes and want to seamlessly run their applications. But
5. Cloud Computing Database and Data Management
Many early papers have discussed cloud computing and the distributed cloud database. NIST
defines cloud computing as a model for enabling ubiquitous, convenient, on-demand network access
to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications,
and services) that can be rapidly provisioned and released with minimal management effort or
service provider interaction (Mell & Grance, 2011). Its key characteristics include: on-demand self-
service, broad network access, resource pooling, rapid elasticity and measured service.
A data management system can be migrated to the cloud platform and even become
Database-as-a-Service(DBaaS). DBaaS must be able to scale-out elastically and support databases
and workloads of different sizes. Curino et al. believe database partitioning is essential to allow
multi-nodes load balancing and scale-out. Partitioning should do this in a way that can minimizes
the number of cross-node distributed transactions. They proposed a graph-based partitioning method
to spread large databases across many machines. (Curino, 2011)
Jaroslav Pokorny states that tradition applications use vertical scale to support a larger
system while cloud computing uses horizontal scaling to scale out in a more effective and cheaper
way. NoSQL databases in the cloud relax some of usual database constraints to archive horizontal
scaling. The author discusses the ACID principle and CAP theorem. Traditional RDBMS always try
to guarantee consistency at all cost. Distributed Hash Tables (DHT) is a common technical to
partition the data for a DBMS to scale-out in the cloud. The author selected two NoSQL DBs,
CASSANDRA and Google DB, as typical examples and discusses their data model, how to query,
and how the data gets stored. Cassandra can even allow the developer to choose the consistency
26
degree in their client application to enable real-time transaction processing in the cloud. Also, the
author lists 10 popular NoSQL DBs and makes a comparison. The author concluded that the current
NoSQL solution is good for unstructured data but since has difficulty reaching simple ACID
properties, it is not a replacement for tradition RDBMSs. (Pokorny, 2013)
Rick Cattell walked through over twenty scalable data stores (Cattell, 2011) and compared
them from concurrency control, data storage, replication mechanism and transaction consistency
perspectives. He summarized six key features of a NoSQL DB. One of NoSQL’s key characteristics
is shared-nothing horizontal scaling architecture. Shared-nothing architecture allows NoSQL to
replicate and partition data over many servers, in order to scale and process data at much faster
speeds. It trades ACID constraints for performance and scalability. In his study, MySQL Cluster,
VoltDB, Clustrix, ScaleDB, ScaleBase and NuoDB are tagged as high consistency databases.
Not all NoSQL systems sacrifice consistency for high availability. Facebook Cassandra is a
key-value NoSQL store. It extends the concept of eventual consistency and implemental tunable
consistency mechanism that allow a developer to decide how consistent the requested data should
be. The developer can make trade-offs between consistency and latency. Cassandra allows clients
to specify a desired consistency level, ZERO, ONE, QUORUM and ALL, with each read or write
operation. ZERO indicates no consistency guarantee but offers the lowest possible latency, and ALL
is the highest consistency but it sacrifices availability. QUORUM is a middle-ground, ensuring
strong consistency. Use of these consistency levels should be tuned in order to strike the appropriate
balance between consistency and latency for the application. In addition to reduced latency, lower
consistency requirements mean that read and write services remain more highly available in the
event of a network partition. (Featherston, 2010)
Dr. Michael Stonebraker and Rick Cattell, leading researchers and technology entrepreneurs
in the database field, claim that one size does not fit all. (Stonebraker & Cattell, 2011) They studied
many scalable SQL and NoSQL data stores introduced over the past five years for consistency
guarantees, per-server performance, scalability for read versus write loads, automatic recovery from
failure of a server, programming convenience, and administrative simplicity. These data stores can
27
manage data volume that exceeds the capacity of single-server RDBMSs. These researchers think
that a developer needs to redesign the application for scalability, partition application data into
“shards,” avoid operations that span partitions, design for parallelism, and weigh requirements for
consistency guarantees. They presented ten rules that a system should follow in order to achieve
scalable performance. Some of the rules are: shared-nothing architecture, leverage of fast memory,
high availability, automatic recovery, avoidance of multi-node operations, not trying to build ACID
yourself, and recognition that per-node performance matters.
At the Symposium on Principles of Distributed Computing (PODC) 2000, Eric Brewer
presented his CAP theorem, also known as Brewer’s CAP Theorem. The three key properties of
distributed databases are tolerance for Network Partition, Consistency and Availability. Brewer
states that any shared-data system can only have, at most, two of those properties. Daniel Abadi of
Yale University says that consistency became adjustable in the modern distributed database system.
He questioned that the CAP theorem focus on the trade-off between consistency and availability
when there partition is not an issue, but it ignores latency. He argued that a system should trade off
latency and consistency in the absence of partition. He went further and proposed his PACELC
theorem: if there is a partition (P), how does the system trade off availability and consistency (A and
C); else (E), when the system is running normally in the absence of partitions, how does the system
trade off latency (L) and consistency (C)? (Abadi, 2012) Figure 1 below illustrate the diagram of
PACELE.
Figure 2: PACELC Tradeoffs for Distributed Data Services (Abadi, 2012)
28
Eric Brewer recognized the limitation of his CAP theorem in the evolving environment. The
CAP theorem asserts that any networked shared-data system can have only two of three desirable
properties. After twelve years of his CAP theorem, Brewer thinks that designers can optimize
consistency and availability by explicitly handling partitions, thereby achieving some trade-off of all
three.(Brewer, 2012)
The CAP theorem had become a principle for designing NoSQL databases. However, many
transactional applications cannot give up a strong consistency requirement. Tim Kraska et al. found
there has non-trivial trade-off between cost, consistency and availability through their experiences.
They advocate finding a balance between cost, consistency, and availability and present a number of
techniques that let the system dynamically adapt the consistency level by monitoring the data and/or
gathering temporal statistics of the data. (Kraska, 2009) They acknowledge that high consistency
implies high cost per transaction and reduced availability and propose to divide the data into three
categories: Category A – Serializable, Category B – Adaptive and Category C - Session
Consistency. They understand the challenge in relaxed consistency models is to provide the best
possible cost/benefit ratio while still providing understandable behavior to the developer. The
different categories bear different consistency level guarantees. The system only pays the
consistency cost when it matters.
29
CHAPTER III: METHODOLOGY
Cloud computing is becoming one of hottest trends in the current IT industry. The traditional
RDBMS is not designed for the cloud environment, so it cannot fit its unique characteristics. Novel
data management solutions are booming in recent years. It is not feasible to review all the solutions.
I employ a systematic approach to select the solutions for study in this essay.
The major selection criteria for review include:
Solutions must have potential to scale in the cloud for OLTP transactions.
Solutions must be able to provide strong consistency.
Solutions should be either well-known or have great growth potential.
Gartner Magic Quadrant for Operational Database Management Systems is used as an
important reference to narrow down the candidate solutions. This essay focuses on solutions that
can already or have potential to scale in the cloud for OLTP transactions. The ability to provide
strong consistency is a key factor. Gartner Magic Quadrants is a research methodology and
visualization tool for monitoring and evaluating the progress and position of companies in a
specific, technology-based market. Their report is based on their survey of hundreds of customers.
Gartner requests feedback on a vendor’s completeness of vision and ability to execute it. Gartner set
criteria for the companies listed in Magic Quadrants. For the 2013 Gartner Magic Quadrant for
Operational Database Management Systems, companies to be considered must have over 100
customer across at least two of the major geographic regions. The minimum revenue is $20 million.
SAP, Oracle, NuoDB, Clstrix and VoltDB, which I am reviewing, are listed in the 2013
Gartner Magic Quadrant for Operational Database Management Systems. Megastore from Google is
only used within Google and is not considered based on Gartner’s inclusion criteria, but given
Google’s influence in the cloud and Megastore’s unique design, Megastore is selected as one of the
DBs to be reviewed. ScaleDB is selected mostly because it is one of MySQL variant databases.
MySQL cluster adopts shared-nothing architecture, but ScaleDB tried to solve the issue with
surprisingly backward shared-disk architecture.
30
DB-Engines ranking is a good reference that helped me understand the popularity of
reviewed DBs. DB-Engines rank the databases by their popularity, which it is scored through a
number of parameters such as the number of mentions on websites, general interest according to
Google Trends, number of job offers, etc. As I focus on the cloud-based OLTP database, it is a
relative new domain and it is understandable that most of reviewed DBs have a relatively low
ranking. But its most recent growth trend can provide some additional insight besides Gartner’s
Magic Quadrant.
Below is the graph of reviewed DB growth trends that I drew using the tool provided by DB-
Engine. MySQL ranks very high on the popularity scale and was second in the overall ranking, but
it is an umbrella for all MySQL products. The MySQL cluster in my essay should only account for a
small portion of the MySQL market. As you can see in the graph below, five out of six DBs are
gaining popularity, the exception being MySQL. Again, Google’s Megastore is not listed as it is
only used in internal projects.
Figure 3: Trend Popularity Data provided by DB-Engine.
31
The selected data management solutions are reviewed with a focus on their architecture and
the data consistency they can provide.
32
CHAPTER IV: CASE STUDIES
1. Megastore
Megastore is a storage system developed by Google and has been widely deployed within
Google for many years. It blends the scalability of a NoSQL data store with the convenience of a
traditional RDBMS in a novel way and provides both strong consistency guarantees and high
availability. (Baker et al., 2011)
The goal of Megastore is to overcome the weaknesses of common NoSQL solutions which
can only provide eventual consistency. Megastore is designed for high availability, scalability and
low latency data management. More important, it can provide full ACID, which is critical for many
applications. It is built upon Google’s BigTable (Google’s NoSQL key-value store) and adds ACID
transactions, secondary indexes, queues and others primitives. It optimizes the Paxos algorithm to
achieve low latency replication operation across geographically distributed datacenters, in order to
provide reasonable latencies for interactive applications in a highly distributed environment.
The data in Megastore is partitioned into so-called entity groups which are hierarchically
linked sets of entities. In one entity group, all the entities share the common prefix of the primary
key. Megastore tables are either entity group root tables or child tables. Each child table must
declare a single distinguished foreign key referencing a root table. (Baker et al., 2011) Each record
in the root table along with all the associated data in the child table is deemed as one single entity
group. Google created transaction logs for each single entity group which are replicated to other
copies, so it can ensure the ACID properties within an entity group. Megastore supports two-phase
commit across entity groups operation. But the cost of two-phase commit is expensive. Google
recommends always using asynchronous messaging when the consistency for across-entity group
operation is not absolutely required.
The figure below illustrates the two-phase commit and asynchronous messaging for across-
entity groups operations.
33
Figure 4: Operations across entity groups (Baker et al., 2011)
Megastore provides three types of read methods: current, snapshot, and inconsistent reads
for user’s various requirements. As it is built on top of Google’s BigTable NoSQL technology,
Megastore stores multiple values with different timestamps to achieve multiversion concurrency
control (MVCC). The current read fetches the latest committed version of data. The snapshot read
fetches the data by timestamp. The inconsistent read is similar to the current read and tries to get the
latest version of data, but unlike current read, the inconsistent read does not check whether all the
committed transaction logs have been applied, so it does not guarantee that it can fetch the latest
version of data. The inconsistent read is faster than the other two types of reads and can be used
when stale or partially applicable data can be tolerated.
Megastore is built upon BigTable and adds traditional RDBMS primitives such as ACID,
indexes and schemas. It provides transparent replication and failover between data centers through
Paxos. It stores the transaction in the log first and applies it for each entity group. The ACID
properties are protected over an entity group for any transaction. For transaction across multiple
entity groups, users can choose light-weight asynchronous messaging or expensive two-phase
commit to suit their needs. Data writes need a majority of replicas to be up in order to commit, so it
may be more costly. Global consistency comes at a cost and Megastore writes are relatively slow.
34
Megastore is scalable, partition tolerate and provides adjustable ACID control. It is suitable
for a large-scale transaction processing system where a data can be easily partitioned. It performs
well largely in reads and small update applications.
2. SAP HANA
SAP HANA is an in-memory, columnar-based, relational database developed by Germany
software giant SAP AG. The name HANA is short for "High-Performance Analytic Appliance".
SAP HANA employs a hybrid engine that can process both column-based store and row-based
store. The row-based engine is suitable for OLTP and the column-based engine is suitabl for OLAP.
SAP merged two engines into one DB, in order to provide real-time analytics functionality. Both
engines share a common persistency layer, which provides data persistency consistent across both
engines (Krutov et al., 2014). It has a logging system that records all changes in in-memory pages.
The logger writes all the committed transactions on persistent storage. The key component that is
responsible for ensuring transactional ACID is Transaction Manager. It coordinates database
transactions, controls transactional isolation, and keeps track of running and closed transactions.
The transaction manager works with the persistency layer to achieve atomic and durable
transactions and provide consistent views of data.
35
Figure 5: The SAP HANA database architecture (Färber, 2012)
SAP HANA adopted the shared-nothing architecture and is designed for a highly distributed
system. The system can be split into three different functional components: Name server, Index
server and Statistics server. The Name server stores the topology information of the SAP HANA
system and provides directory service. The Index server contains the actual data stores and the
engines for processing the data. The Statistics server collects historical performance data for alerting
and performance analysis purposes. All the servers can be deployed into multiple nodes. One of the
index servers becomes the master server of the index servers’ cluster, and others index server acts as
a slave index server. If application data does not partition properly, the master index server has to
forward the request to all the slave index servers and perform but a multi-hop process. That would
increase the processing latency. When the data is well partitioned, the master index server only
needs to forward a request to the right index server hosting the corresponding data partition.
Since SAP HANA relies on MVCC as the underlying concurrency control mechanism, the
system provides distributed snapshot isolation and distributed locking to synchronize multiple
36
writers. Therefore, the system relies on a distributed locking scheme with a global deadlock
detection mechanism, avoiding a centralized lock server as a potential single point of failure (Wada
et al., 2011).
SAP HANA provides true ACID guarantees. If the data can be partitioned properly, the SAP
HANA can be scaled into hundreds of nodes with minimum performance penalty. Its architecture is
good for performing both OLTP and OLAP in one place. It eliminates the traditional Extract,
Transform and Load (ETL) process that pulls the data from OLTP into OLAP for data mining and
reduces data redundancy.
SAP HANA can be scaled-out for largely read data analytics processing. It suffers from
performance downgrade if online transaction processing has to access data in a remote distributed
environment.
3. VoltDB
VoltDB is an open source OLTP database that implements the design of the academic H-
Store project. It is an in-memory DB that adopted the share-nothing architecture. It is designed to
run a cluster rather than a single big machine. Tables can be partitioned across multiple servers in
the cluster. It can scale-out on a commodity machine and deliver ultra-high performance while
protecting the ACID properties.
VoltDB is designed to run much faster than traditional RDBMSs. Its main founder, Michael
Stonebraker, identified the four major overheads of traditional RDBMSs: buffer pool overhead,
multi-threading overhead, record-level locking and write-ahead log. He proposed a novel design that
can get rid of all those four major overheads, in order to make the new DB run much faster than
traditional RDBMSs.
The interface that VoltDB exposes is stored procedure. A transaction is a stored procedure
call and the transaction is executed sequentially and exclusively against its data. That is how
VoltDB protects the transaction atomicity and isolation properties. VoltDB is a pure SQL system. It
supports a large subset of SQL-92, including most SQL data types, along with filtering, joins and
37
aggregates (VolDB whitepaper, 2010). VoltDB can accept Ad-hoc queries, but it just compiles the
query on-the-fly into temporary stored procedures and calls it in the same way (Stonebraker, 2011).
VoltDB achieves high availability through automatic intra-cluster and inter-cluster
replication. Data is synchronously committed to replicate partitions within the cluster before
transactions commit. This provides durability against single-node failures. For intra-cluster
replication over WAN, transactions are asynchronously committed to a replica cluster. VoltDB
implements a concept called command logging for transaction-level durability (VolDB whitepaper,
2010). When a disaster occurs, VoltDB simply replays the logged commands to restore the data for
recovery.
For now, VoltDB only supports hash partitioning. When a new node is added in the cluster,
VoltDB has to redistribute the data into different servers; it cannot provide service until the data
redistribution is completed. Also, the hash partitioning is not good for range searches. It is best
suited for an application which has a high volume of small transactions.
4. MySQL Cluster
MySQL Cluster enhances the standard open source MySQL with an in-memory clustered
storage engine known as Network DataBase engine (NDB). It employs shared-nothing clustering
architecture and automatic sharding technique. That enables the great scalability of MySQL Cluster
compared to other traditional RDBMS.
Unlike the Oracle RAC, in which all the nodes in the RAC are treated equally, nodes in the
MySQL cluster are assigned three different roles: Storage Node (SN), Management Server Node
(MGM) and MySQL Server Node. The SN stores all the data and replicates the data between nodes
to ensure high availability. It also handles all the database transactions. The MGM handles the
system configuration and it is only used at start-up and system re-configuration (Ronstrom, 2004).
The MySQL server node sits in between the application and the SN. It knows how the data is
partitioned in the SN and acts as a broker that takes the application SQL request and sends it to the
appropriate SN.
38
Figure 6: MySQL Node Architecture (Ronstrom, 2004)
MySQL Cluster uses two-phase commit protocol in order to guarantee data consistency. All
the changes of a transaction are replicated synchronously to the nodes which hold other copies of
data before transaction commit. MySQL Cluster supports read committed transaction isolation level
and will not read uncommitted data of other transactions.
5. ScaleDB
ScaleDB is a pluggable storage engine that transforms MySQL to a cluster of database
servers. It uses shared-disk architecture which is similar to Oracle RAC.
ScaleDB is composed of three different types of nodes: Database Node, Cluster Manager
Node and ScaleDB Storage Node (Shadmon, 2009). Its design is almost the same as MySQL
Cluster except for the shared-disk architecture. All the storage nodes form a global cache and
39
persistency layer. The global cache manages caching of shared data and guarantees cache
coherency.
Figure 7: ScaleDB Cluster with Mirrored Storage (Shadmon, 2009)
ScaleDB uses a locking mechanism to guarantee ACID properties. The local lock manager
maintains locks at the node level, while the distributed lock manager manages cluster level locks.
ScaleDB can offer high availability. The cluster manager node detects the failure of other
nodes and takes action to resolve the issue. The standby cluster manager will kick in if the master
cluster node fails.
The shared-disk architecture has experienced scalability challenges and performance
bottlenecks. That explains why most of the new databases choose shared-nothing architecture. But
the speed of network connection has increased drastically in recent years and the cost of high
performance storage has been significantly reduced, especially in the cloud. ScaleDB is suitable for
the cloud environment and delivers high availability. There is no single point of failure in ScaleDB.
As all the storage nodes need to share their storage with other nodes, its scalability is limited.
40
6. NuoDB
NuoDB is a distributed database designed with global application deployment challenges in
mind. It is a true SQL service, with all the properties of ACID transactions, standard SQL language
support, and relational logic.(NuoDB, 2013)
NuoDB is composed of three layers: an administrative tier, a transactional tier and a storage
tier. The transaction layer maintains atomicity, consistency and isolation, while the storage layer is
responsible for durability. These layers can be set up in one server, but it is supposed to be different
servers by design. The node for the translation layer is known as the transaction engine (TE) and the
node for the storage layer is known as the storage management node (SM). Unlike a typical hub-
and-spoke design, NuoDB uses peer-to-peer service to move data between its nodes. The
administrator node is responsible for monitoring, managing and automating database activity. As all
processes are peers, all the hosts are the same and there is no single point of failure. The peer-to-
peer communication is formed through a local management agent that installs on all the hosts. The
administrator node have a global view of all the nodes.
NuoDB introduced a new concept called Atoms. Atoms are chunks of data used for
simplifying internal communication and caching. The NuoDB is simply a collection of Atoms that
can be easily stored in key-value stores or others kinds of storage.
NuoDB uses MVCC to avoid the conflict of concurrency read and write. For the write-write
conflict, NuoDB picks some host as a chairman of the object to act as tie-breaker. Only the TE that
caches the object can be selected as chairman, so most mediation is done locally. NuoDB sends
asynchronous update messages to all peers that have a copy of the object when TE commits a
transaction.
41
Figure 8: NuoDB architecture (NuoDB, 2013)
NuoDB is a high performance and resilient database solution that supports SQL and ACID
properties. The peer-to-peer architecture makes NuoDB very easy to scale-out and achieve high
availability. The unique Atom design allows different data stores to be selected for storage, such as
a local file system, Amazon S3 or a Hadoop Distributed Filesystem.
Every SM node of NuoDB connects to a whole set of databases. NuoDB relies on the
underlying storage to provide the data distribution. In order to prevent single SM failure, NuoDB
allows installation of multiple SM nodes, but additional SM nodes means additional storage for
another copy of the dataset. If there is a limit on the underlying storage, NuoDB cannot go beyond
that limit, since it cannot combine local disks into a big pool.
For consistency during write-write conflict, NuoDB uses a tie-breaker mechanism that is
similar to the locking system of traditional RDBMSs. It might run into bottleneck when the system
is a highly concurrent OLTP system. NuoDB is best suited for a hybrid workload that combines
both OLTP and OLAP requirements.
NuoDB adopted a tunable commit protocol that allows the user to trade off configurable
durability for performance.
NuoDB is well suited for applications that need transactional consistency, highly scalable
performance and simple operation.
42
7. ClustrixDB
ClustrixDB is designed from the ground up for scale-out in the cloud. It adopts fully-
distributed shared-nothing architecture and can be easily to scaled-out by adding more nodes. It
fully supports SQL and is fully ACID compliant. It has the ability to handle massive ACID
transactions.
Just like many other NewSQL DBs, ClustrixDB stores multi-version rows with timestamps,
so there is no conflict in providing consistent read while updating the record. The older versions are
garbage-collected when no longer used. For concurrent updates, ClustrixDB uses 2Phase Locking
(2PL) to order updates. Writers always read the latest committed information and acquire locks
before making any changes (Clustrix, 2014).
ClustrixDB distributes multiple copies of data intelligently across nodes and can parallelize
the query automatically by using multiple nodes and multiple cores on each node, so it can
accelerate the queries without sharding the database.
Figure 9: Clustrix Distributed Query Processing (Clustrix, 2014)
ClustrixDB is suitable for applications that running both OLTP and OLAP at the same time.
ClustrixDB claims it can scale linearly to hundreds of cores since it employs shared-nothing
architecture, intelligent data distribution, and distributed query processing.
43
CHAPTER V: CHALLENGE, OPPORTUNITY AND TREND
In this essay, I have reviewed eight databases: Google Megastore, SAP NANA, VoltDB,
Oracle MySQL Cluster, ScaleDB, NuoDB and Clustrix DB, along with Oracle RAC. In the latest
2013 Gartner Magic Quadrant for Operational Database Management Systems, Oracle and SAP
were recognized as market leaders, while VoltDB, NuoDB and Clustrix were listed as niche players
that have potential to challenge the leaders. Google’s Megastore is only used internally and cannot
be evaluated by third parties.
Most current databases provide multi-version concurrency control (MVCC) for read
consistency. But the way the DBs I reviewed implement MVCC is different than traditional
RDBMSs. Traditional RDBMSs such as Oracle RAC only keep the latest version of the data in the
database and try to reconstruct older versions of data dynamically as required. The new approach for
MVCC is to store multiple versions of data in the database and garbage-collect records when they
are no longer needed. All seven DBs except VoltDB choose the new approach for MVCC. VoltDB
tries to serialize all the transactions, so there is no concurrency issue at all. It is obvious that storing
multiple versions of records requires more disk space, but it can avoid the expensive operation of
reconstructing older versions of data. Storing multiple versions of records can achieve better
performance and scalability and opens a door toward eventual consistency.
Application consistency requirements might vary for different types of applications.
Applications such as social networking and commenting can tolerate a different view of data for a
short period of time. For mission critical applications, such as bank account management, strong
consistency has to be guaranteed.
In the reviewed DBs, only Megastore allows the developer to choose three different
consistency reads. SAP HANA, MySQL Cluster, NuoDB and ScaleDB all enforce ACID
compliance. VoltDB sequences all operations. There is no concurrency issue for VoltDB, so no
need to worry about data consistency.
In a distributed cloud environment, providing consistency is much more complex than in a
traditional database. It requires a great deal of communication involving remote locks, which force
44
systems to wait on each other before proceeding to mutate shared data. It is common to employ a
data sharding technique to avoid consistency issues. The inevitable communication delay and
reliability downgrade between networks increases the complexity to guarantee complete consistency
in a cloud environment. Megastore use Paxos to support two-phase commit across entity groups
operation. Google recognized the overhead of two-phase commit, so it recommends always using
asynchronous messaging when consistency across entity group operation is not absolutely required.
The common strategy is to employ two-phase commit, but its complexity and costs reduce the
system scalability and performance even when using the best practices and the most advanced
technologies of the time.
There always is cost attached for strong consistency. For many enterprise applications,
strong consistency is a must in some of its use cases. But the consistency requirement in many use
cases can be relaxed for the benefit of availability and scalability. For example, the account balance
of prepaid cellular service is critical and has to be tracked in real time for authorization before
authorizing a subscriber’s call request. But the account balance is not that important for the majority
of postpaid accounts since the carrier collects payment at the end of the customer’s bill cycle. As
long as the account balance of a postpaid account become consistent before the bill cycle, it is good
enough for the carrier to bill the customer correctly. Even for a prepaid account, not all the changes
within its transactions demand strong consistency. Account balance must maintain consistency all
the time, but the call detail does not have to be online right after the customer hangs up the phone.
The customer is unlikely to detect when the call details are posted by seconds or even minutes
delay. From a business perspective, some data in some transactions require absolute consistency
while others do not. Even different customers may have different consistency needs. If a database
allow its consistency to be customized and leaves the business analyst with the job of choosing the
desired consistency level for different transactions or users, then the system does not have to pay the
consistency costs for transactions where consistency is not really needed. Google’s Megastore
allows prioritization of consistency over performance. It provides three levels of read consistency,
current, snapshot, and inconsistent reads. If the database can let the user customize the consistency
45
model for the content of data on top of session level consistency control, it should be able to reduce
the cost of maintaining strong consistency in transactions. For instance, money transferring in a
bank must be executed in the strong consistency model. If the data management system can impose
strong consistency on the account balance information while applying eventual consistency on the
auditing and logging information, it might still meet the banking system’s requirements. But it
reduces the amount of data that must be synced significantly, in order to reduce the cost of
maintaining the strong consistency transaction.
Compared to the traditional major RDBMS products, Oracle, Microsoft SQL Server and
IBM DB2 rank first, third and fifth to respectively in terms of popularity of DB-Engines. Cloud
DBs have a relatively lower ranking of popularity. Table 1 compares the seven reviewed DBs in
terms of popularity, storage architecture and consistency. The popularity is based on the DB-Engine
ranking on relational DBMSs for July 2014. The latest result can be found at http://db-
engines.com/en/ranking/relational+dbms
MySQL is ranked as the second-most popular database after Oracle, thanks to great adoption
of the regular MySQL database. The MySQL cluster I discussed in this essay is quite different from
but the single-machine MySQL database, but DB-Engine ranks the whole MySQL family as a
single product. The number two ranking does not reflect the popularity of MySQL cluster at all.
Database Popularity Storage
Architecture Consistency
Megastore Only Google Shared-nothing adjustable ACID
SAP HANA 13 Shared-nothing ACID
VoltDB 40 Shared-nothing ACID
MySQL Cluster 2 Shared-nothing ACID
ScaleDB 78 Shared-Disk ACID
NuoDB 43 Shared-nothing Tunable ACID
ClustrixDB 52 Shared-nothing ACID
Table 1: Reviewed DB comparison
46
Speed, scalability and availability are all interesting areas for comparison, but each DB
might vary in performance in different use cases; there are no standard criteria to rank these
systems. Most benchmarks are conducted internally and in their favourite use cases environment.
VoltDB is touted as a super-fast OLTP engine. It targets high-velocity data to provide real time
analytics on it. SAP HANA is well-known for its ability to combine OLTP and OLAP engines in
one. MySQL cluster is the pioneer that adopts shared-nothing architecture and has the ability to
scale to tens of nodes. ScaleDB is a pluggable engine for MySQL and provides a cloud-based
solution for scalability. NuoDB claims to provide high scalability through its three-tier structure
combining features of administrative, transactional and storage layers. ClustrixDB is designed from
the ground up for scale-out in the cloud. It is perceivable that it has great scalability in the cloud.
According to the InformationWeek 2014 State of Database Technology Survey of 955
business technology professionals, the traditional RDBMS is still widely used today, but the cloud
DBs are becoming attractive. MongoDB is ranked tenth in use today. The popularity trend depicted
in Chapter III shows that five reviewed cloud DBs gained on popularity while the other two
(MegaStore and MySQL Cluster) have no data to measure.
Traditional RDBMSs are suitable for processing structured data in a transactional fashion,
but cannot handle massive data with decent performance. NoSQL can scale easily to process high
volume of all kinds of data, but it cannot guarantee strong consistency. Ensuring strong consistency
in the cloud environment is expensive and comes at the cost of performance and availability. It is a
challenge for the data management solution providers to come up with a solution that can overcome
the limitations of both RDBMSs and NoSQL DBs. NewSQL is emerging to tackle the issue.
47
CHAPTER VI: CONCLUSIONS AND RECOMMENDATIONS
Cloud computing offers significant benefits and presents great opportunities for the
information technology industry. The traditional RDBMS is designed to run on a single big
machine. Its cluster solutions are based on shared-disk architecture and group a number of servers
into one cluster. It can protect the system when any part of the server experiences a hardware issue,
but it cannot linear scale, since the storage is single point of failure and experiences performance
bottlenecks. These RDBMSs are not natively suited to a cloud environment. Many NoSQL solutions
have been invented by Internet companies to tackle their scalability and performance challenges in
the cloud. They use partitioning or a sharding technique to distribute data into multiple nodes so
they can better scale with near-linear performance. But they have to relax the notion of strong
consistency for the benefit of improved scalability and only offer eventual consistency. For many
web applications, eventual consistency is good enough to satisfy their requirements. For instance, a
social website or job searching website, in order to maintain high availability, avoids the overhead
of synchronizing the data between distant nodes. It is common to keep multiple replica in the cloud.
The replica approach can be master/replica, master/master, or distributed peers. Most NoSQLs do
not guarantee the strong data consistency that traditional RDBMSs offer. Strong data consistency is
still critical for many enterprise OLTP applications, so there is tremendous opportunity for new
vendors to introduce novel solutions that address these enterprises’ concerns. NewSQLs are
emerging and trying to combine both NoSQL’s and traditional RDBMS’s advantages.
In this essay, I have surveyed cloud computing technology, cloud NoSQL, NewSQL, data
consistency theorem, traditional RDBMS cluster Oracle RAC and seven selected data management
solutions. All seven database solutions recognized the importance of strong data consistency and
tried to solve the issue through their unique architectures.
This essay explored the common architecture of cloud-based OLTP data management
solutions such as shared-nothing storage, MVCC, partitioning or sharding to avoid across-distance
nodes transactions. In addition, this work has identified the challenges for future development in the
48
domain, especially fine grain data consistency control. I hope my work will provide a better
understanding of data consistency challenges in the cloud environment and will assist system
architects in making better decisions when they are thinking of migrating their application to the
cloud. It can also be used as material for university students who have interests in the data
management domain.
49
References
Abadi, D. J. (2012). Consistency tradeoffs in modern distributed database system design. Computer-
IEEE Computer Magazine, 45(2), 37.
Aslett, M. (2011). How will the database incumbents respond to NoSQL and NewSQL? San
Francisco, The, 451, 1-5.
Baker, J., Bond, C., Corbett, J. C., Furman, J. J., Khorlin, A., Larson, J., & Yushprakh, V. (2011,
January). Megastore: Providing Scalable, Highly Available Storage for Interactive Services.
In CIDR (Vol. 11, pp. 223-234).
Brewer, E. A. (2000, July). Towards robust distributed systems. In PODC (p. 7).
Brewer, E. (2012). CAP twelve years later: How the" rules" have changed. Computer, 45(2), 23-29.
Cattell, R. (2011). Scalable SQL and NoSQL data stores. ACM SIGMOD Record, 39(4), 12-27.
Curino, C., Jones, E. P., Popa, R. A., Malviya, N., Wu, E., Madden, S., ... & Zeldovich, N. (2011).
Relational cloud: A database-as-a-service for the cloud.
Färber, F., Cha, S. K., Primsch, J., Bornhövd, C., Sigg, S., & Lehner, W. (2012). SAP HANA
database: data management for modern business applications. ACM Sigmod Record, 40(4),
45-51.
Featherston, D. (2010). Cassandra: Principles and application. University of Illinois, 7, 28.
Feinberg, D. Adrian M. & Heudecker, N. (2013). Magic Quadrant for Operational Database
Management Systems. Gartner Research Note.
Hecht, R., & Jablonski, S. (2011). NoSQL Evaluation. In International Conference on Cloud and
Service Computing.
50
Hogan, M., Liu, F., Sokol, A., & Tong, J. (2011). NIST cloud computing standards roadmap. NIST
Special Publication, 35.
Home - Clustrix Documentation - Clustrix Documentation. 2014. Home - Clustrix Documentation -
Clustrix Documentation. [ONLINE] Available at:
http://docs.clustrix.com/display/CLXDOC/Home.
Hussain, S. J., Farooq, T., Shamsudeen, R., & Yu, K. (2013). New Features in RAC 12c. In Expert
Oracle RAC 12c (pp. 97-122). Apress.
Kraska, T., Hentschel, M., Alonso, G., & Kossmann, D. (2009). Consistency Rationing in the
Cloud: Pay only when it matters. Proceedings of the VLDB Endowment, 2(1), 253-264.
Kossmann, D., Kraska, T., & Loesing, S. (2010, June). An evaluation of alternative architectures for
transaction processing in the cloud. In Proceedings of the 2010 ACM SIGMOD International
Conference on Management of data (pp. 579-590). ACM.
Krutov, I., Vey, G., & Bachmaier, M. (2014). In-memory Computing with SAP HANA on IBM
eX5 Systems. IBM Redbooks.
Kumar, R., Gupta, N., Maharwal, H., Charu, S., & Yadav, K. (2014). Critical Analysis of Database
Management Using NewSQL.
Mell, P., & Grance, T. (2009). The NIST definition of cloud computing. National Institute of
Standards and Technology, 53(6), 50.
NoSQL - Wikipedia, the free encyclopedia. 2013. [ONLINE] Available at:
http://en.wikipedia.org/wiki/NoSQL
NuoDB, Inc. A Technical Whitepaper – NuoDB, 2013
Pokorny, J. (2013). NoSQL databases: a step to database scalability in web environment.
International Journal of Web Information Systems, 9(1), 69-82.
51
Ronstrom, M., & Thalmann, L. (2004). MySQL cluster architecture overview. MySQL Technical
White Paper.
Shadmon, M. (2009). The ScaleDB Storage Engine.
Solid IT, “DB-Engines Ranking of database management systems” [Online]. Available: http://db-
engines.com/en/ranking
Stonebraker, M. (2011). NewSQL: An Alternative to NoSQL and Old SQL for New OLTP Apps.
Stonebraker, M., & Cattell, R. (2011). 10 rules for scalable performance in simple operation
datastores. Communications of the ACM, 54(6), 72-80.
Stonebraker, M., & Weisberg, A. (2013). The VoltDB Main Memory DBMS. IEEE Data Eng. Bull.,
36(2), 21-27.
Terry, D. (2013). Replicated data consistency explained through baseball. Communications of the
ACM, 56(12), 82-89.
VoltDB, L. L. C. VoltDB Technical Overview, Whitepaper, 2010.
Wada, H., Fekete, A., Zhao, L., Lee, K., & Liu, A. (2011, January). Data Consistency Properties
and the Trade-offs in Commercial Cloud Storage: the Consumers' Perspective. In CIDR
(Vol. 11, pp. 134-143).
Zhang, Q., Cheng, L., & Boutaba, R. (2010). Cloud computing: state-of-the-art and research
challenges. Journal of internet services and applications, 1(1), 7-18.