an evaluation of key-value stores in scientific …

AN EVALUATION OF KEY-VALUE STORES IN

SCIENTIFIC APPLICATIONS

A Thesis Presented to

the Faculty of the Department of Computer Science

University of Houston

In Partial Fulfillment

of the Requirements for the Degree

Master of Science

By

Sonia Shirwadkar

May 2017



Sonia Shirwadkar

APPROVED:

Dr. Edgar Gabriel, ChairmanDept. of Computer Science, University of Houston

Dr. Weidong ShiDept. of Computer Science, University of Houston

Dr. Dan PriceHonors College, University of Houston

Dean, College of Natural Sciences and Mathematics

ii

Acknowledgments

“No one who achieves success does so without the help of others. The wise acknowledge

this help with gratitude.” - Alfred North Whitehead

Although, I have a long way to go before I am wise, I would like to take this opportunity

to express my deepest gratitude to all the people who have helped me in this journey.

First and foremost, I would like to thank Dr. Gabriel for being a great advisor. I

appreciate the time, effort and ideas that you have invested to make my graduate experience

productive and stimulating. The joy and enthusiasm you have for research was contagious

and motivational for me, even during tough times. You have been an inspiring teacher and

mentor and I would like to thank you for the patience, kindness and humor that you have

shown. Thank you for guiding me at every step and for the incredible understanding you

showed when I came to you with my questions. It has indeed been a privilege working with

you.

I would like to thank Dr. Shi and Dr. Price for accepting to be my committee members.

I truly appreciate the time and effort you spent in reviewing my thesis and providing

valuable feedback.

A special thanks to my PSTL lab-mates Shweta, Youcef, Tanvir, and Raafat. You have

contributed immensely to my personal and professional time at the University of Houston.

The last nine months have been a joy mainly because of the incredible work environment

in the lab. Thank you for being great friends and for all the encouragement that you have

given me.

A big thanks to Hope Queener and Jason Marsack at the College of Optometry for

teaching me the value of team-work and work ethics. I truly enjoyed working with you.

I have been extremely fortunate to have the constant support, guidance, and faith of

iii

my friends. A big thank you to all my friends in India, for constantly motivating me to

follow my dreams. Thank you for the late-night calls, care packages, and all the love that

you have given me in the time that I have been away from home. I would like to thank my

friends Omkar, Tejus, Sneha, Sonal, Aditya, and Shweta for being my family away from

home. I will forever be grateful for the constant assurance and encouragement that you

gave me. I would also like to thank my friends, classmates and roomates here in Houston

for all their help and support.

A special thanks to all my teachers. I would not be here if not for the wisdom that you

have shared. You have empowered me to chase my dreams. Each one of you has taught

me important life lessons that have always guided me. I will be eternally grateful to have

been your student.

Last but by no means the least, I would like to thank my family for always being there

for me. I would like to start by thanking my Mom and Dad for their unconditional love and

support. A very big thank you to Kaka and Kaku for all their love, concern and advice.

You all have taught me the beauty of hard-work and perseverance and this thesis would

never have been possible without you.

Finally, I would like to thank Parikshit for being my greatest source of motivation.

You inspire me everyday to be a better version of myself and I would never have made it

without you.

iv



An Abstract of a Thesis

Presented to

the Faculty of the Department of Computer Science

University of Houston

In Partial Fulfillment

of the Requirements for the Degree

Master of Science

By

Sonia Shirwadkar

May 2017

v

AbstractBig data analytics is a rapidly evolving multidisciplinary field that involves the use of com-

puting capacity, tools, techniques, and theories to solve scientific and engineering problems.

With the big data boom, scientific applications now have to analyze huge volumes of data.

NoSQL [1] databases are gaining popularity for these type of applications due to their scal-

ability and flexibility. There are various types of NoSQL databases available in the market

today [2], including key-value databases. Key-value databases [3] are the simplest NoSQL

databases where every single item is stored as a key-value pair. In-memory key-value stores

are specialized key-value databases that maintain data in main memory instead of the disk.

Hence, they are well-suited for applications having high-frequencies of alternating read and

write cycles.

The focus of this thesis is to analyze popular in-memory key-value stores and com-

pare their performance. We have performed the comparisons based on parameters like

in-memory caching support, supported programming languages, scalability, and utilization

from parallel applications. Based on the initial comparisons, we evaluated two key-value

stores in detail, namely Memcached [4] and Redis [5]. To perform extensive analysis of these

two data stores, a set of micro-benchmarks have been developed and evaluated for both

Memcached and Redis. Tests were performed to evaluate the scalability, responsiveness

and data load handling capacity and Redis outperformed Memcached in all test cases.

To further analyze the in-memory caching ability of Redis, we integrated it as a caching

layer into an air quality simulation [6] based on Hadoop [7] MapReduce [8] which calculates

the eight-hour rolling average of ozone concentration at various sites in Houston, TX. Our

aim was to compare the performance of the original air-quality application that uses the

disk for data storage, to our application that uses in-memory caching. Initial results show

that there is no performance gain achieved by integrating Redis as a caching layer. Further

optimizations and configurations of the code is reserved for future work.

vi

Contents

1 Introduction 1

1.1 Brief Overview of Key-Value Data Stores . . . . . . . . . . . . . . . . . . . 4

1.2 Goals of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3 Organization of this Document . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Background 8

2.1 In-memory Key-value Stores . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.1 Redis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.2 Memcached . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.1.3 Riak . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.1.4 Hazelcast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.1.5 MICA (Memory-store with Intelligent Concurrent Access) . . . . . . 21

2.1.5.1 Parallel Data Access . . . . . . . . . . . . . . . . . . . . . . 21

2.1.5.2 Network Stack . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.1.5.3 Key-value Data Structures . . . . . . . . . . . . . . . . . . 23

2.1.6 Aerospike . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.1.7 Comparison of Key-Value Stores . . . . . . . . . . . . . . . . . . . . 26

2.2 Brief Overview of Message Passing Interface (MPI) . . . . . . . . . . . . . . 29

vii

2.3 Brief Overview of MapReduce Programming and Hadoop Eco-system . . . . 31

2.3.1 Integration of Key-Value Stores in Hadoop . . . . . . . . . . . . . . 35

3 Analysis and Results 36

3.1 MPI Micro-benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.1.1 Description of the Micro-benchmark Applications . . . . . . . . . . . 38

3.1.1.1 Technical Data . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.1.2 Comparison of Memcached and Redis using our Micro-benchmark . 41

3.1.2.1 Varying the Number of Client Processes . . . . . . . . . . . 43

3.1.2.1.1 Using Values of Size 1 KB . . . . . . . . . . . . . 43

3.1.2.1.2 Using Values of Size 32 KB . . . . . . . . . . . . . 44

3.1.2.2 Varying the Number of Server Instances . . . . . . . . . . . 47

3.1.2.3 Varying the Size of the Value . . . . . . . . . . . . . . . . . 48

3.1.2.4 Observations and Final Conclusions . . . . . . . . . . . . . 50

3.2 Air-quality Simulation Application . . . . . . . . . . . . . . . . . . . . . . . 51

3.3 Integration of Redis in Hadoop . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.3.1 Technical Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.4 Results and Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4 Conclusions and Outlook 59

Bibliography 62

viii

List of Figures

1.1 Key-value pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1 Redis Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2 Redis in a Master-Slave Architecture . . . . . . . . . . . . . . . . . . . . . . 12

2.3 Memcached Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.4 Riak Ring Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.5 Hazelcast In-memory Computing Architecture . . . . . . . . . . . . . . . . . 19

2.6 Hazelcast Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.7 MICA Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.8 Aerospike Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.9 Word Count Using Hadoop MapReduce . . . . . . . . . . . . . . . . . . . . 34

3.1 Time Taken to Store and Retrieve Data When the Number of Client Pro-

cesses is Varied. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.2 Time Taken to Retrieve Data When the Number of Client Processes is Varied. 46

3.3 Time Taken to Store and Retrieve Data When the Number of Servers is

Varied. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.4 Time Taken to Store and Retrieve Data when the Value Size is Varied. . . . 50

3.5 Customized RecordWriter to Read in Data from Redis . . . . . . . . . . . . 54

ix

3.6 Customized RecordReader to Write Data to Redis . . . . . . . . . . . . . . 55

3.7 Comparison of Execution Times (in minutes) for Air-quality Applications

Using HDFS and Redis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

x

List of Tables

2.1 Summary of features of key-value stores . . . . . . . . . . . . . . . . . . . . 28

3.1 Time taken to store and retrieve data when number of client processes is

varied. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.2 Time taken to store and retrieve data when number of client processes is

varied. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.3 Time taken to store and retrieve data when the number of servers is varied. 47

3.4 Time taken to store and retrieve data when the size of the value is varied. . 49

3.5 Time taken to execute original air-quality application . . . . . . . . . . . . 56

3.6 Time taken to execute the air-quality application using Redis . . . . . . . . 57

xi

Chapter 1

Introduction

Traditionally, science has been divided into theoretical and applied/experimental branches.

Scientific computing (or Computational Science) though closely related to the theoretical

side, also has features related to the experimental domain. Computational science has

now become the third pillar of science and increasingly scientists employ scientific comput-

ing tools and techniques to solve many problems in the fields of science and engineering.

Problems as diverse as designing the wing of an airplane to predicting the weather are

being solved using scientific computing methodologies. However, the data generated in

such problems is in the range of hundreds of gigabytes, while some applications even deal

with terabytes of data. “Big Data” is a term that is generally used to describe such a

collection of data which is huge in size and yet growing exponentially with time. The New

York Stock Exchange generates terabytes of new trade data per day. Social media sites like

Facebook ingest and generate around 500+ terabytes of data per day. A single jet engine

can generate 10+ terabytes of data in 30 minutes of flight time, with thousands of flights

scheduled per day the generation of data reaches up to several petabytes [9]. Many of these

1

applications are either real-time or have requirements to provide results in a timely man-

ner. Traditionally, such applications were executed using specialized hardware along with

conventional data storage and retrieval methods. However, as the scale of data increased,

the need for larger and scalable data storage methods increased which is why large data

centers began to be used. The massive datasets collected are so large and complex that

none of the traditional data-management tools are able to store it or process it efficiently

mainly because they do not scale according to the scale of the data.

Data being produced can be structured, unstructured, or semi-structured. Relational

databases are bounded by their schema and hence, pose a limitation on the type of data

that can be entered into the database. They cannot accommodate the volume, velocity, and

variety of the data being produced. Also, the data being collected could not be discarded

because larger datasets can be analyzed to generate more accurate correlations, which

may lead to more concrete decision-making resulting in greater operational efficiencies and

profits. In the early 2000s, the volumes of data being handled by organizations like Google

started outgrowing the capacities of the legacy RDBMS software. The exponential growth

of the web also contributed to this data explosion and gradually businesses all around began

facing the issue of managing increasingly large volumes of data. While Internet giants such

as Amazon, Facebook, and Google may have been the first to truly struggle with the “big

data problem”, enterprises across industries were struggling to manage massive quantities

of data, or data entering systems at a high velocity, or both. It wasn’t long before data

scientists and engineers designed a new system to meet the increasing data-management

demands. As a result, the term “NoSQL” was introduced to describe the data-management

systems that contained some RDBMS-like qualities, but went beyond the limits that limited

traditional SQL-based databases.

2

A NoSQL-database environment is a non-relational database system optimized for hori-

zontal scaling onto a large, distributed network of nodes. It enables rapid, ad-hoc organiza-

tion and analysis of massive amounts and diverse data types. NoSQL is a whole new way of

thinking about databases. The easiest way to think of NoSQL, is of a database which does

not adhere to the traditional relational database management system (RDBMS) structure

and sometimes it is also referred to as ‘not only SQL’. It is not built on tables and does

not necessarily employ SQL to manipulate data. NoSQL databases also commonly do not

provide full ACID (atomicity, consistency, isolation, durability) [10] guarantees. NoSQL

also helps ensure availability of data even in the face of hardware failures. If one or more

database servers, or nodes goes down, the other nodes in the system are able to continue

with operations without data loss, thereby showing true fault tolerance. When deployed

properly, NoSQL databases enable high performance while also guaranteeing availability.

This is immensely beneficial because system updates, modifications, and maintenance can

be carried out without having to take the database offline. As NoSQL databases do not

strictly adhere to the ACID properties, they provide real location independence. This

means that read and write operations to a database can be performed regardless of where

that I/O operation physically occurs with the operation being propagated out from that

location, so that its available to users and machines at other sites. Such functionality is

very difficult to architect for relational databases. NoSQL databases guarantee eventual

consistency of the data across all nodes.

A NoSQL-data model can support use cases that don’t fit well into a RDBMS. A NoSQL

database is able to accept all types of data (structured, semi-structured, or unstructured)

much more easily than a relational database, which rely on a predefined schema. NoSQL

systems are designed so that they can be easily integrated into new cloud-computing archi-

tectures that have emerged over the past decade to allow massive computations to be run

3

inexpensively and efficiently. Data organized in NoSQL systems can be analyzed to gain

insights about previously unknown patterns and trends with minimal coding and without

the need for data scientists and additional infrastructure. This makes operational big-data

workloads much easier to manage, cheaper, and faster to implement. Each organization

had different requirements from their NoSQL database and as a result there are various

NoSQL-data stores in the market from different vendors including Amazon, Google, etc.,

to handle big data. However NoSQL databases can be broadly categorized as follows:

• Key-value store: In a key-value store, the data consists of an indexed key and a value,

hence the name.

• Document database: Expands on the basic idea of key-value stores where “docu-

ments” contain more complex data and each document is assigned a unique key.

• Column store: Instead of storing data in rows as done by RDBMS, these databases

are designed for storing data tables as sections of columns of data, rather than as

rows of data.

• Graph database: Based on graph theory, these databases are designed for data whose

relations are well-represented as a graph and have elements which are interconnected.

This thesis focuses on key-value stores and on in-memory key-value stores in particular.

1.1 Brief Overview of Key-Value Data Stores

A key-value store is a simple database that uses an associative array (map or dictionary)

as the fundamental data structure in which each key is associated with one value. In each

key-value pair the key can be in the form of a string such as a filename, URI, or hash

while the value on the other hand, can be any kind of data. The value is stored as a

4

BLOB (Binary Large OBject). The value essentially is binary data and can be anything

ranging from numbers, strings, counters, JSON, XML, HTML, binaries, images, and short

videos. As a result, key-value stores require minimal upfront database design and are faster

to deploy. Also, since data is referenced by keys, there is no need to index the data to

improve performance. However, since the type of the values is known, you cannot filter or

control what’s returned from a request based on the value. Key-value stores provide a way

to store, retrieve, and update data using get, put, and delete commands. The simplicity of

this model makes a key-value store fast, easy to use, scalable, portable, and flexible. Figure

1.1 shows a collection of keys and the values associated with them. These key-value pairs

are then ultimately stored in a key-value database configured to store and retrieve data in

an efficient manner.

Figure 1.1: Key-value pairs

As seen in Figure 1.1, the data is stored in the form of key-value pairs. The key needs to

be unique throughout the dataset since it serves as the index for the value into the datastore.

Key-value databases are designed so as to enable efficient storage and retrieval of key-value

pairs. Typically key-value datastores are implemented using hash-tables since retrieval

from hash-tables can be done in O(1) time if the hash-table is implemented properly. Key-

value stores can use consistency models ranging from eventual consistency to serializability.

Some maintain the data in memory (known as in-memory key-value stores) while others

5

employ storage devices to maintain the data. There are many types of key-value stores

available today but this paper focuses on in-memory key-value stores and specifically on

two in-memory key-value databases namely Memcached and Redis.

1.2 Goals of this Thesis

Over the years, traditional databases have been the go-to solution for all data storage and

analysis requirements. Although traditional databases have been the tried and tested way

to store data, in recent years, we have seen a tremendous shift in the status quo and NoSQL

databases have emerged as the solution for all “big data” applications. This is because

traditional databases are unable to keep up with the “volume, velocity and variety” of the

data currently being generated. There are many types of NoSQL databases each designed

with a specific purpose and target group in mind. Key-value stores are one such type of

NoSQL databases which have found wide-spread due to their simplicity and their ability to

be easily integrated into any environment with minimal efforts. In-memory key-value stores

are a special kind of key-value store that retain data in RAM. They are now increasingly

being used in enterprise applications to improve application performance by enhancing the

speed with which data is written/read. The goal of this thesis is to evaluate and compare

the various in-memory key-value stores currently available. This evaluation is done in three

phases. In the first phase, we evaluate and compare popular and widely-used in-memory

key-value stores available in the market today. In the second phase, we evaluate and

compare in detail, the performance of two in-memory key-value stores, namely Memcached

and Redis using a micro-benchmark that we have developed using C and the OpenMPI

library. In the final phase, we integrate Redis into an application performing large-scale

data analysis so as to analyze if in-memory key-value stores enhance the performance of

the application. The application that we have used is an air-quality simulation developed

6

using Hadoop MapReduce, which analyzes an air-quality dataset of 48.5 GB, containing

measurements of pollutants from various sensors spread all over Texas. This application

analyzes the given dataset to calculate the eight-hour rolling average of air-quality in sites

across Houston, TX.

1.3 Organization of this Document

The rest of the thesis is organized as follows. Chapter 2 discusses the details of various

widely-used in-memory key-value stores. It also outlines the details of the OpenMPI library

and the Hadoop framework. In Chapter 3, we describe in detail, the use of OpenMPI micro-

benchmark which we evaluate and compare the performance of Memcached and Redis. We

also discuss the details of the Hadoop MapReduce air-quality simulation application and

evaluate the performance results after integrating Redis into this application. In Chapter

4, we present the conclusion of the work.

7

Chapter 2

Background

In the previous chapter, we discussed briefly the limitations of traditional relational databases

and how they fall short when dealing with huge volumes of data. Relational databases offer

many powerful data management tools and techniques. However, a majority of applica-

tions today, only require basic functionalities to store and retrieve data by primary key and

do not require the complex querying and management features offered by RDBMS’s. En-

terprise level relational databases require sophisticated hardware and trained professionals

for day-to-day operations which increases the cost of maintaining applications using these

databases. Also, the available replication strategies are limited and typically choose con-

sistency over availability. Despite improvements being made, it is still difficult to scale-out

databases or use smart partitioning schemes for load balancing. To overcome the limita-

tions discussed earlier, NoSQL databases were proposed as the solution. Various types of

NoSQL databases are now increasingly being used for large-scale data analytics applica-

tions. Key-value stores are one such type of NoSQL databases which are widely used in

production environments for their performance and simplicity. In-memory key-value stores

are a specialized form of key-value stores, and they will be the main focus of this thesis. In

8

this chapter, we describe in detail some widely used in-memory key-value stores available

in the market today. We will also briefly explain the OpenMPI library and the Hadoop

framework using which we have developed benchmarks and applications for evaluation and

results.

2.1 In-memory Key-value Stores

Key-value stores are the simplest form of NoSQL databases. A key-value store allows

you to store data, indexed by unique keys. The value is just a blob and the database is

usually not concerned about the content or type of the value. In other words, key-value

stores don’t have a type-defined schema, but a client-defined semantics for understanding

what the values are. Key-value stores tend to have great performance, because the access

pattern in key-value stores can be optimized. The benefits of using this approach is that

it is very simple to build a key value store. Also, applications using key-value stores are

easily scalable. In-memory key-value stores are a specialized form of key-value stores which

are highly optimized so as to allow extremely fast read/writes from/to the database. In-

memory key-value databases store the data in main memory (RAM), so that any request

to read/write data can be serviced by just accessing the RAM instead of the disk. It is

because of this reason that such key-value stores are now increasingly being incorporated

as caching layers in time-sensitive data analytics applications. In the following subsections

we describe and compare the details of some widely used in-memory key-value stores.

2.1.1 Redis

Redis [5] is a very popular open-source (BSD licensed), in-memory, key-value data store.

Redis is widely used because of it’s great performance, adaptability, a rich set of data

structures, and a simple API. According to the creator, Salvatore Sanfilippo, Redis is an

9

“in-memory data structure store used as a database, cache, and message broker” [5]. This

is because, Redis provides support for storing not only string values but also complex data

structures like hashes, lists, and sets. Redis also has support for replication, LUA scripting

[11], Least Recently Used (LRU) eviction [12] and different levels of on-disk persistence.

On-disk persistence means that apart from maintaining data in the memory, there is an

option to also persist the data by either dumping the data to the disk periodically or by

appending each write command to a log file. Persistence can be optionally disabled, if the

application just requires a high-performance, in-memory, caching mechanism.

The architecture of any Redis application is simple and consists of two main processes -

Redis client and Redis Server. The client and server processes can be in the same computer

or in two different computers. The server is responsible for storing data in memory and

handling all read/write requests from the client. The client can be the Redis console client

(provided by RedisLabs) or any other application developed using Redis-client libraries

(available for a wide variety of programming languages). For trivial applications, which

require basic caching facilities, one instance of a Redis server will suffice. However, most

production level applications, will require more than one instance of a Redis server. “Redis

Cluster” (available since version 3.0) is a fairly new feature of Redis which involves running

multiple instances of the Redis server on machines in the cluster. The basic structure of

Redis deployed in a cluster is as follows:

10

Figure 2.1: Redis Cluster

In a cluster, all the server instances are connected to each other and together, they

maintain meta-data about the state of the network. There may be more than one instances

of the server running on one physical machine. The servers communicate using a customized

and highly optimized version of the gossip protocol [13]. Client applications connect to

these server instances and issue read/write requests. A requesting client application can

be of two types:

• Dummy-client requests: The client is responsible for locating the correct node on

which data is located and issue requests to the respective node.

• Smart-client requests: The client forwards its request to any one of the nodes. The re-

quest is then forwarded to the appropriate server where the requested data is present.

The details of how the above connections are created and maintained, are hidden from the

end-user applications by Redis-client libraries. Redis can be deployed in cluster in a variety

11

of ways but the most common method is a master-slave sharded method so as to enable

replication of data. The logical structure of such an architecture is as follows:

Figure 2.2: Redis in a Master-Slave Architecture

The slave nodes are exact replicas of the master nodes which ensures that the required

data will be available even if a particular master node goes down. The Redis cluster

manager tries to allocate slaves and masters such that the replicas are in different physical

servers. “Redis Cluster” also provides many other useful features like adding and removing

nodes while applications are running, resharding of keys around nodes in the cluster, multi-

key operations (e.g., using wildcard characters to retrieve key-value pairs). Redis, whether

running on a standalone machine or in a cluster greatly improves the performance of

applications. Due to a diverse set of useful features and also because of its performance

and ease of use, Redis, today, is one of the leading key-value stores being used in the

industry and academia.

2.1.2 Memcached

Memcached [4] is an open-source, high-performance, distributed, in-memory key-value store

which is used as a caching layer in many applications that deal with huge volumes of data.

12

Memcached is used for data caching in LiveJournal, Slashdot, Wikipedia, and other high-

traffic sites [14]. According to Brad Fitzpatrick, the creator of Memcached, the primary

motivation for creating Memcached was to improve the load performance of dynamic web-

sites by caching individual objects on dynamic web pages. The main idea behind Mem-

cached is to collect the main memory available in all machines connected in a network, and

pool them together so that their collective main memory capacities appear as one cohe-

sive unit to applications using Memcached. This means that Memcached does not require

extremely powerful servers to execute. Memcached can be run on commodity hardware,

connected together in a network. Nodes can be added/removed from the network with-

out any adverse effects. Also, the effective total amount of RAM made available to client

applications is more and can easily be modified to suit application requirements.

Memcached is designed to have a client-server architecture. Memcached server instances

are run over nodes in the network, wherever memory is available, and each server listens

on a user-defined IP and port. The main memory from all running Memcached server

instances forms a single, common memory pool, and client applications use this memory

pool to store and retrieve data. Multiple Memcached server instances can run on a single

physical machine. The basic structure of Memcached in action in the network is as follows:

13

Figure 2.3: Memcached Architecture

In Figure 2.3 [4], we have three Memcached server instances running in the cluster.

The keyspace is divided among the server instances such that each Memcached server is

responsible for a particular set of key-value pairs. To store/retrive a key-value pair, client

applications are supposed to send requests to the correct Memcached instance. This is

done by logically considering each Memcached server itself to be a bucket in a hash table.

To store/retrieve a key, the client calculates the hash of the key, which points to the correct

Memcached instance. Each Memcached instance, in turn, holds a hash table of it’s assigned

key-value pairs. The client application can then store/retrieve the key-value pair. Thus,

Memcached acts as a two-layer global hash table. End-users need not be worried about

the details about how to connect to the correct Memcached instances. There are many

Memcached client libraries available in a wide variety of languages like C, C++, Perl, Java,

PHP, Ruby, and Python. These libraries abstract away the internal details and present a

simplified API which can then be used by applications. In Figure 2.3, if the application

requests the key ‘foo’ (using a client library), the client library calculates the hash value of

the key to locate the server which will process the request (in this case, ‘foo’ is present in

14

server 2). The request is then forwarded to the correct server (server 2). The server then

responds to the client library by searching for ‘foo’ in it’s local hash table and returning it

to the user.

Each server instance is independent of the other and they do not communicate with

each other. Also, the data inside the servers is maintained on a least recently used basis

to make room for new items. In case a server fails or if the requested data is not present

in the cache, requests to the server result in a cache miss, which the application may then

handle appropriately. Memcached clients have to be configured appropriately to deal with

node failures. If no effort is taken in this direction, requests for keys assigned to a failed

Memcached instance simply result in cache misses. Memcached is designed for fast access

to data by using optimized memory allocation algorithms, avoiding locking objects so as to

avoid waits, fetching multiple keys at the same time etc. Due to it’s compactness, simplicity

and high-performance, Memcached is widely used as a caching layer in many applications

that require high-speed access to data [15].

2.1.3 Riak

Riak key-value store (known as Riak KV) [16] is a highly resilient key-value database. It

is highly optimized to be available and scalable while running on a cluster of commodity

hardware. Riak also provides in-memory caching by integrating Redis as the caching

layer into it’s key-value database. This helps reduce latency and improves application

performance. Riak stores data as a combination of keys and values, where the value can

be anything ranging from JSON, XML, HTML to binaries, images etc. Keys are binary

values which are used to uniquely identify a value. An application using Riak is part of a

client-server request-response model. Client applications are responsible for connecting to a

Riak server and making read or write requests. User applications wanting to leverage Riak,

15

need not delve into the details of how to communicate with Riak servers. They can simply

make use of simple API’s provided by client libraries, available for many programming

languages like Java, Ruby, Python, PHP, Erlang, .NET, Node.js, C, Clojure, Go, Perl,

Scala, and R. A Riak server is responsible for satisfying incoming client requests and can

function as a stand-alone instance or can be grouped together to form a Riak cluster. All

the Riak instances in a cluster work together, by pooling together their individual hardware

resources to provide a global view of the database to client applications. They communicate

with each other to provide data availability and partition tolerance.

Riak, working in a cluster, has a peer-to-peer architecture in which all the nodes can

fulfill read and write requests. All nodes have the same set of functionalities which is

why there is no single point of failure in the architecture. Riak’s architecture is arranged

in the form of a “Ring”. Nodes in the cluster are assigned logical partitions and these

partitions are all considered as part of the same hash space (In Figure 2.4 [16], node 0

is responsible for all green partitions while all orange partitions are handled by node 1

and so on). Each partition is a logical entity that is managed by a separate process. This

process is responsible for storing data, serving incoming read, and write requests. Since the

workload is distributed among multiple processes, Riak is extremely scalable. A physical

machine in the network may have one or more partitions stored locally. Depending on

the replication factor (say N), replicas of data stored in one partition is also stored in

the “next N partitions” of the hash space. Nodes in the cluster communicate with each

other by exchanging a data structure known as “Ring state”. At any given point of time

each node in the cluster knows the state of the entire cluster. A client can request for a

particular piece of information from any node in the cluster. If a node receives a request

for data that is not present locally, it forwards the request to the proper node by consulting

the ring state. The ring architecture explained above can be logically depicted as follows:

16

Figure 2.4: Riak Ring Architecture

Riak is an eventually consistent database [17] which means that data is evenly dis-

tributed among all nodes in the cluster and that if a node goes down, key-value pairs are

redistributed in an efficient manner. When a particular node goes down, a neighboring

node will take over its responsibilities. When the failed node returns, the updates received

by the neighboring node are handed back to it. This ensures that data is always available.

Riak also guarantees eventually consistent replicas of the data, meaning that while data

is always available, not all replicas may have the most recent update at the exact same

time. Due to it’s simple architecture, high-performance, and well-documented client li-

brary API’s, Riak has found wide-spread use in many corporations like Uber, Alert Logic,

Zephyr and Rovio.

2.1.4 Hazelcast

Hazelcast [18] is an open source, in-memory data store written in Java. According to the

documentation, “Hazelcast is an In-Memory Data Grid (IMDG) and allows for data to

be evenly distributed among the nodes of a computer cluster and is designed to scale up

17

to hundreds of thousands of nodes”. While in-memory key-value stores like Redis started

providing cluster support only after a few initial versions, Hazelcast was developed from

the ground up with the intention to leverage distributed computer architectures.

Hazelcast’s architecture can be described as peer-to-peer. There is no master and slave

and hence there is no single point of failure. All nodes store an equal amount of data and

do an equal amount of processing. The oldest node in the cluster is the de-facto leader and

manages cluster membership by determining which node is responsible for which particular

chunk of data. As new nodes join or dropout, the cluster re-balances accordingly. Each

server instance runs in a separate Java Virtual Machine [19] and there may be more than

one server instances running on a single physical machine. Hazelcast supports a client-

server request-response design. Client applications making data requests are serviced by

Hazelcast server instances running on nodes in the cluster. User applications do not need to

delve into the details of connecting to Hazelcast servers and making requests. There are a

wide variety of client libraries that enable user applications to communicate with Hazelcast

instances distributed on nodes in the network. Client libararies are provided for popular

programming languages like Java, C++, .NET, Node.js, Python, and Scala. Figure 2.5

[20] depicts the communication mechanism between the client and server applications.

18

Figure 2.5: Hazelcast In-memory Computing Architecture

The client application makes requests to the Hazelcast server which is then fulfilled.

The communication pattern between the client and the servers can one of the following:

• Embedded topology

The client application, the data and the Hazelcast instance all reside on the same

node and share a single JVM. The client and the server communicate with each other

directly.

• Client plus member topology

The client application and the Hazelcast instances are not tightly coupled and may

reside on different nodes of the cluster. They communicate with each other over the

network.

19

The two topologies listed above are depicted below [20].

(a) Embedded Topology (b) Client plus Member Topology

Figure 2.6: Hazelcast Architecture

Although the embedded topology is comparatively simple and there are no extra nodes

to manage or maintain, the client plus member topology is mostly preferred. This is be-

cause, it provides greater flexibility in terms of cluster mechanics. Member JVMs can

be taken down and restarted without affecting the application. The client plus member

topologies isolate the application code from cluster-related events. Hazelcast client ap-

plications can be either a “native client” or a “lite client”. A native client maintains a

connection to any one node in the cluster and is redirected appropriately by that node

when making requests. A lite client maintains data about each and every cluster in the

node and makes requests to the correct Hazelcast instance. The Hazelcast instances share

the keyspace such that any one instance is not over-burdened. In case of node crashes,

Hazelcast also provides recovery and fail-over capabilities. Hazelcast is an open source

library which is easily distributed in the form of a JAR file without the need to install

any software. It supports in-built data structures like maps, queues, multimaps and also

20

allows for the creation of custom data structures. Hazelcast is used in many enterprise

applications and has a huge client base that includes American Express, Deutsche Bank,

Dominos Pizza and JC Penny.

2.1.5 MICA (Memory-store with Intelligent Concurrent Access)

MICA [21] is “a scalable in-memory key-value store that handles 65.6 to 76.9 million key-

value operations per second using a single general-purpose multi-core system” [21]. MICA

can be integrated into applications using a request-response, client-server model. MICA is

installed across nodes in the cluster and client applications can connect to these instances

to make requests. The requesting client needs to know which server instance to contact. To

serve multiple client requests efficiently, MICA is designed for high single-node throughput

and low end-to-end latency. MICA also strives to achieve consistent performance across

workloads, and can handle small, and variable-length key-value items while still running on

commodity hardware. To achieve all the above performance gains, MICA makes key design

decisions regarding parallel data access, the network stack, and key-value data structures.

The following sub-sections describe these design choices in detail.

2.1.5.1 Parallel Data Access

To enable truly parallel access to data, MICA creates one or more data partitions (“shards”)

per CPU core and stores key-value items in a partition determined by their key. An

item’s partition is determined by using a 64-bit hash of an items key calculated by the

client application. Sometimes, such partitioning may lead to skewed workloads wherein a

particular partition is being used more often than others. In this case, MICA exploits CPU

caches and packet burst I/O to disproportionately speed more loaded partitions, nearly

eliminating the penalty from skewed workloads. MICA can operate in EREW (Exclusive

21

Read Exclusive Write) or CREW (Concurrent Read Exclusive Write) modes. EREW

assigns a single CPU core to each partition for all operations. The absence of concurrent

access to partitions removes the need for synchronization and inter-core communication,

making MICA scale linearly with CPU cores. CREW allows any core to read partitions, but

only a single core can write. This combines the benefit of concurrent read and exclusive

write; the former allows all cores to process read requests, while the latter still reduces

expensive cache-line transfer.

2.1.5.2 Network Stack

MICA uses Intels DPDK [22] instead of standard socket I/O. This allows our user-level

server software to control NIC’s (Network Interface Card) and transfer packet data with

minimal overhead. This is done because the key-value pairs to be sent over the network

are usually not large enough as compared to traditional TCP/IP packets. Also, TCP/IP

features like congestion control and error correction are strictly not required for this articula

case. By bypassing socket I/O, MICA avoids any additional network features that are not

required and hence avoids delays. For NUMA (non-uniform memory access) systems [23],

the data is partitioned such that the CPU core and the NIC only accesses packet buffers

stored in their respective NUMA domains. Each key-value pair to be transmitted is an

individual packet, to further increase transmission speeds, MICA uses bursty I/O. MICA

also ensures that no CPU core is overloaded with requests by using processor affinity to

determine which CPU is responsible for which partition of data. Requests for keys are then

forwarded accordingly by the client.

22

2.1.5.3 Key-value Data Structures

MICA can be used either for storing data (no existing items can be removed without an

explicit client request) or for caching data (existing items may be removed to reclaim space

for new items). MICA uses separate memory allocators for cache and store semantics.

MICA uses a circular log for caching. New data is appended to the log and existing data

is modified in place. Oldest items at the head of the file are evicted to make space for

newer entries when the cache is full. Although the natural eviction is FIFO, MICA can

provide LRU eviction by reinserting any requested items at the tail. In store mode, MICA

uses a lossy concurrent hash index to index stored items. Both the above data structures

exploit cache semantics to provide fast writes and simple memory management. Each

MICA partition consists of a single circular log and lossy concurrent hash index.

Figure 2.7 [24] clearly depicts MICA’s in-memory key-value store approach. It also shows

how a client request is forwarded to the server and how each design decision discussed

above affects the plays a part in enhancing the performance.

Figure 2.7: MICA Approach

MICA is entirely written using the C programming language and it has a client library

23

in C. Applications that want to leverage MICA as a key-value store or cache can use this

client library to make requests to MICA instances installed on a cluster. Although, MICA

has a set of impressive features, it is not as widely used as it’s other counterparts. The

reasons for this include limited documentation as well as lack of client libraries in other

programming languages.

2.1.6 Aerospike

Aerospike is a distributed, scalable NoSQL database. It is developed from the ground up

keeping clustering and persistence in mind. It’s architecture is comprised of the following

layers [25]:

• Application layer

All end-user applications fall in this layer

• Client layer

This layer consists of a set of client libraries written in a variety of languages like

C, Java, C#. NET, Go, Perl and Python. These client libraries are responsible for

monitoring the cluster on which Aerospike is installed and forwarding application

requests to the correct node.

• Clustering and distribution layer

This layer manages cluster communications and automates fail-over, replication,

cross-data center synchronization, and intelligent re-balancing and data migration.

• Data storage layer

This layer reliably stores data in DRAM and Flash for fast retrieval.

24

Figure 2.8: Aerospike Architecture

Aerospike uses a shared-nothing architecture, where every node in the Aerospike cluster

is identical, all nodes are peers and there is no single point of failure. Data is distributed

evenly and randomly across all nodes within the cluster. Nodes within the cluster com-

municate with each other using a “heartbeat call” to monitor inter-node connectivity and

to maintain meta-data about the cluster state. When a node is added or removed from

the cluster, data is automatically redistributed among the nodes. Aerospike also allows

for replication of data so as to ensure reliability and availability even if a node goes down.

Replication is done on geographically separated nodes so as to ensure maximum availabil-

ity. Any changes to the main data partition is also immediately reflected in the replicas.

On cluster startup, Aerospike configures policy containers -namespaces (similar to RDBMS

databases). Namespaces are divided into sets (similar to RDBMS tables) and records (sim-

ilar to RDBMS rows). Each record has a unique indexed key, and one or more bins (similar

to RDBMS columns) that contain the record values. Applications can read or write this

data by making requests using Aerospike client libraries. When data is to be stored, the

client library computes a hash to determine which node the data is to be stored on and

25

forwards the request accordingly. Similarly, to read a particular key-value pair, the hash

of the key is calculated by the client library and the request is forwarded to the that node

accordingly. If a node goes down, the client libraries communicate with the replicas until

the node comes back up again. Aerospike secondary indices of data in memory for faster

retrieval. One major feature of Aerospike is that the data can be persisted on to SSD (Solid

State Storage) storage. This hybrid model enables faster fetching of data as compared to

traditional HDD (Hard Disk Drive) storage. Aerospike also supports data types, queries

and User Defined Functions (UDF). Aerospike has steadily gained recognition for being a

high-performing, scalable key-value store and is being used by organizations like Kayak,

AppNexus, Adform and Yashi.

2.1.7 Comparison of Key-Value Stores

In the previous sections, we briefly described the salient features of some widely used in-

memory key-value stores. In this section, we will compare them so as to pick the ones that

we would like to further analyze. The comparison is done on the following factors:

• Programming languages

The aim is to select a database which has client libraries in widely-used major pro-

gramming languages. This ensures that the key-value store can be easily integrated

into scientific and big-data applications.

• Hadoop and HPC support

We want to select a database which can be easily integrated into Hadoop and High

Performance Computing environments (in our case we aim for Open MPI support).

This is because we will be analyzing the key-value store using an Open MPI micro-

benchmark and a Hadoop application.

26

• In-memory storage

Our aim is to analyze key-value stores which can be integrated as a caching layer in

compute intensive applications to see if we observe any performance benefits. Hence,

we look for a key-value store that maintains data in memory.

• Storage on files or databases

We also would ideally like the key-value database to persist data onto secondary

storage so that data is not lost.

• Access from remote locations

We plan to install the key-value store onto a cluster and then access the database

remotely using client applications, which is why easy remote access is important for

us.

• Support for parallel storage and operations

Ideally, we want data operations to be performed in parallel. The key-value store

should be able to run in a cluster and should be able to process multiple incoming

simultaneous requests. Unrelated data requests should not block operations and data

operations should be performed as soon as possible.

• Open Source

From a financial perspective, we aim to select key-value stores that are open source.

Table 2.1 gives a summary of the relevant features of all the key-value stores discussed

above:

27

Table 2.1: Summary of features of key-value stores

Comparing the features of all the above in-memory key-value stores, we found Redis to

be the best fit. Riak fulfills all of the above requirements but its in-memory key-value store

internally uses Redis, so we decided on not moving forward with it. Similarly, Aerospike

also has some promising features but it requires a Solid State Drives (SSD) as the backing

store, which, we believe, largely restricts its scope. Hazelcast and MICA do not have the

option to back data onto a secondary storage medium, which is why we did not select

them. Although, Memcached too does not allow backing of data onto secondary storage,

based on surveys [26], we observed that Redis and Memcached are the most widely-used

28

key-value stores. Hence, we decided to select Redis and Memcached for further analysis.

In the next sections, we examine details of the Message Passing Interface (MPI) used

in parallel computing and the Hadoop framework used for analysis mostly done using a

cluster of commodity hardware. We will also examine the ways in which in-memory key-

value stores can possibly be integrated into these environments so as to offer performance

improvements.

2.2 Brief Overview of Message Passing Interface (MPI)

Traditionally computer problems were solved using serial algorithms where instructions

were executed one after the other. In parallel computing, a problem is broken down into

discrete parts that can be executed concurrently by compute resources that communicate

and co-ordinate with each other to produce the desired results. Parallel computing is thus

used to either solve problems that are too large to be solved by a single compute resource

or to solve problems faster than a single compute resource. The compute resources can

be either a single computer with multiple cores or a set of computers connected through

a network. If the compute resource is a single multi-core computer, then communication

is done by reading or writing to shared memory. However for a distributed architecture,

communication is done using sockets, message passing, or Remote Procedure Calls (RPC).

Generally, shared memory systems are easy to program while distributed memory systems

are difficult to program. This is largely because of the inherent complexity of designing and

coordinating concurrent tasks, a lack of portable algorithms, standardized environments,

and software development toolkits. There are constant innovations in microprocessor ar-

chitecture and as a result, parallel software developed keeping a particular architecture

in mind soon becomes outdated, which ultimately undermines the efforts taken to design

that particular parallel software. Hence, there is a need for a standard library that enables

29

programmers to develop portable, high-performance, parallel applications. MPI stands for

Message Passing Interface [27] and it is a standard that is created and maintained by the

MPI Forum, an open group consisting of parallel computing experts from the industry

as well as academia. The MPI standard provides an Application Programming Interface

(API) [28] that is used for portable, high-performance inter-process communication (IPC)

[29] message passing.

On most operating systems, an “MPI process” usually corresponds to the operating

system’s concept of a process and processes working together to solve a particular problem

are part of a group so as to enable communication between them. MPI is designed to be

actualized as middleware, meaning that upper-level applications invoke MPI functions to

perform message passing without actually going into the details of how exactly communica-

tion takes place. MPI defines a high-level API and it abstracts away the actual underlying

communication methods used to transfer messages between processes. This abstraction is

done to hide the complexity of inter-process communication from the upper-level applica-

tion and also to make the application portable across different environments. A properly

written MPI application is meant to be source-compatible across a wide variety of plat-

forms and network types. MPI exposes API’s for point-to-point communication (e.g., send

and receive) and also for other communication patterns, such as collective communication.

A collective operation is an operation where multiple processes are involved in a single

communication. Reliable broadcast which involves one MPI process sending a message to

all other MPI processes in the group is an example of a collecive operation. There are many

implementations of the MPI standard targeted for a wide variety of platforms, operating

systems, and network types. Some implementations are open source while others are closed

source. Open MPI, as its name implies, is an open source implementation of MPI and is

widely used in many high-performance computing environments. We have developed a

30

micro-benchmark using OpenMPI to analyze the performance of Redis and Memcached.

The details of this benchmark are given in the next chapter.

2.3 Brief Overview of MapReduce Programming and Hadoop

Eco-system

Lately, there has been a deluge of data that is huge and varied. Traditional data analysis

tools are not equipped to handle the magnitude and variety of data being generated and

that is where Hadoop [7] comes in. “The Apache Hadoop software library is a framework

that allows for the distributed processing of large data sets across clusters of computers us-

ing simple programming models. It is designed to scale up from single servers to thousands

of machines, each offering local computation and storage. Rather than rely on hardware to

deliver high-availability, the library itself is designed to detect and handle failures at the

application layer, so delivering a highly-available service on top of a cluster of computers,

each of which may be prone to failures.” [30]. In Hadoop, data storage and data analysis,

both are performed using the same set of nodes which allows Hadoop to improve the per-

formance of large scale computations by using the principle of spatial locality [31]. Also,

the cost of a Hadoop cluster is extremely cheap due to the use of commodity hardware.

Together Hadoop-based frameworks have become the de-facto standard for storing and

processing big data.

The Hadoop framework consists of three main components:

• HDFS: Hadoop Distributed File System (HDFS) [30] is a distributed file system

which is used to store very large files.

• MapReduce Framework: The MapReduce [30] module is responsible for carrying out

distributed analysis tasks by implementing the MapReduce paradigm.

31

• YARN: Yet Another Resource Manager (YARN) [32] is the resource manager for the

framework and is responsible for managing and allocation resources to the application

as and when required.

The origins of the Hadoop framework are largely inspired by the Google File System

[33] and MapReduce paradigm [8] introduced in 2004. These concepts laid the foundation

for the Hadoop framework and by 2009 Hadoop came to be widely used as a large-scale

data-analysis platform. In this model, the total computational requirements of a Hadoop

application are divided among nodes in the cluster, and the data to be processed is stored in

HDFS. HDFS divides the file into blocks and stores those blocks onto nodes in the cluster.

HDFS also provides fault tolerance by storing replicas of file chunks in the cluster and the

default replica count is three (which may be configured according to the requirements of

the application). For fault tolerance, HDFS stores the first replica on the same rack where

the original data is present so as to quickly overcome the failure of a node and to continue

processing. Another replica is stored on a separate rack so that in the event of a rack

failure, data will be available and can be analyzed.

The MapReduce style of programming is exceptionally flexible and can be used to

solve a wide-array of data analytics problems. A Hadoop cluster consists of computational

nodes which can share workloads and take advantage of a very large aggregate bandwidth

across the cluster. Hadoop clusters typically consist of a few master nodes, which control

the storage and processing systems in Hadoop, and many slave nodes, which store all the

clusters data and is also where the data gets processed. MapReduce involves the processing

of a sequence of operations on distributed-data sets. The data consists of key-value pairs,

and the computations have only two phases: a map phase and a reduce phase. The

key concept here is divide and conquer. A typical MapReduce application will have the

following phases:

32

• During the Map phase, input data is split into a large number of fragments, each of

which is assigned to a map task.

• These map tasks are distributed across the cluster.

• Each map task processes the key-value pairs from its assigned fragment and produces

a set of intermediate key-value pairs.

• The intermediate data set is sorted by key, and the sorted data is partitioned into a

number of fragments that matches the number of reduce tasks. This phase is known

as the sort and shuffle phase.

• During the Reduce phase, each reduce task processes the data fragment that was

assigned to it and produces an output key-value pair.

• These reduce tasks are also distributed across the cluster and write their output to

HDFS when finished.

To put this in perspective, we can make use of a basic word-count example. The word

count operation takes place in two stages - a mapper phase and a reducer phase. In the

mapper phase the input text/document is tokenized into words and a key value pair is

formed with these words such that the key is the word itself and the value is ‘1’. All

the values corresponding to a key go to one reducer and in the reduce phase the keys are

grouped together and the values for similar keys are added. This process can be visualized

better as seen in Figure 2.9 [34].

33

Figure 2.9: Word Count Using Hadoop MapReduce

In a MapReduce application, both the map and reduce functions are distributed. When

a MapReduce application is launched, many copies of the program are started on the cluster

of machines on which it is started. One of the copies is called the master and it controls

the rest of the copies - the workers. The master is responsible for distributing the data

across the workers and ensuring that all the workers are engaged in successful completion of

tasks. In case of any failure, automatic re-scheduling of tasks across the available workers is

done. The intermediate key value pairs generated by the map function is distributed across

the multiple workers which run the reduce function. The intermediate values are sorted

and then merged by the reduce function which emits them as output. This distribution

of resources is handled by the YARN module of the Hadoop framework. In the next

subsection, we briefly describe our reasoning behind integrating an in-memory key-value

store into a Hadoop application and the potential benefits that we may gain.

34

2.3.1 Integration of Key-Value Stores in Hadoop

The input, temporary results and the output of a MapReduce application are read/written

from/to the disk via HDFS. Although HDFS is optimized to handle huge loads, the disk

will tend to slow down the performance. Although a majority of MapReduce applications

are meant to be executed in batch-processing mode, there are some applications that may

require quick delivery of intermediate results. Scientific applications fall in this category

and hence, this thesis aims to introduce an in-memory key-value store that will act as the

primary backing store for MapReduce applications instead of HDFS. This is done with the

intention of improving the overall performance of the application by reducing the time to

read/write results. To achieve this, we studied, analyzed and compared the features of

many key-value stores widely used today. Our aim was to find a key-value store which

had the ability to retain data in the main memory so as to reduce retrieval time, support

parallel computing and Hadoop applications and which preferably, is also open source.

Some of the key-value stores that we analyzed are discussed below so as to select the ones

that most suit our needs.

In the next chapter, we will compare the performance of Memcached and Redis using a

micro-benchmark. We will also discuss the working of the air-quality simulation application

in detail and about how integrating an in-memory cache into this application can possibly

give increased performance benefits.

35

Chapter 3

Analysis and Results

The previous chapter gave an overview of various in-memory key-value stores, OpenMPI

and the Hadoop framework. After evaluating some widely used in-memory key-value stores,

we were most interested in evaluating the performance of Redis and Memcached in detail.

To perform this analysis, we have developed a micro-benchmark application. Also, we

were interested in integrating an in-memory key-value database into a compute intensive

application to evaluate if we gain any performance benefits. For this analysis, we have used

an air-quality simulation that generates the eight-hourly air-quality average around sites

in Houston.

In the initial part of this chapter, we describe the micro-benchmark application in detail

and present our results and observations. We then describe the air-quality application

in detail and present our strategy for incorporating an in-memory cache into a Hadoop

application. We then conclude this chapter with our results and findings.

36

3.1 MPI Micro-benchmark

To compare the performance offered by Memcached to that offered by Redis, we have

developed two micro-benchmark applications using C and the MPI library, one each for

Memcached and Redis. The main intention behind developing these two micro-benchmarks

was to do an initial performance analysis of Memcached and Redis. The micro-benchmark

is a C program that establishes a basic communication setup between Memcached/Redis

servers running in a cluster and the respective client applications. The micro-benchmarks

have been developed so that a user can easily specify configurations using only command

line arguments and input files. The parameters that the user can influence are as follows:

• Number of Servers

The number of Memcached/Redis servers to be used and their respective hostnames

are passed in an input text file to the program. These servers then work together to

handle incoming client requests.

• Number of Clients

The number of client processes making requests to the server can be specified us-

ing command-line arguments. The number of client processes storing data can be

configured separately from the number of clients retrieving data.

• Number of key-value pairs to be stored and retrieved

The total number of key-value pairs to be stored and retrieved can be indicated using

command-line arguments.

• Individual value size

The size of individual values can be specified using command-line arguments.

In our analysis, the main conditions that we want to evaluate is the scalability, reliability,

37

and load-balancing ability of Memcached and Redis. By varying input parameters to

the benchmarks, we have evaluated and compared both Memcached and Redis to test

for the above conditions. In the next section, we present details of the micro-benchmark

application and our findings.

3.1.1 Description of the Micro-benchmark Applications

Although, we have developed two micro-benchmark applications, one each for Memcached

and Redis, the two are very similar and only differ in parts that require communication

and synchronization with either Memcached or Redis. We now give details of the Mem-

cached benchmark application and later on, we will explain the Redis benchmark by only

explaining the sections of code that differ.

In the previous chapter, we explained that Memcached client libraries can be integrated

into user applications to make requests to the server to store, retrieve or modify a particu-

lar key-value pair. Memcached has a variety of client libraries for programming languages

like C, C++, Java, or C# .NET. Since we are using MPI and the C programming language

for our benchmark application, we have used libMemcached as our client library. libMem-

cached is an open source C/C++ client library for the Memcached server which has been

designed to be light on memory usage, thread safe, and provide full access to server side

methods. Our MPI micro-benchmark applications make requests to Memcached server

instances, with the help of API’s exposed by libMemcached. The cluster that we have used

for our evaluations is the crill cluster at the University of Houston and the details of this

cluster are provided later on in this chapter.

Our MPI benchmark application acts as a client and sends requests to Memcached servers.

We initially start out with validating the input parameters and initializing the MPI en-

vironment. Once everything has been set-up, we establish a connection to the required

38

number of Memcached servers by using host-names from a given input text file. In the fol-

lowing sample code, each line from the input file is fetched and interpreted as a host-name

with which a connection is to be established.

while((readLen = getline(&line, &length, fp)) != -1)

{

line[readLen - 1] = ’\0’;

servers = memcached_server_list_append(servers, line, 11211, &rc);

rc = memcached_server_push(memc, servers);

}

Once the connections have been established, key-value pairs are stored onto the Memcached

servers. Depending on the number of instances of the client application to be executed and

the number of key-value pairs to be stored/retrieved, the keyspace is divided equally among

the MPI processes. Each MPI process is responsible for handling it’s subset of the keyspace,

independent of the other MPI processes. For example, if 4 MPI processes are given the

task of storing 20 key-value pairs, each process will generate and store 5 key-value pairs

onto Memcached servers. Out of these 4 MPI processes, if only 2 processes are given the

responsibility to retrieve key-value pairs, then each retrieving client will be responsible

for fetching 10 key-value pairs. Special care has been taken to avoid duplicate keys in

the dataset by using a combination of the current MPI process’ rank and offset of the

current key within the subset of data assigned to the current instance. Values are just

alpha-numerical strings that are generated using a random function. These key-value pairs

are then later retrieved one by one and the amount of time taken to store and retrieve the

key-value pairs is noted down. Between the generation, storing of key-value pairs and their

retrieval, care has been taken to insert MPI barrier statements because we are pipelining

the storage and retrieval tasks one after the other. The following code section demonstrates

39

the relevant code section to retrieve key-value pairs from Memcached servers.

MPI_Comm_rank(MPI_COMM_WORLD,&taskid);

MPI_Comm_size(MPI_COMM_WORLD, &numtasks);

nKeyValPairs = atoi(argv[2]);

nSubsetSize = nKeyValPairs / numtasks;

keyMin = taskid * nSubsetSize;

keyMax = ((taskid + 1) * nSubsetSize) - 1;

start = MPI_Wtime();

while(keyMin <= keyMax)

{

sprintf(key, "%d", keyMin);

gen_random(value, valueSize);

rc = memcached_set(memc, key, strlen(key), value, strlen(value), (time_t)0,

(uint32_t)0);

keyMin++;

}

end = MPI_Wtime();

The working of the micro-benchmark application for Redis is also very similar to the one

described above and for brevity, we skip the code sections for the Redis micro-benchmark.

As our Redis client library, we have used Hiredis. Hiredis is a compact C client library for

the Redis server. Hiredis is the official C client library recommended by Redis Labs and it

is thread-safe with built-in write replication, auto-reconnect, and a couple of other useful

features.

Thus, using these two benchmarks, we performed measurements to analyze and compare

the performance of Memcached and Redis. In the next section, we give technical details of

the hardware and software resources used.

40

3.1.1.1 Technical Data

For the analysis of our benchmark we have used the crill cluster at the University of

Houston. The crill cluster consists of 16 nodes with four 12-core AMD Opteron (Magny

Cours) processors each (48 cores per node, 768 cores total), 64 GB of main memory and

two dual-port InfiniBand HCAs per node. The cluster has a PVFS2 (v2.8.2) parallel file

system with 15 I/O servers and a stripe size of 1 MB. The file system is mounted onto the

compute nodes over the second InfiniBand network interconnect of the cluster. The cluster

utilizes SLURM as a resource manager. For development we have used the OpenMPI

library (version 2.0.1), Memcached (version 1.4.20), Redis (version 3.2.8), Libmemcached

(version 1.0.18), and Hiredis (version 1.0.0)

In the next few sections, we explain in detail the process that we have used to analyze and

compare the performance of Memcached and Redis using the benchmark applications.

3.1.2 Comparison of Memcached and Redis using our Micro-benchmark

Integrating a database into a mission critical application is often a huge decision and

organizations typically invest a lot of effort in selecting one that suits their needs. Any

such analysis on databases is incomplete without taking into consideration how well it

performs in terms of speed. The amount of time taken to store and retrieve data is one

of the main parameters affecting the efficiency of a database. Hence, our benchmarks

focus mainly on the time taken to store and retrieve a pre-determined amount of data.

However, there can be many factors that affect how fast data is stored and retrieved from

the database. The major parameters that we are concerned with are as follows:

• Responsiveness.

To test for responsiveness, we vary the number of processes storing and retrieving

41

data to/from the database servers. We believe that this experiment will give us an

idea of how well a server handles parallel requests coming in from multiple client

applications. Ideally, even as the number of parallel client requests increases, the

database should stay responsive. This will ensure that even if clients work together

to complete a single huge task, the performance is not hampered.

• Scalability.

To test for scalability, we vary the number of Memcached/Redis server instances

running in the cluster. This will help us gain insights about how well a database

performs load balancing. We believe that, as the number of servers increases, data is

also distributed evenly among increasing number of server. Hence the time taken by

an individual server to search for a data item and return it to the client should also

go down. This will in turn lead to lesser execution times.

• Functionality in case of varying data load.

This experiment is aimed at understanding how well a database performs irrespective

of the size of data to be stored/fetched. To do this, we incrementally vary the size

of the value to be stored and retrieved from the database. We expect that as data

sizes increase, the execution times will also increase. The main of this experiment if

to test that both Memcached and Redis perform well despite increasing data loads.

We believe that analyzing Memcached and Redis based on the above three criteria will

give us an overall understanding of their performance. It will also help quantify the overall

performance levels of the two databases. We have executed our micro-benchmarks on the

crill cluster, keeping in mind the above parameters. In the next few sections, we will

examine and compare the results that we have observed.

42

3.1.2.1 Varying the Number of Client Processes

In this analysis, our main aim is to observe the performance of Memcached and Redis in

the face of parallel data requests. To do this, we gradually increase the number of client

processes making requests to the servers, while keeping all other aspects of the application

fixed. This experiment has been performed in two parts. In the first part, we vary the

number of client processes while keeping the value size fixed at 1 KB. In the second part

we vary the clients and keep the value size fixed at 32 KB. The reasoning for this two-part

evaluation is explained in the following subsections.

3.1.2.1.1 Using Values of Size 1 KB

For this case, we have generated, stored and retrieved 100,000 key-value pairs where each

key is 20 characters long and each value is of size 1KB. We have used eight Memcached

and Redis server instances. The number of MPI processes is varied from 1 to 64 in steps of

powers of 2. The processes work together to store and retrieve the data. We have recorded

three readings for storage and retrieval times and reported the minimum. The minimum

storage and retrieval times observed in each case is given in Table 3.1:

Table 3.1: Time taken to store and retrieve data when number of client processes is varied.

43

Figure 3.1, shows a comparison of the data storage and retrieval times for Memcached

and Redis.

0 20 40 600

20

40

60

80

100

No. of client processes

Min

.ti

me

take

nto

stor

ed

ata

(sec

)

MemcachedRedis

0 20 40 600

10

20

30


Min

.ti

me

take

nto

retr

ieve

data

(sec

)

MemcachedRedis

Figure 3.1: Time Taken to Store and Retrieve Data When the Number of Client Processes

is Varied.

As observed in Figure 3.1, for both Memcached and Redis, we see that for storing and

retrieving data, the time taken to store and retrieve data decreases as the number of

processes increases. However, we see that towards the end, the performance of Memcached

is significantly worse than Redis. This leads us to conclude that Memcached is unable to

keep up as the number of simultaneous client requests increases beyond a certain threshold.

Also, when we compare the performance of Memcached and Redis, we can clearly see that

Redis gives better storage and retrieval times as compared to Memcached.

3.1.2.1.2 Using Values of Size 32 KB

In production-level applications, data size typically exceeds 1 KB. Hence, to get an idea of

how Memcached and Redis would perform while integrated with a regular application, we

44

decided to generate, store and retrieve 100,000 key-value pairs where each key is 20 char-

acters long and each value is of size 32 KB. For this analysis, we have used 16 Memcached

and Redis server instances. As in the previous case, the number of processes is varied from

1 to 64 in steps of powers of 2. We have recorded three readings for storage and retrieval

times and reported the minimum. The minimum storage and retrieval times observed in

each case is given in Table 3.2:

Table 3.2: Time taken to store and retrieve data when number of client processes is varied.

As seen in Figure 3.2, we compare the storage and retrieval time for Memcached and

Redis, when the data size is 32 KB and the number of client processes is varied.

45

0 20 40 600

50

100

150

200


Min

.ti

me

take

nto

stor

ed

ata

(sec

)

MemcachedRedis

0 20 40 600

50

100

150

200


Min

.ti

me

take

nto

retr

ieve

data

(sec

)

MemcachedRedis

Figure 3.2: Time Taken to Retrieve Data When the Number of Client Processes is Varied.

As seen in Figure 3.2, the time taken to store and retrieve 100,000 key-value pairs (each

value of 32 KB) follows the same pattern as the one where we used 1 KB values. How-

ever, in this case Memcached performs significantly worse than Redis. While storing and

retrieving data, we observed that, despite initial spikes, overall as the number of clients is

increased, the storage and retrieval time also gradually decreases. Also, when the value size

was increased to 32 KB, we noticed a considerable amount of cache misses for Memcached.

The reason for these misses is the fact that Memcached does not back data to a secondary

store and it is purely an in-memory key-value store.

Thus, for both of the above cases (data of size 1 KB and 32 KB), we conclude that Redis

is better at handling parallel client requests. Redis also performs better than Memcached

while storing and retrieving data. We observed that Redis was more reliable than Mem-

cached and that it strives to achieve data availability in most cases irrespective of data size.

In the next section, we will present the results observed while testing for the scalability of

both databases.

46

3.1.2.2 Varying the Number of Server Instances

We now run the second experiment by varying the number of Memcached and Redis server

instances running in the cluster. As part of this experiment, we generate, store and retrieve

100,000 key-value pairs with each key of 20 characters and each value of size 1 KB. We run

this experiment using 16 MPI processes and all 16 of them will share the load of storing

and fetching the data. The number of server instances used are 1, 2, 4, 8, 12, and 16. We

have recorded three readings for storage and retrieval times and reported the minimum.

The minimum storage and retrieval times observed in each case is given in Table 3.3:

Table 3.3: Time taken to store and retrieve data when the number of servers is varied.

As seen in Figure 3.3, we compare the performance of both databases when the number of

servers are varied.

47

0 5 10 15 200

1

2

3

4

No. of servers

Min

.ti

me

take

nto

store

dat

a[s

ec]

MemcachedRedis

0 5 10 15 200

1

2

3

4

No. of servers

Min

.ti

me

take

nto

retr

ieve

data

(sec

)

MemcachedRedis

Figure 3.3: Time Taken to Store and Retrieve Data When the Number of Servers is Varied.

From the above graphs, we can see a downward trend in the time taken to store and

retrieve the data. This indicates positively towards the scalability and the load-balancing

abilities of both Memcached and Redis. However, in this case too, we found that Redis

out-performs Memcached.

3.1.2.3 Varying the Size of the Value

The previous two cases focused mainly on analyzing the responsiveness and scalablity of

Memcached and Redis. In this case, we subject both Memcached and Redis to increasing

levels of data load and analyze how well they perform regular functions like storing and

retrieving data. For this case we have generated, stored and retrieved 100,000 key-value

pairs where each key is 20 characters long. We have used 16 Memcached and Redis server

instances and 16 MPI processes. All the MPI processes are equally responsible for handling

the load. The value size is varied from 1 KB to 64 KB in steps of powers of 2. We have

recorded three readings for storage and retrieval times and reported the minimum. The

48

minimum storage and retrieval times observed in each case is given in Table 3.4:

Table 3.4: Time taken to store and retrieve data when the size of the value is varied.

As seen in Figure 3.4, the difference in data storage and retrieval times for Memcached

and Redis when the data load is varied.

49

0 2 4 6

·104

0

20

40

60

80

100

120

Size of value (bytes)

Min

.ti

me

take

nto

stor

ed

ata

(sec

)

MemcachedRedis

0 2 4 6

·104

0

20

40

60

80

100

120

Size of value (bytes)

Min

.ti

me

take

nto

retr

ieve

data

(sec

)

MemcachedRedis

Figure 3.4: Time Taken to Store and Retrieve Data when the Value Size is Varied.

In this experiment, we observed that, as the size of individual values were increased, both

Memcached and Redis gradually started taking more time to store and retrieve the data.

Figure 3.4 clearly shows a linear relationship between the size of the data and the storage

and retrieval times. For values upto sizes of 1 KB, both Memcached and Redis perform

reasonably well while storing and retrieving data. However we see that, as the data size

is increased beyond 1 KB, the execution times double with each step. In this experiment

too, we observed that Redis performs better than Memcached while storing and retrieving

data. Also, in case of Memcached, we observed that, as the data size increased, the number

of data misses also increased.

3.1.2.4 Observations and Final Conclusions

In this manner, we have performed a comprehensive analysis of Memcached and Redis

using our OpenMPI micro-benchmark. We analyzed both key-value stores so as to gain an

50

idea about how reactive they are to varying data loads and varying number of client re-

quests. We also performed experiments to test the scalability of these two databases. Both

key-value stores performed fairly well in the test cases. However, as the number of client

requests and the volume of data to be stored/fetched increased, the difference in perfor-

mance between the two databases became apparent. Redis out-performed Memcached in

all our test cases. Also, we noted that Redis was generally more reliable than Memcached

in terms of data availability. This observation can be attributed to the fact that, contrary

to Redis, Memcached does not have any option to back data to secondary storage. As a

result, incorporating Memcached as an in-memory key-value cache into an application may

lead to more cache misses as the volume of data and incoming requests increases. Taking

into consideration the above results, we conclude that Redis is a much better candidate to

incorporate into applications as an in-memory caching mechanism. In the next section, we

present the details of the air-quality simulation application and the method that we have

used to integrate Redis into this application.

3.2 Air-quality Simulation Application

In the previous section, we analyzed and compared two in-memory key-value stores, namely

Memcached and Redis. After analysis we concluded that Redis outperformed Memcached

in most instances. In this section, we integrate Redis as a caching layer into a data analysis

application to see if gives any significant performance benefits. The application that we

are using is a Hadoop MapReduce application that is responsible for calculating the eight-

hour rolling average of air-quality data gathered around sites in Houston [6]. We are using

a dataset that contains information of pollutants measured by various sensors placed all

across Texas, from 2009 to 2013. We are using a total of five input files, one for each year.

The total size of the dataset is 48.5 GB and all the input files are stored in HDFS. Each

51

input file is a comma-separated list of information. Each line consists of the following fields:

year, month, day, hour, min, region, parameter id, parameter name, site, cams, value, and

flag. The problem that we tried to solve is to compute the eight-hour rolling average of O3

concentration in the air around sites in Houston, TX. This problem is broken down into

two parts. In the first part, for every site in Houston, we calculate the hourly average.

In the second part we combine the hourly averages to calculate the eight-hourly averages.

Using Hadoop MapReduce we can solve this problem using two MapReduce jobs.

The first MapReduce job computes the average of O3 concentration around sites in Houston

for every hour. The data present in the input directory is divided into blocks and given

as input to the mapper. It outputs (key,value) pairs which are then used by the reducer

to perform the required aggregation. The key emitted by the mapper is a combination

of siteId, year, day of the year, and the hour. Only data points having the valid flag

set, belonging to sites in Houston, with parameter name as O3 and whose pollutant value

is not null are considered as valid data points for our measurement. The corresponding

pollutant concentration is emitted by the Mapper as the value. The Reducer gets as input

a subset of keys, and each key is associated with a list of values. For each key, the sum of

values and the number of values associated with that keys are computed. If the frequency

count for a given hour is above a certain threshold (ex: greater than five in our case), the

corresponding hourly average is computed. If the frequency is less, a dummy value (“-1”

in our case) is emitted so as to indicate that the value is inconsequential. The second

MapReduce job calculates the eight-hour rolling average of O3 concentration around sites

in Houston. The mapper receives as input, the hourly averages computed in the previous

MapReduce job. The mapper emits eight keys that indicate the eight consecutive hours

starting from the hour indicated in the input key and the average pollutant concentration

value corresponding to the base hour. Special care has been taken to ensure that the hours

52

emitted by the mapper roll over after 24 hours. Every instance of the reducer, receives

as input, a list of average O3 concentration values associated with a particular hour. For

every hour, the sum of the averages and a frequency count is computed similar to the earlier

MapReduce job. If the total number of valid entries for a given hour is above a certain

threshold (greater than six in our case), the corresponding eight-hour rolling average is

computed. If the frequency count is less, a dummy value (“NA” in our case) is emitted by

the reducer to indicate an inconsequential entry.

Thus, using the above two MapReduce jobs, we have calculated the eight-hourly averages

of O3 concentration. The next section describes our reasoning for integrating Redis as a

caching layer into this application, and gives details of how this was achieved.

3.3 Integration of Redis in Hadoop

In the previous section, we described the air-quality MapReduce application in detail and

pointed out that input data to the application comes from HDFS and the output data is

written to HDFS. However, intermediate data, like the data passed from the first MapRe-

duce job to the second MapReduce job as well as the data passed on from the Mapper to

the Reducer is also written to HDFS. We believe that introducing an in-memory key-value

store as a caching layer may boost the performance of this application, because data will

be read in from RAM and not from the disk. To test this hypothesis, we have decided

to incorporate Redis as an in-memory cache in the air-quality application. To do this, we

have customized the data input source and output destinations to suit our requirements.

In the previous chapter, we discussed that, when a MapReduce job starts, each input file

is divided into splits and each of these splits is assigned to an instance of the Mapper.

Each split is further divided into records of key-value pairs which are then processed by

the Mapper. The ’InputFormat’ class is responsible for configuring how contiguous chunks

53

of input are generated from blocks in HDFS (or other sources). This class also provides

a ’RecordReader’ class that generates key-value pairs from each individual split. Hadoop

provides a set of standard InputFormat classes, but in our case, we use our own Input-

Format and RecordReader classes so as to read in data from Redis. Similarly, to write

our data to Redis instead of HDFS, we need to provide our own implementation of the

RecordWriter class.

In the new application, we will still have two MapReduce jobs where the first job calculates

the hourly averages and the second calculates the eight-hourly average. The flow of the

new application will be as follows:

• The Mapper of the first MapReduce job reads data from the input file stored in

HDFS and emits (key, value) pairs which are then used by the reducer to perform

the required aggregation.

• The Reducer calculates and emits the corresponding hourly averages to Redis instead

of HDFS. To write to Redis, we use our own customized RecordWriter as follows:

Figure 3.5: Customized RecordWriter to Read in Data from Redis

• The Mapper of the second job, reads in the hourly averages from Redis and emits

eight keys that indicate the eight consecutive hours starting from the hour indicated

in the input key and the input average value corresponding to the base hour. The

data is emitted to Redis instead of HDFS. To acheive this we implement our own

RecordReader as follows:

54

Figure 3.6: Customized RecordReader to Write Data to Redis

• Finally, the Reducer of the second MapReduce job, reads in the output of the previous

step from Redis and calculates and emits the final eight hourly average to HDFS.

To integrate Redis into our Hadoop application, we make use of a Java Redis client library

called Jedis which is the officially recommended Java client by Redis Labs. We have

then implemented customized RecordReader and RecordWriter classes to read/write data

to/from Redis using Jedis.

In this way, we have concluded the description of the air-quality simulation application

and our own customized version using Redis. In the next section, we present the details of

the hardware and software resources used for our analysis.

3.3.1 Technical Data

The Whale cluster located at the University of Houston is used to perform analyses for the

research work. It has 57 Appro 1522H nodes (whale-001 to whale-057). Each node has two

2.2 GHz quad-core AMD Opteron processors (8 cores total) with 16 GB main memory and

Gigabit Ethernet. The cluster uses a 144 port 4xInfiniBand DDR Voltaire Grid Director

ISR 2012 switch and a two 48 port HP GE switch for the network interconnect. For

the storage, a 4 TB NFS /home file system and a 7 TB HDFS file system (using triple

replication) is used. For development we have used Hadoop (version 2.7.2), Redis (version

3.2.8) and Jedis (version 2.8)

55

In the next section we compare the performance of both air-quality simulation applications

described previously and present our conclusion.

3.4 Results and Comparison

In the previous section, we discussed in detail the Hadoop air-quality application and also

our customized implementation with in-memory caching. In this section, we analyze the

performance of the two applications with respect to the time taken to complete execution.

We then compare the execution times of both applications to see if integrating Redis as a

caching layer provides any benefits.

To perform our analysis, we have used the whale cluster at the University of Houston. For

our analysis, we have varied the number of reducers from 1 to 20 in steps of 5. We have

executed both applications three times on the whale cluster and reported the minimum

of the three. The results that we observed for the original air-quality application are in

Table 3.5:

Table 3.5: Time taken to execute original air-quality application

No. of Reducers Execution time (min)

1 5min, 9sec

5 3min, 33sec

10 3min

15 2min, 42sec

20 2min, 41sec

56

The execution times that we observed for the air-quality application in which we integrated

Redis are in Table 3.6:

Table 3.6: Time taken to execute the air-quality application using Redis

No. of Reducers Execution time (min)

1 5min, 49sec

5 3min, 43sec

10 3min, 43sec

15 2min, 45sec

20 2min, 44sec

Figure 3.7 will enable us to understand the execution timings better. In the graph, we

have compared the total execution time taken by both air-quality applications.

1 5 10 15

3

4

5

6

No. of Reducers

Exec

uti

onti

me

(min

) HDFSRedis

Figure 3.7: Comparison of Execution Times (in minutes) for Air-quality Applications Using

HDFS and Redis.

57

Contrary to our expectations, we observed that integrating Redis into our application did

not provide any added performance benefits. In fact, the total time taken by the application

using in-memory caching is more as compared to the original application. We believe that

the delay is being introduced due to the fact that we are using a single Redis hash to store

data. As a result, this becomes a bottleneck when a client tries to write multiple key-value

pairs to the database. When a client wants to write data to Redis, it connects to a Redis

server instance and demands access to the hash. This client will then wait till it receives

a response from the server before sending the next request. Essentially all requests from

a single client are serialized and delay is introduced in completing the requests. When

we use more than one client, the delays get accrued and we see poor performance. To

solve this problem, Redis provides an advanced feature called pipelining [35]. Using Redis

pipelining it is possible to send multiple commands to the server without waiting for replies

from the server. This essentially means that a client buffers up a bunch of commands and

ships them to the server in one go. The benefit here is that we save network round trip

time for every command. However, due to lack of proper documentation for Jedis and

time constraints, we could not explore this option of pipelining client requests, but we wish

to continue exploring this option. We believe that introducing pipelining will give better

results, and we will see the true benefits of using Redis as a caching layer in scientific

applications. With this, we conclude the analysis and result section of this thesis and in

the next chapter, we conclude this thesis by summarizing our analysis, observations and

findings.

58

Chapter 4

Conclusions and Outlook

In recent years, the industry as well as academia has faced an unprecedented data explo-

sion and performing analyses on these large datasets is becoming increasingly common.

Data analysis is performed so as to find previously unknown correlations between datasets,

however, at the same time there is a tremendous need to make proper use of the available

computing resources. Also, traditional RDBMS databases are unable to keep up with the

huge volume of data that is being generated. To complicate matters further, data being

generated is obtained from various sources and may be structured or unstructured. NoSQL

databases overcome many of the shortcomings of RDBMS systems and have emerged as

a solution to store and analyze big data. There are many types of NoSQL databases and

lately key-value NoSQL databases are being increasingly used due to their simplicity and

ease of use. In-memory key-value stores are a special kind of key-value databases that re-

tain data in main memory instead of on secondary storage. This is done so as to speed-up

access to data. As a result, they are being used in compute intensive applications as an

intermediate caching layer to store intermediate and final results. This ensures faster read

times and hence enhances the performance of the application. The main focus of this thesis

59

is to analyze and compare the various in-memory key-value stores available in the market

today.

We have analyzed popular in-memory key-value stores like Memcached, Redis, Riak, Hazel-

cast, Aerospike, and MICA. We have then compared them based on features like in-memory

caching, support for multiple, parallel requests, open-source, easy access from remote loca-

tions etc. Based on our analysis, we were most interested in studying Redis and Memcached

in detail. To do this we have developed a micro-benchmark using C and the OpenMPI

library so as to analyze and compare Memcached and Redis. Based on our analysis, we

concluded that Redis was more scalable and reliable as compared to Memcached. Also,

we noticed Redis to be more resilient in the face of large data requests. Based on this

observation, we concluded Redis to be the better of the two.

To test how well Redis performs as an in-memory cache, we have integrated it into a Hadoop

MapReduce application that measures the eight hourly average of air-quality around sites

in Houston. We have used a 48.5 GB dataset that contains data collected from various

sites in Texas from 2009 to 2013. This task has been achieved in two parts using two

MapReduce jobs. The first job is responsible for calculating hourly averages and the second

job calculates the final eight hourly averages. The main aim was to compare the execution

times of this application with a similar Hadoop application that does not use in-memory

caching. Although, we observed promising results for the second part of the application we

observed that integrating Redis as a caching layer did not offer any performance benefits.

However, we believe that this problem can be solved using an advanced feature called Redis

pipelining and we wish to explore this further.

In the future, we are interested in benchmarking other in-memory key-value stores like Riak.

We also want to integrate Memcached as a caching layer into a data analysis application

to observe it’s performance in a real-world scenario. Further, we would love to explore

60

other components of the NoSQL eco-system so as improve the analytical abilities of big

data applications.

61

Bibliography

[1] Rick Cattell. Scalable sql and nosql data stores. SIGMOD Rec., 39(4):12–27, May2011.

[2] Ameya Nayak, Anil Poriya, and Dikshay Poojary. Type of nosql databases and itscomparison with relational databases. International Journal of Applied InformationSystems, 5(4), March 2013. Published by Foundation of Computer Science, New York,USA.

[3] Key-value database - wikipedia. https://en.wikipedia.org/wiki/Key-value_

database. [Online; accessed 16-Mar-2017].

[4] Brad Fitzpatrick. Distributed caching with memcached. Linux J., 2004(124):5–, Au-gust 2004.

[5] Redis. https://redis.io/. [Online; accessed 23-Dec-2016].

[6] Haripriya Ayyalasomayajula, Edgar Gabriel, Peggy Lindner, and Daniel Price. Airquality simulations using big data programming models. In Big Data Computing Ser-vice and Applications (BigDataService), 2016 IEEE Second International Conferenceon, pages 182–184. IEEE, 2016.

[7] Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler. Thehadoop distributed file system. In Proceedings of the 2010 IEEE 26th Symposiumon Mass Storage Systems and Technologies (MSST), MSST ’10, pages 1–10, Washing-ton, DC, USA, 2010. IEEE Computer Society.

[8] Jeffrey Dean and Sanjay Ghemawat. Mapreduce: Simplified data processing on largeclusters. Commun. ACM, 51(1):107–113, January 2008.

[9] Introduction to big data: Types, characteristics & benefits. http://www.guru99.

com/what-is-big-data.html. [Online; accessed 04-Nov-2016].

[10] Acid - wikipedia. https://en.wikipedia.org/wiki/ACID. [Online; accessed 21-February-2017].

[11] The programming language lua. https://www.lua.org/. [Online; accessed 27-Dec-2016].

62

[12] Using redis as an lru cache redis. https://redis.io/topics/lru-cache. [Online;accessed 24-Dec-2016].

[13] Gossip protocol - wikipedia. https://en.wikipedia.org/wiki/Gossip_protocol.[Online; accessed 11-Jan-2017].

[14] Memcached - a distributed memory object caching system. https://memcached.org/.[Online; accessed 27-Nov-2016].

[15] Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee,Harry C. Li, Ryan McElroy, Mike Paleczny, Daniel Peek, Paul Saab, David Stafford,Tony Tung, and Venkateshwaran Venkataramani. Scaling memcache at facebook. InProceedings of the 10th USENIX Conference on Networked Systems Design and Imple-mentation, nsdi’13, pages 385–398, Berkeley, CA, USA, 2013. USENIX Association.

[16] Riakkv enterprise technical overview. http://info.basho.com/rs/721-DGT-611/

images/RiakKV\%20Enterprise\%20Technical\%20Overview-6page.pdf. [Online;accessed 01-Feb-2017].

[17] Consistent hashing - wikipedia. https://en.wikipedia.org/wiki/Consistent_

hashing. [Online; accessed 31-Dec-2016].

[18] An architect’s view of hazelcast imdg - hazelcast.com. https://hazelcast.com/

resources/architects-view-hazelcast/. [Online; accessed 02-Feb-2017].

[19] Java virtual machine - wikipedia. https://en.wikipedia.org/wiki/Java_virtual_machine. [Online; accessed 28-Feb-2017].

[20] Hazelcast documentation. http://docs.hazelcast.org/docs/3.3/manual/pdf/

hazelcast-documentation-3.3.5.pdf. [Online; accessed 03-Feb-2017].

[21] Hyeontaek Lim, Dongsu Han, David G. Andersen, and Michael Kaminsky. Mica: Aholistic approach to fast in-memory key-value storage. In 11th USENIX Symposiumon Networked Systems Design and Implementation (NSDI 14), pages 429–444, Seattle,WA, 2014. USENIX Association.

[22] Data plane development kit. http://dpdk.org/. [Online; accessed 03-Mar-2017].

[23] Non-uniform memory access - wikipedia. https://en.wikipedia.org/wiki/

Non-uniform_memory_access. [Online; accessed 07-Mar-2017].

[24] Mica: A holistic approach to fast in-memory key-value storage. http://www.slideserve.com/schuyler/

mica-a-holistic-approach-to-fast-in-memory-key-value-storage. [On-line; accessed 04-Feb-2017].

[25] Aerospike architecture. http://www.aerospike.com/docs/architecture. [Online;accessed 04-Mar-2017].

63

[26] Db-engines ranking - popularity ranking of key-value stores. https://db-engines.

com/en/ranking/key-value+store. [Online; accessed 03-Feb-2017].

[27] Mpi: A message-passing interface standard. http://mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf. [Online; accessed 13-Dec-2016].

[28] Application programming interface - wikipedia. https://en.wikipedia.org/wiki/

Application_programming_interface. [Online; accessed 02-March-2017].

[29] Inter-process communication - wikipedia. https://en.wikipedia.org/wiki/

Inter-process_communication. [Online; accessed 26-Feb-2017].

[30] Apache hadoop. http://hadoop.apache.org/. [Online; accessed 10-Feb-2017].

[31] Locality of reference - wikipedia. https://en.wikipedia.org/wiki/Locality_of_

reference. [Online; accessed 21-February-2017].

[32] Apache hadoop yarn. https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/

hadoop-yarn-site/YARN.html. [Online; accessed 17-Apr-2017].

[33] Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The google file system.In Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles,SOSP ’03, pages 29–43, New York, NY, USA, 2003. ACM.

[34] Hadoop word count example. https://cs.calvin.edu/courses/cs/374/

exercises/12/lab/. [Online; accessed 12-Dec-2016].

[35] Redis pipelining. https://redis.io/topics/pipelining. [Online; accessed 12-Apr-2017].

[36] M. Berezecki, E. Frachtenberg, M. Paleczny, and K. Steele. Many-core key-value store.In Proceedings of the 2011 International Green Computing Conference and Workshops,IGCC ’11, pages 1–8, Washington, DC, USA, 2011. IEEE Computer Society.

[37] Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. Work-load analysis of a large-scale key-value store. In Proceedings of the 12th ACM SIG-METRICS/PERFORMANCE Joint International Conference on Measurement andModeling of Computer Systems, SIGMETRICS ’12, pages 53–64, New York, NY, USA,2012. ACM.

[38] Shared memory hash table vishesh handa’s blog. http://vhanda.in/blog/2012/

07/shared-memory-hash-table/. [Online; accessed 01-Dec-2016].

[39] Tom White. Hadoop: The Definitive Guide . O’ReillyMedia, Inc., second edition,October 2010.

[40] Thilina Gunarathne Srinath Perera. Hadoop MapReduce Cookbook . Packt Publishing,first edition, February 2013.

64

[41] Introduction to mapreduce and hadoop. http://people.csail.mit.edu/matei/

talks/2010/amp_mapreduce.pdf. [Online; accessed 17-Apr-2017].

[42] Edgar Gabriel. Cosc 6374 parallel computation, fall 2015. http://www2.cs.uh.edu/

~gabriel/courses/cosc6374_f15/index.shtml. [Online; accessed 17-Apr-2017].

[43] Edgar Gabriel. Cosc 6339 big data analytics, spring 2015. http://www2.cs.uh.edu/

~gabriel/courses/cosc6339_s15/index.shtml. [Online; accessed 17-Apr-2017].

[44] Rdbms, 2016. [Online; accessed 28-November-2016].

[45] Emilio Coppa. Hadoop architecture overview. http://ercoppa.github.io/

HadoopInternals/HadoopArchitectureOverview.html. [Online; accessed 17-Apr-2017].

[46] Open mpi: Open source high performance computing. https://www.open-mpi.org/.[Online; accessed 17-Apr-2017].

[47] Jeffrey M. Squyres. The architecture of open source applications (volume 2): Openmpi. http://www.aosabook.org/en/openmpi.html, 2015. [Online; accessed 17-Apr-2017].

65

an evaluation of key-value stores in scientific …

Documents