os apache cassandra pdf

5/26/2018 Os Apache Cassandra PDF

1/15

Copyright IBM Corporation 2012 Trademarks

Consider the Apache Cassandra database Page 1 of 15

Consider the Apache Cassandra database

What are the pros and cons of this NoSQL database?

Srinath Perera([email protected])

Senior Software Architect

WSO2 Inc

03 July 2012

NoSQL storage provides a flexible and scalable alternative to relational databases, andamong many such storages, Cassandra is one of the popular choices. Move beyond the well-

known details and explore the less obvious details associated with Cassandra. You'll examine

the Cassandra data model, storage schema design, architecture, and potential surprises

associated with Cassandra.

Introduction

In the database history article "What Goes Around Comes Around," (see Resources) Michal

Stonebraker describes in detail how storage techniques have evolved over time. Before arrivingat the relational model, developers tried other models such as hierarchical and directed graph.

It is worth noting that the SQL-based relational modelwhich is the de facto standard even now

has prevailed for about 30 years. Given the short history and fast pace of computer science,

this is a remarkable achievement. The relational model is so well-established that for many years,

selecting data storage for an application was an easy choice for the solution architect. The choice

was invariably a relational database.

Developments like increasing user bases of systems, mobile devices, extended online presence

of users, cloud computing, and multi-core systems have led to increasingly large-scale systems.

High-tech companies such as Google and Amazon were among first to hit those problems of

scale. They soon found out that relational databases are not adequate to support large-scale

systems.

To circumvent those challenges, Google and Amazon came up with two alternative solutions:

Big Table and Dynamo (see Resources) where they relaxed the guarantees provided by the

relational data model to achieve higher scalability. Eric Brewer's "CAP Theorem" (see Resources)

later formalized thoseobservations. It claims that for scalable systems, consistency, availability,

and partition tolerance are trade-offs where it is impossible to build systems containing all those
http://www.ibm.com/legal/copytrade.shtmlhttp://www.ibm.com/legal/copytrade.shtmlmailto:[email protected]:[email protected]://www.ibm.com/developerworks/ibm/trademarks/http://www.ibm.com/legal/copytrade.shtml


2/15

developerWorks ibm.com/developerWorks/


properties. Soon, based on earlier work by Google and Amazon, and understanding acquired

about scalable systems, a new class of storage systems was proposed. They were named

"NoSQL" systems. The name first meant "do not use SQL if you want to scale" and later it was

redefined to "not only SQL" to mean that there are other solutions in addition to SQL-based

solutions.

There are many NoSQL systems, and each relaxes or alters some aspect of the relational model.

It is worth noting that none of the NoSQL solutions work for all scenarios. Each does better than

relational models and scales for some subsets of the use cases. My earlier article "Finding the

Right Data Solution for Your Application in the Data Storage Haystack" discusses how to match

application requirements to NoSQL solutions (see Resources).

Apache Cassandra (see Resources) is one of the first and most widely used NoSQL solutions.

This article takes a detailed look at Cassandra and points out details and tricky points not readily

apparent when you look at Cassandra for the first time.

Apache Cassandra

Cassandra is a NoSQL Column family implementation supporting the Big Table data model using

the architectural aspects introduced by Amazon Dynamo. Some of the strong points of Cassandra

are:

Highly scalable and highly available with no single point of failure

NoSQL column family implementation

Very high write throughput and good read throughput

SQL-like query language (since 0.8) and support search through secondary indexes

Tunable consistency and support for replication Flexible schema

These positive points make it easy to recommend Cassandra, but it is crucial for a developer to

delve into the details and tricky points of Cassandra to grasp the intricacies of this program.

Cassandra stores data according to the column family data model, depicted in Figure 1.


3/15

ibm.com/developerWorks/ developerWorks


Figure 1. Cassandra data model

What is a Column?Columnis bit of a misnomer, and possibly the name cellwould have been easier to

understand. I will stick with columnas that is the common usage.

Cassandra data model consists of columns, rows, column families, and keyspace. Let's look at

each part in detail.

Column the most basic unit in the Cassandra data model, and each column consists of a

name, a value, and a timestamp. For this discussion, ignore the timestamp, and then you can

represent a column as a name value pair (such as author="Asimov").

Row a collection of columns labeled with a name. For example, Listing 1shows how a row

might be represented:

Listing 1. Example of a row "Second Foundation"-> {

author="Asimov",

publishedDate="..",

tag1="sci-fi", tag2="Asimov"

}

Cassandra consists of many storage nodes and stores each row within a single storage node.

Within each row, Cassandra always stores columns sorted by their column names. Using this

sort order, Cassandra supports slice queries where given a row, users can retrieve a subset of


4/15



its columns falling within a given column name range. For example, a slice query with range

tag0 to tag9999 will get all the columns whose names fall between tag0 and tag9999.

Column family a collection of rows labeled with a name. Listing 2shows how sample data

might look:

Listing 2. Example of a column family

Books->{

"Foundation"->{author="Asimov", publishedDate=".."},

"Second Foundation"->{author="Asimov", publishedDate=".."},

}

It is often said that a column family is like a table in a relational model. As shown in the

following example, the similarities end there.

Keyspace a group of many column families together. It is only a logical grouping of column

families and provides an isolated scope for names.

Finally, super columns reside within a column family that groups several columns under a one key.As developers discourage the use of super columns, I do not discuss them here.

Cassandra versus RDBMS data models

From the above description of the Cassandra data model, data is placed in a two dimensional

(2D) space within each column family. To retrieve data in a column family, users need two keys:

row name and column name. In that sense, both the relational model and Cassandra are similar,

although there are several crucial differences.

Relational columns are homogeneous across all rows in the table. A clear vertical relationship

usually exists between data items, that is not the case with Cassandra columns. This is the

reason Cassandra stores the column name with each data item (column).

With the relational model, 2D data space is complete. Each point in the 2D space should have

at least the null value stored there. Again, this is not the case with Cassandra, and it can have

rows containing only a few items, while other rows can have millions of items.

With a relational model, the schema is predefined and cannot be changed at runtime, while

Cassandra lets users change the schema at runtime.

Cassandra always stores data such that columns are sorted based on their names. This

makes it easier to search for data through a column using slice queries, but it is harder to

search for data through a row unless you use an order-preserving partitioner.

Another crucial difference is that column names in RDMBS represent metadata aboutdata, but never data. In Cassandra, however, the names of columns can include data.

Consequently, Cassandra rows can have millions of columns, while a relational model usually

has tens of columns.

Using a well-defined immutable schema, relational models support sophisticated queries

that include JOINs, aggregations, and more. With a relational model, users can define the

data schema without worrying about queries. Cassandra does not support JOINs and most

SQL search methods. Therefore, schema has to be catered to the queries required by the

application.


5/15



To explore the above differences, consider a book rating site where users can add books (author,

rank, price, link), comments (text, time, name), and tag them. The Application needs to support the

following operations by the users:

Adding books

Adding comments for books Adding tags for books

Listing books sorted by rank

Listing books given a tag

Listing the comments given a book ID

It is rather trivial to implement the above application with a relational model. Figure 2shows the

Entityrelationship (ER) diagram for the database design.

Figure 2. ER Model for the Book rating site

Let's see how this can be implemented using the Cassandra data model. Listing 3shows a

potential schema with Cassandra, where the first line represents the "Books" column family which

has multiple rows, each having properties of the book as columns. and denote

timestamps.

Listing 3. Cassandra schema for the book rating sample

Books[BookID->(author, rank, price, link, tag, tag ..,

cmt+= text + "-" + author) ]

Tags2BooksIndex[TagID->(=bookID1, =bookID2, ..) ]

Tags2AuthorsIndex[TagID->(=bookID1, =bookID2, ..) ]RanksIndex["RANK" -> (rank=bookID)]

Table 1is a sample data set as per the schema.

Table 1. Sample data for the book rating site

Column Family Name Sample Dataset

Books "Foundation" -> ("author"="Asimov", "rank"=9, "price"=14, "tag1"="sci-

fi", "tag2"="future", "cmt1311031405922"="best book-sanjiva",

"cmt1311031405923"="well I disagree-srinath")


6/15



"I Robot" -> ("author"="Asimov", "rank"=7, "price"=14, "tag1"="sci-

fi" "tag2"="robots", "cmt1311031405924"="Asimov's best-srinath",

"cmt1311031405928"="I like foundation better-sanjiva")

RanksIndex "Rank" -> (9="Foundation", 7="I Robot")

Tags2BooksIndex "sci-fi" -> ("1311031405918"="Foundation", "1311031405919"="I Robot"

"future" ->

Tags2AuthorsIndex "sci-fi" -> (1311031405920="Asimov")

"future" ->

This example shows several design differences between the relational and Cassandra models.

The Cassandra model stores data about books in a single column family called "Books," and the

other three Column Families are indexes built to support queries.

Looking at the "Books" column family in detail, the model uses a row to represent each book

where a book name is the row ID. Details about the book are represented as columns stored within

the row.

Looking closely, you might notice that data items stored (like comments, and tags that have 1:M

relationship with books) are also within a single row. To do that, append the time stamp to the

column names for tags and comments. This approach stores all data within the same column. This

action avoids having to do JOINs to retrieve data. Cassandra circumvents the lack of support for

JOINs through this approach.

This provides several advantages.

You can read all data about a book through a single query reading the complete row.

You can retrieve comments and tags without a JOIN by using slice queries that have cmt0-

cmt9999 and tag0-tag9999 as starting and ending ranges.

Because Cassandra stores columns sorted by their column names, making slice queries is very

fast. It is worth noting that storing all the details about the data item in a single row and the use of

sort orders are the most crucial ideas behind the Cassandra data design. Most Cassandra data

model designs follow these ideas in some form. User can use the sort orders while storing data

and building indexes. For example, another side effect of appending time stamps to column names

is that as column names are stored in the sorted order, comments having column names post-

fixed by the timestamps are stored in the order they are created, and search results would have

the same order.

Cassandra does not support any search methods from the basic design. Although it supports

secondary indexes, they are supported using indexes that are built later, and secondary indexes

have several limitations including lack of support for range queries.

Consequently, the best results in a Cassandra data design needs users to implement searches

by building custom indexes and utilizing column and row sort orders. Other three-column families

(Tags2BooksIndex, Tags2AuthorsIndex, and RankIndex) do exactly that. Since users need to

search for books given a tag, "Tags2BooksIndex" column family builds an index by storing the tag

name as the row ID and all books tagged by that tag as columns under that row. As shown by the


7/15



example, timestamps are added as the column keys, but that is to provide a unique column ID.

The search implementation simply reads the index by looking up the row by tag name and finding

the matches by reading all columns stored within that rowID.

Table 2discusses how each of the queries required by the application is implemented using the

above Cassandra indexes.

Table 2. Comparison of query implementations

Query description Query as SQL Cassandra implementation

List books sorted by the rank Run the query

"Select * from Books order by

rank"and then on each result do "Select

tag from Tags where bookid=?"

and "Select comment from Comments

where bookid=?"

Do a slice query on "RankIndex" column family

to receive an ordered list of books, and for

each book do a slice query on "Books" column

family to read the details about the book.

Given a tag, find the authors whose books

have the given tag.

Select distinct author

from Tags, Books whereTags.bookid=Books.bookid and tag=?

Read all columns for the given tag from

Tags2Authors using a slice query.

Given a tag, list books that have the given tag. Select bookid from Tags where

tag=?

Read all columns for the given tag from

Tags2BooksIndex using a slice query.

Given a book, list the comments for that book

in sorted order of the time when the comments

were created.

Select text, time, user from

Comments where bookid=? Order by

time

In "Books" column family, do a slice query from

the row corresponding to the given book. They

are in sorted order due to timestamps used as

the column name.

Although the above design can efficiently support queries required by the book-rating site, it can

only support queries that it is designed for and cannot support ad-hoc queries. For example, it

cannot do the following queries without building new indexes.

Select * from Books where price > 50;

Select * from Books where author="Asimov"

It is possible to change the design to support those and other queries by either building

appropriate indexes or by writing code to walk through the data. The need for custom code to

support new queries, however, is a limitation compared to relational models where adding new

queries often needs no changes to the schema.

From the 0.8 release, Cassandra supports secondary indexes where users can specify a search

by a given property, and Cassandra automatically builds indexes for searching based on that

property. That model, however, provides less flexibility. For example, secondary indexes do not

support range queries and provide no guarantees on sort orders of results.

Using Cassandra from the Java environment

Cassandra has many clients written in different languages. This article focuses on the Hector client

(see Resources), which is the most widely used Java client for Cassandra. Users can add to their

application by adding the Hector JARs to the application classpath. Listing 4shows a sample

Hector client.


8/15



First, connect to a Cassandra cluster. Use the instructions in the Cassandra Getting Started

Page (see Resources) to set up a Cassandra node. Unless its configuration has been changed, it

typically runs on port 9160. Next, define a keyspace. This can be done either through the client or

through the conf/cassandra.yaml configuration file.

Listing 4. Sample Hector client code for Cassandra

Cluster cluster = HFactory.createCluster('TestCluster',

new CassandraHostConfigurator("localhost:9160"));

//define a keyspace

Keyspace keyspace = HFactory.createKeyspace("BooksRating", cluster);

//Now let's add a new column.

String rowID = "Foundation";

String columnFamily = "Books";

Mutator

mutator = HFactory.createMutator(keyspace, user);

mutator.insert(rowID, columnFamily,

HFactory.createStringColumn("author", "Asimov"));

//Now let's read the column back

ColumnQuery

columnQuery = HFactory.createStringColumnQuery(keyspace);

columnQuery.setColumnFamily(columnFamily).setKey(wso2).setName("address");

QueryResult


9/15



Figure 3. Cassandra cluster

Cassandra uses consistent hashing to assign data items to nodes. In simple terms, Cassandra

uses a hash algorithm to calculate the hash for keys of each data item stored in Cassandra (for

example, column name, row ID). The hash range or all possible hash values (also known as

keyspace) is divided among the nodes in the Cassandra cluster. Then Cassandra assigns each

data item to the node, and that node is responsible for storing and managing the data item. The

paper "Cassandra - A Decentralized Structured Storage System" (see Resources) provides adetailed discussion about Cassandra architecture.

The resulting architecture provides the following properties:

Cassandra distributes data among its nodes transparently to the users. Any node can accept

any request (read, write, or delete) and route it to the correct node even if the data is not

stored in that node.

Users can define how many replicas are needed, and Cassandra handles replica creation and

management transparently.

Tunable consistency: When storing and reading data, users can choose the expected

consistency level per each operation. For example, if the "quorum" consistency level is used

while writing or reading, data is written and read from more than half of the nodes in the

cluster. Support for tunable consistency enables users to choose the consistency level best

suited to the use case.

Cassandra provides very fast writes, and they are actually faster than reads where it can

transfer data about 80-360MB/sec per node. It achieves this using two techniques.

Cassandra keeps most of the data within memory at the responsible node, and any

updates are done in the memory and written to the persistent storage (file system) in

a lazy fashion. To avoid losing data, however, Cassandra writes all transactions to a


10/15



commit log in the disk. Unlike updating data items in the disk, writes to commit logs are

append-only and, therefore, avoid rotational delay while writing to the disk. For more

information on disk-drive performance characteristics, see Resources.

Unless writes have requested full consistency, Cassandra writes data to enough nodes

without resolving any data inconsistencies where it resolves inconsistencies only at the

first read. This process is called "read repair."

The resulting architecture is highly scalable. You can build a Cassandra cluster that has 10s of

100s of nodes that is capable of handling terabytes to petabytes of data. There is a trade-off with

distributed systems, and scale almost never comes for free. As mentioned before, a user might

face many surprises moving from a relational database to Cassandra. The next section discusses

some of them.

Possible surprises with Cassandra

Be aware of these differences when you move from a relational database to Cassandra.

No transactions, no JOINs

It is well known that Cassandra does not support ACID transactions. Although it has a batch

operation, there is no guarantee that sub-operations within the batch operation are carried out in

an atomic fashion. This will be discussed more under Failed operations may leave changes.

Furthermore, Cassandra does not support JOINs. If a user needs to join two column families, you

must retrieve and join data programmatically. This is often expensive and time-consuming for large

data sets. Cassandra circumvents this limitation by storing as much data as possible in the same

row, as described in the example.

No foreign keys and keys are immutable

Cassandra does not support foreign keys, so it is not possible for Cassandra to manage the data

consistency on a user's behalf. Therefore, the application should handle the data consistency.

Furthermore, users cannot change the keys. It is recommended to use surrogate keys (generated

keys instead of the key, and managing the key as a property) with the use cases that need

changes to the keys.

Keys have to be uniqueEach key, for example row keys and column keys, has to be unique in its scope, and if the same

key has been used twice it will overwrite the data.

There are two solutions to this problem. First, you can use a composite key. In other words, create

the key by combining several fields together, and this solution is often used with row keys. The

second solution is when there is a danger of the same key occurring twice, postfix the key with a

random value or a timestamp. This often happens with indexes when an index stores a value as

the column name. For example, in the book rating application the rank was used as the column


11/15



name. To avoid having two entries having the same column name because both have the same

rank, the timestamp is added to the rank as a postfix.

Failed operations may leave changes

As explained before, Cassandra does not support atomic operations. Instead, it supports

idempotent operations. Idempotent operations leave the system in the same state regardless of

how many times the operations are carried out. All Cassandra operations are idempotent. If an

operation fails, you can retry it without any problem. This provides a mechanism to recover from

transient failures.

Also Cassandra supports batch operations, but they do not have any atomicity guarantees either.

Since the operations are idempotent, the client can keep retrying until all operations of the batch

are successful.

Idempotent operations are not equal to atomic operations. If an operation is successful, all is well

and the outcome is identical to atomic operations. If an operation fails, the client can retry, and ifit is successful, again all is well. If, however, the operations fails even after retrying, unlike with

atomic operations, it might leave side effects. Unfortunately, with Cassandra, this is a complexity

that programmers have to deal with themselves.

Searching is complicated

Searching is not built into the core of the Cassandra architecture, and search mechanisms are

layered on top using sort orders as described earlier. Cassandra supports secondary indexes

where the system automatically builds them, with some limited functionality. When secondary

indexes do not work, users have to learn the data model and build indexes using sort orders and

slices.

Three types of complexities area associated with building search methods:

1. Building custom search methods require programmers to understand indexing and details

about storage to a certain extent. Therefore, Cassandra needs higher skilled developers than

with relational models.

2. Custom indexes heavily depend on sorted orders, and they are complicated. There are two

types of sort orders: first, the columns are always sorted by name, and second, the row sort

orders work only if an order-preserving partitioner (see Resources) is used.

3. Adding a new query often needs new indexes and code changes unlike with relational

models. This requires developers to analyze queries before storing the data.

Super columns and order preserving partitioners are discouraged

Cassandra super columns can be useful when modeling multi-level data, where it adds one more

level to the hierarchy. Anything that can be modeled with super columns, however, can also be

supported through columns. Hence, super columns do not provide additional power. Also, they

do not support secondary indexes. Therefore, the Cassandra developers discourage the use of

super columns. Although there is no firm date for discontinuing support, it might happen in future

releases.


12/15



A partitioner in Cassandra decides how to distribute (shard) data among Cassandra nodes, and

there are many implementations. If an order-preserving partitioner is used, rowIDs are stored in

a sorted order and Cassandra can do slices (searches) across rowIDs as well. This partitioner

does not distribute the data uniformly among its nodes, however, and with large datasets, some

of the nodes might be hard-pressed while others are lightly loaded. Therefore, developers also

discourage the use of order-preserving partitioners.

Healing from failure is manual

If a node in a Cassandra cluster has failed, the cluster will continue to work if you have replicas.

Full recovery, which is to redistribute data and compensate for missing replicas, is a manual

operation through a command line tool called node tool(see Resources). Also, while the manual

operation happens, the system will be unavailable.

It remembers deletes

Cassandra is designed such that it continues to work without a problem even if a node goes down

(or gets disconnected) and comes back later. A consequence is this complicates data deletions.

For example, assume a node is down. While down, a data item has been deleted in replicas.

When the unavailable node comes back on, it will reintroduce the deleted data item at the syncing

process unless Cassandra remembers that data item has been deleted.

Therefore, Cassandra has to remember that the data item has been deleted. In the 0.8 release,

Cassandra was remembering all the data even if it is deleted. This caused disk usage to keep

growing for update-intensive operations. Cassandra does not have to remember all the deleted

data, but just the fact that a data item has been deleted. This fix was done in later releases of

Cassandra.

Conclusion

This article delves into some details that are not readily apparent when you consider Cassandra.

I described the Cassandra data model, comparing it with the relational data model, and

demonstrated a typical schema design with Cassandra. A key observation is that unlike the

relational model that breaks data into many tables, Cassandra tends to keep as much as data as

possible within the same row to avoiding having to join that data for retrieval.

You also looked at several limitations of the Cassandra-based approach. These limitations,

however, are common to most NoSQL solutions, and are often conscious design trade-offs to

enable high scalability.


13/15



Downloads

Description Name Size

Book rating sample code CassandraSample.zip 42KB
http://www.ibm.com/developerworks/apps/download/index.jsp?contentid=823612&filename=CassandraSample.zip&method=http&locale=


14/15



Resources

Learn

Read What goes around comes around(Michael Stonebraker and Joey Hellerstein, 2007), if

you are interested about the history of storage technologies. Read Getting Started Pagein the Cassandra Wiki to install Cassandra, run single node

Cassandra, and find an overview of how to configure multinode clusters.

Read the paper, Cassandra - A Decentralized Structured Storage System(Avinash

Lakshman and Prashant Malik, 2009) to understand the Cassandra architecture in more

detail.

Read more about Google's Big Tableand Amazon's Dynamo.

Read about Eric Brewer's CAP theorem(Julian Browne, January 2009).

See Finding the Right Data Solution for Your Application in the Data Storage Haystack

(Srinath Perera, InfoQ, October 2011) for an overview of NoSQL landscape and

recommendations onhow to choose the right NoSQL storage. Learn more about Disk-drive performance characteristicson Wikipedia.

Read about node tool, a simple command line interface to these exposed operations and

attributes on the Cassandra Wiki.

Find more details about consistency levels in the Casandra API wiki.

Read more about storage configurations.

The Open Source developerWorks zoneprovides a wealth of information on open source

tools and using open source technologies.

Stay current with developerWorks technical events and webcastsfocused on a variety of IBM

products and ITindustry topics.

Attend a free developerWorks Live! briefingto get up-to-speed quickly on IBM products and

tools, as well as IT industry trends.

Watch developerWorks on-demand demosranging from product installation and setup demos

for beginners, to advanced functionality for experienced developers.

Follow developerWorks on Twitter.

Get products and technologies

Explore Cassandraon the project website.

Check out the Hector clienton the project website.

Download Cassandraand find instructions on how to use it at cassandra.apache.org.

Evaluate IBM productsin the way that suits you best: Download a product trial, try a productonline, use a product in a cloud environment, or spend a few hours in the SOA Sandbox

learning how to implement ServiceOrientedArchitecture efficiently.

Discuss

Check out developerWorks blogsand get involved in the developerWorks community.

Get involved in the developerWorks community. Connect with other developerWorks users

while exploring the developer-driven blogs, forums, groups, and wikis.
http://www.ibm.com/developerworks/downloads/index.htmlhttp://cassandra.apache.org/https://github.com/rantav/hectorhttp://cassandra.apache.org/http://cassandra.apache.org/http://www.twitter.com/developerworks/http://www.twitter.com/developerworks/http://www.ibm.com/developerworks/offers/lp/demos/index.htmlhttp://www.ibm.com/developerworks/offers/techbriefings/index.htmlhttp://www.ibm.com/developerworks/offers/techbriefings/events.htmlhttp://www.ibm.com/developerworks/opensource/http://wiki.apache.org/cassandra/StorageConfigurationhttp://wiki.apache.org/cassandra/StorageConfigurationhttp://en.wikipedia.org/wiki/Disk-drive_performance_characteristicshttp://www.infoq.com/articles/perera-data-storage-haystackhttp://www.infoq.com/articles/perera-data-storage-haystackhttp://www.julianbrowne.com/article/viewer/brewers-cap-theoremhttp://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/bigtable-osdi06.pdfhttp://db.cs.pitt.edu/courses/cs3551/11-1/handouts/10-1.1.1.115.1568.pdfhttp://www.ibm.com/developerworks/communityhttp://www.ibm.com/developerworks/communityhttp://www.ibm.com/developerworks/blogs/http://www.ibm.com/developerworks/downloads/soasandbox/index.htmlhttp://www.ibm.com/developerworks/downloads/index.htmlhttp://cassandra.apache.org/https://github.com/rantav/hectorhttp://cassandra.apache.org/http://www.twitter.com/developerworks/http://www.ibm.com/developerworks/offers/lp/demos/index.htmlhttp://www.ibm.com/developerworks/offers/techbriefings/index.htmlhttp://www.ibm.com/developerworks/offers/techbriefings/events.htmlhttp://www.ibm.com/developerworks/opensource/http://wiki.apache.org/cassandra/StorageConfigurationhttp://wiki.apache.org/cassandra/APIhttp://wiki.apache.org/cassandra/NodeToolhttp://en.wikipedia.org/wiki/Disk-drive_performance_characteristicshttp://www.infoq.com/articles/perera-data-storage-haystackhttp://www.julianbrowne.com/article/viewer/brewers-cap-theoremhttp://db.cs.pitt.edu/courses/cs3551/11-1/handouts/10-1.1.1.115.1568.pdfhttp://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/bigtable-osdi06.pdfhttp://www.cs.cornell.edu/Projects/ladis2009/papers/Lakshman-ladis2009.PDFhttp://wiki.apache.org/cassandra/GettingStartedhttp://idke.ruc.edu.cn/seminars/phd/2007/11.07/What%20Goes%20Around%20Comes%20Around.pdf


15/15



About the author

Srinath Perera

Srinath works as a Senior Software architect at WSO2 Inc., where he overlooks the

overall WSO2 platform architecture with the CTO. He also serves as a research

scientist at Lanka Software Foundation and teaches as a visiting faculty at

Department of Computer Science and Engineering, University of Moratuwa. He

is a co-founder of Apache Axis2, and he has been involved with the Apache Web

Service project since 2002 and is a member of Apache Software foundation, PMC

and the Apache Web Service project. Srinath is also a committer of Apache open

source projects Axis, Axis2, and Geronimo. Srinath received his Ph.D. and M.Sc.

in Computer Sciences from Indiana University, Bloomington, USA and received

his Bachelor of Science in Computer Science and Engineering from University of

Moratuwa, Sri Lanka.

Copyright IBM Corporation 2012

(www.ibm.com/legal/copytrade.shtml)

Trademarks

(www.ibm.com/developerworks/ibm/trademarks/)
http://www.ibm.com/developerworks/ibm/trademarks/http://www.ibm.com/legal/copytrade.shtml

os apache cassandra pdf

Documents