os apache cassandra pdf
DESCRIPTION
apcahceTRANSCRIPT
-
5/26/2018 Os Apache Cassandra PDF
1/15
Copyright IBM Corporation 2012 Trademarks
Consider the Apache Cassandra database Page 1 of 15
Consider the Apache Cassandra database
What are the pros and cons of this NoSQL database?
Srinath Perera([email protected])
Senior Software Architect
WSO2 Inc
03 July 2012
NoSQL storage provides a flexible and scalable alternative to relational databases, andamong many such storages, Cassandra is one of the popular choices. Move beyond the well-
known details and explore the less obvious details associated with Cassandra. You'll examine
the Cassandra data model, storage schema design, architecture, and potential surprises
associated with Cassandra.
Introduction
In the database history article "What Goes Around Comes Around," (see Resources) Michal
Stonebraker describes in detail how storage techniques have evolved over time. Before arrivingat the relational model, developers tried other models such as hierarchical and directed graph.
It is worth noting that the SQL-based relational modelwhich is the de facto standard even now
has prevailed for about 30 years. Given the short history and fast pace of computer science,
this is a remarkable achievement. The relational model is so well-established that for many years,
selecting data storage for an application was an easy choice for the solution architect. The choice
was invariably a relational database.
Developments like increasing user bases of systems, mobile devices, extended online presence
of users, cloud computing, and multi-core systems have led to increasingly large-scale systems.
High-tech companies such as Google and Amazon were among first to hit those problems of
scale. They soon found out that relational databases are not adequate to support large-scale
systems.
To circumvent those challenges, Google and Amazon came up with two alternative solutions:
Big Table and Dynamo (see Resources) where they relaxed the guarantees provided by the
relational data model to achieve higher scalability. Eric Brewer's "CAP Theorem" (see Resources)
later formalized thoseobservations. It claims that for scalable systems, consistency, availability,
and partition tolerance are trade-offs where it is impossible to build systems containing all those
http://www.ibm.com/legal/copytrade.shtmlhttp://www.ibm.com/legal/copytrade.shtmlmailto:[email protected]:[email protected]://www.ibm.com/developerworks/ibm/trademarks/http://www.ibm.com/legal/copytrade.shtml -
5/26/2018 Os Apache Cassandra PDF
2/15
developerWorks ibm.com/developerWorks/
Consider the Apache Cassandra database Page 2 of 15
properties. Soon, based on earlier work by Google and Amazon, and understanding acquired
about scalable systems, a new class of storage systems was proposed. They were named
"NoSQL" systems. The name first meant "do not use SQL if you want to scale" and later it was
redefined to "not only SQL" to mean that there are other solutions in addition to SQL-based
solutions.
There are many NoSQL systems, and each relaxes or alters some aspect of the relational model.
It is worth noting that none of the NoSQL solutions work for all scenarios. Each does better than
relational models and scales for some subsets of the use cases. My earlier article "Finding the
Right Data Solution for Your Application in the Data Storage Haystack" discusses how to match
application requirements to NoSQL solutions (see Resources).
Apache Cassandra (see Resources) is one of the first and most widely used NoSQL solutions.
This article takes a detailed look at Cassandra and points out details and tricky points not readily
apparent when you look at Cassandra for the first time.
Apache Cassandra
Cassandra is a NoSQL Column family implementation supporting the Big Table data model using
the architectural aspects introduced by Amazon Dynamo. Some of the strong points of Cassandra
are:
Highly scalable and highly available with no single point of failure
NoSQL column family implementation
Very high write throughput and good read throughput
SQL-like query language (since 0.8) and support search through secondary indexes
Tunable consistency and support for replication Flexible schema
These positive points make it easy to recommend Cassandra, but it is crucial for a developer to
delve into the details and tricky points of Cassandra to grasp the intricacies of this program.
Cassandra stores data according to the column family data model, depicted in Figure 1.
-
5/26/2018 Os Apache Cassandra PDF
3/15
ibm.com/developerWorks/ developerWorks
Consider the Apache Cassandra database Page 3 of 15
Figure 1. Cassandra data model
What is a Column?Columnis bit of a misnomer, and possibly the name cellwould have been easier to
understand. I will stick with columnas that is the common usage.
Cassandra data model consists of columns, rows, column families, and keyspace. Let's look at
each part in detail.
Column the most basic unit in the Cassandra data model, and each column consists of a
name, a value, and a timestamp. For this discussion, ignore the timestamp, and then you can
represent a column as a name value pair (such as author="Asimov").
Row a collection of columns labeled with a name. For example, Listing 1shows how a row
might be represented:
Listing 1. Example of a row "Second Foundation"-> {
author="Asimov",
publishedDate="..",
tag1="sci-fi", tag2="Asimov"
}
Cassandra consists of many storage nodes and stores each row within a single storage node.
Within each row, Cassandra always stores columns sorted by their column names. Using this
sort order, Cassandra supports slice queries where given a row, users can retrieve a subset of
-
5/26/2018 Os Apache Cassandra PDF
4/15
developerWorks ibm.com/developerWorks/
Consider the Apache Cassandra database Page 4 of 15
its columns falling within a given column name range. For example, a slice query with range
tag0 to tag9999 will get all the columns whose names fall between tag0 and tag9999.
Column family a collection of rows labeled with a name. Listing 2shows how sample data
might look:
Listing 2. Example of a column family
Books->{
"Foundation"->{author="Asimov", publishedDate=".."},
"Second Foundation"->{author="Asimov", publishedDate=".."},
}
It is often said that a column family is like a table in a relational model. As shown in the
following example, the similarities end there.
Keyspace a group of many column families together. It is only a logical grouping of column
families and provides an isolated scope for names.
Finally, super columns reside within a column family that groups several columns under a one key.As developers discourage the use of super columns, I do not discuss them here.
Cassandra versus RDBMS data models
From the above description of the Cassandra data model, data is placed in a two dimensional
(2D) space within each column family. To retrieve data in a column family, users need two keys:
row name and column name. In that sense, both the relational model and Cassandra are similar,
although there are several crucial differences.
Relational columns are homogeneous across all rows in the table. A clear vertical relationship
usually exists between data items, that is not the case with Cassandra columns. This is the
reason Cassandra stores the column name with each data item (column).
With the relational model, 2D data space is complete. Each point in the 2D space should have
at least the null value stored there. Again, this is not the case with Cassandra, and it can have
rows containing only a few items, while other rows can have millions of items.
With a relational model, the schema is predefined and cannot be changed at runtime, while
Cassandra lets users change the schema at runtime.
Cassandra always stores data such that columns are sorted based on their names. This
makes it easier to search for data through a column using slice queries, but it is harder to
search for data through a row unless you use an order-preserving partitioner.
Another crucial difference is that column names in RDMBS represent metadata aboutdata, but never data. In Cassandra, however, the names of columns can include data.
Consequently, Cassandra rows can have millions of columns, while a relational model usually
has tens of columns.
Using a well-defined immutable schema, relational models support sophisticated queries
that include JOINs, aggregations, and more. With a relational model, users can define the
data schema without worrying about queries. Cassandra does not support JOINs and most
SQL search methods. Therefore, schema has to be catered to the queries required by the
application.
-
5/26/2018 Os Apache Cassandra PDF
5/15
ibm.com/developerWorks/ developerWorks
Consider the Apache Cassandra database Page 5 of 15
To explore the above differences, consider a book rating site where users can add books (author,
rank, price, link), comments (text, time, name), and tag them. The Application needs to support the
following operations by the users:
Adding books
Adding comments for books Adding tags for books
Listing books sorted by rank
Listing books given a tag
Listing the comments given a book ID
It is rather trivial to implement the above application with a relational model. Figure 2shows the
Entityrelationship (ER) diagram for the database design.
Figure 2. ER Model for the Book rating site
Let's see how this can be implemented using the Cassandra data model. Listing 3shows a
potential schema with Cassandra, where the first line represents the "Books" column family which
has multiple rows, each having properties of the book as columns. and denote
timestamps.
Listing 3. Cassandra schema for the book rating sample
Books[BookID->(author, rank, price, link, tag, tag ..,
cmt+= text + "-" + author) ]
Tags2BooksIndex[TagID->(=bookID1, =bookID2, ..) ]
Tags2AuthorsIndex[TagID->(=bookID1, =bookID2, ..) ]RanksIndex["RANK" -> (rank=bookID)]
Table 1is a sample data set as per the schema.
Table 1. Sample data for the book rating site
Column Family Name Sample Dataset
Books "Foundation" -> ("author"="Asimov", "rank"=9, "price"=14, "tag1"="sci-
fi", "tag2"="future", "cmt1311031405922"="best book-sanjiva",
"cmt1311031405923"="well I disagree-srinath")
-
5/26/2018 Os Apache Cassandra PDF
6/15
developerWorks ibm.com/developerWorks/
Consider the Apache Cassandra database Page 6 of 15
"I Robot" -> ("author"="Asimov", "rank"=7, "price"=14, "tag1"="sci-
fi" "tag2"="robots", "cmt1311031405924"="Asimov's best-srinath",
"cmt1311031405928"="I like foundation better-sanjiva")
RanksIndex "Rank" -> (9="Foundation", 7="I Robot")
Tags2BooksIndex "sci-fi" -> ("1311031405918"="Foundation", "1311031405919"="I Robot"
"future" ->
Tags2AuthorsIndex "sci-fi" -> (1311031405920="Asimov")
"future" ->
This example shows several design differences between the relational and Cassandra models.
The Cassandra model stores data about books in a single column family called "Books," and the
other three Column Families are indexes built to support queries.
Looking at the "Books" column family in detail, the model uses a row to represent each book
where a book name is the row ID. Details about the book are represented as columns stored within
the row.
Looking closely, you might notice that data items stored (like comments, and tags that have 1:M
relationship with books) are also within a single row. To do that, append the time stamp to the
column names for tags and comments. This approach stores all data within the same column. This
action avoids having to do JOINs to retrieve data. Cassandra circumvents the lack of support for
JOINs through this approach.
This provides several advantages.
You can read all data about a book through a single query reading the complete row.
You can retrieve comments and tags without a JOIN by using slice queries that have cmt0-
cmt9999 and tag0-tag9999 as starting and ending ranges.
Because Cassandra stores columns sorted by their column names, making slice queries is very
fast. It is worth noting that storing all the details about the data item in a single row and the use of
sort orders are the most crucial ideas behind the Cassandra data design. Most Cassandra data
model designs follow these ideas in some form. User can use the sort orders while storing data
and building indexes. For example, another side effect of appending time stamps to column names
is that as column names are stored in the sorted order, comments having column names post-
fixed by the timestamps are stored in the order they are created, and search results would have
the same order.
Cassandra does not support any search methods from the basic design. Although it supports
secondary indexes, they are supported using indexes that are built later, and secondary indexes
have several limitations including lack of support for range queries.
Consequently, the best results in a Cassandra data design needs users to implement searches
by building custom indexes and utilizing column and row sort orders. Other three-column families
(Tags2BooksIndex, Tags2AuthorsIndex, and RankIndex) do exactly that. Since users need to
search for books given a tag, "Tags2BooksIndex" column family builds an index by storing the tag
name as the row ID and all books tagged by that tag as columns under that row. As shown by the
-
5/26/2018 Os Apache Cassandra PDF
7/15
ibm.com/developerWorks/ developerWorks
Consider the Apache Cassandra database Page 7 of 15
example, timestamps are added as the column keys, but that is to provide a unique column ID.
The search implementation simply reads the index by looking up the row by tag name and finding
the matches by reading all columns stored within that rowID.
Table 2discusses how each of the queries required by the application is implemented using the
above Cassandra indexes.
Table 2. Comparison of query implementations
Query description Query as SQL Cassandra implementation
List books sorted by the rank Run the query
"Select * from Books order by
rank"and then on each result do "Select
tag from Tags where bookid=?"
and "Select comment from Comments
where bookid=?"
Do a slice query on "RankIndex" column family
to receive an ordered list of books, and for
each book do a slice query on "Books" column
family to read the details about the book.
Given a tag, find the authors whose books
have the given tag.
Select distinct author
from Tags, Books whereTags.bookid=Books.bookid and tag=?
Read all columns for the given tag from
Tags2Authors using a slice query.
Given a tag, list books that have the given tag. Select bookid from Tags where
tag=?
Read all columns for the given tag from
Tags2BooksIndex using a slice query.
Given a book, list the comments for that book
in sorted order of the time when the comments
were created.
Select text, time, user from
Comments where bookid=? Order by
time
In "Books" column family, do a slice query from
the row corresponding to the given book. They
are in sorted order due to timestamps used as
the column name.
Although the above design can efficiently support queries required by the book-rating site, it can
only support queries that it is designed for and cannot support ad-hoc queries. For example, it
cannot do the following queries without building new indexes.
Select * from Books where price > 50;
Select * from Books where author="Asimov"
It is possible to change the design to support those and other queries by either building
appropriate indexes or by writing code to walk through the data. The need for custom code to
support new queries, however, is a limitation compared to relational models where adding new
queries often needs no changes to the schema.
From the 0.8 release, Cassandra supports secondary indexes where users can specify a search
by a given property, and Cassandra automatically builds indexes for searching based on that
property. That model, however, provides less flexibility. For example, secondary indexes do not
support range queries and provide no guarantees on sort orders of results.
Using Cassandra from the Java environment
Cassandra has many clients written in different languages. This article focuses on the Hector client
(see Resources), which is the most widely used Java client for Cassandra. Users can add to their
application by adding the Hector JARs to the application classpath. Listing 4shows a sample
Hector client.
-
5/26/2018 Os Apache Cassandra PDF
8/15
developerWorks ibm.com/developerWorks/
Consider the Apache Cassandra database Page 8 of 15
First, connect to a Cassandra cluster. Use the instructions in the Cassandra Getting Started
Page (see Resources) to set up a Cassandra node. Unless its configuration has been changed, it
typically runs on port 9160. Next, define a keyspace. This can be done either through the client or
through the conf/cassandra.yaml configuration file.
Listing 4. Sample Hector client code for Cassandra
Cluster cluster = HFactory.createCluster('TestCluster',
new CassandraHostConfigurator("localhost:9160"));
//define a keyspace
Keyspace keyspace = HFactory.createKeyspace("BooksRating", cluster);
//Now let's add a new column.
String rowID = "Foundation";
String columnFamily = "Books";
Mutator
mutator = HFactory.createMutator(keyspace, user);
mutator.insert(rowID, columnFamily,
HFactory.createStringColumn("author", "Asimov"));
//Now let's read the column back
ColumnQuery
columnQuery = HFactory.createStringColumnQuery(keyspace);
columnQuery.setColumnFamily(columnFamily).setKey(wso2).setName("address");
QueryResult
-
5/26/2018 Os Apache Cassandra PDF
9/15
ibm.com/developerWorks/ developerWorks
Consider the Apache Cassandra database Page 9 of 15
Figure 3. Cassandra cluster
Cassandra uses consistent hashing to assign data items to nodes. In simple terms, Cassandra
uses a hash algorithm to calculate the hash for keys of each data item stored in Cassandra (for
example, column name, row ID). The hash range or all possible hash values (also known as
keyspace) is divided among the nodes in the Cassandra cluster. Then Cassandra assigns each
data item to the node, and that node is responsible for storing and managing the data item. The
paper "Cassandra - A Decentralized Structured Storage System" (see Resources) provides adetailed discussion about Cassandra architecture.
The resulting architecture provides the following properties:
Cassandra distributes data among its nodes transparently to the users. Any node can accept
any request (read, write, or delete) and route it to the correct node even if the data is not
stored in that node.
Users can define how many replicas are needed, and Cassandra handles replica creation and
management transparently.
Tunable consistency: When storing and reading data, users can choose the expected
consistency level per each operation. For example, if the "quorum" consistency level is used
while writing or reading, data is written and read from more than half of the nodes in the
cluster. Support for tunable consistency enables users to choose the consistency level best
suited to the use case.
Cassandra provides very fast writes, and they are actually faster than reads where it can
transfer data about 80-360MB/sec per node. It achieves this using two techniques.
Cassandra keeps most of the data within memory at the responsible node, and any
updates are done in the memory and written to the persistent storage (file system) in
a lazy fashion. To avoid losing data, however, Cassandra writes all transactions to a
-
5/26/2018 Os Apache Cassandra PDF
10/15
developerWorks ibm.com/developerWorks/
Consider the Apache Cassandra database Page 10 of 15
commit log in the disk. Unlike updating data items in the disk, writes to commit logs are
append-only and, therefore, avoid rotational delay while writing to the disk. For more
information on disk-drive performance characteristics, see Resources.
Unless writes have requested full consistency, Cassandra writes data to enough nodes
without resolving any data inconsistencies where it resolves inconsistencies only at the
first read. This process is called "read repair."
The resulting architecture is highly scalable. You can build a Cassandra cluster that has 10s of
100s of nodes that is capable of handling terabytes to petabytes of data. There is a trade-off with
distributed systems, and scale almost never comes for free. As mentioned before, a user might
face many surprises moving from a relational database to Cassandra. The next section discusses
some of them.
Possible surprises with Cassandra
Be aware of these differences when you move from a relational database to Cassandra.
No transactions, no JOINs
It is well known that Cassandra does not support ACID transactions. Although it has a batch
operation, there is no guarantee that sub-operations within the batch operation are carried out in
an atomic fashion. This will be discussed more under Failed operations may leave changes.
Furthermore, Cassandra does not support JOINs. If a user needs to join two column families, you
must retrieve and join data programmatically. This is often expensive and time-consuming for large
data sets. Cassandra circumvents this limitation by storing as much data as possible in the same
row, as described in the example.
No foreign keys and keys are immutable
Cassandra does not support foreign keys, so it is not possible for Cassandra to manage the data
consistency on a user's behalf. Therefore, the application should handle the data consistency.
Furthermore, users cannot change the keys. It is recommended to use surrogate keys (generated
keys instead of the key, and managing the key as a property) with the use cases that need
changes to the keys.
Keys have to be uniqueEach key, for example row keys and column keys, has to be unique in its scope, and if the same
key has been used twice it will overwrite the data.
There are two solutions to this problem. First, you can use a composite key. In other words, create
the key by combining several fields together, and this solution is often used with row keys. The
second solution is when there is a danger of the same key occurring twice, postfix the key with a
random value or a timestamp. This often happens with indexes when an index stores a value as
the column name. For example, in the book rating application the rank was used as the column
-
5/26/2018 Os Apache Cassandra PDF
11/15
ibm.com/developerWorks/ developerWorks
Consider the Apache Cassandra database Page 11 of 15
name. To avoid having two entries having the same column name because both have the same
rank, the timestamp is added to the rank as a postfix.
Failed operations may leave changes
As explained before, Cassandra does not support atomic operations. Instead, it supports
idempotent operations. Idempotent operations leave the system in the same state regardless of
how many times the operations are carried out. All Cassandra operations are idempotent. If an
operation fails, you can retry it without any problem. This provides a mechanism to recover from
transient failures.
Also Cassandra supports batch operations, but they do not have any atomicity guarantees either.
Since the operations are idempotent, the client can keep retrying until all operations of the batch
are successful.
Idempotent operations are not equal to atomic operations. If an operation is successful, all is well
and the outcome is identical to atomic operations. If an operation fails, the client can retry, and ifit is successful, again all is well. If, however, the operations fails even after retrying, unlike with
atomic operations, it might leave side effects. Unfortunately, with Cassandra, this is a complexity
that programmers have to deal with themselves.
Searching is complicated
Searching is not built into the core of the Cassandra architecture, and search mechanisms are
layered on top using sort orders as described earlier. Cassandra supports secondary indexes
where the system automatically builds them, with some limited functionality. When secondary
indexes do not work, users have to learn the data model and build indexes using sort orders and
slices.
Three types of complexities area associated with building search methods:
1. Building custom search methods require programmers to understand indexing and details
about storage to a certain extent. Therefore, Cassandra needs higher skilled developers than
with relational models.
2. Custom indexes heavily depend on sorted orders, and they are complicated. There are two
types of sort orders: first, the columns are always sorted by name, and second, the row sort
orders work only if an order-preserving partitioner (see Resources) is used.
3. Adding a new query often needs new indexes and code changes unlike with relational
models. This requires developers to analyze queries before storing the data.
Super columns and order preserving partitioners are discouraged
Cassandra super columns can be useful when modeling multi-level data, where it adds one more
level to the hierarchy. Anything that can be modeled with super columns, however, can also be
supported through columns. Hence, super columns do not provide additional power. Also, they
do not support secondary indexes. Therefore, the Cassandra developers discourage the use of
super columns. Although there is no firm date for discontinuing support, it might happen in future
releases.
-
5/26/2018 Os Apache Cassandra PDF
12/15
developerWorks ibm.com/developerWorks/
Consider the Apache Cassandra database Page 12 of 15
A partitioner in Cassandra decides how to distribute (shard) data among Cassandra nodes, and
there are many implementations. If an order-preserving partitioner is used, rowIDs are stored in
a sorted order and Cassandra can do slices (searches) across rowIDs as well. This partitioner
does not distribute the data uniformly among its nodes, however, and with large datasets, some
of the nodes might be hard-pressed while others are lightly loaded. Therefore, developers also
discourage the use of order-preserving partitioners.
Healing from failure is manual
If a node in a Cassandra cluster has failed, the cluster will continue to work if you have replicas.
Full recovery, which is to redistribute data and compensate for missing replicas, is a manual
operation through a command line tool called node tool(see Resources). Also, while the manual
operation happens, the system will be unavailable.
It remembers deletes
Cassandra is designed such that it continues to work without a problem even if a node goes down
(or gets disconnected) and comes back later. A consequence is this complicates data deletions.
For example, assume a node is down. While down, a data item has been deleted in replicas.
When the unavailable node comes back on, it will reintroduce the deleted data item at the syncing
process unless Cassandra remembers that data item has been deleted.
Therefore, Cassandra has to remember that the data item has been deleted. In the 0.8 release,
Cassandra was remembering all the data even if it is deleted. This caused disk usage to keep
growing for update-intensive operations. Cassandra does not have to remember all the deleted
data, but just the fact that a data item has been deleted. This fix was done in later releases of
Cassandra.
Conclusion
This article delves into some details that are not readily apparent when you consider Cassandra.
I described the Cassandra data model, comparing it with the relational data model, and
demonstrated a typical schema design with Cassandra. A key observation is that unlike the
relational model that breaks data into many tables, Cassandra tends to keep as much as data as
possible within the same row to avoiding having to join that data for retrieval.
You also looked at several limitations of the Cassandra-based approach. These limitations,
however, are common to most NoSQL solutions, and are often conscious design trade-offs to
enable high scalability.
-
5/26/2018 Os Apache Cassandra PDF
13/15
ibm.com/developerWorks/ developerWorks
Consider the Apache Cassandra database Page 13 of 15
Downloads
Description Name Size
Book rating sample code CassandraSample.zip 42KB
http://www.ibm.com/developerworks/apps/download/index.jsp?contentid=823612&filename=CassandraSample.zip&method=http&locale= -
5/26/2018 Os Apache Cassandra PDF
14/15
developerWorks ibm.com/developerWorks/
Consider the Apache Cassandra database Page 14 of 15
Resources
Learn
Read What goes around comes around(Michael Stonebraker and Joey Hellerstein, 2007), if
you are interested about the history of storage technologies. Read Getting Started Pagein the Cassandra Wiki to install Cassandra, run single node
Cassandra, and find an overview of how to configure multinode clusters.
Read the paper, Cassandra - A Decentralized Structured Storage System(Avinash
Lakshman and Prashant Malik, 2009) to understand the Cassandra architecture in more
detail.
Read more about Google's Big Tableand Amazon's Dynamo.
Read about Eric Brewer's CAP theorem(Julian Browne, January 2009).
See Finding the Right Data Solution for Your Application in the Data Storage Haystack
(Srinath Perera, InfoQ, October 2011) for an overview of NoSQL landscape and
recommendations onhow to choose the right NoSQL storage. Learn more about Disk-drive performance characteristicson Wikipedia.
Read about node tool, a simple command line interface to these exposed operations and
attributes on the Cassandra Wiki.
Find more details about consistency levels in the Casandra API wiki.
Read more about storage configurations.
The Open Source developerWorks zoneprovides a wealth of information on open source
tools and using open source technologies.
Stay current with developerWorks technical events and webcastsfocused on a variety of IBM
products and ITindustry topics.
Attend a free developerWorks Live! briefingto get up-to-speed quickly on IBM products and
tools, as well as IT industry trends.
Watch developerWorks on-demand demosranging from product installation and setup demos
for beginners, to advanced functionality for experienced developers.
Follow developerWorks on Twitter.
Get products and technologies
Explore Cassandraon the project website.
Check out the Hector clienton the project website.
Download Cassandraand find instructions on how to use it at cassandra.apache.org.
Evaluate IBM productsin the way that suits you best: Download a product trial, try a productonline, use a product in a cloud environment, or spend a few hours in the SOA Sandbox
learning how to implement ServiceOrientedArchitecture efficiently.
Discuss
Check out developerWorks blogsand get involved in the developerWorks community.
Get involved in the developerWorks community. Connect with other developerWorks users
while exploring the developer-driven blogs, forums, groups, and wikis.
http://www.ibm.com/developerworks/downloads/index.htmlhttp://cassandra.apache.org/https://github.com/rantav/hectorhttp://cassandra.apache.org/http://cassandra.apache.org/http://www.twitter.com/developerworks/http://www.twitter.com/developerworks/http://www.ibm.com/developerworks/offers/lp/demos/index.htmlhttp://www.ibm.com/developerworks/offers/techbriefings/index.htmlhttp://www.ibm.com/developerworks/offers/techbriefings/events.htmlhttp://www.ibm.com/developerworks/opensource/http://wiki.apache.org/cassandra/StorageConfigurationhttp://wiki.apache.org/cassandra/StorageConfigurationhttp://en.wikipedia.org/wiki/Disk-drive_performance_characteristicshttp://www.infoq.com/articles/perera-data-storage-haystackhttp://www.infoq.com/articles/perera-data-storage-haystackhttp://www.julianbrowne.com/article/viewer/brewers-cap-theoremhttp://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/bigtable-osdi06.pdfhttp://db.cs.pitt.edu/courses/cs3551/11-1/handouts/10-1.1.1.115.1568.pdfhttp://www.ibm.com/developerworks/communityhttp://www.ibm.com/developerworks/communityhttp://www.ibm.com/developerworks/blogs/http://www.ibm.com/developerworks/downloads/soasandbox/index.htmlhttp://www.ibm.com/developerworks/downloads/index.htmlhttp://cassandra.apache.org/https://github.com/rantav/hectorhttp://cassandra.apache.org/http://www.twitter.com/developerworks/http://www.ibm.com/developerworks/offers/lp/demos/index.htmlhttp://www.ibm.com/developerworks/offers/techbriefings/index.htmlhttp://www.ibm.com/developerworks/offers/techbriefings/events.htmlhttp://www.ibm.com/developerworks/opensource/http://wiki.apache.org/cassandra/StorageConfigurationhttp://wiki.apache.org/cassandra/APIhttp://wiki.apache.org/cassandra/NodeToolhttp://en.wikipedia.org/wiki/Disk-drive_performance_characteristicshttp://www.infoq.com/articles/perera-data-storage-haystackhttp://www.julianbrowne.com/article/viewer/brewers-cap-theoremhttp://db.cs.pitt.edu/courses/cs3551/11-1/handouts/10-1.1.1.115.1568.pdfhttp://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/bigtable-osdi06.pdfhttp://www.cs.cornell.edu/Projects/ladis2009/papers/Lakshman-ladis2009.PDFhttp://wiki.apache.org/cassandra/GettingStartedhttp://idke.ruc.edu.cn/seminars/phd/2007/11.07/What%20Goes%20Around%20Comes%20Around.pdf -
5/26/2018 Os Apache Cassandra PDF
15/15
ibm.com/developerWorks/ developerWorks
Consider the Apache Cassandra database Page 15 of 15
About the author
Srinath Perera
Srinath works as a Senior Software architect at WSO2 Inc., where he overlooks the
overall WSO2 platform architecture with the CTO. He also serves as a research
scientist at Lanka Software Foundation and teaches as a visiting faculty at
Department of Computer Science and Engineering, University of Moratuwa. He
is a co-founder of Apache Axis2, and he has been involved with the Apache Web
Service project since 2002 and is a member of Apache Software foundation, PMC
and the Apache Web Service project. Srinath is also a committer of Apache open
source projects Axis, Axis2, and Geronimo. Srinath received his Ph.D. and M.Sc.
in Computer Sciences from Indiana University, Bloomington, USA and received
his Bachelor of Science in Computer Science and Engineering from University of
Moratuwa, Sri Lanka.
Copyright IBM Corporation 2012
(www.ibm.com/legal/copytrade.shtml)
Trademarks
(www.ibm.com/developerworks/ibm/trademarks/)
http://www.ibm.com/developerworks/ibm/trademarks/http://www.ibm.com/legal/copytrade.shtml