os apache cassandra pdf

Upload: manishsg

Post on 15-Oct-2015

25 views

Category:

Documents


0 download

DESCRIPTION

apcahce

TRANSCRIPT

  • 5/26/2018 Os Apache Cassandra PDF

    1/15

    Copyright IBM Corporation 2012 Trademarks

    Consider the Apache Cassandra database Page 1 of 15

    Consider the Apache Cassandra database

    What are the pros and cons of this NoSQL database?

    Srinath Perera([email protected])

    Senior Software Architect

    WSO2 Inc

    03 July 2012

    NoSQL storage provides a flexible and scalable alternative to relational databases, andamong many such storages, Cassandra is one of the popular choices. Move beyond the well-

    known details and explore the less obvious details associated with Cassandra. You'll examine

    the Cassandra data model, storage schema design, architecture, and potential surprises

    associated with Cassandra.

    Introduction

    In the database history article "What Goes Around Comes Around," (see Resources) Michal

    Stonebraker describes in detail how storage techniques have evolved over time. Before arrivingat the relational model, developers tried other models such as hierarchical and directed graph.

    It is worth noting that the SQL-based relational modelwhich is the de facto standard even now

    has prevailed for about 30 years. Given the short history and fast pace of computer science,

    this is a remarkable achievement. The relational model is so well-established that for many years,

    selecting data storage for an application was an easy choice for the solution architect. The choice

    was invariably a relational database.

    Developments like increasing user bases of systems, mobile devices, extended online presence

    of users, cloud computing, and multi-core systems have led to increasingly large-scale systems.

    High-tech companies such as Google and Amazon were among first to hit those problems of

    scale. They soon found out that relational databases are not adequate to support large-scale

    systems.

    To circumvent those challenges, Google and Amazon came up with two alternative solutions:

    Big Table and Dynamo (see Resources) where they relaxed the guarantees provided by the

    relational data model to achieve higher scalability. Eric Brewer's "CAP Theorem" (see Resources)

    later formalized thoseobservations. It claims that for scalable systems, consistency, availability,

    and partition tolerance are trade-offs where it is impossible to build systems containing all those

    http://www.ibm.com/legal/copytrade.shtmlhttp://www.ibm.com/legal/copytrade.shtmlmailto:[email protected]:[email protected]://www.ibm.com/developerworks/ibm/trademarks/http://www.ibm.com/legal/copytrade.shtml
  • 5/26/2018 Os Apache Cassandra PDF

    2/15

    developerWorks ibm.com/developerWorks/

    Consider the Apache Cassandra database Page 2 of 15

    properties. Soon, based on earlier work by Google and Amazon, and understanding acquired

    about scalable systems, a new class of storage systems was proposed. They were named

    "NoSQL" systems. The name first meant "do not use SQL if you want to scale" and later it was

    redefined to "not only SQL" to mean that there are other solutions in addition to SQL-based

    solutions.

    There are many NoSQL systems, and each relaxes or alters some aspect of the relational model.

    It is worth noting that none of the NoSQL solutions work for all scenarios. Each does better than

    relational models and scales for some subsets of the use cases. My earlier article "Finding the

    Right Data Solution for Your Application in the Data Storage Haystack" discusses how to match

    application requirements to NoSQL solutions (see Resources).

    Apache Cassandra (see Resources) is one of the first and most widely used NoSQL solutions.

    This article takes a detailed look at Cassandra and points out details and tricky points not readily

    apparent when you look at Cassandra for the first time.

    Apache Cassandra

    Cassandra is a NoSQL Column family implementation supporting the Big Table data model using

    the architectural aspects introduced by Amazon Dynamo. Some of the strong points of Cassandra

    are:

    Highly scalable and highly available with no single point of failure

    NoSQL column family implementation

    Very high write throughput and good read throughput

    SQL-like query language (since 0.8) and support search through secondary indexes

    Tunable consistency and support for replication Flexible schema

    These positive points make it easy to recommend Cassandra, but it is crucial for a developer to

    delve into the details and tricky points of Cassandra to grasp the intricacies of this program.

    Cassandra stores data according to the column family data model, depicted in Figure 1.

  • 5/26/2018 Os Apache Cassandra PDF

    3/15

    ibm.com/developerWorks/ developerWorks

    Consider the Apache Cassandra database Page 3 of 15

    Figure 1. Cassandra data model

    What is a Column?Columnis bit of a misnomer, and possibly the name cellwould have been easier to

    understand. I will stick with columnas that is the common usage.

    Cassandra data model consists of columns, rows, column families, and keyspace. Let's look at

    each part in detail.

    Column the most basic unit in the Cassandra data model, and each column consists of a

    name, a value, and a timestamp. For this discussion, ignore the timestamp, and then you can

    represent a column as a name value pair (such as author="Asimov").

    Row a collection of columns labeled with a name. For example, Listing 1shows how a row

    might be represented:

    Listing 1. Example of a row "Second Foundation"-> {

    author="Asimov",

    publishedDate="..",

    tag1="sci-fi", tag2="Asimov"

    }

    Cassandra consists of many storage nodes and stores each row within a single storage node.

    Within each row, Cassandra always stores columns sorted by their column names. Using this

    sort order, Cassandra supports slice queries where given a row, users can retrieve a subset of

  • 5/26/2018 Os Apache Cassandra PDF

    4/15

    developerWorks ibm.com/developerWorks/

    Consider the Apache Cassandra database Page 4 of 15

    its columns falling within a given column name range. For example, a slice query with range

    tag0 to tag9999 will get all the columns whose names fall between tag0 and tag9999.

    Column family a collection of rows labeled with a name. Listing 2shows how sample data

    might look:

    Listing 2. Example of a column family

    Books->{

    "Foundation"->{author="Asimov", publishedDate=".."},

    "Second Foundation"->{author="Asimov", publishedDate=".."},

    }

    It is often said that a column family is like a table in a relational model. As shown in the

    following example, the similarities end there.

    Keyspace a group of many column families together. It is only a logical grouping of column

    families and provides an isolated scope for names.

    Finally, super columns reside within a column family that groups several columns under a one key.As developers discourage the use of super columns, I do not discuss them here.

    Cassandra versus RDBMS data models

    From the above description of the Cassandra data model, data is placed in a two dimensional

    (2D) space within each column family. To retrieve data in a column family, users need two keys:

    row name and column name. In that sense, both the relational model and Cassandra are similar,

    although there are several crucial differences.

    Relational columns are homogeneous across all rows in the table. A clear vertical relationship

    usually exists between data items, that is not the case with Cassandra columns. This is the

    reason Cassandra stores the column name with each data item (column).

    With the relational model, 2D data space is complete. Each point in the 2D space should have

    at least the null value stored there. Again, this is not the case with Cassandra, and it can have

    rows containing only a few items, while other rows can have millions of items.

    With a relational model, the schema is predefined and cannot be changed at runtime, while

    Cassandra lets users change the schema at runtime.

    Cassandra always stores data such that columns are sorted based on their names. This

    makes it easier to search for data through a column using slice queries, but it is harder to

    search for data through a row unless you use an order-preserving partitioner.

    Another crucial difference is that column names in RDMBS represent metadata aboutdata, but never data. In Cassandra, however, the names of columns can include data.

    Consequently, Cassandra rows can have millions of columns, while a relational model usually

    has tens of columns.

    Using a well-defined immutable schema, relational models support sophisticated queries

    that include JOINs, aggregations, and more. With a relational model, users can define the

    data schema without worrying about queries. Cassandra does not support JOINs and most

    SQL search methods. Therefore, schema has to be catered to the queries required by the

    application.

  • 5/26/2018 Os Apache Cassandra PDF

    5/15

    ibm.com/developerWorks/ developerWorks

    Consider the Apache Cassandra database Page 5 of 15

    To explore the above differences, consider a book rating site where users can add books (author,

    rank, price, link), comments (text, time, name), and tag them. The Application needs to support the

    following operations by the users:

    Adding books

    Adding comments for books Adding tags for books

    Listing books sorted by rank

    Listing books given a tag

    Listing the comments given a book ID

    It is rather trivial to implement the above application with a relational model. Figure 2shows the

    Entityrelationship (ER) diagram for the database design.

    Figure 2. ER Model for the Book rating site

    Let's see how this can be implemented using the Cassandra data model. Listing 3shows a

    potential schema with Cassandra, where the first line represents the "Books" column family which

    has multiple rows, each having properties of the book as columns. and denote

    timestamps.

    Listing 3. Cassandra schema for the book rating sample

    Books[BookID->(author, rank, price, link, tag, tag ..,

    cmt+= text + "-" + author) ]

    Tags2BooksIndex[TagID->(=bookID1, =bookID2, ..) ]

    Tags2AuthorsIndex[TagID->(=bookID1, =bookID2, ..) ]RanksIndex["RANK" -> (rank=bookID)]

    Table 1is a sample data set as per the schema.

    Table 1. Sample data for the book rating site

    Column Family Name Sample Dataset

    Books "Foundation" -> ("author"="Asimov", "rank"=9, "price"=14, "tag1"="sci-

    fi", "tag2"="future", "cmt1311031405922"="best book-sanjiva",

    "cmt1311031405923"="well I disagree-srinath")

  • 5/26/2018 Os Apache Cassandra PDF

    6/15

    developerWorks ibm.com/developerWorks/

    Consider the Apache Cassandra database Page 6 of 15

    "I Robot" -> ("author"="Asimov", "rank"=7, "price"=14, "tag1"="sci-

    fi" "tag2"="robots", "cmt1311031405924"="Asimov's best-srinath",

    "cmt1311031405928"="I like foundation better-sanjiva")

    RanksIndex "Rank" -> (9="Foundation", 7="I Robot")

    Tags2BooksIndex "sci-fi" -> ("1311031405918"="Foundation", "1311031405919"="I Robot"

    "future" ->

    Tags2AuthorsIndex "sci-fi" -> (1311031405920="Asimov")

    "future" ->

    This example shows several design differences between the relational and Cassandra models.

    The Cassandra model stores data about books in a single column family called "Books," and the

    other three Column Families are indexes built to support queries.

    Looking at the "Books" column family in detail, the model uses a row to represent each book

    where a book name is the row ID. Details about the book are represented as columns stored within

    the row.

    Looking closely, you might notice that data items stored (like comments, and tags that have 1:M

    relationship with books) are also within a single row. To do that, append the time stamp to the

    column names for tags and comments. This approach stores all data within the same column. This

    action avoids having to do JOINs to retrieve data. Cassandra circumvents the lack of support for

    JOINs through this approach.

    This provides several advantages.

    You can read all data about a book through a single query reading the complete row.

    You can retrieve comments and tags without a JOIN by using slice queries that have cmt0-

    cmt9999 and tag0-tag9999 as starting and ending ranges.

    Because Cassandra stores columns sorted by their column names, making slice queries is very

    fast. It is worth noting that storing all the details about the data item in a single row and the use of

    sort orders are the most crucial ideas behind the Cassandra data design. Most Cassandra data

    model designs follow these ideas in some form. User can use the sort orders while storing data

    and building indexes. For example, another side effect of appending time stamps to column names

    is that as column names are stored in the sorted order, comments having column names post-

    fixed by the timestamps are stored in the order they are created, and search results would have

    the same order.

    Cassandra does not support any search methods from the basic design. Although it supports

    secondary indexes, they are supported using indexes that are built later, and secondary indexes

    have several limitations including lack of support for range queries.

    Consequently, the best results in a Cassandra data design needs users to implement searches

    by building custom indexes and utilizing column and row sort orders. Other three-column families

    (Tags2BooksIndex, Tags2AuthorsIndex, and RankIndex) do exactly that. Since users need to

    search for books given a tag, "Tags2BooksIndex" column family builds an index by storing the tag

    name as the row ID and all books tagged by that tag as columns under that row. As shown by the

  • 5/26/2018 Os Apache Cassandra PDF

    7/15

    ibm.com/developerWorks/ developerWorks

    Consider the Apache Cassandra database Page 7 of 15

    example, timestamps are added as the column keys, but that is to provide a unique column ID.

    The search implementation simply reads the index by looking up the row by tag name and finding

    the matches by reading all columns stored within that rowID.

    Table 2discusses how each of the queries required by the application is implemented using the

    above Cassandra indexes.

    Table 2. Comparison of query implementations

    Query description Query as SQL Cassandra implementation

    List books sorted by the rank Run the query

    "Select * from Books order by

    rank"and then on each result do "Select

    tag from Tags where bookid=?"

    and "Select comment from Comments

    where bookid=?"

    Do a slice query on "RankIndex" column family

    to receive an ordered list of books, and for

    each book do a slice query on "Books" column

    family to read the details about the book.

    Given a tag, find the authors whose books

    have the given tag.

    Select distinct author

    from Tags, Books whereTags.bookid=Books.bookid and tag=?

    Read all columns for the given tag from

    Tags2Authors using a slice query.

    Given a tag, list books that have the given tag. Select bookid from Tags where

    tag=?

    Read all columns for the given tag from

    Tags2BooksIndex using a slice query.

    Given a book, list the comments for that book

    in sorted order of the time when the comments

    were created.

    Select text, time, user from

    Comments where bookid=? Order by

    time

    In "Books" column family, do a slice query from

    the row corresponding to the given book. They

    are in sorted order due to timestamps used as

    the column name.

    Although the above design can efficiently support queries required by the book-rating site, it can

    only support queries that it is designed for and cannot support ad-hoc queries. For example, it

    cannot do the following queries without building new indexes.

    Select * from Books where price > 50;

    Select * from Books where author="Asimov"

    It is possible to change the design to support those and other queries by either building

    appropriate indexes or by writing code to walk through the data. The need for custom code to

    support new queries, however, is a limitation compared to relational models where adding new

    queries often needs no changes to the schema.

    From the 0.8 release, Cassandra supports secondary indexes where users can specify a search

    by a given property, and Cassandra automatically builds indexes for searching based on that

    property. That model, however, provides less flexibility. For example, secondary indexes do not

    support range queries and provide no guarantees on sort orders of results.

    Using Cassandra from the Java environment

    Cassandra has many clients written in different languages. This article focuses on the Hector client

    (see Resources), which is the most widely used Java client for Cassandra. Users can add to their

    application by adding the Hector JARs to the application classpath. Listing 4shows a sample

    Hector client.

  • 5/26/2018 Os Apache Cassandra PDF

    8/15

    developerWorks ibm.com/developerWorks/

    Consider the Apache Cassandra database Page 8 of 15

    First, connect to a Cassandra cluster. Use the instructions in the Cassandra Getting Started

    Page (see Resources) to set up a Cassandra node. Unless its configuration has been changed, it

    typically runs on port 9160. Next, define a keyspace. This can be done either through the client or

    through the conf/cassandra.yaml configuration file.

    Listing 4. Sample Hector client code for Cassandra

    Cluster cluster = HFactory.createCluster('TestCluster',

    new CassandraHostConfigurator("localhost:9160"));

    //define a keyspace

    Keyspace keyspace = HFactory.createKeyspace("BooksRating", cluster);

    //Now let's add a new column.

    String rowID = "Foundation";

    String columnFamily = "Books";

    Mutator

    mutator = HFactory.createMutator(keyspace, user);

    mutator.insert(rowID, columnFamily,

    HFactory.createStringColumn("author", "Asimov"));

    //Now let's read the column back

    ColumnQuery

    columnQuery = HFactory.createStringColumnQuery(keyspace);

    columnQuery.setColumnFamily(columnFamily).setKey(wso2).setName("address");

    QueryResult

  • 5/26/2018 Os Apache Cassandra PDF

    9/15

    ibm.com/developerWorks/ developerWorks

    Consider the Apache Cassandra database Page 9 of 15

    Figure 3. Cassandra cluster

    Cassandra uses consistent hashing to assign data items to nodes. In simple terms, Cassandra

    uses a hash algorithm to calculate the hash for keys of each data item stored in Cassandra (for

    example, column name, row ID). The hash range or all possible hash values (also known as

    keyspace) is divided among the nodes in the Cassandra cluster. Then Cassandra assigns each

    data item to the node, and that node is responsible for storing and managing the data item. The

    paper "Cassandra - A Decentralized Structured Storage System" (see Resources) provides adetailed discussion about Cassandra architecture.

    The resulting architecture provides the following properties:

    Cassandra distributes data among its nodes transparently to the users. Any node can accept

    any request (read, write, or delete) and route it to the correct node even if the data is not

    stored in that node.

    Users can define how many replicas are needed, and Cassandra handles replica creation and

    management transparently.

    Tunable consistency: When storing and reading data, users can choose the expected

    consistency level per each operation. For example, if the "quorum" consistency level is used

    while writing or reading, data is written and read from more than half of the nodes in the

    cluster. Support for tunable consistency enables users to choose the consistency level best

    suited to the use case.

    Cassandra provides very fast writes, and they are actually faster than reads where it can

    transfer data about 80-360MB/sec per node. It achieves this using two techniques.

    Cassandra keeps most of the data within memory at the responsible node, and any

    updates are done in the memory and written to the persistent storage (file system) in

    a lazy fashion. To avoid losing data, however, Cassandra writes all transactions to a

  • 5/26/2018 Os Apache Cassandra PDF

    10/15

    developerWorks ibm.com/developerWorks/

    Consider the Apache Cassandra database Page 10 of 15

    commit log in the disk. Unlike updating data items in the disk, writes to commit logs are

    append-only and, therefore, avoid rotational delay while writing to the disk. For more

    information on disk-drive performance characteristics, see Resources.

    Unless writes have requested full consistency, Cassandra writes data to enough nodes

    without resolving any data inconsistencies where it resolves inconsistencies only at the

    first read. This process is called "read repair."

    The resulting architecture is highly scalable. You can build a Cassandra cluster that has 10s of

    100s of nodes that is capable of handling terabytes to petabytes of data. There is a trade-off with

    distributed systems, and scale almost never comes for free. As mentioned before, a user might

    face many surprises moving from a relational database to Cassandra. The next section discusses

    some of them.

    Possible surprises with Cassandra

    Be aware of these differences when you move from a relational database to Cassandra.

    No transactions, no JOINs

    It is well known that Cassandra does not support ACID transactions. Although it has a batch

    operation, there is no guarantee that sub-operations within the batch operation are carried out in

    an atomic fashion. This will be discussed more under Failed operations may leave changes.

    Furthermore, Cassandra does not support JOINs. If a user needs to join two column families, you

    must retrieve and join data programmatically. This is often expensive and time-consuming for large

    data sets. Cassandra circumvents this limitation by storing as much data as possible in the same

    row, as described in the example.

    No foreign keys and keys are immutable

    Cassandra does not support foreign keys, so it is not possible for Cassandra to manage the data

    consistency on a user's behalf. Therefore, the application should handle the data consistency.

    Furthermore, users cannot change the keys. It is recommended to use surrogate keys (generated

    keys instead of the key, and managing the key as a property) with the use cases that need

    changes to the keys.

    Keys have to be uniqueEach key, for example row keys and column keys, has to be unique in its scope, and if the same

    key has been used twice it will overwrite the data.

    There are two solutions to this problem. First, you can use a composite key. In other words, create

    the key by combining several fields together, and this solution is often used with row keys. The

    second solution is when there is a danger of the same key occurring twice, postfix the key with a

    random value or a timestamp. This often happens with indexes when an index stores a value as

    the column name. For example, in the book rating application the rank was used as the column

  • 5/26/2018 Os Apache Cassandra PDF

    11/15

    ibm.com/developerWorks/ developerWorks

    Consider the Apache Cassandra database Page 11 of 15

    name. To avoid having two entries having the same column name because both have the same

    rank, the timestamp is added to the rank as a postfix.

    Failed operations may leave changes

    As explained before, Cassandra does not support atomic operations. Instead, it supports

    idempotent operations. Idempotent operations leave the system in the same state regardless of

    how many times the operations are carried out. All Cassandra operations are idempotent. If an

    operation fails, you can retry it without any problem. This provides a mechanism to recover from

    transient failures.

    Also Cassandra supports batch operations, but they do not have any atomicity guarantees either.

    Since the operations are idempotent, the client can keep retrying until all operations of the batch

    are successful.

    Idempotent operations are not equal to atomic operations. If an operation is successful, all is well

    and the outcome is identical to atomic operations. If an operation fails, the client can retry, and ifit is successful, again all is well. If, however, the operations fails even after retrying, unlike with

    atomic operations, it might leave side effects. Unfortunately, with Cassandra, this is a complexity

    that programmers have to deal with themselves.

    Searching is complicated

    Searching is not built into the core of the Cassandra architecture, and search mechanisms are

    layered on top using sort orders as described earlier. Cassandra supports secondary indexes

    where the system automatically builds them, with some limited functionality. When secondary

    indexes do not work, users have to learn the data model and build indexes using sort orders and

    slices.

    Three types of complexities area associated with building search methods:

    1. Building custom search methods require programmers to understand indexing and details

    about storage to a certain extent. Therefore, Cassandra needs higher skilled developers than

    with relational models.

    2. Custom indexes heavily depend on sorted orders, and they are complicated. There are two

    types of sort orders: first, the columns are always sorted by name, and second, the row sort

    orders work only if an order-preserving partitioner (see Resources) is used.

    3. Adding a new query often needs new indexes and code changes unlike with relational

    models. This requires developers to analyze queries before storing the data.

    Super columns and order preserving partitioners are discouraged

    Cassandra super columns can be useful when modeling multi-level data, where it adds one more

    level to the hierarchy. Anything that can be modeled with super columns, however, can also be

    supported through columns. Hence, super columns do not provide additional power. Also, they

    do not support secondary indexes. Therefore, the Cassandra developers discourage the use of

    super columns. Although there is no firm date for discontinuing support, it might happen in future

    releases.

  • 5/26/2018 Os Apache Cassandra PDF

    12/15

    developerWorks ibm.com/developerWorks/

    Consider the Apache Cassandra database Page 12 of 15

    A partitioner in Cassandra decides how to distribute (shard) data among Cassandra nodes, and

    there are many implementations. If an order-preserving partitioner is used, rowIDs are stored in

    a sorted order and Cassandra can do slices (searches) across rowIDs as well. This partitioner

    does not distribute the data uniformly among its nodes, however, and with large datasets, some

    of the nodes might be hard-pressed while others are lightly loaded. Therefore, developers also

    discourage the use of order-preserving partitioners.

    Healing from failure is manual

    If a node in a Cassandra cluster has failed, the cluster will continue to work if you have replicas.

    Full recovery, which is to redistribute data and compensate for missing replicas, is a manual

    operation through a command line tool called node tool(see Resources). Also, while the manual

    operation happens, the system will be unavailable.

    It remembers deletes

    Cassandra is designed such that it continues to work without a problem even if a node goes down

    (or gets disconnected) and comes back later. A consequence is this complicates data deletions.

    For example, assume a node is down. While down, a data item has been deleted in replicas.

    When the unavailable node comes back on, it will reintroduce the deleted data item at the syncing

    process unless Cassandra remembers that data item has been deleted.

    Therefore, Cassandra has to remember that the data item has been deleted. In the 0.8 release,

    Cassandra was remembering all the data even if it is deleted. This caused disk usage to keep

    growing for update-intensive operations. Cassandra does not have to remember all the deleted

    data, but just the fact that a data item has been deleted. This fix was done in later releases of

    Cassandra.

    Conclusion

    This article delves into some details that are not readily apparent when you consider Cassandra.

    I described the Cassandra data model, comparing it with the relational data model, and

    demonstrated a typical schema design with Cassandra. A key observation is that unlike the

    relational model that breaks data into many tables, Cassandra tends to keep as much as data as

    possible within the same row to avoiding having to join that data for retrieval.

    You also looked at several limitations of the Cassandra-based approach. These limitations,

    however, are common to most NoSQL solutions, and are often conscious design trade-offs to

    enable high scalability.

  • 5/26/2018 Os Apache Cassandra PDF

    13/15

    ibm.com/developerWorks/ developerWorks

    Consider the Apache Cassandra database Page 13 of 15

    Downloads

    Description Name Size

    Book rating sample code CassandraSample.zip 42KB

    http://www.ibm.com/developerworks/apps/download/index.jsp?contentid=823612&filename=CassandraSample.zip&method=http&locale=
  • 5/26/2018 Os Apache Cassandra PDF

    14/15

    developerWorks ibm.com/developerWorks/

    Consider the Apache Cassandra database Page 14 of 15

    Resources

    Learn

    Read What goes around comes around(Michael Stonebraker and Joey Hellerstein, 2007), if

    you are interested about the history of storage technologies. Read Getting Started Pagein the Cassandra Wiki to install Cassandra, run single node

    Cassandra, and find an overview of how to configure multinode clusters.

    Read the paper, Cassandra - A Decentralized Structured Storage System(Avinash

    Lakshman and Prashant Malik, 2009) to understand the Cassandra architecture in more

    detail.

    Read more about Google's Big Tableand Amazon's Dynamo.

    Read about Eric Brewer's CAP theorem(Julian Browne, January 2009).

    See Finding the Right Data Solution for Your Application in the Data Storage Haystack

    (Srinath Perera, InfoQ, October 2011) for an overview of NoSQL landscape and

    recommendations onhow to choose the right NoSQL storage. Learn more about Disk-drive performance characteristicson Wikipedia.

    Read about node tool, a simple command line interface to these exposed operations and

    attributes on the Cassandra Wiki.

    Find more details about consistency levels in the Casandra API wiki.

    Read more about storage configurations.

    The Open Source developerWorks zoneprovides a wealth of information on open source

    tools and using open source technologies.

    Stay current with developerWorks technical events and webcastsfocused on a variety of IBM

    products and ITindustry topics.

    Attend a free developerWorks Live! briefingto get up-to-speed quickly on IBM products and

    tools, as well as IT industry trends.

    Watch developerWorks on-demand demosranging from product installation and setup demos

    for beginners, to advanced functionality for experienced developers.

    Follow developerWorks on Twitter.

    Get products and technologies

    Explore Cassandraon the project website.

    Check out the Hector clienton the project website.

    Download Cassandraand find instructions on how to use it at cassandra.apache.org.

    Evaluate IBM productsin the way that suits you best: Download a product trial, try a productonline, use a product in a cloud environment, or spend a few hours in the SOA Sandbox

    learning how to implement ServiceOrientedArchitecture efficiently.

    Discuss

    Check out developerWorks blogsand get involved in the developerWorks community.

    Get involved in the developerWorks community. Connect with other developerWorks users

    while exploring the developer-driven blogs, forums, groups, and wikis.

    http://www.ibm.com/developerworks/downloads/index.htmlhttp://cassandra.apache.org/https://github.com/rantav/hectorhttp://cassandra.apache.org/http://cassandra.apache.org/http://www.twitter.com/developerworks/http://www.twitter.com/developerworks/http://www.ibm.com/developerworks/offers/lp/demos/index.htmlhttp://www.ibm.com/developerworks/offers/techbriefings/index.htmlhttp://www.ibm.com/developerworks/offers/techbriefings/events.htmlhttp://www.ibm.com/developerworks/opensource/http://wiki.apache.org/cassandra/StorageConfigurationhttp://wiki.apache.org/cassandra/StorageConfigurationhttp://en.wikipedia.org/wiki/Disk-drive_performance_characteristicshttp://www.infoq.com/articles/perera-data-storage-haystackhttp://www.infoq.com/articles/perera-data-storage-haystackhttp://www.julianbrowne.com/article/viewer/brewers-cap-theoremhttp://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/bigtable-osdi06.pdfhttp://db.cs.pitt.edu/courses/cs3551/11-1/handouts/10-1.1.1.115.1568.pdfhttp://www.ibm.com/developerworks/communityhttp://www.ibm.com/developerworks/communityhttp://www.ibm.com/developerworks/blogs/http://www.ibm.com/developerworks/downloads/soasandbox/index.htmlhttp://www.ibm.com/developerworks/downloads/index.htmlhttp://cassandra.apache.org/https://github.com/rantav/hectorhttp://cassandra.apache.org/http://www.twitter.com/developerworks/http://www.ibm.com/developerworks/offers/lp/demos/index.htmlhttp://www.ibm.com/developerworks/offers/techbriefings/index.htmlhttp://www.ibm.com/developerworks/offers/techbriefings/events.htmlhttp://www.ibm.com/developerworks/opensource/http://wiki.apache.org/cassandra/StorageConfigurationhttp://wiki.apache.org/cassandra/APIhttp://wiki.apache.org/cassandra/NodeToolhttp://en.wikipedia.org/wiki/Disk-drive_performance_characteristicshttp://www.infoq.com/articles/perera-data-storage-haystackhttp://www.julianbrowne.com/article/viewer/brewers-cap-theoremhttp://db.cs.pitt.edu/courses/cs3551/11-1/handouts/10-1.1.1.115.1568.pdfhttp://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/bigtable-osdi06.pdfhttp://www.cs.cornell.edu/Projects/ladis2009/papers/Lakshman-ladis2009.PDFhttp://wiki.apache.org/cassandra/GettingStartedhttp://idke.ruc.edu.cn/seminars/phd/2007/11.07/What%20Goes%20Around%20Comes%20Around.pdf
  • 5/26/2018 Os Apache Cassandra PDF

    15/15

    ibm.com/developerWorks/ developerWorks

    Consider the Apache Cassandra database Page 15 of 15

    About the author

    Srinath Perera

    Srinath works as a Senior Software architect at WSO2 Inc., where he overlooks the

    overall WSO2 platform architecture with the CTO. He also serves as a research

    scientist at Lanka Software Foundation and teaches as a visiting faculty at

    Department of Computer Science and Engineering, University of Moratuwa. He

    is a co-founder of Apache Axis2, and he has been involved with the Apache Web

    Service project since 2002 and is a member of Apache Software foundation, PMC

    and the Apache Web Service project. Srinath is also a committer of Apache open

    source projects Axis, Axis2, and Geronimo. Srinath received his Ph.D. and M.Sc.

    in Computer Sciences from Indiana University, Bloomington, USA and received

    his Bachelor of Science in Computer Science and Engineering from University of

    Moratuwa, Sri Lanka.

    Copyright IBM Corporation 2012

    (www.ibm.com/legal/copytrade.shtml)

    Trademarks

    (www.ibm.com/developerworks/ibm/trademarks/)

    http://www.ibm.com/developerworks/ibm/trademarks/http://www.ibm.com/legal/copytrade.shtml