nosql and couchbase

32
NoSQL & Couchbase Sangharsh Agarwal

Upload: sangharsh-agarwal

Post on 16-Jul-2015

142 views

Category:

Software


0 download

TRANSCRIPT

NoSQL & Couchbase

Sangharsh Agarwal

Relational Databases

• MySQL, PostgreSQL, SQLite, Oracle etc.,

• Good at

– Schemas

– Strong Consistency

– Transactions

– “Mature” and well tested

–Availability of Expertise

What is NoSQL?

• It’s not Anti SQL or ‘NO’ SQL.

• It means (N)ot (O)nly SQL.

• Exact name could be Non Relational DB.

What is NoSQL?

• Carlo Strozzi used the term NoSQL in 1998 to name his lightweight, open-source relational database that did not expose the standard SQL interface.

• A NoSQL database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases.

• Motivation for NoSQL include simplicity of design, horizontal scaling and finer control over availability.

• Data structures in NoSQL (e.g. key-value, graph, or document) differs from the RDBMS, and therefore some operations are faster in NoSQL and some in RDBMS.

“Is NoSQL a complete replacement of RDBMS?”

“NO”

Common Features of NoSQL

• Open Source• Schema-less• Scalability with Scale Out not Scale Up.• Distribution with Sharding.• Eventual Consistency.• Commodity Class Nodes• Parallel Query with MapReduce.• Cloud Readiness• High Availability

NoSQL Data Models (1/2)

• Distributed Caches: Couchbase, Memcached, Velocity

• Wide Column Stores: Accumulo, Cassandra, Druid, HBase

• Document Stores: Clusterpoint, Apache CouchDB, Couchbase, MarkLogic, MongoDB

NoSQL Data Models (2/2)

• Key-value Stores: Dynamo, FoundationDB, MemcacheDB, Redis, Riak, FairCom c-treeACE

• Graph Databases: Allegro, Neo4J, InfiniteGraph, OrientDB, Virtuoso, Stardog

Why NoSQL (1/2)

• Interactive applications have changed dramatically over the last 15 years. In the late ‘90s, large web companies emerged with dramatic increases in scale on many dimensions:

– The number of concurrent users skyrocketed. (Big Users)

– The amount of data collected and processed soared. (IOT)

– The amount of unstructured or semi-structured data exploded. (Big Data/Cloud)

• Dealing with above issues was more and more difficult using relational database technology.

• Relational databases are essentially architected to run a single machine and use a rigid schema-based approach to modeling data.

Why NoSQL (2/2)

• Schema-less: Alter operation in RDBMS is costly.

• RDMS are less capable of dealing with Big-Data.

• RDMS are not good for Object oriented programmer.

• RDMS support Scale-up than Scale-out.

• RDMS can-not handle Unstructured or semi-structured data.

Big Users

• Not that long ago, 1,000 daily users of an application was a lot and 10,000 was an extreme case.

• Today, with the growth in global Internet use, the increased number of hours users spend online, and the growing popularity of smartphones and tablets, it's not uncommon for apps to have millions of users a day.

Internet of Things

• The amount of machine-generated data is increasing with the proliferation of digital telemetry.

• There are 14 billion things connected to the Internet. – By 2020, 32 billion things will be connected to the Internet.– By 2020, 10% of data will be generated by embedded systems.– By 2020, 20% of target rich data will be generated by

embedded systems.

• Telemetry data is small, semi-structured and continuous. It’s a challenge for relational databases.

• To address this challenge, the innovative enterprise is relying on NoSQL technology to scale concurrent data access to millions of connected things.

Big Data

• The amount of data is growing rapidly, and the nature of data is changing as well as developers find new data types – most of which are unstructured or semi-structures – that they want to incorporate into their applications.

• Data is becoming easier to capture and access through third parties such as Facebook, Dun and Bradstreet, and others.

• NoSQL provides a data model that maps better to the application’s organization of data and simplifies the interaction between the

The Cloud

• Three-Tier Internet Architecture: Applications today are increasingly developed using a three-tier internet architecture, are cloud-based, and use a Software-as-a-Service business model that needs to support the collective needs of thousands of customers.

• Above approach requires a horizontally scalable architecture that easily scales with the number of users and amount of data the application has.

• NoSQL technologies have been built from the ground up to be distributed, scale-out technologies and therefore fit better with the highly distributed nature of the three-tier Internet architecture.

Data Models• Relational and NoSQL data models are very different.

• The relational model takes data and separates it into many interrelated tables. • Tables reference each other through foreign keys that are stored in columns as

well. • NoSQL databases have a very different model.

• For example, a document-oriented NoSQL database takes the data you want to store and aggregates it into documents using the JSON format.

The CAP Theorem

Published by Eric Brewer in 2000, the theorem is a set of basic requirements that describe any distributed system (not just storage/database systems).

• Consistency - All the servers in the system will have the same data so anyone using the system will get the same copy regardless of which server answers their request.

• Availability - The system will always respond to a request (even if it's not the latest data or consistent across the system or just a message saying the system isn't working).

• Partition Tolerance - The system continues to operate as a whole even if individual servers fail or can't be reached.

It's theoretically impossible to have all 3 requirements met, so a combination of 2 must be chosen and this is usually the deciding factor in what technology is used.

ACID vs BASE Theorems

ACID Properties

ACID is a set of properties that apply specifically to database transactions, defined as follows:

• Atomicity - Everything in a transaction must happen successfully or none of the changes are committed. This avoids a transaction that changes multiple pieces of data from failing halfway and only making a few changes.

• Consistency - The data will only be committed if it passes all the rules in place in the database (ie: data types, triggers, constraints, etc).

• Isolation - Transactions won't affect other transactions by changing data that another operation is counting on; and other users won't see partial results of a transaction in progress (depending on isolation mode).

• Durability - Once data is committed, it is durably stored and safe against errors, crashes or any other (software) malfunctions within the database.

BASE Theorem

• Basically Available - This constraint states that the system does guarantee the availability of the data as regards CAP Theorem; there will be a response to any request. But, that response could still be ‘failure’ to obtain the requested data or the data may be in an inconsistent or changing state, much like waiting for a check to clear in your bank account.

• Soft state - The state of the system could change over time, so even during times without input there may be changes going on due to ‘eventual consistency,’ thus the state of the system is always ‘soft.’

• Eventual consistency - The system will eventually become consistent once it stops receiving input. The data will propagate to everywhere it should sooner or later, but the system will continue to receive input and is not checking the consistency of every transaction before it moves onto the next one.

Couchbase

Couchbase - The NoSQL document database

• Couchbase Server, originally known as Membase, is an open source, distributed (shared-nothing architecture) NoSQL document-oriented database that is optimized for interactive applications. These applications must service many concurrent users; creating, storing, retrieving, aggregating, manipulating and presenting data.

• Couchbase is designed to provide easy-to-scale key-value or document access with low latency and high sustained throughput. It is designed to be clustered from a single machine to very large scale deployments.

• In the parlance of Eric Brewer’s CAP theorem, Couchbase is a CP type system.

Couchbase Features

Easy Scalability

It’s easy to scale your database layer with Couchbase Server, whether within a cluster or across clusters in multiple data centers. With one click of a button, no downtime, and no changes to your app, you can grow your cluster from 1 to 25 to 100s of servers while keeping the workload evenly distributed.

Consistent High Performance

Couchbase Server’s consistent sub millisecond response times means an awesome experience for your app users. Consistent, high throughput lets you serve more users with fewer servers. Data and workload are equally spread across all servers.

Always On

With Couchbase Server, your application is always online, 24x365. Whether you are upgrading your database, system software or hardware – or recovering from a disaster – you can count on zero app downtime with Couchbase Server.

Flexible Data Model

You shouldn’t have to worry about the database when you change your application. With Couchbase Server, there is no fixed schema so records can have different structure, and be changed any time, without modification to other documents in the database.

Couchbase Features..

Flexible Data Model

1. JSON Support2. Indexing and Querying3. Incremental Map Reduce

Easy Scalability

1. Clone to Grow with Auto-Sharding2. Cross-Cluster Replication (XDCR)

Consistent High Performance

1. Built-In Object-Level Cache (memcached)

Always On 24x365

1. Zero Downtime Manitenance2. Data Replication With Auto-Failover3. Management and Monitoring UI4. Reliable Storage Architecture.

Why Couchbase?

• Couchbase provides the world’s most complete, most scalable and best performing NoSQL database.

• Couchbase provides the world’s most complete, most scalable and best performing NoSQL database.

• Couchbase provides a shared nothing architecture, a single node-type, a built in caching layer, true auto-sharding and the world’s first NoSQL mobile offering.

Couchbase Architecture (1/3)

High-Level Deployment Architecture.

Couchbase Architecture (2/3)

• In Couchbase Server, the data manager stores and retrieves data in response to data operation requests from applications.

• Every server in a Couchbase cluster includes a built-in multi-threaded object-managed cache, which provides consistent low-latency for read and write operations.

• The cluster manager supervises server configuration and interaction between servers within a Couchbase cluster.

Node architecture diagram of Couchbase Server

Couchbase Architecture (3/3)

Data flow within Couchbase during a write operation

1. Client writes a document into the cache, and the server sends the client a confirmation.

2. The document is added into the intra-cluster replication queue to be replicated to other servers within the cluster.

3. The document is also added into the disk write queue to be asynchronously persisted to disk. The document is persisted to disk after the disk-write queue is flushed.

4. After the document is persisted to disk, it’s replicated to other Couchbase Server clusters using cross datacenter replication (XDCR) and eventually indexed.

Couchbase’ Elasticsearch Connector

• Together, Couchbase and Elasticsearch enable you to build richer and more powerful apps with full-text search, indexing and querying and real-time analytics for use cases such as content stores or aggregating data from varied data sources.

“The plug-in for Elasticsearch extends Couchbase Server’s flexibility even further, allowing users to build self-adapting interactive applications.”

Thanks

References

• http://www.thoughtworks.com/insights/articles/nosql-comparison

• http://www.quora.com/What-is-the-relation-between-SQL-NoSQL-the-CAP-theorem-and-ACID

• http://www.christof-strauch.de/nosqldbs.pdf

• http://docs.couchbase.com/