selecting best nosql

23
Selecting No-SQL DBMS Finding the best NoSQL DBMS @Mohammed Fazuluddin

Upload: mohammed-fazuluddin

Post on 06-Apr-2017

57 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Selecting best NoSQL

Selecting No-SQL DBMS

Finding the best NoSQL DBMS

@Mohammed Fazuluddin

Page 2: Selecting best NoSQL

Topics

Why choose NoSQL database Overview Brief on different type of NoSQL’s

Page 3: Selecting best NoSQL

Why choose NoSQL database

To improve programmer productivity by using a database that better matches an application's needs.

To improve data access performance via some combination of handling larger data volumes, reducing latency, and improving throughput.

Since most of the NoSQL databases are open source, testing them is a simple matter of downloading these products and setting up a test environment.

Separating parts of applications into services also allows you to introduce NoSQL into an existing application.

Page 4: Selecting best NoSQL

Overview

NoSQL means that when designing a software solution there are more than one storage mechanism that could be used based on the needs. 

Due to increasing needs for scalability and performance, alternative systems have emerged, namely NoSQL technology.

There are hundreds of readily available NoSQL databases, and each have different use case scenarios.

If we categories the NoSQL then we can divide into four main NoSQL categories

Document Database Key-value Database Column Based Database Graph Database

Page 5: Selecting best NoSQL

Overview

Before going down the NoSQL path, it's a good recheck whether your existing DBMS software can be used for the current requirement.

Using NoSQL databases allows developers to develop without having to convert in-memory structures to relational structures.

NoSQL does not have a prescriptive definition but we can make a set of common observations, such as: Not using the relational model Running well on clusters Mostly open-source Built for the 21st century web estates Schema-less

Page 6: Selecting best NoSQL

Document Database

Page 7: Selecting best NoSQL

Document Database

The document store DBMS stores data at the document level using a markup language such as JavaScript Object Notation (JSON) or XML.

The document data model makes it easy for developers to store and combine data of any structure, without giving up data access and indexing functionality.

Database administrators (DBAs) can dynamically modify the schema without downtime.

Document databases work well for event logging, online shopping, content management and in-depth analytical processing.

The schema flexibility of document databases can also be useful for projects which required rapid prototyping.

Page 8: Selecting best NoSQL

Document Database

One of the leading NoSQL DBMS’s is MongoDB, an open source document store DBMS.

It's designed to make it easy to develop and run modern applications that rely on structured and unstructured data while delivering scalability and high availability, and supporting rapidly changing data.

There are probably more technicians familiar with it than any other NoSQL DBMS, making it somewhat easier to staff MongoDB projects.

MongoDB stores data as documents in a binary JSON representation called Binary JSON (BSON).

MongoDB is specifically designed for rapidly building applications that scale globally and are inexpensive to operate.

Page 9: Selecting best NoSQL

Document Database

Another option is Couchbase Server, a JSON-based document store derived from Couch DB, which is an Apache open source project.

Couchbase Server delivers eventual consistency for transactions, as opposed to ACID (atomicity, consistency, isolation, and durability).

Many NoSQL offerings rely on command line interface (CLI) administration, but Couchbase Server administration tasks can be performed using the Web, CLI or RESTful API.

Another option is MarkLogic Server, it can handle JSON, XML and resource description framework (RDF) data natively, and offers critical enterprise features such as ACID transactions, automated failover and security.

Page 10: Selecting best NoSQL

Key-Value Database

Page 11: Selecting best NoSQL

Key-Value Database

The key-value approach is somewhat similar to the document approach. Both offer flexible schemata, but the data in a key-value store isn't structured using a markup language like JSON.

Key-value databases excel at session management, serving ad content and managing user or product profiles. When data is encoded in many different ways without a rigorous schema, using a key-value database can make sense.

One of the leading key-value DBMS’s is Redis, an open source, BSD-licensed, key-value data store.

Redis is a key-value store, but it also supports different kinds of data structures. Whereas with traditional key-value stores you associate string keys to string values, in Redis the value isn't limited to a simple string but can also hold more complex data structures.

Page 12: Selecting best NoSQL

Key-Value Database

Another NoSQL key-value DBMS option is Riak from Basho Technologies. 

Riak is a fault-tolerant, highly available, scalable, distributed multi-model DBMS.

Riak open source is free under the Apache 2 license whereas Riak Enterprise requires a commercial license agreement, sold by Basho Technologies.

Riak is more accurately termed a multi-model platform, supporting key-value, object store and search capabilities all from the same platform.

Riak is an open source, distributed DBMS that's implemented across multiple servers, It provides features like any server can respond to read or write requests. If one server fails, other servers will continue to act upon client requests.

Page 13: Selecting best NoSQL

Column Database

Page 14: Selecting best NoSQL

Column Database

A column store NoSQL DBMS allows you to store data with keys mapped to values and the values grouped into families that are often accessed together.

A column database is well-suited for data where writes are uncommon and applications need to access a few columns of many rows all at once.

Column stores work well for event logging, content management and counting/categorizing for analytics.

Column stores are also useful when you have expiring data because you can set up a column to automatically expire.

Apache Cassandra is one of the top NoSQL column family DBMS’s, it's an open source DBMS, originally developed at Facebook and later released as an open source project, and is therefore freely available to download and use.

Page 15: Selecting best NoSQL

Column Database

Apache Cassandra is designed to be used by online applications that require fast performance with no downtime,  It was engineered to handle very large amounts of data spread out across commodity servers to deliver high availability without a single point of failure.

DataStax Enterprise, a commercial vendor, has created an enterprise-level version of Cassandra with support called DataStax Enterprise. 

DataStax Enterprise is free to use in development environments; use in production requires the purchase of a license (or enrollment in the startup program).

DataStax offers subscriptions for both production and non-production environments that include certified software and support.

Page 16: Selecting best NoSQL

Column Database

Apache HBase is another leading open source NoSQL column store.  Designed to deliver random, real-time, read/write access to large amounts

of data using commodity hardware, HBase is modeled after Google's Big table storage system.

It's built on top of Hadoop and Hadoop Distributed File System (HDFS). Although Hadoop and HBase are open source projects there are commercial

providers such as Cloudera, which offers Cloudera Enterprise. Apache Hadoop and other open source projects into a single, highly

scalable system for analytical processing. Of course, Cloudera isn't the only commercial provider; for example, Hortonworks and MapR Technologies are other leading providers of Hadoop distributions that include HBase.

Page 17: Selecting best NoSQL

Graph Database

The graph database NoSQL category focuses on relationships between values and stores data using graph structures with nodes, edges and properties.

In a graph database every element contains a direct pointer to its adjacent element and no index lookups are necessary.

It is used in social media (relationship management), search, network and IT operations, fraud detection, real-time recommendations, digital asset management and master data management , essentially any application that benefits from harnessing the power of data relationships using graphs.

The leading graph database is Neo4j. Neo4j is a native graph database system, where things are stored as nodes and relationships between things building the structure of the database. 

Page 18: Selecting best NoSQL

Graph Database

Page 19: Selecting best NoSQL

Graph Database

Graph databases allow you to store entities and relationships between these entities. Entities are also known as nodes, which have properties.

Nodes can have different types of relationships between them, allowing you to both represent relationships between the domain entities and to have secondary relationships for things like category, path, time-trees, quad-trees for spatial indexing, or linked lists for sorted access.

Since most of the power from the graph databases comes from the relationships and their properties, a lot of thought and design work is needed to model the relationships in the domain that we are trying to work with.

Relationships are first-class citizens in graph databases; most of the value of graph databases is derived from the relationships.

Page 20: Selecting best NoSQL

Graph Database

There are many graph databases available, such as Neo4J, Infinite Graph, OrientDB, or FlockDB (which is a special case: a graph database that only supports single-depth relationships or adjacency lists, where you cannot traverse more than one level deep for relationships).

Neo4j offers ACID transactions, high-availability clustering for enterprise deployments, and comes with a Web-based administration tool.

Neo4j isn't new technology; the company has been in business for more than a decade.

Titan, which is optimized for storing and querying graphs represented over a cluster of machines. 

Page 21: Selecting best NoSQL

Graph Database

Titan has a pluggable storage architecture that allows it to build on proven database technology such as Apache Cassandra, Apache HBase or Oracle Berkeley DB.

Choosing a multi-model approach can make sense for applications needing several different NoSQL approaches (such as key/value for some data and graph for others).

Most NoSQL DBMS offerings are open source and can be licensed for free under an open source license or via a commercial license from a vendor that offers support and upgrades.

The commercial option is recommended for organizations intending to use NoSQL databases in production applications and systems.

Page 22: Selecting best NoSQL

The multi-model DBMS

Another choice in the NoSQL market is the multi-model DBMS. A growing number of vendors have delivered DBMS products that support more than one (or all) of the NoSQL models (some cases, relational, too). Examples of multi-model NoSQL vendors include DataStrax Enterprises, Foundation DB, Cortex DB and Orient DB.

Your existing relational DBMS may also be an option. The relational vendors are working to expand their DBMS’s to embrace NoSQL, and some have already started to introduce NoSQL capabilities.

One example is IBM DB2. The DB2 for Linux, Unix and Windows with a column store capability, albeit a relational column store and it has the ability to store RDF graph triples and JSON documents, which may obviate the need for DB2 users to acquire a graph or document database.

Page 23: Selecting best NoSQL

Thank You