no sq lv2

26
NoSQL Presented By: Nusrat Sharmin

Upload: nusrat-sharmin

Post on 14-Apr-2017

116 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: No sq lv2

NoSQLPresented By: Nusrat Sharmin

Page 2: No sq lv2

What is NoSQL? Stands for Not Only SQL

implying that when designing a software solution or product there are more than one storage mechanism that could be used based on the needs

Class of non-relational data storage systems Usually do not require fixed table schema that is schema-less nor do they use

concept of joins Running well on clusters Mostly open-source, distributed, & built for 21st web estates Designed to cope up with the scale & agility challenges that face modern

applications Built to take advantage of the cheap storage & processing power available today

Page 3: No sq lv2

Why NoSQL Databases? Allows developers to develop

without having to convert in-memory structures to relational structures

Page 4: No sq lv2

Why NoSQL Databases? Using databases as

integration points in favor of encapsulating databases with applications & integrating using services

The rise of the web as a platform also created a vital factor change in data storage need to support large volumes of data by

running on clustersRelational databases were not designed

to run on clusters for example the data storage for ERP

application are lot more different than data storage needs of a Facebook or an Etsy

Page 5: No sq lv2

Data Models of NoSQL A data model is a set of constructs for representing the information

Relational model: tables, columns & rowsStorage model: how the DBMS stores & manipulates the data internally A data model is usually independent of the storage model Data models for NoSQL systems

Aggregate Data Models key-value document column-family

Distribution Models

Page 6: No sq lv2

Aggregate Data Models Data as units that have a complex structure

more structure than just a set of tuples example:

complex record with: simple fields, arrays, records nested inside Aggregate in Domain-Driven Design

a collection of related objects that we treat as unit a unit for data manipulation and management of consistency

Advantages of aggregates: easier for application programmers to work with easier for database systems to handle operating on cluster

Page 7: No sq lv2

Distribution Models Aggregate oriented databases make distribution of data easier

the distribution mechanism has to move the aggregate that contained all the related data in the aggregate

There are two styles of distributing data Sharding

distributes different data across multiple servers each server acts as the single source for a subset of data

Replication copies data in multiple servers, so each bit of data can be found in multiple places comes in two forms

Master-slave replication makes one node the authoritative copy that handles writes while slaves synchronize with the master and may handle reads reduces the chance of update conflicts

Peer-to-peer replication allows writes to any node that nodes coordinate to synchronize their copies of the data avoids loading all writes onto a single server creating a single point of failure

Page 8: No sq lv2

CAP Theorem Proposed by Eric Brewer (talk on

Principles of Distributed Computing July 2000)

Three properties of a system: consistency, availability and partitions

Can have at most two of these three properties for any shared-data system

To scale out, partition will need. That leaves either consistency or availability to choose from In almost all cases, choose availability

over consistency

Consistency

Partition tolerance

Availability

Page 9: No sq lv2

CAP Theorem

Once a writer has written, all readers will see that write Two kinds of consistency:

strong consistency – ACID(Atomicity Consistency Isolation Durability)

weak consistency – BASE(Basically Available Soft-state Eventual consistency )

Consistency

Partition tolerance

Availability

Page 10: No sq lv2

CAP Theorem System is available

during software & hardware upgrades & node failures

Traditionally, thought of as theserver/process available five 9’s (99.999

%) However, for large node system, at

almost any point in time there’s a good chance that a node is either down or there is a network disruption among the nodes Want a system that is resilient in the

face of network disruption

Consistency

Partition tolerance

Availability

Page 11: No sq lv2

CAP Theorem

A system can continue to operate in the presence of a network partitions

Consistency

Partition tolerance

Availability

Page 12: No sq lv2

CAP Theorem

Theorem: Can have at most two of these properties for any shared-data system

Consistency

Partition tolerance

Availability

Page 13: No sq lv2

Types of NoSQL Databases

NoSQL

Key-Value or ‘the big hash table’

Schema-less

Column-based

Document-based

Graph-based

Page 14: No sq lv2

Key-Value databases

Simplest NoSQL data stores to use from an API perspective

The client can either get the value for the key put a value for a key or delete a key from the data store

The data stores just store the value is blob without caring what is inside

Can store whatever like in the aggregate Can only access an aggregate by lookup

based on its key Examples: Riak, Redis, Memcached, Berkely

DB, HamsterDB, Amazon DynamoDB (not open-source), Project Voldemort & Couchbase

Page 15: No sq lv2

Document databases

Main concept are – ‘Documents’ Database stores & retrieves documents

which can be XML, JSON, BSON and so on

Documents are Self-describing Hierarchical tree data structures that can

consist of maps, collections & scalar values Documents are stored similar to each other

but do not have to be exactly the same Store documents in the ‘value’

i.e. part of the key-value store where the values are examinable

Example: MongoDB, CouchDB, Terrastore, OrientDB, RavenDB

Page 16: No sq lv2

Column family stores Store data in column families as rows

that have many columns associated with a row key

Column families are group of related data that is often accessed together

Various rows do not have the same columns

Columns can be added to any rows at any time without having to

add it to other rows Example: Cassandra, Hbase,

Hypertable, Amazon DynamoDB

Page 17: No sq lv2

Graph storesAllows to store entities & relationships

between these entitiesEntities are also known as nodes

can be an instance of an object in the application

Relations are known as edgesNodes are organized by relationships

allows you to find interesting patterns between the nodes

complex relationship requires complex join Like storing a graph like structure in

RDBMS in relational databases model the graph beforehand the traversal need.

Traversal will change the data movement

Page 18: No sq lv2

Graph stores In database traversing

the joins or relationships are very fast Nodes can have

different types of relationships Value of the graph databases

derived from the relationships Relationships don’t only have a type but

also a start node & an end node

Adding new relationship types is easy Changing existing nodes & relationships are

similar to data migration

Example : Neo4J, Infinite Graph, OrientDB or FlockDB

Page 19: No sq lv2

Key/Value Vs. Schema-less

Key/Value Pros:

very fastvery scalablesimple modelable to distribute horizontally

Cons:many data structures (objects) can’t be

easily modeled as key value pairs

Schema-less Pros:

Schema-less data model is richer than key/value pairs

eventual consistencymany are distributedstill provide excellent performance and

scalability Cons:

typically no ACID transactions or joins

Page 20: No sq lv2

SQL Vs. NoSQLTopics SQL NoSQL

Types One type : SQL Database (with minor variations)

Many different types: Key/Value, document database, column stores database, graph database

Development History

Developed in 1970s Developed in 2000s

Deal with First wave of data storage applications Limitations of SQL databases, particularly concerning scale, replication & unstructured data storage

Examples MySQL, Postgres, Oracle MongoDB, Cassandra, Hbase, Neo4J

Data Storage Model Individual records are stored as rows in tables with columns much like spreadsheet. Separate data stored in separate tables & used joined operation for querying data

Varies based on database type. For example, key-value stores function similar to the SQL but have only two columns: ‘key’ & ‘value’ with more information sometimes stored in ‘value’ & Document databases work with table & row model storing all relevant data in single document like JSON, XML etc.

Page 21: No sq lv2

Topics SQL NoSQLSchemas Predefined i.e. structure & datatypes are

fixedDynamic. Unlike SQL can store dissimilar data if necessary.

Scaling Vertically i.e. single sever must be made increasingly powerful. To spread SQL database over many servers additional engineering required

Horizontally i.e. to add capacity, a database administrator can simply add more commodity servers & cloud instances

Sharding Manual sharding Auto sharding

Development Model

Mix of open-source (e.g. Postgres, MySQL) and closed source (e.g. Oracle)

Open-source

Supports Transactions

Update can be configured entirely or not at all

In certain circumstances and at certain levels (e.g. document level vs. database level)

Data Manipulation

Specific language using select, insert & update statements e.g. SELECT fields FROM table WHERE

Object oriented APIs

Consistency Strong consistency Depends on product. Some provide strong consistency (e.g. MongoDB) whereas others eventual consistency (e.g. Cassandra)

SQL Vs. NoSQL

Page 22: No sq lv2

Handling Relational Data Lack ability of joins in queries Three main techniques for handling relational data

Multiple queries instead of retrieving all data with one query, it’s acceptable to do several queries

Caching/replication/non-normalized data instead of storing only foreign keys, it’s common to store actual foreign values with model’s data

Nesting data put more data in a smaller number of collections so that a single document can contains all the

data that need for a specific task

Page 23: No sq lv2

Benefits of NoSQL Cheap, easy to implement (open source) Data are replicated to multiple nodes (therefore identical & fault tolerant) and can

be partitioned Down nodes easily replaced No single point of failure

Easy to distribute Don’t require a schema Can scale up and down Relax the data consistency requirement (CAP)

Page 24: No sq lv2

Conclusion NoSQL database doesn’t mean

the demise of RDBMS databases improve programmer productivity improve data access performance via some combination

handling larger data volumes reducing latency improving throughput

Entering an era of ‘Polyglot Persistence’ a technique that uses different data storage technologies to handle varying data storage

needs can apply across an enterprise or within a single application

Page 25: No sq lv2

References

1. http://www.thoughtworks.com/insights/blog/nosql-databases-overview2. http://www.cs.kent.edu/~jin/Cloud12Spring/HbaseHivePig.pptx3. http://en.wikipedia.org/wiki/NoSQL4. http://www.mongodb.com/nosql-explained5. http://nosql-database.org/

Page 26: No sq lv2

Q & A

Thank You!