bigdata, nosql & elasticsearch

17
BIG DATA, NOSQL & ELASTICSEARCH BY SANURA HETTIARACHCHI INTERN AT VOCANIC

Upload: sanura-hettiarachchi

Post on 24-Jun-2015

375 views

Category:

Technology


0 download

DESCRIPTION

These slides explain briefly about the big data concept, NoSQL database systems and the ElasticSearch technology which are quite modern concepts

TRANSCRIPT

Page 1: BigData, NoSQL & ElasticSearch

BIG DATA, NOSQL &ELASTICSEARCH

BY SANURA HETTIARACHCHI

INTERN AT VOCANIC

Page 2: BigData, NoSQL & ElasticSearch

GRAPH DATABASE

• Database that uses graph structures with nodes, edges, and properties to represent and store data.

• Nodes represent entities.

• Properties are pertinent information that relate to nodes.

• Edges represent the relationship between the two. Most of the important information is really stored in the edges.

Page 3: BigData, NoSQL & ElasticSearch

ADVANTAGES & DISADVANTAGES

• Faster for associative data sets

• Map more directly to the structure of object-oriented applications.

• Do not typically require expensive join operations.

• Depend less on a rigid schema, they are more suitable to manage ad hoc and changing data with evolving schemas.

o Relational databases are typically faster at performing the same operation on large numbers of data elements.

o Relational databases are well known.

Page 4: BigData, NoSQL & ElasticSearch

BIG DATA• Any collection of data sets so large and complex that it becomes difficult to

process using traditional data processing applications.

• Require "massively parallel software running on tens, hundreds, or even thousands of servers"

Page 5: BigData, NoSQL & ElasticSearch

FACTORS OF GROWTH, CHALLENGES AND OPPORTUNITIES OF BIG DATA

• Volume – the quantity of data that is generated.

• Variety  – category to which Big Data belongs to.

• Velocity – how fast the data is generated and processed to meet the demands.

• Variability – the inconsistency which can be shown by the data at times.

• Complexity – data needs to be linked, connected and correlated in order to be able to grasp the information.

Page 6: BigData, NoSQL & ElasticSearch

HORIZONTAL & VERTICAL SCALING

• Horizontal scaling - scale by adding more machines to your pool of resources.

• Vertical scaling - scale by adding more power (CPU, RAM, etc.) to your existing machine.

• Horizontal scaling is easier to scale dynamically by adding more machines into the existing pool.

• Vertical scaling is often limited to the capacity of a single machine

• Horizontal scaling are the Cloud data stores, e.g. DynamoDB, Cassandra , MongoDB

• Vertical scaling is MySQL - Amazon RDS (The cloud version of MySQL)

Page 7: BigData, NoSQL & ElasticSearch

ACID

• Atomicity - all of a transaction happens, or none of it does.

• Consistency - data will be consistent.

• Isolation - one transaction cannot read data from another transaction that is not yet completed.

• Durability - once a transaction is complete, it is guaranteed that all of the changes have been recorded to a durable medium.

Page 8: BigData, NoSQL & ElasticSearch

NOSQL

• Basically a large serialized object store• (mostly) retrieve objects by defined ID

• In general, doesn’t support complicated queries

• Doesn’t have a structured schema• Recommends de-normalization

• Designed to be distributed (cloud-scale) out of the box

• Because of this, drops the ACID requirements• Any database can answer any query

• Any write query can operate against any database and will “eventually” propagate to other distributed servers

Page 9: BigData, NoSQL & ElasticSearch

BASE-THE OPPOSITE OF ACID

• Basically Available – guaranteed availability

• Soft-state – the state of the system may change, even without a query (because of node updates)

• Eventually Consistent – the system will become consistent over time

Page 10: BigData, NoSQL & ElasticSearch

WHY NOSQL?

• Today, data is becoming easier to access and capture through third parties such as Facebook, Google+ and others.

• Personal user information, social graphs, geo-location data, user-generated content and machine logging data are just a few examples where the data has been increasing exponentially.

• To use the above services properly requires the processing of huge amounts of data. Which SQL databases are no good for, and were never designed for.

• NoSQL databases have evolved to handle this huge data properly.

Page 11: BigData, NoSQL & ElasticSearch

CAP THEOREM

• Consistency - This means that the data in the database remains consistent after the execution of an operation.

• Availability - This means that the system is always on, no downtime.

• Partition Tolerance - This means that the system continues to function even if the communication among the servers is unreliable

Distributed systems must be partition tolerant , so we have to choose between Consistency and Availability.

Page 12: BigData, NoSQL & ElasticSearch

DIFFERENT TYPES OF NOSQL Column Store

• Column data is saved together, as opposed to row data

• Super useful for data analytics

• Hadoop, Cassandra, Hypertable

Key-Value Store

• A key that refers to a payload

• MemcacheDB, Azure Table Storage, Redis

Document / XML / Object Store

• Key (and possibly other indexes) point at a serialized object

• DB can operate against values in document

• MongoDB, CouchDB, RavenDB

Graph Store

• Nodes are stored independently, and the relationship between nodes (edges) are stored with data

Page 13: BigData, NoSQL & ElasticSearch

RDBMS VS NOSQL

RDBMS NoSQL

Structured and organized data Semi-structured or unorganized data

Structured Query Language (SQL) No declarative query language

Tight consistency Eventual consistency

ACID transactions BASE transactions

Data and Relationships stored in tables No pre defined schema

Page 14: BigData, NoSQL & ElasticSearch

ELASTICSEARCH

• Elasticsearch is a flexible and powerful open source, distributed, real-time search and analytics engine.

• It is based on Apache Lucene which is a free open source information retrieval software library, originally written in Java.

• ElasticSearch is distributed, which means that indices can be divided into shards and each shard can have zero or more replicas.

Page 15: BigData, NoSQL & ElasticSearch

OVERVIEW

• Real timeElasticsearch supports real-time GET requests, which makes it suitable as a NoSQL solution.

• DistributedIt is built to scale horizontally out of the box. As you need more capacity, just add more nodes, and let the cluster reorganize itself to take advantage of the extra hardware.

• High AvailabilityThey will detect and remove failed nodes, and reorganize themselves to ensure that your data is safe and accessible.

Page 16: BigData, NoSQL & ElasticSearch

OVERVIEW• Multi Tenancy

A cluster can host multiple indices which can be queried independently or as a group.

• Document Oriented

Store complex real world entities in Elasticsearch as structured JSON documents. All fields are indexed by default, and all the indices can be used in a single query, to return results at breath taking speed.

• Conflict Management

Optimistic version control can be used where needed to ensure that data is never lost due to conflicting changes from multiple processes

Page 17: BigData, NoSQL & ElasticSearch

THANK YOU