exploring nosql and implementing through cassandra

Post on 11-Apr-2017

161 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Copyright © 2014, Oracle and/or its affiliates. All rights reserved.1

DILEEP KALIDINDI23rd February 2015

Explore, Build & Operate

NoSQL with

Apache Cassandra

Who am I ?

Dileep Varma Kalidindi

Current: Senior Engineer @Responsys (since Apr’14), Circles Team.

Fascination: Problem Solving , Distributed & BigData churning systems.

Past: 8+yrs with VeriSign, Informatica Labs, NTT Data.

Hobbies: Adventure sports.

05/02/2023

Are we good ?

3

Data

Data

Data has never been in same structure, so as their modelling techniques.

Applications evolved from OLAP, OLTP to Web, Mobile & Social.

Big Data comes with different characteristics – Volume, Velocity, Variety, Veracity & Value.

Responsys Data:

Need for better suitable Data models and Storage models

- but why ?

Impending Mismatch –Data model & Storage model

SQL relational model is User oriented

in store concurrency, integrity, consistency, or data type validity

Transactional guarantees, schemas and referential integrity

Purpose applications tend to control integrity and validity (not aggregation fancy)

Difference between the persistent data model and the in-memory data structures.

Data duplication and denormalization are now First class citizens !!

Scale–up to Scale– wide – NoSQL Multinode vs RDBMS clustering.

Conceptual – ACID, BASE & CAP

Transactions, consistency and availability – could we prioritize ?

CAP theorem - consequences

Agenda

NoSQL NoSQL Implementations – for various purposes Architecture fit – Polyglot persistence Data modelling – concepts in view of NoSQL . Cassandra – Architecture Database Internals CQL & DEMO Installation, Configuration & tools Oracle NoSQL – pitch by Sheetal

# NoSQL

May 2, 2023 11

NoSQL

Non-relational, distributed, open-source & horizontally scalable #nxtGen

NoSQL is an accidental neologism.

Schema less storage systems built for 5 v’s of Bigdata.

Decentralized – Every node in cluster is identical

High Availability - No SPoF – No Network failures

Open source and No cost models (Except for enterprise support)

NoSQL – Architecture fit-in

Polyglot persistence thinking fits in right data store for appropriate data sets.

Service usage over Direct data usage.Concerns

Operational concerns like licensing, support, tools, upgrade, auditing. Security of Datastore, Context’s, Authorization etc .. Integration with ETL and Data transfer utilities. Deployment complexity

Data models – in view of NoSQL

NoSQL models are application specific “What questions do I have?”

Relational models are driven by structure of data “What answers do I have?” 

Modelling techniques Conceptual: Denormalization, Aggregates & Application side joins General: Atomic aggregates, Enumerable Keys, Dimensionality

reduction, Index table & Composite key index. Hierarchical: Tree aggregation, Materialized paths, Nested sets &

batch graph processing.

Data models – deep view

Conceptual: DeNormalization Query data volume or IO per query VS total data volume

Processing complexity VS total data volumeAggregates:

Simple Atomic

Tree aggregation:

NoSQL - implementations

If one implementation fits all then why not RDBMS ?Classification is driven in application point of view !Key-Value

Strong aggregation which is opaque to the database Oracle NoSQL, Windows Azure & Redis

Document database Structure in the aggregate MongoDb, CouchDb & Raven DB

NoSQL - implementations

Column family structures Two level aggregate structure Key & a row aggregate, Row aggregate is a group of columns. Big table, Hbase & Cassandra

Graphs database Neo 4j

NoSQL – implementations – CAP fit

May 2, 2023 21

Apache Cassandra - Continuous availability, linear scalability & operational simplicity

About Column store NoSQL Database. Originally developed by Facebook (2007) and now an Apache project Master less architecture with all nodes in Ring topology Commercial add-ons & support (“enterprise edition”) by Datastax

Data center replication, Scalability (wide), Fault-tolerance & Tunable consistency.

Online load balancing, flexible schema, key-oriented queries & CAP-aware Implementation of good Security standards, Operations, Monitoring & utilities.

Column – Key-value pair Counter column Expiring column Super column

Column family – Collection of rows - Map <RowKeys, OrderedColumn Collection> Dynamic (Wide) Static (Narrow)

KeyStore – containts column families & super column familes

Cassandra – data model

CAP Values – AP (Availability & Partition tolerance). Consistency (eventual) available with latency. No row locking (Hbase wins!)

Linear scaling of Cassandra – throughput vs no-of nodes. Casandra Cluster – Partioner generates tokens for rowKeys Write in action Read in action

Cassandra – Architecture

Installation & Configuration

Yum installation is the easiest - /etc/yum.repos.d/datastax.repo Cassandra.yaml configuration

Cluster_name, data_file_dir, commitlog_dir Directory locations Start Cassandra :– Cassandra –f

Start CLI:- cqlsh Stop Cassandra – service stop or process kill

Demo

May 2, 2023 26

CQL in action

CQL 3.0 is much like SQL. All names are case-insensitive

CQL Data types: Create KeySpace: Responsys_Demo Create table, index, user All other SQL like functions !!

Cassandra – Monitoring

JMX Interface – DEMO Nodetool – Cassandra JMX interface

cfstats Netstats Ring & other operations

DataStax Ops center Nagios monitoring Cassandra logging & GC logging

05/02/2023

29Confidential

Summary, Conclusions&

References

Summary – Quick recap

Data evolution ACID, BASE & CAP NoSQL, data models, implementations Cassandra & Data model Architecture Installations & Operations

05/02/2023

32

Q & A

Copyright © 2014, Oracle and/or its affiliates. All rights reserved.33

Thank you

APPENDIX

top related