introduction to cassandra architecture

Post on 12-Apr-2017

354 Views

Category:

Software

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Nick Bailey@nickmbailey

Intro to Cassandra Architecture

1

4.1 Cassandra - Introduction

Why does Cassandra Exist?

Dynamo Paper(2007)•How do we build a data store that is:• Reliable• Performant• “Always On”•Nothing new and shiny• 24 papers cited

Also the basis for Riak and Voldemort

BigTable(2006)

•Richer data model• 1 key. Lots of values• Fast sequential access• 38 Papers cited

Cassandra(2008)•Distributed features of Dynamo•Data Model and storage from

BigTable• February 17, 2010 it graduated to

a top-level Apache project

Cassandra - More than one server

• All nodes participate in a cluster• Shared nothing• Add or remove as needed•More capacity? Add a server

7

8

Cassandra HBase Redis MySQL

THR

OU

GH

PUT

OPS

/SEC

)

VLDB benchmark

Cassandra - Fully Replicated

• Client writes local• Data syncs across WAN• Replication per Data Center

9

Cassandra for Applications

APACHE

CASSANDRA

Summary

•The evolution of the internet and online data created new problems

•Apache Cassandra was based on a variety of technologies to solve these problems

•The goals of Apache Cassandra are all about staying online and performant

•Apache Cassandra is a database best used for applications, close to your users

4.1.2 Cassandra - Basic Architecture

Row

Column 1

Partition Key 1

Column 2

Column 3

Column 4

Partition

Column 1

Partition Key 1

Column 2

Column 3

Column 4

Column 1

Partition Key 1

Column 2

Column 3

Column 4

Column 1

Partition Key 1

Column 2

Column 3

Column 4

Column 1

Partition Key 1

Column 2

Column 3

Column 4

Partition with Clustering

Cluster 1

Partition Key 1

Column 1

Column 2

Column 3

Cluster 2

Partition Key 1

Column 1

Column 2

Column 3

Cluster 3

Partition Key 1

Column 1

Column 2

Column 3

Cluster 4

Partition Key 1

Column 1

Column 2

Column 3

Table Column 1

Partition Key 1

Column 2

Column 3

Column 4

Column 1

Partition Key 1

Column 2

Column 3

Column 4

Column 1

Partition Key 1

Column 2

Column 3

Column 4

Column 1

Partition Key 1

Column 2

Column 3

Column 4

Column 1

Partition Key 2

Column 2

Column 3

Column 4

Column 1

Column 2

Column 3

Column 4

Column 1

Column 2

Column 3

Column 4

Column 1

Column 2

Column 3

Column 4

Partition Key 2

Partition Key 2

Partition Key 2

Keyspace

Column 1

Partition Key 1

Column 2

Column 3

Column 4

Column 1

Partition Key 2

Column 2

Column 3

Column 4

Column 1

Partition Key 1

Column 2

Column 3

Column 4

Column 1

Partition Key 1

Column 2

Column 3

Column 4

Column 1

Partition Key 1

Column 2

Column 3

Column 4

Column 1

Partition Key 2

Column 2

Column 3

Column 4

Column 1

Partition Key 2

Column 2

Column 3

Column 4

Column 1

Partition Key 2

Column 2

Column 3

Column 4

Column 1

Partition Key 1

Column 2

Column 3

Column 4

Column 1

Partition Key 2

Column 2

Column 3

Column 4

Column 1

Partition Key 1

Column 2

Column 3

Column 4

Column 1

Partition Key 1

Column 2

Column 3

Column 4

Column 1

Partition Key 1

Column 2

Column 3

Column 4

Column 1

Partition Key 2

Column 2

Column 3

Column 4

Column 1

Partition Key 2

Column 2

Column 3

Column 4

Column 1

Partition Key 2

Column 2

Column 3

Column 4

Table 1 Table 2Keyspace 1

NodeServer

TokenServer•Each partition is a 128 bit value

•Consistent hash between 2-63 and 264

•Each node owns a range of those values

•The token is the beginning of that range to the next node’s token value

•Virtual Nodes break these down further

Data

Token Range

0 …

The cluster Server

Token Range0 0-100

0-100

The cluster Server

Token Range0 0-50

51 51-100

Server

0-50

51-100

The cluster Server

Token Range0 0-2526 26-5051 51-7576 76-100

Server

ServerServer

0-25

76-100

26-5051-75

Summary

•Tables store rows of data by column•Partitions are similar data grouped by a partition key•Keyspaces contain tables and are grouped by data center•Tokens show node placement in the range of cluster data

4.1.3 Cassandra - Replication, High Availability and Multi-datacenter

Replication10.0.0.

1

DC1

DC1: RF=1

Node Primary

10.0.0.1 00-25

10.0.0.2 26-5010.0.0.3 51-75

10.0.0.4 76-100

10.0.0.100-25

10.0.0.476-100

10.0.0.226-50

10.0.0.351-75

Replication10.0.0.100-25

10.0.0.476-100

10.0.0.226-50

10.0.0.351-75

DC1

DC1: RF=2

Node Primary Replica

10.0.0.1 00-25 76-100

10.0.0.2 26-50 00-2510.0.0.3 51-75 26-50

10.0.0.4 76-100 51-75

76-100

00-25

26-50

51-75

ReplicationDC1

DC1: RF=3

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-10010.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

10.0.0.100-25

10.0.0.476-100

10.0.0.226-50

10.0.0.351-75

76-10051-75

00-2576-100

26-5000-25

51-7526-50

ReplicationDC1

DC1: RF=3

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-10010.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

10.0.0.100-25

10.0.0.476-100

10.0.0.226-50

10.0.0.351-75

76-10051-75

00-2576-100

26-5000-25

51-7526-50

Client

Write to partition 15

???

Consistency levelConsistency Level Number of Nodes Acknowledged

One One - Read repair triggered

Local One One - Read repair in local DC

Quorum 51%

Local Quorum 51% in local DC

ConsistencyDC1

DC1: RF=3

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-10010.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

10.0.0.100-25

10.0.0.476-100

10.0.0.226-50

10.0.0.351-75

76-10051-75

00-2576-100

26-5000-25

51-7526-50

Client

Write to partition 15CL= One

ConsistencyDC1

DC1: RF=3

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-10010.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

10.0.0.100-25

10.0.0.476-100

10.0.0.226-50

10.0.0.351-75

76-10051-75

00-2576-100

26-5000-25

51-7526-50

Client

Write to partition 15CL= One

ConsistencyDC1

DC1: RF=3

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-10010.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

10.0.0.100-25

10.0.0.476-100

10.0.0.226-50

10.0.0.351-75

76-10051-75

00-2576-100

26-5000-25

51-7526-50

Client

Write to partition 15CL= Quorum

Multi-datacenterDC1

DC1: RF=3Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-10010.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

10.0.0.100-25

10.0.0.476-100

10.0.0.226-50

10.0.0.351-75

76-10051-75

00-2576-100

26-5000-25

51-7526-50

Client

Write to partition 15

DC2

10.1.0.100-25

10.1.0.476-100

10.1.0.226-50

10.1.0.351-75

76-10051-75

00-2576-100

26-5000-25

51-7526-50

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-10010.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

DC2: RF=3

Multi-datacenterDC1

DC1: RF=3Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-10010.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

10.0.0.100-25

10.0.0.476-100

10.0.0.226-50

10.0.0.351-75

76-10051-75

00-2576-100

26-5000-25

51-7526-50

Client

Write to partition 15

DC2

10.1.0.100-25

10.1.0.476-100

10.1.0.226-50

10.1.0.351-75

76-10051-75

00-2576-100

26-5000-25

51-7526-50

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-10010.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

DC2: RF=3

Multi-datacenterDC1

DC1: RF=3Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-10010.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

10.0.0.100-25

10.0.0.476-100

10.0.0.226-50

10.0.0.351-75

76-10051-75

00-2576-100

26-5000-25

51-7526-50

Client

Write to partition 15

DC2

10.1.0.100-25

10.1.0.476-100

10.1.0.226-50

10.1.0.351-75

76-10051-75

00-2576-100

26-5000-25

51-7526-50

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-10010.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

DC2: RF=3

Summary

•Replication Factor indicates how many times your data is copied

•Consistency Level specifies how many replicas are consistent at read or write

•Replication along with Consistency Factor are critical for uptime

4.2.1.1.3 Cassandra - Read and Write Path (Node Architecture)

WritesCREATE TABLE raw_weather_data ( wsid text, year int, month int, day int, hour int, temperature double, dewpoint double, pressure double, wind_direction int, wind_speed double, sky_condition int, sky_condition_text text, one_hour_precip double, six_hour_precip double, PRIMARY KEY ((wsid), year, month, day, hour) ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC);

WritesCREATE TABLE raw_weather_data ( wsid text, year int, month int, day int, hour int, temperature double, PRIMARY KEY ((wsid), year, month, day, hour) ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC);

INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,10,-5.6);

INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,9,-5.1);

INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,8,-4.9);

INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,7,-5.3);

Write PathClient INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature)

VALUES (‘10010:99999’,2005,12,1,7,-5.3);

year 1wsid 1 month 1 day 1 hour 1

year 2wsid 2 month 2 day 2 hour 2

Memtable

SSTable

SSTable

SSTable

SSTable

Node

Commit Log Data * Compaction *

Temp

Temp

Memory

Disk

Read PathClient

SSTableSSTable

SSTable

Node

Data

SELECT wsid,hour,temperatureFROM raw_weather_dataWHERE wsid='10010:99999'AND year = 2005 AND month = 12 AND day = 1 AND hour >= 7 AND hour <= 10;

year 1wsid 1 month 1 day 1 hour 1

year 2wsid 2 month 2 day 2 hour 2

Memtable

Temp

Temp

Memory

Disk

Summary

•By default, writes are durable•Client receives ack when consistency level is achieved•Reads must always go to disk•Compaction is data housekeeping

43

Questions?

top related