Download - Introduction to Cassandra Architecture
Nick Bailey@nickmbailey
Intro to Cassandra Architecture
1
4.1 Cassandra - Introduction
Why does Cassandra Exist?
Dynamo Paper(2007)•How do we build a data store that is:• Reliable• Performant• “Always On”•Nothing new and shiny• 24 papers cited
Also the basis for Riak and Voldemort
BigTable(2006)
•Richer data model• 1 key. Lots of values• Fast sequential access• 38 Papers cited
Cassandra(2008)•Distributed features of Dynamo•Data Model and storage from
BigTable• February 17, 2010 it graduated to
a top-level Apache project
Cassandra - More than one server
• All nodes participate in a cluster• Shared nothing• Add or remove as needed•More capacity? Add a server
7
8
Cassandra HBase Redis MySQL
THR
OU
GH
PUT
OPS
/SEC
)
VLDB benchmark
Cassandra - Fully Replicated
• Client writes local• Data syncs across WAN• Replication per Data Center
9
Cassandra for Applications
APACHE
CASSANDRA
Summary
•The evolution of the internet and online data created new problems
•Apache Cassandra was based on a variety of technologies to solve these problems
•The goals of Apache Cassandra are all about staying online and performant
•Apache Cassandra is a database best used for applications, close to your users
4.1.2 Cassandra - Basic Architecture
Row
Column 1
Partition Key 1
Column 2
Column 3
Column 4
Partition
Column 1
Partition Key 1
Column 2
Column 3
Column 4
Column 1
Partition Key 1
Column 2
Column 3
Column 4
Column 1
Partition Key 1
Column 2
Column 3
Column 4
Column 1
Partition Key 1
Column 2
Column 3
Column 4
Partition with Clustering
Cluster 1
Partition Key 1
Column 1
Column 2
Column 3
Cluster 2
Partition Key 1
Column 1
Column 2
Column 3
Cluster 3
Partition Key 1
Column 1
Column 2
Column 3
Cluster 4
Partition Key 1
Column 1
Column 2
Column 3
Table Column 1
Partition Key 1
Column 2
Column 3
Column 4
Column 1
Partition Key 1
Column 2
Column 3
Column 4
Column 1
Partition Key 1
Column 2
Column 3
Column 4
Column 1
Partition Key 1
Column 2
Column 3
Column 4
Column 1
Partition Key 2
Column 2
Column 3
Column 4
Column 1
Column 2
Column 3
Column 4
Column 1
Column 2
Column 3
Column 4
Column 1
Column 2
Column 3
Column 4
Partition Key 2
Partition Key 2
Partition Key 2
Keyspace
Column 1
Partition Key 1
Column 2
Column 3
Column 4
Column 1
Partition Key 2
Column 2
Column 3
Column 4
Column 1
Partition Key 1
Column 2
Column 3
Column 4
Column 1
Partition Key 1
Column 2
Column 3
Column 4
Column 1
Partition Key 1
Column 2
Column 3
Column 4
Column 1
Partition Key 2
Column 2
Column 3
Column 4
Column 1
Partition Key 2
Column 2
Column 3
Column 4
Column 1
Partition Key 2
Column 2
Column 3
Column 4
Column 1
Partition Key 1
Column 2
Column 3
Column 4
Column 1
Partition Key 2
Column 2
Column 3
Column 4
Column 1
Partition Key 1
Column 2
Column 3
Column 4
Column 1
Partition Key 1
Column 2
Column 3
Column 4
Column 1
Partition Key 1
Column 2
Column 3
Column 4
Column 1
Partition Key 2
Column 2
Column 3
Column 4
Column 1
Partition Key 2
Column 2
Column 3
Column 4
Column 1
Partition Key 2
Column 2
Column 3
Column 4
Table 1 Table 2Keyspace 1
NodeServer
TokenServer•Each partition is a 128 bit value
•Consistent hash between 2-63 and 264
•Each node owns a range of those values
•The token is the beginning of that range to the next node’s token value
•Virtual Nodes break these down further
Data
Token Range
0 …
The cluster Server
Token Range0 0-100
0-100
The cluster Server
Token Range0 0-50
51 51-100
Server
0-50
51-100
The cluster Server
Token Range0 0-2526 26-5051 51-7576 76-100
Server
ServerServer
0-25
76-100
26-5051-75
Summary
•Tables store rows of data by column•Partitions are similar data grouped by a partition key•Keyspaces contain tables and are grouped by data center•Tokens show node placement in the range of cluster data
4.1.3 Cassandra - Replication, High Availability and Multi-datacenter
Replication10.0.0.
1
DC1
DC1: RF=1
Node Primary
10.0.0.1 00-25
10.0.0.2 26-5010.0.0.3 51-75
10.0.0.4 76-100
10.0.0.100-25
10.0.0.476-100
10.0.0.226-50
10.0.0.351-75
Replication10.0.0.100-25
10.0.0.476-100
10.0.0.226-50
10.0.0.351-75
DC1
DC1: RF=2
Node Primary Replica
10.0.0.1 00-25 76-100
10.0.0.2 26-50 00-2510.0.0.3 51-75 26-50
10.0.0.4 76-100 51-75
76-100
00-25
26-50
51-75
ReplicationDC1
DC1: RF=3
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-10010.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
10.0.0.100-25
10.0.0.476-100
10.0.0.226-50
10.0.0.351-75
76-10051-75
00-2576-100
26-5000-25
51-7526-50
ReplicationDC1
DC1: RF=3
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-10010.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
10.0.0.100-25
10.0.0.476-100
10.0.0.226-50
10.0.0.351-75
76-10051-75
00-2576-100
26-5000-25
51-7526-50
Client
Write to partition 15
???
Consistency levelConsistency Level Number of Nodes Acknowledged
One One - Read repair triggered
Local One One - Read repair in local DC
Quorum 51%
Local Quorum 51% in local DC
ConsistencyDC1
DC1: RF=3
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-10010.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
10.0.0.100-25
10.0.0.476-100
10.0.0.226-50
10.0.0.351-75
76-10051-75
00-2576-100
26-5000-25
51-7526-50
Client
Write to partition 15CL= One
ConsistencyDC1
DC1: RF=3
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-10010.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
10.0.0.100-25
10.0.0.476-100
10.0.0.226-50
10.0.0.351-75
76-10051-75
00-2576-100
26-5000-25
51-7526-50
Client
Write to partition 15CL= One
ConsistencyDC1
DC1: RF=3
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-10010.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
10.0.0.100-25
10.0.0.476-100
10.0.0.226-50
10.0.0.351-75
76-10051-75
00-2576-100
26-5000-25
51-7526-50
Client
Write to partition 15CL= Quorum
Multi-datacenterDC1
DC1: RF=3Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-10010.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
10.0.0.100-25
10.0.0.476-100
10.0.0.226-50
10.0.0.351-75
76-10051-75
00-2576-100
26-5000-25
51-7526-50
Client
Write to partition 15
DC2
10.1.0.100-25
10.1.0.476-100
10.1.0.226-50
10.1.0.351-75
76-10051-75
00-2576-100
26-5000-25
51-7526-50
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-10010.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
DC2: RF=3
Multi-datacenterDC1
DC1: RF=3Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-10010.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
10.0.0.100-25
10.0.0.476-100
10.0.0.226-50
10.0.0.351-75
76-10051-75
00-2576-100
26-5000-25
51-7526-50
Client
Write to partition 15
DC2
10.1.0.100-25
10.1.0.476-100
10.1.0.226-50
10.1.0.351-75
76-10051-75
00-2576-100
26-5000-25
51-7526-50
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-10010.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
DC2: RF=3
Multi-datacenterDC1
DC1: RF=3Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-10010.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
10.0.0.100-25
10.0.0.476-100
10.0.0.226-50
10.0.0.351-75
76-10051-75
00-2576-100
26-5000-25
51-7526-50
Client
Write to partition 15
DC2
10.1.0.100-25
10.1.0.476-100
10.1.0.226-50
10.1.0.351-75
76-10051-75
00-2576-100
26-5000-25
51-7526-50
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-10010.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
DC2: RF=3
Summary
•Replication Factor indicates how many times your data is copied
•Consistency Level specifies how many replicas are consistent at read or write
•Replication along with Consistency Factor are critical for uptime
4.2.1.1.3 Cassandra - Read and Write Path (Node Architecture)
WritesCREATE TABLE raw_weather_data ( wsid text, year int, month int, day int, hour int, temperature double, dewpoint double, pressure double, wind_direction int, wind_speed double, sky_condition int, sky_condition_text text, one_hour_precip double, six_hour_precip double, PRIMARY KEY ((wsid), year, month, day, hour) ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC);
WritesCREATE TABLE raw_weather_data ( wsid text, year int, month int, day int, hour int, temperature double, PRIMARY KEY ((wsid), year, month, day, hour) ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC);
INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,10,-5.6);
INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,9,-5.1);
INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,8,-4.9);
INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,7,-5.3);
Write PathClient INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature)
VALUES (‘10010:99999’,2005,12,1,7,-5.3);
year 1wsid 1 month 1 day 1 hour 1
year 2wsid 2 month 2 day 2 hour 2
Memtable
SSTable
SSTable
SSTable
SSTable
Node
Commit Log Data * Compaction *
Temp
Temp
Memory
Disk
Read PathClient
SSTableSSTable
SSTable
Node
Data
SELECT wsid,hour,temperatureFROM raw_weather_dataWHERE wsid='10010:99999'AND year = 2005 AND month = 12 AND day = 1 AND hour >= 7 AND hour <= 10;
year 1wsid 1 month 1 day 1 hour 1
year 2wsid 2 month 2 day 2 hour 2
Memtable
Temp
Temp
Memory
Disk
Summary
•By default, writes are durable•Client receives ack when consistency level is achieved•Reads must always go to disk•Compaction is data housekeeping
43
Questions?