cassandra explained 100922022328 phpapp01
TRANSCRIPT
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
1/50
Cassandra Explained
Disruptive Code
September 22, 2010
Eric [email protected]
@jericevanshttp://blog.sym-link.com
mailto:[email protected]:[email protected] -
8/6/2019 Cassandra Explained 100922022328 Phpapp01
2/50
Outline
Background
Description
API(s) Code Samples
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
3/50
Background
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
4/50
The Digital Universe
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
5/50
Consolidation
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
6/50
Old Guard
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
7/50
Vertical Scaling Sucks
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
8/50
Influential Papers
BigTable
Strong consistency
Sparse map data model GFS, Chubby, et al
Dynamo
O(1) distributed hash table (DHT) BASE (aka eventual consistency)
Client tunable consistency/availability
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
9/50
NoSQL
HBase
MongoDB
Riak Voldemort
Neo4J
Cassandra
Hypertable
HyperGraphDB
Memcached Tokyo Cabinet
Redis
CouchDB
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
10/50
NoSQL Big data
HBase
MongoDB
Riak Voldemort
Neo4J
Cassandra
Hypertable
HyperGraphDB
Memcached Tokyo Cabinet
Redis
CouchDB
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
11/50
Bigtable / Dynamo
Bigtable
HBase
Hypertable
Dynamo
Riak
Voldemort
Cassandra ??
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
12/50
Dynamo-Bigtable Lovechild
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
13/50
CAP Theorem Pick Two
CP
Bigtable
Hypertable
HBase
AP
Dynamo
Voldemort
Cassandra
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
14/50
CAP Theorem Pick Two
Consistency Availability
Partition Tolerance
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
15/50
Description
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
16/50
Properties
Symmetric
No single point of failure
Linearly scalable Ease of administration
Flexible partitioning, replica placement
Automated provisioning High availability (eventual consistency)
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
17/50
P2P Routing
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
18/50
P2P Routing
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
19/50
Partitioning
Random
128bit namespace, (MD5)
Good distribution
Order Preserving
Tokens determine namespace
Natural order (lexicographical)
Range / cover queries
Yours ??
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
20/50
Replica Placement
SimpleSnitch
Default
N-1 successive nodes
RackInferringSnitch
Infers DC/rack from IP
PropertyFileSnitch
Configured w/ a properties file
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
21/50
Bootstrap
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
22/50
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
23/50
Bootstrap
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
24/50
Choosing Consistency
Read
Level Description
ZERO N/AANY N/A
ONE 1 replica
QUORUM (N / 2) +1
ALL All replicas
Write
Level Description
ZERO Hail MaryANY 1 replica (HH)
ONE 1 replica
QUORUM (N / 2) +1
ALL All replicas
R + W > N
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
25/50
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
26/50
Quorum ((N/2) + 1)
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
27/50
Data Model
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
28/50
Overview
Keyspace
Uppermost namespace
Typically one per application
ColumnFamily Associates records of a similar kind
Record-level Atomicity
Indexed Column
Basic unit of storage
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
29/50
Sparse Table
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
30/50
Column
name
byte[]
Queried against (predicates)
Determines sort order value
byte[]
Opaque to Cassandra
timestamp
long
Conflict resolution (Last Write Wins)
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
31/50
Column Comparators
Bytes
UTF8
TimeUUID
Long
LexicalUUID
Composite (third-party)
http://github.com/edanuff/CassandraCompositeType
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
32/50
API
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
33/50
RPC
THRIFT
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
34/50
RPC
THRIFT
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
35/50
RPC
AVRO
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
36/50
RPC
AVRO
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
37/50
Idiomatic Client Libraries
Pelops, Hector (Java)
Pycassa (Python)
Cassandra (Ruby)
Others
http://wiki.apache.org/cassandra/ClientOptions
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
38/50
Code Samples
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
39/50
Pycassa Python Client API
# creating a connectionfrom pycassa import connecthosts = ['host1:9160', 'host2:9160']client = connect('Keyspace1', hosts)
# creating a column family instancefrom pycassa import ColumnFamilycf = ColumnFamily(client, Standard1)
# reading/writing a columncf.insert('key1', {'name': 'value'})print cf.get('key1')['name']
1. http://github.com/vomjom/pycassa
http://github.com/vomjom/pycassahttp://github.com/vomjom/pycassa -
8/6/2019 Cassandra Explained 100922022328 Phpapp01
40/50
Address Book Setup
# conf/cassandra.yamlkeyspaces:- name: AddressBook
column_families:
- name: Addressescompare_with: BytesTyperows_cached: 10000keys_cached: 50
comment: 'No comment'
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
41/50
Adding an entry
key = uuid()
columns = {'first': 'Eric',
'last': 'Evans','email': '[email protected]','city': 'San Antonio','zip': 78250
}
addresses.insert(key, columns)
mailto:'[email protected]:'[email protected] -
8/6/2019 Cassandra Explained 100922022328 Phpapp01
42/50
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
43/50
Indexing (manual)
# conf/cassandra.yamlkeyspaces:- name: AddressBook
column_families:
- name: Addressescompare_with: BytesTyperows_cached: 10000keys_cached: 50
comment: 'No comment'- name: ByCitycompare_with: UTF8Type
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
44/50
Updating the index
key = uuid()
columns = {'first': 'Eric',
'last': 'Evans','email': '[email protected]','city': 'San Antonio','zip': 78250
}
addresses.insert(key, columns)byCity.insert('San Antonio', {key: ''})
mailto:'[email protected]:'[email protected] -
8/6/2019 Cassandra Explained 100922022328 Phpapp01
45/50
Indexing (auto)
# conf/cassandra.yamlkeyspaces:- name: AddressBook
column_families:
- name: Addressescompare_with: BytesTyperows_cached: 10000keys_cached: 50
comment: 'No comment' column_metadata:- name: cityindex_type: KEYS
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
46/50
Querying the Index
from pycassa.index import create_index_expressionfrom pycassa.index import create_index_clause
e = create_index_expression('city', 'San Antonio')clause = create_index_clause([e])
results = address.get_indexed_slices(clause)
for (key, columns) in results.items(): print %(first)s %(last)s % columns
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
47/50
Timeseries
# conf/cassandra.yamlkeyspaces:- name: Sites
column_families:
- name: Statscompare_with: LongType
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
48/50
Logging values
# time as long (milliseconds since epoch)tstmp = long(time() * 1e6)
stats.insert('org.apache', {tstmp: value})
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
49/50
Slicing
begin = long(start * 1e6)
stats.get_range('org.apache',column_start=begin)
end = long((start + 86400) * 1e6)
stats.get_range(start='org.apache',finish='org.debian',column_start=begin,column_finish=end)
-
8/6/2019 Cassandra Explained 100922022328 Phpapp01
50/50
Questions?