jonathan ellis "apache cassandra 2.0 and 2.1". Выступление на cassandra conf...
DESCRIPTION
TRANSCRIPT
©2013 DataStax Confidential. Do not distribute without consent.
CTO, DataStax
Jonathan EllisProject Chair, Apache Cassandra
Modern Apache Cassandra
1
Five years of Cassandra
Jul-09 May-10 Feb-11 Dec-11 Oct-12 Jul-13
0.1 0.3 0.6 0.7 1.0 1.2...
2.0
DSE
Jul-08
Application/Use Case• Social Signals: like/want/own
features for eBay product and item pages
• Hunch taste graph for eBay users and items
• Many time series use cases
Why Cassandra? • Multi-datacenter• Scalable• Write performance• Distributed counters• Hadoop support
ACE
Time series data
Multi-datacenter support
Distributed counters
Hadoop support
Application/Use Case• Adobe AudienceManager: web
analytics, content management, and online advertising
Why Cassandra? • Low-latency• Scalable• Multi-datacenter• Tuneable consistency
ACE
Bootstrapping
Bootstrapping
Bootstrapping
sd
s d
sd
sd
Bootstrapping
sd
s d
sd
sd
Bootstrapping
Tuneable consistency•(We’ll come back to this)
Application/Use Case• Logging• Notifications
Why Cassandra? • Efficient writes• Durable• Scalable• High availability
ACE
Durable + efficient writes
Memory
Hard drive
Memtable
write( , )k1 c1:v1
Commit log
Memory
Hard drive
Memtable
write( , k1 c1:v
Commit log
k1 c1:v
k1 c1:v
Memory
Hard drive
write( , k1 c2:v
k1 c1:v
k1 c1:v
k1 c2:v
c2:v
Memory
Hard drive
k1 c1:v
k1 c1:v
k1 c2:v
c2:v
write( , )k2 c1:v c2:v
k2 c1:v c2:v
k2 c1:v c2:v
Memory
Hard drive
k1 c1:v
k1 c1:v
k1 c2:v
c2:v
write( , )k1 c1:v c3:v
k2 c1:v c2:v
k2 c1:v c2:v
k1 c1:v c3:v
c3:v
Memory
Hard drive
SSTable
flush
k1 c1:v c2:v
k2 c1:v c2:v
c3:v
index / BF
cleanup
High availability•99.9999% availability on Cassandra•(We’ll come back to this, too)
Core values•Massive scalability•High performance
•Ease of use
•Reliability/Availabilty
Cassandra HBase RedisMySQL
0
20000
40000
60000
80000
0 2 4 6 8 10 12
Cassandra HBase RedisMySQL
NUMBER OF NODES
THRO
UG
HPU
T O
PS/S
EC) CASSANDRA
VLDB benchmark (RWS)
0
8750
17500
26250
35000
1 2 4 8 16 32
Cassandra HBase MongoDB
CASSANDRA
Endpoint benchmark (RW)TH
ROU
GH
PUT
OPS
/SEC
)
NUMBER OF NODES
Ease of useCREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int);
CREATE INDEX ON users(state);
SELECT * FROM users WHERE state=‘Texas’ AND birth_date > 1950;
Classic partitioning (SPOF)
partition 1 partition 2 partition 3 partition 4
router
client
(Not a theoretical problem)
https://speakerdeck.com/mitsuhiko/a-year-of-mongodb
http://aphyr.com/posts/288-the-network-is-reliable
Fully distributed, no SPOF
p1
p1
p1p3
p6
Client
Primary key determines placement*
Partitioning
jim
carol
johnny
suzy
age: 36 car: camaro gender: M
age: 37 car: subaru gender: F
age:12 gender: M
age:10 gender: F
jim
carol
johnny
suzy
PK
5e02739678...
a9a0198010...
f4eb27cea7...
78b421309e...
Murmur Hash
Murmur* hash operation yields a 64-bit number for keysof any size.
Node A
Node D Node C
Node B
The “token ring”
jim 5e02739678...
carol a9a0198010...
johnny f4eb27cea7...
suzy 78b421309e...
Start EndA 0xc000000000..
10x0000000000..0
B 0x0000000000..1
0x4000000000..0
C 0x4000000000..1
0x8000000000..0
D 0x8000000000..1
0xc000000000..0
jim 5e02739678...
carol a9a0198010...
johnny f4eb27cea7...
suzy 78b421309e...
Start EndA 0xc000000000..
10x0000000000..0
B 0x0000000000..1
0x4000000000..0
C 0x4000000000..1
0x8000000000..0
D 0x8000000000..1
0xc000000000..0
jim 5e02739678...
carol a9a0198010...
johnny f4eb27cea7...
suzy 78b421309e...
Start EndA 0xc000000000..
10x0000000000..0
B 0x0000000000..1
0x4000000000..0
C 0x4000000000..1
0x8000000000..0
D 0x8000000000..1
0xc000000000..0
jim 5e02739678...
carol a9a0198010...
johnny f4eb27cea7...
suzy 78b421309e...
Start EndA 0xc000000000..
10x0000000000..0
B 0x0000000000..1
0x4000000000..0
C 0x4000000000..1
0x8000000000..0
D 0x8000000000..1
0xc000000000..0
jim 5e02739678...
carol a9a0198010...
johnny f4eb27cea7...
suzy 78b421309e...
Start EndA 0xc000000000..
10x0000000000..0
B 0x0000000000..1
0x4000000000..0
C 0x4000000000..1
0x8000000000..0
D 0x8000000000..1
0xc000000000..0
Node A
Node D Node C
Node B
carol a9a0198010...
Replication
Node A
Node D Node C
Node B
carol a9a0198010...
Node A
Node D Node C
Node B
carol a9a0198010...
C’’A’’
D’
C’A’ D
A
B’
CB
Virtual nodes
Node A
Node D Node C
Node B
Without vnodes With vnodes
A closer look at reads
Client Coordinator
40%busy
90%busy
30%busy
A closer look at reads
Client Coordinator
40%busy
90%busy
30%busy
A closer look at reads
Client Coordinator
40%busy
90%busy
30%busy
A closer look at reads
Client Coordinator
40%busy
90%busy
30%busy
A closer look at reads
Client Coordinator
40%busy
90%busy
30%busy
Rapid read protection
Client Coordinator
40%busy
90%busy
30%busy
Rapid read protection
Client Coordinator
40%busy
90%busy
30%busy
Rapid read protection
Client Coordinator
40%busy
90%busy
30%busy
Rapid read protection
Client Coordinator
40%busy
90%busy
30%busyX
Rapid read protection
Client Coordinator
40%busy
90%busy
30%busyX
Rapid read protection
Client Coordinator
40%busy
90%busy
30%busyX
Rapid read protection
Client Coordinator
40%busy
90%busy
30%busyX
Rapid Read Protection
NONE
Consistency levels
Client Coordinator
40%busy
90%busy
30%busy
Consistency levels
Client Coordinator
40%busy
90%busy
30%busy
Consistency levels
Client Coordinator
40%busy
90%busy
30%busy
Consistency levels
Client Coordinator
40%busy
90%busy
30%busy
Consistency levels
Client Coordinator
40%busy
90%busy
30%busy
Consistency levels•ONE•QUORUM
•LOCAL_QUORUM
•LOCAL_ONE•TWO
•ALL
#CASSANDRAEURace conditionSELECT nameFROM usersWHERE username = 'pmcfadin';
#CASSANDRAEURace conditionSELECT nameFROM usersWHERE username = 'pmcfadin';
(0 rows) SELECT nameFROM usersWHERE username = 'pmcfadin';
#CASSANDRAEURace conditionSELECT nameFROM usersWHERE username = 'pmcfadin';
(0 rows) SELECT nameFROM usersWHERE username = 'pmcfadin';
INSERT INTO users (username, name, email, password, created_date)VALUES ('pmcfadin', 'Patrick McFadin', ['[email protected]'], 'ba27e03fd9...', '2011-06-20 13:50:00');
(0 rows)
#CASSANDRAEURace conditionSELECT nameFROM usersWHERE username = 'pmcfadin';
(0 rows) SELECT nameFROM usersWHERE username = 'pmcfadin';
INSERT INTO users (username, name, email, password, created_date)VALUES ('pmcfadin', 'Patrick McFadin', ['[email protected]'], 'ba27e03fd9...', '2011-06-20 13:50:00');
(0 rows)
INSERT INTO users (username, name, email, password, created_date)VALUES ('pmcfadin', 'Patrick McFadin', ['[email protected]'], 'ea24e13ad9...', '2011-06-20 13:50:01');
#CASSANDRAEURace conditionSELECT nameFROM usersWHERE username = 'pmcfadin';
This one wins
(0 rows) SELECT nameFROM usersWHERE username = 'pmcfadin';
INSERT INTO users (username, name, email, password, created_date)VALUES ('pmcfadin', 'Patrick McFadin', ['[email protected]'], 'ba27e03fd9...', '2011-06-20 13:50:00');
(0 rows)
INSERT INTO users (username, name, email, password, created_date)VALUES ('pmcfadin', 'Patrick McFadin', ['[email protected]'], 'ea24e13ad9...', '2011-06-20 13:50:01');
#CASSANDRAEULightweight transactionsINSERT INTO users (username, name, email, password, created_date)VALUES ('pmcfadin', 'Patrick McFadin', ['[email protected]'], 'ba27e03fd9...', '2011-06-20 13:50:00')IF NOT EXISTS;
#CASSANDRAEULightweight transactionsINSERT INTO users (username, name, email, password, created_date)VALUES ('pmcfadin', 'Patrick McFadin', ['[email protected]'], 'ba27e03fd9...', '2011-06-20 13:50:00')IF NOT EXISTS;
[applied]----------- True
INSERT INTO users (username, name, email, password, created_date)VALUES ('pmcfadin', 'Patrick McFadin', ['[email protected]'], 'ea24e13ad9...', '2011-06-20 13:50:01')IF NOT EXISTS;
#CASSANDRAEULightweight transactions
[applied] | username | created_date | name -----------+----------+----------------+---------------- False | pmcfadin | 2011-06-20 ... | Patrick McFadin
INSERT INTO users (username, name, email, password, created_date)VALUES ('pmcfadin', 'Patrick McFadin', ['[email protected]'], 'ba27e03fd9...', '2011-06-20 13:50:00')IF NOT EXISTS;
[applied]----------- True
INSERT INTO users (username, name, email, password, created_date)VALUES ('pmcfadin', 'Patrick McFadin', ['[email protected]'], 'ea24e13ad9...', '2011-06-20 13:50:01')IF NOT EXISTS;
Paxos•All operations are quorum-based•Each replica sends information about unfinished operations to the leader during prepare
•Paxos made Simple
Details•4 round trips vs 1 for normal updates•Paxos state is durable
•Immediate consistency with no leader election or failover
•ConsistencyLevel.SERIAL•http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0
Use with caution•Great for 1% of your application•Eventual consistency is your friend
•http://www.slideshare.net/planetcassandra/c-summit-2013-eventual-consistency-hopeful-consistency-by-christos-kalantzis
Cassandra 2.1
User defined typesCREATE TYPE address (
street text, city text, zip_code int, phones set<text>)
CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses map<text, address>)
SELECT id, name, addresses.city, addresses.phones FROM users;
id | name | addresses.city | addresses.phones--------------------+----------------+-------------------------- 63bf691f | jbellis | Austin | {'512-4567', '512-9999'}
Collection indexingCREATE TABLE songs (
id uuid PRIMARY KEY, artist text, album text, title text, data blob, tags set<text>);
CREATE INDEX song_tags_idx ON songs(tags);
SELECT * FROM songs WHERE 'blues' IN tags;
id | album | artist | tags | title----------+---------------+-------------------+-----------------------+------------------ 5027b27e | Country Blues | Lightnin' Hopkins | {'acoustic', 'blues'} | Worrying My Mind
More-efficient repair
More-efficient repair
More-efficient repair
More-efficient repair
More-efficient repair
More-efficient repair
More-efficient repair
More-efficient repair
More-efficient repair
2.1 roadmap•Efficient handling of cold data•Counters 2.0
•Only repair new-since-last-repair data
•January/February 2014
Вопросы?