things youshould be doing when using cassandra drivers

52
Things You Should Be Doing When Using Cassandra Drivers Rebecca Mills Junior Evangelist at Datastax @rebccamills

Upload: rebecca-mills

Post on 18-Jul-2015

287 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Things YouShould Be Doing When Using Cassandra Drivers

Things You Should Be Doing When Using Cassandra Drivers

Rebecca Mills Junior Evangelist at Datastax @rebccamills

Page 2: Things YouShould Be Doing When Using Cassandra Drivers

What do I do?

2 Confidential

•  Try to create awareness for open source Cassandra

•  Develop content

•  Identify problems newcomers might be encountering

•  Develop strategies and material to help with that first ease of initial use

Page 3: Things YouShould Be Doing When Using Cassandra Drivers

Of course all this extends to drivers!

Confidential 3

•  Learning and playing with the drivers as much as I can

•  Develop “Getting Started” tutorials for drivers in various programming languages

•  Making it my mission to bring the details to light

Page 4: Things YouShould Be Doing When Using Cassandra Drivers

So How Can We Communicate with Cassandra in “X” Language?

Confidential 4

Page 5: Things YouShould Be Doing When Using Cassandra Drivers

We have what you need!

Confidential 5

•  Datastax provides drivers for Java, Python, C#

•  Fresh out of the oven Ruby, Node.js, and C++

•  Also loads of open source drivers to chose from

•  Check out the Planet Cassandra Client Drivers section

Page 6: Things YouShould Be Doing When Using Cassandra Drivers

Confidential 6

Let’s get into some of the basics of smart Cassandra driver usage:

Page 7: Things YouShould Be Doing When Using Cassandra Drivers

1. One Cluster instance per cluster

Confidential 7

•  Configure different important aspects of the way connections and queries will be handled.

•  Contact points •  Retry Policies •  Load Balancing Policies

cluster  =  Cluster(['10.1.1.3',  '10.1.1.4',  '10.1.1.5'],          compression=True,          load_balancing_policy=TokenAwarePolicy(                  DCAwareRoundRobinPolicy(local_dc='US_EAST')))  

Page 8: Things YouShould Be Doing When Using Cassandra Drivers

2. One Session per keyspace

Confidential 8

•  Query execution, connection pooling •  Long-lived object •  Not to be used in a request/response short-lived

fashion •  Share the same cluster and session instances

across your application

Page 9: Things YouShould Be Doing When Using Cassandra Drivers

Cluster & Session

Confidential 9

cluster  =  Cluster(['10.1.1.3',  '10.1.1.4',  '10.1.1.5'],          compression=True,          load_balancing_policy=TokenAwarePolicy(                  DCAwareRoundRobinPolicy(local_dc='US_EAST')))    session  =  cluster.connect('demo')  

Page 10: Things YouShould Be Doing When Using Cassandra Drivers

3. Use Prepared Statements

Confidential 10

•  If you execute a statement more than once

•  Has multiple benefits

•  Prepare once, bind and execute multiple times

•  We’ll talk more about this soon!

Page 11: Things YouShould Be Doing When Using Cassandra Drivers

Confidential 11

Cool

Useful

Page 12: Things YouShould Be Doing When Using Cassandra Drivers

Confidential 12

Page 13: Things YouShould Be Doing When Using Cassandra Drivers

Deep Dives:

Confidential 13

•  Prepared Statements •  Load Balancing Policies •  Retry Policies •  Connection Pooling

•  Async API

Page 14: Things YouShould Be Doing When Using Cassandra Drivers

Why use Prepared Statements?

Confidential 14

•  More performant than using strings •  Will be parsed only once on the server •  We expect you to use them with repeated queries in

production •  Avoid CQL injection

Page 15: Things YouShould Be Doing When Using Cassandra Drivers

Prepared Statements

Confidential 15

Consider a string session.execute(""”  

 

INSERT  INTO  users  (lastname,  age,  city,  email,  firstname)  VALUES  (‘Jones’,  35,  ‘Austin’,  ‘[email protected]’,  ‘Bob’)  

 

"""

Page 16: Things YouShould Be Doing When Using Cassandra Drivers

Prepared Statements

Confidential 16

session.execute("""  

INSERT  INTO  users  (lastname,  age,  city,  email,  firstname)  VALUES  (‘Smith’,  24,  ‘Tampa’,  ‘[email protected]’,  ‘Bob’)  

 

""")  

 

session.execute(""”  

 

INSERT  INTO  users  (lastname,  age,  city,  email,  firstname)  VALUES  (‘Power’,  45,  ‘New  York’,  ‘[email protected]’,  ‘Kate’)  

 

""")  

 

session.execute(""”  

 

INSERT  INTO  users  (lastname,  age,  city,  email,  firstname)  VALUES  (‘Renolds’,  33,  ‘Miami’,  ‘[email protected]’,  ‘Carl’)  

 

""")  

Page 17: Things YouShould Be Doing When Using Cassandra Drivers

Prepared Statements

Confidential 17

Now the same, as a prepared statement  

Prepared_stmt  =  session.prepare  (“INSERT  INTO  users  (lastname,  age,  city,  email,                        firstname)  VALUES  (?,  ?,  ?,  ?,  ?)”)  

Bound_stmt  =  prepared.bind([‘Jones’,  35,  ‘Austin’,  ‘[email protected]’,  ‘Bob’])  

Stmt  =  session.execute(bound_stmt)      

Page 18: Things YouShould Be Doing When Using Cassandra Drivers

What’s the difference?

Confidential 18

Page 19: Things YouShould Be Doing When Using Cassandra Drivers

Prepared Statements

Confidential 19

Client Cassandra Entire Query String

Client Cassandra Query ID & Bound Values

INSERT with strings

INSERT with PreparedStatements

Large amount of data Parse cost

Smaller amount of data No parsing

Page 20: Things YouShould Be Doing When Using Cassandra Drivers

So what does that mean to me?

Confidential 20

Page 21: Things YouShould Be Doing When Using Cassandra Drivers

Speed!

Confidential 21

Page 22: Things YouShould Be Doing When Using Cassandra Drivers

Prepared Statements

Confidential 22

http://techblog.netflix.com/2013/12/astyanax-update.html

Page 23: Things YouShould Be Doing When Using Cassandra Drivers

Prepared Statements

Confidential 23

Putting a prepared statement in a for loop is an anti-pattern  for  (int  i;  i  <  10;  i++)  {      PreparedStatement  ps  =  session.prepare("UPDATE  user  SET  disabled  =  1  WHERE  id  =  ?");  

           session.execute(ps.bind(i));  }  

Page 24: Things YouShould Be Doing When Using Cassandra Drivers

Load Balancing

Confidential 24

•  A load balancing policy will determine which node to run an insert or query.

•  Since a client can read or write to any node, sometimes that can be inefficient.

•  If a node receives a read or write owned on another node, it will coordinate that request for the client.

•  We can use a load balancing policy to control that action.

Page 25: Things YouShould Be Doing When Using Cassandra Drivers

Load Balancing deep dive

Confidential 25

Using this example

Cluster cluster = new Cluster! .builder().! .addContactPoint(“10.0.0.1”)! .withRetryPolicy(DefaultRetryPolicy.INSTANCE)! .withLoadBalancingPolicy(! new TokenAwarePolicy(! new DCAwareRoundRobinPolicy())!

Page 26: Things YouShould Be Doing When Using Cassandra Drivers

Example data model

Confidential 26

CREATE TABLE users (!

username text PRIMARY KEY!

firstName text,!

lastName text!

);!

!

INSERT INTO users (username, firstName, lastName)!

VALUES (‘rmills’, ‘Rebecca’, ‘Mills’);!

!

INSERT INTO users (username, firstName, lastName)!

VALUES (‘pmcfadin’, ‘Patrick’, ‘McFadin’);!

!

Page 27: Things YouShould Be Doing When Using Cassandra Drivers

Discover cluster

Confidential 27

Client .addContactPoint(“10.0.0.1”)!

10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

RF=3

Page 28: Things YouShould Be Doing When Using Cassandra Drivers

Populate connection pool

Confidential 28

10.0.0.1 00-25

Client

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

DC1!

DC1!

Page 29: Things YouShould Be Doing When Using Cassandra Drivers

Request for data

Confidential 29

Client 10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

SELECT firstName!FROM users!WHERE userName = ‘rmills’;!

rmills Murmur3 Hash Token = 15!

DC1!

Page 30: Things YouShould Be Doing When Using Cassandra Drivers

Token Aware

Confidential 30

Client 10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

SELECT firstName!FROM users!WHERE userName = ‘rmills’;!

Token = 15!

withLoadBalancingPolicy(! new TokenAwarePolicy(!

DC1!

Page 31: Things YouShould Be Doing When Using Cassandra Drivers

Token Aware

Confidential 31

Client 10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

SELECT firstName!FROM users!WHERE userName = ‘rmills’;!

Token = 15!

DC1! Which node?

DC1!

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

Page 32: Things YouShould Be Doing When Using Cassandra Drivers

Token Aware

Confidential 32

Client 10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

SELECT firstName!FROM users!WHERE userName = ‘rmills’;!

Token = 15!

DC1!

DC1!

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

Page 33: Things YouShould Be Doing When Using Cassandra Drivers

Token Aware

Confidential 33

Client 10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

SELECT firstName!FROM users!WHERE userName = ‘rmills’;!

Token = 15!

DC1!

DC1!

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

Page 34: Things YouShould Be Doing When Using Cassandra Drivers

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

Token Aware

Confidential 34

Client 10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

SELECT firstName!FROM users!WHERE userName = ‘rmills’;!

DC1!

DC1!

withLoadBalancingPolicy(! new TokenAwarePolicy(! new DCAwareRoundRobinPolicy())!!

Page 35: Things YouShould Be Doing When Using Cassandra Drivers

Token Aware

Confidential 35

Client 10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

SELECT firstName!FROM users!WHERE userName = ‘rmills’;!

DC1!

DC1!

withLoadBalancingPolicy(! new TokenAwarePolicy(! new DCAwareRoundRobinPolicy())!!

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

Page 36: Things YouShould Be Doing When Using Cassandra Drivers

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

Token Aware - Retry

Confidential 36

Client 10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

SELECT firstName!FROM users!WHERE userName = ‘rmills’;!

DC1!

DC1!

withLoadBalancingPolicy(! new TokenAwarePolicy(! new DCAwareRoundRobinPolicy())!!

Retry Timeout

Page 37: Things YouShould Be Doing When Using Cassandra Drivers

Without Token Aware

Confidential 37

Using this modified example

Cluster cluster = new Cluster! .builder().! .addContactPoint(“10.0.0.1”)! .withRetryPolicy(DefaultRetryPolicy.INSTANCE)! .withLoadBalancingPolicy(! new DCAwareRoundRobinPolicy())!

Page 38: Things YouShould Be Doing When Using Cassandra Drivers

Request for data

Confidential 38

Client 10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

SELECT firstName!FROM users!WHERE userName = ‘pmcfadin’;!

pmcfadin Murmur3 Hash Token = 77!

DC1!

Page 39: Things YouShould Be Doing When Using Cassandra Drivers

No Token Aware

Confidential 39

Client 10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

SELECT firstName!FROM users!WHERE userName = ‘pmcfadin’;!

Token = 77!

DC1!

DC1!

.withLoadBalancingPolicy(! new DCAwareRoundRobinPolicy())!

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

Page 40: Things YouShould Be Doing When Using Cassandra Drivers

Data placement

Confidential 40

Client 10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

SELECT firstName!FROM users!WHERE userName = ‘pmcfadin’;!

Token = 77!

DC1!

DC1!

.withLoadBalancingPolicy(! new DCAwareRoundRobinPolicy())!

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

Page 41: Things YouShould Be Doing When Using Cassandra Drivers

Standard Round Robin

Confidential 41

Client 10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

SELECT firstName!FROM users!WHERE userName = ‘pmcfadin’;!

Token = 77!

DC1!

DC1!

.withLoadBalancingPolicy(! new DCAwareRoundRobinPolicy())!

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50 Coordinate

Page 42: Things YouShould Be Doing When Using Cassandra Drivers

Load Balancing

Confidential 42

•  Default pre-java 2.0.2: RoundRobinPolicy •  Now: TokenAwarePolicy – Adds token awareness to

a child policy •  Acts as a filter, wraps around another policy •  Used to reduce network hops, as only replicas will

be considered

Page 43: Things YouShould Be Doing When Using Cassandra Drivers

Load Balancing - Whitelist

Confidential 43

•  Ensures only the hosts from a provided list are used

•  Wraps a child policy

•  Used to limit the effects of automatic peer discovery

•  Execute queries only a given list of hosts

Page 44: Things YouShould Be Doing When Using Cassandra Drivers

Asynchronous Statements

Confidential 44

•  Native binary protocol supports request pipelining

•  A single connection can be used for single simultaneous and independent request/response exchanges

Page 45: Things YouShould Be Doing When Using Cassandra Drivers

Asynchronous Statements

Confidential 45

•  Don’t have to wait for a query to complete and return rows directly, non-blocking IO

•  Method almost immediately returns a future  object

Node Client

Page 46: Things YouShould Be Doing When Using Cassandra Drivers

Asynchronous Statements

Confidential 46

query  =  "SELECT  *  FROM  users  WHERE  lastname=%s"  future  =  session.execute_async(query,  [lastname])    #  ...  do  some  other  work    try:          rows  =  future.result()          user  =  rows[0]          print  user.name,  user.age  except  ReadTimeout:          log.exception("Query  timed  out:")  

Page 47: Things YouShould Be Doing When Using Cassandra Drivers

Asynchronous Statements

Confidential 47

 #  build  a  list  of  futures  futures  =  []  query  =  "SELECT  *  FROM  users  WHERE  lastname=%s"  for  user_id  in  ids_to_fetch:          futures.append(session.execute_async(query,  [lastname])    #  wait  for  them  to  complete  and  use  the  results  for  future  in  futures:          rows  =  future.result()          print  rows[0].name,  rows[0].age  

Page 48: Things YouShould Be Doing When Using Cassandra Drivers

Where can I download the drivers?

Confidential 48

Page 49: Things YouShould Be Doing When Using Cassandra Drivers

Planet Cassandra

Confidential 49

•  A great place for Apache Cassandra resources!

•  Blog post, webinars, tutorials, and much much more!

•  Also a great place for your driver needs

Page 50: Things YouShould Be Doing When Using Cassandra Drivers

Confidential 50

Page 51: Things YouShould Be Doing When Using Cassandra Drivers

Confidential 51

Page 52: Things YouShould Be Doing When Using Cassandra Drivers

Thank You!Twitter: @rebccamills

Confidential 52