a deep dive into apache cassandra for .net developers

88
A Deep Dive into Apache Cassandra for .NET Developers Luke Tillman (@LukeTillman) Language Evangelist at DataStax

Upload: luke-tillman

Post on 11-Jul-2015

751 views

Category:

Technology


5 download

TRANSCRIPT

Page 1: A Deep Dive into Apache Cassandra for .NET Developers

A Deep Dive into Apache Cassandra for

.NET Developers

Luke Tillman (@LukeTillman)

Language Evangelist at DataStax

Page 2: A Deep Dive into Apache Cassandra for .NET Developers

Who are you?!

• Evangelist with a focus on the .NET Community

• Long-time .NET Developer

• Recently presented at Cassandra Summit 2014 with Microsoft

• Very Recent Denver Transplant

2

Page 3: A Deep Dive into Apache Cassandra for .NET Developers

Why should I care?

• Cassandra Core Principles

– Ease of Use

– Massive Scalability

– High Performance

– Always Available

• Growth!

3

DB Engines Rankings (January 2014)

http://db-engines.com/en/ranking

Page 4: A Deep Dive into Apache Cassandra for .NET Developers

1 What is Cassandra and how does it work?

2 Cassandra Query Language (CQL)

3 Data Modeling Like a Pro

4 .NET Driver for Cassandra

5 Tools and Code

4

Page 5: A Deep Dive into Apache Cassandra for .NET Developers

What is Cassandra and how does it work?

5

Page 6: A Deep Dive into Apache Cassandra for .NET Developers

What is Cassandra?

• A Linearly Scaling and Fault Tolerant Distributed Database

• Fully Distributed

– Data spread over many nodes

– All nodes participate in a cluster

– All nodes are equal (masterless)

– No SPOF (shared nothing)

6

Page 8: A Deep Dive into Apache Cassandra for .NET Developers

What is Cassandra?

• Fault Tolerant

– Nodes Down != Database Down

– Datacenter Down != Database Down

8

Page 9: A Deep Dive into Apache Cassandra for .NET Developers

What is Cassandra?

• Fully Replicated

• Clients write local

• Data syncs across WAN

• Replication Factor per DC

9

US Europe

Client

Page 10: A Deep Dive into Apache Cassandra for .NET Developers

Cassandra and the CAP Theorem

• The CAP Theorem limits what distributed systems can do

• Consistency

• Availability

• Partition Tolerance

• Limits? “Pick 2 out of 3”

• Cassandra is an AP system that is Eventually Consistent

10

Page 11: A Deep Dive into Apache Cassandra for .NET Developers

Two knobs control Cassandra fault tolerance

• Replication Factor (server side)

– How many copies of the data should exist?

11

Client

B AD

C AB

A CD

D BC

Write A

RF=3

Page 12: A Deep Dive into Apache Cassandra for .NET Developers

Two knobs control Cassandra fault tolerance

• Consistency Level (client side)

– How many replicas do we need to hear from before we acknowledge?

12

Client

B AD

C AB

A CD

D BC

Write A

CL=QUORUM

Client

B AD

C AB

A CD

D BC

Write A

CL=ONE

Page 13: A Deep Dive into Apache Cassandra for .NET Developers

Consistency Levels

• Applies to both Reads and Writes (i.e. is set on each query)

• ONE – one replica from any DC

• LOCAL_ONE – one replica from local DC

• QUORUM – 51% of replicas from any DC

• LOCAL_QUORUM – 51% of replicas from local DC

• ALL – all replicas

• TWO

13

Page 14: A Deep Dive into Apache Cassandra for .NET Developers

Consistency Level and Speed

• How many replicas we need to hear from can affect how quickly

we can read and write data in Cassandra

14

Client

B AD

C AB

A CD

D BC

5 µs ack

300 µs ack

12 µs ack

12 µs ack

Read A

(CL=QUORUM)

Page 15: A Deep Dive into Apache Cassandra for .NET Developers

Consistency Level and Availability

• Consistency Level choice affects availability

• For example, QUORUM can tolerate one replica being down and

still be available (in RF=3)

15

Client

B AD

C AB

A CD

D BC

A=2

A=2

A=2

Read A

(CL=QUORUM)

Page 16: A Deep Dive into Apache Cassandra for .NET Developers

Consistency Level and Eventual Consistency

• Cassandra is an AP system that is Eventually Consistent so

replicas may disagree

• Column values are timestamped

• In Cassandra, Last Write Wins (LWW)

16

Client

B AD

C AB

A CD

D BC

A=2

Newer

A=1

Older

A=2

Read A

(CL=QUORUM)

Christos from Netflix: “Eventual Consistency != Hopeful Consistency”

https://www.youtube.com/watch?v=lwIA8tsDXXE

Page 17: A Deep Dive into Apache Cassandra for .NET Developers

Writes in the cluster

• Fully distributed, no SPOF

• Node that receives a request is the Coordinator for request

• Any node can act as Coordinator

17

Client

B AD

C AB

A CD

D BC

Write A

(CL=ONE)

Coordinator Node

Page 18: A Deep Dive into Apache Cassandra for .NET Developers

Writes in the cluster – Data Distribution

• Partition Key determines node placement

18

Partition Key

id='pmcfadin' lastname='McFadin'

id='jhaddad' firstname='Jon' lastname='Haddad'

id='ltillman' firstname='Luke' lastname='Tillman'

CREATE TABLE users ( id text, firstname text, lastname text, PRIMARY KEY (id) );

Page 19: A Deep Dive into Apache Cassandra for .NET Developers

Writes in the cluster – Data Distribution

• The Partition Key is hashed using a consistent hashing function

(Murmur 3) and the output is used to place the data on a node

• The data is also replicated to RF-1 other nodes

19

Partition Key

id='ltillman' firstname='Luke' lastname='Tillman'

Murmur3 id: ltillman Murmur3: A

B AD

C AB

A CD

D BC

RF=3

Page 20: A Deep Dive into Apache Cassandra for .NET Developers

Hashing – Back to Reality

• Back in reality, Partition Keys actually hash to 128 bit numbers

• Nodes in Cassandra own token ranges (i.e. hash ranges)

20

B AD

C AB

A CD

D BC

Range Start End

A 0xC000000..1 0x0000000..0

B 0x0000000..1 0x4000000..0

C 0x4000000..1 0x8000000..0

D 0x8000000..1 0xC000000..0

Partition Key

id='ltillman' Murmur3 0xadb95e99da887a8a4cb474db86eb5769

Page 21: A Deep Dive into Apache Cassandra for .NET Developers

Writes on a single node

• Client makes a write request

Client

UPDATE users SET firstname = 'Luke' WHERE id = 'ltillman'

Disk

Memory

Page 22: A Deep Dive into Apache Cassandra for .NET Developers

Writes on a single node

Client

UPDATE users SET firstname = 'Luke' WHERE id = 'ltillman'

Commit Log

id='ltillman', firstname='Luke'

Disk

Memory

• Data is appended to the Commit Log

• Append only, sequential IO == FAST

Page 23: A Deep Dive into Apache Cassandra for .NET Developers

Writes on a single node

Client

UPDATE users SET firstname = 'Luke' WHERE id = 'ltillman'

Commit Log

id='ltillman', firstname='Luke'

Disk

Memory

Memtable for Users Some

Other

Memtable id='ltillman' firstname='Luke' lastname='Tillman'

• Data is written/merged into Memtable

Page 24: A Deep Dive into Apache Cassandra for .NET Developers

Writes on a single node

Client

UPDATE users SET firstname = 'Luke' WHERE id = 'ltillman'

Commit Log

id='ltillman', firstname='Luke'

Disk

Memory

Memtable for Users Some

Other

Memtable id='ltillman' firstname='Luke' lastname='Tillman'

• Server acknowledges to client

• Writes in C* are FAST due to simplicity, log structured storage

Page 25: A Deep Dive into Apache Cassandra for .NET Developers

Writes on a single node

Client

UPDATE users SET firstname = 'Luke' WHERE id = 'ltillman'

Data Directory

Disk

Memory

Memtable for Users Some

Other

Memtable id='ltillman' firstname='Luke' lastname='Tillman'

Some

Other

SSTable

SSTable

#1 for

Users

SSTable

#2 for

Users

• Once Memtable is full, data is flushed to disk as SSTable (Sorted

String Table)

Page 26: A Deep Dive into Apache Cassandra for .NET Developers

Compaction

• Compactions merge and unify data in our SSTables

• SSTables are immutable, so this is when we consolidate rows

26

SSTable

#1 for

Users

SSTable

#2 for

Users

SSTable #3 for

Users

id='ltillman' firstname='Lucas' (timestamp=Older)

lastname='Tillman'

id='ltillman' firstname='Luke' lastname='Tillman'

id='ltillman' firstname='Luke' (timestamp=Newer)

Page 27: A Deep Dive into Apache Cassandra for .NET Developers

Reads in the cluster

• Same as writes in the cluster, reads are coordinated

• Any node can be the Coordinator Node

27

Client

B AD

C AB

A CD

D BC

Read A

(CL=QUORUM)

Coordinator Node

Page 28: A Deep Dive into Apache Cassandra for .NET Developers

Reads on a single node

• Client makes a read request

28

Client

SELECT firstname, lastname FROM users WHERE id = 'ltillman'

Disk

Memory

Page 29: A Deep Dive into Apache Cassandra for .NET Developers

Reads on a single node

• Data is read from (possibly multiple) SSTables and merged

29

Client

SELECT firstname, lastname FROM users WHERE id = 'ltillman'

Disk

Memory

SSTable #1 for Users

id='ltillman' firstname='Lucas' (timestamp=Older)

lastname='Tillman'

SSTable #2 for Users

id='ltillman' firstname='Luke' (timestamp=Newer)

firstname='Luke' lastname='Tillman'

Page 30: A Deep Dive into Apache Cassandra for .NET Developers

Reads on a single node

• Any unflushed Memtable data is also merged

30

Client

SELECT firstname, lastname FROM users WHERE id = 'ltillman'

Disk

Memory

firstname='Luke' lastname='Tillman' Memtable

for Users

Page 31: A Deep Dive into Apache Cassandra for .NET Developers

Reads on a single node

• Client gets acknowledgement with the data

• Reads in Cassandra are also FAST but often limited by Disk IO

31

Client

SELECT firstname, lastname FROM users WHERE id = 'ltillman'

Disk

Memory

firstname='Luke' lastname='Tillman'

Page 32: A Deep Dive into Apache Cassandra for .NET Developers

Compaction - Revisited

• Compactions merge and unify data in our SSTables, making

them important to reads (less SSTables = less to read/merge)

32

SSTable

#1 for

Users

SSTable

#2 for

Users

SSTable #3 for

Users

id='ltillman' firstname='Lucas' (timestamp=Older)

lastname='Tillman'

id='ltillman' firstname='Luke' lastname='Tillman'

id='ltillman' firstname='Luke' (timestamp=Newer)

Page 33: A Deep Dive into Apache Cassandra for .NET Developers

Cassandra Query Language (CQL)

33

Page 34: A Deep Dive into Apache Cassandra for .NET Developers

Data Structures

• Keyspace is like RDBMS Database or Schema

• Like RDBMS, Cassandra uses Tables to store data

• Partitions can have one row (narrow) or multiple

rows (wide)

34

Keyspace

Tables

Partitions

Rows

Page 35: A Deep Dive into Apache Cassandra for .NET Developers

Schema Definition (DDL)

• Easy to define tables for storing data

• First part of Primary Key is the Partition Key

CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, tags set<text>, added_date timestamp, PRIMARY KEY (videoid) );

Page 36: A Deep Dive into Apache Cassandra for .NET Developers

Schema Definition (DDL)

• One row per partition (familiar)

CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, tags set<text>, added_date timestamp, PRIMARY KEY (videoid) );

name ...

Keyboard Cat ...

Nyan Cat ...

Original Grumpy Cat ...

videoid

689d56e5- …

93357d73- …

d978b136- …

Page 37: A Deep Dive into Apache Cassandra for .NET Developers

Clustering Columns

• Second part of Primary Key is Clustering Columns

• Clustering columns affect ordering of data (on disk)

• Multiple rows per partition

37

CREATE TABLE comments_by_video ( videoid uuid, commentid timeuuid, userid uuid, comment text, PRIMARY KEY (videoid, commentid) ) WITH CLUSTERING ORDER BY (commentid DESC);

Page 38: A Deep Dive into Apache Cassandra for .NET Developers

Clustering Columns – Wide Rows (Partitions)

• Use of Clustering Columns is where the (old) term “Wide Rows”

comes from

38

videoid='0fe6a...'

userid= 'ac346...'

comment= 'Awesome!'

commentid='82be1...' (10/1/2014 9:36AM)

userid= 'f89d3...'

comment= 'Garbage!'

commentid='765ac...' (9/17/2014 7:55AM)

CREATE TABLE comments_by_video ( videoid uuid, commentid timeuuid, userid uuid, comment text, PRIMARY KEY (videoid, commentid) ) WITH CLUSTERING ORDER BY (commentid DESC);

Page 39: A Deep Dive into Apache Cassandra for .NET Developers

Inserts and Updates

• Use INSERT or UPDATE to add and modify data

• Both will overwrite data (no constraints like RDBMS)

• INSERT and UPDATE functionally equivalent 39

INSERT INTO comments_by_video ( videoid, commentid, userid, comment) VALUES ( '0fe6a...', '82be1...', 'ac346...', 'Awesome!');

UPDATE comments_by_video SET userid = 'ac346...', comment = 'Awesome!' WHERE videoid = '0fe6a...' AND commentid = '82be1...';

Page 40: A Deep Dive into Apache Cassandra for .NET Developers

TTL and Deletes

• Can specify a Time to Live (TTL) in seconds when doing an

INSERT or UPDATE

• Use DELETE statement to remove data

• Can optionally specify columns to remove part of a row

40

INSERT INTO comments_by_video ( ... ) VALUES ( ... ) USING TTL 86400;

DELETE FROM comments_by_video WHERE videoid = '0fe6a...' AND commentid = '82be1...';

Page 41: A Deep Dive into Apache Cassandra for .NET Developers

Querying

• Use SELECT to get data from your tables

• Always include Partition Key and optionally Clustering Columns

• Can use ORDER BY and LIMIT

• Use range queries (for example, by date) to slice partitions

41

SELECT * FROM comments_by_video WHERE videoid = 'a67cd...' LIMIT 10;

Page 42: A Deep Dive into Apache Cassandra for .NET Developers

Data Modeling Like a Pro

42

Page 43: A Deep Dive into Apache Cassandra for .NET Developers

Cassandra Data Modeling

• Requires a different mindset than RDBMS modeling

• Know your data and your queries up front

• Queries drive a lot of the modeling decisions (i.e. “table per

query” pattern)

• Denormalize/Duplicate data at write time to do as few queries

as possible come read time

• Remember, disk is cheap and writes in Cassandra are FAST

43

Page 44: A Deep Dive into Apache Cassandra for .NET Developers

Getting to Know Your Data

44

User

id

firstname

lastname

email

password Video

id

name

description

location

preview_image

tags features

Comment comment

id

adds

timestamp

posts

timestamp

1

n

n

1

1

n n

m

rates

rating

Page 45: A Deep Dive into Apache Cassandra for .NET Developers

Application Workflows

45

User Logs

into site

Show basic

information

about user

Show videos

added by a

user

Show

comments

posted by a

user

Search for a

video by tag

Show latest

videos added

to the site

Show

comments

for a video

Show ratings

for a video

Show video

and its

details

Page 46: A Deep Dive into Apache Cassandra for .NET Developers

Queries come from Workflows

46

Users

User Logs

into site

Find user by email

address

Show basic

information

about user Find user by id

Comments

Show

comments

for a video

Find comments by

video (latest first)

Show

comments

posted by a

user

Find comments by

user (latest first)

Ratings

Show ratings

for a video Find ratings by video

Page 47: A Deep Dive into Apache Cassandra for .NET Developers

Users – The Relational Way

• Single Users table with all user data and an Id Primary Key

• Add an index on email address to allow queries by email

User Logs

into site

Find user by email

address

Show basic

information

about user Find user by id

Page 48: A Deep Dive into Apache Cassandra for .NET Developers

Users – The Cassandra Way

User Logs

into site

Find user by email

address

Show basic

information

about user Find user by id

CREATE TABLE user_credentials ( email text, password text, userid uuid, PRIMARY KEY (email) );

CREATE TABLE users ( userid uuid, firstname text, lastname text, email text, created_date timestamp, PRIMARY KEY (userid) );

Page 49: A Deep Dive into Apache Cassandra for .NET Developers

Modeling Relationships – Collection Types

• Cassandra doesn’t support JOINs, but your data will still have

relationships (and you can still model that in Cassandra)

• One tool available is CQL collection types

CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, location text, location_type int, preview_image_location text, tags set<text>, added_date timestamp, PRIMARY KEY (videoid) );

Page 50: A Deep Dive into Apache Cassandra for .NET Developers

Modeling Relationships – Client Side Joins

50

CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, location text, location_type int, preview_image_location text, tags set<text>, added_date timestamp, PRIMARY KEY (videoid) );

CREATE TABLE users ( userid uuid, firstname text, lastname text, email text, created_date timestamp, PRIMARY KEY (userid) );

Currently requires query for video,

followed by query for user by id based

on results of first query

Page 51: A Deep Dive into Apache Cassandra for .NET Developers

Modeling Relationships – Client Side Joins

• What is the cost? Might be OK in small situations

• Do NOT scale

• Avoid when possible

51

Page 52: A Deep Dive into Apache Cassandra for .NET Developers

Modeling Relationships – Client Side Joins

52

CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, ... user_firstname text, user_lastname text, user_email text, PRIMARY KEY (videoid) );

CREATE TABLE users_by_video ( videoid uuid, userid uuid, firstname text, lastname text, email text, PRIMARY KEY (videoid) );

or

Page 53: A Deep Dive into Apache Cassandra for .NET Developers

.NET Driver for Cassandra

53

Page 54: A Deep Dive into Apache Cassandra for .NET Developers

.NET and Cassandra

• Open Source (on GitHub), available via NuGet

• Bootstrap using the Builder and then reuse the ISession object

Cluster cluster = Cluster.Builder() .AddContactPoint("127.0.0.1") .Build(); ISession session = cluster.Connect("killrvideo");

54

Page 55: A Deep Dive into Apache Cassandra for .NET Developers

.NET and Cassandra

• Executing CQL with SimpleStatement

• Sync and Async API available for executing statements

• Use Async API for executing queries in parallel

var videoId = Guid.NewGuid(); var statement = new SimpleStatement("SELECT * FROM videos WHERE videoid = ?", videoId); RowSet rows = await session.ExecuteAsync(statement);

55

Page 56: A Deep Dive into Apache Cassandra for .NET Developers

.NET and Cassandra

• Getting values from a RowSet is easy

• Rowset is a collection of Row (IEnumerable<Row>)

RowSet rows = await _session.ExecuteAsync(statement); foreach (Row row in rows) { var videoId = row.GetValue<Guid>("videoid"); var addedDate = row.GetValue<DateTimeOffset>("added_date"); var name = row.GetValue<string>("name"); }

56

Page 57: A Deep Dive into Apache Cassandra for .NET Developers

CQL 3 Data Types to .NET Types

• Full listing available in driver docs (http://www.datastax.com/docs)

CQL 3 Data Type .NET Type

bigint, counter long

boolean bool

decimal, float float

double double

int int

uuid, timeuuid System.Guid

text, varchar string (Encoding.UTF8)

timestamp System.DateTimeOffset

varint System.Numerics.BigInteger

Page 58: A Deep Dive into Apache Cassandra for .NET Developers

.NET and Cassandra - PreparedStatement

• SimpleStatement useful for one-off CQL execution (or when

dynamic CQL is a possibility)

• Use PreparedStatement for better performance

• Pay the cost of Prepare once (server roundtrip)

• Save the PreparedStatement instance and reuse

PreparedStatement prepared = session.Prepare( "SELECT * FROM user_credentials WHERE email = ?");

Page 59: A Deep Dive into Apache Cassandra for .NET Developers

.NET and Cassandra - PreparedStatement

• Bind variable values to get BoundStatement for execution

• Execution only has to send variable values

• You will use these all the time

• Remember: Prepare once, bind and execute many

BoundStatement bound = prepared.Bind("[email protected]"); RowSet rows = await _session.ExecuteAsync(bound);

Page 60: A Deep Dive into Apache Cassandra for .NET Developers

.NET and Cassandra - BatchStatement

• Add Simple/Bound statements to a batch

BoundStatement bound = prepared.Bind(video.VideoId, video.Name); var simple = new SimpleStatement( "UPDATE videos SET name = ? WHERE videoid = ?" ).Bind(video.Name, video.VideoId); // Use an atomic batch to send over all the mutations var batchStatement = new BatchStatement(); batchStatement.Add(bound); batchStatement.Add(simple); RowSet rows = await _session.ExecuteAsync(batch);

Page 61: A Deep Dive into Apache Cassandra for .NET Developers

.NET and Cassandra - BatchStatement

• Batches are Logged (atomic) by default

• Use when you want a group of mutations (statements) to all

succeed or all fail (denormalizing at write time)

• Really large batches are an anti-pattern (and Cassandra will

warn you)

• Not a performance optimization for bulk-loading data

Page 62: A Deep Dive into Apache Cassandra for .NET Developers

.NET and Cassandra – Statement Options

• Options like Consistency Level and Retry Policy are available at

the Statement level

• If not set on a statement, driver will fallback to defaults set when

building/configuring the Cluster

62

IStatement bound = prepared.Bind("[email protected]") .SetPageSize(100) .SetConsistencyLevel(ConsistencyLevel.LocalOne) .SetRetryPolicy(new DefaultRetryPolicy()) .EnableTracing();

Page 63: A Deep Dive into Apache Cassandra for .NET Developers

Lightweight Transactions (LWT)

• Use when you don’t want writes to step on each other

• AKA Linearizable Consistency

• Serial Isolation Level

• Be sure to read the fine print: has a latency cost associated

with using it, so use only where needed

• The canonical example: unique user accounts

Page 64: A Deep Dive into Apache Cassandra for .NET Developers

Lightweight Transactions (LWT)

• Returns a column called [applied] indicating success/failure

• Different from the relational world where you might expect an

Exception (i.e. PrimaryKeyViolationException or similar)

var statement = new SimpleStatement("INSERT INTO user_credentials (email,

password) VALUES (?, ?) IF NOT EXISTS");

statement = statement.Bind("[email protected]", "Password1!");

RowSet rows = await _session.ExecuteAsync(statement);

var userInserted = rows.Single().GetValue<bool>("[applied]");

Page 65: A Deep Dive into Apache Cassandra for .NET Developers

Automatic Paging

• The Problem: Loading big result sets into memory is a recipe for disaster (OutOfMemoryExceptions, etc.)

• Better to load and process a large result set in pages (chunks)

• Doing this manually with Cassandra prior to 2.0 was a pain

• Automatic Paging makes paging on a large RowSet transparent

Page 66: A Deep Dive into Apache Cassandra for .NET Developers

Automatic Paging

• Set a page size on a statement

• Iterate over the resulting RowSet

• As you iterate, new pages are fetched transparently when the

Rows in the current page are exhausted

• Will allow you to iterate until all pages are exhausted

boundStatement = boundStatement.SetPageSize(100); RowSet rows = await _session.ExecuteAsync(boundStatement); foreach (Row row in rows) { }

Page 67: A Deep Dive into Apache Cassandra for .NET Developers

Typical Pager UI in a Web Application

• Show page of records in UI and allow user to navigate

Page 68: A Deep Dive into Apache Cassandra for .NET Developers

Typical Pager UI in a Web Application

• Automatic Paging – this is not the feature you are looking for

Page 69: A Deep Dive into Apache Cassandra for .NET Developers

.NET and Cassandra

• Mapping results to DTOs: if you like using CQL for querying, try

Mapper component (formerly CqlPoco package)

public class User { public Guid UserId { get; set; } public string Name { get; set; } } // Create a mapper from your session object var mapper = new Mapper(session); // Get a user by id from Cassandra or null if not found var user = client.SingleOrDefault<User>( "SELECT userid, name FROM users WHERE userid = ?", someUserId);

69

Page 70: A Deep Dive into Apache Cassandra for .NET Developers

.NET and Cassandra

• Mapping results to DTOs: if you like LINQ, use built-in LINQ

provider

[Table("users")] public class User { [Column("userid"), PartitionKey] public Guid UserId { get; set; } [Column("name")] public string Name { get; set; } } var user = session.GetTable<User>() .SingleOrDefault(u => u.UserId == someUserId) .Execute();

70

Page 71: A Deep Dive into Apache Cassandra for .NET Developers

Tools and Code

71

Page 72: A Deep Dive into Apache Cassandra for .NET Developers

Installing Cassandra

• Planet Cassandra for all things C* (planetcassandra.org)

Page 73: A Deep Dive into Apache Cassandra for .NET Developers

Installing Cassandra

• Windows installer is super easy way to do development and

testing on your local machine

• Production Cassandra deployments on Linux (Windows

performance parity is coming in 3.0)

Page 74: A Deep Dive into Apache Cassandra for .NET Developers

In the Box – Command Line Tools

• Use cqlsh REPL for running CQL against your cluster

74

Page 75: A Deep Dive into Apache Cassandra for .NET Developers

In the Box – Command Line Tools

• Use nodetool for information on your cluster (and lots more)

• On Windows, available under apache-cassandra\bin folder 75

Page 76: A Deep Dive into Apache Cassandra for .NET Developers

DevCenter

• Get it on DataStax

web site

(www.datastax.com)

• GUI for Cassandra

development (think

SQL Server

Management Studio

for Cassandra)

76

Page 77: A Deep Dive into Apache Cassandra for .NET Developers

Sample Code – KillrVideo

• Live demo available at http://www.killrvideo.com

– Written in C#

– Live Demo running in Azure

– Open source: https://github.com/luketillman/killrvideo-csharp

77

Page 78: A Deep Dive into Apache Cassandra for .NET Developers

Questions? Overtime (use cases)?

Follow me for updates or to ask questions later: @LukeTillman

78

Page 79: A Deep Dive into Apache Cassandra for .NET Developers

Overtime: Who’s using it?

79

Page 80: A Deep Dive into Apache Cassandra for .NET Developers

Cassandra Adoption

Page 81: A Deep Dive into Apache Cassandra for .NET Developers

Some Common Use Case Categories

• Product Catalogs and Playlists

• Internet of Things (IoT) and Sensor Data

• Messaging (emails, IMs, alerts, comments)

• Recommendation and Personalization

• Fraud Detection

• Time series and temporal ordered data

http://planetcassandra.org/apache-cassandra-use-cases/

Page 82: A Deep Dive into Apache Cassandra for .NET Developers

The “Slide Heard Round the World”

• From Cassandra Summit 2014, got a lot of attention

• 75,000+ nodes

• 10s of PBs of data

• Millions ops/s

• One of the largest known Cassandra deployments

82

Page 83: A Deep Dive into Apache Cassandra for .NET Developers

Spotify

• Streaming music web service

• > 24,000,000 music tracks

• > 50TB of data in Cassandra

Why Cassandra?

• Was PostgreSQL, but hit scaling

problems

• Multi Datacenter Availability

• Integration with Spark for data

processing and analytics

Usage

• Catalog

• User playlists

• Artists following

• Radio Stations

• Event notifications

83

http://planetcassandra.org/blog/interview/spotify-scales-to-the-top-of-the-charts-with-apache-cassandra-at-40k-requestssecond/

Page 84: A Deep Dive into Apache Cassandra for .NET Developers

eBay

• Online auction site

• > 250TB of data, dozens of nodes,

multiple data centres

• > 6 billion writes, > 5 billion reads

per day

Why Cassandra?

• Low latency, high scale, multiple data

centers

• Suited for graph structures using

wide rows

Usage

• Building next generation of

recommendation engine

• Storing user activity data

• Updating models of user interests in

real time

84

http://planetcassandra.org/blog/5-minute-c-interview-ebay/

Page 85: A Deep Dive into Apache Cassandra for .NET Developers

FullContact

• Contact management: from multiple

sources, sync, de-dupe, APIs available

• 2 clusters, dozens of nodes, running

in AWS

• Based here in Denver

Why Cassandra?

• Migated from MongoDB after

running into scaling issues

• Operational simplicity

• Resilience and Availability

Usage

• Person API (search by email, Twitter

handle, Facebook, or phone)

• Searched data from multiple sources

(ingested by Hadoop M/R jobs)

• Resolved profiles

85

http://planetcassandra.org/blog/fullcontact-readies-their-search-platform-to-scale-moves-from-mongodb-to-apache-cassandra/

Page 86: A Deep Dive into Apache Cassandra for .NET Developers

Instagram

• Photo-sharing, video-sharing and

social networking service

• Originally AWS (Now Facebook data

centers?)

• > 20k writes/second, >15k

reads/second

Why Cassandra?

• Migrated from Redis (problems

keeping everything in memory)

• No painful “sharding” process

• 75% reduction in costs

Usage

• Auditing information – security,

integrity, spam detection

• News feed (“inboxes” or activity feed)

– Likes, Follows, etc.

86

http://planetcassandra.org/blog/instagram-making-the-switch-to-cassandra-from-redis-75-instasavings/

Summit 2014 Presentation: https://www.youtube.com/watch?v=_gc94ITUitY

Page 87: A Deep Dive into Apache Cassandra for .NET Developers

Netflix

• TV and Movie streaming service

• > 2700+ nodes on over 90 clusters

• 4 Datacenters

• > 1 Trillion operations per day

Why Cassandra?

• Migrated from Oracle

• Massive amounts of data

• Multi datacenter, No SPOF

• No downtime for schema changes

Usage

• Everything! (Almost – 95% of DB use)

• Example: Personalization

– What titles do you play?

– What do you play before/after?

– Where did you pause?

– What did you abandon watching after 5

minutes?

87

http://planetcassandra.org/blog/case-study-netflix/

Summit 2014 Presentation: https://www.youtube.com/watch?v=RMSNLP_ORg8&index=43&list=UUvP-AXuCr-naAeEccCfKwUA

Page 88: A Deep Dive into Apache Cassandra for .NET Developers

Go forth and build awesome things!

Follow me for updates or to ask questions later: @LukeTillman

88