cassandra day ny 2014: utilizing apache cassandra at ultravisual

Post on 15-Jan-2015

162 Views

Category:

Technology

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

Cassandra has been an integral part of Ultravisual’s infrastructure since its launch, allowing us to rapidly prototype and build new features that further enhance user experience. Over the course of this discussion we will cover three key topics. How Cassandra came to be used at Ultravisual and the key problem it solved. How the usage of Cassandra as part of our stack has evolved alongside the product. And finally, some of the experiences we’ve had with deploying and running Cassandra in a production environment.

TRANSCRIPT

CASSANDRA @

ULTRAVISUALCassandra Day New York 2014

Skye BookLead Systems Architect

ULTRAVISUAL

A visual network for inspiration, expression,

and collaboration

The Feed• A user’s first taste of UV

• More than just posts

• Constantly being tweaked and re-thought

SELECT DISTINCT _post.*FROM _postJOIN _collection_post cp ON _post.uuid=cp.post_uuidJOIN _collection_follow cf ON cp.c_uuid=cf.collection_uuidWHERE cf.user_id = ?ORDER BY _post.created_at DESCLIMIT 20 OFFSET 0

The Old Way

Started Simple !

“Show me recent posts in collections I follow”

SELECT a.*FROM _user_follow a, _user_follow bWHERE b.follower=12345AND a.follower=b.followedORDER BY a.followed_at DESCLIMIT 20 OFFSET 0

The Old Way

Added Complexity !

“Show me people recently followed by my connections”

The Old Way

Every new feature needs another query !

Feed requests generate a disproportionate amount of load to normal CRUD ops

Reframing the Problem

From This:

A place for posts, new collections, social activity, and anything else interesting

nitro404.com/computers/knex.php

Reframing the Problem

To This:

A list of items interesting to the user

The New Way

Model First

• With an SQL background, this can be misleading.

• Essential Question: “How do I need to access this data?”

–Rick Branson, Instagram Cassandra Summit 2013

“Try to model data as a log of user intent”

The New Way

}The New Way

user status

created_at

story json2 0 61b97280 user_follow:3:5 {“foo”:”bar”}

2 1 5daa04c0 post:bfbd0a39 {“foo”:”bar”}

2 1 565752e0 collection_follow:5:d70961c1

{“foo”:”bar”}

2 1 4a8189e0 user_follow:3:5 {“foo”:”bar”}

Primary Key Cached story JSON

Model for user feeds

• Fast to fetch user stories

• Cached JSON means almost zero SQL requests

Fast.Response times cut from

over 100’s ms to 30ms range

Launch WeekFeatured by Apple!

Cluster Disk Usage

26%

74%

Don’t be too cute

cqlsh:ultravisual> ALTER TABLE latest_feed DROP json;

Handling Deletions• Data is only appended,

never deleted from user feeds

• Adapted Instagram’s ‘Anti-Column’ solution

• Avoids missed deletions for nodes down longer than GCGraceSeconds

• Avoids race condition where deletion arrives before write.

Sam follows Sandy

user

created_at

status

story2 4a8189e0 1 user_follow:

3:5Sam unfollows Sandy

user

created_at

status

story2 61b97280 0 user_follow:

3:52 4a8189e0 1 user_follow:

3:5

Negated Entriesuser

created_at

status

story2 61b97280 0 user_follow:

3:52 4a8189e0 1 user_follow:

3:5

user

status

created_at

story2 0 61b97280 user_follow:

3:52 1 4a8189e0 user_follow:

3:5

Keeps all entries in a single time series

First page can usually be populated by a single read

Splits user’s row into two lists, live and undo

Will always require at least two reads

Further Uses• User Notifications

• User Onboarding

• Reshare Statistics

• User & Content Reports

• API Statistics

User Onboarding

user created_at

sequence step content2 61b97280 onboaring_v2 1 rec_collections_1

3 5daa04c0 onboaring_v2 2 rec_collections_2

5 565752e0 onboaring_v3 1 find_friends

6 4a8189e0 onboaring_v3 1 find_friends

Sequenced feed entries for users on signup

Production Experiences

Drivers • Java: Started with Astyanax, moved to Datastax

v2

• Node.js: node-cassandra-cql

Cryptic message with large batch updates in pre-release versions of 2.0 driver

DS Driver Issue 229

com.datastax.driver.core.exceptions.DriverInternalError: An unexpected protocol error occured. This is a bug in this library, please report: Unknown code 256 for a consistency level

As of 2.0, batches with more than 64k statements throw a better exception:

java.lang.IllagalStateException: Batch statement cannot contain more than 65536 statements.

Just use LZ4

Compression

Cassandra-4851Unfortunate truth in Cassandra 2.0.5

!cqlsh:test> SELECT * FROM user_feed WHERE user = 2 AND created_at > :some_uuid AND status=0;!cqlsh:test> Bad Request: PRIMARY KEY part status cannot be restricted (preceding part created_at is either not restricted or by a non-EQ relation)

Cassandra-4851

Adds CQL3 support for vector comparison syntax

!cqlsh:test> SELECT * FROM timeline WHERE day = ’21 Jun 2014’ AND (hour,min) >= (3,50) AND (hour,min,sec) <= (4,37,30);

Available in 2.0.6

Production ExperiencesUpgrades • Manual package installs (dsc20 from Datastax)

• One node at a time

• Upgrade, wait for healthy status & operations, move on

• OpsCenter provides good overview

Production Experiences

Speaking of OpsCenter… • Don’t be alarmed if nodes appear but agent

data does not

• opscenterd often needs a restart after cluster upgrade to see agents again

Production Experiences

Service Discovery • Running on AWS using EC2MultiRegionSnitch

• Using OpsWorks (Amazon’s Chef service) for seed config

Chef Cookbookgithub.com/skyebook/cassandra-opsworks-chef-cookbook

• Forked from Michael Klishin’s awesome C* cookbook

• Added integration with OpsWorks’ stack.json# Add this node as the first seed# If using the multi-region snitch, we must use the public IP addressif node["cassandra"]["snitch"] == "Ec2MultiRegionSnitch" seed_array << node["opsworks"]["instance"]["ip"]else seed_array << node["opsworks"]["instance"]["private_ip"]end!node["opsworks"]["layers"]["cassandra"]["instances"].each do |instance_name, values| if node["cassandra"]["snitch"] == "Ec2MultiRegionSnitch" seed_array << values["ip"] else seed_array << values["private_ip"] endend set[:cassandra][:seeds] = seed_array

Questions

top related