netflix at-disney-09-26-2014

48
Cloud Data Persistence @ Monal Daxini Senior Software Engineer Cloud Database Engineering @monaldax 50m+ Subscribers

Upload: mdaxini

Post on 28-Nov-2014

157 views

Category:

Software


0 download

DESCRIPTION

Slides from a presentation by Monal Daxini at Disney, Glendale CA about Netflix Open Source Software, Cloud Data Persistence, and Cassandra best Practices

TRANSCRIPT

Page 1: Netflix at-disney-09-26-2014

Cloud Data Persistence @

Monal Daxini Senior Software Engineer

Cloud Database Engineering !

@monaldax

50m+ Subscribers

Page 2: Netflix at-disney-09-26-2014

SummaryNetflix OSS

Microservices

m@Netflix Season 1, 2

Cassandra @ Netflix

Cassandra Best Practices

Coming Soon…

Page 3: Netflix at-disney-09-26-2014

Start with Zero To Cloud With @NetflixOSS

!https://github.com/Netflix-Skunkworks/zerotocloud

Page 4: Netflix at-disney-09-26-2014

Karyon/Governator

Hystrix

Ribbon/Eureka

Curator

EVCache

Astyanax

Turbine

Servo

Blitz4J

Function OSS Library

RxJava

Archaius

Page 5: Netflix at-disney-09-26-2014

Building Apps and AMIs

ASG /Cluster

WAR

ASG/Cluster

App AMI

Deploy

Launch Instances

@stonse

Page 6: Netflix at-disney-09-26-2014
Page 7: Netflix at-disney-09-26-2014

NetflixOSS

Suro Data Pipeline

Eureka

Zuul

Edda

Page 8: Netflix at-disney-09-26-2014

Micro ServicesMicro services DOES NOT mean better Availability

Need Fault Tolerant Architecture

Service Dependency View

Distributed Tracing (Dapper inspired)

Page 9: Netflix at-disney-09-26-2014

Micro Services1 response - 1 monolithic service 99.99% uptime

1 response - 30 micro services each 99.99% uptime

overall 97% uptime (20hrs downtime)

Page 10: Netflix at-disney-09-26-2014

Micro Services

Actual Scale

~2 Billion Edge Requests per day

Results in ~20 Billion Fan out requests to

~100 different MicroServices

Page 11: Netflix at-disney-09-26-2014

Fault Tolerant Arch

Depedency Isolation

Aggressive timeouts

Circuit breakers

Page 12: Netflix at-disney-09-26-2014

MicroServices Container

Synchronous Asynchronous

Tomcat RxNetty (UDP TCP WebSockets SSE)

ThreadPool

(1 thread per request)

EventLoops

Page 13: Netflix at-disney-09-26-2014

MicroServices Container

Rx

ease async programming

avoid callback hell

Netty to leverage EventLoop

Rx + Netty RxNetty

Page 14: Netflix at-disney-09-26-2014

* Courtsey Brendan Gregg

Page 15: Netflix at-disney-09-26-2014

AWS Maint

Page 16: Netflix at-disney-09-26-2014

@Netflix Season-1

Media Cloud Engineering

Page 17: Netflix at-disney-09-26-2014

Encoding PaaS

Master - Worker Pattern

Decoupled by Priority Queues with message lease

State in Cassandra

Page 18: Netflix at-disney-09-26-2014

Oracle >> Cassandra

Data Model & Lack of ACID

Client Cluster Symbiosis

Embrace Eventual Consistency

Data Migration

Shadow Write / Reads

Page 19: Netflix at-disney-09-26-2014

Object To Cassandra Mapping/** * @author mdaxini */@CColumnFamily(name = “Sequence", shared = true) @Audited(columnFamily = "sequence_audit") public class SequenceBean { @CId(name = "id") private String sequenceName; @CColumn(name = "sequenceValue") private Long sequenceValue; @CColumn(name = "updated") @TemporalAutoUpdate @JsonProperty("updated") private Date updated;

Page 20: Netflix at-disney-09-26-2014

Object To Cassandra Mapping@JsonAutoDetect(JsonMethod.NONE) @JsonIgnoreProperties(ignoreUnknown = true) !@CColumnFamily(name = "task") public class Job { @CId private JobKey jobKey;

public final class TaskKey { @CId(order = 0) private Long packageId; @CId(order = 1) private UUID taskId;

Page 21: Netflix at-disney-09-26-2014

Priority-Scheduling Queue

Evolution:

One SQS Queue per priority range

Store and forward (rate-adaptive) to SQS Queue

Rule based priority, leases, RDBMS based with prefetch

Page 22: Netflix at-disney-09-26-2014

Encoding PaaS Farm

One command deployment and upgrade

Self Serve

Homogeneous View of Windows and Linux

Pioneered Ubuntu - production since 2011

Page 23: Netflix at-disney-09-26-2014

Innovate Fast Build for Pragmatic Scale

Innovate for Business Standardize Later*

Page 24: Netflix at-disney-09-26-2014

@Netflix Season-2

Cloud Database Engineering

[CDE]

Page 25: Netflix at-disney-09-26-2014

Platform Big Data/Caching & Services

Cassandra Astyanax Priam

CassJMeter Hadoop Platform As a Service

Genie

Lipstick

Adapted from a slide by @stonse

Caching

Invi

so*

Page 26: Netflix at-disney-09-26-2014

CDE Charter

Spark*

Solr*

* Under Construction

Dynomite*

Redis

ElasticSearch

Cassandra (1.2.x >> 2.0.x)

Priam

Astyanax

Skynet*

Page 27: Netflix at-disney-09-26-2014

All OLTP Data in Cassandra

!

Almost!

Page 28: Netflix at-disney-09-26-2014

Cassandra Prod Footprint

90+ Clusters

2700+ Nodes

4 Datacenters (Amazon Regions)

>1 Trillion operations per day

Page 29: Netflix at-disney-09-26-2014

Cassandra Best Practices* Usage

*Practices I have found useful, YMMV

Page 30: Netflix at-disney-09-26-2014

Use RandomPartitioner

Have at least 3 replicas (quorum)

Same number of replicas - simpler operations

!

!

create keyspace oracle with placement_strategy = 'NetworkTopologyStrategy' and strategy_options = {us-west-2 : 3, us-east : 3}

Page 31: Netflix at-disney-09-26-2014

Move to CQL3 from thrift

Codifies best practices

Leverage Collections (albeit restricted cardinality)

Use Key Caching

As a default turn off Row Caching

Rename all composite columns in one ALTER TABLE statement.

Page 32: Netflix at-disney-09-26-2014

Watch length of column names

Use “COMPACT STORAGE” wisely

Cannot use collections - depends on CompositeType

Non compact storage uses 2 bytes per internal cell, but preferred.

!

!

* Image courtsey Datastax blog

Page 33: Netflix at-disney-09-26-2014

cqlsh:test> SELECT * FROM events; key | column1 | column2 | value --------+---------+---------+--------- tbomba | 4 | 120 | event 1 tbomba | 4 | 2500 | event 2 tbomba | 9 | 521 | event 3 tbomba | 10 | 3525 | event 4

* Courtsey Datastax blog

CREATE TABLE events ( key text, column1 int, column2 int, value text, PRIMARY KEY(key, column1, column2) ) WITH COMPACT STORAGE

Page 34: Netflix at-disney-09-26-2014

Prefer CL_ONE

data replication within 500ms across the region

Using quorum reads and writes, then set read_repair_chance to 0.0 or very low value.

Make sure repairs are run often

Eventual Consistency does not mean hopeful consistency

Page 35: Netflix at-disney-09-26-2014

Avoid secondary indexes for high cardinality values

Most cases we set gc_grace_seconds = 10 days

Avoid hot rows

detect using node level latency metrics

Page 36: Netflix at-disney-09-26-2014

Avoid heavy rows

Avoid too wide rows (< 100K columns if smaller)

Don’t use C* as a Queue

Tombstones will bite you

Page 37: Netflix at-disney-09-26-2014

SizeTieredCompactionStrategy

write heavy workload

non-predictable I/O, 2x disk space

LeveledCompactionStrategy

read heavy work loads

predictable I/O, 2x STCS

Page 38: Netflix at-disney-09-26-2014

LeveledCompactionStrategy

SizeTieredCompactionStrategy

* Image courtsey Datastax blog

Page 39: Netflix at-disney-09-26-2014

Guesstimate and then validate sstable_size_in_mb

Hint: based on write rate and size

160mb for LeveledCompactionStrategy

SizeTieredCompactionStrategy - C* default 50mb

Page 40: Netflix at-disney-09-26-2014

Atomic batches

no isolation, only atomic for row within partition key

no automatic rollback

Lightweight transactions

Page 41: Netflix at-disney-09-26-2014

Cassandra Best Practices Operations

*Practices we have found useful, YMMV

Page 42: Netflix at-disney-09-26-2014

If your C* clusters footprint is significant

must have good automation

at least a C* semi-expert

Use cstar_perf to validate your initial clusters

We don’t use vnodes

On each node size disk to have 2x of expected data - ephemeral ssds no ebs

Page 43: Netflix at-disney-09-26-2014

Monitoring and alerting

read write latency - co-ordinator & node level

Compaction stats

Heap Usage

Network

Max & Min Row sizes

Page 44: Netflix at-disney-09-26-2014

Fixed tokens, double the cluster to expand

Important to size the cluster for app needs initially

benefits of fixed tokens outweighs vnodes

Take back up of all the nodes

to allow for eventual consistency on restores

Note: commitlog by default fsync only ever 10 seconds

Page 45: Netflix at-disney-09-26-2014

Run repairs before GCGraceSeconds expires

Throttle compactions and repairs

Repairs can take a long time

run a primary range and a Keyspace at a time to avoid performance impact.

Page 46: Netflix at-disney-09-26-2014

Schema disagreements - pick the nodes with the older date and restart them one at time.

nodetool reset local schema not persistent on 1.2

Recyle nodes in aws to prevent staleness

Expanding to new region

Launch nodes in new region without bootstrapping

Change Keyspace replication

Run nodetool rebuild on nodes in new region.

Page 47: Netflix at-disney-09-26-2014

More Info

http://techblog.netflix.com/

http://netflix.github.io/

http://slideshare.net/netflix

https://www.youtube.com/user/NetflixOpenSource

https://www.youtube.com/user/NetflixIR $$$

Page 48: Netflix at-disney-09-26-2014

??