Transcript
Page 1: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Launching PS4 with Cassandra

Page 2: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Introduction •  Alexander Filipchik – Staff Software Engineer at SNEI

•  Dustin Pham – Staff Software Engineer at SNEI

Page 3: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Agenda •  Journey towards Cassandra •  Cassandra-backed PS4 Features •  Ops-y Stuff •  Lessons learned

Page 4: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Journey towards Cassandra

Page 5: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Challenges •  Small Team •  Legacy Support •  Hardware Deadline •  Scaling @ Peak Time

Page 6: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Why Cassandra •  Strong community •  Horizontally scalable architecture •  Good performance •  Cost effective •  New adventure J

6

Page 7: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

PS4 Features backed by Cassandra

Page 8: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Cassandra-backed PS4 features

•  What’s New •  Video Library •  My Library •  PS Now •  Notifications •  LiveArea •  Store catalog •  Pre-order •  PS Plus •  Recommendations •  Remote Download •  Share •  Authentication •  +more

Page 9: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Cassandra-backed PS4 features

•  What’s New •  Video Library •  My Library •  PS Now •  Notifications •  LiveArea •  Store catalog •  Pre-order •  PS Plus •  Recommendations •  Remote Download •  Share •  Authentication •  +more

Page 10: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Cassandra-backed PS4 features

•  What’s New •  Video Library •  My Library •  PS Now •  Notifications •  LiveArea •  Store catalog •  Pre-order •  PS Plus •  Recommendations •  Remote Download •  Share •  Authentication •  +more

Page 11: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Cassandra-backed PS4 features

•  What’s New •  Video Library •  My Library •  PS Now •  Notifications •  LiveArea •  Store catalog •  Pre-order •  PS Plus •  Recommendations •  Remote Download •  Share •  Authentication •  +more

Page 12: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Cassandra-backed PS4 features

•  What’s New •  Video Library •  My Library •  PS Now •  Notifications •  LiveArea •  Store catalog •  Pre-order •  PS Plus •  Recommendations •  Remote Download •  Share •  Authentication •  +more

Page 13: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Cassandra-backed PS4 features

•  What’s New •  Video Library •  My Library •  PS Now •  Notifications •  LiveArea •  Store catalog •  Pre-order •  PS Plus •  Recommendations •  Remote Download •  Share •  Authentication •  +more

Page 14: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Cassandra-backed PS4 features

•  What’s New •  Video Library •  My Library •  PS Now •  Notifications •  LiveArea •  Store catalog •  Pre-order •  PS Plus •  Recommendations •  Remote Download •  Share •  Authentication •  +more

Page 15: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Cassandra-backed PS4 features

•  What’s New •  Video Library •  My Library •  PS Now •  Notifications •  LiveArea •  Store catalog •  Pre-order •  PS Plus •  Recommendations •  Remote Download •  Share •  Authentication •  +more

Page 16: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Cassandra-backed PS4 features

•  What’s New •  Video Library •  My Library •  PS Now •  Notifications •  LiveArea •  Store catalog •  Pre-order •  PS Plus •  Recommendations •  Remote Download •  Share •  Authentication •  +more

Page 17: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Cassandra-backed PS4 features

•  What’s New •  Video Library •  My Library •  PS Now •  Notifications •  LiveArea •  Store catalog •  Pre-order •  PS Plus •  Recommendations •  Remote Download •  Share •  Authentication •  +more

Page 18: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Cassandra-backed PS4 features

•  What’s New •  Video Library •  My Library •  PS Now •  Notifications •  LiveArea •  Store catalog •  Pre-order •  PS Plus •  Recommendations •  Remote Download •  Share •  Authentication •  +more

Page 19: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Cassandra-backed PS4 features

•  What’s New •  Video Library •  My Library •  PS Now •  Notifications •  LiveArea •  Store catalog •  Pre-order •  PS Plus •  Recommendations •  Remote Download •  Share •  Authentication •  +more

Page 20: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Cassandra-backed PS4 features

•  What’s New •  Video Library •  My Library •  PS Now •  Notifications •  LiveArea •  Store catalog •  Pre-order •  PS Plus •  Recommendations •  Remote Download •  Share •  Authentication •  +more

Page 21: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Cassandra-backed PS4 features

•  What’s New •  Video Library •  My Library •  PS Now •  Notifications •  LiveArea •  Store catalog •  Pre-order •  PS Plus •  Recommendations •  Remote Download •  Share •  Authentication •  + more

Page 22: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Ops-y Stuff

Page 23: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Infrastructure •  Hosted in cloud and physical DCs •  Several hundred nodes and growing •  Cluster by feature •  Vnodes and Assigned token clusters •  Astyanax Client

Page 24: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Stats for PS4 cloud nodes •  Data throughput: Gigabytes / sec •  Cassandra read/writes: > 200,000 / sec •  Data size: tens of terabytes •  10M PS4 and 80M PS3 sold

24

Page 25: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Clusters •  Cluster per Read/Write pattern initially •  Now use cluster per feature •  Seeds referenced by DNS names •  Size Tiered compaction •  Manual compactions for some CFs

25

Page 26: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

A typical node •  m2.4xl + i2.2xl •  2 ephemeral disks (~ 2 x 800 GB) •  Commit log on root partition •  Topology managed in the topology file

managed by chef

26

Page 27: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

AWS •  Nodes are

interleaved between AZs – Replication factor

spreads data across AZ’s

– Minimizes downtime due to AZ outage

Availability Zone A Availability Zone C

Page 28: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Eph1

Disk Layout

Pre-Launch Launch Current

ü  2 Ephemerals in a RAID 0 ü  Higher throughput (io

spreads into 2 devices for reading & writing)

ü  If you lose 1 device, you loose the array !

ü  2 Ephemerals in a RAID 1 ü  Higher throughput for

reading (io spreads into 2 devices), but not for writing

ü  If you lose 1 device, the array continues up in degraded mode.

ü  ½ the available space

ü  2 individual Ephemerals ü  Higher throughput (io

spreads into 2 devices for reading & writing)

ü  You lose 1 device, Cassandra stops (configurable)

ü  No RAID overhead

Eph0

AWS m2.4xl

RAID 0

Eph1

Eph0

AWS m2.4xl

RAID 1

Eph1

Eph0

AWS m2.4xl

Page 29: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Cluster Resizing

Page 30: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Thrift Payload Size

thri%_framed_transport_size_in_mb  thri%_max_message_length_in_mb  

Page 31: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Bouncing Nodes phi_convict_threshold  

Page 32: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Inter-DC Latency

Page 33: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Monitor system health •  Nagios •  Kibana/Elasticsearch •  Graphite •  AWS Cloudwatch •  App level monitoring •  Opscenter

Page 34: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra
Page 35: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

App level metrics

Page 36: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Lessons Learned

Page 37: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Fun with Astyanax Client •  Cross DC Latencies –  Several second latencies in JP and EE data

centers –  Astyanax configs to ensure local datacenters

used •  Imbalanced node traffic –  Hashing algorithm (MD5 vs Murmur3)

•  DNS Caching in the JVM –  Stale seed nodes

Page 38: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

A tale of 2 Nodes

Page 39: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Cluster lessons •  A single bad node can raise app

latencies significantly •  Taking out an entire cassandra cluster is

easy (not so fun) – Compressing data before sending to

cassandra helps a lot. •  Corrupted SStable resulted in

cascading failure

Page 40: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra
Page 41: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

•  Monitoring – Memtable flush frequency – Hinted handoffs – Garbage collection – Compactions – Histograms

Page 42: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

•  VPNs are a dangerous bottle neck

•  Easier to rebuild a node than to fix

•  Backup data – Replication factor helps

but does not account for data corruption

Page 43: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

•  Denormalization costs •  Disk is cheap but EC2s are

not •  TTL on almost everything •  Adjust gc_grace_period

based off TTL times •  Transactions ? Be creative •  Load test with real data

Page 44: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

•  Replication strategy: –  Read / Write pattern –  Data is source of truth or not –  Data locality –  User Level data vs App level

data •  Cluster wide commands

should be staggered –  Global repair L

Page 45: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Tokens •  Vnodes vs Assigned Tokens –  Increased chattiness on gossip protocol

with vnodes – Perceived slowness on repair and cleanup

operations on vnodes enabled cluster – Astyanax client does not like vnodes…

Page 46: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Compactions •  Compactions are your worst enemy –  larger disk usage = high cpu & longer

compactions •  Leveled compaction vs sized compaction –  Start up time –  Cpu tradeoff –  IO tradeoff

•  Updates + Removals eat up disks

Page 47: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

We are hiring… sonyentertainmentnetwork.com/careers


Top Related