cassandra summit 2014: deploying cassandra for call of duty
DESCRIPTION
Presenters: Seán O Sullivan, Service Reliability Engineer & Tim Czerniak, Software Engineer at Demonware This presentation covers the eight-month evaluation process we underwent to migrate some of Call of Duty’s core services from MySQL to Cassandra. We will outline our requirements, the process we followed for the evaluation, decisions we made around our schema, configuration and hardware, and some issues we encountered.TRANSCRIPT
DEMONWAREDeploying Cassandra for Call of Duty
#CassandraSummit
Tim Czerniak Software Engineer
DemonWare
Seán O Sullivan Operations Engineer
DemonWare
DEMON-WHO?
DemonWare is a subsidiary of Activision-Blizzard
We write, deploy and maintain client and server applications for Activision and Blizzard games
SERVICES• Matchmaking • Leaderboards • Chat • File Storage • Leagues • Social Network
Integration • etc…
TECHNOLOGIES
Client
C++ HTTP
Server
Python Erlang
MySQL CentOS
Puppet
OUR UNUSUAL USE CASE
Release
First weekend
Christmas
Peak
– Benjamin Franklin
“By failing to prepare,you are preparing to fail.”
OUR PREDICAMENT
Needed to share data cross-DC…
…but MySQL isn’t so good at that.
• Progress store • High write, low read. • File size ~4KB • Persistent
• Presence • High write, high read • Data size minimal • Transient
• Messaging • Low write, low read • Transient
SERVICES
• Cross DC
• Ease of consolidation and expansion
• Manageability for the operations teams
• Throughput
• Storage: 1,500,000 reqs/min
• Presence: 250,000 reqs/min
• Messaging: 850,000 reqs/min
REQUIREMENTS
EVALUATION• Shortlisted suitable
options • Riak • Cassandra
• Re-wrote our application backend, twice
LOAD TESTING
• Two clusters
• Single CPU, SSD and average memory
• Dual CPU, Spindles and high memory
• Used realistic user profiles
• Included peaks and troughs during testing
• Ran a soak test
THE WINNER???• Initially Riak was a slam-dunk
• Erlang-based (we know Erlang)
• Tooling is excellent
• Performed well
• Previously evaluated
THE WINNER• Cassandra won in the end
• Write performance
• Richer feature set
• Maturity of codebase and tooling
• Testing continued 24/7 until launch
SCHEMA• Progress store
• A perfect fit! • Presence
• More relational • High throughput (Tombstones!) • TTLs
• Messaging • Time-series data, well suited • Tombstones!
• Keep it simple
• It’s not a relational DB
• Get your partition keys and clustering keys right.
• C* will do what it does best
SCHEMA: LESSONS LEARNED
SCHEMA: LESSONS LEARNED• Don’t ignore CAP theorem
• Cassandra has tuneable consistency, but there will be trade-offs
• Load test with real numbers
• Some issues aren’t evident in unit-tests
CONFIG
• Default settings, probably not what you want
• Changed many settings off the bat
• Reverted some (oops)
HARDWARE
• 2x Intel Xeon E5-2620 @ 2Ghz
• 2x 480GB SSD (RAID-1)
• 32GB
• 1Gb non-dedicated network
MONITORING
• Graphite
• Nagios
• Jolokia
GOTCHAS• Vnodes and rack awareness
• Loadbalancers
• Dev differs from production (of course...)
• Launching in a DC we didn't load test in
LAUNCH
• Request to simulate a node failure
• Two nodes died over Christmas
• Expanding to other titles
QUESTIONS?
APPENDIXcassandra.conf:
auto_bootstrap: false
hinted_handoff_throttle_in_kb: 1024
max_hints_delivery_threads: 2
trickle_fsync: true
rpc_server_type: hsha
<% if virtual == "physical" -%>
concurrent_reads: 128
<% else -%>
concurrent_reads: 32
<% end -%>
concurrent_writes: <%= processorcount.to_i * 8 -%>
multithreaded_compaction: false
<% if virtual == "physical" -%>
compaction_throughput_mb_per_sec: 0
<% else -%>
compaction_throughput_mb_per_sec: 16
<% end -%>
!
cassandra-env.sh:
<% if virtual == "physical" -%>
JVM_OPTS="$JVM_OPTS -Xss180k"
<% else -%>
JVM_OPTS="$JVM_OPTS -Xss228k"
<% end -%>
JVM_EXTRA_OPTS="$JVM_EXTRA_OPTS -javaagent:/usr/share/java/graphite-reporter-agent.jar -javaagent:/usr/share/java/jolokia-jvm-agent.jar=port=8080,host=<%= hostname %>"
EXTRA_CLASSPATH="/usr/share/java/metrics-graphite-2.0.3.jar"