cassandra summit 2010 performance tuning

67
Cassandra Summit 1.0 Performance Tuning Brandon Williams Riptano, Inc. [email protected] [email protected] @faltering driftx on freenode August 10, 2010 Brandon Williams Cassandra Summit 1.0

Upload: driftx

Post on 03-Jul-2015

15.513 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Cassandra Summit 2010 Performance Tuning

Cassandra Summit 1.0Performance Tuning

Brandon Williams

Riptano, [email protected]

[email protected]@faltering

driftx on freenode

August 10, 2010

Brandon Williams Cassandra Summit 1.0

Page 2: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Making writes faster

Use a separate IO device for the commit log.

Hard to accomplish in the cloudRackspace: one IO device, but it’s persistent (RAID arrayunderneath)EC2: EBS is slow, local disk is impersistent

You could put the commitlog on the ephemeral drive anyway,at the price of durabilityBut then, why have a commitlog at all?Maybe you can disable it in 0.7/0.8

Realservers: one RAID array, bad RAID optionsWill anyone ever offer SSDs?

Brandon Williams Cassandra Summit 1.0

Page 3: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Making writes faster

Use a separate IO device for the commit log.Hard to accomplish in the cloud

Rackspace: one IO device, but it’s persistent (RAID arrayunderneath)EC2: EBS is slow, local disk is impersistent

You could put the commitlog on the ephemeral drive anyway,at the price of durabilityBut then, why have a commitlog at all?Maybe you can disable it in 0.7/0.8

Realservers: one RAID array, bad RAID optionsWill anyone ever offer SSDs?

Brandon Williams Cassandra Summit 1.0

Page 4: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Making writes faster

Use a separate IO device for the commit log.Hard to accomplish in the cloudRackspace: one IO device, but it’s persistent (RAID arrayunderneath)

EC2: EBS is slow, local disk is impersistentYou could put the commitlog on the ephemeral drive anyway,at the price of durabilityBut then, why have a commitlog at all?Maybe you can disable it in 0.7/0.8

Realservers: one RAID array, bad RAID optionsWill anyone ever offer SSDs?

Brandon Williams Cassandra Summit 1.0

Page 5: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Making writes faster

Use a separate IO device for the commit log.Hard to accomplish in the cloudRackspace: one IO device, but it’s persistent (RAID arrayunderneath)EC2: EBS is slow, local disk is impersistent

You could put the commitlog on the ephemeral drive anyway,at the price of durabilityBut then, why have a commitlog at all?Maybe you can disable it in 0.7/0.8

Realservers: one RAID array, bad RAID optionsWill anyone ever offer SSDs?

Brandon Williams Cassandra Summit 1.0

Page 6: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Making writes faster

Use a separate IO device for the commit log.Hard to accomplish in the cloudRackspace: one IO device, but it’s persistent (RAID arrayunderneath)EC2: EBS is slow, local disk is impersistent

You could put the commitlog on the ephemeral drive anyway,at the price of durabilityBut then, why have a commitlog at all?

Maybe you can disable it in 0.7/0.8

Realservers: one RAID array, bad RAID optionsWill anyone ever offer SSDs?

Brandon Williams Cassandra Summit 1.0

Page 7: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Making writes faster

Use a separate IO device for the commit log.Hard to accomplish in the cloudRackspace: one IO device, but it’s persistent (RAID arrayunderneath)EC2: EBS is slow, local disk is impersistent

You could put the commitlog on the ephemeral drive anyway,at the price of durabilityBut then, why have a commitlog at all?Maybe you can disable it in 0.7/0.8

Realservers: one RAID array, bad RAID optionsWill anyone ever offer SSDs?

Brandon Williams Cassandra Summit 1.0

Page 8: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Making writes faster

Use a separate IO device for the commit log.Hard to accomplish in the cloudRackspace: one IO device, but it’s persistent (RAID arrayunderneath)EC2: EBS is slow, local disk is impersistent

You could put the commitlog on the ephemeral drive anyway,at the price of durabilityBut then, why have a commitlog at all?Maybe you can disable it in 0.7/0.8

Realservers: one RAID array, bad RAID options

Will anyone ever offer SSDs?

Brandon Williams Cassandra Summit 1.0

Page 9: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Making writes faster

Use a separate IO device for the commit log.Hard to accomplish in the cloudRackspace: one IO device, but it’s persistent (RAID arrayunderneath)EC2: EBS is slow, local disk is impersistent

You could put the commitlog on the ephemeral drive anyway,at the price of durabilityBut then, why have a commitlog at all?Maybe you can disable it in 0.7/0.8

Realservers: one RAID array, bad RAID optionsWill anyone ever offer SSDs?

Brandon Williams Cassandra Summit 1.0

Page 10: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

What else?

concurrent writers (concurrent readers forreads)

increase if you have lots of cores

memtable flush writersincrease if you have lots of IO

Brandon Williams Cassandra Summit 1.0

Page 11: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

What else?

concurrent writers (concurrent readers forreads)

increase if you have lots of coresmemtable flush writers

increase if you have lots of IO

Brandon Williams Cassandra Summit 1.0

Page 12: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

What are all these options?

memtable throughput in mb

memtable operations in millions

memtable flush after mins

bigger memtables improve writes?

no, but they can improve readswhat?

Brandon Williams Cassandra Summit 1.0

Page 13: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

What are all these options?

memtable throughput in mb

memtable operations in millions

memtable flush after mins

bigger memtables improve writes?no, but they can improve reads

what?

Brandon Williams Cassandra Summit 1.0

Page 14: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

What are all these options?

memtable throughput in mb

memtable operations in millions

memtable flush after mins

bigger memtables improve writes?no, but they can improve readswhat?

Brandon Williams Cassandra Summit 1.0

Page 15: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Compaction: the slayer of reads

a necessary evilIO contention hellyou can reduce compaction priority in 0.6.4 or later

-Dcassandra.compaction.priority=1constantly outstripping it means you need more nodesreducing the priority affects CPU usage, not IO

avoid reading from slow hostsdynamic snitch

accrual failure detector

Brandon Williams Cassandra Summit 1.0

Page 16: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Compaction: the slayer of reads

a necessary evil

IO contention hellyou can reduce compaction priority in 0.6.4 or later

-Dcassandra.compaction.priority=1constantly outstripping it means you need more nodesreducing the priority affects CPU usage, not IO

avoid reading from slow hostsdynamic snitch

accrual failure detector

Brandon Williams Cassandra Summit 1.0

Page 17: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Compaction: the slayer of reads

a necessary evilIO contention hell

you can reduce compaction priority in 0.6.4 or later-Dcassandra.compaction.priority=1constantly outstripping it means you need more nodesreducing the priority affects CPU usage, not IO

avoid reading from slow hostsdynamic snitch

accrual failure detector

Brandon Williams Cassandra Summit 1.0

Page 18: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Compaction: the slayer of reads

a necessary evilIO contention hellyou can reduce compaction priority in 0.6.4 or later

-Dcassandra.compaction.priority=1

constantly outstripping it means you need more nodesreducing the priority affects CPU usage, not IO

avoid reading from slow hostsdynamic snitch

accrual failure detector

Brandon Williams Cassandra Summit 1.0

Page 19: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Compaction: the slayer of reads

a necessary evilIO contention hellyou can reduce compaction priority in 0.6.4 or later

-Dcassandra.compaction.priority=1constantly outstripping it means you need more nodes

reducing the priority affects CPU usage, not IOavoid reading from slow hosts

dynamic snitchaccrual failure detector

Brandon Williams Cassandra Summit 1.0

Page 20: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Compaction: the slayer of reads

a necessary evilIO contention hellyou can reduce compaction priority in 0.6.4 or later

-Dcassandra.compaction.priority=1constantly outstripping it means you need more nodesreducing the priority affects CPU usage, not IO

avoid reading from slow hostsdynamic snitch

accrual failure detector

Brandon Williams Cassandra Summit 1.0

Page 21: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Compaction: the slayer of reads

a necessary evilIO contention hellyou can reduce compaction priority in 0.6.4 or later

-Dcassandra.compaction.priority=1constantly outstripping it means you need more nodesreducing the priority affects CPU usage, not IO

avoid reading from slow hosts

dynamic snitchaccrual failure detector

Brandon Williams Cassandra Summit 1.0

Page 22: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Compaction: the slayer of reads

a necessary evilIO contention hellyou can reduce compaction priority in 0.6.4 or later

-Dcassandra.compaction.priority=1constantly outstripping it means you need more nodesreducing the priority affects CPU usage, not IO

avoid reading from slow hostsdynamic snitch

accrual failure detector

Brandon Williams Cassandra Summit 1.0

Page 23: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Compaction: the slayer of reads

a necessary evilIO contention hellyou can reduce compaction priority in 0.6.4 or later

-Dcassandra.compaction.priority=1constantly outstripping it means you need more nodesreducing the priority affects CPU usage, not IO

avoid reading from slow hostsdynamic snitch

accrual failure detector

Brandon Williams Cassandra Summit 1.0

Page 24: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Compaction (con’t)

bigger memtables absorb more overwrites

less sstables makes for more efficient compactionif you are write once then read-only, you *could* turn it off

merge-on-read and bloomfilters save yousomeday, you’ll want to repair

Brandon Williams Cassandra Summit 1.0

Page 25: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Compaction (con’t)

bigger memtables absorb more overwritesless sstables makes for more efficient compaction

if you are write once then read-only, you *could* turn it offmerge-on-read and bloomfilters save yousomeday, you’ll want to repair

Brandon Williams Cassandra Summit 1.0

Page 26: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Compaction (con’t)

bigger memtables absorb more overwritesless sstables makes for more efficient compaction

if you are write once then read-only, you *could* turn it off

merge-on-read and bloomfilters save yousomeday, you’ll want to repair

Brandon Williams Cassandra Summit 1.0

Page 27: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Compaction (con’t)

bigger memtables absorb more overwritesless sstables makes for more efficient compaction

if you are write once then read-only, you *could* turn it offmerge-on-read and bloomfilters save you

someday, you’ll want to repair

Brandon Williams Cassandra Summit 1.0

Page 28: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Compaction (con’t)

bigger memtables absorb more overwritesless sstables makes for more efficient compaction

if you are write once then read-only, you *could* turn it offmerge-on-read and bloomfilters save yousomeday, you’ll want to repair

Brandon Williams Cassandra Summit 1.0

Page 29: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Know your read pattern

how much data is in the working set?disk is slow: you want that in memory

sometimes you can’t afford the cost

how many reads are repeats?doing lots of random IO within a row?

column index size in kb

Brandon Williams Cassandra Summit 1.0

Page 30: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Know your read pattern

how much data is in the working set?

disk is slow: you want that in memorysometimes you can’t afford the cost

how many reads are repeats?doing lots of random IO within a row?

column index size in kb

Brandon Williams Cassandra Summit 1.0

Page 31: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Know your read pattern

how much data is in the working set?disk is slow: you want that in memory

sometimes you can’t afford the cost

how many reads are repeats?doing lots of random IO within a row?

column index size in kb

Brandon Williams Cassandra Summit 1.0

Page 32: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Know your read pattern

how much data is in the working set?disk is slow: you want that in memory

sometimes you can’t afford the cost

how many reads are repeats?doing lots of random IO within a row?

column index size in kb

Brandon Williams Cassandra Summit 1.0

Page 33: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Know your read pattern

how much data is in the working set?disk is slow: you want that in memory

sometimes you can’t afford the cost

how many reads are repeats?

doing lots of random IO within a row?column index size in kb

Brandon Williams Cassandra Summit 1.0

Page 34: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Know your read pattern

how much data is in the working set?disk is slow: you want that in memory

sometimes you can’t afford the cost

how many reads are repeats?doing lots of random IO within a row?

column index size in kb

Brandon Williams Cassandra Summit 1.0

Page 35: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Caches

on a cold hit, each row requires two seeksone to find the row’s position in the index

key cache eliminates thisanother to read the row

row cache eliminates this, toocolumns in the row are contiguous afterwards

make fat rowsbut not too fat, since the row is the unit of distribution

the OS file cacheuse a good OS

Brandon Williams Cassandra Summit 1.0

Page 36: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Caches

on a cold hit, each row requires two seeks

one to find the row’s position in the indexkey cache eliminates this

another to read the rowrow cache eliminates this, too

columns in the row are contiguous afterwardsmake fat rowsbut not too fat, since the row is the unit of distribution

the OS file cacheuse a good OS

Brandon Williams Cassandra Summit 1.0

Page 37: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Caches

on a cold hit, each row requires two seeksone to find the row’s position in the index

key cache eliminates thisanother to read the row

row cache eliminates this, toocolumns in the row are contiguous afterwards

make fat rowsbut not too fat, since the row is the unit of distribution

the OS file cacheuse a good OS

Brandon Williams Cassandra Summit 1.0

Page 38: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Caches

on a cold hit, each row requires two seeksone to find the row’s position in the index

key cache eliminates this

another to read the rowrow cache eliminates this, too

columns in the row are contiguous afterwardsmake fat rowsbut not too fat, since the row is the unit of distribution

the OS file cacheuse a good OS

Brandon Williams Cassandra Summit 1.0

Page 39: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Caches

on a cold hit, each row requires two seeksone to find the row’s position in the index

key cache eliminates thisanother to read the row

row cache eliminates this, too

columns in the row are contiguous afterwardsmake fat rowsbut not too fat, since the row is the unit of distribution

the OS file cacheuse a good OS

Brandon Williams Cassandra Summit 1.0

Page 40: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Caches

on a cold hit, each row requires two seeksone to find the row’s position in the index

key cache eliminates thisanother to read the row

row cache eliminates this, toocolumns in the row are contiguous afterwards

make fat rowsbut not too fat, since the row is the unit of distribution

the OS file cacheuse a good OS

Brandon Williams Cassandra Summit 1.0

Page 41: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Caches

on a cold hit, each row requires two seeksone to find the row’s position in the index

key cache eliminates thisanother to read the row

row cache eliminates this, toocolumns in the row are contiguous afterwards

make fat rows

but not too fat, since the row is the unit of distributionthe OS file cache

use a good OS

Brandon Williams Cassandra Summit 1.0

Page 42: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Caches

on a cold hit, each row requires two seeksone to find the row’s position in the index

key cache eliminates thisanother to read the row

row cache eliminates this, toocolumns in the row are contiguous afterwards

make fat rowsbut not too fat, since the row is the unit of distribution

the OS file cacheuse a good OS

Brandon Williams Cassandra Summit 1.0

Page 43: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Caches

on a cold hit, each row requires two seeksone to find the row’s position in the index

key cache eliminates thisanother to read the row

row cache eliminates this, toocolumns in the row are contiguous afterwards

make fat rowsbut not too fat, since the row is the unit of distribution

the OS file cache

use a good OS

Brandon Williams Cassandra Summit 1.0

Page 44: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Caches

on a cold hit, each row requires two seeksone to find the row’s position in the index

key cache eliminates thisanother to read the row

row cache eliminates this, toocolumns in the row are contiguous afterwards

make fat rowsbut not too fat, since the row is the unit of distribution

the OS file cacheuse a good OS

Brandon Williams Cassandra Summit 1.0

Page 45: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Caching Strategies

key cacheexcellent bang for your buckhalf your seeks are gonea lot of keys fit in a relatively small amount of memory

row cacheall seeks are gonebut more heap usage = more GC pressuretrying to use 32GB of row cache will wreck youestimating the correct size can be difficult

use the average row size in cfstats as a starting pointin 0.7, each SSTable has a persistent row size histogramthe penalty for being wrong can be catastrophic: OOMcan’t be done programmatically in Java, or Cassandra woulddo it for youthis is why you can’t set an absolute amount in bytes

if you enable on it very fat rows, it can be badkeep your indexes in a different column family

Brandon Williams Cassandra Summit 1.0

Page 46: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Caching Strategies

key cacheexcellent bang for your buckhalf your seeks are gonea lot of keys fit in a relatively small amount of memory

row cacheall seeks are gonebut more heap usage = more GC pressuretrying to use 32GB of row cache will wreck youestimating the correct size can be difficult

use the average row size in cfstats as a starting pointin 0.7, each SSTable has a persistent row size histogramthe penalty for being wrong can be catastrophic: OOMcan’t be done programmatically in Java, or Cassandra woulddo it for youthis is why you can’t set an absolute amount in bytes

if you enable on it very fat rows, it can be badkeep your indexes in a different column family

Brandon Williams Cassandra Summit 1.0

Page 47: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Caching Strategies

key cacheexcellent bang for your buckhalf your seeks are gonea lot of keys fit in a relatively small amount of memory

row cacheall seeks are gonebut more heap usage = more GC pressure

trying to use 32GB of row cache will wreck youestimating the correct size can be difficult

use the average row size in cfstats as a starting pointin 0.7, each SSTable has a persistent row size histogramthe penalty for being wrong can be catastrophic: OOMcan’t be done programmatically in Java, or Cassandra woulddo it for youthis is why you can’t set an absolute amount in bytes

if you enable on it very fat rows, it can be badkeep your indexes in a different column family

Brandon Williams Cassandra Summit 1.0

Page 48: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Caching Strategies

key cacheexcellent bang for your buckhalf your seeks are gonea lot of keys fit in a relatively small amount of memory

row cacheall seeks are gonebut more heap usage = more GC pressuretrying to use 32GB of row cache will wreck you

estimating the correct size can be difficultuse the average row size in cfstats as a starting pointin 0.7, each SSTable has a persistent row size histogramthe penalty for being wrong can be catastrophic: OOMcan’t be done programmatically in Java, or Cassandra woulddo it for youthis is why you can’t set an absolute amount in bytes

if you enable on it very fat rows, it can be badkeep your indexes in a different column family

Brandon Williams Cassandra Summit 1.0

Page 49: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Caching Strategies

key cacheexcellent bang for your buckhalf your seeks are gonea lot of keys fit in a relatively small amount of memory

row cacheall seeks are gonebut more heap usage = more GC pressuretrying to use 32GB of row cache will wreck youestimating the correct size can be difficult

use the average row size in cfstats as a starting pointin 0.7, each SSTable has a persistent row size histogramthe penalty for being wrong can be catastrophic: OOMcan’t be done programmatically in Java, or Cassandra woulddo it for youthis is why you can’t set an absolute amount in bytes

if you enable on it very fat rows, it can be badkeep your indexes in a different column family

Brandon Williams Cassandra Summit 1.0

Page 50: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Caching Strategies

key cacheexcellent bang for your buckhalf your seeks are gonea lot of keys fit in a relatively small amount of memory

row cacheall seeks are gonebut more heap usage = more GC pressuretrying to use 32GB of row cache will wreck youestimating the correct size can be difficult

use the average row size in cfstats as a starting pointin 0.7, each SSTable has a persistent row size histogramthe penalty for being wrong can be catastrophic: OOMcan’t be done programmatically in Java, or Cassandra woulddo it for youthis is why you can’t set an absolute amount in bytes

if you enable on it very fat rows, it can be bad

keep your indexes in a different column family

Brandon Williams Cassandra Summit 1.0

Page 51: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Caching Strategies

key cacheexcellent bang for your buckhalf your seeks are gonea lot of keys fit in a relatively small amount of memory

row cacheall seeks are gonebut more heap usage = more GC pressuretrying to use 32GB of row cache will wreck youestimating the correct size can be difficult

use the average row size in cfstats as a starting pointin 0.7, each SSTable has a persistent row size histogramthe penalty for being wrong can be catastrophic: OOMcan’t be done programmatically in Java, or Cassandra woulddo it for youthis is why you can’t set an absolute amount in bytes

if you enable on it very fat rows, it can be badkeep your indexes in a different column family

Brandon Williams Cassandra Summit 1.0

Page 52: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Caching Strategies (con’t)

OS file cache: it’s freeno size estimation needed

mmap is greatunless it makes you swapswitch to mmap index onlywhy do you have swap enabled, anyway?

Absolute numbers vs percentagespercentages can be an OOM time bombharder to calculate how much memory the cache will use

Brandon Williams Cassandra Summit 1.0

Page 53: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Caching Strategies (con’t)

OS file cache: it’s freeno size estimation neededmmap is great

unless it makes you swap

switch to mmap index onlywhy do you have swap enabled, anyway?

Absolute numbers vs percentagespercentages can be an OOM time bombharder to calculate how much memory the cache will use

Brandon Williams Cassandra Summit 1.0

Page 54: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Caching Strategies (con’t)

OS file cache: it’s freeno size estimation neededmmap is great

unless it makes you swapswitch to mmap index only

why do you have swap enabled, anyway?

Absolute numbers vs percentagespercentages can be an OOM time bombharder to calculate how much memory the cache will use

Brandon Williams Cassandra Summit 1.0

Page 55: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Caching Strategies (con’t)

OS file cache: it’s freeno size estimation neededmmap is great

unless it makes you swapswitch to mmap index onlywhy do you have swap enabled, anyway?

Absolute numbers vs percentagespercentages can be an OOM time bombharder to calculate how much memory the cache will use

Brandon Williams Cassandra Summit 1.0

Page 56: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Caching Strategies (con’t)

OS file cache: it’s freeno size estimation neededmmap is great

unless it makes you swapswitch to mmap index onlywhy do you have swap enabled, anyway?

Absolute numbers vs percentagespercentages can be an OOM time bombharder to calculate how much memory the cache will use

Brandon Williams Cassandra Summit 1.0

Page 57: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Caching Strategies (con’t)

OS file cache: it’s freeno size estimation neededmmap is great

unless it makes you swapswitch to mmap index onlywhy do you have swap enabled, anyway?

Absolute numbers vs percentagespercentages can be an OOM time bombharder to calculate how much memory the cache will use

Brandon Williams Cassandra Summit 1.0

Page 58: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Caching Strategies (con’t)

OS file cache: it’s freeno size estimation neededmmap is great

unless it makes you swapswitch to mmap index onlywhy do you have swap enabled, anyway?

Absolute numbers vs percentagespercentages can be an OOM time bombharder to calculate how much memory the cache will use

Brandon Williams Cassandra Summit 1.0

Page 59: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Caching Strategies (con’t)

lookup order:row cachekey cachedisk (file cache?)

sizing your caches:large key cachesmaller row cache for very hot rowsleave the rest to the OS

don’t make your heap larger than neededmonitor hit rates via JMX

actually, monitor everything you can

Brandon Williams Cassandra Summit 1.0

Page 60: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Caching Strategies (con’t)

lookup order:row cachekey cachedisk (file cache?)

sizing your caches:large key cachesmaller row cache for very hot rowsleave the rest to the OS

don’t make your heap larger than neededmonitor hit rates via JMX

actually, monitor everything you can

Brandon Williams Cassandra Summit 1.0

Page 61: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Caching Strategies (con’t)

lookup order:row cachekey cachedisk (file cache?)

sizing your caches:large key cachesmaller row cache for very hot rowsleave the rest to the OS

don’t make your heap larger than needed

monitor hit rates via JMXactually, monitor everything you can

Brandon Williams Cassandra Summit 1.0

Page 62: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Caching Strategies (con’t)

lookup order:row cachekey cachedisk (file cache?)

sizing your caches:large key cachesmaller row cache for very hot rowsleave the rest to the OS

don’t make your heap larger than neededmonitor hit rates via JMX

actually, monitor everything you can

Brandon Williams Cassandra Summit 1.0

Page 63: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Test, Measure, Tweak, Repeat

use stress.py as a baselinemake sure you have multiprocessing

move to real world data

Brandon Williams Cassandra Summit 1.0

Page 64: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Test, Measure, Tweak, Repeat

use stress.py as a baselinemake sure you have multiprocessing

move to real world data

Brandon Williams Cassandra Summit 1.0

Page 65: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Test, Measure, Tweak, Repeat

use stress.py as a baselinemake sure you have multiprocessing

move to real world data

Brandon Williams Cassandra Summit 1.0

Page 66: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

Settings you don’t need to touch

commitlog rotation threshold in mb

SlicedBufferSizeInKB

FlushIndexBufferSizeInMB

Brandon Williams Cassandra Summit 1.0

Page 67: Cassandra Summit 2010 Performance Tuning

Tuning WritesTuning Reads

The End

Questions?

Brandon Williams Cassandra Summit 1.0