c* summit 2013: time-series metrics with cassandra by mike heffner

#CASSANDRA13

Time-Series Metrics with Cassandra Mike Heffner

#CASSANDRA13

What we do.

#CASSANDRA13

October 2011

l  Decision: All measurements in Cassandra l  Single EC2 Ring: 6 * m1.large l  Cassandra 0.8.x l  How does this work?

#CASSANDRA13

Today

l  Multiple sharded rings l  ~250,000 writes / second l  EC2: m1.xlarge and m2.4xlarge l  Cassandra 1.1.x l  Read load: < 1%

#CASSANDRA13

Talk Highlights

l  Matching Schema to Storage l  Optimally Expiring Data l  Monitor Everything

#CASSANDRA13

Matching Schema to Storage

#CASSANDRA13

What is a Measurement?

( Metric ID, Source )

(X, Y) => (Time stamp, Value)

#CASSANDRA13

Measurement CF

Example: Select measurements between times [T1, T2]:

#CASSANDRA13

Locating Rows

Let us calculate the maximum row size: l  1 minute records l  1 week TTL l  7 days * 24 hours * 60 minutes => 10,080 l  3 Longs * 8 bytes * 10k => ~240KB (not bad)

#CASSANDRA13

Row Storage Over Time

#CASSANDRA13

Seek All The SStables

#CASSANDRA13

Examining CF SSTables Metrics/metric_id_epochs_60 histograms Offset SSTables 1 28821 2 58859 3 201198 4 178326 5 223016 6 154952 7 83289 8 21552 10 81104

1 2 3 4 5 6 7 8 10

nodetool cfhistograms Metrics metric_id_epochs_60

#CASSANDRA13

Splitting the Rows

mget(Rows: [12, EBase_30], [12, EBase_40], Columns: {31->45})

Retrieve Time Bases for Times 31->45 for metric ID 12:

#CASSANDRA13

Examining CF SSTables Metrics/metric_id_epochs_60 Offset SSTables 1 28821 2 58859 3 201198 4 178326 5 223016 6 154952 7 83289 8 21552 10 81104

1 2 3 4 5 6 7 8 10

nodetool cfhistograms Metrics metric_id_epochs_60

Metrics/metric_id_epochs_60 Offset SSTables 1 3491820 2 5389762 3 4095760 4 1310741 5 9976

1 2 3 4 5 6 7 8 9 10

Before

After

#CASSANDRA13

/graph me

#CASSANDRA13

Optimally Expiring Data

#CASSANDRA13

TTL Expiration

l  Churn of about 750GB / day l  12 TB total l  6% of data set l  gc_grace = 0 l  STC

#CASSANDRA13

Synchronized Compactions

#CASSANDRA13

#CASSANDRA13

nodetool compact

#CASSANDRA13

* http://hight3ch.com/garbage-truck-crushing-a-car/

#CASSANDRA13

nodetool cleanup

#CASSANDRA13

Cleanup

l  Not just for topology changes l  Tombstoned rows (not referenced) l  Rotated row keys decrease references l  Cons: Must process every sstable.

#CASSANDRA13

Immutable SStables

#CASSANDRA13

Leverage SStable Mod Time

l  If now – mtime > TTL => all data is expired l  We can quickly eliminate entire sstables: find -mtime +<TTL> -name *.db | xargs rm

l  Fast and low overhead l  Cons: Rolling restart

26G 2013-05-17 09:44 Metrics-metrics_60-hf-7209-Data.db

#CASSANDRA13

nodetool setcompactionthreshold

#CASSANDRA13

Increasing minor compactions

l  By default, STC requires a minimum of 4 ssts l  Leads to large non-compacted sstables l  Dropping to 2 can flatten the storage growth nodetool setcompactionthreshold <ks> <cf> 2

l  Cons: CPU/IO increase

#CASSANDRA13

Result

#CASSANDRA13

Effective Monitoring

#CASSANDRA13

Ring Dashboards

#CASSANDRA13

Disk Errors => Throw Away

l  If you ever see this, replace! end_request: I/O error, dev xvdb, sector 467940617

end_request: I/O error, dev xvdb, sector 467940617

l  Mark node down, bootstrap new l  No metric for this?

#CASSANDRA13

Cassandra Log Volume

l  Count log lines seen every 10 minutes l  Track over time l  Can identify: -  Unbalanced workloads -  Schema disagreements -  Phantom gossip nodes -  GC activity

l  grep -v '.java' => exceptions

#CASSANDRA13

Q & A

Mike Heffner

/mheffner

/mheffner

c* summit 2013: time-series metrics with cassandra by mike heffner

Technology