c* summit 2013: time-series metrics with cassandra by mike heffner

34
#CASSANDRA13 Time-Series Metrics with Cassandra Mike Heffner

Upload: planet-cassandra

Post on 24-Jan-2015

1.286 views

Category:

Technology


0 download

DESCRIPTION

Librato's Metrics platform relies on Cassandra as its sole data storage platform for time-series data. This session will discuss how we have scaled from a single six node Cassandra ring two years ago to the multiple storage rings that handle over 150,000 writes/second today. We'll cover the steps we have taken to scale the platform including the evolution of our underlying schema, operational tricks, and client-library improvements. The session will finish with our suggestions on how we believe Cassandra as a project and its community can be improved.

TRANSCRIPT

Page 1: C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner

#CASSANDRA13

Time-Series Metrics with Cassandra Mike Heffner

Page 2: C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner

#CASSANDRA13

What we do.

Page 3: C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner

#CASSANDRA13

October 2011

l  Decision: All measurements in Cassandra l  Single EC2 Ring: 6 * m1.large l  Cassandra 0.8.x l  How does this work?

Page 4: C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner

#CASSANDRA13

Today

l  Multiple sharded rings l  ~250,000 writes / second l  EC2: m1.xlarge and m2.4xlarge l  Cassandra 1.1.x l  Read load: < 1%

Page 5: C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner

#CASSANDRA13

Talk Highlights

l  Matching Schema to Storage l  Optimally Expiring Data l  Monitor Everything

Page 6: C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner

#CASSANDRA13

Matching Schema to Storage

Page 7: C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner

#CASSANDRA13

What is a Measurement?

( Metric ID, Source )

(X, Y) => (Time stamp, Value)

Page 8: C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner

#CASSANDRA13

Measurement CF

Example: Select measurements between times [T1, T2]:

Page 9: C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner

#CASSANDRA13

Locating Rows

Let us calculate the maximum row size: l  1 minute records l  1 week TTL l  7 days * 24 hours * 60 minutes => 10,080 l  3 Longs * 8 bytes * 10k => ~240KB (not bad)

Page 10: C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner

#CASSANDRA13

Row Storage Over Time

Page 11: C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner

#CASSANDRA13

Row Storage Over Time

Page 12: C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner

#CASSANDRA13

Seek All The SStables

Page 13: C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner

#CASSANDRA13

Examining CF SSTables Metrics/metric_id_epochs_60 histograms Offset SSTables 1 28821 2 58859 3 201198 4 178326 5 223016 6 154952 7 83289 8 21552 10 81104

1 2 3 4 5 6 7 8 10

nodetool cfhistograms Metrics metric_id_epochs_60

Page 14: C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner

#CASSANDRA13

Splitting the Rows

mget(Rows: [12, EBase_30], [12, EBase_40], Columns: {31->45})

Retrieve Time Bases for Times 31->45 for metric ID 12:

Page 15: C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner

#CASSANDRA13

Examining CF SSTables Metrics/metric_id_epochs_60 Offset SSTables 1 28821 2 58859 3 201198 4 178326 5 223016 6 154952 7 83289 8 21552 10 81104

1 2 3 4 5 6 7 8 10

nodetool cfhistograms Metrics metric_id_epochs_60

Metrics/metric_id_epochs_60 Offset SSTables 1 3491820 2 5389762 3 4095760 4 1310741 5 9976

1 2 3 4 5 6 7 8 9 10

Before

After

Page 16: C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner

#CASSANDRA13

/graph me

Page 17: C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner

#CASSANDRA13

Optimally Expiring Data

Page 18: C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner

#CASSANDRA13

TTL Expiration

l  Churn of about 750GB / day l  12 TB total l  6% of data set l  gc_grace = 0 l  STC

Page 19: C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner

#CASSANDRA13

Synchronized Compactions

Page 20: C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner

#CASSANDRA13

Page 21: C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner

#CASSANDRA13

nodetool compact

Page 22: C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner

#CASSANDRA13

* http://hight3ch.com/garbage-truck-crushing-a-car/

Page 23: C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner

#CASSANDRA13

nodetool cleanup

Page 24: C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner

#CASSANDRA13

Cleanup

l  Not just for topology changes l  Tombstoned rows (not referenced) l  Rotated row keys decrease references l  Cons: Must process every sstable.

Page 25: C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner

#CASSANDRA13

Immutable SStables

Page 26: C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner

#CASSANDRA13

Leverage SStable Mod Time

l  If now – mtime > TTL => all data is expired l  We can quickly eliminate entire sstables: find -mtime +<TTL> -name *.db | xargs rm

l  Fast and low overhead l  Cons: Rolling restart

26G 2013-05-17 09:44 Metrics-metrics_60-hf-7209-Data.db

Page 27: C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner

#CASSANDRA13

nodetool setcompactionthreshold

Page 28: C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner

#CASSANDRA13

Increasing minor compactions

l  By default, STC requires a minimum of 4 ssts l  Leads to large non-compacted sstables l  Dropping to 2 can flatten the storage growth nodetool setcompactionthreshold <ks> <cf> 2

l  Cons: CPU/IO increase

Page 29: C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner

#CASSANDRA13

Result

Page 30: C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner

#CASSANDRA13

Effective Monitoring

Page 31: C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner

#CASSANDRA13

Ring Dashboards

Page 32: C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner

#CASSANDRA13

Disk Errors => Throw Away

l  If you ever see this, replace! end_request: I/O error, dev xvdb, sector 467940617

end_request: I/O error, dev xvdb, sector 467940617

l  Mark node down, bootstrap new l  No metric for this?

Page 33: C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner

#CASSANDRA13

Cassandra Log Volume

l  Count log lines seen every 10 minutes l  Track over time l  Can identify: -  Unbalanced workloads -  Schema disagreements -  Phantom gossip nodes -  GC activity

l  grep -v '.java' => exceptions

Page 34: C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner

#CASSANDRA13

Q & A

Mike Heffner

/mheffner

/mheffner