terror & hysteria: cost effective scaling of time series data with cassandra (sam bisbee, threat...

21
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra Sam Bisbee, Threat Stack CTO

Upload: datastax

Post on 06-Jan-2017

74 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra (Sam Bisbee, Threat Stack) | C* Summit 2016

Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra

Sam Bisbee, Threat Stack CTO

Page 2: Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra (Sam Bisbee, Threat Stack) | C* Summit 2016

Typical [time series] problems on C*

● Disk utilization creates a scaling pattern of lighting money on fire

– Only works for a month or two, even with 90% disk utilization

● Every write up we found focused on schema design for tracking integers across time

– There are days we wish we only tracked integers

● Data drastically loses value over time, but C*'s design doesn't acknowledge this

– TTLs only address 0 value states, not partial value

– Ex., 99% of reads are for data in its first day

● Not all sensors are equal

Page 3: Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra (Sam Bisbee, Threat Stack) | C* Summit 2016

Categories of Time Series Data

Volume of Tx's

Size of Tx's

CRUD, Web 2.0

System Monitoring(CPU, etc.)System Monitoring(CPU, etc.)

Traditional object store

Threat Stack

Page 4: Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra (Sam Bisbee, Threat Stack) | C* Summit 2016

Categories of Time Series Data

Volume of Tx's

Size of Tx's

CRUD, Web 2.0

System Monitoring(CPU, etc.)System Monitoring(CPU, etc.)

Traditional object store

Threat Stack

Traditional timeseries on C*, whateveryone writes about

“We're going to needa bigger boat. Or disks.”

Page 5: Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra (Sam Bisbee, Threat Stack) | C* Summit 2016

We care about this thing called margins

(see: we're in Boston, not the Valley)

Page 6: Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra (Sam Bisbee, Threat Stack) | C* Summit 2016

Data at Threat Stack

● 5 to 10TBs per day of raw data

– Crossed several TB per day in first few months of production with ~4 people

● 80,000 to 150,000 Tx per second, analyzed in real time

– Internal goal of analyzing, persisting, and firing alerts in <1s

● 90% write to 10% read tx

● Pre-compute query results for 70% of queries for UI

– Optimized lookup tables & complex data structures, not just “query & cache”

● 100% AWS, distrust of remote storage in our DNA

– This is not just EBS bashing. This applies to all databases on all platforms, even a cage in a data center.

● By the way, we're on DSE 4.8.4 (C* 2.1)

Page 7: Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra (Sam Bisbee, Threat Stack) | C* Summit 2016

Generic data model

● Entire platform assumes that events form a partially ordered, eventually consistent, write ahead log

– A wonderful C* use case, so long as you only INSERT

● UPDATE is a dirty word and C* counters are “banned”

– We do our big counts elsewhere (“right tool for the right job”)

● No DELETEs, too many key permutations and don't want tombstones

● Duplicate writes will happen

– Legitimate: fully or partially failed batches of writes

– Legitimate: sensor resends data because it doesn't see platform's acknowledgement of data

– How-do-you-even-computer: people cannot configure NTP, so have fun constantly receiving data from 1970

● TTL on insert time, store and query on event time

Page 8: Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra (Sam Bisbee, Threat Stack) | C* Summit 2016

We need to show individual events or slices,

cannot use time granularity rows

(1min, 15min, 30min, 1hr, etc.)

Page 9: Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra (Sam Bisbee, Threat Stack) | C* Summit 2016

Creating and updating tables' schema

● ALTER TABLE isn't fun, so we support dual writes instead

– Create new schema, performing dual reads for new & old

– Cut writes over to new schema

– After TTL time, DROP TABLE old

● Each step is verifiable with unit tests and metrics

● Maintains insert only data model for temporary disk util cost

● Allows trivial testing of analysis and A/B'ing of schema

– Just toss a new schema in, gather some insights, and then feel free to drop it

Page 10: Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra (Sam Bisbee, Threat Stack) | C* Summit 2016

AWS Instance Types & EBS

● EBS is generally banned on our platform

– Too many of us lived through the great outage

– Too many of us cannot live with unpredictable I/O patterns

– Biggest reason: you cannot RI EBS

● Originally used i2.2xlarge's in 2014/2015

– Considering amount of “learning” we did, we were very grateful for SSDs due to amount of streaming we had to do

● Moved to d2.xlarge's and d2.2xlarge's in 2015

– RAID 0 the spindles with xfs

– We like the CPU and RAM to disk ratio, especially since compaction stops after a few hours

Page 11: Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra (Sam Bisbee, Threat Stack) | C* Summit 2016

$/TB on AWS

i2.2xlarge d2.2xlarge c3.2xlarge +6 x 2TB io1 EBS

No Prepay $619.04 / 1.6TB= $386 / TB / month

$586.92 / 12TB= $49.91 / TB / month

$1,713.16 / 12TB= $142.77/TB/month

Partial Prepay $530.37 / 1.6TB= $331.48/TB/month

$502.12 / 12TB= $41.85 / TB / month

$1,684.59 / 12TB= $140.39/TB/month

Full Prepay $519.17 / 1.6TB= $324.85/TB/month

$492 / 12TB= $41 / TB / month

$1,680.84 / 12TB= $140.07/TB/month

● Amortizes one-time RI across 1yr, focusing on cost instead of cash out of pocket

● Does not account for N=3 in cluster, so x3 for each record, then x2 for worst case compaction headroom (realistically need MUCH LESS)

● c3 column assumes d2 comparison on disk size, not fair versus i2

Page 12: Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra (Sam Bisbee, Threat Stack) | C* Summit 2016

We only store some raw data in C*

● Deleting data proved too difficult in the early days, even with DTCS (slides coming on how we solved this)

● Re-streaming due to regular maintenance could take a week or more

– Dropping instance size doesn't solve throughput problem since all resources are cut, not just disk size

– Another reason not to use EBS since you'll “never” get close to 100% disk utilization

● Due to aforementioned C* durability design, cost of data for day 2..N is too high even if you drop replica count

Page 13: Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra (Sam Bisbee, Threat Stack) | C* Summit 2016

Tying C* to raw data

● Every query must constrain a minimum of:

– Sensor ID

– Event Day

● Every query result must include a minimum of:

– Sensor ID

– Event Day

– Event ID

● Batches of (sensor_id, event_day, event_id) triples are then used to look up the raw events from raw data storage

– This isn't always necessary (aggregates, correlations, etc.)

– Even with additional hops, full reads are still <1s

Page 14: Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra (Sam Bisbee, Threat Stack) | C* Summit 2016

Using triples to batch writes

● Partition key starts with sensor id and event day

– Bonus: you get fresh ring location every day! Helps for averaging out your schema mistakes over the TTL

● Event batches off of RabbitMQ are already constrained to a single sensor id and event day

– Allows mapping a single AMQP read to a single C* write (RabbitMQ is podded, not clustered)

– Flow state of pipeline becomes trivial to understand

● Batch C* writes on partition key, then data size (soft cap at 5120 bytes, C* inner warn)

Page 15: Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra (Sam Bisbee, Threat Stack) | C* Summit 2016

Compaction woes, STCS & DTCS

● Used STCS in 2014/2015, expired data would get stuck ∞

– “We could rotate tables” → eh, no

– “We could rotate clusters” → oh c'mon, hell no

– “We could generate every historic permutation of keys within that time bucket with Spark and run DELETEs” →...............

● Used DTCS in 2015, but expired data still got stuck ∞

– When deciding whether an SSTable is too old to compact, compares “now” versus max timestamp (most recent write)

– If you write constantly (time series), then SSTables will rarely or never stop compacting

– This means that you never realize the true value of DTCS for time series, the ability to unlink whole SSTables from disk

Page 16: Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra (Sam Bisbee, Threat Stack) | C* Summit 2016

Cluster disk states assuming const sensor count

Disk Util

Time

What you want

What you get

Initial build up toretention period

Page 17: Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra (Sam Bisbee, Threat Stack) | C* Summit 2016

MTCS, fixing DTCS

https://github.com/threatstack/mtcs

Now compare w/ min time(oldest write)

Page 18: Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra (Sam Bisbee, Threat Stack) | C* Summit 2016

MTCS settings

● Never run repairs (never worked on STCS or DTCS anyway) and hinted handoff is off (great way to kill a cluster anyway)

● max_sstable_age_days = 1

base_time_seconds = 1 hour

● Results in roughly hour bucket sequential SSTables

– Reads are happy due to day or hour resolution, which have to provide this in the partition key anyway

● Rest of DTCS sub-properties are default

● Not worried about really old and small SSTables since those are simply unlinked “soon”

Page 19: Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra (Sam Bisbee, Threat Stack) | C* Summit 2016

MTCS + sstablejanitor.sh

● Even with MTCS, SSTables were still not getting unlinked

● So enters sstablejanitor.sh

– Cron job fires it once per hour

– Iterates over each SSTable on disk for MTCS tables (chef/cron feeds it a list of tables and their TTLs)

– Uses sstablemetadata to determine max timestamp

– If past TTL, then uses JMX to invoke CompactionManager's forceUserDefinedCompaction on the table

● Hack? Yes, cron + sed + awk + JMX qualifies as a hack, but it works like a charm and we don't carry expired data

● Bonus: don't need to reserve half your disks for compaction

Page 20: Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra (Sam Bisbee, Threat Stack) | C* Summit 2016
Page 21: Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra (Sam Bisbee, Threat Stack) | C* Summit 2016

Discussion

@threatstack@sbisbee