![Page 1: SignalFx: Making Cassandra Perform as a Time Series Database](https://reader034.vdocuments.site/reader034/viewer/2022042618/5889018c1a28abcf5f8b64ef/html5/thumbnails/1.jpg)
M M / D D / Y Y
YOUR T ITLE HERE
P R E PA R E D F O R :
P L A C E L O G O
H E R E
Making Cassandra performas a time series database
Paul [email protected]
![Page 2: SignalFx: Making Cassandra Perform as a Time Series Database](https://reader034.vdocuments.site/reader034/viewer/2022042618/5889018c1a28abcf5f8b64ef/html5/thumbnails/2.jpg)
Introduction
• real time streaming analytics for monitoring and alerting
• ingest many billions of points of timeseries data per day
• ingest at 1 second resolution
• all of this data ends up in cassandra
#CassandraSummit
![Page 3: SignalFx: Making Cassandra Perform as a Time Series Database](https://reader034.vdocuments.site/reader034/viewer/2022042618/5889018c1a28abcf5f8b64ef/html5/thumbnails/3.jpg)
What we’re talking about
• a metric is an abstract quantity such as CPU load or heap size
• a source is some entity which measures and reports metrics
• a datapoint is a value for a metric from a source at some time
• a timeseries a sequence of those datapoints over time
#CassandraSummit
![Page 4: SignalFx: Making Cassandra Perform as a Time Series Database](https://reader034.vdocuments.site/reader034/viewer/2022042618/5889018c1a28abcf5f8b64ef/html5/thumbnails/4.jpg)
4#CassandraSummit
overall performance (version 0→1→2→3)
![Page 5: SignalFx: Making Cassandra Perform as a Time Series Database](https://reader034.vdocuments.site/reader034/viewer/2022042618/5889018c1a28abcf5f8b64ef/html5/thumbnails/5.jpg)
5
original ingest path (version 0)
ingestserver loader queue
TSDBserversources
TSDB clients
sourcessources TSDBC*
#CassandraSummit
![Page 6: SignalFx: Making Cassandra Perform as a Time Series Database](https://reader034.vdocuments.site/reader034/viewer/2022042618/5889018c1a28abcf5f8b64ef/html5/thumbnails/6.jpg)
6
TSDB schema (versions 0,1,2,3)
CREATE TABLE table_0 ( segment text time timestamp, value blob, PRIMARY KEY (segment, time) ) WITH COMPACT STORAGE;
#CassandraSummit
![Page 7: SignalFx: Making Cassandra Perform as a Time Series Database](https://reader034.vdocuments.site/reader034/viewer/2022042618/5889018c1a28abcf5f8b64ef/html5/thumbnails/7.jpg)
7
cassandra operation (version 0)
#CassandraSummit
![Page 8: SignalFx: Making Cassandra Perform as a Time Series Database](https://reader034.vdocuments.site/reader034/viewer/2022042618/5889018c1a28abcf5f8b64ef/html5/thumbnails/8.jpg)
8#CassandraSummit
init ial performance (version 0)
![Page 9: SignalFx: Making Cassandra Perform as a Time Series Database](https://reader034.vdocuments.site/reader034/viewer/2022042618/5889018c1a28abcf5f8b64ef/html5/thumbnails/9.jpg)
buffered writes rationale (version 1)
• writing every datapoint individually is very expensive
• buffer data in memory
• write many points in a batch statement
• buffers are dropped when they have been written to cassandra
9#CassandraSummit
![Page 10: SignalFx: Making Cassandra Perform as a Time Series Database](https://reader034.vdocuments.site/reader034/viewer/2022042618/5889018c1a28abcf5f8b64ef/html5/thumbnails/10.jpg)
10
buffered write ingest path (versions 1,2)
TSDBserver sources
TSDB clients
sourcessources TSDBC*migratormemory
tieringestserver
#CassandraSummit
![Page 11: SignalFx: Making Cassandra Perform as a Time Series Database](https://reader034.vdocuments.site/reader034/viewer/2022042618/5889018c1a28abcf5f8b64ef/html5/thumbnails/11.jpg)
11
buffered writes operation (version 1)
#CassandraSummit
![Page 12: SignalFx: Making Cassandra Perform as a Time Series Database](https://reader034.vdocuments.site/reader034/viewer/2022042618/5889018c1a28abcf5f8b64ef/html5/thumbnails/12.jpg)
12
buffered writes performance (versions 0→1)
#CassandraSummit
![Page 13: SignalFx: Making Cassandra Perform as a Time Series Database](https://reader034.vdocuments.site/reader034/viewer/2022042618/5889018c1a28abcf5f8b64ef/html5/thumbnails/13.jpg)
packed writes rationale (version 2)
• writing data point-by-point means a column for each datapoint
• pack a buffer of datapoints into a block and write the block
• this will reduce the number of columns and write operations
• will have more impact on storage than on performance
• schema and overall flow remain the same
13#CassandraSummit
![Page 14: SignalFx: Making Cassandra Perform as a Time Series Database](https://reader034.vdocuments.site/reader034/viewer/2022042618/5889018c1a28abcf5f8b64ef/html5/thumbnails/14.jpg)
14
packed writes operation (version 2)
#CassandraSummit
![Page 15: SignalFx: Making Cassandra Perform as a Time Series Database](https://reader034.vdocuments.site/reader034/viewer/2022042618/5889018c1a28abcf5f8b64ef/html5/thumbnails/15.jpg)
15
packed writes performance (versions 1→2)
#CassandraSummit
![Page 16: SignalFx: Making Cassandra Perform as a Time Series Database](https://reader034.vdocuments.site/reader034/viewer/2022042618/5889018c1a28abcf5f8b64ef/html5/thumbnails/16.jpg)
redo-log rationale (version 3)
• if the ingest server dies, we lose the buffered data
• fix this with more cassandra
• write a persistent log of data as it’s written to the memory-tier
• when an ingest server restarts it will reload its memory-tier from this log
16#CassandraSummit
![Page 17: SignalFx: Making Cassandra Perform as a Time Series Database](https://reader034.vdocuments.site/reader034/viewer/2022042618/5889018c1a28abcf5f8b64ef/html5/thumbnails/17.jpg)
17
redo-log diagram (version 3)
TSDBserver sources
TSDB clients
sourcessources TSDBC*migratormemory
tieringestserver
logC*
#CassandraSummit
![Page 18: SignalFx: Making Cassandra Perform as a Time Series Database](https://reader034.vdocuments.site/reader034/viewer/2022042618/5889018c1a28abcf5f8b64ef/html5/thumbnails/18.jpg)
18
log schema (version 3)
CREATE TABLE table_0 ( stamp text, sequence bigint, value blob, PRIMARY KEY (stamp, sequence) ) WITH COMPACT STORAGE;
#CassandraSummit
![Page 19: SignalFx: Making Cassandra Perform as a Time Series Database](https://reader034.vdocuments.site/reader034/viewer/2022042618/5889018c1a28abcf5f8b64ef/html5/thumbnails/19.jpg)
19
packed writes with log operation (version 3)
#CassandraSummit
![Page 20: SignalFx: Making Cassandra Perform as a Time Series Database](https://reader034.vdocuments.site/reader034/viewer/2022042618/5889018c1a28abcf5f8b64ef/html5/thumbnails/20.jpg)
20
log performance (version 2→3)
#CassandraSummit
![Page 21: SignalFx: Making Cassandra Perform as a Time Series Database](https://reader034.vdocuments.site/reader034/viewer/2022042618/5889018c1a28abcf5f8b64ef/html5/thumbnails/21.jpg)
what we found
• matching the workload to the database is very important
• load is much more dependent on rate of writes than on volume of data written
• for our very write-heavy workload we saw 4x performance improvement by doing fewer, larger writes
• it turns out to be cheaper to write data twice efficiently than once naively
21#CassandraSummit
![Page 22: SignalFx: Making Cassandra Perform as a Time Series Database](https://reader034.vdocuments.site/reader034/viewer/2022042618/5889018c1a28abcf5f8b64ef/html5/thumbnails/22.jpg)
22
overall performance (version 0→1→2→3)
#CassandraSummit
![Page 23: SignalFx: Making Cassandra Perform as a Time Series Database](https://reader034.vdocuments.site/reader034/viewer/2022042618/5889018c1a28abcf5f8b64ef/html5/thumbnails/23.jpg)
23
Thanks
Paul [email protected]
#CassandraSummit
WE’RE [email protected]://signalfx.com/careers.html