time series data with influxdb
TRANSCRIPT
Working with time series data with InfluxDB
Paul Dix @pauldix
What is time series data?
Stock trades and quotes
Metrics
Analytics
Events
Sensor data
Two kinds of time series data…
Regular time series
t0 t1 t2 t3 t4 t6 t7
Samples at regular intervals
Irregular time series
t0 t1 t2 t3 t4 t6 t7
Events whenever they come in
Inducing a regular time series from an irregular one
query: select count(customer_id) from events where time > now() - 1h group by time(1m), customer_id
Data that you ask questions about over time
InfluxDB is an open source distributed time
series database* still working on the distributed part
Why would you want a database for time series
data?
Scale
Example from DevOps• 2,000 servers, VMs, containers, or sensor units
• 200 measurements per server/unit
• every 10 seconds
• = 3,456,000,000 distinct points per day
Sharding Datausually requires application level code
Data retentionapplication level code and sharding
Rollups and aggregation
InfluxDB features
SQL style query language
Retention policiesautomatically managed data retention
Continuous queriesfor rollups and aggregation
HTTP API - 2 endpoints
HTTP API - 2 endpoints
/write?db=mydb&rp=fooWrite: HTTP POST
HTTP API - 2 endpoints
/write?db=mydb&rp=foo
/query?db=mydb&rp=foo&q=
Write: HTTP POST
Read: HTTP GET
InfluxDB Schema• Measurements (e.g. cpu, temperature, event,
memory)
InfluxDB Schema• Measurements (e.g. cpu, temperature, event,
memory)
• Tags (e.g. region=uswest, host=serverA, sensor=23)
InfluxDB Schema• Measurements (e.g. cpu, temperature, event,
memory)
• Tags (e.g. region=uswest, host=serverA, sensor=23)
• Fields (e.g. value=23.2, info=‘this is some extra stuff`, present=true)
InfluxDB Schema• Measurements (e.g. cpu, temperature, event,
memory)
• Tags (e.g. region=uswest, host=serverA, sensor=23)
• Fields (e.g. value=23.2, info=‘this is some extra stuff`, present=true)
• Timestamp (nano-second epoch)
All data is indexed by measurement, tagset,
and time
Influx CLI
$ ./influx Connected to http://localhost:8086 version 0.9 InfluxDB shell 0.9 >
Create a database
CREATE DATABASE foo
Create a retention policy
CREATE RETENTION POLICY <rp-name> ON <db-name> DURATION <duration> REPLICATION <n> [DEFAULT]
Create a retention policy
CREATE RETENTION POLICY <rp-name> ON <db-name> DURATION <duration> REPLICATION <n> [DEFAULT]
CREATE RETENTION POLICY high_precision ON mydb DURATION 7d REPLICATION 3 DEFAULT
Create a retention policy
CREATE RETENTION POLICY <rp-name> ON <db-name> DURATION <duration> REPLICATION <n> [DEFAULT]
CREATE RETENTION POLICY high_precision ON mydb DURATION 7d REPLICATION 3 DEFAULT
Writes will go into this RP unless otherwise specified
Discovery
Inverted indexof measurements and tags
DiscoverySHOW MEASUREMENTs
DiscoverySHOW MEASUREMENTs
SHOW MEASUREMENTS where host = 'serverA'
DiscoverySHOW MEASUREMENTs
SHOW MEASUREMENTS where host = 'serverA'
SHOW TAG KEYS
DiscoverySHOW MEASUREMENTs
SHOW MEASUREMENTS where host = 'serverA'
SHOW TAG KEYS
SHOW TAG KEYS from CPU
DiscoverySHOW MEASUREMENTs
SHOW MEASUREMENTS where host = 'serverA'
SHOW TAG KEYS
SHOW TAG KEYS from CPU
SHOW TAG VALUES from CPU WITH KEY = 'region'
DiscoverySHOW MEASUREMENTs
SHOW MEASUREMENTS where host = 'serverA'
SHOW TAG KEYS
SHOW TAG KEYS from CPU
SHOW TAG VALUES from CPU WITH KEY = 'region'
SHOW SERIES
DiscoverySHOW MEASUREMENTs
SHOW MEASUREMENTS where host = 'serverA'
SHOW TAG KEYS
SHOW TAG KEYS from CPU
SHOW TAG VALUES from CPU WITH KEY = 'region'
SHOW SERIES
SHOW SERIES where service = 'redis'
Queries
SQL-ish
select * from some_series where time > now() - 1h
Aggregates
select percentile(90, value) from cpu where time > now() - 1d group by time(10m)
Aggregates
select percentile(90, value) from cpu where time > now() - 1d group by time(10m), region
Group by a tag
Where against Regex (field)
select value from some_log_series where value =~ /.*ERROR.*/ and time > "2014-03-01" and time < "2014-03-03"
Where against Regex (tag)
select value from some_log_series where host =~ /.*asdf.*/ and time > "2014-03-01" and time < “2014-03-03" group by host
Functionsmin max percentile first last stddev mean count sum median distinct count(distinct)
more soon: difference, histogram, moving_average
Continuous queriesCREATE CONTINUOUS QUERY "10m_event_count"ON mydbBEGIN SELECT count(value) INTO "6_months".events FROM events GROUP BY time(10m)END;
Other tools
Telegrafdata collection
Chronograf
Grafana
More coming• Compression
• Clustering
• Custom functions