dunning time-series-2015

© 2014 MapR Technologies 2

Agenda

• The Internet is turning upside down

• A story

• The last (mile) shall be first

• Time series on NO-SQL

• Faster time series on NO-SQL

• Summary


How the Internet Works

• Big content servers feed data across the backbone to

• Regional caches and servers feed data across neighborhood

transport to

• The “last mile”

• Bits are nearly conserved, $ are concentrated centrally

– But total $ mass at the edge is much higher


How The Internet Works

Server

Cache

Cache

Gateway

SwitchFirewall

c1

c2

Gateway

Switch Firewall

c1

c2

SwitchFirewall c1

c2


Conservation of Bits Decreases Bandwidth

Server

Cache

Cache

Gateway

SwitchFirewall

c1

c2

Gateway

Switch Firewall

c1

c2

SwitchFirewall c1

c2


Total Investment Dominated by Last Mile

Server

Cache

Cache

Gateway

SwitchFirewall

c1

c2

Gateway

Switch Firewall

c1

c2

SwitchFirewall c1

c2


The Rub

• What's the problem?

– Speed (end-to-end latency, backbone bw)

– Feasibility (cost for consumer links)

– Caching

• What do we need?

– Cheap last-mile hardware

– Good caches


First:

An apology for going

off-script


Now, the story


By the 1840’s, the NY-SF

sailing time was down to

130-180 days


In 1851, the record was

set at 89 days by the

Flying Cloud


The difference was due

(in part) to big data

and a primitive kind of

time-series database


These charts were free …

If you donated your data


But how does this apply

today?


What has changed?

Where will it lead?


Things


Emitting data


How The Internet Works

Server

Cache

Cache

Gateway

SwitchFirewall

c1

c2

Gateway

Switch Firewall

c1

c2

SwitchFirewall c1

c2


How the Internet is Going to Work

Server

Cache

Cache

GatewaySwitchControllerm4

m3

Gateway

SwitchController

m6

m5

SwitchControllerm2

m1


Where Will The $ Go?

Server

Cache

Cache

GatewaySwitchControllerm4

m3

Gateway

SwitchController

m6

m5

SwitchControllerm2

m1


Sensors


Controllers


The Problems

• Sensors and controllers have little processing or space

– SIM cards = 20Mhz processor, 128kb space = 16kB

– Arduino mini = 15kB RAM (more EPROM)

– BeagleBone/Raspberry Pi = 500 kB RAM

• Sensors and controllers have little power

– Very common to power down 99% of the time

• Sensors and controls often have very low bandwidth

– Mesh networks with base rates << 1Mb/s

– Power line networking

– Intermittent 3G/4G/LTE connectivity


What Do We Need to Do With a Time Series

• Acquire

– Measurement, transmission, reception

– Mostly not our problem

• Store

– We own this

• Retrieve

– We have to allow this

• Analyze and visualize

– We facilitate this via retrieval


Retrieval Requirements

• Retrieve by time-series, time range, tags

– Possibly pull millions of data points at a time

– Possibly do on-the-fly windowed aggregations

• Search by unstructured data

– Typically require time windowed facetting after search

– Also need to dive in with first kind of retrieval


Storage choices and trade-offs

• Flat files

– Great for rapid ingest with massive data

– Handles essentially any data type

– Less good for data requiring frequent updates

– Harder to find specific ranges

• Traditional relational db

– Ingests up to 10,000’s/ sec; prefers well structured (numerical) data; expensive

• Non-relational db: Tables (such as MapR tables in M7 or HBase)

– Ingests up to 100,000 rows/sec

– Handles wide variety of data

– Good for frequent updates

– Easily scanned in a range


Specific Example

• Consider a server farm

• Lots of system metrics

• Typically 100-300 stats / 30 s

• Loads, RPC’s, packets, requests/s

• Common to have 100 – 10,000 machines


The General Outline

• 10 samples / second / machine

x 1,000 machines

= 10,000 samples / second

• This is what Open TSDB was designed to handle

• Install and go, but don’t test at scale


Specific Example

• Consider oil drilling rigs

• When drilling wells, there are *lots* of moving parts

• Typically a drilling rig makes about 10K samples/s

• Temperatures, pressures, magnetics,

machine vibration levels, salinity, voltage,

currents, many others

• Typical project has 100 rigs


The General Outline

• 10K samples / second / rig

x 100 rigs

= 1M samples / second


The General Outline

• 10K samples / second / rig

x 100 rigs

= 1M samples / second

• But wait, there’s more

– Suppose you want to test your system

– Perhaps with a year of data

– And you want to load that data in << 1 year

• 100x real-time = 100M samples / second


How Should That Work?

Message

queueCollector

MapR

tableSamples

Web service Users


A First Attempt

OpenTSDB is a distributed Time Series Database build on top of

HBase, enabling you …

– to store & index, as well as

– to query & plot

… metrics at scale.


Design Goals

• Distributed storage of metrics

• Metrics query fast and easy

• Scale out to thousands of machines and billions of data points

• No SPOF


Key concepts


Key concepts

(00:38, 56) mysql.com_delete schema=userdb


Key concepts

data point: (timestamp, value)

+ metric

+ tag: key=value

time series


Example TS

...

1409497082 327810227706 mysql.bytes_received schema=foo host=db1

1409497099 6604859181710 mysql.bytes_sent schema=foo host=db1

1409497106 327812421706 mysql.bytes_received schema=foo host=db1

1409497113 6604901075387 mysql.bytes_sent schema=foo host=db

...

UNIX epoch timestamp: $(date +%s)

a metric (often hierarchical)

two tags


Declare metric

$ tsdb mkmetric mysql.bytes_sent mysql.bytes_received

metrics mysql.bytes_sent: [0, 0, 1]

metrics mysql.bytes_received: [0, 0, 2]

… or use –auto-metric


Collect metric

• tcollector: gathers data from local

collectors, pushes to TSDs and

providing deduplication

• lots bundled

– General: iostat, netstat, etc.

– Others: MySQL, HBase, etc.

• … or roll your own

http://opentsdb.net/docs/build/html/user_guide/utilities/tcollector.html

https://github.com/OpenTSDB/tcollector/tree/master/collectors/0


The Whole Picture

HBase

or

MapR-DB


Wide Table Design: Point-by-Point


Wide Table Design: Hybrid Point-by-Point + Blob

Insertion of data as blob makes original columns redundant

Non-relational, but you can query these tables with Drill


Status to This Point

• Each sample requires one insertion, compaction requires

another

• Typical performance on SE cluster

– 1 edge node + 4 cluster nodes

– 20,000 samples per second observed

– Would be faster on performance cluster, possibly not a lot

• Suitable for server monitoring

• Not suitable for large scale history ingestion

• Bulk load helps a little, but not much

• Still 1000x too slow for industrial work


Speeding up OpenTSDB

20,000 data points per second per node in the test cluster

Why can’t it be faster ?


Speeding up OpenTSDB: open source MapR extensions

Available on Github: https://github.com/mapr-demos/opentsdb

https://github.com/mapr-demos/opentsdb


Status to This Point

• 3600 samples require one insertion

• Typical results on SE cluster– 1 edge node + 4 cluster nodes

– 14 million samples per second observed

– ~700x faster ingestion

• Typical results on performance cluster– 2-4 edge nodes + 4-9 cluster nodes

– 110 million samples/s (4 nodes) to >200 million samples/s (8 nodes)

• Suitable for large scale history ingestion

• 30 million data points retrieved in 20s

• Ready for industrial work


Key Results

• Ingestion is network limited

– Edge nodes are the critical resource

– Number of edge nodes defines a limit to scaling

• With enough edge nodes scaling is near perfect

• Performance of raw OpenTSDB is limited by stateless demon

• Modified OpenTSDB can run 1000x faster


Overall Ingestion Rate

Nodes

To

tal In

ge

stion

Ra

te (

mill

ion

s o

f p

oin

ts /

se

co

nd

)

4 5 8 9

05

01

50

25

0 Two ingestors

One ingestor


Normalized Ingestion Rate

Nodes

Ing

estio

n p

er

no

de

(m

illio

ns o

f p

oin

ts / s

eco

nd)

4 5 8 9

01

02

03

04

0Two ingestors

One ingestor


Why MapR?

• MapR tables are inherently faster, safer

– Sustained > 1GB/s ingest rate in tests

• Mirror to M5 or M7 cluster to isolate analytics load

• Transaction logs involves frequent appends, many files


When is this All Wrong?

• In some cases, retrieval by series-id + time range not sufficient

• May need very flexible retrieval of events based on text-like

criteria

• Search may be better than class time-series database

• Can scale Lucene based search to > 1 million events / second


When is it Even More Right

• In many industrial settings, data rates from individual sensors are

relatively high

– Latency to view is still measured in seconds, not sample points

• This allows batching at source

• Common requirement for highly variable sample rates

– 1 sample/s, baseline, switch to 10 k sample/s

– Small batches during slow times are just fine since number of sensors is

constant

– Requires variable window sizes


Summary

• The internet is turning upside down

• This will make time series ubiquitous

• Current open source systems are much too slow

• We can fix that with modern NoSQL systems

– (I wear a red hat for a reason)


Questions


Thank You

@mapr maprtech

[email protected]@apache.org

Ted Dunning, Chief Application Architect

MapRTechnologies

maprtech

mapr-technologies

mailto:[email protected]

dunning time-series-2015

Documents

mapr technologies

mapr tables

agendathe internet

data points

sqlfaster time series

mile bits

edge nodes scaling

firsttime series