scalable real-time analytics using druid

Scalable Real-time Analytics using DruidNishant Bangarwa and Slim Bouguerra Hadoop SummitJune 2016

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AgendaHistory and Motivation

Druid Architecture

Druid VS BIG DATA


History

Development started at Metamarkets in 2011

Initial use case – power ad-tech analytics product

Open sourced in late 2012– GPL licensed initially – Switched to Apache V2 in early 2015

150+ contributors today


Motivation

Business Intelligence/ OLAP use cases that need interactive real time visualizations on Complex data streams e.g. – Real time bidding events– User activity streams– Voice Call Logs– Network traffic flows– Firewall Events– Application performance metrics


Solutions Evaluated

RDBMS (Postgres, Mysql)– Star schema with aggregate tables– Slow performance on large scale (upto 20 sec page load times)– Query caching helped, arbitrary queries still slow

Key/value stores (HBase, Cassandra, BigTable)– Pre-aggregate all dimensional combinations – Fast queries were achieved – Precomputation scales exponentially – Takes time to precompute (upto 9 hrs with 14 dimensions)– Not Cost effective


What is Druid ?

Column-oriented distributed datastore Sub-Second query times Realtime streaming ingestion arbitrary slicing and dicing of data Automatic Data Summarization Approximate algorithms (hyperLogLog, theta) Scalable to petabytes of data Highly available


Companies Using Druid


Key Features of Druid


Scalability

Ability to handle – petabytes of data – billions of events/day

Largest druid cluster – 50 Trillion+ events – 50PB+ of raw data – Over 500TB of compressed query-able

data– Ingestion Rate over 500,000 events/sec– 10-100K events/sec/core


Fast Response Time

Critical for interactive user experience Avg query times ~500ms 90%ile under 1 sec 99%ile under 10 sec Handle 1000’s of concurrent queries


Arbitrary slicing n’ dicing

Ability to support arbitrary filtering, splitting and aggregation of data


Queries on Immediate Data

Immediate insights into current data

Ability to query data as soon as it is ingested

Recent data more important than old data


Highly Available

Data Replication across nodes Shared nothing architecture No Single point of failure


Rolling upgrades without downtime

Maintain backwards compatibility Each node can be upgraded independently Easy to run experiments No Downtime

1

1

1

1

1

1

1

1 1

1

1

1

2

2

2

1 1

2

2

2

2

2

2

1 2

2

2

2

2

3

3

2


Druid Architecture


Early Druid Architecture

Hadoop

Historical Node

Historical Node

Historical Node

Batch Data Broker Node Queries



Hadoop

Historical Node

Historical Node

Historical Node



Historical Nodes

Shared nothing architecture Main workhorses of druid cluster Load immutable read optimized segments Respond to queries Use memory mapped files

to load segments


Broker Nodes

Keeps track of segment announcements in cluster Scatters query across historical and realtime nodes Merge results from different query nodes (Distributed) caching layer


Coordinator Nodes

Assigns segments to historical nodes Interval based cost function to distribute segments Makes sure query load is uniform across historical nodes Handles replication of data Configurable rules to load/drop data


Current Druid Architecture

Hadoop

Historical Node

Historical Node

Historical Node

Batch Data

Broker Node Queries

ETL(Samza,

Kafka, Storm, Spark etc)

Streaming Data Realtime

Node

Realtime Node

Hand

off


Realtime Nodes

Ability to ingest streams of data Both push and pull based ingestion Stores data in write-optimized structure Periodically converts write-optimized structure

to read-optimized segments Event query-able as soon as it is ingested


DRUID VS Big Data


Question: number of unique user last minute ?

Pre compute aggregates for every possible set of dimensions. ETL pipeline thousand of stages to compute aggregates. Load to complex stack and layers of databases. Repeat every Hour/Day/Week/Year.

Billion of users. Billions of events per hour. Retain years worth of data.

Will not scale for:


Summarize data you must !!

Any solution ?


Summarization

Row is an ad impression. Clicked == 1 is an actual click. Summarization of the hour ?

timestamp domain user gender clicked2011-01-01T00:01:35Z bieber.com 4312345532 Female 12011-01-01T00:03:03Z bieber.com 3484920241 Female 02011-01-01T00:04:51Z ultra.com 9530174728 Male 12011-01-01T00:05:33Z ultra.com 4098310573 Male 12011-01-01T00:05:53Z ultra.com 5832057930 Female 02011-01-01T00:06:17Z ultra.com 5789283478 Female 12011-01-01T00:23:15Z bieber.com 4730093842 Female 02011-01-01T00:38:51Z ultra.com 3909846810 Male 12011-01-01T00:49:33Z bieber.com 4930097162 Female 12011-01-01T00:49:53Z ultra.com 0381837193 Female 0

Example courtesy of Eric Tschetter, used with his permission

simple, just add up numbers


Summarization – Hourly, Simple

timestamp domain user gender clicked2011-01-01T00:01:35Z bieber.com 4312345532 Female 12011-01-01T00:03:03Z bieber.com 3484920241 Female 02011-01-01T00:04:51Z ultra.com 9530174728 Male 12011-01-01T00:05:33Z ultra.com 4098310573 Male 12011-01-01T00:05:53Z ultra.com 5832057930 Female 02011-01-01T00:06:17Z ultra.com 5789283478 Female 12011-01-01T00:23:15Z bieber.com 4730093842 Female 02011-01-01T00:38:51Z ultra.com 3909846810 Male 12011-01-01T00:49:33Z bieber.com 4930097162 Female 12011-01-01T00:49:53Z ultra.com 0381837193 Female 0

timestamp impressions clicks2011-01-01T00:00:00Z 10 6

We can not query by domain, user or gender!


Summarization – Hourly, Gender + Domaintimestamp domain user gender clicked2011-01-01T00:01:35Z bieber.com 4312345532 Female 12011-01-01T00:03:03Z bieber.com 3484920241 Female 02011-01-01T00:04:51Z ultra.com 9530174728 Male 12011-01-01T00:05:33Z ultra.com 4098310573 Male 12011-01-01T00:05:53Z ultra.com 5832057930 Female 02011-01-01T00:06:17Z ultra.com 5789283478 Female 12011-01-01T00:23:15Z bieber.com 4730093842 Female 02011-01-01T00:38:51Z ultra.com 9530174728 Male 12011-01-01T00:49:33Z bieber.com 4930097162 Female 12011-01-01T00:49:53Z ultra.com 0381837193 Female 0

timestamp domain gender impressions clicks2011-01-01T00:00:00Z bieber.com Female 4 22011-01-01T00:00:00Z ultra.com Female 3 12011-01-01T00:00:00Z ultra.com Male 3 2

(+) Number of rows per hour is bounded by cardinality of (domain X gender) (-) Query granularity can not be less than one hour. (-) Cannot answer to number of unique users !!!.


Summarization, compute unique

timestamp domain gender impressions clicks2011-01-01T00:00:00Z bieber.com Female 4 22011-01-01T00:00:00Z ultra.com Female 3 12011-01-01T00:00:00Z ultra.com Male 3 2

timestamp domain gender impressions clicks unique2011-01-01T00 bieber.com Female 4 2 [4312345532, 3484920241, 4730093842, 4930097162]2011-01-01T00 ultra.com Female 3 1 [5832057930, 5789283478, 0381837193]2011-01-01T00 ultra.com Male 3 2 [9530174728, 4098310573]

“unique” grows linearly !!! billion-entry sets per row !!! Can not be apply push down aggregates and merge approach


Sketching groundwork

timestamp domain gender impressions clicks uniques2011-01-01T00 bieber.com Female 4 2 [4312345532, 3484920241, 4730093842, 4930097162]2011-01-01T00 ultra.com Female 3 1 [5832057930, 5789283478, 0381837193]2011-01-01T00 ultra.com Male 3 2 [9530174728, 4098310573]

timestamp domain gender impressions clicks uniques2011-01-01T00 bieber.com Female 4 2 <sub-linear-data-structure>2011-01-01T00 ultra.com Female 3 1 <sub-linear-data-structure>2011-01-01T00 ultra.com Male 3 2 <sub-linear-data-structure>

Requirements– Streamable.– Mergeable at query time.– Approximate with predictable error the number of unique users.– Limited memory independent from the data size.


Theta Sketches KMV: Open sourced by Yahoo! [datasketches.github.io]

Predictable approximation error can be trade-off by sketch size– k = 4096 corresponds to an RSE of +/- 3.2% with 95% confidence.– k = 16K corresponds to an RSE of +/- 1.6% with 95% confidence.

Limited memory footprint and independent from data size– k = 4096 -> 32768 bytes.– K = 16384 -> 131072 bytes.

Mergebale at query time.– “merge rate of about 14.5 million sketches per second per processor thread”

[http://datasketches.github.io/docs/Theta/ThetaMergeSpeed.html].

Intersection can be computed at query time. Duplication insensitive.

https://speakerdeck.com/druidio/approximate-algorithms-and-sketches-in-druid

http://datasketches.github.io/


Druid success story !

Replaced 5,000 Hbase cluster serving six petabytes of metrics to power Flurry mobile analytics alone.[infoworld.com]

Tracking more than 2 billion mobiles devices [Flurry SDK @ MDC 2016]. Real-time Ingestion 20 billion events per day [Flurry SDK @ MDC 2016]. Sub second query latency. Query the last 15 second.

http://www.infoworld.com/article/2949168/hadoop/yahoo-struts-its-hadoop-stuff.html


“Another flaw in the human character is that everybody wants to build and nobody wants to do maintenance.”

― Kurt Vonnegut,

http://www.goodreads.com/author/show/2778055.Kurt_Vonnegut


Monitoring

Monitoring (System level)– CPU– MEM– Network IO– …

Alerting– Logged exceptions– System Metrics Threshold

Exploratory debugging and performance tuning ?


Exploratory Debugging / Performance Tuning, Hard !!

Guess Why ?

Distributed application running on multiple machines with different configuration, across multiple data center….



Guess Why ?

Can not run benchmark at production environment.



Guess Why ?

Can not reproduce the production load pattern.



Guess Why ?

Hard to obtain insights from log files.– Need to be interactive and real-time.– Need to able to arbitrary slice and dice the benchmark results.


Druid internal metrics

– Periodic events– Query related events– Ingestion related events

Events Type

{“timestamp”:”2016-05-01T10:14:00”, “metric”: “query/time”, “service”:”druid/broker”, “value”:”234”, “type”:”groupBy”, ” id”: ”12374095094” , …}

Anatomy of Events

Unbounded cardinality for dimensions like query id. Very high throughput of emitted events.


Metrics Cluster Architecture

http

http

httphttp

Per Real-time node we can 20k Events/sec with at granularity of one minute.

http VIP

Collectorsnodes

Brokers

Historical

Query rewrite

Scatter/GhatherR

eal time


Summary Scalability

– Horizontal Scalability.– Columnar storage, indexing and compression. – Multi-tennancy.

Real-time– Ingestion latency < seconds.– Query latency < seconds.

Arbitrary slice and dice big data like ninja– No more pre-canned drill downs.– Query with more fine-grained granularity.

High availability and Rolling deployment capabilities – Less costly to run. – Very active open source community.


Thank you ! Questions ?


Druid as a Platform

Druid

Batch Ingestion(Hadoop, Spark, …)

Web Services(Fili)

Visualizations(Pivot, Graphana,

Caravel)

Machine Learning(SciPy, R, ScalaNLP)

Streaming Ingestion(Storm, Samza, Spark-Streaming,

Kafka, ….)

scalable real-time analytics using druid

Technology