scalable real-time analytics using druid
TRANSCRIPT
Scalable Real-time Analytics using DruidNishant Bangarwa and Slim Bouguerra Hadoop SummitJune 2016
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
AgendaHistory and Motivation
Druid Architecture
Druid VS BIG DATA
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
History
Development started at Metamarkets in 2011
Initial use case – power ad-tech analytics product
Open sourced in late 2012– GPL licensed initially – Switched to Apache V2 in early 2015
150+ contributors today
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Motivation
Business Intelligence/ OLAP use cases that need interactive real time visualizations on Complex data streams e.g. – Real time bidding events– User activity streams– Voice Call Logs– Network traffic flows– Firewall Events– Application performance metrics
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Solutions Evaluated
RDBMS (Postgres, Mysql)– Star schema with aggregate tables– Slow performance on large scale (upto 20 sec page load times)– Query caching helped, arbitrary queries still slow
Key/value stores (HBase, Cassandra, BigTable)– Pre-aggregate all dimensional combinations – Fast queries were achieved – Precomputation scales exponentially – Takes time to precompute (upto 9 hrs with 14 dimensions)– Not Cost effective
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What is Druid ?
Column-oriented distributed datastore Sub-Second query times Realtime streaming ingestion arbitrary slicing and dicing of data Automatic Data Summarization Approximate algorithms (hyperLogLog, theta) Scalable to petabytes of data Highly available
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Companies Using Druid
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Key Features of Druid
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Scalability
Ability to handle – petabytes of data – billions of events/day
Largest druid cluster – 50 Trillion+ events – 50PB+ of raw data – Over 500TB of compressed query-able
data– Ingestion Rate over 500,000 events/sec– 10-100K events/sec/core
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Fast Response Time
Critical for interactive user experience Avg query times ~500ms 90%ile under 1 sec 99%ile under 10 sec Handle 1000’s of concurrent queries
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Arbitrary slicing n’ dicing
Ability to support arbitrary filtering, splitting and aggregation of data
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Queries on Immediate Data
Immediate insights into current data
Ability to query data as soon as it is ingested
Recent data more important than old data
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Highly Available
Data Replication across nodes Shared nothing architecture No Single point of failure
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Rolling upgrades without downtime
Maintain backwards compatibility Each node can be upgraded independently Easy to run experiments No Downtime
1
1
1
1
1
1
1
1 1
1
1
1
2
2
2
1 1
2
2
2
2
2
2
1 2
2
2
2
2
3
3
2
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Druid Architecture
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Early Druid Architecture
Hadoop
Historical Node
Historical Node
Historical Node
Batch Data Broker Node Queries
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Early Druid Architecture
Hadoop
Historical Node
Historical Node
Historical Node
Batch Data Broker Node Queries
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Early Druid Architecture
Hadoop
Historical Node
Historical Node
Historical Node
Batch Data Broker Node Queries
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Historical Nodes
Shared nothing architecture Main workhorses of druid cluster Load immutable read optimized segments Respond to queries Use memory mapped files
to load segments
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Broker Nodes
Keeps track of segment announcements in cluster Scatters query across historical and realtime nodes Merge results from different query nodes (Distributed) caching layer
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Coordinator Nodes
Assigns segments to historical nodes Interval based cost function to distribute segments Makes sure query load is uniform across historical nodes Handles replication of data Configurable rules to load/drop data
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Current Druid Architecture
Hadoop
Historical Node
Historical Node
Historical Node
Batch Data
Broker Node Queries
ETL(Samza,
Kafka, Storm, Spark etc)
Streaming Data Realtime
Node
Realtime Node
Hand
off
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Realtime Nodes
Ability to ingest streams of data Both push and pull based ingestion Stores data in write-optimized structure Periodically converts write-optimized structure
to read-optimized segments Event query-able as soon as it is ingested
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
DRUID VS Big Data
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Question: number of unique user last minute ?
Pre compute aggregates for every possible set of dimensions. ETL pipeline thousand of stages to compute aggregates. Load to complex stack and layers of databases. Repeat every Hour/Day/Week/Year.
Billion of users. Billions of events per hour. Retain years worth of data.
Will not scale for:
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Summarize data you must !!
Any solution ?
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Summarization
Row is an ad impression. Clicked == 1 is an actual click. Summarization of the hour ?
timestamp domain user gender clicked2011-01-01T00:01:35Z bieber.com 4312345532 Female 12011-01-01T00:03:03Z bieber.com 3484920241 Female 02011-01-01T00:04:51Z ultra.com 9530174728 Male 12011-01-01T00:05:33Z ultra.com 4098310573 Male 12011-01-01T00:05:53Z ultra.com 5832057930 Female 02011-01-01T00:06:17Z ultra.com 5789283478 Female 12011-01-01T00:23:15Z bieber.com 4730093842 Female 02011-01-01T00:38:51Z ultra.com 3909846810 Male 12011-01-01T00:49:33Z bieber.com 4930097162 Female 12011-01-01T00:49:53Z ultra.com 0381837193 Female 0
Example courtesy of Eric Tschetter, used with his permission
simple, just add up numbers
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Summarization – Hourly, Simple
timestamp domain user gender clicked2011-01-01T00:01:35Z bieber.com 4312345532 Female 12011-01-01T00:03:03Z bieber.com 3484920241 Female 02011-01-01T00:04:51Z ultra.com 9530174728 Male 12011-01-01T00:05:33Z ultra.com 4098310573 Male 12011-01-01T00:05:53Z ultra.com 5832057930 Female 02011-01-01T00:06:17Z ultra.com 5789283478 Female 12011-01-01T00:23:15Z bieber.com 4730093842 Female 02011-01-01T00:38:51Z ultra.com 3909846810 Male 12011-01-01T00:49:33Z bieber.com 4930097162 Female 12011-01-01T00:49:53Z ultra.com 0381837193 Female 0
timestamp impressions clicks2011-01-01T00:00:00Z 10 6
We can not query by domain, user or gender!
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Summarization – Hourly, Gender + Domaintimestamp domain user gender clicked2011-01-01T00:01:35Z bieber.com 4312345532 Female 12011-01-01T00:03:03Z bieber.com 3484920241 Female 02011-01-01T00:04:51Z ultra.com 9530174728 Male 12011-01-01T00:05:33Z ultra.com 4098310573 Male 12011-01-01T00:05:53Z ultra.com 5832057930 Female 02011-01-01T00:06:17Z ultra.com 5789283478 Female 12011-01-01T00:23:15Z bieber.com 4730093842 Female 02011-01-01T00:38:51Z ultra.com 9530174728 Male 12011-01-01T00:49:33Z bieber.com 4930097162 Female 12011-01-01T00:49:53Z ultra.com 0381837193 Female 0
timestamp domain gender impressions clicks2011-01-01T00:00:00Z bieber.com Female 4 22011-01-01T00:00:00Z ultra.com Female 3 12011-01-01T00:00:00Z ultra.com Male 3 2
(+) Number of rows per hour is bounded by cardinality of (domain X gender) (-) Query granularity can not be less than one hour. (-) Cannot answer to number of unique users !!!.
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Summarization, compute unique
timestamp domain gender impressions clicks2011-01-01T00:00:00Z bieber.com Female 4 22011-01-01T00:00:00Z ultra.com Female 3 12011-01-01T00:00:00Z ultra.com Male 3 2
timestamp domain gender impressions clicks unique2011-01-01T00 bieber.com Female 4 2 [4312345532, 3484920241, 4730093842, 4930097162]2011-01-01T00 ultra.com Female 3 1 [5832057930, 5789283478, 0381837193]2011-01-01T00 ultra.com Male 3 2 [9530174728, 4098310573]
“unique” grows linearly !!! billion-entry sets per row !!! Can not be apply push down aggregates and merge approach
32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Sketching groundwork
timestamp domain gender impressions clicks uniques2011-01-01T00 bieber.com Female 4 2 [4312345532, 3484920241, 4730093842, 4930097162]2011-01-01T00 ultra.com Female 3 1 [5832057930, 5789283478, 0381837193]2011-01-01T00 ultra.com Male 3 2 [9530174728, 4098310573]
timestamp domain gender impressions clicks uniques2011-01-01T00 bieber.com Female 4 2 <sub-linear-data-structure>2011-01-01T00 ultra.com Female 3 1 <sub-linear-data-structure>2011-01-01T00 ultra.com Male 3 2 <sub-linear-data-structure>
Requirements– Streamable.– Mergeable at query time.– Approximate with predictable error the number of unique users.– Limited memory independent from the data size.
33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Theta Sketches KMV: Open sourced by Yahoo! [datasketches.github.io]
Predictable approximation error can be trade-off by sketch size– k = 4096 corresponds to an RSE of +/- 3.2% with 95% confidence.– k = 16K corresponds to an RSE of +/- 1.6% with 95% confidence.
Limited memory footprint and independent from data size– k = 4096 -> 32768 bytes.– K = 16384 -> 131072 bytes.
Mergebale at query time.– “merge rate of about 14.5 million sketches per second per processor thread”
[http://datasketches.github.io/docs/Theta/ThetaMergeSpeed.html].
Intersection can be computed at query time. Duplication insensitive.
https://speakerdeck.com/druidio/approximate-algorithms-and-sketches-in-druid
34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Druid success story !
Replaced 5,000 Hbase cluster serving six petabytes of metrics to power Flurry mobile analytics alone.[infoworld.com]
Tracking more than 2 billion mobiles devices [Flurry SDK @ MDC 2016]. Real-time Ingestion 20 billion events per day [Flurry SDK @ MDC 2016]. Sub second query latency. Query the last 15 second.
35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
“Another flaw in the human character is that everybody wants to build and nobody wants to do maintenance.”
― Kurt Vonnegut,
36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Monitoring
Monitoring (System level)– CPU– MEM– Network IO– …
Alerting– Logged exceptions– System Metrics Threshold
Exploratory debugging and performance tuning ?
37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Exploratory Debugging / Performance Tuning, Hard !!
Guess Why ?
Distributed application running on multiple machines with different configuration, across multiple data center….
38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Exploratory Debugging / Performance Tuning, Hard !!
Guess Why ?
Can not run benchmark at production environment.
39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Exploratory Debugging / Performance Tuning, Hard !!
Guess Why ?
Can not reproduce the production load pattern.
40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Exploratory Debugging / Performance Tuning, Hard !!
Guess Why ?
Hard to obtain insights from log files.– Need to be interactive and real-time.– Need to able to arbitrary slice and dice the benchmark results.
41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Druid internal metrics
– Periodic events– Query related events– Ingestion related events
Events Type
{“timestamp”:”2016-05-01T10:14:00”, “metric”: “query/time”, “service”:”druid/broker”, “value”:”234”, “type”:”groupBy”, ” id”: ”12374095094” , …}
Anatomy of Events
Unbounded cardinality for dimensions like query id. Very high throughput of emitted events.
42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Metrics Cluster Architecture
http
http
httphttp
Per Real-time node we can 20k Events/sec with at granularity of one minute.
http VIP
Collectorsnodes
Brokers
Historical
Query rewrite
Scatter/GhatherR
eal time
43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Summary Scalability
– Horizontal Scalability.– Columnar storage, indexing and compression. – Multi-tennancy.
Real-time– Ingestion latency < seconds.– Query latency < seconds.
Arbitrary slice and dice big data like ninja– No more pre-canned drill downs.– Query with more fine-grained granularity.
High availability and Rolling deployment capabilities – Less costly to run. – Very active open source community.
44 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank you ! Questions ?
45 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Druid as a Platform
Druid
Batch Ingestion(Hadoop, Spark, …)
Web Services(Fili)
Visualizations(Pivot, Graphana,
Caravel)
Machine Learning(SciPy, R, ScalaNLP)
Streaming Ingestion(Storm, Samza, Spark-Streaming,
Kafka, ….)