high cardinality time series search: a new level of scale - data day texas 2016

23
© Rocana, Inc. All Rights Reserved. | 1 Eric Sammer – CTO and co-founder, @esammer Data Day Texas 2016 High cardinality time series search A new level of scale

Upload: eric-sammer

Post on 13-Apr-2017

3.835 views

Category:

Software


0 download

TRANSCRIPT

Page 1: High cardinality time series search: A new level of scale - Data Day Texas 2016

© Rocana, Inc. All Rights Reserved. | 1

Eric Sammer – CTO and co-founder, @esammer

Data Day Texas 2016

High cardinality time series searchA new level of scale

Page 2: High cardinality time series search: A new level of scale - Data Day Texas 2016

© Rocana, Inc. All Rights Reserved. | 2

Context• We build a system for large scale realtime collection, processing, and

analysis of event-oriented machine data

• On prem or in the cloud, but not SaaS

• Supportability is a big deal for us• Predictability of performance and under failures• Ease of configuration and operation• Behavior in wacky environments

• All of our decisions are informed by this - YMMV

Page 3: High cardinality time series search: A new level of scale - Data Day Texas 2016

© Rocana, Inc. All Rights Reserved. | 3

What I mean by “scale”• Typical: 10s of TB of new data per day

• Average event size ~200-500 bytes

• 20TB per day• @200 bytes = 1.2M events / second, ~109.9B events / day, 40.1T events / year• @500 bytes = 509K events / second, ~43.9B events / day, 16T events / year,

• Retaining years online for query

Page 4: High cardinality time series search: A new level of scale - Data Day Texas 2016

© Rocana, Inc. All Rights Reserved. | 4

General purpose search – the good parts• We originally built against Solr Cloud (but most of this goes for Elastic

Seach too)

• Amazing feature set for general purpose search

• Good support for moderate scale

• Excellent at• Content search – news sites, document repositories• Finite size datasets – product catalogs, job postings, things you prune• Low(er) cardinality datasets that (mostly) fit in memory

Page 5: High cardinality time series search: A new level of scale - Data Day Texas 2016

© Rocana, Inc. All Rights Reserved. | 5

Problems with general purpose search systems• Fixed shard allocation models – always N partitions

• Multi-level and semantic partitioning is painful without building your own macro query planner

• All shards open all the time; poor resource control for high retention

• APIs are record-at-a-time focused for NRT indexing; poor ingest performance (aka: please stop making everything REST!)

• Ingest concurrency is wonky

• High write amplification on data we know won’t change

• Other smaller stuff…

Page 6: High cardinality time series search: A new level of scale - Data Day Texas 2016

© Rocana, Inc. All Rights Reserved. | 6

“Well actually…”

Plenty of ways to push general purpose systems

(We tried many of them)

• Using multiple collections as partitions, macro query planning

• Running multiple JVMs per node for better utilization

• Pushing historical searches into another system

• Building weirdo caches of things

At some point the cost of hacking outweighed the cost of building

Page 7: High cardinality time series search: A new level of scale - Data Day Texas 2016

© Rocana, Inc. All Rights Reserved. | 7

Warning!• This is not a condemnation of general purpose search systems!

• Unless the sky is falling, use one of those systems

Page 8: High cardinality time series search: A new level of scale - Data Day Texas 2016

© Rocana, Inc. All Rights Reserved. | 8

We built a thing: Rocana SearchHigh cardinality, low latency, parallel search system for time-oriented events

Page 9: High cardinality time series search: A new level of scale - Data Day Texas 2016

© Rocana, Inc. All Rights Reserved. | 9

Features of Rocana Search• Fully parallelized ingest and query, built for large clusters

• Every node is an indexer, query coordinator, and executor

• Optimized for high cardinality time-oriented event data

• Built to keep all data online and queryable without wasting resources for infrequently used data

• Fully durable, resistant to node failures

• Operationally friendly: online ops, predictable resource usage and performance

• Uses battle tested open source components (Kafka, Lucene, HDFS, ZK)

Page 10: High cardinality time series search: A new level of scale - Data Day Texas 2016

© Rocana, Inc. All Rights Reserved. | 10

Major differences• Storage and partition model looks more like range-partitioned tables in

databases; new partitions easily added, old ones dropped, support for multi-field partitioning

• Partitions subdivided into slices for parallel writes

• Query engine aggressively prunes partitions by analyzing predicates

• Ingestion path is Kafka, built for extremely high throughput of small events

What we know about our data allows us to optimize

Page 11: High cardinality time series search: A new level of scale - Data Day Texas 2016

© Rocana, Inc. All Rights Reserved. | 11

Architecture

(A single node)

Page 12: High cardinality time series search: A new level of scale - Data Day Texas 2016

© Rocana, Inc. All Rights Reserved. | 12

Collections, partitions, and slices• A search collection is split into partitions by a partition strategy

• Think: “By year, month, day, hour”• Partitioning invisible to queries (e.g. `time:[x TO y] AND host:z` works normally)

• Partitions are divided into slices to support lock-free parallel writes• Think: “This hour has 20 slices, each of which is independent for write”

Page 13: High cardinality time series search: A new level of scale - Data Day Texas 2016

© Rocana, Inc. All Rights Reserved. | 13

Collections, partitions, and slices

Page 14: High cardinality time series search: A new level of scale - Data Day Texas 2016

© Rocana, Inc. All Rights Reserved. | 14

From events to partitions to slices

Page 15: High cardinality time series search: A new level of scale - Data Day Texas 2016

© Rocana, Inc. All Rights Reserved. | 15

Assigning slices to nodes

Page 16: High cardinality time series search: A new level of scale - Data Day Texas 2016

© Rocana, Inc. All Rights Reserved. | 16

Following the write path• One of the search nodes is the exclusive owner of KP 0 and KP 1

• Consume a batch of events

• Use the partition strategy to figure out to which RS partition it belongs

• Kafka messages carry the partition so we know the slice

• Event written to the proper partition/slice

• Eventually the indexes are committed

• If the partition or slice is new, metadata service is informed

Page 17: High cardinality time series search: A new level of scale - Data Day Texas 2016

© Rocana, Inc. All Rights Reserved. | 17

Query engine basics• Queries submitted to coordinator via RPC

• Coordinator (smart) parses, plans, schedules and monitors fragments, merges results, responds to client

• Fragments are submitted to executors for processing

• Executors (dumb) search exactly what they’re told, stream to coordinator

• Fragment is generated for every partition/slice that may contain data

Page 18: High cardinality time series search: A new level of scale - Data Day Texas 2016

© Rocana, Inc. All Rights Reserved. | 18

Some implications• Search processes are on the same nodes as the HDFS DataNode

• First replica of any event received by search from Kafka is written locally

• Result: Unless nodes fail, all reads are local (HDFS short circuit reads)

• Linux kernel page cache is useful here

• HDFS caching can be used

• Search has an off-heap block cache as well

• In case of failure, any search node can read any index

• HDFS overhead winds up being very little, still get the advantages

Page 19: High cardinality time series search: A new level of scale - Data Day Texas 2016

© Rocana, Inc. All Rights Reserved. | 19

Contrived query scenario• 80 Kafka partitions (80 slices)

• Collection partitioned by day

• 80 nodes, 16 executor threads each

• Query: time:[2015-01-01 TO 2016-01-01] AND service:sshd• 365 * 80 = 29200 fragments generated for the query (a lot!)• 29200 / (80 * 16) = ~22 “waves” of fragments• If each “wave” takes ~0.5 second, the query takes ~11 seconds

Page 20: High cardinality time series search: A new level of scale - Data Day Texas 2016

© Rocana, Inc. All Rights Reserved. | 20

More real, but preliminary• 24 AWS EC2 d2.2xl, instance storage

• Ingesting data at ~3 million events per minute (50K eps)• 24 Kafka partitions / RS slices• Index size: 5.9 billion events

• Query: All events, facet by 3 fields• No tuning (default config): ~10 seconds (with a silly bug)• 10 concurrent instances of the same query: ~21 seconds total• 50 concurrent instances: ~41 seconds

• We will do much better shortly (*ahem*, Brett)!

Page 21: High cardinality time series search: A new level of scale - Data Day Texas 2016

© Rocana, Inc. All Rights Reserved. | 21

What we’ve really shown

In the context of search, scale means:

• High cardinality: Billions of events per day

• High speed ingest: Hundreds of thousands of events per second

• Not having to age data out of the collection

• Handling large, concurrent queries, while ingesting data

• Fully utilizing modern hardware

These things are very possible

Page 22: High cardinality time series search: A new level of scale - Data Day Texas 2016

© Rocana, Inc. All Rights Reserved. | 22

Next steps• Read replicas

• Smarter partition elimination in complex queries

• Speculative execution of query fragments

• Additional metadata for index fields to improve storage efficiency

• Smarter cache management

• Better visibility into performance and health

• Strong consensus (e.g. Raft, multi-paxos) for metadata?

Page 23: High cardinality time series search: A new level of scale - Data Day Texas 2016

© Rocana, Inc. All Rights Reserved. | 23

Thank you!

Hopefully I still have time for questions.

rocana.com

@esammer

[email protected]

(ask me for stickers)

The (amazing) core search team:

• Brett Hoerner - @bretthoerner

• Michael Peterson - @quux00

• Mark Tozzi - @not_napoleon