time-oriented event search. a new level of scale

24
© Rocana, Inc. All Rights Reserved. | 1 Joey Echeverria, Platform Technical Lead - @fwiffo Michael Peterson, Platform Engineer - @quux00 Hadoop Summit Ireland 2016 Time-oriented event search A new level of scale

Upload: hadoop-summit

Post on 06-Jan-2017

220 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Time-oriented event search. A new level of scale

© Rocana, Inc. All Rights Reserved. | 1

Joey Echeverria, Platform Technical Lead - @fwiffo

Michael Peterson, Platform Engineer - @quux00

Hadoop Summit Ireland 2016

Time-oriented event searchA new level of scale

Page 2: Time-oriented event search. A new level of scale

© Rocana, Inc. All Rights Reserved. | 2

Context• We built a system for large scale realtime collection, processing, and

analysis of event-oriented machine data

• On prem or in the cloud, but not SaaS

• Supportability is a big deal for us• Predictability of performance and under failures• Ease of configuration and operation• Behavior in wacky environments

• All of our decisions are informed by this - YMMV

Page 3: Time-oriented event search. A new level of scale

© Rocana, Inc. All Rights Reserved. | 3

What I mean by “scale”• Typical: 10s of TB of new data per day

• Average event size ~200-500 bytes

• 20TB per day• @200 bytes = 1.2M events / second, ~109.9B events / day, 40.1T events / year• @500 bytes = 509K events / second, ~43.9B events / day, 16T events / year,

• Retaining years of data online for query

Page 4: Time-oriented event search. A new level of scale

© Rocana, Inc. All Rights Reserved. | 4

General purpose search – the good parts• We originally built against Solr Cloud (but most of this goes for Elastic

Search too)

• Amazing feature set for general purpose search

• Good support for moderate scale

• Excellent at• Content search – news sites, document repositories• Finite size datasets – product catalogs, job postings, things you prune• Low(er) cardinality datasets that (mostly) fit in memory

Page 5: Time-oriented event search. A new level of scale

© Rocana, Inc. All Rights Reserved. | 5

Problems with general purpose search systems• Fixed shard allocation models – always N partitions

• Multi-level and semantic partitioning is painful without building your own macro query planner

• All shards open all the time; poor resource control for high retention

• APIs are record-at-a-time focused for NRT indexing; poor ingest performance (aka: please stop making everything REST!)

• Ingest concurrency is wonky

• High write amplification on data we know won’t change

• Other smaller stuff…

Page 6: Time-oriented event search. A new level of scale

© Rocana, Inc. All Rights Reserved. | 6

“Well actually…”

Plenty of ways to push general purpose systems

(We tried many of them)

• Using multiple collections as partitions, macro query planning

• Running multiple JVMs per node for better utilization

• Pushing historical searches into another system

• Building weirdo caches of things

At some point the cost of hacking outweighed the cost of building

Page 7: Time-oriented event search. A new level of scale

© Rocana, Inc. All Rights Reserved. | 7

Warning!• This is not a condemnation of general purpose search systems!

• Unless the sky is falling, use one of those systems

Page 8: Time-oriented event search. A new level of scale

© Rocana, Inc. All Rights Reserved. | 8

We built a thing: Rocana SearchHigh cardinality, low latency, parallel search system for time-oriented events

Page 9: Time-oriented event search. A new level of scale

© Rocana, Inc. All Rights Reserved. | 9

Key Goals for Rocana Search• Higher indexing throughput per node than Solr for time-oriented event

data

• Scale horizontally better than Solr• Support an arbitrary number of dynamically created partitions

• Arbitrarily large amounts of indexed data on disk• all data queryable without wasting resources for infrequently used data

• Ability to add/remove Search nodes dynamically without any manual restarts or rebalances

Page 10: Time-oriented event search. A new level of scale

© Rocana, Inc. All Rights Reserved. | 10

Some Key Features of Rocana Search• Fully parallelized ingest and query, built for large clusters

• Every node is an indexer

Hadoop Node

Rocana Search

Hadoop Node

Rocana SearchHadoop Node

Rocana Search

Hadoop Node

Rocana Search

Kafka

Page 11: Time-oriented event search. A new level of scale

© Rocana, Inc. All Rights Reserved. | 11

Some Key Features of Rocana Search• Every node is a query coordinator and executor

Query ClientRocana Search

Coord Exec

Rocana Search

Coord Exec

Rocana Search

Coord Exec

Rocana Search

Coord Exec

Page 12: Time-oriented event search. A new level of scale

© Rocana, Inc. All Rights Reserved. | 12

Architecture

(A single node)

RS

HDFS

MetadataIndex Management Coordinator

ExecutorLucene Indexes

Query Client

Kafka

Data Producers

ZK

Page 13: Time-oriented event search. A new level of scale

© Rocana, Inc. All Rights Reserved. | 13

Sharding Model: datasets, partitions, and slices• A search dataset is split into partitions by a partition strategy

• Think: “By year, month, day”• Partitioning invisible to queries (e.g. `time:[x TO y] AND host:z` works normally)

• Partitions are divided into slices to support lock-free parallel writes• Think: “This day has 20 slices, each of which is independent for write”• Number of slices == Kafka partitions

Page 14: Time-oriented event search. A new level of scale

© Rocana, Inc. All Rights Reserved. | 14

Datasets, partitions, and slices

Dataset “events”

Partition “2016/01/01”

Slice 0 Slice 1

Slice 2 Slice N

Partition “2016/01/02”

Slice 0 Slice 1

Slice 2 Slice N

Page 15: Time-oriented event search. A new level of scale

© Rocana, Inc. All Rights Reserved. | 15

From events to partitions to slices

Page 16: Time-oriented event search. A new level of scale

© Rocana, Inc. All Rights Reserved. | 16

Assigning slices to nodes

Page 17: Time-oriented event search. A new level of scale

© Rocana, Inc. All Rights Reserved. | 17

The write path• One of the search nodes is the exclusive owner of KP 0 and KP 1

• Consume a batch of events

• Use the partition strategy to figure out to which RS partition it belongs

• Kafka messages carry the partition so we know the slice

• Event written to the proper partition/slice

• Eventually the indexes are committed

• If the partition or slice is new, metadata service is informed

Page 18: Time-oriented event search. A new level of scale

© Rocana, Inc. All Rights Reserved. | 18

Query• Queries submitted to coordinator via RPC

• Coordinator parses query and aggressively prunes partitions to search by analyzing predicates

• Coordinator schedules and monitors fragments, merges results, responds to client

• Fragments are submitted to executors for processing

• Executors search exactly what they’re told, stream to coordinator

• Fragment is generated for every slice that may contain data

Page 19: Time-oriented event search. A new level of scale

© Rocana, Inc. All Rights Reserved. | 19

Some benefits of the design• Search processes are on the same nodes as the HDFS DataNode

• First replica of any event received by search from Kafka is written locally• Unless nodes fail, all reads are local (HDFS short circuit reads)• Linux kernel page cache is useful here• HDFS caching could also be used (not yet doing this)

• Search uses off-heap block cache as well

• In case of failure, any search node can read any index

• HDFS overhead winds up being very little, still get the advantages

Page 20: Time-oriented event search. A new level of scale

© Rocana, Inc. All Rights Reserved. | 20

Initial Benchmarks: Event ingest and indexing• Early days on this . . .

• Most recent data we have for Rocana Search vs. Solr• using CMS GC• on Hadoop/HDFS (CDH)• on AWS (d2.2xlarge) – 8 cpus, 60 GB RAM• 4 nodes, 8 shards• 12 hour run, ~300 GiB indexed to disk

• Solr = 11,000 events/sec• Rocana Search = 36,500 events/sec

Page 21: Time-oriented event search. A new level of scale

© Rocana, Inc. All Rights Reserved. | 21

Initial Benchmarks: Query During Ingest

Page 22: Time-oriented event search. A new level of scale

© Rocana, Inc. All Rights Reserved. | 22

Initial Benchmarks: Query (No Ingest)

Page 23: Time-oriented event search. A new level of scale

© Rocana, Inc. All Rights Reserved. | 23

What we’ve really shown

In the context of search, scale means:

• High cardinality: Billions of unique events per day

• High speed ingest: Hundreds of thousands of events per second

• Not having to age data out of the dataset

• Handling large, concurrent queries, while ingesting data

• Fully utilizing modern hardware

These things are very possible

Page 24: Time-oriented event search. A new level of scale

© Rocana, Inc. All Rights Reserved. | 24

Thank you!

Questions?

[email protected]

[email protected]

The Rocana Search Team:

• Michael Peterson - @quux00

• Mark Tozzi - @not_napoleon

• Brad Cupit - @bradcupit

• Brett Hoerner - @bretthoerner

• Joey Echeverria - @fwiffo

• Eric Sammer - @esammer