time series processing with solr and spark

O C T O B E R 1 1 - 1 4 , 2 0 1 6 • B O S T O N , M A

Time Series Processing with Solr and Spark Josef Adersberger (@adersberger)

CTO, QAware

3

TIME SERIES 101

4

01WE’RE SURROUNDED BY TIME SERIES

▸ Operational data: Monitoring data, performance metrics, log events, …

▸ Data Warehouse: Dimension time

▸ Measured Me: Activity tracking, ECG, …

▸ Sensor telemetry: Sensor data, …

▸ Financial data: Stock charts, …

▸ Climate data: Temperature, …

▸ Web tracking: Clickstreams, …

▸ …

@adersberger

5

WE’RE SURROUNDED BY TIME SERIES (Pt. 2)

▸ Oktoberfest: Visitor and beer consumption trend

the singularity

6

01TIME SERIES: BASIC TERMS

univariate time series multivariate time series multi-dimensional time series (time series tensor)

time series setobservation

@adersberger

7

01ILLUSTRATIVE OPERATIONS ON TIME SERIES

align

Time series => Time series

diff downsampling outlier

min/max avg/med slope std-dev

Time series => Scalar

@adersberger

OUR USE CASE

Monitoring Data Analysis of a business-critical,worldwide distributed software system. Enableroot cause analysis and anomaly detection.

> 1,000 nodes worldwide

> 10 processes per node

> 20 metrics per process (OS, JVM, App-spec.)

Measured every second.

= about 6.3 trillions observations p.a.Data retention: 5 yrs.

11

01USE CASE: EXPLORING

Drill-down host process measurements counters (metrics)

Query time series metadata

Superimpose time series

@adersberger

12

01USE CASE: STATISTICS

@adersberger

13

01USE CASE: ANOMALY DETECTION

Featuring Twitter Anomaly Detection (https://github.com/twitter/AnomalyDetectionand Yahoo EGDAS https://github.com/yahoo/egads

@adersberger

https://github.com/twitter/AnomalyDetection

https://github.com/yahoo/egads

14

01USE CASE: SQL AND ZEPPELIN

@adersberger

CHRONIX SPARKhttps://github.com/ChronixDB/chronix.spark

https://github.com/ChronixDB/chronix.spark

http://www.datasciencecentral.com

http://www.datasciencecentral.com

19

01AVAILABLE TIME SERIES DATABASES

https://github.com/qaware/big-data-landscape

EASY-TO-USE BIG TIME SERIES DATA STORAGE & PROCESSING ON SPARK

21

01THE CHRONIX STACK chronix.io

Big time series database Scale-out Storage-efficient Interactive queries

No separate servers: Drop-in to existing Solr and Spark installations

Integrated into the relevant open source ecosystem

@adersberger

Core

Chronix Storage

Chronix Server

Chronix Spark

Chr

onix

For

mat

GrafanaChronix Analytics

Collection

Analytics Frontends

Logstash fluentd collectd

Zeppelin

Prometheus Ingestion Bridge

KairosDB OpenTSDBInfluxDB Graphite

http://chronix.io

22

node

Distributed Data &Data Retrieval ‣ Data sharding ‣ Fast index-based queries ‣ Efficient storage format

Distributed Processing ‣ Heavy lifting distributed

processing ‣ Efficient integration of Spark

and Solr

Result Processing Post-processing on a smaller set of time series

data flow

icon credits to Nimal Raj (database), Arthur Shlain (console) and alvarobueno (takslist)

@adersberger

23

TIME SERIES MODEL

Set of univariate multi-dimensional numeric time series

▸ set … because it’s more flexible and better to parallelise if operations can input and output multiple time series.

▸ univariate … because multivariate will introduce too much complexity (and we have our set to bundle multiple time series).

▸ multi-dimensional … because the ability to slice & dice in the set of time series is very convenient for a lot of use cases.

▸ numeric … because it’s the most common use case.

A single time series is identified by a combination of its non-temporal dimensional values (e.g. unit “mem usage” + host “aws42” + process “tomcat”)

@adersberger

24

01CHRONIX SPARK API: ENTRY POINTS

CHRONIX SPARK

ChronixRDD

ChronixSparkContext

‣ Represents a set of time series ‣Distributed operations on sets of time series

‣Creates ChronixRDDs ‣ Speaks with the Chronix Server (Solr)

@adersberger

25

01CHRONIX SPARK API: DATA MODEL

MetricTimeSeries

MetricObservationDataFrame

+ toDataFrame()

@adersberger

Dataset<MetricTimeSeries>

Dataset<MetricObservation>

+ toDataset() + toObservationsDataset()

ChronixRDD

26

01SPARK APIs FOR DATA PROCESSING

RDD DataFrame Dataset

typed yes no yes

optimized medium highly highly

mature yes yes medium

SQL no yes no

@adersberger

27

01CHRONIX RDD

Statistical operations

the set characteristic: a JavaRDD of MetricTimeSeries

Filter the set (esp. bydimensions)

@adersberger

28

01METRICTIMESERIES DATA TYPEaccess all timestamps

the multi-dimensionality:get/set dimensions(attributes)

access all observations as stream

access all numeric values

@adersberger

30

01//Create Chronix Spark context from a SparkContext / JavaSparkContextChronixSparkContext csc = new ChronixSparkContext(sc);//Read data into ChronixRDDSolrQuery query = new SolrQuery( "metric:\"java.lang:type=Memory/HeapMemoryUsage/used\""); ChronixRDD rdd = csc.query(query, "localhost:9983", //ZooKeeper host "chronix", //Solr collection for Chronix new ChronixSolrCloudStorage());//Calculate the overall min/max/mean of all time series in the RDDdouble min = rdd.min();double max = rdd.max();double mean = rdd.mean();

DataFrame df = rdd.toDataFrame(sqlContext);DataFrame res = df .select("time", "value", "process", "metric") .where("process='jenkins-jolokia'") .orderBy("time"); res.show();

@adersberger

CHRONIX SPARK INTERNALS

32

Distributed Data &Data Retrieval ‣ Data sharding (OK) ‣ Fast index-based queries (OK) ‣ Efficient storage format

@adersberger

33

01CHRONIX FORMAT: CHUNKING TIME SERIES

TIME SERIES ‣ start: TimeStamp ‣ end: TimeStamp ‣ dimensions: Map<String, String> ‣ observations: byte[]


Logical


Physical

Chunking: 1 logical time series = n physical time series (chunks) 1 chunk = fixed amount of observations 1 chunk = 1 Solr document

@adersberger

34

01CHRONIX FORMAT: ENCODING OF OBSERVATIONS

Binary encoding of all timestamp/value pairs (observations) with ProtoBuf incl. binary compression. Delta encoding leading to more effective binary compression

… of time stamps (DCC, Date-Delta-Compaction)

… of values: diff

chunck • timespan • nbr. of observations

periodic distributed time stamps (pts): timespan / nbr. of observations

real time stamps (rts) if |pts(x) - rts(x)| < threshold : rts(x) = pts(x) value_to_store = pts(x) - rts(x)

value_to_store = value(x) - value(x-1)

@adersberger

35

01CHRONIX FORMAT: TUNING CHUNK SIZE AND CODEC

GZIP + 128

kBytes

Florian Lautenschlager, Michael Philippsen, Andreas Kumlehn, Josef AdersbergerChronix: Efficient Storage and Query of Operational Time Series International Conference on Software Maintenance and Evolution 2016 (submitted)

@adersberger

storage demand access

time

36

01CHRONIX FORMAT: STORAGE EFFICIENCY BENCHMARK

@adersberger

37

01CHRONIX FORMAT: PERFORMANCE BENCHMARK

unit: secondsnbr of queries query

@adersberger

38

Distributed Processing ‣ Heavy lifting distributed

processing ‣ Efficient integration of Spark

and Solr

@adersberger

39

01SPARK AND SOLR BEST PRACTICES: ALIGN PARALLELISM

SolrDocument(Chunk)

Solr Shard Solr Shard

TimeSeries TimeSeries TimeSeries TimeSeries TimeSeries

Partition Partition

ChronixRDD

• Unit of parallelism in Spark: Partition • Unit of parallelism in Solr: Shard • 1 Spark Partition = 1 Solr Shard

SolrDocument(Chunk)

SolrDocument(Chunk)

SolrDocument(Chunk)

SolrDocument(Chunk)

SolrDocument(Chunk)

SolrDocument(Chunk)

SolrDocument(Chunk)

SolrDocument(Chunk)

SolrDocument(Chunk)

@adersberger

40

01ALIGN THE PARALLELISM WITHIN CHRONIXRDD

public ChronixRDD queryChronixChunks( final SolrQuery query, final String zkHost, final String collection, final ChronixSolrCloudStorage<MetricTimeSeries> chronixStorage) throws SolrServerException, IOException { // first get a list of replicas to query for this collection List<String> shards = chronixStorage.getShardList(zkHost, collection); // parallelize the requests to the shards JavaRDD<MetricTimeSeries> docs = jsc.parallelize(shards, shards.size()).flatMap( (FlatMapFunction<String, MetricTimeSeries>) shardUrl -> chronixStorage.streamFromSingleNode( new KassiopeiaSimpleConverter(), shardUrl, query)::iterator); return new ChronixRDD(docs);}

Figure out all Solr shards (using CloudSolrClient in the background)

Query each shard in parallel and convert SolrDocuments to MetricTimeSeries

@adersberger

41

01SPARK AND SOLR BEST PRACTICES: PUSHDOWNSolrQuery query = new SolrQuery( “<Solr query containing filters and aggregations>"); ChronixRDD rdd = csc.query(query, …

@adersberger

Predicate pushdown • Pre-filter time series based on their metadata (dimensions, start, end) with Solr.

Aggregation pushdown • Perform pre-aggregations (min/max/avg/…) at ingestion time and store it as metadata.

• (to come) Perform aggregations on Solr-level at query time by enabling Solr to decode observations

42

01SPARK AND SOLR BEST PRACTICES: EFFICIENT DATA TRANSFER

Reduce volume: Pushdown & compression

Use efficient protocols: Low-overhead, bulk, stream

Avoid remote transfer: Place Spark tasks (processes 1 partition) on the Solr node with the appropriate shard. (to come by using SolrRDD)

@adersberger

Export Handler

ChronixRDD

CloudSolrStream

Format Decoder

bulk of JSON tuples

Chronix SparkSolr / SolrJ

43

private Stream<MetricTimeSeries> streamWithCloudSolrStream(String zkHost, String collection, String shardUrl, SolrQuery query, TimeSeriesConverter<MetricTimeSeries> converter) throws IOException { Map params = new HashMap(); params.put("q", query.getQuery()); params.put("sort", "id asc"); params.put("shards", extractShardIdFromShardUrl(shardUrl)); params.put("fl", Schema.DATA + ", " + Schema.ID + ", " + Schema.START + ", " + Schema.END + ", metric, host, measurement, process, ag, group"); params.put("qt", "/export"); params.put("distrib", false); CloudSolrStream solrStream = new CloudSolrStream(zkHost, collection, params); solrStream.open(); SolrTupleStreamingService tupStream = new SolrTupleStreamingService(solrStream, converter); return StreamSupport.stream( Spliterators.spliteratorUnknownSize(tupStream, Spliterator.SIZED), false);}

Pin query to one shard

Use export request handler

Boilerplate code to stream response

@adersberger

Time Series Databases should be first-class citizens.

Chronix leverages Solr and Spark to be storage efficient and to allow interactive

queries for big time series data.

THANK YOU! QUESTIONS?

Mail: [email protected] Twitter: @adersberger

TWITTER.COM/QAWARE - SLIDESHARE.NET/QAWARE

mailto:[email protected]?subject=

http://twitter.com/qaware

http://slideshare.net/qaware

BONUS SLIDES

PERFORMANCE

codingvoding.tumblr.com

http://codingvoding.tumblr.com

PREMATURE OPTIMIZATION IS NOT EVIL IF YOU HANDLE BIG DATA

Josef Adersberger

PERFORMANCE

USING A JAVA PROFILER WITH A LOCAL CLUSTER

PERFORMANCE

HIGH-PERFORMANCE, LOW-OVERHEAD COLLECTIONS

PERFORMANCE

830 MB -> 360 MB(- 57%)

unveiled wrong Jackson handling inside of SolrClient

53

01THE SECRETS OF DISTRIBUTED PROCESSING PERFORMANCE

Rule 1: Be as close to the data as possible! (CPU cache > memory > local disk > network)

Rule 2: Reduce data volume as early as possible! (as long as you don’t sacrifice parallelization)

Rule 3: Parallelize as much as possible! (max = #cores * x)

PERFORMANCE

THE RULES APPLIED

‣ Rule 1: Be as close to the data as possible! 1. Solr caching2. Spark in-memory processing with activated RDD compression3. Binary protocol between Solr and Spark

‣ Rule 2: Reduce data volume as early as possible! ‣ Efficient storage format (Chronix Format)‣ Predicate pushdown to Solr (query)‣ Group-by & aggregation pushdown to Solr (faceting within a query)

‣ Rule 3: Parallelize as much as possible! ‣ Scale-out on data-level with SolrCloud‣ Scale-out on processing-level with Spark

APACHE SPARK 101

CHRONIX SPARK WONDERLAND

ARCHITECTURE

APACHE SPARK

SPARK TERMINOLOGY (1/2)

▸ RDD: Has transformations and actions. Hides data partitioning & distributed computation. References a set of partitions (“output partitions”) - materialized or not - and has dependencies to another RDD (“input partitions”). RDD operations are evaluated as late as possible (when an action is called). As long as not being the root RDD the partitions of an RDD are in memory but they can be persisted by request.

▸ Partitions: (Logical) chunks of data. Default unit and level of parallelism - inside of a partition everything is a sequential operation on records. Has to fit into memory. Can have different representations (in-memory, on disk, off heap, …)

APACHE SPARK

SPARK TERMINOLOGY (2/2)

▸ Job: A computation job which is launched when an action is called on a RDD.

▸ Task: The atomic unit of work (function). Bound to exactly one partition.

▸ Stage: Set of Task pipelines which can be executed in parallel on one executor.

▸ Shuffling: If partitions need to be transferred between executors. Shuffle write = outbound partition transfer. Shuffle read = inbound partition transfer.

▸ DAG Scheduler: Computes DAG of stages from RDD DAG. Determines the preferred location for each task.

THE COMPETITORS / ALTERNATIVES

CHRONIX RDD VS. SPARK-TS

▸ Spark-TS provides no specific time series storage it uses the Spark persistence mechanisms instead. This leads to a less efficient storage usage and less possibilities to perform performance optimizations via predicate pushdown.

▸ In contrast to Spark-TS Chronix does not align all time series values on one vector of timestamps. This leads to greater flexibility in time series aggregation

▸ Chronix provides multi-dimensional time series as this is very useful for data warehousing and APM.

▸ Chronix has support for Datasets as this will be an important Spark API in the near future. But Chronix currently doesn’t support an IndexedRowMatrix for SparkML.

▸ Chronix is purely written in Java. There is no explicit support for Python and Scala yet.

▸ Chronix doesn not support a ZonedTime as this makes it way more complicated.

CHRONIX SPARK INTERNALS

61

01CHRONIXRDD: GET THE CHUNKS FROM SOLR

public ChronixRDD queryChronixChunks( final SolrQuery query, final String zkHost, final String collection, final ChronixSolrCloudStorage<MetricTimeSeries> chronixStorage) throws SolrServerException, IOException { // first get a list of replicas to query for this collection List<String> shards = chronixStorage.getShardList(zkHost, collection); // parallelize the requests to the shards JavaRDD<MetricTimeSeries> docs = jsc.parallelize(shards, shards.size()).flatMap( (FlatMapFunction<String, MetricTimeSeries>) shardUrl -> chronixStorage.streamFromSingleNode( new KassiopeiaSimpleConverter(), shardUrl, query)::iterator); return new ChronixRDD(docs);}

Figure out all Solr shards (using CloudSolrClient in the background)

Query each shard in parallel and convert SolrDocuments to MetricTimeSeries

62

01BINARY PROTOCOL WITH STANDARD SOLR CLIENT

private Stream<MetricTimeSeries> streamWithHttpSolrClient(String shardUrl, SolrQuery query, TimeSeriesConverter<MetricTimeSeries> converter) { HttpSolrClient solrClient = getSingleNodeSolrClient(shardUrl); solrClient.setRequestWriter(new BinaryRequestWriter()); query.set("distrib", false); SolrStreamingService<MetricTimeSeries> solrStreamingService = new SolrStreamingService<>(converter, query, solrClient, nrOfDocumentPerBatch); return StreamSupport.stream( Spliterators.spliteratorUnknownSize(solrStreamingService, Spliterator.SIZED), false);}

Use HttpSolrClient pinned to one shard

Use binary (request)protocol


63

private Stream<MetricTimeSeries> streamWithCloudSolrStream(String zkHost, String collection, String shardUrl, SolrQuery query, TimeSeriesConverter<MetricTimeSeries> converter) throws IOException { Map params = new HashMap(); params.put("q", query.getQuery()); params.put("sort", "id asc"); params.put("shards", extractShardIdFromShardUrl(shardUrl)); params.put("fl", Schema.DATA + ", " + Schema.ID + ", " + Schema.START + ", " + Schema.END + ", metric, host, measurement, process, ag, group"); params.put("qt", "/export"); params.put("distrib", false); CloudSolrStream solrStream = new CloudSolrStream(zkHost, collection, params); solrStream.open(); SolrTupleStreamingService tupStream = new SolrTupleStreamingService(solrStream, converter); return StreamSupport.stream( Spliterators.spliteratorUnknownSize(tupStream, Spliterator.SIZED), false);}

EXPORT HANDLER PROTOCOL

Pin query to one shard

Use export request handler


64

01CHRONIXRDD: FROM CHUNKS TO TIME SERIESpublic ChronixRDD joinChunks() { JavaPairRDD<MetricTimeSeriesKey, Iterable<MetricTimeSeries>> groupRdd = this.groupBy(MetricTimeSeriesKey::new); JavaPairRDD<MetricTimeSeriesKey, MetricTimeSeries> joinedRdd = groupRdd.mapValues((Function<Iterable<MetricTimeSeries>, MetricTimeSeries>) mtsIt -> { MetricTimeSeriesOrdering ordering = new MetricTimeSeriesOrdering(); List<MetricTimeSeries> orderedChunks = ordering.immutableSortedCopy(mtsIt); MetricTimeSeries result = null; for (MetricTimeSeries mts : orderedChunks) { if (result == null) { result = new MetricTimeSeries .Builder(mts.getMetric()) .attributes(mts.attributes()).build(); } result.addAll(mts.getTimestampsAsArray(), mts.getValuesAsArray()); } return result; }); JavaRDD<MetricTimeSeries> resultJavaRdd = joinedRdd.map((Tuple2<MetricTimeSeriesKey, MetricTimeSeries> mtTuple) -> mtTuple._2); return new ChronixRDD(resultJavaRdd); }

group chunks according identity

join chunks tological time series

time series processing with solr and spark

Technology