ron crocker - evaluating streaming framework performance for a large-scale aggregation pipeline

47
Confidential ©2008-15 New Relic, Inc. All rights reserved. E VALUATING S TREAMING F RAMEWORK P ERFORMANCE FOR A L ARGE -S CALE A GGREGATION P IPELINE R ON C ROCKER ([email protected]) P RINCIPAL E NGINEER & A RCHITECT I NGEST P IPELINE 1

Upload: flink-forward

Post on 08-Jan-2017

116 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Confidential ©2008-15 New Relic, Inc. All rights reserved.  

EVALUATING STREAMING FRAMEWORK PERFORMANCE FOR A LARGE-SCALE

AGGREGATION PIPELINE

RON CROCKER ([email protected])

PRINCIPAL ENGINEER & ARCHITECT INGEST PIPELINE

1

Confidential ©2008-15 New Relic, Inc. All rights reserved.  

Confidential ©2008-15 New Relic, Inc. All rights reserved.  

4

EVERY MINUTE

requestsacceptsover 16M

storesover

analyticevents2M

aggregatesover 800M metrics

3Bqueriesover

datapoints

differentservices

containsover 200

maintained/built by 25+ engineering

teams

2.5 morethan

SSDstorage

PETABYTES

Thanks for the pic! https://www.flickr.com/photos/stephenyeargin/7466608166

�����

Confidential ©2008-15 New Relic, Inc. All rights reserved.  

9

Confidential ©2008-15 New Relic, Inc. All rights reserved.  

▪ Double-click to edit▪

Confidential ©2008-15 New Relic, Inc. All rights reserved.  

Goals for evaluating streaming systems

• Understand performance characteristics

• Understand operations characteristics

11

Confidential ©2008-15 New Relic, Inc. All rights reserved.  

How New Relic works…

… the cartoon version

12

Confidential ©2008-15 New Relic, Inc. All rights reserved.  

13

A1 An instance of your application running on a host

A2 Another instance of your application running on another host

An More instances of your application running on more hosts

Confidential ©2008-15 New Relic, Inc. All rights reserved.  

14

A1

A2

An

New Relic Agent reports data to New Relic

Confidential ©2008-15 New Relic, Inc. All rights reserved.  

15

A1

A2

An

▪ Agent Token (≈ account ID, agent ID)▪ Duration (time-period covered)▪ Timeslices: Each timeslice contains▪ Metric name▪ Metric stats▪ Count, total time, exclusive time, min,

max, sum of squares

HTTP post to <something>.newrelic.com

Confidential ©2008-15 New Relic, Inc. All rights reserved.  

16

▪ ▪

Confidential ©2008-15 New Relic, Inc. All rights reserved.  

TimesliceResolver

MinuteAggregator

MinuteWriter

HourAggregator

HourWriter

aggr

egat

ed_m

inut

e_ti

mesl

ices

_dat

a

reso

lved

_tim

esli

ce_d

ata

aggr

egat

ed_h

ourl

y_ti

mesl

ices

_dat

a

raw_

time

slic

e_da

ta

HTTP termination

OtherConsumers

17

A1

A1

▪ Agent Token (≈ account ID, agent ID)▪ Duration (time-period covered)▪ Timeslices: Each timeslice contains▪ Metric name▪ Metric stats▪ Count, total time, exclusive time, min,

max, sum of squares

▪ Account ID ▪ Agent ID ▪ Start time ▪ Duration (time-period covered)▪ Timeslices: Each timeslice contains▪ Metric name▪ Metric stats▪ Count, total time, exclusive time, min,

max, sum of squares

▪ Account ID▪ Agent ID▪ Application Agent IDs ▪ Start time▪ Duration (time-period covered)▪ Timeslices: Each timeslice contains▪ Metric ID ▪ Metric stats▪ Count, total time, exclusive time, min,

max, sum of squares

▪ Account ID▪ Agent ID ▪ Timeslices: Each timeslice contains▪ Metric ID▪ Start time ▪ Duration (time-period covered) ▪ Metric stats▪ Count, total time, exclusive time,

min, max, sum of squares

Confidential ©2008-15 New Relic, Inc. All rights reserved.  

The Experiment

18

Confidential ©2008-15 New Relic, Inc. All rights reserved.  

19

TimesliceResolver

MinuteAggregator

MinuteWriter

HourAggregator

HourWriter

aggr

egat

ed_m

inut

e_ti

mesl

ices

_dat

a

reso

lved

_tim

esli

ce_d

ata

aggr

egat

ed_h

ourl

y_ti

mesl

ices

_dat

a

raw_

time

slic

e_da

ta

HTTP termination

OtherConsumers

Why Minute Aggregator?

▪No external dependencies ▪ Performance comparisons solely focused on processing

▪Repeatable ▪ We can compare across technologies without needing to normalize

▪ Important to our business ▪ ProvIDes aggregation across instances of your application▪ We could have benchmarked something else, like Yahoo benchmark or

word count, but would it have mattered?20

Confidential ©2008-15 New Relic, Inc. All rights reserved.  

21

TimesliceResolver

MinuteAggregator

MinuteWriter

HourAggregator

HourWriter

aggr

egat

ed_m

inut

e_ti

mesl

ices

_dat

a

reso

lved

_tim

esli

ce_d

ata

aggr

egat

ed_h

ourl

y_ti

mesl

ices

_dat

a

raw_

time

slic

e_da

ta

HTTP termination

OtherConsumers

What about Hour Aggregator?

▪Similar to Minute Aggregator ▪ No external dependencies, Repeatable, Important to the business

▪Needs to run for several hours to understand performance ▪ … and I'm that patient

▪Extra credit: Integrate into stream implementations

22

Confidential ©2008-15 New Relic, Inc. All rights reserved.  

Goals for evaluating streaming systems

• Understand performance characteristics • Performance at different arrival rates:• 100%• 6000%• To infinity and beyond

• Understand operations characteristics • No explicit goal

23

Evaluation Framework

24

Datacenter

StagingKafka

AWS

ExperimentKafka

VPC

Baseline

Flink

Spark

Load driver

AWS Configurations

▪Kafka + ZK ▪ 3 i2.8xlarge hosts▪Baseline ▪ 3 m4.4xlarge hosts▪Flink ▪ 4 m4.4xlarge hosts▪Spark ▪ EMR - 1 master , 3 workers, all m4.4xlarge

25

AWS Configuration i2.8xlarge m4.4xlarge

Cores 32 16

RAM 244GB 64GB

Network Bandwidth 10Gbps 2Gbps

Experimental Kafka system

▪Kafka 0.8.2.2 ▪ NR fork, includes back ports of some 0.9 features

▪# partitions: 16 ▪ It's possible that this is too few partitions for the Baseline system

26

Load Driver

▪Generates simple synthetic load based on real traffic ▪ Real traffic = output of Timeslice Resolver ▪ Load generated based on repeated messages

▪Synthesizing interesting load is challenging: ▪ Un-bundle timeslices▪ Generate re-bundled with new IDs - Agent, Account and/or Metrics▪ Repeat as necessary to get to load point

27

Kafka

Baseline system - Our incumbent Minute Aggregator

28

▪ Consume Aggregate Agent

Aggregate Applications

▪ Consume Aggregate Agent

Aggregate Applications

Produce

Construct Minute

Bundles

▪Kafka ▪

Confidential ©2008-15 New Relic, Inc. All rights reserved.  

29

Confidential ©2008-15 New Relic, Inc. All rights reserved.  

30

Distributions are not friendly…

31

Average # timeslices: 279 Geometric Mean # timeslices: 64

Median # timeslices: 44

Long tail…

Flink configuration

Job Manager

Task Manager(16 slots)

Task Manager(16 slots)

Task Manager(16 slots)

Confidential ©2008-15 New Relic, Inc. All rights reserved.  

33

Confidential ©2008-15 New Relic, Inc. All rights reserved.  

34

AWS EMR

Spark configuration

Master

Slave Slave Slave

Confidential ©2008-15 New Relic, Inc. All rights reserved.  

36

But the Spark Streaming solution generates WRONG results

▪… because there is no Event Time windowing

▪… leading to me abandoning Spark Streaming

37

Confidential ©2008-15 New Relic, Inc. All rights reserved.  

Results

38

Results

39

Technology 100 % 500 % 4000 % 6000 % more…

Baseline

Flink

Spark

X Flat throughput with Kafka lag

X Flat throughput without Kafka lag

X Wrong answers…

Opportunities to improve the experiment▪MORE BANDWIDTH

▪ I don't know the limit of the Flink implementation

▪Key space domain expansion [All] ▪ Scaling in rate domain only, with the same set of keys

▪ This is too easy on the key-based systems [Flink, Spark] ▪ This may be hard on the baseline system as well

▪ Inclusion of Database sinks [Flink, Spark] ▪ Kafka sinks are still needed for downstream functions

40

Confidential ©2008-15 New Relic, Inc. All rights reserved.  

Thank you

41

Confidential ©2008-15 New Relic, Inc. All rights reserved.  

Extra credit

42

Confidential ©2008-15 New Relic, Inc. All rights reserved.  

43

TimesliceResolver

MinuteAggregator

MinuteWriter

HourAggregator

HourWriter

aggr

egat

ed_m

inut

e_ti

mesl

ices

_dat

a

reso

lved

_tim

esli

ce_d

ata

aggr

egat

ed_h

ourl

y_ti

mesl

ices

_dat

a

raw_

time

slic

e_da

ta

HTTP termination

OtherConsumers

Confidential ©2008-15 New Relic, Inc. All rights reserved.  

44

Confidential ©2008-15 New Relic, Inc. All rights reserved.  

45

Confidential ©2008-15 New Relic, Inc. All rights reserved.  

Thank you

46