scalability and state - yonsei university · 2017-10-06 · scalability and state: a critical...
TRANSCRIPT
![Page 1: SCALABILITY AND STATE - Yonsei University · 2017-10-06 · SCALABILITY AND STATE: A CRITICAL ASSESSMENT OF THROUGHPUT OBTAINABLE ON BIG DATA STREAMING FRAMEWORKS FOR APPLICATIONS](https://reader034.vdocuments.site/reader034/viewer/2022042307/5ed386401ed2af2073287075/html5/thumbnails/1.jpg)
SCALABILITY AND STATE:A CRITICAL ASSESSMENT OF THROUGHPUT OBTAINABLE
ON BIG DATA STREAMING FRAMEWORKS
FOR APPLICATIONS WITH AND WITHOUT STATE INFORMATION
Department of Computer Science
Shinhyung Yang, Yonguk Jeong, ChangWan Hong, Hyunje Jun
and Bernd Burgstaller
Yonsei University
International Workshop on Autonomic Solutions for Parallel and
Distributed Data Stream Processing (Auto-DaSP 2017),
Santiago de Compostela, August 29, 2017.
![Page 2: SCALABILITY AND STATE - Yonsei University · 2017-10-06 · SCALABILITY AND STATE: A CRITICAL ASSESSMENT OF THROUGHPUT OBTAINABLE ON BIG DATA STREAMING FRAMEWORKS FOR APPLICATIONS](https://reader034.vdocuments.site/reader034/viewer/2022042307/5ed386401ed2af2073287075/html5/thumbnails/2.jpg)
Motivation
2
Characteristics of real-time stream processing:
sub-second latency incoming events
arriving at high velocity and high density
real-time data analysis on incoming streams
information perishes over time (e.g., GPS data)
Example: Urban traffic management
Batch Processing (MapReduce) is unable to meet the sub-second
latency requirements of stream analytics applications
GPS position information
User move history
Cell towerstatistics
Statisticspredictor
Prediction aggregator
10M tuples/sec
Cell towerdensity
aggregator
Population of Seoul:
10M (daytime)
![Page 3: SCALABILITY AND STATE - Yonsei University · 2017-10-06 · SCALABILITY AND STATE: A CRITICAL ASSESSMENT OF THROUGHPUT OBTAINABLE ON BIG DATA STREAMING FRAMEWORKS FOR APPLICATIONS](https://reader034.vdocuments.site/reader034/viewer/2022042307/5ed386401ed2af2073287075/html5/thumbnails/3.jpg)
Contributions
3
1. Determining maximum throughput obtainable from current streaming engines
Apache Storm, Apache Flink, Spark Streaming
2. Created and adapted streaming analysis benchmarks
Adapted: Yahoo streaming benchmark
simulation of an advertisement analytics pipeline
Created: trend detection benchmark
real-world streaming analysis identifying and predicting importance of real-world events
3. Dynamic Cloud profiling through Kieker framework
4. Made production-level framework configurations available on GitHub (for reproducibility of results)
5. Compared Cloud trend detector to a hand-tuned single-node lock-less shared memory trend detection re-implementation.
To check for possible glass ceiling with streaming framework performance.
![Page 4: SCALABILITY AND STATE - Yonsei University · 2017-10-06 · SCALABILITY AND STATE: A CRITICAL ASSESSMENT OF THROUGHPUT OBTAINABLE ON BIG DATA STREAMING FRAMEWORKS FOR APPLICATIONS](https://reader034.vdocuments.site/reader034/viewer/2022042307/5ed386401ed2af2073287075/html5/thumbnails/4.jpg)
Contributions
4
1. Determining maximum throughput obtainable from current streaming engines
Apache Storm, Apache Flink, Spark Streaming
2. Created and adapted streaming analysis benchmarks
Adapted: Yahoo streaming benchmark
simulation of an advertisement analytics pipeline
Created: trend detection benchmark
real-world streaming analysis identifying and predicting importance of real-world events
3. Dynamic Cloud profiling through Kieker framework
4. Made production-level framework configurations available on GitHub (for reproducibility of results)
5. Compared Cloud trend detector to a hand-tuned single-node lock-less shared memory trend detection re-implementation.
To check for possible glass ceiling with streaming framework performance.
![Page 5: SCALABILITY AND STATE - Yonsei University · 2017-10-06 · SCALABILITY AND STATE: A CRITICAL ASSESSMENT OF THROUGHPUT OBTAINABLE ON BIG DATA STREAMING FRAMEWORKS FOR APPLICATIONS](https://reader034.vdocuments.site/reader034/viewer/2022042307/5ed386401ed2af2073287075/html5/thumbnails/5.jpg)
Benchmark 1: Yahoo streaming benchmark
5
Tests the performance of existing Big Data streaming engines:
Apache Storm, Apache Flink, and Apache Spark Streaming
An advertising analytics pipeline of streaming operations:
events arrive through Kafka
JSON format is deserialized
events are filtered, projected, and joined
windowed counts of events per campaign are stored in the Redis in-memory
database
Apache Kafka(Stream Source)
RedisIn-memory database
Deserialize JSON Filter Transform / Projection
JoinTime Window Aggregation
Increment & Store
Storm, Flink or Spark Streaming
![Page 6: SCALABILITY AND STATE - Yonsei University · 2017-10-06 · SCALABILITY AND STATE: A CRITICAL ASSESSMENT OF THROUGHPUT OBTAINABLE ON BIG DATA STREAMING FRAMEWORKS FOR APPLICATIONS](https://reader034.vdocuments.site/reader034/viewer/2022042307/5ed386401ed2af2073287075/html5/thumbnails/6.jpg)
Experimental Setup
6
Cloud setup
from Yahoo’s publication [YH2016]
30 Cloud nodes are configured on Google Compute Engine
One Cloud node is equipped with:
16 virtual CPUs (vCPUs)
aka 16 Intel hyperthreads
Intel Xeon @ 2.50 GHz
24 GB RAM
![Page 7: SCALABILITY AND STATE - Yonsei University · 2017-10-06 · SCALABILITY AND STATE: A CRITICAL ASSESSMENT OF THROUGHPUT OBTAINABLE ON BIG DATA STREAMING FRAMEWORKS FOR APPLICATIONS](https://reader034.vdocuments.site/reader034/viewer/2022042307/5ed386401ed2af2073287075/html5/thumbnails/7.jpg)
Experimental Setup
7
The total provided Cloud resources include:
480 vCPUs
720 GB RAM
Cloud setup
from Yahoo’s publication [YH2016]
30 Cloud nodes are configured on Google Compute Engine
One Cloud node is equipped with:
16 virtual CPUs (vCPUs)
aka 16 Intel hyperthreads
Intel Xeon @ 2.50 GHz
24 GB RAM
![Page 8: SCALABILITY AND STATE - Yonsei University · 2017-10-06 · SCALABILITY AND STATE: A CRITICAL ASSESSMENT OF THROUGHPUT OBTAINABLE ON BIG DATA STREAMING FRAMEWORKS FOR APPLICATIONS](https://reader034.vdocuments.site/reader034/viewer/2022042307/5ed386401ed2af2073287075/html5/thumbnails/8.jpg)
Cloud Infrastructure vs. Application Nodes
8
RedisZookeeper
ClusterKafka Producers
(datagenerators)streaming engineKafka
Cluster
Cloud infrastructure setup
3 Zookeeper nodes
1 Redis in-memory database node
1 Kafka cluster (5 Kafka broker nodes)
10 Kafka producer nodes
19 infrastructure nodes 11 application-specific nodes
![Page 9: SCALABILITY AND STATE - Yonsei University · 2017-10-06 · SCALABILITY AND STATE: A CRITICAL ASSESSMENT OF THROUGHPUT OBTAINABLE ON BIG DATA STREAMING FRAMEWORKS FOR APPLICATIONS](https://reader034.vdocuments.site/reader034/viewer/2022042307/5ed386401ed2af2073287075/html5/thumbnails/9.jpg)
Benchmarking Cloud applications
9
Measuring CPU utilization in the Cloud
Kieker dynamic profiling framework
specialized at measuring performance of Cloud systems
Kieker agent and our sample-based profiler deployed with all application-
specific nodes
per-core, per-second CPU utilization of nodes is sampled every 500 ms (to fulfill per-
second sampling rate)
Our sample-based profiler accumulates sampling data on all nodes
11 application-specific streaming engines
Kieker agent Sample-based profiler
![Page 10: SCALABILITY AND STATE - Yonsei University · 2017-10-06 · SCALABILITY AND STATE: A CRITICAL ASSESSMENT OF THROUGHPUT OBTAINABLE ON BIG DATA STREAMING FRAMEWORKS FOR APPLICATIONS](https://reader034.vdocuments.site/reader034/viewer/2022042307/5ed386401ed2af2073287075/html5/thumbnails/10.jpg)
10
Average (AVG) graph shows vCPU utilization of Cloud nodes which run application actors.
Utilization is averaged across all vCPUs of a node.
Storm: Average Node vCPU Utilization
![Page 11: SCALABILITY AND STATE - Yonsei University · 2017-10-06 · SCALABILITY AND STATE: A CRITICAL ASSESSMENT OF THROUGHPUT OBTAINABLE ON BIG DATA STREAMING FRAMEWORKS FOR APPLICATIONS](https://reader034.vdocuments.site/reader034/viewer/2022042307/5ed386401ed2af2073287075/html5/thumbnails/11.jpg)
11
Storm: Average Node vCPU Utilization
![Page 12: SCALABILITY AND STATE - Yonsei University · 2017-10-06 · SCALABILITY AND STATE: A CRITICAL ASSESSMENT OF THROUGHPUT OBTAINABLE ON BIG DATA STREAMING FRAMEWORKS FOR APPLICATIONS](https://reader034.vdocuments.site/reader034/viewer/2022042307/5ed386401ed2af2073287075/html5/thumbnails/12.jpg)
Storm: Actor Instance Allocation
12
Same color is used for the same Cloud node in the graph and
orchestration diagram
KafkaSpout
Deserialize
EventFilter
EventProjection
RedisJoin
Campaign
Processor
Evaluation of orchestration efficiency
Profiled and drew actor allocation graph
of each streaming engine.
Did not include Spark streaming due to
differences in programming interfaces.
Each Cloud node represented with a
unique color.
All actor instances are included to
provide the complete picture.
![Page 13: SCALABILITY AND STATE - Yonsei University · 2017-10-06 · SCALABILITY AND STATE: A CRITICAL ASSESSMENT OF THROUGHPUT OBTAINABLE ON BIG DATA STREAMING FRAMEWORKS FOR APPLICATIONS](https://reader034.vdocuments.site/reader034/viewer/2022042307/5ed386401ed2af2073287075/html5/thumbnails/13.jpg)
Storm: Actor Instance Allocation
13
Same color is used for the same Cloud node in the graph and
orchestration diagram
KafkaSpout
Deserialize
EventFilter
EventProjection
RedisJoin
Campaign
Processor
![Page 14: SCALABILITY AND STATE - Yonsei University · 2017-10-06 · SCALABILITY AND STATE: A CRITICAL ASSESSMENT OF THROUGHPUT OBTAINABLE ON BIG DATA STREAMING FRAMEWORKS FOR APPLICATIONS](https://reader034.vdocuments.site/reader034/viewer/2022042307/5ed386401ed2af2073287075/html5/thumbnails/14.jpg)
14
Comparing Streaming Engines
Average node vCPU utilization
across three streaming engines
Under-utilization with Flink
and Spark Streaming
Spark Streaming
Storm Flink
![Page 15: SCALABILITY AND STATE - Yonsei University · 2017-10-06 · SCALABILITY AND STATE: A CRITICAL ASSESSMENT OF THROUGHPUT OBTAINABLE ON BIG DATA STREAMING FRAMEWORKS FOR APPLICATIONS](https://reader034.vdocuments.site/reader034/viewer/2022042307/5ed386401ed2af2073287075/html5/thumbnails/15.jpg)
Flink Actor Instance Allocation
15
Flink’s orchestration graph
Flink actors are confined to 5 nodes; 6 nodes left idle.
One node (green) overly allocated with actor instances.
Flink favors vertical over horizontal scaling, although not load-balanced.
KafkaConsumer
Deserialize
EventFilter
RedisJoin
Campaign
Processor
![Page 16: SCALABILITY AND STATE - Yonsei University · 2017-10-06 · SCALABILITY AND STATE: A CRITICAL ASSESSMENT OF THROUGHPUT OBTAINABLE ON BIG DATA STREAMING FRAMEWORKS FOR APPLICATIONS](https://reader034.vdocuments.site/reader034/viewer/2022042307/5ed386401ed2af2073287075/html5/thumbnails/16.jpg)
Differences in Orchestration Strategies
16
Flink’s Orchestration Storm’s Orchestration
Orchestration Details
Streaming engine Flink Storm
Number of participating nodes 5 10
Throughput (tuples/sec) 282,141 24,703
![Page 17: SCALABILITY AND STATE - Yonsei University · 2017-10-06 · SCALABILITY AND STATE: A CRITICAL ASSESSMENT OF THROUGHPUT OBTAINABLE ON BIG DATA STREAMING FRAMEWORKS FOR APPLICATIONS](https://reader034.vdocuments.site/reader034/viewer/2022042307/5ed386401ed2af2073287075/html5/thumbnails/17.jpg)
Orchestration Strategies Differences
17
Remarks
Streaming engines employ different orchestration strategies
Users are only given with high-level configuration options
Users cannot select number of actor instances nor assign actor instances to nodes
Flink’s Orchestration Storm’s Orchestration
![Page 18: SCALABILITY AND STATE - Yonsei University · 2017-10-06 · SCALABILITY AND STATE: A CRITICAL ASSESSMENT OF THROUGHPUT OBTAINABLE ON BIG DATA STREAMING FRAMEWORKS FOR APPLICATIONS](https://reader034.vdocuments.site/reader034/viewer/2022042307/5ed386401ed2af2073287075/html5/thumbnails/18.jpg)
18
Storm: CV of vCPU Utilization per Node
High CV: the vCPUs of a node are utilized to largely varying degrees.
Low CV: the vCPUs of a node are utilized to the same degree.
Ideal:
high average vCPU utilization
low CV
"all vCPUs are humming"
![Page 19: SCALABILITY AND STATE - Yonsei University · 2017-10-06 · SCALABILITY AND STATE: A CRITICAL ASSESSMENT OF THROUGHPUT OBTAINABLE ON BIG DATA STREAMING FRAMEWORKS FOR APPLICATIONS](https://reader034.vdocuments.site/reader034/viewer/2022042307/5ed386401ed2af2073287075/html5/thumbnails/19.jpg)
19
Storm: CV of Node vCPU Utilization
![Page 20: SCALABILITY AND STATE - Yonsei University · 2017-10-06 · SCALABILITY AND STATE: A CRITICAL ASSESSMENT OF THROUGHPUT OBTAINABLE ON BIG DATA STREAMING FRAMEWORKS FOR APPLICATIONS](https://reader034.vdocuments.site/reader034/viewer/2022042307/5ed386401ed2af2073287075/html5/thumbnails/20.jpg)
20
Comparing Streaming Engines
CV graphs of the three
streaming engines
Storm shows low CV whereas
Spark Streaming’s CV values
are highly scattered
Storm Flink
Spark Streaming
![Page 21: SCALABILITY AND STATE - Yonsei University · 2017-10-06 · SCALABILITY AND STATE: A CRITICAL ASSESSMENT OF THROUGHPUT OBTAINABLE ON BIG DATA STREAMING FRAMEWORKS FOR APPLICATIONS](https://reader034.vdocuments.site/reader034/viewer/2022042307/5ed386401ed2af2073287075/html5/thumbnails/21.jpg)
Benchmark 2: Trend Detection
21
A popular streaming analysis used in social network services and
search engines
discovering, measuring, and comparing changes in time series data from
online user interactions
Point-by-point Poisson model
example: keyword trending for a soccer match
the probability of observing a particular count of some quantity, when many
sources have individually low probabilities of contributing to the count
most effective for finding trending keywords from small sets of time series
data
Example data set
Wikipedia’s actual page traffic data collected for three months (150GB,
67M tuples)
![Page 22: SCALABILITY AND STATE - Yonsei University · 2017-10-06 · SCALABILITY AND STATE: A CRITICAL ASSESSMENT OF THROUGHPUT OBTAINABLE ON BIG DATA STREAMING FRAMEWORKS FOR APPLICATIONS](https://reader034.vdocuments.site/reader034/viewer/2022042307/5ed386401ed2af2073287075/html5/thumbnails/22.jpg)
Cloud Trend Detector
22
Implemented with Storm API and Java
Stateful versus stateless actor
stateful: a global data structure is required to maintain all states
stateless: remove global data-structure from a Cloud application to avoid
expensive communication overhead
Re-designing the trend detector to become stateless:
introducing speculative trend detection
parallel reduction algorithm is a natural fit for this purpose
int sum = 0;
sum++;
a b c a’ b’ c’
![Page 23: SCALABILITY AND STATE - Yonsei University · 2017-10-06 · SCALABILITY AND STATE: A CRITICAL ASSESSMENT OF THROUGHPUT OBTAINABLE ON BIG DATA STREAMING FRAMEWORKS FOR APPLICATIONS](https://reader034.vdocuments.site/reader034/viewer/2022042307/5ed386401ed2af2073287075/html5/thumbnails/23.jpg)
Parallel Reduction Algorithm
23
N-layer Cloud trend detector with parallel reduction
Cloud trend detector is created dynamically at the beginning of the run-
time with given number of layers
Each trend detection node receives partial stream and evaluate each
keyword’s trendiness.
Each aggregator node performs evaluation of trendiness from the results
of the two precedent nodes.
Data Generator
Aggregator
Aggregator
Trend Detection
Trend Detection
Trend Detection
Aggregator
Aggregator
Aggregator
Aggregator
Aggregator
![Page 24: SCALABILITY AND STATE - Yonsei University · 2017-10-06 · SCALABILITY AND STATE: A CRITICAL ASSESSMENT OF THROUGHPUT OBTAINABLE ON BIG DATA STREAMING FRAMEWORKS FOR APPLICATIONS](https://reader034.vdocuments.site/reader034/viewer/2022042307/5ed386401ed2af2073287075/html5/thumbnails/24.jpg)
Single-node Trend Detector
24
Stateful Trend Detection
Implemented in C++ for a shared-memory multicore computer
lock-free hashmap
Data Generator0
Worker01
Worker01
Worker0i
Data Generator1
Worker11
Worker12
Worker1i
trending-listkeyword1
keyword2
keyword3
keywordn
t1 t2 t3 ti
t0
t1
t2
tn
![Page 25: SCALABILITY AND STATE - Yonsei University · 2017-10-06 · SCALABILITY AND STATE: A CRITICAL ASSESSMENT OF THROUGHPUT OBTAINABLE ON BIG DATA STREAMING FRAMEWORKS FOR APPLICATIONS](https://reader034.vdocuments.site/reader034/viewer/2022042307/5ed386401ed2af2073287075/html5/thumbnails/25.jpg)
Thread-to-core Allocation
25
Each thread is allocated to a single, dedicated core on a CPU
lock-free hashmap
Worker01
Worker01
Worker0i
Data Generator0 Data Generator1
Worker11
Worker12
Worker1i
trending-listkeyword1
keyword2
keyword3
keywordn
t1 t2 t3 ti
t0
t1
t2
tn
![Page 26: SCALABILITY AND STATE - Yonsei University · 2017-10-06 · SCALABILITY AND STATE: A CRITICAL ASSESSMENT OF THROUGHPUT OBTAINABLE ON BIG DATA STREAMING FRAMEWORKS FOR APPLICATIONS](https://reader034.vdocuments.site/reader034/viewer/2022042307/5ed386401ed2af2073287075/html5/thumbnails/26.jpg)
Thread-to-core Allocation (cont.)
26
Thread-to-core allocation
one datagenerator d is employed per CPU
remaining cores are filled with worker threads w
each worker thread has a dedicated streaming queue to receive tuples from
a datagenerator
the worker threads receive tuples from the datagenerator thread pinned on
the same CPU
![Page 27: SCALABILITY AND STATE - Yonsei University · 2017-10-06 · SCALABILITY AND STATE: A CRITICAL ASSESSMENT OF THROUGHPUT OBTAINABLE ON BIG DATA STREAMING FRAMEWORKS FOR APPLICATIONS](https://reader034.vdocuments.site/reader034/viewer/2022042307/5ed386401ed2af2073287075/html5/thumbnails/27.jpg)
Lock-free SPSC Queue
27
Lock-free single-producer-single-consumer queues are employed
for each and every worker thread
lock-free hashmap
Data Generator0
Worker01
Worker01
Worker0i
Data Generator1
Worker11
Worker12
Worker1i
trending-listkeyword1
keyword2
keyword3
keywordn
t1 t2 t3 ti
t0
t1
t2
tn
a dedicated streaming
queue for each worker
![Page 28: SCALABILITY AND STATE - Yonsei University · 2017-10-06 · SCALABILITY AND STATE: A CRITICAL ASSESSMENT OF THROUGHPUT OBTAINABLE ON BIG DATA STREAMING FRAMEWORKS FOR APPLICATIONS](https://reader034.vdocuments.site/reader034/viewer/2022042307/5ed386401ed2af2073287075/html5/thumbnails/28.jpg)
Lock-free Hashmap
28
Lock contention is removed by employing a lock-free hashmap
Correctness is guaranteed by storing all timestamps
lock-free hashmap
Data Generator0
Worker01
Worker01
Worker0i
Data Generator1
Worker11
Worker12
Worker1i
trending-listkeyword1
keyword2
keyword3
keywordn
t1 t2 t3 ti
t0
t1
t2
tn
![Page 29: SCALABILITY AND STATE - Yonsei University · 2017-10-06 · SCALABILITY AND STATE: A CRITICAL ASSESSMENT OF THROUGHPUT OBTAINABLE ON BIG DATA STREAMING FRAMEWORKS FOR APPLICATIONS](https://reader034.vdocuments.site/reader034/viewer/2022042307/5ed386401ed2af2073287075/html5/thumbnails/29.jpg)
Timebucket Evaluation
29
All timestamps of received keywords are stored.
Trendiness of a keyword is evaluated periodically.
lock-free hashmap
Data Generator0
Worker01
Worker01
Worker0i
Data Generator1
Worker11
Worker12
Worker1i
t1 t2 t3 ti
trending-list
keyword1
keyword2
keyword3
keywordn
t1 t2 t3 ti
t0
t1
t2
tn
time-bucket1
![Page 30: SCALABILITY AND STATE - Yonsei University · 2017-10-06 · SCALABILITY AND STATE: A CRITICAL ASSESSMENT OF THROUGHPUT OBTAINABLE ON BIG DATA STREAMING FRAMEWORKS FOR APPLICATIONS](https://reader034.vdocuments.site/reader034/viewer/2022042307/5ed386401ed2af2073287075/html5/thumbnails/30.jpg)
Experimental Results: Single-node Trend Detector
30
Single-node trend detector
2 Intel Xeon E5-2699 v4 CPUs (22 physical cores per CPU)
512 GB RAM
Achieved throughput: 3,217,432 tuples/s
![Page 31: SCALABILITY AND STATE - Yonsei University · 2017-10-06 · SCALABILITY AND STATE: A CRITICAL ASSESSMENT OF THROUGHPUT OBTAINABLE ON BIG DATA STREAMING FRAMEWORKS FOR APPLICATIONS](https://reader034.vdocuments.site/reader034/viewer/2022042307/5ed386401ed2af2073287075/html5/thumbnails/31.jpg)
31
Cloud Trend Detector Orchestration
KafkaSpout
Deserialize
TrendDetection
Aggregator
Aggregator
Aggregator
SinkNode
![Page 32: SCALABILITY AND STATE - Yonsei University · 2017-10-06 · SCALABILITY AND STATE: A CRITICAL ASSESSMENT OF THROUGHPUT OBTAINABLE ON BIG DATA STREAMING FRAMEWORKS FOR APPLICATIONS](https://reader034.vdocuments.site/reader034/viewer/2022042307/5ed386401ed2af2073287075/html5/thumbnails/32.jpg)
Utilization & Throughput
32
Cloud trend detector shows under-utilized Cloud nodes
Comparison of Cloud & single-node trend detectors
Trend Detection
Type Cloud Single-node
Participating node counts: 30 1
Throughput (tuples/s): 72,499 3,217,432
Implementation time: 2 3
![Page 33: SCALABILITY AND STATE - Yonsei University · 2017-10-06 · SCALABILITY AND STATE: A CRITICAL ASSESSMENT OF THROUGHPUT OBTAINABLE ON BIG DATA STREAMING FRAMEWORKS FOR APPLICATIONS](https://reader034.vdocuments.site/reader034/viewer/2022042307/5ed386401ed2af2073287075/html5/thumbnails/33.jpg)
Utilization & Throughput
33
Cloud trend detector shows under-utilized Cloud nodes
Comparison of Cloud & single-node trend detectors
Trend Detection
Type Cloud Single-node
Participating node counts: 30 1
Throughput (tuples/s): 72,499 3,217,432
Implementation time: 2 3
![Page 34: SCALABILITY AND STATE - Yonsei University · 2017-10-06 · SCALABILITY AND STATE: A CRITICAL ASSESSMENT OF THROUGHPUT OBTAINABLE ON BIG DATA STREAMING FRAMEWORKS FOR APPLICATIONS](https://reader034.vdocuments.site/reader034/viewer/2022042307/5ed386401ed2af2073287075/html5/thumbnails/34.jpg)
Conclusion
34
Big Data streaming platforms exhibit:
Low throughput
Disadvantageous orchestration decisions:
over-subscribed nodes (Flink), under-utilized nodes (all)
inconsistent vertical scaling (Flink), inefficient horizontal scaling (Storm)
Our stateful lock-less single-node trend detector features:
vertical scaling on a shared-memory multicore computer
it outperformed its Cloud-based counterpart by two orders of magnitude higher throughput
Envisioned future work:
Determine and resolve main bottlenecks of streaming platforms
Orchestration? scaling? communication latencies? JVM-induced overhead?
Attempt efficient vertical scaling for Cloud applications (inspired by Flink’s orchestration).
Orchestration of streaming applications for the Cloud
![Page 35: SCALABILITY AND STATE - Yonsei University · 2017-10-06 · SCALABILITY AND STATE: A CRITICAL ASSESSMENT OF THROUGHPUT OBTAINABLE ON BIG DATA STREAMING FRAMEWORKS FOR APPLICATIONS](https://reader034.vdocuments.site/reader034/viewer/2022042307/5ed386401ed2af2073287075/html5/thumbnails/35.jpg)
Acknowledgements
35
Research supported by the Next-Generation Information Computing
Development Program through the National Research Foundation of
Korea (NRF), funded by the Ministry of Science, ICT & Future Planning
under grant NRF2015M3C4A7065522.
![Page 36: SCALABILITY AND STATE - Yonsei University · 2017-10-06 · SCALABILITY AND STATE: A CRITICAL ASSESSMENT OF THROUGHPUT OBTAINABLE ON BIG DATA STREAMING FRAMEWORKS FOR APPLICATIONS](https://reader034.vdocuments.site/reader034/viewer/2022042307/5ed386401ed2af2073287075/html5/thumbnails/36.jpg)
36
Thank you…