very large scale stream processing inside alibaba longda@alibaba alibaba

Alibaba

Very Large Scale Stream Processing inside Alibaba

longda@alibaba

Alibaba

Current1

Current

Next Future

3

团队介绍

• Apache Storm PMC• The First Storm Team in China• Storm 0.5.1/0.5.4/0.6.0/0.6.2/0.7.0/0.7.1• Jstorm

0.7.1/0.9.0/0.9.1/0.9.2/0.9.3/0.9.3.1/0.9.4/0.9.4.1/0.9.5/0.9.5.1/0.9.6/0.9.6.1/0.9.6.2/0.9.6.3/0.9.7/0.9.7.1/0.9.7.2/0.9.8/2.0.4/2.1.0

• Our job – Do Everything: • Application Development• JStorm Platform Evolve• JStorm/Storm Technology Support• Maintain all Cluster

Current

Next Future

4

In Alibaba

• Everywhere• 1600 machines, 70 K machines will deploy • More 1000 Applications, 1500 topology• 1.5 PB• 2 Trillion Messages

Current

Next Future

5

Tlog/eagleeye 1000 Billion Message, 700 TB log, monitor 200K machines log.Rds Monitor 200 TB LogCTU Security 200 Billion Message, monitor all of trade/user actions, 500wDB Monitor 200 Billion Message, 500wBI Realtime Monitor 200 Billion Message, more than 2000 KPI.Alimama Anti Cheat 100 Billion Message, Living Room 11.11 Living Room, 12.12 Living Room, Spring Festival Living RoomOthers All kinds of monitor System

Large Scale Application

Current

Next Future

6

Advanced Features

• User Side Functionality• Stability Enhancement• Performance Improvement

Current

Next Future

7

Stable

• Customer Feedback• No one accident since the switch to Jstorm in the Alimama Cluster

Current

Next Future

8

• Improve Stability• Redesign Metric System• Backpressure• Resource Isolation• Nimbus HA• Topology Manager• Redesign ZK usage• Modify OS setting in RPM

Advanced Feature – Improve Stability

Current

Next Future

9

Redesign Metric System

• Key point:• Every Tuple Stage RT, including wait-time between stages, network cost. • Avoid noise

• Pluginable• Provide API to fetch all metrics

• Koala • Simple Directly Display all metrics

Current

Next Future

10

New UI

Current

Next Future

11

Backpressure

The paper about Heron is too simple to useThe design is complicatedWorks well on our online system, 6 times than the normal

Current

Next Future

12

Resource Isolation

• Cluster Isolation, control through one unified porter –Koala• In one cluster:• Cgroup ， share + limit CPU• User-defined Scheduler, force topology run on special

nodes.

Current

Next Future

13

Nimbus HA

• Nimbus HA, • Run more than 20 months• Stable

Current

Next Future

14

TopologyMaster

• Topology’s central control, move some jobs from Nimbus• Backpressure coordinator• Metrics collector/calculator• Hearbeat collector

Current

Next Future

15

Redesign ZK usage

• No dynamic data stored on ZK, especially metrics and hearbeat• ZK can’t support more than 400 Storm nodes .• ZK can support 2000 Jstorm node, current in Alibaba, a lot of

Jstorm ZK support 800 node.

Current

Next Future

16

RPM Setting

• Easy install Jstorm• Modify• Local temporary port range• Ulimit• Cronjob• Environment viriable

Current

Next Future

17

Advanced Features – From User Side

• User Side Functionality• User-Defined Scheduler• User-Defined Log• User-Defined Metrics• Gently Shutdown

• Dynamic Expand/Reload/Restart

• Customized Memory Usage• Different Netty Policy• Classloader

Current

Next Future

18

User-Defined Scheduler

• Just Using API:• Customize every worker’s CPU/Memory usage• Customized topology assignment• Assign Topology by used• Bind several component into one worker （ such as spout/bolt ）• Bind upstream/downstream component into one worker• Force one component run on special machines

• Force one component’s task run on different machines• Force topology run on special machines• Force using old assignment

Current

Next Future

19

Used-Define Log

• Switch to user log configuration• Switch between logback and log4j• Redirect System.out to any file• Add tags•

（ clustername/hostname/topologyname/workerid/taskid）• Dynamic change log setting:• Enable/Disable debug, debug log sample rate

Current

Next Future

20

User-Defined Metrics

• Using java metrics• Use-defined metrics• Web UI display

• Using Alimonitor• All metrics will be sent to Alimonitor• Used defined Alarm• Display history

• Koala System – JStorm porter• All metrics will be sent to Koala System• Display history• User Defined Alarm

Current

Next Future

21

Gently shutdown

• Resolve problem:• No data loss during shutdown• All worker must be killed• ZK is clean

Current

Next Future

22

Dynamic Expand/Reload/Restart

• Expand• Don’t kill current worker, don’t impact current data flow

• Restart• Reset all configuration• Modify worker/component parallel

• Reload• Reload binary• Reload Configuration

Current

Next Future

23

Customized memory usage

• Customize Worker memory -- worker.memory.size• Modify gc • worker.gc.childopts• Using user-define scheduler api

• Queue mode• Capacity limited/unlimited

Current

Next Future

24

Advanced Netty Feature

• Sync /Async Mode• Async mode blocking policy• Async cache policy

Current

Next Future

25

classloader

• Resolve class conflict between Application and JStorm

Current

Next Future

26

• 6 Servers (24core/98G)• 18 Spout/18 Bolt/18 Acker

0 10 20 30 40 50 600

2000000

4000000

6000000

8000000

10000000

12000000

62436806830500

5595900 5474180

3379800

9280598

10818815

9065965

6819139

5610201

Throughput vs workers

jstormstorm

workers

pollt

uple

s/10

s

Current

Next Future

27

Performance Improvement

1. Smart Batch Policy2. Add one thread to deserialize Tuple in every task3. Remove total send/receive stage4. Separate send and receive operation in Spout5. Fix several bug which leading to CPU empty run.6. Reduce metrics system performance influence.7. Tuning Acker code8. Tuning GC

Current

Next Future

28

Archeture

zookeeper

ui nimbus supervisor supervisor supervisor

worker

task

Current Next Future

29

Merge into Storm

• Replace the clojure core

Current Next Future

30

Redesign our SQL Engine

• The SQL Engine is customized, no general

Alibaba

Current Next Future

31

1. A more powerful SQL Engine2. A more powerful high level program framework

1. Easier to learn, to debug2. Provide higher thoroughput

3. A high level scheduler1. I don’t prefer to offline system – liking Hadoop/Spark/Yarn2. I prefer to online system – Elastic Online Scheduler/Docker/virtual machine3. More light

What should Storm/Jstorm go

Alibaba

Thanks！

Welcome join us ：QQ/ 微信 : 32147704

very large scale stream processing inside alibaba longda@alibaba alibaba

Documents