jstorm introduction-0.9.6

Company

LOGO

An Introduction of JStorm

LongdaFeng([email protected])

Longda Feng

Alibaba

Agenda

Question and Answer.

Basic Concept & Scenarios

Background

JStorm vs Storm

Why start JStorm?

Who are we?

JStorm Team was among one of the earliest that uses Storm in China. Storm 0.5.1/0.5.4/0.6.0/0.6.2/0.7.0/0.7.1

JStorm 0.7.1/0.9.0/0.9.1/0.9.2/0.9.3/…

Our Duties Application Development

JStorm System Development

JStorm System Operation

Longda Feng

Alibaba

Who are Using JStorm

Many small Chinese companies are using

JStorm

Longda Feng

Alibaba

How Big?

More than 3000 servers

More than 3 trillion messages per day

Longda Feng

Alibaba

What is JStorm?

JStorm is a distributed programming

framework

Similar to Hadoop MapReduce but designed

for real-time/in-memory scenarios

Users can build powerful distributed

applications from very simple APIs

Longda Feng

Alibaba

What is JStorm?

Redesigned Storm in Java.

Proved stable running in huge clusters.

Much faster

Much more powerful

Longda Feng

Alibaba

Basic Conception

Pipe-lined data processing

Longda Feng

Alibaba

Advantage 1

Easy learning:

Simple Building Blocks: Topology/Spout/Bolt

APIs

Out of Box RPC/Fault-tolerance/Real-time

Data Grouping & Combining

Longda Feng

Alibaba

Advantage 2

Excellent Scalability

Horizontally Scalable

DAG-based

Adjustable parallelism of each component

Longda Feng

Alibaba

Stable

Guarantees Fault-Tolerance

No Single Point of Failure

• Nimbus HA

• Any Supervisor can be shutdown

New worker will be spawned and replace the

failed one automatically

Longda Feng

Alibaba

Accuracy

Acking framework guarantees no lost of

data

Transaction framework guarantees data

accuracy.

Longda Feng

Alibaba

Scenarios

Stateless Computation

All data come from Tuple

Use Cases：

Log Analysis

Pipe-lined System

Message converter

Statistical Analysis

Real-time Recommendation Algorithm

Longda Feng

Alibaba

Longda Feng

Alibaba

Why start JStorm

Storm community is not as active as we’ve expected

Tailored for enterprise environment

Fixed critical bugs in Storm

Provided professional technical support, improved app development pace.

Reduced operational cost.

How Many Versions?

https://github.com/alibaba/JStorm/releases 0.9.6(2014/9/22)

0.9.5.1(2014/9/14)

0.9.5 (2014/8/27)

0.9.4.1 (2014/8/15)

0.9.4(2014/7/18)

0.9.3.1 (2014/5/31)

0.9.3 (2014/5/10)

0.9.2 (2014/4/8)

0.9.1(2014/1/24)

0.9.0(2013/12/30)

0.7.1(2013/4/28)Longda Feng

Alibaba

JStorm is a superset of Storm

The program run in Storm can run in

JStorm without changing code

Longda Feng

Alibaba

More stable (1) -- nimbus HA

Nimbus HA

Dual-Nimbus HA

Longda Feng

Alibaba

More stable (2) -- RPC

Netty supports 2 RPC modes

Async

Sync

• Sending speed keeps up with the receiving speed,

therefore the data flow is more stable.

Longda Feng

Alibaba

More stable(3) – resource isolation

Malicious Worker won’t mess up with

others

Supported CPU Isolation with cgroups

Supported Memory Isolation

Resources quota can be enforced on each

group (before 0.9.5)

Longda Feng

Alibaba

More stable(4) -- Monitor

Monitor every component in your

Topology

Many more metrics(70+) than storm

Supported user-defined metrics

Supported user-defined alerts

Longda Feng

Alibaba

More stable (5) – CPU usage

Better utilizing CPU resource

Improved disruptor implementation

• Drop CPU usage from 300% to 10% when

processing queue is full

Avoid CPU spin-waiting

• Relocating nextTuple/ack/fail work to a different

thread

Longda Feng

Alibaba

More stable(6) -- more catch

Add try-catch in any place.

Nimbus/supervisor main thread

Spout/bolt initialization/cleanup

All IO operation, serialization/deserialization

All ZK operation

Longda Feng

Alibaba

More stable(7) -- ZK

Reduced unnecessary ZK usage：

Removed useless watcher

Increased ZK heartbeat frequency

Detect failed worker without a full scan of the

entire ZK directory

Longda Feng

Alibaba

More stable（8） -- other

Improved GC Tuning.

Guaranteed that all workers killed after kill

command is issued

Guaranteed single supervisor/nimbus per

instance

Avoid excessive use of local ports by

Netty client

。。。

Longda Feng

Alibaba

More powerful scheduler

Balancing Tasks with regard of ：

CPU

Memory

Net

Longda Feng

Alibaba

CPU assignment

By default assign each worker a single

CPU slot

Application can be configured to utilize

more slots

Why：

Some task creates extra threads to do other

things in Alimama, one CPU slot doesn’t meet

requirement

Longda Feng

Alibaba

Memory Usage

Default worker memory is 2G

Application can be configured to utilize

more memory slots

Why:

In Alipay Mdrill application, Solr bolt will apply

much more memory

Longda Feng

Alibaba

Smarter Balancing

With JStorm Scheduler:

Tasks that exchange data heavily tend to be

assigned to the same worker to avoid

networking cost.

Longda Feng

Alibaba

User Defined Scheduler

User define task run one designated

worker

User can setting how many CPU slot /memory

slot will be used

Why：

In Taobao TAE project, some bolts want to

run in user defined-nodes

Longda Feng

Alibaba

Task on Different Node

Task of one component can be scheduled

to run on different nodes

Why：

In ALIPAY Mdrill, Solr bolt must run different

node

Longda Feng

Alibaba

Task on Single Node

All tasks can be scheduled to run on a

single node.

Why:

In Taobao TLog, there are many small jobs, in

order to reduce network cost, all task of one

job must run on single node.

Longda Feng

Alibaba

Old Assignment

“Last Assignment Policy”

By default , a task will run on the machine it

runs previous time

Why：

In Alibaba CDO, When restart one application,

user wanted to reuse old workers

Longda Feng

Alibaba

Pluginable

Be able to run on:

Hadoop yarn(more stable than storm)

Alibaba Apsara Clould System

Alibaba Elastic Resource Pool

Longda Feng

Alibaba

Classloader

Resolved application jar-confliction with

JStorm

Longda Feng

Alibaba

More convenient UI

More useful stats collected and displayed.

Browse Worker Log in UI

Longda Feng

Alibaba

Support libjar

Don’t need assembly all dependency jars

into one jar

Submit libjar with libjar parameter

Support worker.classpath

Longda Feng

Alibaba

Faster

6 Servers (24core/98G)

18 Spout/18 Bolt/18 Acker

Longda Feng

Alibaba

9280598

10818815

9065965

6819139

5610201

62436806830500

5595900 5474180

3379800

0

2000000

4000000

6000000

8000000

10000000

12000000

0 10 20 30 40 50 60

pollt

uple

s/1

0s

workers

Throughput vs workers

jstorm

storm

JStorm 41W/S Sending Speed

Longda Feng

Alibaba

Storm 41W/S Sending Speed

Longda Feng

Alibaba

Why Faster

Reduce memory-copying by zeroMq

Dedicated Deserializing Thread

Better Tuned Sampling Logic

Better Tuned Acking Framework

Better Tuned GC

Longda Feng

Alibaba

Other Improvement

More than 100 improvements

https://github.com/alibaba/JStorm/blob/master/history.md

Fixed assign topology competition

Reset rebalance/reassigned worker timeout as 4 minutes

Graceful worker shutdown

Improvement on thrift server

Avoid mistakenly killing of worker while rebalancing jobs.

。。。。

Longda Feng

Alibaba

https://github.com/alibaba/jstorm/blob/master/history.md

More document

https://github.com/alibaba/JStorm/wiki

Google-group:[email protected]

Wangwang：JStorm

QQ：228374502

Laiwang: JStorm

Longda Feng

Alibaba

https://github.com/alibaba/jstorm/wiki

Join us

Welcome to Join us

[email protected]

Longda Feng

Alibaba

Company

LOGO

纪君祥（Longda Feng）

jstorm introduction-0.9.6

Internet