jstorm introduction-0.9.6
DESCRIPTION
JStorm introductionTRANSCRIPT
Longda Feng
Alibaba
Agenda
Question and Answer.
Basic Concept & Scenarios
Background
JStorm vs Storm
Why start JStorm?
Who are we?
JStorm Team was among one of the earliest that uses Storm in China. Storm 0.5.1/0.5.4/0.6.0/0.6.2/0.7.0/0.7.1
JStorm 0.7.1/0.9.0/0.9.1/0.9.2/0.9.3/…
Our Duties Application Development
JStorm System Development
JStorm System Operation
Longda Feng
Alibaba
Who are Using JStorm
Many small Chinese companies are using
JStorm
Longda Feng
Alibaba
How Big?
More than 3000 servers
More than 3 trillion messages per day
Longda Feng
Alibaba
What is JStorm?
JStorm is a distributed programming
framework
Similar to Hadoop MapReduce but designed
for real-time/in-memory scenarios
Users can build powerful distributed
applications from very simple APIs
Longda Feng
Alibaba
What is JStorm?
Redesigned Storm in Java.
Proved stable running in huge clusters.
Much faster
Much more powerful
Longda Feng
Alibaba
Basic Conception
Pipe-lined data processing
Longda Feng
Alibaba
Advantage 1
Easy learning:
Simple Building Blocks: Topology/Spout/Bolt
APIs
Out of Box RPC/Fault-tolerance/Real-time
Data Grouping & Combining
Longda Feng
Alibaba
Advantage 2
Excellent Scalability
Horizontally Scalable
DAG-based
Adjustable parallelism of each component
Longda Feng
Alibaba
Stable
Guarantees Fault-Tolerance
No Single Point of Failure
• Nimbus HA
• Any Supervisor can be shutdown
New worker will be spawned and replace the
failed one automatically
Longda Feng
Alibaba
Accuracy
Acking framework guarantees no lost of
data
Transaction framework guarantees data
accuracy.
Longda Feng
Alibaba
Scenarios
Stateless Computation
All data come from Tuple
Use Cases:
Log Analysis
Pipe-lined System
Message converter
Statistical Analysis
Real-time Recommendation Algorithm
Longda Feng
Alibaba
Longda Feng
Alibaba
Why start JStorm
Storm community is not as active as we’ve expected
Tailored for enterprise environment
Fixed critical bugs in Storm
Provided professional technical support, improved app development pace.
Reduced operational cost.
How Many Versions?
https://github.com/alibaba/JStorm/releases 0.9.6(2014/9/22)
0.9.5.1(2014/9/14)
0.9.5 (2014/8/27)
0.9.4.1 (2014/8/15)
0.9.4(2014/7/18)
0.9.3.1 (2014/5/31)
0.9.3 (2014/5/10)
0.9.2 (2014/4/8)
0.9.1(2014/1/24)
0.9.0(2013/12/30)
0.7.1(2013/4/28)Longda Feng
Alibaba
JStorm is a superset of Storm
The program run in Storm can run in
JStorm without changing code
Longda Feng
Alibaba
More stable (1) -- nimbus HA
Nimbus HA
Dual-Nimbus HA
Longda Feng
Alibaba
More stable (2) -- RPC
Netty supports 2 RPC modes
Async
Sync
• Sending speed keeps up with the receiving speed,
therefore the data flow is more stable.
Longda Feng
Alibaba
More stable(3) – resource isolation
Malicious Worker won’t mess up with
others
Supported CPU Isolation with cgroups
Supported Memory Isolation
Resources quota can be enforced on each
group (before 0.9.5)
Longda Feng
Alibaba
More stable(4) -- Monitor
Monitor every component in your
Topology
Many more metrics(70+) than storm
Supported user-defined metrics
Supported user-defined alerts
Longda Feng
Alibaba
More stable (5) – CPU usage
Better utilizing CPU resource
Improved disruptor implementation
• Drop CPU usage from 300% to 10% when
processing queue is full
Avoid CPU spin-waiting
• Relocating nextTuple/ack/fail work to a different
thread
Longda Feng
Alibaba
More stable(6) -- more catch
Add try-catch in any place.
Nimbus/supervisor main thread
Spout/bolt initialization/cleanup
All IO operation, serialization/deserialization
All ZK operation
Longda Feng
Alibaba
More stable(7) -- ZK
Reduced unnecessary ZK usage:
Removed useless watcher
Increased ZK heartbeat frequency
Detect failed worker without a full scan of the
entire ZK directory
Longda Feng
Alibaba
More stable(8) -- other
Improved GC Tuning.
Guaranteed that all workers killed after kill
command is issued
Guaranteed single supervisor/nimbus per
instance
Avoid excessive use of local ports by
Netty client
。。。
Longda Feng
Alibaba
More powerful scheduler
Balancing Tasks with regard of :
CPU
Memory
Net
Longda Feng
Alibaba
CPU assignment
By default assign each worker a single
CPU slot
Application can be configured to utilize
more slots
Why:
Some task creates extra threads to do other
things in Alimama, one CPU slot doesn’t meet
requirement
Longda Feng
Alibaba
Memory Usage
Default worker memory is 2G
Application can be configured to utilize
more memory slots
Why:
In Alipay Mdrill application, Solr bolt will apply
much more memory
Longda Feng
Alibaba
Smarter Balancing
With JStorm Scheduler:
Tasks that exchange data heavily tend to be
assigned to the same worker to avoid
networking cost.
Longda Feng
Alibaba
User Defined Scheduler
User define task run one designated
worker
User can setting how many CPU slot /memory
slot will be used
Why:
In Taobao TAE project, some bolts want to
run in user defined-nodes
Longda Feng
Alibaba
Task on Different Node
Task of one component can be scheduled
to run on different nodes
Why:
In ALIPAY Mdrill, Solr bolt must run different
node
Longda Feng
Alibaba
Task on Single Node
All tasks can be scheduled to run on a
single node.
Why:
In Taobao TLog, there are many small jobs, in
order to reduce network cost, all task of one
job must run on single node.
Longda Feng
Alibaba
Old Assignment
“Last Assignment Policy”
By default , a task will run on the machine it
runs previous time
Why:
In Alibaba CDO, When restart one application,
user wanted to reuse old workers
Longda Feng
Alibaba
Pluginable
Be able to run on:
Hadoop yarn(more stable than storm)
Alibaba Apsara Clould System
Alibaba Elastic Resource Pool
Longda Feng
Alibaba
Classloader
Resolved application jar-confliction with
JStorm
Longda Feng
Alibaba
More convenient UI
More useful stats collected and displayed.
Browse Worker Log in UI
Longda Feng
Alibaba
Support libjar
Don’t need assembly all dependency jars
into one jar
Submit libjar with libjar parameter
Support worker.classpath
Longda Feng
Alibaba
Faster
6 Servers (24core/98G)
18 Spout/18 Bolt/18 Acker
Longda Feng
Alibaba
9280598
10818815
9065965
6819139
5610201
62436806830500
5595900 5474180
3379800
0
2000000
4000000
6000000
8000000
10000000
12000000
0 10 20 30 40 50 60
pollt
uple
s/1
0s
workers
Throughput vs workers
jstorm
storm
JStorm 41W/S Sending Speed
Longda Feng
Alibaba
Storm 41W/S Sending Speed
Longda Feng
Alibaba
Why Faster
Reduce memory-copying by zeroMq
Dedicated Deserializing Thread
Better Tuned Sampling Logic
Better Tuned Acking Framework
Better Tuned GC
Longda Feng
Alibaba
Other Improvement
More than 100 improvements
https://github.com/alibaba/JStorm/blob/master/history.md
Fixed assign topology competition
Reset rebalance/reassigned worker timeout as 4 minutes
Graceful worker shutdown
Improvement on thrift server
Avoid mistakenly killing of worker while rebalancing jobs.
。。。。
Longda Feng
Alibaba
More document
https://github.com/alibaba/JStorm/wiki
Google-group:[email protected]
Wangwang:JStorm
QQ:228374502
Laiwang: JStorm
Longda Feng
Alibaba
Company
LOGO
纪君祥(Longda Feng)