streaming computing: architectures, and tchnologies

20
Streaming Computing Some thoughts and technology choices for event-driven processing Natalino Busa - 29 Aug. 2013

Upload: natalino-busa

Post on 14-Dec-2014

582 views

Category:

Technology


0 download

DESCRIPTION

Some loosen thoughts about the latest buzzwords, streaming computing, realtime processing, and in memory computing.

TRANSCRIPT

Page 1: Streaming computing: architectures, and tchnologies

Streaming ComputingSome thoughts and technology choices for event-driven processing

Natalino Busa - 29 Aug. 2013

Page 2: Streaming computing: architectures, and tchnologies

Outline

● Concurrency● Streaming computing

● Technologies○ Gigaspaces○ Storm○ Akka

● Comparison matrix● Opportunities

Page 3: Streaming computing: architectures, and tchnologies

Algorithms: a tribute

Numbers and Algorithms:

9th century Persian Muslim mathematician Abu Abdullah Muhammad ibn Musa Al-Khwarizmi,

whose work built upon that of the 7th century Indian mathematician Brahmagupta.

We own a lot to these guys !!!

Page 4: Streaming computing: architectures, and tchnologies

Why do we need parallelism?

It gets bigger,

It doesn’t get much faster

BUT

We get more cores in a chip.

More cores = more parallelismWe are happy now, right?

Page 5: Streaming computing: architectures, and tchnologies

Moore’s law

Every 18 months, the number of CPU

core’s double

Another interpretation:

Every 18 months, the number of idle

CPU core’s double

Page 6: Streaming computing: architectures, and tchnologies

More parallelism

We trade:

Time vs ( CPU, Memory, I/O)

Page 7: Streaming computing: architectures, and tchnologies

Modern applications

Scalability:Vertical: concurrency

(use all the cores, memory and I/O of a given machine)

Horizontal: distribution (use all the machines in the cluster)

High availability: Fault tolerance: all levels (local, distributed)

(the terminator effect: you can stop it but can’t kill it )

Page 8: Streaming computing: architectures, and tchnologies

Streaming applications

Performance: Efficient use of resources:

CPU and memory, but also OS threads and sockets

Asynchronous:

event driven, reacts on new data

Distributed:

more machines = more performancethe algorithm is partitioned and/or replicated on the cluster

Page 9: Streaming computing: architectures, and tchnologies

What to increase?

More CPU: It helps when there is

computation involved

More MEMORY: It helps when there is

more state to keep

More I/O: It helps when there are

more messages to transfer

Page 10: Streaming computing: architectures, and tchnologies

Streaming or batch?

ProcessingData

Natalino Busa - 12 Feb. 2013

Data

source system target systemour system

What differentiate Streaming from Batch?

● Granularity of Data● Granularity of Processing

Granularity impacts:

Throughput, Latency, and the Cost of the system!

Page 11: Streaming computing: architectures, and tchnologies

The choice is yours

1000 events/sec (1 KB/event)

running on 100 cores all day long

“Wait a day, then process”

860 M events = 86 GB of data

Latency: 24 hoursThroughput: 1 update/day

BATCH: Hadoop

Latency 1ms Throughput: 1000 updates/sec

STREAMING: Akka

“Do not wait”

Process the 1KB of data each msec.

“Both are valid options. It depends on the application domain and the requirements/specs of the target and source systems”

Page 12: Streaming computing: architectures, and tchnologies

Mapping it to existing applications

Granularity of Data

256 GB 256 GB

Granularity of Processing

1 CPU 100 CPU’s

Traditional DB systems Big Data (Hadoop)

Granularity of Data

1 KB 1 KB

Granularity of Processing

1 CPU 100 CPU’s

Traditional mail server Web application server

Page 13: Streaming computing: architectures, and tchnologies

Technologies: Gigaspaces

Page 14: Streaming computing: architectures, and tchnologies

Technologies: StormTopology

SupervisingScaling

Page 15: Streaming computing: architectures, and tchnologies

Technologies: Akka

Supervising:tree of actors

Topology (statics and dynamic actors)

Scaling and distributed processing

Page 16: Streaming computing: architectures, and tchnologies

Technology matrix

Gran

ular

ity o

f Dat

aGranularity of Processing

Small Big

Small Akka AkkaGigaspaces

Big ? Storm

System end-to-end throughput

High ~ 10’000 events/sec Medium ~100 events/sec Low ~10 events/sec

Akka Storm/ Gigaspaces Scripting languages

Page 17: Streaming computing: architectures, and tchnologies

Big Data in motion

Both are:Distributed, fault-tolerant, streaming

- Storm ++ multi-language -- not user/admin friendly -- slow supervising

processing elements are jvm’s ideal when data is coarse grained

- Akka ++ high throughput, fine grained actors ++ dynamic topologies -- low-level, but high performance

processing elements are small and lightweightideal for millions of transactions per second

- Gigaspaces ++ combines memory + application distribution -- framework api is not very flexible

processing elements are jvmsideal for all-in-one solution, with little customization

Page 18: Streaming computing: architectures, and tchnologies

Opportunity: Lambda Architecture

Logic layerSoftware as a Servicee.g realt-time predictor

Natalino Busa - 12 Feb. 2013from http://www.manning.com/marz/

Page 19: Streaming computing: architectures, and tchnologies

Opportunity: Batch + Streaming

BatchComputing

Front End Services

In-MemoryDistributed Database

In-memoryDistributed DB’s

BatchStreaming

HTML5 Client / Responsive Applow-latencyHTTP API services FETCH

(refresh)

StreamingComputing

Data Warehouses Messaging Busses

PUSH(SSE, notifications)