cloud connect 03 08-2011

28
Cloud Event Processing Analyze Sense Respond CloudConnect March 8, 2011

Upload: colin-clark

Post on 23-Dec-2014

1.246 views

Category:

Technology


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Cloud connect 03 08-2011

Cloud Event Processing

Analyze ∙ Sense ∙ Respond

CloudConnectMarch 8, 2011

Page 2: Cloud connect 03 08-2011

CLOUDEVENTPROCESSING

Welcome

• High Velocity Big Data• What is Complex Event Processing?• Analyzing Time Series with SAX• What is Map/Reduce?• Correlating with Historical Data• Using the Cloud• Questions

Page 3: Cloud connect 03 08-2011

CLOUDEVENTPROCESSING

Data Growth*

Category 1 Category 2 Category 3 Category 40

2

4

6

8

10

12

14

16

18

*It would appear that things will actually get worse, not better

Page 4: Cloud connect 03 08-2011

CLOUDEVENTPROCESSING

High Velocity Big Data

• What is Big Data?– You’ve got Big Data issues when you can’t turn the data into

information fast enough to act on:• Earthquake• Brownout• Market Crash• Terrorist Event

– You’ve got Big Data when you have to consider its actually Physicality

• What is High Velocity Big Data– Big Data In Flight…

• You don’t get to store it before you analyze it

Page 5: Cloud connect 03 08-2011

CLOUDEVENTPROCESSING

What is Complex Event Processing?

• Complex Event Processing (CEP) delivers high-speed processing of many events across all the layers of an organization, identifying only the most meaningful events within the event cloud, analyzing their impact, and taking subsequent action in real time.– From Wikipedia

Page 6: Cloud connect 03 08-2011

CLOUDEVENTPROCESSING

What? What is CEP?

• Domain Specific Language– Makes it easier to deal with events

• Continuous Query– Select symbol, side, price from tradeStream

• Time/Length Windows– Select symbol, side, avg(price) from tradeStream.win:time(10

minutes) group by symbol, side• Pattern Matching

– select a.* from pattern [every a=FIXNewOrderSingle -> (timer:interval(30 seconds) and not FIXNewOrderSingle(a.Side!=Side and a.OrderQty = OrderQty and a.Symbol = Symbol))]

Page 7: Cloud connect 03 08-2011

CLOUDEVENTPROCESSING

Wouldn’t It Be Cool

• Select * from everything where itsInteresting = toMe in last 10 minutes;

• Select * from everything where earthQuake > .8;

• Select * from everything where terroristsWillStrike > .9;

Page 8: Cloud connect 03 08-2011

CLOUDEVENTPROCESSING

CEP – Current Benefits*

• Really Fast!• Low Latency!• Provides a ‘ready made’ framework to build

real-time pattern matching applications• Think at a higher level

– Productivity

*your mileage may vary, widely

Page 9: Cloud connect 03 08-2011

CLOUDEVENTPROCESSING

CEP – Current Limitations

• Memory Bound– If you have a lot of events and windows, you risk

running out of memory on a single machine• Compute Bound

– To ensure high throughput and low latency, most CEP engines are actually doing simplistic things

• e.g. Filtering events

• Black Box– What’s going on in there?

Page 10: Cloud connect 03 08-2011

CLOUDEVENTPROCESSING

Checkpoint

• Ok, so by using Complex Event Processing– You can analyze data in flight– But

• You’re constrained by:– Available compute– Memory

• Because, there’s still too much data to process on one machine…

Page 11: Cloud connect 03 08-2011

CLOUDEVENTPROCESSING

The Problem With Time Series • Dimensionality

– How can I recognize something?• Distance Measures

– How do I find similar occurrences?• Time

– By the time I process the data, the information has little value…

Page 12: Cloud connect 03 08-2011

CLOUDEVENTPROCESSING

Symbolic Aggregate Approximation

• SAX reduces numerical data to a short string, or SAX word.

• Thousands of data points of numerical, continuous data becomes ‘ABCEDEFGH’

• SAX Approximation of the data fits in main memory, yet retains features of interest

• Creating SAX words from

historical and streaming data allows us to perform all kinds of magic…

0

-

-

0 20 40 60 80 100 120

bb b

a

cc

c

a

baabccbc

SAX Encoding

SAX Advantages:• Patterns identified and described using SAX actually

look like the underlying data• Other algorithms sometimes don’t actually describe

the underlying patterns or take way too much work to be useful in real time

Page 13: Cloud connect 03 08-2011

CLOUDEVENTPROCESSING

SAX – 5 Use Cases

• Indexing– Given a time series, find similar time series in the database

• Clustering– Find natural grouping in the time series

• Classification– Automagically sort patterns found in time series into categories

• Summarization– Condense verbose data into meaningful information

• Anomaly Detection– Find surprising, interesting, or unexpected behavior

Page 14: Cloud connect 03 08-2011

CLOUDEVENTPROCESSING

Why SAX is Cool

• Lower Bounding– The patterns identified and described using SAX

actually look like the underlying data• Dimensionality Reduction

– Previously intractable problems become possible in real time

• Other algorithms sometimes don’t describe underlying patterns

• Take way too much work to be useful in real time

Page 15: Cloud connect 03 08-2011

CLOUDEVENTPROCESSING

A Day’s Worth of IBM

Page 16: Cloud connect 03 08-2011

CLOUDEVENTPROCESSING

Normalized & PAA Applied

Page 17: Cloud connect 03 08-2011

CLOUDEVENTPROCESSING

And Finally, SAX

ED D

BC C

ABCCE

FG

EDDCCBC

Page 18: Cloud connect 03 08-2011

CLOUDEVENTPROCESSING

Checkpoint

• We’ve reduced dimensionality• We know were we are

– The current pattern is AABASDGF• We’re calculating it in ‘real-time’*

– Using Complex Event Processing• But

– There’s still too much data to process on one machine…• How can we process more data in the same

amount of time?

*I much prefer the term event-driven

Page 19: Cloud connect 03 08-2011

CLOUDEVENTPROCESSING

What is Map/Reduce?• Framework for processing ginormous datasets using a large number of

computers (nodes) in a cluster.

• "Map" Master node takes the input, chops it up into smaller sub-problems, and distributes those to worker nodes. The worker node processes that smaller problem, and passes the answer back to its master node.

• "Reduce" Takes the answers to all the sub-problems and combines them in a way to get the output - the answer to the problem it was originally trying to solve.– From Wikipedia

Page 20: Cloud connect 03 08-2011

CLOUDEVENTPROCESSING

What? What is Map/Reduce?

• WordCount Example (classic)– Map scans text for words and emits - {word,1}– Combine/collapses key values on same node -

{word,1,1,1} -> {word,3}– Shuffle/Sort merges results from different nodes

• {node A,”NoSQL”,50} {node B,:”Oracle”,50} {node B,”NoSQL”, 50) – becomes

• {node A,”NoSQL”,50} {node B,”NoSQL”,50} {node B,”Oracle”,50}

– Reduce• Outputs {“NoSQL”,100} {“Oracle”,50}

Page 21: Cloud connect 03 08-2011

CLOUDEVENTPROCESSING

SAX and Map/Reduce

• SAX is an ‘embarrassingly parallel’ problem• Using parallel processing allows SAX words to

be computed more quickly• Using Streaming Map/Reduce provides results

even faster, increasing the value of data even more– Partition by symbol and sort by timestamp– Calculate SAX words for each symbol, in parallel

• CEP Time Windows to the Rescue!

Page 22: Cloud connect 03 08-2011

CLOUDEVENTPROCESSING

Checkpoint

• CEP is great, but I still have to tell it what I’m looking for, right?

• SAX can help us reduce dimensionality, what else can it do for us?

• How do I relate Streaming Data to Historical Data?

• How do I do this while the Information still has value?

Page 23: Cloud connect 03 08-2011

CLOUDEVENTPROCESSING

High Velocity Big Data Pattern

OnRampEvents Events

Map

Map

Map

SAX Reduce Context

Map

ReduceMap

Map

Events

Historical

Events

Page 24: Cloud connect 03 08-2011

CLOUDEVENTPROCESSING

So What Do We Need?

• Complex Event Processing• The Algorithm (SAX)• Processing Model – Streaming Map/Reduce• Context – The Historical Aspect• What Do We Call This?

Page 25: Cloud connect 03 08-2011

CLOUDEVENTPROCESSING

What is DarkStar?

– Platform as a Service (PaaS)• Provides Distributed

– Complex Event Processing– Streaming Map/Reduce– Messaging– Web Services– Monitoring/Management

– Applications are built on top, or inside• SAX runs inside of DarkStar

– SAX is not a component of DarkStar, but an add-in library

– And deployed in a cluster• Virtualized Resources

Page 26: Cloud connect 03 08-2011

CLOUDEVENTPROCESSING

DarkStar

• What patterns are occurring in my data, right now?– CEP based streaming Map/Reduce

• Use a cluster of machines

• When did this pattern happen before?– Database with embedded Map/Reduce

• No need to move data outside the database for processing

Page 27: Cloud connect 03 08-2011

CLOUDEVENTPROCESSING

The Cloud

• Elastic Resource– Grows/Shrinks according to demand

• Virtualization– Efficient utilization of compute

• The Previously Unthinkable– Is now possible, if not already commonplace

• Peering can provide access to Big Pipes and Secure Data

Page 28: Cloud connect 03 08-2011

CLOUDEVENTPROCESSING

Thank You!

• Questions?

• Contact Me– Colin Clark– @EventCloudPro– [email protected]