real time analytics case study

25
Real Time Analytics – Big Data Case Study 1

Upload: nasscom

Post on 15-May-2015

2.538 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Real time analytics   case study

Real Time Analytics – Big Data

Case Study

1

Page 2: Real time analytics   case study

Agenda

Big Data

Real Time Analytics

Why is it needed?

Case Study – Telecom Industry

2 Impetus Confidential

Page 3: Real time analytics   case study

Big Data & Hadoop

3 Impetus Confidential

Page 4: Real time analytics   case study

What is Big Data?

Three dimensions of Big Data

• Volume

o Gathering/collecting over terabytes of information

• Velocity

o Analyzing million of trade events generated per day

• Variety

o Structured or unstructured data like text, sensor data, click

streams, audio, video and log files

4 Impetus Confidential

Page 5: Real time analytics   case study

Big Data

Data is the key to Business, it could be used for

• User behavior analysis

• Ad targeting

• Trending topics

• Recommendations

How ?

• Hadoop is the de-facto for batch processing data analytics

o Provides parallel computation framework (Map Reduce)

o Redundant, fault tolerant data storage

o Designed to reliably store data using commodity machine

o Designed keeping in mind hardware failures

• Based on Google’s GFS and Map Reduce implementation

• Real Time Analytics? - NO

5 Impetus Confidential

Page 6: Real time analytics   case study

Real Time Analytics

6 Impetus Confidential

Page 7: Real time analytics   case study

What is Real Time Analytics?

What is it?

• Real-time analytics is a process of delivering information

about events as they occur

Some Examples

• Financial Industry - Fraud Detection, Trading

• E-commerce - Recommendations

• Telecom Industry - Machine to Machine communication

• Supply Chain Management

• Business Activity Monitoring

7 Impetus Confidential

Page 8: Real time analytics   case study

Why is it needed ?

Time is money

• Inter-day risk analysis in real time could translate into

increased profits

Helps organizations to stay ahead of competition

• E-commerce – throwing information based on what a user is

browsing or interested in could help in better sales and

experience

• Content creator could produce relevant and quality content

8 Impetus Confidential

Page 9: Real time analytics   case study

Case Study –

Telecommunication Industry

9 Impetus Confidential

Page 10: Real time analytics   case study

The Company, Challenge & Benefits

10 Impetus Proprietary

Company

• Telecom firm providing wireless

network service designed to deliver

Machine to Machine communications

to millions of device.

Challenge

• Design a Near Real Time solution

for predicting patterns based on

data generated by Machine-to-

Machine (M2M) communication

and sent over wireless network.

• Solution should be able to support

addition of near real time streams

without much of a change.

• Enable customer to get real time

alerts for business critical

situations

Benefits

• Enabled customers to react to their critical business needs in real time.

• Improved Customer Experience.

• Reduced operating cost.

Page 11: Real time analytics   case study

Examples

Machine to Machine Communication

• Vineyards watering

o Spread over huge area

o Critical to maintain water level threshold

• Vehicle Tracking & Geo-fencing

o Mark the radius of vehicle movement (in case of valet parking)

11 Impetus Confidential

Page 12: Real time analytics   case study

Incoming Data Attributes

Continuous input streams

• Events as they happen

High data volume

• 1000-100000 events per second

Varied sources

• Data coming from multiple sources

12 Impetus Confidential

Page 13: Real time analytics   case study

Expected Goals

Identify patterns

• Devices sending incorrect /duplicate data

Reliability

• Events are processed as they happen

• Events are not missed in case of failure

Scalability

• Should be able to support increase in volume

Capability to Add more Queries

• Should be able to add more queries for a particular type of

incoming stream

Notification / Alerts System

13 Impetus Confidential

Page 14: Real time analytics   case study

Technology Stack – What all is needed?

Event Processing capability

• Esper

o Processing engine for data streams

o SQL-Like Support – run queries on data stream

o Sliding windows (time or length)

o Pattern Matching

o Executes large number of queries simultaneously

14 Impetus Confidential

Page 15: Real time analytics   case study

Technology Stack – Esper

Esper - Simple steps to get started

• Get an Esper instance

• Create a statement (Esper Query Language)

• Register the statement with esper engine

• Create a Listner

• Attach listener to the statement

15 Impetus Confidential

Page 16: Real time analytics   case study

Technology Stack – Esper

Esper – Sample Queries

Time based window

select avg(price) from StockTickEvent.win:time(30 sec)

Length based window

select symbol, avg(price) as averagePrice from

StockTickEvent.win:length(100) group by symbol

16 Impetus Confidential

Page 17: Real time analytics   case study

Technology Stack - Storm

Data Carrier for Esper

• Storm

o Facilitates data transfer

o Continuous Computation

o Distributed, Fault tolerant

o Scalable, No Data Loss

o Provides parallelism

o Acking & Replay capability

17 Impetus Confidential

Page 18: Real time analytics   case study

Technology Stack - Storm

Basic concept of Storm

• Streams, Spouts & Bolts

• Stream is unbounded sequence

of tuples

• Spouts are data emitters,

retrieving data from outside the

Storm cluster

• Bolts are data processors,

receive one or more stream and

emit (potentially) one or more

18 Impetus Confidential

Page 19: Real time analytics   case study

Technology Stack - Storm

Storm Cluster

• Topology - A graph of spouts and bolts

that are connected with stream groupings

• Master Node – Runs daemon called

Nimbus

o Distributes code across cluster

o Assign tasks to machines

o Monitor failure

• Worker Node - Runs daemon called

Supervisor

o Listens for work assigned

o Start/Stop worker process

o Executes subset of topology

• Coordination between nimbus and

supervisor is done with Zookeeper

19 Impetus Confidential

Page 20: Real time analytics   case study

Technology Stack - Flume

Log Data Collection

• Flume

o Stream oriented data flow

o Log streaming from various sources

o Collect, aggregate & move data to centralized data

store

o Distributed, Reliable

o Failover and recovery mechanism

20 Impetus Confidential

Page 21: Real time analytics   case study

Technology Stack - Flume

Flume

• Agent - Receives data from

an application

• Collector – Writes data on to

a permanent storage

• Master – Separate service

controlling all the other

nodes

21 Impetus Confidential

Page 22: Real time analytics   case study

Technology Stack - Messaging

Bridging the gap between Flume & Storm

• Queue Messaging System

o Robust messaging

o Flexible routing

o Highly available

o Makes Flume & Storm integration loosely coupled

• RabbitMQ fits the requirement

22 Impetus Confidential

Page 23: Real time analytics   case study

Fitting it all together

23

Data Center

Page 24: Real time analytics   case study

References

Esper

http://esper.codehaus.org/

Storm

https://github.com/nathanmarz/storm

https://github.com/tomdz/storm-esper

Flume

http://archive.cloudera.com/cdh/3/flume/UserGuide/#_architecture

Queue Messaging System

http://www.rabbitmq.com/

24 Impetus Confidential

Page 25: Real time analytics   case study

Thank You