introduction to wso2 data analytics platform

39
An Introduction to the WSO2 Analytics Platform Srinath Perera VP Research WSO2, Apache Member (@srinath_perera) [email protected]

Upload: srinath-perera

Post on 16-Apr-2017

825 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Introduction to  WSO2 Data Analytics Platform

An Introduction to the WSO2 Analytics PlatformSrinath PereraVP Research WSO2, Apache Member(@srinath_perera) [email protected]

Page 2: Introduction to  WSO2 Data Analytics Platform
Page 3: Introduction to  WSO2 Data Analytics Platform
Page 4: Introduction to  WSO2 Data Analytics Platform

Collect Data One Sensor API to

publish events - REST, Thrift, Java, JMS,

Kafka- Java clients, java script

clients* First you define streams

(think it as a infinite table in SQL DB)

Then publish events via Sensor API

Page 5: Introduction to  WSO2 Data Analytics Platform

“Publish once, process anyway you like”

Page 6: Introduction to  WSO2 Data Analytics Platform

Collecting Data: Example

Java example: create and send events Events send asynchronously See client given in http://goo.gl/vIJzqc for more info

Agent agent = new Agent(agentConfiguration);publisher = new AsyncDataPublisher("tcp://hostname:7612", .. );

StreamDefinition definition = new StreamDefinition(STREAM_NAME,VERSION);definition.addPayloadData("sid", STRING);... publisher.addStreamDefinition(definition);... Event event = new Event();event.setPayloadData(eventData);publisher.publish(STREAM_NAME, VERSION, event);Send events

Define Stream

Initialize Stream

Page 7: Introduction to  WSO2 Data Analytics Platform

Data Collection Examples• Collect data from inbuilt agents in

WSO2 products, Tomcat etc.• Collecting your log data via log stash • Collecting JVM and JMX stats via

agent • Ingesting data from message queues

such as JMS or Kafka • Pulling data from a RSS feed, or

scraping a web page • Write a custom agent to collect data

from your system and push it to DAS

Photo credit http://www.torange.us/ CC license

Page 8: Introduction to  WSO2 Data Analytics Platform

Analysis: Batch Analytics• Batch analytics reads data from a disk ( or some other

storage) and process them record by record • “MapReduce” is most widely used technology for batch

analytics – Apache Hadoop– Apache Spark 30X faster and much more flexible

• Analytics (Min, Max, average, correlation, histograms, might join or group data in many ways)

• Key Performance indicators (KPIs)– E.g. Profit per square feet for retail

• Presented as a Dashboard

Page 9: Introduction to  WSO2 Data Analytics Platform

SQL like Queries: Spark SQL Since many understands SQL,

Hive made large scale data processing Big Data accessible to many

Expressive, short, and sweet. Define core operations that

covers 90% of problems Lets experts dig in when they

like! (via User Defined functions)insert overwrite table BusSpeed select hour, average(v) as avgV, busID from BusStream group by busID, getHour(ts);

Page 10: Introduction to  WSO2 Data Analytics Platform

Spark SQL Query

Count entries where username is not empty group by user name and ordered by the count

SELECT username, COUNT(*) AS count FROM wikiData WHERE username <> '' GROUP BY username ORDER BY count DESC LIMIT 10

Page 11: Introduction to  WSO2 Data Analytics Platform

Usecase: API Usage

• Looking at different API calls by countries• Designed to draw attention to what APIs are

used and where

Page 12: Introduction to  WSO2 Data Analytics Platform

Value of some Insights degrade Fast!

For some usecases ( e.g. stock markets, traffic, surveillance, patient monitoring) the value of insights degrades very quickly with time.

We need technology that can produce outputs fast Static Queries, but need very fast

output (Alerts, Realtime control) Dynamic and Interactive Queries

( Data exploration)

Page 13: Introduction to  WSO2 Data Analytics Platform

Realtime Analytics: Complex Event Processing

Page 14: Introduction to  WSO2 Data Analytics Platform

CEP Queries 1

Calculate average temperature over a 1 minute sliding window group by roomNo

Define Stream TempStream(roomNo string, temp double )from TempStream#window.time(1 min)

select roomNo, avg(temp) as avgTempgroup by roomNoinsert all events into AvgRoomTempStream ; 

Page 15: Introduction to  WSO2 Data Analytics Platform

CEP Queries 2

Using data from a Football game Kick stream shows kicks by players on the ball Ball possession is hit by me, followed by any number

of hits by me, followed by hit by someone else

from every k1 =KickStream, KickStream[playerid = k1.playerid]*,KickStream[playerid != k1.playerid]

select ..insert into BallPosessionStream;

Page 16: Introduction to  WSO2 Data Analytics Platform

People Tracking via BLE

• Track people through BLE via triangulation

• Higher level logic via Complex Event Processing

• Traffic Monitoring • Smart retail • Airport management

Page 17: Introduction to  WSO2 Data Analytics Platform

Realtime Soccer Analysis

Watch at: https://www.youtube.com/watch?v=nRI6buQ0NOM

Page 18: Introduction to  WSO2 Data Analytics Platform

Scaling CEP Queries on top of Storm

▪Accepts CEP queries with hints about how to partition streams

▪Partition streams, build a Apache Storm topology running CEP nodes as Storm Sprouts, and run it. see http://goo.gl/pP3kdX for more info.

Page 19: Introduction to  WSO2 Data Analytics Platform

CEP Queries On Strom

@dist(parallel='4’) ask to run it with 4 nodes Use partition definition to break the data so they

can run in parallel

define partition on TempStream.region {@dist(parallel='4’) from TempStream[temp > 33]insert into HighTempStream;

}

from HighTempStream#window(1h)select max(temp)as max insert into HourlyMaxTempStream;

Page 20: Introduction to  WSO2 Data Analytics Platform

Interactive Analytics Best way to explore

data is by asking Ad-hoc questions

Interactive Analytics ( Search) let you query the system and receive fast results (<10s)

Shows data in context (e.g. by grouping events from the same transaction together)

Built using Lucence based Indexes.

SparkSQL> SELECT * FROM TWITTER_DATA

Page 21: Introduction to  WSO2 Data Analytics Platform

Predictive Analytics Can you “Write a program to drive a

Car?” Machine learning

Takes in lot of examples, and build a program that matches those examples

We call that program a “model” Lot of tools

- R ( Statistical language)- Sci-kit learn (Python)- Apache Spark’s MLBase and Apache

Mahout (Java)

Page 22: Introduction to  WSO2 Data Analytics Platform

Predictive Analytics in DAS• Building models

– With WSO2 Machine Learner Product via a Wizard ( powered by MLLib)

– Build model using R and export them as PMML

• Built models can be used them with both WSO2 CEP and ESB

Page 23: Introduction to  WSO2 Data Analytics Platform

Using the Model Within CEPfrom InputStream#ml:predict(’/../diabetes-model', 'double')select *insert into PredictionStream;

<predict> <model storage-location=”../downloaded-ml-model"/> <features> <feature name="SI2" expression="$body/features/SI2"/> .. </features> <predictionOutput property="result"/></predict>

 

Within ESB

Page 24: Introduction to  WSO2 Data Analytics Platform

WSO2 Machine Learner• Upload or select data • Explore the data • Train a Machine

learning model

Page 25: Introduction to  WSO2 Data Analytics Platform

WSO2 Machine Learner• Compare Results• Understand why• Iterate

Page 26: Introduction to  WSO2 Data Analytics Platform

Supported Algorithms• Deep Learning based classification (H2O’s Stacked

Autoencoders Classifier).• Classification algorithms - Decision Trees, Linear

Regression, Lasso Regression, SVM, Naïve • K-Mean clustering for unsupervised learning on your

data• Employ Anomaly Detection using K Means

Algorithm to identify fraud, network penetration and other difficult scenarios

• Recommendations Engine (Collaborative Filtering Algorithm)

Page 27: Introduction to  WSO2 Data Analytics Platform

Predict wait time in the Airport

• Predicting the time to go through airport

• Real-time updates and events to passengers

• Let airport manage by allocate resources

Page 28: Introduction to  WSO2 Data Analytics Platform

Predict Promising Customers• Typical website can get millions of users • Only very small fraction coverts • Each user, we know what he access, where

is works, country, what browser, OS, etc. • Problem is to predict what users will covert • Used Logistic regression, Random Forest,

Survival Modeling etc.

Page 29: Introduction to  WSO2 Data Analytics Platform

Predict Super Bowl• Predicted 7 of the 11

games • Done with Random

Forest Algorithm • Even what we missed

are instructive

See Yuda’s post: Predicting the Super Bowl with Machine Learning

Page 30: Introduction to  WSO2 Data Analytics Platform

Anomaly Detection:Markov Models

• Can model probability of a sequences

• Given a sequence, can predict likelihood, and use that to detect anomalies.

• Implemented with WSO2 CEP

Page 31: Introduction to  WSO2 Data Analytics Platform

Anomaly Detection: Clustering• Use clustering to

identify normal behavior as clusters

• Consider points away from all cluster as anomalies.

• Point is considered away from a cluster if it is outside 99% percentile line for that cluster

• Includes in WSO2 ML

Page 32: Introduction to  WSO2 Data Analytics Platform

Communicate: Dashboards• Dashboard give an

“Overall idea” in a glance (e.g. car dashboard)– Boring when everything is

good!!• Build your own dashboard.

– WSO2 DAS supports a

gadget generation Wizard– You can write your own

Gadgets using D3 and Javascript.

Page 33: Introduction to  WSO2 Data Analytics Platform

Gadget Generation Wizard

• Starts with data in tabular format • Map each column to dimension in

your plot like X,Y, color, point size, etc

• Create a chart with few clicks

Powered by VizGrammer lib that uses

Vaga undneath

(see https://github.com/wso2/VizGrammar)

Page 34: Introduction to  WSO2 Data Analytics Platform

Communicate: Alerts▪Done with CEP Queries▪Last Mile- Email, SMS- Push notifications to a UI- Pager - Trigger physical Alarm

Page 35: Introduction to  WSO2 Data Analytics Platform

Real Life Use Cases▪Cisco ( OEM the platform with Cisco solutions, Health, Smart Parking)

▪Experian ( Digital Marketing) - see video

▪Pacific Controls ( Smart City Platform, Vehicle tracking, building monitoring) - see video

▪Throttling and Anomaly Detection ( by group of Telco companies)

▪API Analytics (13+ customers)

No battle plan survives contact with

the enemy--Helmuth von Moltke

Page 36: Introduction to  WSO2 Data Analytics Platform

Key Differentiators• Open Source, under Apache 2 license• Publish data once, analyze it anyway

you like experience. • Flexible packaging or as a scalable

cluster • Rich, extensible, SQL-like configuration

language• Compact, easy to learn syntax

addressing complex requirements, such as time windows, patterns, sequences which would be complex to develop in a programming language such as Java.

• Rich set of data connectors, which can be easily extended

•Events only need to be published once from applications to the platform, and can be consumed by batch or real time pipeline.

• Performance on single node satisfies 90% of use cases

• Part of the overall WSO2 platform

36

Page 37: Introduction to  WSO2 Data Analytics Platform

More Information▪Introducing WSO2 Analytics Platform: Note for Architects, https://iwringer.wordpress.com/2015/03/18/introducing-wso2-analytics-platform-note-for-architects/

▪WSO2 Data Analytics Server, http://wso2.com/products/data-analytics-server/

▪WSO2 Complex Event Processor, http://wso2.com/products/complex-event-processor/

▪WSO2 Machine Learner, http://wso2.com/products/machine-learner/

Page 38: Introduction to  WSO2 Data Analytics Platform
Page 39: Introduction to  WSO2 Data Analytics Platform

Thank You