wso2 analytics platform - the one stop shop for all your data needs

66
WSO2 Analytics Platform: The One Stop Shop for All Your Data Needs Sriskandarajah Suhothayan Associate Director/Architect, WSO2 Anjana Fernando Senior Technical Lead, WSO2

Upload: sriskandarajah-suhothayan

Post on 15-Apr-2017

379 views

Category:

Data & Analytics


5 download

TRANSCRIPT

Page 1: WSO2 Analytics Platform - The one stop shop for all your data needs

WSO2 Analytics Platform: The One Stop Shop for All Your Data Needs

Sriskandarajah Suhothayan

Associate Director/Architect, WSO2

Anjana Fernando

Senior Technical Lead, WSO2

Page 2: WSO2 Analytics Platform - The one stop shop for all your data needs

WSO2 Analytics Platform

WSO2 Analytics Platform uniquely combines simultaneous real-time, interactive, batch with predictive analytics to turn data from IoT, mobile and Web apps into actionable insights

Page 3: WSO2 Analytics Platform - The one stop shop for all your data needs

WSO2 Analytics Platform

Page 4: WSO2 Analytics Platform - The one stop shop for all your data needs

WSO2 Data Analytics Server

• Fully-open source solution with the ability to build systems and applications that collect and analyze both realtime and persisted data and communicate the results.

• Part of WSO2 Big Data Analytics Platform

• High performance data capture framework

• Highly available and scalable by design

• Pre-built Data Agents for WSO2 products

Page 5: WSO2 Analytics Platform - The one stop shop for all your data needs

Case Study : Smart Home

• DEBS (Distributed Event Based Systems) is a premier academic conference, which post yearly event processing challenge (http://www.cse.iitb.ac.in/debs2014/?page_id=42)

• Smart Home electricity data: 2000 sensors, 40 houses, 4 Billion events

• We posted fastest single node solution measured (400K events/sec) and close to one million distributed throughput.

• WSO2 CEP based solution is one of the four finalists (with Dresden University of Technology, Fraunhofer Institute, and Imperial College London)

• Only generic solution to become a finalist

Page 6: WSO2 Analytics Platform - The one stop shop for all your data needs

a

Experian delivers a digital marketing platform, where CEP plays a key role to analyze in real-time customers behavior and offer targeted promotions. CEP was chosen after careful analysis, primarily for its openness, its open source nature, the fact support is driven by engineers and the availability of a complete middleware, integrated with CEP, for additional use cases.

Eurecat is the Catalunya innovation center (in Spain) - Using CEP to analyze data from iBeacons deployed within department stores to offer instant rebates to user or send them help if it detected that they seem “stuck” in the shop area. They chose WSO2 due to real time processing, the variety of IoT connectors available as well as the extensible framework and the rich configuration language. They also use WSO2 ESB in conjunction with WSO2 CEP.

Pacific Controls is an innovative company delivering an IoT platform of platforms: Galaxy 2021. The platform allows to manage all kinds of devices within a building and take automated decisions such as moving an elevator or starting the air conditioning based on certain conditions. Within Galaxy2021, CEP is used for monitoring alarms and specific conditions.Pacific Controls also uses other products from the WSO2 platform, such as WSO2 ESB and Identity..

A leading airline uses CEP to enhance customer experience by calculating the average time to reach their boarding gate (going through security, walking, etc.). They also want to track the time it takes to clean a plane, in order to better streamline the boarding process and notify both the airline and customers about potential delays. They evaluated WSO2 CEP first as they were already using our platform and decided to use it as it addressed all their requirements.

Customer Stories

Page 7: WSO2 Analytics Platform - The one stop shop for all your data needs

Healthcare Data Monitoring

• Allows to search/visualize/analyze healthcare records (HL7) across 20 hospitals in Italy

• Used in combination with WSO2 ESB• Custom toolbox tailored to customer’s requirement (to replace existing system)

Page 8: WSO2 Analytics Platform - The one stop shop for all your data needs

WSO2 DAS Architecture

Page 9: WSO2 Analytics Platform - The one stop shop for all your data needs

Data Processing Pipeline

Collect Data

• Define scheme for data

• Send events to batch and/or Real time pipeline

• Publish events

Analyze

• Spark SQL for batch analytics

• Siddhi Query Language for real time analytics

• Predictive models for Machine Learning.

Communicate

• Alerts• Dashboards• API

Page 10: WSO2 Analytics Platform - The one stop shop for all your data needs

Highly Pluggable Event Receiver Architecture

Page 11: WSO2 Analytics Platform - The one stop shop for all your data needs

Data Model

Data published conforming to a strongly typed data stream{

'name': 'stream.name',

'version': '1.0.0',

'nickName': 'stream nick name',

'description': 'description of the stream',

'metaData':[

{'name':'meta_data_1','type':'STRING'},

],

'correlationData':[

{'name':'correlation_data_1','type':'STRING'}

],

'payloadData':[

{'name':'payload_data_1','type':'BOOL'},

{'name':'payload_data_2','type':'LONG'}

]

}

Page 12: WSO2 Analytics Platform - The one stop shop for all your data needs

Data Persistence

• Data Abstraction Layer to enable pluggable data connectors– RDBMS, Cassandra, HBase, custom..

• Analytics Tables– The data persistence entity in WSO2 Data Analytics Server– Provides a backend data source agnostic way of storing and retrieving

data– Allows applications to be written in a way, that it does not depend on a

specific data source, e.g. JDBC (RDBMS), Cassandra APIs etc.. – WSO2 DAS gives a standard REST API in accessing the Analytics Tables

Page 13: WSO2 Analytics Platform - The one stop shop for all your data needs

Data Persistence

• Analytics Record Stores– An Analytics Record Store, stores a specific set of Analytics Tables– Event persistence can configure which Analytics Record Store to be used for

storing incoming events– Single Analytics Table namespace, the target record store only given at the time

of table creation– Useful in creating Analytics Tables where data will be stored in multiple target

databases

Page 14: WSO2 Analytics Platform - The one stop shop for all your data needs

Interactive Analytics

Page 15: WSO2 Analytics Platform - The one stop shop for all your data needs

Interactive Analysis

• Full text data indexing support powered by Apache Lucene

• Drilldown search support• Distributed data indexing

– Designed to support scalability• Near real time data indexing and

retrieval– Data indexed immediately as

received

Page 16: WSO2 Analytics Platform - The one stop shop for all your data needs

Interactive Analysis

Page 17: WSO2 Analytics Platform - The one stop shop for all your data needs

Activity Monitoring

• Correlate the messages collected based on the activity_id in the metadata of the event

• Trace the transaction path where the events could be in different tables using lucene queries

Page 18: WSO2 Analytics Platform - The one stop shop for all your data needs

Activity Explorer

Page 19: WSO2 Analytics Platform - The one stop shop for all your data needs

Batch Analytics

Page 20: WSO2 Analytics Platform - The one stop shop for all your data needs

Batch Analytics

● Powered by Apache Spark up to 30x higher performance than Hadoop

● Parallel, distributed with optimized in-memory processing

● Scalable script-based analytics written using an easy-to-learn, SQL-like

query language powered by Spark SQL

● Interactive built in web interface for ad-hoc query execution

● HA/FO supported scheduled query script execution

● Run Spark on a single node, Spark embedded Carbon server cluster or

connect to external Spark cluster

Page 21: WSO2 Analytics Platform - The one stop shop for all your data needs

Batch Analytics

Page 22: WSO2 Analytics Platform - The one stop shop for all your data needs

Batch Analytics

Page 23: WSO2 Analytics Platform - The one stop shop for all your data needs

● Idea is to given the “Overall idea” in a glance (e.g. car dashboard)

● Support for personalization, you can build your own dashboard.

● Also the entry point for Drill down● How to build?

○ Dashboard via Google Gadget and content via HTML5 + Javascript

○ Use WSO2 User Engagement Server to build a dashboard (or JSP/PHP)

○ Use charting libraries like Vega or D3

Communicate: Dashboards

Page 24: WSO2 Analytics Platform - The one stop shop for all your data needs

● Start with data in tabular format ● Map each column to dimension in your plot like X,Y, color,

point size, etc ● Also do drill-downs● Create a chart with few clicks

Gadget Generation Wizard

Page 25: WSO2 Analytics Platform - The one stop shop for all your data needs

Realtime Analysis

Page 26: WSO2 Analytics Platform - The one stop shop for all your data needs

What’s Realtime Analytics?...

Realtime Analytics in Complex Event Processor

• Gather data from multiple sources• Correlate data streams over time• Find interesting occurrences • And Notify • All in Realtime !

Page 27: WSO2 Analytics Platform - The one stop shop for all your data needs

Market Recognition

• Named as a Strong Performer in The Forrester Wave™: Big Data Streaming Analytics, Q1 2016.

• Highest score possible in 'Acquisition and Pricing' criteria, and among second-highest scores in 'Ability to execute' criteria

• The Forrester Report notes…..

“WSO2 is an open source middleware provider that includes a full spectrum of architected-as-one components such as application servers, message brokers, enterprise service bus, and many

others.

Its streaming analytics solution follows the complex event processor architectural approach, so it provides very low-latency analytics. Enterprises that already use WSO2 middleware can add CEP seamlessly. Enterprises looking for a full middleware stack that includes streaming analytics will

find a place for WSO2 on their shortlist as well.”

Page 28: WSO2 Analytics Platform - The one stop shop for all your data needs

What is WSO2 CEP ?

Page 29: WSO2 Analytics Platform - The one stop shop for all your data needs

Event Flow of WSO2 CEP

Page 30: WSO2 Analytics Platform - The one stop shop for all your data needs

Realtime Execution

• Process in streaming fashion (one event at a time)

• Execution logic written as Execution Plans

• Execution Plan– An isolated logical execution unit– Includes a set of queries, and relates to multiple input and output

event streams– Executed using dedicated WSO2 Siddhi engine

Page 31: WSO2 Analytics Platform - The one stop shop for all your data needs

Realtime Processing Patterns

• Transformation - project, translate, enrich, split

• Filter

• Composition / Aggregation / Analytics

– basic stats, group by, moving averages

• Join multiple streams

• Detect patterns

– Coordinating events over time

– Trends – increasing, decreasing, stable, on-increasing, non-decreasing, mixed

• Integrate with historical data

Page 32: WSO2 Analytics Platform - The one stop shop for all your data needs

Siddhi Query Structure

define stream <event stream>(<attribute> <type>,<attribute> <type>, ...);

from <event stream>select <attribute>,<attribute>, ...insert into <event stream> ;

Page 33: WSO2 Analytics Platform - The one stop shop for all your data needs

define stream SoftDrinkSales

(region string, brand string, quantity int,

price double);

from SoftDrinkSales

select brand, quantity

insert into OutputStream ;

define stream OutputStream

(brand string, quantity int);

Output Streams are inferred

Siddhi Query ...

Page 34: WSO2 Analytics Platform - The one stop shop for all your data needs

define stream SoftDrinkSales

(region string, brand string, quantity int,

price double);

from SoftDrinkSales

select brand, avg(price*quantity) as avgCost,‘USD’ as currency

insert into AvgCostStream

from AvgCostStream

select brand, toEuro(avgCost) as avgCost,‘EURO’ as currency

insert into OutputStream ;

Enriching Streams

Using Functions

Siddhi Query ...

Page 35: WSO2 Analytics Platform - The one stop shop for all your data needs

define stream SoftDrinkSales

(region string, brand string, quantity int,

price double);

from SoftDrinkSales[region == ‘USA’ and quantity > 99]

select brand, price, quantity

insert into WholeSales ;

from SoftDrinkSales#window.time(1 hour)

select region, brand, avg(quantity) as avgQuantity

group by region, brand

insert into LastHourSales ;

Filtering

Aggregation over 1 hour

Other supported window types: timeBatch(), length(), lengthBatch(), etc.

Siddhi Query (Filter & Window) ...

Page 36: WSO2 Analytics Platform - The one stop shop for all your data needs

define stream Purchase (price double, cardNo long,place string);

from every (a1 = Purchase[price < 10] ) ->

a2 = Purchase[ price >10000 and a1.cardNo == a2.cardNo ]

within 1 day

select a1.cardNo as cardNo, a2.price as price, a2.place as place

insert into PotentialFraud ;

Siddhi Query (Pattern) ...

Page 37: WSO2 Analytics Platform - The one stop shop for all your data needs

define stream StockStream (symbol string, price double, volume int);

partition by (symbol of StockStream)

begin

from t1=StockStream,

t2=StockStream [(t2[last] is null and t1.price < price) or

(t2[last].price < price)]+

within 5 min

select t1.price as initialPrice, t2[last].price as finalPrice,t1.symbol

insert into IncreaingMyStockPriceStream

end;

Siddhi Query (Trends & Partition)...

Page 38: WSO2 Analytics Platform - The one stop shop for all your data needs

define table CardUserTable (name string, cardNum long) ;

@from(eventtable = 'rdbms' , datasource.name = ‘CardDataSource’ , table.name = ‘UserTable’, caching.algorithm’=‘LRU’)

define table CardUserTable (name string, cardNum long)

Cache types supported

• Basic: A size-based algorithm based on FIFO.• LRU (Least Recently Used): The least recently used event is dropped

when cache is full.• LFU (Least Frequently Used): The least frequently used event is dropped

when cache is full.

Siddhi Query (Table) ...

Supported for RDBMS, In-Memory, Analytics Table,

Hazelcast

Page 39: WSO2 Analytics Platform - The one stop shop for all your data needs

define stream Purchase (price double, cardNo long, place string);

define stream CardUserStream (name string, cardNo long) ;

define table CardUserTable (name string, cardNum long) ;

from Purchase#window.length(1) join CardUserTable

on Purchase.cardNo == CardUserTable.cardNum

select Purchase.cardNo as cardNo, CardUserTable.name as name, Purchase.price as price

insert into PurchaseUserStream ;

from CardUserStream

select name, cardNo as cardNum

update CardUserTable

on CardUserTable.name == name ;

Similarly insert into and delete are also supported!

Siddhi Query (Table) ...

Page 40: WSO2 Analytics Platform - The one stop shop for all your data needs

• Function extension• Aggregator extension• Window extension• Stream Processor extension

define stream SalesStream (brand string, price double, currency string);

from SalesStream

select brand, custom:toUSD(price, currency) as priceInUSD

insert into OutputStream ;

Referred with namespaces

Siddhi Query (Extension) ...

Page 41: WSO2 Analytics Platform - The one stop shop for all your data needs

• geo: Geographical processing • nlp: Natural language Processing (with Stanford NLP)• ml: Running machine learning models of WSO2 Machine Lerner • pmml: Running PMML models learnt by R• timeseries: Regression and time series • math: Mathematical operations• str: String operations • regex: Regular expression • ...

Siddhi Extensions

Page 42: WSO2 Analytics Platform - The one stop shop for all your data needs

Event Publisher

*Supports custom event publishers via its pluggable architecture!

Page 43: WSO2 Analytics Platform - The one stop shop for all your data needs

Realtime Dashboard

• Dashboard – Google Gadget – HTML5 + javascripts

• Support gadget generation – Using D3 and Vega

• Gather data for UI from – Websockets – Polling

• Support Custom Gadgets and Dashboards

Page 44: WSO2 Analytics Platform - The one stop shop for all your data needs

Predictive Analysis

Page 45: WSO2 Analytics Platform - The one stop shop for all your data needs

What’s Predictive Analytics?...

Predictive Analytics in Machine Learner

• Extract, pre-process, and explore data• Create models, tune algorithms and make predictions• Integrate for better intelligence

Page 46: WSO2 Analytics Platform - The one stop shop for all your data needs

Predictive Analytics

• Guided UI to build machine learning models via – Apache Spark MLlib– H2O.ai (for deep learning

algorithms)– R and export them as PMML

• Run models using CEP, DAS and ESB• Run R Scripts, Regression and Anomaly Detection on Realtime

Page 47: WSO2 Analytics Platform - The one stop shop for all your data needs

Terminology

• Input data must be in a tabular format • Each row is called a data point • Each column is called a feature • Value you are going to predict is called the “response variable”

Page 48: WSO2 Analytics Platform - The one stop shop for all your data needs

WSO2 ML Overview

Page 49: WSO2 Analytics Platform - The one stop shop for all your data needs

Guided process

Page 50: WSO2 Analytics Platform - The one stop shop for all your data needs

An insight into data

Page 51: WSO2 Analytics Platform - The one stop shop for all your data needs

Data Exploration

Page 52: WSO2 Analytics Platform - The one stop shop for all your data needs

Supported

Algorithms

Page 53: WSO2 Analytics Platform - The one stop shop for all your data needs

Machine Learning Pipeline

Page 54: WSO2 Analytics Platform - The one stop shop for all your data needs

Evaluate built models

Page 55: WSO2 Analytics Platform - The one stop shop for all your data needs

Prediction in Real-time

Page 56: WSO2 Analytics Platform - The one stop shop for all your data needs

Predicting the Big Game !

Page 57: WSO2 Analytics Platform - The one stop shop for all your data needs

http://wso2.com/landing/big-data-game/

Page 58: WSO2 Analytics Platform - The one stop shop for all your data needs

DAS High Available Clustered Setup

Page 59: WSO2 Analytics Platform - The one stop shop for all your data needs

WSO2 CEP (Realtime) Scalability

Distributed Realtime = Siddhi +

Advantages over Apache Storm

• No need to write Java code (Supports SQL like query language)

• No need to start from basic principles (Supports high level language)

• Adoption for change is fast

• Govern artifacts using Toolboxes

• etc ...

Page 60: WSO2 Analytics Platform - The one stop shop for all your data needs

How we scale ?

Page 61: WSO2 Analytics Platform - The one stop shop for all your data needs

Siddhi QL - distributed

define stream StockStream (symbol string, volume int, price double);

@name(Filter Query’)@dist(parallel= ‘3')from StockStream[price > 75]select *insert into HightPriceStockStream ;

@name(‘Window Query’)@dist(parallel= ‘2')partition with (symbol of HighPriceStockStream)begin

from HighPriceStockStream#window.time(10 min)select symbol, sum(volume) as sumVolume insert into ResultStockStream ;

end;

Page 62: WSO2 Analytics Platform - The one stop shop for all your data needs

Distributed Execution on Storm UI

Page 63: WSO2 Analytics Platform - The one stop shop for all your data needs

WSO2 ML (Predictive Analytics) Deployment

Page 64: WSO2 Analytics Platform - The one stop shop for all your data needs

Iris DataSet

setosa versicolor virginica

Page 65: WSO2 Analytics Platform - The one stop shop for all your data needs

Analytics for Products

Core :

•Analytics for Products distributions :

• Analytics ESB• Analytics IoTS• Analytics IS• etc

Page 66: WSO2 Analytics Platform - The one stop shop for all your data needs

Thank You!

#WSO2ConEU

Share your feedback for this session

wso2con.com/app