machine learning in the real world

24
Machine Learning in the Real World Srinath Perera, Ph.D VP Research, WSO2 Member, Apache Foundation @srinath_perera

Upload: srinath-perera

Post on 21-Apr-2017

755 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Machine Learning in the Real World

Srinath Perera, Ph.DVP Research, WSO2

Member, Apache Foundation@srinath_perera

Premise of Big DataIf you collect data about your business, and feed it to a Big Data system, you will find useful insights

that will provide competitive advantage– (e.g. Analysis of data sets can find new correlations to

"spot business trends, prevent diseases, combat crime and so on”. [Wikipedia])

Predictive Analytics Can you “Write a

program to drive a Car?”

Machine learning Takes in lot of examples,

and build a program that matches those examples

We call that program a “model”

Lot of tools R ( Statistical language) Scikit learn (Python) Apache Spark’s MLBase

and Apache Mahout (Java)

By Michael Shick - Own work, CC BY-SA 4.0

Predictive Analytics in DAS

• Building models – With WSO2 Machine

Learner Product via a Wizard ( powered by MLLib)

– Build model using R and export them as PMML

• Built models can be used them with both WSO2 CEP and ESB

WSO2 Machine Learner• Upload or select data • Explore the data • Train a Machine

learning model

WSO2 Machine Learner• Compare Results• Understand why• Iterate

Supported Algorithms• Deep Learning based classification (H2O’s Stacked

Autoencoders Classifier).• Classification algorithms - Decision Trees, Linear

Regression, Lasso Regression, SVM, Naïve • K-Mean clustering for unsupervised learning on your

data• Employ Anomaly Detection using K Means

Algorithm to identify fraud, network penetration and other difficult scenarios

• Recommendations Engine (Collaborative Filtering Algorithm)

Who will use you’re system and How?

Challenges: Keeping the System Running

● Incorporate Continuous data ● Integrate data continuously ● Get feedback about

effectiveness of decisions (e.g. Accuracy of Fraud)

● Track and update models● Trends change● Generate models in batch

mode and update

Use Cases

Predict wait time in the Airport

• Predicting the time to go through airport

• Real-time updates and events to passengers

• Let airport manage by allocate resources

• Implemented using linear regression

People Tracking via BLE

• Track people through BLE via triangulation

• Higher level logic via Complex Event Processing

• Traffic Monitoring • Smart retail • Airport management

Predict Super Bowl• Predicted 7 of the

11 games • Done with Random

Forest Regression Algorithm

• Even what we missed are instructive

See Yuda’s post: Predicting the Super Bowl with Machine Learning https://www.youtube.com/watch?v=GirdyHxl_Yk

Predict Defect Probability • Production line

manufacturing equipment

• Expensive to extensively test each equipment

• Instead use ML to decide which equipment should be tested using data via Automated optical inspection (AOI)

• Used Random Forest over 9 features

• 96% accuracy detecting faulty equipment and 95.9 accuracy on identifying root cause.

Image credit Youtube channel (epSos.de)

Predict Promising Customers• Typical website can get millions of

users • Only very small fraction coverts • Each user, we know what he access,

where is works, country, what browser, OS, etc.

• Problem is to predict what users will covert

• Used Logistic regression, Random Forest, Survival Modeling etc.

Anomaly Detection:Markov Models • Can model

probability of a sequences• Given a

sequence, can predict likelihood, and use that to detect anomalies. • Implemented

with WSO2 CEP

Anomaly Detection: Clustering

• Use clustering to identify normal behavior as clusters • Consider points

away from all cluster as anomalies. • Point is considered

away from a cluster if it is outside 99% percentile line for that cluster • Includes in WSO2

ML

Few Things We are

Working on

Value Preposition

IoT Analytics: Time Series Forecasts

• All data from Internet of Things (IoT) are time series data

• With the explosion of IoT, we will have many time series use cases

• Time series data are different from other data due to auto correlation

• Several approaches– Careful feature engineering ( e.g. using window operations

like moving averages etc.)– ARIMA– Rolling window based regression – Recurrent Neural Networks (RNN)

IoT Analytics: Spatiotemporal Forecasts

• Most IoT use cases have location data as well

• That is spatiotemporal data, which is different due to spatiotemporal correlation

• Approaches – Careful feature engineering

( e.g. using window operations like moving averages etc)

– Timebox methods ( sliding spatial box)

– RNN

Predictive Maintenance• Fix the problem before it

happens, avoiding expensive downtimes– Airplanes, turbines, windmills – Construction Equipment– Car, Golf carts

• How– Predicting useful lifetime – Build a model for normal

operation and compare deviation

– Match against known error patterns

Summary• WSO2 Machine Learner • Use cases

– Predict wait time in the Airport – People Tracking via BLE– Predict Super Bowl– Predict Defect Probability – Predict Promising Customers– Anomaly Detection: Markov

Models – Anomaly Detection: Clustering

• Things we are working on – IoT Analytics: Time Series

Forecasts– IoT Analytics: Spatiotemporal

Forecasts– Predictive Maintenance

• Welcome collaborations