machine learning in the real world
TRANSCRIPT
Machine Learning in the Real World
Srinath Perera, Ph.DVP Research, WSO2
Member, Apache Foundation@srinath_perera
Premise of Big DataIf you collect data about your business, and feed it to a Big Data system, you will find useful insights
that will provide competitive advantage– (e.g. Analysis of data sets can find new correlations to
"spot business trends, prevent diseases, combat crime and so on”. [Wikipedia])
Predictive Analytics Can you “Write a
program to drive a Car?”
Machine learning Takes in lot of examples,
and build a program that matches those examples
We call that program a “model”
Lot of tools R ( Statistical language) Scikit learn (Python) Apache Spark’s MLBase
and Apache Mahout (Java)
By Michael Shick - Own work, CC BY-SA 4.0
Predictive Analytics in DAS
• Building models – With WSO2 Machine
Learner Product via a Wizard ( powered by MLLib)
– Build model using R and export them as PMML
• Built models can be used them with both WSO2 CEP and ESB
Supported Algorithms• Deep Learning based classification (H2O’s Stacked
Autoencoders Classifier).• Classification algorithms - Decision Trees, Linear
Regression, Lasso Regression, SVM, Naïve • K-Mean clustering for unsupervised learning on your
data• Employ Anomaly Detection using K Means
Algorithm to identify fraud, network penetration and other difficult scenarios
• Recommendations Engine (Collaborative Filtering Algorithm)
Challenges: Keeping the System Running
● Incorporate Continuous data ● Integrate data continuously ● Get feedback about
effectiveness of decisions (e.g. Accuracy of Fraud)
● Track and update models● Trends change● Generate models in batch
mode and update
Predict wait time in the Airport
• Predicting the time to go through airport
• Real-time updates and events to passengers
• Let airport manage by allocate resources
• Implemented using linear regression
People Tracking via BLE
• Track people through BLE via triangulation
• Higher level logic via Complex Event Processing
• Traffic Monitoring • Smart retail • Airport management
Predict Super Bowl• Predicted 7 of the
11 games • Done with Random
Forest Regression Algorithm
• Even what we missed are instructive
See Yuda’s post: Predicting the Super Bowl with Machine Learning https://www.youtube.com/watch?v=GirdyHxl_Yk
Predict Defect Probability • Production line
manufacturing equipment
• Expensive to extensively test each equipment
• Instead use ML to decide which equipment should be tested using data via Automated optical inspection (AOI)
• Used Random Forest over 9 features
• 96% accuracy detecting faulty equipment and 95.9 accuracy on identifying root cause.
Image credit Youtube channel (epSos.de)
Predict Promising Customers• Typical website can get millions of
users • Only very small fraction coverts • Each user, we know what he access,
where is works, country, what browser, OS, etc.
• Problem is to predict what users will covert
• Used Logistic regression, Random Forest, Survival Modeling etc.
Anomaly Detection:Markov Models • Can model
probability of a sequences• Given a
sequence, can predict likelihood, and use that to detect anomalies. • Implemented
with WSO2 CEP
Anomaly Detection: Clustering
• Use clustering to identify normal behavior as clusters • Consider points
away from all cluster as anomalies. • Point is considered
away from a cluster if it is outside 99% percentile line for that cluster • Includes in WSO2
ML
IoT Analytics: Time Series Forecasts
• All data from Internet of Things (IoT) are time series data
• With the explosion of IoT, we will have many time series use cases
• Time series data are different from other data due to auto correlation
• Several approaches– Careful feature engineering ( e.g. using window operations
like moving averages etc.)– ARIMA– Rolling window based regression – Recurrent Neural Networks (RNN)
IoT Analytics: Spatiotemporal Forecasts
• Most IoT use cases have location data as well
• That is spatiotemporal data, which is different due to spatiotemporal correlation
• Approaches – Careful feature engineering
( e.g. using window operations like moving averages etc)
– Timebox methods ( sliding spatial box)
– RNN
Predictive Maintenance• Fix the problem before it
happens, avoiding expensive downtimes– Airplanes, turbines, windmills – Construction Equipment– Car, Golf carts
• How– Predicting useful lifetime – Build a model for normal
operation and compare deviation
– Match against known error patterns
Summary• WSO2 Machine Learner • Use cases
– Predict wait time in the Airport – People Tracking via BLE– Predict Super Bowl– Predict Defect Probability – Predict Promising Customers– Anomaly Detection: Markov
Models – Anomaly Detection: Clustering
• Things we are working on – IoT Analytics: Time Series
Forecasts– IoT Analytics: Spatiotemporal
Forecasts– Predictive Maintenance
• Welcome collaborations