wise.io meetup at hacker dojo
DESCRIPTION
Slides by http://wise.io from the Silicon Valley Hands On Programming Events meetup group on July 13, 2013.TRANSCRIPT
Automate Your Workflowwise.io Machine Learning Tools
Joseph W. Richards, PhDCo-founder & Chief Scientist, wise.io
@wiseio
Wednesday, July 31, 13
Machine Learning - Set of data-driven statistical models aimed at solving problems with real-world data (large, high-dimensional, messy)
• Focus on prediction problems (e.g., spam vs. not spam) instead of parameter inference (e.g., is the effect significant)
• Computational requirements / constraints are important• Variety of input data from simple (structured) to complicated
text, image, time series & video (unstructured)• Flexible, non-parametric models are employed instead of
simple, parametric models Wednesday, July 31, 13
Drew ConwayWednesday, July 31, 13
APPROACHES TO DATA ANALYSIS
Manual Analysis & Simple Business
Rules
• Slow & labor intensive• Non-optimal• No statistical guarantees
Basic Analytics• Automated• Simple models• Inaccurate & limited
Machine Learning
• Automated• Highly accurate• Adaptible & flexible• Learn from new data
Wednesday, July 31, 13
TYPES OF MACHINE LEARNING
information retrieval - search, indexing, document retreival
classification - spam/fraud detection, sentiment analysis, ad targeting
regression - stock market prediction, sales prediction, cost forecasting
imputation - data cleaning, inference of missing information
recommendation - product recommendation, recruiting, ‘Netflix prize’
clustering - customer segmentation, product categorization
dimensionality reduction - visualization, manual insight, prediction
outlier detection - anomaly identification, process control
Wednesday, July 31, 13
ML USE CASES
frauddetection
adtargeting
sentimentanalysis
intelligentsensors
healthcare& genomics
Wednesday, July 31, 13
Long lead time from inception to
deployment
Existing toolkits are buckling under data
gravity
Data scientist scarcity
Results of prototyping
...
...large-scale production
environments
PERVASIVE ML PAIN POINTS
Wednesday, July 31, 13
ActionableInsight
Fast, Scalable Machine Learning
FeatureMarketplace
Your DataSources
Beautiful UIAPI & Embeddable Models
100x faster
patent- pending
MACHINE INTELLIGENCE ENGINE
textimagesvideo
time seriesgraph
CLOUD-BASED AND ON-PREMISE SOLUTIONS AVAILABLE
Democratizing Machine Intelligence
Wednesday, July 31, 13
EXAMPLE WORKFLOW:FRAUD DETECTION
1. Connect historical data(CSV, SQL, mongoDB, S3, Dropbox, etc.)
2. Ask the appropriate question: “how can I predict fraud (yes/no) for new transactions?”
3. Perform feature engineeringe.g., if I have text data then do NLP
4. Train an optimized classification model from the historical data to predict fraud as a function of input features
5. Use the optimized model in your production workflowRESTful APIs (Python, Ruby, Java) or embedded model file
Wednesday, July 31, 13
WiseRF™ THE ML ENGINE
‣ accurate nonlinear algorithm‣ heterogeneous data: categories,
numbers, integers, boolean‣ versatile‣ little tuning required‣ no normalization needed‣ handle missing data‣ robust to outliers
RANDOM FORESTS WiseRF™Our fast, memory-efficient and
scalable implementation of Random Forest
‣ Faster training: better optimization of models
‣ Faster predictions: smaller time between data collection and decision making
‣ Scalable learning: no need to subsample or approximate
‣ Memory efficient: embedded devices!Wednesday, July 31, 13
WiseRF™ BENCHMARKS
Digit recognition (MNIST): 45,000 training images, 784 dimensions
8GB dataset in 99 sec(SVM takes ~1 week)
Learning on Large Data:
Prediction on Fast Data:20 M predictions per second
Wednesday, July 31, 13