real-time machine learning with node.js - philipp burckhardt, carnegie mellon university
TRANSCRIPT
11/14/2016 Machine Learning
http://localhost:3000/#/?export&_k=lv9fld 1/23
11/14/2016 Machine Learning
http://localhost:3000/#/?export&_k=lv9fld 2/23
REAL-TIME MACHINELEARNING WITH NODE.JS
PHILIPP BURCKHARDTCarnegie Mellon University
11/14/2016 Machine Learning
http://localhost:3000/#/?export&_k=lv9fld 3/23
LEARNINGPATTERNSFROM DATA(iStock)
11/14/2016 Machine Learning
http://localhost:3000/#/?export&_k=lv9fld 4/23
REAL-TIME MACHINELEARNING WITH
NODE.JS
11/14/2016 Machine Learning
http://localhost:3000/#/?export&_k=lv9fld 5/23
BATCHBuild model using a batch of available data
INCREMENTALUpdate model as new data comes in
TRAINING ALGORITHMS
11/14/2016 Machine Learning
http://localhost:3000/#/?export&_k=lv9fld 6/23
8. // For each simulated datum, update the mean... 9. for ( var i = 0; i < 100; i++ ) { 10. var v = randu() * 100.0; 11. accumulator( v ); 12. } 13. var mean = accumulator(); 14. 15.
4. var incrmean = require( '@stdlib/math/generics/statistics/incrmean' ); 5. 6. var accumulator = incrmean(); 7.
16. 17. Update estimator as new data comes in...
11/14/2016 Machine Learning
http://localhost:3000/#/?export&_k=lv9fld 7/23
Prediction is very di�cult,especially if it's about thefuture.
- Nils Bohr
11/14/2016 Machine Learning
http://localhost:3000/#/?export&_k=lv9fld 8/23
INDEPENDENTLY ANDIDENTICALLY
DISTRIBUTED (I.I.D.)
DATA ASSUMED TO BE
Might not hold: e.g., time series are mostly non-stationary
11/14/2016 Machine Learning
http://localhost:3000/#/?export&_k=lv9fld 9/23
8. setInterval( function() { 9. var mem = os.freemem() / os.totalmem(); 10. accumulator( mem ); 11. var mean = accumulator(); 12. }, 1000 ); 13. 14. 15. 16. 17.
1. 'use strict'; 2. 3. var incrmmean = require( '@stdlib/math/generics/statistics/incrmmean' ); 4. var os = require( 'os' ); 5. 6. var accumulator = incrmmean( 5 ); 7.
Update moving mean as data comes in...
11/14/2016 Machine Learning
http://localhost:3000/#/?export&_k=lv9fld 10/23
Moving Means
window size
11/14/2016 Machine Learning
http://localhost:3000/#/?export&_k=lv9fld 11/23
11/14/2016 Machine Learning
http://localhost:3000/#/?export&_k=lv9fld 12/23
TYPES OF PROBLEMSRegression
0 20 40 60 80 1001,000
500
0
500
1,000
1,500
e.g., house prices
Classi�cation
0 20 40 60 80 1000
20
40
60
80
100
e.g., character recognition (OCR)
Clustering
0 20 40 60 80 1000
20
40
60
80
100
e.g., movie tastes
11/14/2016 Machine Learning
http://localhost:3000/#/?export&_k=lv9fld 13/23
REGRESSIONModel relationship between a numeric dependent
variable y and one or more explanatory variables X.
11/14/2016 Machine Learning
http://localhost:3000/#/?export&_k=lv9fld 14/23
14. registry 15. .on( 'package', function onPkg( pkg ) { 16. var nVersions = pkg.versions ? 17. pkg.versions.length : 0; 18. if ( pkg.created ) { 19. var current = new Date().getTime(); 20. var created = new Date( pkg.created ); 21. var age = ( current - created ) / 22. ( 1000 * 60 * 60 * 24 * 365 ); 23. model.update( [ age ], nVersions ); 24. } 25. })
10. 'loss': 'huber', 11. 'intercept': true 12. }); 13.
26. 27. 28. 29. 30. 31.
Use creation date to predict # of versions
11/14/2016 Machine Learning
http://localhost:3000/#/?export&_k=lv9fld 15/23
Start Regression line: = 0.000 + 0.000xNumber of package versions is positively correlated with age:
y
11/14/2016 Machine Learning
http://localhost:3000/#/?export&_k=lv9fld 16/23
CLASSIFICATIONModel relationship between a dependent categorical
variable y and one or more explanatory variables X.
11/14/2016 Machine Learning
http://localhost:3000/#/?export&_k=lv9fld 17/23
Predicting a binary outcome
8. var model = onlineClassification({ 9. 'lambda': 1e-6, 10. 'intercept': true, 11. 'loss': 'log' 12. }); 13. 14. registry.on( 'package', function onPkg( pkg ) { 15. var usesReact = pkg.mentions( 'react' ) ? 16. 1 : 17. -1; 18. 19. var features = [ 20. 'webpack', 'browserify', 'jest', 21. 'tape', 'mocha' 22. ].map( 23. d => pkg.devDependsOn( d ) ); 24. 25. var phat = model.predict( features, 'probability' ); 26. var yhat = phat > 0.5 ? +1 : -1; 27.
11/14/2016 Machine Learning
http://localhost:3000/#/?export&_k=lv9fld 18/23
React Usageparadigm-tagsmarketeercouchcachedatamodel-to-openapieaze-requestparadigm-categoriesjoeljparks-hubot-cosmicjrparadigm-taxonomiesgulp-controlled-merge-jsonember-cli-addon-tests
PredictedYes
PredictedNo
Yes 0 4No 0 34
Webpack Browserify Jest Tape Mocha
2.0
1.0
1.0
2.0
11/14/2016 Machine Learning
http://localhost:3000/#/?export&_k=lv9fld 19/23
Evaluating regression and classi�cationmodels
500 1,000 1,500 2,000 2,500
0.10
0.20
0.30
0.40
0.50
15%
Look at generalizationerror (performance on data notused for model training)
Our toy model does not doso well: A mis-classi�cation rateof 13% might sound great, butalways predicting -1 yields 15%!
11/14/2016 Machine Learning
http://localhost:3000/#/?export&_k=lv9fld 20/23
CLUSTERINGGroup observations into meaningful clusters such that
objects within a cluster are similar to each other and di�erentfrom objects assigned to the other clusters.
11/14/2016 Machine Learning
http://localhost:3000/#/?export&_k=lv9fld 21/23
POPULAR ALGORITHMSkmeansdbscanHierarchical ClusteringMixture of Gaussians
4 5 6 7 8Dim. 1
2
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
4
4.1
4.2
4.3
4.4Dim. 2
1
2
3
4
5
6
7Dim. 3
4 5 6 7 8 Dim. 1
2
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
4
4.1
4.2
4.3
4.4Dim. 2
1
2
3
4
5
6
7Dim. 3
Cluster 1
Cluster 2
Cluster 3
Iris setosa
Iris versicolor
Iris virginica
Iris SpecieskMeans Clusters
11/14/2016 Machine Learning
http://localhost:3000/#/?export&_k=lv9fld 22/23
Free Textbooks:"An Introduction to Statistical Learning" by James, Witten,
Hastie & Tibshirani (plus accompanying video lectures)"Elements of Statistical Learning: Data Mining, Inference,
and Prediction." by Hastie, Tibshirani & Friedmanstdlib GitHub repository: https://github.com/stdlib-
js/stdlib/tree/develop
FURTHER RESOURCES
11/14/2016 Machine Learning
http://localhost:3000/#/?export&_k=lv9fld 23/23
THANK YOU!