Download - Workshop on Machine Learning
![Page 1: Workshop on Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022042713/547e46e85806b5ea5e8b46a9/html5/thumbnails/1.jpg)
Machine Learningwhat, why and how
![Page 2: Workshop on Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022042713/547e46e85806b5ea5e8b46a9/html5/thumbnails/2.jpg)
me ...
Harshad➔ Senior Data Scientist @ Sokrati ➔ Spent last 4 years trying to understand and
apply machine learning
Sokrati is a digital advertising startup based out of Pune
![Page 3: Workshop on Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022042713/547e46e85806b5ea5e8b46a9/html5/thumbnails/3.jpg)
what are we going to do ?
![Page 4: Workshop on Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022042713/547e46e85806b5ea5e8b46a9/html5/thumbnails/4.jpg)
get a 10000 feet view ...
![Page 5: Workshop on Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022042713/547e46e85806b5ea5e8b46a9/html5/thumbnails/5.jpg)
then go to specifics ...
![Page 6: Workshop on Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022042713/547e46e85806b5ea5e8b46a9/html5/thumbnails/6.jpg)
10000 feet view
![Page 7: Workshop on Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022042713/547e46e85806b5ea5e8b46a9/html5/thumbnails/7.jpg)
what is ML ?
➔ Too many definitions!➔ Too much debate➔ Analytics vs ML vs data mining vs Big Data vs
Statistics vs next buzzword in the market
![Page 8: Workshop on Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022042713/547e46e85806b5ea5e8b46a9/html5/thumbnails/8.jpg)
what is ML ?
Teaching machines to take decisions with the help of data
practical man’s definition of machine learning!
![Page 9: Workshop on Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022042713/547e46e85806b5ea5e8b46a9/html5/thumbnails/9.jpg)
bit of history ...
![Page 10: Workshop on Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022042713/547e46e85806b5ea5e8b46a9/html5/thumbnails/10.jpg)
That’s too ancient!
That’s not ML...
![Page 11: Workshop on Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022042713/547e46e85806b5ea5e8b46a9/html5/thumbnails/11.jpg)
bit of (relevant) history ...
➔ Insurance, Banking industry◆ Credit scoring◆ mathematical models in finance
➔ Artificial Intelligence and other fancy ideas◆ deep blue, Samuel’s checkers machine◆ IBM watson computer
![Page 12: Workshop on Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022042713/547e46e85806b5ea5e8b46a9/html5/thumbnails/12.jpg)
Find ‘f’ => endeavour to understand the world!
g( y ) = f( X )
![Page 13: Workshop on Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022042713/547e46e85806b5ea5e8b46a9/html5/thumbnails/13.jpg)
the two cultures in ML
Stats Culture
Vs
AI / ML Culture
![Page 14: Workshop on Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022042713/547e46e85806b5ea5e8b46a9/html5/thumbnails/14.jpg)
the two cultures in ML
Stats Culture➔ Focus on ‘why this model’➔ Goodness of fit, hypothesis
tests, residuals➔ or MCMC methods, bayesian
modeling➔ regression models, survival
analysis
AI / ML Culture➔ Focus on ‘good predictions’➔ Cross validations, ensemble of
models➔ Focus on underfit vs overfit
analysis➔ neural nets, tree based models
(random forests et. al.)
![Page 15: Workshop on Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022042713/547e46e85806b5ea5e8b46a9/html5/thumbnails/15.jpg)
but let’s build bridges
Stats➔ Focus on basics, sound theory➔ Exploration, summaries➔ Models
ML / AI➔ Focus on predictions➔ Model evaluation➔ Feature selection
Computations➔ Focus on application➔ Achieving scale and usability➔ Hadoop , Storm and friends..
Business Knowledge➔ Focus on interpretation➔ Visualizations➔ Creating stories out of data
![Page 16: Workshop on Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022042713/547e46e85806b5ea5e8b46a9/html5/thumbnails/16.jpg)
typical ML process
➔ Objective➔ Source data➔ Explore➔ Model➔ Evaluate➔ Apply➔ Validate
![Page 17: Workshop on Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022042713/547e46e85806b5ea5e8b46a9/html5/thumbnails/17.jpg)
objectivein brief!
![Page 18: Workshop on Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022042713/547e46e85806b5ea5e8b46a9/html5/thumbnails/18.jpg)
bottom line : not in terms of algorithm but outcome
probability of customer churn
group set of emails by topic
predict rainfall
recommend item to a consumer to increase likelihood of click
![Page 19: Workshop on Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022042713/547e46e85806b5ea5e8b46a9/html5/thumbnails/19.jpg)
sourcing datato be covered at end!
![Page 20: Workshop on Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022042713/547e46e85806b5ea5e8b46a9/html5/thumbnails/20.jpg)
exploreR you ready ?
![Page 21: Workshop on Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022042713/547e46e85806b5ea5e8b46a9/html5/thumbnails/21.jpg)
introduction to R
➔ Starting R➔ Data Structures
◆ Atomic Vectors, matrix◆ Lists◆ Data frames
➔ Data types◆ usual suspects in numerics (int, double, character)◆ Factors ◆ logical
![Page 22: Workshop on Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022042713/547e46e85806b5ea5e8b46a9/html5/thumbnails/22.jpg)
data frame , the workhorse➔ Load sample data frame➔ Explore data frame (head, tail)➔ Access elements by index
◆ access rows◆ access columns
● single● multiple (by name, by index, by -ve index)
➔ Find metadata◆ names◆ dimensions
➔ Explore using plot (pairs)
![Page 23: Workshop on Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022042713/547e46e85806b5ea5e8b46a9/html5/thumbnails/23.jpg)
broadcasting / vectorization
➔ Very important concept➔ Subsetting
◆ vectors◆ data frames
➔ Applying operations◆ operations on entire column
![Page 24: Workshop on Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022042713/547e46e85806b5ea5e8b46a9/html5/thumbnails/24.jpg)
data exploration
➔ summarizing using mean➔ quantiles, when mean is not enough
◆ outlier detection➔ functional roots of R : sapply summary➔ summary function
◆ on numerical◆ on factors
➔ plots (basic)➔ histograms➔ correlations
![Page 25: Workshop on Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022042713/547e46e85806b5ea5e8b46a9/html5/thumbnails/25.jpg)
modelssimple & real world
![Page 26: Workshop on Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022042713/547e46e85806b5ea5e8b46a9/html5/thumbnails/26.jpg)
basic types of models
➔ Supervised learning➔ Unsupervised learning➔ Semi-supervised learning➔ Reinforcement learning
![Page 27: Workshop on Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022042713/547e46e85806b5ea5e8b46a9/html5/thumbnails/27.jpg)
linear regression
➔ load sample dataset (cars)➔ build linear regression model➔ understand the output
◆ summary◆ plots
➔ understanding train vs predict cycle◆ most important idea!
![Page 28: Workshop on Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022042713/547e46e85806b5ea5e8b46a9/html5/thumbnails/28.jpg)
demo on real world dataset
data exploration, classification
![Page 29: Workshop on Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022042713/547e46e85806b5ea5e8b46a9/html5/thumbnails/29.jpg)
hierarchical clustering
➔ basic idea of clustering◆ distance as a proxy for similarity, group by distance◆ group anything as long as distance can be
calculated➔ load and explore eurodist data➔ fit hierarchical cluster➔ plot dendrogram
![Page 30: Workshop on Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022042713/547e46e85806b5ea5e8b46a9/html5/thumbnails/30.jpg)
demo on real world dataand the most important idea!
![Page 31: Workshop on Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022042713/547e46e85806b5ea5e8b46a9/html5/thumbnails/31.jpg)
concept of vector space model➔ Words as axis➔ Bag of words defines
vector space➔ Document as a point in
space➔ We can
◆ define distance◆ measure similarity
(cosine similarity)◆ group documents
➔ what can be a document ?
![Page 32: Workshop on Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022042713/547e46e85806b5ea5e8b46a9/html5/thumbnails/32.jpg)
evaluation
![Page 33: Workshop on Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022042713/547e46e85806b5ea5e8b46a9/html5/thumbnails/33.jpg)
evaluation metrics
➔ depends on type of model◆ regression : MAPE, MSSE◆ classification : accuracy, precision, recall, F score◆ clustering : within vs between variance
➔ ML world (ref : two cultures) has much better story
➔ Not enough to perform well on training set
![Page 34: Workshop on Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022042713/547e46e85806b5ea5e8b46a9/html5/thumbnails/34.jpg)
brief intro regularization
➔ Bias vs Variance problem➔ We want to be ‘just right’➔ Concept of regularization➔ Intro Cross validation
![Page 35: Workshop on Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022042713/547e46e85806b5ea5e8b46a9/html5/thumbnails/35.jpg)
demo of evaluationand fantastic Scikits API
![Page 36: Workshop on Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022042713/547e46e85806b5ea5e8b46a9/html5/thumbnails/36.jpg)
sourcing and applyingand the great ML divide
![Page 37: Workshop on Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022042713/547e46e85806b5ea5e8b46a9/html5/thumbnails/37.jpg)
the great ML divide
Lab Culture➔ Theory➔ Small Datasets➔ In memory➔ Not live➔ R, Octave,
Python..
Source and Apply➔ The practice➔ Huge datasets➔ Live in production➔ Hadoop and
friends, Python ? , R ?
![Page 38: Workshop on Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022042713/547e46e85806b5ea5e8b46a9/html5/thumbnails/38.jpg)
processing data at scale
➔ Data is not available in final form➔ Non standard data
◆ click streams◆ event logs◆ free form text
➔ Process at scale➔ Transform , clean, group in final form
![Page 39: Workshop on Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022042713/547e46e85806b5ea5e8b46a9/html5/thumbnails/39.jpg)
5 min intro to Hadoop Ecosystem
➔ Write in assembly◆ java
➔ DSLs◆ Pig◆ hive◆ impala
➔ Functional Languages to rescue!
![Page 40: Workshop on Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022042713/547e46e85806b5ea5e8b46a9/html5/thumbnails/40.jpg)
Introduction to Cascalogprocessing data at scale
![Page 41: Workshop on Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022042713/547e46e85806b5ea5e8b46a9/html5/thumbnails/41.jpg)
Recap
➔ Pragmatic ML➔ Key phases➔ Supervised learning➔ Unsupervised learning➔ Evaluation➔ Application issues➔ Processing data @ scale
![Page 42: Workshop on Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022042713/547e46e85806b5ea5e8b46a9/html5/thumbnails/42.jpg)
Let’s discuss...