machine learning in python -...
TRANSCRIPT
![Page 1: Machine Learning in Python - gradquant.ucr.edugradquant.ucr.edu/.../ML_Python_Spring2018_part1.pdf · Machine Learning •Developed out of initial work in Artificial Intelligence](https://reader034.vdocuments.site/reader034/viewer/2022052309/5b8914eb7f8b9a655f8b5f8c/html5/thumbnails/1.jpg)
Machine Learning in PythonRohith Mohan
GradQuantSpring 2018
![Page 2: Machine Learning in Python - gradquant.ucr.edugradquant.ucr.edu/.../ML_Python_Spring2018_part1.pdf · Machine Learning •Developed out of initial work in Artificial Intelligence](https://reader034.vdocuments.site/reader034/viewer/2022052309/5b8914eb7f8b9a655f8b5f8c/html5/thumbnails/2.jpg)
What is Machine Learning?
https://twitter.com/myusuf3/status/995425049170489344
![Page 3: Machine Learning in Python - gradquant.ucr.edugradquant.ucr.edu/.../ML_Python_Spring2018_part1.pdf · Machine Learning •Developed out of initial work in Artificial Intelligence](https://reader034.vdocuments.site/reader034/viewer/2022052309/5b8914eb7f8b9a655f8b5f8c/html5/thumbnails/3.jpg)
http://www.hlt.utdallas.edu/~vgogate/ml/2013f/lectures.html
Computer
Data
Program
Output
Output
Traditional Programming
Machine Learning
Computer
Data
Program
• Getting computers to program themselves
• Coding is the bottleneck, let data dictate programming
![Page 4: Machine Learning in Python - gradquant.ucr.edugradquant.ucr.edu/.../ML_Python_Spring2018_part1.pdf · Machine Learning •Developed out of initial work in Artificial Intelligence](https://reader034.vdocuments.site/reader034/viewer/2022052309/5b8914eb7f8b9a655f8b5f8c/html5/thumbnails/4.jpg)
Formal Definitions
Andrew Ng Machine Learning Coursera
• Arthur Samuel (1959)• “Machine Learning: Field of study that gives computers the ability to learn
without being explicitly programmed.”
• Created a program for computer to play itself in checkers (10000s games) and learn at IBM
• Tom Mitchell (1998)• “Well-posed Learning Problem: A computer program is said to learn from
experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.”
![Page 5: Machine Learning in Python - gradquant.ucr.edugradquant.ucr.edu/.../ML_Python_Spring2018_part1.pdf · Machine Learning •Developed out of initial work in Artificial Intelligence](https://reader034.vdocuments.site/reader034/viewer/2022052309/5b8914eb7f8b9a655f8b5f8c/html5/thumbnails/5.jpg)
Machine Learning
• Developed out of initial work in Artificial Intelligence (AI)• Increased availability of large datasets and advances in computing
architecture boosted usage in recent times
https://en.wikipedia.org/wiki/Timeline_of_machine_learning
![Page 6: Machine Learning in Python - gradquant.ucr.edugradquant.ucr.edu/.../ML_Python_Spring2018_part1.pdf · Machine Learning •Developed out of initial work in Artificial Intelligence](https://reader034.vdocuments.site/reader034/viewer/2022052309/5b8914eb7f8b9a655f8b5f8c/html5/thumbnails/6.jpg)
Usage
Mining and clustering gene expression data to identify individuals
https://de.wikipedia.org/wiki/Genexpressionsanalyse
Natural Language Processing + Computer Vision
http://www.idownloadblog.com/2016/05/12/google-translate-offline-mode/
https://www.irishnews.com/magazine/science/2018/01/01/news/12-of-the-biggest-scientific-breakthroughs-of-2017-that-might-just-change-the-world-1222695/
Reproducing human behavior (True AI)
Recommendation algorithms https://www.flickr.com/photos/theadamclarke/2589233355
![Page 7: Machine Learning in Python - gradquant.ucr.edugradquant.ucr.edu/.../ML_Python_Spring2018_part1.pdf · Machine Learning •Developed out of initial work in Artificial Intelligence](https://reader034.vdocuments.site/reader034/viewer/2022052309/5b8914eb7f8b9a655f8b5f8c/html5/thumbnails/7.jpg)
Common steps in ML workflow
• Collect data (various sources, UCI data repository, news orgs, Kaggle)• Prepare data (exploratory analysis, feature selection, regularization)• Selecting and training model (train and test datasets, what model?)• Evaluating model (accuracy, precision, ROC curves, F1 score)• Optimizing performance (change model, # of features, scaling)
![Page 8: Machine Learning in Python - gradquant.ucr.edugradquant.ucr.edu/.../ML_Python_Spring2018_part1.pdf · Machine Learning •Developed out of initial work in Artificial Intelligence](https://reader034.vdocuments.site/reader034/viewer/2022052309/5b8914eb7f8b9a655f8b5f8c/html5/thumbnails/8.jpg)
scikit-learn
http://scikit-learn.org/stable/index.html
![Page 9: Machine Learning in Python - gradquant.ucr.edugradquant.ucr.edu/.../ML_Python_Spring2018_part1.pdf · Machine Learning •Developed out of initial work in Artificial Intelligence](https://reader034.vdocuments.site/reader034/viewer/2022052309/5b8914eb7f8b9a655f8b5f8c/html5/thumbnails/9.jpg)
Preprocessing
• Clean data and deal with missing values, etc.• Feature scaling - rescaling features to be more sensible• Standardization - getting various features into similar range (e.g. -1 to
1)• Square footage of a house (100s of ft) vs # of rooms (1-5)
http://scikit-learn.org/stable/modules/preprocessing.html
![Page 10: Machine Learning in Python - gradquant.ucr.edugradquant.ucr.edu/.../ML_Python_Spring2018_part1.pdf · Machine Learning •Developed out of initial work in Artificial Intelligence](https://reader034.vdocuments.site/reader034/viewer/2022052309/5b8914eb7f8b9a655f8b5f8c/html5/thumbnails/10.jpg)
Preprocessing
• Clean data and deal with missing values, etc.• Feature scaling - rescaling features to be more sensible• Standardization - getting various features into similar range (e.g. -1 to
1)• Square footage of a house (100s of ft) vs # of rooms (1-5)
• Normalization – scaling to some standard (e.g. subtract mean & divide by SD)• Many others (regularization,imputation, generating polynomial
features, etc.)
http://scikit-learn.org/stable/modules/preprocessing.html
![Page 11: Machine Learning in Python - gradquant.ucr.edugradquant.ucr.edu/.../ML_Python_Spring2018_part1.pdf · Machine Learning •Developed out of initial work in Artificial Intelligence](https://reader034.vdocuments.site/reader034/viewer/2022052309/5b8914eb7f8b9a655f8b5f8c/html5/thumbnails/11.jpg)
Preprocessing
• Clean data and deal with missing values, etc.
• Feature scaling - rescaling features to be more sensible
• Standardization - getting various features into similar range (e.g. -1 to 1)• Square footage of a house (100s of ft) vs # of rooms (1-5)
• Normalization – scaling to some standard (e.g. subtract mean & divide by SD)
http://scikit-learn.org/stable/modules/preprocessing.html
![Page 12: Machine Learning in Python - gradquant.ucr.edugradquant.ucr.edu/.../ML_Python_Spring2018_part1.pdf · Machine Learning •Developed out of initial work in Artificial Intelligence](https://reader034.vdocuments.site/reader034/viewer/2022052309/5b8914eb7f8b9a655f8b5f8c/html5/thumbnails/12.jpg)
Importance of feature scaling
http://scikit-learn.org/stable/auto_examples/preprocessing/plot_scaling_importance.html
![Page 13: Machine Learning in Python - gradquant.ucr.edugradquant.ucr.edu/.../ML_Python_Spring2018_part1.pdf · Machine Learning •Developed out of initial work in Artificial Intelligence](https://reader034.vdocuments.site/reader034/viewer/2022052309/5b8914eb7f8b9a655f8b5f8c/html5/thumbnails/13.jpg)
Comparison of scalingStandardScaler
![Page 14: Machine Learning in Python - gradquant.ucr.edugradquant.ucr.edu/.../ML_Python_Spring2018_part1.pdf · Machine Learning •Developed out of initial work in Artificial Intelligence](https://reader034.vdocuments.site/reader034/viewer/2022052309/5b8914eb7f8b9a655f8b5f8c/html5/thumbnails/14.jpg)
Comparison of scalingRobustScaler
![Page 15: Machine Learning in Python - gradquant.ucr.edugradquant.ucr.edu/.../ML_Python_Spring2018_part1.pdf · Machine Learning •Developed out of initial work in Artificial Intelligence](https://reader034.vdocuments.site/reader034/viewer/2022052309/5b8914eb7f8b9a655f8b5f8c/html5/thumbnails/15.jpg)
Train Test (Cross Validate?)
• Why do we need to split up our datasets?• Overfitting
• Split dataset• Train – for training your model on• Test – evaluate performance of model• Usually 40% for testing is enough
• Validation set?• Cross-validation
• Split up training set into subsets and evaluate performance (can be more computationally expensive but conserves data)
• Hyper-parameter tuning
![Page 16: Machine Learning in Python - gradquant.ucr.edugradquant.ucr.edu/.../ML_Python_Spring2018_part1.pdf · Machine Learning •Developed out of initial work in Artificial Intelligence](https://reader034.vdocuments.site/reader034/viewer/2022052309/5b8914eb7f8b9a655f8b5f8c/html5/thumbnails/16.jpg)
Bias-variance tradeoff
UnderfittingHigh Bias
OverfittingHigh Variance
http://scikit-learn.org/stable/auto_examples/model_selection/plot_underfitting_overfitting.html#sphx-glr-auto-examples-model-selection-plot-underfitting-overfitting-py
http://scott.fortmann-roe.com/docs/BiasVariance.html
![Page 17: Machine Learning in Python - gradquant.ucr.edugradquant.ucr.edu/.../ML_Python_Spring2018_part1.pdf · Machine Learning •Developed out of initial work in Artificial Intelligence](https://reader034.vdocuments.site/reader034/viewer/2022052309/5b8914eb7f8b9a655f8b5f8c/html5/thumbnails/17.jpg)
Bias-variance tradeoff
http://scott.fortmann-roe.com/docs/BiasVariance.html
![Page 18: Machine Learning in Python - gradquant.ucr.edugradquant.ucr.edu/.../ML_Python_Spring2018_part1.pdf · Machine Learning •Developed out of initial work in Artificial Intelligence](https://reader034.vdocuments.site/reader034/viewer/2022052309/5b8914eb7f8b9a655f8b5f8c/html5/thumbnails/18.jpg)
How to select a model?
![Page 19: Machine Learning in Python - gradquant.ucr.edugradquant.ucr.edu/.../ML_Python_Spring2018_part1.pdf · Machine Learning •Developed out of initial work in Artificial Intelligence](https://reader034.vdocuments.site/reader034/viewer/2022052309/5b8914eb7f8b9a655f8b5f8c/html5/thumbnails/19.jpg)
http://scikit-learn.org/stable/tutorial/machine_learning_map/index.html
![Page 20: Machine Learning in Python - gradquant.ucr.edugradquant.ucr.edu/.../ML_Python_Spring2018_part1.pdf · Machine Learning •Developed out of initial work in Artificial Intelligence](https://reader034.vdocuments.site/reader034/viewer/2022052309/5b8914eb7f8b9a655f8b5f8c/html5/thumbnails/20.jpg)
Supervised vs Unsupervised Learning
• Supervised• Regression, classification• Input variables, output variable, learn mapping of input to output
• Unsupervised• Clustering, association, etc.• No correct answers and no teacher
• Semi-supervised• Partially labeled dataset of images• Mixing both techniques is what occurs in real-world
![Page 21: Machine Learning in Python - gradquant.ucr.edugradquant.ucr.edu/.../ML_Python_Spring2018_part1.pdf · Machine Learning •Developed out of initial work in Artificial Intelligence](https://reader034.vdocuments.site/reader034/viewer/2022052309/5b8914eb7f8b9a655f8b5f8c/html5/thumbnails/21.jpg)
Regression
• Linear regression (OLS)
• Prediction• Multiple variables/features?• Feature selection
http://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.htmlhttps://www.xlstat.com/en/solutions/features/ordinary-least-squares-regression-ols
![Page 22: Machine Learning in Python - gradquant.ucr.edugradquant.ucr.edu/.../ML_Python_Spring2018_part1.pdf · Machine Learning •Developed out of initial work in Artificial Intelligence](https://reader034.vdocuments.site/reader034/viewer/2022052309/5b8914eb7f8b9a655f8b5f8c/html5/thumbnails/22.jpg)
Feature Selection
https://www.coursera.org/learn/machine-learning
![Page 23: Machine Learning in Python - gradquant.ucr.edugradquant.ucr.edu/.../ML_Python_Spring2018_part1.pdf · Machine Learning •Developed out of initial work in Artificial Intelligence](https://reader034.vdocuments.site/reader034/viewer/2022052309/5b8914eb7f8b9a655f8b5f8c/html5/thumbnails/23.jpg)
Feature Selection
https://www.coursera.org/learn/machine-learning
![Page 24: Machine Learning in Python - gradquant.ucr.edugradquant.ucr.edu/.../ML_Python_Spring2018_part1.pdf · Machine Learning •Developed out of initial work in Artificial Intelligence](https://reader034.vdocuments.site/reader034/viewer/2022052309/5b8914eb7f8b9a655f8b5f8c/html5/thumbnails/24.jpg)
Regression
• Linear regression (OLS)
• Prediction
• Multiple variables/features?• Feature selection• Length, width of a house (area?)• Regularization
http://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.htmlhttps://www.xlstat.com/en/solutions/features/ordinary-least-squares-regression-ols
![Page 25: Machine Learning in Python - gradquant.ucr.edugradquant.ucr.edu/.../ML_Python_Spring2018_part1.pdf · Machine Learning •Developed out of initial work in Artificial Intelligence](https://reader034.vdocuments.site/reader034/viewer/2022052309/5b8914eb7f8b9a655f8b5f8c/html5/thumbnails/25.jpg)
Regularization
https://www.coursera.org/learn/machine-learning
![Page 26: Machine Learning in Python - gradquant.ucr.edugradquant.ucr.edu/.../ML_Python_Spring2018_part1.pdf · Machine Learning •Developed out of initial work in Artificial Intelligence](https://reader034.vdocuments.site/reader034/viewer/2022052309/5b8914eb7f8b9a655f8b5f8c/html5/thumbnails/26.jpg)
Regularization
http://enhancedatascience.com/2017/07/04/machine-learning-explained-regularization/
![Page 27: Machine Learning in Python - gradquant.ucr.edugradquant.ucr.edu/.../ML_Python_Spring2018_part1.pdf · Machine Learning •Developed out of initial work in Artificial Intelligence](https://reader034.vdocuments.site/reader034/viewer/2022052309/5b8914eb7f8b9a655f8b5f8c/html5/thumbnails/27.jpg)
http://scikit-learn.org/stable/auto_examples/model_selection/plot_train_error_vs_test_error.html#sphx-glr-auto-examples-model-selection-plot-train-error-vs-test-error-py
![Page 28: Machine Learning in Python - gradquant.ucr.edugradquant.ucr.edu/.../ML_Python_Spring2018_part1.pdf · Machine Learning •Developed out of initial work in Artificial Intelligence](https://reader034.vdocuments.site/reader034/viewer/2022052309/5b8914eb7f8b9a655f8b5f8c/html5/thumbnails/28.jpg)
Classification – Logistic Regression
http://www.saedsayad.com/logistic_regression.htm
![Page 29: Machine Learning in Python - gradquant.ucr.edugradquant.ucr.edu/.../ML_Python_Spring2018_part1.pdf · Machine Learning •Developed out of initial work in Artificial Intelligence](https://reader034.vdocuments.site/reader034/viewer/2022052309/5b8914eb7f8b9a655f8b5f8c/html5/thumbnails/29.jpg)
Classification – Logistic Regression
https://medium.com/technology-nineleaps/logistic-regression-bac1db38cb8c https://mapr.com/blog/predicting-breast-cancer-using-apache-spark-machine-learning-logistic-regression/
![Page 30: Machine Learning in Python - gradquant.ucr.edugradquant.ucr.edu/.../ML_Python_Spring2018_part1.pdf · Machine Learning •Developed out of initial work in Artificial Intelligence](https://reader034.vdocuments.site/reader034/viewer/2022052309/5b8914eb7f8b9a655f8b5f8c/html5/thumbnails/30.jpg)
Classification – SVM
http://scikit-learn.org/0.18/auto_examples/svm/plot_separating_hyperplane.html
![Page 31: Machine Learning in Python - gradquant.ucr.edugradquant.ucr.edu/.../ML_Python_Spring2018_part1.pdf · Machine Learning •Developed out of initial work in Artificial Intelligence](https://reader034.vdocuments.site/reader034/viewer/2022052309/5b8914eb7f8b9a655f8b5f8c/html5/thumbnails/31.jpg)
Evaluating Performance
• Accuracy – how many predictions are correct out of the entire dataset?• Can be a flawed metric
• Precision and Recall
https://en.wikipedia.org/wiki/Precision_and_recall
![Page 32: Machine Learning in Python - gradquant.ucr.edugradquant.ucr.edu/.../ML_Python_Spring2018_part1.pdf · Machine Learning •Developed out of initial work in Artificial Intelligence](https://reader034.vdocuments.site/reader034/viewer/2022052309/5b8914eb7f8b9a655f8b5f8c/html5/thumbnails/32.jpg)
Evaluating Performance
• Accuracy – how many predictions are correct out of the entire dataset?• Can be a flawed metric
• Precision and Recall
• ROC curves• F1 score
![Page 33: Machine Learning in Python - gradquant.ucr.edugradquant.ucr.edu/.../ML_Python_Spring2018_part1.pdf · Machine Learning •Developed out of initial work in Artificial Intelligence](https://reader034.vdocuments.site/reader034/viewer/2022052309/5b8914eb7f8b9a655f8b5f8c/html5/thumbnails/33.jpg)
Evaluating Performance
http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html#sphx-glr-auto-examples-model-selection-plot-roc-py
![Page 34: Machine Learning in Python - gradquant.ucr.edugradquant.ucr.edu/.../ML_Python_Spring2018_part1.pdf · Machine Learning •Developed out of initial work in Artificial Intelligence](https://reader034.vdocuments.site/reader034/viewer/2022052309/5b8914eb7f8b9a655f8b5f8c/html5/thumbnails/34.jpg)
Classification - K-Nearest Neighbors
• Robust to noisy training data• More effective with larger datasets
• Need to determine parameter K (number of nearest neighbors)• What type of distance metric?• High computation cost
![Page 35: Machine Learning in Python - gradquant.ucr.edugradquant.ucr.edu/.../ML_Python_Spring2018_part1.pdf · Machine Learning •Developed out of initial work in Artificial Intelligence](https://reader034.vdocuments.site/reader034/viewer/2022052309/5b8914eb7f8b9a655f8b5f8c/html5/thumbnails/35.jpg)
Clustering
• Unsupervised learning• Can help you understand structure of your data
• Various types of clustering: K-means, Hierarchical, Ward
![Page 36: Machine Learning in Python - gradquant.ucr.edugradquant.ucr.edu/.../ML_Python_Spring2018_part1.pdf · Machine Learning •Developed out of initial work in Artificial Intelligence](https://reader034.vdocuments.site/reader034/viewer/2022052309/5b8914eb7f8b9a655f8b5f8c/html5/thumbnails/36.jpg)
K-means
• Randomly choose k centroids• Form clusters around it• Take mean of cluster to identify new centroid• Repeat until convergence
https://www.youtube.com/watch?v=_aWzGGNrcic