machine learning using python
TRANSCRIPT
![Page 1: Machine Learning using Python](https://reader031.vdocuments.site/reader031/viewer/2022012318/5a681eb27f8b9a81378b642f/html5/thumbnails/1.jpg)
![Page 2: Machine Learning using Python](https://reader031.vdocuments.site/reader031/viewer/2022012318/5a681eb27f8b9a81378b642f/html5/thumbnails/2.jpg)
http://ocl.space
![Page 3: Machine Learning using Python](https://reader031.vdocuments.site/reader031/viewer/2022012318/5a681eb27f8b9a81378b642f/html5/thumbnails/3.jpg)
http://ocl.space
What is Machine Learning?
![Page 4: Machine Learning using Python](https://reader031.vdocuments.site/reader031/viewer/2022012318/5a681eb27f8b9a81378b642f/html5/thumbnails/4.jpg)
http://ocl.space
Types of Machine Learning
● Supervised Learning
● Unsupervised Learning
● Reinforcement Learning
![Page 5: Machine Learning using Python](https://reader031.vdocuments.site/reader031/viewer/2022012318/5a681eb27f8b9a81378b642f/html5/thumbnails/5.jpg)
http://ocl.space
Supervised Learning
y = f(X)
X is the features/inputsy is the target/outputf(X) is the learning function
Types :
● Regression
● Classification
![Page 6: Machine Learning using Python](https://reader031.vdocuments.site/reader031/viewer/2022012318/5a681eb27f8b9a81378b642f/html5/thumbnails/6.jpg)
http://ocl.space
Unsupervised Learning
● We have input data (X) but no corresponding output variable (y).
● The goal is to model the distribution of the data in order to learn moreabout the data.
● Types of unsupervised learning :
--> Clustering
--> Association
![Page 7: Machine Learning using Python](https://reader031.vdocuments.site/reader031/viewer/2022012318/5a681eb27f8b9a81378b642f/html5/thumbnails/7.jpg)
http://ocl.space
Other learning methods...
● Reinforcement learning
● Semi-supervised learning
● Transfer learning
![Page 8: Machine Learning using Python](https://reader031.vdocuments.site/reader031/viewer/2022012318/5a681eb27f8b9a81378b642f/html5/thumbnails/8.jpg)
http://ocl.space
Regression
● A form of predictive modelling technique which investigates the relationshipbetween a dependent (target) and independent variable (s) (predictor).
● It is used for forecasting, time series modelling and finding the causal effectrelationship between the variables.
● It indicates the significant relationships between dependent variable and independent variable.
● It indicates the strength of impact of multiple independent variables on a dependent variable.
● Types of regression : Linear, Logistic, Polynomial, Stepwise, Ridge, Lassoand ElasticNet
![Page 9: Machine Learning using Python](https://reader031.vdocuments.site/reader031/viewer/2022012318/5a681eb27f8b9a81378b642f/html5/thumbnails/9.jpg)
http://ocl.space
Classification
● A classification problem is when the output variable is a category.
● Examples : Emails filtering, Spam/Not Spam
![Page 10: Machine Learning using Python](https://reader031.vdocuments.site/reader031/viewer/2022012318/5a681eb27f8b9a81378b642f/html5/thumbnails/10.jpg)
http://ocl.space
Clustering and Association
● The aim is to segregate groups with similar traits and assign them into clusters.
● Types of Clustering :
--> Hard Clustering: In hard clustering, each data point either belongs to a cluster completely or not.
--> Soft Clustering: In soft clustering, instead of putting each data point into a separate cluster, a probability or likelihood of that data point to be in those clusters is assigned.
● When we want to discover rules that describe portions of the input data it is knownas association problem.
![Page 11: Machine Learning using Python](https://reader031.vdocuments.site/reader031/viewer/2022012318/5a681eb27f8b9a81378b642f/html5/thumbnails/11.jpg)
http://ocl.space
Linear Regression
● It is used to estimate real values (cost of houses, number of calls, total sales etc.)based on continuous variable(s).
● Here, we establish relationship between independent and dependent variables by fitting a best line.
● This best fit line is known as regression line and represented by a linear equation
Y= a * X + b
Y – Dependent Variablea – SlopeX – Independent variableb – Intercept
![Page 12: Machine Learning using Python](https://reader031.vdocuments.site/reader031/viewer/2022012318/5a681eb27f8b9a81378b642f/html5/thumbnails/12.jpg)
http://ocl.space
Linear Regression
![Page 13: Machine Learning using Python](https://reader031.vdocuments.site/reader031/viewer/2022012318/5a681eb27f8b9a81378b642f/html5/thumbnails/13.jpg)
http://ocl.space
Logistic Regression
● It is used to estimate discrete values ( Binary values like 0/1, yes/no, true/false )based on given set of independent variable(s).
● It predicts the probability of occurrence of an event by fitting data to a logit function.
![Page 14: Machine Learning using Python](https://reader031.vdocuments.site/reader031/viewer/2022012318/5a681eb27f8b9a81378b642f/html5/thumbnails/14.jpg)
http://ocl.space
Logistic Regression
![Page 15: Machine Learning using Python](https://reader031.vdocuments.site/reader031/viewer/2022012318/5a681eb27f8b9a81378b642f/html5/thumbnails/15.jpg)
http://ocl.space
Overfitting & Underfitting
● Overfitting happens when a model performs too well on training data but does not perform well on unseen data.
● Underfitting when a model does not perform well on training data as well as unseen data.
![Page 16: Machine Learning using Python](https://reader031.vdocuments.site/reader031/viewer/2022012318/5a681eb27f8b9a81378b642f/html5/thumbnails/16.jpg)
http://ocl.space
Cross Validation
● A method to test how well a model performs on unseen data.
● Types of Cross Validation methods :
--> Hold out method
--> K-fold method
--> Leave-one-out cross validation
![Page 17: Machine Learning using Python](https://reader031.vdocuments.site/reader031/viewer/2022012318/5a681eb27f8b9a81378b642f/html5/thumbnails/17.jpg)
http://ocl.space
Learning = Representation + Evaluation + Optimization
![Page 18: Machine Learning using Python](https://reader031.vdocuments.site/reader031/viewer/2022012318/5a681eb27f8b9a81378b642f/html5/thumbnails/18.jpg)
http://ocl.space
Naive Bayes
● Naive Bayes is a supervised learning algorithm which is based on bayes theorem.
● The word naive comes from the assumption of independence among features.
● We can write bayes theorem as follows :
Where,P(x) is the prior probability of a feature.P(x | y) is the probability of a feature given target. It's also known as likelihood.P(y) is the prior probability of a target or class in case of classification.p(y | x) is the posterior probability of target given feature.
![Page 19: Machine Learning using Python](https://reader031.vdocuments.site/reader031/viewer/2022012318/5a681eb27f8b9a81378b642f/html5/thumbnails/19.jpg)
http://ocl.space
Support Vector Machines (SVMs)
● SVMs are among the best supervised learning algorithms.
● It is effective in high dimensional space and it is memory efficient as well.
● We plot each data item as a point in n-dimensional space andperform classificationby finding the hyperplane that differentiate the two classes very well.
● We can draw m number of hyperplanes.
● The optimal hyperplane is obtained by maximizing the margin.
![Page 20: Machine Learning using Python](https://reader031.vdocuments.site/reader031/viewer/2022012318/5a681eb27f8b9a81378b642f/html5/thumbnails/20.jpg)
http://ocl.space
Support Vector Machines (SVMs)
![Page 21: Machine Learning using Python](https://reader031.vdocuments.site/reader031/viewer/2022012318/5a681eb27f8b9a81378b642f/html5/thumbnails/21.jpg)
http://ocl.space
Decision Tree
● Decision Tree is the supervised learning algorithm which can be used for classification as well as regression problems.
● Here we split population into set of homogeneous sets by asking set of questions.
● Example : To decide what to do on a particular day.
![Page 22: Machine Learning using Python](https://reader031.vdocuments.site/reader031/viewer/2022012318/5a681eb27f8b9a81378b642f/html5/thumbnails/22.jpg)
http://ocl.space
Decision Tree
![Page 23: Machine Learning using Python](https://reader031.vdocuments.site/reader031/viewer/2022012318/5a681eb27f8b9a81378b642f/html5/thumbnails/23.jpg)
http://ocl.space
Random Forest
● Random Forest is the most common type of Ensemble Learning.
● It is a collection of decision trees.
● To classify a new object based on attributes, each tree gives a classification and we say the tree “votes” for that class. The forest chooses the classification having the most votes (over all the trees in the forest).
● There are plethora of advantages of random forest such as they are fast to train, requires no input preparation.
● One of the disadvantage of random forest is that our model may become too large.
![Page 24: Machine Learning using Python](https://reader031.vdocuments.site/reader031/viewer/2022012318/5a681eb27f8b9a81378b642f/html5/thumbnails/24.jpg)
http://ocl.space
K-nearest Neighbors (KNN)
● KNN can be used for both classification and regression problems.
● It stores all available cases and classifies new cases by a majority vote of its k neighbors.
● KNN is computationally expensive.
![Page 25: Machine Learning using Python](https://reader031.vdocuments.site/reader031/viewer/2022012318/5a681eb27f8b9a81378b642f/html5/thumbnails/25.jpg)
http://ocl.space
K-means clustering
● K-means is one of the simplest unsupervised learning algorithm used for clustering problem.
● Our goal is to group objects based on their features similarity.
● Basic idea behind K-means is, we define k centroids, that is, one for each cluster.
![Page 26: Machine Learning using Python](https://reader031.vdocuments.site/reader031/viewer/2022012318/5a681eb27f8b9a81378b642f/html5/thumbnails/26.jpg)
http://ocl.space
Neural Networks
● Neural Network is an information processing system, that is, we pass some input to the Neural Network, some processing happens and we get some output.
● Neural Networks are inspired from biological connection of neurons and how information processing happens in the brain.
![Page 27: Machine Learning using Python](https://reader031.vdocuments.site/reader031/viewer/2022012318/5a681eb27f8b9a81378b642f/html5/thumbnails/27.jpg)
http://ocl.space
Let’s get started...