supervised algorithms

49
Machine Learning: - Supervised Algorithms - Realized by : AKHIAT Yassine AKACHAR El Yazid Faculté des Sciences Dhar El Mahraz-Fès Année Universitaire : 2014/2015 Master SIRM

Upload: yassine-akhiat

Post on 08-Aug-2015

66 views

Category:

Science


2 download

TRANSCRIPT

Machine Learning:- Supervised Algorithms -

Realized by :

AKHIAT Yassine AKACHAR El Yazid

Faculté des Sciences Dhar El Mahraz-Fès

Année Universitaire : 2014/2015

Master SIRM

Outline

1. Introduction2. Supervised Algorithms 3. Some Real life

applications4. Naïve Bayes Classifier5. Implementation 6. Conclusion

Introduction

Machine Learning

from dictionary.com

“The ability of a machine to improve its

performance based on previous results.”

Arthur Samuel (1959) Field of study that gives computers the

ability to learn without being explicitly programmed

Introduction

Machine learning algorithms are organized into taxonomy, based on the desired outcome of the algorithm. Common algorithm types include:

Supervised Algorithms Unsupervised Algorithms Reinforcement Algorithms ETC …

Algorithms Types

Supervised Algorithms

Supervised Algorithms is the search for algorithms that reason from externally supplied instances to produce general hypotheses, which then make predictions about future instances.

In other words :

The goal of supervised learning is to build a concise model of the distribution of class labels in terms of predictor features.

Definition

Motivation

Supervised Algorithms

the raison why Supervised are appeared :

because in each domain there is a lot of information has generated in seconds , So why we don't exploit those information and this experience to make a good decision in future

Supervised Algorithms

Data: A set of data records (also called examples, instances or cases) described by k attributes: A1, A2, … Ak. a class: Each example is labelled with a pre-

defined class. Goal: To learn a classification model from the

data that can be used to predict the classes of new (future, or test) cases/instances.

Approach

Supervised Algorithms

Supervised Algorithms Process

Learning (training): Learn a model using the training data

Testing: Test the model using unseen test data to assess the model accuracy

,cases test ofnumber Total

tionsclassificacorrect ofNumber Accuracy

Supervised Algorithms

Example : Regression

Age prediction Regression :Predict

Continuous valued output

(Age)

Supervised Algorithms

Example: Classification:

Classification:Predict discreet

valued output(0 or 1)

Boolean functions AND

Supervised Algorithms

Classification Algorithms

Neural Networks Decision Tree K- Nearest neighbors Naïve Bayes ETC …

Supervised Algorithms

Neural Networks

Find the best separating plane between two classes.

Supervised Algorithms

Decision Tree

leaves represent classifications and branches represent tests on features that lead to those classifications

x1

x2?

?

?

?

X1>1

X2>2

YES

YES

NO

NO

1

2

Supervised Algorithms

K- Nearest neighborsFind the k nearest neighbors of the test example , and infer its class using their known class. E.g. K=3

x1

x2 ?

?

?

??

Supervised Algorithms

Comparison

(**** stars represent the best and * star the worst performance)

Some Real life applications

Systems Biology :Gene expression microarray data

Face detection :Signature recognition

Medicine : Predict if a patient has heart ischemia by a spectral analysis of his/her ECG

Recommended Systems

Text categorization : Spam filter

Some Real life applications

Microarray data

Separate malignant from healthy tissues based on the mRNA expression profile of the tissue.

Machine Learning Basics: 1. General Introduction

Some Real life applications

Face Detection

Some Real life applications

Text categorization

Categorize text documents into predefined categories for example, categorize E-mail as “Spam” or “NotSpam”

Naïve Bayes

Named after

Thomas Bayes in

1876, who

proposed the

Bayes Theorem.

Definition

Naïve Bayesian Classification

Bayesian Classification

What is it ?

The Bayesian classifier is based on Bayes’ Theorem

with independence assumptions between predictors.

Easy to build, with no complicated iterative

parameter estimation which makes it particularly

useful for very large datasets

Bayesian Classification

Bayes Theorem

Bayes Theorem provides a way of calculating the

posterior probability, P(C|X), from P(X) ,and P(X|C)

P(C|X) is the posterior probability

of class given predictor (attribute)

P(X|C) is the likelihood which is

the probability of predictor given

class

P(X) is the prior probability of

predictor

Bayesian Classification

Naïve Bayesian Algorithme

Example

Bayesian Classification

Classify a new Instance

(Outlook=sunny, Temp=cool, Humidity=high, Wind=strong)

How to classify This new Instance ??

Bayesian Classifier

Frequency Table

Outlook Play=Yes

Play=No

Sunny 2/9 3/5Overcast 4/9 0/5

Rain 3/9 2/5

Temperature

Play=Yes Play=No

Hot 2/9 2/5Mild 4/9 2/5Cool 3/9 1/5

Humidity Play=Yes

Play=No

High 3/9 4/5Normal 6/9 1/5

Wind Play=Yes Play=No

Strong 3/9 3/5Weak 6/9 2/5

P(Play=Yes) = 9/14 P(Play=No) = 5/14

Bayesian Classification

Example So lets Classify This new instance :

Likelihood of Yes

L=P(Outl=sunny|Yes)*P(Tem=Cool|Yes)*P(Hum=high|Yes)*P(Win=Strong|Yes)*P(Yes)L=2/9 * 4/9 * 6/9 * 3/9 * 9/14 =0,0053

Likelihood of No

L=P(Outl=sunny|No)*P(Temp=Cool|No)*P(Hum=high|No)*P(Win=Strong|No)*P(No)L=2/9 * 4/9 * 6/9 * 3/9 * 9/14 =0,0206

Outlook Temperature

Humidity Wind Play Tennis

Sunny Cool High Strong ??

Example

Bayesian Classification

Now we normalize :

P(Yes)= 0,0053 / ( 0,0053+0,0206 ) P(No)= 0,0206 / ( 0,0053+0,0206 )

Then :

P(Yes) =0,20 P(No) =0,80

So the predict class is

Outlook Temperature

Humidity Wind Play Tennis

Sunny Cool High Strong No

When an attribute value (Outlook=Overcast)

doesn’t occur with every class value (play tennis

=no)

Add 1 to all the counts

Bayesian Classification

The Zero-Frequency Problem

Bayesian Classification

Numerical Attributes

Numerical variables need to be transformed to their categorical before constructing their frequency tables

The other option we have is using the distribution of the numerical variable to have a good guess of the frequency

For example, one common practice is to assume normal distributions for numerical variables

Bayesian Classification

Normal distribution

The probability density function for the normal distribution is defined by two parameters (mean and standard deviation )

Bayesian Classification

Example of numerical Attributes

Yes

86 96 80 65 70 80 70 90 75

No 85 90 70 95 91

79,1 10,2

86,2 9,7

Humidity Mean StDev

Bayesian Classification

Uses Of Bayes Classification

Text Classification

Spam Filtering

Hybrid Recommender System

Online Application

Bayesian Classification

Advantages

Easy to implement

Requires a small amount of training data to estimate the parameters

Good results obtained in most of the cases

Bayesian Classification

Disadvantages

Assumption: class conditional independence, therefore loss of accuracy

Practically, dependencies exist among variables

E.g., hospitals: patients: Profile: age, family history, etc. Symptoms: fever, cough etc., Disease: lung cancer, diabetes, etc.

Dependencies among these cannot be modelled by Naïve Bayesian Classifier

Application

Spam filtering Spam filtering is the best known use of Naive Bayesian text

classification. It makes use of a naive Bayes classifier to identify spam e-mail.

Bayesian spam filtering has become a popular mechanism to distinguish illegitimate spam email from legitimate email

Many modern mail clients implement Bayesian spam filtering. Users can also install separate email filtering programs.

DSPAM, SpamAssassin, SpamBayes, ASSP,

Rappel

Naïve Bayes

The Bayesian classifier is based on Bayes’ Theorem

with independence assumptions between predictors.

Easy to build, with no complicated iterative

parameter estimation which makes it particularly

useful for very large datasets

Rappel

Naïve Bayes algorithms

Rappel

Naïve Bayes algorithms

Rappel

Naïve Bayes algorithms

Rappel

Naïve Bayes algorithms

Example

Naïve Bayes algorithms

doc words class

training D1 SIRM master FSDM A

D2 SIRM master A

D3 master SIRM A

D4 SIRM recherche FSDM B

test D5 SIRM SIRM SIRM master recherche FSDM ???

P(A)=(Nc/Nd)=3/4 , P(B)=(Nc/Nd)=1/4 P(SIRM|A)=(3+1)/(7+4)=4/11, P(master|A)=(3+1)/(7+4)=4/11P(recherche|A)=(0+1)/(7+4)=1/11, P(FSDM|A)=(1+1)/(7+4)=2/11

Example

Naïve Bayes algorithms

doc words class

training D1 SIRM master FSDM A

D2 SIRM master A

D3 master SIRM A

D4 SIRM recherche FSDM B

test D5 SIRM SIRM SIRM master recherche FSDM ???

P(A)=(Nc/Nd)=3/4 , P(B)=(Nc/Nd)=1/4 P(SIRM|B)=(1+1)/(3+4)=2/7, P(master|B)=(0+1)/(3+4)=1/7P(recherche|B)=(1+1)/(3+4)=2/7, P(FSDM|B)=(1+1)/(3+4)=2/7

Example

P(A|D5)=3/4 * (4/11)4 * 1/11 * 2/11 =0,00022

P(B|D5)=1/4 * (2/7)5 * 1/7 =0,000068 Now we normalize :

P(A|D5)= 0,00022 / ( 0,000068+0,00022 ) P(B|D5)= 0,000068 / ( 0,000068+0,00022 )

Then :

P(A|D5) =0,76 P(A|D5) =0,24

So the predict class is

Test D5 SIRM SIRM SIRM master recherche FSDM A

Conclusion

The naive Bayes model is tremendously appealing because of

its simplicity, elegance, and robustness.

It is one of the oldest formal classification algorithms, and yet

even in its simplest form it is often surprisingly effective.

A large number of modifications have been introduced, by the

statistical, data mining, machine learning, and pattern

recognition communities, in an attempt to make it more flexible.

Thank You