classification - knn classifier, naive bayesian classifierknn classi er naive bayesian classi er...
Post on 27-Mar-2021
13 Views
Preview:
TRANSCRIPT
KNN Classifier Naive Bayesian Classifier
ClassificationKNN Classifier, Naive Bayesian Classifier
CS479/508, Slide 1/27
KNN Classifier Naive Bayesian Classifier
Outline
1 KNN Classifier
2 Naive Bayesian Classifier
CS479/508, Slide 2/27
KNN Classifier Naive Bayesian Classifier
k-Nearest-Neighbor classifiers
First introduced in early 1950s
Distance metrics
Generally Euclidean distance is used when the values arecontinuousHamming distance for categorical/nominal values
CS479/508, Slide 3/27
KNN Classifier Naive Bayesian Classifier
Algorithm idea
Let k be the number of nearest neighbors and D be the set oftraining examples
for each test example z = (x′, y ′), do
Compute d(x, x′), the distance between z and every example(x, y) ∈ DSelect Dz ⊆ D, the set of k closest training examples to zy ′ = argmaxv
∑(xi ,yi )∈Dz
I (v == yi ) //Majority voting
end for
v : a class label
yi : the class label for one of the NNs
I : indicate function that returns value 1 if its argument is trueand 0 otherwise.
CS479/508, Slide 4/27
KNN Classifier Naive Bayesian Classifier
Discussions
Computation can be costly if the number of training examplesis large.
Efficient indexing techniques are available to find the nearestneighbors of a test example.
Sensitive k: reduce the impact of k is to weight the influenceof a NN xi : wi = 1
d(x′,xi )2
Distance-weighted voting:
y ′ = argmaxv∑
(xi ,yi )∈Dz
wi × I (v == yi )
CS479/508, Slide 5/27
KNN Classifier Naive Bayesian Classifier
Characteristics
Instance-based learning: (1) require a proximity measure (2)make predictions without having to maintain a model
Lazy learner: do not require model building.
Predictions are made based on local information. Thus, it isquite susceptible to noise.
Can produce arbitrarily shaped decision boundaries. Moreflexible compared with the decision tree approach.
CS479/508, Slide 6/27
KNN Classifier Naive Bayesian Classifier
References
Hui Ding, Goce Trajcevski, Peter Scheuermann, XiaoyueWang, Eamonn J. Keogh: Querying and mining of time seriesdata: experimental comparison of representations anddistance measures. PVLDB 1(2):1542-1552 (2008)
Abdullah Mueen, Eamonn J. Keogh, Neal Young:Logical-shapelets: an expressive primitive for time seriesclassification. KDD 2011:1154-1162.
Minh Nhut Nguyen, Xiao-Li Li, See-Kiong Ng: PositiveUnlabeled Learning for Time Series Classification. IJCAI2011:1421-1426.
CS479/508, Slide 7/27
KNN Classifier Naive Bayesian Classifier
kNN classifier – R
> install.packages(’class’)
> library(class)
> help(knn)
https://cran.r-project.org/web/packages/class/class.pdf
CS479/508, Slide 8/27
KNN Classifier Naive Bayesian Classifier
Naive Bayesian Classifier
Non-deterministic relationship between attribute set and theclass label.
Reason 1: noisy dataReason 2: complex factors
Probabilistic relationships between attribute set and the classlabel.
Bayesian classifiers: a probabilistic framework for solvingclassification problems.
CS479/508, Slide 9/27
KNN Classifier Naive Bayesian Classifier
Bayes Theorem
Conditional probability
P(C |A) =P(A,C )
P(A)
P(A|C ) =P(A,C )
P(C )
Bayes theorem
P(C |A) =P(A|C )P(C )
P(A)
CS479/508, Slide 10/27
KNN Classifier Naive Bayesian Classifier
Example of Bayes Theorem
Given:
A doctor knows that meningitis causes stiff neck 50% of thetimePrior probability of any patient having meningitis is 1/50,000Prior probability of any patient having stiff neck is 1/20
If a patient has stiff neck, what’s the probability he/she hasmeningitis?
Solve this problem:
M: a patient has meningitisS: a patient has stiff neckP(S|M) = 50%, P(M) = 1/50, 000, P(S) = 1/20, P(M|S)=?
P(M|S) =P(S|M)P(M)
P(S)=
(1/2) · (1/50, 000)
1/20=
20
100, 000= 0.0002
CS479/508, Slide 11/27
KNN Classifier Naive Bayesian Classifier
Example of Bayes Theorem
Given:
A doctor knows that meningitis causes stiff neck 50% of thetimePrior probability of any patient having meningitis is 1/50,000Prior probability of any patient having stiff neck is 1/20
If a patient has stiff neck, what’s the probability he/she hasmeningitis?
Solve this problem:
M: a patient has meningitisS: a patient has stiff neckP(S|M) = 50%, P(M) = 1/50, 000, P(S) = 1/20, P(M|S)=?
P(M|S) =P(S|M)P(M)
P(S)=
(1/2) · (1/50, 000)
1/20=
20
100, 000= 0.0002
CS479/508, Slide 11/27
KNN Classifier Naive Bayesian Classifier
Bayesian Classifiers
Consider each attribute and class label as random variables
Given a record with attributes (A1,A2, · · · ,An)
Goal is to predict class CSpecifically, we want to find the value of C that maximizesP(C |A1,A2, · · · ,An)
Can we estimate P(C |A1,A2, · · · ,An) directly from data?
CS479/508, Slide 12/27
KNN Classifier Naive Bayesian Classifier
Bayesian Classifiers – approach
Compute the posterior probability P(C |A1,A2, · · · ,An) for allvalues of C using the Bayes theorem
P(C |A1,A2, · · · ,An) =P(A1,A2, · · · ,An|C )P(C )
P(A1,A2, · · · ,An)
Choose value of C that maximizes P(C |A1,A2, · · · ,An)
≡ maximize P(A1,A2, · · · ,An|C )P(C )
How to estimate P(A1,A2, · · · ,An|C )?
CS479/508, Slide 13/27
KNN Classifier Naive Bayesian Classifier
Naive Bayes Classifier
Assume independence among attributes Ai when class is given
P(A1,A2, · · · ,An|Cj) = P(A1|Cj)P(A2|Cj) · · ·P(An|Cj)
Can estimate P(Ai |Cj) for all Ai and Cj .
New point is classified to Cj if P(Cj)ΠP(Ai |Cj) is maximal.
CS479/508, Slide 14/27
KNN Classifier Naive Bayesian Classifier
Estimate Probabilities from Data
Tid Refund Marital Status Taxable Income Cheat
1 Yes Single 125K No2 No Married 100K No3 No Single 70K No4 Yes Married 120K No5 No Divorced 95K Yes6 No Married 60K No7 Yes Divorced 220K No8 No Single 85K Yes9 No Married 75K No
10 No Single 90K Yes
Class: P(C = Ci ) = NiN
Discrete attributes: P(Ai = ai |C = Cj ) =Nij
Nj
Nij : # of instances having attribute Ai = ai and belonging to class Cj
Nj : # of instances belonging to class Cj
P(No) = 0.7, P(Yes) = 0.3
P(Status = Married |No) = 47P(Refund = Yes|Yes) = 0
CS479/508, Slide 15/27
KNN Classifier Naive Bayesian Classifier
Estimate Probabilities from Data – continuous attributes
Discretize the range into binsone ordinal attribute per bin
Two-way split: (A < v) or (A ≥ v)choose only one of the two splits as new attribute
Probability density estimationAssume attribute follows a normal distributionUse data to estimate parameters of distribution (e.g., mean µand standard deviation σ)Once probability distribution is known, can use it to estimatethe conditional probability
P(Ai |cj) =1√
2πσijexp−
(Ai−µij )2
2σ2ij
µij : estimated as sample mean of Ai for all training recordswith class cj .σij : estimated as sample variance s2 of all training recordswith class cj .
CS479/508, Slide 16/27
KNN Classifier Naive Bayesian Classifier
Estimate Probabilities from Data – Example
Tid Refund Marital Status Taxable Income Cheat
1 Yes Single 125K No2 No Married 100K No3 No Single 70K No4 Yes Married 120K No5 No Divorced 95K Yes6 No Married 60K No7 Yes Divorced 220K No8 No Single 85K Yes9 No Married 75K No
10 No Single 90K Yes
P(Ai |cj ) =1
√2πσij
exp−
(Ai−µij )2
2σ2ij
For (Income, Class=No)Sample mean: 125+100+70+120+60+220+75
7= 110
Sample variance: 2975
P(Income = 120|No) =1
√2π(54.54)
exp− (120−110)2
2(2975) = 0.0072
CS479/508, Slide 17/27
KNN Classifier Naive Bayesian Classifier
Example of Naive Bayes Classifier – Example 1
Given a test record: X=(Refund=No, Married, Income=120K)P(Refund = Yes|No) = 3/7P(Refund = No|No) = 4/7P(Refund = Yes|Yes) = 0P(Marital Status = Single|No) = 2/7P(Marital Status = Divorced |No) = 1/7P(Marital Status = Married |No) = 4/7P(Marital Status = Single|Yes) = 2/3P(Marital Status = Divorced |Yes) = 1/3P(Marital Status = Married |Yes) = 0P(Class = Yes) = 0.3P(Class = No) = 0.7For taxable incomeIf class=No: Sample mean=110, variance=2975
If class=Yes: Sample mean=90, variance=25
CS479/508, Slide 18/27
KNN Classifier Naive Bayesian Classifier
Example 1 (cont.)
P(X |Class = No) =P(Refund = No|No) ∗ P(Married |No) ∗ P(Income = 120K |No)
=4/7 ∗ 4/7 ∗ 0.0072
=0.0024
P(X |Class = Yes) =P(Refund = No|Yes) ∗ P(Married |Yes) ∗ P(Income = 120K |Yes)
=1 ∗ 0 ∗ 1.2 ∗ 10−9
=0
Since P(X |No) ∗ P(No) > P(X |Yes) ∗ P(Yes)P(No|X ) > P(Yes|X ) → class = No
P(Married |Yes) = 0. If one of the conditional probability is zero, then the entire
expression becomes zero.
CS479/508, Slide 19/27
KNN Classifier Naive Bayesian Classifier
Naive Bayes Classifier – Probability estimation for Zeroconditional probability
Original: P(Ai = ai |C = Cj) =Nij
Nj
Laplace: P(Ai = ai |C = Cj) =Nij+1Nj+c
m-estimate: P(Ai = ai |C = Cj) =Nij+mpNj+m
Nij : the number of instances with attribute value ai andbelonging to a class Cj
Nj : the number of instances belonging to a class Cj
c: total number of classes
p: prior probability for the positive class Cj
m: parameter
E.g., use m-estimate: m = 3 and p = 0.3 (prior for positive class Yes)
P(Marital Status=Married|Yes)= 0+0.3∗33+3 = 0.9
6
CS479/508, Slide 20/27
KNN Classifier Naive Bayesian Classifier
Probability estimation - Zero probability in R packagee1071
R package e1071, method naiveBayes assumes
independence of the predictor variables, and
Gaussian distribution (given the target class) of metric predictors.
For attributes with missing values, the corresponding table entries are
omitted for prediction.
Laplace and threshold are used to adjust the attribute likelihood
P(Ai =ai |C = Cj).
Laplace is a positive number controlling Laplace smoothing, whichadjusts the attribute likelihood P(Ai =ai |C = Cj) to be
Nij+laplaceNj+|Ai |×laplace , where
|Ai | is the number of different values for attribute Ai .
This parameter is only used to process attributes with categorical values.
Default laplace value is zero.
The threshold parameter is used to replace the zero attribute
likelihood P(Ai =ai |C = Cj).
CS479/508, Slide 21/27
KNN Classifier Naive Bayesian Classifier
Example of Naive Bayes Classifier – Example 2
Name Give Birth Can Fly Live in Water Have Legs Class
human yes no no yes mammalspython no no no no non-mammalssalmon no no yes no non-mammalswhale yes no yes no mammalsfrog no no sometimes yes non-mammalskomodo no no no yes non-mammalsbat yes yes no yes mammalspigeon no yes no yes non-mammalscat yes no no yes mammalsleopard shark yes no yes no non-mammalsturtle no no sometimes yes non-mammalspenguin no no sometimes yes non-mammalsporcupine yes no no yes mammalseel no no yes no non-mammalssalamander no no sometimes yes non-mammalsgila monster no no no yes non-mammalsplatypus no no no yes mammalsowl no yes no yes non-mammalsdolphin yes no yes no mammalseagle no yes no yes non-mammals
CS479/508, Slide 22/27
KNN Classifier Naive Bayesian Classifier
Example 2 (cont.)
Give test case:
Name Give Birth Can Fly Live in Water Have Legs Class
yes no yes no ?
A: attributes
M: mammals
N: non-mammals
CS479/508, Slide 23/27
KNN Classifier Naive Bayesian Classifier
Example 2 (cont.)P(M) = 7
20, P(N) = 13
20
P(GB = Y |M) = 67
, P(GB = N|M) = 17
P(Can Fly = Y |M) = 17
, P(Can Fly = N|M) = 67
P(Live in Water = Y |M) = 27
, P(Live in Water = N|M) = 57
P(Have Legs = Y |M) = 57
, P(Have Legs = N|M) = 27
P(X |M) =P(GB = Y |M) ∗ P(Can Fly = N|M) ∗ P(Live in Water = Y |M)
∗ P(Have Legs = N|M)
=6/7 ∗ 6/7 ∗ 2/7 ∗ 2/7 = 0.06
P(X |N) =P(GB = Y |N) ∗ P(Can Fly = N|N) ∗ P(Live in Water = Y |N)
∗ P(Have Legs = N|N)
=1/13 ∗ 10/13 ∗ 3/13 ∗ 4/13 = 0.0042
P(X |M) ∗ P(M) = 0.06 ∗ 7/20 = 0.021P(X |N) ∗ P(N) = 0.0042 ∗ 13/20 = 0.0027
Since P(X |M) ∗ P(M) > P(X |N) ∗ P(N), class = Mammals
CS479/508, Slide 24/27
KNN Classifier Naive Bayesian Classifier
Naive Bayesian classifier – R
> install.packages(’e1071’, dependencies = TRUE)
> library(e1071)
> help(naiveBayes)
https://cran.r-project.org/web/packages/e1071/e1071.pdf
CS479/508, Slide 25/27
KNN Classifier Naive Bayesian Classifier
Naive Bayesian classifier – R
Sometimes, we may run into the following mistake:
model_nb <- naiveBayes(trn.classlabel ~ ., data = trn.data)
predict(model_nb, tst.data, type=c("class"))
#factor(0)
#Levels:
One possible reason is that your trn.classlabel are treated as numericalvalues.
They should be categorical class labels as shown in naiveBayes methoddescription.
Computes the conditional a-posterior probabilities of a categorical class variablegiven independent predictor variables using the Bayes rule.
You may want to run this to see whether the class labels are categorical values.
is.factor(trn.classlabel)
If they are not categorical class labels, the following change should work
model_nb <- naiveBayes(as.factor(trn.classlabel) ~ ., data = trn.data)
predict(model_nb, tst.data, type=c("class"))
CS479/508, Slide 26/27
KNN Classifier Naive Bayesian Classifier
Naive Bayes Classifier (Summary)
Robust to isolated noise points
Handle missing values by ignoring the instance duringprobability estimate calculations
Robust to irrelevant attributes
Independence assumption may not hold for some attributes
Use other techniques such as Bayesian Belief Networks (BBN)
CS479/508, Slide 27/27
top related