2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 data mining:...
TRANSCRIPT
![Page 1: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/1.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 1
Classification and Prediction
(Data Mining: Concepts and Techniques)
![Page 2: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/2.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 2
Chapter 6 Classification and Prediction
What is classification? What is prediction? Issues regarding classification and
prediction Classification by decision tree induction Bayesian Classification Classification by backpropagation Other Classification Methods Prediction Performance evaluation Summary
![Page 3: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/3.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 3
An Example of Classification(Fruit Classifier)
Classifieroutput
Class label
oval, red, orange, yellow
shape=roundcolor = red
inputfeatures
Apple
shape=roundcolor = orange
Classifier Orange
Classifier Mango
![Page 4: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/4.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 4
A Graphical Model for Classifier
yClassifierinputfeatures output
class label
::
x1
x2
xn
![Page 5: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/5.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 5
Model Representation for Classifier
The model is in a form of y = f (x1, x2, …, xn)
yClassifierinputfeatures output
class label
::
x1
x2
xn
Main problems in classifier model construction: • What are x1, …, xn in order to construct an effective f ?• How to get the model f given x1, …, xn ?• How to collect training data with class label y for creating model f
![Page 6: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/6.jpg)
Use a training set to construct a model for the outcomeforecast of future events. Two main types Classification
constructing models that distinguish classes for future forecast
Applications: loan approval, customer classification, recognition of finger print
Model representation: decision-tree, neural network Prediction
constructing models that predict unknown numerical values Applications : price prediction of various securities, assets Model representation: neural network, linear regression
Main Data Mining TechniquesSupervised Learning
6
![Page 7: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/7.jpg)
Use a training set to construct a model for the outcome
forecast of future events. Classification
predicts categorical class labels constructs a classification model to classify new data
Prediction predicts numerical values Constructs a continuous-valued function to predict
unknown or missing values Typical Applications
credit card approval medical diagnosis & treatment Pattern recognition
Classification vs. Prediction
7
![Page 8: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/8.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 8
Classification vs. Prediction
Classifierinputfeatures output
class label
::
predicted value
Predictorinputfeatures output
::
(continuous value)
(category/nominal value)
![Page 9: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/9.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 9
Classification—A Two-Step Process
1. Model construction Training set : the data set used for model construction
Class label : each tuple/sample is assumed to belong to a predefined class (determined by the class label attribute)
Model representation: classification rules, decision trees, or mathematical formulae
2. Model usage: for classifying future unknown objects Test set : a data set independent of training set Performance evaluation : to evaluate how good the model is
The known class label (y) of each test sample is compared with the classified result (y’) from the classification model
Accuracy rate : the percentage of test samples that are correctly classified by the model (the ratio of y=y’s)
![Page 10: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/10.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 10
Model construction
TrainingData(I, O)
ClassificationLearning
Algorithms
ClassifierModel
Model usage
ClassifierModel
inputfeatures output
class label
::
Classification—A Two-Step Process
![Page 11: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/11.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 11
Classification Process (1): Example for Model Construction
TrainingData
name rank years tenuredMike Assistant Prof 3 noMary Assistant Prof 7 yesBill Professor 2 yesJim Associate Prof 7 yesDave Assistant Prof 6 noAnne Associate Prof 3 no
ClassificationLearning
Algorithms
IF rank = ‘professor’OR years > 6
THEN tenured = ‘yes’
Classifier(Model)
tenured = f (rank, years)
input features class label
![Page 12: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/12.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 12
Classification Process (2): Example for Model Use
TestingData
name rank years tenuredTom Assistant Prof 2 noMerlisa Associate Prof 7 noGeorge Professor 5 yesJoseph Assistant Prof 7 yes
Unseen Data
(Jeff, Professor, 4)
Predict tenured?Accuracy
Performance evaluationuse of the model
Classifier
![Page 13: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/13.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 13
Supervised vs. Unsupervised Learning
Supervised learning (classification)
Aim : establish a classifier model
Supervision : The training data (observations,
measurements, etc.) are accompanied by class
labels indicating the class of the observations
Unsupervised learning (clustering)
The class labels of training data is unknown
Aim : establish the classes or clusters for the
data
![Page 14: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/14.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 14
Chapter 6 Classification and Prediction
What is classification? What is prediction? Issues regarding classification and prediction Classification by decision tree induction Bayesian Classification Classification by backpropagation Classification based on concepts from
association rule mining Other Classification Methods Prediction Estimating classification accuracy Summary
![Page 15: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/15.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 15
Issues regarding classification & prediction
1). Data Preparation (Data Preprocessing) Data cleaning
Preprocess data in order to reduce noise and handle missing values
Feature relevance analysis (feature selection)
Remove the irrelevant or redundant attributes
Data transformation Generalize and/or normalize data
![Page 16: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/16.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 16
Issues regarding classification and prediction
2). Performance Evaluation of Classification Methods Predictive accuracy Speed scalability (for Big Data Analysis)
time to construct the model time to use the model
Space scalability (for Big Data Analysis) Memory/disk required to construct/use the model
Robustness handling noise and missing values
Interpretability: understanding and insight provided by the model
Goodness of rules size and compactness of classification rules
![Page 17: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/17.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 17
Chapter 6 Classification and Prediction
What is classification? What is prediction? Issues regarding classification and prediction Classification by decision tree induction Bayesian Classification Classification by backpropagation Classification based on concepts from
association rule mining Other Classification Methods Prediction Estimating classification accuracy Summary
![Page 18: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/18.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 18
Decision Tree Induction Algorithm
(A Learning Algo. for Classification Model)
Decision Tree Induction Given : a set of training data (<I,O>=< x1, …, xn, y>)
Aim (Find f, where y = f (x1, x2, …, xn) and f in DT form)
To construct a minimal decision tree in order to effectively classify future unknown samples
Decision tree Representation : a flow-chart-like tree structure Internal node denotes a test on an attribute of a
sample Branch represents an outcome of the test Leaf node represent a class labels or a class
distribution
![Page 19: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/19.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 19
Classification in Decision Tree Induction
1. Generation of decision tree : consisting of two phases Tree construction
Tree is constructed one node by one node in a top-down manner by using training examples.
Tree pruning Identify and remove branching subtrees that
reflect noise or outliers2. Use of decision tree: classifying an unknown sample
Test the attribute values of a sample (with unknown class label) against the decision tree
![Page 20: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/20.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 20
An Example of Training Dataset( For buys_PC )
age income student credit_rating buys_PC<=30 high no fair no<=30 high no excellent no31…40 high no fair yes>40 medium no fair yes>40 low yes fair yes>40 low yes excellent no31…40 low yes excellent yes<=30 medium no fair no<=30 low yes fair yes>40 medium yes fair yes<=30 medium yes excellent yes31…40 medium no excellent yes31…40 high yes fair yes>40 medium no excellent no
This follows an example from Quinlan’s ID3
Class label
Input features
![Page 21: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/21.jpg)
no yes fairexcellent
<= 30 > 4030..40
student?
age?
credit rating?
nono yes
yes
yes
: test (input) attribute: class label for Buy_PC
: attribute value
?
A Decision Tree for Predicting buys_PC
Buy_PC = f (age, student, credit rating)
f
(age, student, credit rating)
Buy_PC=y/n
21
![Page 22: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/22.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 22
Extracting Classification Rules from Trees
Rules are easier for humans to understand Represent the knowledge in the form of IF-THEN rules One rule is created for each path from the root to a leaf
Each attribute-value pair along a path forms a condition The leaf node holds the class prediction
Examples of Extracted RulesIF age = “<=30” AND student = “no” THEN buys_computer = “no”IF age = “<=30” AND student = “yes” THEN buys_computer =
“yes”IF age = “31…40” THEN buys_computer =
“yes”IF age = “>40” AND credit_rating = “excellent”
THEN buys_computer = “yes”IF age = “>40” AND credit_rating = “fair”
THEN buys_computer = “no”
![Page 23: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/23.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 23
ID3 Algorithm for Decision Tree Induction
Assumption: Attributes are categorical If continuous-valued, they are discretized in advance
Idea of decision tree induction algorithm: Tree is constructed one node by one node in a top-
down recursive divide-and-conquer manner Key : Find the most discriminating attribute at each
node At start, all the training samples are located at the root node At each node, training samples at this node are
used to select the most discriminating attribute on the basis of a heuristic or statistical measure (attribute selection measure)
partitioned training samples into node branches based on the most discriminating attribute and its associated data values of training samples
![Page 24: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/24.jpg)
<= 30 > 4030..40
age?
Partitioning of Training Dataat a Node of a Decision Tree
age income student credit_rating buys_PC<=30 high no fair no<=30 high no excellent no31…40 high no fair yes>40 medium no fair yes>40 low yes fair yes>40 low yes excellent no31…40 low yes excellent yes<=30 medium no fair no<=30 low yes fair yes>40 medium yes fair yes<=30 medium yes excellent yes31…40 medium no excellent yes31…40 high yes fair yes>40 medium no excellent no
age income student credit_rating buys_PC>40 medium no fair yes>40 low yes fair yes>40 low yes excellent no>40 medium yes fair yes>40 medium no excellent no
age income student credit_rating buys_PC31…40 high no fair yes31…40 low yes excellent yes31…40 medium no excellent yes31…40 high yes fair yes
24
age income student credit_rating buys_PC<=30 high no fair no<=30 high no excellent no<=30 medium no fair no<=30 low yes fair yes<=30 medium yes excellent yes
???
age, income, student and credit_rating are tested. age is the most discriminating attribute.
![Page 25: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/25.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 25
Stopping Conditions (for Decision Tree Induction)
Conditions for stopping recursive data partitioning
All samples at a given node belong to the same class
There are no remaining attributes for further
partitioning – majority voting is employed for
classifying the leaf
There are no samples left
![Page 26: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/26.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 26
Attribute Selection Measure(Find the most discriminating attribute at
each node)
Information gain (ID3/C4.5) All attributes are assumed to be categorical Can be modified for continuous-valued attributes
Gini index (IBM IntelligentMiner) All attributes are assumed continuous-valued Assume there exist several possible split values
for each attribute May need other tools, such as clustering, to get the
possible split values
Can be modified for categorical attributes
![Page 27: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/27.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 27
Information Gain (of ID3/C4.5)
(A Measure for Attribute Selection)
How to find the most discriminating attribute for each node
Idea : find the attribute with the highest information gain at each node
Assume there are two classes, P and N, in training examples Let the set of training examples S contains
p elements of class P n elements of class N
The information amount, needed to decide if an arbitrary example in S belongs to P or N, is defined asnp
nnp
nnp
pnp
pnpI
22 loglog),(Entropy
:
![Page 28: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/28.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 28
Information Gain (in Decision Tree Induction)
Assume that using attribute Ak the set S will be partitioned
into sets {S1, S2 , …, Sv} (i.e., # of Ak ’s values= v)
If Si contains pi examples of P and ni examples of N, the
entropy (the expected information amount needed to classify objects in all subtrees of S is
The information amount gained by branching on Ak
Find the attribute Ai with maximal gain for this node
11
),(),(||
||)(
iii
ii
iii
ik npI
np
npnpI
S
SAE
)A(E)n,p(I)A(Gain kk
Example
ij,mjfor)A(Gain)A(Gain ji 1
![Page 29: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/29.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 29
An Example(Attribute Selection by Information
Gains)
Data
Class P: buys_computer = “yes”
Class N: buys_computer = “no”
I(p, n) = I(9, 5) =0.940
Compute entropies for atrributes : age, income, student, credit_rating
at root node
Information Gain for age:
Similarly
Therefore, attribute age is selected at root node
age pi ni I(pi, ni)<=30 2 3 0.97130…40 4 0 0>40 3 2 0.971
69.0)2,3(14
5
)0,4(14
4)3,2(
14
5)(
I
IIageE
048.0)_(
151.0)(
029.0)(
ratingcreditGain
studentGain
incomeGain
25.0)(),()( ageEnpIageGain
Entropy after splitting using
age
Total split entropy for age:
equation
DataSplitting
![Page 30: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/30.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 30
Gini Index (IBM IntelligentMiner)
If a data set T contains examples from n classes, gini index gini (T) is defined as
where pj is the data percentage of class j in T. If a data set T is split into two subsets T1 and T2 with
sizes N1 and N2 respectively by using attribute Ak, the gini index after splitting ginik (T) is defined as
The attribute providing the smallest ginik(T) is chosen to split the node (need to enumerate all possible splitting points for attribute Ak).
n
jp jTgini
1
21)(
)()()( 22
11 Tgini
NN
TginiNNTginik
entropy
![Page 31: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/31.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 31
Avoid Overfitting in ID3/C4.5
The generated tree may overfit the training data Result in poor accuracy for unseen samples If too many tree branches exist, some may reflect
anomalies due to noise or outliers Two pruning approaches to avoid overfitting
Prepruning: Halt tree construction early, i.e.,Don’t split a node if this would result in the goodness measure falling below a threshold
Difficult to choose an appropriate threshold Postpruning:
Get a sequence of progressively pruned trees from a “fully grown” tree
Use a set of data different from the training data to decide which is the “best pruned tree”
![Page 32: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/32.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 32
Enhancements to basic decision tree induction
Allow for continuous-valued attributes
Define new discrete-valued attributes by
dynamically partitioning the continuous attribute
values into a set of discrete intervals
Handle missing attribute values by
Assigning the most common value of the attribute
Assigning a probability to each of the possible values
![Page 33: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/33.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 33
Why decision tree induction
Decision tree induction a classification learning algorithm.Classification — a typical problem extensively studied by statisticians and machine learning researchersWhy decision tree induction for classification?
convertible to simple and easy to understand classification rules (if-then rules)
can use SQL queries to access databases for each rule to find its associated data and rule coverage rate
comparable classification accuracy with other methods
relatively faster learning speed (than other classification methods)
![Page 34: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/34.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 34
Presentation of Classification Results
![Page 35: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/35.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 35
Chapter 6 Classification and Prediction
What is classification? What is prediction? Issues regarding classification and prediction Classification by decision tree induction Bayesian Classification Classification by backpropagation Classification based on concepts from
association rule mining Other Classification Methods Prediction Estimating classification Summary
![Page 36: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/36.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 36
Bayesian Classification: Why?
Probabilistic prediction and learning: Predict multiple hypotheses Calculate a probability for each hypothesis
Incremental learning : Each training example incrementally increases/decreases
the probability that a hypothesis is correct. Prior knowledge can be combined with observed data.
Standard:
Even when Bayesian methods are computationally intractable, they can provide a benchmark standard against other methods
![Page 37: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/37.jpg)
Given training data D, posterior probability of a hypothesis h, denoted as P(h|D), follows the Bayes theorem
The value of P(h|D) can be obtained according to the values of P(h), P(D), P(D|h)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 37
Bayesian Theorem
)()()|(
DPDhPDhP
)()()|()|(
DPhPhDPDhP
)()()|(
hPhDPhDP
D h
hD
![Page 38: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/38.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 38
Naive Bayes Classification for m classes
Definition X =<x1, x2, …, xn >: denoting a n-dimensional sample Ci|X : the hypothesis “sample X is of class Ci” P(Ci|X) = probability that sample X is of class Ci
There are m probabilities : P(C1|X), P(C2|X), …, P(Cm|X)
f : assign sample X to class Ci if P(Ci|X) is maximal among P(C1|X), P(C2|X),… P(Ci|X), …, P(Cm|X), i.e.,
The naive Bayesian classifier f assigns an unknown sample X to class Ci if and only if
(X is most likely of class Ci according to probabilities)
ijmjforXCPXCP ji ,1)|()|(
Find f for D, where y = f (x1, x2, …, xn) and f in NBC form
![Page 39: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/39.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 39
Estimating a-posteriori probabilities
How to find maximal one among P(C1|X), P(C2|X) …, P(Cm|
X) According to Bayes theorem:
P(Ci|X) = P(X|Ci)·P(Ci) / P(X) for 1 ≤ i ≤ m
P(X) is difficult to obtain, but it is a constant for
computing P(Ci|X)’s for all m classes, where 1 ≤ i ≤ m
So, only the values of P(X|Ci)·P(Ci)’s are reqired for 1 ≤ i ≤ m
P(Ci) = relative freq of samples in class Ci
=> can be computed from the training data Remaining problem: How to compute P(X|Ci) ?
![Page 40: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/40.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 40
Naïve Bayesian Classification
Naïve assumption: attribute independence
P(X|Ci) = P(<x1,…,xn>|Ci) = P(x1|Ci)·…·P(xn|Ci)
Why this assumption ? Make P(X|Ci) computable. X may not be in Ci , therefore P(X|Ci) is not computable Require a minimal # of training data
If k-th attribute of X is categorical:P(xk|Ci) is estimated as the relative freq of samples having
value xk as k-th attribute (Ak=xk) in class Ci , 1 ≤ k ≤ n If k-th attribute is continuous:
P(xk|Ci) is estimated via Gaussian density function (normal
distribution, a function modeled by mean and variance) for k-th attribute by using data in class Ci
![Page 41: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/41.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 41
Play-tennis example(Predict playing tennis or not on a given day)
Outlook Temperature Humidity Windy Classsunny hot high false Nsunny hot high true Novercast hot high false Prain mild high false Prain cool normal false Prain cool normal true Novercast cool normal true Psunny mild high false Nsunny cool normal false Prain mild normal false Psunny mild normal true Povercast mild high true Povercast hot normal false Prain mild high true N
Given : an unseen input sample<rain, hot, high, false>
Will play tennis ?
P : Play, 9 recordsN : Not play, 5 records
Given : the following training dataProblem : Predict whether to play tennis on a particular day ?
![Page 42: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/42.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 42
Play-tennis example: classifying X
An unseen sample X = <rain, hot, high, false> Want to predict “play tennis or not?”
1. Then, the problem is to test “if P(p|X) > P(n|X) ?” i.e., “if P(X|p)·P(p) / P(X) > P(X|n)·P(n) / P(X) ” (Bayesian
Theorem)
2. According to Naïve Bayesian Classification,test if P(X|p) ·P(p) > P(X|n) ·P(n) ?
P(X|p)·P(p) = P(<rain, hot, high, false>|p)·P(p) = P(rain|p)·P(hot|p)·P(high|p)·P(false|p)·P(p)
P(X|n)·P(n) = P(<rain, hot, high, false>|n)·P(n) = P(rain|n)·P(hot|n)·P(high|n)·P(false|n)·P(n)
3. Need to know the values of above probabilities.
![Page 43: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/43.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 43
Play-tennis example: estimating P(p), P(n),
outlookP(sunny|p) = 2/9 P(sunny|n) = 3/5
P(overcast|p) = 4/9 P(overcast|n) = 0
P(rain|p) = 3/9 P(rain|n) = 2/5
temperature
P(hot|p) = 2/9 P(hot|n) = 2/5
P(mild|p) = 4/9 P(mild|n) = 2/5
P(cool|p) = 3/9 P(cool|n) = 1/5
humidityP(high|p) = 3/9 P(high|n) = 4/5
P(normal|p) = 6/9 P(normal|n) = 2/5
windyP(true|p) = 3/9 P(true|n) = 3/5
P(false|p) = 6/9 P(false|n) = 2/5
P(p) = 9/14
P(n) = 5/14
Step 1
Step 2
X = <rain, hot, high, false>P(X|p) ·P(p) > P(X|n) ·P(n) ?
Outlook Temperature Humidity Windy Classsunny hot high false Nsunny hot high true Novercast hot high false Prain mild high false Prain cool normal false Prain cool normal true Novercast cool normal true Psunny mild high false Nsunny cool normal false Prain mild normal false Psunny mild normal true Povercast mild high true Povercast hot normal false Prain mild high true N
)C(P)Cx(P
)C|x(Pi
ikik
![Page 44: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/44.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 44
Play-tennis example: classifying X
An unseen sample X = <rain, hot, high, false> Want to predict “play tennis or not?”
Then, the problem is to test “if P(p|X) > P(n|X) ?” i.e., “if P(X|p)·P(p) / P(X) > P(X|n)·P(n) / P(X) ” (Bayesian
Theorem)
According to Naïve Bayesian Classification,test if P(X|p) ·P(p) > P(X|n) ·P(n) ? P(X|p)·P(p) = P(<rain, hot, high, false>|p)·P(p) = P(rain|
p)·P(hot|p)·P(high|p)·P(false|p)·P(p) =3/9·2/9·3/9·6/9·9/14 = 0.010582
P(X|n)·P(n) = P(<rain, hot, high, false>|n)·P(n) = P(rain|n)·P(hot|n)·P(high|n)·P(false|n)·P(n)2/5·2/5·4/5·2/5·5/14 = 0.018286
Sample X is classified in class n (P(X|p) ·P(p) < P(X|n) ·P(n) )
![Page 45: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/45.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 45
The Independence Assumption
Makes computation possible
Yields optimal classifiers when the assumption is
satisfied
But is seldom satisfied in practice, as attributes
(variables) are often correlated.
Can attempt to overcome this limitation by:
Bayesian networks, that combine Bayesian reasoning
with causal relationships between attributes
Sample size must be large enough compared to NBC
![Page 46: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/46.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 46
Bayesian Belief Networks (I)FamilyHistory
LungCancer
PositiveXRay
Smoker
Emphysema
Dyspnea
LC
~LC
(FH, S) (FH, ~S)(~FH, S) (~FH, ~S)
0.8
0.2
0.5
0.5
0.7
0.3
0.1
0.9
Bayesian Belief Networks
The conditional probability table for the variable LungCancerAssumption : Variables Family History and Smoker are correlated.
![Page 47: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/47.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 47
Bayesian Belief Networks (II)
Bayesian belief network allows class conditional
independencies between subsets of the variables
A graphical model of causal relationships
Several classes of problems in learning Bayesian
belief networks Given network structure of all related variables => Easy
Given network structure of some related variables =>
Hard
When the network structure is unknown => Harder
![Page 48: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/48.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 48
Chapter 6 Classification and Prediction
What is classification? What is prediction? Issues regarding classification and
prediction Classification by decision tree induction Bayesian Classification Discriminative Classification Other Classification Methods Prediction Estimating classification Summary
![Page 49: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/49.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 49
Binary classification as a mathematical mapping computes 2-class categorical labels is a binary function f: X Y mathematically
y = f(X), X X ≡ n, y Y = {+1, –1} (or = {0, 1}) X : input, y : output
Example : classification of personal homepage, SPAM mail(An application of automatic document classification ) y = +1 or –1 (1/0; yes/no; true/false; positive/negative) X = <x1, x2, …, xn > (a keyword frequency vector for a Web
page) x1 : # of keyword 1, e.g., “homepage” x2 : # of keyword 2, e.g., “welcome” …..
Classification: A Mathematical Mapping
![Page 50: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/50.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 50
Linear Classification Problems
Example : a 2D binary classification problem A sample is a 2-D point The data above the red
line belongs to class ‘x’ The data below red line
belongs to class ‘o’ The data can be linearly
classified by the red line Classifier examples:
SVM Perceptron (an ANN)
x
x
x xxx
x
x
x
x o o
o
oo
o
o
o
o oo
o
o
In linear classification problems, the classification is accomplished by a linear hyperplane.
0 cbyax
02211 xwxw
x
![Page 51: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/51.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 51
Chapter 6 Classification and Prediction
What is classification? What is prediction? Issues regarding classification and
prediction Classification by decision tree induction Bayesian Classification Classification by Backpropagation Other Classification Methods Prediction Estimating classification Summary
![Page 52: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/52.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 52
Artificial Neural Networks(A Network of Artificial Neurons)
Advantages prediction accuracy is generally high robust, works when training examples contain
errors output may be discrete, real-valued, or a vector of
discrete or real-valued attributes fast evaluation of the learned target function
Criticism (somewhat) long training time for an optimal model difficult to understand the learned function
(weights) not easy to incorporate prior domain knowledge
![Page 53: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/53.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 53
Architecture of a Typical Artificial Neural Network(Multi-layer Perceptron)
Input Layer Output Layer
Middle Layer
I n
p u
t S
i g
n a
l s
O u
t p
u t
S
i g n
a l
s
.
.
.
.
.
.
![Page 54: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/54.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 54
A Neuron as a Simple Computing Element
)(1
n
iiiwxfY function meaning
Neuron Y
Input Signals
x1
x2
xn
Output Signals
Y
Y
Y
w2
w1
wn
Weights
θ Threshold(bias)
ConnectionWeights
![Page 55: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/55.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 55
Steps of a neuron’s computation
1. computes the weighted sum of the input signals
2. compares the result with a threshold value
3. Produce an output based on a transfer or
activation function as follows:
A Neuron as a Simple Computing Element
)(1
n
iiiwxfY
![Page 56: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/56.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 56
Various Activation Functions f of a Neuron
S t e p f u n c t io n S ig n f u n c t io n
+ 1
-1
0
+ 1
-1
0X
Y
X
Y
+ 1
-1
0 X
Y
S ig m o id f u n c t io n
+ 1
-1
0 X
Y
L in e a r f u n c t io n
0 if ,0
0 if ,1
X
XY step
0 if ,1
0 if ,1
X
XY sign
Xsigmoid
eY
1
1XY linear
Hidden neuronOutput neuron
Output neuron(function approximation)
n
iiiwxXXfY
1
),(
![Page 57: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/57.jpg)
Construction of Classification Modelvia Network Training
The objective of network training obtain a set of connection weights that makes
almost all the training tuples classified correctly Steps
1. Initialize weights with random values 2. Feed one of the training samples into the network 3. Do the followings for each neuron layer by layer
1. Compute the net input to the neuron as a weighted summation of all the inputs to the neuron
2. Compute the output value using the activation function3. Compute the error by backpropagation algorithm4. Adjust the weights and the bias according to the error
4. Go to Step 2 until convergence
57
Input Layer Output Layer
Middle Layer
I n p
u t
S i
g n
a l
s
O u
t p
u t
S i
g n
a l
s
Loop
![Page 58: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/58.jpg)
Backpropagation Training Algorithm
Output nodes
Input nodes
Hidden nodes
Output vector
Input vector
i
Hj
Ii
Hji
Hj OwI
HjI
Hj
eO
1
1
))(1( Okk
Ok
Okk OTOOErr
k
kOkj
Hj
Hj
Hj Errw)O(OErr 1
Hj
Hj
new,Hj Err
Hj
Ok
Okj
new,Okj OErrww
Ii
Hj
Hji
new,Hji OErrww
k
j
OkO
OjH
i
weights are updated according to backward propagated errors
I).input is propagated forward
58
…
… …
… …
OiI
II).
error
: learning rate
IjH
j
Ok
Hj
Okj
Ok OwI
OkI
Ok
eO
1
1
HjI Hidden
layerThe j-th neuron
I : inputO : output
Hjiw
Okjw
ErrkO
ErrjH
![Page 59: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/59.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 59
Chapter 6 Classification and Prediction
What is classification? What is prediction? Issues regarding classification and
prediction Classification by decision tree induction Bayesian classification Classification by Backpropagation Other classification methods Prediction Estimating classification accuracy Summary
![Page 60: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/60.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 60
Other Classification Methods
SVM—Support Vector Machines k-nearest neighbor classifier Case-based reasoning Rough set approach Fuzzy set approaches
![Page 61: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/61.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 61
SVM—Support Vector Machines
A classification method for both linear and nonlinear data
For nonlinear data, a nonlinear mapping is used to transform the training data into a higher dimension
With the new dimension, it searches for the optimal linear separating hyperplane (i.e., “decision boundary”)
With an appropriate nonlinear mapping to a sufficiently high dimension, data from two classes can always be separated by a linear hyperplane
SVM finds this separating hyperplane using support vectors (“essential” training tuples) and margins (間隔幅度 , defined by the support vectors)
![Page 62: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/62.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 62
SVM—History and Applications
Vapnik and colleagues (1992)—groundwork from
Vapnik & Chervonenkis’ statistical learning theory
in 1960s
Features: training can be slow but accuracy is high
owing to their ability to model complex nonlinear
decision boundaries (for margin maximization)
Used both for classification and prediction
Applications: handwritten digit recognition, object
recognition, speaker identification
![Page 63: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/63.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 63
SVM — General Concept(Find a decision boundary with a maximal margin)
decision boundary 1
decision boundary 2
Decision boundary 2 is better than decision boundary 1
![Page 64: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/64.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 64
SVM— Margins and Support Vectors
Better support vectors
Small Margin Large Margin
Worse support vectors
![Page 65: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/65.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 65
SVM — Case 1When Data Is Linearly Separable
m
Let data D={(X1, y1), …, (X|D|, y|D|)} be the set of training tuples, where Xi is associated with the class label yi
There are infinite lines (hyperplanes) separating the two classes but we want to find the best one (the one that minimizes classification error on unseen data)
SVM searches for the separating hyperplane with the largest margin, i.e., maximum marginal hyperplane (MMH)
![Page 66: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/66.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 66
SVM – Case 1 : Linearly Separable
A separating hyperplane can be written as
W ● X + b = 0
where W={w1, w2, …, wn} is a weight vector and b a scalar (bias)
For a 2-D space, a line L ax + by +c=0 can be written as
w0 + w1 x1 + w2 x2 = 0
The hyperplanes defining the two sides of the margin:
H1: w0 + w1 x1 + w2 x2 ≥ 1 for yi = +1, and
H2: w0 + w1 x1 + w2 x2 ≤ – 1 for yi = –1
Support vectors : any training tuples that fall on hyperplanes H1
or H2 (i.e., the sides defining the margin)
This becomes a constrained (convex) quadratic optimization problem: linear constraints and quadratic objective function Quadratic Programming (QP) Lagrangian multipliers
![Page 67: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/67.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 67
Why Is SVM Effective on High Dimensional Data?
The complexity of SVM classifier is characterized by the # of
support vectors rather than the dimensionality or # of the data
The support vectors are the essential or critical training
examples —they lie closest to the decision boundary (MMH)
If all other training examples are removed and the SVM training
is repeated, the same separating hyperplane would be found
The set of support vectors can be used to compute an (upper)
bound on the expected error rate of an SVM classifier, which is
independent of the data dimensionality
Thus, an SVM with a small number of support vectors can still
have good generalization, even when the dimensionality of the
data is high
![Page 68: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/68.jpg)
Transform the original input data into a higher dimensional space
A 3D input vector is mapped into a new 6D space Search for a linear separating hyperplane in the 6D
space
z1=x1, z2=x2, z3=x3
二〇二三年四月二十日 Data Mining: Concepts and Techniques 68
SVM — Case 2 Linearly Inseparable
A 1
A 2
<x1, x2, x3><z1, z2, z3, z4, z5, z6>
![Page 69: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/69.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 69
k-NN (k-Nearest Neighbor) Algorithm
A sample X is represented as
A training sample corresponds to a point in an n-D space.
The nearest neighbors are defined in terms of Euclidean
distance:
When given an unknown sample xq, a k-NN classifier
searches the sample space for the k training samples nearest
to the sample xq. Then decide the class of the unknown
sample by majority vote.
The value of k is decided heuristically. . _
+_ xq
+
_ _+
_
_
+
n
iii yxYXD
1
2)(),(
nxxxX ...,, ,21
![Page 70: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/70.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 70
Discussion on the k-NN Algorithm
k-NN algorithm works only for numeric-valued data Enhancement : distance-weighted k-NN algorithm
Weight the contribution of the k neighbors according to their distance to the query point (sample) xq
giving greater weight to closer neighbors
Similarly, works only for numeric-valued data Robust to noisy data by averaging k nearest neighbors Curse of dimensionality: distance between neighbors
could be dominated by many irrelevant attributes. To overcome it, axes stretch or elimination of the
least relevant attributes.
2)),((1
ixqxDw
![Page 71: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/71.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 71
Rough Set Approach
Rough sets are used to approximately or “roughly” define equivalent classes
A rough set for a given class C is approximated by two sets: a lower approximation (certain to be in C) and an upper approximation (cannot be described as not
belonging to C)
Rough set can also be used for feature reduction. A discernibility matrix is used to detect redundant
attributes.
![Page 72: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/72.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 72
Fuzzy Set Approaches
Example Application : Credit approval Credit Approval Rule :
IF (years_employed >=2) AND (income >=50K), THEN Credit = “approval”
Problem :What if a customer has had a job for at least two years and her income is $49K ?
Should she be approved or not ? Solution : Fuzzy set approaches
IF (years_employed is medium) AND (income is high), THEN Credit is approval
![Page 73: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/73.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 73
Fuzzy Set Approaches
Fuzzy logic uses truth values between 0.0 and 1.0 to represent the degree of membership
Attribute values are converted to fuzzy values e.g., income is mapped into the discrete categories {low,
medium, high} with fuzzy values calculated as
For income, 49K is transformed into
Each applicable rule of the rule set contributes a vote for membership in the categories
Typically, the truth values for each predicted category are summed up with weights for making decision
]1,0[,,;,, hmlhmlx
fuzzy membership graph
9.0,1.0,0
![Page 74: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/74.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 74
Chapter 6 Classification and Prediction
What is classification? What is prediction? Issues regarding classification and
prediction Classification by decision tree induction Bayesian Classification Classification by backpropagation Other Classification Methods Prediction Estimating classification accuracy Summary
![Page 75: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/75.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 75
What Is Prediction?
Prediction is similar to classification Step 1: construct a model Step 2: use the model to predict unknown value
Major method for prediction is regression Linear multiple regression Non-linear regression
Other method : artificial neural networks Main difference between prediction and
classification Classification predict categorical class label Prediction models continuous-valued functions
2211 xxy 3
32
21 xxxy
![Page 76: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/76.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 76
Linear regression: Y = + X Two parameters , and specify the line They are estimated by using the training samples
Using least squares criterion to the training samples: (X1, Y1), (X2, Y2) …, (Xn, Yn)
Multiple regression: Y = b0 + b1 X1 + b2 X2+…+ bn Xn
Many nonlinear functions can be transformed into the above.
Log-linear models: Example : Estimate probability:
p(a, b, c, d) = αabc abdγacd bcd
log p(a, b, c, d) = log abc +log abd+logγacd +log bcd
Regression Analysis and Log-Linear Models in Prediction
![Page 77: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/77.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 77
Chapter 6 Classification and Prediction
What is classification? What is prediction? Issues regarding classification and prediction Classification by decision tree induction Bayesian Classification Classification by backpropagation Classification based on concepts from
association rule mining Other Classification Methods Prediction Estimating accuracy Summary
![Page 78: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/78.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 78
Classifier Accuracy Measures
Accuracy of a classifier M, acc(M): percentage of test samples that are correctly classified by the classifier M (created by training set) Error rate (misclassification rate) of M = 1 – acc(M) Given m classes, CMi,j , an entry in a confusion matrix, indicates
# of samples in class i that are labeled by the classifier as class j Alternative performance measures (e.g., for cancer diagnosis)
sensitivity = TP/P /* true positive recognition rate */specificity = TN/N /* true negative recognition rate */precision = TP/(TP + FP)
accuracy = (TP+TN)/(P + N) = sensitivity * P/(P + N) + specificity * N/(P + N)
This model can also be used for cost-benefit analysis
classes yes (computed) no (computed) total recognition(%)
buy_computer = yes
6954 46 7000 99.34
buy_computer = no
412 2588 3000 86.27
total 7366 2634 10000
95.42
P’
(computed)N’ (computed)
P True positive False negative
N False positive
True negative
![Page 79: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/79.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 79
Error Measures for Prediction
Measure how far off the predicted value is from the actual known value Loss function: measures the error betw. yi and yi’ (predicted value)
Absolute error: | yi – yi’| Squared error: (yi – yi’)2 Test error: the average loss over the test set
Mean absolute error (MAE): Mean squared error (MSE):
Relative absolute error (RAE): Relative squared error (RSE):
The mean squared-error exaggerates the presence of outliers Popularly use (square) root mean-squared error, similarly, root
relative squared error
d
yyd
iii
1
|'|
d
yyd
iii
1
2)'(
dyy
dyy
d
ii
d
iii
/||
/|'|
1
1
d
ii
d
iii
dyy
dyy
1
2
1
2
/)(
/)'(
![Page 80: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/80.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 80
Performance Evaluation of Classification(Methods for Estimating Average Classification Accuracy)
Partition: Training-and-testing use two independent data sets: training set (2/3), test
set(1/3) Used for data set with large number of samples
k-fold cross-validation Randomly divide the data set into k subsets : S1, S2, …,Sk
At iteration i , the Si subset is used as test set and the remaining k-1 subsets are used as training data
A total of k times for computing average accuracy Used for data set of moderate size
Bootstrapping (leave-one-out) Similar to k-fold cross-validation with k set to s, where s is
the number of initial samples Used for small data set
![Page 81: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/81.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 81
10-fold Cross-Validation
Data set
Randomly divided into 10 subsets
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
Nine out of ten are used for training the classifier The tenth subset is
used as the test set in iteration
10 10
Training setTest set
Repeat iterations :1 to 10
12
34
5
678
9
At iteration 10
![Page 82: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/82.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 82
Model Comparisonby ROC Curves
ROC (Receiver Operating Characteristics) curves: for visual comparison of the performance of classification models
Vertical axis represents TP (true positive) rate Horizontal axis represents FP (false positive) rate The plot also shows a diagonal line Model 1 is better than model 2
model 1
model 2 diagonal line
(coin model)
![Page 83: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/83.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 83
Model Comparison by ROC Curves
Originated from signal detection theory Shows the trade-off between the true positive
rate and the false positive rate The area under the ROC curve (AUC) is a
measure of the performance of the model A model with perfect accuracy will have an area
of 1.0
The closer to the diagonal line (i.e., the closer the area is to 0.5), the less accurate is the model
![Page 84: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/84.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 84
Chapter 6 Classification and Prediction
What is classification? What is prediction? Issues regarding classification and prediction Classification by decision tree induction Bayesian Classification Classification by backpropagation Classification based on concepts from
association rule mining Other Classification Methods Prediction Estimating classification accuracy Summary
![Page 85: 2015年10月27日星期二 2015年10月27日星期二 2015年10月27日星期二 Data Mining: Concepts and Techniques1 Classification and Prediction (Data Mining: Concepts and Techniques)](https://reader033.vdocuments.site/reader033/viewer/2022061501/56649f045503460f94c18147/html5/thumbnails/85.jpg)
二〇二三年四月二十日 Data Mining: Concepts and Techniques 85
Summary
Classification is an extensively studied problem (mainly
in statistics, machine learning & AI)
Classification issue : How to create a classifier, i.e., find
f,
where y = f (x1, x2, …, xn) and f in DT, NBC, ANN,… forms
Classification is probably one of the most widely used
data mining techniques with a lot of extensions
Scalability is an important issue for applications
related to Big Data, Clouds