ethem alpaydın department of computer engineering boğaziçi university [email protected]...
Post on 21-Dec-2015
226 views
TRANSCRIPT
![Page 1: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/1.jpg)
Ethem AlpaydınDepartment of Computer Engineering
Boğaziçi University [email protected]
Intelligent Data Mining
![Page 2: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/2.jpg)
What is Data Mining ?
• Search for very strong patterns (correlations, dependencies) in big data that can generalise to accurate future decisions.
• Aka Knowledge discovery in databases, Business Intelligence
![Page 3: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/3.jpg)
Example Applications• Association
“30% of customers who buy diapers also buy beer.” Basket Analysis
• Classification“Young women buy small inexpensive cars.” “Older wealthy men buy big cars.”
• RegressionCredit Scoring
![Page 4: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/4.jpg)
Example Applications
• Sequential Patterns“Customers who latepay two or more of the first three installments have a 60% probability of defaulting.”
• Similar Time Sequences“The value of the stocks of company X has been similar to that of company Y’s.”
![Page 5: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/5.jpg)
Example Applications
• Exceptions (Deviation Detection)“Is any of my customers behaving differently than usual?”
• Text mining (Web mining)“Which documents on the internet are similar to this document?”
![Page 6: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/6.jpg)
IDIS – US Forest Service
• Identifies forest stands (areas similar in age, structure and species composition)
• Predicts how different stands would react to fire and what preventive measures should be taken?
![Page 7: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/7.jpg)
GTE Labs
• KEFIR (Key findings reporter)• Evaluates health-care utilization
costs• Isolates groups whose costs are
likely to increase in the next year. • Find medical conditions for which
there is a known procedure that improves health condition and decreases costs.
![Page 8: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/8.jpg)
Lockheed
• RECON Stock portfolio selection• Create a portfolio of 150-200
securities from an analysis of a DB of the performance of 1,500 securities over a 7 years period.
![Page 9: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/9.jpg)
VISA
• Credit Card Fraud Detection• CRIS: Neural Network software
which learns to recognize spending patterns of card holders and scores transactions by risk.“If a card holder normally buys gas and groceries and the account suddenly shows purchase of stereo equipment in Hong Kong, CRIS sends a notice to bank which in turn can contact the card holder.”
![Page 10: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/10.jpg)
ISL Ltd (Clementine) - BBC
• Audience prediction• Program schedulers must be able
to predict the likely audience for a program and the optimum time to show it.
• Type of program, time, competing programs, other events affect audience figures.
![Page 11: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/11.jpg)
Data Mining is NOT Magic!
Data mining draws on the concepts and methods of databases, statistics, and machine learning.
![Page 12: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/12.jpg)
From the Warehouse to the Mine
DataWarehouse
Standardform
TransactionalDatabases Extract,
transform,cleanse data
Define goals,data transformations
![Page 13: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/13.jpg)
How to mine?
Verification Discovery
Computer-assisted, User-directed, Top-down
Query and ReportOLAP (Online Analytical Processing) tools
Automated, Data-driven, Bottom-up
![Page 14: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/14.jpg)
Steps: 1. Define Goal
• Associations between products ?• New market segments or potential
customers?• Buying patterns over time or product
sales trends?• Discriminating among classes of
customers ?
![Page 15: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/15.jpg)
Steps:2. Prepare Data
• Integrate, select and preprocess existing data (already done if there is a warehouse)
• Any other data relevant to the objective which might supplement existing data
![Page 16: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/16.jpg)
Steps:2. Prepare Data (Cont’d)• Select the data: Identify relevant variables• Data cleaning: Errors, inconsistencies,
duplicates, missing data.• Data scrubbing: Mappings, data
conversions, new attributes• Visual Inspection: Data distribution,
structure, outliers, correlations btw attributes
• Feature Analysis: Clustering, Discretization
![Page 17: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/17.jpg)
Steps:3. Select Tool• Identify task class
Clustering/Segmentation, Association, Classification,
Pattern detection/Prediction in time series
• Identify solution classExplanation (Decision trees, rules) vs Black Box (neural network)
• Model assesment, validation and comparisonk-fold cross validation, statistical tests
• Combination of models
![Page 18: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/18.jpg)
Steps:4. Interpretation
• Are the results (explanations/predictions) correct, significant?
• Consultation with a domain expert
![Page 19: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/19.jpg)
Example
• Data as a table of attributes
Name
Income Owns a house? Marital status
Ali
25,000 $
Yes Married
Veli 18,000 $ No Married
We would like to be able to explain the value of one attribute in terms of the values of other attributes that are relevant.
Default
NoYes
![Page 20: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/20.jpg)
Modelling Data
Attributes x are observable
y =f (x) where f is unknown and probabilistic
fx y
![Page 21: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/21.jpg)
Building a Model for Data
fxy
f*
-
![Page 22: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/22.jpg)
Learning from Data
Given a sample X={xt,yt}t
we build f*(xt) a predictor to f (xt)
that minimizes the difference between our prediction and actual value
ttt xfyE 2)(*
![Page 23: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/23.jpg)
Types of Applications
• Classification: y in {C1, C2,…,CK}
• Regression: y in Re• Time-Series Prediction: x
temporally dependent
• Clustering: Group x according to similarity
![Page 24: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/24.jpg)
Example
Yearly income
savingsOKDEFAULT
![Page 25: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/25.jpg)
Example Solution
2
RULE: IF yearly-income> 1 AND savings> 2 THEN OK ELSE DEFAULT
x2 : savings
x1 : yearly-income1
OKDEFAULT
![Page 26: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/26.jpg)
Decision Treesx1 : yearly incomex2 : savingsy = 0: DEFAULTy = 1: OK
x1 > 1
x2 > 2 y = 0
y = 1 y = 0
yes
no
no
yes
![Page 27: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/27.jpg)
Clustering
yearly-income
savingsOKDEFAULTType
1
Type 2 Type 3
![Page 28: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/28.jpg)
Time-Series Prediction
timeJan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Jan
PresentPast Future
?
Discovery of frequent episodes
![Page 29: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/29.jpg)
Methodology
InitialStandardForm
Testset
Trainset
Predictor 1
Predictor 2
Predictor L
Choosebest
Data reduction:Value and featureReductions
Train alternativepredictors ontrain set
Test trainedpredictors ontest data andchoose best
BestPredictor
Acceptbest ifgoodenough
![Page 30: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/30.jpg)
Data Visualisation
• Plot data in fewer dimensions (typically 2) to allow visual analysis
• Visualisation of structure, groups and outliers
![Page 31: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/31.jpg)
Data Visualisation
Yearly income
savings
Exceptions
Rule
![Page 32: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/32.jpg)
Techniques for Training Predictors
• Parametric multivariate statistics• Memory-based (Case-based)
Models • Decision Trees• Artificial Neural Networks
![Page 33: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/33.jpg)
Classification
• x : d-dimensional vector of attributes
• C1 , C2 ,... , CK : K classes
• Reject or doubt
• Compute P(Ci|x) from data and
choose k such that P(Ck|x)=maxj P(Cj|x)
![Page 34: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/34.jpg)
Bayes’ Rulep(x|Cj) : likelihood that an object of class j has its features xP(Cj) : prior probability of class jp(x) : probability of an object (of any class) with feature xP(Cj|x) : posterior probability that object with feature x is of class j
![Page 35: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/35.jpg)
Statistical Methods
• Parametric e.g., Gaussian, model for class densities, p(x|Cj)
Univariate
Multivariate
x
2
2
2
)(exp
2
1)|(
j
j
j
j
xCxp
dx
)()(
21
exp)2(
1)|( 1
2/ jjT
jj
djCp μxΣμxΣ
x
![Page 36: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/36.jpg)
Training a Classifier
• Given data {xt}t of class Cj
Univariate: p(x|Cj) is N (j,j)
Multivariate: p(x|Cj) is Nd (j,j)
j
Cx
t
j n
xj
t̂
1
)ˆ(
ˆ
2
2
j
Cxj
t
j n
xj
t
n
nCP j
j )(ˆ
j
C
t
j nj
t x
x
μ̂1
)ˆ)(ˆ(
ˆ 2
j
C
Tj
tj
t
j nj
tx
μxμx
![Page 37: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/37.jpg)
Example: 1D Case
![Page 38: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/38.jpg)
Example: Different Variances
![Page 39: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/39.jpg)
Example: Many Classes
![Page 40: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/40.jpg)
2D Case: Equal Spheric Classes
![Page 41: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/41.jpg)
Shared Covariances
![Page 42: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/42.jpg)
Different Covariances
![Page 43: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/43.jpg)
Actions and Risks
i : Action i
(i|Cj) : Loss of taking action i when the situation is Cj
R(i |x) = j (i|Cj) P(Cj |x)
Choose k st
R(k |x) = mini R(i |x)
![Page 44: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/44.jpg)
Function Approximation (Scoring)
![Page 45: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/45.jpg)
Regression
where is noise. In linear regression, Find w,w0 st
)|( tt xfy
00),|( wwxwwxf tt 2
00 )(),( wwxywwE t
t
t
0,00
wE
wE
E
w
![Page 46: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/46.jpg)
Linear Regression
![Page 47: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/47.jpg)
Polynomial Regression
• E.g., quadratic
01
2
2012 ),,|( wxwxwwwwxf ttt
201
2
2012 )(),,( wxwxwywwwE tt
t
t
![Page 48: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/48.jpg)
Polynomial Regression
![Page 49: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/49.jpg)
Multiple Linear Regression
• d inputs:
xwTtdd
tt
dt
dtt
wxwxwxw
wwwwxxxf
02211
21021 ),,,,|,,,(
221021
210
),,,,|,,,(
),,,,(
td
td
ttt
d
wwwwxxxfy
wwwwE
![Page 50: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/50.jpg)
Feature Selection
• Subset selectionForward and backward
methods• Linear Projection
Principal Components Analysis (PCA)
Linear Discriminant Analysis (LDA)
![Page 51: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/51.jpg)
Sequential Feature Selection
(x1) (x2) (x3) (x4)
(x1 x3) (x2 x3) (x3 x4)
(x1 x2 x3) (x2 x3 x4)
Forward Selection
(x1 x2 x3 x4)
(x1 x2 x3) (x1 x2 x4) (x1 x3 x4) (x2 x3 x4)
(x2 x4) (x1 x4) (x1 x2)
Backward Selection
![Page 52: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/52.jpg)
Principal Components Analysis (PCA)
z2
x1
z1x2
z2
z1
Whiteningtransform
![Page 53: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/53.jpg)
Linear Discriminant Analysis (LDA)
x1
z1x2
z1
![Page 54: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/54.jpg)
Memory-based Methods
• Case-based reasoning • Nearest-neighbor algorithms• Keep a list of known instances and
interpolate response from those
![Page 55: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/55.jpg)
Nearest Neighbor
x1
x2
![Page 56: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/56.jpg)
Local Regression
x
y
Mixture of Experts
![Page 57: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/57.jpg)
Missing Data
• Ignore cases with missing data• Mean imputation• Imputation by regression
![Page 58: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/58.jpg)
Training Decision Trees
x1 > 1
x2 > 2 y = 0
y = 1 y = 0
yes
no
no
yes
x2
x11
2
![Page 59: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/59.jpg)
Measuring Disorder
x1x1
70
19
85
04
x2x2
![Page 60: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/60.jpg)
Entropy
n
n
n
n
nn
nn
e rightrightleftleft loglog
![Page 61: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/61.jpg)
Artificial Neural Networks
x1
xd
x2
x0=+1
w1
w2
wd
w0
yg
Regression: IdentityClassification: Sigmoid (0/1)
)(
)( 02211
xwTg
wwxwxgy
![Page 62: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/62.jpg)
Training a Neural Network
• d inputs:
d
iii
T xwggo0
)( xw
2
2)|(
Xt i
iit
Xt
tt xwgyoyXE w
Find w that min E on X
tt yX ,xTraining set:
![Page 63: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/63.jpg)
Nonlinear Optimization
wı
E
ii w
Ew
Gradient-descent:Iterative learning Starting from random w is learning factor
![Page 64: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/64.jpg)
Neural Networks for ClassificationK outputs oj , j=1,..,KEach oj estimates P (Cj|x)
)exp(11
)(
xw
xw
Tj
Tjj sigmoido
![Page 65: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/65.jpg)
Multiple Outputs
x0=+1
oK
xdx2x1
o2o1
wKd
d
i
tiji
tTj
tj xwggo
0
)( xw
![Page 66: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/66.jpg)
Iterative Training
ti
tj
tj
ji
j
jjiji
tTtj
t j
tj
tj
xgoyw
o
oE
wE
w
go
oyXE
j
)('
)(
)|(2
xw
w
ttX yx ,
i
tj
tj
tj
tjji
itj
tjji
xoooyw
xoyw
)1(
LinearNonlinear
![Page 67: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/67.jpg)
Nonlinear classification
Linearly separable NOT Linearly separable;requires a nonlineardiscriminant
![Page 68: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/68.jpg)
Multi-Layer Networks
x0=+1
hH
xdx2x1
h2h1
wKdh0=+1
tKH
o1 o2 oK
H
p
tpjp
tj htgo
0
d
i
tipi
tp xwsigmoidh
0
![Page 69: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/69.jpg)
Probabilistic Networks
,...1.0)|(,05.0)|(
1.0)(
pp
p
![Page 70: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/70.jpg)
Evaluating Learners
1. Given a model M, how can we assess its performance on real (future) data?
2. Given M1, M2, ..., ML which one is the best?
![Page 71: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/71.jpg)
Cross-validation
1 2 3 k-1 k
1 2 3 k-1 k
Repeat k times and average
![Page 72: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/72.jpg)
Combining Learners: Why?
InitialStandardForm
Validationset
Trainset
Predictor 1
Predictor 2
Predictor L
Choosebest
BestPredictor
![Page 73: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/73.jpg)
Combining Learners: How?
InitialStandardForm
Validationset
Trainset
Predictor 1
Predictor 2
Predictor L
Voting
![Page 74: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/74.jpg)
Conclusions:The Importance of Data
• Extract valuable information from large amounts of raw data
• Large amount of reliable data is a must. The quality of the solution depends highly on the quality of the data
• Data mining is not alchemy; we cannot turn stone into gold
![Page 75: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/75.jpg)
Conclusions: The Importance of the Domain Expert
• Joint effort of human experts and computers
• Any information (symmetries, constraints, etc) regarding the application should be made use of to help the learning system
• Results should be checked for consistency by domain experts
![Page 76: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/76.jpg)
Conclusions: The Importance of Being Patient
• Data mining is not straightforward; repeated trials are needed before the system is finetuned.
• Mining may be lengthy and costly. Large expectations lead to large disappointments !
![Page 77: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/77.jpg)
Once again: Important Requirements for Mining
• Large amount of high quality data• Devoted and knowledgable experts on:
1. Application domain2. Databases (Data warehouse)3. Statistics and Machine Learning
• Time and patience
![Page 78: Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d695503460f94a47655/html5/thumbnails/78.jpg)
That’s all folks!