teaching an introductory course in data mining

299
Teaching an Introductory Course in Data Mining Richard J. Roiger Computer and Information Sciences Dept. Minnesota State University, Mankato USA Email: [email protected] Web site: krypton.mnsu.edu/~roiger

Upload: tommy96

Post on 13-Jan-2015

918 views

Category:

Documents


3 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Teaching an Introductory Course in Data Mining

Teaching an Introductory Course in Data Mining

Richard J. Roiger

Computer and Information Sciences Dept.

Minnesota State University, Mankato USA

Email: [email protected]

Web site: krypton.mnsu.edu/~roiger

Page 2: Teaching an Introductory Course in Data Mining

Teaching an Introductory Course in Data Mining

• Designed for university instructors teaching in information science or computer science departments who wish to introduce a data mining course or unit into their curriculum.

• Appropriate for anyone interested in a detailed overview of data mining as a problem-solving tool.

• Will emphasize material found in the text: “Data Mining A Tutorial-Based Primer” published by Addison-Wesley in 2003.

• Additional materials covering the most recent trends in data mining will also • be presented.

• Participants will have the opportunity to experience the data mining process.

• Each participant will receive a complimentary copy of the aforementioned text together with a CD containing power point slides and a student version of IDA.

Page 3: Teaching an Introductory Course in Data Mining

Questions to Answer

• What constitutes data mining?

• Where does data mining fit in a CS or IS curriculum?

• Can I use data mining to solve my problem?

• How do I use data mining to solve my problem?

Page 4: Teaching an Introductory Course in Data Mining

What Constitutes Data Mining?

• Finding interesting patterns in data

• Model building

• Inductive learning

• Generalization

Page 5: Teaching an Introductory Course in Data Mining

What Constitutes Data Mining?

• Business applications– beer and diapers– valid vs. invalid credit purchases– churn analysis

• Web applications– crawler vs. human being– user browsing habits

Page 6: Teaching an Introductory Course in Data Mining

What Constitutes Data Mining?

• Medical applications– microarray data mining– disease diagnosis

• Scientific applications– earthquake detection– gamma-ray bursts

Page 7: Teaching an Introductory Course in Data Mining

Where does data mining fit in a CS or IS curriculum?

Intelligent Systems Minimum Maximum

Computer Science 1 5

Information Systems 1 1

Page 8: Teaching an Introductory Course in Data Mining

Where does data mining fit in a CS or IS curriculum?

Decision Theory Minimum Maximum

Computer Science 0 0

Information Systems 3 3

Page 9: Teaching an Introductory Course in Data Mining

Can I use data mining to solve my problem?

• Do I have access to the data?

• Is the data easily obtainable?

• Do I have access to the right attributes?

Page 10: Teaching an Introductory Course in Data Mining

How do I use data mining to solve my problem?

• What strategies should I apply?

• What data mining techniques should I use?

• How do I evaluate results?

• How do I apply what has been learned?

• Have I adhered to all data privacy issues?

Page 11: Teaching an Introductory Course in Data Mining

Data Mining: A First View

Chapter 1

Page 12: Teaching an Introductory Course in Data Mining

Data Mining

The process of employing one or more computer learning techniques to automatically analyze and extract knowledge from data.

Page 13: Teaching an Introductory Course in Data Mining

Knowledge Discovery in Databases (KDD)

The application of the scientific method to data mining. Data mining is one step of the KDD process.

Page 14: Teaching an Introductory Course in Data Mining

Computers & Learning

Computers are good at learning concepts. Concepts are the output of a data mining session.

Page 15: Teaching an Introductory Course in Data Mining

Supervised Learning

• Build a learner model using data instances of known origin.

• Use the model to determine the outcome new instances of

unknown origin.

Page 16: Teaching an Introductory Course in Data Mining

Supervised Learning:

A Decision Tree Example

Page 17: Teaching an Introductory Course in Data Mining

Decision Tree

A tree structure where non-terminal nodes represent tests on one or more attributes and terminal nodes reflect decision outcomes.

Page 18: Teaching an Introductory Course in Data Mining

Table 1.1 • Hypothetical Training Data for Disease Diagnosis

Patient Sore SwollenID# Throat Fever Glands Congestion Headache Diagnosis

1 Yes Yes Yes Yes Yes Strep throat2 No No No Yes Yes Allergy3 Yes Yes No Yes No Cold4 Yes No Yes No No Strep throat5 No Yes No Yes No Cold6 No No No Yes No Allergy7 No No Yes No No Strep throat8 Yes No No Yes Yes Allergy9 No Yes No Yes Yes Cold10 Yes Yes No Yes Yes Cold

Page 19: Teaching an Introductory Course in Data Mining

Figure 1.1 A decision tree for the data in Table 1.1

SwollenGlands

Fever

No

Yes

Diagnosis = Allergy Diagnosis = Cold

No

Yes

Diagnosis = Strep Throat

Page 20: Teaching an Introductory Course in Data Mining

Table 1.2 • Data Instances with an Unknown Classification

Patient Sore SwollenID# Throat Fever Glands Congestion Headache Diagnosis

11 No No Yes Yes Yes ?12 Yes Yes No No Yes ?13 No No No No Yes ?

Page 21: Teaching an Introductory Course in Data Mining

Production Rules

IF Swollen Glands = Yes

THEN Diagnosis = Strep Throat

IF Swollen Glands = No & Fever = Yes

THEN Diagnosis = Cold

IF Swollen Glands = No & Fever = No

THEN Diagnosis = Allergy

Page 22: Teaching an Introductory Course in Data Mining

Unsupervised Clustering

A data mining method that builds models from data without predefined classes.

Page 23: Teaching an Introductory Course in Data Mining

The Acme Investors Dataset

Table 1.3 • Acme Investors Incorporated

Customer Account Margin Transaction Trades/ Favorite AnnualID Type Account Method Month Sex Age Recreation Income

1005 Joint No Online 12.5 F 30–39 Tennis 40–59K1013 Custodial No Broker 0.5 F 50–59 Skiing 80–99K1245 Joint No Online 3.6 M 20–29 Golf 20–39K2110 Individual Yes Broker 22.3 M 30–39 Fishing 40–59K1001 Individual Yes Online 5.0 M 40–49 Golf 60–79K

Page 24: Teaching an Introductory Course in Data Mining

The Acme Investors Dataset & Supervised Learning

1. Can I develop a general profile of an online investor?2. Can I determine if a new customer is likely to open a

margin account?3. Can I build a model predict the average number of trades

per month for a new investor?4. What characteristics differentiate female and male

investors?

Page 25: Teaching an Introductory Course in Data Mining

The Acme Investors Dataset & Unsupervised Clustering

1. What attribute similarities group customers of Acme Investors together?

2. What differences in attribute values segment the customer database?

Page 26: Teaching an Introductory Course in Data Mining

1.3 Is Data Mining Appropriate for My Problem?

Page 27: Teaching an Introductory Course in Data Mining

Data Mining or Data Query?

• Shallow Knowledge

• Multidimensional Knowledge

• Hidden Knowledge

• Deep Knowledge

Page 28: Teaching an Introductory Course in Data Mining

Shallow Knowledge

Shallow knowledge is factual. It can be easily stored and manipulated in a database.

Page 29: Teaching an Introductory Course in Data Mining

Multidimensional Knowledge

Multidimensional knowledge is also factual. On-line analytical Processing (OLAP) tools are used to manipulate multidimensional knowledge.

Page 30: Teaching an Introductory Course in Data Mining

Hidden Knowledge

Hidden knowledge represents patterns or regularities in data that cannot be easily found using database query. However, data mining algorithms can find such patterns with ease.

Page 31: Teaching an Introductory Course in Data Mining

Data Mining vs. Data Query: An Example

• Use data query if you already almost know what you are looking for.

• Use data mining to find regularities in data that are not obvious.

Page 32: Teaching an Introductory Course in Data Mining

1.4 Expert Systems or Data Mining?

Page 33: Teaching an Introductory Course in Data Mining

Expert System

A computer program that emulates the problem-solving skills of one or more human experts.

Page 34: Teaching an Introductory Course in Data Mining

Knowledge Engineer

A person trained to interact with an expert in order to capture their knowledge.

Page 35: Teaching an Introductory Course in Data Mining

Figure 1.2 Data mining vs. expert systems

Data Mining Tool

Expert SystemBuilding Tool

Human Expert

If Swollen Glands = YesThen Diagnosis = Strep Throat

If Swollen Glands = YesThen Diagnosis = Strep Throat

Knowledge Engineer

Data

Page 36: Teaching an Introductory Course in Data Mining

1.5 A Simple Data Mining Process Model

Page 37: Teaching an Introductory Course in Data Mining

Figure 1.3 A simple data mining process model

SQL QueriesOperationalDatabase

DataWarehouse

ResultApplication

Interpretation&

EvaluationData Mining

Page 38: Teaching an Introductory Course in Data Mining

1.6 Why Not Simple Search?

• Nearest Neighbor Classifier

• K-nearest Neighbor Classifier

Page 39: Teaching an Introductory Course in Data Mining

Nearest Neighbor Classifier

Classification is performed by searching the training data for the instance closest in distance to the unknown instance.

Page 40: Teaching an Introductory Course in Data Mining

Customer Intrinsic Value

Page 41: Teaching an Introductory Course in Data Mining

Figure 1.4 Intrinsic vs. actual customer value

X

X

X

X

X

XX

X

X

_

_

__

_

_

_

_

_

__

Intrinsic(Predicted)

Value

Actual Value

Page 42: Teaching an Introductory Course in Data Mining

Data Mining: A Closer Look

Chapter 2

Page 43: Teaching an Introductory Course in Data Mining

2.1 Data Mining Strategies

Page 44: Teaching an Introductory Course in Data Mining

Figure 2.1 A hierarchy of data mining strategies

Data MiningStrategies

SupervisedLearning

Market BasketAnalysis

UnsupervisedClustering

PredictionEstimationClassification

Page 45: Teaching an Introductory Course in Data Mining

Data Mining Strategies: Classification

• Learning is supervised.

• The dependent variable is categorical.

• Well-defined classes.

• Current rather than future behavior.

Page 46: Teaching an Introductory Course in Data Mining

Data Mining Strategies: Estimation

• Learning is supervised.

• The dependent variable is numeric.

• Well-defined classes.

• Current rather than future behavior.

Page 47: Teaching an Introductory Course in Data Mining

Data Mining Strategies:Prediction

• The emphasis is on predicting future rather than current outcomes.

• The output attribute may be categorical or numeric.

Page 48: Teaching an Introductory Course in Data Mining

Classification, Estimation or Prediction?

The nature of the data determines whether a model is suitable for classification, estimation, or prediction.

Page 49: Teaching an Introductory Course in Data Mining

The Cardiology Patient Dataset

This dataset contains 303 instances. Each instance holds information about a patient who either has or does not have a heart condition.

Page 50: Teaching an Introductory Course in Data Mining

The Cardiology Patient Dataset

• 138 instances represent patients with heart disease.• 165 instances contain information about patients free of heart disease.

Page 51: Teaching an Introductory Course in Data Mining

Table 2.1 • Cardiology Patient Data Attribute Mixed Numeric Name Values Values Comments

Age Numeric Numeric Age in years

Sex Male, Female 1, 0 Patient gender

Chest Pain Type Angina, Abnormal Angina, 1–4 NoTang = Nonanginal NoTang, Asymptomatic pain

Blood Pressure Numeric Numeric Resting blood pressure upon hospital admission

Cholesterol Numeric Numeric Serum cholesterol

Fasting Blood True, False 1, 0 Is fasting blood sugar less Sugar < 120 than 120?

Resting ECG Normal, Abnormal, Hyp 0, 1, 2 Hyp = Left ventricular hypertrophy

Maximum Heart Numeric Numeric Maximum heart rate Rate achieved

Induced Angina? True, False 1, 0 Does the patient experience angina as a result of exercise?

Old Peak Numeric Numeric ST depression induced by exercise relative to rest

Slope Up, flat, down 1–3 Slope of the peak exercise ST segment

Number Colored 0, 1, 2, 3 0, 1, 2, 3 Number of major vessels Vessels colored by fluorosopy

Thal Normal fix, rev 3, 6, 7 Normal, fixed defect, reversible defect

Concept Class Healthy, Sick 1, 0 Angiographic disease status

Page 52: Teaching an Introductory Course in Data Mining

Table 2.2 • Most and Least Typical Instances from the Cardiology Domain

Attribute Most Typical Least Typical Most Typical Least TypicalName Healthy Class Healthy Class Sick Class Sick Class

Age 52 63 60 62Sex Male Male Male FemaleChest Pain Type NoTang Angina Asymptomatic AsymptomaticBlood Pressure 138 145 125 160Cholesterol 223 233 258 164Fasting Blood Sugar < 120 False True False FalseResting ECG Normal Hyp Hyp HypMaximum Heart Rate 169 150 141 145Induced Angina? False False True FalseOld Peak 0 2.3 2.8 6.2Slope Up Down Flat DownNumber of Colored Vessels 0 0 1 3Thal Normal Fix Rev Rev

Page 53: Teaching an Introductory Course in Data Mining

Classification, Estimation or Prediction?

The next two slides each contain a rule generated from this dataset. Are either of these rules predictive?

Page 54: Teaching an Introductory Course in Data Mining

A Healthy Class Rule for the Cardiology Patient Dataset

IF 169 <= Maximum Heart Rate <=202

THEN Concept Class = Healthy

Rule accuracy: 85.07%

Rule coverage: 34.55%

Page 55: Teaching an Introductory Course in Data Mining

A Sick Class Rule for the Cardiology Patient Dataset

IF Thal = Rev & Chest Pain Type = Asymptomatic

THEN Concept Class = Sick

Rule accuracy: 91.14%

Rule coverage: 52.17%

Page 56: Teaching an Introductory Course in Data Mining

Data Mining Strategies: Unsupervised Clustering

Page 57: Teaching an Introductory Course in Data Mining

Unsupervised Clustering can be used to:

• determine if relationships can be found in the data.

• evaluate the likely performance of a supervised model.• find a best set of input attributes for supervised learning.• detect Outliers.

Page 58: Teaching an Introductory Course in Data Mining

Data Mining Strategies: Market Basket Analysis

• Find interesting relationships among retail products.

• Uses association rule algorithms.

Page 59: Teaching an Introductory Course in Data Mining

2.2 Supervised Data Mining Techniques

Page 60: Teaching an Introductory Course in Data Mining

The Credit Card Promotion Database

Page 61: Teaching an Introductory Course in Data Mining

Table 2.3 • The Credit Card Promotion Database

Income Magazine Watch Life Insurance Credit CardRange ($) Promotion Promotion Promotion Insurance Sex Age

40–50K Yes No No No Male 4530–40K Yes Yes Yes No Female 4040–50K No No No No Male 4230–40K Yes Yes Yes Yes Male 4350–60K Yes No Yes No Female 3820–30K No No No No Female 5530–40K Yes No Yes Yes Male 3520–30K No Yes No No Male 2730–40K Yes No No No Male 4330–40K Yes Yes Yes No Female 4140–50K No Yes Yes No Female 4320–30K No Yes Yes No Male 2950–60K Yes Yes Yes No Female 3940–50K No Yes No No Male 5520–30K No No Yes Yes Female 19

Page 62: Teaching an Introductory Course in Data Mining

A Hypothesis for the Credit Card Promotion Database

A combination of one or more of the dataset attributes differentiate Acme Credit Card Company card holders who have taken advantage of the life insurance promotion and those card holders who have chosen not to participate in the promotional offer.

Page 63: Teaching an Introductory Course in Data Mining

Supervised Data Mining Techniques:Production Rules

Page 64: Teaching an Introductory Course in Data Mining

A Production Rule for theCredit Card Promotion Database

IF Sex = Female & 19 <=Age <= 43

THEN Life Insurance Promotion = Yes

Rule Accuracy: 100.00%

Rule Coverage: 66.67%

Page 65: Teaching an Introductory Course in Data Mining

Production Rule Accuracy & Coverage

• Rule accuracy is a between-class measure.

• Rule coverage is a within-class measure.

Page 66: Teaching an Introductory Course in Data Mining

Supervised Data Mining Techniques:Neural Networks

Page 67: Teaching an Introductory Course in Data Mining

Figure 2.2 A multilayer fully connected neural network

InputLayer

OutputLayer

HiddenLayer

Page 68: Teaching an Introductory Course in Data Mining

Table 2.4 • Neural Network Training: Actual and Computed Output

Instance Number Life Insurance Promotion Computed Output

1 0 0.024

2 1 0.998

3 0 0.023

4 1 0.986

5 1 0.999

6 0 0.050

7 1 0.999

8 0 0.262

9 0 0.060

10 1 0.997

11 1 0.999

12 1 0.776

13 1 0.999

14 0 0.023

15 1 0.999

Page 69: Teaching an Introductory Course in Data Mining

Supervised Data Mining Techniques:Statistical Regression

Life insurance promotion =

0.5909 (credit card insurance) -

0.5455 (sex) + 0.7727

Page 70: Teaching an Introductory Course in Data Mining

2.3 Association Rules

Page 71: Teaching an Introductory Course in Data Mining

Comparing Association Rules & Production Rules

• Association rules can have one or several output attributes. Production rules are limited to one output attribute.

• With association rules, an output attribute for one rule can be an input attribute for another rule.

Page 72: Teaching an Introductory Course in Data Mining

Two Association Rules for the Credit Card Promotion Database

IF Sex = Female & Age = over40 & Credit Card Insurance = No

THEN Life Insurance Promotion = Yes

IF Sex = Female & Age = over40THEN Credit Card Insurance = No & Life Insurance Promotion = Yes

Page 73: Teaching an Introductory Course in Data Mining

2.4 Clustering Techniques

Page 74: Teaching an Introductory Course in Data Mining

Figure 2.3 An unsupervised clustering of the credit card database

# Instances: 5Sex: Male => 3

Female => 2Age: 37.0Credit Card Insurance: Yes => 1

No => 4Life Insurance Promotion: Yes => 2

No => 3

Cluster 1

Cluster 2

Cluster 3

# Instances: 3Sex: Male => 3

Female => 0Age: 43.3Credit Card Insurance: Yes => 0

No => 3Life Insurance Promotion: Yes => 0

No => 3

# Instances: 7Sex: Male => 2

Female => 5Age: 39.9Credit Card Insurance: Yes => 2

No => 5Life Insurance Promotion: Yes => 7

No => 0

Page 75: Teaching an Introductory Course in Data Mining

2.5 Evaluating Performance

Page 76: Teaching an Introductory Course in Data Mining

Evaluating Supervised Learner Models

Page 77: Teaching an Introductory Course in Data Mining

Confusion Matrix

• A matrix used to summarize the results of a supervised classification.

• Entries along the main diagonal are correct classifications.

• Entries other than those on the main diagonal are classification errors.

Page 78: Teaching an Introductory Course in Data Mining

Table 2.5 • A Three-Class Confusion Matrix

Computed Decision

C1 C2 C3C1 C11 C12 C13

C2 C21 C22 C23

C3 C31 C32 C33

Page 79: Teaching an Introductory Course in Data Mining

Two-Class Error Analysis

Page 80: Teaching an Introductory Course in Data Mining

Table 2.6 • A Simple Confusion Matrix

Computed Computed

Accept Reject

Accept True FalseAccept Reject

Reject False TrueAccept Reject

Page 81: Teaching an Introductory Course in Data Mining

Table 2.7 • Two Confusion Matrices Each Showing a 10% Error Rate

Model Computed Computed Model Computed ComputedA Accept Reject B Accept Reject

Accept 600 25 Accept 600 75Reject 75 300 Reject 25 300

Page 82: Teaching an Introductory Course in Data Mining

Evaluating Numeric Output

• Mean absolute error

• Mean squared error

• Root mean squared error

Page 83: Teaching an Introductory Course in Data Mining

Mean Absolute Error

The average absolute difference between classifier predicted output and actual output.

Page 84: Teaching an Introductory Course in Data Mining

Mean Squared Error

The average of the sum of squared differences between classifier predicted output and actual output.

Page 85: Teaching an Introductory Course in Data Mining

Root Mean Squared Error

The square root of the mean squared error.

Page 86: Teaching an Introductory Course in Data Mining

Comparing Models by Measuring Lift

Page 87: Teaching an Introductory Course in Data Mining

Figure 2.4 Targeted vs. mass mailing

0

200

400

600

800

1000

1200

0 10 20 30 40 50 60 70 80 90 100

NumberResponding

% Sampled

Page 88: Teaching an Introductory Course in Data Mining

Computing Lift

)|(

)|(

PopulationCP

SampleCPLift

i

i

Page 89: Teaching an Introductory Course in Data Mining

Table 2.8 • Two Confusion Matrices: No Model and an Ideal Model

No Computed Computed Ideal Computed ComputedModel Accept Reject Model Accept Reject

Accept 1,000 0 Accept 1,000 0Reject 99,000 0 Reject 0 99,000

Page 90: Teaching an Introductory Course in Data Mining

Table 2.9 • Two Confusion Matrices for Alternative Models with Lift Equal to 2.25

Model Computed Computed Model Computed ComputedX Accept Reject Y Accept Reject

Accept 540 460 Accept 450 550Reject 23,460 75,540 Reject 19,550 79,450

Page 91: Teaching an Introductory Course in Data Mining

Unsupervised Model Evaluation

Page 92: Teaching an Introductory Course in Data Mining

Unsupervised Model Evaluation(cluster quality)

• All clustering techniques compute some measure of cluster quality.

• One evaluation method is to calculate the sum of squared error differences between the instances of each cluster and their cluster center.

• Smaller values indicate clusters of higher quality.

Page 93: Teaching an Introductory Course in Data Mining

Supervised Learning for Unsupervised Model Evaluation

• Designate each formed cluster as a class and assign each class an arbitrary name.

• Choose a random sample of instances from each class for supervised learning.

• Build a supervised model from the chosen instances. Employ the remaining instances to test the correctness of the model.

Page 94: Teaching an Introductory Course in Data Mining

Basic Data Mining Techniques

Chapter 3

Page 95: Teaching an Introductory Course in Data Mining

3.1 Decision Trees

Page 96: Teaching an Introductory Course in Data Mining

An Algorithm for Building Decision Trees

1. Let T be the set of training instances.2. Choose an attribute that best differentiates the instances in T.3. Create a tree node whose value is the chosen attribute.

-Create child links from this node where each link represents a unique value for the chosen attribute.-Use the child link values to further subdivide the instances into subclasses.

4. For each subclass created in step 3: -If the instances in the subclass satisfy predefined criteria or if the set of

remaining attribute choices for this path is null, specify the classification for new instances following this decision path. -If the subclass does not satisfy the criteria and there is at least one attribute to further subdivide the path of the tree, let T be the current set of subclass instances and return to step 2.

Page 97: Teaching an Introductory Course in Data Mining

Table 3.1 • The Credit Card Promotion Database

Income Life Insurance Credit CardRange Promotion Insurance Sex Age

40–50K No No Male 4530–40K Yes No Female 4040–50K No No Male 4230–40K Yes Yes Male 4350–60K Yes No Female 3820–30K No No Female 5530–40K Yes Yes Male 3520–30K No No Male 2730–40K No No Male 4330–40K Yes No Female 4140–50K Yes No Female 4320–30K Yes No Male 2950–60K Yes No Female 3940–50K No No Male 5520–30K Yes Yes Female 19

Page 98: Teaching an Introductory Course in Data Mining

Figure 3.1 A partial decision tree with root node = income range

IncomeRange

30-40K

4 Yes1 No

2 Yes2 No

1 Yes3 No

2 Yes

50-60K40-50K20-30K

Table 3.1 • The Credit Card Promotion Database

Income Life Insurance Credit CardRange Promotion Insurance Sex Age

40–50K No No Male 4530–40K Yes No Female 4040–50K No No Male 4230–40K Yes Yes Male 4350–60K Yes No Female 3820–30K No No Female 5530–40K Yes Yes Male 3520–30K No No Male 2730–40K No No Male 4330–40K Yes No Female 4140–50K Yes No Female 4320–30K Yes No Male 2950–60K Yes No Female 3940–50K No No Male 5520–30K Yes Yes Female 19

Page 99: Teaching an Introductory Course in Data Mining

Figure 3.2 A partial decision tree with root node = credit card insurance

CreditCard

Insurance

No Yes

3 Yes0 No

6 Yes6 No

Table 3.1 • The Credit Card Promotion Database

Income Life Insurance Credit CardRange Promotion Insurance Sex Age

40–50K No No Male 4530–40K Yes No Female 4040–50K No No Male 4230–40K Yes Yes Male 4350–60K Yes No Female 3820–30K No No Female 5530–40K Yes Yes Male 3520–30K No No Male 2730–40K No No Male 4330–40K Yes No Female 4140–50K Yes No Female 4320–30K Yes No Male 2950–60K Yes No Female 3940–50K No No Male 5520–30K Yes Yes Female 19

Page 100: Teaching an Introductory Course in Data Mining

Figure 3.3 A partial decision tree with root node = age

Age

<= 43 > 43

0 Yes3 No

9 Yes3 No

Table 3.1 • The Credit Card Promotion Database

Income Life Insurance Credit CardRange Promotion Insurance Sex Age

40–50K No No Male 4530–40K Yes No Female 4040–50K No No Male 4230–40K Yes Yes Male 4350–60K Yes No Female 3820–30K No No Female 5530–40K Yes Yes Male 3520–30K No No Male 2730–40K No No Male 4330–40K Yes No Female 4140–50K Yes No Female 4320–30K Yes No Male 2950–60K Yes No Female 3940–50K No No Male 5520–30K Yes Yes Female 19

Page 101: Teaching an Introductory Course in Data Mining

Decision Trees for the Credit Card Promotion Database

Page 102: Teaching an Introductory Course in Data Mining

Figure 3.4 A three-node decision tree for the credit card database

Age

Sex

<= 43

Male

Yes (6/0)

Female

> 43

CreditCard

Insurance

YesNo

No (4/1) Yes (2/0)

No (3/0)

Table 3.1 • The Credit Card Promotion Database

Income Life Insurance Credit CardRange Promotion Insurance Sex Age

40–50K No No Male 4530–40K Yes No Female 4040–50K No No Male 4230–40K Yes Yes Male 4350–60K Yes No Female 3820–30K No No Female 5530–40K Yes Yes Male 3520–30K No No Male 2730–40K No No Male 4330–40K Yes No Female 4140–50K Yes No Female 4320–30K Yes No Male 2950–60K Yes No Female 3940–50K No No Male 5520–30K Yes Yes Female 19

Page 103: Teaching an Introductory Course in Data Mining

Figure 3.5 A two-node decision treee for the credit card database

CreditCard

Insurance

Sex

No

Male

Yes (6/1)

Female

Yes

Yes (3/0)

No (6/1)

Table 3.1 • The Credit Card Promotion Database

Income Life Insurance Credit CardRange Promotion Insurance Sex Age

40–50K No No Male 4530–40K Yes No Female 4040–50K No No Male 4230–40K Yes Yes Male 4350–60K Yes No Female 3820–30K No No Female 5530–40K Yes Yes Male 3520–30K No No Male 2730–40K No No Male 4330–40K Yes No Female 4140–50K Yes No Female 4320–30K Yes No Male 2950–60K Yes No Female 3940–50K No No Male 5520–30K Yes Yes Female 19

Page 104: Teaching an Introductory Course in Data Mining

Table 3.2 • Training Data Instances Following the Path in Figure 3.4 to Credit CardInsurance = No

Income Life Insurance Credit CardRange Promotion Insurance Sex Age

40–50K No No Male 4220–30K No No Male 2730–40K No No Male 4320–30K Yes No Male 29

Page 105: Teaching an Introductory Course in Data Mining

Decision Tree Rules

Page 106: Teaching an Introductory Course in Data Mining

A Rule for the Tree in Figure 3.4

IF Age <=43 & Sex = Male & Credit Card Insurance = NoTHEN Life Insurance Promotion = No

Page 107: Teaching an Introductory Course in Data Mining

A Simplified Rule Obtained by Removing Attribute Age

IF Sex = Male & Credit Card Insurance = No THEN Life Insurance Promotion = No

Page 108: Teaching an Introductory Course in Data Mining

Other Methods for Building Decision Trees

• CART

• CHAID

Page 109: Teaching an Introductory Course in Data Mining

Advantages of Decision Trees

• Easy to understand.

• Map nicely to a set of production rules.• Applied to real problems.• Make no prior assumptions about the data.• Able to process both numerical and categorical data.

Page 110: Teaching an Introductory Course in Data Mining

Disadvantages of Decision Trees

• Output attribute must be categorical.

• Limited to one output attribute.• Decision tree algorithms are unstable.• Trees created from numeric datasets can be complex.

Page 111: Teaching an Introductory Course in Data Mining

3.2 Generating Association Rules

Page 112: Teaching an Introductory Course in Data Mining

Confidence and Support

Page 113: Teaching an Introductory Course in Data Mining

Rule Confidence

Given a rule of the form “If A then B”, rule confidence is the conditional probability that B is true when A is known to be true.

Page 114: Teaching an Introductory Course in Data Mining

Rule Support

The minimum percentage of instances in the database that contain all items listed in a given association rule.

Page 115: Teaching an Introductory Course in Data Mining

Mining Association Rules: An Example

Page 116: Teaching an Introductory Course in Data Mining

Table 3.3 • A Subset of the Credit Card Promotion Database

Magazine Watch Life Insurance Credit CardPromotion Promotion Promotion Insurance Sex

Yes No No No MaleYes Yes Yes No FemaleNo No No No MaleYes Yes Yes Yes MaleYes No Yes No FemaleNo No No No FemaleYes No Yes Yes MaleNo Yes No No MaleYes No No No MaleYes Yes Yes No Female

Page 117: Teaching an Introductory Course in Data Mining

Table 3.4 • Single-Item Sets Single-Item Sets Number of Items Magazine Promotion = Yes 7 Watch Promotion = Yes 4 Watch Promotion = No 6 Life Insurance Promotion = Yes 5 Life Insurance Promotion = No 5 Credit Card Insurance = No 8 Sex = Male 6 Sex = Female 4

Page 118: Teaching an Introductory Course in Data Mining

Table 3.5 • Two-Item Sets

Two-Item Sets Number of Items

Magazine Promotion = Yes & Watch Promotion = No 4Magazine Promotion = Yes & Life Insurance Promotion = Yes 5Magazine Promotion = Yes & Credit Card Insurance = No 5Magazine Promotion = Yes & Sex = Male 4Watch Promotion = No & Life Insurance Promotion = No 4Watch Promotion = No & Credit Card Insurance = No 5Watch Promotion = No & Sex = Male 4Life Insurance Promotion = No & Credit Card Insurance = No 5Life Insurance Promotion = No & Sex = Male 4Credit Card Insurance = No & Sex = Male 4Credit Card Insurance = No & Sex = Female 4

Page 119: Teaching an Introductory Course in Data Mining

Two Possible Two-Item Set Rules

IF Magazine Promotion =Yes

THEN Life Insurance Promotion =Yes (5/7)

IF Life Insurance Promotion =Yes

THEN Magazine Promotion =Yes (5/5)

Page 120: Teaching an Introductory Course in Data Mining

Three-Item Set Rules

IF Watch Promotion =No & Life Insurance Promotion = No

THEN Credit Card Insurance =No (4/4)

IF Watch Promotion =No THEN Life Insurance Promotion = No & Credit

Card Insurance = No (4/6)

Page 121: Teaching an Introductory Course in Data Mining

General Considerations

• We are interested in association rules that show a lift in product sales where the lift is the result of the product’s association with one or more other products.

• We are also interested in association rules that show a lower than expected confidence for a particular association.

Page 122: Teaching an Introductory Course in Data Mining

3.3 The K-Means Algorithm

1. Choose a value for K, the total number of clusters.

2. Randomly choose K points as cluster centers.

3. Assign the remaining instances to their closest cluster center.

4. Calculate a new cluster center for each cluster.

5. Repeat steps 3-5 until the cluster centers do not change.

Page 123: Teaching an Introductory Course in Data Mining

An Example Using K-Means

Page 124: Teaching an Introductory Course in Data Mining

Table 3.6 • K-Means Input Values

Instance X Y

1 1.0 1.52 1.0 4.53 2.0 1.54 2.0 3.55 3.0 2.56 5.0 6.0

Page 125: Teaching an Introductory Course in Data Mining

Figure 3.6 A coordinate mapping of the data in Table 3.6

0

1

2

3

4

5

6

7

0 1 2 3 4 5 6

f(x)

x

Page 126: Teaching an Introductory Course in Data Mining

Table 3.7 • Several Applications of the K-Means Algorithm (K = 2)

Outcome Cluster Centers Cluster Points Squared Error

1 (2.67,4.67) 2, 4, 614.50

(2.00,1.83) 1, 3, 5

2 (1.5,1.5) 1, 315.94

(2.75,4.125) 2, 4, 5, 6

3 (1.8,2.7) 1, 2, 3, 4, 59.60

(5,6) 6

Page 127: Teaching an Introductory Course in Data Mining

Figure 3.7 A K-Means clustering of the data in Table 3.6 (K = 2)

0

1

2

3

4

5

6

7

0 1 2 3 4 5 6

x

f(x)

Page 128: Teaching an Introductory Course in Data Mining

General Considerations

• Requires real-valued data.

• We must select the number of clusters present in the data.

• Works best when the clusters in the data are of approximately equal size.• Attribute significance cannot be determined.• Lacks explanation capabilities.

Page 129: Teaching an Introductory Course in Data Mining

3.4 Genetic Learning

Page 130: Teaching an Introductory Course in Data Mining

Genetic Learning Operators

• Crossover

• Mutation

• Selection

Page 131: Teaching an Introductory Course in Data Mining

Genetic Algorithms and Supervised Learning

Page 132: Teaching an Introductory Course in Data Mining

Figure 3.8 Supervised genetic learning

FitnessFunction

PopulationElements

Candidatesfor Crossover

& Mutation

TrainingData

Keep

Throw

Page 133: Teaching an Introductory Course in Data Mining

Table 3.8 • An Initial Population for Supervised Genetic Learning

Population Income Life Insurance Credit CardElement Range Promotion Insurance Sex Age

1 20–30K No Yes Male 30–392 30–40K Yes No Female 50–593 ? No No Male 40–494 30–40K Yes Yes Male 40–49

Page 134: Teaching an Introductory Course in Data Mining

Table 3.9 • Training Data for Genetic Learning

Training Income Life Insurance Credit CardInstance Range Promotion Insurance Sex Age

1 30–40K Yes Yes Male 30–392 30–40K Yes No Female 40–493 50–60K Yes No Female 30–394 20–30K No No Female 50–595 20–30K No No Male 20–296 30–40K No No Male 40–49

Page 135: Teaching an Introductory Course in Data Mining

Figure 3.9 A crossover operation

PopulationElement

AgeSexCredit CardInsurance

Life InsurancePromotion

IncomeRange

#1 30-39MaleYesNo20-30K

PopulationElement

AgeSexCredit CardInsurance

Life InsurancePromotion

IncomeRange

#2 50-59FemNoYes30-40K

PopulationElement

AgeSexCredit CardInsurance

Life InsurancePromotion

IncomeRange

#2 30-39MaleYesYes30-40K

PopulationElement

AgeSexCredit CardInsurance

Life InsurancePromotion

IncomeRange

#1 50-59FemNoNo20-30K

Page 136: Teaching an Introductory Course in Data Mining

Table 3.10 • A Second-Generation Population

Population Income Life Insurance Credit CardElement Range Promotion Insurance Sex Age

1 20–30K No No Female 50–592 30–40K Yes Yes Male 30–393 ? No No Male 40–494 30–40K Yes Yes Male 40–49

Page 137: Teaching an Introductory Course in Data Mining

Genetic Algorithms and Unsupervised Clustering

Page 138: Teaching an Introductory Course in Data Mining

Figure 3.10 Unsupervised genetic clustering

a1 a2 a3 . . . an

.

.

.

.

I1

Ip

I2.....

Pinstances

S1

Ek2

Ek1

E22

E21

E12

E11

SK

S2

Solutions

.

.

.

Page 139: Teaching an Introductory Course in Data Mining

Table 3.11 • A First-Generation Population for Unsupervised Clustering

S1

S2

S3

Solution elements (1.0,1.0) (3.0,2.0) (4.0,3.0)(initial population) (5.0,5.0) (3.0,5.0) (5.0,1.0)

Fitness score 11.31 9.78 15.55

Solution elements (5.0,1.0) (3.0,2.0) (4.0,3.0)(second generation) (5.0,5.0) (3.0,5.0) (1.0,1.0)

Fitness score 17.96 9.78 11.34

Solution elements (5.0,5.0) (3.0,2.0) (4.0,3.0)(third generation) (1.0,5.0) (3.0,5.0) (1.0,1.0)

Fitness score 13.64 9.78 11.34

Page 140: Teaching an Introductory Course in Data Mining

General Considerations

• Global optimization is not a guarantee.

• The fitness function determines the complexity of the algorithm.• Explain their results provided the fitness function is understandable.• Transforming the data to a form suitable for

genetic learning can be a challenge.

Page 141: Teaching an Introductory Course in Data Mining

3.5 Choosing a Data Mining Technique

Page 142: Teaching an Introductory Course in Data Mining

Initial Considerations

• Is learning supervised or unsupervised?

• Is explanation required?• What is the interaction between input and output attributes?• What are the data types of the input and output attributes?

Page 143: Teaching an Introductory Course in Data Mining

Further Considerations

• Do We Know the Distribution of the Data?

• Do We Know Which Attributes Best Define the Data?• Does the Data Contain Missing Values?• Is Time an Issue?• Which Technique Is Most Likely to Give a Best Test

Set Accuracy?

Page 144: Teaching an Introductory Course in Data Mining

An Excel-based Data Mining Tool

Chapter 4

Page 145: Teaching an Introductory Course in Data Mining

Figure 4.1 The iDA system architecture

Data

PreProcessor

Interface

HeuristicAgent

NeuralNetworks

LargeDataset

ESX

MiningTechnique

GenerateRules

RulesRuleMaker

ReportGenerator

ExcelSheets

Explaination

Yes

No

No

Yes

Yes

No

Page 146: Teaching an Introductory Course in Data Mining

4.2 ESX: A Multipurpose Tool for Data Mining

Page 147: Teaching an Introductory Course in Data Mining

Figure 4.3 An ESX concept hierarchy

Root

CnC1 C2

I11 I1jI12

Root Level

Instance Level

Concept Level

. . .

. . .

I21 I2kI22

. . . In1 InlIn2

. . .

Page 148: Teaching an Introductory Course in Data Mining

Table 4.1 • Credit Card Promotion Database: iDAV Format

Income Magazine Watch Life Insurance Credit CardRange Promotion Promotion Promotion Insurance Sex Age

C C C C C C RI I I I I I I

40–50K Yes No No No Male 4530–40K Yes Yes Yes No Female 4040–50K No No No No Male 4230–40K Yes Yes Yes Yes Male 4350–60K Yes No Yes No Female 3820–30K No No No No Female 5530–40K Yes No Yes Yes Male 3520–30K No Yes No No Male 2730–40K Yes No No No Male 4330–40K Yes Yes Yes No Female 4140–50K No Yes Yes No Female 4320–30K No Yes Yes No Male 2950–60K Yes Yes Yes No Female 3940–50K No Yes No No Male 5520–30K No No Yes Yes Female 19

Page 149: Teaching an Introductory Course in Data Mining

Figure 4.10 Class 3 summary results

Page 150: Teaching an Introductory Course in Data Mining

Knowledge Discovery in Databases

Chapter 5

Page 151: Teaching an Introductory Course in Data Mining

5.1 A KDD Process Model

Page 152: Teaching an Introductory Course in Data Mining

Figure 5.1 A seven-step KDD process model

Step 3: Data Preprocessing

CleansedData

Step 2: Create Target Data

DataWarehouse

TargetData

Step 1: Goal Identification

DefinedGoals

Step 4: Data Transformation

TransformedData

Step 7: Taking Action

Step 6: Interpretation & EvaluationStep 5: Data Mining

DataModel

Transactional

Database

FlatFile

Page 153: Teaching an Introductory Course in Data Mining

Figure 5.2 Applyiing the scientific method to data mining

The Scientific Method

Define the Problem

A KDD Process Model

Take Action

Interpretation / Evaluation

Create Target DataData PreprocessingData TransformationData Mining

Identify the Goal

Verifiy Conclusions

Draw Conclusions

Perform an Experiment

Formulate a Hypothesis

{

Page 154: Teaching an Introductory Course in Data Mining

Step 1: Goal Identification

• Define the Problem.

• Choose a Data Mining Tool.

• Estimate Project Cost.

• Estimate Project Completion Time.

• Address Legal Issues.

• Develop a Maintenance Plan.

Page 155: Teaching an Introductory Course in Data Mining

Step 2: Creating a Target Dataset

Page 156: Teaching an Introductory Course in Data Mining

Figure 5.3 The Acme credit card database

Page 157: Teaching an Introductory Course in Data Mining

Step 3: Data Preprocessing

• Noisy Data

• Missing Data

Page 158: Teaching an Introductory Course in Data Mining

Noisy Data

• Locate Duplicate Records.

• Locate Incorrect Attribute Values.

• Smooth Data.

Page 159: Teaching an Introductory Course in Data Mining

Preprocessing Missing Data

• Discard Records With Missing Values.

• Replace Missing Real-valued Items With the Class Mean.

• Replace Missing Values With Values Found Within Highly Similar Instances.

Page 160: Teaching an Introductory Course in Data Mining

Processing Missing Data While Learning

• Ignore Missing Values.

• Treat Missing Values As Equal Compares.

• Treat Missing values As Unequal Compares.

Page 161: Teaching an Introductory Course in Data Mining

Step 4: Data Transformation

• Data Normalization

• Data Type Conversion

• Attribute and Instance Selection

Page 162: Teaching an Introductory Course in Data Mining

Data Normalization

• Decimal Scaling

• Min-Max Normalization

• Normalization using Z-scores

• Logarithmic Normalization

Page 163: Teaching an Introductory Course in Data Mining

Attribute and Instance Selection

• Eliminating Attributes

• Creating Attributes

• Instance Selection

Page 164: Teaching an Introductory Course in Data Mining

Table 5.1 • An Initial Population for Genetic Attribute Selection

Population Income Magazine Watch Credit CardElement Range Promotion Promotion Insurance Sex Age

1 1 0 0 1 1 12 0 0 0 1 0 13 0 0 0 0 1 1

Page 165: Teaching an Introductory Course in Data Mining

Step 5: Data Mining

1. Choose training and test data.

2. Designate a set of input attributes.

3. If learning is supervised, choose one or more output attributes.

4. Select learning parameter values.

5. Invoke the data mining tool.

Page 166: Teaching an Introductory Course in Data Mining

Step 6: Interpretation and Evaluation

• Statistical analysis.

• Heuristic analysis.

• Experimental analysis.

• Human analysis.

Page 167: Teaching an Introductory Course in Data Mining

Step 7: Taking Action

• Create a report.

• Relocate retail items.

• Mail promotional information.

• Detect fraud.

• Fund new research.

Page 168: Teaching an Introductory Course in Data Mining

5.9 The Crisp-DM Process Model

1. Business understanding

2. Data understanding

3. Data preparation

4. Modeling

5. Evaluation

6. Deployment

Page 169: Teaching an Introductory Course in Data Mining

The Data Warehouse

Chapter 6

Page 170: Teaching an Introductory Course in Data Mining

6.1 Operational Databases

Page 171: Teaching an Introductory Course in Data Mining

Data Modeling and Normalization

• One-to-One Relationships

• One-to-Many Relationships

• Many-to-Many Relationships

Page 172: Teaching an Introductory Course in Data Mining

Data Modeling and Normalization

• First Normal Form

• Second Normal Form

• Third Normal Form

Page 173: Teaching an Introductory Course in Data Mining

Figure 6.1 A simple entity-relationship diagram

Type ID

Year

Make

Income Range

Customer ID

Vehicle - Type Customer

Page 174: Teaching an Introductory Course in Data Mining

The Relational Model

Page 175: Teaching an Introductory Course in Data Mining

Table 6.1a • Relational Table for Vehicle-Type

Type ID Make Year

4371 Chevrolet 19956940 Cadillac 20004595 Chevrolet 20012390 Cadillac 1997

Page 176: Teaching an Introductory Course in Data Mining

Table 6.1b • Relational Table for Customer

Customer IncomeID Range ($) Type ID

0001 70–90K 23900002 30–50K 43710003 70–90K 69400004 30–50K 45950005 70–90K 2390

Page 177: Teaching an Introductory Course in Data Mining

Table 6.2 • Join of Tables 6.1a and 6.1b

Customer IncomeID Range ($) Type ID Make Year

0001 70–90K 2390 Cadillac 19970002 30–50K 4371 Chevrolet 19950003 70–90K 6940 Cadillac 20000004 30–50K 4595 Chevrolet 20010005 70–90K 2390 Cadillac 1997

Page 178: Teaching an Introductory Course in Data Mining

6.2 Data Warehouse Design

Page 179: Teaching an Introductory Course in Data Mining

The Data Warehouse

“A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management’s decision making process (W.H. Inmon).”

Page 180: Teaching an Introductory Course in Data Mining

Granularity

Granularity is a term used to describe the level of detail of stored information.

Page 181: Teaching an Introductory Course in Data Mining

Figure 6.2 A data warehouse process model

OperationalDatabase(s)

Decision Support SystemDataWarehouse

IndependentData Mart

ExternalData

ETL Routine(Extract/Transform/Load)

DependentData Mart

Extract/Summarize Data

Report

Page 182: Teaching an Introductory Course in Data Mining

Entering Data into the Warehouse

• Independent Data Mart

• ETL (Extract, Transform, Load Routine)

• Metadata

Page 183: Teaching an Introductory Course in Data Mining

Structuring the Data Warehouse: Two Methods

• Structure the warehouse model using the star schema

• Structure the warehouse model as a multidimensional array

Page 184: Teaching an Introductory Course in Data Mining

The Star Schema

• Fact Table

• Dimension Tables

• Slowly Changing Dimensions

Page 185: Teaching an Introductory Course in Data Mining

Figure 6.3 A star schema for credit card purchases

Cardholder Key Purchase Key1 2

Fact TableAmountTime KeyLocation Key

101 14.50

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

15 4 115 8.251 2 103 22.40

Location Key Street10 425 Church St

Location DimensionRegionStateCity

SCCharleston 3...

.

.

.

.

.

.

.

.

.

.

.

.

GenderMale

.

.

.

Female

Income Range50 - 70,000

.

.

.

70 - 90,000

Cardholder Key Name1 John Doe

.

.

.

.

.

.

2 Sara Smith

Cardholder Dimension

Purchase Key Category1 Supermarket

.

.

.

.

.

.

2 Travel & Entertainment

Purchase Dimension

3 Auto & Vehicle4 Retail5 Restarurant6 Miscellaneous

Time Key Month10 Jan

Time DimensionYearQuarterDay

15 2002...

.

.

.

.

.

.

.

.

.

.

.

.

Page 186: Teaching an Introductory Course in Data Mining

The Multidimensionality of the Star Schema

Page 187: Teaching an Introductory Course in Data Mining

Figure 6.4 Dimensions of the fact table shown in Figure 6.3

PurchaseKey

Location Key

Time K

ey

A(C i,1,2,10)

Cardholder Ci

Page 188: Teaching an Introductory Course in Data Mining

Additional Relational Schemas

• Snowflake Schema

• Constellation Schema

Page 189: Teaching an Introductory Course in Data Mining

Figure 6.5 A constellation schema for credit card purchases and promotions

Cardholder Key Purchase Key1 2

Purchase Fact TableAmountTime KeyLocation Key

101 14.50

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

15 4 115 8.251 2 103 22.40

Time Key Month5 Dec

Time DimensionYearQuarterDay

431 2001

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

8 Jan 13 200210 Jan 15 2002

Promotion Key DescriptionPromotion Dimension

Cost

.

.

.

.

.

.

.

.

.

1 watch promo 15.25

Purchase Key Category1 Supermarket2 Travel & Entertainment

Purchase Dimension

3 Auto & Vehicle4 Retail5 Restarurant6 Miscellaneous

Location Key Street5 425 Church St

Location DimensionRegionStateCity

SCCharleston 3...

.

.

.

.

.

.

.

.

.

.

.

.

Cardholder Key Promotion Key1 1

Promotion Fact TableResponseTime Key

5 Yes

.

.

.

.

.

.

.

.

.

.

.

.

2 1 5 No

GenderMale

.

.

.

Female

Income Range50 - 70,000

.

.

.

70 - 90,000

Cardholder Key Name1 John Doe

.

.

.

.

.

.

2 Sara Smith

Cardholder Dimension

Page 190: Teaching an Introductory Course in Data Mining

Decision Support: Analyzing the Warehouse Data

• Reporting Data

• Analyzing Data

• Knowledge Discovery

Page 191: Teaching an Introductory Course in Data Mining

6.3 On-line Analytical Processing

Page 192: Teaching an Introductory Course in Data Mining

OLAP Operations

• Slice – A single dimension operation

• Dice – A multidimensional operation

• Roll-up – A higher level of generalization

• Drill-down – A greater level of detail

• Rotation – View data from a new perspective

Page 193: Teaching an Introductory Course in Data Mining

Figure 6.6 A multidimensional cube for credit card purchases

Dec.

Mar.

Feb.

Apr.

May

Jun.

Jul.

Aug.

Sep.

Oct.

Nov.

Jan.

Mo

nth

Su

pe

rma

rke

t

Mis

ce

lla

neo

us

Res

tau

ran

t

Tra

vel

Ret

ail

Ve

hic

le

Category

RegionOne

FourThreeTwo

Month = Dec.

Count = 110

Amount = 6,720

Region = Two

Category = Vehicle

Page 194: Teaching an Introductory Course in Data Mining

Concept Hierarchy

A mapping that allows attributes to be viewed from varying levels of detail.

Page 195: Teaching an Introductory Course in Data Mining

Figure 6.7 A concept hierarchy for location

Region

Street Address

City

State

Page 196: Teaching an Introductory Course in Data Mining

Figure 6.8 Rolling up from months to quarters

Q4

Q2

Q3

Tim

e

Su

pe

rma

rke

t

Mis

ce

lla

ne

ou

s

Re

sta

ura

nt

Tra

vel

Re

tail

Ve

hic

le

Category

Q1

Month = Oct./Nov/Dec.

Region = One

Category = Supermarket

Page 197: Teaching an Introductory Course in Data Mining

Formal Evaluation Techniques

Chapter 7

Page 198: Teaching an Introductory Course in Data Mining

7.1 What Should Be Evaluated?

1. Supervised Model

2. Training Data

3. Attributes

4. Model Builder

5. Parameters

6. Test Set Evaluation

Page 199: Teaching an Introductory Course in Data Mining

Figure 7.1 Components for supervised learning

ModelBuilder

SupervisedModel EvaluationData

Instances

Attributes

Parameters

Test Data

Training Data

Page 200: Teaching an Introductory Course in Data Mining

Single-Valued Summary Statistics

• Mean

• Variance

• Standard deviation

Page 201: Teaching an Introductory Course in Data Mining

The Normal Distribution

Page 202: Teaching an Introductory Course in Data Mining

Figure 7.2 A normal distribution

-99 -3 -2 -1 0 1 2 3 99

13.54%

34.13%

2.14%

34.13%

13.54%

2.14%

.13%.13%

f(x)

x

Page 203: Teaching an Introductory Course in Data Mining

Normal Distributions & Sample Means

• A distribution of means taken from random sets of independent samples of equal size are distributed normally.

• Any sample mean will vary less than two standard errors from the population mean 95% of the time.

Page 204: Teaching an Introductory Course in Data Mining

Equation 7.2

A Classical Model for Hypothesis Testing

sizes. sampleingcorrespondareand

means; respectivetheforscoresvarianceareand

samples;tindependentheformeanssampleareand

and; score cesignifican theis

21

21

21

nn

XX

P

where

vv

)//( 2211

21

nvnv

XXP

Page 205: Teaching an Introductory Course in Data Mining

Table 7.1 • A Confusion Matrix for the Null Hypothesis

Computed Computed Accept Reject

Accept Null True Accept Type 1 ErrorHypothesis

Reject Null Type 2 Error True RejectHypothesis

Page 206: Teaching an Introductory Course in Data Mining

Equation 7.3

7.3 Computing Test Set Confidence Intervals

instances set test of #

errors set test of # )( e Error RatClassifier E

Page 207: Teaching an Introductory Course in Data Mining

Computing 95% Confidence Intervals

1. Given a test set sample S of size n and error rate E

2. Compute sample variance as V= E(1-E)

3. Compute the standard error (SE) as the square root of V divided by n.

4. Calculate an upper bound error as E + 2(SE)

5. Calculate a lower bound error as E - 2(SE)

Page 208: Teaching an Introductory Course in Data Mining

Cross Validation

• Used when ample test data is not available• Partition the dataset into n fixed-size units.

n-1 units are used for training and the nth unit is used as a test set.

• Repeat this process until each of the fixed-size units has been used as test data.

• Model correctness is taken as the average of all training-test trials.

Page 209: Teaching an Introductory Course in Data Mining

Bootstrapping

• Used when ample training and test data is not available.

• Bootstrapping allows instances to appear more than once in the training data.

Page 210: Teaching an Introductory Course in Data Mining

7.4 Comparing Supervised Learner Models

Page 211: Teaching an Introductory Course in Data Mining

Equation 7.4

Comparing Models with Independent Test Data

where

E1 = The error rate for model M1

E2 = The error rate for model M2

q = (E1 + E2)/2

n1 = the number of instances in test set A

n2 = the number of instances in test set B

 

)2/11/1)(1(

21

nnqq

EEP

Page 212: Teaching an Introductory Course in Data Mining

Equation 7.5

Comparing Models with a Single Test Dataset

where

E1 = The error rate for model M1

E2 = The error rate for model M2

q = (E1 + E2)/2

n = the number of test set instances

 

)/2)(1(

21

nqq

EEP

Page 213: Teaching an Introductory Course in Data Mining

7.5 Attribute Evaluation

Page 214: Teaching an Introductory Course in Data Mining

Locating Redundant Attributes with Excel

• Correlation Coefficient

• Positive Correlation

• Negative Correlation

• Curvilinear Relationship

Page 215: Teaching an Introductory Course in Data Mining

Creating a Scatterplot Diagram with MS Excel

Page 216: Teaching an Introductory Course in Data Mining

Equation 7.6

Hypothesis Testing for Numerical Attribute Significance

jjii

ji

ininstancesofnumber theisand in instancesofnumber theis

. attributefor variancej class theand variancei class the

.attributeformeanjclass theis andmeaniclass theis i

where

CC

Aisis

Aj

XX

nn

vv

)//( jnjviniv

jXiX

ijP

Page 217: Teaching an Introductory Course in Data Mining

7.6 Unsupervised Evaluation Techniques

• Unsupervised Clustering for Supervised Evaluation

• Supervised Evaluation for Unsupervised Clustering

• Additional Methods

Page 218: Teaching an Introductory Course in Data Mining

7.7 Evaluating Supervised Models with Numeric Output

Page 219: Teaching an Introductory Course in Data Mining

Equation 7.7

Mean Squared Error

where for the ith instance,

ai = actual output value

ci = computed output value

 

 

n

cacacacamse

2) ( ... )(... 2) ( 2) ( nni i2211

Page 220: Teaching an Introductory Course in Data Mining

Equation 7.8

Mean Absolute Error

where for the ith instance,

ai = actual output value

ci = computed output value

 

 

n

cacacamae

| | .... | | | | nn2211

Page 221: Teaching an Introductory Course in Data Mining

Neural Networks

Chapter 8

Page 222: Teaching an Introductory Course in Data Mining

8.1 Feed-Forward Neural Networks

Page 223: Teaching an Introductory Course in Data Mining

Figure 8.1 A fully connected feed-forward neural network

Node 1

Node 2

Node i

Node j

Node k

Node 3

Input Layer Output LayerHidden Layer

1.0

0.7

0.4

Wjk

Wik

W3i

W3j

W2i

W2j

W1i

W1j

Page 224: Teaching an Introductory Course in Data Mining

Equation 8.2

The Sigmoid Function

2.718282.by edapproximat logarithms natural of base theis

where

1

1)(

e

xexf

Page 225: Teaching an Introductory Course in Data Mining

Figure 8.2 The sigmoid function

0.000

0.200

0.400

0.600

0.800

1.000

1.200

-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6

f(x)

x

Page 226: Teaching an Introductory Course in Data Mining

Supervised Learning with Feed-Forward Networks

• Backpropagation Learning

• Genetic Learning

Page 227: Teaching an Introductory Course in Data Mining

Unsupervised Clustering with Self-Organizing Maps

Page 228: Teaching an Introductory Course in Data Mining

Figure 8.3 A 3x3 Kohonen network with two input layer nodes

Output Layer

Input Layer

Node 2Node 1

Page 229: Teaching an Introductory Course in Data Mining

8.3 Neural Network Explanation

• Sensitivity Analysis

• Average Member Technique

Page 230: Teaching an Introductory Course in Data Mining

8.4 General Considerations

• What input attributes will be used to build the network? • How will the network output be represented?• How many hidden layers should the network contain?• How many nodes should there be in each hidden layer?• What condition will terminate network training?

Page 231: Teaching an Introductory Course in Data Mining

Neural Network Strengths

• Work well with noisy data.• Can process numeric and categorical data.• Appropriate for applications requiring a time element.• Have performed well in several domains.• Appropriate for supervised learning and unsupervised

clustering.

Page 232: Teaching an Introductory Course in Data Mining

Weaknesses

• Lack explanation capabilities.• May not provide optimal solutions to problems.• Overtraining can be a problem.

Page 233: Teaching an Introductory Course in Data Mining

Statistical Techniques

Chapter 10

Page 234: Teaching an Introductory Course in Data Mining

Equation 10.1

10.1 Linear Regression Analysis

cnxnaxaxaxanxxxxf .......)...,,( 332211321

Page 235: Teaching an Introductory Course in Data Mining

Multiple Linear Regression with Excel

Page 236: Teaching an Introductory Course in Data Mining

Regression Trees

Page 237: Teaching an Introductory Course in Data Mining

Figure 10.2 A generic model tree

Test 1

Test 3Test 2

Test 4

>=

>=

>=

<< >=

<

<

LRM1 LRM2 LRM3

LRM4 LRM5

Page 238: Teaching an Introductory Course in Data Mining

10.2 Logistic Regression

Page 239: Teaching an Introductory Course in Data Mining

Transforming the Linear Regression Model

Logistic regression is a nonlinear regression technique that associates a conditional probability with each data instance.

Page 240: Teaching an Introductory Course in Data Mining

Equation 10.7

The Logistic Regression Model

exp as denoted often logarithms natural of basetheis

where

1)|1(

e

xypc

c

e

e

ax

ax

Page 241: Teaching an Introductory Course in Data Mining

Equation 10.9

10.3 Bayes Classifier

H

H

EP

HPHEPEHP

withassociated evidence theis E

testedbe tohypothesis theis where

)(

)()|()|(

Page 242: Teaching an Introductory Course in Data Mining

Bayes Classifier: An Example

Page 243: Teaching an Introductory Course in Data Mining

Table 10.4 • Data for Bayes Classifier

Magazine Watch Life Insurance Credit CardPromotion Promotion Promotion Insurance Sex

Yes No No No MaleYes Yes Yes Yes FemaleNo No No No MaleYes Yes Yes Yes MaleYes No Yes No FemaleNo No No No FemaleYes Yes Yes Yes MaleNo No No No MaleYes No No No MaleYes Yes Yes No Female

Page 244: Teaching an Introductory Course in Data Mining

The Instance to be Classified

Magazine Promotion = Yes

Watch Promotion = Yes

Life Insurance Promotion = No

Credit Card Insurance = No

Sex = ?

Page 245: Teaching an Introductory Course in Data Mining

Table 10.5 • Counts and Probabilities for Attribute Sex

Magazine Watch Life Insurance Credit CardPromotion Promotion Promotion Insurance

Sex Male Female Male Female Male Female Male Female

Yes 4 3 2 2 2 3 2 1No 2 1 4 2 4 1 4 3

Ratio: yes/total 4/6 3/4 2/6 2/4 2/6 3/4 2/6 1/4Ratio: no/total 2/6 1/4 4/6 2/4 4/6 1/4 4/6 3/4

Page 246: Teaching an Introductory Course in Data Mining

Equation 10.10

Computing The Probability For Sex = Male

)(

)()|()|(

EP

malesexPmalesexEPEmalesexP

Page 247: Teaching an Introductory Course in Data Mining

Conditional Probabilities for Sex = Male

P(magazine promotion = yes | sex = male) = 4/6

P(watch promotion = yes | sex = male) = 2/6

P(life insurance promotion = no | sex = male) = 4/6

P(credit card insurance = no | sex = male) = 4/6

P(E | sex =male) = (4/6) (2/6) (4/6) (4/6) = 8/81

Page 248: Teaching an Introductory Course in Data Mining

The Probability for Sex=Male Given Evidence E

P(sex = male | E) 0.0593 / P(E)

Page 249: Teaching an Introductory Course in Data Mining

The Probability for Sex=Female Given Evidence E

P(sex = female| E) 0.0281 / P(E)

Page 250: Teaching an Introductory Course in Data Mining

Equation 10.12

Zero-Valued Attribute Counts

attribute for the valuespossible ofnumber

total theofpart fractional equal an is p

1)(usually 1 and 0 between a value is

))((

k

kd

pkn

Page 251: Teaching an Introductory Course in Data Mining

Missing Data

With Bayes classifier missing data items are ignored.

Page 252: Teaching an Introductory Course in Data Mining

Equation 10.13

Numeric Data

where

e = the exponential function

= the class mean for the given numerical attribute

= the class standard deviation for the attribute

x = the attribute value

)2/()( 22

)2/(1)( xexf

Page 253: Teaching an Introductory Course in Data Mining

10.4 Clustering Algorithms

Page 254: Teaching an Introductory Course in Data Mining

Agglomerative Clustering

1. Place each instance into a separate partition.

2. Until all instances are part of a single cluster:

a. Determine the two most similar clusters.

b. Merge the clusters chosen into a single cluster.

3. Choose a clustering formed by one of the step 2 iterations as a final result.

Page 255: Teaching an Introductory Course in Data Mining

Conceptual Clustering

1. Create a cluster with the first instance as its only member.

2. For each remaining instance, take one of two actions at each tree level.

a. Place the new instance into an existing cluster.

b. Create a new concept cluster having the new instance as its only member.

Page 256: Teaching an Introductory Course in Data Mining

Expectation Maximization

The EM (expectation-maximization) algorithm is a statistical technique that makes use of the finite Gaussian mixtures model.

Page 257: Teaching an Introductory Course in Data Mining

Expectation Maximization

• A mixture is a set of n probability distributions where each distribution represents a cluster.

• The mixtures model assigns each data instance a probability that it would have a certain set of attribute values given it was a member of a specified cluster.

Page 258: Teaching an Introductory Course in Data Mining

Expectation Maximization

• The EM algorithm is similar to the K-Means procedure in that a set of parameters are recomputed until a desire convergence is achieved.

• In the simplest case, there are two clusters, a single real-valued attribute, and the probability distributions are normal.

Page 259: Teaching an Introductory Course in Data Mining

EM Algorithm (two-class, one attribute scenario)

1. Guess initial values for the five parameters.

2. Until a termination criterion is achieved:

a. Use the probability density function for normal distributions to compute the cluster probability for each instance.

b. Use the probability scores assigned to each instance in step 2(a) to re-estimate the parameters.

Page 260: Teaching an Introductory Course in Data Mining

Specialized Techniques

Chapter 11

Page 261: Teaching an Introductory Course in Data Mining

11.1 Time-Series Analysis

Time-series Problems: Prediction applications with one or more time-dependent attributes.

Page 262: Teaching an Introductory Course in Data Mining

Table 11.1 • Weekly Average Closing Prices for the Nasdaqand Dow Jones Industrial Average

Week Nasdaq Dow Nasdaq-1 Dow-1 Nasdaq-2 Dow-2Average Average Average Average Average Average

200003 4176.75 11413.28 3968.47 11587.96 3847.25 11224.10200004 4052.01 10967.60 4176.75 11413.28 3968.47 11587.96200005 4104.28 10992.38 4052.01 10967.60 4176.75 11413.28200006 4398.72 10726.28 4104.28 10992.38 4052.01 10967.60200007 4445.53 10506.68 4398.72 10726.28 4104.28 10992.38200008 4535.15 10121.31 4445.53 10506.68 4398.72 10726.28200009 4745.58 10167.38 4535.15 10121.31 4445.53 10506.68200010 4949.09 9952.52 4745.58 10167.38 4535.15 10121.31200011 4742.40 10223.11 4949.09 9952.52 4745.58 10167.38200012 4818.01 10937.36 4742.40 10223.11 4949.09 9952.52

Page 263: Teaching an Introductory Course in Data Mining

11.2 Mining the Web

Page 264: Teaching an Introductory Course in Data Mining

Web-Based Mining(identifying the goal)

– Decrease the average number of pages visited by a customer before a purchase transaction.

– Increase the average number of pages viewed per user session.

– Increase Web server efficiency– Personalize Web pages for customers– Determine those products that tend to be purchased or

viewed together– Decrease the total number of item returns– Increase visitor retention rates

Page 265: Teaching an Introductory Course in Data Mining

Web-Based Mining(preparing the data)

• Data is stored in Web server log files, typically in the form of clickstream sequences

• Server log files provide information in extended common log file format

Page 266: Teaching an Introductory Course in Data Mining

Extended Common Log File Format

• Host Address

• Date/Time

• Request

• Status

• Bytes

• Referring Page

• Browser Type

Page 267: Teaching an Introductory Course in Data Mining

Extended Common Log File Format

80.202.8.93 - - [16/Apr/2002:22:43:28 -0600] "GET /grbts/images/msu-new-color.gif HTTP/1.1" 200 5006 "http://grb.mnsu.edu/doc/index.html" "Mozilla/4.0 (compatible; MSIE 5.0; Windows 2000) Opera 6.01 [nb]“

134.29.41.219 - - [17/Apr/2002:19:23:30 -0600] "GET /resin-doc/images/resin_powered.gif HTTP/1.1" 200 571 "http://grb.mnsu.edu/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Q312461)"

Page 268: Teaching an Introductory Course in Data Mining

Preparing the Data(the session file)

• A session file is a file created by the data preparation process.

• Each instance of a session file represents a single user session.

Page 269: Teaching an Introductory Course in Data Mining

Preparing the Data(the session file)

•A user session is a set of pageviews requested by a single user from a single Web server.• A pageview contains one or more page files each forming a display window in a Web browser.•Each pageview is tagged with a unique uniform resource identifier (URI).

Page 270: Teaching an Introductory Course in Data Mining

Figure 11.1 A generic Web usage model

LearnerModel

WebServerLogs

DataPreparation

SessionFile

Data MiningAlgorithm(s)

Page 271: Teaching an Introductory Course in Data Mining

Preparing the Data(the session file)

• Creating the session file is difficult– Identify individual users in a log file– Host addresses are of limited help– Host address combined with referring page is

beneficial– One user page request may generate multiple

log file entries from several types of servers– Easiest when sites are allowed to use cookies

Page 272: Teaching an Introductory Course in Data Mining

Web-Based Mining(mining the data)

• Traditional techniques such as association rule generators or clustering methods can be applied.

• Sequence miners, which are special data mining algorithms used to discover frequently accessed Web pages that occur in the same order, are often used.

Page 273: Teaching an Introductory Course in Data Mining

Web-Based Mining(evaluating results)

• Consider four hypothetical pageview instances

P5 P4 P10 P3 P15 P2 P1

P2 P4 P10 P8 P15 P4 P15 P1

P4 P3 P7 P11 P14 P8 P2 P10

P1 P3 P10 P11 P4 P15 P9

Page 274: Teaching an Introductory Course in Data Mining

Evaluating Results(association rules)

• An association rule generator outputs the following rule from our session data.

IF P4 & P10

THEN P15 {3/4}

• This rule states that P4, P10 and P15 appear in three session instances. Also, a four instances have P4 and P10 appearing in the same session instance

Page 275: Teaching an Introductory Course in Data Mining

Evaluating Results(unsupervised clustering)

• Use agglomerative clustering to place session instances into clusters.

• Instance similarity is computed by dividing the total number of pageviews each pair of instances share by the total number of pageviews contained within the instances.

Page 276: Teaching an Introductory Course in Data Mining

Evaluating Results(unsupervised clustering)

• Consider the following session instances:

P5 P4 P10 P3 P15 P2 P1

P2 P4 P10 P8 P15 P4 P15 P1

• The computed similarity is 5/8 = 0.625

Page 277: Teaching an Introductory Course in Data Mining

Evaluating Results(summary statistics)

• Summary statistics about the activities taking place at a Web site can be obtained using a Web server log analyzer.

• The output of the analyzer is an aggregation of log file data displayed in graphical format.

Page 278: Teaching an Introductory Course in Data Mining

Web-Based Mining (Taking Action)

• Implement a strategy based on created user profiles to personalize the Web pages viewed by site visitors.

• Adapt the indexing structure of a Web site to better reflect the paths followed by typical users.

• Set up online advertising promotions for registered Web site customers.

• Send e-mail to promote products of likely interest to a select group of registered customers.

• Modify the content of a Web site by grouping products likely to be purchased together, removing products of little interest, and expanding the offerings of high-demand products.

Page 279: Teaching an Introductory Course in Data Mining

Data Mining for Web Site Evaluation

Web site evaluation is concerned with determining whether the actual use of a site matches the intentions of its designer.

Page 280: Teaching an Introductory Course in Data Mining

Data Mining for Web Site Evaluation

• Data mining can help with site evaluation by determining the frequent patterns and routes traveled by the user population.

• Sequential ordering of pageviews is of primary interest.

• Sequence miners are used to determine pageview order sequencing.

Page 281: Teaching an Introductory Course in Data Mining

Data Mining for Personalization

• The goal of personalization is to present Web users with what interests them without requiring them to ask for it directly.

• Manual techniques force users to register at a Web site and to fill in questionnaires.

• Data mining can be used to automate personalization.

Page 282: Teaching an Introductory Course in Data Mining

Data Mining for Personalization

Automatic personalization is accomplished by creating usage profiles from stored session data.

Page 283: Teaching an Introductory Course in Data Mining

Data Mining for Web Site Adaptation

The index synthesis problem: Given a Web site and a visitor access log, create new index pages containing collections of links to related but currently unlinked pages.

Page 284: Teaching an Introductory Course in Data Mining

11.3 Mining Textual Data

• Train: Create an attribute dictionary.

• Filter: Remove common words.

• Classify: Classify new documents.

Page 285: Teaching an Introductory Course in Data Mining

11.4 Improving Performance

• Bagging

• Boosting

• Instance Typicality

Page 286: Teaching an Introductory Course in Data Mining

Data Mining Standards

Grossman, R.L., Hornick , M.F., Meyer, G., Data Mining Standards Initiatives, Communications of the ACM, August 2002,Vol. 45. No. 8

Page 287: Teaching an Introductory Course in Data Mining

Privacy & Data Mining

• Inference is the process of users posing queries and deducing unauthorized information from the legitimate responses that they receive.

• Data mining offers sophisticated tools to deduce sensitive patterns from data.

Page 288: Teaching an Introductory Course in Data Mining

Privacy & Data Mining(an example)

Unnamed Health records are public information.

People's names are public.

People associated with their individual health records is private information.

Page 289: Teaching an Introductory Course in Data Mining

Privacy & Data Mining(an example)

Former employees have their employment records stored in a datawarehouse. An employer uses data mining to build a classificationmodel to differentiate employees relative to their termination:

• They quit• They were fired• They were laid off• They retired

The employer now uses the model to classify current employees. He fires employees likely to quit, and lays off employees likely to

retire. Is this ethical?

Page 290: Teaching an Introductory Course in Data Mining

Privacy & Data Mining(handling the inference problem)

• Given a database and a data mining tool, apply the tool to determine if sensitive information can be deduced.

• Use an inference controller to detect the motives of the user.

• Give only samples of the data to the user thereby preventing the user from building a data mining model.

Page 291: Teaching an Introductory Course in Data Mining

Privacy & Data Mining

Thuraisingham, B., Web Data Mining and Applications in Business Intelligence and Counter-Terrorism, CRC Press, 2003.

Page 293: Teaching an Introductory Course in Data Mining

Data Mining Textbooks

• Berry, M.J., Linoff, G., Data Mining Techinques For marketing, Sales, and Customer Support, Wiley, 1997.

• Han, J., Kamber, M., Data Mining Concepts and Techniques, Academic Press, 2001.

• Roiger, R.J., Geatz, M.W., Data Mining: A Tutorial-Based Primer, Addison-Wesley, 2003.

• Tan, P., Steinbach, M., Kumar, V., Introduction To Data Mining, Addison-Wesley, 2005.

• Witten, I.H., Frank, E., Data Mining Practical Machine Learning Tools with Java Implementations, Academic Press, 2000.

Page 294: Teaching an Introductory Course in Data Mining

Data Mining Resources

• AI magazine

• Communications of the ACM

• SIGKDD Explorations

• Computer Magazine

• PC AI

• IEEE Transactions on Data and Knowledge Engineering

Page 295: Teaching an Introductory Course in Data Mining

Data Mining A Tutorial-Based Primer

• Part I: Data Mining Fundamentals

• Part II: tools for Knowledge Discovery

• Part III: Advanced Data Mining Techniques

• Part IV: Intelligent Systems

Page 296: Teaching an Introductory Course in Data Mining

Part I: Data Mining Fundamentals

• Chapter 1 Data Mining: A First View• Chapter 2 Data Mining: A Closer Look• Chapter 3 Basic Data Mining Techniques• Chapter 4 An Excel-Based Data Mining Tool

Page 297: Teaching an Introductory Course in Data Mining

Part II: Tools for Knowledge Discovery

• Chapter 5: Knowledge Discovery in Databases• Chapter 6: The Data Warehouse• Chapter 7: Formal Evaluation Techniques

Page 298: Teaching an Introductory Course in Data Mining

Part III: Advanced Data Mining Techniques

• Chapter 8: Neural Networks• Chapter 9: Building Neural Networks with IDA• Chapter 10: Statistical Techniques• Chapter 11: Specialized Techniques

Page 299: Teaching an Introductory Course in Data Mining

Part IV: Intelligent Systems

• Chapter 12: Rule-Based Systems• Chapter 13: Managing Uncertainty in Rule-Based

Systems• Chapter 14: Intelligent Agents