part ii tools for knowledge discovery. knowledge discovery in databases chapter 5
Post on 20-Dec-2015
229 views
TRANSCRIPT
![Page 1: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d4e5503460f94a2d05d/html5/thumbnails/1.jpg)
Part II
Tools for
Knowledge Discovery
![Page 2: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d4e5503460f94a2d05d/html5/thumbnails/2.jpg)
Knowledge Discovery in Databases
Chapter 5
![Page 3: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d4e5503460f94a2d05d/html5/thumbnails/3.jpg)
5.1 A KDD Process Model
![Page 4: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d4e5503460f94a2d05d/html5/thumbnails/4.jpg)
Figure 5.1 A seven-step KDD process model
Step 3: Data Preprocessing
CleansedData
Step 2: Create Target Data
DataWarehouse
TargetData
Step 1: Goal Identification
DefinedGoals
Step 4: Data Transformation
TransformedData
Step 7: Taking Action
Step 6: Interpretation & EvaluationStep 5: Data Mining
DataModel
Transactional
Database
FlatFile
![Page 5: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d4e5503460f94a2d05d/html5/thumbnails/5.jpg)
Figure 5.2 Applyiing the scientific method to data mining
The Scientific Method
Define the Problem
A KDD Process Model
Take Action
Interpretation / Evaluation
Create Target DataData PreprocessingData TransformationData Mining
Identify the Goal
Verifiy Conclusions
Draw Conclusions
Perform an Experiment
Formulate a Hypothesis
{
![Page 6: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d4e5503460f94a2d05d/html5/thumbnails/6.jpg)
Step 1: Goal Identification
• Define the Problem.
• Choose a Data Mining Tool.
• Estimate Project Cost.
• Estimate Project Completion Time.
• Address Legal Issues.
• Develop a Maintenance Plan.
![Page 7: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d4e5503460f94a2d05d/html5/thumbnails/7.jpg)
Step 2: Creating a Target Dataset
![Page 8: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d4e5503460f94a2d05d/html5/thumbnails/8.jpg)
Figure 5.3 The Acme credit card database
![Page 9: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d4e5503460f94a2d05d/html5/thumbnails/9.jpg)
Step 3: Data Preprocessing
• Noisy Data
• Missing Data
![Page 10: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d4e5503460f94a2d05d/html5/thumbnails/10.jpg)
Noisy Data
• Locate Duplicate Records.
• Locate Incorrect Attribute Values.
• Smooth Data.
![Page 11: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d4e5503460f94a2d05d/html5/thumbnails/11.jpg)
Preprocessing Missing Data
• Discard Records With Missing Values.
• Replace Missing Real-valued Items With the Class Mean.
• Replace Missing Values With Values Found Within Highly Similar Instances.
![Page 12: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d4e5503460f94a2d05d/html5/thumbnails/12.jpg)
Processing Missing Data While Learning
• Ignore Missing Values.
• Treat Missing Values As Equal Compares.
• Treat Missing values As Unequal Compares.
![Page 13: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d4e5503460f94a2d05d/html5/thumbnails/13.jpg)
Step 4: Data Transformation
• Data Normalization
• Data Type Conversion
• Attribute and Instance Selection
![Page 14: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d4e5503460f94a2d05d/html5/thumbnails/14.jpg)
Data Normalization
• Decimal Scaling
• Min-Max Normalization
• Normalization using Z-scores
• Logarithmic Normalization
![Page 15: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d4e5503460f94a2d05d/html5/thumbnails/15.jpg)
Attribute and Instance Selection
• Eliminating Attributes
• Creating Attributes
• Instance Selection
![Page 16: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d4e5503460f94a2d05d/html5/thumbnails/16.jpg)
Table 5.1 • An Initial Population for Genetic Attribute Selection
Population Income Magazine Watch Credit CardElement Range Promotion Promotion Insurance Sex Age
1 1 0 0 1 1 12 0 0 0 1 0 13 0 0 0 0 1 1
![Page 17: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d4e5503460f94a2d05d/html5/thumbnails/17.jpg)
Step 5: Data Mining
1. Choose training and test data.
2. Designate a set of input attributes.
3. If learning is supervised, choose one or more output attributes.
4. Select learning parameter values.
5. Invoke the data mining tool.
![Page 18: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d4e5503460f94a2d05d/html5/thumbnails/18.jpg)
Step 6: Interpretation and Evaluation
• Statistical analysis.
• Heuristic analysis.
• Experimental analysis.
• Human analysis.
![Page 19: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d4e5503460f94a2d05d/html5/thumbnails/19.jpg)
Step 7: Taking Action
• Create a report.
• Relocate retail items.
• Mail promotional information.
• Detect fraud.
• Fund new research.
![Page 20: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d4e5503460f94a2d05d/html5/thumbnails/20.jpg)
5.9 The Crisp-DM Process Model
1. Business understanding
2. Data understanding
3. Data preparation
4. Modeling
5. Evaluation
6. Deployment
![Page 21: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d4e5503460f94a2d05d/html5/thumbnails/21.jpg)
5.10 Experimenting with ESX
![Page 22: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d4e5503460f94a2d05d/html5/thumbnails/22.jpg)
A Four-Step Model for Knowledge Discovery
1. Identify the goal.
2. Prepare the data.
3. Apply data mining.
4. Interpret and evaluate the results.
![Page 23: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d4e5503460f94a2d05d/html5/thumbnails/23.jpg)
Experiment 1: Attribute Evaluation
*Applying the Four-Step Process Model to the Credit Screening
Dataset*
![Page 24: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d4e5503460f94a2d05d/html5/thumbnails/24.jpg)
Table 5.2 • A Confusion Matrix for Credit Card Screening
Computed ComputedAccept Reject
Accept 115 38Reject 35 152
![Page 25: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d4e5503460f94a2d05d/html5/thumbnails/25.jpg)
Table 5.3 • Test Set Results for a Most Typical Training Model
Computed ComputedAccept Reject
Accept 98 55Reject 25 162
![Page 26: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d4e5503460f94a2d05d/html5/thumbnails/26.jpg)
Experiment 2: Parameter Evaluation
*Applying the Four-Step Process Model to the Satellite Image
Dataset*
![Page 27: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5](https://reader030.vdocuments.site/reader030/viewer/2022032704/56649d4e5503460f94a2d05d/html5/thumbnails/27.jpg)
Figure 5.4 Satellite image data