the crisp data mining process. august 28, 2004data mining2 the data mining process business...
TRANSCRIPT
![Page 1: The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f075503460f94c1cc75/html5/thumbnails/1.jpg)
The CRISP Data Mining Process
![Page 2: The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f075503460f94c1cc75/html5/thumbnails/2.jpg)
August 28, 2004 Data Mining 2
The Data Mining Process
Businessunderstanding
Dataevaluation
Datapreparation
Modeling
Evaluation
Deployment Data
![Page 3: The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f075503460f94c1cc75/html5/thumbnails/3.jpg)
August 28, 2004 Data Mining 3
Business Understanding
Projectobjectives
Projectrequirements
DM ProblemFormulation
PreliminaryPlan
![Page 4: The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f075503460f94c1cc75/html5/thumbnails/4.jpg)
August 28, 2004 Data Mining 4
Case Study
Data mining project done for a large insurance companyConsider the use of data mining to improve understanding of customer databasesLed by the data warehousing team, which wanted to also improve their expertise
![Page 5: The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f075503460f94c1cc75/html5/thumbnails/5.jpg)
August 28, 2004 Data Mining 5
Business Objectives
Understand what coverage packages are of interest to a customer group Targeting of new customers Cross-selling opportunities to existing customers
Understand why a customer group terminates coverage Know in advance what groups are likely to
terminate Understand what factors influence termination
![Page 6: The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f075503460f94c1cc75/html5/thumbnails/6.jpg)
August 28, 2004 Data Mining 6
What are the Goals?
The business goals Improve customer retention Increase cross-selling
Success criteriaCustomer turnover rateAmount of cross-selling
![Page 7: The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f075503460f94c1cc75/html5/thumbnails/7.jpg)
August 28, 2004 Data Mining 7
Data Mining Problems
Classify new and existing customers as either interested or not interested in a particular coverage
Classify existing customers as either likely or unlikely to terminate coverage
![Page 8: The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f075503460f94c1cc75/html5/thumbnails/8.jpg)
August 28, 2004 Data Mining 8
The Data Mining Process
Businessobjectives
Dataevaluation
Datapreparation
Modeling
Evaluation
Deployment Data
![Page 9: The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f075503460f94c1cc75/html5/thumbnails/9.jpg)
August 28, 2004 Data Mining 9
Data Evaluation
Initial data collections
Data quality
Initial insights
Interesting subsets
Data warehousing team
![Page 10: The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f075503460f94c1cc75/html5/thumbnails/10.jpg)
August 28, 2004 Data Mining 10
Case Study: Data Evaluation
Data was extracted from select customer databases by company personnel
Coverage programs with few customers selected for pilot project
Five separate files extracted for five coverage programs
![Page 11: The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f075503460f94c1cc75/html5/thumbnails/11.jpg)
August 28, 2004 Data Mining 11
The Data Mining Process
Businessobjectives
Dataevaluation
Datapreparation
Modeling
Evaluation
Deployment Data
![Page 12: The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f075503460f94c1cc75/html5/thumbnails/12.jpg)
August 28, 2004 Data Mining 12
Data Preparation
Raw DataFinishedData Set
Technical tasks:Data selectionAttribute selectionData cleaning
![Page 13: The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f075503460f94c1cc75/html5/thumbnails/13.jpg)
August 28, 2004 Data Mining 13
Case Study: Data Preparation
Some initial formatting of data in MS ExcelCleaning of data fileCombine headers/instancesAdd a new attribute: interest (yes/no)Must create the no interest cases
End up with a CSV formatted file
![Page 14: The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f075503460f94c1cc75/html5/thumbnails/14.jpg)
August 28, 2004 Data Mining 14
Weka Data Mining Software
Data in CSV format loaded into Weka:Data preprocessingAttribute selectionModeling
ClassificationClusteringAssociation rule mining
Visualization
![Page 15: The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f075503460f94c1cc75/html5/thumbnails/15.jpg)
August 28, 2004 Data Mining 15
Data Preprocessing in Weka
Initial data inspectionMissing valuesUseless attributesNumeric attributes as nominal
Some helpful Weka filtersRemoveUselessReplaceMissingValues
![Page 16: The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f075503460f94c1cc75/html5/thumbnails/16.jpg)
August 28, 2004 Data Mining 16
Data Preprocessing in Weka
Data reduction: Instance dimension
RemovePercentage, and Resample filtersAttribute dimension
Remove redundant attributesRemove irrelevant attributes Identify most important attributes
![Page 17: The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f075503460f94c1cc75/html5/thumbnails/17.jpg)
August 28, 2004 Data Mining 17
Attribute Selection Methods
Three main methods used: InfoGain ChiSquared Relief
Combined results from complimentary methods
Final pruning of attribute list to twenty attributes
![Page 18: The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f075503460f94c1cc75/html5/thumbnails/18.jpg)
August 28, 2004 Data Mining 18
Selected Attributes
LocationTax StateContract StateState CodeZip Code
![Page 19: The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f075503460f94c1cc75/html5/thumbnails/19.jpg)
August 28, 2004 Data Mining 19
Selected Attributes
SizeCase Size Range
Industry Industry Classification Industry Classification NameSIC Code
![Page 20: The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f075503460f94c1cc75/html5/thumbnails/20.jpg)
August 28, 2004 Data Mining 20
Selected Attributes
TimingNew Sale FlagDecision Maker Effective MonthDecision Maker Effective YearNext Renewal MonthNext Renewal Year
![Page 21: The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f075503460f94c1cc75/html5/thumbnails/21.jpg)
August 28, 2004 Data Mining 21
Selected Attributes
InternalAgency NumberOffice NamePricing Category CodeProduct Line NameSmall Group Flag
![Page 22: The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f075503460f94c1cc75/html5/thumbnails/22.jpg)
August 28, 2004 Data Mining 22
Relevance of Attribute Selection
Improved modelingFaster model inductionHigher accuracyEasier to interpret models
Structural knowledge gained from the selection of attributes
![Page 23: The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f075503460f94c1cc75/html5/thumbnails/23.jpg)
August 28, 2004 Data Mining 23
Most Important Attributes
What attributes effect the purchasing decision of a customer group?E.g., the five most important factor that determine if a customer group purchases a particular insurance coverage Agency Number Small Group Flag Zip Code Decision Maker Effective Year Next Renewal Month
![Page 24: The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f075503460f94c1cc75/html5/thumbnails/24.jpg)
August 28, 2004 Data Mining 24
Customer Segmentation
Unique groups of customersSimilar characteristicsSimilar behavior in terms of interest in
coverage
For example, separate predictive models for customer segments for a particular type of insurance
![Page 25: The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f075503460f94c1cc75/html5/thumbnails/25.jpg)
August 28, 2004 Data Mining 25
Customer Segments Used for Modeling
ResultsThree segments for one databaseTwo segments for two databasesOne segment for two databases
Continue modeling for each segment independently
![Page 26: The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f075503460f94c1cc75/html5/thumbnails/26.jpg)
August 28, 2004 Data Mining 26
The Data Mining Process
Businessobjectives
Dataevaluation
Datapreparation
Modeling
Evaluation
Deployment Data
![Page 27: The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f075503460f94c1cc75/html5/thumbnails/27.jpg)
August 28, 2004 Data Mining 27
Modeling
Select modeling technique(s)
Calibrate modeling techniques
Make adjustments to data
![Page 28: The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f075503460f94c1cc75/html5/thumbnails/28.jpg)
August 28, 2004 Data Mining 28
Modeling
Mathematical models for predicting if a customer is interested in a coverageUnderstand why a customer is interestedFor example:If a customer’s state is Indiana and the office is Indianapolis_Office1 then the customer is interested in Coverage_3
![Page 29: The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f075503460f94c1cc75/html5/thumbnails/29.jpg)
August 28, 2004 Data Mining 29
Modeling Techniques
Three modeling techniques tried for predicting customer interest: Decision trees Artificial neural networks (ANN) Support vector machines (SVM)
Decision trees have the advantage of transparencyANN and SVM did not have significantly better prediction accuracy
![Page 30: The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f075503460f94c1cc75/html5/thumbnails/30.jpg)
August 28, 2004 Data Mining 30
Insurance Coverage Interest (Type 6)
Small Group Flag
Y
Product Line Name
No
N
No
Group_2
Yes
Group_1
![Page 31: The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f075503460f94c1cc75/html5/thumbnails/31.jpg)
August 28, 2004 Data Mining 31
Insurance Coverage Interest (Type 7)
Pricing Category Code
Industry Classification
Name
A4
Agency Number
Yes No
<= 430 > 430
Next Renewal Year
NoYes
<= 2000 > 2000
Legal_ServicesTransportation_andPublic_Utilities
Next Renewal Year
Yes No
Group_1Group_2
A2
Yes No
<= 2002> 2002
OthersBranchesomitted
![Page 32: The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f075503460f94c1cc75/html5/thumbnails/32.jpg)
August 28, 2004 Data Mining 32
Accuracy of Predicting Customer Interest
Coverage Accuracy
Type 1 84.0%
Type 2 97.2%
Type 3 98.3%
Type 4 99.5%
Type 5 88.4%
Type 6 100%
Type 7 76.3%
Type 8 85.0%
Type 9 94.8%
![Page 33: The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f075503460f94c1cc75/html5/thumbnails/33.jpg)
August 28, 2004 Data Mining 33
Modeling
Mathematical models for predicting if a customer will terminate coverage
Why do customers terminate a specific type of coverage?
What are the important factors in a customers decision to terminate coverage?
![Page 34: The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f075503460f94c1cc75/html5/thumbnails/34.jpg)
August 28, 2004 Data Mining 34
Who Terminates Type 3 Coverage?
CustomerEffective Year
Terminated
2000
Next RenewalMonth
1999
2000
CoverageEffective Year
CoverageEffective Year
2001 2002
Active
Terminated Terminated Active
2000
Active
2000
7 7
Correct for 95%of customers
![Page 35: The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f075503460f94c1cc75/html5/thumbnails/35.jpg)
August 28, 2004 Data Mining 35
Who Terminates Type 1 Coverage?
Decision tree based on:Distribution numberUnderwriting department numberPrice categoryRate typeRate Plan Year
Predicts 96.3% of terminations correctly
![Page 36: The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f075503460f94c1cc75/html5/thumbnails/36.jpg)
August 28, 2004 Data Mining 36
Accuracy of Predicting Termination
Model Accuracy
Type 1 96.3%
Type 2 96.5%
Type 3 95.3%
Type 4 88.9%
Type 5 88.3%
![Page 37: The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f075503460f94c1cc75/html5/thumbnails/37.jpg)
August 28, 2004 Data Mining 37
The Data Mining Process
Businessobjectives
Dataevaluation
Datapreparation
Modeling
Evaluation
Deployment Data
![Page 38: The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f075503460f94c1cc75/html5/thumbnails/38.jpg)
August 28, 2004 Data Mining 38
Evaluation
Data analysis results in a good model
Are business objectives being achieved?
Is there an important business issue that has
not been considered?
Should the results be used?
![Page 39: The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f075503460f94c1cc75/html5/thumbnails/39.jpg)
August 28, 2004 Data Mining 39
The Data Mining Process
Businessobjectives
Dataevaluation
Datapreparation
Modeling
Evaluation
Deployment Data
![Page 40: The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f075503460f94c1cc75/html5/thumbnails/40.jpg)
August 28, 2004 Data Mining 40
Deployment
Incorporate the results in the organization’s decision making processReportDecision support systemPersonalization of web pagesRepeatable data mining process