lecture8_overview of dm techniques modified)
DESCRIPTION
This lecture is use for Data Ware house and Data Mining courseTRANSCRIPT
![Page 1: Lecture8_Overview of DM Techniques Modified)](https://reader033.vdocuments.site/reader033/viewer/2022052820/55286127550346aa588b4773/html5/thumbnails/1.jpg)
SCCS 453 DW and DMSemester 2, Year 2006
Songsri Tangsripairoj, Ph.D. 1
SCCS 453 Data Warehousing and Data Mining
Songsri Tangsripairoj, [email protected]
Department of Computer ScienceFaculty of Science, Mahidol University
Lecture 8Overview of Data Mining Techniques
![Page 2: Lecture8_Overview of DM Techniques Modified)](https://reader033.vdocuments.site/reader033/viewer/2022052820/55286127550346aa588b4773/html5/thumbnails/2.jpg)
SCCS 453 DW and DMSemester 2, Year 2006
Songsri Tangsripairoj, Ph.D. 2
Topics Data Mining Tasks Data Mining Techniques Data Mining Models Data Mining Functions Demonstration Data Sets Data Mining Tools
Shopping Cart Analyzer (SCA) Weka by the University of Waikato
![Page 3: Lecture8_Overview of DM Techniques Modified)](https://reader033.vdocuments.site/reader033/viewer/2022052820/55286127550346aa588b4773/html5/thumbnails/3.jpg)
SCCS 453 DW and DMSemester 2, Year 2006
Songsri Tangsripairoj, Ph.D. 3
Data Mining Tasks Descriptive
Characterize general properties of the data in databases
Clustering and Summarization
Predictive Perform inference on the current data in order to
make prediction Classification and Estimation
![Page 4: Lecture8_Overview of DM Techniques Modified)](https://reader033.vdocuments.site/reader033/viewer/2022052820/55286127550346aa588b4773/html5/thumbnails/4.jpg)
SCCS 453 DW and DMSemester 2, Year 2006
Songsri Tangsripairoj, Ph.D. 4
Data Mining Techniques
Statistical techniques Have strong diagnostic tools Can be used for the development of confidence
intervals on parameter estimates, hypothesis testing
Artificial Intelligence techniques Require less assumptions about the data Are generally more automatic
![Page 5: Lecture8_Overview of DM Techniques Modified)](https://reader033.vdocuments.site/reader033/viewer/2022052820/55286127550346aa588b4773/html5/thumbnails/5.jpg)
SCCS 453 DW and DMSemester 2, Year 2006
Songsri Tangsripairoj, Ph.D. 5
Data Mining Techniques
Statistical Market-Basket Analysis - find groups of items
Memory-Based Reasoning- case based
Cluster Detection - undirected (quantitative MBA)
Artificial Intelligence Link Analysis - MCI’s Friends & Family
Decision Trees, Rule Induction - production rule
Neural Networks - automatic pattern detection
Genetic Algorithms - keep best parameters
![Page 6: Lecture8_Overview of DM Techniques Modified)](https://reader033.vdocuments.site/reader033/viewer/2022052820/55286127550346aa588b4773/html5/thumbnails/6.jpg)
SCCS 453 DW and DMSemester 2, Year 2006
Songsri Tangsripairoj, Ph.D. 6
Comparison of Features
Rules Neural Net CaseBase Genetic
Noisy data Good Very good Good Very good
Missing data Good Good Very good Good
Large sets Very good Poor Good Good
Different types Good Numerical Very good Transform
Accuracy High Very high High High
Explanation Very good Poor Very good Good
Integration Good Good Good Very good
Ease Easy Difficult Easy Difficult
![Page 7: Lecture8_Overview of DM Techniques Modified)](https://reader033.vdocuments.site/reader033/viewer/2022052820/55286127550346aa588b4773/html5/thumbnails/7.jpg)
SCCS 453 DW and DMSemester 2, Year 2006
Songsri Tangsripairoj, Ph.D. 7
Data Mining Models
Regression: Y = a + bX Classification: assign new record to class Predictive: assign value to new record Clustering: groups for data Time-series: assign future value Links: patterns in data
![Page 8: Lecture8_Overview of DM Techniques Modified)](https://reader033.vdocuments.site/reader033/viewer/2022052820/55286127550346aa588b4773/html5/thumbnails/8.jpg)
SCCS 453 DW and DMSemester 2, Year 2006
Songsri Tangsripairoj, Ph.D. 8
Data Mining Modeling ToolsRadding Algorithms Peacock Functions Basis Task
Cluster detection Cluster analysis Statistics Classification
Regression models Statistics Estimation
Logistic regression Statistics Classification
Discriminant analysis Statistics Classification
Neural networks Neural networks AI Classification
Kohonen nets AI Cluster
Decision trees Association rules AI Classification
Rule induction Association rules AI Description
Link analysis Description
Query tools Description
Descriptive statistics Statistics Description
Visualization tools Statistics Description
![Page 9: Lecture8_Overview of DM Techniques Modified)](https://reader033.vdocuments.site/reader033/viewer/2022052820/55286127550346aa588b4773/html5/thumbnails/9.jpg)
SCCS 453 DW and DMSemester 2, Year 2006
Songsri Tangsripairoj, Ph.D. 9
Data Mining Functions
Classification Identify categories in data
Prediction Formula to predict future observations
Association Rules using relationships among entities
Detection Anomalies & irregularities (fraud detection)
![Page 10: Lecture8_Overview of DM Techniques Modified)](https://reader033.vdocuments.site/reader033/viewer/2022052820/55286127550346aa588b4773/html5/thumbnails/10.jpg)
SCCS 453 DW and DMSemester 2, Year 2006
Songsri Tangsripairoj, Ph.D. 10
Financial ApplicationsTechnique Application Problem Type
Neural net Forecast stock price Prediction
NN, Rule Forecast bankruptcy
Fraud detection
Prediction
Detection
NN, Case Forecast interest rate Prediction
NN, visual Late loan detection Detection
Rule Credit assessment
Risk classification
Prediction
Classification
Rule, Case Corporate bond rate Prediction
![Page 11: Lecture8_Overview of DM Techniques Modified)](https://reader033.vdocuments.site/reader033/viewer/2022052820/55286127550346aa588b4773/html5/thumbnails/11.jpg)
SCCS 453 DW and DMSemester 2, Year 2006
Songsri Tangsripairoj, Ph.D. 11
Telecom Applications
Technique Application Problem Type
Neural net,
Rule induct
Forecast network behav. Prediction
Rule induct Churn
Fraud detection
Classification
Detection
Case based Call tracking Classification
![Page 12: Lecture8_Overview of DM Techniques Modified)](https://reader033.vdocuments.site/reader033/viewer/2022052820/55286127550346aa588b4773/html5/thumbnails/12.jpg)
SCCS 453 DW and DMSemester 2, Year 2006
Songsri Tangsripairoj, Ph.D. 12
Marketing Applications
Technique Application Problem Type
Rule induct Market segment
Cross-selling
Classification
Association
Rule induct, visual Lifestyle analysis
Performance analy.
Classification
Association
Rule induct, genetic, visual
Reaction to promotion
Prediction
Case based Online sales support Classification
![Page 13: Lecture8_Overview of DM Techniques Modified)](https://reader033.vdocuments.site/reader033/viewer/2022052820/55286127550346aa588b4773/html5/thumbnails/13.jpg)
SCCS 453 DW and DMSemester 2, Year 2006
Songsri Tangsripairoj, Ph.D. 13
Web Applications
Technique Application Problem Type
Rule induct,
Visualization
User browsing similarity analy.
Classification,
Association
Rule-based heuristics
Web page content similarity
Association
![Page 14: Lecture8_Overview of DM Techniques Modified)](https://reader033.vdocuments.site/reader033/viewer/2022052820/55286127550346aa588b4773/html5/thumbnails/14.jpg)
SCCS 453 DW and DMSemester 2, Year 2006
Songsri Tangsripairoj, Ph.D. 14
Other Applications
Technique Application Problem Type
Neural net Software cost Detection
Neural net,
rule induct
Litigation assessment
Prediction
Rule induct Insurance fraud
Healthcare except.
Detection
Detection
Case based Insurance claim
Software quality
Prediction
Classification
Genetic algor. Budget spending Classification
![Page 15: Lecture8_Overview of DM Techniques Modified)](https://reader033.vdocuments.site/reader033/viewer/2022052820/55286127550346aa588b4773/html5/thumbnails/15.jpg)
SCCS 453 DW and DMSemester 2, Year 2006
Songsri Tangsripairoj, Ph.D. 15
Demonstration Data Sets
Loan Application Data classification
Job Application Data classification
Insurance Fraud Data detection
Expenditure Data prediction
![Page 16: Lecture8_Overview of DM Techniques Modified)](https://reader033.vdocuments.site/reader033/viewer/2022052820/55286127550346aa588b4773/html5/thumbnails/16.jpg)
SCCS 453 DW and DMSemester 2, Year 2006
Songsri Tangsripairoj, Ph.D. 16
Loan Data
650 observations OUTCOMES (binary):
On-time cost of error: $300 Late (default)cost of error: $2,000
Variables: Age, Income, Assets, Debts, Want, Credit
Credit ordinal Transform: Assets, Debts, & Want →Risk
![Page 17: Lecture8_Overview of DM Techniques Modified)](https://reader033.vdocuments.site/reader033/viewer/2022052820/55286127550346aa588b4773/html5/thumbnails/17.jpg)
SCCS 453 DW and DMSemester 2, Year 2006
Songsri Tangsripairoj, Ph.D. 17
Example Loan Data
![Page 18: Lecture8_Overview of DM Techniques Modified)](https://reader033.vdocuments.site/reader033/viewer/2022052820/55286127550346aa588b4773/html5/thumbnails/18.jpg)
SCCS 453 DW and DMSemester 2, Year 2006
Songsri Tangsripairoj, Ph.D. 18
Job Application Data
500 observations OUTCOMES (ordinal):
Unacceptable Minimal Acceptable Excellent
Variables: Age, State, Degree, Major, Experience
State nominal; degree & major ordinal State is superfluous
![Page 19: Lecture8_Overview of DM Techniques Modified)](https://reader033.vdocuments.site/reader033/viewer/2022052820/55286127550346aa588b4773/html5/thumbnails/19.jpg)
SCCS 453 DW and DMSemester 2, Year 2006
Songsri Tangsripairoj, Ph.D. 19
Example Job App. Data
![Page 20: Lecture8_Overview of DM Techniques Modified)](https://reader033.vdocuments.site/reader033/viewer/2022052820/55286127550346aa588b4773/html5/thumbnails/20.jpg)
SCCS 453 DW and DMSemester 2, Year 2006
Songsri Tangsripairoj, Ph.D. 20
Insurance Claim Data
5000 observations OUTCOMES (binary):
OK cost of error $500 Fraudulent cost of error $2,500
Variables: Age, Gender, Claim, Tickets, Prior claims, Attorney
Gender & attorney nominal, tickets & prior claims categorical
![Page 21: Lecture8_Overview of DM Techniques Modified)](https://reader033.vdocuments.site/reader033/viewer/2022052820/55286127550346aa588b4773/html5/thumbnails/21.jpg)
SCCS 453 DW and DMSemester 2, Year 2006
Songsri Tangsripairoj, Ph.D. 21
Example Insurance Claim Data
![Page 22: Lecture8_Overview of DM Techniques Modified)](https://reader033.vdocuments.site/reader033/viewer/2022052820/55286127550346aa588b4773/html5/thumbnails/22.jpg)
SCCS 453 DW and DMSemester 2, Year 2006
Songsri Tangsripairoj, Ph.D. 22
Expenditure Data
10,000 observations OUTCOMES:
Could predict response in a number of categories Others
Variables: Age, Gender, Marital, Dependents, Income, Job
years, Town years, Education years, Drivers license, Own home, Number of credit cards
Churn, proportion of income spent on seven categories
![Page 23: Lecture8_Overview of DM Techniques Modified)](https://reader033.vdocuments.site/reader033/viewer/2022052820/55286127550346aa588b4773/html5/thumbnails/23.jpg)
SCCS 453 DW and DMSemester 2, Year 2006
Songsri Tangsripairoj, Ph.D. 23
Example Expenditure Data