data mining introduction
DESCRIPTION
Data Mining Introduction. TYNE SYSTEM Chun-hung, Chou 2003.12.09. Outline. 1. Data Mining Overview 2. Functionalities 3. Software 4. R function 5. Example 6. Q & A. Data Mining Overview. Knowledge Discovery Process. 1. Data cleaning - remove noise and inconsistent data - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Data Mining Introduction](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814eca550346895dbc67fa/html5/thumbnails/1.jpg)
Data Mining IntroductionData Mining Introduction
TYNE SYSTEM
Chun-hung, Chou
2003.12.09
![Page 2: Data Mining Introduction](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814eca550346895dbc67fa/html5/thumbnails/2.jpg)
OutlineOutline
1. Data Mining Overview
2. Functionalities
3. Software
4. R function
5. Example
6. Q & A
![Page 3: Data Mining Introduction](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814eca550346895dbc67fa/html5/thumbnails/3.jpg)
Data Mining Overview
![Page 4: Data Mining Introduction](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814eca550346895dbc67fa/html5/thumbnails/4.jpg)
Knowledge Discovery ProcessKnowledge Discovery Process
1. Data cleaning - remove noise and inconsistent data
2. Data integration - combine multiple data sources
3. Data selection - data relevant to the analysis task
4. Data transformation - the forms for mining
5. Data mining
6. Pattern evaluation - identify
7. Knowledge presentation
![Page 5: Data Mining Introduction](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814eca550346895dbc67fa/html5/thumbnails/5.jpg)
What is Data Mining?What is Data Mining?
• Viewed as part of the Knowledge Discovery process.
• Data mining is a process that uses a variety of data analysis tools to discover patterns and relationships in data.
• Uses tools from Computer Science and Artificial Intelligence as well as Statistics.
![Page 6: Data Mining Introduction](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814eca550346895dbc67fa/html5/thumbnails/6.jpg)
Why do we need data mining?Why do we need data mining?
– Large number of records (cases) (108-1012 bytes)– High dimensional data (variables) (10-104 attributes)– Only a small portion, typically 5% to 10%, of the
collected data is ever analyzed.– Data that may never be explored continues to be
collected out of fear that something that may prove important in the future may be missing.
– Magnitude of data precludes most traditional analysis ANOVA/PC/
![Page 7: Data Mining Introduction](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814eca550346895dbc67fa/html5/thumbnails/7.jpg)
Potential ApplicationsPotential Applications
– Fraud Detection – Manufacturing Processes – Targeting Markets – Scientific Data Analysis– Risk Management– Web Intelligence– Bioinformation– …...
![Page 8: Data Mining Introduction](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814eca550346895dbc67fa/html5/thumbnails/8.jpg)
•Data mining tools need no guidance.•Data mining models explain behavior.•Data mining requires no data analysis skill.•Data mining tools are “different” from statistics•Data mining eliminates the need to understand your business and your data.
Data Mining MythsData Mining Myths
![Page 9: Data Mining Introduction](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814eca550346895dbc67fa/html5/thumbnails/9.jpg)
Data Mining FunctionalitiesData Mining Functionalities
• Concept/Class Description
• Association Analysis
• Classification Analysis
• Cluster Analysis
• Outlier Analysis
• Evolution Analysis
![Page 10: Data Mining Introduction](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814eca550346895dbc67fa/html5/thumbnails/10.jpg)
Concept DescriptionConcept Description
Generate descriptions for characterization and
comparison of data
characterization :
summarizes and describes a collection of data
e.g. mean,distribution,percentile,..
comparison :
summarizes and distinguishes one collection of data from other
collection(s) of data
![Page 11: Data Mining Introduction](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814eca550346895dbc67fa/html5/thumbnails/11.jpg)
Concept DescriptionConcept Description
Method:
visualization:
e.g. boxplot,bar chart, histogram,…
statistics/tabulate:
e.g. mean, std, proportion,contingency table…
![Page 12: Data Mining Introduction](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814eca550346895dbc67fa/html5/thumbnails/12.jpg)
Association AnalysisAssociation Analysis
Goal: find interesting relationships among items in a given data set
![Page 13: Data Mining Introduction](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814eca550346895dbc67fa/html5/thumbnails/13.jpg)
Association AnalysisAssociation Analysis
Example:• Market Basket Analysis - An example of Rule-based
Machine Learning• Customer Analysis
– Market Basket Analysis uses the information about what a customer purchases to give us insight into who they are and why they make certain purchases
• Product Analysis– Market Basket Analysis gives us insight into the
merchandise by telling us which products tend to be purchased together and which are most amenable to purchase
![Page 14: Data Mining Introduction](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814eca550346895dbc67fa/html5/thumbnails/14.jpg)
Classification AnalysisClassification Analysis
Goal:
Build a model to describe a predetermined set of data
classes or concepts and use the model as prediction
![Page 15: Data Mining Introduction](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814eca550346895dbc67fa/html5/thumbnails/15.jpg)
Classification AnalysisClassification Analysis
Method: Decision Tree Bayesian network Bayesian belife network Neural network k-nearest neighbor case-based reasoning genetic algorithm rough sets fuzzy logic
![Page 16: Data Mining Introduction](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814eca550346895dbc67fa/html5/thumbnails/16.jpg)
Cluster AnalysisCluster Analysis
Goal:
grouping a set of physical or abstract objects into classes
of similar objects
![Page 17: Data Mining Introduction](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814eca550346895dbc67fa/html5/thumbnails/17.jpg)
ClusterCluster
• Method:
Partitioning methods :k-means
Hierarchical methods :top-down,bottom-up
Density-based methods :arbitrary shapes
Grid-based methods :cells
Model-based methods :best fit of given model
![Page 18: Data Mining Introduction](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814eca550346895dbc67fa/html5/thumbnails/18.jpg)
Outlier AnalysisOutlier Analysis
Outlier: the data can be considered as
inconsistent in a given data set
Goal: find an efficient method to mine the
outliers
![Page 19: Data Mining Introduction](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814eca550346895dbc67fa/html5/thumbnails/19.jpg)
Outlier AnalysisOutlier Analysis
Method:
- Statistical-Based Outlier Detection
- Distance-Based Outlier Detection
- Deviation-Based Outlier Detection
![Page 20: Data Mining Introduction](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814eca550346895dbc67fa/html5/thumbnails/20.jpg)
Evolution AnalysisEvolution Analysis
• Goal:
Describe and models regularities or trends for
objects whose behavior changes over time
![Page 21: Data Mining Introduction](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814eca550346895dbc67fa/html5/thumbnails/21.jpg)
Evolution AnalysisEvolution Analysis
• Method:
Statistical Method
Trend Analysis
Similarity Search in Time-Series Analysis
Sequential Pattern Mining
Periodicity Analysis
![Page 22: Data Mining Introduction](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814eca550346895dbc67fa/html5/thumbnails/22.jpg)
Commercial Software Commercial Software
• Full Suite
Product Company Price(US$)
EnterpriseMiner SAS >75000
Clementine SPSS ~50000Intelligent Miner IBM ??
Data Miner STATISTICA ~50000
IndexMiner Index Software ??
![Page 23: Data Mining Introduction](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814eca550346895dbc67fa/html5/thumbnails/23.jpg)
Method in RMethod in R
Function R Library
Tree tree
Cluster clara
Cluster diana
Cluster fanny
Cluster mona
Cluster hclust
Cluster kmeans
Cluster cluster
![Page 24: Data Mining Introduction](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814eca550346895dbc67fa/html5/thumbnails/24.jpg)
Example—Decision TreeExample—Decision Tree
• Decision Tree for Tools abnormal detection
AWD080AWD030,AWD050
![Page 25: Data Mining Introduction](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814eca550346895dbc67fa/html5/thumbnails/25.jpg)
Example– Decision TreeExample– Decision Tree
![Page 26: Data Mining Introduction](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814eca550346895dbc67fa/html5/thumbnails/26.jpg)
Example -- ClusterExample -- Cluster
![Page 27: Data Mining Introduction](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814eca550346895dbc67fa/html5/thumbnails/27.jpg)
Question & Suggestion
![Page 28: Data Mining Introduction](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814eca550346895dbc67fa/html5/thumbnails/28.jpg)
Thanks !