data mining part i iiit allahabad
DESCRIPTION
DATA MINING Part I IIIT Allahabad. Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Dallas, Texas 75275, USA [email protected] http://lyle.smu.edu/~ mhd/dmiiit.html - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/1.jpg)
1
DATA MINING Part I
IIIT Allahabad
Margaret H. DunhamDepartment of Computer Science and Engineering
Southern Methodist UniversityDallas, Texas 75275, USA
[email protected]://lyle.smu.edu/~mhd/dmiiit.html
Some slides extracted from Data Mining, Introductory and Advanced Topics, Prentice Hall, 2002.
Support provided by Fulbright Grant and IIIT AllahabadIIIT Allahabad
![Page 2: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/2.jpg)
2IIIT Allahabad
![Page 3: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/3.jpg)
3
Data Mining Outline
Part I: Introduction (19/1 – 20/1) Part II: Classification (24/1 – 27/1) Part III: Clustering (31/1 – 3/2) Part IV: Association Rules (7/2 – 10/2) Part V: Applications (14/2 – 17/2)
IIIT Allahabad
![Page 4: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/4.jpg)
Class Structure
Each class is two hours Tuesday/Wednesday presentation Thursday/Friday Lab
4IIIT Allahabad
![Page 5: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/5.jpg)
5
Data Mining Part I Introduction Outline
Lecture– Define data mining– Data mining vs. databases– Basic data mining tasks– Data mining development– Data mining issues
Lab– Download XLMiner and Weka– Analyze simple dataset
IIIT Allahabad
Goal: Provide an overview of data mining.
![Page 6: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/6.jpg)
6
Introduction
Data is growing at a phenomenal rate Users expect more sophisticated
information How?
IIIT Allahabad
UNCOVER HIDDEN INFORMATIONDATA MINING
![Page 7: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/7.jpg)
7
Data Mining Definition
Finding hidden information in a database
Fit data to a model Similar terms
– Exploratory data analysis– Data driven discovery– Deductive learning
IIIT Allahabad
![Page 8: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/8.jpg)
8
Data Mining Algorithm
Objective: Fit Data to a Model– Descriptive– Predictive
Preference – Technique to choose the best model
Search – Technique to search the data– “Query”
IIIT Allahabad
![Page 9: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/9.jpg)
9
Database Processing vs. Data Mining Processing
Query– Well defined– SQL
Query– Poorly defined– No precise query language
IIIT Allahabad
Data– Operational data
Output– Precise– Subset of database
Data– Not operational data
Output– Fuzzy– Not a subset of database
![Page 10: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/10.jpg)
10
Query Examples Database
Data Mining
IIIT Allahabad
– Find all customers who have purchased milk
– Find all items which are frequently purchased with milk. (association rules)
– Find all credit applicants with last name of Smith.– Identify customers who have purchased more
than $10,000 in the last month.
– Find all credit applicants who are poor credit risks. (classification)
– Identify customers with similar buying habits. (Clustering)
![Page 11: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/11.jpg)
11
Basic Data Mining Tasks Classification maps data into predefined
groups or classes– Supervised learning– Prediction– Regression
Clustering groups similar data together into clusters.– Unsupervised learning– Segmentation– Partitioning
IIIT Allahabad
![Page 12: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/12.jpg)
12
Basic Data Mining Tasks (cont’d)
Link Analysis uncovers relationships among data.– Affinity Analysis– Association Rules– Sequential Analysis determines sequential
patterns.
IIIT Allahabad
![Page 13: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/13.jpg)
13
CLASSIFICATION
Assign data into predefined groups or classes.
IIIT Allahabad
![Page 14: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/14.jpg)
14
But it isn’t Magic
You must know what you are looking for You must know how to look for you
IIIT Allahabad
Suppose you knew that a specific cave had gold:
What would you look for?
How would you look for it?
Might need an expert miner
![Page 15: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/15.jpg)
15
“If it looks like a duck, walks like a duck, and quacks like a duck, then
it’s a duck.”
IIIT Allahabad
Description Behavior Associations
Classification Clustering Link Analysis (Profiling) (Similarity)
“If it looks like a terrorist, walks like a terrorist, and quacks like a terrorist, then
it’s a terrorist.”
![Page 16: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/16.jpg)
16
Classification Ex: Grading
IIIT Allahabad
>=90<90x
>=80<80
x
>=70<70
x
F
B
A
>=60<50
x C
D
![Page 17: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/17.jpg)
17IIIT Allahabad
Grasshoppers
KatydidsGiven a collection of annotated data. (in this case 5 instances of Katydids and five of Grasshoppers), decide what type of insect the unlabeled example is.
(c) Eamonn Keogh, [email protected]
![Page 18: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/18.jpg)
18IIIT Allahabad
Insect ID Abdomen Length
Antennae Length
Insect Class
1 2.7 5.5 Grasshopper
2 8.0 9.1 Katydid
3 0.9 4.7 Grasshopper
4 1.1 3.1 Grasshopper
5 5.4 8.5 Katydid
6 2.9 1.9 Grasshopper
7 6.1 6.6 Katydid
8 0.5 1.0 Grasshopper
9 8.3 6.6 Katydid
10 8.1 4.7 Katydid
11 5.1 7.0 ???????
The classification problem can now be expressed as:
Given a training database predict the class label of a previously unseen instance
previously unseen instance = (c) Eamonn Keogh, [email protected]
![Page 19: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/19.jpg)
19IIIT Allahabad
Ant
enna
Len
gth
10
1 2 3 4 5 6 7 8 9 10
123456789
Grasshoppers KatydidsAbdomen Length
(c) Eamonn Keogh, [email protected]
![Page 21: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/21.jpg)
21IIIT Allahabad
Handwriting
Recognition
George Washington Manuscript
0 50 100 150 200 250 300 350 400 4500
0.5
1
(c) Eamonn Keogh, [email protected]
![Page 22: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/22.jpg)
22
Anomaly Detection
IIIT Allahabad
![Page 23: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/23.jpg)
23IIIT Allahabad
![Page 24: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/24.jpg)
24
CLUSTERING
Partition data into previously undefined groups.
IIIT Allahabad
![Page 25: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/25.jpg)
25IIIT Allahabad
http://149.170.199.144/multivar/ca.htm
![Page 27: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/27.jpg)
27
Two Types of Clustering
IIIT Allahabad
Hierarchical Partitional
(c) Eamonn Keogh, [email protected]
![Page 28: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/28.jpg)
28
Hierarchical Clustering Example
Iris Data Set
IIIT Allahabad
Setosa
Versicolor
VirginicaThe data originally appeared in Fisher, R. A. (1936). "The Use of Multiple Measurements in Axonomic Problems," Annals of Eugenics 7, 179-188.Hierarchical Clustering Explorer Version 3.0, Human-Computer Interaction Lab, University of Maryland, http://www.cs.umd.edu/hcil/multi-cluster .
![Page 29: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/29.jpg)
29
http://www.time.com/time/magazine/article/0,9171,1541283,00.html
IIIT Allahabad
![Page 30: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/30.jpg)
30
Microarray Data Analysis Each probe location associated with gene Color indicates degree of gene expression Compare different samples (normal/disease) Track same sample over time Questions
– Which genes are related to this disease?– Which genes behave in a similar manner?– What is the function of a gene?
Clustering– Hierarchical– K-means
IIIT Allahabad
![Page 31: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/31.jpg)
31
Microarray Data - Clustering
IIIT Allahabad
"Gene expression profiling identifies clinically relevant subtypes of prostate cancer"Proc. Natl. Acad. Sci. USA, Vol. 101, Issue 3, 811-816, January 20, 2004
![Page 32: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/32.jpg)
32
ASSOCIATION RULES/ LINK ANALYSIS
Find relationships between data
IIIT Allahabad
![Page 33: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/33.jpg)
33
ASSOCIATION RULES EXAMPLES
People who buy diapers also buy beer If gene A is highly expressed in this
disease then gene A is also expressed Relationships between people Book Stores Department Stores Advertising Product PlacementIIIT Allahabad
![Page 34: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/34.jpg)
34IIIT Allahabad
Data Mining Introductory and Advanced Topics, by Margaret H. Dunham, Prentice Hall, 2003.DILBERT reprinted by permission of United Feature Syndicate, Inc.
![Page 35: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/35.jpg)
35IIIT Allahabad
Joshua Benton and Holly K. Hacker, “At Charters, Cheating’s off the Charts:, Dallas Morning News, June 4, 2007.
![Page 36: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/36.jpg)
36
No/Little Cheating
IIIT Allahabad
Joshua Benton and Holly K. Hacker, “At Charters, Cheating’s off the Charts:, Dallas Morning News, June 4, 2007.
![Page 37: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/37.jpg)
37
Rampant Cheating
IIIT Allahabad
Joshua Benton and Holly K. Hacker, “At Charters, Cheating’s off the Charts:, Dallas Morning News, June 4, 2007.
![Page 38: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/38.jpg)
38IIIT Allahabad
Jialun Qin, Jennifer J. Xu, Daning Hu, Marc Sageman and Hsinchun Chen, “Analyzing Terrorist Networks: A Case Study of the Global Salafi Jihad Network” Lecture Notes in Computer Science, Publisher: Springer-Verlag GmbH, Volume 3495 / 2005 , p. 287.
![Page 39: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/39.jpg)
39
Ex: Stock Market Analysis
Example: Stock Market Predict future values Determine similar patterns over time Classify behavior
IIIT Allahabad
![Page 40: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/40.jpg)
40
Ex: Stock Market Analysis
IIIT Allahabad
![Page 41: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/41.jpg)
41
Data Mining vs. KDD
Knowledge Discovery in Databases (KDD): process of finding useful information and patterns in data.
Data Mining: Use of algorithms to extract the information and patterns derived by the KDD process.
IIIT Allahabad
![Page 42: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/42.jpg)
42
KDD Process
Selection: Obtain data from various sources. Preprocessing: Cleanse data. Transformation: Convert to common format.
Transform to new format. Data Mining: Obtain desired results. Interpretation/Evaluation: Present results to
user in meaningful manner.IIIT Allahabad
Modified from [FPSS96C]
![Page 43: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/43.jpg)
43
KDD Process Ex: Web Log Selection:
– Select log data (dates and locations) to use Preprocessing:
– Remove identifying URLs; Remove error logs Transformation:
– Sessionize (sort and group) Data Mining:
– Identify and count patterns; Construct data structure Interpretation/Evaluation:
– Identify and display frequently accessed sequences. Potential User Applications:
– Cache prediction– Personalization
IIIT Allahabad
![Page 44: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/44.jpg)
Related Topics
Databases OLTP OLAP Information Retrieval
44IIIT Allahabad
![Page 45: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/45.jpg)
45
DB & OLTP Systems Schema
– (ID,Name,Address,Salary,JobNo) Data Model
– ER– Relational
Transaction Query:
SELECT NameFROM TWHERE Salary > 100000
DM: Only imprecise queriesIIIT Allahabad
![Page 46: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/46.jpg)
46
Classification/Prediction is Fuzzy
Loan
Amnt
Simple Fuzzy
Accept Accept
RejectReject
IIIT Allahabad
![Page 47: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/47.jpg)
47
Information Retrieval Information Retrieval (IR): retrieving desired
information from textual data. Library Science Digital Libraries Web Search Engines Traditionally keyword based Sample query:
Find all documents about “data mining”.
DM: Similarity measures; Mine text/Web data.
IIIT Allahabad
![Page 48: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/48.jpg)
48
Information Retrieval (cont’d) Similarity: measure of how close a
query is to a document. Documents which are “close enough”
are retrieved. Metrics:
–Precision = |Relevant and Retrieved| |Retrieved|
–Recall = |Relevant and Retrieved| |Relevant|
IIIT Allahabad
![Page 49: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/49.jpg)
49
IR Query Result Measures and Classification
IR ClassificationIIIT Allahabad
![Page 50: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/50.jpg)
50
OLAP Online Analytic Processing (OLAP): provides more
complex queries than OLTP. OnLine Transaction Processing (OLTP): traditional
database/transaction processing. Dimensional data; cube view Visualization of operations:
– Slice: examine sub-cube.– Dice: rotate cube to look at another dimension.– Roll Up/Drill Down
DM: May use OLAP queries.
IIIT Allahabad
![Page 51: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/51.jpg)
51
DM vs. Related TopicsArea Query Data Results Output
DB/OLTP Precise Database Precise DB Objects or Aggregation
IR Precise Documents Vague Documents OLAP Analysis Multidimensional Precise DB Objects
or Aggregation
DM Vague Preprocessed Vague KDD Objects
IIIT Allahabad
![Page 52: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/52.jpg)
52
Data Mining Development
IIIT Allahabad
• Similarity Measures• Hierarchical Clustering• IR Systems• Imprecise Queries• Textual Data• Web Search Engines
• Bayes Theorem• Regression Analysis• EM Algorithm• K-Means Clustering• Time Series Analysis
• Neural Networks• Decision Tree Algorithms
• Algorithm Design Techniques• Algorithm Analysis• Data Structures
• Relational Data Model• SQL• Association Rule
Algorithms• Data Warehousing• Scalability Techniques
![Page 53: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/53.jpg)
53
KDD Issues
Human Interaction Overfitting Outliers Interpretation Visualization Large Datasets High Dimensionality
IIIT Allahabad
![Page 54: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/54.jpg)
Overfitting Suppose we want to predict whether an individual is
short, medium, or tall in height. What is wrong with this data?
54IIIT Allahabad
Name Gender Height OutputMary F 1.6 ShortMaggie F 1.9 MediumMartha F 1.88 MediumStephanie F 1.7 ShortBob M 1.85 MediumKathy F 1.6 ShortGeorge M 1.7 ShortDebbie F 1.8 MediumTodd M 1.95 MediumKim F 1.9 MediumAmy F 1.8 MediumWynette F 1.75 Medium
![Page 55: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/55.jpg)
55
KDD Issues (cont’d)
Multimedia Data Missing Data Irrelevant Data Noisy Data Changing Data Integration Application
IIIT Allahabad
![Page 56: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/56.jpg)
56
WARNING With data mining you don’t always know
what you are looking for. There is not one right answer. The data you are using is noisy Data Mining is a very applied discipline. A data mining course provides you tools
to use to analyze data. Experience provides you knowledge of
how to use these tools.IIIT Allahabad
![Page 57: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/57.jpg)
57IIIT Allahabadhttp://ieeexplore.ieee.org/iel5/6/32236/01502526.pdf?tp=&arnumber=1502526&isnumber=32236
![Page 58: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/58.jpg)
58IIIT Allahabad
![Page 59: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/59.jpg)
59
Social Implications of DM
Privacy Profiling Unauthorized use Invalid results and claims
IIIT Allahabad
![Page 60: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/60.jpg)
60
Data Mining Metrics
Usefulness Return on Investment (ROI) Accuracy … Space/Time
IIIT Allahabad
![Page 61: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/61.jpg)
61
Visualization Techniques
Graphical Geometric Icon-based Pixel-based Hierarchical Hybrid
IIIT Allahabad
![Page 62: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/62.jpg)
62
Models Based on Summarization
Visualization: Frequency distribution, mean, variance, median, mode, etc.
Box Plot:
IIIT Allahabad
![Page 63: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/63.jpg)
63
Scatter Diagram
IIIT Allahabad
![Page 64: DATA MINING Part I IIIT Allahabad](https://reader036.vdocuments.site/reader036/viewer/2022081505/56815c29550346895dc9fedb/html5/thumbnails/64.jpg)
DM Tools XLMiner – Easy addin to Excelhttp://www.solver.com/xlminer/index.html Weka – Open Source; Visualization,
Functionality, Interfacehttp://www.cs.waikato.ac.nz/ml/weka/ SAS (JMP) – Commercial Product SPSS – Commercial Product MATLAB – Statistical/Math Applications R – Programming
64IIIT Allahabad