data mining versus semantic web
DESCRIPTION
Data Mining Versus Semantic Web. Veljko Milutinovic, [email protected] http://galeb.etf.bg.ac.yu/vm. This material was developed with financial help of the WUSA fund of Austria. Two different avenues leading to the same goal! - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/1.jpg)
Page 1/64
Data Mining Data Mining Versus Versus
Semantic WebSemantic Web
Data Mining Data Mining Versus Versus
Semantic WebSemantic Web
Veljko Milutinovic, [email protected]
http://home.etf.rs/~vm
This material was developed with financial help of the WUSA fund of Austria.
![Page 2: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/2.jpg)
Page 2/64
DataMining versus SemanticWeb
• Two different avenues leading to the same goal!• The goal:
Efficient retrieval of knowledge,from large compact or distributed databases, or the Internet
• What is the knowledge: Synergistic interaction of information (data)and its relationships (correlations).
• The major difference: Placement of complexity!
![Page 3: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/3.jpg)
Page 3/64
Essence of DataMining
• Data and knowledge representedwith simple mechanisms (typically, HTML)and without metadata (data about data).
• Consequently, relatively complex algorithms have to be used (complexity migratedinto the retrieval request time).
• In return,low complexity at system design time!
![Page 4: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/4.jpg)
Page 4/64
Essence of SemanticWeb
• Data and knowledge representedwith complex mechanisms (typically XML)and with plenty of metadata (a byte of data may be accompanied with a megabyte of metadata).
• Consequently, relatively simple algorithms can be used (low complexity at the retrieval request time).
• However, large metadata designand maintenance complexityat system design time.
![Page 5: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/5.jpg)
Page 5/64
Major Knowledge Retrieval Algorithms (for DataMining)
• Neural Networks• Decision Trees• Rule Induction• Memory Based Reasoning,
etc…
Consequently, the stress is on algorithms!
![Page 6: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/6.jpg)
Page 6/64
Major Metadata Handling Tools (for SemanticWeb)
• XML• RDF• Ontology Languages• Verification (Logic +Trust) Efforts in Progress
Consequently, the stress is on tools!
![Page 7: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/7.jpg)
Page 7/64
Issues in Data Mining Issues in Data Mining InfrastructureInfrastructure
Issues in Data Mining Issues in Data Mining InfrastructureInfrastructure
Authors: Nemanja Jovanovic, [email protected] Milenkovic, [email protected] Veljko Milutinovic, [email protected]
http://home.etf.rs/~vm
![Page 8: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/8.jpg)
Page 8/64
Ivana Vujovic ([email protected])Erich Neuhold ([email protected])
Peter Fankhauser ([email protected])Claudia Niederée ([email protected])
Veljko Milutinovic ([email protected])
http://home.etf.rs/~vm
![Page 9: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/9.jpg)
Page 9/64
Data Mining in the NutshellData Mining in the NutshellData Mining in the NutshellData Mining in the Nutshell
Uncovering the hidden knowledge
Huge n-p complete search space
Multidimensional interface
![Page 10: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/10.jpg)
Page 10/64
A Problem …A Problem …A Problem …A Problem …
You are a marketing manager for a cellular phone company
Problem: Churn is too high
Bringing back a customer after quitting is both difficult and expensive
Giving a new telephone to everyone whose contract is expiring is very expensive (as well as wasteful)
You pay a sales commission of 250$ per contract
Customers receive free phone (cost 125$) with contract
Turnover (after contract expires) is 40%
![Page 11: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/11.jpg)
Page 11/64
… … A SolutionA Solution… … A SolutionA Solution
Three months before a contract expires, predict which customers will leave
If you want to keep a customer that is predicted to churn, offer them a new phone
The ones that are not predicted to churn need no attention
If you don’t want to keep the customer, do nothing
How can you predict future behavior?
Tarot Cards?
Magic Ball?
Data Mining?
![Page 12: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/12.jpg)
Page 12/64
Still Skeptical?Still Skeptical?Still Skeptical?Still Skeptical?
![Page 13: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/13.jpg)
Page 13/64
The DefinitionThe DefinitionThe DefinitionThe Definition
Automated
The automated extraction of predictive information from (large) databases
Extraction
Predictive
Databases
![Page 14: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/14.jpg)
Page 14/64
History of Data MiningHistory of Data MiningHistory of Data MiningHistory of Data Mining
![Page 15: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/15.jpg)
Page 15/64
Repetition in Solar ActivityRepetition in Solar ActivityRepetition in Solar ActivityRepetition in Solar Activity
1613 – Galileo Galilei
1859 – Heinrich Schwabe
![Page 16: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/16.jpg)
Page 16/64
The Return of theThe Return of theHalley CometHalley Comet
The Return of theThe Return of theHalley CometHalley Comet
1910 1986 2061 ???
1531
1607
1682
239 BC
Edmund Halley (1656 - 1742)
![Page 17: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/17.jpg)
Page 17/64
Data Mining is NotData Mining is NotData Mining is NotData Mining is Not
Data warehousing
Ad-hoc query/reporting
Online Analytical Processing (OLAP)
Data visualization
![Page 18: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/18.jpg)
Page 18/64
Data Mining isData Mining isData Mining isData Mining is
Automated extraction of predictive informationfrom various data sources
Powerful technology with great potential to help users focus on the most important information stored in data warehouses or streamed through communication lines
![Page 19: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/19.jpg)
Page 19/64
Focus of this PresentationFocus of this PresentationFocus of this PresentationFocus of this Presentation
Data Mining problem types
Data Mining models and algorithms
Efficient Data Mining
Available software
![Page 20: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/20.jpg)
Page 20/64
Data MiningData Mining Problem Types Problem Types
Data MiningData Mining Problem Types Problem Types
![Page 21: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/21.jpg)
Page 21/64
Data Mining Problem TypesData Mining Problem TypesData Mining Problem TypesData Mining Problem Types
6 types
Often a combination solves the problem
![Page 22: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/22.jpg)
Page 22/64
Data Description and Data Description and SummarizationSummarization
Data Description and Data Description and SummarizationSummarization
Aims at concise description of data characteristics
Lower end of scale of problem types
Provides the user an overview of the data structure
Typically a sub goal
![Page 23: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/23.jpg)
Page 23/64
SegmentationSegmentationSegmentationSegmentation
Separates the data into interesting and meaningful subgroups or classes
Manual or (semi)automatic
A problem for itself or just a step in solving a problem
![Page 24: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/24.jpg)
Page 24/64
ClassificationClassificationClassificationClassification
Assumption: existence of objects with characteristics that belong to different classes
Building classification models which assign correct labels in advance
Exists in wide range of various application
Segmentation can provide labels or restrict data sets
![Page 25: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/25.jpg)
Page 25/64
Concept DescriptionConcept DescriptionConcept DescriptionConcept Description
Understandable description of concepts or classes
Close connection to both segmentation and classification
Similarity and differences to classification
![Page 26: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/26.jpg)
Page 26/64
Prediction (Regression)Prediction (Regression)Prediction (Regression)Prediction (Regression)
Similar to classification - difference:discrete becomes continuous
Finds the numerical value of the target attribute for unseen objects
![Page 27: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/27.jpg)
Page 27/64
Dependency AnalysisDependency AnalysisDependency AnalysisDependency Analysis
Finding the model that describes significant dependences between data items or events
Prediction of value of a data item
Special case: associations
![Page 28: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/28.jpg)
Page 28/64
Data Mining ModelsData Mining ModelsData Mining ModelsData Mining Models
7A AntiPasti
Analytics
Animation
Anecdotic
Advantages
Applications
AfterTea
![Page 29: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/29.jpg)
Page 29/64
Neural NetworksNeural NetworksNeural NetworksNeural Networks
Characterizes processed data with single numeric value
Efficient modeling of large and complex problems
Based on biological structures Neurons
Network consists of neurons grouped into layers
![Page 30: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/30.jpg)
Page 30/64
Neuron FunctionalityNeuron FunctionalityNeuron FunctionalityNeuron Functionality
I1
I2
I3
In
Output
W1
W2
W3
Wn
f
Output = f (W1*I1, W2*I2, …, Wn*In)Output = f (W1*I1, W2*I2, …, Wn*In)
![Page 31: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/31.jpg)
Page 31/64
Training Neural NetworksTraining Neural NetworksTraining Neural NetworksTraining Neural Networks
Advantages? Applications? Anecdotes (Dark+Dark)?
![Page 32: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/32.jpg)
Page 32/64
Decision TreesDecision TreesDecision TreesDecision Trees
A way of representing a series of rules that lead to a class or value
Iterative splitting of data into discrete groups maximizing distance between them at each split
CHAID, CHART, Quest, C5.0
Classification trees and regression trees
Unlimited growth and stopping rules
Univariate splits and multivariate splits
![Page 33: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/33.jpg)
Page 33/64
Decision TreesDecision TreesDecision TreesDecision Trees
Balance>10 Balance<=10
Age<=32 Age>32
Married=NO Married=YES
![Page 34: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/34.jpg)
Page 34/64
Decision TreesDecision TreesDecision TreesDecision Trees
Advantages? Applications? Anecdotes (CRO+Past)?
![Page 35: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/35.jpg)
Page 35/64
Rule InductionRule InductionRule InductionRule Induction
Method of deriving a set of rules to classify cases
Creates independent rules that are unlikely to form a tree
Rules may not cover all possible situations
Rules may sometimes conflict in a prediction
![Page 36: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/36.jpg)
Page 36/64
Rule InductionRule InductionRule InductionRule Induction
If balance>100.000 then confidence=HIGH & weight=1.7
If balance>25.000 andstatus=married
then confidence=HIGH & weight=2.3
If balance<40.000 then confidence=LOW & weight=1.9
Advantages? Applications? Anecdotes (Teeth+Devor)
![Page 37: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/37.jpg)
Page 37/64
K-nearest Neighbor and K-nearest Neighbor and Memory-Based Reasoning (MBR)Memory-Based Reasoning (MBR)
K-nearest Neighbor and K-nearest Neighbor and Memory-Based Reasoning (MBR)Memory-Based Reasoning (MBR)
Usage of knowledge of previously solved similar problems in solving the new problem
Assigning the class to the group where most of the k-”neighbors” belong
First step – finding the suitable measure for distance between attributes in the data
How far is black from green?
+ Easy handling of non-standard data types
- Huge models
![Page 38: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/38.jpg)
Page 38/64
K-nearest Neighbor and K-nearest Neighbor and Memory-Based Reasoning (MBR)Memory-Based Reasoning (MBR)
K-nearest Neighbor and K-nearest Neighbor and Memory-Based Reasoning (MBR)Memory-Based Reasoning (MBR)
Advantages? Applications? Anecdotes (Predictions Based on Similarities)? E: Nd+1s, so(loMOTH), prof(badA>off), prof(abuA>fam)
![Page 39: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/39.jpg)
Page 39/64
Data Mining Models Data Mining Models and Algorithmsand Algorithms
Data Mining Models Data Mining Models and Algorithmsand Algorithms
Logistic regression
Discriminant analysis
Generalized Adaptive Models (GAM)
Genetic algorithms
Etc…
Many other available models and algorithms
Many application-specific variations of known models
Final implementation usually involves several techniques
Adding selected metadata (e.g., time for prediction)E:27+sports(5+6+7+11+…+2+1)
![Page 40: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/40.jpg)
Page 40/64
Efficient Data MiningEfficient Data Mining Efficient Data MiningEfficient Data Mining
![Page 41: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/41.jpg)
Page 41/64
Don’t Mess With It!
YES NO
YES
You Shouldn’t Have!
NO
Will it ExplodeIn Your Hands?
NO
Look The Other Way
Anyone ElseKnows? You’re in TROUBLE!
YESYES
NO
Hide ItCan You Blame Someone Else?
NO
NO PROBLEM!
YES
Is It Working?
Did You Mess With It?
![Page 42: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/42.jpg)
Page 42/64
DM Process ModelDM Process ModelDM Process ModelDM Process Model
CRISP–DM – tends to become a standard
5A – used by SPSS Clementine(Assess, Access, Analyze, Act and Automate)
SEMMA – used by SAS Enterprise Miner(Sample, Explore, Modify, Model and Assess)
![Page 43: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/43.jpg)
Page 43/64
CRISP - DMCRISP - DMCRISP - DMCRISP - DM
CRoss-Industry Standard for DM
Conceived in 1996 by three companies:
![Page 44: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/44.jpg)
Page 44/64
CRISP – DM methodologyCRISP – DM methodologyCRISP – DM methodologyCRISP – DM methodology
Four level breakdown of the CRISP-DM methodology:
Phases
Generic Tasks
Process Instances
Specialized Tasks
![Page 45: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/45.jpg)
Page 45/64
Mapping generic modelsMapping generic modelsto specialized modelsto specialized models
Mapping generic modelsMapping generic modelsto specialized modelsto specialized models
Analyze the specific context
Remove any details not applicable to the context
Add any details specific to the context
Specialize generic context according toconcrete characteristic of the context
Possibly rename generic contents to provide more explicit meanings
![Page 46: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/46.jpg)
Page 46/64
Generalized and Specialized Generalized and Specialized CookingCooking
Generalized and Specialized Generalized and Specialized CookingCooking
Preparing food on your own Find out what you want to eat
Find the recipe for that meal
Gather the ingredients
Prepare the meal
Enjoy your food
Clean up everything (or leave it for later)
Raw stake with vegetables?
Check the Cookbook or call mom
Defrost the meat (if you had it in the fridge)
Buy missing ingredients or borrow the from the neighbors
Cook the vegetables and fry the meat
Enjoy your food or even more
You were cooking so convince someone else to do the dishes
![Page 47: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/47.jpg)
Page 47/64
CRISP – DM modelCRISP – DM modelCRISP – DM modelCRISP – DM model
Business understanding
Data understanding
Data preparation
Modeling
Evaluation
Deployment
Business understanding
Data understanding
Datapreparation
ModelingEvaluation
Deployment
![Page 48: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/48.jpg)
Page 48/64
Business UnderstandingBusiness UnderstandingBusiness UnderstandingBusiness Understanding
Determine business objectives
Assess situation
Determine data mining goals
Produce project plan
![Page 49: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/49.jpg)
Page 49/64
Data UnderstandingData UnderstandingData UnderstandingData Understanding
Collect initial data
Describe data
Explore data
Verify data quality
![Page 50: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/50.jpg)
Page 50/64
Data PreparationData PreparationData PreparationData Preparation
Select data
Clean data
Construct data
Integrate data
Format data
![Page 51: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/51.jpg)
Page 51/64
ModelingModelingModelingModeling
Select modeling technique
Generate test design
Build model
Assess model
![Page 52: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/52.jpg)
Page 52/64
EvaluationEvaluationEvaluationEvaluation
Evaluate results
Review process
Determine next steps
results = models + findings
![Page 53: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/53.jpg)
Page 53/64
DeploymentDeploymentDeploymentDeployment
Plan deployment
Plan monitoring and maintenance
Produce final report
Review project
![Page 54: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/54.jpg)
Page 54/64
At Last…At Last…At Last…At Last…
![Page 55: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/55.jpg)
Page 55/64
Available SoftwareAvailable SoftwareAvailable SoftwareAvailable Software
14
![Page 56: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/56.jpg)
Page 56/64
Comparison of forteen DM tools
• The Decision Tree products were - CART - Scenario - See5 - S-Plus
• The Rule Induction tools were - WizWhy - DataMind - DMSK
• Neural Networks were built from three programs - NeuroShell2 - PcOLPARS - PRW
• The Polynomial Network tools were - ModelQuest Expert - Gnosis - a module of NeuroShell2- KnowledgeMiner
![Page 57: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/57.jpg)
Page 57/64
Criteria for evaluating DM tools
A list of 20 criteria for evaluating DM tools, put into 4 categories:
• Capability measures what a desktop tool can do, and how well it does it
- Handless missing data- Considers misclassification
costs - Allows data transformations - Quality of tesing options - Has programming language - Provides useful output reports
- Visualisation
![Page 58: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/58.jpg)
Page 58/64
Criteria for evaluating DM tools
• Learnability/Usability shows how easy a tool is to learn and use:
- Tutorials- Wizards- Easy to learn- User’s manual- Online help- Interface
![Page 59: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/59.jpg)
Page 59/64
Criteria for evaluating DM tools
• Interoperability shows a tool’s ability to interface with other computer applications
- Importing data- Exporting data- Links to other applications
• Flexibility
- Model adjustment flexibility- Customizable work enviroment- Ability to write or change code
![Page 60: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/60.jpg)
Page 60/64
A classification of data sets
• Pima Indians Diabetes data set– 768 cases of Native American women from the Pima tribe
some of whom are diabetic, most of whom are not – 8 attributes plus the binary class variable for diabetes per instance
• Wisconsin Breast Cancer data set – 699 instances of breast tumors
some of which are malignant, most of which are benign– 10 attributes plus the binary malignancy variable per case
• The Forensic Glass Identification data set – 214 instances of glass collected during crime investigations – 10 attributes plus the multi-class output variable per instance
• Moon Cannon data set – 300 solutions to the equation:
x = 2v 2 sin(g)cos(g)/g
– the data were generated without adding noise
![Page 61: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/61.jpg)
Page 61/64
Evaluation of forteen DM tools
![Page 62: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/62.jpg)
Page 62/64
ConclusionsConclusionsConclusionsConclusions
![Page 63: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/63.jpg)
Page 63/64
WWWWWW.NBA..NBA.COMCOMWWWWWW.NBA..NBA.COMCOM
![Page 64: Data Mining Versus Semantic Web](https://reader035.vdocuments.site/reader035/viewer/2022070418/56815a1f550346895dc76356/html5/thumbnails/64.jpg)
Page 64/64
Se7enSe7enSe7enSe7en