intelligent data analysis (ida) by josipa kern, phd andrija stampar school of public health medical...
TRANSCRIPT
Intelligent Data Intelligent Data AnalysisAnalysis
(IDA) (IDA)
by
Josipa Kern, PhD
Andrija Stampar School of Public Health
Medical School University of Zagreb
Zagreb, Croatia
Interest and Excitement for Interest and Excitement for Intelligent Data AnalysisIntelligent Data Analysis
Decision making is asking for information and knowledge
Data processing can give them
Multidimensionality of problems is looking for methods for adequate and deep data processing and analysis
Learning ObjectivesLearning Objectives
To understand the concept of the IDATo meet web-sites and literature on IDATo meet some tools for IDATo learn how to use IDA tools and to
validate the IDA results
Performance ObjectivesPerformance Objectives
Recognize problems asking for IDAPreparing data and making analysisValidating and interpreting results of
IDA
IDA is…IDA is…
… an interdisciplinary study concerned with the effective analysis of data;
… used for extracting useful information from large quantities of online data; extracting desirable knowledge or interesting patterns from existing databases;
IDA or …IDA or …
Data miningKnowledge acquisition from dataGenetic algorithm-based rule discoveryKnowledge discoveryLearning classifier systemMachine learningetc.
IDA gives knowledge …IDA gives knowledge …
Knowledge is …Knowledge is …
the distillation of information that has been collected, classified, organized, integrated, abstracted and value-added;
at a level of abstraction higher than the data, and information on which it is based and can be used to deduce new information and new knowledge;
usually in the context of human expertise used in solving problems.
Knowledge acquisition …Knowledge acquisition …
The process of eliciting, analyzing, transforming, classifying, organizing and integrating knowledge and representing that knowledge in a form that can be used in a computer system.
Knowledge in a domain can Knowledge in a domain can be expressed as a number be expressed as a number
of rulesof rules
Rule is …Rule is …
A formal way of specifying a recommendation, directive, or strategy, expressed as "IF premise THEN conclusion" or "IF condition THEN action".
How to discover rules How to discover rules hidden in the data?hidden in the data?
Some tools for IDA …Some tools for IDA …
See5 - program for analyzing data and generating classifiers in the form of decision trees and/or rule sets.
http://www.rulequest.com
Some tools for IDA …Some tools for IDA …
Cubist - analyzes data and generates rule-based piecewise linear models – collections of rules, each with an associated linear expression for computing a target value..
http://www.rulequest.com
Some tools for IDA …Some tools for IDA …
ILLM - the tool constructs classification models in the form of rules which represent knowledge about relations hidden in data.
http://dms.irb.hr
Some tools for IDA …Some tools for IDA …
Magnum Opus - finds association rules providing competitive advantage by revealing underlying interactions between factors within the data.
http://www.rulequest.com
Evaluation of IDA resultsEvaluation of IDA results
Absolute & relative accuracySensitivity & specificityFalse positive & false negativeError rateReliability of rulesEtc.
Example of IDAExample of IDA
Illustration of IDA by using See5
See5…application…See5…application…
application.names - lists the classes to which cases may belong and the attributes used to describe each case.
Attributes are of two types: discrete attributes have a value drawn from a set of possibilities, and continuous attributes have numeric values.
See5…application…See5…application…
application.data - provides information on the training cases from which See5 will extract patterns.
The entry for each case consists of one or more lines that give the values for all attributes.
See5…application…See5…application…
application.test - provides information on the test cases (used for evaluation of results).
The entry for each case consists of one or more lines that give the values for all attributes.
See5…application…See5…application…exampleexample……
Epidemiological study (1970-1990)Sample of examinees died from
cardiovascular diseases during the period
Question: Did they know they were ill?1 – they were healthy2 – they were ill (drug treatment, positive clinical
and laboratory findings)
See5…application…See5…application…exampleexample……
application.names – example
Goal.gender:M,Factivity:1,2,3age: continuoussmoking: No,Yes…Goal:1,2…
See5…application…See5…application…exampleexample……
application.data – example
M,1,59,Yes,0,0,0,0,119,73,103,86,247,87,15979,?,?,?,1,73,2.5
M,1,66,Yes,0,0,0,0,132,81,183,239,?,783,14403,27221,19153,23187,1,73,2.6
M,1,61,No,0,0,0,0,130,79,148,86,209,115,21719,12324,10593,11458,1,74,2.5
… …
See5…application…See5…application…exampleexample……
Results – example
Rule 1: (cover 26)
gender = M
SBP > 111
oil_fat > 2.9
-> class 1 [0.929]
See5…application…See5…application…exampleexample……
Results – example
Rule 4: (cover 14)
smoking = Yes
SBP > 131
glucose > 93
glucose <= 118
oil_fat <= 2.9
-> class 2 [0.938]
See5…application…See5…application…exampleexample……
Results – example
Rule 15: (cover 2)
SBP <= 111
oil_fat > 2.9
-> class 2 [0.750]
See5…application…See5…application…exampleexample……
Results – example
Evaluation on training data
(199 cases):
(a) (b) <-classified as
---- ----
107 3 (a): class 1
17 72 (b): class 2
See5…application…See5…application…exampleexample……
Results – example (training set)
Sensitivity=0.97
Specificity=0.81
See5…application…See5…application…exampleexample……
Results – example
Evaluation on test data
(73 cases):
(a) (b) <-classified as
---- ----
43 1 (a): class 1
3 26 (b): class 2
See5…application…See5…application…exampleexample……
Results – example (test set)
Sensitivity=0.98Specificity=0.90
All the suggested IDA tools are All the suggested IDA tools are available at mentioned URLs, at available at mentioned URLs, at
least as demo version least as demo version
Try your own IDA…
Thank you!