intelligent data analysis (ida) by josipa kern, phd andrija stampar school of public health medical...

Post on 25-Dec-2015

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Intelligent Data Intelligent Data AnalysisAnalysis

(IDA) (IDA)

by

Josipa Kern, PhD

Andrija Stampar School of Public Health

Medical School University of Zagreb

Zagreb, Croatia

Interest and Excitement for Interest and Excitement for Intelligent Data AnalysisIntelligent Data Analysis

Decision making is asking for information and knowledge

Data processing can give them

Multidimensionality of problems is looking for methods for adequate and deep data processing and analysis

Learning ObjectivesLearning Objectives

To understand the concept of the IDATo meet web-sites and literature on IDATo meet some tools for IDATo learn how to use IDA tools and to

validate the IDA results

Performance ObjectivesPerformance Objectives

Recognize problems asking for IDAPreparing data and making analysisValidating and interpreting results of

IDA

IDA is…IDA is…

… an interdisciplinary study concerned with the effective analysis of data;

… used for extracting useful information from large quantities of online data; extracting desirable knowledge or interesting patterns from existing databases;

IDA or …IDA or …

Data miningKnowledge acquisition from dataGenetic algorithm-based rule discoveryKnowledge discoveryLearning classifier systemMachine learningetc.

IDA gives knowledge …IDA gives knowledge …

Knowledge is …Knowledge is …

the distillation of information that has been collected, classified, organized, integrated, abstracted and value-added;

at a level of abstraction higher than the data, and information on which it is based and can be used to deduce new information and new knowledge;

usually in the context of human expertise used in solving problems.

Knowledge acquisition …Knowledge acquisition …

The process of eliciting, analyzing, transforming, classifying, organizing and integrating knowledge and representing that knowledge in a form that can be used in a computer system.

Knowledge in a domain can Knowledge in a domain can be expressed as a number be expressed as a number

of rulesof rules

Rule is …Rule is …

A formal way of specifying a recommendation, directive, or strategy, expressed as "IF premise THEN conclusion" or "IF condition THEN action".

How to discover rules How to discover rules hidden in the data?hidden in the data?

Some tools for IDA …Some tools for IDA …

See5 - program for analyzing data and generating classifiers in the form of decision trees and/or rule sets.

http://www.rulequest.com

Some tools for IDA …Some tools for IDA …

Cubist - analyzes data and generates rule-based piecewise linear models – collections of rules, each with an associated linear expression for computing a target value..

http://www.rulequest.com

Some tools for IDA …Some tools for IDA …

ILLM - the tool constructs classification models in the form of rules which represent knowledge about relations hidden in data.

http://dms.irb.hr

Some tools for IDA …Some tools for IDA …

Magnum Opus - finds association rules providing competitive advantage by revealing underlying interactions between factors within the data. 

http://www.rulequest.com

Evaluation of IDA resultsEvaluation of IDA results

Absolute & relative accuracySensitivity & specificityFalse positive & false negativeError rateReliability of rulesEtc.

Example of IDAExample of IDA

Illustration of IDA by using See5

See5…application…See5…application…

application.names - lists the classes to which cases may belong and the attributes used to describe each case.

Attributes are of two types: discrete attributes have a value drawn from a set of possibilities, and continuous attributes have numeric values.

See5…application…See5…application…

application.data - provides information on the training cases from which See5 will extract patterns.

The entry for each case consists of one or more lines that give the values for all attributes.

See5…application…See5…application…

application.test - provides information on the test cases (used for evaluation of results).

The entry for each case consists of one or more lines that give the values for all attributes.

See5…application…See5…application…exampleexample……

Epidemiological study (1970-1990)Sample of examinees died from

cardiovascular diseases during the period

Question: Did they know they were ill?1 – they were healthy2 – they were ill (drug treatment, positive clinical

and laboratory findings)

See5…application…See5…application…exampleexample……

application.names – example

Goal.gender:M,Factivity:1,2,3age: continuoussmoking: No,Yes…Goal:1,2…

See5…application…See5…application…exampleexample……

application.data – example

M,1,59,Yes,0,0,0,0,119,73,103,86,247,87,15979,?,?,?,1,73,2.5

M,1,66,Yes,0,0,0,0,132,81,183,239,?,783,14403,27221,19153,23187,1,73,2.6

M,1,61,No,0,0,0,0,130,79,148,86,209,115,21719,12324,10593,11458,1,74,2.5

… …

See5…application…See5…application…exampleexample……

Results – example 

Rule 1: (cover 26)

gender = M

SBP > 111

oil_fat > 2.9

-> class 1 [0.929]

See5…application…See5…application…exampleexample……

Results – example 

Rule 4: (cover 14)

smoking = Yes

SBP > 131

glucose > 93

glucose <= 118

oil_fat <= 2.9

-> class 2 [0.938]

See5…application…See5…application…exampleexample……

Results – example 

Rule 15: (cover 2)

SBP <= 111

oil_fat > 2.9

-> class 2 [0.750]

See5…application…See5…application…exampleexample……

Results – example 

Evaluation on training data

(199 cases):

  (a) (b) <-classified as

---- ----

107 3 (a): class 1

17 72 (b): class 2

See5…application…See5…application…exampleexample……

Results – example (training set) 

Sensitivity=0.97

Specificity=0.81

See5…application…See5…application…exampleexample……

Results – example 

Evaluation on test data

(73 cases):

 

(a) (b) <-classified as

---- ----

43 1 (a): class 1

3 26 (b): class 2

See5…application…See5…application…exampleexample……

Results – example (test set) 

 Sensitivity=0.98Specificity=0.90

All the suggested IDA tools are All the suggested IDA tools are available at mentioned URLs, at available at mentioned URLs, at

least as demo version least as demo version

Try your own IDA…

Thank you!

top related