hasar : mining sequential association rules for atherosclerosis risk factor analysis

22
HASAR HASAR : Mining Sequential : Mining Sequential Association Rules for Association Rules for Atherosclerosis Risk Atherosclerosis Risk Factor Analysis Factor Analysis Laurent Brisson, Nicolas Pasquier, Céline Laurent Brisson, Nicolas Pasquier, Céline Hebert, Hebert, Martine Collard Martine Collard I3S Laboratory, University of Nice- I3S Laboratory, University of Nice- Sophia Antipolis Sophia Antipolis GREYC Laboratory, University of Caen GREYC Laboratory, University of Caen

Upload: anila

Post on 05-Feb-2016

42 views

Category:

Documents


0 download

DESCRIPTION

HASAR : Mining Sequential Association Rules for Atherosclerosis Risk Factor Analysis. Laurent Brisson, Nicolas Pasquier, Céline Hebert, Martine Collard I3S Laboratory, University of Nice-Sophia Antipolis GREYC Laboratory, University of Caen. Contents. 1. Analytic question & Objectives - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: HASAR : Mining Sequential Association Rules for Atherosclerosis Risk Factor Analysis

HASARHASAR : Mining Sequential : Mining Sequential Association Rules for Association Rules for

Atherosclerosis Risk Factor Atherosclerosis Risk Factor AnalysisAnalysis

Laurent Brisson, Nicolas Pasquier, Céline Hebert, Laurent Brisson, Nicolas Pasquier, Céline Hebert,

Martine CollardMartine Collard

I3S Laboratory, University of Nice-Sophia I3S Laboratory, University of Nice-Sophia AntipolisAntipolis

GREYC Laboratory, University of CaenGREYC Laboratory, University of Caen

Page 2: HASAR : Mining Sequential Association Rules for Atherosclerosis Risk Factor Analysis

ContentsContents

1. Analytic question & Objectives

2. Model & Data Preparation

3. Algorithms

4. Results

Page 3: HASAR : Mining Sequential Association Rules for Atherosclerosis Risk Factor Analysis

Analytic QuestionAnalytic Question

Are there any differences in the development of risk factors and other characteristics between men of the risk group, who came down with the observed cardiovascular diseases and those who stayed healthy ?

Page 4: HASAR : Mining Sequential Association Rules for Atherosclerosis Risk Factor Analysis

ObjectivesObjectives

Evolution of Risk Factors according behavioural changes

• Groups RG versus PG and NG

• Healthy patients (NCVD) versus those with cardiovascular diseases (CVD)

• Groups based on patient education level and job

Page 5: HASAR : Mining Sequential Association Rules for Atherosclerosis Risk Factor Analysis

Sequential RulesSequential Rules

IDE_itemset BEH_time_itemset RF_time_item

• IDE_itemset : static identification attributes

Age of the patient Educational level of the patient Alcohol consumption at the beginning of

the study

Page 6: HASAR : Mining Sequential Association Rules for Atherosclerosis Risk Factor Analysis

Sequential RulesSequential Rules

IDE_itemset BEH_time_itemset RF_time_item

• BEH_time_itemset : behavioural change attributes

Comsumption of cigarettes a day Physical activity after job Physical activity in a job Different kinds of diet Medecine for cholesterol Medecine for blood pressure

Page 7: HASAR : Mining Sequential Association Rules for Atherosclerosis Risk Factor Analysis

Sequential RulesSequential Rules

IDE_itemset BEH_time_itemset RF_time_item

• RF_time_item : risk factor change attribute

Cholesterol level HDL Cholesterol level LDL Cholesterol level Triglycerides level Obesity …

Page 8: HASAR : Mining Sequential Association Rules for Atherosclerosis Risk Factor Analysis

ModelModel

IDE_itemset BEH_time_itemset RF_time_item

• Action periodwhere it occurs at least one control

• Latency perioda waiting time before observing effects

• Observation periodwhere it occurs only one control

Page 9: HASAR : Mining Sequential Association Rules for Atherosclerosis Risk Factor Analysis

Data PreparationData Preparation : creation of : creation of changes variableschanges variables

BEH_OBESITY idControl N

Entry.height = hControl.weight =

w

Control N+1Entry.height = hControl.weight =

w

Unknown -2 h or w = * h or w = Unknown

Unknown -2 h or w = Unknown

h or w = *

Stay_normal

1 w / h² <= 25 w / h² <= 25

Decreased 2 w / h² > 25 w / h² <= 25

Increased 3 w / h² <= 25 w / h² > 25

Stay_high 4 w / h² > 25 w / h² > 25

Page 10: HASAR : Mining Sequential Association Rules for Atherosclerosis Risk Factor Analysis

Data PreparationData Preparation : Flattening : Flattening operationoperation

Initial table : 1 row 1 control

ID ID ControlControl

ID ID PatientPatient

BEH 1BEH 1 …… BEH jBEH j RF 1RF 1 …… RF kRF k

Page 11: HASAR : Mining Sequential Association Rules for Atherosclerosis Risk Factor Analysis

Data PreparationData Preparation : Flattening : Flattening operationoperation

Flattened table : 1 row 1 patient

ID ID PatienPatien

tt

IDE 1 IDE 1

… …

IDE IDE jj

BEH 1 BEH 1

… …

BEH BEH kk

RF 1 RF 1

… …

RF RF mm……

BEH 1 BEH 1

… …

BEH BEH kk

RF 1 RF 1

… …

RF RF mm

control 1

control n

static attributes

Page 12: HASAR : Mining Sequential Association Rules for Atherosclerosis Risk Factor Analysis

Evolutionnary ApproachEvolutionnary Approach

A Genetic Algorithm searching for temporal rules

Fixed-length chromosome

BehavioursIdentification

Risk factor

Page 13: HASAR : Mining Sequential Association Rules for Atherosclerosis Risk Factor Analysis

Evolutionnary ApproachEvolutionnary Approach

A gene for each static identification attributes

IDE

1Behaviours… IDE

jRisk

factor

Page 14: HASAR : Mining Sequential Association Rules for Atherosclerosis Risk Factor Analysis

Evolutionnary ApproachEvolutionnary Approach

A gene for each kind of behavioural changes

Identification

Risk factor

Action period

… BEH k

BEH 1

Page 15: HASAR : Mining Sequential Association Rules for Atherosclerosis Risk Factor Analysis

Evolutionnary ApproachEvolutionnary Approach

One gene to describe a risk factor

Identification

RF iBehaviours

Observation period

Action period

Page 16: HASAR : Mining Sequential Association Rules for Atherosclerosis Risk Factor Analysis

Evolutionnary ApproachEvolutionnary Approach

Identification

RF iBehaviours

Observation period

Action period

Latency period

Fitness function : support * confidence * lift

Page 17: HASAR : Mining Sequential Association Rules for Atherosclerosis Risk Factor Analysis

Genetic Algorithm OptimizationGenetic Algorithm Optimization

• A CLOSE based approach for initialization

• CLOSE algorithm CLOSE algorithm improves:

extraction efficiency reducing the search-space

(use of generators and frequent close itemset)

results relevance suppressing redondant rules(bases generation)

Page 18: HASAR : Mining Sequential Association Rules for Atherosclerosis Risk Factor Analysis

Results : Patient classes Results : Patient classes comparisoncomparison

• Best rulesBest rules on on PGPG versus NG and RG versus NG and RG

Page 19: HASAR : Mining Sequential Association Rules for Atherosclerosis Risk Factor Analysis

Results : Patient classes Results : Patient classes comparisoncomparison

• Best rulesBest rules on on CVDCVD versus NCVD versus NCVD

Page 20: HASAR : Mining Sequential Association Rules for Atherosclerosis Risk Factor Analysis

Results : Initialization MethodsResults : Initialization Methods

• ComparisonComparison on RG group on RG group

Page 21: HASAR : Mining Sequential Association Rules for Atherosclerosis Risk Factor Analysis

ConclusionConclusion

• Different tendencies among groups

• Confirmation of prior medical knowledge

• Contradictions with some "assumptions"

• Further investigations with assistance of medical experts

Page 22: HASAR : Mining Sequential Association Rules for Atherosclerosis Risk Factor Analysis

Future ResearchFuture Researcheses

• To analyse relationships between time windows and various risk factors

• To Develop new evaluation criteria

• To Integrate physician’s prior knowledge

• To apply HASAR approach to other temporal datasets