hasar : mining sequential association rules for atherosclerosis risk factor analysis laurent...

22
HASAR HASAR : Mining Sequential : Mining Sequential Association Rules for Association Rules for Atherosclerosis Risk Atherosclerosis Risk Factor Analysis Factor Analysis Laurent Brisson, Nicolas Pasquier, Céline Laurent Brisson, Nicolas Pasquier, Céline Hebert, Hebert, Martine Collard Martine Collard I3S Laboratory, University of Nice- I3S Laboratory, University of Nice- Sophia Antipolis Sophia Antipolis GREYC Laboratory, University of Caen GREYC Laboratory, University of Caen

Upload: anthony-holmes

Post on 01-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

HASARHASAR : Mining Sequential : Mining Sequential Association Rules for Association Rules for

Atherosclerosis Risk Factor Atherosclerosis Risk Factor AnalysisAnalysis

Laurent Brisson, Nicolas Pasquier, Céline Hebert, Laurent Brisson, Nicolas Pasquier, Céline Hebert,

Martine CollardMartine Collard

I3S Laboratory, University of Nice-Sophia I3S Laboratory, University of Nice-Sophia AntipolisAntipolis

GREYC Laboratory, University of CaenGREYC Laboratory, University of Caen

ContentsContents

1. Analytic question & Objectives

2. Model & Data Preparation

3. Algorithms

4. Results

Analytic QuestionAnalytic Question

Are there any differences in the development of risk factors and other characteristics between men of the risk group, who came down with the observed cardiovascular diseases and those who stayed healthy ?

ObjectivesObjectives

Evolution of Risk Factors according behavioural changes

• Groups RG versus PG and NG

• Healthy patients (NCVD) versus those with cardiovascular diseases (CVD)

• Groups based on patient education level and job

Sequential RulesSequential Rules

IDE_itemset BEH_time_itemset RF_time_item

• IDE_itemset : static identification attributes

Age of the patient Educational level of the patient Alcohol consumption at the beginning of

the study

Sequential RulesSequential Rules

IDE_itemset BEH_time_itemset RF_time_item

• BEH_time_itemset : behavioural change attributes

Comsumption of cigarettes a day Physical activity after job Physical activity in a job Different kinds of diet Medecine for cholesterol Medecine for blood pressure

Sequential RulesSequential Rules

IDE_itemset BEH_time_itemset RF_time_item

• RF_time_item : risk factor change attribute

Cholesterol level HDL Cholesterol level LDL Cholesterol level Triglycerides level Obesity …

ModelModel

IDE_itemset BEH_time_itemset RF_time_item

• Action periodwhere it occurs at least one control

• Latency perioda waiting time before observing effects

• Observation periodwhere it occurs only one control

Data PreparationData Preparation : creation of : creation of changes variableschanges variables

BEH_OBESITY idControl N

Entry.height = hControl.weight =

w

Control N+1Entry.height = hControl.weight =

w

Unknown -2 h or w = * h or w = Unknown

Unknown -2 h or w = Unknown

h or w = *

Stay_normal

1 w / h² <= 25 w / h² <= 25

Decreased 2 w / h² > 25 w / h² <= 25

Increased 3 w / h² <= 25 w / h² > 25

Stay_high 4 w / h² > 25 w / h² > 25

Data PreparationData Preparation : Flattening : Flattening operationoperation

Initial table : 1 row 1 control

ID ID ControlControl

ID ID PatientPatient

BEH 1BEH 1 …… BEH jBEH j RF 1RF 1 …… RF kRF k

Data PreparationData Preparation : Flattening : Flattening operationoperation

Flattened table : 1 row 1 patient

ID ID PatienPatien

tt

IDE 1 IDE 1

… …

IDE IDE jj

BEH 1 BEH 1

… …

BEH BEH kk

RF 1 RF 1

… …

RF RF mm……

BEH 1 BEH 1

… …

BEH BEH kk

RF 1 RF 1

… …

RF RF mm

control 1

control n

static attributes

Evolutionnary ApproachEvolutionnary Approach

A Genetic Algorithm searching for temporal rules

Fixed-length chromosome

BehavioursIdentification

Risk factor

Evolutionnary ApproachEvolutionnary Approach

A gene for each static identification attributes

IDE

1Behaviours… IDE

jRisk

factor

Evolutionnary ApproachEvolutionnary Approach

A gene for each kind of behavioural changes

Identification

Risk factor

Action period

… BEH k

BEH 1

Evolutionnary ApproachEvolutionnary Approach

One gene to describe a risk factor

Identification

RF iBehaviours

Observation period

Action period

Evolutionnary ApproachEvolutionnary Approach

Identification

RF iBehaviours

Observation period

Action period

Latency period

Fitness function : support * confidence * lift

Genetic Algorithm OptimizationGenetic Algorithm Optimization

• A CLOSE based approach for initialization

• CLOSE algorithm CLOSE algorithm improves:

extraction efficiency reducing the search-space

(use of generators and frequent close itemset)

results relevance suppressing redondant rules(bases generation)

Results : Patient classes Results : Patient classes comparisoncomparison

• Best rulesBest rules on on PGPG versus NG and RG versus NG and RG

Results : Patient classes Results : Patient classes comparisoncomparison

• Best rulesBest rules on on CVDCVD versus NCVD versus NCVD

Results : Initialization MethodsResults : Initialization Methods

• ComparisonComparison on RG group on RG group

ConclusionConclusion

• Different tendencies among groups

• Confirmation of prior medical knowledge

• Contradictions with some "assumptions"

• Further investigations with assistance of medical experts

Future ResearchFuture Researcheses

• To analyse relationships between time windows and various risk factors

• To Develop new evaluation criteria

• To Integrate physician’s prior knowledge

• To apply HASAR approach to other temporal datasets