hasar : mining sequential association rules for atherosclerosis risk factor analysis
DESCRIPTION
HASAR : Mining Sequential Association Rules for Atherosclerosis Risk Factor Analysis. Laurent Brisson, Nicolas Pasquier, Céline Hebert, Martine Collard I3S Laboratory, University of Nice-Sophia Antipolis GREYC Laboratory, University of Caen. Contents. 1. Analytic question & Objectives - PowerPoint PPT PresentationTRANSCRIPT
HASARHASAR : Mining Sequential : Mining Sequential Association Rules for Association Rules for
Atherosclerosis Risk Factor Atherosclerosis Risk Factor AnalysisAnalysis
Laurent Brisson, Nicolas Pasquier, Céline Hebert, Laurent Brisson, Nicolas Pasquier, Céline Hebert,
Martine CollardMartine Collard
I3S Laboratory, University of Nice-Sophia I3S Laboratory, University of Nice-Sophia AntipolisAntipolis
GREYC Laboratory, University of CaenGREYC Laboratory, University of Caen
ContentsContents
1. Analytic question & Objectives
2. Model & Data Preparation
3. Algorithms
4. Results
Analytic QuestionAnalytic Question
Are there any differences in the development of risk factors and other characteristics between men of the risk group, who came down with the observed cardiovascular diseases and those who stayed healthy ?
ObjectivesObjectives
Evolution of Risk Factors according behavioural changes
• Groups RG versus PG and NG
• Healthy patients (NCVD) versus those with cardiovascular diseases (CVD)
• Groups based on patient education level and job
Sequential RulesSequential Rules
IDE_itemset BEH_time_itemset RF_time_item
• IDE_itemset : static identification attributes
Age of the patient Educational level of the patient Alcohol consumption at the beginning of
the study
Sequential RulesSequential Rules
IDE_itemset BEH_time_itemset RF_time_item
• BEH_time_itemset : behavioural change attributes
Comsumption of cigarettes a day Physical activity after job Physical activity in a job Different kinds of diet Medecine for cholesterol Medecine for blood pressure
Sequential RulesSequential Rules
IDE_itemset BEH_time_itemset RF_time_item
• RF_time_item : risk factor change attribute
Cholesterol level HDL Cholesterol level LDL Cholesterol level Triglycerides level Obesity …
ModelModel
IDE_itemset BEH_time_itemset RF_time_item
• Action periodwhere it occurs at least one control
• Latency perioda waiting time before observing effects
• Observation periodwhere it occurs only one control
Data PreparationData Preparation : creation of : creation of changes variableschanges variables
BEH_OBESITY idControl N
Entry.height = hControl.weight =
w
Control N+1Entry.height = hControl.weight =
w
Unknown -2 h or w = * h or w = Unknown
Unknown -2 h or w = Unknown
h or w = *
Stay_normal
1 w / h² <= 25 w / h² <= 25
Decreased 2 w / h² > 25 w / h² <= 25
Increased 3 w / h² <= 25 w / h² > 25
Stay_high 4 w / h² > 25 w / h² > 25
Data PreparationData Preparation : Flattening : Flattening operationoperation
Initial table : 1 row 1 control
ID ID ControlControl
ID ID PatientPatient
BEH 1BEH 1 …… BEH jBEH j RF 1RF 1 …… RF kRF k
Data PreparationData Preparation : Flattening : Flattening operationoperation
Flattened table : 1 row 1 patient
ID ID PatienPatien
tt
IDE 1 IDE 1
… …
IDE IDE jj
BEH 1 BEH 1
… …
BEH BEH kk
RF 1 RF 1
… …
RF RF mm……
BEH 1 BEH 1
… …
BEH BEH kk
RF 1 RF 1
… …
RF RF mm
control 1
control n
static attributes
Evolutionnary ApproachEvolutionnary Approach
A Genetic Algorithm searching for temporal rules
Fixed-length chromosome
BehavioursIdentification
Risk factor
Evolutionnary ApproachEvolutionnary Approach
A gene for each static identification attributes
IDE
1Behaviours… IDE
jRisk
factor
Evolutionnary ApproachEvolutionnary Approach
A gene for each kind of behavioural changes
Identification
Risk factor
Action period
… BEH k
BEH 1
Evolutionnary ApproachEvolutionnary Approach
One gene to describe a risk factor
Identification
RF iBehaviours
Observation period
Action period
Evolutionnary ApproachEvolutionnary Approach
Identification
RF iBehaviours
Observation period
Action period
Latency period
Fitness function : support * confidence * lift
Genetic Algorithm OptimizationGenetic Algorithm Optimization
• A CLOSE based approach for initialization
• CLOSE algorithm CLOSE algorithm improves:
extraction efficiency reducing the search-space
(use of generators and frequent close itemset)
results relevance suppressing redondant rules(bases generation)
Results : Patient classes Results : Patient classes comparisoncomparison
• Best rulesBest rules on on PGPG versus NG and RG versus NG and RG
Results : Patient classes Results : Patient classes comparisoncomparison
• Best rulesBest rules on on CVDCVD versus NCVD versus NCVD
Results : Initialization MethodsResults : Initialization Methods
• ComparisonComparison on RG group on RG group
ConclusionConclusion
• Different tendencies among groups
• Confirmation of prior medical knowledge
• Contradictions with some "assumptions"
• Further investigations with assistance of medical experts
Future ResearchFuture Researcheses
• To analyse relationships between time windows and various risk factors
• To Develop new evaluation criteria
• To Integrate physician’s prior knowledge
• To apply HASAR approach to other temporal datasets