mining event or state sequences: a social science...
TRANSCRIPT
Mining Event or State Sequences
Mining Event or State Sequences:A Social Science Perspective
Gilbert Ritschard
Department of Econometrics, University of Genevahttp://mephisto.unige.ch
IIS 2008, Zakopane, Poland, June 16-18
13/7/2008gr 1/86
Mining Event or State Sequences
My talk is about life courses,Example of scientific life courseto help you understand what a social scientist does at IIS
date event1970-1979 Studies in econometrics1980-1992 Mathematical Economics1985-... Work with Social scientists (Family studies)
Interest in Statistics for social sciences1990-1995 Interest in Neural Networks2000-... KDD and data mining (Clustering, supervised learning)2003-... Work with historians, demographers, psychologists
(longitudinal data)2005-... KDD and Data mining approaches
for analysing life course data
13/7/2008gr 2/86
Mining Event or State Sequences
Outline
1 Sequence Analysis in Social Sciences
2 Survival Trees
3 Visualizing and clustering sequence data
4 Mining Frequent Episodes
13/7/2008gr 3/86
Mining Event or State SequencesSequence Analysis in Social Sciences
Motivation
Motivation
Individual life course paradigm.Following macro quantities (e.g. #divorces, fertility rate, meaneducation level, ...) over timeinsufficient for understanding social behavior.Need to follow individual life courses.
Data availabilityLarge panel surveys in many countries(SHP, CHER, SILC, GGP, ...)Biographical retrospective surveys (FFS, ...).Statistical matching of censuses, population registers and otheradministrative data.
13/7/2008gr 6/86
Mining Event or State SequencesSequence Analysis in Social Sciences
Motivation
Motivation
Need for suited methods for discovering interesting knowledgefrom these individual longitudinal data.Social scientists use
Essentially Survival analysis (Event History Analysis)More rarely sequential data analysis (Optimal Matching,Markov Chain Models)
Could social scientists benefit from data-mining approaches?Which methods?Are there specific issues with those methods for socialscientists?
13/7/2008gr 7/86
Mining Event or State SequencesSequence Analysis in Social Sciences
Motivation
Motivation: KD in Social sciences
In KDD and data mining, focus on prediction andclassification.Improve prediction and classification errors.
In Social science, aim is understanding/explaining (social)behaviors.Hence focus is on process rather than output.
13/7/2008gr 8/86
Mining Event or State SequencesSequence Analysis in Social Sciences
Motivation
What kind of data
What kind of data are we dealing with?Mainly categorical longitudinal data describing life coursesAn ontology of longitudinal data (Aristotelean tree).
13/7/2008gr 9/86
Mining Event or State SequencesSequence Analysis in Social Sciences
Motivation
Alternative views of Individual Longitudinal Data
Table: Time stamped events, record for Sandra
ending secondary school in 1970 first job in 1971 marriage in 1973
Table: State sequence view, Sandra
year 1969 1970 1971 1972 1973civil status single single single single marriededucation level primary secondary secondary secondary secondaryjob no no first first first
13/7/2008gr 10/86
Mining Event or State SequencesSequence Analysis in Social Sciences
Motivation
Issues with life course data
Incomplete sequencesCensored and truncated data:Cases falling out of observation before experiencing an event ofinterest.Sequences of varying length.
Time varying predictors.Example: When analysing time to divorce, presence of childrenis a time varying predictor.
Data collected by clustersExample: Household panel surveys.Multi-level analysis to account for unobserved sharedcharacteristics of members of a same cluster.
13/7/2008gr 11/86
Mining Event or State SequencesSequence Analysis in Social Sciences
Motivation
Multi-level: Simple linear regression example
y = 3.2 + 0.2 x
y = 6.2 - 0.8 x
y = 15.6 - 0.8 x
y = 12.5 - 0.8 x
0
1
2
3
4
5
6
7
8
9
1 3 5 7 9 11 13 15
Education
Chi
ldre
n
13/7/2008gr 12/86
Mining Event or State SequencesSequence Analysis in Social Sciences
Methods for Longitudinal Data
Classical statistical approachesSurvival Approaches
Survival or Event history analysis (Blossfeld and Rohwer, 2002)Focuses on one event.Concerned with duration until event occursor with hazard of experiencing event.
Survival curves: Distribution of duration until event occurs
S(t) = p(T ≥ t) .
Hazard models: Regression like models for S(t, x) or hazardh(t) = p(T = t | T ≥ t)
h(t, x) = g(t, β0 + β1x1 + β2x2(t) + · · ·
).
13/7/2008gr 14/86
Mining Event or State SequencesSequence Analysis in Social Sciences
Methods for Longitudinal Data
Survival curves (Switzerland, SHP 2002 biographical survey)
Women
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 10 20 30 40 50 60 70 80
AGE (years)
Surv
ival
pro
babi
lity
Leaving home Marriage 1st Chilbirth Parents' deathLast child left Divorce Widowing13/7/2008gr 15/86
Mining Event or State SequencesSequence Analysis in Social Sciences
Methods for Longitudinal Data
Analysis of sequences
Frequencies of given subsequencesEssentially event sequences.Subsequences considered as categories ⇒ Methods forcategorical data apply (Frequencies, cross tables, log-linearmodels, logistic regression, ...).
Markov chain modelsState sequences.Focuses on transition rates between states.Does the rate also depend on previous states?How many previous states are significant?
Optimal Matching (Abbott and Forrest, 1986) .State sequences.Edit distance (Levenshtein, 1966; Needleman and Wunsch,1970) between pairs of sequences.Clustering of sequences.
13/7/2008gr 16/86
Mining Event or State SequencesSequence Analysis in Social Sciences
Methods for Longitudinal Data
Typology of methods for life course data
IssuesQuestions duration/hazard state/event sequencingdescriptive • Survival curves: • Optimal matching
Parametric clustering(Weibull, Gompertz, ...) • Frequencies of given
and non parametric patterns(Kaplan-Meier, Nelson- • Discovering typicalAalen) estimators. episodes
causality • Hazard regression models • Markov models(Cox, ...) • Mobility trees
• Survival trees • Association rulesamong episodes
13/7/2008gr 17/86
Mining Event or State SequencesSurvival Trees
The biographical SHP dataset
SHP biographical retrospective surveyhttp://www.swisspanel.ch
SHP retrospective survey: 2001 (860) and 2002 (4700 cases).We consider only data collected in 2002.Data completed with variables from 2002 wave (language).
Characteristics of retained data for divorce(individuals who get married at least once)
men women TotalTotal 1414 1656 30701st marriage dissolution 231 308 539
16.3% 18.6% 17.6%
13/7/2008gr 20/86
Mining Event or State SequencesSurvival Trees
The biographical SHP dataset
Distribution by birth cohortBirth year
year
Fre
quen
cy
1910 1920 1930 1940 1950 1960
010
020
030
040
050
0
13/7/2008gr 21/86
Mining Event or State SequencesSurvival Trees
The biographical SHP dataset
Marriage duration until divorceSurvival curves
0 8
0.85
0.9
0.95
1
vie
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0 10 20 30 40
prob
. de
surv
Durée du mariage, Femmes
1942 et avant
1943-1952
1953 et après
0 8
0.85
0.9
0.95
1
vie
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0 10 20 30 40pr
ob. d
e su
rvDurée du mariage, Hommes
1942 et avant
1943-1952
1953 et après
0 8
0.85
0.9
0.95
1
vie
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0 10 20 30 40
prob
. de
surv
Durée du mariage, Femmes
1942 et avant
1943-1952
1953 et après
13/7/2008gr 22/86
Mining Event or State SequencesSurvival Trees
The biographical SHP dataset
Marriage duration until divorceHazard model
Discrete time model (logistic regression on person-year data)exp(B) gives the Odds Ratio, i.e. change in the odd h/(1− h)when covariate increased by 1 unit.
exp(B) Sig.birthyr 1.0088 0.002university 1.22 0.043child 0.73 0.000language unknwn 1.47 0.000
French 1.26 0.007German 1 refItalian 0.89 0.537
Constant 0.0000000004 0.000
13/7/2008gr 23/86
Mining Event or State SequencesSurvival Trees
Survival Tree Principle
Survival trees: Principle
Target is survival curve or some other survival characteristic.Aim: Partition data set into groups thatdiffer as much as possible (max between class variability)
Example: Segal (1988) maximizes difference in KM survivalcurves by selecting split with smallest p-value of Tarone-WareChi-square statistics
TW =∑
i
wi
(di1 − E(Di )
)(w2
i var(Di ))1/2
are as homogeneous as possible (min within class variability)Example: Leblanc and Crowley (1992) maximize gain indeviance (-log-likelihood) of relative risk estimates.
13/7/2008gr 25/86
Mining Event or State SequencesSurvival Trees
Example
Divorce, Switzerland, Differences in KM Survival Curves I
Zoom
� � � � � � � � � � �
� � � � � � � � � � � � � �
� � � � � � � � � �
� � � � � � � � � � � �
� � � � � � � � � � � � � � �
� � � � � � � � �
� � � � � � � � � � � �
� � � � � � � �� � � � � � � � �
� � � � � � � � � �
� � � � � � � � � � � �
� � � � � � � �� � � � � � � � �
� � � � � � � � � �
� � � � � � � � � � � �
� � � � � � � �� � � � � � � � �
� � � � � � � � �
� � � � � � � � � � � �
� � � � � � � �� � � � � � �
� � � � � � � � � �
� � �
� � � � � � � � � � � �
� � � � � �� � � � � � � � � �
� � � � � � � � � � �
� � � � ! " " #$ � "
� � � � � � � � � � � �
� � � � � � �� � � � � � � � �
� � � � � � � � � �
% & � � � � � � � # � � � ' � � � # � � � �
� � � � � � � �
% & � � � � � � � � # � � ' � � � # � � � � % & � � � � � � � # � � � ' � � � # � � � �
� � � � � � � � � � � �
� � � � � � �� � � � � � � �
� � � � � � � � � �
� � � � � � � � � � �
� � � � � � � � �� � � � � � � � � �
� � � � � � � � � �
� � � � � � � � ( � ) � *� � � � � � � � � �
� � � � � � � �
% & � � � � � � � # � � � � ' � � � # � � � �
� � � � � � � � � � �
� � � � � � � � �� � � � � � � �
� � � � � � � � �
� � � � � � � � � � � �
� � � � � � �� � � � � � �
� � � � � � � � �
$ � "� �
� � � � � � � �
% & � � � � � � � # � � � ' � � � # � � �
� � � � � � � � � � � �
� � � � � � �� � � � � � � �
� � � � � � � � � �
� � � � � � � � � � � �
� � � � � � � �� � � � � � � � �
� � � � � � � � � �
$ � "� �
� � � � � � � �
% & � � � � � � � # � � � � ' � � � # � � �
+ � + �
+
+ � + + � + �13/7/2008gr 27/86
Mining Event or State SequencesSurvival Trees
Example
Divorce, Switzerland, Differences in KM Survival Curves II
0 10 20 30 40
0.5
0.6
0.7
0.8
0.9
1.0
Cohort <=1940 & Non French Speaking & University
Cohort <=1940 & Non French Speaking & < University
Cohort <=1940 & French Speaking
Cohort > 1940 & No Child & University
Cohort > 1940 & No Child & < University
Cohort > 1940 & Child & German or Italian Speaking
Cohort > 1940 & Child & French or Unknown Speaking
13/7/2008gr 28/86
Mining Event or State SequencesSurvival Trees
Example
Divorce, Switzerland, Relative risk
� � � � � � �
� � � � � � � � � � � � � � �
� � � � � �
� � � � � � � � � � � � � �
� � � �
� � � � � � � � � �
� � � � �� � � � � � � � �
� � � � � � � � � � �
� � � � � � � � �� �
� � � � � � � � � �
� � � � � � � �
� � � � � � � � � � � � � � � � � � � � �
� � � � �
� � � � � � � � � � � � � �
� � � � � � � �
� � � � � � � � � � � � � �
� � � � � � �
� � � � � � � � � � � � � �
� � � � � � �
� � � � � � � � � � � � �
� � � � � � �
� � � � � � � � � � � � � �
13/7/2008gr 29/86
Mining Event or State SequencesSurvival Trees
Example
Hazard model with interaction
Adding interaction effects detected with the tree approachimproves significantly the fit (sig ∆χ2 = 0.004)
exp(B) Sig.born after 1940 1.78 0.000university 1.22 0.049child 0.94 0.619language unknwn 1.50 0.000
French 1.12 0.282German 1 refItalian 0.92 0.677
b_before_40*French 1.46 0.028b_after_40*child 0.68 0.010
Constant 0.008 0.00013/7/2008gr 30/86
Mining Event or State SequencesSurvival Trees
Social Science Issues
Issues with survival trees in social sciences
1 Dealing with time varying predictorsSegal (1992) discusses few possibilities, none being reallysatisfactory.Huang et al. (1998) propose a piecewise constant approachsuitable for discrete variables and limited number of changes.Room for development ...
2 Multi-level analysisHow can we account for multi-level effects in survival trees,and more generally in trees?Conjecture: Should be possible to include unobserved sharedeffect in deviance-based splitting criteria.
13/7/2008gr 32/86
Mining Event or State SequencesVisualizing and clustering sequence data
Life trajectories
Sequence analysis
Survival approaches not useful in a unitary (holistic)perspective of the whole life course.Sequence analysis of whole collection of life events bettersuited for such holistic approach (Billari, 2005).
Rendering sequencesColorize your life courses
Results from the analysis of the retrospective Swiss HouseholdPanel (SHP) survey.Focus on visualization of life course data.
13/7/2008gr 35/86
Mining Event or State SequencesVisualizing and clustering sequence data
Life trajectories
Evolution tendencies in familial life course trajectories
Sequence analysis techniques permit to test hypotheses aboutevolution in these familial life trajectories. (Elzinga and Liefbroer,2007):
De-standardization: Some states and events of familial life areshared by decreasing proportions of the population, occur atmore dispersed ages and their duration is also more scattered.De-institutionalization: Social and temporal organization oflife courses becomes less driven by normative, legal orinstitutional rules.Differentiation: Number of distinct steps lived by individualincreases.
13/7/2008gr 36/86
Mining Event or State SequencesVisualizing and clustering sequence data
Example: the BioFam sequential data set
Presentation of the “BioFam” data
Data from the retrospective survey conducted in 2002 by theSwiss Household Panel (SHP)(with support of Federal Statistical Office, Swiss NationalFund for Scientific Research, University of Neuchatel.)Retrospective survey: 5560 individualsRetained familial life events: Leaving Home, First childbirth,First marriage and First divorce.Age 15 to 45 → 2601 remaining individuals, born between1909 et 1957.
13/7/2008gr 38/86
Mining Event or State SequencesVisualizing and clustering sequence data
Example: the BioFam sequential data set
Distribution by birth cohortBirth year
year
Fre
quen
cy
1910 1920 1930 1940 1950 1960
010
020
030
040
050
0
13/7/2008gr 39/86
Mining Event or State SequencesVisualizing and clustering sequence data
Example: the BioFam sequential data set
Creating state sequences
Example of time stamped data:individual LHome marriage childbirth divorce
1 1989 1990 1992 NA
13/7/2008gr 40/86
Mining Event or State SequencesVisualizing and clustering sequence data
Example: the BioFam sequential data set
Deriving the states
Need one state for each combination of events:
LHome marriage childbirth divorce0 no no no no1 yes no no no2 no yes yes/no no3 yes yes no no4 no no yes no5 yes no yes no6 yes yes yes no7 yes/no yes yes/no yes
13/7/2008gr 41/86
Mining Event or State SequencesVisualizing and clustering sequence data
Characteristics of sequences
Definition
Entropy: measure of uncertainty regarding sequencepredictability.
pi , proportion of cases (or time points) in state i .Shannon h(p) =
∑i −pi log2(pi )
Other type of entropies: Quadratic (Gini), Daroczy, ...Two ways of using entropies.
Entropy of the state at each time (age) point: Entropyincreases with diversity of states observed at each time point(age).Entropy of each individual sequences: Entropy increases withdiversity of states during the observed life course and varieswith the time spend in each state.
13/7/2008gr 43/86
Mining Event or State SequencesVisualizing and clustering sequence data
Characteristics of sequences
Entropy of the state at each time (age) point
Entropy of bifam state distribution by age
Age
Ent
ropy
a15 a17 a19 a21 a23 a25 a27 a29
0.2
0.4
0.6
0.8
13/7/2008gr 44/86
Mining Event or State SequencesVisualizing and clustering sequence data
Characteristics of sequences
Entropy: Minimum/maximum
Entropie minimum, médiane et maximum
Time
Seq
uenc
es 1
−15
, sor
ted
by E
ntro
py
A15 A20 A25 A30 A35 A40 A45
N/N/N/NY/N/N/NN/Y/*/NY/Y/N/NN/N/Y/NY/N/Y/NY/Y/Y/N*/*/*/Y
13/7/2008gr 45/86
Mining Event or State SequencesVisualizing and clustering sequence data
Characteristics of sequences
Entropy - histogram
Entropy for the sequences in the biofam data set
Entropy
Fre
quen
cy
0.0 0.5 1.0 1.5
010
020
030
040
050
0
13/7/2008gr 46/86
Mining Event or State SequencesVisualizing and clustering sequence data
Characteristics of sequences
Hypothesis
Evolutions of familial life trajectories gives rise to an increasein the entropy of individual sequences,because they become less predictable and more diversified.
13/7/2008gr 47/86
Mining Event or State SequencesVisualizing and clustering sequence data
Characteristics of sequences
Entropy by birth cohorts
●●●
●
● ●●●●●
●
●●●●
●
●●●
●●
●●
●●●●●● ●●●●●●●●●
●
●●●
●
●●●
●
●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●
1909−18 1919−28 1929−38 1939−48 1949−58
0.0
0.5
1.0
1.5
Distribution de l'entropie selon les cohortes de naissances
Birth cohort
Seq
uenc
es e
ntro
py
13/7/2008gr 48/86
Mining Event or State SequencesVisualizing and clustering sequence data
Characteristics of sequences
Entropy by sex
●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
Hommes Femmes
0.0
0.5
1.0
1.5
Distribution de l'entropie selon le sexe
Sexe
Seq
uenc
es e
ntro
py
13/7/2008gr 49/86
Mining Event or State SequencesVisualizing and clustering sequence data
Characteristics of sequences
Definition
Turbulence (Elzinga and Liefbroer, 2007): Somewhat similarto entropy.Turbulence accounts for state sequencing (which is not thecase of the entropy).Turbulence accounts of the following two elements:
number of subsequences:x=S,U,M,MC - 16 subsequences more turbulent thany=S,U,S,C - 15 subsequencesvariance of duration in each state:S/10 U/2 M/132 is less turbulent thanS/48 U/48 M/48
13/7/2008gr 50/86
Mining Event or State SequencesVisualizing and clustering sequence data
Characteristics of sequences
Turbulence - Minimum/maximum
Turbulence minimum, médiane et maximum
Time
Seq
uenc
es 1
−15
, sor
ted
by T
urbu
lenc
e
A15 A20 A25 A30 A35 A40 A45
N/N/N/NY/N/N/NN/Y/*/NY/Y/N/NN/N/Y/NY/N/Y/NY/Y/Y/N*/*/*/Y
13/7/2008gr 51/86
Mining Event or State SequencesVisualizing and clustering sequence data
Characteristics of sequences
Turbulence - histogram
Turbulence for the sequences in the biofam data set
Turbulence
Fre
quen
cy
2 4 6 8 10
020
040
060
0
13/7/2008gr 52/86
Mining Event or State SequencesVisualizing and clustering sequence data
Characteristics of sequences
Turbulence by cohorts
●
●
●
●●
●●
●
●
●
●●
●
●
●●●
●
●
●●
●
●●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●●●●●●●●●
●
●
●●
●
●
●
●
●
●●●
●
●
●●●●●
● ●
●
●
●
●●●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●●
●
●
●
●●●
1909−18 1919−28 1929−38 1939−48 1949−58
24
68
10
Turbulence selon la cohorte de naissances
Birth cohort
Seq
uenc
es tu
rbul
ence
13/7/2008gr 53/86
Mining Event or State SequencesVisualizing and clustering sequence data
Distances between sequences: Clustering
Clustering, Multidimensional scaling and more
Once you are able to compute 2 by 2 distances betweensequences you can among others:Cluster sequencesMake scatter plot representation of sets of sequences usingmultidimensional scaling.
13/7/2008gr 55/86
Mining Event or State SequencesVisualizing and clustering sequence data
Distances between sequences: Clustering
Distances between sequences
Edit distance (known as Optimal matching in Social sciences)(Levenshtein, 1966; Needleman and Wunsch, 1970; Abbott andForrest, 1986)
d(x , y) Total cost of insert, deletion and substitution changesrequired to transform sequence x into y .Different solutions depending on indel and substitution costs.
Other metrics proposed by (Elzinga, 2008)LCP: Longest common prefix (also longest common postfix)LCS: Longest common subsequence(same as OM with indel cost = 1, and substitution cost = 2).NMS: Number of matching subsequences...
Elzinga (2008) proposes a nice formalization of these metrics.
13/7/2008gr 56/86
Mining Event or State SequencesVisualizing and clustering sequence data
Distances between sequences: Clustering
Dendrogram, OM1 versus OM3different indel costs (1 vs 3)
117
334
784
910
8111
0011
9214
8817
5217
8322
0522
5923
8225
89 121
155
285
563
790
796
929
992
1019
1419
1468
2023
2125 13
0 5525
853
423
113
3218
5921
51 535
1387
1519 73
724
6711
4923
091
913
6810
2120
88 834
2305
1050
1444 2 13
314
218
424
863
765
381
888
991
211
9312
4312
5416
1516
7819
9321
6322
6125
55 26 59 104
159
172
428
663
860
1014
1452
1485
1559
1620
1663
2267
2525
2554
2584
1116 37 163
195
234
358
362
598
784
813
965
1020
1032
1042
1059
1065
1088
1249
1252
1343
1795
1825
1892
1899
1925
1964
2002
2258
2358
2535
2546
2597 16
229
753
665
285
612
3812
4415
1015
5215
5416
0917
2717
3817
8719
4520
4822
5723
3523
7324
5724
9624
97 110
918
1373
1978 36
020
3610
7315
0612
0422
23 15 82 129
131
312
660
677
833
905
913
1089
1138
1239
1329
1378
1512
1584
1680
1874
1884
2343
2448
2552 83 91 11
215
015
226
026
728
229
954
959
976
410
5313
7917
5320
7521
4525
47 132
2478
1673
1581
1873
1653 30 87 137
235
256
345
364
403
594
907
1092
1284
1476
1489
1526
2047
2207
2272
2349
2361
2396
2596 31 10
022
024
327
728
135
445
546
048
371
081
485
097
016
8919
0020
5222
1523
3923
4824
6525
7425
9520
68 35 102
259
266
309
311
429
729
819
837
999
1187
1250
1264
1760
1768
1806
1886
1920
2001
2067
2325
2340
2359
2556 42
015
7317
2810
9116
6925
22 340
552
820
9823
6523
6624
77 642
777
1072
1677 38
471
192
512
3413
1613
4117
9318
7519
27 842
1377
1915
2454
2097 29
1385 38
541
764
183
510
0812
3320
3824
6625
60 242
810
841
982
1156
1297
1436
1672
1683
1763
1798
1862
1990
2483 62
717
73 189
310
530
2327 78
894
324
38 649
1144
2144 94
512
1114
0611
7114
9116
94 224
379
503
940
1314
1645
1076
2437
240
2389 95
220
7417
9722
3723
26 395
473
502
561
880
1463
1812
1813 94
422
0622
1823
0613
3924
1498
117
0316
5618
9521
5722
43 61 143
465
472
593
646
755
792
876
1003
1384
1671
1907
2120
2245
2269
2503 16
925
332
287
710
0410
5610
7113
9717
1321
2822
2022
8124
8225
68 996
1870
1279
1866 41
341
446
876
389
611
5011
5514
0121
1021
5522
4122
2673
616
5213
3814
90 86 559
2381 441
1936
1442 168
2236
442
1356
1755
1937 92 33
346
481
710
8211
8212
1613
5016
9017
1217
6723
93 537
802
1126
1289
1702
1754
1803
1894
1950
2193
2421 13
932
334
813
8616
9211
1213
5514
2314
6719
5723
72 787
1740 90
825
1028
328
481
692
411
5420
5521
2724
7613
6166
213
3320
0517
6418
43 42 60 586
487
910
2026
2006
1562
319
1317
1051
488
585
1829 88
119
682
159
687
220
32 241
493
911
665
1952
887
1017
2105 23
891
430
016
8436
570
721
4225
9311
4520
5721
15 74 289
167
888
499
584
2531 19
779
355
516
6632
493
720
7112
0914
3518
3023
7051
316
97 780
1147
1085
1958 79
425
2119
4325
90 759
803
1599 873
1326
2371 76
146
750
1224
1535
1654
316
1143
2394 64
818
85 716
1443
2487
1125
1679
1691 75
2316 64
063
514
0016
3321
90 879
1879
1949 8
170
156
811
160
212
0814
3360
116
6717
22 960
2143
2016 34
497
958
019
77 401
1982
830
1976
1398
1242
1313 10 38 99 113
164
171
187
212
213
226
228
229
302
304
386
404
427
432
454
484
494
521
523
715
760
767
773
774
775
781
797
857
980
991
1011
1023
1031
1033
1040
1087
1130
1134
1263
1276
1344
1351
1390
1437
1509
1542
1591
1661
1662
1733
1784
1785
1944
1970
2009
2012
2034
2092
2094
2100
2154
2212
2233
2353
2367
2446
2459
2475
2543 5
423
2146
312
0621
24 77
2027
2417
1229 78 828
2077
1527 622
815
1575 29
117
4115
4125
33 519
740
853
1394
1457
1807 60
067
611
3996
711
8822
9967
525
87 678
1123
1214
1469
2024
2282 45
211
9613
8310
2511
46 746
2392 86
191
518
6711
1314
6616
60 2495
40 45 221
533
571
829
890
1024
1210
1246
1380
1473
1642
1832
1887
2060
2134
2203
2211
2320
2435 43
944
057
082
612
6814
6423
0923
1125
69 443
577
836
843
1028
1265
1664
2099
2135
2322 61
919
3325
82 239
518
2579
423
2391 27
473
925
2918
2827
214
3821
71 321
973
1364 41
950
758
810
3710
9417
7117
9421
3222
2722
88 566
1604
1938
2250 49
1179
1537
1804 36
744
451
754
882
016
1025
53 307
738
758
971
1119
1266
1305
1412
1572
2411 90
910
1210
2960
678
914
5622
52 273
449
516
868
900
927
1157
1354
1588
1608
1638
2021
2090
2300
2329
2410
2441 36
155
476
610
4713
5714
1414
1517
9218
5220
5122
3122
4422
9623
6823
8724
9425
28 46
1290
390
1454
2278 27
814
1117
2325
00 402
1054
1472
2263
2264
124
558
1142
2089 42
410
6212
5118
83 422
1550
2514
2515 60
822
9316
3623
2824
80 567
799
2138
1151
1353
346
1701 49
613
08 692
1699
604
895
1910
1253
2430
2199 66
821
6637
737
893
117
1017
1120
1016
11 855
2490
508
1064
624
2172
1462 99
322
0893
620
0823
2319
1921
3319
39 1815
2820
54 2199
521
3615
5523
6415
38 3222
48 263
1240
2189
1831 32
612
41 5725
04 8420
66 998
1007
270
987
2170
2037
1470
1948
2505
296
2222
1381
2112
477
1772
1841
2517 85 34
120
3135
214
3223
4517
5620
83 9317
2917
6910
0119
74 25
1579
2548
2559
1228
1959
2436 33
210
3814
9622
51 626
1061
1877
1322
1924
2507 51
1474 59
718
4222
1324
4525
13 476
1774
2369
177
1133
2279 60
720
5335
357
910
1657
810
18 827
994 66
1601
613
1605 75
420
0422
56 436
1010 49
211
8012
5525
6715
9519
03 200
338
2280 30
396
821
5020
114
2719
7324
89 997
1439
1367
1665
1996
1668
2527 39
689
797
614
4921
11 906
2287
1876 27
1002
1594 4
323
1912
4787
113
99 903
1009
2020 14
125
37 743
1446
2418 18
219
9717
3423
7525
424
2718
3910
0619
23 7312
9523
5221
9153
214
2414
3110
719
8916
81 612
1500
1896 40
679
110
8612
2319
6020
4421
97 106
1863
2192 250
339
2351
1227
1942
2290 16
659
528 17
023
9914
6510
4615
46 294
475
391
2332
1531
1574
1918
2471
127
1871
628
2481
2076 87
814
0819
94 700
1030
2061 52
213
7122
10 661
1487
2470 4
112
0238
113
4034
022
00 651
1330
1455 10
316
7024
4213
3123
5014
1323
57 798
1352
2512
1428 4
671
2338 80 57
511
7411
7516
1216
4822
6825
38 538
1311
2235 54
015
9054
423
8825
80 412
2073 86
419
7911
0317
614
8017
6224
7323
1424
626
444
812
8515
9823
1824
26 689
1520
1393
1968
741
865
1800
2059 22
596
114
7120
78 286
2303
1743
631
2575
1494 647
2079
2230
2341 95
824
1222
9724
86 882
2310
1098
1865
1657
2028
2029 5 59
210
48 512
515
1628 39
917
7919
35 418
2432 68
810
1315
4016
1311
7211
7322
6024
05 47
950
1327
2354
618
2195 67
413
7222
14 6725
3455
776
156
412
2015
3214
9515
8320
85 64 854
2247
2460 53
913
2510
4529
312
8013
2414
40 611
1757
582
1901
2070
2506 61
414
7714
7819
88 731
1334
1342
2524
2084
292
697
812
867
1258 35
178
388
423
0711
2918
4024
0611
0112
6718
5623
08 525
620
866
885
1036
1497
2194
2216 63
392
819
0211
2218
0120
6363
465
465
516
2224
34 805
1621
2123
1277
1396
2464
2511
818
58 2232
078
270
313
0711
8612
9318
3721
8224
74 44 749
901
2225 60
911
6319
5424
92 65
2413
2198
233
974
1104
1300
2013 26
539
893
015
0718
22 8918
9723
0125
7825
276
214
070
511
1725
73 329
2130
1309
1904
1108
1851
1651
1846
1111
1159
1160 1
215
6011
28 545
708
2035
1022
2363
1549
2180
2594 36
310
7920
58 610
1127
1067
1310 35
055
220
14 831
1066
1075
1571
2137
2443
2539
2156
659
2160 95
914
726
865
610
5810
7011
6817
8625
0217
58 451
615
546
1102 20 73
275
715
8220
8020
5021
74 630
387
1674
2140
1530
1961 34 109
694
734
823
990
1170
1213
1655
2119 14
919
428
737
165
715
9616
2616
5824
4010
8019
2219
322
729
039
374
210
6811
9111
9517
0618
1124
0924
29 126
2139
1941
2131
1518
2007
2469 13
576
598
925
40 288
832
1278
1448
2541 13
618
120
733
571
975
690
211
9914
2215
4515
5816
1616
2516
9818
1018
2720
1921
2221
7724
7925
71 942
2532
1323 53 148
262
370
392
636
870
947
953
966
1294
1299
1345
1430
1441
1659
1715
1747
1847
2018
2049
2176
2179
2408
2458
2536 62 10
815
119
936
643
072
812
4512
8312
8614
1716
3017
3017
9119
1419
8421
7322
7323
8624
2524
44 63 687
693
949
1027
1118
1370
1523
1525
1603
2104
2402
2509
2545 20
647
160
567
294
694
895
412
1713
1513
6614
2914
8215
8016
3916
4620
4521
7521
9623
3623
4724
2025
7025
92 276
1782 17
532
780
618
2612
1226
173
515
3922
55 245
629
1183 33
764
557
410
7719
08 772
1534 85
221
1822
6221
2915
3621
02 640
889
116
49 556
1389
1553 71
740
715
0117
5923
13 722
1274
1403
1688
1921
1987
2302
2404 12
213
424
915
6916
19 222
1687
2462
1359
1834
2431
101
485
551
825
917
1165
1524
1600
1631
1644 24
410
5241
616
2917
9918
5718
9323
7923
8525
3011
2024
6825
8811
5815
6115
5125
525
770
687
420
0031
816
43 510
768
1717 68
015
1422
09 298
851
972
1635
1817
2232
1891
2025 46
970
922
40 572
9716
17 128
204
2238 17
920
1522
5343
422
8421
8543
812
72 616
1618
1218
1291
1421
1815
1868
1363
1929
2337
117
1719 16
512
1525
72 482
1951 589
1878 984
2201
1460
1814 48
913
6511
8118
53 421
542
1911 77
914
9818
8820
9314
5915
0411
913
2017
51 343
590
380
1461
686
2407 61
725
5793
513
4821
1434
219
3024
9815
2183
918
6421
13 453
520
1909 547
587
690
359
1407
1055
2294
2344
644
941
1298 957
1083
2239
2518
2107
2108 54
111
15 667
1963
1854
1962 37
525
5121
1725
2645
745
815
4725
6212
6214
4537
671
350
625
19 527
714
280
2086
1222
1781
2039
2040
2224
720
2087
1981
1256
1231
1261
963
1221
1362
2081
2082
1848
1849 7
214
317
704
804
975
1232
1321
1404
1861
2452
2453 30
135
517
31 565
1275 65
092
010
3913
0214
2524
85 625
808
1499
1586
1724
2265
2463 16 88 41
043
169
569
811
7614
4714
8615
7016
5019
1322
2922
4624
1924
72 50 6818
3519
16 52 374
394
809
1492
1889
2046 36
2415
1869
2285 18
543
510
9712
9213
1917
04 685
1563
2095
2126
2161
2162
2378 15
619
3422
04 190
1682
356
1166
1185
1273
1566
1627
1824
2298
2451 29
551
163
871
882
211
3712
0312
0515
2915
9318
4520
56 526
1543 62
374
410
9017
6122
2825
98 232
400
486
576
988
1304
1349
1517
1685
1726
1744
2184
2274
2493 85
915
5610
0523
4620
6221
52 305
306
1235
2558 11
1178
2283 34
977
612
69 824
1225
1074 962
2169
1451
1906
154
529
218
1946 75
219
5613
6014
75 237
271
553
1395 79 397
1675
2183
2356
560
2330
1614
1965 96
1161
2583
1405
2249 93
811
6212
8819
8635
713
12 922
893
1721
2109
2433 93
912
7118
0823
33 174
334
664
1548
1479
1966
1737
461
462
581
1358
2072
2520 84
713
0370
223
9015
0819
47 325
1237
1493
2011
2331
1533
2549 9
202
279
308
328
470
745
786
801
1136
1141
1382
1410
1481
1585
1819
1953
2116
2242
2289
2488
2601 11
410
7822
91 33 314
415
459
478
480
573
858
1026
1135
1167
1257
1282
1568
1597
1693
1718
1746
1777
1872
1995
2153
2380
2508
2576
2577
2585 11
533
133
637
277
184
810
9311
9813
9215
1315
5718
1820
0324
5625
4225
8626
0013
91 23 24 72 116
188
389
450
639
682
1153
1226
1281
1318
1374
1578
1778
2286
2360
2501 56 63
269
989
289
911
8913
2815
0317
7017
7517
7617
9619
7521
4923
8323
8424
61 186
875
2101
1850
2158
2362
2544
1099
1700
2450 19 247
313
368
388
474
479
490
491
495
524
681
725
726
1152
1194
1347
1376
1483
1484
1544
1577
1676
1686
1821
1940
1983
1998
1999
2103
2219
2374
2422 95 12
321
621
741
142
577
011
7715
0216
4117
3218
2321
48 158
1335
1802 21
542
672
179
589
810
4913
3714
0923
42 39 160
161
315
840
883
1836
1991
2591
1184
1805
683
684
733
1140
1602
1632
1716
1971
1972
2516 80
011
06 58 157
178
330
433
509
679
712
785
869
1107
1306
1833
2042
2043
2091
2271
2312
2315
2397
2398
2424
2428
2599 56
210
0010
5721
65 14 846
1426
1890
2065
2164
2275 10
514
58 978
2266
2334 17 20
350
173
086
310
4310
4417
3618
9819
69 138
219
862
1511
1587
1860
1190 69 956
1522
192
211
1035
2168
2277 98 56
916
3718
16 437
467
2377
1564
1060
2566
1809
1105 48 269
466
845
964
1725
1742 93
219
1721
59 955
1069
1695
1696
1705
2064
2167
2276
2292 12
019
180
720
9624
0024
01 409
894
923
643
1453
2565 951
2106
1955 13
2581
504
1109
1881 77
811
3215
8916
06 9010
4119
05 621
1375 67
091
675
371 50
067
392
112
8720
17 382
2484
1388
198
751
1248
1624
1790
2270
2523 59
125
63 727
1880 7
024
9912
6014
1819
9220
30 236
723
1063
1567
1623 58
317
3919
26 933
118
446
1515 44
512
519
3220
69 208
209
383
2181
1131
1931
2403
2564 45
660
310
96 180
934
205
1709
2550 66
625
6117
0811
1418
321
022
1721
4714
3424
4717
5022
02 691
1270
2186
2187
2491 74
714
20 9417
8917
4517
80 724
1505
1607
2254 14
414
516
4717
07 251
977
1720
1219
1336
505
514
969
1369
1416 69
681
188
622
318
4413
0111
1016
4022
9511
6411
6920
33 369
1516 669
1820 76
917
8816
3422
3414
0237
312
3617
14 838
2141 98
619
6711
2415
9217
6622
2123
5524
39 153
497
1259
1576 481
2178
1148
2423
275
2317 447
2041
1765
1230
2304
2395
1346
2324
2376
1749
1748
1855
1912
2121 498
1985 748
1735 65
815
6519
80 531
1838
1928
2146
543
1201 84
411
9712
0714
5012
0010
8418
8221
8810
95 985
550
904
2449
2022
2416
926
2455 98
310
1510
3411
2112
96
020
040
060
080
010
00
Dendrogram of agnes(x = dist.om1, diss = TRUE, method = "ward")
Agglomerative Coefficient = 1dist.om1
Hei
ght
117
334
784
910
8111
0011
9214
8817
5217
8322
0522
5923
8225
89 121
155
285
563
790
796
929
992
1019
1419
1468
2023
2125 13
0 5525
853
423
113
3218
5921
51 535
1387
1519 73
724
67 213
314
218
424
863
765
381
888
991
211
9312
4312
5416
1516
7819
9321
6322
6125
55 26 59 104
159
172
428
663
860
1014
1452
1485
1559
1620
1663
2267
2525
2554
2584 11
4937 16
319
523
435
836
259
878
481
396
510
2010
3210
4210
5910
6510
8812
4912
5213
4317
9518
2518
9218
9919
2519
6420
0222
5823
5825
3525
4625
9711
1616
229
753
665
285
612
3812
4415
1015
5215
5416
0917
2717
3817
8719
4520
4822
5723
3523
7324
5724
9624
97 15 82 129
131
312
660
677
833
905
913
1089
1138
1239
1329
1378
1512
1584
1680
1874
1884
2343
2448
2552 83 91 11
215
015
226
026
728
229
954
959
976
410
5313
7917
5320
7521
4525
47 132
2478
1673
1581
1873
1653 30 87 137
235
256
345
364
403
594
907
1092
1284
1476
1489
1526
2047
2207
2272
2349
2361
2396
2596
1728 31 100
220
243
277
281
354
455
460
483
710
814
850
970
1689
1900
2052
2215
2339
2348
2465
2574
2595
2068 35 102
259
266
309
311
429
729
819
837
999
1187
1250
1264
1760
1768
1806
1886
1920
2001
2067
2325
2340
2359
2556 11
010
7315
06 360
2036 91
813
7319
78 420
1573
1091
1669
2522 626
1061
1877
1322
1924
2507
1204
2223 3
405
528
2098
2365
2366
2477 64
277
710
7216
77 384
711
925
1234
1316
1341
1793
1875
1927 84
213
7719
1524
5420
97 2913
85 385
417
641
835
1008
1233
2038
2466
2560 24
281
084
198
211
5612
9714
3616
7216
8317
6317
9818
6219
9024
83 189
649
1144
2144 94
598
117
0316
5618
9521
5722
43 310
530
2327 78
894
324
38 952
2074
224
379
503
940
1314
1645
1076
2437
2370
240
2389
1797
2237
2326 39
547
350
256
188
014
6318
1218
13 944
2206
2218
2306
1339
2414 7
615
3516
5414
675
012
2411
7114
9116
9412
1114
0631
611
4323
94 648
1885 71
614
4324
8723
7175
980
315
9911
2516
7916
91 51
2445
2513
1474
2213 34
458
059
718
4217
711
3322
7920
53 607
578
1018 82
799
417
7335
357
962
710
1647
617
7423
69 61 143
465
472
593
646
755
792
876
1003
1384
1671
1907
2120
2245
2269
2503 16
925
332
287
710
0410
5610
7113
9717
1321
2822
2022
8124
8225
68 996
1870
1279
1866 16
822
3644
213
5617
5519
37 92 333
464
817
1082
1182
1216
1350
1690
1712
1767
2393 53
741
341
446
876
389
611
5011
5514
0121
1021
5522
4122
2673
616
5213
3814
90 401
963
1221
1362
2081
2082
1848
1849 86 559
2381
1442
441
1936
960
2143
2016
1242
830
1976
1398
1982
1313
139
323
348
1386
1692
1112
1355
1423
1467
1957
2372 78
717
4066
213
3320
0517
6418
43 283
284
816
924
1154
2055
2127
2476
1361
802
1126
1289
1702
1754
1803
1894
1950
2193
2421 90
825
10 42 60 586
2006
487
910
2026
1051
1562
488
585
1829 88
181
887
1017
2105 56
870
124
149
391
166
519
5259
687
220
32 196
821
319
1317 23
891
430
016
8436
570
749
221
4225
93 74 289
7523
16 635
1400
1633
2190 64
087
918
7919
49 111
602
601
1667
1722
1208
1433 97
919
7716
788
849
958
425
31 873
1326
197
793
555
1666
324
937
2071
1209
1435
1830 51
316
9719
4325
90 794
2521
780
1147
1085
1958 10 38 99 11
316
417
118
721
221
322
622
822
930
230
438
640
442
743
245
448
449
452
152
371
576
076
777
377
477
578
179
785
798
099
110
1110
2310
3110
3310
4010
8711
3011
3412
6312
7613
4413
5113
9014
3715
0915
4215
9116
6116
6217
3317
8417
8519
4419
7020
0920
1220
3420
9220
9421
0021
5422
1222
3323
5323
6724
4624
5924
7525
43 54
2321
463
1206
2124 7
720
2724
1712
29 4612
9039
014
5422
78 278
1411
1723
2500 7
882
820
77 622
1527
1867
230
919
1368
1021
2088 83
423
0510
5014
44 1113
1466
1660
2495
40 45 221
533
571
829
890
1024
1210
1246
1380
1473
1642
1832
1887
2060
2134
2203
2211
2320
2435 43
944
057
082
612
6814
6423
0923
1125
69 273
449
516
868
900
927
1157
1354
1588
1608
1638
2021
2090
2300
2329
2410
2441 36
155
476
610
4713
5714
1414
1517
9218
5220
5122
3122
4422
9623
6823
8724
9425
28 239
518
274
2579
739
2529
1828
423
2391
1938
2250
419
507
588
1037
1094
1771
1794
2132
2227
2288 61
919
3325
8216
0444
357
783
684
310
2812
6516
6420
9921
3523
22 566
4911
7915
3718
04 367
444
517
548
820
1610
2553 30
773
875
897
111
1912
6613
0514
1215
7224
11 909
1012 60
678
914
5622
5210
2912
455
811
4220
89 424
1062
1251
1883 27
214
3821
71 321
973
1364 42
215
5025
1425
15 608
2293
1636
2328
2480 56
779
921
3811
5113
53 402
1054
1472
2263
2264
346
1701 49
613
08 692
1699 85
560
489
519
10 668
2166
1253
2430
2199
377
931
378
1710
1711
2010
1611
2490
624
2172
1462 99
393
620
0823
2322
0819
1921
3319
39 1815
2820
37 3222
48 263
296
2222
1381
2112
477
1772
1841
2517 27
098
721
7019
4825
05 332
1496
2251
1038
1228
1959
2436 85 34
114
3223
4514
7017
56 93
352
1769
1729
2083
326
2031 61
215
0010
0119
74 2199
521
3615
3815
5523
6420
5412
4021
8918
31 25
2548
2559
1579 57 84
2066 99
814
2725
0410
0712
41 6620
0422
5616
0161
375
416
0520
033
822
8013
6721
5096
824
8919
73 201
1665 99
714
3916
6825
27 303
396
897
906
976
1449
2111
2287
1996
1876
436
1010
1180
2020
1255
2567
1595
1903
1145
2057
2115 50
810
6445
211
9613
8310
2511
46 746
2392 86
191
527
1002
1896 43
1594
1247
182
871
1399 90
310
0923
19 7312
9523
5221
91 106
1863
2192
107
1989
1681 532
1424
1431
1960 40
620
4479
110
8612
2321
97 166
595
250
339
1227
2351
1428
1446
2418
1942
2290
2537 4
671
2338 19
875
112
48 540
1590
382
2484
1388
1172
1173
2270
2523
1624
1790 13
2581
538
1311
2235
1287
2017 71 50
067
392
120
7850
411
0918
81 778
1132
1589
1606
1110
1164
1169 90
1041
1905 38
321
8162
113
75 670
916
753
236
723
1567
1623 59
125
63 727
1880 18
374
714
20 210
2217
2147
1708
453
520
547
587
690
691
1270
2186
2187
2491
1434
2447
1750
2202 70
2499
1063
1260
1418
1992
2030 37
320
3398
619
6711
2422
2123
5524
39 933
369
1516 76
916
3422
3417
8866
918
2014
02 559
210
48 512
515
1628 24
626
444
812
8515
9823
1824
26 564
1220
1532
1495
1583
2085 63
125
7514
9422
596
114
71 286
2303 80 575
1174
1175
1612
1648
2268
2538 54
423
8825
8023
1439
917
7919
35 412
2073
1743
1762
2473 86
419
7911
03 47 950
2214
618
2195 41
824
32 674
1372
1327
2354 61
725
5793
513
4821
14 6734
359
055
776
125
3411
913
2017
51 686
2407 38
014
61 64 854
2247
2460 53
913
2510
4524
562
911
8311
2918
4024
06 293
1280
1324
1440
1478
1988 33
764
577
215
34 611
1757
525
620
866
885
1036
1497
2194
2216 63
465
465
516
2224
3424
6425
11 582
1901
2070
2506 80
513
4225
2420
8417
614
8022
3023
41 689
1520
1393
1968
351
783
884
2307 74
186
518
0020
5918
0120
6326
539
893
015
0718
22 292
697
812
867
1258
1101
1267
1856
2308 63
392
819
0218
0912
7713
9610
9818
6511
2220
2820
298
1858
1186
1293
1837
2182
2474 44 74
913
0020
13 22
320
782
703
1307
2198 14
060
932
916
5118
4618
51 705
2301
2578
811
886
2130
1111
1159
1160
1108
1117
2573 89
1897 25
276
264
720
7995
824
1224
8622
9723
10 9417
89 144
145
1647
1707 22
318
4413
0116
4022
95 251
977
1720
505
514
969
1369
1416 69
672
415
0512
1913
0919
0413
3616
0722
5417
4517
80 180
934
1114 20
517
0925
5011
717
19 165
482
1951
1215
2572 44
510
7414
5119
0682
412
25 962
2169
118
446
1515 12
519
3220
6924
0360
310
9611
3119
3125
6425
4948
913
6511
8118
5358
918
78 984
2201
1460
1814 42
119
1154
277
914
9818
8820
93 550
904
2449 926
2455 98
310
3410
1511
2112
9620
2224
16 153
658
1197
1207
1450
1928
2146
208
209
456
748
1735 498
1985
1565
1980
1592
1766
543
1201
1200
838
2141
1236
1714 98
553
118
38 844
1882
2188
1084
1095
275
447
2041
1148
2317
1748
1855
1912
2121
1765
481
2178
2423
497
1259
1576
1346
2324
2376
1749
1230
2304
2395 6
408
891
1649 10
148
555
182
591
711
6515
2416
0016
3116
44 244
1052 71
730
135
517
3114
9915
8617
2422
6524
63 416
1629
1799
1857
1893
2379
2385
2530
1120
2468
2588
1158
1561
1551 55
613
8915
53 722
1274
1403
1688
1921
1987
2302
2404 15
452
921
819
46 553
1395 625
808
237
271
1909
305
306
1475
565
1275 85
915
56 752
1956
1360
1235
2558 7
214
317
704
804
975
1232
1321
1404
1861
2452
2453 65
092
010
3913
0214
2524
85 190
1184
1005
2346 35
611
6611
8512
7315
6616
2718
2422
9824
51 232
400
486
576
988
1304
1349
1517
1685
1726
1744
2184
2274
2493 15
619
3422
0429
551
163
871
882
211
3712
0312
0515
2915
9318
4520
56 526
1543
1682
623
744
1090
1761
2228
2598 16 88 41
043
169
569
811
7614
4714
8615
7016
5019
1322
2922
4624
1924
72 52 374
394
809
1492
1889
2046 68
515
6320
9521
2621
6121
6223
78 3624
15 185
435
1097
1292
1319
1704
1869
2285
2107
2108 11
1178
2283 34
977
612
69 560
2330 93
811
6217
3712
8819
86 847
1303
1614
1965
174
334
664
1548
1479
1966
702
2390
1508
1947 35
713
12 922
461
462
581
1358
2072
2520
2062
2152 32
512
3714
9320
1115
33 79 397
1675
2183
2356
2331 39
168
810
1315
4016
1322
6024
0516
2121
23 9611
6125
8314
0522
4925
6534
219
3024
9815
2189
317
2121
0924
3318
0823
33 939
1271 280
2086
2039
2040
2224
1115
1222
1781
1261
375
666
2561
1231
720
2087
1981
1256
438
1272
1421
1815
1868
1218
1291
616
1618
2185 541
1854
1962 66
783
918
6421
13 957
1083
1963
2239
2518 35
914
0764
494
112
9845
745
815
4725
6214
4510
5522
9423
4412
6225
5121
1725
2637
671
350
625
19 714
9716
17 204
2238 43
422
84 128
179
2015
1363
1929
2337
122
134
249
1569
1619 22
216
8724
6213
5918
3424
3125
525
770
651
068
015
1422
09 874
2000
298
851
972
1635
1817
2232 31
816
4318
9176
817
17 407
1501
1759
2313
2025 46
970
957
229
117
4115
4125
33 600
676
1139
967
1188
2299
675
2587 67
811
2312
1414
6920
2422
82 519
740
1394
1457
1807
2253
815
1575 85
314
5915
04 920
227
930
832
847
074
578
680
111
3611
4113
8214
1014
8115
8518
1919
5321
1622
4222
8924
8826
01 114
1078
2291 19 24
731
336
838
847
447
949
049
149
552
468
172
572
611
5211
9413
4713
7614
8314
8415
4415
7716
7616
8618
2119
4019
8319
9819
9921
0322
1923
7424
22 39 160
161
315
840
883
1836
1991
2591 80
011
0618
0568
368
473
311
4016
0216
3217
1619
7119
7225
16 58 157
178
330
433
509
679
712
785
869
1107
1306
1833
2042
2043
2091
2271
2312
2315
2397
2398
2424
2428
2599 56
221
6510
0010
57 48 269
466
845
964
1725
1742 15
813
3518
02 215
426
721
795
898
1049
1337
1409
2342 12
019
180
720
9624
0024
01 409
894
923
643
1453 50 68
1835
1916 95 12
321
621
741
142
577
011
7715
0216
4117
3218
2321
48 98 569
1637
1816 95
121
0695
510
6916
9516
9617
0520
6421
6722
7622
92 192
211
1035
2168
2277 93
219
1721
59 14 846
1426
1890
2065
2164
2275 13
821
986
215
1115
8718
60 69
978
2266
2334 95
615
2210
514
5815
6411
9046
723
7711
0580
618
2612
1210
6025
66 437
527
882
23 24 72 116
188
389
450
639
682
1153
1226
1281
1318
1374
1578
1778
2286
2360
2501 11
533
133
637
277
184
810
9311
9813
9215
1315
5718
1820
0324
5625
4225
8626
00 206
471
605
672
946
948
954
1217
1315
1366
1429
1482
1580
1639
1646
2045
2175
2196
2336
2347
2420
2570
2592
1391
1371
2470
1487
2210 33 314
415
459
478
480
573
858
1026
1135
1167
1257
1282
1568
1597
1693
1718
1746
1777
1872
1995
2153
2380
2508
2576
2577
2585 56 63
269
989
289
911
8913
2815
0317
7017
7517
7617
9619
7521
4923
8323
8424
6110
9917
0024
5018
687
521
0118
5021
5823
6225
44 614
1477 73
113
34 12
1560 545
708
2035
147
268
656
1168
1786
2502
1163
1954
2492
1758
451
546
1102
1058
1070 61
563
065
2413
1657
233
974
1104 90
122
2565
921
60 959
350
552
2014
1549
831
1066
1075
1022
2363
1571
2137
2443
2539 36
310
7920
58 610
1127
1128
2180
2594
2156 17 203
501
730
863
1043
1044
1736
1898
1969 62 10
815
119
936
643
072
812
4512
8312
8614
1716
3017
3017
9119
1419
8421
7322
7323
8624
2524
44 276
1782 17
532
710
6713
10 261
735
1539
2255 57
410
7719
08 852
2118
2262
1536
2102 53 148
262
370
392
636
870
947
953
966
1294
1299
1345
1430
1441
1659
1715
1747
1847
2018
2049
2176
2179
2408
2458
2536 63 68
769
394
910
2711
1813
7015
2315
2516
0321
0424
0225
0925
4513
2313
576
598
925
40 126
1941
2131 19
322
729
039
374
210
6811
9111
9517
0618
1124
0924
29 942
2532 13
618
120
733
571
975
690
211
9914
2215
4515
5816
1616
2516
9818
1018
2720
1921
2221
7724
7925
7121
29 20 732
757
1582
2080 38
720
5021
74 34 109
694
734
823
990
1170
1213
1655
2119 14
919
428
737
165
715
9616
2616
5824
4010
8019
2215
3019
6116
7421
4028
883
212
7814
4825
4115
1820
0724
6921
39 28
170
1546
2399
700
1030
2061
1994
127
1871
1046
103
1670
2442 74
314
6552
262
824
8120
7614
08 878
1531
1574
1918
2471 41
381
1340 79
812
0222
4034
022
00 651
661
1330
1455
1331
2350
1413
2357
1352
2512 14
119
9725
424
2723
7510
0618
3919
2317
3429
447
519
5523
32 583
1739
1926
020
040
060
080
010
0012
00
Dendrogram of agnes(x = dist.om3, diss = TRUE, method = "ward")
Agglomerative Coefficient = 1dist.om3
Hei
ght
OM1 OM3
13/7/2008gr 57/86
Mining Event or State SequencesVisualizing and clustering sequence data
Distances between sequences: Clustering
State distribution by age, within cluster
4.7 %
4.5 %
4.3 %
3.5 %
2.4 %
2.4 %
2 %
1.8 %
1.7 %
1.6 %
Age
A15 A17 A19 A21 A23 A25 A27 A29
0 1 2 3 4 5 6 7
A15 A16 A17 A18 A19 A20 A21 A22 A23 A24 A25 A26 A27 A28 A29 A30 A31 A32 A33 A34 A35 A36 A37 A38 A39 A40 A41 A42 A43 A44 A45
Groupe 1
Age
Fre
quen
cy
0.0
0.2
0.4
0.6
0.8
1.0
A15 A16 A17 A18 A19 A20 A21 A22 A23 A24 A25 A26 A27 A28 A29 A30 A31 A32 A33 A34 A35 A36 A37 A38 A39 A40 A41 A42 A43 A44 A45
Groupe 2
Age
Fre
quen
cy
0.0
0.2
0.4
0.6
0.8
1.0
A15 A16 A17 A18 A19 A20 A21 A22 A23 A24 A25 A26 A27 A28 A29 A30 A31 A32 A33 A34 A35 A36 A37 A38 A39 A40 A41 A42 A43 A44 A45
Groupe 3
Age
Fre
quen
cy
0.0
0.2
0.4
0.6
0.8
1.0
A15 A16 A17 A18 A19 A20 A21 A22 A23 A24 A25 A26 A27 A28 A29 A30 A31 A32 A33 A34 A35 A36 A37 A38 A39 A40 A41 A42 A43 A44 A45
Groupe 4
Age
Fre
quen
cy
0.0
0.2
0.4
0.6
0.8
1.0
A15 A16 A17 A18 A19 A20 A21 A22 A23 A24 A25 A26 A27 A28 A29 A30 A31 A32 A33 A34 A35 A36 A37 A38 A39 A40 A41 A42 A43 A44 A45
Groupe 5
Age
Fre
quen
cy
0.0
0.2
0.4
0.6
0.8
1.0
A15 A16 A17 A18 A19 A20 A21 A22 A23 A24 A25 A26 A27 A28 A29 A30 A31 A32 A33 A34 A35 A36 A37 A38 A39 A40 A41 A42 A43 A44 A45
Groupe 6
Age
Fre
quen
cy
0.0
0.2
0.4
0.6
0.8
1.0
13/7/2008gr 58/86
Mining Event or State SequencesVisualizing and clustering sequence data
Distances between sequences: Clustering
Most frequent sequences by cluster
4.7 %
4.5 %
4.3 %
3.5 %
2.4 %
2.4 %
2 %
1.8 %
1.7 %
1.6 %
Age
A15 A17 A19 A21 A23 A25 A27 A29
0 1 2 3 4 5 6 7
11.3 %
9.1 %
8.4 %
8.4 %
8 %
8 %
6.9 %
6.5 %
6.5 %
5.1 %
Groupe 1
Age
A15 A22 A29 A36 A43
5 %
4.1 %
4.1 %
3.5 %
3.2 %
2.9 %
2.6 %
2.6 %
2.3 %
2.3 %
Groupe 2
Age
A15 A22 A29 A36 A43
2.3 %
1.9 %
1.8 %
1.7 %
1.6 %
1.6 %
1.5 %
1.5 %
1.3 %
1.2 %
Groupe 3
Age
A15 A22 A29 A36 A43
57.5 %
3.9 %1.6 %1.6 %0.8 %0.8 %0.8 %0.8 %0.8 %0.8 %
Groupe 4
Age
A15 A22 A29 A36 A43
1.3 %
0.8 %
0.8 %
0.8 %
0.8 %
0.8 %
0.8 %
0.8 %
0.8 %
0.8 %
Groupe 5
Age
A15 A22 A29 A36 A43
10.2 %
8.2 %
7.8 %
4.8 %
4.8 %
4.8 %
4.4 %
3.4 %1.9 %1.9 %
Groupe 6
Age
A15 A22 A29 A36 A43
13/7/2008gr 59/86
Mining Event or State SequencesVisualizing and clustering sequence data
Distances between sequences: Clustering
I-plot by cluster
4.7 %
4.5 %
4.3 %
3.5 %
2.4 %
2.4 %
2 %
1.8 %
1.7 %
1.6 %
Age
A15 A17 A19 A21 A23 A25 A27 A29
0 1 2 3 4 5 6 7
13/7/2008gr 60/86
Mining Event or State SequencesVisualizing and clustering sequence data
Distances between sequences: Clustering
Distribution by birth cohort within each clusterAnnée de naissance (Groupe 1)
année
Fre
quen
cy
1910 1920 1930 1940 1950 1960
010
2030
4050
Année de naissance (Groupe 2)
année
Fre
quen
cy
1910 1920 1930 1940 1950 1960
010
2030
4050
60
Année de naissance (Groupe 3)
année
Fre
quen
cy
1910 1920 1930 1940 1950 1960
050
100
150
200
250
300
Année de naissance (Groupe 4)
année
Fre
quen
cy
1910 1920 1930 1940 1950 1960
05
1015
20
Année de naissance (Groupe 5)
année
Fre
quen
cy
1910 1920 1930 1940 1950 1960
010
2030
4050
60
Année de naissance (Groupe 6)
année
Fre
quen
cy
1910 1920 1930 1940 1950 1960
010
2030
4050
13/7/2008gr 61/86
Mining Event or State SequencesVisualizing and clustering sequence data
Multidimensional Scaling representation of sequences
Multidimensional Scaling: Principle
Let D be a distance matrix between sequences.D computed using OM, LPS, LCS, ... metrics.Multidimensional Scaling consists in
Finding a set of real valued variables (f1, f2) such that theδij =
√(fi1− fj1)2 + (fi2− fj2)2 best approximate the
distances dij . between sequences.Plotting the points in the (f1, f2) space.
13/7/2008gr 63/86
Mining Event or State SequencesVisualizing and clustering sequence data
Multidimensional Scaling representation of sequences
Multidimensional Scaling
−30 −20 −10 0 10 20 30
−20
−10
010
2030
dist.om.mds$points[,1]
dist
.om
.mds
$poi
nts[
,2]
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
Groupe 1Groupe 2Groupe 3Groupe 4Groupe 5Groupe 6
13/7/2008gr 64/86
Mining Event or State SequencesMining Frequent Episodes
Mining Frequent Episodes
What can we expect from frequent episodes mining?GSP (Srikant and Agrawal, 1996)MINEPI, WINEPI (Mannila et al., 1997)TCG, TAG (Bettini et al., 1996)SPADE (Zaki, 2001)
Are there specific issues when applying these methods insocial sciences?
13/7/2008gr 66/86
Mining Event or State SequencesMining Frequent Episodes
What Is It About?
Frequent episodes. What is it?
Episode: Collection of events occurring frequently together.Mining typical episodes:
Specialized case of mining frequent itemsets.Time dimension ⇒ Partially ordered events.
More complex than unordered itemsets: User mustspecify time constraints (and episode structure constraints).select a counting method.
13/7/2008gr 68/86
Mining Event or State SequencesMining Frequent Episodes
What Is It About?
Episode structure constraints
For people who leave home within 2 years from their 17, what aretypical events occurring until they get married and have a firstchild?
LH,17
w = 2
??
w = 1
C1
M(0, 4)
(0,3)
(0, 1, 10)
elastic
event constraints
parallel
node constraint
edge constraints
13/7/2008gr 69/86
Mining Event or State SequencesMining Frequent Episodes
What Is It About?
Counting methods (Joshi et al., 2001)
20 21 22 23 24
U UUC C C
Searching (U,C)min gap= 1, max gap= 2, win size= 2
indiv. with episode COBJ = 1
windows with episode CWIN = 3
min win. with episode CminWIN = 2
distinct occurrences CDIS_o = 5
dist. occ. without overlap CDIS = 3
13/7/2008gr 70/86
Mining Event or State SequencesMining Frequent Episodes
Example: Counting Alternate Episode Structures
Example: Counting alternate structures (COBJ, no max gap)
0%
5%
10%
15%
20%
25%
30%
Child <
Marr
iage
Marriag
e < C
hild
Child =
Marr
iage
Child <
Job
Job <
Chil
d
Child =
Job
Child <
Edu
c end
Educ e
nd <
Child
Child =
Edu
c end
Marriag
e < Jo
b
Job <
Marr
iage
Marriag
e = Jo
b
Marriag
e < Edu
c end
Educ e
nd <
Marriag
e
Marriag
e = Edu
c end
Job <
Edu
c end
Educ e
nd <
Job
Job =
Edu
c end
Switzerland, SHP 2002 biographical survey (n = 5560).13/7/2008gr 72/86
Mining Event or State SequencesMining Frequent Episodes
Issues Regarding Episode Rules
Rules between episodes
Social scientists like causal explanations.Empirically assessed rules are valuable material in that respect.
Little attention paid to this aspect in the literature onfrequent subsequences.
Mined episodes are already structured: if (U,C) is a frequentepisode, then we know that C often follows U.Deriving association rules from frequent ordered patterns issimilar to what is done with unordered itemsets.
Rule relevance criteria: confidence, surprisingness, implicationstrength, ...Their value depends on the selected counting method.
13/7/2008gr 74/86
Mining Event or State SequencesMining Frequent Episodes
Issues Regarding Episode Rules
Issues with episode rules in social sciences
Parallel life courses:Family events and professional life course.Life courses of each partner of a couple.
Mining associations between frequent episodes of a sequencewith those of its parallel sequence.
Frequent episodes from mix of the 2 sequences, and thenrestrict search of rules among candidates with premise andconsequence belonging to a different sequence.Frequent episodes from each sequence, and thensearch rules among candidates obtained by combining frequentepisodes from each sequence.
Accounting for multi-level effects when validating rules.Is rule relevant among groups, or within groups?
13/7/2008gr 75/86
Mining Event or State SequencesSummary
Summary
Data mining approaches (survival trees, clustering sequences,frequent episodes) have promising future in life courseanalysis.
Complement classical statistical outcomes with new insights.
Their use within social sciences raises specific issues:Accounting for multi-level effects when growing survival tree ormining association rules.Handling time varying predictors in survival trees.Selecting relevant counting methods (event dependent)?Suitable criteria for measuring association strength betweenfrequent episodes....
13/7/2008gr 76/86
Mining Event or State SequencesSummary
Our TraMineR R-package
Let me finish with an Add ...
TraMineR, a free life trajectory mining toolfor the free open source R statistical environment.downloadable from http://mephisto.unige.ch/biomining
and soon from the CRAN
13/7/2008gr 77/86
Mining Event or State SequencesSummary
Thank You!Thank You!
13/7/2008gr 78/86
Mining Event or State SequencesAppendix
Zoomed tree
Divorce, Switzerland, Differences in KM Survival CurvesI
Return
� � � � � � � � � � �
� � � � � � � � � � � � � �
� � � � � � � � � �
� � � � � � � � � � � �
� � � � � � � � � � � � � � �
� � � � � � � � �
� � � � � � � � � � � �
� � � � � � � �� � � � � � � � �
� � � � � � � � � �
� � � � � � � � � � � �
� � � � � � � �� � � � � � � � �
� � � � � � � � � �
� � � � � � � � � � � �
� � � � � � � �� � � � � � � � �
� � � � � � � � �
� � � � � � � � � � � �
� � � � � � � �� � � � � � �
� � � � � � � � � �
� � �
� � � � � � � � � � � �
� � � � � �� � � � � � � � � �
� � � � � � � � � � �
� � � � ! " " #$ � "
� � � � � � � � � � � �
� � � � � � �� � � � � � � � �
� � � � � � � � � �
% & � � � � � � � # � � � ' � � � # � � � �
� � � � � � � �
% & � � � � � � � � # � � ' � � � # � � � � % & � � � � � � � # � � � ' � � � # � � � �
� � � � � � � � � � � �
� � � � � � �� � � � � � � �
� � � � � � � � � �
� � � � � � � � � � �
� � � � � � � � �� � � � � � � � � �
� � � � � � � � � �
� � � � � � � � ( � ) � *� � � � � � � � � �
� � � � � � � �
% & � � � � � � � # � � � � ' � � � # � � � �
� � � � � � � � � � �
� � � � � � � � �� � � � � � � �
� � � � � � � � �
� � � � � � � � � � � �
� � � � � � �� � � � � � �
� � � � � � � � �
$ � "� �
� � � � � � � �
% & � � � � � � � # � � � ' � � � # � � �
� � � � � � � � � � � �
� � � � � � �� � � � � � � �
� � � � � � � � � �
� � � � � � � � � � � �
� � � � � � � �� � � � � � � � �
� � � � � � � � � �
$ � "� �
� � � � � � � �
% & � � � � � � � # � � � � ' � � � # � � �
+ � + �
+
+ � + + � + �
13/7/2008gr 79/86
Mining Event or State SequencesAppendix
Sub-sequences
Clusters and subsequences
Return
m1 e1
m_e
10
m_e
5
m_e
1
em1 s1 c1
0.0
0.2
0.4
0.6
0.8
1.0
Groupe 1
m1 d1
d_m
10
dm1 c1
d_m
5
c_m
10
c_m
5
0.0
0.2
0.4
0.6
0.8
1.0
Groupe 2
m1 d1 e1
m_e
10
d_e1
0
m_e
5
dm1
d_e5
0.0
0.2
0.4
0.6
0.8
1.0
Groupe 3
m1 d1 e1
dm1
m_e
10
d_e1
0
m_e
5 c1
0.0
0.2
0.4
0.6
0.8
1.0
Groupe 4
m1 s1 d1 e1
m_s
10
dm1
d_e1
0
d_m
10
0.0
0.2
0.4
0.6
0.8
1.0
Groupe 5
d1 c1 cd1 e1
d_c1
0
m1
d_e1
0
ce1
0.0
0.2
0.4
0.6
0.8
1.0
Groupe 6
13/7/2008gr 80/86
Mining Event or State SequencesAppendix
Sub-sequences
Biofam data: Legend
A15 A16 A17 A18 A19 A20 A21 A22 A23 A24 A25 A26 A27 A28 A29 A30 A31 A32 A33 A34 A35 A36 A37 A38 A39 A40 A41 A42 A43 A44 A45
Groupe 1
Age
Fre
quen
cy
0.0
0.2
0.4
0.6
0.8
1.0
no eventleft homemarried with/without childleft home, marriedwith childleft home, with childleft home, married, childdivorced
13/7/2008gr 81/86
Mining Event or State SequencesAppendix
For Further Reading
For Further Reading I
Abbott, A. and J. Forrest (1986). Optimal matching methods forhistorical sequences. Journal of Interdisciplinary History 16,471–494.
Bettini, C., X. S. Wang, and S. Jajodia (1996). Testing complextemporal relationships involving multiple granularities and itsapplication to data mining (extended abstract). In PODS ’96:Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGARTsymposium on Principles of database systems, New York, pp.68–78. ACM Press.
13/7/2008gr 82/86
Mining Event or State SequencesAppendix
For Further Reading
For Further Reading II
Billari, F. C. (2005). Life course analysis: Two (complementary)cultures? Some reflections with examples from the analysis oftransition to adulthood. In P. Ghisletta, J.-M. Le Goff, R. Levy,D. Spini, and E. Widmer (Eds.), Towards an InterdisciplinaryPerspective on the Life Course, Advancements in Life CourseResearch, Vol. 10, pp. 267–288. Amsterdam: Elsevier.
Blossfeld, H.-P. and G. Rohwer (2002). Techniques of EventHistory Modeling, New Approaches to Causal Analysis (2nded.). Mahwah NJ: Lawrence Erlbaum.
Elzinga, C. H. (2008). Sequence analysis: Metric representationsof categorical time series. Sociological Methods and Research.forthcoming.
13/7/2008gr 83/86
Mining Event or State SequencesAppendix
For Further Reading
For Further Reading III
Elzinga, C. H. and A. C. Liefbroer (2007). De-standardization offamily-life trajectories of young adults: A cross-nationalcomparison using sequence analysis. European Journal ofPopulation 23, 225–250.
Huang, X., S. Chen, and S. Soong (1998). Piecewise exponentialsurvival trees with time-dependent covariates. Biometrics 54,1420–1433.
Joshi, M. V., G. Karypis, and V. Kumar (2001). A universalformulation of sequential patterns. In Proceedings of theKDD’2001 workshop on Temporal Data Mining, San Fransisco,August 2001.
Leblanc, M. and J. Crowley (1992). Relative risk trees for censoredsurvival data. Biometrics 48, 411–425.
13/7/2008gr 84/86
Mining Event or State SequencesAppendix
For Further Reading
For Further Reading IV
Levenshtein, V. (1966). Binary codes capable of correctingdeletions, insertions, and reversals. Soviet Physics Doklady 10,707–710.
Mannila, H., H. Toivonen, and A. I. Verkamo (1997). Discovery offrequent episodes in event sequences. Data Mining andKnowledge Discovery 1(3), 259–289.
Needleman, S. and C. Wunsch (1970). A general methodapplicable to the search for similarities in the amino acidsequence of two proteins. Journal of Molecular Biology 48,443–453.
Segal, M. R. (1988). Regression trees for censored data.Biometrics 44, 35–47.
13/7/2008gr 85/86
Mining Event or State SequencesAppendix
For Further Reading
For Further Reading V
Segal, M. R. (1992). Tree-structured methods for longitudinaldata. Journal of the American Statistical Association 87 (418),407–418.
Srikant, R. and R. Agrawal (1996). Mining sequential patterns:Generalizations and performance improvements. In P. M. G.Apers, M. Bouzeghoub, and G. Gardarin (Eds.), Advances inDatabase Technologies – 5th International Conference onExtending Database Technology (EDBT’96), Avignon, France,Volume 1057, pp. 3–17. Springer-Verlag.
Zaki, M. J. (2001). SPADE: An efficient algorithm for miningfrequent sequences. Machine Learning 42(1/2), 31–60.
13/7/2008gr 86/86