clustering random walk time series

25
Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Clustering Random Walk Time Series GSI 2015 - Geometric Science of Information Gautier Marti, Frank Nielsen, Philippe Very, Philippe Donnat 29 October 2015 Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

Upload: hellebore-capital-limited

Post on 15-Apr-2017

371 views

Category:

Science


0 download

TRANSCRIPT

Page 1: Clustering Random Walk Time Series

IntroductionGeometry of Random Walk Time Series

The Hierarchical Block ModelConclusion

Clustering Random Walk Time SeriesGSI 2015 - Geometric Science of Information

Gautier Marti, Frank Nielsen, Philippe Very, Philippe Donnat

29 October 2015

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

Page 2: Clustering Random Walk Time Series

IntroductionGeometry of Random Walk Time Series

The Hierarchical Block ModelConclusion

1 Introduction

2 Geometry of Random Walk Time Series

3 The Hierarchical Block Model

4 Conclusion

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

Page 3: Clustering Random Walk Time Series

IntroductionGeometry of Random Walk Time Series

The Hierarchical Block ModelConclusion

Context (data from www.datagrapple.com)

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

Page 4: Clustering Random Walk Time Series

IntroductionGeometry of Random Walk Time Series

The Hierarchical Block ModelConclusion

What is a clustering program?

Definition

Clustering is the task of grouping a set of objects in such a waythat objects in the same group (cluster) are more similar to eachother than those in different groups.

Example of a clustering program

We aim at finding k groups by positioning k group centers{c1, . . . , ck} such that data points {x1, . . . , xn} minimizeminc1,...,ck

∑ni=1 mink

j=1 d(xi , cj)2

But, what is the distance d between two random walk time series?

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

Page 5: Clustering Random Walk Time Series

IntroductionGeometry of Random Walk Time Series

The Hierarchical Block ModelConclusion

What are clusters of Random Walk Time Series?

French banks and building materialsCDS over 2006-2015

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

Page 6: Clustering Random Walk Time Series

IntroductionGeometry of Random Walk Time Series

The Hierarchical Block ModelConclusion

What are clusters of Random Walk Time Series?

French banks and building materialsCDS over 2006-2015

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

Page 7: Clustering Random Walk Time Series

IntroductionGeometry of Random Walk Time Series

The Hierarchical Block ModelConclusion

1 Introduction

2 Geometry of Random Walk Time Series

3 The Hierarchical Block Model

4 Conclusion

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

Page 8: Clustering Random Walk Time Series

IntroductionGeometry of Random Walk Time Series

The Hierarchical Block ModelConclusion

Geometry of RW TS ≡ Geometry of Random Variables

i.i.d. observations:

X1 : X 11 , X 2

1 , . . . , XT1

X2 : X 12 , X 2

2 , . . . , XT2

. . . , . . . , . . . , . . . , . . .XN : X 1

N , X 2N , . . . , XT

N

Which distances d(Xi ,Xj) between dependent random variables?

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

Page 9: Clustering Random Walk Time Series

IntroductionGeometry of Random Walk Time Series

The Hierarchical Block ModelConclusion

Pitfalls of a basic distance

Let (X ,Y ) be a bivariate Gaussian vector, with X ∼ N (µX , σ2X ),

Y ∼ N (µY , σ2Y ) and whose correlation is ρ(X ,Y ) ∈ [−1, 1].

E[(X − Y )2] = (µX − µY )2 + (σX − σY )2 + 2σXσY (1− ρ(X ,Y ))

Now, consider the following values for correlation:

ρ(X ,Y ) = 0, so E[(X − Y )2] = (µX − µY )2 + σ2X + σ2

Y .Assume µX = µY and σX = σY . For σX = σY � 1, weobtain E[(X − Y )2]� 1 instead of the distance 0, expectedfrom comparing two equal Gaussians.

ρ(X ,Y ) = 1, so E[(X − Y )2] = (µX − µY )2 + (σX − σY )2.

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

Page 10: Clustering Random Walk Time Series

IntroductionGeometry of Random Walk Time Series

The Hierarchical Block ModelConclusion

Pitfalls of a basic distanceLet (X , Y ) be a bivariate Gaussian vector, with X ∼ N (µX , σ

2X ), Y ∼ N (µY , σ

2Y ) and whose correlation is

ρ(X , Y ) ∈ [−1, 1].

E[(X − Y )2] = (µX − µY )2 + (σX − σY )2 + 2σXσY (1− ρ(X , Y ))

Now, consider the following values for correlation:

ρ(X , Y ) = 0, so E[(X − Y )2] = (µX − µY )2 + σ2X + σ2

Y . Assume µX = µY and σX = σY . For

σX = σY � 1, we obtain E[(X − Y )2]� 1 instead of the distance 0, expected from comparing twoequal Gaussians.

ρ(X , Y ) = 1, so E[(X − Y )2] = (µX − µY )2 + (σX − σY )2.

30 20 10 0 10 20 300.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40 Probability density functions of Gaus-sians N (−5, 1) and N (5, 1), Gaus-sians N (−5, 3) and N (5, 3), andGaussians N (−5, 10) and N (5, 10).Green, red and blue Gaussians areequidistant using L2 geometry on theparameter space (µ, σ).

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

Page 11: Clustering Random Walk Time Series

IntroductionGeometry of Random Walk Time Series

The Hierarchical Block ModelConclusion

Sklar’s Theorem

Theorem (Sklar’s Theorem (1959))

For any random vector X = (X1, . . . ,XN) having continuousmarginal cdfs Pi , 1 ≤ i ≤ N, its joint cumulative distribution P isuniquely expressed as

P(X1, . . . ,XN) = C (P1(X1), . . . ,PN(XN)),

where C, the multivariate distribution of uniform marginals, isknown as the copula of X .

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

Page 12: Clustering Random Walk Time Series

IntroductionGeometry of Random Walk Time Series

The Hierarchical Block ModelConclusion

Sklar’s Theorem

Theorem (Sklar’s Theorem (1959))

For any random vector X = (X1, . . . , XN ) having continuous marginal cdfs Pi , 1 ≤ i ≤ N, its joint cumulativedistribution P is uniquely expressed as P(X1, . . . , XN ) = C(P1(X1), . . . , PN (XN )), where C, the multivariatedistribution of uniform marginals, is known as the copula of X .

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

Page 13: Clustering Random Walk Time Series

IntroductionGeometry of Random Walk Time Series

The Hierarchical Block ModelConclusion

The Copula Transform

Definition (The Copula Transform)

Let X = (X1, . . . ,XN) be a random vector with continuousmarginal cumulative distribution functions (cdfs) Pi , 1 ≤ i ≤ N.The random vector

U = (U1, . . . ,UN) := P(X ) = (P1(X1), . . . ,PN(XN))

is known as the copula transform.

Ui , 1 ≤ i ≤ N, are uniformly distributed on [0, 1] (the probabilityintegral transform): for Pi the cdf of Xi , we havex = Pi (Pi

−1(x)) = Pr(Xi ≤ Pi−1(x)) = Pr(Pi (Xi ) ≤ x), thus

Pi (Xi ) ∼ U [0, 1].

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

Page 14: Clustering Random Walk Time Series

IntroductionGeometry of Random Walk Time Series

The Hierarchical Block ModelConclusion

The Copula Transform

Definition (The Copula Transform)

Let X = (X1, . . . , XN ) be a random vector with continuous marginal cumulative distribution functions (cdfs) Pi ,1 ≤ i ≤ N. The random vector U = (U1, . . . ,UN ) := P(X ) = (P1(X1), . . . , PN (XN )) is known as the copulatransform.

0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2

X∼U[0,1]10

8

6

4

2

0

2

Y∼

ln(X

)

ρ≈0.84

0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2

PX (X)0.2

0.0

0.2

0.4

0.6

0.8

1.0

1.2

PY(Y

)

ρ=1

The Copula Transform invariance to strictly increasing transformation

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

Page 15: Clustering Random Walk Time Series

IntroductionGeometry of Random Walk Time Series

The Hierarchical Block ModelConclusion

Deheuvels’ Empirical Copula Transform

Let (X t1 , . . . , X

tN ), 1 ≤ t ≤ T , be T observations from a random vector (X1, . . . , XN ) with continuous margins.

Since one cannot directly obtain the corresponding copula observations (Ut1, . . . ,U

tN ) = (P1(X t

1 ), . . . , PN (X tN )),

where t = 1, . . . ,T , without knowing a priori (P1, . . . , PN ), one can instead

Definition (The Empirical Copula Transform)

estimate the N empirical margins PTi (x) = 1

T

∑Tt=1 1(X t

i ≤ x),1 ≤ i ≤ N, to obtain the T empirical observations

(Ut1, . . . , U

tN) = (PT

1 (X t1 ), . . . ,PT

N (X tN)).

Equivalently, since Uti = R t

i /T , R ti being the rank of observation

X ti , the empirical copula transform can be considered as the

normalized rank transform.

In practice

x_transform = rankdata(x)/len(x)

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

Page 16: Clustering Random Walk Time Series

IntroductionGeometry of Random Walk Time Series

The Hierarchical Block ModelConclusion

Generic Non-Parametric Distance

d2θ (Xi ,Xj) = θ3E

[|Pi (Xi )− Pj(Xj)|2

]+ (1− θ)

1

2

∫R

(√dPi

dλ−√

dPj

)2

(i) 0 ≤ dθ ≤ 1, (ii) 0 < θ < 1, dθ metric,(iii) dθ is invariant under diffeomorphism

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

Page 17: Clustering Random Walk Time Series

IntroductionGeometry of Random Walk Time Series

The Hierarchical Block ModelConclusion

Generic Non-Parametric Distance

d20 : 1

2

∫R

(√dPidλ −

√dPj

)2

dλ = Hellinger2

d21 : 3E

[|Pi (Xi )− Pj(Xj)|2

]=

1− ρS2

= 2−6

∫ 1

0

∫ 1

0C (u, v)dudv

Remark:If f (x , θ) = cΦ(u1, . . . , uN ; Σ)

∏Ni=1 fi (xi ; νi ) then

ds2 = ds2GaussCopula +

N∑i=1

ds2margins

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

Page 18: Clustering Random Walk Time Series

IntroductionGeometry of Random Walk Time Series

The Hierarchical Block ModelConclusion

1 Introduction

2 Geometry of Random Walk Time Series

3 The Hierarchical Block Model

4 Conclusion

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

Page 19: Clustering Random Walk Time Series

IntroductionGeometry of Random Walk Time Series

The Hierarchical Block ModelConclusion

The Hierarchical Block Model

A model of nested partitions

The nested partitions defined by themodel can be seen on the distancematrix for a proper distance and theright permutation of the data points

In practice, one observe and workwith the above distance matrixwhich is identitical to the left oneup to a permutation of the data

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

Page 20: Clustering Random Walk Time Series

IntroductionGeometry of Random Walk Time Series

The Hierarchical Block ModelConclusion

Results: Data from Hierarchical Block Model

Adjusted Rand IndexAlgo. Distance Distrib Correl Correl+Distrib

HC-AL

(1− ρ)/2 0.00 ±0.01 0.99 ±0.01 0.56 ±0.01

E[(X − Y )2] 0.00 ±0.00 0.09 ±0.12 0.55 ±0.05

GPR θ = 0 0.34 ±0.01 0.01 ±0.01 0.06 ±0.02

GPR θ = 1 0.00 ±0.01 0.99 ±0.01 0.56 ±0.01

GPR θ = .5 0.34 ±0.01 0.59 ±0.12 0.57 ±0.01

GNPR θ = 0 1 0.00 ±0.00 0.17 ±0.00

GNPR θ = 1 0.00 ±0.00 1 0.57 ±0.00

GNPR θ = .5 0.99 ±0.01 0.25 ±0.20 0.95 ±0.08

AP

(1− ρ)/2 0.00 ±0.00 0.99 ±0.07 0.48 ±0.02

E[(X − Y )2] 0.14 ±0.03 0.94 ±0.02 0.59 ±0.00

GPR θ = 0 0.25 ±0.08 0.01 ±0.01 0.05 ±0.02

GPR θ = 1 0.00 ±0.01 0.99 ±0.01 0.48 ±0.02

GPR θ = .5 0.06 ±0.00 0.80 ±0.10 0.52 ±0.02

GNPR θ = 0 1 0.00 ±0.00 0.18 ±0.01

GNPR θ = 1 0.00 ±0.01 1 0.59 ±0.00

GNPR θ = .5 0.39 ±0.02 0.39 ±0.11 1

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

Page 21: Clustering Random Walk Time Series

IntroductionGeometry of Random Walk Time Series

The Hierarchical Block ModelConclusion

Results: Application to Credit Default Swap Time Series

Distance matricescomputed on CDStime series exhibit ahierarchical blockstructure

Marti, Very, Donnat,

Nielsen IEEE ICMLA 2015

(un)Stability ofclusters with L2

distance

Stability of clusterswith the proposeddistance

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

Page 22: Clustering Random Walk Time Series

IntroductionGeometry of Random Walk Time Series

The Hierarchical Block ModelConclusion

Consistency

Definition (Consistency of a clustering algorithm)

A clustering algorithm A is consistent with respect to the HierarchicalBlock Model defining a set of nested partitions P if the probability thatthe algorithm A recovers all the partitions in P converges to 1 whenT →∞.

Definition (Space-conserving algorithm)

A space-conserving algorithm does not distort the space, i.e. the distanceDij between two clusters Ci and Cj is such that

Dij ∈[

minx∈Ci ,y∈Cj

d(x , y), maxx∈Ci ,y∈Cj

d(x , y)

].

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

Page 23: Clustering Random Walk Time Series

IntroductionGeometry of Random Walk Time Series

The Hierarchical Block ModelConclusion

Consistency

Theorem (Consistency of space-conserving algorithms (Andler,Marti, Nielsen, Donnat, 2015))

Space-conserving algorithms (e.g., Single, Average, CompleteLinkage) are consistent with respect to the Hierarchical BlockModel.

T = 100 T = 1000 T = 10000

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

Page 24: Clustering Random Walk Time Series

IntroductionGeometry of Random Walk Time Series

The Hierarchical Block ModelConclusion

1 Introduction

2 Geometry of Random Walk Time Series

3 The Hierarchical Block Model

4 Conclusion

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

Page 25: Clustering Random Walk Time Series

IntroductionGeometry of Random Walk Time Series

The Hierarchical Block ModelConclusion

Discussion and questions?

Avenue for research:

distances on (copula,margins)clustering using multivariate dependence informationclustering using multi-wise dependence information

Optimal Copula Transport for Clustering Multivariate Time Series,Marti, Nielsen, Donnat, 2015

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series