influence function of the error rate of generalized k-means - joint ...€¦ · influence function...

78
Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence function of the error rate Bias of the error rate Simulation study Future research Influence function of the error rate of generalized k-means Joint work with G. Haesbroeck Ch. Ruwet UNIVERSITY OF LIÈGE 30 March 2009

Upload: others

Post on 18-Jul-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Influence function of the error rate ofgeneralized k-meansJoint work with G. Haesbroeck

Ch. Ruwet

UNIVERSITY OF LIÈGE

30 March 2009

Page 2: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Introduction

Page 3: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

The k -means clustering method

Aim of clustering : Group similar observations in kclusters C1, . . . , Ck .

The k -means algorithm constructs clusters in order tominimize the within cluster sum of squared distances.

Let us focus on k = 2 groups.

Page 4: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Classification based on clustering

In case of natural groups in the data, clustering may beused to find these groups via a classification rule :

x ∈ Cj ⇔ ‖x − xj‖ = min1≤i≤2

‖x − xi‖

Page 5: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Example in 1D

Height of students in 1BM

Height (in cm)

160 170 180 190

Page 6: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Example in 1D

Height of students in 1BM

Height (in cm)

160 170 180 190

Page 7: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Example in 2D

40 50 60 70 80 90

160

170

180

190

Students of 1BM

Weight (in kg)

Hei

ght (

in c

m)

Page 8: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Example in 2D

40 50 60 70 80 90

160

170

180

190

Students of 1BM

Weight (in kg)

Hei

ght (

in c

m)

Page 9: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Contaminated example in 1D

Height of students in 1BM

Height (in cm)

120 140 160 180

Page 10: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Contaminated example in 1D

Height of students in 1BM

Height (in cm)

120 140 160 180

Page 11: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

The generalized 2-means clustering method

The clusters’ centers (T1, T2) are solution of

mint1,t2⊂R2

n∑

i=1

Ω

(

inf1≤j≤2

‖xi − tj‖)

for a suitable nondecreasing penalty function Ω.

Classical penalty functions :

Ω(x) = x2 → 2-means method

Ω(x) = x → 2-medoids method

Page 12: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Contaminated example with the 2-medoidsmethod

Height of students in 1BM

Height (in cm)

120 130 140 150 160 170 180 190

Page 13: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

The generalized 2-means clustering method

The classification rule :

x ∈ Cj ⇔ Ω(‖x − Tj‖) = min1≤i≤2

Ω(‖x − Ti‖)

In one dimension, the estimated clusters are simply:

C1 =] −∞, C[

C2 =]C, +∞[

where C =T1 + T2

2is the cut-off point.

Page 14: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Error rate

Page 15: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Example in 1D

Height of students in 1BM

Height (in cm)

160 170 180 190

GirlsBoys

Page 16: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Example in 1D with the 2-means

Height of students in 1BM

Height (in cm)

160 170 180 190

GirlsBoys

Page 17: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Example in 1D with the 2-means

Height of students in 1BM

Height (in cm)

160 170 180 190

GirlsBoys

Page 18: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Example in 1D with the 2-means

Height of students in 1BM

Height (in cm)

160 170 180 190

GirlsBoys

ER=0.326

Page 19: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Contaminated example in 1D

Height of students in 1BM

Height (in cm)

120 140 160 180

GirlsBoys

Page 20: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Contaminated example in 1D with the 2-means

Height of students in 1BM

Height (in cm)

120 140 160 180

GirlsBoys

Page 21: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Contaminated example in 1D with the 2-means

Height of students in 1BM

Height (in cm)

120 140 160 180

GirlsBoys

Page 22: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Contaminated example in 1D with the 2-means

Height of students in 1BM

Height (in cm)

120 140 160 180

GirlsBoys

ER=0.609

Page 23: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Contaminated example in 1D with the2-medoids

Height of students in 1BM

Height (in cm)

120 140 160 180

GirlsBoys

ER=0.370

Page 24: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Classification setting

Suppose

X arises from 2 groups G1 and G2 with πi(F ) = IPF [X ∈ Gi ]

then

F is a mixture of two distributions

F = π1(F )F1 + π2(F )F2

with densities f1 and f2.

Additional assumption : one dimension !

Page 25: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

The generalized 2-means as statisticalfunctionals

The clusters’ centers (T1(F ), T2(F )) are solution of

mint1,t2⊂R2

Ω

(

inf1≤j≤2

‖x − tj‖)

dF (x)

for a suitable nondecreasing penalty function Ω.

The classification rule is

RF (x) = Cj(F ) ⇔ Ω(‖x−Tj(F )‖) = min1≤i≤2

Ω(‖x−Ti(F )‖)

Page 26: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Optimality in classification

A classification rule is optimal if the corresponding errorrate is minimal

The optimal classification rule is the Bayes rule :

x ∈ C1 ⇔ π1(F )f1(x) > π2(F )f2(x)

(Anderson, 1958)

The 2-means procedure is optimal under the model

FN = 0.5 N(µ1, σ2) + 0.5 N(µ2, σ

2) with µ1 < µ2

(Qiu and Tamhane, 2007)

Page 27: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Theoretical vs empirical error rate

Theoretical error rate :• Training sample according to F : estimation of the rule• Test sample according to Fm : evaluation of the rule• In ideal circumstances : F = Fm

ER(F , Fm) =2

j=1

πj(Fm)IPFm

[

RF (X ) 6= Cj(F )∣

∣ Gj]

Empirical error rate :• Training sample according to F : estimation and

evaluation of the rule

ER(F , F ) =2

j=1

πj(F )IPF[

RF (X ) 6= Cj(F )∣

∣ Gj]

Page 28: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Theoretical vs empirical error rate

Theoretical error rate :• Training sample according to F : estimation of the rule• Test sample according to Fm : evaluation of the rule• In ideal circumstances : F = Fm

ER(F , Fm) =2

j=1

πj(Fm)IPFm

[

RF (X ) 6= Cj(F )∣

∣ Gj]

Empirical error rate :• Training sample according to F : estimation and

evaluation of the rule

ER(F , F ) =2

j=1

πj(F )IPF[

RF (X ) 6= Cj(F )∣

∣ Gj]

Page 29: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Contaminated distribution

A contaminated distribution is defined by

""D

DD

DD

DD

D

zzuuuu

uuuu

uu

1 − ε : F ε : G

where G is a arbitrary distribution function.

Page 30: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Contaminated distribution

A contaminated distribution is defined by

""D

DD

DD

DD

D

zzuuuu

uuuu

uu

1 − ε : F ε : G

where G is a arbitrary distribution function.

To see the influence of one singular point x , G = ∆x leadingto

Fε = (1 − ε)F + ε∆x

Page 31: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Contaminated mixture

−4 −2 0 2 4

0.00

0.05

0.10

0.15

0.20

Mixture

x

0.4

0.6

−4 −2 0 2 4

0.00

0.05

0.10

0.15

0.20

Contaminated mixture

x

(1−ε)0.4

(1−ε)0.6

Page 32: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Theoretical vs empirical error rate undercontamination

Now, the training sample is distributed as Fε which is acontaminated mixture.

Theoretical error rate :

ER(Fε, Fm) =2

j=1

πj(Fm)IPFm

[

RFε(X ) 6= Cj(Fε)

∣ Gj]

Empirical error rate :

ER(Fε, Fε) =2

j=1

πj(Fε)IPFε

[

RFε(X ) 6= Cj(Fε)

∣ Gj]

Page 33: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Theoretical vs empirical error rate undercontamination

Now, the training sample is distributed as Fε which is acontaminated mixture.

Theoretical error rate :

ER(Fε, Fm) =2

j=1

πj(Fm)IPFm

[

RFε(X ) 6= Cj(Fε)

∣ Gj]

Empirical error rate :

ER(Fε, Fε) =2

j=1

πj(Fε)IPFε

[

RFε(X ) 6= Cj(Fε)

∣ Gj]

Page 34: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Theoretical vs empirical error rate undercontamination

Graphically :

Fm = FN ≡ 0.5 N(−1, 1) + 0.5 N(1, 1) an optimal model

C(FN) = −1+12 = 0 (Qiu and Tamhane, 2007)

Fε = (1 − ε)Fm + ε∆x

ε varying and x = −0.5

x ∈ G1 varying and ε = 0.1

Page 35: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Theoretical vs empirical error rate undercontamination

Theoretical error rate (with the 2-means) :

0.0 0.2 0.4 0.6 0.8

0.16

00.

165

0.17

00.

175

0.18

0

Err

or R

ate

ε−2 −1 0 1 2

0.15

90.

160

0.16

10.

162

0.16

3

x

Err

or R

ate

Page 36: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Theoretical vs empirical error rate undercontamination

Empirical error rate (with the 2-means) :

0.0 0.2 0.4 0.6 0.8

0.0

0.2

0.4

0.6

0.8

1.0

Err

or R

ate

If x is well placedIf not

ε−2 −1 0 1 2

0.14

0.16

0.18

0.20

0.22

0.24

x

Err

or R

ate

Page 37: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Influence function of theerror rate

Page 38: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Influence functions

Hampel et al (1986) : For any statistical functional T andany distribution F ,

IF(x ; T, F ) = limε→0

T(Fε) − T(F )

ε=

∂εT(Fε)

ε=0where

Fε = (1 − ε)F + ε∆x ;

EF [IF(X ; T, F )] = 0;

T(Fε) ≈ T(F ) + εIF(x ; T, F ) for ε small enough(First-order von Mises expansion of T at F ).

Page 39: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Influence functions

Hampel et al (1986) : For any statistical functional T andany distribution F ,

IF(x ; T, F ) = limε→0

T(Fε) − T(F )

ε=

∂εT(Fε)

ε=0where

Fε = (1 − ε)F + ε∆x ;

EF [IF(X ; T, F )] = 0;

T(Fε) ≈ T(F ) + εIF(x ; T, F ) for ε small enough(First-order von Mises expansion of T at F ).

Page 40: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Theoretical vs empirical error rate

ER(Fε, F ) ≈ ER(F , F ) + εIF(x ; ER, F )

ER(Fε, Fε) ≈ ER(F , F ) + εIF(x ; ER, F )

Theoretical error rate :

ER(Fε, FN) ≥ ER(FN , FN) ⇒ IF(x ; ER, FN) ≡ 0

Empirical error rate : The IF does not vanish!From now on, ER(F ) = ER(F , F ).

Page 41: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Theoretical vs empirical error rate

ER(Fε, F ) ≈ ER(F , F ) + εIF(x ; ER, F )

ER(Fε, Fε) ≈ ER(F , F ) + εIF(x ; ER, F )

Theoretical error rate :

ER(Fε, FN) ≥ ER(FN , FN) ⇒ IF(x ; ER, FN) ≡ 0

Empirical error rate : The IF does not vanish!From now on, ER(F ) = ER(F , F ).

Page 42: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Theoretical vs empirical error rate

ER(Fε, F ) ≈ ER(F , F ) + εIF(x ; ER, F )

ER(Fε, Fε) ≈ ER(F , F ) + εIF(x ; ER, F )

Theoretical error rate :

ER(Fε, FN) ≥ ER(FN , FN) ⇒ IF(x ; ER, FN) ≡ 0

Empirical error rate : The IF does not vanish!From now on, ER(F ) = ER(F , F ).

Page 43: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

ER(Fε) =?

ER(F ) =2

j=1

πj(F )IPF[

RF (X ) 6= Cj(F )∣

∣ Gj]

= π1(F )1 − F1(C(F )) + π2(F )F2(C(F ))

Page 44: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

ER(Fε) =?

ER(F ) =2

j=1

πj(F )IPF[

RF (X ) 6= Cj(F )∣

∣ Gj]

= π1(F )1 − F1(C(F )) + π2(F )F2(C(F ))

Under Fε = (1 − ε)F + ε∆x , one has

ER(Fε) = π1(Fε)

1 − F1,ε (C(Fε))

+ π2(Fε)F2,ε (C(Fε))

Page 45: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

ER(Fε) =?

ER(F ) =2

j=1

πj(F )IPF[

RF (X ) 6= Cj(F )∣

∣ Gj]

= π1(F )1 − F1(C(F )) + π2(F )F2(C(F ))

Under Fε = (1 − ε)F + ε∆x , one has

ER(Fε) = π1(Fε)

1 − F1,ε (C(Fε))

+ π2(Fε)F2,ε (C(Fε))

Page 46: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

πi(Fε) =? and Fi ,ε =?

Fε = (1 − ε)F + ε∆x

πi(Fε) = (1 − ε)πi(F ) + εIx ∈ Gi

Fi,ε =

(

1 −εIx ∈ Gi

πi(Fε)

)

Fi +εIx ∈ Gi

πi(Fε)∆x

⇒ Fε = π1(Fε)F1,ε + π2(Fε)F2,ε

Page 47: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

πi(Fε) =? and Fi ,ε =?

Fε = (1 − ε)F + ε∆x

πi(Fε) = (1 − ε)πi(F ) + εIx ∈ Gi

Fi,ε =

(

1 −εIx ∈ Gi

πi(Fε)

)

Fi +εIx ∈ Gi

πi(Fε)∆x

⇒ Fε = π1(Fε)F1,ε + π2(Fε)F2,ε

Page 48: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

πi(Fε) =? and Fi ,ε =?

Fε = (1 − ε)F + ε∆x

πi(Fε) = (1 − ε)πi(F ) + εIx ∈ Gi

Fi,ε =

(

1 −εIx ∈ Gi

πi(Fε)

)

Fi +εIx ∈ Gi

πi(Fε)∆x

⇒ Fε = π1(Fε)F1,ε + π2(Fε)F2,ε

Page 49: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Influence function of the empirical error rate

Proposition

For all x 6= C(F ),

IF(x ; ER, F ) = −ER(F )

+ Ix ≤ C(F )(1 − 2 Ix ∈ G1) + Ix ∈ G1

+12(IF(x ; T1, F ) + IF(x ; T2, F ))

π2(F )f2(C(F )) − π1(F )f1(C(F )).

Expressions of IF(x ; T1, F ) and IF(x ; T2, F ) have beencomputed by García-Escudero and Gordaliza (1999).

Page 50: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Graphics of IF(x ; ER, F )

−4 −2 0 2 4

0.0

0.5

1.0

x

IF E

R

x in G1x in G2

−4 −2 0 2 4

0.0

0.5

1.0

xIF

ER

x in G1x in G2

Page 51: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Graphics of IF(x ; ER, F )

−4 −2 0 2 4

−2

−1

01

2

x

IF E

R

pi1=0.2pi1=0.4pi1=0.6pi1=0.8

−4 −2 0 2 4

−2

−1

01

2x

IF E

R

pi1=0.2pi1=0.4pi1=0.6pi1=0.8

Page 52: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Graphics of IF(x ; ER, F )

∆ = µ2 − µ1

−4 −2 0 2 4

−1.

0−

0.5

0.0

0.5

1.0

1.5

x

IF E

R

Delta=1Delta=3Delta=5

−4 −2 0 2 4

−1.

0−

0.5

0.0

0.5

1.0

1.5

x

IF E

R

Delta=1Delta=3Delta=5

Page 53: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Graphics of IF(x ; ER, F )

−4 −2 0 2 4

−0.

50.

00.

51.

0

x

IF E

R

sigma=0.5sigma=1sigma=1.5sigma=2

−4 −2 0 2 4

−0.

50.

00.

51.

0x

IF E

R

sigma=0.5sigma=1sigma=1.5sigma=2

Page 54: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Bias of the error rate

Page 55: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Definition

In practice, F is replace by Fn, the empirical cdf

The bias is defined by Bn(ER) = EF [ER(Fn) − ER(F )]

Fernholz (2001) :

Bn(ER) =1

2nEF [IF2(X ; ER, F )] + o(n−1)

where

IF2(x ; ER, F ) =∂2

∂ε2 ER((1 − ε)F + ε∆x)

ε=0.

This result is true if ER(.) is Hadamard or Fréchetdifferentiable.

Page 56: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Computation of the bias

Proposition

Under asymptotic normality and consistency of T1 and T2

(Pollard, 1981 and 1982) :

Bn(ER) ≈1

4nπ2(F )f2(C(F )) − π1(F )f1(C(F ))

(EF [IF2(X ; T1, F )] + EF [IF2(X ; T2, F )])

+1

8nπ2(F )f ′2(C(F )) − π1(F )f ′1(C(F ))

(ASV(T1) + ASV(T2) + 2 ASC(T1, T2)).

Expressions of IF2(X ; T1, F ) and IF2(X ; T2, F ) have beencomputed.

Page 57: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Asymptotic difference

How much difference in error rate is to be expected byestimating a clustering rule from a finite sample ?

A-Diff(ER) = limn→∞

n Bn(ER)

⇒ Graphical comparisons of the 2-means and 2-medoidsmethods :

F = π1 N(−∆/2, 1) + (1 − π1) N(∆/2, 1)

• π1 varying and ∆ = 3• ∆ varying and π1 = 0.4

FN = 0.5 N(−∆/2, 1) + 0.5 N(∆/2, 1)

Page 58: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Graphics of A-Diff(ER) under F

0.2 0.4 0.6 0.8

−50

050 2−means

2−medoids

π1

A-D

iff

0 2 4 6 8 10

−0.

10.

00.

10.

20.

3

2−means2−medoids

∆A

-Diff

Page 59: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Graphic of A-Diff(ER) under FN

0 2 4 6 8 10

−0.

06−

0.04

−0.

020.

00

2−means2−medoids

A-D

iff

Page 60: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Simulation study

Page 61: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Simulation settings

F = 0.4 N(−1.5, 1) + 0.6 N(1.5, 1)

n = 100Fε = (1 − ε)F + ε∆x with

• ε = 0.01 and |x | = 5• ε = 0.05 and |x | = 5• ε = 0.01 and |x | = 50• ε = 0.05 and |x | = 50

N = 1000

Page 62: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Simulation results

2-means 2-medoids

(0.071) (0.072)

in C1 in C2 in C1 in C2

from G1

from G2

Page 63: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Simulation results

2-means 2-medoids

(0.071) (0.072)

in C1 in C2 in C1 in C2

0.068 0.083from G1

0.078 0.072from G2

Page 64: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Simulation results

2-means 2-medoids

(0.071) (0.072)

in C1 in C2 in C1 in C2

0.068 0.083 0.071 0.083from G1

0.078 0.072 0.081 0.073from G2

Page 65: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Simulation results

2-means 2-medoids

(0.071) (0.072)

in C1 in C2 in C1 in C2

0.068 0.083 0.071 0.083from G1 0.064 0.139

0.078 0.072 0.081 0.073from G2 0.119 0.083

Page 66: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Simulation results

2-means 2-medoids

(0.071) (0.072)

in C1 in C2 in C1 in C2

0.068 0.083 0.071 0.083from G1 0.064 0.139 0.066 0.127

0.078 0.072 0.081 0.073from G2 0.119 0.083 0.116 0.072

Page 67: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Simulation results

2-means 2-medoids

(0.071) (0.072)

in C1 in C2 in C1 in C2

0.068 0.083 0.071 0.083from G1 0.064 0.139 0.066 0.127

0.39 0.61

0.078 0.072 0.081 0.073from G2 0.119 0.083 0.116 0.072

0.41 0.59

Page 68: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Simulation results

2-means 2-medoids

(0.071) (0.072)

in C1 in C2 in C1 in C2

0.068 0.083 0.071 0.083from G1 0.064 0.139 0.066 0.127

0.39 0.61 0.072 0.083

0.078 0.072 0.081 0.073from G2 0.119 0.083 0.116 0.072

0.41 0.59 0.081 0.073

Page 69: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Simulation results

2-means 2-medoids

(0.071) (0.072)

in C1 in C2 in C1 in C2

0.068 0.083 0.071 0.083from G1 0.064 0.139 0.066 0.127

0.39 0.61 0.072 0.0830.35 0.650.078 0.072 0.081 0.073

from G2 0.119 0.083 0.116 0.0720.41 0.59 0.081 0.0730.45 0.55

Page 70: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Simulation results

2-means 2-medoids

(0.071) (0.072)

in C1 in C2 in C1 in C2

0.068 0.083 0.071 0.083from G1 0.064 0.139 0.066 0.127

0.39 0.61 0.072 0.0830.35 0.65 0.35 0.650.078 0.072 0.081 0.073

from G2 0.119 0.083 0.116 0.0720.41 0.59 0.081 0.0730.45 0.55 0.45 0.55

Page 71: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Contaminated example with the 2-medoids

Height of students in 1BM

Height (in cm)

120 130 140 150 160 170 180

GirlsBoys

Page 72: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Conclusion of these simulations

A well-placed contamination in the smallest groupmakes the error rate decrease

Too much contamination makes the error rate of the2-means break down

Unfortunately, the error rate of the 2-medoids alsobreaks down when there are too much and too faroutliers !

Page 73: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Future research

Page 74: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Future research

Generalized trimmed 2-means : for α ∈ [0, 1],(T1(F ), T2(F )) are solution of

minA:F (A)=1−α

mint1,t2⊂R

(

inf1≤j≤2

‖x − tj‖)

dF (x).

Theoretical error rate

ER(Fε, Fm) =2

j=1

πj(Fm)IPFm

[

RFε(X ) 6= Cj(Fε)

∣ Gj]

More than 1 dimension or more than 2 groups

Page 75: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Thank you for yourattention!

Page 76: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Bibliography

CROUX C., FILZMOSER P. and JOOSSENS K. (2008),Classification efficiences for robust linear discriminantanalysis, Statistica Sinica 18, pp. 581-599.

CROUX C., HAESBROECK G. and JOOSSENS K. (2008),Logistic discrimination using robust estimators : aninfluence function approach, The Canadian Journal ofStatistics, 36, pp. 157-174.

FERNHOLZ L. T., On multivariate higher order von Misesexpansions in Metrika 2001, vol. 53, pp. 123-140.

Page 77: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Bibliography

GARCÍA-ESCUDERO L. A. and GORDALIZA A.,Robustness Properties of k Means and Trimmed kMeans, Journal of the American Statistical Association,September 1999, Vol. 94, nr 447, pp. 956-969.

HAMPEL F.R., RONCHETTI E.M., ROUSSEEUW P.J.,STAHEL W.A., Robust Statistics : The Approach Basedon Influence Functions, John Wiley and Sons,New-York, 1986.

ANDERSON T.W., An Introduction to MultivariateStatistical Analysis, Wiley, New-York, 1958, pp.126-133.

Page 78: Influence function of the error rate of generalized k-means - Joint ...€¦ · Influence function of the error rate of generalized k-means Ch. Ruwet Introduction Error rate Influence

Influencefunction of

the error rateof

generalizedk-means

Ch. Ruwet

Introduction

Error rate

Influencefunction of theerror rate

Bias of theerror rate

Simulationstudy

Futureresearch

Bibliography

POLLARD D., Strong Consistency of k-MeansClustering, The Annals of Probability, 1981, Vol.9, nr4,pp.919-926.

POLLARD D., A Central Limit Theorem for k-MeansClustering, The Annals of Probability, 1982, Vol.10, nr1,pp.135-140.

QIU D. and TAMHANE A. C. (2007), A comparative studyof the k -means algorithm and the normal mixturemodel for clustering : Univariate case, Journal ofStatistical Planning and Inference, 137, pp. 3722-3740.