development of robust scatter estimators under independent ...andy.leung/files/... · alqallaf, van...

46
Development of robust scatter estimators under independent contamination model C. Agostinelli 1 , A. Leung 2 , V.J. Yohai 3 and R.H. Zamar 2 1 Universita C` a Fosc` ari di Venezia, 2 University of British Columbia, and 3 Universidad de Buenos Aires and CONICET Mar 16, 2013 C. Agostinelli 1 , A. Leung 2 ,, V.J. Yohai 3 and R.H. Zamar 2 Development of robust scatter estimators under independent

Upload: others

Post on 23-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

Development of robust scatter estimatorsunder independent contamination model

C. Agostinelli1, A. Leung2, V.J. Yohai3 and R.H. Zamar2

1 Universita Ca Foscari di Venezia, 2 University of British Columbia, and 3Universidad de Buenos Aires and CONICET

Mar 16, 2013

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 2: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

Some declarations

I To math geeks: I am sorry but I will keep my talk to haveminimal math equations and theorems today (come on, it is9 am!)

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 3: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

Objective of the day

Objective: robust estimation of (location and) scatter matrix fora data set of size n and p continuous variables.

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 4: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

What is contamination?

Perhaps the most classical contamination model isHuber-Tukey contamination model (HTCM) (Tukey in 1960,Huber in 1964), which was originally for 1-D data...

Contamination is row-wise, e.g.[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]

[1,] 0.9 -2.8 -2.1 -0.8 -2.4 1.3 2.7 3.4 0.9 -0.1[2,] -2.4 2.3 -1.8 -3.0 1.9 1.0 -0.5 0.4 -2.8 -1.5

[3,] 0.7 -2.3 -0.6 2.9 -1.5 -0.8 2.9 0.0 -2.6 1.8

[4,] 1.0 1.9 1.6 1.1 0.0 -2.2 1.0 -4.1 2.2 -0.9[5,] 0.1 -1.0 1.8 2.2 -0.1 2.1 -1.3 3.1 1.2 1.0

[6,] 1.7 3.0 0.6 0.9 -1.4 1.9 -0.3 -0.4 -0.4 1.7[7,] -0.8 1.0 2.5 3.9 -2.8 2.5 -0.3 -0.9 2.6 2.4

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 5: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

What is contamination?

Perhaps the most classical contamination model isHuber-Tukey contamination model (HTCM) (Tukey in 1960,Huber in 1964), which was originally for 1-D data...

Contamination is row-wise, e.g.[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]

[1,] 0.9 -2.8 -2.1 -0.8 -2.4 1.3 2.7 3.4 0.9 -0.1[2,] -2.4 2.3 -1.8 -3.0 1.9 1.0 -0.5 0.4 -2.8 -1.5

[3,] 0.7 -2.3 -0.6 2.9 -1.5 -0.8 2.9 0.0 -2.6 1.8

[4,] 1.0 1.9 1.6 1.1 0.0 -2.2 1.0 -4.1 2.2 -0.9[5,] 0.1 -1.0 1.8 2.2 -0.1 2.1 -1.3 3.1 1.2 1.0

[6,] 1.7 3.0 0.6 0.9 -1.4 1.9 -0.3 -0.4 -0.4 1.7[7,] -0.8 1.0 2.5 3.9 -2.8 2.5 -0.3 -0.9 2.6 2.4

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 6: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

What is contamination?

HTCM in math notation,

x∗ = (1 − u)x + uc

whereI x = (x1, ..., xp) ∼ N(µ,Σ)

I c ∼“something”I u ∼ Bin(1, ε), 0 ≤ ε < 1/2

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 7: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

New contamination model

HTCM may not be realistic...I outliers are more likely to happen in certain variables,

independent of othersI what if p is large but n is of moderate to small size?I what if every single observation has one component

contamination?

Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a newcontamination model...

Cell-wise contamination model

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 8: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

New contamination model

HTCM may not be realistic...I outliers are more likely to happen in certain variables,

independent of othersI what if p is large but n is of moderate to small size?I what if every single observation has one component

contamination?

Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a newcontamination model...

Cell-wise contamination model

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 9: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

New contamination model

Contamination is cell-wise, e.g.[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]

[1,] 2.69 2.10 4.59 2.13 -1.09 2.72 -0.72 0.47 -1.42 -1.90

[2,] 2.92 2.20 -1.70 -1.83 -1.05 4.89 0.32 -1.93 -2.59 -2.48

[3,] -0.75 0.53 -3.22 3.07 4.04 -1.39 -0.26 0.44 0.05 2.14

[4,] -2.35 4.46 -0.99 -0.41 0.68 -2.79 1.37 1.74 1.35 1.78

[5,] -1.09 -2.77 4.59 -2.78 -0.97 1.35 4.10 -0.56 3.79 -0.11

[6,] -1.94 -0.33 -0.40 -3.22 1.32 0.24 -1.89 1.02 2.60 4.54

where in math model is

x∗ = (1 − U)x + Uc

where x = (x1, ..., xp) and c is same as before, except

U = diag(ui), where ui ∼ Bin(1, ε),0 ≤ ε < 1/2

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 10: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

New contamination model

Contamination is cell-wise, e.g.[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]

[1,] 2.69 2.10 4.59 2.13 -1.09 2.72 -0.72 0.47 -1.42 -1.90

[2,] 2.92 2.20 -1.70 -1.83 -1.05 4.89 0.32 -1.93 -2.59 -2.48

[3,] -0.75 0.53 -3.22 3.07 4.04 -1.39 -0.26 0.44 0.05 2.14

[4,] -2.35 4.46 -0.99 -0.41 0.68 -2.79 1.37 1.74 1.35 1.78

[5,] -1.09 -2.77 4.59 -2.78 -0.97 1.35 4.10 -0.56 3.79 -0.11

[6,] -1.94 -0.33 -0.40 -3.22 1.32 0.24 -1.89 1.02 2.60 4.54

where in math model is

x∗ = (1 − U)x + Uc

where x = (x1, ..., xp) and c is same as before, except

U = diag(ui), where ui ∼ Bin(1, ε),0 ≤ ε < 1/2

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 11: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

Existing robust scatter estimators

Under HTCM, we have...I Minimum Volume Ellipsoid (MVE) (Rousseeuw, 1985)I Minimum Covariance Determinant (MCD) (Rousseeuw,

1985)I S-estimator (Davies, 1987)I MM-estimator (Yohai, 1987; Tatsuoka and Tyler, 2000)I modified GK estimator (Maronna and Zamar, 2002)I ...

Let’s look at how these existing robust scatter estimators (e.g.MVE, S-est, MM-est) perform under HTCM and Cell-wisecontam.

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 12: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

HTCMLet’s first illustrate through mini examples and diagrams:I p = 3,n = 30, ε = 0.20, random covariance matrix, origin center, normalI 95% conf. ellipsoids: MLE-clean (blue)

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 13: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

HTCMLet’s first illustrate through mini examples and diagrams:I p = 3,n = 30, ε = 0.20, random covariance matrix, origin center, normalI 95% conf. ellipsoids: MLE-clean (blue)

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 14: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

HTCMLet’s first illustrate through mini examples and diagrams:I p = 3,n = 30, ε = 0.20, random covariance matrix, origin center, normalI 95% conf. ellipsoids: MLE-clean (blue), MLE (yellow)

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 15: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

HTCMLet’s first illustrate through mini examples and diagrams:I p = 3,n = 30, ε = 0.20, random covariance matrix, origin center, normalI 95% conf. ellipsoids: MLE-clean (blue), MLE (yellow), MVE (green)

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 16: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

HTCMLet’s first illustrate through mini examples and diagrams:I p = 3,n = 30, ε = 0.20, random covariance matrix, origin center, normalI 95% conf. ellipsoids: MLE-clean (blue), MLE (yellow), MVE (green),

S-est. (red)

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 17: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

HTCMLet’s first illustrate through mini examples and diagrams:I p = 3,n = 30, ε = 0.20, random covariance matrix, origin center, normalI 95% conf. ellipsoids: MLE-clean (blue), MLE (yellow), MVE (green),

S-est. (red) ,MM-est. (gray)

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 18: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

Davies’ S-estimator

Definition (Davies, 1987): For µ ∈ Rp and positive definite Σ,S-estimator is (

µ, Σ)

= arg min s(µ,Σ)

Σ = s∗ Σ

where s(µ,Σ) is solution s to

1n

n∑i=1

ρ

(xi − µ)TΣ−1(xi − µ)|Σ|1/p

s

=12,

with ρ(·) is some bounded monotone loss function and mustsatifies

(||X||2

c

))=

12

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 19: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

MM-estimator (a two-stage estimator)

Definition: For µ ∈ Rp and positive definite Σ, MM-estimator is

(µ, Σ) = arg min J(µ,Σ)

where

J(µ,Σ) =1n

n∑i=1

ρ2

(xi − µ)TΣ−1(xi − µ)|Σ|1/p

sn

with ρ2(·) being a different loss function, i.e. ρ2(·) ≤ ρ1(·) and snbeing the scale from S-estimate.

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 20: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

Cell-wise contamination

I p = 3,n = 30, ε = 0.20, random covariance matrix, origin center, normalI 95% conf. ellipsoids: MLE-clean (blue)

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 21: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

Cell-wise contamination

I p = 3,n = 30, ε = 0.20, random covariance matrix, origin center, normalI 95% conf. ellipsoids: MLE-clean (blue), MLE (yellow)

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 22: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

Cell-wise contamination

I p = 3,n = 30, ε = 0.20, random covariance matrix, origin center, normalI 95% conf. ellipsoids: MLE-clean (blue), MLE (yellow), MVE (green)

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 23: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

Cell-wise contamination

I p = 3,n = 30, ε = 0.20, random covariance matrix, origin center, normalI 95% conf. ellipsoids: MLE-clean (blue), MLE (yellow), MVE (green),

S-est. (red)

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 24: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

Cell-wise contamination

I p = 3,n = 30, ε = 0.20, random covariance matrix, origin center, normalI 95% conf. ellipsoids: MLE-clean (blue), MLE (yellow), MVE (green),

S-est. (red) ,MM-est. (gray)

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 25: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

Composite S-estimator

MVE, S-, and MM estimator performs very badly undercell-wise contam....

Note that in our cell-wise contam. example,P(≥ 1 variable is contam.) = 1 − (1 − ε)p = 0.488.

In fact, all affine equivariant estimators for covariance collapseunder cell-wise contam. (Allqalaf et al., 2009)!

We need to develop a new estimator...

Composite-S estimator (CSE)

...but this estimator is not affine equivariant, which saves fromfalling under HTCM!

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 26: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

Composite S-estimator

MVE, S-, and MM estimator performs very badly undercell-wise contam....

Note that in our cell-wise contam. example,P(≥ 1 variable is contam.) = 1 − (1 − ε)p = 0.488.

In fact, all affine equivariant estimators for covariance collapseunder cell-wise contam. (Allqalaf et al., 2009)!

We need to develop a new estimator...

Composite-S estimator (CSE)

...but this estimator is not affine equivariant, which saves fromfalling under HTCM!

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 27: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

Composite S-estimator

MVE, S-, and MM estimator performs very badly undercell-wise contam....

Note that in our cell-wise contam. example,P(≥ 1 variable is contam.) = 1 − (1 − ε)p = 0.488.

In fact, all affine equivariant estimators for covariance collapseunder cell-wise contam. (Allqalaf et al., 2009)!

We need to develop a new estimator...

Composite-S estimator (CSE)

...but this estimator is not affine equivariant, which saves fromfalling under HTCM!

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 28: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

Composite S-estimator

MVE, S-, and MM estimator performs very badly undercell-wise contam....

Note that in our cell-wise contam. example,P(≥ 1 variable is contam.) = 1 − (1 − ε)p = 0.488.

In fact, all affine equivariant estimators for covariance collapseunder cell-wise contam. (Allqalaf et al., 2009)!

We need to develop a new estimator...

Composite-S estimator (CSE)

...but this estimator is not affine equivariant, which saves fromfalling under HTCM!

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 29: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

Composite S-estimator

In short, CSE attempts to minimize the size of the covariance(e.g. “ellipses”) for each pair of variables simultaneously,instead of all variables.

It tries to downweight bivariate Mahalanobis distances, insteadof full, when constructing the covariance matrix

Now let’s have an example, we will get back to its definitionlater...

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 30: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

Composite S-estimator

In short, CSE attempts to minimize the size of the covariance(e.g. “ellipses”) for each pair of variables simultaneously,instead of all variables.

It tries to downweight bivariate Mahalanobis distances, insteadof full, when constructing the covariance matrix

Now let’s have an example, we will get back to its definitionlater...

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 31: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

Composite S-estimator

In short, CSE attempts to minimize the size of the covariance(e.g. “ellipses”) for each pair of variables simultaneously,instead of all variables.

It tries to downweight bivariate Mahalanobis distances, insteadof full, when constructing the covariance matrix

Now let’s have an example, we will get back to its definitionlater...

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 32: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

Composite S-estimator

Example: p = 5,n = 100, ε = 0.10, random covariance matrix, origin center,normal, cell-wise contam.

95% confidence region based on Davies’ S-estimator vs true covariance:

Scatter Plot Matrix

V1024 0 2 4

−4−2

0

−4 −2 0

V2246

2 4 6

−4−2

0

−4−2 0

V3246

2 4 6

−202

−2 0 2

V40

24 0 2 4

−4−2

0

−4 −2 0

V52468

2 4 6 8

−4−2

02

−4 0 2

true S−est

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 33: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

Composite S-estimator

Example: p = 5,n = 100, ε = 0.10, random covariance matrix, origin center,normal, cell-wise contam.

95% confidence region based on CSE:

Scatter Plot Matrix

V1024 0 2 4

−4−2

0

−4 −2 0

V2246

2 4 6

−4−2

0

−4−2 0

V3246

2 4 6

−202

−2 0 2

V40

24 0 2 4

−4−2

0

−4 −2 0

V52468

2 4 6 8

−4−2

02

−4 0 2

true CSE

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 34: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

Composite S-estimator

Example: p = 5,n = 100, ε = 0.10, random covariance matrix, origin center,normal, cell-wise contam.

95% confidence region based on CSE versus S-est. based on each pair:

Scatter Plot Matrix

V1024 0 2 4

−4−2

0

−4 −2 0

V2246

2 4 6

−4−2

0

−4−2 0

V3246

2 4 6

−202

−2 0 2

V40

24 0 2 4

−4−2

0

−4 −2 0

V52468

2 4 6 8

−4−2

02

−4 0 2

true CSE Pairwise−S

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 35: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

Composite S-estimator

Definition (CSE): For a given robust initial estimator Ω0,

(µ, Σ) = arg min s(µ,Σ, Ω0)

Σ = s∗ Σ

where s(µ,Σ, Ω0) is solution s to

2p(p − 1)n

n∑i=1

p∑j=k

p−1∑k=1

ρ

d jki (µ,Σ)

s c0

|Σjk|1/2

|Ωjk0 |

1/2

=12

d jki (µ,Σ) = (xjk

− µjk )TΣjk−1(xjk− µjk ) is the bivariate

Mahalanobis distance, and c must satisifies the same criteriaas in Davies’ S-estimator but in bivariate.

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 36: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

Composite MM-estimator

CSE in general is robust under cell-wise contam. but notefficient.

Efficiency is a measurement of variability of the estimaterelative to some gold standard, such as MLE, under nocontamination.

We use the corresponding MM-version (Tatsuoka and Tyler,2000) of CSE to achieve efficiency

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 37: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

Composite MM-estimator

CSE in general is robust under cell-wise contam. but notefficient.

Efficiency is a measurement of variability of the estimaterelative to some gold standard, such as MLE, under nocontamination.

We use the corresponding MM-version (Tatsuoka and Tyler,2000) of CSE to achieve efficiency

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 38: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

Composite MM-estimator

CSE in general is robust under cell-wise contam. but notefficient.

Efficiency is a measurement of variability of the estimaterelative to some gold standard, such as MLE, under nocontamination.

We use the corresponding MM-version (Tatsuoka and Tyler,2000) of CSE to achieve efficiency

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 39: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

Composite S- and MM-estimator

Both have very nice but complex estimation procedure thatclosely link with S-estimator with missing data (Danilov et al,2012), but we will not describe here

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 40: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

Some results shown in ICORS 2012

We performed a Monte Carlo study to assess the behavior ofthe proposed estimators.

Simulation setting:I x ∼ N(0,Σ0), some n and pI Σ0 is exchangeable correlation, i.e.

Σ0 =

1 r ... rr 1 ... r... ... ... ...r ... 1 rr ... r 1

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 41: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

Some results shown in ICORS 2012

Here we show some results for

I Correlations: r = 0.5 and r = 0.9I p = 10 and n = 100.I p = 20 and n = 200.

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 42: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

Some results shown in ICORS 2012

Performance criteria as:1. Likelihood ratio test distance (LRT) for robustness

evaluation

D(Σ,Σ0) =1N

N∑i=1

D(Σi ,Σ0)

where

D(Σ,Σ0) = trace(Σ−10 Σ) − log(det(Σ−1

0 Σ)) − p

2. Relative efficiency based on LRT values for efficiencyevaluation

D(ΣMLE,Σ0)/D(Σ,Σ0)

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 43: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

Monte Carlo results

Gaussian Efficiency Without Outliers

p = 10, n = 100 p = 20,n = 200

ESTIMATES r0.5 0.9

S-est 0.91 0.90Pairwise-S 0.25 0.45CSE 0.70 0.50CMME 0.74 0.78

ESTIMATES r0.5 0.9

S-est 0.96 0.96Pairwise-S 0.36 0.37CSE 0.74 0.44CMME 0.81 0.60

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 44: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

Monte Carlo results

n = 100,p = 10, ε = 10%

10% Contamination(n=100, p=10)

Outliers size

Aver

age

LRT

dist

ance

0

2

4

6

8

5 10 15 20

Corr.=0.5ICM

Corr.=0.9ICM

Corr.=0.5THCM

5 10 15 20

0

2

4

6

8

Corr.=0.9THCM

Pairwise−SCS (QC)

Classical−SCMM (QC)

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 45: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

Remarks and conclusion

I In general, CSE (and CMME) are very robust undercell-wise contam.

I We have seen that CSE (and CMME) do not perform verywell under HTCM

I Our goal is to have an estimator highly robust under bothHTCM and cell-wise contam. (we are ambitious!)

I ...while efficiency is our second priority

To be continued....

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model

Page 46: Development of robust scatter estimators under independent ...andy.leung/files/... · Alqallaf, Van Aelst, Yohai and Zamar (2006) proposed a new contamination model... Cell-wise contamination

Acknowledgement

Special thanks to Professor R. Zamar and Professor V. Yohai!

Prof. Zamar Prof. Yohai

...AND THANK YOU FOR LISTENING!

C. Agostinelli1, A. Leung2,, V.J. Yohai3 and R.H. Zamar2 Development of robust scatter estimators under independent contamination model