alexey pomerantsev & oxana rodionova - chemometrics · alexey pomerantsev & oxana rodionova...
TRANSCRIPT
![Page 1: Alexey Pomerantsev & Oxana Rodionova - Chemometrics · Alexey Pomerantsev & Oxana Rodionova Semenov Institute of Chemical Physics, Moscow Use and abuse of robust PCA. 28.06.12 CAC-2012](https://reader030.vdocuments.site/reader030/viewer/2022013003/5f06afc87e708231d4193b4a/html5/thumbnails/1.jpg)
28.06.12 1CAC-2012
Robust SIMCA Robust SIMCA bearing on nonbearing on non--robust PCArobust PCA
Alexey Pomerantsev & Oxana Rodionova
Semenov Institute of Chemical Physics, Moscow
Use and abuse of robust PCAUse and abuse of robust PCA
![Page 2: Alexey Pomerantsev & Oxana Rodionova - Chemometrics · Alexey Pomerantsev & Oxana Rodionova Semenov Institute of Chemical Physics, Moscow Use and abuse of robust PCA. 28.06.12 CAC-2012](https://reader030.vdocuments.site/reader030/viewer/2022013003/5f06afc87e708231d4193b4a/html5/thumbnails/2.jpg)
28.06.12 2CAC-2012
Contaminated dataContaminated data
Rγ=(1–γ)R+γD
Few outliers Two groups
![Page 3: Alexey Pomerantsev & Oxana Rodionova - Chemometrics · Alexey Pomerantsev & Oxana Rodionova Semenov Institute of Chemical Physics, Moscow Use and abuse of robust PCA. 28.06.12 CAC-2012](https://reader030.vdocuments.site/reader030/viewer/2022013003/5f06afc87e708231d4193b4a/html5/thumbnails/3.jpg)
28.06.12 3CAC-2012
Classical methods for contaminated data Classical methods for contaminated data
Masking effect Swamping effect
![Page 4: Alexey Pomerantsev & Oxana Rodionova - Chemometrics · Alexey Pomerantsev & Oxana Rodionova Semenov Institute of Chemical Physics, Moscow Use and abuse of robust PCA. 28.06.12 CAC-2012](https://reader030.vdocuments.site/reader030/viewer/2022013003/5f06afc87e708231d4193b4a/html5/thumbnails/4.jpg)
28.06.12 4CAC-2012
Classical and robust statisticsClassical and robust statisticsRobustClassical
( )xmedian ~ =x
∑=
=I
iix
Ix
1
1
( )xs ~median1.4826 MAD −= x
∑=
−−
=I
ii xx
Is
1
22 )(1
1
Classical
Robust
outlier
![Page 5: Alexey Pomerantsev & Oxana Rodionova - Chemometrics · Alexey Pomerantsev & Oxana Rodionova Semenov Institute of Chemical Physics, Moscow Use and abuse of robust PCA. 28.06.12 CAC-2012](https://reader030.vdocuments.site/reader030/viewer/2022013003/5f06afc87e708231d4193b4a/html5/thumbnails/5.jpg)
28.06.12 5CAC-2012
Extremes and OutliersExtremes and Outliers
)2(
),(
222 χ∝+
∝⎟⎠⎞
⎜⎝⎛
yx
Nyx I0
)1|2(222 α−χ≤+ −yx
( )Iyx 1222 )1(|2 β−χ≤+ −
α is Extreme significance
β is Outlier significance
α=0.01β=0.05
![Page 6: Alexey Pomerantsev & Oxana Rodionova - Chemometrics · Alexey Pomerantsev & Oxana Rodionova Semenov Institute of Chemical Physics, Moscow Use and abuse of robust PCA. 28.06.12 CAC-2012](https://reader030.vdocuments.site/reader030/viewer/2022013003/5f06afc87e708231d4193b4a/html5/thumbnails/6.jpg)
28.06.12 6CAC-2012
ProblemProblem
OK
Regular data
OKRobustmethods
BADClassicalmethods
Contaminated data
?
![Page 7: Alexey Pomerantsev & Oxana Rodionova - Chemometrics · Alexey Pomerantsev & Oxana Rodionova Semenov Institute of Chemical Physics, Moscow Use and abuse of robust PCA. 28.06.12 CAC-2012](https://reader030.vdocuments.site/reader030/viewer/2022013003/5f06afc87e708231d4193b4a/html5/thumbnails/7.jpg)
28.06.12 7CAC-2012
Principal Component AnalysisPrincipal Component Analysis
I
A
TA
A PA
EA+X I= × J
J
t
I
J
![Page 8: Alexey Pomerantsev & Oxana Rodionova - Chemometrics · Alexey Pomerantsev & Oxana Rodionova Semenov Institute of Chemical Physics, Moscow Use and abuse of robust PCA. 28.06.12 CAC-2012](https://reader030.vdocuments.site/reader030/viewer/2022013003/5f06afc87e708231d4193b4a/html5/thumbnails/8.jpg)
28.06.12 8CAC-2012
PCA PCA robustificationrobustification
(1) Data pre-processing;
(2) Decomposition;
(3) Calculation of thresholds.
![Page 9: Alexey Pomerantsev & Oxana Rodionova - Chemometrics · Alexey Pomerantsev & Oxana Rodionova Semenov Institute of Chemical Physics, Moscow Use and abuse of robust PCA. 28.06.12 CAC-2012](https://reader030.vdocuments.site/reader030/viewer/2022013003/5f06afc87e708231d4193b4a/html5/thumbnails/9.jpg)
28.06.12 9CAC-2012
Scores & Orthogonal DistancesScores & Orthogonal DistancesSD:
distance within the model
∑=
−
λ==
A
a a
iaiii
th1
21tt )( tTTt
OD:distance to the model
∑∑
+==
==K
Aaia
J
jiji tev
1
2
1
2
![Page 10: Alexey Pomerantsev & Oxana Rodionova - Chemometrics · Alexey Pomerantsev & Oxana Rodionova Semenov Institute of Chemical Physics, Moscow Use and abuse of robust PCA. 28.06.12 CAC-2012](https://reader030.vdocuments.site/reader030/viewer/2022013003/5f06afc87e708231d4193b4a/html5/thumbnails/10.jpg)
28.06.12 10CAC-2012
Data Driven SIMCAData Driven SIMCA
SD OD
hu =
v (u1,...., uI )% (u0/N) χ2(N)
u0= ?
N = ?
J. Chemometrics 22 (2008) 601-609
![Page 11: Alexey Pomerantsev & Oxana Rodionova - Chemometrics · Alexey Pomerantsev & Oxana Rodionova Semenov Institute of Chemical Physics, Moscow Use and abuse of robust PCA. 28.06.12 CAC-2012](https://reader030.vdocuments.site/reader030/viewer/2022013003/5f06afc87e708231d4193b4a/html5/thumbnails/11.jpg)
28.06.12 11CAC-2012
Tolerance AreasTolerance Areas
α is Extreme significance
β is Outlier significance
00 vvN
hhNz vh +=
)(2vh NNz +χ∝
)1|(2 α−+χ≤ −vh NNz
( )Ivh NNz 12 )1(| β−+χ≤ −
![Page 12: Alexey Pomerantsev & Oxana Rodionova - Chemometrics · Alexey Pomerantsev & Oxana Rodionova Semenov Institute of Chemical Physics, Moscow Use and abuse of robust PCA. 28.06.12 CAC-2012](https://reader030.vdocuments.site/reader030/viewer/2022013003/5f06afc87e708231d4193b4a/html5/thumbnails/12.jpg)
28.06.12 12CAC-2012
Classical Data Driven (CDD) SIMCA Classical Data Driven (CDD) SIMCA Classical Method of Moments
2
20
0ˆ2intˆ,ˆusuNuu ==
∑∑==
−−
==I
iiu
I
ii uu
Isu
Iu
1
22
1)(
11,1
(u1,...., uI )% (u0/N) χ2(N)Given
Then
Where
![Page 13: Alexey Pomerantsev & Oxana Rodionova - Chemometrics · Alexey Pomerantsev & Oxana Rodionova Semenov Institute of Chemical Physics, Moscow Use and abuse of robust PCA. 28.06.12 CAC-2012](https://reader030.vdocuments.site/reader030/viewer/2022013003/5f06afc87e708231d4193b4a/html5/thumbnails/13.jpg)
28.06.12 13CAC-2012
Robust Data Driven (RDD) SIMCA Robust Data Driven (RDD) SIMCA Robust Method of Moments
M=median(u) R=interquartile(u)
Given
Then
Where
[ ]⎪⎪⎩
⎪⎪⎨
⎧
χ−χ=
χ=⇐
−−
−
),25.0(),75.0(
),5.0(
~
~
220
200
NNNuR
NNuM
N
u
(u1,...., uI )% (u0/N) χ2(N)
u(1) ≤ u(2 )≤ .... ≤ u(I-1) ≤ u(I)
½ ½
¼ ¼
![Page 14: Alexey Pomerantsev & Oxana Rodionova - Chemometrics · Alexey Pomerantsev & Oxana Rodionova Semenov Institute of Chemical Physics, Moscow Use and abuse of robust PCA. 28.06.12 CAC-2012](https://reader030.vdocuments.site/reader030/viewer/2022013003/5f06afc87e708231d4193b4a/html5/thumbnails/14.jpg)
28.06.12 14CAC-2012
Dual Data Driven (3D) SIMCA Dual Data Driven (3D) SIMCA Given
Then
X=TtP+Eh=(h1,...., hI) v=(v1,...., vI)
NoRDD SIMCA
YesCDD SIMCA
RDD SIMCACDD SIMCA
( ) ( )vh NvNh ˆˆˆˆ00
( ) ( )vh NvNh ~~~~00
( ) ( )vvhh NNNN ˆ~&ˆ~≈≈
![Page 15: Alexey Pomerantsev & Oxana Rodionova - Chemometrics · Alexey Pomerantsev & Oxana Rodionova Semenov Institute of Chemical Physics, Moscow Use and abuse of robust PCA. 28.06.12 CAC-2012](https://reader030.vdocuments.site/reader030/viewer/2022013003/5f06afc87e708231d4193b4a/html5/thumbnails/15.jpg)
28.06.12 15CAC-2012
TOMCAT TOMCAT ToolBoxToolBox
M. Daszykowski, S. Serneels, K. Kaczmarek, P. Van Espen, C. Croux, B. Walczak, ChemoLab 85 (2007) 269-277
http://chemometria.us.edu.pl/RobustToolbox/
Robust PCArobust PCs, robust singular values
Robust classification rulesz-transformed robust OD and SD
)(
)(medianODROD
)(
)(medianSDRD
OD
OD
SD
SD
n
ii
n
ii
Q
Q
−=
−=
Robust pre-processingrobust centering & scaling
![Page 16: Alexey Pomerantsev & Oxana Rodionova - Chemometrics · Alexey Pomerantsev & Oxana Rodionova Semenov Institute of Chemical Physics, Moscow Use and abuse of robust PCA. 28.06.12 CAC-2012](https://reader030.vdocuments.site/reader030/viewer/2022013003/5f06afc87e708231d4193b4a/html5/thumbnails/16.jpg)
28.06.12 16CAC-2012
Case study I. Simulated regular dataCase study I. Simulated regular data
),(N),(N 2I0εV0δ
εδx
σ∝∝
+=
The numbers of variables, J=3
The numbers of objects, I=100
The number of principal components, A=2
The δ properties are:
E(δ) = 0, v11= v22 = v33 = 0.28, rank(V) = 2.
The ε component properties are:
E(ε) = 0, σ=0.05
![Page 17: Alexey Pomerantsev & Oxana Rodionova - Chemometrics · Alexey Pomerantsev & Oxana Rodionova Semenov Institute of Chemical Physics, Moscow Use and abuse of robust PCA. 28.06.12 CAC-2012](https://reader030.vdocuments.site/reader030/viewer/2022013003/5f06afc87e708231d4193b4a/html5/thumbnails/17.jpg)
28.06.12 17CAC-2012
SIMCA plotsSIMCA plots
ext=5
out=11
out=0out=12
CDD SIMCA TOMCAT
extreme area (α=0.05) outlier area (β=0.05)
![Page 18: Alexey Pomerantsev & Oxana Rodionova - Chemometrics · Alexey Pomerantsev & Oxana Rodionova Semenov Institute of Chemical Physics, Moscow Use and abuse of robust PCA. 28.06.12 CAC-2012](https://reader030.vdocuments.site/reader030/viewer/2022013003/5f06afc87e708231d4193b4a/html5/thumbnails/18.jpg)
28.06.12 18CAC-2012
Totally in 10 regular data setsTotally in 10 regular data sets
70
60
Expected
![Page 19: Alexey Pomerantsev & Oxana Rodionova - Chemometrics · Alexey Pomerantsev & Oxana Rodionova Semenov Institute of Chemical Physics, Moscow Use and abuse of robust PCA. 28.06.12 CAC-2012](https://reader030.vdocuments.site/reader030/viewer/2022013003/5f06afc87e708231d4193b4a/html5/thumbnails/19.jpg)
28.06.12 19CAC-2012
Case study II. Simulated data with outliersCase study II. Simulated data with outliers
),(N),(N 2I0εV0δ
εδx
σ∝∝
+=
The numbers of variables, J=3
The numbers of objects, I=100
The number of principal components, A=2
The δ properties are:
E(δ) = 0, v11= v22 = v33 = 0.28, rank(V) = 2.
The ε component properties are:
E(ε) = 0, σ=0.05 (first 97 objects)
E(ε) = 0, σ =0.2 (last 3 objects)
![Page 20: Alexey Pomerantsev & Oxana Rodionova - Chemometrics · Alexey Pomerantsev & Oxana Rodionova Semenov Institute of Chemical Physics, Moscow Use and abuse of robust PCA. 28.06.12 CAC-2012](https://reader030.vdocuments.site/reader030/viewer/2022013003/5f06afc87e708231d4193b4a/html5/thumbnails/20.jpg)
28.06.12 20CAC-2012
SIMCA plotsSIMCA plots
![Page 21: Alexey Pomerantsev & Oxana Rodionova - Chemometrics · Alexey Pomerantsev & Oxana Rodionova Semenov Institute of Chemical Physics, Moscow Use and abuse of robust PCA. 28.06.12 CAC-2012](https://reader030.vdocuments.site/reader030/viewer/2022013003/5f06afc87e708231d4193b4a/html5/thumbnails/21.jpg)
28.06.12 21CAC-2012
REFERENCE & RDDREFERENCE & RDD--SIMCASIMCA
![Page 22: Alexey Pomerantsev & Oxana Rodionova - Chemometrics · Alexey Pomerantsev & Oxana Rodionova Semenov Institute of Chemical Physics, Moscow Use and abuse of robust PCA. 28.06.12 CAC-2012](https://reader030.vdocuments.site/reader030/viewer/2022013003/5f06afc87e708231d4193b4a/html5/thumbnails/22.jpg)
28.06.12 22CAC-2012
Totally in 10 data sets with outliersTotally in 10 data sets with outliers
Expected
![Page 23: Alexey Pomerantsev & Oxana Rodionova - Chemometrics · Alexey Pomerantsev & Oxana Rodionova Semenov Institute of Chemical Physics, Moscow Use and abuse of robust PCA. 28.06.12 CAC-2012](https://reader030.vdocuments.site/reader030/viewer/2022013003/5f06afc87e708231d4193b4a/html5/thumbnails/23.jpg)
28.06.12 23CAC-2012
Case study II. Real world data with 2 groupsCase study II. Real world data with 2 groups
Substance in the closed PE bags,
82 drums measured by NIR.
Totally: 246 spectra
Group G1: 196 objects
Group G2: 50 objects ACA 642 (2009) 222-227
![Page 24: Alexey Pomerantsev & Oxana Rodionova - Chemometrics · Alexey Pomerantsev & Oxana Rodionova Semenov Institute of Chemical Physics, Moscow Use and abuse of robust PCA. 28.06.12 CAC-2012](https://reader030.vdocuments.site/reader030/viewer/2022013003/5f06afc87e708231d4193b4a/html5/thumbnails/24.jpg)
28.06.12 24CAC-2012
Expected/observed number of extremesExpected/observed number of extremes
Clean subset G1 Contaminated dataset G1+G2
Expected number of extremes N=αI
![Page 25: Alexey Pomerantsev & Oxana Rodionova - Chemometrics · Alexey Pomerantsev & Oxana Rodionova Semenov Institute of Chemical Physics, Moscow Use and abuse of robust PCA. 28.06.12 CAC-2012](https://reader030.vdocuments.site/reader030/viewer/2022013003/5f06afc87e708231d4193b4a/html5/thumbnails/25.jpg)
28.06.12 25CAC-2012
Results of separationResults of separation
Subset G1 revealed Subset G2 revealed
![Page 26: Alexey Pomerantsev & Oxana Rodionova - Chemometrics · Alexey Pomerantsev & Oxana Rodionova Semenov Institute of Chemical Physics, Moscow Use and abuse of robust PCA. 28.06.12 CAC-2012](https://reader030.vdocuments.site/reader030/viewer/2022013003/5f06afc87e708231d4193b4a/html5/thumbnails/26.jpg)
28.06.12 26CAC-2012
Conclusion 1Conclusion 1Each tool has its purpose: classical methods are for regular data, whereas robust methods should be used for contaminated data. Do not expect that there exists a common tool that yields reasonable results in both cases.
![Page 27: Alexey Pomerantsev & Oxana Rodionova - Chemometrics · Alexey Pomerantsev & Oxana Rodionova Semenov Institute of Chemical Physics, Moscow Use and abuse of robust PCA. 28.06.12 CAC-2012](https://reader030.vdocuments.site/reader030/viewer/2022013003/5f06afc87e708231d4193b4a/html5/thumbnails/27.jpg)
28.06.12 27CAC-2012
Conclusion 2Conclusion 2
Extreme objects play an important role in data analysis. These objects should not be confused with outliers. The number of extremes should be compared to the expected number, coupled with the significance level α.
Clean dataset Contaminated dataset
![Page 28: Alexey Pomerantsev & Oxana Rodionova - Chemometrics · Alexey Pomerantsev & Oxana Rodionova Semenov Institute of Chemical Physics, Moscow Use and abuse of robust PCA. 28.06.12 CAC-2012](https://reader030.vdocuments.site/reader030/viewer/2022013003/5f06afc87e708231d4193b4a/html5/thumbnails/28.jpg)
28.06.12 28CAC-2012
Conclusion 3Conclusion 3The proposed Dual Data Driven PCA/SIMCA approach looks like a fine competitor to the pure classical and to the strictly robust methods. This technique has demonstrated a proper performance in the analysis of both regular and contaminated data sets.
Clean dataset Contaminated dataset
![Page 29: Alexey Pomerantsev & Oxana Rodionova - Chemometrics · Alexey Pomerantsev & Oxana Rodionova Semenov Institute of Chemical Physics, Moscow Use and abuse of robust PCA. 28.06.12 CAC-2012](https://reader030.vdocuments.site/reader030/viewer/2022013003/5f06afc87e708231d4193b4a/html5/thumbnails/29.jpg)
28.06.12 29CAC-2012
Thank you for Thank you for your attentionyour attention
A Lawyer’s Mistake