department of science u.s. coast guard academy new london, connecticut [email protected]
DESCRIPTION
Chemometric Methods for GC x GC. LCDR Gregory J. Hall Glenn S. Frysinger. Department of Science U.S. Coast Guard Academy New London, Connecticut [email protected]. LCDR Gregory J. Hall. 1995 B.S. Marine Science – U.S. Coast Guard Academy 1995 – 1997 Operations Officer, USCGC SPAR - PowerPoint PPT PresentationTRANSCRIPT
Department of ScienceU.S. Coast Guard AcademyNew London, [email protected]
LCDR Gregory J. HallGlenn S. Frysinger
Chemometric Methods for GC x GC
LCDR Gregory J. Hall1995 B.S. Marine Science – U.S. Coast Guard
Academy
1995 – 1997 Operations Officer, USCGC SPAR
1997-1998 M.S. Chemistry, Tufts University
1998-2000 Rotating Military Faculty, USCGA
2000 – Appointed to the PCTS
2002 – 2004 Ph.D. sabbatical, Tufts University
2006 – Ph.D. Chemistry, Tufts University“Chemometric Characterization and Classification of Estuarine Water through Multidimensional Fluorescence”
Permanent Commissioned Teaching Staff (PCTS)
About 23 officers ranked from LT to CAPT
Provide the “interpreters” between the military and civilian faculty and leadership for the college
Teaching, Service, and Scholarship expected
Ph.D. required
LCDR Gregory J. Hall
What IS Chemometrics?Chemometrics is the chemical discipline that uses mathematical, statistical and other methods employing formal logic to design or select optimal measurement procedures and experiments, and to provide maximum relevant chemical information by analyzing chemical data. (D.L. Massart: Chemometrics:, Elsevier, NY,1988)
Chemometrics already covered and to come
1. Difference Chromatograms
2. Property Modeling
3. Clustering
4. Chromatograph Prediction
5. Mass Spec searching
6. Template Construction
7. XICs
8. Retention Indices
You are all already chemometricians!
Today1. Data Structures – How I view GC x GC data
2. Variance - PCA
3. Classification – SIMCA, PCR-DA
4. Regression – PLS
5. Peak Resolution - PARAFAC
6. Preprocessing – Alignment
7. The way forward, humble opinions
Data – GC x GC - FID
XJ
K
I
sam
ple
First Dimension
Secon
d Dim
ensio
n
1 2 40 50 67 32 32 25 10 1
2 5 64 90 45 1 18 5 67 10 1
7 41 7 80 23 4 41 50 42 20
Intensity Values
Chromatogram
“Two way”
3 Dimensions
Chromatogram Stack
“Three way”
4 Dimensions
Dataset Data Object
First DimensionSec
ond
Dimen
sion
Data – GC x GC -TOF
Sample (Date?)
First Dimension
Secon
d Dim
ensio
n
m/z
X
Dataset
“Four way”
5 Dimensions !
variable 1
varia
ble
2va
riabl
e 3
ij
PC 1
PC 2
T2Q
Principal Components Analysis (PCA)
Principal Components Analysis (PCA)
= T
P
“model”
Sam
ples
X
data
E+
residuals
“components”
X T P E
Goal - Variance capture
Multi-way Principal Components Analysis (MPCA)
Wise, B. M.; Gallagher, N. B.; Bro, R.; Shaver, J. M.; Windig, W.; Koch, R. S. PLS Toolbox 4.0; Eigenvector Research, Inc.: Wenatchee, WA, 2006.
Our data15 x 410,000
0 5 10 15 20 25 30 35 40
3.0
2.0
1.0
0.0
Time (min)
Tim
e (s
)
4.0
GC × GC/MS TIC of Fire Debris
6 clean carpet samples 5 gasoline samples6 “doped” carpet samples
PCA Model Specifics1. Only two carpet classes included
2. 4 PCs = 98% variance
3. Two random samples per class left out, all gasoline samples left out of “training set”
4. Left out samples “projected” onto the model later.
PC 1 - LoadingsPC 1 - Loadings
0 5 10 15 20 25 30 35 40 45 50
2.0
1.5
1.0
0.5
0
Time (min)
Tim
e (s
)
Red = positive loadings, correlatedBlue = negative loadings, anti-correlated
PC 2 - LoadingPC 2 - Loading
0 5 10 15 20 25 30 35 40 45 50
2.0
1.5
1.0
0.5
0
Time (min)
Tim
e (s
)
Chemically interpretable results!Next step - classification
Principal Components RegressionDiscriminant Analysis (PCR-DA) w/
accelerant
wo/ accelerant
0 1
0 1
0 1
0 1
1 0
1 0
1 0
1 0
1 0
Y
0 5 10 15 20 25 30 35 40 45 50
2.0
1.5
1.0
0.5
0Time (min)
Tim
e (s
)0 5 10 15 20 25 30 35 40 45 50
2.0
1.5
1.0
0.5
0Time (min)
Tim
e (s
)0 5 10 15 20 25 30 35 40 45 50
2.0
1.5
1.0
0.5
0Time (min)
Tim
e (s
)
Xvariable 1
varia
ble
2va
riabl
e 3
ij
PC 1
PC 2
T2Q
Regression Vector
Regression VectorRegression Vector
0 5 10 15 20 25 30 35 40 45 50
2.0
1.5
1.0
0.5
0
Time (min)
Tim
e (s
)
Red = positive loadingsBlue = negative loadings
20 25 30
150
100
O
Regression Vector ZoomRegression Vector Zoom
Principal Components Regression Principal Components Regression PredictionsPredictions
1 6 7 12 17-0.2
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8Sample Scores on the Regression Vector
Unaltered Carpet
Arson Debris
Gasoline
Discriminant Analysis 1 = Member of Arson Class
Classification – Soft Independent Model of Class Analogy (SIMCA)
variable 1
varia
ble
2va
riabl
e 3
xy
zk
22 2xk r rd Q T
22
, , 20.95 0.95
k kr k r k
Q TQ TQ T
variable 1va
riable
2
varia
ble
3
ij
PC 1
PC 2
T2Q
SIMCA Model Specifics1. PCA modeled for 2 classes – Arson , not Arson
2. Each model had 2 PCs with 99% variance captured
3. One random samples per class left out, all gasoline samples left out of “training set”
4. Left out samples “projected” onto each model later.
Arson “Case” SIMCA Results
Carpet Doped Gasoline0
1
In C
arpe
t Cla
ss
0
1
In D
oped
Cla
ss
1
2
Nea
rest
Cla
ss
0
1N
ot in
any
Cla
ss
Carpet Doped Gasoline
Carpet Doped GasolineCarpet Doped Gasoline
Carpet SamplesCarpet TestDoped SamplesDoped TestGasoline Test
Arson “Case” SIMCA Fit Statistics
-10 0 10 20 30-0.01
0
0.01
0.02
0.03
0.04
Q Residuals
T^2
Res
idua
ls
0 50 100 150 200 250
0
0.05
0.1
0.15
0.2
0.25
Q ResidualsT^
2 R
esid
uals
Fit Statistics for Doped Carpet Class
-4 -2 0 2 4 6 80.01
0.015
0.02
0.025
0.03
Q Residuals
T^2
Res
idua
ls
0 500 1000
0
0.2
0.4
0.6
0.8
1
Q Residuals
T^2
Res
idua
ls
Fit Statistics for Carpet Class
Carpet SamplesCarpet TestDoped SamplesDoped TestGasoline Test
Parallel Factor Analysis (PARAFAC)
1
R
ijk ir jr kr ijkr
x a b c e
+=
B
A
CG
X E
+=X E
a1
b1
c1
a2
b2
c2
a3
b3
c3
+ +
J
K
I
J
K
I
J
R
K R
R I
J
K
I
J
K
I
Parallel Factor Analysis (PARAFAC)
PARAFAC
Sample
Sco
reLo
adin
g
Load
ing
First Dimension Second DimensionXJ
K
I
Factor 1Factor 2
a1
b1
c1
a2
b2
c2
Sample
Sco
reLo
adin
g
Load
ing
First Dimension
GC x GC - FID
Chromatogram Stack
Second Dimension
Parallel Factor Analysis (PARAFAC)
GC x GC - TOF
Sinha, A. E.; Fraga, C. G.; Prazen, B. J.; Synovec, R. E. Journal of Chromatography A 2004, 1027, 269-277.
Parallel Factor Analysis (PARAFAC)
PARAFAC
m/z
Sco
reLo
adin
g
Load
ing
First Dimension Second DimensionXJ
K
I
Factor 1Factor 2
a1
b1
c1
a2
b2
c2
m/z
Sco
reLo
adin
g
Load
ing
First Dimension
GC x GC - TOF
Sample
Second Dimension
Parallel Factor Analysis (PARAFAC)
GC x GC - TOF
“Complex Environmental Sample”
Sinha, A. E.; Fraga, C. G.; Prazen, B. J.; Synovec, R. E. Journal of Chromatography A 2004, 1027, 269-277.
PARAFAC Results
Sinha, A. E.; Fraga, C. G.; Prazen, B. J.; Synovec, R. E. Journal of Chromatography A 2004, 1027, 269-277.
PARAFAC Results
Sinha, A. E.; Fraga, C. G.; Prazen, B. J.; Synovec, R. E. Journal of Chromatography A 2004, 1027, 269-277.
GCImage screen capture
NIJ0221 100 µg 75% Wx gasoline / nylon carpet matrix
GC × GC/MS Peak Deconvolution PARAFAC?
Partial Least Squares (PLS)
X T P E Y T Q F
= T
P
“model”
sam
ples
X
data
E+
residuals
“latent variables”
Y
variables
sam
ples
properties
= TFQ +
PLS Results Naphthalenes in Jet Fuel
Johnson, K. J.; Prazen, B. J.; Young, D. C.; Synovec, R. E. Journal of Separation Science 2004, 27, 410-416.
Alignment Strategy 1Experimental Design
Alignment Strategy 2Templates / Peak TablesAlignment Strategy 3Retention Index
Alignment Strategy 4Piecewise Correlation Maximization
Pierce, K. M.; Wood, L. F.; Wright, B. W.; Synovec, R. E. Analytical Chemistry 2005, 77, 7735-7743.
Alignment Strategy 5
“Warping”
Kaczmarek, K.; Walczak, B.; de Jong, S.; Vandeginste, B. G. M. Journal of Chemical Information and Computer Sciences 2003, 43, 978-986.
Alignment Strategy Proposal # 1Anchor Warping
Alignment Strategy Proposal # 1Anchor Warping
Alignment Strategy Proposal #2DTW – Piecewise Hybrid
1st Dimension DTW Alkanes?
2nd Dimension Piecewise
Humble Opinions1. GC x GC is tremendously interesting data
2. Tremendous amounts of work possible, even with data that presently exists. Good alignment will open up even more possibilities
3. Include the Chemist in the analysis
4. Include the Chemometrician in the experimental design
Future? 1. More PCA, PCR, PLS, PARAFAC
2. Regression certainty calculations
3. NPLS, NPLS-DA
4. Holistic, automatic alignment strategies 2D COW or DTW ?
PARAFAC 2 ?
5. User driven alignment strategiesAnchor warping
6. Inclusion on m/z axisPurity, CODA?
U.S. Coast Guard Academy Alexander Trust
You all!
Acknowledgements