qualitative and quantitative aspects of the application of genetic algorithm-based variable...
TRANSCRIPT
Qualitative and quantitative aspects of the applicationof genetic algorithm-based variable selection in
polarography and stripping voltammetry
Ana Herrero*, M. Cruz Ortiz
Dpto. QuõÂmica, Facultad de Ciencias, Universidad de Burgos, Pza. Misael BanÄuelos s/n, 09001 Burgos, Spain
Received 9 June 1998; received in revised form 14 August 1998; accepted 25 August 1998
Abstract
A genetic algorithm (GA) is successfully applied as a variable selection method in the multivariate analysis with partial least
squares (PLS) regression of several polarographic and stripping voltammetric data sets, where different interferences are
present (coupled reactions, formation of intermetallic compounds, overlapping signals and matrix effect). In most cases, the
results corresponding to this variable selection method are better than those obtained when all the variables are considered.
Such is the case in the determination of benzaldehyde, where a dimerization reaction occurs simultaneously to the
electrochemical reactions. In general, an improvement in the precision is achieved for the test samples by using the GA. On
the other hand, the GA provides valuable qualitative information that, in every case, provides a signi®cant tool to detect and
understand the chemical phenomena related to each analysis. # 1999 Elsevier Science B.V. All rights reserved.
Keywords: Feature selection; Genetic algorithm; Partial least squares regression; Polarography; Stripping voltammetry; Coupled reactions;
Overlapping signals; Intermetallic compounds; Matrix effect
1. Introduction
The determination of those variables that are really
relevant for building a multivariate regression model is
of great importance. At ®rst, it could be thought that
the more variables constitute the model the better the
model is, because all those variables related to the
response are taken into account. But this reasoning is
not adequate since the presence of other variables not
related to the response can increase the background
noise, reducing the prediction ability of the model
built. The aim of a variable selection procedure is a
signi®cant reduction in the number of variables to
obtain simpler and more stable relationships.
On the other hand, the variable selection in a multi-
variate calibration can reveal the need for using some
variables apparently not related to a speci®c analyte
or, on the contrary, to not use others that could seem
essential. This can denote the presence in the electro-
chemical signal of phenomena not expected, interfer-
ences as coupled reactions, formation of intermetallic
compounds, overlapping signals, matrix effect, etc. In
short, a guided variable selection procedure can pro-
vide very useful qualitative information about the
analysed chemical system. Several methods have been
Analytica Chimica Acta 378 (1999) 245±259
*Corresponding author. Fax: +34-947-258831; e-mail:
0003-2670/99/$ ± see front matter # 1999 Elsevier Science B.V. All rights reserved.
P I I : S 0 0 0 3 - 2 6 7 0 ( 9 8 ) 0 0 6 1 9 - 9
proposed to carry out the selection of variables in
regression analysis [1±5], stepwise procedures being
the most commonly used.
In general, stepwise procedures for the selection of
variables have the drawback that they do not suf®-
ciently explore possible combinations of variables. An
ef®cient alternative is provided through some optimi-
zation procedures, such as the genetic algorithms
[6±9] (GAs), which have been the approach used in
this paper.
GAs [10±13] have been used to solve dif®cult
problems with objective functions that do not possess
`̀ nice'' properties such as continuity, differentiability,
etc. [14±17]. These algorithms maintain and mani-
pulate a family, or population, of solutions and imple-
ment a `̀ survival of the ®ttest'' strategy in their search
for better solutions. Whereas traditional search tech-
niques use characteristics of the problem to determine
the next sampling point (e.g., gradients, Hessians,
linearity and continuity), GAs make no such assump-
tions. Instead, the next sampled point is determined
based on stochastic sampling/decision rules rather
than a set of deterministic decision rules. So GAs
are a technique very useful in the variable selection
problem because the relationship between presence/
absence of variables in a calibration model and the
prediction ability of the model, specially for PLS
models, is very complex and the mathematical proper-
ties above cited are unknown.
GAs search the solution space of a function through
the use of simulated evolution, i.e. the survival of the
®ttest strategy. GAs have been shown to solve the
optimization problem by exploring all regions of the
potential solutions and exponentially exploiting pro-
mising areas through mutation, crossover and selec-
tion operations applied to individuals in the
population. A complete discussion of genetic algo-
rithms can be found in [14±17].
The use of a GA requires the determination of six
fundamental issues that, in this paper, were the follow-
ing:
1. Each subset of variables is represented by a vector
of binary coordinates (gens) called chromosome
where the codi®cation 1 implies that the variable
has been chosen to build a PLS calibration model
and the 0 implies that the variable has not been
chosen.
2. Selection function. The chromosomes are chosen
with proportional inverse probability to the PRESS
of the model in such a way that the effective
selection of two chromosomes that will reproduce
is carried out.
3. Genetic operators. Uniform and simple cross-over
between chromosomes and mutation of each gen
with determined probabilities.
4. Evaluation function. The PRESS of the PLS model
built with only those variables with codification 1
in the chromosome.
5. The GA used in this work, known as `steady-state
without duplicates' [6,7], is characterized by the
fact that it does not replace the whole population,
but only includes a descendent chromosome (a
potential solution) to the population if the PRESS
corresponding to this chromosome is lower than
the worst of the population. In this case, the worst
chromosome of the population goes out to maintain
the size of the population. Duplicate chromosomes
do not exist either.
6. The population evolves by maintaining the best
responses and including those chromosomes that
improve the response. Usually, the evolution stops
when a sufficiently high number of descendent
chromosomes has been generated.
This speci®c GA has been applied in the literature in
multivariate analysis for multicomponent spectropho-
tometric [18,19] and electrochemical [20] determina-
tions, for quality control [21], etc. But it seems to be
also adequate to analyse a wide kind of different
electrochemical signals where several interference
problems, such as overlapping signals or matrix effect,
require the use of multivariate regression models [22±
24]. With the aim of study the suitability of this feature
selection method for solving electroanalytical data,
both qualitative and quantitative aspects of the appli-
cation of the cited GA to several polarographic and
stripping voltammetric examples have been analysed
in this paper.
Some electrochemical examples in which interfer-
ence problems such as coupled reactions, formation of
intermetallic compounds, overlapping signals and
matrix effect, are present have been studied. In most
of the cases, the obtained results showed that the GA
had allowed improvement or, at least, maintenance of
the prediction ability of the partial least squares (PLS)
246 A. Herrero, M. Cruz Ortiz / Analytica Chimica Acta 378 (1999) 245±259
regression models built with the selected variables, the
precision for the test samples being improved.
On the other hand, the qualitative information
obtained about the chemical systems has always been
really signi®cant, helping to understand the chemical
and electrochemical processes that have been taking
place. This fact points out that the GA can be a useful
tool in qualitative electroanalysis.
2. Experimental
2.1. Materials and equipment
Analytical-reagent grade chemicals were used with-
out further puri®cation; benzaldehyde was obtained
from Merck (>99%). All the solutions were prepared
with deionized water obtained in a Barnstead NANO
Pure II system. Table 1 summarizes the supporting
electrolytes and the instrumental and experimental
conditions used in each case. Nitrogen (99.997%)
was used to remove dissolved oxygen.
Most of the polarographic and voltammetric mea-
surements were carried out using a Metrohm 646 VA
processor with a 647 VA stand in conjunction with a
Metrohm multimode electrode (MME) used in the
static (SMDE) and hanging (HMDE) mercury drop
electrode mode, respectively (stirring rate
1290 rev minÿ1). The three-electrode system was
completed by means of a platinum auxiliary electrode
and an Ag/AgCl/KCl (3 mol dmÿ3) reference elec-
trode. Some voltammetric measurements were carried
out with a mAUTOLAB system from Eco Chemie in
conjunction with a Metrohm 663 VA stand, equipped
with a Metrohm multimode electrode (MME) used in
the hanging mercury drop electrode (HMDE) mode,
and connected to the interface to mercury electrodes
IME 663 (stirring rate 1500 rev minÿ1). The three-
electrode system was completed by means of an
Ag/AgCl/KCl (3 mol dmÿ3) reference electrode and
a glassy carbon auxiliary electrode.
Analysis of data were done with PARVUS [25],
STATGRAPHICS [26], and MATLAB [27].
2.2. Polarographic procedures
The solution was placed in the polarographic cell
and purged with nitrogen for 10 min. Once the solu-
tion had been deoxygenated, polarograms were
recorded from initial, Ei, to ®nal, Ef, potentials. All
the results presented were obtained using the differ-
ential-pulse mode with drop time 0.6 s, drop area
0.40 mm2 and scan rate �10 mV sÿ1. After each addi-
tion, the solution was stirred and deoxygenated for 15 s
before applying the polarographic procedure again.
2.3. Voltammetric procedures
The solution was placed in the cell and purged with
nitrogen for 10 min. Once the solution had been
deoxygenated, a deposition potential, Edep, was
applied to the working electrode during a deposition
time, tdep. At the end of this time the stirrer was
switched off, and after 30 s had elapsed, an anodic
potential scan was initiated from Ei to Ef potentials in
the differential-pulse mode. Other instrumental para-
meters were: modulation amplitude for the Metrohm
system, 50 mV; modulation amplitude for the mAU-
TOLAB system, 49.95 mV; pulse repetition time,
0.6 s; nominal area, 0.40 mm2; scan rate, 10 mV sÿ1.
The solution was stirred and deoxygenated for 15 s
after each addition.
3. Results and discussion
Among the multivariate regressions, PLS is
designed as a biased model with high stability in
the predictions. The prediction ability of the PLS
models is evaluated by means of the cross-validated
variance, that is the prediction error sum of squares
(PRESS). So, a GA applied to the selection of the
variables that will constitute the PLS model should
improve, or at least maintain, the cross-validated
variance of the PLS model. The GA used in this paper
rejects any chromosome with cross-validated variance
lower than a predictive threshold value.
This GA has been used to perform a variable
selection procedure in determinations carried out by
differential-pulse polarography (DPP) and differen-
tial-pulse anodic stripping voltammetry (DPASV)
where the PLS regression was needed to model dif-
ferent interference problems, which are individually
speci®ed below. The possibility that, through a guided
experimental design, the variables chosen by the GA
could reveal the nature of these interferences has been
studied together with the quantitative analysis.
A. Herrero, M. Cruz Ortiz / Analytica Chimica Acta 378 (1999) 245±259 247
Tab
le1
Exper
imen
tal
condit
ions
and
par
amet
ers
corr
espondin
gto
each
anal
yse
dca
se
Cas
eA
nal
yte
sT
echniq
ue
Support
ing
elec
troly
teE
lect
rode
Edep/V
t dep/s
Ei/V
Ef/
VS
yst
em
AB
enza
ldeh
yde
DP
PM
cIlv
aine
buff
ers
from
pH
2.2
to
7.6
conta
inin
g20%
of
alco
hol
SM
DE
±±
ÿ0.8
00
ÿ1.5
02
Met
rohm
BC
uII,
Pb
II,
Cd
IIan
dZ
nII
DP
PA
ceti
cac
id(2
mol
dmÿ3
)an
d
amm
oniu
mhydro
xid
e(1
mol
dmÿ3
)
SM
DE
±±
0.0
95
ÿ1.1
17
Met
rohm
CC
uII,
Pb
II,
Cd
IIan
dZ
nII
DPA
SV
Ace
tic
acid
(2m
ol
dmÿ3
)an
d
amm
oniu
mhydro
xid
e(1
mol
dmÿ3
)
HM
DE
ÿ1.1
10
30
ÿ1.1
10
0.0
72
Met
rohm
DT
lIan
dP
bII
DPA
SV
Oxal
icac
id(0
.1m
ol
dmÿ3
)an
d
hydro
chlo
ric
acid
(0.1
mol
dmÿ3
)
HM
DE
ÿ0.6
00
30
ÿ0.5
58
ÿ0.2
82
Met
rohm
EC
uII
and
FeII
ID
PAS
VC
itra
te-c
itri
cac
id(p
H4.7
)H
MD
Eÿ1
.300
60
ÿ0.1
54
0.0
50
mAU
TO
LA
B
Edep,
t dep,
Ei
and
Ef
bei
ng
the
dep
osi
tio
np
ote
nti
al,
dep
osi
tion
tim
e,in
itia
lpote
nti
alan
dfi
nal
pote
nti
al,
resp
ecti
vel
y
248 A. Herrero, M. Cruz Ortiz / Analytica Chimica Acta 378 (1999) 245±259
Fig
.1
.P
ola
rog
ram
s(o
nce
the
bla
nk
sign
alh
asb
een
subtr
acte
d)
and
var
iable
sse
lect
edby
the
GA
(ver
tica
lli
nes
)in
the
det
erm
inat
ion
of
ben
zald
ehyde
for
pH
:(a
)2.2
,(b
)4.6
and
(c)
7.6
.B
enza
ldeh
yde
con
cen
trat
ion
sg
ofr
om
0.4
9to
3.3
4m
mol
dmÿ3
.
A. Herrero, M. Cruz Ortiz / Analytica Chimica Acta 378 (1999) 245±259 249
3.1. Determination of benzaldehyde by DPP
In the polarographic determination of benzaldehyde
carried out at different pH, coupled reactions take
place (dimerization reactions in particular), which
leads to obtaining polarograms of low analytical
quality with highly overlapping and shifted peaks,
in such a way that the use of multivariate regression
techniques (PLS regression) are required [23]. Then a
PLS regression model was built independently for
each pH. Seven additions of benzaldehyde (7 objects)
constituted each training set, the current at 119 poten-
tials being recorded (119 predictor variables). The
mean of the absolute relative errors corresponding
to these PLS models was equal to 2.10%.
Fig. 1 shows the polarograms for the three pH
values (2.2, 4.6, 7.6) which are most representative
of the different kind of signals recorded. When the
reduction of benzaldehyde is carried out in acid
medium, Fig. 1(a), two successive polarographic
peaks are observed. The ®rst (�ÿ0.95 V) corresponds
to the formation of a free radical which simultaneously
can dimerize (depending on the concentration of
benzaldehyde) or be reduced to benzyl alcohol, giving
the second peak (�ÿ1.30 V) obscured by the dis-
charge of protons (<ÿ1.38 V). At higher pH,
Fig. 1(c), the two peaks are gradually replaced by
only one peak (�ÿ1.32 V) associated with an elec-
trodic reaction of two electrons, corresponding to the
formation of benzyl alcohol as a result of an ECE
process. So, at intermediate pH, Fig. 1(b), this third
peak (�ÿ1.4 V) appears together with the other two
(�ÿ1.08 and ÿ1.23 V).
When the GA was applied to the data set corre-
sponding to each pH, different sets of selected vari-
ables were obtained for each one. Fig. 1 shows the
variables selected (vertical lines) by the GA for the
three pH indicated above. In Fig. 1(a) some of the
variables selected are in the zone corresponding to the
®rst peak, and the rest are related to those potentials
where protons are reduced, which indicates that these
potentials are also related to the benzaldehyde con-
centration because its inclusion in the PLS models
improves the cross-validated variance. In fact, a shift
of the polarographic signal due to the discharge of the
protons when the concentration of benzaldehyde
increases was observed. In Fig. 1(b) potentials over
the three polarographic peaks have been selected by
the GA at intermediate pH values, whereas in Fig. 1(c)
the selected variables at high pH corresponds to the
only peak of the polarogram, principally to the tails of
it, avoiding in both cases the zones where the shifting
of the peak is clear.
Next, the PLS models built with all the variables
and those built with the variables selected by the GA
have been compared. The prediction ability of these
models has been evaluated by means of tests of paired
samples that have been used to compare the absolute
value of the relative error of the benzaldehyde con-
centration calculated by the PLS models built without
and with the selection of variables. Since the differ-
ences between these absolute errors could not be
considered normal, it is necessary to use non-para-
metric tests. For the two tests applied, the Signs test
and the Wilcoxon signed ranks test [28], the null
hypothesis (H0) was: the median of the differences
is zero, i.e. there is no effect by using the variable
selection procedure; whereas the alternative hypoth-
esis (Ha) was: the median is different from zero, i.e.
there is effect due to the variable selection. For sig-
ni®cance levels (�) lower than 0.05, H0 is rejected.
The results of these non-parametric tests are shown in
Table 2, Case A. Both tests conclude that there exists
effect by the variable selection procedure (p<0.05), i.e.
the concentrations calculated with and without selec-
tion of variables are statistically different. Further-
more, as the median of the differences is ÿ0.255, the
variable selection procedure reduces the relative error,
for that signi®cance level, by 0.255%.
3.2. Determination of CuII, PbII, CdII and ZnII by DPP
Partial least squares regression has been success-
fully applied to the polarographic determination of
Table 2
Medians and actual significance probabilities (p) for the two non-
parametric tests applied to all the analysed cases
Case
A B C D E
Median ÿ0.255 ÿ0.155 1.440 2.815 0.000
Wilcoxon signed
ranks test
0.004a 0.164 0.031a 0.021a 0.584
Signs test 0.004a 0.112 0.052 0.077 0.617
H0: median is zero, and Ha: median is different from zero.a p<0.05 implies that H0 is rejected.
250 A. Herrero, M. Cruz Ortiz / Analytica Chimica Acta 378 (1999) 245±259
copper, lead, cadmium and zinc [22]. This multivariate
model allows one to simultaneously determine these
four metals when an adequate experimental design is
used (approximately a central composite design [29]
but with equal volume additions of each metal to give
®ve different levels of concentration). In this case, 28
samples constituted the training set and 8 the test set
(the current at 100 potentials was recorded for each
sample). The mean absolute relative errors was 2.37%,
as opposed to errors up to 40% obtained when classi-
cal univariate methods were used with these `̀ appar-
ently'' non-problematic voltammetric signals [22].
The polarograms corresponding to this analysis are
shown in Fig. 2, where it can be seen that there exits
noise due to irregularities of the signals, mainly at the
extremes of the polarogram. For this reason, when the
GA is applied independently for each metal, the
selected potentials correspond both to the peak of
the analysed metal and to other zones of the polaro-
grams in such a way that these ¯uctuations can be
modelled by the multivariate regression. The variables
selected by the GA for CuII (®rst peak) and PbII
(second peak) are shown in Fig. 2(a) and (b), respec-
tively. The variable selection carried out by the GA
seems to point out that there is no other interference
apart from that corresponding to the variability of the
base line, which is modelled by taking into account not
specially relevant potentials (valley points).
The above non-parametric tests have been applied
again to compare the results obtained by the PLS
models with and without the variable selection pro-
cedure, the corresponding results being shown in
Table 2, Case B. Both tests conclude that there are
no statistically signi®cant differences between the two
procedures for a signi®cance level��0.05. So, the use
of the GA in this case allows one to maintain the
prediction ability of the PLS models (see the cross-
validated variance in Table 3) and gives indicative
qualitative information.
3.3. Determination of CuII, PbII, CdII and ZnII by
DPASV
The determination of these four metals has been
carried out also at lower concentrations, so a more
sensitive electroanalytical technique, which implies
an electrodeposition step, has been used. But, when
several metals are simultaneously on a mercury elec-
trode, intermetallic compounds are usually formed
between the amalgamate metals (intermetallic com-
pounds between: Au±Zn, Cu±Cd, Cu±Zn, Cu±Sn, Co±
Zn, Ni±Zn, etc., have been reported) [30]. This phe-
nomenon can seriously interfere in the analytical
response, causing its severe depression or shift, which
can generate large errors in the determinations made
by stripping analysis [31].
The same experimental design used in the last
polarographic determination has been followed to
carry out this simultaneous analysis (so, there are
28 samples in the training set and 8 in the test set,
the current at 99 potentials being recorded). All the
voltammograms recorded are shown in Fig. 3, where
some interferences can be foreseen at the higher levels
of concentrations. As in the polarographic analysis, a
PLS model has been built independently for each
metal (giving a mean of the absolute relative errors
of 3.7%), which has been able to model these inter-
ferences, that could be caused by the formation of
intermetallic compounds.
The GA has also been applied to this data set, and
contrary to the previous case, the selected variables are
related to those voltammetric peaks of the metals that
form intermetallic compounds, as can be seen in
Fig. 3(a) and (b) for CuII (fourth peak) and PbII (third
peak), respectively. So, the variable selection made by
the GA allows one to con®rm the existence of a new
interference and, knowing the potentials implied in it,
to relate this interference to the intermetallic com-
pound formation.
With reference to the prediction ability of the PLS
models built with the variables selected by the GA, the
results of both non-parametric tests, in Table 2 (Case
C) do not coincide, and whereas the Signs test con-
cludes that the determinations made without and with
variable selection are statistically equal, the Wilcoxon
signed ranks test conclude the contrary. Since the
actual signi®cance probability of the latter test is
really near to the critical value of 0.05, it could be
concluded that both procedures give statistically equal
results.
3.4. Determination of TlI and PbII by DPASV
On the other hand, the simultaneous determination
of several metals by means of voltammetric techni-
ques can be rendered dif®cult by the presence of
A. Herrero, M. Cruz Ortiz / Analytica Chimica Acta 378 (1999) 245±259 251
Fig. 2. Polarograms recorded in the determination of CuII, PbII, CdII and ZnII by DPP (concentration ranges: 1.92±9.47 mmol dmÿ3 for CuII,
PbII and ZnII, and 3.02±14.85 mmol dmÿ3 for CdII). The polarographic peaks correspond, from left to right, to CuII, PbII, CdII and ZnII,
respectively. The vertical lines indicate those variables selected by the GA for (a) CuII and (b) PbII.
252 A. Herrero, M. Cruz Ortiz / Analytica Chimica Acta 378 (1999) 245±259
Table 3
True (ctrue) and calculated (ccalc�precision) concentrations for the test set samples, and explained and cross-validated (in brackets) variances
of the PLS models built without and with variable selection by means of the genetic algorithm
Case Metal ctrue (mmol dmÿ3) Without GA With GA
Variance (%) ccalc (mmol dmÿ3) Variance (%) ccalc (mmol dmÿ3)
B CuII 3.831 99.661 3.854�0.158 99.855 4.003�0.092
7.605 (99.547) 7.483�0.095 (99.823) 7.527�0.043
5.747 5.669�0.312 5.662�0.039
5.703 5.691�0.192 5.703�0.090
5.747 5.684�0.059 5.783�0.041
5.703 5.667�0.071 5.699�0.025
5.747 5.488�0.211 5.657�0.096
5.703 5.737�0.073 5.694�0.047
CdII 9.010 99.792 9.208�0.115 99.85 9.208�0.058
8.942 (99.727) 8.907�0.163 (99.77) 8.948�0.101
9.010 9.207�0.194 9.140�0.124
8.942 9.142�0.138 9.209�0.106
6.007 5.931�0.060 5.959�0.063
11.922 11.893�0.104 11.932�0.054
9.010 9.285�0.141 8.980�0.119
8.942 9.051�0.083 9.053�0.051
PbII 5.747 99.826 5.800�0.067 99.850 5.789�0.059
5.703 (99.772) 5.707�0.092 (99.702) 5.711�0.060
3.831 3.381�0.113 3.707�0.358
7.605 7.592�0.079 7.542�0.046
5.747 5.773�0.036 5.776�0.027
5.703 5.765�0.059 5.765�0.038
5.747 6.199�0.086 5.981�0.253
5.703 5.826�0.054 5.769�0.040
ZnII 5.747 99.143 5.620�0.222 99.207 5.607�0.209
5.703 (99.105) 5.613�0.133 (99.199) 5.570�0.090
5.747 6.489�0.437 6.728�0.555
5.703 5.734�0.270 5.918�0.144
5.747 5.708�0.083 5.852�0.077
5.703 5.859�0.099 5.767�0.065
3.831 3.631�0.296 3.789�0.056
7.605 7.760�0.102 7.859�0.055
C CuII 0.136 99.210 0.138�0.011 97.827 0.131�0.006
0.269 (95.263) 0.277�0.013 (96.633) 0.277�0.004
0.204 0.195�0.012 0.193�0.007
0.202 0.205�0.012 0.202�0.006
0.204 0.217�0.015 0.232�0.006
0.202 0.225�0.010 0.217�0.005
0.204 0.201�0.015 0.191�0.006
0.202 0.208�0.011 0.208�0.006
CdII 0.052 97.898 0.053�0.002 96.485 0.054�0.001
0.051 (94.476) 0.052�0.003 (95.614) 0.052�0.002
0.052 0.048�0.003 0.047�0.002
0.051 0.050�0.004 0.047�0.002
0.035 0.032�0.004 0.035�0.001
0.069 0.074�0.003 0.075�0.001
0.052 0.050�0.004 0.048�0.002
0.051 0.051�0.003 0.049�0.003
PbII 0.075 98.903 0.080�0.003 97.650 0.081�0.003
0.074 (95.945) 0.077�0.004 (95.717) 0.079�0.003
A. Herrero, M. Cruz Ortiz / Analytica Chimica Acta 378 (1999) 245±259 253
overlapping signals, such as the case of TlI and PbII
when both metals are simultaneously determined by
stripping voltammetry. To solve this problem several
experimental [32] (in order to search more speci®c
signals by using an adequate supporting electrolyte, a
separation technique, etc.) and statistical approaches
(PLS [33] and continuum [34] regression, etc.) have
been proposed. In this paper, the GA has been used to
select variables in the simultaneous determination of
TlI and PbII when, together with the overlapping
signals of both metals, another signal corresponding
to the experimental blank appears in the same poten-
tial window [24].
The original voltammograms (without the blank
signal being subtracted) have been used in the analy-
sis, which has been carried out following a complete
design (5 levels for each metal and all the possible
combinations between levels [24]) in such a way that
the training set is formed by 25 samples and the test set
by 4, recording the current at 47 potentials. Fig. 4
shows the voltammograms recorded, where the dotted
line corresponds to the signal of the experimental
blank and the solid lines to the voltammograms of
the calibration samples.
As in the previous examples, the vertical lines of
Fig. 4(a) and (b) indicate the variables selected by the
GA for TlI and PbII, respectively. It is evident that, in
the determination of TlI, the GA has avoided the blank
peak potentials and has only considered potentials in
the tails of the peak of thallium where the blank signal
does not interfere. In the same way, no potential
related to the blank signal has been taken into account
in the variable selection made for the determination of
PbII. This behaviour of the GA reveals the existence of
Table 3 (Continued )
Case Metal ctrue (mmol dmÿ3) Without GA With GA
Variance (%) ccalc (mmol dmÿ3) Variance (%) ccalc (mmol dmÿ3)
0.050 0.046�0.004 0.048�0.004
0.099 0.101�0.004 0.100�0.006
0.075 0.078�0.004 0.080�0.003
0.074 0.080�0.003 0.080�0.003
0.075 0.075�0.005 0.073�0.004
0.074 0.077�0.004 0.079�0.002
ZnII 0.474 98.590 0.482�0.018 98.591 0.474�0.009
0.468 (97.486) 0.471�0.020 (97.727) 0.459�0.009
0.474 0.471�0.019 0.474�0.008
0.468 0.460�0.021 0.432�0.021
0.474 0.517�0.028 0.507�0.011
0.468 0.494�0.017 0.480�0.013
0.317 0.317�0.024 0.307�0.011
0.625 0.626�0.022 0.638�0.015
D TlI 0.326 99.710 0.326�0.013 98.244 0.360�0.040
1.283 (99.671) 1.340�0.020 (98.181) 1.390�0.024
0.802 0.807�0.115 0.555�0.049
1.427 1.410�0.027 1.320�0.038
PbII 0.343 99.577 0.351�0.011 99.603 0.353�0.013
0.563 (99.570) 0.578�0.017 (99.608) 0.581�0.007
0.900 0.949�0.077 0.944�0.076
1.112 1.100�0.031 1.090�0.019
E CuII 0.308 99.887 0.312�0.006 99.896 0.314�0.004
0.500 (99.893) 0.500�0.007 (99.900) 0.500�0.005
0.800 0.787�0.009 0.786�0.007
0.982 0.939�0.030 0.949�0.011
FeIII 20.537 99.997 20.285�0.161 99.995 20.114�0.195
80.051 (99.993) 79.997�0.207 (99.994) 80.072�0.084
50.032 49.923�0.195 50.000�0.169
88.364 88.712�0.484 88.557�0.203
254 A. Herrero, M. Cruz Ortiz / Analytica Chimica Acta 378 (1999) 245±259
Fig. 3. Voltammograms recorded in the determination of CuII, PbII, CdII and ZnII by DPASV (concentration ranges: 68.56±334.42 nmol dmÿ3
for CuII, 25.26±123.20 nmol dmÿ3 for PbII, 17.48±85.24 nmol dmÿ3 for CdII and 159.25±776.83 nmol dmÿ3 for ZnII). The voltammetric peaks
correspond, from left to right, to ZnII, CdII, PbII and CuII, respectively. The vertical lines indicate those variables selected by the GA for (a)
CuII and (b) PbII.
A. Herrero, M. Cruz Ortiz / Analytica Chimica Acta 378 (1999) 245±259 255
Fig. 4. Voltammograms recorded in the determination of TlI and PbII by DPASV (concentration ranges: 0.32±1.61 mmol dmÿ3 for TlI and
0.23±1.13 mmol dmÿ3 for PbII). The dotted line corresponds to the voltammogram for the blank. The voltammetric peaks correspond, from left
to right, to TlI and PbII, respectively. The vertical lines indicate those variables selected by the GA for (a) TlI and (b) PbII.
256 A. Herrero, M. Cruz Ortiz / Analytica Chimica Acta 378 (1999) 245±259
a `foreign' signal which, if a complex matrix were
analysed, could not be detected in a simple way. So,
this GA-based variable selection procedure could be a
useful tool to detect and analyse non-expected anoma-
lies in voltammetric and polarographic determina-
tions.
The predictions carried out by the PLS models after
the selection of variables have been compared with
those corresponding to the PLS models built without
variables selection, by means of the above non-para-
metric tests. The corresponding results are shown in
Table 2, Case D, and as in the last case, the results of
both tests do not coincide, although the actual sig-
ni®cance probability of the Wilcoxon signed ranks test
is very near to the critical value again, as the Sign test.
In this case, the use of the GA does not improve the
numerical results, but it provides signi®cant qualita-
tive information.
3.5. Determination of CuII and FeIII by DPASV
Another usual interference in sample analysis is the
matrix effect that, in voltammetric determinations,
could be due to solution-phase electroactive species
present in the sample which produce a response at the
same potential as the analyte of interest. Such is the
effect of the presence of high FeIII concentrations in
the stripping voltammetric determination of CuII,
because the voltammetric peaks of both metals are
close together in most supporting electrolytes, giving a
single wide voltammetric peak, result of two very
highly overlapping peaks. Most of the univariate
methods proposed in the bibliography to avoid the
interference of iron in the determination of copper
include extraction [35] or selective complexation [36],
medium exchange procedures [37], subtractive
approaches [38], etc., which do not always give the
pursued goal since the separate determination of each
metal is usually achieved together with relatively high
errors in some cases.
The use of the PLS regression allows one to simul-
taneously determine both metals, copper and the
interferent, through an adequate experimental design
with the same experimental effort that could be neces-
sary to only determine copper in the presence of high
level of iron. So a complete design with 6 levels for
each metal, i.e. 36 training set samples and 4 test set
samples has been used, the current at 35 potentials
being recorded. All the voltammograms are jointly
shown in Fig. 5 where the high overlapping of the two
peaks is obvious.
Next, the GA has been used on the training data set
to select the more informative variables for the PLS
models; the variables selected are those shown in
Fig. 5(a) and (b) for both CuII and FeIII, respectively.
Fig. 5(a) shows that, in the determination of copper,
the GA has selected variables just in the zone of the
copper peak potential, as expected, and others in the
right tail of the peak. However, in the case of iron, the
peak potential zone is clearly eluded by the GA
(only potentials in the tails of the peak have been
selected) probably because of the shifting undergone
by the iron peak when the concentration of this metal
increases. Again, the GA avoids the zones of con¯ict,
giving a qualitative way of detecting this kind of
phenomena.
With reference to the prediction ability of the PLS
models built after the variable selection procedure, the
applied non-parametric tests conclude that there are no
statistically signi®cant differences between these pre-
dictions and those made by the PLS models built
without variable selection, Table 2 (Case E). So, the
PLS models built have the same prediction ability as
those built with all the predictor variables although
they are constituted by a signi®cant lower number of
variables.
3.6. Precision for the test sets' samples
The precision corresponding to the test sets samples
without and with variable selection has been calcu-
lated in order to compare both procedures. An empiri-
cal formula suggested in [39,40] has been used to
evaluate the con®dence intervals. Table 3 shows the
results where the calculated concentration of each test
set sample is accompanied by the corresponding pre-
cision, together with the explained and crossvalidated
variances for the PLS model built without and with
previous variable selection for each metal. In general,
the precision calculated for the test samples by the
PLS models built with those variables selected by the
GA is better (smaller) than that obtained when all the
predictor variables were used, as can be seen in
Table 3. This results con®rm the fact that the GA
selects the variables more related to the response,
avoiding those that could disturb the prediction-
A. Herrero, M. Cruz Ortiz / Analytica Chimica Acta 378 (1999) 245±259 257
Fig. 5. Voltammograms recorded in the determination of CuII and FeIII by DPASV (concentration ranges: 0±1.01 mmol dmÿ3 for CuII and 0±
101.03 mmol dmÿ3 for FeIII). The highly overlapping voltammetric peaks correspond, from left to right, to CuII and FeIII, respectively. The
vertical lines indicate those variables selected by the GA for (a) CuII and (b) FeIII.
258 A. Herrero, M. Cruz Ortiz / Analytica Chimica Acta 378 (1999) 245±259
response relationship, so the predicted values are more
accurate.
References
[1] N.R. Draper, H. Smith, Applied Regression Analysis, 2nd ed.,
Wiley, New York, 1981.
[2] P.J. Brown, Measurement, Regression and Calibration,
Clarendon Press, Oxford, 1993.
[3] J.H. Kalivas, P.M. Lang, Mathematical Analysis of Spectral
Orthogonality, Marcel Dekker, New York, 1994.
[4] A.J. Miller, Subset Selection in Regression, Chapman and
Hall, London, 1990.
[5] A. Garrido Frenich, D. Jouan-Rimbaud, D.L. Massart, S.
Kuttatharmmakul, M. MartõÂnez Galera, J.L. MartõÂnez Vidal,
Analyst 120 (1995) 2787.
[6] R. Leardi, R. Boggia, M. Terrile, J. Chemom. 6 (1992) 267.
[7] R. Leardi, J. Chemom. 8 (1994) 65.
[8] H. Kubinyi, J. Chemom. 10 (1996) 119.
[9] D. Broadhurst, R. Goodacre, A. Jones, J.J. Rowland, D.B.
Kell, Anal. Chim. Acta 348 (1997) 71.
[10] C.B. Lucasius, G. Kateman, Chemom. Intell. Lab. Syst. 19
(1993) 1.
[11] C.B. Lucasius, G. Kateman, Chemom. Intell. Lab. Syst. 25
(1994) 99.
[12] C.R. Houck, J.A. Joines, M.G. Kay, A Genetic Algorithm for
Function Optimization: A Matlab Implementation, NCSU-IE
TR 95-09, 1995 (anonymous ftp from: ftp://ftp.eos.ncsu.edu/
pub/simul/GAOT).
[13] D.B. Hibbert, Chemom. Intell. Lab. Syst. 19 (1993) 277.
[14] Z. Michalewicz, Genetic Algorithms�Data Structure-
s�Evolution Programs, 3rd ed., Springer, Berlin, 1996.
[15] L. Davis, The Handbook of Genetic Algorithms, Van
Nostrand Reinhold, New York, 1991.
[16] J. Holland, Adaptation in Natural and Artificial Systems, The
University of Michigan Press, Ann Arbor, MI, 1975.
[17] D.E. Goldberg, Genetic Algorithm in Search Optimization
and Machine Learning, Addison-Wesley, Reading, MA, 1989.
[18] D. Jouan-Rimbaud, D.L. Massart, R. Leardi, Anal. Chem. 67
(1995) 4295.
[19] M.J. Arcos, M.C. Ortiz, B. Villahoz, L.A. Sarabia, Anal.
Chim. Acta 339 (1997) 63.
[20] M.J. Arcos, C. Alonso, M.C. Ortiz, Electrochimica Acta 43
(1998) 479.
[21] M.C. Ortiz, A. Herrero, M.S. SaÂnchez, L.A. Sarabia, M.
IÂnÄiguez, Analyst 120 (1995) 2793.
[22] A. Herrero, M.C. Ortiz, Anal. Chim. Acta 348 (1997) 51.
[23] A. Herrero, M.C. Ortiz, J. Electroanal. Chem. 432 (1997) 223.
[24] A. Herrero, M.C. Ortiz, Talanta, 46 (1998) 129.
[25] M. Forina, R. Leardi, C. Armanino, S. Lanteri, PARVUS: An
Extendable Package of Programs for Data Exploration,
Classification and Correlation, Ver 1.1, Elsevier Scientific
Software, 1990.
[26] STATGRAPHICS, Ver. 5, STSC, Rockville, MD, 1991.
[27] Matlab, The MathWorks, Natick, Mass., 1992.
[28] M. Hollander, D.A. Wolfe, Nonparametric Statistical Meth-
ods, Wiley, Chichester, 1973.
[29] G.E.P. Box, W.G. Hunter, J.S. Hunter, EstadõÂstica para
investigadores: introduccioÂn al disenÄo de experimentos,
anaÂlisis de datos y construccioÂn de modelos, ReverteÂ,
Barcelona, 1989.
[30] F. Vydra, K. SÏtulõÂk, E. JulaÂkovaÂ, Electrochemical Stripping
Analysis, Ellis Horwood, Chichester, 1976.
[31] T.R. Copeland, R.A. Osteryoung, R.K. Skogerboe, Anal.
Chem. 46 (1974) 2093.
[32] Z. Lukaszewski, W. Zembrzuski, A. Piela, Anal. Chim. Acta
318 (1996) 159.
[33] A. Henrion, R. Henrion, G. Henrion, F. Scholz, Electro-
analysis 2 (1990) 309.
[34] M.C. Ortiz, M.J. Arcos, L.A. Sarabia, Chemom. Intell. Lab.
Syst. 34 (1996) 245.
[35] V. Meenakumari, Analyst 120 (1995) 2849.
[36] M.E. VaÂzquez DõÂaz, J.C. JõÂmenez SaÂnchez, M. CallejoÂn
MochoÂn, A. Guiraum PeÂrez, Analyst 119 (1994) 1571.
[37] S. Gottesfeld, M. Ariel, J. Electroanal. Chem. 9 (1965) 112.
[38] J.E. Bonelli, H.E. Taylor, R.K. Skogerboe, Anal. Chim. Acta
118 (1980) 243.
[39] Unscrambler II (v 4.0), User's Guide, Camo A/S, Norway,
1992.
[40] S. De Vries, C.J.F. Ter Braak, Chemom. Intell. Lab. Syst. 30
(1995) 239.
A. Herrero, M. Cruz Ortiz / Analytica Chimica Acta 378 (1999) 245±259 259