a liquid handling system for the automated acquisition of data for training, validating and testing...
TRANSCRIPT
![Page 1: A liquid handling system for the automated acquisition of data for training, validating and testing calibration models](https://reader031.vdocuments.site/reader031/viewer/2022020515/5750230e1a28ab877ea7f40f/html5/thumbnails/1.jpg)
A liquid handling system for the automated acquisitionof data for training, validating and testing calibration models
Edward Richards, Conrad Bessant*, Selwayan SainiCranfield Centre for Analytical Science, Institute of BioScience & Technology, Cranfield University, Silsoe, Bedforshire MK45 4DT, UK
Received 17 May 2002; received in revised form 9 September 2002; accepted 24 September 2002
Abstract
Multivariate calibration of a single sensor for many mixed analytes is useful, but generating, validating and testing the calibration models
requires large data sets which can be time consuming to collect, particularly when the number of analytes is large. In this paper, a solution to
this problem is presented, in the form of an automated liquid handling system capable of producing mixtures and triggering an external
analytical instrument to analyse them. As a case study, the electrochemical technique of dual pulse staircase voltammetry (DPSV) was used to
collect data for three analytes (glucose, fructose and ethanol) mixed in varying concentrations. Artificial neural networks (ANNs), optimised
using genetic algorithms were used to create the best possible multivariate calibration model. The liquid handling system performed 1668
experiments used in the study in approximately 60 h, compared to over 2 weeks that would be required to collect perform the experiments
manually. The best calibration models produced from the data bettered those produced from previous manually collected data [1,2] for all the
analytes concerned.
# 2002 Elsevier Science B.V. All rights reserved.
Keywords: Liquid handling; Automation; Multivariate calibration; Neural networks; Multi-analyte; Electrochemistry
1. Introduction
In a previous paper [2], we showed how an elitist genetic
algorithm could be used to optimise the parameters of a
neural network to train the best possible multivariate cali-
bration models for voltammetric data acquired from mix-
tures of ethanol, fructose and glucose. The data, 1064
experiments, was produced by hand in a laborious pipette
procedure, susceptible to variability and human error. The
1064 experiments took a suitably trained person 2 weeks to
perform. To reduce the time taken and potential human
induced error, a computer-controlled automated liquid hand-
ling system has been designed and built. The system is
capable of performing four factor nine level fully factorial
experimental designs, producing each individual sample in a
random order to negate systematic errors caused by tem-
perature variation or electrode drift.
In the case study presented here, the system was used to
produce aqueous mixtures of ethanol, fructose and glucose,
which it then passed through a flow cell. After the generation
and mixing of each sample, the system triggered a potentio-
stat to perform an electrochemical analysis using dual
pulse staircase voltammetry (DPSV) [3]. The system devel-
oped here differs from a standard flow injection system
by having the ability to automatically mix different con-
centration permutations of multiple analytes, buffer and
carrier in discrete sample volumes which are then passed
through the detector. There is no detection of concentration
gradient as the detection is performed under a stopped flow
condition after half the sample has passed through the
detection cell.
2. Materials and methods
2.1. The automated liquid handling system
The automated liquid handling system (Fig. 1) uses a
series of calibrated diaphragm pumps (Bio-Chem Valve Inc.,
USA) to create a 1-ml sample containing up to four different
analytes, buffer and water. Each of the analyte and buffer
pumps (Figs. 1 and 2) has a set volume of 20 ml. The mix and
cell pumps each have set volumes of 50 ml. Varying the
number of strokes of each of the analyte pumps from 0 to 8
Sensors and Actuators B 88 (2003) 149–154
* Corresponding author. Tel.: þ1-525-863-358; fax: þ1-525-863-540.
E-mail address: [email protected] (C. Bessant).
0925-4005/02/$ – see front matter # 2002 Elsevier Science B.V. All rights reserved.
PII: S 0 9 2 5 - 4 0 0 5 ( 0 2 ) 0 0 3 1 8 - 0
![Page 2: A liquid handling system for the automated acquisition of data for training, validating and testing calibration models](https://reader031.vdocuments.site/reader031/viewer/2022020515/5750230e1a28ab877ea7f40f/html5/thumbnails/2.jpg)
times per sample allows up to 94 ¼ 6561 different mixture
permutations of the four analytes to be produced. The pump
switch interface supplies a buffered power supply to the
pumps via transistors and also powers the mixing motor
using a bi-stable latch. The pump switch interface was
controlled directly via the PC parallel port using a program
written in Cþþ Builder (Borland, USA), running under
Windows 98 (Microsoft, USA) on an Athlon 1 GHz pro-
cessor (Advanced Micro Devices, USA).
The RO water tanks were two 10 l glass vessels linked via
a siphon tube. Their combined surface area of 0.067 cm2
meant that there was very little change in pressure in the
water line over a large number of experiments. The surface
of the RO water in the tanks was maintained approximately
3 cm above the inlet of the mixing pump, so as to provide a
small positive pressure at the inlet to the pump for the
analyte and buffer pumps to work against, without changing
their performance characteristics. This would ensure that the
water line was always flooded and that the sample was held
close to the mixing pump when injected and not able to flow
out into the RO water tanks.
2.1.1. Sample production
To produce a sample, the buffer pump first injects 10
aliquots (0.2 ml) of concentrated buffer into the water pipe
(see Fig. 2), displacing the water back towards the water
Fig. 1. Photograph of the automated liquid handing system.
Fig. 2. Schematic of the automated liquid handling system, showing the major components.
150 E. Richards et al. / Sensors and Actuators B 88 (2003) 149–154
![Page 3: A liquid handling system for the automated acquisition of data for training, validating and testing calibration models](https://reader031.vdocuments.site/reader031/viewer/2022020515/5750230e1a28ab877ea7f40f/html5/thumbnails/3.jpg)
tank. Aliquots of the analytes in a concentrated form are also
injected in the same way into the water line depending upon
the mixture to be produced. Again they displace the water
and now the buffer back towards the water tank (see Fig. 2).
The maximum volume that could be displaced back towards
the tank is the maximum concentrations of all the analytes (8
aliquots each) plus the buffer (0.84 ml). The water pipe has a
1.5 mm i.d., hence a 1-ml sample will be held in 56.6 cm of
the water pipe. To ensure that the sample cannot contaminate
the water tank, the water pipe is of 1-m length. Having
loaded the water line with buffer and analytes, the mix pump
pumps a 1-ml volume (20 aliquots) containing the buffer,
analytes and water into the mixing cell. The bi-stable latch
starts the motor-driven stirrer, which mixes the sample as it
enters the mixing cell. When the 1-ml volume has been
pumped into the mixing cell and mixed for three seconds, the
stirrer is stopped and the cell pump starts to draw half of the
homogenised sample through the thin layer flow cell. After
half the sample has passed through the flow cell, pumping
stops, the potentiostat is triggered and a voltammetric
measurement is taken and stored for the sample. The rest
of the sample is then drawn through the flow cell. Two wash
cycles are then performed in tandem, which consist of the
sample production process as described above, but without
the injection of analytes or the voltammetric measurement.
Washing is performed to prevent carryover from measure-
ment to measurement.
2.2. Sample production error
The pump manufacturer’s data sheet specifies a value for
aliquot repeatability of �4% of the set volume for each
pump. A simulation of a 9 level, 3 factored, fully factorial
experimental design (729 experiments) was run in Matlab
6.1, where the assumption that the �4% of pump set
volume repeatability represented 3 S.D. of the normally
distributed aliquot population. The simulated errors were
calculated by summing, for the individual analytes and
buffer concentrations required (for each simulated sample)
the number of individual 20 ml volume aliquots the analyte
and buffer pumps would inject into the system manifold
(Fig. 2). To each simulated aliquot a random normally
distributed volume error was added, using a standard
distribution, s of �0.2666 ml (1.333% of 20 ml) and zero
mean. Having ascertained the simulated volumes injected
into the manifold, 20 aliquots of the 50 ml mix pump were
simulated (a 1-ml volume in total). Each simulated aliquot
had a randomly distributed volume error added with stan-
dard distribution, s of �0.666 ml (1.333% of 50 ml). The
simulated volume of analytes and buffer were subtracted
from the simulated volume pumped by the mix pump, to
determine the volume of additional water added to the
sample. The simulated concentrations of analytes and
buffer in the final solution could then be calculated. The
simulated RMS error for each analyte and the buffer was
then calculated by taking the square root for the average of
all the simulated concentrations subtracted from the
required concentrations squared for each analyte. The
simulated RMS error for each analyte, per sample pro-
duced, was �0.38% of required maximum analyte con-
centration. The simulated RMS error for the buffer, per
sample produced, was �0.1% of required sample buffer
concentration.
2.3. Materials
The analyte mixtures and buffer were produced in a
concentrated form so that when injected into the water pipe
they would be diluted to the concentrations of interest.
Ethanol (HPLC grade 99.97%) (Merck, USA), D-fructose
(AnalaR) (BDH, UK), D-glucose (General Purpose Reagent)
(BDH, UK) and sodium hydroxide pearls (min 98%)
(Sigma–Aldrich) were used. All reagents were dissolved
in RO water. As the volume of each solution is limited to
1 ml, the weight/volume requirement of all analytes for all
the experiments is small.
2.4. Dual pulse staircase voltammetry
The applied potential waveform is the same as that
described by Bessant and Saini [1], except that than the
duration of the electrode rejuvenation steps for both oxida-
tion (þ0.7 V) and reduction (�0.9 V) were set to 1 s. The
measurement was taken in a thin layer flow cell (20 ml)
(BAS, USA) at the surface of a 3 mm diameter platinum
electrode (BAS, USA). The counter electrode was identical
and adjacent to the working electrode (BAS, USA). The
reference electrode was Ag/AgCl (BAS, USA). The mea-
surement was taken in a stopped flow condition using an
Autolab PSTAT10 (Echo Chemie B.V., Netherlands).
2.5. Data sets
Training data sets were of a three factor five level fully
factorial design as described in Richards et al. [2]. The
maximum and minimum concentrations were 12 mM and
0 mM for ethanol, 0.68 mM and 0 mM for fructose and
0.72 mM and 0 mM for glucose. Three repeat sets of
training data were collected either side of the validation
set (6 � 125 experiments in total). The validation data set
was comprised of a repeat of the training data design (125
experiments) with an additional three factor four level fully
factored design (64 experiments) data set at concentrations
between those used for the training data. The maximum and
minimum concentrations were, for the additional part of
the validation data set, 10.5 mM and 1.5 mM for ethanol,
0.595 mM and 0.085 mM for fructose and 0.63 mM and
0.09 mM for glucose, respectively (64 experiments).
Finally, the test set was a three factor nine level design
(729 experiments) collected at equal steps between the
maximum and minimum concentrations specified for the
training data.
E. Richards et al. / Sensors and Actuators B 88 (2003) 149–154 151
![Page 4: A liquid handling system for the automated acquisition of data for training, validating and testing calibration models](https://reader031.vdocuments.site/reader031/viewer/2022020515/5750230e1a28ab877ea7f40f/html5/thumbnails/4.jpg)
2.6. Data modelling and pre-treatment
Calibration modelling was performed on a PC running
Windows 2000 (Microsoft, USA) operating system with an
Athlon 1600XP processor (Advanced Micro Devices,
USA) and 1 GB RAM. Calibration models were produced
using in-house programs written using Matlab 6.1 (The
MathWorks, USA) and the Neural Networks Toolbox 4.0.1
(The MathWorks, USA). Non-linear neural network mod-
els were used rather than linear modelling methods such as
PCR or PLS because the voltammogram response for a
mixture of the analytes is not a linear combination of the
individual analyte voltammograms. This is due to electro-
chemical deposition of the analytes onto the working
electrode surface at different potentials, so analytes depos-
ited earlier in the potential scan partially interfere with the
detection of the subsequent analytes. In the previous paper
[2] two methods for determining the optimal neural net-
work settings were compared. The first used a fully
factored parameter search approach where the number
of epochs in steps of 100, the number of principal com-
ponents input to the network, and the number of hidden
neurons was evaluated to determine the network para-
meters that produced the most accurate calibration model.
The second method employed a genetic algorithm para-
meter search to explore the same three parameters, to
evolve the best neural network. In this paper, only the
genetic algorithm neural network parameter optimisation
method was applied, as this previous proved to be the most
efficient optimisation method. Data pre-treatment (range
scaling and PCA) was performed prior to modelling as
described in Richards et al. [2]. Although seven principal
components were found to explain almost 100% of train-
ing data variance, the number of principal components
retained was 22. This was to allow for non-linearities in the
data set which had been relegated to later principal com-
ponents by the PCA algorithm, to be included for training
the networks [2]. Network performance was quantified by
using the RMS distance error, which gives an indication of
the network’s overall calibration capability. The RMS
distance (RMSD) is calculated by taking the RMS error
for the replicate training data sets (RMSR) and the RMS
error for the interpolation sets (RMSI) and combining
them as in Eq. (1).
RMSD ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiRMSR2 þ RMSI2
p(1)
This distance equation applies only to the validation and test
sets, as they were comprised of replicate training data sets
and also interpolation data sets collected at concentrations
not used for training.
3. Results and discussion
3.1. The automated liquid handing system
The automated liquid handing system proved to have
good repeatability and reliability. Fig. 3 shows six repeti-
tions of blank subtracted voltammograms at five concentra-
tions for ethanol. The data shown was produced in a random
order amongst the 6 repetitions of 3 factor 5 level fully
factored experimental design which formed the training
data.
3.2. Neural network calibration model
The results presented in this paper used the same genetic
algorithm parameter search technique as previously
empoyed [2]. The results for ethanol are used as a case
study with summarised results in Table 1 for fructose and
glucose.
Fig. 3. Blank subtracted voltammograms for five different concentrations
of ethanol. Data is taken from the training data sets.
Table 1
Best neural network calibration models obtained using validation data
Analyte Property
RMS replicate error RMS interpolation error RMS distance error Number of epochs Number of PCs Number of
hidden neurons
Ethanol 0.0248 0.0255 0.0355 472 21 9
Fructose 0.0257 0.0288 0.0386 553 21 8
Glucose 0.0231 0.0247 0.0338 357 14 8
152 E. Richards et al. / Sensors and Actuators B 88 (2003) 149–154
![Page 5: A liquid handling system for the automated acquisition of data for training, validating and testing calibration models](https://reader031.vdocuments.site/reader031/viewer/2022020515/5750230e1a28ab877ea7f40f/html5/thumbnails/5.jpg)
3.2.1. The number of training epochs
The relationship between the network accuracy and the
number of training epochs follows the same trend as pre-
viously seen in Richards et al. [2]. The effect of the training
for a given number of epochs was undermined by having
random initial weights. Only where there were short training
periods (i.e. less than 50 epochs) were there signs of under-
training. Good networks were trained within all epoch
ranges, with the best network using 472 epochs.
3.2.2. Number of PCs
In general, there was a reduction in the RMS distance
error with increasing numbers of principal components
presented to the neural networks for training, suggesting
that there was non-linear information relegated to the higher
principal components which was salient information for
modelling [2,4]. The best general network was trained using
21 principal components.
3.2.3. Number of hidden neurons
Fig. 4a shows that as the number of hidden neurons are
increased from 2 to 5 neurons there is a general improvement
in the RMS distance error. As the hidden neurons are
increased further there is generally no improvement, how-
ever, there are individual networks that are trained that are
very good general networks. The best network was trained
using nine neurons suggesting that there were many intricate
relationships that required modelling.
Fig. 4b shows how many times the genetic algorithm
chose numbers of hidden neurons to model ethanol from the
data. It can be seen that it concentrated mainly on 8 and 9
hidden neurons to produce the best networks. There may
have been many complex relationships to model or it may
Fig. 4. (a) RMS distance errors for ethanol, plotted as a function of the
number of hidden neurons. The best (*), the worst (*) and average RMS
distance errors (–) are shown. (b) Frequency the genetic algorithm chose a
number of hidden neurons to train with, for ethanol calibration.
Fig. 5. (a) Box and whisker plot showing the results obtained from the best
network when presented with the ethanol validation data. The whiskers
denote the maximum and minimum values, the boxes denote the upper and
lower quartile ranges and the line in the box denotes the median value. (b)
Box and whisker plot showing the predicted values for the best network
when presented with the test data set. The whiskers denote the maximum
and minimum values, the boxes denote the upper and lower quartile ranges
and the line in the box denotes the median value.
E. Richards et al. / Sensors and Actuators B 88 (2003) 149–154 153
![Page 6: A liquid handling system for the automated acquisition of data for training, validating and testing calibration models](https://reader031.vdocuments.site/reader031/viewer/2022020515/5750230e1a28ab877ea7f40f/html5/thumbnails/6.jpg)
have been the case that there were local areas of under fitting
that were compensated for by extra degrees of freedom
added by having more neurons [5].
3.3. Testing the best network
In the previous paper [2] the neural network models were
optimised using a validation data set, yet were not presented
with a test data set afterwards. For this piece of work, the best
model found using the validation data in the optimisation
process underwent a final test using a comprehensive three
factor nine level (729 experiments) test set. The predictive
results for ethanol from the test data can be seen displayed as a
box and whisker plot in Fig. 5b along with the results for the
validation data used to obtain the best network in Fig. 5a. At
each concentration shown in Fig. 5b there are 81 predictions.
The results for the test data set are summarised in Table 2. The
tails of the whiskers denote the maximum and minimum
predictions, the top and bottom of the box denote the inter-
quartile ranges for the predictions and the bar across the
middle of the box denotes the mean value.
4. Conclusions
The automated liquid handing system provides a fast,
reliable and accurate method of performing large data
collection activities in a random order. It was able to
improve accuracy and reproducibility with respect to
similar experiments done by hand. This was shown by
performing a previously conducted calibration method
using data collected by the automated liquid handing
system. In general, the neural network models produced
from the data collected using the automated liquid handing
system were better than those produced using manually
collected data [2]. Using the automated liquid handing
system also reduces waste and analyte reagent usage by
taking measurements from a small sample size of 1 ml.
The best model trained also provided good predictions for
a large test set incorporating many samples taken at
concentration permutations previously unused in the mod-
elling process.
In future work, we plan to produce data for the calibration
of four analyte systems. We will also be interfacing the
automated liquid handing system to alternative analytical
methods, such as optical spectrometers and gas sensors to
provide multi-analyte calibration models for different types
of analytes.
References
[1] C. Bessant, S. Saini, Simultaneous determination of ethanol, fructose
and glucose at an unmodified platinum electrode using artificial neural
networks, Anal. Chem. 71 (1999) 2806–2813.
[2] E. Richards, C. Bessant, S. Saini, Optimisation of a Neural Network
Model for calibration of voltammetric data, Chemomet. Intel. Lab.
Sys. 61 (2002) 35–49.
[3] Y. Fung, S. Mo, Application of dual pulse staircase voltammetry for
simultaneous determination of glucose and fructose, Electroanalysis 7
(1995) 160–165.
[4] F. Despagne, D.L. Massart, Neural networks in multivariate calibra-
tion, The Analyst 123 (1998) 157R–178R.
[5] S. Lawrence, C. Giles, Overfitting and neural networks: conjugate
gradient and backpropagation, in: Proceedings of the IEEE Interna-
tional Joint Conference on Neural Networks, Como, Italy, 24–27 July
2000, pp. 114–119.
Table 2
Test data set results for the best neural network calibration models
obtained using validation data
Analyte Property
RMS replicate
error
RMS interpolation
error
RMS distance
error
Ethanol 0.0459 0.0253 0.0524
Fructose 0.0304 0.0273 0.0409
Glucose 0.0306 0.0386 0.0492
154 E. Richards et al. / Sensors and Actuators B 88 (2003) 149–154