blank augmentation protocol for improving the robustness of multivariate calibrations

Blank Augmentation Protocol for Improving the Robustness ofMultivariate Calibrations

KIRSTEN E. KRAMER* and GARY W. SMALL�Optical Science and Technology Center and Department of Chemistry, University of Iowa, Iowa City, Iowa 52242

An updating procedure is described for improving the robustness of

multivariate calibration models based on near-infrared spectroscopy.

Employing a single blank sample containing no analyte, repeated spectra

are acquired during the instrumental warm-up period. These spectra are

used to capture the instrumental profile on the analysis day in a way that

can be used to update a previously computed calibration model. By

augmenting the original spectra of the calibration samples with a group of

spectra collected from the blank sample, an updated model can be

computed that incorporates any instrumental drift that has occurred. This

protocol is evaluated in the context of an analysis of physiological levels of

glucose in a simulated biological matrix designed to mimic blood plasma.

Employing data of calibration and prediction samples acquired over

approximately six months, procedures are studied for implementing the

algorithm in conjunction with calibration models based on partial least

squares (PLS) regression. Over the range of 1–20 mM glucose, the final

algorithm achieves a standard error of prediction (SEP) of 0.79 mM when

the augmented PLS model is applied to data collected 176 days after the

collection of the calibration spectra. Without updating, the original PLS

model produces a seriously degraded SEP of 13.4 mM.

Index Headings: Multivariate calibration; Near-infrared spectroscopy;

NIR spectroscopy; Partial least squares; PLS; Calibration transfer.

INTRODUCTION

A key to the growth of near-infrared (NIR) spectroscopy as aquantitative analysis technique has been the development ofmultivariate calibration methods that allow the extraction ofweak analyte signals from the overlapping signatures of otherconstituents in the sample matrix. A complicating factor in theuse of these methods is the effect on model performance ofchanges in instrumental response over time. The calibrationmodel may fail to characterize future data accurately if the datacontain new features not present in the original set ofcalibration spectra. This motivates the use of calibrationtransfer or updating techniques1,2 in cases in which thecollection of a completely new set of calibration spectra isnot practical.

Transfer calibration techniques can be based on three generalstrategies. First, data preprocessing methods such as digitalfiltering can be used to remove variance from the data that isinterfering with the performance of the calibration model. Noadditional data collection is required and no modification of thecalibration model is made.3–6 These methods are mostsuccessful when the spectral changes introduced into theprediction data are relatively small.

A second strategy involves the development of a transfor-mation matrix to cause spectral data with new, unmodeledfeatures to conform to the original calibration model. Direct

standardization (DS) and piecewise direct standardization(PDS)7,8 require the re-measurement of a subset of thecalibration samples (termed the transfer set) on the predictionday for use in deriving the transformation matrix. Thedisadvantages to this approach are the reliance on the fidelityof the transfer set (with the possibility of sample degradation oralterations during re-preparation), the difficulty in selecting anappropriate transfer set that will encompass all future data, andthe obvious inconvenience of collecting the subset itself.

The third approach is to use calibration updating proceduresto incorporate new information into the original model toreflect changes in the data to be predicted. Often, this isaccomplished through the application of a simple slope andbias correction to the predicted concentrations, or in otherinstances through more elaborate techniques involving higher-order models.8

Haaland9 and Saiz-Abajo et al.10 have developed methods tomodify the set of calibration spectra synthetically to incorpo-rate new features or sources of variance. The calibration modelis then recomputed with the modified calibration set. Thisapproach has the advantage that no additional data collection isneeded, although the principal disadvantage is that the variancesources affecting the calibration must be known and wellcharacterized if a successful synthetic modification to thecalibration set is to be made.

Much of the work related to calibration updating hasinvolved the collection on the prediction day of new dataspecifically for use in updating the calibration model.Analogous to the DS and PDS methods referenced above, asmall set of calibration samples can be measured and theacquired data can be used to augment the original set ofcalibration spectra.7,11,12 The model is then recomputed withthe augmented calibration set, with the added samples possiblyweighted to control their influence.11,12 Zhang has used asimilar sample set collected on the prediction day to guide there-optimization of the calibration model using the originalcalibration spectra.13,14 Analogous procedures are used inmany process monitoring applications where updating orreplacement of calibration points allows the model to changedynamically to track instrumental drift.15–18

One of the keys to making the calibration updatingprocedure practical is to minimize the amount of new datathat must be acquired. In this regard, the Haaland group hasmade significant advancements. They have developed a seriesof related methods designed around the idea of incorporatingdata collected on the prediction day into the calibrationmodel.19–24 These studies have demonstrated that repeatmeasurements of a single calibration standard selected fromthe center of the concentration space can be successfully usedto incorporate information about instrumental drift into thecalibration model. In their work, the calibration standard was

Received 2 February 2006; accepted 16 February 2007.* Present address: Naval Research Laboratory, 4555 Overlook Ave. SW,

Washington, DC 20375.� Author to whom correspondence should be sent. E-mail: [email protected].

Volume 61, Number 5, 2007 APPLIED SPECTROSCOPY 4970003-7028/07/6105-0497$2.00/0

� 2007 Society for Applied Spectroscopy

typically measured at the beginning and end of the day, as wellas after every two prediction samples.23

Given this success with a single calibration standard, thequestion arises as to whether a blank sample containing noanalyte could be similarly used. In the work reported here, thisidea is studied through the evaluation of an updating strategythat is implemented with minimal sample preparation and datacollection requirements. The basic approach is to use theinstrumental warm-up period to acquire repeated spectra of oneblank sample. These spectra are then incorporated into thecalculation of an updated calibration model. The method isillustrated here in the context of a near-infrared analysis ofphysiological levels of glucose in a synthetic biological matrix.The updating procedure is used in conjunction with calibrationmodels based on partial least squares (PLS) regression. Issuesstudied in this work include the composition of the blanksample, procedures for real-time implementation of themethod, and the long-term performance of the approach overapproximately six months.

EXPERIMENTAL

Solution Preparation. The data set used in this studyconsisted of buffered aqueous samples containing physiolog-ical levels of the analyte, glucose, in a variable matrix oftriacetin and bovine serum albumin (BSA). These matrixconstituents were models for total protein and triglycerides,respectively, and the overall data set was a mimic for a set ofclinical samples such as blood plasma.

A uniform design25 was used to provide a distribution ofconcentrations for each species under study that would span arange relevant for diabetic patients. Separate designs were usedto produce 70 calibration and 21 prediction samples, with eachset spanning the same concentration space (1.4–19.3 mMglucose, 51.0–99.0 g/L BSA, and 1.1–3.9 g/L triacetin). Themaximum correlation coefficients between component concen-trations were 0.01 and 0.25 for the calibration and predictionsets, respectively. The solvent was a 0.1 M, pH 7.4 phosphatebuffer (NaH2PO4, ACS reagent, Fisher Scientific, Fair Lawn,NJ, þ 50% w/w NaOH, Fisher Scientific) containing sodiumbenzoate as a preservative (5 g/L, Fisher Scientific). Allsolution preparation used reagent-grade water obtained from aMilli-Q Plus water purification system (Millipore, Inc., Bed-ford, MA). Stock solutions of 71.4 mM anhydrous glucose(ACS reagent, Fisher Scientific), 94.9 g/L BSA (Cohn fractionV powder, product no. A 4503, minimum 96% by electropho-resis, Sigma Chemical Co., St. Louis, MO) and 29.7 g/Ltriacetin (99%, Sigma Chemical Co.) were prepared in thebuffer, and mixture samples were obtained by volumetricdilutions of the stock solutions with the buffer.

Apparatus. Interferograms were collected with a DigilabFTS-60A Fourier transform (FT) spectrometer (Varian, Inc.,Randolph, MA) equipped with a 100 W tungsten-halogenlamp, CaF2 beam splitter, and liquid-nitrogen-cooled InSbdetector. A K-band optical interference filter (Barr Associates,Westford, MA) was placed before the sample cell to isolate the5000–4000 cm�1 region. A liquid transmission cell with a 20mm diameter circular aperture (model 118–3, Wilmad Glass,Buena, NJ), sapphire windows (Meller Optics, Providence, RI),and a 2 mm path length was used to contain the samples. Withthis cell, the source beam was attenuated through the use of a63% transmittance thin-film neutral density filter (Rolyn

Optics, Covina, CA) and a 2 cm�1 instrumental aperture. Thelight intensity was restricted to ensure detector linearity.

Temperature control was provided by an integrated waterjacket in the sample cell. Sample temperatures were regulatedby use of a refrigerated circulating bath (Model 9100, FisherScientific), and the cell temperature was monitored with acopper-constantan thermocouple probe (Omega Engineering,Inc., Stamford, CT) inserted into a port in the cell water jacket.Temperatures were read with an Omega Model 670 digitalmeter to a precision of 60.1 8C. The mean and standarddeviation of the cell temperature readings were 37.0 6 0.18C.Glucose concentrations were verified each day with a YSIModel 2300 Stat Plus glucose analyzer (YSI, Inc., YellowSprings, OH).

Data Processing. Interferograms of glucose samples werecollected in triplicate as 256 coadded, single-sided scansconsisting of 2048 points, sampled at every zero-crossing ofthe HeNe reference laser for a maximum spectral frequency of15 801 cm�1. Interferograms of two blank samples were alsocollected each day and will be described below. Allinterferograms were downloaded to a Silicon Graphics Origin200 computer (Silicon Graphics, Inc., Mountain View, CA)running under Irix (Version 6.5, Silicon Graphics, Inc.).Computations were performed on this system with programswritten in Fortran 77. Some calculations used subroutines fromthe IMSL software package (IMSL, Inc., Houston. TX).Fourier processing included Mertz phase correction andtriangular apodization. Computed spectra had a point spacingof 15.43 cm�1. On the basis of previous work with this samplematrix,26,27 the portion from 4700–4300 cm�1 was used foranalysis, resulting in spectra composed of 27 points. AfterFourier processing, all other computations were performedusing either Matlab (Version 6.5, The Mathworks, Inc., Natick,MA) or in-house Fortran 77 code. Some calculationsperformed with Matlab employed the functions crossval,normaliz, pca, pls, preprocess, and snv from the PLS Toolbox(Version 2.0, Eigenvector Research, Inc., Manson, WA).

Sampling and Data Partitioning. The 70 samples used forcalibration were collected at two different times, each spanningtwo days. The first collection occurred on days 1 and 2(calibration set A), after which the samples were frozen andthen thawed for re-analysis on days 100–101 (calibration setB). The prediction set, made up of 21 samples, was firstcollected on day 3, then re-sampled on future prediction days30, 37, 44, 51, 58, 65, 72, 79, 102, 124, 136, 143, 150, 157,164, 171, and 178. The solutions were kept frozen betweendata collection days.

For the calibration and prediction sets, a run order wasrandomized whereby all concentrations of the three-componentsystem had low correlations with time. Scans were alsocollected each day of two spectroscopic blanks: (1) the samephosphate buffer used in solution preparation (buf ), and (2) themean of the sample matrix (mat), which contained 75 g/L BSAand 2.5 g/L triacetin in buffer. On days 1–3, spectra of theblank samples were collected as 256 coadded scans, and 20spectra were collected for each type. For all other days, 50spectra were collected for both buf and mat solutions as 64coadded scans.

Collection of the data for the blank samples began after thesample temperature rose to 37.0 8C, which was usually onlyseveral minutes after the detector was filled with liquid nitrogenand the source lamp powered. The order of collection of the

498 Volume 61, Number 5, 2007

two blank samples was switched from day to day to see if theinitial spectra collected were affected by not allowing theinstrument any equilibration time. The entire collection of theblank samples took approximately one hour per day. Aftercollecting the blanks, the glucose samples (from the calibrationor prediction set) were measured.

RESULTS AND DISCUSSION

Preprocessing of Single-Beam Spectra. Quantitativemodels derived from transmission experiments are typicallybased on spectra in absorbance units (AU) because of the linearrelationship between absorbance and concentration expressedby the Beer–Lambert law. For the glucose monitoringapplication that is the focus of this research, we have beenexploring the direct use of single-beam spectra because of thelack of a good glucose-free background in the ultimateapplication of this work to noninvasive measurements of tissueglucose.28–30 While the calibration updating algorithm de-scribed here could be applied to models based on either single-beam spectra or data in AU, results from the procedure willonly be presented for single-beam data.

When single-beam data are used directly in a quantitativemodel, preprocessing is typically required to reduce spectralintensity variation that occurs from day to day.28,30–32 Severalpreprocessing strategies were evaluated for their suitability foruse with the calibration updating method: (1) normalization ofthe single-beam spectrum to unit length in the vector sense, (2)multiplicative signal correction (MSC),28,33 (3) performing alog10 transform of the reciprocal of the spectral intensities(termed log/I ),28 (4) application of Savitzky–Golay first-derivative (five-point, quadratic) and second-derivative (five-point, quadratic, cubic) calculations,34,35 and (5) standardnormal variate (SNV) scaling.36–38 The preprocessing methodswere applied identically to all of the sample or blank single-beam spectra. For the MSC calculation, in cases in which the

calibration matrix was augmented with blank spectra collectedon the prediction day, the mean of all the spectra in theaugmented matrix was used as the reference in computing theadditive and multiplicative correction terms.

For the studies performed here, the log/I scaling producedthe best results, followed by roughly equivalent performancefor the vector normalization and SNV calculations. Only theresults produced with log/I scaling will be reported. Afterpreprocessing, the data were mean-centered prior to the PLScomputations. When an augmented calibration matrix wasemployed, the mean was computed from all the spectra (i.e.,blank and sample spectra).

Overview of Spectral Information. Figure 1 shows spectrain AU of each pure component in phosphate buffer inapproximately its mean concentration within the calibrationset. The region of 4700–4300 cm�1 is plotted to represent thespectral data used in forming the calibration models. Glucose(Fig. 1A) exhibits combination bands near 4700 (O–H), 4400(C–H), and 4300 (C–H) cm�1, but these are much less intensethan those of the interfering species, BSA (Fig. 1B), whichshows absorption bands near 4600 (N–H) and 4370 (C–H)cm�1, and triacetin (Fig. 1C), which exhibits one distinct C–Habsorption near 4450 cm�1. The high amount of spectraloverlap presents a challenge for modeling glucose in thismatrix. More information on the NIR spectral features of thisthree-component system can be found in previous publica-tions.13,26,27,39

Assessment of Data Quality. The goal in this experimentwas to develop a simple and convenient way of updating aspectroscopic calibration model that is subject to deteriorationover time. To judge the success of the approach, the long-termdata were characterized on a daily basis in order to confirm thatthe data quality over time had not deteriorated.

Short-term noise was calculated for the 21 samples taken intriplicate on each prediction day (or the 35 on each calibrationday). The triplicates were taken as pairs, ratios were computed,and the resulting transmittance spectra were converted to AU.To correct for baseline curvature in the noise spectrum causedby temperature variation, the root mean square (rms) noise wascomputed in lAU about a second-order polynomial fit to theregion of 4500–4300 cm�1. This region was chosen for thenoise analysis because it surrounds the important C–Hcombination band of glucose at 4400 cm�1.

In Figs. 2A and 2B, noise values are plotted for the set ofglucose and blank data, respectively, collected on each analysisday. For the glucose samples, all days look relatively stableexcept day 124. The source of the problem was neveridentified, but the results for the updating procedure will notbe given for this day.

In Fig. 2B, rms noise values are plotted separately for the buf(black) and mat (gray) blanks for each day in the chronologicalorder in which they were collected. For example, if the order ofthe bars is gray followed by black, this indicates that the matspectra were collected first. Lower noise values for days 1–3reflect the greater amount of signal averaging performed withthe spectra collected on these days. Four batches of blank data,days 102, 109, 150, and 171, gave higher than expected noisevalues, and all these batches happened to correspond to the matsample. The source of the higher noise was never identified,but the inclusion of these days in the prediction was a usefultool in helping to characterize the updating algorithm.

To assess the integrity of the collected data in terms of

FIG. 1. Absorbance spectra of (A) glucose (10.7 mM), (B) BSA (75.0 g/L),and (C) triacetin (2.5 g/L). Each spectrum reflects the approximate meanconcentration of the component within the calibration set. Each species wasdissolved in phosphate buffer and a buffer background was used to compute theabsorbance. Note the low glucose signal strength and the significant spectraloverlap among the three components.

APPLIED SPECTROSCOPY 499

predicting glucose concentrations, internal cross-validationswere performed separately with data collected each day,leaving one sample (three spectra) out at a time and computinga PLS model from the region of 4700–4300 cm�1 in theremaining spectra. Models ranging from 5–20 latent variableswere computed. The PLS model size was selected byperforming an F-test for significance and assessing when thestandard error of cross-validation (SECV) did not significantlydecrease (P ¼ 0.70). The SECV was defined as

SECV ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXn

i¼1

ðCi � CiÞ2

n

vuuutð1Þ

where Ci and Ci reflect the prepared and predicted concentra-tions, respectively, corresponding to spectrum i, and nrepresents the total number of spectra. For days 1, 2, 100,and 101, SECV values were calculated for the 35 samples ofthe first or second half of the calibration data. For theremaining days, the calculation was based on the 21 samples ofthe prediction set.

Figure 2C shows the internal SECV values for eachcalibration and prediction day for models based on thesingle-beam log/I spectra collected on that day. These modelsrequired 7–13 factors. While the model sizes obtained in thisinternal cross-validation exceeded the recommended standardof a 6:1 ratio of samples to latent variables,40 the SECV resultsare only used here as a check on data quality. Similar to the rmsnoise values for the glucose samples, day 124 showed a higherthan expected SECV. When all SECV values were pooled, day124 exceeded the 95% confidence level, confirming ourdecision to treat this day as an outlier.

Model Performance Without Updating. Figure 3 shows aplot of prediction results for glucose over several analysis daysusing calibration set A (collected over days 1 and 2) to build

the PLS model. Values of the standard error of prediction(SEP) are plotted for log/I, single-beam data (black), andabsorbance data using the mean of a triplicate bufferbackground collected just prior to analysis of the predictionsamples (gray). The calculation of SEP is analogous to Eq. 1except that the model built from the calibration data is used topredict all spectra in the prediction set.

The PLS model was optimized using leave-10%-out cross-validation with the calibration set and both single-beam andabsorbance data model sizes were selected to be 10 factorsusing the F-test (P¼0.70). The region of 4700–4300 cm�1 wasused. Both types of models show serious instability over time,and neither approach is acceptable in terms of performance.The single-beam models are especially prone to instrumentaldrift, producing SEP values in excess of 5 mM as timeprogresses. Degradation of the absorbance model is pro-nounced but less severe, indicating that the buffer backgroundis removing the instrumental contribution of the signal to acertain extent, but may not be capturing the instrumental profilethe same way that a perfectly matched background would. Theresults presented in Fig. 3 clearly motivate the need for acalibration updating procedure to account for the effects ofinstrumental drift.

Overview of Model Updating Procedures. The strategyused in the updating method was to append the single-beamspectra of a blank sample (mat or buf ) collected on a givenprediction day to the previously collected spectra of thecalibration samples. The corresponding concentration values of0 mM glucose were added to the vector of sampleconcentrations. Then, PLS regression was applied to the regionof 4700–4300 cm�1 in the augmented calibration data to obtainan updated model to be applied to the prediction samples takenon the same analysis day. After the number of blank spectrawas selected, a new leave-10%-out cross-validation wasperformed for the augmented calibration set using 1–20 PLSfactors. As described previously, an F-test (P ¼ 0.70) wasemployed to select the final model size.

Two approaches were evaluated for use in selecting theblank spectra to add to the calibration set. The first methodattempted to model an on-line, real-time situation where it wasdesirable to collect just enough spectra for updating, then

FIG. 2. Data quality analysis of each collection day showing (A) rms noisevalues for the glucose samples, (B) rms noise values for the blank data, and (C)SECV values for the glucose samples. In plot B, the data sets of blank spectraare plotted in collection order (left to right) for the buf (black) and mat (gray)samples. Plot C gives internal cross-validation results for single-beam, log/Idata. The performance of the instrument seems to be relatively stable for thelong-term sampling with the exception of day 124, which showed high noisevalues and a larger than expected SECV value.

FIG. 3. Prediction results for single-beam log/I spectra (black) and absorbancespectra (gray) using a PLS model computed from calibration set A (collectiondays 1–2). Results outside of day 3 give high errors and indicate the need forupdating the model with information about any new features in the data thatcould have surfaced over time.


proceed immediately with the analysis of the predictionsamples. With this approach, blank spectra would be collected,diagnostics performed, and then a judgment would be madeabout whether the updating was sufficient to allow the datacollection to stop. For this approach, the blank spectra weretreated chronologically and the diagnostic used was Hotelling’sT2 statistic.

For the case of a PLS model with orthogonal scores, the T2

value for a spectrum is the sum of the normalized squaredscores for that spectrum across each PLS factor in the model.The normalization factor for each score is the inverse varianceof the scores for the corresponding PLS factor, computed overthe spectra in the calibration set. The T2 statistic is a commondiagnostic used in multivariate statistical process control.15,41

A 95% confidence limit for the T2 values was used to identifyunusual observations.

To select the number of spectra, the calibration data wereincrementally augmented with the collected spectra of theblank. After each addition, the T2 values were computed for theblank spectra. For this calculation, the number of PLS factorswas held constant at the value previously determined with thecross-validation procedure. While this value is not necessarilyoptimal considering that new information has been added to thecalibration set, it was revealed in testing that the optimalnumber of factors for the new calibration set rarely differed bymore than one from the original estimate. Using the originalnumber of factors as an initial setting was therefore judged tobe an acceptable procedure.

Initial blank spectra added to the existing data usually gavehigh T2 values, which decreased to near or below the 95%threshold after enough were added such that the model nolonger considered them to be unusual observations. In thealgorithm used in this study, it was not a requirement for allblank spectra to be below the T2 limit. The augmentation wasstopped when the number above the threshold remained thesame for 10 consecutive additions. This allowed for thepossibility that one or more blank spectra might never dropbelow the threshold. This procedure also helped to mitigate thearbitrariness of using a fixed cutoff value. Usually less than 50spectra were selected, but in a few cases more than 50 werechosen, and the additional spectra were taken beginning withthe first spectrum and repeating the list chronologically asnecessary.

The second approach to updating was tested in the off-linesense, assuming batches of data could be stored and analyzed ata later time. This allowed us to investigate the possibility ofselecting a small number of blank spectra from the data poolthat would be most suitable for updating. In this experiment,subsets of blank spectra were tested on the basis of variability.Two extremes were chosen to assess whether variability withinthe spectra was beneficial. If little variability were necessary,then the updating would presumably be successful aftercollecting only a small number of spectra and weighting, ifnecessary, by adding multiple copies of the blank spectra intothe calibration set.

The first extreme selected the 10 blank spectra with thelargest Euclidean distances relative to the mean. The secondselection was based on choosing spectra with the 10 highestraw single-beam spectral maxima, which would most likely beof the same temperature and at the highest source output duringthe warm-up collection. After selection of the 10 spectra byeither method, the number of blank spectra was selected as

described previously and the data were allowed to repeat ifnecessary. The subsets from the first procedure, based ondistance, are denoted highvar, signifying a high amount ofvariability, while the second subset, based on spectral maxima,is termed lowvar.

A final subset selection method, tsqelim, was employed toeliminate corrupted or aberrant spectra that may not be suitablefor the updating procedure. This method was simply an outlierelimination method that excluded spectra on the basis ofabnormally high T2 values. For all spectra, the T2 values werecomputed from a single-beam model using the originalcalibration set plus the mat spectra taken on the calibrationdays. Addition of blank spectra from the calibration days wasnot found to be helpful in the final updating procedure, but washelpful for the computation of T2 values to flag outlyingspectra. In order to justify using T2 values to detect outlyingspectra, the model needed to be provided with ‘‘normal’’(calibration day) blanks. For the selection method using T2

values, the prediction day blanks were treated as unknownsamples, and the spectrum with the lowest T2 value (T2

low) wasselected. A threshold, Z, was established as

Z ¼ 1:5 � T2low ð2Þ

All blank spectra with T2 values less than or equal to Z wereselected in the subset used for updating. This value was shownto exclude aberrant data from the four prediction dayscontaining some blank spectra with higher than expected noiselevels, yet be inclusive of the majority of data on a typical day.The threshold in Eq. 2 was selected on the basis ofinvestigating our specific data and might need to be adjustedwhen applying the updating procedure to other spectra. Use ofthe tsqelim procedure also has a disadvantage that is commonto all data analysis methods that involve removal ofobservations: the possibility exists that the outlying blankspectrum may indicate a systematic change in the spectrometerand that this information may in fact be necessary for theupdating procedure to work most effectively.

Evaluation of Type of Blank. Two types of blank spectrawere investigated for use in updating the calibration model.Figure 4 shows examples of the raw, single-beam spectra of thebuf (solid line) and mat (dashed line) samples. Figure 5illustrates changes in the spectral profiles of the buf samplesover time. For each set of blank spectra collected on aparticular day, the mean spectrum was computed, and the meanof day 1 (the first calibration day) was subtracted. Numbersnext to the spectra indicate the analysis day. Days 2 and 3 (boldlines) are similar to day 1 in terms of output intensity (thedifference spectra are close to zero), and the derivative-likeshape indicates a spectral shift indicative of a slighttemperature mismatch in the solution relative to day 1. Outsideof day 3 (beginning with day 30), the spectra show variationsin both intensity and profile. These changes must be handledproperly by the calibration model if it is to perform well inprediction.

For days 44–79, Fig. 6A shows the prediction resultsobtained when the two types of blanks were used with theupdating algorithm. Calibration set A was employed, and theupdating procedure described previously based on theincremental (chronological) addition of blank spectra to thecalibration set was used. The SEP values for models thatincluded the buf spectra are plotted with black bars, followed


by similar results for the mat (gray) spectra. The results fromthese prediction days are presented as typical.

Results obtained with the buf spectra were inconsistent,giving good results for some days, but fairly poor resultsoverall. This indicated that it was better to include the matrixconstituents in the blank rather than trying to capture the newfeatures with solvent alone. Possibly, the single-beam profilesbetween the two are different enough that the instrumental driftfeatures may manifest themselves differently when superim-posed on the two types of spectral backgrounds. This resultmay be data-dependent.

A second test was performed in which the blank spectra weretreated as unknown samples and analyzed with modelsconstructed from the prediction samples taken on thecorresponding analysis day. As outlined previously, an internalcross-validation was performed with the prediction set and thePLS model size was chosen according to the F-test of theSECV values. These model sizes are reported in Table I. Figure6B shows the results of estimating the blank concentrations (0mM glucose) from the prediction data for days 44–79. Barcolors are the same as indicated above.

Figure 6B indicates that the models based on the predictionsamples had the easiest time predicting zero concentrationglucose for the mat spectra. In other words, the existing datarecognized the mat spectra as samples containing zero analytewhile the buf spectra produced poorer, less consistentpredictions.

It was hypothesized that the results obtained with the bufspectra would perhaps improve if the model were also providedwith buffer spectra collected on the calibration day. Thismodified procedure was evaluated, but the final predictionresults were still not as good as those obtained with the matspectra. The remaining results presented will thus focus onupdating procedures that employ the mat spectra.

Selection of Number of Blank Spectra Using the T2

Limit. To judge the effectiveness of the T2 cutoff, anexhaustive grid-search of prediction results for various dayswas tested, varying the number of blank spectra added and PLSmodel size. The number of spectra was tested from 0–50 in step

sizes of 5, then 50–210 (the number of spectra in the calibration

itself) in step sizes of 20. For each number of spectra, q, added,

the PLS model sizes were tested from 1–20 factors. The

prediction results were sorted according to the best SEP for

each number of blank spectra added, and for nearly all data sets

predictions did not deteriorate through q ¼ 210 as long as the

FIG. 5. Single-beam spectra of the mean of all buf spectra collected during theinstrumental warm-up sessions on various analysis days, with the mean fromday 1 subtracted. Numbers point to analysis days or groups of days. Days 2 and3 are close to day 1 in terms of instrumental output intensity, and the resultingdifference spectra have low intensities. The derivative-like shapes are indicativeof temperature variation from the mean spectrum of day 1. Other days show notonly drift in output intensity, but also asymmetric, irregular profiles whencompared to those from day 1.

FIG. 6. (A) Prediction results (SEP values in mM) for model updating fromdays 44 through 79 using calibration set A (collected on days 1–2) with log/Ispectra. Black and gray bars denote updating using the buf and mat spectra,respectively. Results indicate that the mat spectra were the most successful atupdating the calibration. (B) SEP values obtained when the blank spectra wereused as prediction samples. An internal PLS model was computed within eachprediction day using the prediction day glucose samples to formulate the model(model sizes were selected via cross-validation). Colors assigned to the bars arethe same as in panel A. Results indicate that the glucose samples recognized themat spectra as being close to zero analyte concentration (low SEP values),while the data had a more difficult time assigning a zero concentration to thebuf spectra.

FIG. 4. Single-beam profiles of phosphate buffer (solid) and the mean matrix(dashed). Each trace reflects the mean of the batch of samples collected on day44. The change in spectral shapes arises from the presence of BSA in thesample matrix.


model size was appropriate. This generally meant an increaseof 1–2 factors from the original PLS calibration, and the resultsshowed that the optimal number of blank spectra increased asthe prediction data grew further away from the time frame ofthe calibration. For the data used here, there seemed to beflexibility in selecting the number of blank spectra, and it wasbetter to err on the side of too many spectra added. The dangerin adding too many blanks would be the diminishing ofcalibration information that is relevant as the data matrix

becomes overwhelmed with the blank spectra, but this did notseem to be a problem for our data through the numbersearched. When confirmed with the exhaustive predictionresults, the weighting procedure using the T2 limit was a goodmethod for selecting the number of blank spectra to add.

Blank-Augmented Model Sizes. Selection of the PLSmodel size had a greater impact on the prediction results thandid the number of blank spectra selected. Generally the modelsize increased by one or more factors after augmentation, but insome instances remained the same. The SECV values for theupdated models reflected the original calibration samples(containing glucose) and new blank spectra (containing noglucose). Errors were always low for the new blanks addedbecause the model was able to characterize these samples fairlywell. Therefore, the SECV values of the updated models werenot dramatically different from the original calibration, andmodel sizes remained similar. When checked with the results ofthe exhaustive grid search, the SECV was a good selectionmethod, and P ¼ 0.70 was more successful than P ¼ 0.95 forthe F-test. Using a less strict probability level resulted inslightly higher model sizes, which improved prediction resultsfor most days.

Results from Updated Models. Figures 7A and 7B showresults for the updating of calibration sets A and B,respectively, using the full sets of mat data added inchronological order (no subset selection). Model sizes weredetermined by the cross-validation procedure describedpreviously. The gray bars show SEP values without updatingthe calibration and the black bars show the results usingcalibration updating. Additional information about the modelsbuilt from calibration sets A and B are presented in Tables Iand II, respectively. The first row of each table corresponds tocalibration data (A or B) without updating, while the rest of the

TABLE I. Summary of updating results and diagnostics for calibration set A.

Day BA-SEParms noise(lAU)b PLSc SECVd SECd qe NATf TUCLg

1–2 — 33 10 0.63 0.58 0 n/a —3 0.73 32 11 0.62 0.56 23 0 0.2130 1.57 52 10 0.71 0.65 41 0 0.9537 0.84 53 11 0.66 0.59 33 1 0.9244 0.96 48 10 0.68 0.63 34 1 0.6351 1.02 54 10 0.70 0.65 38 1 0.6558 1.02 54 11 0.66 0.60 26 0 0.9865 0.74 56 10 0.71 0.67 28 0 0.9872 0.86 60 12 0.63 0.57 35 2 1.0079 1.09 53 11 0.68 0.62 26 0 1.00102 0.86 120 11 1.08 0.68 34 1 1.00109 0.87 93 11 1.13 0.70 28 1 1.00136 1.50 62 11 0.69 0.63 35 2 1.00143 0.96 59 12 0.66 0.59 36 0 1.00150 1.14 245 11 0.71 0.65 61 1 1.00157 0.69 49 10 0.70 0.65 50 1 1.00164 1.23 52 10 0.66 0.61 40 2 1.00171 4.01 348 13 0.81 0.74 63 24 1.00178 0.79 46 11 0.65 0.59 34 1 1.00

a Values of SEP obtained by application of the background augmentation procedure to the spectral range of 4700–4300 cm�1.b Noise computed over 4500–4300 cm�1 as described in the text and averaged over the day’s set of blank spectra.c Number of PLS latent variables chosen after F-test of SECV values using blank-augmented calibration set A.d Result from original calibration set A (days 1–2) or the augmented calibration set (days 3–178).e Number of blank spectra selected by the chronological updating algorithm.f Number of blank spectra still above the T 2 threshold after the selection procedure.g Fraction of spectra in the prediction set (63 total) exceeding the 99% UCL for the T 2 values.

FIG. 7. Prediction results with model updating for single-beam log/I modelsbased on (A) calibration set A and (B) calibration set B. Gray bars show PLSresults with no updating, while black bars indicate updating results using themat spectra. The number of blank spectra added to the calibration set waschosen by adding the mat spectra in chronological order, terminating accordingto the T 2 cutoff described in the text. The PLS model sizes were chosen by thecross-validation and F-test procedures (P ¼ 0.70) described in the text. Theplots are clipped at SEP¼ 5 mM for clarity in viewing the region below 2 mM.


rows are the diagnostics for the updated models using thecalibration set in row 1.

Updating the calibration brings the prediction results near orbelow 1 mM in most cases. These were good results whencompared to the internal SECV values (Fig. 2C) and the SECVand standard error of calibration (SEC) values listed in Table I,which generally lie between 0.6 mM and 1 mM. Thecalculation of SEC is analogous to Eq. 1, but employs thecalibration samples and adjusts the degrees of freedom toaccount for the number of PLS model terms. The SEC withoutupdating was 0.58 mM for calibration set A (model size ¼ 10PLS factors) and also 0.58 mM for calibration set B (12 PLSfactors). Therefore, SEP values approaching 1 mM wereconsidered successful for this experiment.

One aberrant prediction value was that of day 171 in Fig. 7A(calibration set A), which gave poor results due to most of thedata being severely corrupted in some way. Upon examiningthe raw spectra, it was apparent that the blank spectra from day171 suffered from something more than simple drift, as thereappeared to be light reaching the detector in regions that wewould not expect, possibly from bubble formation within thesample cell. Other days that suffered abnormal spectra weredays 102, 109, and 150, although the severity and number ofspectra that were corrupted were not as great as day 171. Theupdating algorithm still improved results for these days,indicating the robustness of the method when using blankspectra with poor data quality.

These four analysis days gave higher than average noisevalues, but an even more dramatic indicator was the T2 valuesof these data points within the PLS models. The easiest way ofspotting the corrupt spectra was to use one blank spectrum as abackground for an absorbance computation with the glucosesamples and other blank spectra collected on the analysis day.Absorbance data with abnormal backgrounds yielded very high

T2 values (100–1000) for the four analysis days in whichcorrupt spectra were obtained.

Diagnostics for Assessing Performance. The abovediscussion motivates the need for a general diagnosticprocedure to identify when the model updating method hasworked well enough to allow the user to accept the predictedconcentrations. The standard practice in multivariate calibra-tion is to apply outlier testing to the prediction spectra to assesstheir compatibility with the calibration data and hence, with thecalibration model. To evaluate this approach in the context ofthe model updating method, the T2 values of the spectra in theprediction set were examined when compared to the augmentedcalibration sets.

To identify outliers among the prediction spectra, an uppercontrol limit (UCL) for T2 was set at the 99% level of thetheoretical distribution.42 Tables I and II list the fraction of the63 spectra in each prediction set that exceeded the UCL. On thebasis of the T2 UCL, most or all of the prediction spectra arejudged to be outliers, even after augmenting the calibration setwith spectra of the blank.

Figure 8 illustrates the reason for this result. Scores along thefirst and second PLS factors are plotted for the calibration(circles) and prediction (triangles) spectra corresponding todays 157 (Fig. 8A) and 178 (Fig. 8B). The blank spectra addedto the calibration set are enclosed by an ellipse in each plot. Fordays 157 and 178, the spectra of the blank sample clearlymatch the spectra in the prediction set and both groups aredistinct relative to the spectra of the calibration samples. Fromthe perspective of outlier detection, the blank spectra do notcontribute enough overall weight in the calibration data set toprevent the prediction spectra from being judged to be unusualdata points. However, by adding the blank spectra to thecalibration set, the recomputed PLS models are able to performvery well with the prediction samples, producing SEP values of0.69 and 0.79 mM for days 157 and 178, respectively.

TABLE II. Summary of updating results and diagnostics for calibration set B.

Day BA-SEParms noise(lAU)b PLSc SECVd SECd qe NATf TUCLg

100–101 — 57 12 0.65 0.58 0 n/a —3 0.78 32 12 0.69 0.62 30 0 0.9730 0.96 52 12 0.77 0.68 79 0 1.0037 0.78 53 12 0.76 0.68 33 3 0.7344 1.02 48 11 0.73 0.66 40 2 0.7151 1.14 54 12 0.70 0.63 61 0 0.5658 0.85 54 12 0.79 0.70 29 1 0.5965 1.03 56 11 0.80 0.72 60 1 0.8472 1.76 60 12 0.70 0.62 27 3 1.0079 1.05 53 11 0.80 0.72 45 1 1.00102 1.79 120 11 0.85 0.77 47 4 0.33109 1.77 93 12 0.87 0.74 29 1 0.76136 1.33 62 12 0.85 0.77 33 4 1.00143 1.19 59 11 0.84 0.77 36 1 1.00150 1.15 245 11 0.79 0.71 59 2 0.97157 1.09 49 11 0.74 0.68 36 2 1.00164 1.22 52 11 0.76 0.70 47 2 0.94171 1.43 348 14 0.86 0.77 60 25 1.00178 1.14 46 12 0.74 0.66 31 1 1.00

a Values of SEP obtained by application of the background augmentation procedure to the spectral range of 4700–4300 cm�1.b Noise computed over 4500–4300 cm�1 as described in the text and averaged over the day’s set of blank spectra.c Number of PLS latent variables chosen after F-test of SECV values using blank-augmented calibration set B.d Result from original calibration set B (days 100–101) or the augmented calibration set (days 3–178).e Number of blank spectra selected by the chronological updating algorithm.f Number of blank spectra still above the T 2 threshold after the selection procedure.g Fraction of spectra in the prediction set (63 total) exceeding the 99% UCL for the T 2 values.


Unfortunately, however, use of the theoretical T2 99% UCL isapparently not helpful as a diagnostic in forecasting predictionperformance.

An internal reference distribution for T2 was also evaluatedas an outlier detection method. With this approach, thereplicate spectra of each sample (including the blank) werewithheld in turn from the augmented calibration set, the T2

values of the withheld spectra were computed, and a referencedistribution of the T2 values was accumulated. The T2 valuesof the prediction spectra were then compared to thecorresponding reference distribution with a 99% criterion usedfor assessing outliers. With this test, all prediction spectra oneach day were judged not to be outliers. Thus, while thisconfirms that the blank spectra are effective in helping toaccount for the variation in the prediction spectra on a givenday, this procedure was not helpful in forecasting predictionperformance (i.e., no correlation was found between error inpredicted concentration and the probability level for theprediction spectra in the reference distribution).

An inspection of other values reported in Tables I and II canalso be used for diagnostic purposes. For example, the valuesfor day 171 have aberrant values for the PLS model size andthe number above the T2 threshold after selection of the blanksto add. High short-term noise did not necessarily result in apoor updating performance, but it is logical to assume that poordata quality might be problematic. Noisy or corrupted datacould be diagnosed and monitored on-line as the data are beingcollected (not done in this experiment). Unfortunately, therewas not a single diagnostic that could be directly correlated tothe success or failure of the updating procedure. Therefore,several diagnostics should be evaluated. In a real setting,thresholds could be set for specific diagnostics to indicate whenthe updating procedure may no longer be successful. If a higherlevel of confidence were needed, the re-sampling of one or

more calibration samples would need to take place to test thevalidity of the updating.

Variability in Blank Spectra. Figures 9A and 9B illustratethe effects of variability in the blank spectra for calibration setsA and B. The gray, white, and black bars denote the SEPvalues corresponding to the highvar, lowvar, and tsqelimselection procedures, respectively. For the highvar subsetselection, days 109 and 171 (two of the four days with low-quality blank spectra) gave poor results with calibration A, butbetter results using calibration B, possibly because thiscalibration was closer in time to these prediction sets. Thehighvar SECV and SEC diagnostics for calibration set A wouldhave flagged the poor performance for day 109 (2.44 mM and2.44 mM, respectively, nearly a three-fold increase from thenormal range of values in Table I). The lowvar subset selectionimproved the results of day 171 for calibration set A, indicatingthat subset selection improved noisy or corrupted data.Similarly, the selection method based on T2 values improvedday 171, while also maintaining good results for the rest of thedata. These results suggest that, in general, no subset selectionis necessary unless the data are corrupted. Using the full datasets of blank spectra in chronological order gave the bestresults on a typical analysis day. A check on the integrity of thespectra could be easily incorporated into the procedure toidentify the isolated occurrences of corrupted spectra.

CONCLUSION

The results of this study indicate that the proposed algorithmwas a successful model updating technique for the analysis ofglucose in a simulated biological matrix. A simple protocolwas devised whereby blank spectra were collected prior to datacollection during the time period of instrumental warm-up. Thefinal protocol employed a blank sample containing only matrixcomponents without glucose analyte, using a batch of data thatwas collected in roughly 20 minutes.

To facilitate rapid implementation of the updating protocol,the data quality of the blank spectra was compromised in terms

FIG. 9. Prediction results with model updating for (A) calibration set A and(B) calibration set B according to variability within the set of blank spectraadded to the calibration set. The SEP values are clipped at 5 mM for clarity inviewing the region below 2 mM. The gray, white, and black bars correspond tothe highvar, lowvar, and tsqelim selection methods, respectively. With theexception of days where the blank data were corrupted or suffered from highnoise, none of these subset selection methods offered an improvement in resultscompared to using the full set of data in chronological order.

FIG. 8. Scores along the first and second PLS factors are plotted for calibration(circles) and prediction (triangles) spectra corresponding to (A) day 157 and (B)day 178. The blank spectra added to the calibration set are enclosed by anellipse in each plot.


of the number of coadded interferograms, as well as beingcollected during the warm-up period. Despite these limitations,the updated models performed well in prediction. For example,with calibration set A, the SEC was 0.58 mM glucose and theSEP for day 178 was 13.4 mM when the original model wasused. After updating the calibration using the blank spectra, theSEP was 0.79 mM. For the few cases in which problems wereencountered with the collected blank spectra, the resultspresented here indicate that corrupted data were easily spottedand could be handled in real time during the collection period.

ACKNOWLEDGMENTS

This research was supported by the National Institutes of Health under grantsDK67445 and DK60657.

1. E. Bouveresse and D. L. Massart, Vib. Spectrosc. 11, 3 (1996).2. R. N. Feudale, N. A. Woody, H. Tan, A. J. Myles, S. D. Brown, and J.

Ferre, Chemom. Intell. Lab. Syst. 64, 181 (2002).3. H. Leion, S. Folestad, M. Josefson, and A. Sparen, J. Pharm. Biomed.

Anal. 37, 47 (2005).4. H. Swierenga, P. J. de Groot, A. P. de Weijer, M. W. J. Derksen, and L. M.

C. Buydens, Chemom. Intell. Lab. Syst. 41, 237 (1998).5. H. Martens and E. Stark, J. Pharm. Biomed. Anal. 9, 625 (1991).6. H. Martens, J. P. Nielsen, and S. B. Engelsen, Anal. Chem. 75, 394 (2003).7. Y. Wang, D. J. Veltkamp, and B. R. Kowalski, Anal. Chem. 63, 2750

(1991).8. O. E. de Noord, Chemom. Intell. Lab. Syst. 25, 85 (1994).9. D. M. Haaland, Appl. Spectrosc. 54, 246 (2000).

10. M. J. Saiz-Abajo, B.-H. Mevik, V. H. Segtnan, and T. Næs, Anal. Chim.Acta 533, 147 (2005).

11. P. Tillmann, T.-C. Reinhardt, and C. Paul, J. Near Infrared Spectrosc. 8,101 (2000).

12. M. Westerhaus, ‘‘Improving repeatability of NIR calibrations acrossinstruments’’, in Proc. Third Intl. Conf. Near Infrared Spectrosc., R. Bistonand N. Bartiaux-Thill, Eds. (Agricultural Research Centre Publishing,Gembloux, Belgium, 1990), p. 671.

13. L. Zhang, G. W. Small, and M. A. Arnold, Anal. Chem. 74, 4097 (2002).14. L. Zhang, G. W. Small, and M. A. Arnold, Anal. Chem. 75, 5905 (2003).15. C. L. Stork and B. R. Kowalski, Chemom. Intell. Lab. Syst. 48, 151

(1999).

16. T. Kourti and J. F. MacGregor, Chemom. Intell. Lab. Syst. 28, 3 (1995).17. C. E. Miller, Chemom. Intell. Lab. Syst. 30, 11 (1995).18. B. S. Dayal and J. F. MacGregor, J. Proc. Cont. 7, 169 (1997).19. D. M. Haaland and D. K. Melgaard, Appl. Spectrosc. 54, 1303 (2000).20. D. M. Haaland and D. K. Melgaard, Appl. Spectrosc. 55, 1 (2001).21. D. M. Haaland and D. K. Melgaard, Vib. Spectrosc. 29, 171 (2002).22. C. M. Wehlburg, D. M. Haaland, and D. K. Melgaard, Appl. Spectrosc. 56,

877 (2002).23. C. M. Wehlburg, D. M. Haaland, D. K. Melgaard, and L. E. Martin, Appl.

Spectrosc. 56, 605 (2002).24. D. K. Melgaard, D. M. Haaland, and C. M. Wehlburg, Appl. Spectrosc. 56,

615 (2002).25. Y. Liang, K. Fang, and Q. Xu, Chemom. Intell. Lab. Syst. 58, 43 (2001).26. M. J. Mattu, G. W. Small, and M. A. Arnold, Anal. Chem. 69, 4695

(1997).27. N. A. Cingo, G. W. Small, and M. A. Arnold, Vib. Spectrosc. 23, 103

(2000).28. Q. Ding, G. W. Small, and M. A. Arnold, Appl. Spectrosc. 53, 402 (1999).29. G. Lu, X. Zhou, M. A. Arnold, and G. A. Small, Appl. Spectrosc. 51, 1330

(1997).30. M. J. Wabomba, G. W. Small, and M. A. Arnold, Anal. Chim. Acta 490,

325 (2003).31. C. J. Petty, G. M. Warnes, P. J. Hendra, and M. Judkins, Spectrochim.

Acta, Part A 47, 1179 (1991).32. H. M. Heise and A. Bittner, J. Molecular Structure 348, 127 (1995).33. P. Geladi, D. McDougall, and H. Martens, Appl. Spectrosc. 39, 491

(1985).34. A. Savitzky and M. J. E. Golay, Anal. Chem. 36, 1627 (1964).35. M. Zeaiter, J.-M. Roger, V. Bellon-Maurel, and D. N. Rutledge, Trends

Anal. Chem. 23, 157 (2004).36. R. Bro and A. K. Smilde, J. Chemom. 17, 16 (2003).37. J. Luyparet, S. Heuerding, Y. V. Heyden, and D. L. Massart, J. Pharm.

Biomed. Anal. 36, 495 (2004).38. Q. Guo, W. Wu, and D. L. Massart, Anal. Chim. Acta 382, 87 (1999).39. S. T. Pan, H. Chung, M. A. Arnold, and G. W. Small, Anal. Chem. 68,

1124 (1996).40. Standard Practices for Infrared Multivariate Quantitative Analysis

(American Society for Testing and Materials, West Conshohocken, PA,2000).

41. X. Capron, B. Walczak, O. E. de Noord, and D. L. Massart, Chemom.Intell. Lab. Syst. 76, 205 (2005).

42. R. D. Maesschalck, D. Jouan-Rimbaud, and D. L. Massart, Chemom.Intell. Lab. Syst. 50, 1 (2000).


blank augmentation protocol for improving the robustness of multivariate calibrations

Documents