determination of organic contaminants in aqueous samples by near-infrared spectroscopy

8
Volume 54, Number 7, 2000 APPLIED SPECTROSCOPY 1047 0003-7028 / 00 / 5407-1047$2.00 / 0 q 2000 Society for Applied Spectroscopy Determination of Organic Contaminants in Aqueous Samples by Near - Infrared Spectroscopy QING DING, BRIAN L. BOYD, and GARY W. SMALL * Center for Intelligent Chemical Instrumentation, Department of Chemistry and Biochemistry, Clippinger Laboratories, Ohio University, Athens, Ohio 45701 The feasibility of determining low levels of organic solvents in water by near-infrared (near-IR) spectroscopy is investigated. Mixture samples of tributyl phosphate (TBP) and methyl iso-butyl ketone (MIBK) are determined in aqueous solutions over the concentration range of 1± 160 ppm. Through the use of C± H combination bands in the region of 5000± 4000 cm 2 1 , suf® cient selectivity is obtained to determine each compound in the presence of the other. Separate multivariate calibration models are computed for each compound by use of a combination of bandpass Fourier digital ® ltering and partial least-squares (PLS) regression with both analysis of absor- bance and single-beam spectra. A genetic algorithm is used to im- plement a joint optimization of the parameters governing the ® lter- ing and PLS calculations. Through the use of this procedure, a calibration model based on absorbance spectra is computed for MIBK with a standard error of prediction (SEP) of 3.82 ppm over the 1± 160 ppm range. This ® ve-term model utilizes the spectral range of 4495± 4335 cm 2 1 . A similar nine-term model based on ab- sorbance spectra is computed for TBP over the 4620± 4320 cm 2 1 range. For the range of 1± 100 ppm, an SEP of 4.84 ppm is achieved. The results obtained from the analysis of single-beam spectra are comparable with those obtained in the analysis of absorbance data. Calibration models computed with samples prepared in natural wa- ter are also found to have a similar level of performance. These results establish the feasibility of using near-IR spectroscopy to screen water samples for solvent contamination. Index Headings: Near-infrared spectroscopy; Multivariate calibra- tion; Genetic algorithms; Water analysis. INTRODUCTION The need for cleanup of hazardous waste sites is an issue of national and international concern. In any re- mediation effort, characterization of the site is one of the most important and costly tasks. 1±3 One of the challenges of waste characterization is the ability to analyze static and ¯ owing water sources for contamination by organic solvents. Gas chromatography mass spectrometry (GC- MS) is currently the most widely used technique for de- termination of volatile and semivolatile organic com- pounds in water. However, GC-MS is dif® cult to apply for in situ on-line or ® eld analysis, which is an important component of many site characterization efforts. 4,5 To ex- pedite site characterization and keep the analytical costs as low as possible, a major research effort has been launched to develop ® eld-deployable analytical sensors. 2,6 As a potential technique for in situ monitoring, near- infrared (near-IR) spectroscopy is investigated in this pa- per for its feasibility in screening water samples for con- tamination by organic solvents over the concentration range of 1±160 ppm. Tributyl phosphate (TBP) and meth- Received 28 September 1999; Accepted 15 March 2000. * Author to whom correspondence should be sent. yl iso-butyl ketone (MIBK), common solvents employed in the nuclear fuel reprocessing process, are used as target compounds in this investigation. The use of near-IR spectroscopy for in situ determi- nations of organic solvents in water is complicated by (1) the strong absorbance of water and the corresponding weak absorbance of many target analytes (especially at low concentrations); (2) the problem of interferences caused by overlapping spectral bands of the sample con- stituents; (3) the potential for large spectral baseline var- iations arising from the temperature sensitivity of the near-IR spectrum of water; and (4) the dif® culty in ob- taining representative background measurements for use in spectral processing. A key component of the work de- scribed here is an evaluation of whether these impedi- ments can be suf® ciently overcome through suitable data handling strategies. EXPERIMENTAL Instrumentation. The near-IR spectra used in this work were collected with a Digilab FTS-60A Fourier transform spectrometer (Bio-Rad, Cambridge, MA). The spectrometer was con® gured with a 100 W tungsten±hal- ogen source, CaF 2 beamsplitter, and liquid nitrogen- cooled InSb detector. The near-IR range of 5000±4000 cm 2 1 served as the focus of the analysis. A K-band op- tical interference ® lter (Barr Associates, Westford, MA) was used to isolate this spectral region. For the samples prepared in reagent water, transmission measurements were performed with samples held in a rectangular In- frasil quartz cell with 2 mm pathlength (International Crystal Laboratories, Gar® eld, NJ). Samples prepared in natural water were placed in a demountable liquid trans- mission cell with a 20 mm-diameter circular aperture (Model 118-3, Wilmad Glass, Buena, NJ), sapphire win- dows (Meller Optics, Providence, RI), and a 1.5 mm pathlength. For samples measured in the quartz cell, sam- ple temperatures were controlled to the range of 24.5± 25.5 8 C through the use of a water-jacketed cell holder and refrigerated temperature bath. The demountable cell had an integrated water jacket that allowed temperature control to the range of 25.3±25.8 8 C. Temperatures of the sample solutions were monitored during the data collec- tion by use of a type-T thermocouple and digital ther- mocouple meter (Omega Engineering, Stamford, CT). Reagents. Reagent-grade MIBK and TBP were ob- tained from common suppliers and used without further puri® cation. For samples prepared in reagent water, the water used was obtained by passing house-distilled water through a Milli-Q Plus water puri® cation system (Milli- pore Corp., Bedford, MA). Water was puri® ed immedi-

Upload: gary-w

Post on 02-Oct-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Determination of Organic Contaminants in Aqueous Samples by Near-Infrared Spectroscopy

Volume 54, Number 7, 2000 APPLIED SPECTROSCOPY 10470003-7028 / 00 / 5407-1047$2.00 / 0

q 2000 Society for Applied Spectroscopy

Determination of Organic Contaminants in AqueousSamples by Near-Infrared Spectroscopy

QING DING, BRIAN L. BOYD, and GARY W. SMALL *Center for Intelligent Chemical Instrumentation, Department of Chemistry and Biochemistry, Clippinger Laboratories, Ohio

University, Athens, Ohio 45701

The feasibility of determining low levels of organic solvents in water

by near-infrared (near-IR) spectroscopy is investigated. Mixture

samples of tributyl phosphate (TBP) and methyl iso-butyl ketone(MIBK) are determined in aqueous solutions over the concentration

range of 1± 160 ppm. Through the use of C± H combination bands

in the region of 5000± 4000 cm 2 1, suf ® cient selectivity is obtained todetermine each compound in the presence of the other. Separate

multivariate calibration models are computed for each compound

by use of a combination of bandpass Fourier digital ® ltering andpartial least-squares (PLS) regression with both analysis of absor-

bance and single-beam spectra. A genetic algorithm is used to im-

plement a joint optimization of the parameters governing the ® lter-

ing and PLS calculations. Through the use of this procedure, a

calibration model based on absorbance spectra is computed for

MIBK with a standard error of prediction (SEP) of 3.82 ppm overthe 1± 160 ppm range. This ® ve-term model utilizes the spectral

range of 4495± 4335 cm 2 1. A similar nine-term model based on ab-

sorbance spectra is computed for TBP over the 4620± 4320 cm 2 1

range. For the range of 1± 100 ppm, an SEP of 4.84 ppm is achieved.

The results obtained from the analysis of single-beam spectra are

comparable with those obtained in the analysis of absorbance data.Calibration models computed with samples prepared in natural wa-

ter are also found to have a similar level of performance. These

results establish the feasibility of using near-IR spectroscopy toscreen water samples for solvent contamination.

Index Headings: Near-infrared spectroscopy; Multivariate calibra-

tion; Genetic algorithms; Water analysis.

INTRODUCTION

The need for cleanup of hazardous waste sites is anissue of national and international concern. In any re-

mediation effort, characterization of the site is one of themost important and costly tasks.1±3 One of the challengesof waste characterization is the ability to analyze staticand ¯ owing water sources for contamination by organicsolvents. Gas chromatography mass spectrometry (GC-

MS) is currently the most widely used technique for de-

termination of volatile and semivolatile organic com-

pounds in water. However, GC-MS is dif® cult to applyfor in situ on-line or ® eld analysis, which is an importantcomponent of many site characterization efforts.4,5 To ex-

pedite site characterization and keep the analytical costsas low as possible, a major research effort has beenlaunched to develop ® eld-deployable analytical sensors. 2,6

As a potential technique for in situ monitoring, near-

infrared (near-IR) spectroscopy is investigated in this pa-per for its feasibility in screening water samples for con-

tamination by organic solvents over the concentrationrange of 1±160 ppm. Tributyl phosphate (TBP) and meth-

Received 28 September 1999; Accepted 15 March 2000.* Author to whom correspondence should be sent.

yl iso-butyl ketone (MIBK), common solvents employedin the nuclear fuel reprocessing process, are used as targetcompounds in this investigation.

The use of near-IR spectroscopy for in situ determi-

nations of organic solvents in water is complicated by (1)the strong absorbance of water and the correspondingweak absorbance of many target analytes (especially atlow concentrations); (2) the problem of interferencescaused by overlapping spectral bands of the sample con-

stituents; (3) the potential for large spectral baseline var-

iations arising from the temperature sensitivity of thenear-IR spectrum of water; and (4) the dif® culty in ob-

taining representative background measurements for usein spectral processing. A key component of the work de-

scribed here is an evaluation of whether these impedi-

ments can be suf® ciently overcome through suitable datahandling strategies.

EXPERIMENTAL

Instrumentation. The near-IR spectra used in thiswork were collected with a Digilab FTS-60A Fouriertransform spectrometer (Bio-Rad, Cambridge, MA). Thespectrometer was con® gured with a 100 W tungsten±hal-

ogen source, CaF 2 beamsplitter, and liquid nitrogen-

cooled InSb detector. The near-IR range of 5000±4000cm 2 1 served as the focus of the analysis. A K-band op-

tical interference ® lter (Barr Associates, Westford, MA)was used to isolate this spectral region. For the samplesprepared in reagent water, transmission measurementswere performed with samples held in a rectangular In-

frasil quartz cell with 2 mm pathlength (InternationalCrystal Laboratories, Gar® eld, NJ). Samples prepared innatural water were placed in a demountable liquid trans-mission cell with a 20 mm-diameter circular aperture(Model 118-3, Wilmad Glass, Buena, NJ), sapphire win-

dows (Meller Optics, Providence, RI), and a 1.5 mmpathlength. For samples measured in the quartz cell, sam-

ple temperatures were controlled to the range of 24.5±25.5 8 C through the use of a water-jacketed cell holderand refrigerated temperature bath. The demountable cellhad an integrated water jacket that allowed temperaturecontrol to the range of 25.3±25.8 8 C. Temperatures of thesample solutions were monitored during the data collec-

tion by use of a type-T thermocouple and digital ther-

mocouple meter (Omega Engineering, Stamford, CT).Reagents. Reagent-grade MIBK and TBP were ob-

tained from common suppliers and used without furtherpuri® cation. For samples prepared in reagent water, thewater used was obtained by passing house-distilled waterthrough a Milli-Q Plus water puri® cation system (Milli-

pore Corp., Bedford, MA). Water was puri® ed immedi-

Page 2: Determination of Organic Contaminants in Aqueous Samples by Near-Infrared Spectroscopy

1048 Volume 54, Number 7, 2000

FIG. 1. TBP vs. MIBK concentrations for the three data sets. Calibra-

tion and prediction samples are denoted by open circles (V) and closedtriangles ( m ), respectively. (A) MIBK/TBP data set. The correlationcoef® cients between the MIBK and TBP concentrations for the calibra-

tion and prediction subsets are 2 0.128 and 0.021, respectively. (B)TBP/MIBK data set. The correlation coef ® cients between the MIBKand TBP concentrations for the calibration and prediction subsets are0.097 and 2 0.385, respectively. (C ) Samples prepared in natural water.The correlation coef ® cients between the MIBK and TBP concentrationsfor the calibration and prediction subsets are 0.196 and 2 0.658, re-

spectively.

ately before use. Stock solutions were prepared by di-

rectly weighing chilled MIBK or TBP into a volumetric¯ ask containing water. Individual samples were then pre-pared by dilution of the stock solution. All solutions wereprepared immediately before the collection of the spectraldata.

Samples of pure MIBK, pure TBP, and mixtures of thetwo in reagent water were analyzed. Seventeen pureMIBK samples were prepared over the range of 1±160ppm, and 20 pure TBP samples were made over the rangeof 1±100 ppm. Two mixture data sets were then assem-

bled. The MIBK/TBP (MIBK as analyte, TBP as inter-

ference) mixture set contained 67 samples. The 17 pureMIBK samples were included along with 50 samples ofMIBK/TBP mixtures. These 50 samples had MIBK levelsof 1±100 ppm and TBP levels of either 25 ppm (4 sam-

ples), 50 ppm (20 samples), 75 ppm (3 samples), 100ppm (10 samples), 125 ppm (3 samples), or 150 ppm (10samples). Figure 1A plots TBP vs. MIBK concentrationsfor this data set.

A similar design was used for the total of 69 TBP/MIBK (TBP as analyte, MIBK as interference) mixturesamples. This data set consisted of the 20 pure TBP sam-

ples (1±100 ppm), plus 49 samples with TBP concentra-

tions in the range of 1±100 ppm and MIBK concentra-

tions of 25 (4 samples), 50 (19 samples), 75 (3 samples),100 (10 samples), 125 (3 samples), and 150 (10 samples)ppm. Figure 1B plots TBP vs. MIBK concentrations forthis data set.

A set of MIBK and TBP mixture samples was alsoprepared in natural water. The uniform experimental de-

sign of Fang and Wang7 was used to construct 55 sam-

ples. MIBK and TBP had 17 and 34 levels of concentra-tion, respectively, both ranging from 1 to 100 ppm. EachMIBK level was present in two to four samples, whileeither one or two samples had each TBP level. Figure 1Cplots TBP vs. MIBK concentrations for this data set.Stock solutions were prepared by directly weighingMIBK or TBP into a 250 mL volumetric ¯ ask containingwater collected from the Hocking River in Athens, OH.Stock solution concentrations were 300 and 150 ppm forMIBK and TBP, respectively. Individual samples wereprepared by dilution of the stock solutions with river wa-

ter. Three lots of river water were retrieved from one siteover a time period from May 1998, to March 1999. Five,30, and 20 samples, respectively, were prepared with the® rst, second, and third lots of water. For removal of par-

ticulates, the river water was vacuum ® ltered with What-

man #5 ® lter paper (Whatman International, Ltd., Maid-

stone, England) before use. All stock solutions and sam-

ples were prepared immediately before the collection ofspectral data.

Procedures. For the samples prepared with reagentwater, the data acquisition was performed in batches, im-

mediately after the preparation of the approximately 10samples in each batch. This procedure was utilized toprevent sample degradation. Single-sided interferogramscontaining 16 384 points were collected, and 256 coad-

ded scans were used. Samples were run in a random orderwith respect to concentration, and three replicate inter-

ferograms were collected for each sample. At the begin-

ning of each data collection session and after each ® vesamples, interferograms of pure reagent-grade water were

acquired for subsequent use in the calculation of spectrain absorbance units. The collected interferograms wereFourier processed to single-beam spectra by use of soft-

ware resident on the Bio-Rad SPC-3200 computer con-

trolling the spectrometer. Triangular apodization andMertz phase correction were employed. This approachproduced single-beam spectra with a point spacing of 1.9cm 2 1.

For the samples prepared in river water, 5 to 10 sam-

ples were prepared in each batch, and interferograms ofpure reagent-grade water were taken before and aftereach sample batch. For these samples, double-sided in-

terferograms consisting of 8192 points were collected.One level of zero-® lling was used during Fourier pro-

cessing to produce spectra with point spacing of 1.9 cm 2 1.All other procedures used with these samples were thesame as those described above.

The single-beam spectra were transferred to a SiliconGraphics 4D/460 computer (Silicon Graphics, Mountain

Page 3: Determination of Organic Contaminants in Aqueous Samples by Near-Infrared Spectroscopy

APPLIED SPECTROSCOPY 1049

FIG. 2. Near-IR absorbance spectra for water (solid line), pure MIBK(dashed line), and pure TBP (dash-dot line), plotted over the combi-

nation spectral region of 5000±4000 cm 2 1.

FIG. 3. Replicate near-IR absorbance spectra of MIBK (A) and TBP(B) in water over the region of 4500±4300 cm 2 1. The MIBK and TBPconcentrations are 103 and 98 ppm, respectively.

View, CA) where the remainder of the data analysis wasperformed. This computer operated under Irix (version5.2). All computer software used in processing the spec-

tral data was implemented in FORTRAN 77. Fourier ® l-

tering and multiple linear regression computations wereperformed with subroutines from the IMSL softwarepackage (IMSL, Houston, TX).

RESULTS AND DISCUSSION

Absorbance Spectra of Analytes. The successful de-

termination of organic species in water requires the iden-

ti ® cation of the spectral bands of the target analytes thatcan be isolated from the background absorbance of water.As indicated by the solid line in Fig. 2, within the rangeof 5000±4000 cm 2 1, there is a window between twostrong absorption bands of water. Within this window, theregion of 4500±4200 cm 2 1 contains several C±H com-

bination bands for both pure MIBK (dashed line) andTBP (dash-dot line), although they are signi® cantly over-

lapped. In Figs. 3A and 3B, respectively, three replicatenear-IR spectra of MIBK (103 ppm) and TBP (98 ppm)in water are displayed over the range of 4500±4300 cm 2 1.A background single-beam spectrum of pure water wasused in computing the absorbance values. Two absorptionbands for both MIBK and TBP can be seen at the con-

centrations of approximately 100 ppm. At concentrationsbelow 30 ppm, however, no bands for MIBK and TBPare visibly apparent. The spectra in Fig. 3 also illustratethat, at these low concentrations, both baseline artifactsand noise must be removed in order to extract the MIBKand TBP signals reliably.

Initial feasibility studies were begun with the pureMIBK and TBP aqueous samples. Multivariate calibra-

tion models for both MIBK and TBP were built by em-

ploying partial least-squares (PLS) regression8 based ona selected window of the absorbance spectrum. The op-timal PLS calibration models built with a leave-one-sam-

ple-out cross-validation procedure produced a standarderror of prediction (SEP) of 3.16 ppm and 2.46 ppm,respectively, for pure MIBK and TBP samples. Theseresults for the pure samples demonstrated the feasibilityof determining organic solvents (MIBK and TBP) below

100 ppm in water by near-IR spectroscopy. Given theseinitial favorable results, subsequent work focused on theMIBK and TBP mixture samples.

Analysis of Absorbance Spectra of Mixture Sam-ples. Determination of target compounds in the presenceof other interfering species is required for any on-line® eld analysis. PLS regression was also employed in theanalysis of absorbance spectra of the MIBK/TBP andTBP/MIBK mixture samples. The key parameters thatmust be optimized for the successful use of PLS regres-

sion with near-IR spectra are the spectral range used andthe number of calibration model terms or PLS factorsemployed. Therefore, to obtain the calibration model, itis necessary to develop a procedure for optimizing theseparameters.

Due to the availability of a larger data set for the mix-

ture samples, the cross-validation procedure was not em-

ployed. Instead, the data set was randomly partitionedinto a calibration set (80% of the total samples) and aprediction set (20% of the total samples). The calibrationand prediction samples are differentiated in Figs. 1A and1B through the use of open circles and closed triangles,respectively. The calibration set was randomly partitionedfurther into a calibration subset (80% of the calibrationsamples) and a monitoring set (20% of the calibrationsamples). The three replicate spectra for each samplewere carried together into the assigned calibration, mon-itoring, or prediction set. Three monitoring sets were usedto evaluate the candidate calibration models. The samecalibration set was randomly partitioned three times togenerate three calibration subsets and three monitoringsets.

To obtain the optimal combination of spectral range

Page 4: Determination of Organic Contaminants in Aqueous Samples by Near-Infrared Spectroscopy

1050 Volume 54, Number 7, 2000

FIG. 4. Pooled SEC and SEM vs. the number of PLS factors. Theopen circles denote the values of the pooled SEM, and the open trian-

gles denote the values of the pooled SEC.

FIG. 5. First eight spectral loadings for the PLS model based on thespectral range of 4700±4300 cm 2 1 with the MIBK/TBP absorbance dataset.

and the number of PLS factors for building the calibra-

tion models, we conducted a detailed study of the effectof changing the spectral range between 5000 and 4000cm 2 1 by varying both the size and location of the rangein a systematic manner. For each selected range, the num-ber of PLS factors was varied from 1 to 15. PLS cali-

bration models were built on the basis of the selectedspectral range and number of PLS factors with the threecalibration subsets and evaluated by use of the three cor-

responding monitoring sets. The optimal calibration mod-

els were determined on the basis of the minimum pooledstandard error of monitoring (SEM) for the three moni-

toring sets. The SEM value for an individual monitoringset was calculated by

nm

2(c 2 cà )O im imi 5 1ÎSEM 5 (1)

nm

where nm is the number of spectra in the monitoring set,c im is the actual analyte concentration for spectrum i in

the monitoring set, and cÃim is the analyte concentrationfor that spectrum predicted by the model. Pooling theSEM values from the three monitoring sets avoided thepotential bias from the use of a single monitoring set andhelped to make the calibration models more robust.

Once the optimal model parameters were de® ned, thefull calibration set (calibration subset 1 monitoring set)was used to compute a ® nal model. This model was thenapplied to the prediction set as a ® nal test. The predictionset was thus an independent validation data set that wasnot used at any stage during the model optimization.

To avoid the danger of over modeling, we used a sig-

ni® cance test to determine the optimal number of PLSfactors. In the selection of the signi® cant PLS factors, theminimum SEM was found ® rst; then the SEM of a modelconstructed with one fewer PLS factor was compared tothe minimum SEM by use of an F-test at a con® dencelevel of 95% . This process was repeated with progres-

sively smaller models until a signi® cant difference wasfound. The smallest model that produced an SEM notsigni® cantly different from the minimum SEM was se-lected. As shown in Fig. 4, for the spectral range of4575±4325 cm 2 1, as the number of PLS factors employedin the calibration models with three calibration subsetsand monitoring sets increases, the pooled standard errorof calibration (SEC) decreases. The pooled SEM, how-

ever, decreases until it reaches a minimum with nine PLSfactors, then slowly increases as the number of PLS fac-

tors increases. For this speci® c example, the pooled SEMcorresponding to the calibration model with six PLS fac-

tors was found to be signi® cantly different from the min-

imum SEM produced by the nine-term PLS modelthrough the F-test procedure. Therefore, the seven-termmodel was selected as optimal for the spectral range of4575±4325 cm 2 1.

These optimized model sizes raised concerns becausethe organic mixture samples consisted of only three com-

ponents (including water). Therefore, one might surmisethat the use of three PLS factors should be suf® cient toexplain the variation in the data set. On the basis of thisassumption, one may argue that the optimized model siz-

es (seven for the MIBK/TBP data set and nine for theTBP/MIBK data set) are too large. Although the F-testprocedure has been used to avoid the overmodeling prob-

lem, and Fig. 4 suggests that the model size selected wasappropriate, it was considered important to understandthe reason why a relatively large number of PLS factorsare needed in this application. One consideration is thefact that baseline variations are signi® cant in near-IRspectra of aqueous samples, especially for low concen-

tration samples where the organic spectral features arehardly visible. This case was illustrated previously in Fig.3. Therefore, we anticipated that the ® rst several PLSfactors may be used to explain the relatively large base-

line variations. To evaluate this assertion, Fig. 5 displaysthe PLS spectral loadings for the ® rst eight factors com-

puted from the spectral range of 4700±4300 cm 2 1 for theMIBK/TBP data set. The spectral loadings describe thecomponents extracted from the spectral data by the PLSprocedure. There are no organic spectral features in thespectral loadings until the fourth PLS factor. The ® rstthree PLS factors are apparently used to explain the var-

ious baseline variations in the near-IR absorbance spectra.

Page 5: Determination of Organic Contaminants in Aqueous Samples by Near-Infrared Spectroscopy

APPLIED SPECTROSCOPY 1051

TABLE I. Results from analysis of absorbance spectra for MIBK/TBP and TBP/MIBK mixtures.

Data setSpectral

range (cm 2 1)PLS

factors R2 (% )SECa

(ppm)SEPb

(ppm)

MIBK/TBPTBP/MIBK

4575±43254675±4375

79

99.0198.84

3.603.22

4.255.09

a Standard error of calibration.b Standard error of prediction.

FIG. 6. Concentration correlation plots for absorbance spectra of mix-

ture samples. (A) Results for the MIBK/TBP data set. (B) Results forthe TBP/MIBK data set. Spectra in the calibration and prediction setsare indicated by open circles and closed triangles, respectively. Modelsare described in Table I.

This observation also explains the inclusion in the opti-

mized spectral ranges of regions that do not contain TBPor MIBK absorption bands. Baseline regions must be in-

cluded to allow the PLS model to extract the TBP andMIBK information.

As the number of PLS factors increases, the spectralloadings become increasingly noisy because of the ex-

traction of noise from the data along with the spectralvariation. Beyond seven factors, the noise becomes dom-

inant, and these latent variables are no longer useful inthe calibration model.

For the MIBK/TBP data set, the top ® ve spectral rang-

es were 4575±4325, 4625±4325, 4675±4275, 4750±4275, and 4650±4325 cm 2 1. For the TBP/MIBK data set,the top ® ve spectral ranges were 4625±4375, 4675±4375,4650±4400, 4675±4350, and 4650±4350 cm 2 1. For bothanalytes, the most useful spectral information appears tobe in the region between 4700 and 4300 cm 2 1. Eventhough good chemical selectivity is still observed in Fig.2 below 4300 cm 2 1, this region of the spectrum becomesincreasingly noisy because of the lower optical through-

put caused by increased water absorption.When applied to the separate prediction set, the opti-

mal calibration model based on the spectral range of4575±4325 cm 2 1 and seven PLS factors provided a stan-

dard error of prediction (SEP) of 4.25 ppm for the MIBK/TBP mixtures. The calculation of SEP is analogous tothat of SEM in Eq. 1. With the TBP/MIBK data set, theoptimal calibration model based on the spectral range of4675±4375 cm 2 1 and nine PLS factors provided an SEPof 5.09 ppm. The prediction errors with mixture sampleswere slightly larger than those obtained for the pureMIBK and TBP samples. This outcome is reasonable be-

cause of the interference provided by the overlappingbands. These results con® rmed that the combination ofPLS regression with near-IR spectroscopy provides suf-

® cient selectivity to determine organic solvents in waterat low concentrations and in the presence of species withoverlapping spectral bands.

Table I lists the statistics describing the best calibrationmodels for the MIBK/TBP and TBP/MIBK absorbancespectral data sets. Listed in the table are the spectralrange, the number of PLS factors in the optimal model,the value of R 2 (% ), SEC, and SEP. Concentration cor-

relation plots are also provided in Fig. 6. Figure 6A isthe correlation plot for the MIBK/TBP data set, and Fig.6B is the corresponding plot for the TBP/MIBK data set.The open circles denote spectra in the calibration set,while the closed triangles indicate spectra in the predic-

tion set. Good correlations between predicted and actualconcentrations are noted in both plots.

Digital Fourier ® ltering has been proven useful as apreprocessing technique to remove noise and baseline

variation from near-IR spectral data. Improvements inoverall model performance and/or a reduction of thenumber of required PLS factors have been observed inprevious studies.9±11 This approach allows the design of® lters that can implement a range of preprocessing tasksthat include der ivative calcu lations and smoothing .Therefore, the procedure of coupling a Gaussian-shapeddigital bandpass ® lter with PLS regression was also ap-

plied to the MIBK and TBP determinations in the MIBK/TBP and TBP/MIBK data sets. The use of digital Fourier® lters is based on the assumption that if a spectrum isdecomposed into its underlying harmonic components,the slowly varying baseline variations are expected todominate the low-frequency range while the rapidly vary-

ing noise dominates the high-frequency region. The an-

alyte information should be concentrated at frequenciesbetween the baseline variation and noise. Therefore, dig-ital ® lters may be used to help isolate the analyte infor-

mation from noise and baseline variations.12

The combination of ® ltering with PLS regression re-

quires two additional parameters to be optimizedÐ theposition and the width of the Gaussian-shaped ® lter fre-

quency response function. These values are typically ex-pressed in digital frequency units ( f ), a linear scale of 0to 0.5. To perform a concerted optimization of the ® veparameters, we used a numerical optimization procedurebased on genetic algorithms (GAs). GAs are a class ofnumerical optimization algorithms based on the conceptsof genetics and natural selection.13±15 Our laboratory hasdeveloped an automated GA-based protocol for selectionof the optimal combination of digital ® ltering and PLSregression parameters for use in building calibrationmodels with near-IR spectra.16 The procedures describedin this previous work were applied here without modi® -

cation. Con® guration parameters for the GA were a pop-ulation size of 50, mutation probability of 0.1, and re-

combination probability of 0.9. The single-point cross-

over method of recombination was used, and optimiza-

tions were run for 50 generations. The initialization stepsize was 0.1, and the weighting factor used in the ® tnessfunction was 1.65.16

Page 6: Determination of Organic Contaminants in Aqueous Samples by Near-Infrared Spectroscopy

1052 Volume 54, Number 7, 2000

TABLE II. Results from GA-based optimization with absorbance spectra.

Data set

Filterposition

( f)

Filterwidth

( f)

Startingspectral

point(cm 2 1)

Endingspectral

point(cm 2 1)

Numberof PLSfactors

SECa

(ppm)SEPb

(ppm)

MIBK/TBPTBP/MIBK

0.0670.071

0.0520.037

44954620

43354320

59

3.294.04

3.824.84

a Standard error of calibration.b Standard error of prediction.

FIG. 7. Concentration correlation plots for absorbance spectra obtainedfrom GA-based optimization of model parameters. (A) Results for theMIBK/TBP data set. (B) Results for the TBP/MIBK data set. Spectrain the calibration and prediction sets are indicated by open circles (V)and closed triangles ( m ), respectively. Models are described in Table II.

Replication of the GA optimization runs was per-

formed by using three different sets of initial values forthe ® ve variables to be optimized. For each data set, thesingle result producing the highest ® tness score in theoptimization was selected, and a calibration model wasconstructed with the selected parameters. As before, thefull calibration set of spectra (calibration subset 1 mon-

itoring set) was used to compute these ® nal models. Theresulting models were then applied to the spectra in theprediction set.

Table II lists the prediction results obtained with thesemodels for both the MIBK/TBP and TBP/MIBK datasets. Also listed in the table are the optimal combinationsof ® lter position, ® lter width, number of PLS factors,starting point, and ending point of the spectral range se-

lected by the GA. The optimal ® lters are relatively broadin terms of their frequency responses. This ® nding indi-

cates that they are primarily serving as low-pass ® lters tosuppress spectral noise.

The prediction results for both data sets with digital® ltering based on the GA optimization procedure wereslightly better than the corresponding prediction resultsobtained without ® ltering (Table I). A reduction in thenumber of PLS factors required is also observed for theMIBK/TBP data set. The corresponding concentrationcorrelation plots are shown in Fig. 7. Good correlationsare again noted between predicted and actual concentra-

tions.Analysis of Single-Beam Spectra. The analysis of ab-

sorbance spectra is normally used because of the assumedlinear relationship between absorbance and concentration.However, in the ultimate application of near-IR spectros-

copy to the ® eld analysis of water samples, collection of

a matching background spectrum for use in the absor-

bance calculation may not be possible. The approach ofusing single-beam spectra directly in building calibrationmodels has been successfully implemented in other ap-

plications to eliminate the need for collecting backgroundspectra.17±19 In this research, the analysis of single-beamspectra of MIBK/TBP and TBP/MIBP mixture sampleswas also attempted. The GA-based optimization proce-

dure used to select the optimal combination of digital® lters, spectral ranges, and the number of PLS factors forthe absorbance spectral analysis was adopted for the sin-

gle-beam analysis. The ® rst two rows in Table III reportthe results obtained through the direct analysis of single-

beam spectra of both the MIBK/TBP and TBP/MIBKdata sets. Interestingly, the results for the TBP/MIBKdata set provided results as good as those obtained forthe corresponding analysis of absorbance spectra. For theMIBK/TBP data set, however, the results of the single-

beam spectral analysis were signi® cantly worse thanthose obtained with the corresponding absorbance spec-

tra. The most likely reason for this occurrence is the pres-

ence of samples with MIBK concentrations greater than100 ppm in the MIBK/TBP data set. As the concentrationincreases, a more nonlinear relationship exists betweenthe concentration and single-beam spectral intensity. Thecorresponding concentration correlation plots are shownin Fig. 8. In Fig. 8A, for the MIBK/TBP data set, non-

linearity apparently exists at MIBK concentrations higherthan 100 ppm. Since no samples in the TBP/MIBK dataset had TBP concentrations higher than 100 ppm, Fig.8B shows good concentration correlation between the ac-

tual TBP and predicted TBP concentrations.The single-beam spectral analysis was further investi-

gated by transforming the single-beam spectra as log(1/I i), where I i is the single-beam intensity at resolution el-

ement i. The log transform was used to help reduce thenonlinear relationship between the single-beam spectralintensity and analyte concentration. Again, the same GA-

based optimization procedure was used with the log-

transformed single-beam spectra. The last two rows ofTable III list the results obtained from the analysis of log-

transformed single-beam spectra. These results indicatethat with the log transformation, the analysis of the sin-

gle-beam spectra of the MIBK/TBP data set can be im-

proved to be as good as the analysis of absorbance spec-

tra. When the log transform is used with the TBP/MIBKdata set, the SEP degrades slightly relative to the modelspreviously discussed. However, a paired t-test of the pre-

diction residuals produced by this model vs. the modelsbuilt with raw single-beam spectra and digitally ® lteredabsorbance spectra (Table II) reveals that the differencesamong any of the sets of residuals is signi® cant at no

Page 7: Determination of Organic Contaminants in Aqueous Samples by Near-Infrared Spectroscopy

APPLIED SPECTROSCOPY 1053

TABLE III. Results from GA-based optimization with single-beam spectra.

Data set

Filterposition

( f)

Filterwidth

( f)

Startingspectral

point(cm 2 1)

Endingspectralpoint

(cm 2 1)

Numberof PLSfactors

SECa

(ppm)SEPb

(ppm)

MIBK/TBP(no log 1/I ) 0.073 0.043 4590 4220 8 7.66 10.23

TBP/MIBK(no log 1/I ) 0.025 0.056 4570 4275 9 4.22 4.76

MIBK/TBP(with log 1/I ) 0.065 0.043 4645 4320 9 4.05 3.66

TBP/MIBK(with log 1/I ) 0.043 0.066 4595 4280 8 4.54 5.31

a Standard error of calibration.b Standard error of prediction.

FIG. 9. Concentration correlation plots for log-transformed single-

beam spectra obtained from the GA-based optimization of model pa-

rameters. (A) Results for the MIBK/TBP data set. (B) Results for theTBP/MIBK data set. Spectra in the calibration and prediction sets areindicated by open circles (V) and closed triangles ( m ), respectively.Models are described in Table III.

FIG. 8. Concentration correlation plots for single-beam spectra ob-

tained from the GA-based optimization of model parameters. (A) Re-

sults for the MIBK/TBP data set. (B) Results for the TBP/MIBK dataset. Spectra in the calibration and prediction sets are indicated by opencircles (V) and closed triangles ( m ), respectively. Models are describedin Table III.

more than the 69% level. Figure 9 shows the concentra-

tion correlation plots for the models based on the log-transformed single-beam spectra. The nonlinearity ob-

served in Fig. 8A has been removed in Fig. 9A. Thesimilarity in the results between the models constructedwith the single-beam and absorbance spectra also helpsto justify the use of the same GA parameters during theoptimization of the two sets of models.

Analysis of Samples Prepared in Natural Water. Asa ® nal test of the methodology, calibration models wereconstructed with samples prepared in three batches ofnatural water. Models based on digitally ® ltered single-

beam spectra were again employed, and the log transformwas used. Of the 55 samples, 44 were selected randomlyand placed in the calibration set, and 11 were withheldfor use as prediction data. Figure 1C indicates the cali-

bration and prediction samples with open circles andclosed triangles, respectively. Replicate spectra wereagain placed together in either the calibration or predic-

tion subsets. Each of the three lots of river water wasrepresented in the calibration set. The GA procedure de-

scribed previously was used to optimize digital ® lteringparameters, the spectral range supplied to the PLS cal-

culation, and the number of PLS factors used in the cal-

ibration model. Models for both MIBK and TBP werecomputed. For each analyte, the model that produced the

highest ® tness score during the three replicate runs of theGA optimization was applied to the spectra in the pre-diction set. Table IV summarizes the calibration and pre-

diction performance of these models, and Fig. 10 pro-

vides concentration correlation plots for both MIBK andTBP. For both MIBK and TBP, calibration and predictionresults for the samples prepared in natural water are asgood as or better than the corresponding results obtainedwith the samples prepared in reagent water. These resultssuggest that the additional constituents found in naturalwater samples do not interfere signi® cantly with the de-

termination of MIBK and TBP over the 1±100 ppm con-

centration range.

CONCLUSION

The results presented here demonstrate the feasibilityof determining MIBK and TBP in aqueous mixture sam-

ples over the range of 1±100 ppm by near-IR spectros-

copy. Baseline variation is a major factor in processingnear-IR spectra of aqueous samples containing low levelsof organic solvents. The relatively large PLS modelsneeded to extract MIBK or TBP information from theirrespective mixture samples are due to the requirement tocorrect this baseline variation. The results obtained withthe analysis of single-beam spectra are comparable to

Page 8: Determination of Organic Contaminants in Aqueous Samples by Near-Infrared Spectroscopy

1054 Volume 54, Number 7, 2000

TABLE IV. Results from analysis of samples prepared in natural water.

Analyte

Filterposition

( f)

Filterwidth

( f)

Startingspectral

point(cm 2 1)

Endingspectral

point(cm 2 1)

Numberof PLSfactors

SECa

(ppm)SEPb

(ppm)

MIBKTBP

0.0340.045

0.0610.013

46104545

43104240

89

3.324.08

3.573.84

a Standard error of calibration.b Standard error of prediction.

FIG. 10. Concentration correlation plots for log-transformed single-

beam spectra of samples prepared in natural water. Model parametersfor (A) MIBK and (B) TBP were determined by the GA-based opti-

mization procedure. Spectra in the calibration and prediction sets areindicated by open circles (V) and closed triangles ( m ), respectively.Models are described in Table IV.

those obtained from the analysis of absorbance spectra.The successful analysis of single-beam spectra is veryencouraging, because this approach has the advantage ofeliminating the requirement for a background spectralmeasurement. The similarity in performance with sam-

ples prepared in reagent water and natural water is alsoencouraging in establishing the feasibility of this ap-

proach for use in screening natural water samples.The GA-based method for combining digital ® ltering

with PLS regression is judged to be desirable due to itsability to perform an automated joint optimization of thevarious ® ltering and calibration model parameters. Theability of the ® ltering/PLS procedure to reduce the modelsize is also judged to be an advantage in producing amore robust model. On the basis of the successful resultsobtained for MIBK and TBP, the determination of other

organic solvents in water should also be possible over the1±100 ppm range.

ACKNOWLEDGMENTS

This research was supported by the Department of Energy and Ar-

gonne National Laboratory under Contract 940702401. The Departmentof the Army is acknowledged for providing the Silicon Graphics 4D/460 computer system.

1. J. A. Campbell, R. W. Stromatt, M. R. Smith, D. W. Koppenaal, R.M. Bean, T. E. Jones, D. M. Strachan, and H. Babad, Anal. Chem.66, 1208A (1994).

2. R. A. Greenwell, S. Saggese, and J. Hatch, Proc. Int. Instrum.Symp. 41, 61 (1995).

3. D. S. Sklarew, R. M. Ozanich, R. N. Lee, J. E. Amonette, B. W.Wright, and R. G. Rilley, J. Chromatogr. Sci. 33, 622 (1995).

4. M. H. Hiatt, D. R. Youngman, and J. R. Donnelly, Anal. Chem.66, 905 (1994).

5. R. B. Lucke, J. A. Campell, G. A. Ross, S. C. Goheen, and E. W.Hoppe, Anal. Chem. 65, 2229 (1993).

6. K. Cammann, U. Karst, J. Sander, and M. Wortberg, Proc. SPIE-

Int. Soc. Opt. Eng. 1716, 324 (1992).7. K.-T. Fang and Y. Wang, Number-theoretic Methods in Statistics

(Chapman and Hall, London, 1994), Chap. 5.8. H. Martens and T. Nñ s, Multivariate Calibration (Wiley, New

York, 1989), Chap. 3.9. L. A. Marquardt, M. A. Arnold, and G. W. Small, Anal. Chem. 65,

3271 (1993).10. G. W. Small, M. A. Arnold, and L. A. Marquardt, Anal. Chem. 65,

3279 (1993).11. M. A. Arnold and G. W. Small, Anal. Chem. 62, 1457 (1990).12. G. Horlick, Anal. Chem. 44, 943 (1972).13. D. B. Hibbert, Chemom. Intell. Lab. Syst. 19, 277 (1993).14. C. B. Lucasius and G. Kateman, Chemom. Intell. Lab. Syst. 19, 1

(1993).15. C. B. Lucasius and G. Kateman, Chemom. Intell. Lab. Syst. 25, 99

(1994).16. R. E. Shaffer, G. W. Small, and M. A. Arnold, Anal. Chem. 68,

2663 (1996).17. G. Lu, X. Zhou, M. A. Arnold, and G. W. Small, Appl. Spectrosc.

51, 1330 (1997).18. H. M. Heise and A. Bittner, J. Mol. Struct. 348, 127 (1995).19. A. Bittner, R. Marbach, and H. M. Heise, J. Mol. Struct. 349, 341

(1993).