near-ir versus mid-ir: separability of three classes of organic compounds

6
Volume 51, Number 5, 1997 APPLIED SPECTROSCOPY 625 0003-7028 / 97 / 5105-0625$2.00 / 0 q 1997 Society for Applied Spectroscopy submitted papers Near-IR Versus Mid-IR: Separability of Three Classes of Organic Compounds BRIAN R. STALLARD Sandia National Laboratories, Albuquerque, New Mexico 87185-0967 Recently there has been a surge of interest in spectroscopic sensors operating in the near-IR, although it is recognized that the mid-IR contains more spectral information. The general question addressed in this paper is, How much speci® city is lost in choosing the near-IR over the mid-IR for sensor applications? The example considered is the separability among three classes of organic compounds: al- kanes, alcohols, and ketones/aldehydes. We use spectra from two sources: the Hummel polymer library (mid-IR) and the library of Buback and VoÈ gele (near-IR). This is the ® rst paper on class sep- arability to make use of this new near-IR library, available in digital form only since July 1995. Five spectral regions are considered: region 5, 10,500 to 6300 cm 2 1 ; region 4, 7200 to 5200 cm 2 1 ; region 3, 5500 to 3800 cm 2 1 ; region 2, 3900 to 2500 cm 2 1 ; and region 1, 2500 to 500 cm 2 1 . Class separability is explored both qualitatively and quantitatively with the use of principal component scatter plots, linear discriminant analysis, Bhattacharyya distances, and other methods. We ® nd that the separability is greatest in region 1 and least in region 2, with the three near-IR regions being intermediate. Furthermore, we ® nd that, in the near-IR, there is suf® cient class separability to ensure that organic compounds of one class can be determined in the midst of interference from the other classes. Index Headings: Near-IR spectroscopy; IR spectroscopy; Chemo- metrics; Class separation. INTRODUCTION Recently there has been a surge of interest in spectro- scopic sensors operating in the near-IR. 1± 6 Historically, however, the mid-IR has received much more attention from applied spectroscopists. This paper discusses the relative speci® city available in these two spectral regions. The general problem considered is the detection or de- termination of organic compounds in the presence of spectral interferences from related classes of compounds. Fabrication techniques that have revolutionized the de- sign and manufacture of electronic components are now being applied to optical sensors. 7,8 This potential for cheap and small devices appears much greater in the near-IR than the mid-IR. We expect this consideration to be a major driving force to increase the interest in near-IR sensors. The near-IR spectral region has additional sig- ni® cant advantages. 5,6,9 Detectors are more sensitive, and cooling is a less important issue. Sources are brighter and more ef® cient. Pathlengths for liquid samples are more Received 10 July 1996; accepted 11 September 1996. convenient (about 1 to 5 mm vs. 10 to 50 m m). Also, diffuse re¯ ectance, which is very popular in a number of industrial applications, is more ef® cient with the shorter wavelengths. On the other hand, a well-known rule of thumb states that the absorption cross sections for molecular vibrations decrease from the fundamental (in the mid-IR) by one decade for each order in the overtone spectrum. This con- sideration will be ignored in the present work, since it can readily be added to the model when the need arises. As a further justi® cation, we note that, for a similar cost, near-IR sensors generally have a higher signal-to-noise ratio than mid-IR sensors. A second problem in working in the near-IR is that there is not as much unique spectral information. Classes of compounds may be less distinguishable, and spectral backgrounds may be more likely to interfere. The im- portance of this consideration has been diminished by modern multivariate spectral analysis. Nevertheless, we intuitively feel (and correctly so) that there is more in- formation in the well-known ® ngerprint region (1800 to 400 cm 2 1 ) than in the near-IR. The purpose of this paper is to determine how much speci® city is lost in choosing the near-IR over the mid-IR for sensor applications. This information is needed by the design engineer, who may ® nd it cheaper to build a near-IR sensor but must assess the loss in speci® city that this choice will entail. This paper considers only broad-band spectroscopic sensors which might be employed to detect large mole- cules in the gas phase or molecules of any size in the condensed phase. Gas-phase detection of small molecules with narrow spectral features is not discussed, since the critical considerations are distinctly different. Recently the ® rst large digital spectral library of near-IR spectra covering the range 10,500 to 3800 cm 2 1 has become available. 10 This paper is the ® rst to use the new spectral library to answer questions regarding class separability. MATERIALS AND METHODS Spectra were drawn from two libraries. The Hummel polymer library, acquired from Nicolet Analytical Instru- ments, covers the spectral region 3800 to 500 cm 2 1 , while the library of Buback and VoÈ gele, acquired from Chem-

Upload: brian-r

Post on 05-Oct-2016

218 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Near-IR Versus Mid-IR: Separability of Three Classes of Organic Compounds

Volume 51, Number 5, 1997 APPLIED SPECTROSCOPY 6250003-7028 / 97 / 5105-0625$2.00 / 0q 1997 Society for Applied Spectroscopy

submitted papers

Near-IR Versus Mid-IR: Separability of Three Classes ofOrganic Compounds

BRIAN R. STALLARDSandia National Laboratories, Albuquerque, New Mexico 87185-0967

Recently there has been a surge of interest in spectroscopic sensorsoperating in the near-IR, although it is recognized that the mid-IRcontains more spectral information. The general question addressedin this paper is, How much speci® city is lost in choosing the near-IRover the mid-IR for sensor applications? The example consideredis the separability among three classes of organic compounds: al-kanes, alcohols, and ketones/aldehydes. We use spectra from twosources: the Hummel polymer library (mid-IR) and the library ofBuback and VoÈ gele (near-IR). This is the ® rst paper on class sep-arability to make use of this new near-IR library, available in digitalform only since July 1995. Five spectral regions are considered:region 5, 10,500 to 6300 cm 2 1; region 4, 7200 to 5200 cm 2 1; region3, 5500 to 3800 cm 2 1; region 2, 3900 to 2500 cm 2 1; and region 1,2500 to 500 cm 2 1. Class separability is explored both qualitativelyand quantitatively with the use of principal component scatter plots,linear discriminant analysis, Bhattacharyya distances, and othermethods. We ® nd that the separability is greatest in region 1 andleast in region 2, with the three near-IR regions being intermediate.Furthermore, we ® nd that, in the near-IR, there is suf® cient classseparability to ensure that organic compounds of one class can bedetermined in the midst of interference from the other classes.

Index Headings: Near-IR spectroscopy; IR spectroscopy; Chemo-metrics; Class separation.

INTRODUCTION

Recently there has been a surge of interest in spectro-scopic sensors operating in the near-IR.1± 6 Historically,however, the mid-IR has received much more attentionfrom applied spectroscopists. This paper discusses therelative speci® city available in these two spectral regions.The general problem considered is the detection or de-termination of organic compounds in the presence ofspectral interferences from related classes of compounds.

Fabrication techniques that have revolutionized the de-sign and manufacture of electronic components are nowbeing applied to optical sensors.7,8 This potential forcheap and small devices appears much greater in thenear-IR than the mid-IR. We expect this consideration tobe a major driving force to increase the interest in near-IRsensors. The near-IR spectral region has additional sig-ni® cant advantages.5,6,9 Detectors are more sensitive, andcooling is a less important issue. Sources are brighter andmore ef® cient. Pathlengths for liquid samples are more

Received 10 July 1996; accepted 11 September 1996.

convenient (about 1 to 5 mm vs. 10 to 50 m m). Also,diffuse re¯ ectance, which is very popular in a number ofindustrial applications, is more ef® cient with the shorterwavelengths.

On the other hand, a well-known rule of thumb statesthat the absorption cross sections for molecular vibrationsdecrease from the fundamental (in the mid-IR) by onedecade for each order in the overtone spectrum. This con-sideration will be ignored in the present work, since itcan readily be added to the model when the need arises.As a further justi® cation, we note that, for a similar cost,near-IR sensors generally have a higher signal-to-noiseratio than mid-IR sensors.

A second problem in working in the near-IR is thatthere is not as much unique spectral information. Classesof compounds may be less distinguishable, and spectralbackgrounds may be more likely to interfere. The im-portance of this consideration has been diminished bymodern multivariate spectral analysis. Nevertheless, weintuitively feel (and correctly so) that there is more in-formation in the well-known ® ngerprint region (1800 to400 cm2 1) than in the near-IR. The purpose of this paperis to determine how much speci® city is lost in choosingthe near-IR over the mid-IR for sensor applications. Thisinformation is needed by the design engineer, who may® nd it cheaper to build a near-IR sensor but must assessthe loss in speci® city that this choice will entail.

This paper considers only broad-band spectroscopicsensors which might be employed to detect large mole-cules in the gas phase or molecules of any size in thecondensed phase. Gas-phase detection of small moleculeswith narrow spectral features is not discussed, since thecritical considerations are distinctly different.

Recently the ® rst large digital spectral library ofnear-IR spectra covering the range 10,500 to 3800 cm2 1

has become available.10 This paper is the ® rst to use thenew spectral library to answer questions regarding classseparability.

MATERIALS AND METHODS

Spectra were drawn from two libraries. The Hummelpolymer library, acquired from Nicolet Analytical Instru-ments, covers the spectral region 3800 to 500 cm2 1, whilethe library of Buback and VoÈ gele, acquired from Chem-

Page 2: Near-IR Versus Mid-IR: Separability of Three Classes of Organic Compounds

626 Volume 51, Number 5, 1997

TABLE I. The compounds whose spectra were used in this study.

Alkanes Alcohols Ketones/aldehydes

2,2,4-Trimethyl-pentane

CyclohexaneHexaneDecaneDodecaneHeptaneNonaneUndecaneCyclooctaneMethylcyclo-

hexaneCyclopentane1,2-Dimethyl-

cyclohexaneHexadecane

2-Methyl-1-propanol2-Butanol2,4-Dimethyl-3-pentanol1-ButanolMethanol2-Methyl-2-propanol3-Methyl-1-butanol1-Octanol2-Methyl-1-butanol2-Octanol1-HeptanolCycloheptanol2-Methylcyclohexanol3,3-Dimethyl-1-butanol1-Hexanol

3-Heptanone2-HeptanoneOctanolHexanalPentanal2-Methylbutyr-

aldehyde3-PentanoneCycloheptanone 11,4-Dimethyl-

Cyclohexanone4-Methyl-2-

pentanone

FIG. 1. Plots of the mean spectrum for each class in each spectral region. Spectra have been scaled separately in each region so that the absorbancevalues are from 0 to 1. Spectra have been offset for clarity.

ical Concepts, covers the spectral region 10,500 to 3800cm2 1. The increment between data points was set to aconstant 16 cm2 1 for the combined regions. A simpleaverage of neighboring points was used to reduce theresolution when required.

Data analysis was accomplished principally with theWindows version of the statistical program S-Plus, fromMathSoft. Distributed routines were used where possible,although a certain amount of programming was required.The partial least-squares (PLS) calculations were per-formed with software developed by David M. Haalandand written in the array basic language to be run in theenvironment of GRAMS/386 from Galactic Industries.

RESULTS AND DISCUSSION

The Classes and Spectral Regions. Three classes ofcompounds were drawn from the two spectral libraries:alkanes, alcohols, and ketones/aldehydes. To account ful-ly for the within-class variance, there should ideally be alarge number of spectra to represent each class. In thiscase, the number was limited, since only compounds

found in both libraries could be considered. Table I con-tains a list of the compounds whose spectra were used.

The complete spectrum of each compound was parti-tioned into the ® ve spectral regions: region 5, 10,500 to6300 cm2 1; region 4, 7200 to 5200 cm2 1; region 3, 5500to 3800 cm2 1; region 2, 3900 to 2500 cm2 1; and region1, 2500 to 500 cm2 1. Within each region the spectra werescaled to give an absorbance range from 0 to 1. Figure1 contains the mean spectrum for each of the three classesin the ® ve spectral regions. Studying this ® gure gives anintuitive feel for the separability among the classes. Therest of the paper is concerned with methods to exploreand quantify the separability.

Dimension Reduction and Visualization. One way tovisualize the class separation is in a scatter plot, after thedimension of the spectra has been reduced from severalhundred to only three. A popular method of reducing thedimension of a data set is known as principal componentanalysis (PCA).11± 13 The vector corresponding to the di-rection of maximal variance is determined for the fulldimensional data set. Subject to the constraint of orthon-ormality, the procedure is repeated for the direction ofnext maximal variance as many times as suits the prob-lemÐ usually until an acceptable fraction of the total vari-ance is captured. The spectra are then represented by theirexpansion coef® cients (known as scores) in the new setof basis vectors (known as factors, or speci® cally PC1,PC2, etc.). Figure 2 contains scatter plots of the scoresof the ® rst three PCs for all the classes and regions. Asexpected, region 1 shows the best class separability. Re-gion 2 shows perhaps the least, with the other regionsbeing more or less intermediate cases. The lack of con-creteness in these deductions motivates us to considermore quantitative approaches.

Linear Discriminant Analysis. One way to quantifyseparability between two classes is known as linear dis-criminant analysis (LDA).13 In this approach the classidentities of the spectra must be indicated at the outsetof the calculation. A single axis is determined, which

Page 3: Near-IR Versus Mid-IR: Separability of Three Classes of Organic Compounds

APPLIED SPECTROSCOPY 627

FIG. 2. Three-dimensional scatter plots of the scores of the ® rst three PCs in each of the ® ve spectral regions. The regions are speci® ed in thetext and have the same position in Fig. 1 as in this ® gure. Each point represents a spectrum: ( 1 ) alkane, (M) alcohol, and ( m ) ketone/aldehyde.

FIG. 3. Two examples of the one-dimensional distribution of scoresthat are produced through LDA. Complete LDA results are containedin Fig. 4 and Table IIA.

maximizes the separation between any two classes. Alimitation of LDA is that the covariance matrix encoun-tered in the calculation must be nonsingular. This meansthat the computation requires at least as many spectra asthe dimension of the spectral space. But statisticians warnthat a high dimensional covariance matrix (a few hundredif we use the full spectra) is not truly valid unless thereare an impractically large number of samples. Since thenumber of samples is limited, we chose the alternative ofreducing the dimension of the data set. PCA is appropri-

ate for accomplishing this necessary ® rst step (Ref. 14offers another approach). Figure 3 shows two examplesof LDA applied to a ® ve-dimensional data set. The resultis essentially identical when a three-dimensional data setis used. LDA has further reduced the representation ofthe data to a single dimension. We see in Fig. 3 thatalkanes and alcohols are much more separated in region1 than in region 2. This degree of separation may bequanti® ed into a separation metric by ratioing the abso-lute value of the difference of the means to the squareroot of the pooled variance. This approach, of course,implies an assumption of normality, which is not likelyto be strictly valid. Nevertheless, the assumption is con-venient and probably has little deleterious effect on thevalidity of our deductions. Figure 4 plots the LDA-basedseparation metric for the ® ve spectral regions and thethree pairwise class comparisons (#1 alkanes vs. alco-hols; #2 alkanes vs. ketones/aldehydes; #3 alcohols vs.ketones/aldehydes). The numerical data contained in Fig.4 are also presented in Table IIA. Again region 1 isshown to have the best class separation. We see clearlythat regions 3, 4, and 5 have suf® cient separation to beconsidered for sensor applications. Region 2 is the leastuseful. Even so, region 2 is very popular (perhaps inap-propriately so) for the detection of organic compounds.

Other Measurements of Class Separation. Table IIincludes data for three additional separation metrics withno accompanying ® gure such as Fig. 4. All are calculatedafter the data set has been reduced to a dimension of threevia PCA. We ® nd that all metrics show trends similar tothose for Fig. 4 and Table IIA.

Page 4: Near-IR Versus Mid-IR: Separability of Three Classes of Organic Compounds

628 Volume 51, Number 5, 1997

FIG. 4. A three-dimensional plot showing the class separation metric data derived from LDA (as explained in the text) for all spectral regions andpairwise class comparisons. The class pair labels have the following meaning: (1) alkanes vs. alcohols, (2) alkanes vs. ketone/aldehyde, and (3)alcohols vs. ketone/aldehydes.

TABLE II. Summary of results for the various class separationmetrics de® ned in the text. Larger numbers always mean more sep-aration, but the precise nature of the scale may be poorly de® ned(see text). The classes are compared pairwise where a 5 alkanes, b5 alcohols, and c 5 ketones/aldehydes.

Spectral regions (cm2 1)

Class pairs

a± b a± c b± c

A. Separation metrics based on LDA.

1. 500± 25002. 2500± 39003. 3800± 55004. 5200± 72005. 6300± 10,500

134.66.53.3

17

132.34.34.52.8

154.56.56.19.0

B. Separation metrics based on the ® rst term of theBhattacharyya distance, Eq. 1.

1. 500± 2,5002. 2500± 39003. 3800± 55004. 5200± 72005. 6300± 10,500

112.12.71.3

14

200.51.42.30.8

7.22.21.83.89.3

C. Separation metrics based on scatter matrices in Eq. 2.

1. 500± 25002. 2500± 39003. 3800± 55004. 5200± 72005. 6300± 10,500

3.31.82.01.43.5

4.00.91.52.01.2

2.81.71.62.33.0

D. Separation metrics based on scatter matrices in Eq. 3.

1. 500± 25002. 2500± 39003. 3800± 55004. 5200± 72005. 6300± 10,500

277.58.95.8

32

504.46.38.65.1

177.16.8

1122

Table IIB contains results for the ® rst term of the Bhat-tacharyya distance.15 We are familiar in one dimensionwith the approach of normalizing the difference betweenthe class means to the within-class standard deviation orvariance (see discussion of Fig. 3, above). The Bhatta-

charyya distance is the rigorous generalization of this cal-culation, assuming normality, to higher dimensions. LikeLDA it makes use of covariant matrices that must benonsingular.

2 11 S 1 S1 2TB 5 (M 2 M ) (M 2 M )2 1 2 1[ ]8 2

S 1 S1 2) )211 ln (1)

2 Ï z S z z S z1 2

where the subscripts 1 and 2 designate the classes, su-perscript T indicates transpose, z ± ± ± z indicates the de-terminant, Mi is the mean vector (i.e., spectrum), and S i

is the covariance matrix. The estimated covariance matrixis calculated according to

N1TS 5 (X 2 M)(X 2 M)Oi k i k iN 2 1 k5 1

where Xk is the individual vector representing the kthspectrum. The ® rst term in Eq. 1 is the distance betweentwo classes normalized to their variance, while the secondterm captures the component of class separation relatedto unequal within-class variances. The second term is in-teresting because it reminds us that classes may have thesame mean but be distinguishable to a certain degree bya difference in their distribution. However, in the presentapplication there is a problem with the second term. Thesamples are too few to provide an estimate of the secondterm that is different from zero, in the sense of statisticalsigni® cance. It is better to set the second term to zero(i.e. assume equal variances) than assume that the sepa-rability indicated by the second term is real. Hence, weinclude only the ® rst term of the Bhattacharyya distance

Page 5: Near-IR Versus Mid-IR: Separability of Three Classes of Organic Compounds

APPLIED SPECTROSCOPY 629

TABLE III. The class separation metric based on SEPs obtainedfrom PLS analysis on a simulated data set described in the text.The analyte is hexane in a random background of both alcoholsand ketone/aldehydes. In all but the ® rst column, noise has beenadded.

Spectral regions (cm2 1)

Percent added noise

0 1 2 4

1. 500± 25002. 2500± 39003. 3800± 55004. 5200± 72005. 6300± 10,500

206.3

13106.3

205.57.76.36.3

145.37.74.84.8

8.35.34.54.34.3

in Table IIB. Note that this term is similar to, but not thesame as, the Mahalanobis distance.12

Table IIC contains results for one type of scatter matrixcalculation,15 which is de® ned by the following equation:

SWSM 5 ln (2)1 ) )SM

where SW is the within-class scatter matrix, which is sim-ply the mean of the class covariance matrices, weightedby the number of samples in each class, and SM is themixture scatter matrix, which is simply the covariancematrix of all the samples regardless of their class assign-ments.

Table IID contains results for a second type of scattermatrix calculation15 which is de® ned by the followingequation:

SWSM 5 trace . (3)2 1 2SM

The approach in Eqs. 2 and 3 is to ratio the within-classvariance to total variance. Then reduce the resulting ma-trix to a single number by employing a measure of thesize of the matrix. Both the determinant and the trace canserve this function. The general trends of Tables IIA andIIB are repeated when Eqs. 2 or 3 are used for the sep-aration metric.

PLS Prediction as an Alternative Approach. Themethod of PLS11 is primarily aimed at prediction rather thanclass separability. However, PLS results may be folded intothe present work by focusing on the precision of determin-ing a component from one class when it is present in alarge and varying background of the other classes. As theseparation metric, we use the inverse of the cross-validatedstandard error of prediction (SEP), which is an estimate ofthe 1s precision of the determination. The cross-validationprocedure consists of constructing a series of predictivemodels where a single calibration point is excluded fromeach model. The series is used to predict the concentrationsof the excluded points, in turn, as if they were unknown.Finally, the predicted values of the excluded points are com-pared to their known values. The SEP is the root meansquare of these differences.

We chose the problem of determining the concentrationof hexane in a background of alcohols and ketones/al-dehydes. Twenty-® ve randomized backgrounds were cre-ated by averaging the spectra of these two classes with arandom weighting from 0 to 1. The spectrum of hexanewas scaled by a known (i.e., not randomized) factor be-tween 0.01 and 0.1 and added to one of the sets of ran-domized background spectra. A cross-validated PLS cal-ibration was done with these 25 simulated samples. TheSEPs for the ® ve spectral region were as follows: region1 5 0.005, region 2 5 0.016, region 3 5 0.008, region4 5 0.010, and region 5 5 0.016, where the units arefractions of the reference hexane spectrum, which isscaled from 0 to 1 absorbance units. Lower SEP valuesindicate better performance. To compare these results toTable II, it is convenient to de® ne the separation metricas (10*SEP)2 1. The PLS results, in terms of this separa-tion metric, are listed in the ® rst column of Table III. Thetrends are consistent with those for Table II. The PLSapproach has the desirable feature that the degree to

which one region is superior to another is clearly quan-ti® ed. For example, the expected precision of the deter-mination in region 3 is two times better than in region 2.

Also included in Table III are the results when noiseis added to the set of 25 modeled spectra. We see thatthe separation metric is diminished by noise, but not bythe same proportion in each spectral region. This is animportant observation for practical engineering decisions.As a matter of related interest, we have found that addinga reasonable amount of spectral noise to the calculationsof Table II has essentially no effect on the numericalresults. This is because the noise-induced variance issmall in comparison to the other variances considered inthe calculation.

There is a subtlety buried in Table III that deserves com-ment. The SEPs, calculated by the PLS calibration runs,depend on the number of factors retained in the model. InTable III the number of factors decreases from 5 to 3 withincreasing noise, according to the following rule: the num-ber of factors is lowered until the SEP for the full calibra-tion (i.e., no samples left out for cross validation) dividedby the SEP for the cross-validated calibration is greater than0.60. This condition avoids over® tting of the data. The spe-ci® c number 0.60 is, in our experience, appropriate forabout 25 samples. The rigorous justi® cation for this pro-cedure is still a research topic.

Additional Comments. Considering Tables II and III,we see that the various types of separation metrics yieldthe same trends. Some metrics seem to have a nonlinearscale and tend to exaggerate the changes from region toregion. The general ® nding is that the separability ofclasses is greatest in region 1 and least in region 2, withthe others being intermediate. One result that goes againstthe general rule is that the speci® city for alcohols is great-est in region 5.

Table IIA has one of the more convenient quantitativeinterpretations. Class means are different by the tabulatednumber of units of standard deviation. A three standarddeviation separation already gives very little overlap fornormally distributed classes. Hence, even the most over-lapped of classes in Table IIA are actually well separated.This result is a bit counterintuitive since, in Fig. 1, theclasses have quite similar spectra in some regions. Nev-ertheless, multivariate analysis is capable of making dis-tinctions that are not clearly evident to the eye. Hence,we anticipate that near-IR spectroscopic sensors will, formost applications, have the speci® city required to deter-mine organic compounds in the midst of interfering back-grounds. This expectation assumes that the sensor can beconstructed to take advantage of multivariate data anal-

Page 6: Near-IR Versus Mid-IR: Separability of Three Classes of Organic Compounds

630 Volume 51, Number 5, 1997

ysis and is not limited to one or two wavelengths. Withthis multivariate approach, building a sensor to functionin the ® ngerprint region (region 1) may not be a goodengineering choice. The instrumentation is expensive,and the added speci® city may not be required. In addi-tion, the CH region (region 2) is usually a poor choice,since the speci® city is low but the cost is high. Our anal-ysis shows that the near-IR spectral region offers reason-able speci® city for the type of problem considered. As-suming that the promise of low-cost sensors is realized,applications in the near-IR should continue to increase.

ACKNOWLEDGMENTS

This work was supported by the U.S. Department of Energy undercontract number DE-AC04-94AL85000. Discussions with Edward V.Thomas, a statistician at Sandia National Laboratories, were bene® cial.Also, Jason Harper, a high school intern, contributed by producing alist of compounds which are common to both spectral libraries.

1. A. Fong and G. M. Hieftje, Appl. Spectrosc. 49, 1261 (1995).2. I. Schneider, G. N. Trude, V. V. King, and I. D. Aggarwal, IEEE

Photonics Technol. Let. 7, 1041 (1995).

3. A. S. Bonanno and P. R. Grif® ths, Appl. Spectrosc. 49, 1598 (1995).4. B. R. Stallard, M. J. Garcia, and S. Kaushik, Appl. Spectrosc. 50,

334 (1996).5. Making Light Work: Advances in Near Infrared Spectroscopy, I.

Murray and I. A. Cowe, Eds. (VCH, Weinheim, 1992).6. Near-Infrared Technology in the Agricultural and Food Industries,

P. Williams and K. Norris, Eds. (American Association of CerealChemists, St. Paul, Minnesota, 1987).

7. P. V. Lambeck, Sensors and Actuators B 8, 103 (1992).8. D. P. Saini and S. L. Coulter, Photonics Spectra 30, 91 (March

1996).9. E. Stark, SPIE Proc. 1575, 70 (1991).

10. M. Buback and H. P. VoÈ gele, FT-NIR Atlas (VCH, Weinheim,1993). Available in digital form since July 1995 from ChemicalConcepts, Weinheim, Germany.

11. H. Martens and T. Nñ s, Multivariate Calibration (John Wiley andSons, Chichester, 1989).

12. D. L. Massart, B. G. M. Vandeginste, S. N. Deming, Y. Michotte,and L. Kaufman, Chemometrics: A Textbook (Elsevier, Amsterdam,1988).

13. R. A. Johnson and D. W. Wichern, Applied Multivariate StatisticalAnalysis (Prentice± Hall, Englewood Cliffs, New Jersey, 1992).

14. P. Jonathan, W. V. McCarthy, and A. M. I. Roberts, J. Chemomet.10, 189 (1996).

15. K. Fukunaga, Introduction to Statistical Pattern Recognition, (Ac-ademic Press, San Diego, California, 1990).