dynamiclocalizedsnv,peaksnv,andpartialpeaksnv...
TRANSCRIPT
Research ArticleDynamic Localized SNV Peak SNV and Partial Peak SNVNovel Standardization Methods for Preprocessing ofSpectroscopic Data Used in Predictive Modeling
Emily Grisanti 12 Maria Totska2 Stefan Huber2 Christina Krick Calderon2
Monika Hohmann2 Dominic Lingenfelser2 and Matthias Otto1
1Institute of Analytical Chemistry TU Bergakademie Freiberg Leipziger Str 29 09599 Freiberg Germany2Robert Bosch GmbH Renningen 70465 Stuttgart Germany
Correspondence should be addressed to Emily Grisanti emilygrisantideboschcom
Received 15 March 2018 Accepted 10 July 2018 Published 28 October 2018
Academic Editor K S V Krishna Rao
Copyright copy 2018 Emily Grisanti et al(is is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited
An essential part of multivariate analysis in spectroscopic context is preprocessing (e aim of preprocessing is to removescattering phenomena or disturbances in the spectra due to measurement geometry in order to improve subsequent predictivemodels Especially in vibrational spectroscopy the Standard Normal Variate (SNV) transformation has become very popular andis widely used in many practical applications but standardization is not always ideal when performed across the full spectrumHerein three different new standardization techniques are presented that apply SNV to defined regions rather than to the fullspectrum Dynamic Localized SNV (DLSNV) Peak SNV (PSNV) and Partial Peak SNV (PPSNV) DLSNV is an extension of theLocalized SNV (LSNV) which allows a dynamic starting point of the localized windows on which the SNV is executed in-dividually Peak and Partial Peak SNV are based on picking regions from the spectra with a high correlation to the target value andperform SNV on these essential regions to ensure optimal scatter correction All proposed methods are able to significantlyimprove the model performance in cross validation and robustness tests compared to SNV (e prediction errors could bereduced by up to 16 and 29 compared with LSNV for two regression models
1 Introduction
Chemometric approaches are becoming increasingly pop-ular as they enable more comprehensive extraction of rel-evant information out of complex data provided by moderninstrumental analytics At the same time advances in dataanalysis make it possible to reduce the size of the instrumenthardware by compensating for the missing measurementquality of miniaturized instruments In combination withmultivariate calibration the development of models basedon low-cost analytics such as vibrational spectroscopy al-lows the development of models that predict parametersusually determined with cost-intensive measuring in-struments or complex methods Monitoring the alcoholicfermentation [1] and determining the viscosity of engine oil[2 3] or proteins in milk [4] by spectroscopic means becomethus feasible It has also been possible to determine specific
viscosity modifiers and pour point depressant additivecompounds in engine oils [5] by FTIR which is due to thefact that the concentration of a component followsaccording to the LambertndashBeer Law a linear dependency onthe light absorbance of the medium [6 7]
Preprocessing methods play a decisive role for the per-formance of these models as spectra can be influenced byvarious disturbing factors that interfere with the significance ofthe measurement [8ndash11] (e main influence comes from themeasuring geometry which includes the sample thickness thedistance from the detector to sample the contact pressure andthe angle from the light source to sample [12 13] (eelimination of scattering effects by particles of different sizeand distribution also plays a major role in preprocessing
Different spectroscopic measurement techniques sufferfrom different major disturbing factors In near-infraredspectroscopy it is usually a constant or linear baseline
HindawiJournal of SpectroscopyVolume 2018 Article ID 5037572 14 pageshttpsdoiorg10115520185037572
offset due to scattering light Raman spectra often showpolynomial fluorescence background and for mid-infraredspectra the sample thickness and thus the spectroscopicresponse plays a crucial role [14 15] (e information aboutthe sample is present in the shape of the spectrum andindependent of the offset (additive effect) and the scaling ofthe absolute signal intensity (multiplicative effect) (e taskof preprocessing is to remove these interfering factors fromthe informative part of the spectrum and there are differentapproaches for this
A method for eliminating constant offset terms is tocalculate the first derivative [9] (is procedure can be ex-tended to higher-order derivatives also eliminating offsetterms with linear or quadratic baseline curves (e disad-vantage of calculating the deviation of a spectrum is thatnoise effects are amplified
Multiplicative signal correction (MSC) is another toolwhich can deal with the two major effects A referencespectrum in most cases represented by the mean spectrumof the calibration data set is defined and the spectra arecorrected for the baseline and the multiplicative amplifi-cation effects [16 17] (e approach is associated with theKubelkandashMunk theory which takes optical phenomenacaused by light scattering into account [18 19] For eachspectrum the two correction parameters are estimated viaa least squares regression calculation
Standard normal variate (SNV) removes a constant offsetterm by subtracting the mean value of the full spectrum andbrings all spectra to the same scale by subsequent division bythe standard deviation of the full spectrum [20] Due to itssimplicity SNV is a popular preprocessing method [21] SNVand MSC usually yield similar results and are often regardedas exchangeable [22] Since no extra regression step is neededfor the SNV transformation to estimate the correction pa-rameters in the following the focus lies on SNV as themodelsshould be kept as simple as possible
Some efforts have been made to optimize standardiza-tion techniques A piecewise MSC (PMSC) method has beenproposed by Isaksson and Kowalski [23] which significantlyimproved the predictive power of several regression modelsbased on near-infrared transmittance spectra A LocalizedSNV (LSNV) approach has been introduced by Bi et alperforming the SNV not on the full spectrum but on sub-sequent sequences [24] (is strategy also yielded verypromising results in several regression cases based onbenchmark NIR data sets In the following a dynamicversion of the LSNV algorithm called DLSNV is presentedBy allowing for a dynamic starting point of the first andsubsequent SNV windows it is more flexible to align theSNV to important vibrational bands in the spectra PSNVand PPSNV are based on the idea that the standardizationcan be optimized when performed on distinct wavenumberwindows across highly specific regions of the spectrum
2 Experimental
As a sample set data originated from an investigation aboutaging and interaction phenomena in Automatic Trans-mission Fluids (ATF) were used Many ATF samples have
been stored for different periods at several temperatures toproduce artificially aged samples
(e aim of the presented study was to transfer in-formation coming from a highly specific costly and com-plex measurement method (High-Performance LiquidChromatography coupled with Quadrupole Time-of-Flight-Mass Spectrometry (HPLC-QToF-MS)) to datameasured with a low-cost flexible tabletop instrument(Fourier-Transform Infrared (FTIR) spectrometer)(is wasachieved by analyzing each sample coming from the storageexperiment and determining the additive response signals inthese samples by HPLC-QToF-MS By using these additiveresponses as reference values a calibration model wascreated in order to be able to predict the concentration of theadditive compounds in the samples by evaluating the FTIRspectra (e new standardization techniques proposed hereare being tested for the regression models
21 Additive Compounds Two additive compounds fromtwo different ATF oils were analyzed
Within ATF A an unsaturated ethoxylated amineknown as friction modifierWithin ATF B a bis-tert-butyl-hydroxytoluene (BHT)derivate known as phenolic antioxidant
22 Samples and Experiments For the investigation ofdegradation phenomena in ATFs a comprehensive storageexperiment had been set up (e effects of different materialson ATFs and the impact of temperature on oil aging should beanalyzed (erefore the ATFs were stored under variousconditions in an oven (ree parameters had been varied thestorage temperature the storage time and added materials(e storage times had been adjusted to the temperatures sothat a comparable load according to Arrhenius Law could beexpected (e parameters are listed in Table 1
For all timetemperature combinations three interactionexperiments have been conducted
(i) storage with pure oil(ii) storage with oil plus copper alloy chips(iii) storage with oil plus chips from copper alloy iron
and PA66
(e samples were prepared by storing 100ml fresh oil ina glass jar with a screw cap (e lid had been manipulatedwith a central hole that allowed air exchange
23 Sample Measurements
231 FTIR (e FTIR spectra were collected in transmis-sion with a Bruker Alpha instrument in combination withthe QuickSnapTM transmission sample compartment inthe wavenumber region ranging from 4000 to 600 cmminus1 witha spectral resolution of 4 cmminus1
(e samples were measured without any special samplepreparation with two different setups (1) a droplet of ATFbetween two potassium bromide (KBr) discs separated by
2 Journal of Spectroscopy
a teflon spacer with the thickness of about 50 μm and (2)fixed KBr cuvette of 100 μm thickness filled with ATF
After each sample measurement the KBr discs and thecuvette were rinsed several times with petroleum ether inorder to prevent cross contamination (e cuvette was driedwith N2 gas after rinsing and the KBr discs were dried un-der ambient air For the measurement type (1) 4 spectra persample were recorded and for type (2) one spectrum persample was recorded
Due to the sample layer thickness the hydrocarbon bandsare saturated and therefore the spectra had to be cut in thewavenumber regions between 3000 and 2815 cmminus1 (C-Hstretching mode) and between 1491 and 1424 cmminus1 (C-Hbending and rocking mode) Additionally the CO2 bandswere eliminated by cutting out the region from 2387 to2285 cmminus1 as well(e spectra of ATFA are shown in Figure 1in transmission without any preprocessing as measured inFigure 1(b) after truncation and SNV transformation and inFigure 1(c) SNV transformed after calculating the absorbancespectra by using A minuslog(T) In Figure 2 the same diagramsare shown for ATF B In both cases two series of curves can bediscriminated from the raw spectra by the eye (e blue seriescomes from measurement type (1) and the red set comesfrom the cuvette measurements (2) To combine the two datasets from the measurement setups (1) and (2) are challengingtasks for a predictive model as the main variance is due to thethickness variation(e data set demonstrates the importanceof suitable and sophisticated preprocessing methods in orderto eliminate the difference in the spectra induced by thevarying sample thickness (e standardization techniquespresented here are able to meet this need
232 Liquid Chromatography Coupled with Mass Spectrometry(e measurements for the determination of the additivecompound signals were performed with an Agilent liquidchromatograph 1260 coupled with a high-resolution QToF6540 mass spectrometer with methanolwaterammoniumacetate and isopropanol as an eluent Ionization was carriedout bymeans of electrospray (ESI)(e final compound peakarea data set was created using the Agilent MassHunterQualitative Analysis B 0600 analysis software
(e response signals of the additive compounds arestandardized by subtracting mean and dividing by standarddeviation in order to bring all signal values on the same scale(e standardized signals are depicted in Figure 3
3 Methods
31 Implementation (e proposed novel standardizationmethods and respective optimization processes were imple-mented via Python scripts
32 Regression AlgorithmmdashRidge For the prediction theridge regression estimator implemented in the Python scikit-learn framework for machine learning applications was used[25] It is a linear model which solves a regression task via theleast squares loss function J(w) with L2 regularization [26]Regularization is an approach to minimize the issue ofoverfitting which is particularly important for high-dimensional data such as FTIR spectra by controlling thequadratic sum of the model coefficient w (is is done byadding the penalizing term L2 weighted by the hyper pa-rameter λ
λw2 λ 1113944m
j1w
2j (1)
(us the loss function is defined as
J(w) 1113944n
i1yi minusyipred1113872 1113873
2+ λw2 (2)
where yi stands for the reference value of the ith sample andyipred for the prediction of this sample Since the perfor-mance of the preprocessing methods has to be assessedindependently from the actually used predictive regressionmodel the same regression model with identical hyper-parameter λ was applied to the various preprocessed datasets For the regression of the friction modifier compound ofATF A λ 5 and for the antioxidant of ATF B λ 3 wasused (ese parameters turned out to be the best choicesregarding cross validation and robustness for the SNVtransformed data set in a previously conducted internalstudy
33 Model Performance Evaluation To assess the perfor-mance of our models two different approaches were chosennamely the predictive power under cross validation andnoise addition
331 Cross Validation For cross validation the mean fromthe different measurements of one sample was calculated(e sample set was randomly divided 50 times into a cali-bration and validation set by taking 70 of the data astraining samples and 30 as test samples in each validationiteration with different combinations Each separation runwas provided with a unique random seed to ensure that thedata set was split into the same training and test sets for eachmodel enabling better comparability of results between thedifferent models
332 Robustness against Noise In order to assess the modelperformance under noisy input spectra the model wascalibrated by the full original data set Random Gaussian-distributed white noise was added to each data point (eseperturbed samples were predicted by the model and theprediction error was monitored (is was done for differ-ent noise levels (e random numbers added to each datapoint were generated by a standard normal distributed(mean μ 0 and standard deviation σ 1) random
TABLE 1
Temperature (degC) Storage time (h)120 500 1000 2000 3000140 105 210 415 625160 25 50 105 165
Journal of Spectroscopy 3
number generator e noise levels were de13ned by thefactors (005 010 015 020 025 030 035 040 and 045)which were multiplied with the output of the randomnumber generator For each noise level 50 simulated noisydata sets were generated and predicted by the pretrainedmodel in order to be able to make well-founded statementsabout the model performance under noise perturbation
e noise robustness workow is a very helpful tool toinvestigate whether a good calibration error is a real advantageor if the model ran into over13tting Using the same regressionalgorithm twice with dishyerent regularization parameters λ thelower regularized model will generate a lower initial calibration
error than the more stringent regularized model But if themodels are tested for robustness the latter tends to have a lowererror slope when the noise level increases
34 Evaluation Metrics e built-in functions R2 score andmean squared error (MSE) of the scikit-learn frameworkwere used as performance metrics
341 Mean Squared Error (MSE) e mean squared error(MSE) of a prediction is calculated by the squared dishyerencesbetween the predicted value yipred and the reference value yi
3500 3000 2500 2000 1500 1000Wavenumbers (cmndash1)
Tran
smiss
ion
(au
)
12
10
08
06
04
02
(a)
3500 2500 1500Wavenumbers (cmndash1)
Tran
smiss
ion
(au
) 2
0ndash1ndash2ndash3
1
(b)
3500 2500 1500
Abs
orba
nce (
au)
Wavenumbers (cmndash1)
86420
(c)
Figure 1 FTIR spectra of ATF (a) Raw full transmission spectra without any preprocessing e two data sets with dishyerent measurementsetups can be discriminated by eye e blue spectra originate from measurement type (1) with two KBr discs separated by a Teon spacerand the red set of curves originates from the cuvette measurement (2) (b) SNV-transformed transmission spectra after truncation of thesaturated C-H vibrational regions and CO2 areas and (c) SNV-transformed absorbance spectra after truncation
3500 3000 2500 2000 1500 1000Wavenumbers (cmndash1)
Tran
smiss
ion
(au
)
14
12
10
08
06
02
(a)
04
3500 2500 1500
Abs
orba
nce (
au)
Wavenumbers (cmndash1)
86420
(c)
3500 2500 1500Wavenumbers (cmndash1)
Tran
smiss
ion
(au
)0
ndash1
ndash2
ndash3
1
(b)
Figure 2 FTIR spectra of ATF B (a) Raw full transmission spectra without any preprocessing (b) SNV-transformed transmission spectraafter truncation of the saturated C-H vibrational regions and CO2 areas and (c) SNV-transformed absorbance spectra after truncation
4 Journal of Spectroscopy
of the ith sample For a given data set with n samples theMSEis the average value over all samples It follows the followingformula [27]
MSE y ypred( ) 1nsumn
i1yi minusyipred( )
2 (3)
e best possible MSE value is 0 and small values aredesirable as the deviation from the correct prediction is lowFrom MSE the root-mean-squared error (RMSE) was cal-culated by taking the square root e RMSE value has thesame dimension as the original reference target values
342 R2 Coecient of Determination R2 describes theportion of the variance in the target values (dependentvariables) that can be predicted from the spectra (in-dependent variables) by the model [28] e best possiblescore for R2 is 10 R2 gets 00 for a constant model whichpredicts a constant value disregarding of the input featuresFor linear regression modeling with intercept R2 is equal tothe square of Pearson correlation coecient between pre-dicted and reference target values [29] For a data setcomprising n samples the R2 score is given as
R2 y ypred( ) 1minussumni1 yi minusyipred( )
2
sumni1 yi minusypred( )2 (4)
where yipred is the model prediction of the ith sample whichhas a reference value yi and ypred is the mean value of allpredictions
ypred 1nsumn
i1yipred (5)
35 Standard Normal Variate Each spectrumx (x1 x2 xk) with k measured data points istransformed to the standardized form z (z1 z2 zk)by bringing the spectra to zero mean and unit varianceFor this purpose the mean spectrum x is subtracted fromeach data point xi and divided by the standard deviation
zi xi minus x
sumkj xi minusx( )2kradic (6)
with
x 1ksumk
j
xj (7)
351 Dynamic Localized SNV (DLSNV) e DLSNVworkow is based on the SNV-transformed spectra data set(Figure 4(a)) To calculate the DLSNV data the spectra aredivided into multiple regions On each of these regionsstandardization is performed To adjust the windows toimportant areas in the spectrum a starting point can bede13ned In Figure 4(b) the DLSNV spectra are shown witha starting point of 100 and a window size of 300 pixels
DLSNV algorithm
(i) Perform SNV on a window of the spectrum rangingfrom 13rst data point to the sth one
(ii) Subdivide spectra from sth data point into windowsof all the same size ws
To optimize the two parameters window size ws andstarting point s a three-step approach is performed In eachstep the predictive power of the model is assessed via the
Storage time (h)
Stan
dard
ized
mol
ecul
e sig
nals
(au
)
2
1
0
ndash1
ndash2
ndash3
ndash4
Pure oilCopper chipsMaterial mix
100 200 300 400 500 600 700
(a)
2
1
0
ndash1
ndash2
ndash3
Storage time (h)
Stan
dard
ized
mol
ecul
e sig
nals
(au
)
Pure oilCopper chipsMaterial mix
100 200 300 400 500 600 700
(b)
Figure 3 Standardized additive responses used as target value for the FTIR regressionmodel for (a) the friction modi13er compound and (b)the antioxidant plotted against time for storage temperature 140degC for all three storage experiments measured by HPLC-QToF
Journal of Spectroscopy 5
coecient of determination R2 e prediction performanceof the chosen window size in combination with the re-gression model is benchmarked by 13tting the same model tothe single SNV data indicated by a red line in Figure 4(c)e optimization steps can be summarized as follows
(1) Perform LSNV with window sizes from 50 to 500pixels and determine R2 for all window sizes Findthe optimal window size wsopt1
(2) Perform LSNV with optimal window size of step 1wsopt 1 vary the starting point from 0 to 2middotwsopt 1 andselect the optimal starting point sopt
(3) Perform LSNV with optimal starting point sopt withwindow sizes from 50 to 2middotwsopt 1 in order to 13nd thebest combination of window size wsopt 2 and startingpoint sopt
In Figure 4(d) the 13nal DLSNV spectra after optimi-zation are shown Note that jumps can occur between theindividual standardization windows since the mean value ofthis current window is subtracted for each window How-ever this does not ashyect the regression model
In Figure 5(a) the ATF A samples are shown withSNVperformed on the entire spectral region and in Figure 5(b)the same spectra are depicted after DLSNV optimizationFigure 5(c) shows a zoom-in view of the highlighted region ofFigure 5(a) and in Figure 5(d) the same region is depicted afterDLSNV optimization e baseline is removed for the exactspectra sequence and thus peaks are aligned in a way that thedishyerent aging levels of the samples can already be recognizedby eyee shown snipped spectrum is the phenolic antioxidantregionus the decrease of this band can be associatedwith theaging level Magenta indicates (relatively) fresh sampleswhereas red indicates a strong degradation level
352 Peak SNV e idea behind the Peak SNV method isto standardize the important areas of the spectrum in-dependently of each other e optimization workow forPSNV is shown in Figure 6 starting from the single SNVtransformed data set Data points with a high correlationwith the target values (points of interest POI) are selected
(Figure 6(a)) and the SNV transformation is performed onwindows around the centroids Once the POIs are identi13edthe PSNV transformation is conducted as follows
PSNV algorithm
(i) Subdivide spectra into sequences ranging from halfthe distance from the previous POI to half thedistance to the next one (Figure 6(b)) SNV isperformed across these windows
To 13nd the POI an initial regression model is 13tted tothe data In order to identify important regions of thespectra the model coecients are assessed e normalizedabsolute values of the coecient vector are fed into a peak-picking algorithm Since it may occur that POIs are in closeproximity an agglomeration of the POIs is conducted inorder to prevent from very narrow standardization windowsPeak centroids are calculated via the mean value of thecombined POIs e task for the optimization process is to13nd the best window for POI agglomeration aggopt whichis done by analyzing the calibration R2 for each agglom-eration window and picking the window size with maximalcorrelation between the predicted and reference target(Figure 6(c)) e steps are summarized as follows
(1) Fit the data set to the target values (only calibration)(2) Pick peaks from the normalized model coecient
vector (|w|max(|w|)) threshold for peaks 01(3) Combine peaks which are within a certain window
agg and calculate the centroid of the agglomeratedPOIs
(4) Perform PSNV across the centroid of the POIs(5) Evaluate performance via R2 for agg between 10 and
50 data points and choose aggopt according tomaximal R2
After optimization each window has an individualwindow size and range over the peak centroid of im-portant signals in the spectrum On these windows SNVtransformation provides an optimal baseline and scattereshyect removal e optimized spectrum is shown inFigure 6(d)
SNV
3500 3000 2000 1000Wavenumbers (cmndash1)
2500
Window SNV
Wavenumbers (cmndash1)2500
Find optimalwindow size and
start point
(b)(a) (c)
Final window SNV
3500 3000 2000 1000 3500 3000 2000 1000Wavenumbers (cmndash1)
2500
(d)
095
095
095
097
097
097
R2 Optimization process
R2
R2
Window size 1st run
Starting point
Window size 2nd run
100 200 300 400 500
0 20 40 60 80 100 120
50 60 70 80 90 100 110 120
Figure 4 Demonstration of the workow and optimization process for Dynamic Localized SNV (a) Single SNV (b) Dynamic LocalizedSNV with starting point 100 and window 300 for visualization (c) three-stage optimization process for window starting point and 13nalwindow optimization and (d) optimized DLSNV
6 Journal of Spectroscopy
353 Partial Peak SNV e idea behind Partial Peak SNV issimilar to PSNV picking the regions of the spectrum whichshow a high correlation with the target values agglomeratingPOIs in close proximity and standardizing these importantspectral features (Figure 7(a)) But unlike for PSNV not onlythe whole spectrum is 13nally taken into account but alsoa small window around the POI It may occur that the samedata point appears several times in dishyerent standardizations(see overlapping regions in Figures 7(b) and 7(d)) Due to thisworkow the PPSNV spectrum may have more data points(due to overlapping) or less (because not the entire spectrum istaken into account) than those of the original spectrum APPSNV spectrum is calculated as follows
PPSNV algorithm
(i) Perform SNV across the POIs with a left and rightmargin of pw
e optimization focuses on the adjustment of thewindow size pw around the POIs in which the SNV isapplied for maximal predictive power in calibration (Fig-ure 7(c)) e optimization process is divided into thefollowing steps
(1) Fit the data set to the target values (only calibration)(2) Pick peaks from the normalized model coecient
vector (|w|max(|w|)) threshold for peaks 01
Wavenumbers (cmndash1) Wavenumbers (cmndash1)
(b)(a)Wavenumbers (cmndash1)
2500
3500 3450 3400 3350 3500 3450 3400 3350
3500 3000 2000 1000 3500 3000 2000 1000Wavenumbers (cmndash1)
2500
(d)(c)
Figure 5 Demonstration of the improvement of peak alignment for DLSNV (a) SNV-transformed transmission spectra with marked zoomlevel of (c) (b) Optimized DLSV spectra with marked zoom area of (d) Magenta indicates (relatively) fresh samples whereas red indicatesa strong degradation level
R2
Agglomeration window size
Optimization
(c)Wavenumbers (cmndash1)
2500
Divide spectrum in peak windows
Agglomeratepeaks andperform
peak SNV
Wavenumbers (cmndash1)2500
Model coefficientsselected peaks
(b)(a)
10 20 30 40 503500 3000 2000 10003500 3000 2000 1000 3500 3000 2000 1000Wavenumbers (cmndash1)
2500
Final peak SNV
(d)
Find optimalagglomerationwindow size 095
096
Figure 6 Demonstration of the workow and optimization process for Dynamic Localized SNV (a) Picked peaks of normalized absolutecoecient vector and indication of the POIs in one spectrum (b) spectrum separation according to agglomerated peaks (c) optimizationprocess in order to 13nd optimal agglomeration window size and (d) optimized PSNV
Journal of Spectroscopy 7
Reference friction modifier peak volume
Nonlinearity
ndash3
ndash2
ndash1
0
1
2
Calibration samplesValidation samples
Pred
icte
d fri
ctio
n m
odifi
er p
eak
volu
me
Decreasein additive concentration
ndash3 ndash2 ndash1 0 1 2
(a)
Calibration samplesValidation samples
Reference friction modifier peak volume
ndash3
ndash2
ndash1
0
1
2
Pred
icte
d fri
ctio
n m
odifi
er p
eak
volu
me
ndash3 ndash2 ndash1 0 1 2
(b)
Calibration samplesValidation samples
ndash3
ndash2
ndash1
0
1
2
Pred
icte
d fri
ctio
n m
odifi
er p
eak
volu
me
Reference friction modifier peak volumendash3 ndash2 ndash1 0 1 2
(c)
Calibration samplesValidation samples
ndash3 ndash2 ndash1 0 1 2
ndash3
ndash2
ndash1
0
1
2
Pred
icte
d fri
ctio
n m
odifi
er p
eak
volu
me
Reference friction modifier peak volume
(d)
Figure 8 Cross validation results for (a) SNV (b) Dynamic Localized SNV (c) Peak SNV and (d) Partial Peak SNV preprocessed spectra inthe regression case of the friction modi13er additive e reference values were measured by HPLC-QToF Red dots indicate prediction ofcalibration samples and blue dots represent prediction of the hold-out validation samples e dashed arrow in (a) indicates the increasingdegradation of the samples with increasing time as the friction modi13er additive concentration gets lower
R2
Window size0 50 100 150 200
Optimization
(c)
3500 3000 2000 1000Wavenumbers (cmndash1)
2500
Divide spectrum in peak windows
Agglomeratepeaks andperform PartPeak SNV
3500 3000 2000 1000Wavenumbers (cmndash1)
2500
Model coefficientsselected peaks
(b)(a)
150010000Data points
500
Final part peak SNV
(d)
Find optimalwindow size
092
097
Figure 7 Demonstration of the workow and optimization process for Dynamic Localized SNV (a) Picked peaks of normalized absolutecoecient vector and indication of the POIs in one spectrum (b) spectrum separation according to agglomerated peaks (c) optimizationprocess in order to 13nd optimal window size around the POIs and (d) optimized PPSNV
8 Journal of Spectroscopy
(3) Perform PPSNV across the peaks with the windowsize pw
(4) Evaluate the performance via R2 for pw between 1and 200 data points and choose pwopt according tomaximal R2
4 Results and Discussion
41 Cross Validation In Figure 8(a) the cross validationrecovery function for predictions of the SNV preprocessedspectra of the regression on the friction modi13er compoundis shown A 50-fold cross validation strategy with a cali-brationvalidation splitting of 7030 was used Red dotsrepresent the prediction of calibration and blue dots rep-resent validation samples It is obvious that the linear modelstruggles to predict the high and low compound intensityregions correctly e nonlinearity is visualized by an arrowand a dashed line to guide the eye
In Figure 8 also the cross validation recovery functionfor predictions after Dynamic Localized SNV (Figure 8(b))Peak SNV (Figure 8(c)) and Partial Peak SNV (Figure 8(d))optimization are shown e saturation eshyect in the lowintensity area of the compound response is almost com-pletely removed in the latter three cases It is also notablethat the scattering around the green bisecting line is sig-ni13cantly reduced us the con13dence interval for thepredictions is improved
e RMSEP values during the cross validation of theregression of the friction modi13er component are sum-marized in Figure 9 in a box-and-whisker plot representa-tion e red line indicates the median within the boxes theinterquartile range (IQR) (contains 50 of the data) isdepicted and the margins of the whiskers representQ1 minus 15 middot IQR and Q3 + 15 middot IQR for the lower and upperbound respectively (Q1 means the smallest 25 of the dataset are smaller than this value and Q3 means the smallest75 are smaller than this value) Subplot Figure 9(a) refers tothe transmission spectra and Figure 9(b) refers to the ab-sorbance spectra e labels are associated with (1) withoutstandardization (2) single SNV transformation on the fullspectral range (3) Localized SNV (4) Dynamic LocalizedSNV (5) Peak SNV and (6) Partial Peak SNV
It is noticeable that the RMSEP is very poor in case of thecrude transmission spectra and that SNV has a very usefulimpact on them whereas the improvement after SNV is lowfor absorbance spectra
For all sophisticated optimized standardization ap-proaches DLSNV PSNV and PPSNV the median and thescattering around the median of RMSEP decreases drasti-cally with respect to the SNV-transformed full spectra butalso LSNV seems to be a reasonable choice DLSNV onabsorbance spectra is characterized by the lowest medianand the smallest scattering con13rmed by Table 2 summa-rizing the mean values and standard deviation of RMSEP
RMSE
P Proposedmethods
Proposedmethods
(1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)
+ ++
+ +
+++
++++
00
02
04
06
08
10
12
14
16
(a) (b)
Figure 9 Box-and-whisker plot representation of the root-mean-squared error of prediction of the cross validation strategy (50 foldsrandom train test split of 7030 of the data) for the friction modi13er compound of ATF A In (a) the RMSEP values for the transmissionspectra are shown and in (b) the RMSEP values for the absorbance spectra are shown Boxplot (1) is without standardization (2) is witha single SNV transformation on the full spectrum (3) is with optimized LSNV (4) is with optimized Dynamic DLSNV (5) is with optimizedPSNV and (6) is with optimized PPSNV
Journal of Spectroscopy 9
e summarized RMSEPs of the regression to the antiox-idant additive of ATF B are shown in Figure 10 in a box-and-whisker plot representation where Figure 10(a) refers to thetransmission spectra and Figure 10(b) to the absorbance spectraIn this case DLSNV PSNV and PPSNV reduce both themedian and the scattering around themedian enormously whencompared with SNV on full spectra e best performance isachieved by PPSNV conducted on the transmission spectracon13rmed by Table 2 On transmission spectra DLSNV andPPSNV perform better than LSNV but PSNV only has a pos-itive eshyect when compared to SNV In relation to LSNV usingPSNV the predictive power is reduced e fact that PPSNV isthe best choice for this regression use case suggests that it isbene13cial to only use spectral regions with high correlation withthe target value and drop regions without or low correlation
411 Noise Robustness In Figure 11 the performance of theregression model for the prediction of noisy spectra isshown for the friction modi13er In subplot Figure 11(a)the curves for all preprocessings are depicted and inFigure 11(b) a zoomed view is shown Without any pre-processing the initial calibration error for both trans-mission and absorbance spectra is very poor and rises veryfast with the increasing noise level factor Although thesophisticated preprocessing methods LSNV DLSNVPNSV and PPSNV show a lower initial calibration errorthe slope of the error is lower than for SNV In Figure 11(b)the trend of the transmission spectra having low errorsteepness is visible One may say that the three proposedstandardization techniques show a very similar noise ro-bustness behavior and are signi13cantly better than none or
(a) (b)
RMSE
P
Proposedmethods
Proposedmethods
(1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)00
02
04
06
08
10
12
14
+
+
++
++
+
++
+
Figure 10 Box-and-whisker plot representation of the root-mean-squared error of prediction of the cross validation strategy for thephenolic antioxidant compound of ATF B In (a) the RMSEP values for the transmission spectra are shown and in (b) the RMSEP valuesfor the absorbance spectra are shown Boxplot (1) is without standardization (2) is with a single SNV transformation on the full spectrum(3) is with optimized LSNV (4) is with optimized Dynamic DLSNV (5) is with optimized PSNV and (6) is with optimized PPSNV
Table 2 Summary of the model performances described by the mean value and the standard deviation of the RMSEP values during crossvalidation e relative improvements and respective p-values compared with LSNV are also listed
MethodFriction modi13er ATF A Antioxidant ATF B
Transmission Absorbance Transmission AbsorbanceRaw 083plusmn 022 053plusmn 016 090plusmn 014 070plusmn 011Single SNV 034plusmn 014 042plusmn 017 061plusmn 011 070plusmn 013LSNV 030plusmn 007 031plusmn 007 024plusmn 004 024plusmn 004DLSNV 028plusmn 006 026 plusmn 005 021plusmn 004 021plusmn 004Rel improvement 9 (plt 005) 16 (plt 0001) 13 (plt 0001) 13 (plt 0001)PSNV 028plusmn 008 028plusmn 006 033plusmn 006 031plusmn 007Rel improvement 8 (pgt 005) 9 (plt 005) minus41 (plt 0001) minus33 (plt 0001)PPSNV 026plusmn 006 026plusmn 006 017 plusmn 004 024plusmn 005Rel improvement 13 (plt 001) 15 (plt 0001) 29 (plt 0001) minus3 (pgt 005)
10 Journal of Spectroscopy
SNV preprocessing indicating that the model did notoverfit the data as mentioned in Section 332
In Figure 12 the performance of the regression model ofthe antioxidant for the prediction of noisy spectra is shown(e absorbance spectra with or without preprocessing showa similar noise trend as the SNV-transformed spectra InFigure 12(b) the localized versions are shown in a zoomedview (e PPSNV preprocessing on the transmission spectrais characterized by the flattest noise dependency (ese resutsdemonstrate the superiority of the PPSNVmethod in this usecase As mentioned above PSNV is not advantageous in thisapplication and shows the lowest noise immunity but it ispreferable to the SNV across the whole spectrum
42 Summary (e optimized parameters for the pre-processing methods are summarized in Table 3 (e LSNVoptimization process selects the same window size asDLSNV (us the second window size run has no influenceon the final result in these two cases but the starting pointproduces an improvement
As already mentioned in Table 2 the cross validationperformances of the tested methods are summarized as meanvalues and standard deviation for all cross validation runs Forthe friction modifier the performances of DLSNV PSNVand PPSNV are very similar Table 2 also lists the relative
improvements against the benchmark preprocessing LSNVaccompanied by corresponding p values from a two-sidedt-test which tests the significance of the mean values beingdifferent (the deviation for relative improvements when thesame mean value is given due to the fact that the improve-ments were calculated from exact values rather than roundedvalues)
(e best mean RMSEP value for the regression model forthe frictionmodifier of 026 is produced by DLSNV based onabsorbance spectra (e antioxidant compound is modeledbest by PPSNV preprocessing of the transmission spectraand yields a very low prediction error of 017
To summarize one may say that all proposed methodsperformed very well reducing both mean and standarddeviation of the cross validation error compared with SNVPSNV is not reasonable for the antioxidant additive as theperformance is poor compared with the benchmark pre-processing method LSNV
Which preprocessing method is the best depends on theactual regression use case but in general it is shown thatPPSNV outperforms PSNV (is suggests that it is bene-ficial to drop spectral regions showing low or no de-pendency on the target value and to only consider highlycorrelated peaks
For the antioxidant compound PPSNV yielded anenormous improvement (is could be explained as the
Noise levelfactor
Wo standardizationSNVLSNVDLSNV
PSNVPPSNVTransmissionAbsorbance
14
12
10
08
RMSE
06
04
02
0001 02 03 04
(a)
RMSE
Noise levelfactor
LSNVDLSNVPSNV
PPSNVTransmissionAbsorbance
045
040
035
030
025
020
01501 02 03 04
(b)
Figure 11 RMSE as a function of the noise level factor for the regression of the friction modifier compound of ATF A calibrated by theunperturbed full data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50repetitions of noise addition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown
Journal of Spectroscopy 11
phenolic aging inhibitor is a compound with very narrowvibrational band in the ATF B and thus does not have a greatimpact when the SNV is carried out across the entirespectrum (is may lead to a suboptimal alignment of thisband In case of novel standardizations the SNV is opti-mized to the high correlative bands and scatter effects can becompensated for these exact regions
(e fact that PSNV is unsuccessful for the antioxidantmay be because the POIs are not centered to the middle ofthe SNVwindow andmay have large left and right margins ifthey are far away from other POIs As a result they may notbe optimally standardized (is is shown in Figure 6(b)where the single POI at about 2700 cmminus1 has a large singleSNV window
(e study provides an overview of model performanceswhen using transmission or absorbance spectra suggestingthat both cases can lead to valid regressionmodels Howeverfor quantitative models built on transmission spectra theSNV is vital whereas in the absorbance case the predictivepower does not depend on SNV transformation In absor-bance spectra the influence of the baseline constant is re-duced because high transmission values are converted intolow absorbance values
To conclude DLSNV PSNV and PPSNV were able toimprove both transmission and absorbance predictivemodels (e scattering around the mean values are also
drastically reduced because the model does not have to learnhow to compensate for the baseline shift in each crossvalidation step leading to more reproducible results Eachvibrational band is optimally aligned so that the additivedepletion trend is encoded in the absolute signal intensityand the model does not have to weigh a data point asbackground correction
5 Conclusion
(e results presented in this study demonstrate the out-performance of the proposed novel standardizationstrategies Dynamic Localized SNV Peak SNV and PartialPeak SNV to improve both the mean and scatter of RMSEP
Noise level factor
RMSE
Wo standardizationSNVLSNVDLSNV
PSNVPPSNVTransmissionAbsorbance
20
15
10
05
0001 02 03 04
(a)
LSNVDLSNVPSNV
PPSNVTransmissionAbsorbance
RMSE
Noise level factor
07
06
05
04
03
02
0101 02 03 04
(b)
Figure 12 RMSE as a function of the noise level factor for the regression of the antioxidant compound of ATF B calibrated by the originalfull data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50 repetitions of noiseaddition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown
Table 3 Summary of the preprocessing parameters For DLSNVwindow size and starting point for PSN agglomeration windowand for PPSNV window width around POI is shown
MethodATF A ATF B
Transmission Absorbance Transmission AbsorbanceLSNV wopt 52 wopt 52 wopt 51 wopt 2 51DLSNV wopt 2 52 wopt 2 52 wopt 2 51 wopt 2 51
sopt 5 sopt 7 sopt 48 sopt 48PSNV aggopt 5 aggopt 14 aggopt 6 aggopt 10PPSNV pwopt 17 pwopt 25 pwopt 15 pwopt 38
12 Journal of Spectroscopy
values in cross validation and the robustness against noisedrastically with respect to SNV transformation executed onthe entire spectrum Against the benchmark LSNV anenhancement of the predictive power of a ridge regressionmodel by up to 16 and 29 could be achieved for thefriction modifier and the antioxidant compound re-spectively (e demonstrated optimization workflows forperforming SNV on specific regions of the spectrum havebeen introduced here for the first time (erefore thestandardization methods used in this paper are capable ofeliminating nonlinearities by flexible rescaling in definedareas To our knowledge such standardization techniqueshave not been presented elsewhere
Data Availability
(e data used to support the findings of this study are in-cluded within the article
Conflicts of Interest
(e authors declare that they have no conflicts of interest
References
[1] M Zeaiter J M Roger and V Bellon-Maurel ldquoDynamicorthogonal projection a new method to maintain the on-linerobustness of multivariate calibrations application to nir-based monitoring of wine fermentationsrdquo Chemometrics andIntelligent Laboratory Systems vol 80 no 2 pp 227ndash2352006
[2] R M Balabin and R Z Safieva ldquoMotor oil classificationby base stock and viscosity based on near infrared (nir)spectroscopy datardquo Fuel vol 87 no 12 pp 2745ndash27522008
[3] A Borin and R J Poppi ldquoMultivariate quality control oflubricating oils using fourier transform infrared spectros-copyrdquo Journal of the Brazilian Chemical Society vol 15 no 4pp 570ndash576 2004
[4] J Huang D Brennan L Sattler J Alderman B Lane andC OrsquoMathuna ldquoA comparison of calibration methods basedon calibration data size and robustnessrdquo Chemometrics andIntelligent Laboratory Systems vol 62 no 1 pp 25ndash352002
[5] M A Al-Ghouti and L Al-Atoum ldquoVirgin and recycledengine oil differentiation a spectroscopic studyrdquo Journal ofEnvironmental Management vol 90 no 1 pp 187ndash195 2009
[6] R Kellner J M Mermet M Otto M Valcarcel andH M Widmer Analytical Chemistry John Wiley amp SonsAustralia Limited Milton Australia 2004
[7] B G Osborne T Fearn P T Hindle and B G OsbornePractical nir Spectroscopy with Applications in Food andBeverage Analysis Longman Scientific amp Technical WileyHarlow Essex UK 1993
[8] J D Donald and D D Kevin Interpreting Diffuse Reflectanceand Transmittance NIR Chichester UK 2007
[9] A Rinnan F van den Berg and S B Engelsen ldquoReview of themost common pre-processing techniques for near-infraredspectrardquo TrAC Trends in Analytical Chemistry vol 28 no 10pp 1201ndash1222 2009
[10] E Arendse O Amos Fawole L Samukelo MagwazaN Helene and U Linus Opara ldquoComparing the analyticalperformance of near and mid infrared spectrometers for
evaluating pomegranate juice qualityrdquo LWT vol 91pp 180ndash190 2018
[11] J C Machado M A Faria I M P L V O FerreiraR N M J Pascoa and J A Lopes ldquoVarietal discrimination ofhop pellets by near and mid infrared spectroscopyrdquo Talantavol 180 pp 69ndash75 2018
[12] H W Siesler ldquoVibrational spectroscopyrdquo in ReferenceModule in Materials Science and Materials EngineeringElsevier New York NY USA 2016
[13] A Paula Craig B G Botelho L S Oliveira and A S FrancaldquoMid infrared spectroscopy and chemometrics as tools for theclassification of roasted coffees by cup qualityrdquo FoodChemistry vol 245 pp 1052ndash1061 2018
[14] S R Khandasammy M A Fikiet E Mistek et al ldquoBloodstainspaintings and drugs raman spectroscopy applications in fo-rensic sciencerdquo Forensic Chemistry vol 8 pp 111ndash133 2018
[15] J Engel G Jan E Szymanska et al ldquoBreaking with trends inpre-processingrdquo TrAC Trends in Analytical Chemistryvol 50 pp 96ndash106 2013
[16] H Martens S Jensen and P Geladi ldquoMultivariate linearitytransformation for near-infrared reflectance spectrometryrdquo inProceedings of Nordic symposium on Applied Statisticspp 205ndash234 Stavanger Norway June 1983
[17] P Geladi D MacDougall and HMartens ldquoLinearization andscatter-correction for near-infrared reflectance spectra ofmeatrdquo Applied Spectroscopy vol 39 no 3 pp 491ndash500 1985
[18] P Kubelka and F Munk ldquoEin beitrag zur optik der far-banstricherdquo Zeitschrift fur Technische Physik vol 12pp 593ndash601 1931
[19] A Claus Andersson ldquoDirect orthogonalizationrdquo Chemo-metrics and Intelligent Laboratory Systems vol 47 no 1pp 51ndash63 1999
[20] R J Barnes M S Dhanoa and S J Lister ldquoStandard normalvariate transformation and de-trending of near-infrareddiffuse reflectance spectrardquo Applied Spectroscopy vol 43no 5 pp 772ndash777 1989
[21] M Zeaiter J-M Roger and V Bellon-Maurel ldquoRobustnessof models developed by multivariate calibration part ii theinfluence of pre-processing methodsrdquo TrAC Trends in Ana-lytical Chemistry vol 24 no 5 pp 437ndash445 2005
[22] T Fearn C Riccioli A Garrido-Varo and J Emilio Guer-rero-Ginel ldquoOn the geometry of SNV and mscrdquo Chemo-metrics and Intelligent Laboratory Systems vol 96 no 1pp 22ndash26 2009
[23] T Isaksson and B Kowalski ldquoPiece-wise multiplicative scattercorrection applied to near-infrared diffuse transmittance datafrom meat productsrdquo Applied Spectroscopy vol 47 no 6pp 702ndash709 1993
[24] Y Bi K Yuan W Xiao et al ldquoA local pre-processing methodfor near-infrared spectra combined with spectral segmen-tation and standard normal variate transformationrdquo Analy-tica Chimica Acta vol 909 pp 30ndash40 2016
[25] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 pp 2825ndash2830 2011
[26] S Raschka Python Machine Learning Packt PublishingBirmingham UK 2015
[27] E L Lehmann and G Casella Deory of Point EstimationSpringer Texts in Statistics Springer New York NY USA2003
[28] O Heinisch ldquoSteel R G D and J H Torrie principles andprocedures of statistics (with special reference to the bi-ological sciences) mcgraw-hill book company New York
Journal of Spectroscopy 13
Toronto London 1960 481 s 15 abb 81 s 6 drdquo BiometrischeZeitschrift vol 4 no 3 pp 207-208 1962
[29] K Pearson ldquoNote on regression and inheritance in the case oftwo parentsrdquo Proceedings of the Royal Society of Londonvol 58 no 1 pp 240ndash242 1895
14 Journal of Spectroscopy
TribologyAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
International Journal ofInternational Journal ofPhotoenergy
Hindawiwwwhindawicom Volume 2018
Journal of
Chemistry
Hindawiwwwhindawicom Volume 2018
Advances inPhysical Chemistry
Hindawiwwwhindawicom
Analytical Methods in Chemistry
Journal of
Volume 2018
Bioinorganic Chemistry and ApplicationsHindawiwwwhindawicom Volume 2018
SpectroscopyInternational Journal of
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Medicinal ChemistryInternational Journal of
Hindawiwwwhindawicom Volume 2018
NanotechnologyHindawiwwwhindawicom Volume 2018
Journal of
Applied ChemistryJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Biochemistry Research International
Hindawiwwwhindawicom Volume 2018
Enzyme Research
Hindawiwwwhindawicom Volume 2018
Journal of
SpectroscopyAnalytical ChemistryInternational Journal of
Hindawiwwwhindawicom Volume 2018
MaterialsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
BioMed Research International Electrochemistry
International Journal of
Hindawiwwwhindawicom Volume 2018
Na
nom
ate
ria
ls
Hindawiwwwhindawicom Volume 2018
Journal ofNanomaterials
Submit your manuscripts atwwwhindawicom
offset due to scattering light Raman spectra often showpolynomial fluorescence background and for mid-infraredspectra the sample thickness and thus the spectroscopicresponse plays a crucial role [14 15] (e information aboutthe sample is present in the shape of the spectrum andindependent of the offset (additive effect) and the scaling ofthe absolute signal intensity (multiplicative effect) (e taskof preprocessing is to remove these interfering factors fromthe informative part of the spectrum and there are differentapproaches for this
A method for eliminating constant offset terms is tocalculate the first derivative [9] (is procedure can be ex-tended to higher-order derivatives also eliminating offsetterms with linear or quadratic baseline curves (e disad-vantage of calculating the deviation of a spectrum is thatnoise effects are amplified
Multiplicative signal correction (MSC) is another toolwhich can deal with the two major effects A referencespectrum in most cases represented by the mean spectrumof the calibration data set is defined and the spectra arecorrected for the baseline and the multiplicative amplifi-cation effects [16 17] (e approach is associated with theKubelkandashMunk theory which takes optical phenomenacaused by light scattering into account [18 19] For eachspectrum the two correction parameters are estimated viaa least squares regression calculation
Standard normal variate (SNV) removes a constant offsetterm by subtracting the mean value of the full spectrum andbrings all spectra to the same scale by subsequent division bythe standard deviation of the full spectrum [20] Due to itssimplicity SNV is a popular preprocessing method [21] SNVand MSC usually yield similar results and are often regardedas exchangeable [22] Since no extra regression step is neededfor the SNV transformation to estimate the correction pa-rameters in the following the focus lies on SNV as themodelsshould be kept as simple as possible
Some efforts have been made to optimize standardiza-tion techniques A piecewise MSC (PMSC) method has beenproposed by Isaksson and Kowalski [23] which significantlyimproved the predictive power of several regression modelsbased on near-infrared transmittance spectra A LocalizedSNV (LSNV) approach has been introduced by Bi et alperforming the SNV not on the full spectrum but on sub-sequent sequences [24] (is strategy also yielded verypromising results in several regression cases based onbenchmark NIR data sets In the following a dynamicversion of the LSNV algorithm called DLSNV is presentedBy allowing for a dynamic starting point of the first andsubsequent SNV windows it is more flexible to align theSNV to important vibrational bands in the spectra PSNVand PPSNV are based on the idea that the standardizationcan be optimized when performed on distinct wavenumberwindows across highly specific regions of the spectrum
2 Experimental
As a sample set data originated from an investigation aboutaging and interaction phenomena in Automatic Trans-mission Fluids (ATF) were used Many ATF samples have
been stored for different periods at several temperatures toproduce artificially aged samples
(e aim of the presented study was to transfer in-formation coming from a highly specific costly and com-plex measurement method (High-Performance LiquidChromatography coupled with Quadrupole Time-of-Flight-Mass Spectrometry (HPLC-QToF-MS)) to datameasured with a low-cost flexible tabletop instrument(Fourier-Transform Infrared (FTIR) spectrometer)(is wasachieved by analyzing each sample coming from the storageexperiment and determining the additive response signals inthese samples by HPLC-QToF-MS By using these additiveresponses as reference values a calibration model wascreated in order to be able to predict the concentration of theadditive compounds in the samples by evaluating the FTIRspectra (e new standardization techniques proposed hereare being tested for the regression models
21 Additive Compounds Two additive compounds fromtwo different ATF oils were analyzed
Within ATF A an unsaturated ethoxylated amineknown as friction modifierWithin ATF B a bis-tert-butyl-hydroxytoluene (BHT)derivate known as phenolic antioxidant
22 Samples and Experiments For the investigation ofdegradation phenomena in ATFs a comprehensive storageexperiment had been set up (e effects of different materialson ATFs and the impact of temperature on oil aging should beanalyzed (erefore the ATFs were stored under variousconditions in an oven (ree parameters had been varied thestorage temperature the storage time and added materials(e storage times had been adjusted to the temperatures sothat a comparable load according to Arrhenius Law could beexpected (e parameters are listed in Table 1
For all timetemperature combinations three interactionexperiments have been conducted
(i) storage with pure oil(ii) storage with oil plus copper alloy chips(iii) storage with oil plus chips from copper alloy iron
and PA66
(e samples were prepared by storing 100ml fresh oil ina glass jar with a screw cap (e lid had been manipulatedwith a central hole that allowed air exchange
23 Sample Measurements
231 FTIR (e FTIR spectra were collected in transmis-sion with a Bruker Alpha instrument in combination withthe QuickSnapTM transmission sample compartment inthe wavenumber region ranging from 4000 to 600 cmminus1 witha spectral resolution of 4 cmminus1
(e samples were measured without any special samplepreparation with two different setups (1) a droplet of ATFbetween two potassium bromide (KBr) discs separated by
2 Journal of Spectroscopy
a teflon spacer with the thickness of about 50 μm and (2)fixed KBr cuvette of 100 μm thickness filled with ATF
After each sample measurement the KBr discs and thecuvette were rinsed several times with petroleum ether inorder to prevent cross contamination (e cuvette was driedwith N2 gas after rinsing and the KBr discs were dried un-der ambient air For the measurement type (1) 4 spectra persample were recorded and for type (2) one spectrum persample was recorded
Due to the sample layer thickness the hydrocarbon bandsare saturated and therefore the spectra had to be cut in thewavenumber regions between 3000 and 2815 cmminus1 (C-Hstretching mode) and between 1491 and 1424 cmminus1 (C-Hbending and rocking mode) Additionally the CO2 bandswere eliminated by cutting out the region from 2387 to2285 cmminus1 as well(e spectra of ATFA are shown in Figure 1in transmission without any preprocessing as measured inFigure 1(b) after truncation and SNV transformation and inFigure 1(c) SNV transformed after calculating the absorbancespectra by using A minuslog(T) In Figure 2 the same diagramsare shown for ATF B In both cases two series of curves can bediscriminated from the raw spectra by the eye (e blue seriescomes from measurement type (1) and the red set comesfrom the cuvette measurements (2) To combine the two datasets from the measurement setups (1) and (2) are challengingtasks for a predictive model as the main variance is due to thethickness variation(e data set demonstrates the importanceof suitable and sophisticated preprocessing methods in orderto eliminate the difference in the spectra induced by thevarying sample thickness (e standardization techniquespresented here are able to meet this need
232 Liquid Chromatography Coupled with Mass Spectrometry(e measurements for the determination of the additivecompound signals were performed with an Agilent liquidchromatograph 1260 coupled with a high-resolution QToF6540 mass spectrometer with methanolwaterammoniumacetate and isopropanol as an eluent Ionization was carriedout bymeans of electrospray (ESI)(e final compound peakarea data set was created using the Agilent MassHunterQualitative Analysis B 0600 analysis software
(e response signals of the additive compounds arestandardized by subtracting mean and dividing by standarddeviation in order to bring all signal values on the same scale(e standardized signals are depicted in Figure 3
3 Methods
31 Implementation (e proposed novel standardizationmethods and respective optimization processes were imple-mented via Python scripts
32 Regression AlgorithmmdashRidge For the prediction theridge regression estimator implemented in the Python scikit-learn framework for machine learning applications was used[25] It is a linear model which solves a regression task via theleast squares loss function J(w) with L2 regularization [26]Regularization is an approach to minimize the issue ofoverfitting which is particularly important for high-dimensional data such as FTIR spectra by controlling thequadratic sum of the model coefficient w (is is done byadding the penalizing term L2 weighted by the hyper pa-rameter λ
λw2 λ 1113944m
j1w
2j (1)
(us the loss function is defined as
J(w) 1113944n
i1yi minusyipred1113872 1113873
2+ λw2 (2)
where yi stands for the reference value of the ith sample andyipred for the prediction of this sample Since the perfor-mance of the preprocessing methods has to be assessedindependently from the actually used predictive regressionmodel the same regression model with identical hyper-parameter λ was applied to the various preprocessed datasets For the regression of the friction modifier compound ofATF A λ 5 and for the antioxidant of ATF B λ 3 wasused (ese parameters turned out to be the best choicesregarding cross validation and robustness for the SNVtransformed data set in a previously conducted internalstudy
33 Model Performance Evaluation To assess the perfor-mance of our models two different approaches were chosennamely the predictive power under cross validation andnoise addition
331 Cross Validation For cross validation the mean fromthe different measurements of one sample was calculated(e sample set was randomly divided 50 times into a cali-bration and validation set by taking 70 of the data astraining samples and 30 as test samples in each validationiteration with different combinations Each separation runwas provided with a unique random seed to ensure that thedata set was split into the same training and test sets for eachmodel enabling better comparability of results between thedifferent models
332 Robustness against Noise In order to assess the modelperformance under noisy input spectra the model wascalibrated by the full original data set Random Gaussian-distributed white noise was added to each data point (eseperturbed samples were predicted by the model and theprediction error was monitored (is was done for differ-ent noise levels (e random numbers added to each datapoint were generated by a standard normal distributed(mean μ 0 and standard deviation σ 1) random
TABLE 1
Temperature (degC) Storage time (h)120 500 1000 2000 3000140 105 210 415 625160 25 50 105 165
Journal of Spectroscopy 3
number generator e noise levels were de13ned by thefactors (005 010 015 020 025 030 035 040 and 045)which were multiplied with the output of the randomnumber generator For each noise level 50 simulated noisydata sets were generated and predicted by the pretrainedmodel in order to be able to make well-founded statementsabout the model performance under noise perturbation
e noise robustness workow is a very helpful tool toinvestigate whether a good calibration error is a real advantageor if the model ran into over13tting Using the same regressionalgorithm twice with dishyerent regularization parameters λ thelower regularized model will generate a lower initial calibration
error than the more stringent regularized model But if themodels are tested for robustness the latter tends to have a lowererror slope when the noise level increases
34 Evaluation Metrics e built-in functions R2 score andmean squared error (MSE) of the scikit-learn frameworkwere used as performance metrics
341 Mean Squared Error (MSE) e mean squared error(MSE) of a prediction is calculated by the squared dishyerencesbetween the predicted value yipred and the reference value yi
3500 3000 2500 2000 1500 1000Wavenumbers (cmndash1)
Tran
smiss
ion
(au
)
12
10
08
06
04
02
(a)
3500 2500 1500Wavenumbers (cmndash1)
Tran
smiss
ion
(au
) 2
0ndash1ndash2ndash3
1
(b)
3500 2500 1500
Abs
orba
nce (
au)
Wavenumbers (cmndash1)
86420
(c)
Figure 1 FTIR spectra of ATF (a) Raw full transmission spectra without any preprocessing e two data sets with dishyerent measurementsetups can be discriminated by eye e blue spectra originate from measurement type (1) with two KBr discs separated by a Teon spacerand the red set of curves originates from the cuvette measurement (2) (b) SNV-transformed transmission spectra after truncation of thesaturated C-H vibrational regions and CO2 areas and (c) SNV-transformed absorbance spectra after truncation
3500 3000 2500 2000 1500 1000Wavenumbers (cmndash1)
Tran
smiss
ion
(au
)
14
12
10
08
06
02
(a)
04
3500 2500 1500
Abs
orba
nce (
au)
Wavenumbers (cmndash1)
86420
(c)
3500 2500 1500Wavenumbers (cmndash1)
Tran
smiss
ion
(au
)0
ndash1
ndash2
ndash3
1
(b)
Figure 2 FTIR spectra of ATF B (a) Raw full transmission spectra without any preprocessing (b) SNV-transformed transmission spectraafter truncation of the saturated C-H vibrational regions and CO2 areas and (c) SNV-transformed absorbance spectra after truncation
4 Journal of Spectroscopy
of the ith sample For a given data set with n samples theMSEis the average value over all samples It follows the followingformula [27]
MSE y ypred( ) 1nsumn
i1yi minusyipred( )
2 (3)
e best possible MSE value is 0 and small values aredesirable as the deviation from the correct prediction is lowFrom MSE the root-mean-squared error (RMSE) was cal-culated by taking the square root e RMSE value has thesame dimension as the original reference target values
342 R2 Coecient of Determination R2 describes theportion of the variance in the target values (dependentvariables) that can be predicted from the spectra (in-dependent variables) by the model [28] e best possiblescore for R2 is 10 R2 gets 00 for a constant model whichpredicts a constant value disregarding of the input featuresFor linear regression modeling with intercept R2 is equal tothe square of Pearson correlation coecient between pre-dicted and reference target values [29] For a data setcomprising n samples the R2 score is given as
R2 y ypred( ) 1minussumni1 yi minusyipred( )
2
sumni1 yi minusypred( )2 (4)
where yipred is the model prediction of the ith sample whichhas a reference value yi and ypred is the mean value of allpredictions
ypred 1nsumn
i1yipred (5)
35 Standard Normal Variate Each spectrumx (x1 x2 xk) with k measured data points istransformed to the standardized form z (z1 z2 zk)by bringing the spectra to zero mean and unit varianceFor this purpose the mean spectrum x is subtracted fromeach data point xi and divided by the standard deviation
zi xi minus x
sumkj xi minusx( )2kradic (6)
with
x 1ksumk
j
xj (7)
351 Dynamic Localized SNV (DLSNV) e DLSNVworkow is based on the SNV-transformed spectra data set(Figure 4(a)) To calculate the DLSNV data the spectra aredivided into multiple regions On each of these regionsstandardization is performed To adjust the windows toimportant areas in the spectrum a starting point can bede13ned In Figure 4(b) the DLSNV spectra are shown witha starting point of 100 and a window size of 300 pixels
DLSNV algorithm
(i) Perform SNV on a window of the spectrum rangingfrom 13rst data point to the sth one
(ii) Subdivide spectra from sth data point into windowsof all the same size ws
To optimize the two parameters window size ws andstarting point s a three-step approach is performed In eachstep the predictive power of the model is assessed via the
Storage time (h)
Stan
dard
ized
mol
ecul
e sig
nals
(au
)
2
1
0
ndash1
ndash2
ndash3
ndash4
Pure oilCopper chipsMaterial mix
100 200 300 400 500 600 700
(a)
2
1
0
ndash1
ndash2
ndash3
Storage time (h)
Stan
dard
ized
mol
ecul
e sig
nals
(au
)
Pure oilCopper chipsMaterial mix
100 200 300 400 500 600 700
(b)
Figure 3 Standardized additive responses used as target value for the FTIR regressionmodel for (a) the friction modi13er compound and (b)the antioxidant plotted against time for storage temperature 140degC for all three storage experiments measured by HPLC-QToF
Journal of Spectroscopy 5
coecient of determination R2 e prediction performanceof the chosen window size in combination with the re-gression model is benchmarked by 13tting the same model tothe single SNV data indicated by a red line in Figure 4(c)e optimization steps can be summarized as follows
(1) Perform LSNV with window sizes from 50 to 500pixels and determine R2 for all window sizes Findthe optimal window size wsopt1
(2) Perform LSNV with optimal window size of step 1wsopt 1 vary the starting point from 0 to 2middotwsopt 1 andselect the optimal starting point sopt
(3) Perform LSNV with optimal starting point sopt withwindow sizes from 50 to 2middotwsopt 1 in order to 13nd thebest combination of window size wsopt 2 and startingpoint sopt
In Figure 4(d) the 13nal DLSNV spectra after optimi-zation are shown Note that jumps can occur between theindividual standardization windows since the mean value ofthis current window is subtracted for each window How-ever this does not ashyect the regression model
In Figure 5(a) the ATF A samples are shown withSNVperformed on the entire spectral region and in Figure 5(b)the same spectra are depicted after DLSNV optimizationFigure 5(c) shows a zoom-in view of the highlighted region ofFigure 5(a) and in Figure 5(d) the same region is depicted afterDLSNV optimization e baseline is removed for the exactspectra sequence and thus peaks are aligned in a way that thedishyerent aging levels of the samples can already be recognizedby eyee shown snipped spectrum is the phenolic antioxidantregionus the decrease of this band can be associatedwith theaging level Magenta indicates (relatively) fresh sampleswhereas red indicates a strong degradation level
352 Peak SNV e idea behind the Peak SNV method isto standardize the important areas of the spectrum in-dependently of each other e optimization workow forPSNV is shown in Figure 6 starting from the single SNVtransformed data set Data points with a high correlationwith the target values (points of interest POI) are selected
(Figure 6(a)) and the SNV transformation is performed onwindows around the centroids Once the POIs are identi13edthe PSNV transformation is conducted as follows
PSNV algorithm
(i) Subdivide spectra into sequences ranging from halfthe distance from the previous POI to half thedistance to the next one (Figure 6(b)) SNV isperformed across these windows
To 13nd the POI an initial regression model is 13tted tothe data In order to identify important regions of thespectra the model coecients are assessed e normalizedabsolute values of the coecient vector are fed into a peak-picking algorithm Since it may occur that POIs are in closeproximity an agglomeration of the POIs is conducted inorder to prevent from very narrow standardization windowsPeak centroids are calculated via the mean value of thecombined POIs e task for the optimization process is to13nd the best window for POI agglomeration aggopt whichis done by analyzing the calibration R2 for each agglom-eration window and picking the window size with maximalcorrelation between the predicted and reference target(Figure 6(c)) e steps are summarized as follows
(1) Fit the data set to the target values (only calibration)(2) Pick peaks from the normalized model coecient
vector (|w|max(|w|)) threshold for peaks 01(3) Combine peaks which are within a certain window
agg and calculate the centroid of the agglomeratedPOIs
(4) Perform PSNV across the centroid of the POIs(5) Evaluate performance via R2 for agg between 10 and
50 data points and choose aggopt according tomaximal R2
After optimization each window has an individualwindow size and range over the peak centroid of im-portant signals in the spectrum On these windows SNVtransformation provides an optimal baseline and scattereshyect removal e optimized spectrum is shown inFigure 6(d)
SNV
3500 3000 2000 1000Wavenumbers (cmndash1)
2500
Window SNV
Wavenumbers (cmndash1)2500
Find optimalwindow size and
start point
(b)(a) (c)
Final window SNV
3500 3000 2000 1000 3500 3000 2000 1000Wavenumbers (cmndash1)
2500
(d)
095
095
095
097
097
097
R2 Optimization process
R2
R2
Window size 1st run
Starting point
Window size 2nd run
100 200 300 400 500
0 20 40 60 80 100 120
50 60 70 80 90 100 110 120
Figure 4 Demonstration of the workow and optimization process for Dynamic Localized SNV (a) Single SNV (b) Dynamic LocalizedSNV with starting point 100 and window 300 for visualization (c) three-stage optimization process for window starting point and 13nalwindow optimization and (d) optimized DLSNV
6 Journal of Spectroscopy
353 Partial Peak SNV e idea behind Partial Peak SNV issimilar to PSNV picking the regions of the spectrum whichshow a high correlation with the target values agglomeratingPOIs in close proximity and standardizing these importantspectral features (Figure 7(a)) But unlike for PSNV not onlythe whole spectrum is 13nally taken into account but alsoa small window around the POI It may occur that the samedata point appears several times in dishyerent standardizations(see overlapping regions in Figures 7(b) and 7(d)) Due to thisworkow the PPSNV spectrum may have more data points(due to overlapping) or less (because not the entire spectrum istaken into account) than those of the original spectrum APPSNV spectrum is calculated as follows
PPSNV algorithm
(i) Perform SNV across the POIs with a left and rightmargin of pw
e optimization focuses on the adjustment of thewindow size pw around the POIs in which the SNV isapplied for maximal predictive power in calibration (Fig-ure 7(c)) e optimization process is divided into thefollowing steps
(1) Fit the data set to the target values (only calibration)(2) Pick peaks from the normalized model coecient
vector (|w|max(|w|)) threshold for peaks 01
Wavenumbers (cmndash1) Wavenumbers (cmndash1)
(b)(a)Wavenumbers (cmndash1)
2500
3500 3450 3400 3350 3500 3450 3400 3350
3500 3000 2000 1000 3500 3000 2000 1000Wavenumbers (cmndash1)
2500
(d)(c)
Figure 5 Demonstration of the improvement of peak alignment for DLSNV (a) SNV-transformed transmission spectra with marked zoomlevel of (c) (b) Optimized DLSV spectra with marked zoom area of (d) Magenta indicates (relatively) fresh samples whereas red indicatesa strong degradation level
R2
Agglomeration window size
Optimization
(c)Wavenumbers (cmndash1)
2500
Divide spectrum in peak windows
Agglomeratepeaks andperform
peak SNV
Wavenumbers (cmndash1)2500
Model coefficientsselected peaks
(b)(a)
10 20 30 40 503500 3000 2000 10003500 3000 2000 1000 3500 3000 2000 1000Wavenumbers (cmndash1)
2500
Final peak SNV
(d)
Find optimalagglomerationwindow size 095
096
Figure 6 Demonstration of the workow and optimization process for Dynamic Localized SNV (a) Picked peaks of normalized absolutecoecient vector and indication of the POIs in one spectrum (b) spectrum separation according to agglomerated peaks (c) optimizationprocess in order to 13nd optimal agglomeration window size and (d) optimized PSNV
Journal of Spectroscopy 7
Reference friction modifier peak volume
Nonlinearity
ndash3
ndash2
ndash1
0
1
2
Calibration samplesValidation samples
Pred
icte
d fri
ctio
n m
odifi
er p
eak
volu
me
Decreasein additive concentration
ndash3 ndash2 ndash1 0 1 2
(a)
Calibration samplesValidation samples
Reference friction modifier peak volume
ndash3
ndash2
ndash1
0
1
2
Pred
icte
d fri
ctio
n m
odifi
er p
eak
volu
me
ndash3 ndash2 ndash1 0 1 2
(b)
Calibration samplesValidation samples
ndash3
ndash2
ndash1
0
1
2
Pred
icte
d fri
ctio
n m
odifi
er p
eak
volu
me
Reference friction modifier peak volumendash3 ndash2 ndash1 0 1 2
(c)
Calibration samplesValidation samples
ndash3 ndash2 ndash1 0 1 2
ndash3
ndash2
ndash1
0
1
2
Pred
icte
d fri
ctio
n m
odifi
er p
eak
volu
me
Reference friction modifier peak volume
(d)
Figure 8 Cross validation results for (a) SNV (b) Dynamic Localized SNV (c) Peak SNV and (d) Partial Peak SNV preprocessed spectra inthe regression case of the friction modi13er additive e reference values were measured by HPLC-QToF Red dots indicate prediction ofcalibration samples and blue dots represent prediction of the hold-out validation samples e dashed arrow in (a) indicates the increasingdegradation of the samples with increasing time as the friction modi13er additive concentration gets lower
R2
Window size0 50 100 150 200
Optimization
(c)
3500 3000 2000 1000Wavenumbers (cmndash1)
2500
Divide spectrum in peak windows
Agglomeratepeaks andperform PartPeak SNV
3500 3000 2000 1000Wavenumbers (cmndash1)
2500
Model coefficientsselected peaks
(b)(a)
150010000Data points
500
Final part peak SNV
(d)
Find optimalwindow size
092
097
Figure 7 Demonstration of the workow and optimization process for Dynamic Localized SNV (a) Picked peaks of normalized absolutecoecient vector and indication of the POIs in one spectrum (b) spectrum separation according to agglomerated peaks (c) optimizationprocess in order to 13nd optimal window size around the POIs and (d) optimized PPSNV
8 Journal of Spectroscopy
(3) Perform PPSNV across the peaks with the windowsize pw
(4) Evaluate the performance via R2 for pw between 1and 200 data points and choose pwopt according tomaximal R2
4 Results and Discussion
41 Cross Validation In Figure 8(a) the cross validationrecovery function for predictions of the SNV preprocessedspectra of the regression on the friction modi13er compoundis shown A 50-fold cross validation strategy with a cali-brationvalidation splitting of 7030 was used Red dotsrepresent the prediction of calibration and blue dots rep-resent validation samples It is obvious that the linear modelstruggles to predict the high and low compound intensityregions correctly e nonlinearity is visualized by an arrowand a dashed line to guide the eye
In Figure 8 also the cross validation recovery functionfor predictions after Dynamic Localized SNV (Figure 8(b))Peak SNV (Figure 8(c)) and Partial Peak SNV (Figure 8(d))optimization are shown e saturation eshyect in the lowintensity area of the compound response is almost com-pletely removed in the latter three cases It is also notablethat the scattering around the green bisecting line is sig-ni13cantly reduced us the con13dence interval for thepredictions is improved
e RMSEP values during the cross validation of theregression of the friction modi13er component are sum-marized in Figure 9 in a box-and-whisker plot representa-tion e red line indicates the median within the boxes theinterquartile range (IQR) (contains 50 of the data) isdepicted and the margins of the whiskers representQ1 minus 15 middot IQR and Q3 + 15 middot IQR for the lower and upperbound respectively (Q1 means the smallest 25 of the dataset are smaller than this value and Q3 means the smallest75 are smaller than this value) Subplot Figure 9(a) refers tothe transmission spectra and Figure 9(b) refers to the ab-sorbance spectra e labels are associated with (1) withoutstandardization (2) single SNV transformation on the fullspectral range (3) Localized SNV (4) Dynamic LocalizedSNV (5) Peak SNV and (6) Partial Peak SNV
It is noticeable that the RMSEP is very poor in case of thecrude transmission spectra and that SNV has a very usefulimpact on them whereas the improvement after SNV is lowfor absorbance spectra
For all sophisticated optimized standardization ap-proaches DLSNV PSNV and PPSNV the median and thescattering around the median of RMSEP decreases drasti-cally with respect to the SNV-transformed full spectra butalso LSNV seems to be a reasonable choice DLSNV onabsorbance spectra is characterized by the lowest medianand the smallest scattering con13rmed by Table 2 summa-rizing the mean values and standard deviation of RMSEP
RMSE
P Proposedmethods
Proposedmethods
(1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)
+ ++
+ +
+++
++++
00
02
04
06
08
10
12
14
16
(a) (b)
Figure 9 Box-and-whisker plot representation of the root-mean-squared error of prediction of the cross validation strategy (50 foldsrandom train test split of 7030 of the data) for the friction modi13er compound of ATF A In (a) the RMSEP values for the transmissionspectra are shown and in (b) the RMSEP values for the absorbance spectra are shown Boxplot (1) is without standardization (2) is witha single SNV transformation on the full spectrum (3) is with optimized LSNV (4) is with optimized Dynamic DLSNV (5) is with optimizedPSNV and (6) is with optimized PPSNV
Journal of Spectroscopy 9
e summarized RMSEPs of the regression to the antiox-idant additive of ATF B are shown in Figure 10 in a box-and-whisker plot representation where Figure 10(a) refers to thetransmission spectra and Figure 10(b) to the absorbance spectraIn this case DLSNV PSNV and PPSNV reduce both themedian and the scattering around themedian enormously whencompared with SNV on full spectra e best performance isachieved by PPSNV conducted on the transmission spectracon13rmed by Table 2 On transmission spectra DLSNV andPPSNV perform better than LSNV but PSNV only has a pos-itive eshyect when compared to SNV In relation to LSNV usingPSNV the predictive power is reduced e fact that PPSNV isthe best choice for this regression use case suggests that it isbene13cial to only use spectral regions with high correlation withthe target value and drop regions without or low correlation
411 Noise Robustness In Figure 11 the performance of theregression model for the prediction of noisy spectra isshown for the friction modi13er In subplot Figure 11(a)the curves for all preprocessings are depicted and inFigure 11(b) a zoomed view is shown Without any pre-processing the initial calibration error for both trans-mission and absorbance spectra is very poor and rises veryfast with the increasing noise level factor Although thesophisticated preprocessing methods LSNV DLSNVPNSV and PPSNV show a lower initial calibration errorthe slope of the error is lower than for SNV In Figure 11(b)the trend of the transmission spectra having low errorsteepness is visible One may say that the three proposedstandardization techniques show a very similar noise ro-bustness behavior and are signi13cantly better than none or
(a) (b)
RMSE
P
Proposedmethods
Proposedmethods
(1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)00
02
04
06
08
10
12
14
+
+
++
++
+
++
+
Figure 10 Box-and-whisker plot representation of the root-mean-squared error of prediction of the cross validation strategy for thephenolic antioxidant compound of ATF B In (a) the RMSEP values for the transmission spectra are shown and in (b) the RMSEP valuesfor the absorbance spectra are shown Boxplot (1) is without standardization (2) is with a single SNV transformation on the full spectrum(3) is with optimized LSNV (4) is with optimized Dynamic DLSNV (5) is with optimized PSNV and (6) is with optimized PPSNV
Table 2 Summary of the model performances described by the mean value and the standard deviation of the RMSEP values during crossvalidation e relative improvements and respective p-values compared with LSNV are also listed
MethodFriction modi13er ATF A Antioxidant ATF B
Transmission Absorbance Transmission AbsorbanceRaw 083plusmn 022 053plusmn 016 090plusmn 014 070plusmn 011Single SNV 034plusmn 014 042plusmn 017 061plusmn 011 070plusmn 013LSNV 030plusmn 007 031plusmn 007 024plusmn 004 024plusmn 004DLSNV 028plusmn 006 026 plusmn 005 021plusmn 004 021plusmn 004Rel improvement 9 (plt 005) 16 (plt 0001) 13 (plt 0001) 13 (plt 0001)PSNV 028plusmn 008 028plusmn 006 033plusmn 006 031plusmn 007Rel improvement 8 (pgt 005) 9 (plt 005) minus41 (plt 0001) minus33 (plt 0001)PPSNV 026plusmn 006 026plusmn 006 017 plusmn 004 024plusmn 005Rel improvement 13 (plt 001) 15 (plt 0001) 29 (plt 0001) minus3 (pgt 005)
10 Journal of Spectroscopy
SNV preprocessing indicating that the model did notoverfit the data as mentioned in Section 332
In Figure 12 the performance of the regression model ofthe antioxidant for the prediction of noisy spectra is shown(e absorbance spectra with or without preprocessing showa similar noise trend as the SNV-transformed spectra InFigure 12(b) the localized versions are shown in a zoomedview (e PPSNV preprocessing on the transmission spectrais characterized by the flattest noise dependency (ese resutsdemonstrate the superiority of the PPSNVmethod in this usecase As mentioned above PSNV is not advantageous in thisapplication and shows the lowest noise immunity but it ispreferable to the SNV across the whole spectrum
42 Summary (e optimized parameters for the pre-processing methods are summarized in Table 3 (e LSNVoptimization process selects the same window size asDLSNV (us the second window size run has no influenceon the final result in these two cases but the starting pointproduces an improvement
As already mentioned in Table 2 the cross validationperformances of the tested methods are summarized as meanvalues and standard deviation for all cross validation runs Forthe friction modifier the performances of DLSNV PSNVand PPSNV are very similar Table 2 also lists the relative
improvements against the benchmark preprocessing LSNVaccompanied by corresponding p values from a two-sidedt-test which tests the significance of the mean values beingdifferent (the deviation for relative improvements when thesame mean value is given due to the fact that the improve-ments were calculated from exact values rather than roundedvalues)
(e best mean RMSEP value for the regression model forthe frictionmodifier of 026 is produced by DLSNV based onabsorbance spectra (e antioxidant compound is modeledbest by PPSNV preprocessing of the transmission spectraand yields a very low prediction error of 017
To summarize one may say that all proposed methodsperformed very well reducing both mean and standarddeviation of the cross validation error compared with SNVPSNV is not reasonable for the antioxidant additive as theperformance is poor compared with the benchmark pre-processing method LSNV
Which preprocessing method is the best depends on theactual regression use case but in general it is shown thatPPSNV outperforms PSNV (is suggests that it is bene-ficial to drop spectral regions showing low or no de-pendency on the target value and to only consider highlycorrelated peaks
For the antioxidant compound PPSNV yielded anenormous improvement (is could be explained as the
Noise levelfactor
Wo standardizationSNVLSNVDLSNV
PSNVPPSNVTransmissionAbsorbance
14
12
10
08
RMSE
06
04
02
0001 02 03 04
(a)
RMSE
Noise levelfactor
LSNVDLSNVPSNV
PPSNVTransmissionAbsorbance
045
040
035
030
025
020
01501 02 03 04
(b)
Figure 11 RMSE as a function of the noise level factor for the regression of the friction modifier compound of ATF A calibrated by theunperturbed full data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50repetitions of noise addition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown
Journal of Spectroscopy 11
phenolic aging inhibitor is a compound with very narrowvibrational band in the ATF B and thus does not have a greatimpact when the SNV is carried out across the entirespectrum (is may lead to a suboptimal alignment of thisband In case of novel standardizations the SNV is opti-mized to the high correlative bands and scatter effects can becompensated for these exact regions
(e fact that PSNV is unsuccessful for the antioxidantmay be because the POIs are not centered to the middle ofthe SNVwindow andmay have large left and right margins ifthey are far away from other POIs As a result they may notbe optimally standardized (is is shown in Figure 6(b)where the single POI at about 2700 cmminus1 has a large singleSNV window
(e study provides an overview of model performanceswhen using transmission or absorbance spectra suggestingthat both cases can lead to valid regressionmodels Howeverfor quantitative models built on transmission spectra theSNV is vital whereas in the absorbance case the predictivepower does not depend on SNV transformation In absor-bance spectra the influence of the baseline constant is re-duced because high transmission values are converted intolow absorbance values
To conclude DLSNV PSNV and PPSNV were able toimprove both transmission and absorbance predictivemodels (e scattering around the mean values are also
drastically reduced because the model does not have to learnhow to compensate for the baseline shift in each crossvalidation step leading to more reproducible results Eachvibrational band is optimally aligned so that the additivedepletion trend is encoded in the absolute signal intensityand the model does not have to weigh a data point asbackground correction
5 Conclusion
(e results presented in this study demonstrate the out-performance of the proposed novel standardizationstrategies Dynamic Localized SNV Peak SNV and PartialPeak SNV to improve both the mean and scatter of RMSEP
Noise level factor
RMSE
Wo standardizationSNVLSNVDLSNV
PSNVPPSNVTransmissionAbsorbance
20
15
10
05
0001 02 03 04
(a)
LSNVDLSNVPSNV
PPSNVTransmissionAbsorbance
RMSE
Noise level factor
07
06
05
04
03
02
0101 02 03 04
(b)
Figure 12 RMSE as a function of the noise level factor for the regression of the antioxidant compound of ATF B calibrated by the originalfull data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50 repetitions of noiseaddition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown
Table 3 Summary of the preprocessing parameters For DLSNVwindow size and starting point for PSN agglomeration windowand for PPSNV window width around POI is shown
MethodATF A ATF B
Transmission Absorbance Transmission AbsorbanceLSNV wopt 52 wopt 52 wopt 51 wopt 2 51DLSNV wopt 2 52 wopt 2 52 wopt 2 51 wopt 2 51
sopt 5 sopt 7 sopt 48 sopt 48PSNV aggopt 5 aggopt 14 aggopt 6 aggopt 10PPSNV pwopt 17 pwopt 25 pwopt 15 pwopt 38
12 Journal of Spectroscopy
values in cross validation and the robustness against noisedrastically with respect to SNV transformation executed onthe entire spectrum Against the benchmark LSNV anenhancement of the predictive power of a ridge regressionmodel by up to 16 and 29 could be achieved for thefriction modifier and the antioxidant compound re-spectively (e demonstrated optimization workflows forperforming SNV on specific regions of the spectrum havebeen introduced here for the first time (erefore thestandardization methods used in this paper are capable ofeliminating nonlinearities by flexible rescaling in definedareas To our knowledge such standardization techniqueshave not been presented elsewhere
Data Availability
(e data used to support the findings of this study are in-cluded within the article
Conflicts of Interest
(e authors declare that they have no conflicts of interest
References
[1] M Zeaiter J M Roger and V Bellon-Maurel ldquoDynamicorthogonal projection a new method to maintain the on-linerobustness of multivariate calibrations application to nir-based monitoring of wine fermentationsrdquo Chemometrics andIntelligent Laboratory Systems vol 80 no 2 pp 227ndash2352006
[2] R M Balabin and R Z Safieva ldquoMotor oil classificationby base stock and viscosity based on near infrared (nir)spectroscopy datardquo Fuel vol 87 no 12 pp 2745ndash27522008
[3] A Borin and R J Poppi ldquoMultivariate quality control oflubricating oils using fourier transform infrared spectros-copyrdquo Journal of the Brazilian Chemical Society vol 15 no 4pp 570ndash576 2004
[4] J Huang D Brennan L Sattler J Alderman B Lane andC OrsquoMathuna ldquoA comparison of calibration methods basedon calibration data size and robustnessrdquo Chemometrics andIntelligent Laboratory Systems vol 62 no 1 pp 25ndash352002
[5] M A Al-Ghouti and L Al-Atoum ldquoVirgin and recycledengine oil differentiation a spectroscopic studyrdquo Journal ofEnvironmental Management vol 90 no 1 pp 187ndash195 2009
[6] R Kellner J M Mermet M Otto M Valcarcel andH M Widmer Analytical Chemistry John Wiley amp SonsAustralia Limited Milton Australia 2004
[7] B G Osborne T Fearn P T Hindle and B G OsbornePractical nir Spectroscopy with Applications in Food andBeverage Analysis Longman Scientific amp Technical WileyHarlow Essex UK 1993
[8] J D Donald and D D Kevin Interpreting Diffuse Reflectanceand Transmittance NIR Chichester UK 2007
[9] A Rinnan F van den Berg and S B Engelsen ldquoReview of themost common pre-processing techniques for near-infraredspectrardquo TrAC Trends in Analytical Chemistry vol 28 no 10pp 1201ndash1222 2009
[10] E Arendse O Amos Fawole L Samukelo MagwazaN Helene and U Linus Opara ldquoComparing the analyticalperformance of near and mid infrared spectrometers for
evaluating pomegranate juice qualityrdquo LWT vol 91pp 180ndash190 2018
[11] J C Machado M A Faria I M P L V O FerreiraR N M J Pascoa and J A Lopes ldquoVarietal discrimination ofhop pellets by near and mid infrared spectroscopyrdquo Talantavol 180 pp 69ndash75 2018
[12] H W Siesler ldquoVibrational spectroscopyrdquo in ReferenceModule in Materials Science and Materials EngineeringElsevier New York NY USA 2016
[13] A Paula Craig B G Botelho L S Oliveira and A S FrancaldquoMid infrared spectroscopy and chemometrics as tools for theclassification of roasted coffees by cup qualityrdquo FoodChemistry vol 245 pp 1052ndash1061 2018
[14] S R Khandasammy M A Fikiet E Mistek et al ldquoBloodstainspaintings and drugs raman spectroscopy applications in fo-rensic sciencerdquo Forensic Chemistry vol 8 pp 111ndash133 2018
[15] J Engel G Jan E Szymanska et al ldquoBreaking with trends inpre-processingrdquo TrAC Trends in Analytical Chemistryvol 50 pp 96ndash106 2013
[16] H Martens S Jensen and P Geladi ldquoMultivariate linearitytransformation for near-infrared reflectance spectrometryrdquo inProceedings of Nordic symposium on Applied Statisticspp 205ndash234 Stavanger Norway June 1983
[17] P Geladi D MacDougall and HMartens ldquoLinearization andscatter-correction for near-infrared reflectance spectra ofmeatrdquo Applied Spectroscopy vol 39 no 3 pp 491ndash500 1985
[18] P Kubelka and F Munk ldquoEin beitrag zur optik der far-banstricherdquo Zeitschrift fur Technische Physik vol 12pp 593ndash601 1931
[19] A Claus Andersson ldquoDirect orthogonalizationrdquo Chemo-metrics and Intelligent Laboratory Systems vol 47 no 1pp 51ndash63 1999
[20] R J Barnes M S Dhanoa and S J Lister ldquoStandard normalvariate transformation and de-trending of near-infrareddiffuse reflectance spectrardquo Applied Spectroscopy vol 43no 5 pp 772ndash777 1989
[21] M Zeaiter J-M Roger and V Bellon-Maurel ldquoRobustnessof models developed by multivariate calibration part ii theinfluence of pre-processing methodsrdquo TrAC Trends in Ana-lytical Chemistry vol 24 no 5 pp 437ndash445 2005
[22] T Fearn C Riccioli A Garrido-Varo and J Emilio Guer-rero-Ginel ldquoOn the geometry of SNV and mscrdquo Chemo-metrics and Intelligent Laboratory Systems vol 96 no 1pp 22ndash26 2009
[23] T Isaksson and B Kowalski ldquoPiece-wise multiplicative scattercorrection applied to near-infrared diffuse transmittance datafrom meat productsrdquo Applied Spectroscopy vol 47 no 6pp 702ndash709 1993
[24] Y Bi K Yuan W Xiao et al ldquoA local pre-processing methodfor near-infrared spectra combined with spectral segmen-tation and standard normal variate transformationrdquo Analy-tica Chimica Acta vol 909 pp 30ndash40 2016
[25] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 pp 2825ndash2830 2011
[26] S Raschka Python Machine Learning Packt PublishingBirmingham UK 2015
[27] E L Lehmann and G Casella Deory of Point EstimationSpringer Texts in Statistics Springer New York NY USA2003
[28] O Heinisch ldquoSteel R G D and J H Torrie principles andprocedures of statistics (with special reference to the bi-ological sciences) mcgraw-hill book company New York
Journal of Spectroscopy 13
Toronto London 1960 481 s 15 abb 81 s 6 drdquo BiometrischeZeitschrift vol 4 no 3 pp 207-208 1962
[29] K Pearson ldquoNote on regression and inheritance in the case oftwo parentsrdquo Proceedings of the Royal Society of Londonvol 58 no 1 pp 240ndash242 1895
14 Journal of Spectroscopy
TribologyAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
International Journal ofInternational Journal ofPhotoenergy
Hindawiwwwhindawicom Volume 2018
Journal of
Chemistry
Hindawiwwwhindawicom Volume 2018
Advances inPhysical Chemistry
Hindawiwwwhindawicom
Analytical Methods in Chemistry
Journal of
Volume 2018
Bioinorganic Chemistry and ApplicationsHindawiwwwhindawicom Volume 2018
SpectroscopyInternational Journal of
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Medicinal ChemistryInternational Journal of
Hindawiwwwhindawicom Volume 2018
NanotechnologyHindawiwwwhindawicom Volume 2018
Journal of
Applied ChemistryJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Biochemistry Research International
Hindawiwwwhindawicom Volume 2018
Enzyme Research
Hindawiwwwhindawicom Volume 2018
Journal of
SpectroscopyAnalytical ChemistryInternational Journal of
Hindawiwwwhindawicom Volume 2018
MaterialsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
BioMed Research International Electrochemistry
International Journal of
Hindawiwwwhindawicom Volume 2018
Na
nom
ate
ria
ls
Hindawiwwwhindawicom Volume 2018
Journal ofNanomaterials
Submit your manuscripts atwwwhindawicom
a teflon spacer with the thickness of about 50 μm and (2)fixed KBr cuvette of 100 μm thickness filled with ATF
After each sample measurement the KBr discs and thecuvette were rinsed several times with petroleum ether inorder to prevent cross contamination (e cuvette was driedwith N2 gas after rinsing and the KBr discs were dried un-der ambient air For the measurement type (1) 4 spectra persample were recorded and for type (2) one spectrum persample was recorded
Due to the sample layer thickness the hydrocarbon bandsare saturated and therefore the spectra had to be cut in thewavenumber regions between 3000 and 2815 cmminus1 (C-Hstretching mode) and between 1491 and 1424 cmminus1 (C-Hbending and rocking mode) Additionally the CO2 bandswere eliminated by cutting out the region from 2387 to2285 cmminus1 as well(e spectra of ATFA are shown in Figure 1in transmission without any preprocessing as measured inFigure 1(b) after truncation and SNV transformation and inFigure 1(c) SNV transformed after calculating the absorbancespectra by using A minuslog(T) In Figure 2 the same diagramsare shown for ATF B In both cases two series of curves can bediscriminated from the raw spectra by the eye (e blue seriescomes from measurement type (1) and the red set comesfrom the cuvette measurements (2) To combine the two datasets from the measurement setups (1) and (2) are challengingtasks for a predictive model as the main variance is due to thethickness variation(e data set demonstrates the importanceof suitable and sophisticated preprocessing methods in orderto eliminate the difference in the spectra induced by thevarying sample thickness (e standardization techniquespresented here are able to meet this need
232 Liquid Chromatography Coupled with Mass Spectrometry(e measurements for the determination of the additivecompound signals were performed with an Agilent liquidchromatograph 1260 coupled with a high-resolution QToF6540 mass spectrometer with methanolwaterammoniumacetate and isopropanol as an eluent Ionization was carriedout bymeans of electrospray (ESI)(e final compound peakarea data set was created using the Agilent MassHunterQualitative Analysis B 0600 analysis software
(e response signals of the additive compounds arestandardized by subtracting mean and dividing by standarddeviation in order to bring all signal values on the same scale(e standardized signals are depicted in Figure 3
3 Methods
31 Implementation (e proposed novel standardizationmethods and respective optimization processes were imple-mented via Python scripts
32 Regression AlgorithmmdashRidge For the prediction theridge regression estimator implemented in the Python scikit-learn framework for machine learning applications was used[25] It is a linear model which solves a regression task via theleast squares loss function J(w) with L2 regularization [26]Regularization is an approach to minimize the issue ofoverfitting which is particularly important for high-dimensional data such as FTIR spectra by controlling thequadratic sum of the model coefficient w (is is done byadding the penalizing term L2 weighted by the hyper pa-rameter λ
λw2 λ 1113944m
j1w
2j (1)
(us the loss function is defined as
J(w) 1113944n
i1yi minusyipred1113872 1113873
2+ λw2 (2)
where yi stands for the reference value of the ith sample andyipred for the prediction of this sample Since the perfor-mance of the preprocessing methods has to be assessedindependently from the actually used predictive regressionmodel the same regression model with identical hyper-parameter λ was applied to the various preprocessed datasets For the regression of the friction modifier compound ofATF A λ 5 and for the antioxidant of ATF B λ 3 wasused (ese parameters turned out to be the best choicesregarding cross validation and robustness for the SNVtransformed data set in a previously conducted internalstudy
33 Model Performance Evaluation To assess the perfor-mance of our models two different approaches were chosennamely the predictive power under cross validation andnoise addition
331 Cross Validation For cross validation the mean fromthe different measurements of one sample was calculated(e sample set was randomly divided 50 times into a cali-bration and validation set by taking 70 of the data astraining samples and 30 as test samples in each validationiteration with different combinations Each separation runwas provided with a unique random seed to ensure that thedata set was split into the same training and test sets for eachmodel enabling better comparability of results between thedifferent models
332 Robustness against Noise In order to assess the modelperformance under noisy input spectra the model wascalibrated by the full original data set Random Gaussian-distributed white noise was added to each data point (eseperturbed samples were predicted by the model and theprediction error was monitored (is was done for differ-ent noise levels (e random numbers added to each datapoint were generated by a standard normal distributed(mean μ 0 and standard deviation σ 1) random
TABLE 1
Temperature (degC) Storage time (h)120 500 1000 2000 3000140 105 210 415 625160 25 50 105 165
Journal of Spectroscopy 3
number generator e noise levels were de13ned by thefactors (005 010 015 020 025 030 035 040 and 045)which were multiplied with the output of the randomnumber generator For each noise level 50 simulated noisydata sets were generated and predicted by the pretrainedmodel in order to be able to make well-founded statementsabout the model performance under noise perturbation
e noise robustness workow is a very helpful tool toinvestigate whether a good calibration error is a real advantageor if the model ran into over13tting Using the same regressionalgorithm twice with dishyerent regularization parameters λ thelower regularized model will generate a lower initial calibration
error than the more stringent regularized model But if themodels are tested for robustness the latter tends to have a lowererror slope when the noise level increases
34 Evaluation Metrics e built-in functions R2 score andmean squared error (MSE) of the scikit-learn frameworkwere used as performance metrics
341 Mean Squared Error (MSE) e mean squared error(MSE) of a prediction is calculated by the squared dishyerencesbetween the predicted value yipred and the reference value yi
3500 3000 2500 2000 1500 1000Wavenumbers (cmndash1)
Tran
smiss
ion
(au
)
12
10
08
06
04
02
(a)
3500 2500 1500Wavenumbers (cmndash1)
Tran
smiss
ion
(au
) 2
0ndash1ndash2ndash3
1
(b)
3500 2500 1500
Abs
orba
nce (
au)
Wavenumbers (cmndash1)
86420
(c)
Figure 1 FTIR spectra of ATF (a) Raw full transmission spectra without any preprocessing e two data sets with dishyerent measurementsetups can be discriminated by eye e blue spectra originate from measurement type (1) with two KBr discs separated by a Teon spacerand the red set of curves originates from the cuvette measurement (2) (b) SNV-transformed transmission spectra after truncation of thesaturated C-H vibrational regions and CO2 areas and (c) SNV-transformed absorbance spectra after truncation
3500 3000 2500 2000 1500 1000Wavenumbers (cmndash1)
Tran
smiss
ion
(au
)
14
12
10
08
06
02
(a)
04
3500 2500 1500
Abs
orba
nce (
au)
Wavenumbers (cmndash1)
86420
(c)
3500 2500 1500Wavenumbers (cmndash1)
Tran
smiss
ion
(au
)0
ndash1
ndash2
ndash3
1
(b)
Figure 2 FTIR spectra of ATF B (a) Raw full transmission spectra without any preprocessing (b) SNV-transformed transmission spectraafter truncation of the saturated C-H vibrational regions and CO2 areas and (c) SNV-transformed absorbance spectra after truncation
4 Journal of Spectroscopy
of the ith sample For a given data set with n samples theMSEis the average value over all samples It follows the followingformula [27]
MSE y ypred( ) 1nsumn
i1yi minusyipred( )
2 (3)
e best possible MSE value is 0 and small values aredesirable as the deviation from the correct prediction is lowFrom MSE the root-mean-squared error (RMSE) was cal-culated by taking the square root e RMSE value has thesame dimension as the original reference target values
342 R2 Coecient of Determination R2 describes theportion of the variance in the target values (dependentvariables) that can be predicted from the spectra (in-dependent variables) by the model [28] e best possiblescore for R2 is 10 R2 gets 00 for a constant model whichpredicts a constant value disregarding of the input featuresFor linear regression modeling with intercept R2 is equal tothe square of Pearson correlation coecient between pre-dicted and reference target values [29] For a data setcomprising n samples the R2 score is given as
R2 y ypred( ) 1minussumni1 yi minusyipred( )
2
sumni1 yi minusypred( )2 (4)
where yipred is the model prediction of the ith sample whichhas a reference value yi and ypred is the mean value of allpredictions
ypred 1nsumn
i1yipred (5)
35 Standard Normal Variate Each spectrumx (x1 x2 xk) with k measured data points istransformed to the standardized form z (z1 z2 zk)by bringing the spectra to zero mean and unit varianceFor this purpose the mean spectrum x is subtracted fromeach data point xi and divided by the standard deviation
zi xi minus x
sumkj xi minusx( )2kradic (6)
with
x 1ksumk
j
xj (7)
351 Dynamic Localized SNV (DLSNV) e DLSNVworkow is based on the SNV-transformed spectra data set(Figure 4(a)) To calculate the DLSNV data the spectra aredivided into multiple regions On each of these regionsstandardization is performed To adjust the windows toimportant areas in the spectrum a starting point can bede13ned In Figure 4(b) the DLSNV spectra are shown witha starting point of 100 and a window size of 300 pixels
DLSNV algorithm
(i) Perform SNV on a window of the spectrum rangingfrom 13rst data point to the sth one
(ii) Subdivide spectra from sth data point into windowsof all the same size ws
To optimize the two parameters window size ws andstarting point s a three-step approach is performed In eachstep the predictive power of the model is assessed via the
Storage time (h)
Stan
dard
ized
mol
ecul
e sig
nals
(au
)
2
1
0
ndash1
ndash2
ndash3
ndash4
Pure oilCopper chipsMaterial mix
100 200 300 400 500 600 700
(a)
2
1
0
ndash1
ndash2
ndash3
Storage time (h)
Stan
dard
ized
mol
ecul
e sig
nals
(au
)
Pure oilCopper chipsMaterial mix
100 200 300 400 500 600 700
(b)
Figure 3 Standardized additive responses used as target value for the FTIR regressionmodel for (a) the friction modi13er compound and (b)the antioxidant plotted against time for storage temperature 140degC for all three storage experiments measured by HPLC-QToF
Journal of Spectroscopy 5
coecient of determination R2 e prediction performanceof the chosen window size in combination with the re-gression model is benchmarked by 13tting the same model tothe single SNV data indicated by a red line in Figure 4(c)e optimization steps can be summarized as follows
(1) Perform LSNV with window sizes from 50 to 500pixels and determine R2 for all window sizes Findthe optimal window size wsopt1
(2) Perform LSNV with optimal window size of step 1wsopt 1 vary the starting point from 0 to 2middotwsopt 1 andselect the optimal starting point sopt
(3) Perform LSNV with optimal starting point sopt withwindow sizes from 50 to 2middotwsopt 1 in order to 13nd thebest combination of window size wsopt 2 and startingpoint sopt
In Figure 4(d) the 13nal DLSNV spectra after optimi-zation are shown Note that jumps can occur between theindividual standardization windows since the mean value ofthis current window is subtracted for each window How-ever this does not ashyect the regression model
In Figure 5(a) the ATF A samples are shown withSNVperformed on the entire spectral region and in Figure 5(b)the same spectra are depicted after DLSNV optimizationFigure 5(c) shows a zoom-in view of the highlighted region ofFigure 5(a) and in Figure 5(d) the same region is depicted afterDLSNV optimization e baseline is removed for the exactspectra sequence and thus peaks are aligned in a way that thedishyerent aging levels of the samples can already be recognizedby eyee shown snipped spectrum is the phenolic antioxidantregionus the decrease of this band can be associatedwith theaging level Magenta indicates (relatively) fresh sampleswhereas red indicates a strong degradation level
352 Peak SNV e idea behind the Peak SNV method isto standardize the important areas of the spectrum in-dependently of each other e optimization workow forPSNV is shown in Figure 6 starting from the single SNVtransformed data set Data points with a high correlationwith the target values (points of interest POI) are selected
(Figure 6(a)) and the SNV transformation is performed onwindows around the centroids Once the POIs are identi13edthe PSNV transformation is conducted as follows
PSNV algorithm
(i) Subdivide spectra into sequences ranging from halfthe distance from the previous POI to half thedistance to the next one (Figure 6(b)) SNV isperformed across these windows
To 13nd the POI an initial regression model is 13tted tothe data In order to identify important regions of thespectra the model coecients are assessed e normalizedabsolute values of the coecient vector are fed into a peak-picking algorithm Since it may occur that POIs are in closeproximity an agglomeration of the POIs is conducted inorder to prevent from very narrow standardization windowsPeak centroids are calculated via the mean value of thecombined POIs e task for the optimization process is to13nd the best window for POI agglomeration aggopt whichis done by analyzing the calibration R2 for each agglom-eration window and picking the window size with maximalcorrelation between the predicted and reference target(Figure 6(c)) e steps are summarized as follows
(1) Fit the data set to the target values (only calibration)(2) Pick peaks from the normalized model coecient
vector (|w|max(|w|)) threshold for peaks 01(3) Combine peaks which are within a certain window
agg and calculate the centroid of the agglomeratedPOIs
(4) Perform PSNV across the centroid of the POIs(5) Evaluate performance via R2 for agg between 10 and
50 data points and choose aggopt according tomaximal R2
After optimization each window has an individualwindow size and range over the peak centroid of im-portant signals in the spectrum On these windows SNVtransformation provides an optimal baseline and scattereshyect removal e optimized spectrum is shown inFigure 6(d)
SNV
3500 3000 2000 1000Wavenumbers (cmndash1)
2500
Window SNV
Wavenumbers (cmndash1)2500
Find optimalwindow size and
start point
(b)(a) (c)
Final window SNV
3500 3000 2000 1000 3500 3000 2000 1000Wavenumbers (cmndash1)
2500
(d)
095
095
095
097
097
097
R2 Optimization process
R2
R2
Window size 1st run
Starting point
Window size 2nd run
100 200 300 400 500
0 20 40 60 80 100 120
50 60 70 80 90 100 110 120
Figure 4 Demonstration of the workow and optimization process for Dynamic Localized SNV (a) Single SNV (b) Dynamic LocalizedSNV with starting point 100 and window 300 for visualization (c) three-stage optimization process for window starting point and 13nalwindow optimization and (d) optimized DLSNV
6 Journal of Spectroscopy
353 Partial Peak SNV e idea behind Partial Peak SNV issimilar to PSNV picking the regions of the spectrum whichshow a high correlation with the target values agglomeratingPOIs in close proximity and standardizing these importantspectral features (Figure 7(a)) But unlike for PSNV not onlythe whole spectrum is 13nally taken into account but alsoa small window around the POI It may occur that the samedata point appears several times in dishyerent standardizations(see overlapping regions in Figures 7(b) and 7(d)) Due to thisworkow the PPSNV spectrum may have more data points(due to overlapping) or less (because not the entire spectrum istaken into account) than those of the original spectrum APPSNV spectrum is calculated as follows
PPSNV algorithm
(i) Perform SNV across the POIs with a left and rightmargin of pw
e optimization focuses on the adjustment of thewindow size pw around the POIs in which the SNV isapplied for maximal predictive power in calibration (Fig-ure 7(c)) e optimization process is divided into thefollowing steps
(1) Fit the data set to the target values (only calibration)(2) Pick peaks from the normalized model coecient
vector (|w|max(|w|)) threshold for peaks 01
Wavenumbers (cmndash1) Wavenumbers (cmndash1)
(b)(a)Wavenumbers (cmndash1)
2500
3500 3450 3400 3350 3500 3450 3400 3350
3500 3000 2000 1000 3500 3000 2000 1000Wavenumbers (cmndash1)
2500
(d)(c)
Figure 5 Demonstration of the improvement of peak alignment for DLSNV (a) SNV-transformed transmission spectra with marked zoomlevel of (c) (b) Optimized DLSV spectra with marked zoom area of (d) Magenta indicates (relatively) fresh samples whereas red indicatesa strong degradation level
R2
Agglomeration window size
Optimization
(c)Wavenumbers (cmndash1)
2500
Divide spectrum in peak windows
Agglomeratepeaks andperform
peak SNV
Wavenumbers (cmndash1)2500
Model coefficientsselected peaks
(b)(a)
10 20 30 40 503500 3000 2000 10003500 3000 2000 1000 3500 3000 2000 1000Wavenumbers (cmndash1)
2500
Final peak SNV
(d)
Find optimalagglomerationwindow size 095
096
Figure 6 Demonstration of the workow and optimization process for Dynamic Localized SNV (a) Picked peaks of normalized absolutecoecient vector and indication of the POIs in one spectrum (b) spectrum separation according to agglomerated peaks (c) optimizationprocess in order to 13nd optimal agglomeration window size and (d) optimized PSNV
Journal of Spectroscopy 7
Reference friction modifier peak volume
Nonlinearity
ndash3
ndash2
ndash1
0
1
2
Calibration samplesValidation samples
Pred
icte
d fri
ctio
n m
odifi
er p
eak
volu
me
Decreasein additive concentration
ndash3 ndash2 ndash1 0 1 2
(a)
Calibration samplesValidation samples
Reference friction modifier peak volume
ndash3
ndash2
ndash1
0
1
2
Pred
icte
d fri
ctio
n m
odifi
er p
eak
volu
me
ndash3 ndash2 ndash1 0 1 2
(b)
Calibration samplesValidation samples
ndash3
ndash2
ndash1
0
1
2
Pred
icte
d fri
ctio
n m
odifi
er p
eak
volu
me
Reference friction modifier peak volumendash3 ndash2 ndash1 0 1 2
(c)
Calibration samplesValidation samples
ndash3 ndash2 ndash1 0 1 2
ndash3
ndash2
ndash1
0
1
2
Pred
icte
d fri
ctio
n m
odifi
er p
eak
volu
me
Reference friction modifier peak volume
(d)
Figure 8 Cross validation results for (a) SNV (b) Dynamic Localized SNV (c) Peak SNV and (d) Partial Peak SNV preprocessed spectra inthe regression case of the friction modi13er additive e reference values were measured by HPLC-QToF Red dots indicate prediction ofcalibration samples and blue dots represent prediction of the hold-out validation samples e dashed arrow in (a) indicates the increasingdegradation of the samples with increasing time as the friction modi13er additive concentration gets lower
R2
Window size0 50 100 150 200
Optimization
(c)
3500 3000 2000 1000Wavenumbers (cmndash1)
2500
Divide spectrum in peak windows
Agglomeratepeaks andperform PartPeak SNV
3500 3000 2000 1000Wavenumbers (cmndash1)
2500
Model coefficientsselected peaks
(b)(a)
150010000Data points
500
Final part peak SNV
(d)
Find optimalwindow size
092
097
Figure 7 Demonstration of the workow and optimization process for Dynamic Localized SNV (a) Picked peaks of normalized absolutecoecient vector and indication of the POIs in one spectrum (b) spectrum separation according to agglomerated peaks (c) optimizationprocess in order to 13nd optimal window size around the POIs and (d) optimized PPSNV
8 Journal of Spectroscopy
(3) Perform PPSNV across the peaks with the windowsize pw
(4) Evaluate the performance via R2 for pw between 1and 200 data points and choose pwopt according tomaximal R2
4 Results and Discussion
41 Cross Validation In Figure 8(a) the cross validationrecovery function for predictions of the SNV preprocessedspectra of the regression on the friction modi13er compoundis shown A 50-fold cross validation strategy with a cali-brationvalidation splitting of 7030 was used Red dotsrepresent the prediction of calibration and blue dots rep-resent validation samples It is obvious that the linear modelstruggles to predict the high and low compound intensityregions correctly e nonlinearity is visualized by an arrowand a dashed line to guide the eye
In Figure 8 also the cross validation recovery functionfor predictions after Dynamic Localized SNV (Figure 8(b))Peak SNV (Figure 8(c)) and Partial Peak SNV (Figure 8(d))optimization are shown e saturation eshyect in the lowintensity area of the compound response is almost com-pletely removed in the latter three cases It is also notablethat the scattering around the green bisecting line is sig-ni13cantly reduced us the con13dence interval for thepredictions is improved
e RMSEP values during the cross validation of theregression of the friction modi13er component are sum-marized in Figure 9 in a box-and-whisker plot representa-tion e red line indicates the median within the boxes theinterquartile range (IQR) (contains 50 of the data) isdepicted and the margins of the whiskers representQ1 minus 15 middot IQR and Q3 + 15 middot IQR for the lower and upperbound respectively (Q1 means the smallest 25 of the dataset are smaller than this value and Q3 means the smallest75 are smaller than this value) Subplot Figure 9(a) refers tothe transmission spectra and Figure 9(b) refers to the ab-sorbance spectra e labels are associated with (1) withoutstandardization (2) single SNV transformation on the fullspectral range (3) Localized SNV (4) Dynamic LocalizedSNV (5) Peak SNV and (6) Partial Peak SNV
It is noticeable that the RMSEP is very poor in case of thecrude transmission spectra and that SNV has a very usefulimpact on them whereas the improvement after SNV is lowfor absorbance spectra
For all sophisticated optimized standardization ap-proaches DLSNV PSNV and PPSNV the median and thescattering around the median of RMSEP decreases drasti-cally with respect to the SNV-transformed full spectra butalso LSNV seems to be a reasonable choice DLSNV onabsorbance spectra is characterized by the lowest medianand the smallest scattering con13rmed by Table 2 summa-rizing the mean values and standard deviation of RMSEP
RMSE
P Proposedmethods
Proposedmethods
(1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)
+ ++
+ +
+++
++++
00
02
04
06
08
10
12
14
16
(a) (b)
Figure 9 Box-and-whisker plot representation of the root-mean-squared error of prediction of the cross validation strategy (50 foldsrandom train test split of 7030 of the data) for the friction modi13er compound of ATF A In (a) the RMSEP values for the transmissionspectra are shown and in (b) the RMSEP values for the absorbance spectra are shown Boxplot (1) is without standardization (2) is witha single SNV transformation on the full spectrum (3) is with optimized LSNV (4) is with optimized Dynamic DLSNV (5) is with optimizedPSNV and (6) is with optimized PPSNV
Journal of Spectroscopy 9
e summarized RMSEPs of the regression to the antiox-idant additive of ATF B are shown in Figure 10 in a box-and-whisker plot representation where Figure 10(a) refers to thetransmission spectra and Figure 10(b) to the absorbance spectraIn this case DLSNV PSNV and PPSNV reduce both themedian and the scattering around themedian enormously whencompared with SNV on full spectra e best performance isachieved by PPSNV conducted on the transmission spectracon13rmed by Table 2 On transmission spectra DLSNV andPPSNV perform better than LSNV but PSNV only has a pos-itive eshyect when compared to SNV In relation to LSNV usingPSNV the predictive power is reduced e fact that PPSNV isthe best choice for this regression use case suggests that it isbene13cial to only use spectral regions with high correlation withthe target value and drop regions without or low correlation
411 Noise Robustness In Figure 11 the performance of theregression model for the prediction of noisy spectra isshown for the friction modi13er In subplot Figure 11(a)the curves for all preprocessings are depicted and inFigure 11(b) a zoomed view is shown Without any pre-processing the initial calibration error for both trans-mission and absorbance spectra is very poor and rises veryfast with the increasing noise level factor Although thesophisticated preprocessing methods LSNV DLSNVPNSV and PPSNV show a lower initial calibration errorthe slope of the error is lower than for SNV In Figure 11(b)the trend of the transmission spectra having low errorsteepness is visible One may say that the three proposedstandardization techniques show a very similar noise ro-bustness behavior and are signi13cantly better than none or
(a) (b)
RMSE
P
Proposedmethods
Proposedmethods
(1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)00
02
04
06
08
10
12
14
+
+
++
++
+
++
+
Figure 10 Box-and-whisker plot representation of the root-mean-squared error of prediction of the cross validation strategy for thephenolic antioxidant compound of ATF B In (a) the RMSEP values for the transmission spectra are shown and in (b) the RMSEP valuesfor the absorbance spectra are shown Boxplot (1) is without standardization (2) is with a single SNV transformation on the full spectrum(3) is with optimized LSNV (4) is with optimized Dynamic DLSNV (5) is with optimized PSNV and (6) is with optimized PPSNV
Table 2 Summary of the model performances described by the mean value and the standard deviation of the RMSEP values during crossvalidation e relative improvements and respective p-values compared with LSNV are also listed
MethodFriction modi13er ATF A Antioxidant ATF B
Transmission Absorbance Transmission AbsorbanceRaw 083plusmn 022 053plusmn 016 090plusmn 014 070plusmn 011Single SNV 034plusmn 014 042plusmn 017 061plusmn 011 070plusmn 013LSNV 030plusmn 007 031plusmn 007 024plusmn 004 024plusmn 004DLSNV 028plusmn 006 026 plusmn 005 021plusmn 004 021plusmn 004Rel improvement 9 (plt 005) 16 (plt 0001) 13 (plt 0001) 13 (plt 0001)PSNV 028plusmn 008 028plusmn 006 033plusmn 006 031plusmn 007Rel improvement 8 (pgt 005) 9 (plt 005) minus41 (plt 0001) minus33 (plt 0001)PPSNV 026plusmn 006 026plusmn 006 017 plusmn 004 024plusmn 005Rel improvement 13 (plt 001) 15 (plt 0001) 29 (plt 0001) minus3 (pgt 005)
10 Journal of Spectroscopy
SNV preprocessing indicating that the model did notoverfit the data as mentioned in Section 332
In Figure 12 the performance of the regression model ofthe antioxidant for the prediction of noisy spectra is shown(e absorbance spectra with or without preprocessing showa similar noise trend as the SNV-transformed spectra InFigure 12(b) the localized versions are shown in a zoomedview (e PPSNV preprocessing on the transmission spectrais characterized by the flattest noise dependency (ese resutsdemonstrate the superiority of the PPSNVmethod in this usecase As mentioned above PSNV is not advantageous in thisapplication and shows the lowest noise immunity but it ispreferable to the SNV across the whole spectrum
42 Summary (e optimized parameters for the pre-processing methods are summarized in Table 3 (e LSNVoptimization process selects the same window size asDLSNV (us the second window size run has no influenceon the final result in these two cases but the starting pointproduces an improvement
As already mentioned in Table 2 the cross validationperformances of the tested methods are summarized as meanvalues and standard deviation for all cross validation runs Forthe friction modifier the performances of DLSNV PSNVand PPSNV are very similar Table 2 also lists the relative
improvements against the benchmark preprocessing LSNVaccompanied by corresponding p values from a two-sidedt-test which tests the significance of the mean values beingdifferent (the deviation for relative improvements when thesame mean value is given due to the fact that the improve-ments were calculated from exact values rather than roundedvalues)
(e best mean RMSEP value for the regression model forthe frictionmodifier of 026 is produced by DLSNV based onabsorbance spectra (e antioxidant compound is modeledbest by PPSNV preprocessing of the transmission spectraand yields a very low prediction error of 017
To summarize one may say that all proposed methodsperformed very well reducing both mean and standarddeviation of the cross validation error compared with SNVPSNV is not reasonable for the antioxidant additive as theperformance is poor compared with the benchmark pre-processing method LSNV
Which preprocessing method is the best depends on theactual regression use case but in general it is shown thatPPSNV outperforms PSNV (is suggests that it is bene-ficial to drop spectral regions showing low or no de-pendency on the target value and to only consider highlycorrelated peaks
For the antioxidant compound PPSNV yielded anenormous improvement (is could be explained as the
Noise levelfactor
Wo standardizationSNVLSNVDLSNV
PSNVPPSNVTransmissionAbsorbance
14
12
10
08
RMSE
06
04
02
0001 02 03 04
(a)
RMSE
Noise levelfactor
LSNVDLSNVPSNV
PPSNVTransmissionAbsorbance
045
040
035
030
025
020
01501 02 03 04
(b)
Figure 11 RMSE as a function of the noise level factor for the regression of the friction modifier compound of ATF A calibrated by theunperturbed full data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50repetitions of noise addition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown
Journal of Spectroscopy 11
phenolic aging inhibitor is a compound with very narrowvibrational band in the ATF B and thus does not have a greatimpact when the SNV is carried out across the entirespectrum (is may lead to a suboptimal alignment of thisband In case of novel standardizations the SNV is opti-mized to the high correlative bands and scatter effects can becompensated for these exact regions
(e fact that PSNV is unsuccessful for the antioxidantmay be because the POIs are not centered to the middle ofthe SNVwindow andmay have large left and right margins ifthey are far away from other POIs As a result they may notbe optimally standardized (is is shown in Figure 6(b)where the single POI at about 2700 cmminus1 has a large singleSNV window
(e study provides an overview of model performanceswhen using transmission or absorbance spectra suggestingthat both cases can lead to valid regressionmodels Howeverfor quantitative models built on transmission spectra theSNV is vital whereas in the absorbance case the predictivepower does not depend on SNV transformation In absor-bance spectra the influence of the baseline constant is re-duced because high transmission values are converted intolow absorbance values
To conclude DLSNV PSNV and PPSNV were able toimprove both transmission and absorbance predictivemodels (e scattering around the mean values are also
drastically reduced because the model does not have to learnhow to compensate for the baseline shift in each crossvalidation step leading to more reproducible results Eachvibrational band is optimally aligned so that the additivedepletion trend is encoded in the absolute signal intensityand the model does not have to weigh a data point asbackground correction
5 Conclusion
(e results presented in this study demonstrate the out-performance of the proposed novel standardizationstrategies Dynamic Localized SNV Peak SNV and PartialPeak SNV to improve both the mean and scatter of RMSEP
Noise level factor
RMSE
Wo standardizationSNVLSNVDLSNV
PSNVPPSNVTransmissionAbsorbance
20
15
10
05
0001 02 03 04
(a)
LSNVDLSNVPSNV
PPSNVTransmissionAbsorbance
RMSE
Noise level factor
07
06
05
04
03
02
0101 02 03 04
(b)
Figure 12 RMSE as a function of the noise level factor for the regression of the antioxidant compound of ATF B calibrated by the originalfull data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50 repetitions of noiseaddition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown
Table 3 Summary of the preprocessing parameters For DLSNVwindow size and starting point for PSN agglomeration windowand for PPSNV window width around POI is shown
MethodATF A ATF B
Transmission Absorbance Transmission AbsorbanceLSNV wopt 52 wopt 52 wopt 51 wopt 2 51DLSNV wopt 2 52 wopt 2 52 wopt 2 51 wopt 2 51
sopt 5 sopt 7 sopt 48 sopt 48PSNV aggopt 5 aggopt 14 aggopt 6 aggopt 10PPSNV pwopt 17 pwopt 25 pwopt 15 pwopt 38
12 Journal of Spectroscopy
values in cross validation and the robustness against noisedrastically with respect to SNV transformation executed onthe entire spectrum Against the benchmark LSNV anenhancement of the predictive power of a ridge regressionmodel by up to 16 and 29 could be achieved for thefriction modifier and the antioxidant compound re-spectively (e demonstrated optimization workflows forperforming SNV on specific regions of the spectrum havebeen introduced here for the first time (erefore thestandardization methods used in this paper are capable ofeliminating nonlinearities by flexible rescaling in definedareas To our knowledge such standardization techniqueshave not been presented elsewhere
Data Availability
(e data used to support the findings of this study are in-cluded within the article
Conflicts of Interest
(e authors declare that they have no conflicts of interest
References
[1] M Zeaiter J M Roger and V Bellon-Maurel ldquoDynamicorthogonal projection a new method to maintain the on-linerobustness of multivariate calibrations application to nir-based monitoring of wine fermentationsrdquo Chemometrics andIntelligent Laboratory Systems vol 80 no 2 pp 227ndash2352006
[2] R M Balabin and R Z Safieva ldquoMotor oil classificationby base stock and viscosity based on near infrared (nir)spectroscopy datardquo Fuel vol 87 no 12 pp 2745ndash27522008
[3] A Borin and R J Poppi ldquoMultivariate quality control oflubricating oils using fourier transform infrared spectros-copyrdquo Journal of the Brazilian Chemical Society vol 15 no 4pp 570ndash576 2004
[4] J Huang D Brennan L Sattler J Alderman B Lane andC OrsquoMathuna ldquoA comparison of calibration methods basedon calibration data size and robustnessrdquo Chemometrics andIntelligent Laboratory Systems vol 62 no 1 pp 25ndash352002
[5] M A Al-Ghouti and L Al-Atoum ldquoVirgin and recycledengine oil differentiation a spectroscopic studyrdquo Journal ofEnvironmental Management vol 90 no 1 pp 187ndash195 2009
[6] R Kellner J M Mermet M Otto M Valcarcel andH M Widmer Analytical Chemistry John Wiley amp SonsAustralia Limited Milton Australia 2004
[7] B G Osborne T Fearn P T Hindle and B G OsbornePractical nir Spectroscopy with Applications in Food andBeverage Analysis Longman Scientific amp Technical WileyHarlow Essex UK 1993
[8] J D Donald and D D Kevin Interpreting Diffuse Reflectanceand Transmittance NIR Chichester UK 2007
[9] A Rinnan F van den Berg and S B Engelsen ldquoReview of themost common pre-processing techniques for near-infraredspectrardquo TrAC Trends in Analytical Chemistry vol 28 no 10pp 1201ndash1222 2009
[10] E Arendse O Amos Fawole L Samukelo MagwazaN Helene and U Linus Opara ldquoComparing the analyticalperformance of near and mid infrared spectrometers for
evaluating pomegranate juice qualityrdquo LWT vol 91pp 180ndash190 2018
[11] J C Machado M A Faria I M P L V O FerreiraR N M J Pascoa and J A Lopes ldquoVarietal discrimination ofhop pellets by near and mid infrared spectroscopyrdquo Talantavol 180 pp 69ndash75 2018
[12] H W Siesler ldquoVibrational spectroscopyrdquo in ReferenceModule in Materials Science and Materials EngineeringElsevier New York NY USA 2016
[13] A Paula Craig B G Botelho L S Oliveira and A S FrancaldquoMid infrared spectroscopy and chemometrics as tools for theclassification of roasted coffees by cup qualityrdquo FoodChemistry vol 245 pp 1052ndash1061 2018
[14] S R Khandasammy M A Fikiet E Mistek et al ldquoBloodstainspaintings and drugs raman spectroscopy applications in fo-rensic sciencerdquo Forensic Chemistry vol 8 pp 111ndash133 2018
[15] J Engel G Jan E Szymanska et al ldquoBreaking with trends inpre-processingrdquo TrAC Trends in Analytical Chemistryvol 50 pp 96ndash106 2013
[16] H Martens S Jensen and P Geladi ldquoMultivariate linearitytransformation for near-infrared reflectance spectrometryrdquo inProceedings of Nordic symposium on Applied Statisticspp 205ndash234 Stavanger Norway June 1983
[17] P Geladi D MacDougall and HMartens ldquoLinearization andscatter-correction for near-infrared reflectance spectra ofmeatrdquo Applied Spectroscopy vol 39 no 3 pp 491ndash500 1985
[18] P Kubelka and F Munk ldquoEin beitrag zur optik der far-banstricherdquo Zeitschrift fur Technische Physik vol 12pp 593ndash601 1931
[19] A Claus Andersson ldquoDirect orthogonalizationrdquo Chemo-metrics and Intelligent Laboratory Systems vol 47 no 1pp 51ndash63 1999
[20] R J Barnes M S Dhanoa and S J Lister ldquoStandard normalvariate transformation and de-trending of near-infrareddiffuse reflectance spectrardquo Applied Spectroscopy vol 43no 5 pp 772ndash777 1989
[21] M Zeaiter J-M Roger and V Bellon-Maurel ldquoRobustnessof models developed by multivariate calibration part ii theinfluence of pre-processing methodsrdquo TrAC Trends in Ana-lytical Chemistry vol 24 no 5 pp 437ndash445 2005
[22] T Fearn C Riccioli A Garrido-Varo and J Emilio Guer-rero-Ginel ldquoOn the geometry of SNV and mscrdquo Chemo-metrics and Intelligent Laboratory Systems vol 96 no 1pp 22ndash26 2009
[23] T Isaksson and B Kowalski ldquoPiece-wise multiplicative scattercorrection applied to near-infrared diffuse transmittance datafrom meat productsrdquo Applied Spectroscopy vol 47 no 6pp 702ndash709 1993
[24] Y Bi K Yuan W Xiao et al ldquoA local pre-processing methodfor near-infrared spectra combined with spectral segmen-tation and standard normal variate transformationrdquo Analy-tica Chimica Acta vol 909 pp 30ndash40 2016
[25] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 pp 2825ndash2830 2011
[26] S Raschka Python Machine Learning Packt PublishingBirmingham UK 2015
[27] E L Lehmann and G Casella Deory of Point EstimationSpringer Texts in Statistics Springer New York NY USA2003
[28] O Heinisch ldquoSteel R G D and J H Torrie principles andprocedures of statistics (with special reference to the bi-ological sciences) mcgraw-hill book company New York
Journal of Spectroscopy 13
Toronto London 1960 481 s 15 abb 81 s 6 drdquo BiometrischeZeitschrift vol 4 no 3 pp 207-208 1962
[29] K Pearson ldquoNote on regression and inheritance in the case oftwo parentsrdquo Proceedings of the Royal Society of Londonvol 58 no 1 pp 240ndash242 1895
14 Journal of Spectroscopy
TribologyAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
International Journal ofInternational Journal ofPhotoenergy
Hindawiwwwhindawicom Volume 2018
Journal of
Chemistry
Hindawiwwwhindawicom Volume 2018
Advances inPhysical Chemistry
Hindawiwwwhindawicom
Analytical Methods in Chemistry
Journal of
Volume 2018
Bioinorganic Chemistry and ApplicationsHindawiwwwhindawicom Volume 2018
SpectroscopyInternational Journal of
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Medicinal ChemistryInternational Journal of
Hindawiwwwhindawicom Volume 2018
NanotechnologyHindawiwwwhindawicom Volume 2018
Journal of
Applied ChemistryJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Biochemistry Research International
Hindawiwwwhindawicom Volume 2018
Enzyme Research
Hindawiwwwhindawicom Volume 2018
Journal of
SpectroscopyAnalytical ChemistryInternational Journal of
Hindawiwwwhindawicom Volume 2018
MaterialsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
BioMed Research International Electrochemistry
International Journal of
Hindawiwwwhindawicom Volume 2018
Na
nom
ate
ria
ls
Hindawiwwwhindawicom Volume 2018
Journal ofNanomaterials
Submit your manuscripts atwwwhindawicom
number generator e noise levels were de13ned by thefactors (005 010 015 020 025 030 035 040 and 045)which were multiplied with the output of the randomnumber generator For each noise level 50 simulated noisydata sets were generated and predicted by the pretrainedmodel in order to be able to make well-founded statementsabout the model performance under noise perturbation
e noise robustness workow is a very helpful tool toinvestigate whether a good calibration error is a real advantageor if the model ran into over13tting Using the same regressionalgorithm twice with dishyerent regularization parameters λ thelower regularized model will generate a lower initial calibration
error than the more stringent regularized model But if themodels are tested for robustness the latter tends to have a lowererror slope when the noise level increases
34 Evaluation Metrics e built-in functions R2 score andmean squared error (MSE) of the scikit-learn frameworkwere used as performance metrics
341 Mean Squared Error (MSE) e mean squared error(MSE) of a prediction is calculated by the squared dishyerencesbetween the predicted value yipred and the reference value yi
3500 3000 2500 2000 1500 1000Wavenumbers (cmndash1)
Tran
smiss
ion
(au
)
12
10
08
06
04
02
(a)
3500 2500 1500Wavenumbers (cmndash1)
Tran
smiss
ion
(au
) 2
0ndash1ndash2ndash3
1
(b)
3500 2500 1500
Abs
orba
nce (
au)
Wavenumbers (cmndash1)
86420
(c)
Figure 1 FTIR spectra of ATF (a) Raw full transmission spectra without any preprocessing e two data sets with dishyerent measurementsetups can be discriminated by eye e blue spectra originate from measurement type (1) with two KBr discs separated by a Teon spacerand the red set of curves originates from the cuvette measurement (2) (b) SNV-transformed transmission spectra after truncation of thesaturated C-H vibrational regions and CO2 areas and (c) SNV-transformed absorbance spectra after truncation
3500 3000 2500 2000 1500 1000Wavenumbers (cmndash1)
Tran
smiss
ion
(au
)
14
12
10
08
06
02
(a)
04
3500 2500 1500
Abs
orba
nce (
au)
Wavenumbers (cmndash1)
86420
(c)
3500 2500 1500Wavenumbers (cmndash1)
Tran
smiss
ion
(au
)0
ndash1
ndash2
ndash3
1
(b)
Figure 2 FTIR spectra of ATF B (a) Raw full transmission spectra without any preprocessing (b) SNV-transformed transmission spectraafter truncation of the saturated C-H vibrational regions and CO2 areas and (c) SNV-transformed absorbance spectra after truncation
4 Journal of Spectroscopy
of the ith sample For a given data set with n samples theMSEis the average value over all samples It follows the followingformula [27]
MSE y ypred( ) 1nsumn
i1yi minusyipred( )
2 (3)
e best possible MSE value is 0 and small values aredesirable as the deviation from the correct prediction is lowFrom MSE the root-mean-squared error (RMSE) was cal-culated by taking the square root e RMSE value has thesame dimension as the original reference target values
342 R2 Coecient of Determination R2 describes theportion of the variance in the target values (dependentvariables) that can be predicted from the spectra (in-dependent variables) by the model [28] e best possiblescore for R2 is 10 R2 gets 00 for a constant model whichpredicts a constant value disregarding of the input featuresFor linear regression modeling with intercept R2 is equal tothe square of Pearson correlation coecient between pre-dicted and reference target values [29] For a data setcomprising n samples the R2 score is given as
R2 y ypred( ) 1minussumni1 yi minusyipred( )
2
sumni1 yi minusypred( )2 (4)
where yipred is the model prediction of the ith sample whichhas a reference value yi and ypred is the mean value of allpredictions
ypred 1nsumn
i1yipred (5)
35 Standard Normal Variate Each spectrumx (x1 x2 xk) with k measured data points istransformed to the standardized form z (z1 z2 zk)by bringing the spectra to zero mean and unit varianceFor this purpose the mean spectrum x is subtracted fromeach data point xi and divided by the standard deviation
zi xi minus x
sumkj xi minusx( )2kradic (6)
with
x 1ksumk
j
xj (7)
351 Dynamic Localized SNV (DLSNV) e DLSNVworkow is based on the SNV-transformed spectra data set(Figure 4(a)) To calculate the DLSNV data the spectra aredivided into multiple regions On each of these regionsstandardization is performed To adjust the windows toimportant areas in the spectrum a starting point can bede13ned In Figure 4(b) the DLSNV spectra are shown witha starting point of 100 and a window size of 300 pixels
DLSNV algorithm
(i) Perform SNV on a window of the spectrum rangingfrom 13rst data point to the sth one
(ii) Subdivide spectra from sth data point into windowsof all the same size ws
To optimize the two parameters window size ws andstarting point s a three-step approach is performed In eachstep the predictive power of the model is assessed via the
Storage time (h)
Stan
dard
ized
mol
ecul
e sig
nals
(au
)
2
1
0
ndash1
ndash2
ndash3
ndash4
Pure oilCopper chipsMaterial mix
100 200 300 400 500 600 700
(a)
2
1
0
ndash1
ndash2
ndash3
Storage time (h)
Stan
dard
ized
mol
ecul
e sig
nals
(au
)
Pure oilCopper chipsMaterial mix
100 200 300 400 500 600 700
(b)
Figure 3 Standardized additive responses used as target value for the FTIR regressionmodel for (a) the friction modi13er compound and (b)the antioxidant plotted against time for storage temperature 140degC for all three storage experiments measured by HPLC-QToF
Journal of Spectroscopy 5
coecient of determination R2 e prediction performanceof the chosen window size in combination with the re-gression model is benchmarked by 13tting the same model tothe single SNV data indicated by a red line in Figure 4(c)e optimization steps can be summarized as follows
(1) Perform LSNV with window sizes from 50 to 500pixels and determine R2 for all window sizes Findthe optimal window size wsopt1
(2) Perform LSNV with optimal window size of step 1wsopt 1 vary the starting point from 0 to 2middotwsopt 1 andselect the optimal starting point sopt
(3) Perform LSNV with optimal starting point sopt withwindow sizes from 50 to 2middotwsopt 1 in order to 13nd thebest combination of window size wsopt 2 and startingpoint sopt
In Figure 4(d) the 13nal DLSNV spectra after optimi-zation are shown Note that jumps can occur between theindividual standardization windows since the mean value ofthis current window is subtracted for each window How-ever this does not ashyect the regression model
In Figure 5(a) the ATF A samples are shown withSNVperformed on the entire spectral region and in Figure 5(b)the same spectra are depicted after DLSNV optimizationFigure 5(c) shows a zoom-in view of the highlighted region ofFigure 5(a) and in Figure 5(d) the same region is depicted afterDLSNV optimization e baseline is removed for the exactspectra sequence and thus peaks are aligned in a way that thedishyerent aging levels of the samples can already be recognizedby eyee shown snipped spectrum is the phenolic antioxidantregionus the decrease of this band can be associatedwith theaging level Magenta indicates (relatively) fresh sampleswhereas red indicates a strong degradation level
352 Peak SNV e idea behind the Peak SNV method isto standardize the important areas of the spectrum in-dependently of each other e optimization workow forPSNV is shown in Figure 6 starting from the single SNVtransformed data set Data points with a high correlationwith the target values (points of interest POI) are selected
(Figure 6(a)) and the SNV transformation is performed onwindows around the centroids Once the POIs are identi13edthe PSNV transformation is conducted as follows
PSNV algorithm
(i) Subdivide spectra into sequences ranging from halfthe distance from the previous POI to half thedistance to the next one (Figure 6(b)) SNV isperformed across these windows
To 13nd the POI an initial regression model is 13tted tothe data In order to identify important regions of thespectra the model coecients are assessed e normalizedabsolute values of the coecient vector are fed into a peak-picking algorithm Since it may occur that POIs are in closeproximity an agglomeration of the POIs is conducted inorder to prevent from very narrow standardization windowsPeak centroids are calculated via the mean value of thecombined POIs e task for the optimization process is to13nd the best window for POI agglomeration aggopt whichis done by analyzing the calibration R2 for each agglom-eration window and picking the window size with maximalcorrelation between the predicted and reference target(Figure 6(c)) e steps are summarized as follows
(1) Fit the data set to the target values (only calibration)(2) Pick peaks from the normalized model coecient
vector (|w|max(|w|)) threshold for peaks 01(3) Combine peaks which are within a certain window
agg and calculate the centroid of the agglomeratedPOIs
(4) Perform PSNV across the centroid of the POIs(5) Evaluate performance via R2 for agg between 10 and
50 data points and choose aggopt according tomaximal R2
After optimization each window has an individualwindow size and range over the peak centroid of im-portant signals in the spectrum On these windows SNVtransformation provides an optimal baseline and scattereshyect removal e optimized spectrum is shown inFigure 6(d)
SNV
3500 3000 2000 1000Wavenumbers (cmndash1)
2500
Window SNV
Wavenumbers (cmndash1)2500
Find optimalwindow size and
start point
(b)(a) (c)
Final window SNV
3500 3000 2000 1000 3500 3000 2000 1000Wavenumbers (cmndash1)
2500
(d)
095
095
095
097
097
097
R2 Optimization process
R2
R2
Window size 1st run
Starting point
Window size 2nd run
100 200 300 400 500
0 20 40 60 80 100 120
50 60 70 80 90 100 110 120
Figure 4 Demonstration of the workow and optimization process for Dynamic Localized SNV (a) Single SNV (b) Dynamic LocalizedSNV with starting point 100 and window 300 for visualization (c) three-stage optimization process for window starting point and 13nalwindow optimization and (d) optimized DLSNV
6 Journal of Spectroscopy
353 Partial Peak SNV e idea behind Partial Peak SNV issimilar to PSNV picking the regions of the spectrum whichshow a high correlation with the target values agglomeratingPOIs in close proximity and standardizing these importantspectral features (Figure 7(a)) But unlike for PSNV not onlythe whole spectrum is 13nally taken into account but alsoa small window around the POI It may occur that the samedata point appears several times in dishyerent standardizations(see overlapping regions in Figures 7(b) and 7(d)) Due to thisworkow the PPSNV spectrum may have more data points(due to overlapping) or less (because not the entire spectrum istaken into account) than those of the original spectrum APPSNV spectrum is calculated as follows
PPSNV algorithm
(i) Perform SNV across the POIs with a left and rightmargin of pw
e optimization focuses on the adjustment of thewindow size pw around the POIs in which the SNV isapplied for maximal predictive power in calibration (Fig-ure 7(c)) e optimization process is divided into thefollowing steps
(1) Fit the data set to the target values (only calibration)(2) Pick peaks from the normalized model coecient
vector (|w|max(|w|)) threshold for peaks 01
Wavenumbers (cmndash1) Wavenumbers (cmndash1)
(b)(a)Wavenumbers (cmndash1)
2500
3500 3450 3400 3350 3500 3450 3400 3350
3500 3000 2000 1000 3500 3000 2000 1000Wavenumbers (cmndash1)
2500
(d)(c)
Figure 5 Demonstration of the improvement of peak alignment for DLSNV (a) SNV-transformed transmission spectra with marked zoomlevel of (c) (b) Optimized DLSV spectra with marked zoom area of (d) Magenta indicates (relatively) fresh samples whereas red indicatesa strong degradation level
R2
Agglomeration window size
Optimization
(c)Wavenumbers (cmndash1)
2500
Divide spectrum in peak windows
Agglomeratepeaks andperform
peak SNV
Wavenumbers (cmndash1)2500
Model coefficientsselected peaks
(b)(a)
10 20 30 40 503500 3000 2000 10003500 3000 2000 1000 3500 3000 2000 1000Wavenumbers (cmndash1)
2500
Final peak SNV
(d)
Find optimalagglomerationwindow size 095
096
Figure 6 Demonstration of the workow and optimization process for Dynamic Localized SNV (a) Picked peaks of normalized absolutecoecient vector and indication of the POIs in one spectrum (b) spectrum separation according to agglomerated peaks (c) optimizationprocess in order to 13nd optimal agglomeration window size and (d) optimized PSNV
Journal of Spectroscopy 7
Reference friction modifier peak volume
Nonlinearity
ndash3
ndash2
ndash1
0
1
2
Calibration samplesValidation samples
Pred
icte
d fri
ctio
n m
odifi
er p
eak
volu
me
Decreasein additive concentration
ndash3 ndash2 ndash1 0 1 2
(a)
Calibration samplesValidation samples
Reference friction modifier peak volume
ndash3
ndash2
ndash1
0
1
2
Pred
icte
d fri
ctio
n m
odifi
er p
eak
volu
me
ndash3 ndash2 ndash1 0 1 2
(b)
Calibration samplesValidation samples
ndash3
ndash2
ndash1
0
1
2
Pred
icte
d fri
ctio
n m
odifi
er p
eak
volu
me
Reference friction modifier peak volumendash3 ndash2 ndash1 0 1 2
(c)
Calibration samplesValidation samples
ndash3 ndash2 ndash1 0 1 2
ndash3
ndash2
ndash1
0
1
2
Pred
icte
d fri
ctio
n m
odifi
er p
eak
volu
me
Reference friction modifier peak volume
(d)
Figure 8 Cross validation results for (a) SNV (b) Dynamic Localized SNV (c) Peak SNV and (d) Partial Peak SNV preprocessed spectra inthe regression case of the friction modi13er additive e reference values were measured by HPLC-QToF Red dots indicate prediction ofcalibration samples and blue dots represent prediction of the hold-out validation samples e dashed arrow in (a) indicates the increasingdegradation of the samples with increasing time as the friction modi13er additive concentration gets lower
R2
Window size0 50 100 150 200
Optimization
(c)
3500 3000 2000 1000Wavenumbers (cmndash1)
2500
Divide spectrum in peak windows
Agglomeratepeaks andperform PartPeak SNV
3500 3000 2000 1000Wavenumbers (cmndash1)
2500
Model coefficientsselected peaks
(b)(a)
150010000Data points
500
Final part peak SNV
(d)
Find optimalwindow size
092
097
Figure 7 Demonstration of the workow and optimization process for Dynamic Localized SNV (a) Picked peaks of normalized absolutecoecient vector and indication of the POIs in one spectrum (b) spectrum separation according to agglomerated peaks (c) optimizationprocess in order to 13nd optimal window size around the POIs and (d) optimized PPSNV
8 Journal of Spectroscopy
(3) Perform PPSNV across the peaks with the windowsize pw
(4) Evaluate the performance via R2 for pw between 1and 200 data points and choose pwopt according tomaximal R2
4 Results and Discussion
41 Cross Validation In Figure 8(a) the cross validationrecovery function for predictions of the SNV preprocessedspectra of the regression on the friction modi13er compoundis shown A 50-fold cross validation strategy with a cali-brationvalidation splitting of 7030 was used Red dotsrepresent the prediction of calibration and blue dots rep-resent validation samples It is obvious that the linear modelstruggles to predict the high and low compound intensityregions correctly e nonlinearity is visualized by an arrowand a dashed line to guide the eye
In Figure 8 also the cross validation recovery functionfor predictions after Dynamic Localized SNV (Figure 8(b))Peak SNV (Figure 8(c)) and Partial Peak SNV (Figure 8(d))optimization are shown e saturation eshyect in the lowintensity area of the compound response is almost com-pletely removed in the latter three cases It is also notablethat the scattering around the green bisecting line is sig-ni13cantly reduced us the con13dence interval for thepredictions is improved
e RMSEP values during the cross validation of theregression of the friction modi13er component are sum-marized in Figure 9 in a box-and-whisker plot representa-tion e red line indicates the median within the boxes theinterquartile range (IQR) (contains 50 of the data) isdepicted and the margins of the whiskers representQ1 minus 15 middot IQR and Q3 + 15 middot IQR for the lower and upperbound respectively (Q1 means the smallest 25 of the dataset are smaller than this value and Q3 means the smallest75 are smaller than this value) Subplot Figure 9(a) refers tothe transmission spectra and Figure 9(b) refers to the ab-sorbance spectra e labels are associated with (1) withoutstandardization (2) single SNV transformation on the fullspectral range (3) Localized SNV (4) Dynamic LocalizedSNV (5) Peak SNV and (6) Partial Peak SNV
It is noticeable that the RMSEP is very poor in case of thecrude transmission spectra and that SNV has a very usefulimpact on them whereas the improvement after SNV is lowfor absorbance spectra
For all sophisticated optimized standardization ap-proaches DLSNV PSNV and PPSNV the median and thescattering around the median of RMSEP decreases drasti-cally with respect to the SNV-transformed full spectra butalso LSNV seems to be a reasonable choice DLSNV onabsorbance spectra is characterized by the lowest medianand the smallest scattering con13rmed by Table 2 summa-rizing the mean values and standard deviation of RMSEP
RMSE
P Proposedmethods
Proposedmethods
(1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)
+ ++
+ +
+++
++++
00
02
04
06
08
10
12
14
16
(a) (b)
Figure 9 Box-and-whisker plot representation of the root-mean-squared error of prediction of the cross validation strategy (50 foldsrandom train test split of 7030 of the data) for the friction modi13er compound of ATF A In (a) the RMSEP values for the transmissionspectra are shown and in (b) the RMSEP values for the absorbance spectra are shown Boxplot (1) is without standardization (2) is witha single SNV transformation on the full spectrum (3) is with optimized LSNV (4) is with optimized Dynamic DLSNV (5) is with optimizedPSNV and (6) is with optimized PPSNV
Journal of Spectroscopy 9
e summarized RMSEPs of the regression to the antiox-idant additive of ATF B are shown in Figure 10 in a box-and-whisker plot representation where Figure 10(a) refers to thetransmission spectra and Figure 10(b) to the absorbance spectraIn this case DLSNV PSNV and PPSNV reduce both themedian and the scattering around themedian enormously whencompared with SNV on full spectra e best performance isachieved by PPSNV conducted on the transmission spectracon13rmed by Table 2 On transmission spectra DLSNV andPPSNV perform better than LSNV but PSNV only has a pos-itive eshyect when compared to SNV In relation to LSNV usingPSNV the predictive power is reduced e fact that PPSNV isthe best choice for this regression use case suggests that it isbene13cial to only use spectral regions with high correlation withthe target value and drop regions without or low correlation
411 Noise Robustness In Figure 11 the performance of theregression model for the prediction of noisy spectra isshown for the friction modi13er In subplot Figure 11(a)the curves for all preprocessings are depicted and inFigure 11(b) a zoomed view is shown Without any pre-processing the initial calibration error for both trans-mission and absorbance spectra is very poor and rises veryfast with the increasing noise level factor Although thesophisticated preprocessing methods LSNV DLSNVPNSV and PPSNV show a lower initial calibration errorthe slope of the error is lower than for SNV In Figure 11(b)the trend of the transmission spectra having low errorsteepness is visible One may say that the three proposedstandardization techniques show a very similar noise ro-bustness behavior and are signi13cantly better than none or
(a) (b)
RMSE
P
Proposedmethods
Proposedmethods
(1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)00
02
04
06
08
10
12
14
+
+
++
++
+
++
+
Figure 10 Box-and-whisker plot representation of the root-mean-squared error of prediction of the cross validation strategy for thephenolic antioxidant compound of ATF B In (a) the RMSEP values for the transmission spectra are shown and in (b) the RMSEP valuesfor the absorbance spectra are shown Boxplot (1) is without standardization (2) is with a single SNV transformation on the full spectrum(3) is with optimized LSNV (4) is with optimized Dynamic DLSNV (5) is with optimized PSNV and (6) is with optimized PPSNV
Table 2 Summary of the model performances described by the mean value and the standard deviation of the RMSEP values during crossvalidation e relative improvements and respective p-values compared with LSNV are also listed
MethodFriction modi13er ATF A Antioxidant ATF B
Transmission Absorbance Transmission AbsorbanceRaw 083plusmn 022 053plusmn 016 090plusmn 014 070plusmn 011Single SNV 034plusmn 014 042plusmn 017 061plusmn 011 070plusmn 013LSNV 030plusmn 007 031plusmn 007 024plusmn 004 024plusmn 004DLSNV 028plusmn 006 026 plusmn 005 021plusmn 004 021plusmn 004Rel improvement 9 (plt 005) 16 (plt 0001) 13 (plt 0001) 13 (plt 0001)PSNV 028plusmn 008 028plusmn 006 033plusmn 006 031plusmn 007Rel improvement 8 (pgt 005) 9 (plt 005) minus41 (plt 0001) minus33 (plt 0001)PPSNV 026plusmn 006 026plusmn 006 017 plusmn 004 024plusmn 005Rel improvement 13 (plt 001) 15 (plt 0001) 29 (plt 0001) minus3 (pgt 005)
10 Journal of Spectroscopy
SNV preprocessing indicating that the model did notoverfit the data as mentioned in Section 332
In Figure 12 the performance of the regression model ofthe antioxidant for the prediction of noisy spectra is shown(e absorbance spectra with or without preprocessing showa similar noise trend as the SNV-transformed spectra InFigure 12(b) the localized versions are shown in a zoomedview (e PPSNV preprocessing on the transmission spectrais characterized by the flattest noise dependency (ese resutsdemonstrate the superiority of the PPSNVmethod in this usecase As mentioned above PSNV is not advantageous in thisapplication and shows the lowest noise immunity but it ispreferable to the SNV across the whole spectrum
42 Summary (e optimized parameters for the pre-processing methods are summarized in Table 3 (e LSNVoptimization process selects the same window size asDLSNV (us the second window size run has no influenceon the final result in these two cases but the starting pointproduces an improvement
As already mentioned in Table 2 the cross validationperformances of the tested methods are summarized as meanvalues and standard deviation for all cross validation runs Forthe friction modifier the performances of DLSNV PSNVand PPSNV are very similar Table 2 also lists the relative
improvements against the benchmark preprocessing LSNVaccompanied by corresponding p values from a two-sidedt-test which tests the significance of the mean values beingdifferent (the deviation for relative improvements when thesame mean value is given due to the fact that the improve-ments were calculated from exact values rather than roundedvalues)
(e best mean RMSEP value for the regression model forthe frictionmodifier of 026 is produced by DLSNV based onabsorbance spectra (e antioxidant compound is modeledbest by PPSNV preprocessing of the transmission spectraand yields a very low prediction error of 017
To summarize one may say that all proposed methodsperformed very well reducing both mean and standarddeviation of the cross validation error compared with SNVPSNV is not reasonable for the antioxidant additive as theperformance is poor compared with the benchmark pre-processing method LSNV
Which preprocessing method is the best depends on theactual regression use case but in general it is shown thatPPSNV outperforms PSNV (is suggests that it is bene-ficial to drop spectral regions showing low or no de-pendency on the target value and to only consider highlycorrelated peaks
For the antioxidant compound PPSNV yielded anenormous improvement (is could be explained as the
Noise levelfactor
Wo standardizationSNVLSNVDLSNV
PSNVPPSNVTransmissionAbsorbance
14
12
10
08
RMSE
06
04
02
0001 02 03 04
(a)
RMSE
Noise levelfactor
LSNVDLSNVPSNV
PPSNVTransmissionAbsorbance
045
040
035
030
025
020
01501 02 03 04
(b)
Figure 11 RMSE as a function of the noise level factor for the regression of the friction modifier compound of ATF A calibrated by theunperturbed full data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50repetitions of noise addition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown
Journal of Spectroscopy 11
phenolic aging inhibitor is a compound with very narrowvibrational band in the ATF B and thus does not have a greatimpact when the SNV is carried out across the entirespectrum (is may lead to a suboptimal alignment of thisband In case of novel standardizations the SNV is opti-mized to the high correlative bands and scatter effects can becompensated for these exact regions
(e fact that PSNV is unsuccessful for the antioxidantmay be because the POIs are not centered to the middle ofthe SNVwindow andmay have large left and right margins ifthey are far away from other POIs As a result they may notbe optimally standardized (is is shown in Figure 6(b)where the single POI at about 2700 cmminus1 has a large singleSNV window
(e study provides an overview of model performanceswhen using transmission or absorbance spectra suggestingthat both cases can lead to valid regressionmodels Howeverfor quantitative models built on transmission spectra theSNV is vital whereas in the absorbance case the predictivepower does not depend on SNV transformation In absor-bance spectra the influence of the baseline constant is re-duced because high transmission values are converted intolow absorbance values
To conclude DLSNV PSNV and PPSNV were able toimprove both transmission and absorbance predictivemodels (e scattering around the mean values are also
drastically reduced because the model does not have to learnhow to compensate for the baseline shift in each crossvalidation step leading to more reproducible results Eachvibrational band is optimally aligned so that the additivedepletion trend is encoded in the absolute signal intensityand the model does not have to weigh a data point asbackground correction
5 Conclusion
(e results presented in this study demonstrate the out-performance of the proposed novel standardizationstrategies Dynamic Localized SNV Peak SNV and PartialPeak SNV to improve both the mean and scatter of RMSEP
Noise level factor
RMSE
Wo standardizationSNVLSNVDLSNV
PSNVPPSNVTransmissionAbsorbance
20
15
10
05
0001 02 03 04
(a)
LSNVDLSNVPSNV
PPSNVTransmissionAbsorbance
RMSE
Noise level factor
07
06
05
04
03
02
0101 02 03 04
(b)
Figure 12 RMSE as a function of the noise level factor for the regression of the antioxidant compound of ATF B calibrated by the originalfull data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50 repetitions of noiseaddition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown
Table 3 Summary of the preprocessing parameters For DLSNVwindow size and starting point for PSN agglomeration windowand for PPSNV window width around POI is shown
MethodATF A ATF B
Transmission Absorbance Transmission AbsorbanceLSNV wopt 52 wopt 52 wopt 51 wopt 2 51DLSNV wopt 2 52 wopt 2 52 wopt 2 51 wopt 2 51
sopt 5 sopt 7 sopt 48 sopt 48PSNV aggopt 5 aggopt 14 aggopt 6 aggopt 10PPSNV pwopt 17 pwopt 25 pwopt 15 pwopt 38
12 Journal of Spectroscopy
values in cross validation and the robustness against noisedrastically with respect to SNV transformation executed onthe entire spectrum Against the benchmark LSNV anenhancement of the predictive power of a ridge regressionmodel by up to 16 and 29 could be achieved for thefriction modifier and the antioxidant compound re-spectively (e demonstrated optimization workflows forperforming SNV on specific regions of the spectrum havebeen introduced here for the first time (erefore thestandardization methods used in this paper are capable ofeliminating nonlinearities by flexible rescaling in definedareas To our knowledge such standardization techniqueshave not been presented elsewhere
Data Availability
(e data used to support the findings of this study are in-cluded within the article
Conflicts of Interest
(e authors declare that they have no conflicts of interest
References
[1] M Zeaiter J M Roger and V Bellon-Maurel ldquoDynamicorthogonal projection a new method to maintain the on-linerobustness of multivariate calibrations application to nir-based monitoring of wine fermentationsrdquo Chemometrics andIntelligent Laboratory Systems vol 80 no 2 pp 227ndash2352006
[2] R M Balabin and R Z Safieva ldquoMotor oil classificationby base stock and viscosity based on near infrared (nir)spectroscopy datardquo Fuel vol 87 no 12 pp 2745ndash27522008
[3] A Borin and R J Poppi ldquoMultivariate quality control oflubricating oils using fourier transform infrared spectros-copyrdquo Journal of the Brazilian Chemical Society vol 15 no 4pp 570ndash576 2004
[4] J Huang D Brennan L Sattler J Alderman B Lane andC OrsquoMathuna ldquoA comparison of calibration methods basedon calibration data size and robustnessrdquo Chemometrics andIntelligent Laboratory Systems vol 62 no 1 pp 25ndash352002
[5] M A Al-Ghouti and L Al-Atoum ldquoVirgin and recycledengine oil differentiation a spectroscopic studyrdquo Journal ofEnvironmental Management vol 90 no 1 pp 187ndash195 2009
[6] R Kellner J M Mermet M Otto M Valcarcel andH M Widmer Analytical Chemistry John Wiley amp SonsAustralia Limited Milton Australia 2004
[7] B G Osborne T Fearn P T Hindle and B G OsbornePractical nir Spectroscopy with Applications in Food andBeverage Analysis Longman Scientific amp Technical WileyHarlow Essex UK 1993
[8] J D Donald and D D Kevin Interpreting Diffuse Reflectanceand Transmittance NIR Chichester UK 2007
[9] A Rinnan F van den Berg and S B Engelsen ldquoReview of themost common pre-processing techniques for near-infraredspectrardquo TrAC Trends in Analytical Chemistry vol 28 no 10pp 1201ndash1222 2009
[10] E Arendse O Amos Fawole L Samukelo MagwazaN Helene and U Linus Opara ldquoComparing the analyticalperformance of near and mid infrared spectrometers for
evaluating pomegranate juice qualityrdquo LWT vol 91pp 180ndash190 2018
[11] J C Machado M A Faria I M P L V O FerreiraR N M J Pascoa and J A Lopes ldquoVarietal discrimination ofhop pellets by near and mid infrared spectroscopyrdquo Talantavol 180 pp 69ndash75 2018
[12] H W Siesler ldquoVibrational spectroscopyrdquo in ReferenceModule in Materials Science and Materials EngineeringElsevier New York NY USA 2016
[13] A Paula Craig B G Botelho L S Oliveira and A S FrancaldquoMid infrared spectroscopy and chemometrics as tools for theclassification of roasted coffees by cup qualityrdquo FoodChemistry vol 245 pp 1052ndash1061 2018
[14] S R Khandasammy M A Fikiet E Mistek et al ldquoBloodstainspaintings and drugs raman spectroscopy applications in fo-rensic sciencerdquo Forensic Chemistry vol 8 pp 111ndash133 2018
[15] J Engel G Jan E Szymanska et al ldquoBreaking with trends inpre-processingrdquo TrAC Trends in Analytical Chemistryvol 50 pp 96ndash106 2013
[16] H Martens S Jensen and P Geladi ldquoMultivariate linearitytransformation for near-infrared reflectance spectrometryrdquo inProceedings of Nordic symposium on Applied Statisticspp 205ndash234 Stavanger Norway June 1983
[17] P Geladi D MacDougall and HMartens ldquoLinearization andscatter-correction for near-infrared reflectance spectra ofmeatrdquo Applied Spectroscopy vol 39 no 3 pp 491ndash500 1985
[18] P Kubelka and F Munk ldquoEin beitrag zur optik der far-banstricherdquo Zeitschrift fur Technische Physik vol 12pp 593ndash601 1931
[19] A Claus Andersson ldquoDirect orthogonalizationrdquo Chemo-metrics and Intelligent Laboratory Systems vol 47 no 1pp 51ndash63 1999
[20] R J Barnes M S Dhanoa and S J Lister ldquoStandard normalvariate transformation and de-trending of near-infrareddiffuse reflectance spectrardquo Applied Spectroscopy vol 43no 5 pp 772ndash777 1989
[21] M Zeaiter J-M Roger and V Bellon-Maurel ldquoRobustnessof models developed by multivariate calibration part ii theinfluence of pre-processing methodsrdquo TrAC Trends in Ana-lytical Chemistry vol 24 no 5 pp 437ndash445 2005
[22] T Fearn C Riccioli A Garrido-Varo and J Emilio Guer-rero-Ginel ldquoOn the geometry of SNV and mscrdquo Chemo-metrics and Intelligent Laboratory Systems vol 96 no 1pp 22ndash26 2009
[23] T Isaksson and B Kowalski ldquoPiece-wise multiplicative scattercorrection applied to near-infrared diffuse transmittance datafrom meat productsrdquo Applied Spectroscopy vol 47 no 6pp 702ndash709 1993
[24] Y Bi K Yuan W Xiao et al ldquoA local pre-processing methodfor near-infrared spectra combined with spectral segmen-tation and standard normal variate transformationrdquo Analy-tica Chimica Acta vol 909 pp 30ndash40 2016
[25] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 pp 2825ndash2830 2011
[26] S Raschka Python Machine Learning Packt PublishingBirmingham UK 2015
[27] E L Lehmann and G Casella Deory of Point EstimationSpringer Texts in Statistics Springer New York NY USA2003
[28] O Heinisch ldquoSteel R G D and J H Torrie principles andprocedures of statistics (with special reference to the bi-ological sciences) mcgraw-hill book company New York
Journal of Spectroscopy 13
Toronto London 1960 481 s 15 abb 81 s 6 drdquo BiometrischeZeitschrift vol 4 no 3 pp 207-208 1962
[29] K Pearson ldquoNote on regression and inheritance in the case oftwo parentsrdquo Proceedings of the Royal Society of Londonvol 58 no 1 pp 240ndash242 1895
14 Journal of Spectroscopy
TribologyAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
International Journal ofInternational Journal ofPhotoenergy
Hindawiwwwhindawicom Volume 2018
Journal of
Chemistry
Hindawiwwwhindawicom Volume 2018
Advances inPhysical Chemistry
Hindawiwwwhindawicom
Analytical Methods in Chemistry
Journal of
Volume 2018
Bioinorganic Chemistry and ApplicationsHindawiwwwhindawicom Volume 2018
SpectroscopyInternational Journal of
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Medicinal ChemistryInternational Journal of
Hindawiwwwhindawicom Volume 2018
NanotechnologyHindawiwwwhindawicom Volume 2018
Journal of
Applied ChemistryJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Biochemistry Research International
Hindawiwwwhindawicom Volume 2018
Enzyme Research
Hindawiwwwhindawicom Volume 2018
Journal of
SpectroscopyAnalytical ChemistryInternational Journal of
Hindawiwwwhindawicom Volume 2018
MaterialsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
BioMed Research International Electrochemistry
International Journal of
Hindawiwwwhindawicom Volume 2018
Na
nom
ate
ria
ls
Hindawiwwwhindawicom Volume 2018
Journal ofNanomaterials
Submit your manuscripts atwwwhindawicom
of the ith sample For a given data set with n samples theMSEis the average value over all samples It follows the followingformula [27]
MSE y ypred( ) 1nsumn
i1yi minusyipred( )
2 (3)
e best possible MSE value is 0 and small values aredesirable as the deviation from the correct prediction is lowFrom MSE the root-mean-squared error (RMSE) was cal-culated by taking the square root e RMSE value has thesame dimension as the original reference target values
342 R2 Coecient of Determination R2 describes theportion of the variance in the target values (dependentvariables) that can be predicted from the spectra (in-dependent variables) by the model [28] e best possiblescore for R2 is 10 R2 gets 00 for a constant model whichpredicts a constant value disregarding of the input featuresFor linear regression modeling with intercept R2 is equal tothe square of Pearson correlation coecient between pre-dicted and reference target values [29] For a data setcomprising n samples the R2 score is given as
R2 y ypred( ) 1minussumni1 yi minusyipred( )
2
sumni1 yi minusypred( )2 (4)
where yipred is the model prediction of the ith sample whichhas a reference value yi and ypred is the mean value of allpredictions
ypred 1nsumn
i1yipred (5)
35 Standard Normal Variate Each spectrumx (x1 x2 xk) with k measured data points istransformed to the standardized form z (z1 z2 zk)by bringing the spectra to zero mean and unit varianceFor this purpose the mean spectrum x is subtracted fromeach data point xi and divided by the standard deviation
zi xi minus x
sumkj xi minusx( )2kradic (6)
with
x 1ksumk
j
xj (7)
351 Dynamic Localized SNV (DLSNV) e DLSNVworkow is based on the SNV-transformed spectra data set(Figure 4(a)) To calculate the DLSNV data the spectra aredivided into multiple regions On each of these regionsstandardization is performed To adjust the windows toimportant areas in the spectrum a starting point can bede13ned In Figure 4(b) the DLSNV spectra are shown witha starting point of 100 and a window size of 300 pixels
DLSNV algorithm
(i) Perform SNV on a window of the spectrum rangingfrom 13rst data point to the sth one
(ii) Subdivide spectra from sth data point into windowsof all the same size ws
To optimize the two parameters window size ws andstarting point s a three-step approach is performed In eachstep the predictive power of the model is assessed via the
Storage time (h)
Stan
dard
ized
mol
ecul
e sig
nals
(au
)
2
1
0
ndash1
ndash2
ndash3
ndash4
Pure oilCopper chipsMaterial mix
100 200 300 400 500 600 700
(a)
2
1
0
ndash1
ndash2
ndash3
Storage time (h)
Stan
dard
ized
mol
ecul
e sig
nals
(au
)
Pure oilCopper chipsMaterial mix
100 200 300 400 500 600 700
(b)
Figure 3 Standardized additive responses used as target value for the FTIR regressionmodel for (a) the friction modi13er compound and (b)the antioxidant plotted against time for storage temperature 140degC for all three storage experiments measured by HPLC-QToF
Journal of Spectroscopy 5
coecient of determination R2 e prediction performanceof the chosen window size in combination with the re-gression model is benchmarked by 13tting the same model tothe single SNV data indicated by a red line in Figure 4(c)e optimization steps can be summarized as follows
(1) Perform LSNV with window sizes from 50 to 500pixels and determine R2 for all window sizes Findthe optimal window size wsopt1
(2) Perform LSNV with optimal window size of step 1wsopt 1 vary the starting point from 0 to 2middotwsopt 1 andselect the optimal starting point sopt
(3) Perform LSNV with optimal starting point sopt withwindow sizes from 50 to 2middotwsopt 1 in order to 13nd thebest combination of window size wsopt 2 and startingpoint sopt
In Figure 4(d) the 13nal DLSNV spectra after optimi-zation are shown Note that jumps can occur between theindividual standardization windows since the mean value ofthis current window is subtracted for each window How-ever this does not ashyect the regression model
In Figure 5(a) the ATF A samples are shown withSNVperformed on the entire spectral region and in Figure 5(b)the same spectra are depicted after DLSNV optimizationFigure 5(c) shows a zoom-in view of the highlighted region ofFigure 5(a) and in Figure 5(d) the same region is depicted afterDLSNV optimization e baseline is removed for the exactspectra sequence and thus peaks are aligned in a way that thedishyerent aging levels of the samples can already be recognizedby eyee shown snipped spectrum is the phenolic antioxidantregionus the decrease of this band can be associatedwith theaging level Magenta indicates (relatively) fresh sampleswhereas red indicates a strong degradation level
352 Peak SNV e idea behind the Peak SNV method isto standardize the important areas of the spectrum in-dependently of each other e optimization workow forPSNV is shown in Figure 6 starting from the single SNVtransformed data set Data points with a high correlationwith the target values (points of interest POI) are selected
(Figure 6(a)) and the SNV transformation is performed onwindows around the centroids Once the POIs are identi13edthe PSNV transformation is conducted as follows
PSNV algorithm
(i) Subdivide spectra into sequences ranging from halfthe distance from the previous POI to half thedistance to the next one (Figure 6(b)) SNV isperformed across these windows
To 13nd the POI an initial regression model is 13tted tothe data In order to identify important regions of thespectra the model coecients are assessed e normalizedabsolute values of the coecient vector are fed into a peak-picking algorithm Since it may occur that POIs are in closeproximity an agglomeration of the POIs is conducted inorder to prevent from very narrow standardization windowsPeak centroids are calculated via the mean value of thecombined POIs e task for the optimization process is to13nd the best window for POI agglomeration aggopt whichis done by analyzing the calibration R2 for each agglom-eration window and picking the window size with maximalcorrelation between the predicted and reference target(Figure 6(c)) e steps are summarized as follows
(1) Fit the data set to the target values (only calibration)(2) Pick peaks from the normalized model coecient
vector (|w|max(|w|)) threshold for peaks 01(3) Combine peaks which are within a certain window
agg and calculate the centroid of the agglomeratedPOIs
(4) Perform PSNV across the centroid of the POIs(5) Evaluate performance via R2 for agg between 10 and
50 data points and choose aggopt according tomaximal R2
After optimization each window has an individualwindow size and range over the peak centroid of im-portant signals in the spectrum On these windows SNVtransformation provides an optimal baseline and scattereshyect removal e optimized spectrum is shown inFigure 6(d)
SNV
3500 3000 2000 1000Wavenumbers (cmndash1)
2500
Window SNV
Wavenumbers (cmndash1)2500
Find optimalwindow size and
start point
(b)(a) (c)
Final window SNV
3500 3000 2000 1000 3500 3000 2000 1000Wavenumbers (cmndash1)
2500
(d)
095
095
095
097
097
097
R2 Optimization process
R2
R2
Window size 1st run
Starting point
Window size 2nd run
100 200 300 400 500
0 20 40 60 80 100 120
50 60 70 80 90 100 110 120
Figure 4 Demonstration of the workow and optimization process for Dynamic Localized SNV (a) Single SNV (b) Dynamic LocalizedSNV with starting point 100 and window 300 for visualization (c) three-stage optimization process for window starting point and 13nalwindow optimization and (d) optimized DLSNV
6 Journal of Spectroscopy
353 Partial Peak SNV e idea behind Partial Peak SNV issimilar to PSNV picking the regions of the spectrum whichshow a high correlation with the target values agglomeratingPOIs in close proximity and standardizing these importantspectral features (Figure 7(a)) But unlike for PSNV not onlythe whole spectrum is 13nally taken into account but alsoa small window around the POI It may occur that the samedata point appears several times in dishyerent standardizations(see overlapping regions in Figures 7(b) and 7(d)) Due to thisworkow the PPSNV spectrum may have more data points(due to overlapping) or less (because not the entire spectrum istaken into account) than those of the original spectrum APPSNV spectrum is calculated as follows
PPSNV algorithm
(i) Perform SNV across the POIs with a left and rightmargin of pw
e optimization focuses on the adjustment of thewindow size pw around the POIs in which the SNV isapplied for maximal predictive power in calibration (Fig-ure 7(c)) e optimization process is divided into thefollowing steps
(1) Fit the data set to the target values (only calibration)(2) Pick peaks from the normalized model coecient
vector (|w|max(|w|)) threshold for peaks 01
Wavenumbers (cmndash1) Wavenumbers (cmndash1)
(b)(a)Wavenumbers (cmndash1)
2500
3500 3450 3400 3350 3500 3450 3400 3350
3500 3000 2000 1000 3500 3000 2000 1000Wavenumbers (cmndash1)
2500
(d)(c)
Figure 5 Demonstration of the improvement of peak alignment for DLSNV (a) SNV-transformed transmission spectra with marked zoomlevel of (c) (b) Optimized DLSV spectra with marked zoom area of (d) Magenta indicates (relatively) fresh samples whereas red indicatesa strong degradation level
R2
Agglomeration window size
Optimization
(c)Wavenumbers (cmndash1)
2500
Divide spectrum in peak windows
Agglomeratepeaks andperform
peak SNV
Wavenumbers (cmndash1)2500
Model coefficientsselected peaks
(b)(a)
10 20 30 40 503500 3000 2000 10003500 3000 2000 1000 3500 3000 2000 1000Wavenumbers (cmndash1)
2500
Final peak SNV
(d)
Find optimalagglomerationwindow size 095
096
Figure 6 Demonstration of the workow and optimization process for Dynamic Localized SNV (a) Picked peaks of normalized absolutecoecient vector and indication of the POIs in one spectrum (b) spectrum separation according to agglomerated peaks (c) optimizationprocess in order to 13nd optimal agglomeration window size and (d) optimized PSNV
Journal of Spectroscopy 7
Reference friction modifier peak volume
Nonlinearity
ndash3
ndash2
ndash1
0
1
2
Calibration samplesValidation samples
Pred
icte
d fri
ctio
n m
odifi
er p
eak
volu
me
Decreasein additive concentration
ndash3 ndash2 ndash1 0 1 2
(a)
Calibration samplesValidation samples
Reference friction modifier peak volume
ndash3
ndash2
ndash1
0
1
2
Pred
icte
d fri
ctio
n m
odifi
er p
eak
volu
me
ndash3 ndash2 ndash1 0 1 2
(b)
Calibration samplesValidation samples
ndash3
ndash2
ndash1
0
1
2
Pred
icte
d fri
ctio
n m
odifi
er p
eak
volu
me
Reference friction modifier peak volumendash3 ndash2 ndash1 0 1 2
(c)
Calibration samplesValidation samples
ndash3 ndash2 ndash1 0 1 2
ndash3
ndash2
ndash1
0
1
2
Pred
icte
d fri
ctio
n m
odifi
er p
eak
volu
me
Reference friction modifier peak volume
(d)
Figure 8 Cross validation results for (a) SNV (b) Dynamic Localized SNV (c) Peak SNV and (d) Partial Peak SNV preprocessed spectra inthe regression case of the friction modi13er additive e reference values were measured by HPLC-QToF Red dots indicate prediction ofcalibration samples and blue dots represent prediction of the hold-out validation samples e dashed arrow in (a) indicates the increasingdegradation of the samples with increasing time as the friction modi13er additive concentration gets lower
R2
Window size0 50 100 150 200
Optimization
(c)
3500 3000 2000 1000Wavenumbers (cmndash1)
2500
Divide spectrum in peak windows
Agglomeratepeaks andperform PartPeak SNV
3500 3000 2000 1000Wavenumbers (cmndash1)
2500
Model coefficientsselected peaks
(b)(a)
150010000Data points
500
Final part peak SNV
(d)
Find optimalwindow size
092
097
Figure 7 Demonstration of the workow and optimization process for Dynamic Localized SNV (a) Picked peaks of normalized absolutecoecient vector and indication of the POIs in one spectrum (b) spectrum separation according to agglomerated peaks (c) optimizationprocess in order to 13nd optimal window size around the POIs and (d) optimized PPSNV
8 Journal of Spectroscopy
(3) Perform PPSNV across the peaks with the windowsize pw
(4) Evaluate the performance via R2 for pw between 1and 200 data points and choose pwopt according tomaximal R2
4 Results and Discussion
41 Cross Validation In Figure 8(a) the cross validationrecovery function for predictions of the SNV preprocessedspectra of the regression on the friction modi13er compoundis shown A 50-fold cross validation strategy with a cali-brationvalidation splitting of 7030 was used Red dotsrepresent the prediction of calibration and blue dots rep-resent validation samples It is obvious that the linear modelstruggles to predict the high and low compound intensityregions correctly e nonlinearity is visualized by an arrowand a dashed line to guide the eye
In Figure 8 also the cross validation recovery functionfor predictions after Dynamic Localized SNV (Figure 8(b))Peak SNV (Figure 8(c)) and Partial Peak SNV (Figure 8(d))optimization are shown e saturation eshyect in the lowintensity area of the compound response is almost com-pletely removed in the latter three cases It is also notablethat the scattering around the green bisecting line is sig-ni13cantly reduced us the con13dence interval for thepredictions is improved
e RMSEP values during the cross validation of theregression of the friction modi13er component are sum-marized in Figure 9 in a box-and-whisker plot representa-tion e red line indicates the median within the boxes theinterquartile range (IQR) (contains 50 of the data) isdepicted and the margins of the whiskers representQ1 minus 15 middot IQR and Q3 + 15 middot IQR for the lower and upperbound respectively (Q1 means the smallest 25 of the dataset are smaller than this value and Q3 means the smallest75 are smaller than this value) Subplot Figure 9(a) refers tothe transmission spectra and Figure 9(b) refers to the ab-sorbance spectra e labels are associated with (1) withoutstandardization (2) single SNV transformation on the fullspectral range (3) Localized SNV (4) Dynamic LocalizedSNV (5) Peak SNV and (6) Partial Peak SNV
It is noticeable that the RMSEP is very poor in case of thecrude transmission spectra and that SNV has a very usefulimpact on them whereas the improvement after SNV is lowfor absorbance spectra
For all sophisticated optimized standardization ap-proaches DLSNV PSNV and PPSNV the median and thescattering around the median of RMSEP decreases drasti-cally with respect to the SNV-transformed full spectra butalso LSNV seems to be a reasonable choice DLSNV onabsorbance spectra is characterized by the lowest medianand the smallest scattering con13rmed by Table 2 summa-rizing the mean values and standard deviation of RMSEP
RMSE
P Proposedmethods
Proposedmethods
(1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)
+ ++
+ +
+++
++++
00
02
04
06
08
10
12
14
16
(a) (b)
Figure 9 Box-and-whisker plot representation of the root-mean-squared error of prediction of the cross validation strategy (50 foldsrandom train test split of 7030 of the data) for the friction modi13er compound of ATF A In (a) the RMSEP values for the transmissionspectra are shown and in (b) the RMSEP values for the absorbance spectra are shown Boxplot (1) is without standardization (2) is witha single SNV transformation on the full spectrum (3) is with optimized LSNV (4) is with optimized Dynamic DLSNV (5) is with optimizedPSNV and (6) is with optimized PPSNV
Journal of Spectroscopy 9
e summarized RMSEPs of the regression to the antiox-idant additive of ATF B are shown in Figure 10 in a box-and-whisker plot representation where Figure 10(a) refers to thetransmission spectra and Figure 10(b) to the absorbance spectraIn this case DLSNV PSNV and PPSNV reduce both themedian and the scattering around themedian enormously whencompared with SNV on full spectra e best performance isachieved by PPSNV conducted on the transmission spectracon13rmed by Table 2 On transmission spectra DLSNV andPPSNV perform better than LSNV but PSNV only has a pos-itive eshyect when compared to SNV In relation to LSNV usingPSNV the predictive power is reduced e fact that PPSNV isthe best choice for this regression use case suggests that it isbene13cial to only use spectral regions with high correlation withthe target value and drop regions without or low correlation
411 Noise Robustness In Figure 11 the performance of theregression model for the prediction of noisy spectra isshown for the friction modi13er In subplot Figure 11(a)the curves for all preprocessings are depicted and inFigure 11(b) a zoomed view is shown Without any pre-processing the initial calibration error for both trans-mission and absorbance spectra is very poor and rises veryfast with the increasing noise level factor Although thesophisticated preprocessing methods LSNV DLSNVPNSV and PPSNV show a lower initial calibration errorthe slope of the error is lower than for SNV In Figure 11(b)the trend of the transmission spectra having low errorsteepness is visible One may say that the three proposedstandardization techniques show a very similar noise ro-bustness behavior and are signi13cantly better than none or
(a) (b)
RMSE
P
Proposedmethods
Proposedmethods
(1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)00
02
04
06
08
10
12
14
+
+
++
++
+
++
+
Figure 10 Box-and-whisker plot representation of the root-mean-squared error of prediction of the cross validation strategy for thephenolic antioxidant compound of ATF B In (a) the RMSEP values for the transmission spectra are shown and in (b) the RMSEP valuesfor the absorbance spectra are shown Boxplot (1) is without standardization (2) is with a single SNV transformation on the full spectrum(3) is with optimized LSNV (4) is with optimized Dynamic DLSNV (5) is with optimized PSNV and (6) is with optimized PPSNV
Table 2 Summary of the model performances described by the mean value and the standard deviation of the RMSEP values during crossvalidation e relative improvements and respective p-values compared with LSNV are also listed
MethodFriction modi13er ATF A Antioxidant ATF B
Transmission Absorbance Transmission AbsorbanceRaw 083plusmn 022 053plusmn 016 090plusmn 014 070plusmn 011Single SNV 034plusmn 014 042plusmn 017 061plusmn 011 070plusmn 013LSNV 030plusmn 007 031plusmn 007 024plusmn 004 024plusmn 004DLSNV 028plusmn 006 026 plusmn 005 021plusmn 004 021plusmn 004Rel improvement 9 (plt 005) 16 (plt 0001) 13 (plt 0001) 13 (plt 0001)PSNV 028plusmn 008 028plusmn 006 033plusmn 006 031plusmn 007Rel improvement 8 (pgt 005) 9 (plt 005) minus41 (plt 0001) minus33 (plt 0001)PPSNV 026plusmn 006 026plusmn 006 017 plusmn 004 024plusmn 005Rel improvement 13 (plt 001) 15 (plt 0001) 29 (plt 0001) minus3 (pgt 005)
10 Journal of Spectroscopy
SNV preprocessing indicating that the model did notoverfit the data as mentioned in Section 332
In Figure 12 the performance of the regression model ofthe antioxidant for the prediction of noisy spectra is shown(e absorbance spectra with or without preprocessing showa similar noise trend as the SNV-transformed spectra InFigure 12(b) the localized versions are shown in a zoomedview (e PPSNV preprocessing on the transmission spectrais characterized by the flattest noise dependency (ese resutsdemonstrate the superiority of the PPSNVmethod in this usecase As mentioned above PSNV is not advantageous in thisapplication and shows the lowest noise immunity but it ispreferable to the SNV across the whole spectrum
42 Summary (e optimized parameters for the pre-processing methods are summarized in Table 3 (e LSNVoptimization process selects the same window size asDLSNV (us the second window size run has no influenceon the final result in these two cases but the starting pointproduces an improvement
As already mentioned in Table 2 the cross validationperformances of the tested methods are summarized as meanvalues and standard deviation for all cross validation runs Forthe friction modifier the performances of DLSNV PSNVand PPSNV are very similar Table 2 also lists the relative
improvements against the benchmark preprocessing LSNVaccompanied by corresponding p values from a two-sidedt-test which tests the significance of the mean values beingdifferent (the deviation for relative improvements when thesame mean value is given due to the fact that the improve-ments were calculated from exact values rather than roundedvalues)
(e best mean RMSEP value for the regression model forthe frictionmodifier of 026 is produced by DLSNV based onabsorbance spectra (e antioxidant compound is modeledbest by PPSNV preprocessing of the transmission spectraand yields a very low prediction error of 017
To summarize one may say that all proposed methodsperformed very well reducing both mean and standarddeviation of the cross validation error compared with SNVPSNV is not reasonable for the antioxidant additive as theperformance is poor compared with the benchmark pre-processing method LSNV
Which preprocessing method is the best depends on theactual regression use case but in general it is shown thatPPSNV outperforms PSNV (is suggests that it is bene-ficial to drop spectral regions showing low or no de-pendency on the target value and to only consider highlycorrelated peaks
For the antioxidant compound PPSNV yielded anenormous improvement (is could be explained as the
Noise levelfactor
Wo standardizationSNVLSNVDLSNV
PSNVPPSNVTransmissionAbsorbance
14
12
10
08
RMSE
06
04
02
0001 02 03 04
(a)
RMSE
Noise levelfactor
LSNVDLSNVPSNV
PPSNVTransmissionAbsorbance
045
040
035
030
025
020
01501 02 03 04
(b)
Figure 11 RMSE as a function of the noise level factor for the regression of the friction modifier compound of ATF A calibrated by theunperturbed full data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50repetitions of noise addition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown
Journal of Spectroscopy 11
phenolic aging inhibitor is a compound with very narrowvibrational band in the ATF B and thus does not have a greatimpact when the SNV is carried out across the entirespectrum (is may lead to a suboptimal alignment of thisband In case of novel standardizations the SNV is opti-mized to the high correlative bands and scatter effects can becompensated for these exact regions
(e fact that PSNV is unsuccessful for the antioxidantmay be because the POIs are not centered to the middle ofthe SNVwindow andmay have large left and right margins ifthey are far away from other POIs As a result they may notbe optimally standardized (is is shown in Figure 6(b)where the single POI at about 2700 cmminus1 has a large singleSNV window
(e study provides an overview of model performanceswhen using transmission or absorbance spectra suggestingthat both cases can lead to valid regressionmodels Howeverfor quantitative models built on transmission spectra theSNV is vital whereas in the absorbance case the predictivepower does not depend on SNV transformation In absor-bance spectra the influence of the baseline constant is re-duced because high transmission values are converted intolow absorbance values
To conclude DLSNV PSNV and PPSNV were able toimprove both transmission and absorbance predictivemodels (e scattering around the mean values are also
drastically reduced because the model does not have to learnhow to compensate for the baseline shift in each crossvalidation step leading to more reproducible results Eachvibrational band is optimally aligned so that the additivedepletion trend is encoded in the absolute signal intensityand the model does not have to weigh a data point asbackground correction
5 Conclusion
(e results presented in this study demonstrate the out-performance of the proposed novel standardizationstrategies Dynamic Localized SNV Peak SNV and PartialPeak SNV to improve both the mean and scatter of RMSEP
Noise level factor
RMSE
Wo standardizationSNVLSNVDLSNV
PSNVPPSNVTransmissionAbsorbance
20
15
10
05
0001 02 03 04
(a)
LSNVDLSNVPSNV
PPSNVTransmissionAbsorbance
RMSE
Noise level factor
07
06
05
04
03
02
0101 02 03 04
(b)
Figure 12 RMSE as a function of the noise level factor for the regression of the antioxidant compound of ATF B calibrated by the originalfull data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50 repetitions of noiseaddition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown
Table 3 Summary of the preprocessing parameters For DLSNVwindow size and starting point for PSN agglomeration windowand for PPSNV window width around POI is shown
MethodATF A ATF B
Transmission Absorbance Transmission AbsorbanceLSNV wopt 52 wopt 52 wopt 51 wopt 2 51DLSNV wopt 2 52 wopt 2 52 wopt 2 51 wopt 2 51
sopt 5 sopt 7 sopt 48 sopt 48PSNV aggopt 5 aggopt 14 aggopt 6 aggopt 10PPSNV pwopt 17 pwopt 25 pwopt 15 pwopt 38
12 Journal of Spectroscopy
values in cross validation and the robustness against noisedrastically with respect to SNV transformation executed onthe entire spectrum Against the benchmark LSNV anenhancement of the predictive power of a ridge regressionmodel by up to 16 and 29 could be achieved for thefriction modifier and the antioxidant compound re-spectively (e demonstrated optimization workflows forperforming SNV on specific regions of the spectrum havebeen introduced here for the first time (erefore thestandardization methods used in this paper are capable ofeliminating nonlinearities by flexible rescaling in definedareas To our knowledge such standardization techniqueshave not been presented elsewhere
Data Availability
(e data used to support the findings of this study are in-cluded within the article
Conflicts of Interest
(e authors declare that they have no conflicts of interest
References
[1] M Zeaiter J M Roger and V Bellon-Maurel ldquoDynamicorthogonal projection a new method to maintain the on-linerobustness of multivariate calibrations application to nir-based monitoring of wine fermentationsrdquo Chemometrics andIntelligent Laboratory Systems vol 80 no 2 pp 227ndash2352006
[2] R M Balabin and R Z Safieva ldquoMotor oil classificationby base stock and viscosity based on near infrared (nir)spectroscopy datardquo Fuel vol 87 no 12 pp 2745ndash27522008
[3] A Borin and R J Poppi ldquoMultivariate quality control oflubricating oils using fourier transform infrared spectros-copyrdquo Journal of the Brazilian Chemical Society vol 15 no 4pp 570ndash576 2004
[4] J Huang D Brennan L Sattler J Alderman B Lane andC OrsquoMathuna ldquoA comparison of calibration methods basedon calibration data size and robustnessrdquo Chemometrics andIntelligent Laboratory Systems vol 62 no 1 pp 25ndash352002
[5] M A Al-Ghouti and L Al-Atoum ldquoVirgin and recycledengine oil differentiation a spectroscopic studyrdquo Journal ofEnvironmental Management vol 90 no 1 pp 187ndash195 2009
[6] R Kellner J M Mermet M Otto M Valcarcel andH M Widmer Analytical Chemistry John Wiley amp SonsAustralia Limited Milton Australia 2004
[7] B G Osborne T Fearn P T Hindle and B G OsbornePractical nir Spectroscopy with Applications in Food andBeverage Analysis Longman Scientific amp Technical WileyHarlow Essex UK 1993
[8] J D Donald and D D Kevin Interpreting Diffuse Reflectanceand Transmittance NIR Chichester UK 2007
[9] A Rinnan F van den Berg and S B Engelsen ldquoReview of themost common pre-processing techniques for near-infraredspectrardquo TrAC Trends in Analytical Chemistry vol 28 no 10pp 1201ndash1222 2009
[10] E Arendse O Amos Fawole L Samukelo MagwazaN Helene and U Linus Opara ldquoComparing the analyticalperformance of near and mid infrared spectrometers for
evaluating pomegranate juice qualityrdquo LWT vol 91pp 180ndash190 2018
[11] J C Machado M A Faria I M P L V O FerreiraR N M J Pascoa and J A Lopes ldquoVarietal discrimination ofhop pellets by near and mid infrared spectroscopyrdquo Talantavol 180 pp 69ndash75 2018
[12] H W Siesler ldquoVibrational spectroscopyrdquo in ReferenceModule in Materials Science and Materials EngineeringElsevier New York NY USA 2016
[13] A Paula Craig B G Botelho L S Oliveira and A S FrancaldquoMid infrared spectroscopy and chemometrics as tools for theclassification of roasted coffees by cup qualityrdquo FoodChemistry vol 245 pp 1052ndash1061 2018
[14] S R Khandasammy M A Fikiet E Mistek et al ldquoBloodstainspaintings and drugs raman spectroscopy applications in fo-rensic sciencerdquo Forensic Chemistry vol 8 pp 111ndash133 2018
[15] J Engel G Jan E Szymanska et al ldquoBreaking with trends inpre-processingrdquo TrAC Trends in Analytical Chemistryvol 50 pp 96ndash106 2013
[16] H Martens S Jensen and P Geladi ldquoMultivariate linearitytransformation for near-infrared reflectance spectrometryrdquo inProceedings of Nordic symposium on Applied Statisticspp 205ndash234 Stavanger Norway June 1983
[17] P Geladi D MacDougall and HMartens ldquoLinearization andscatter-correction for near-infrared reflectance spectra ofmeatrdquo Applied Spectroscopy vol 39 no 3 pp 491ndash500 1985
[18] P Kubelka and F Munk ldquoEin beitrag zur optik der far-banstricherdquo Zeitschrift fur Technische Physik vol 12pp 593ndash601 1931
[19] A Claus Andersson ldquoDirect orthogonalizationrdquo Chemo-metrics and Intelligent Laboratory Systems vol 47 no 1pp 51ndash63 1999
[20] R J Barnes M S Dhanoa and S J Lister ldquoStandard normalvariate transformation and de-trending of near-infrareddiffuse reflectance spectrardquo Applied Spectroscopy vol 43no 5 pp 772ndash777 1989
[21] M Zeaiter J-M Roger and V Bellon-Maurel ldquoRobustnessof models developed by multivariate calibration part ii theinfluence of pre-processing methodsrdquo TrAC Trends in Ana-lytical Chemistry vol 24 no 5 pp 437ndash445 2005
[22] T Fearn C Riccioli A Garrido-Varo and J Emilio Guer-rero-Ginel ldquoOn the geometry of SNV and mscrdquo Chemo-metrics and Intelligent Laboratory Systems vol 96 no 1pp 22ndash26 2009
[23] T Isaksson and B Kowalski ldquoPiece-wise multiplicative scattercorrection applied to near-infrared diffuse transmittance datafrom meat productsrdquo Applied Spectroscopy vol 47 no 6pp 702ndash709 1993
[24] Y Bi K Yuan W Xiao et al ldquoA local pre-processing methodfor near-infrared spectra combined with spectral segmen-tation and standard normal variate transformationrdquo Analy-tica Chimica Acta vol 909 pp 30ndash40 2016
[25] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 pp 2825ndash2830 2011
[26] S Raschka Python Machine Learning Packt PublishingBirmingham UK 2015
[27] E L Lehmann and G Casella Deory of Point EstimationSpringer Texts in Statistics Springer New York NY USA2003
[28] O Heinisch ldquoSteel R G D and J H Torrie principles andprocedures of statistics (with special reference to the bi-ological sciences) mcgraw-hill book company New York
Journal of Spectroscopy 13
Toronto London 1960 481 s 15 abb 81 s 6 drdquo BiometrischeZeitschrift vol 4 no 3 pp 207-208 1962
[29] K Pearson ldquoNote on regression and inheritance in the case oftwo parentsrdquo Proceedings of the Royal Society of Londonvol 58 no 1 pp 240ndash242 1895
14 Journal of Spectroscopy
TribologyAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
International Journal ofInternational Journal ofPhotoenergy
Hindawiwwwhindawicom Volume 2018
Journal of
Chemistry
Hindawiwwwhindawicom Volume 2018
Advances inPhysical Chemistry
Hindawiwwwhindawicom
Analytical Methods in Chemistry
Journal of
Volume 2018
Bioinorganic Chemistry and ApplicationsHindawiwwwhindawicom Volume 2018
SpectroscopyInternational Journal of
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Medicinal ChemistryInternational Journal of
Hindawiwwwhindawicom Volume 2018
NanotechnologyHindawiwwwhindawicom Volume 2018
Journal of
Applied ChemistryJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Biochemistry Research International
Hindawiwwwhindawicom Volume 2018
Enzyme Research
Hindawiwwwhindawicom Volume 2018
Journal of
SpectroscopyAnalytical ChemistryInternational Journal of
Hindawiwwwhindawicom Volume 2018
MaterialsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
BioMed Research International Electrochemistry
International Journal of
Hindawiwwwhindawicom Volume 2018
Na
nom
ate
ria
ls
Hindawiwwwhindawicom Volume 2018
Journal ofNanomaterials
Submit your manuscripts atwwwhindawicom
coecient of determination R2 e prediction performanceof the chosen window size in combination with the re-gression model is benchmarked by 13tting the same model tothe single SNV data indicated by a red line in Figure 4(c)e optimization steps can be summarized as follows
(1) Perform LSNV with window sizes from 50 to 500pixels and determine R2 for all window sizes Findthe optimal window size wsopt1
(2) Perform LSNV with optimal window size of step 1wsopt 1 vary the starting point from 0 to 2middotwsopt 1 andselect the optimal starting point sopt
(3) Perform LSNV with optimal starting point sopt withwindow sizes from 50 to 2middotwsopt 1 in order to 13nd thebest combination of window size wsopt 2 and startingpoint sopt
In Figure 4(d) the 13nal DLSNV spectra after optimi-zation are shown Note that jumps can occur between theindividual standardization windows since the mean value ofthis current window is subtracted for each window How-ever this does not ashyect the regression model
In Figure 5(a) the ATF A samples are shown withSNVperformed on the entire spectral region and in Figure 5(b)the same spectra are depicted after DLSNV optimizationFigure 5(c) shows a zoom-in view of the highlighted region ofFigure 5(a) and in Figure 5(d) the same region is depicted afterDLSNV optimization e baseline is removed for the exactspectra sequence and thus peaks are aligned in a way that thedishyerent aging levels of the samples can already be recognizedby eyee shown snipped spectrum is the phenolic antioxidantregionus the decrease of this band can be associatedwith theaging level Magenta indicates (relatively) fresh sampleswhereas red indicates a strong degradation level
352 Peak SNV e idea behind the Peak SNV method isto standardize the important areas of the spectrum in-dependently of each other e optimization workow forPSNV is shown in Figure 6 starting from the single SNVtransformed data set Data points with a high correlationwith the target values (points of interest POI) are selected
(Figure 6(a)) and the SNV transformation is performed onwindows around the centroids Once the POIs are identi13edthe PSNV transformation is conducted as follows
PSNV algorithm
(i) Subdivide spectra into sequences ranging from halfthe distance from the previous POI to half thedistance to the next one (Figure 6(b)) SNV isperformed across these windows
To 13nd the POI an initial regression model is 13tted tothe data In order to identify important regions of thespectra the model coecients are assessed e normalizedabsolute values of the coecient vector are fed into a peak-picking algorithm Since it may occur that POIs are in closeproximity an agglomeration of the POIs is conducted inorder to prevent from very narrow standardization windowsPeak centroids are calculated via the mean value of thecombined POIs e task for the optimization process is to13nd the best window for POI agglomeration aggopt whichis done by analyzing the calibration R2 for each agglom-eration window and picking the window size with maximalcorrelation between the predicted and reference target(Figure 6(c)) e steps are summarized as follows
(1) Fit the data set to the target values (only calibration)(2) Pick peaks from the normalized model coecient
vector (|w|max(|w|)) threshold for peaks 01(3) Combine peaks which are within a certain window
agg and calculate the centroid of the agglomeratedPOIs
(4) Perform PSNV across the centroid of the POIs(5) Evaluate performance via R2 for agg between 10 and
50 data points and choose aggopt according tomaximal R2
After optimization each window has an individualwindow size and range over the peak centroid of im-portant signals in the spectrum On these windows SNVtransformation provides an optimal baseline and scattereshyect removal e optimized spectrum is shown inFigure 6(d)
SNV
3500 3000 2000 1000Wavenumbers (cmndash1)
2500
Window SNV
Wavenumbers (cmndash1)2500
Find optimalwindow size and
start point
(b)(a) (c)
Final window SNV
3500 3000 2000 1000 3500 3000 2000 1000Wavenumbers (cmndash1)
2500
(d)
095
095
095
097
097
097
R2 Optimization process
R2
R2
Window size 1st run
Starting point
Window size 2nd run
100 200 300 400 500
0 20 40 60 80 100 120
50 60 70 80 90 100 110 120
Figure 4 Demonstration of the workow and optimization process for Dynamic Localized SNV (a) Single SNV (b) Dynamic LocalizedSNV with starting point 100 and window 300 for visualization (c) three-stage optimization process for window starting point and 13nalwindow optimization and (d) optimized DLSNV
6 Journal of Spectroscopy
353 Partial Peak SNV e idea behind Partial Peak SNV issimilar to PSNV picking the regions of the spectrum whichshow a high correlation with the target values agglomeratingPOIs in close proximity and standardizing these importantspectral features (Figure 7(a)) But unlike for PSNV not onlythe whole spectrum is 13nally taken into account but alsoa small window around the POI It may occur that the samedata point appears several times in dishyerent standardizations(see overlapping regions in Figures 7(b) and 7(d)) Due to thisworkow the PPSNV spectrum may have more data points(due to overlapping) or less (because not the entire spectrum istaken into account) than those of the original spectrum APPSNV spectrum is calculated as follows
PPSNV algorithm
(i) Perform SNV across the POIs with a left and rightmargin of pw
e optimization focuses on the adjustment of thewindow size pw around the POIs in which the SNV isapplied for maximal predictive power in calibration (Fig-ure 7(c)) e optimization process is divided into thefollowing steps
(1) Fit the data set to the target values (only calibration)(2) Pick peaks from the normalized model coecient
vector (|w|max(|w|)) threshold for peaks 01
Wavenumbers (cmndash1) Wavenumbers (cmndash1)
(b)(a)Wavenumbers (cmndash1)
2500
3500 3450 3400 3350 3500 3450 3400 3350
3500 3000 2000 1000 3500 3000 2000 1000Wavenumbers (cmndash1)
2500
(d)(c)
Figure 5 Demonstration of the improvement of peak alignment for DLSNV (a) SNV-transformed transmission spectra with marked zoomlevel of (c) (b) Optimized DLSV spectra with marked zoom area of (d) Magenta indicates (relatively) fresh samples whereas red indicatesa strong degradation level
R2
Agglomeration window size
Optimization
(c)Wavenumbers (cmndash1)
2500
Divide spectrum in peak windows
Agglomeratepeaks andperform
peak SNV
Wavenumbers (cmndash1)2500
Model coefficientsselected peaks
(b)(a)
10 20 30 40 503500 3000 2000 10003500 3000 2000 1000 3500 3000 2000 1000Wavenumbers (cmndash1)
2500
Final peak SNV
(d)
Find optimalagglomerationwindow size 095
096
Figure 6 Demonstration of the workow and optimization process for Dynamic Localized SNV (a) Picked peaks of normalized absolutecoecient vector and indication of the POIs in one spectrum (b) spectrum separation according to agglomerated peaks (c) optimizationprocess in order to 13nd optimal agglomeration window size and (d) optimized PSNV
Journal of Spectroscopy 7
Reference friction modifier peak volume
Nonlinearity
ndash3
ndash2
ndash1
0
1
2
Calibration samplesValidation samples
Pred
icte
d fri
ctio
n m
odifi
er p
eak
volu
me
Decreasein additive concentration
ndash3 ndash2 ndash1 0 1 2
(a)
Calibration samplesValidation samples
Reference friction modifier peak volume
ndash3
ndash2
ndash1
0
1
2
Pred
icte
d fri
ctio
n m
odifi
er p
eak
volu
me
ndash3 ndash2 ndash1 0 1 2
(b)
Calibration samplesValidation samples
ndash3
ndash2
ndash1
0
1
2
Pred
icte
d fri
ctio
n m
odifi
er p
eak
volu
me
Reference friction modifier peak volumendash3 ndash2 ndash1 0 1 2
(c)
Calibration samplesValidation samples
ndash3 ndash2 ndash1 0 1 2
ndash3
ndash2
ndash1
0
1
2
Pred
icte
d fri
ctio
n m
odifi
er p
eak
volu
me
Reference friction modifier peak volume
(d)
Figure 8 Cross validation results for (a) SNV (b) Dynamic Localized SNV (c) Peak SNV and (d) Partial Peak SNV preprocessed spectra inthe regression case of the friction modi13er additive e reference values were measured by HPLC-QToF Red dots indicate prediction ofcalibration samples and blue dots represent prediction of the hold-out validation samples e dashed arrow in (a) indicates the increasingdegradation of the samples with increasing time as the friction modi13er additive concentration gets lower
R2
Window size0 50 100 150 200
Optimization
(c)
3500 3000 2000 1000Wavenumbers (cmndash1)
2500
Divide spectrum in peak windows
Agglomeratepeaks andperform PartPeak SNV
3500 3000 2000 1000Wavenumbers (cmndash1)
2500
Model coefficientsselected peaks
(b)(a)
150010000Data points
500
Final part peak SNV
(d)
Find optimalwindow size
092
097
Figure 7 Demonstration of the workow and optimization process for Dynamic Localized SNV (a) Picked peaks of normalized absolutecoecient vector and indication of the POIs in one spectrum (b) spectrum separation according to agglomerated peaks (c) optimizationprocess in order to 13nd optimal window size around the POIs and (d) optimized PPSNV
8 Journal of Spectroscopy
(3) Perform PPSNV across the peaks with the windowsize pw
(4) Evaluate the performance via R2 for pw between 1and 200 data points and choose pwopt according tomaximal R2
4 Results and Discussion
41 Cross Validation In Figure 8(a) the cross validationrecovery function for predictions of the SNV preprocessedspectra of the regression on the friction modi13er compoundis shown A 50-fold cross validation strategy with a cali-brationvalidation splitting of 7030 was used Red dotsrepresent the prediction of calibration and blue dots rep-resent validation samples It is obvious that the linear modelstruggles to predict the high and low compound intensityregions correctly e nonlinearity is visualized by an arrowand a dashed line to guide the eye
In Figure 8 also the cross validation recovery functionfor predictions after Dynamic Localized SNV (Figure 8(b))Peak SNV (Figure 8(c)) and Partial Peak SNV (Figure 8(d))optimization are shown e saturation eshyect in the lowintensity area of the compound response is almost com-pletely removed in the latter three cases It is also notablethat the scattering around the green bisecting line is sig-ni13cantly reduced us the con13dence interval for thepredictions is improved
e RMSEP values during the cross validation of theregression of the friction modi13er component are sum-marized in Figure 9 in a box-and-whisker plot representa-tion e red line indicates the median within the boxes theinterquartile range (IQR) (contains 50 of the data) isdepicted and the margins of the whiskers representQ1 minus 15 middot IQR and Q3 + 15 middot IQR for the lower and upperbound respectively (Q1 means the smallest 25 of the dataset are smaller than this value and Q3 means the smallest75 are smaller than this value) Subplot Figure 9(a) refers tothe transmission spectra and Figure 9(b) refers to the ab-sorbance spectra e labels are associated with (1) withoutstandardization (2) single SNV transformation on the fullspectral range (3) Localized SNV (4) Dynamic LocalizedSNV (5) Peak SNV and (6) Partial Peak SNV
It is noticeable that the RMSEP is very poor in case of thecrude transmission spectra and that SNV has a very usefulimpact on them whereas the improvement after SNV is lowfor absorbance spectra
For all sophisticated optimized standardization ap-proaches DLSNV PSNV and PPSNV the median and thescattering around the median of RMSEP decreases drasti-cally with respect to the SNV-transformed full spectra butalso LSNV seems to be a reasonable choice DLSNV onabsorbance spectra is characterized by the lowest medianand the smallest scattering con13rmed by Table 2 summa-rizing the mean values and standard deviation of RMSEP
RMSE
P Proposedmethods
Proposedmethods
(1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)
+ ++
+ +
+++
++++
00
02
04
06
08
10
12
14
16
(a) (b)
Figure 9 Box-and-whisker plot representation of the root-mean-squared error of prediction of the cross validation strategy (50 foldsrandom train test split of 7030 of the data) for the friction modi13er compound of ATF A In (a) the RMSEP values for the transmissionspectra are shown and in (b) the RMSEP values for the absorbance spectra are shown Boxplot (1) is without standardization (2) is witha single SNV transformation on the full spectrum (3) is with optimized LSNV (4) is with optimized Dynamic DLSNV (5) is with optimizedPSNV and (6) is with optimized PPSNV
Journal of Spectroscopy 9
e summarized RMSEPs of the regression to the antiox-idant additive of ATF B are shown in Figure 10 in a box-and-whisker plot representation where Figure 10(a) refers to thetransmission spectra and Figure 10(b) to the absorbance spectraIn this case DLSNV PSNV and PPSNV reduce both themedian and the scattering around themedian enormously whencompared with SNV on full spectra e best performance isachieved by PPSNV conducted on the transmission spectracon13rmed by Table 2 On transmission spectra DLSNV andPPSNV perform better than LSNV but PSNV only has a pos-itive eshyect when compared to SNV In relation to LSNV usingPSNV the predictive power is reduced e fact that PPSNV isthe best choice for this regression use case suggests that it isbene13cial to only use spectral regions with high correlation withthe target value and drop regions without or low correlation
411 Noise Robustness In Figure 11 the performance of theregression model for the prediction of noisy spectra isshown for the friction modi13er In subplot Figure 11(a)the curves for all preprocessings are depicted and inFigure 11(b) a zoomed view is shown Without any pre-processing the initial calibration error for both trans-mission and absorbance spectra is very poor and rises veryfast with the increasing noise level factor Although thesophisticated preprocessing methods LSNV DLSNVPNSV and PPSNV show a lower initial calibration errorthe slope of the error is lower than for SNV In Figure 11(b)the trend of the transmission spectra having low errorsteepness is visible One may say that the three proposedstandardization techniques show a very similar noise ro-bustness behavior and are signi13cantly better than none or
(a) (b)
RMSE
P
Proposedmethods
Proposedmethods
(1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)00
02
04
06
08
10
12
14
+
+
++
++
+
++
+
Figure 10 Box-and-whisker plot representation of the root-mean-squared error of prediction of the cross validation strategy for thephenolic antioxidant compound of ATF B In (a) the RMSEP values for the transmission spectra are shown and in (b) the RMSEP valuesfor the absorbance spectra are shown Boxplot (1) is without standardization (2) is with a single SNV transformation on the full spectrum(3) is with optimized LSNV (4) is with optimized Dynamic DLSNV (5) is with optimized PSNV and (6) is with optimized PPSNV
Table 2 Summary of the model performances described by the mean value and the standard deviation of the RMSEP values during crossvalidation e relative improvements and respective p-values compared with LSNV are also listed
MethodFriction modi13er ATF A Antioxidant ATF B
Transmission Absorbance Transmission AbsorbanceRaw 083plusmn 022 053plusmn 016 090plusmn 014 070plusmn 011Single SNV 034plusmn 014 042plusmn 017 061plusmn 011 070plusmn 013LSNV 030plusmn 007 031plusmn 007 024plusmn 004 024plusmn 004DLSNV 028plusmn 006 026 plusmn 005 021plusmn 004 021plusmn 004Rel improvement 9 (plt 005) 16 (plt 0001) 13 (plt 0001) 13 (plt 0001)PSNV 028plusmn 008 028plusmn 006 033plusmn 006 031plusmn 007Rel improvement 8 (pgt 005) 9 (plt 005) minus41 (plt 0001) minus33 (plt 0001)PPSNV 026plusmn 006 026plusmn 006 017 plusmn 004 024plusmn 005Rel improvement 13 (plt 001) 15 (plt 0001) 29 (plt 0001) minus3 (pgt 005)
10 Journal of Spectroscopy
SNV preprocessing indicating that the model did notoverfit the data as mentioned in Section 332
In Figure 12 the performance of the regression model ofthe antioxidant for the prediction of noisy spectra is shown(e absorbance spectra with or without preprocessing showa similar noise trend as the SNV-transformed spectra InFigure 12(b) the localized versions are shown in a zoomedview (e PPSNV preprocessing on the transmission spectrais characterized by the flattest noise dependency (ese resutsdemonstrate the superiority of the PPSNVmethod in this usecase As mentioned above PSNV is not advantageous in thisapplication and shows the lowest noise immunity but it ispreferable to the SNV across the whole spectrum
42 Summary (e optimized parameters for the pre-processing methods are summarized in Table 3 (e LSNVoptimization process selects the same window size asDLSNV (us the second window size run has no influenceon the final result in these two cases but the starting pointproduces an improvement
As already mentioned in Table 2 the cross validationperformances of the tested methods are summarized as meanvalues and standard deviation for all cross validation runs Forthe friction modifier the performances of DLSNV PSNVand PPSNV are very similar Table 2 also lists the relative
improvements against the benchmark preprocessing LSNVaccompanied by corresponding p values from a two-sidedt-test which tests the significance of the mean values beingdifferent (the deviation for relative improvements when thesame mean value is given due to the fact that the improve-ments were calculated from exact values rather than roundedvalues)
(e best mean RMSEP value for the regression model forthe frictionmodifier of 026 is produced by DLSNV based onabsorbance spectra (e antioxidant compound is modeledbest by PPSNV preprocessing of the transmission spectraand yields a very low prediction error of 017
To summarize one may say that all proposed methodsperformed very well reducing both mean and standarddeviation of the cross validation error compared with SNVPSNV is not reasonable for the antioxidant additive as theperformance is poor compared with the benchmark pre-processing method LSNV
Which preprocessing method is the best depends on theactual regression use case but in general it is shown thatPPSNV outperforms PSNV (is suggests that it is bene-ficial to drop spectral regions showing low or no de-pendency on the target value and to only consider highlycorrelated peaks
For the antioxidant compound PPSNV yielded anenormous improvement (is could be explained as the
Noise levelfactor
Wo standardizationSNVLSNVDLSNV
PSNVPPSNVTransmissionAbsorbance
14
12
10
08
RMSE
06
04
02
0001 02 03 04
(a)
RMSE
Noise levelfactor
LSNVDLSNVPSNV
PPSNVTransmissionAbsorbance
045
040
035
030
025
020
01501 02 03 04
(b)
Figure 11 RMSE as a function of the noise level factor for the regression of the friction modifier compound of ATF A calibrated by theunperturbed full data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50repetitions of noise addition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown
Journal of Spectroscopy 11
phenolic aging inhibitor is a compound with very narrowvibrational band in the ATF B and thus does not have a greatimpact when the SNV is carried out across the entirespectrum (is may lead to a suboptimal alignment of thisband In case of novel standardizations the SNV is opti-mized to the high correlative bands and scatter effects can becompensated for these exact regions
(e fact that PSNV is unsuccessful for the antioxidantmay be because the POIs are not centered to the middle ofthe SNVwindow andmay have large left and right margins ifthey are far away from other POIs As a result they may notbe optimally standardized (is is shown in Figure 6(b)where the single POI at about 2700 cmminus1 has a large singleSNV window
(e study provides an overview of model performanceswhen using transmission or absorbance spectra suggestingthat both cases can lead to valid regressionmodels Howeverfor quantitative models built on transmission spectra theSNV is vital whereas in the absorbance case the predictivepower does not depend on SNV transformation In absor-bance spectra the influence of the baseline constant is re-duced because high transmission values are converted intolow absorbance values
To conclude DLSNV PSNV and PPSNV were able toimprove both transmission and absorbance predictivemodels (e scattering around the mean values are also
drastically reduced because the model does not have to learnhow to compensate for the baseline shift in each crossvalidation step leading to more reproducible results Eachvibrational band is optimally aligned so that the additivedepletion trend is encoded in the absolute signal intensityand the model does not have to weigh a data point asbackground correction
5 Conclusion
(e results presented in this study demonstrate the out-performance of the proposed novel standardizationstrategies Dynamic Localized SNV Peak SNV and PartialPeak SNV to improve both the mean and scatter of RMSEP
Noise level factor
RMSE
Wo standardizationSNVLSNVDLSNV
PSNVPPSNVTransmissionAbsorbance
20
15
10
05
0001 02 03 04
(a)
LSNVDLSNVPSNV
PPSNVTransmissionAbsorbance
RMSE
Noise level factor
07
06
05
04
03
02
0101 02 03 04
(b)
Figure 12 RMSE as a function of the noise level factor for the regression of the antioxidant compound of ATF B calibrated by the originalfull data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50 repetitions of noiseaddition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown
Table 3 Summary of the preprocessing parameters For DLSNVwindow size and starting point for PSN agglomeration windowand for PPSNV window width around POI is shown
MethodATF A ATF B
Transmission Absorbance Transmission AbsorbanceLSNV wopt 52 wopt 52 wopt 51 wopt 2 51DLSNV wopt 2 52 wopt 2 52 wopt 2 51 wopt 2 51
sopt 5 sopt 7 sopt 48 sopt 48PSNV aggopt 5 aggopt 14 aggopt 6 aggopt 10PPSNV pwopt 17 pwopt 25 pwopt 15 pwopt 38
12 Journal of Spectroscopy
values in cross validation and the robustness against noisedrastically with respect to SNV transformation executed onthe entire spectrum Against the benchmark LSNV anenhancement of the predictive power of a ridge regressionmodel by up to 16 and 29 could be achieved for thefriction modifier and the antioxidant compound re-spectively (e demonstrated optimization workflows forperforming SNV on specific regions of the spectrum havebeen introduced here for the first time (erefore thestandardization methods used in this paper are capable ofeliminating nonlinearities by flexible rescaling in definedareas To our knowledge such standardization techniqueshave not been presented elsewhere
Data Availability
(e data used to support the findings of this study are in-cluded within the article
Conflicts of Interest
(e authors declare that they have no conflicts of interest
References
[1] M Zeaiter J M Roger and V Bellon-Maurel ldquoDynamicorthogonal projection a new method to maintain the on-linerobustness of multivariate calibrations application to nir-based monitoring of wine fermentationsrdquo Chemometrics andIntelligent Laboratory Systems vol 80 no 2 pp 227ndash2352006
[2] R M Balabin and R Z Safieva ldquoMotor oil classificationby base stock and viscosity based on near infrared (nir)spectroscopy datardquo Fuel vol 87 no 12 pp 2745ndash27522008
[3] A Borin and R J Poppi ldquoMultivariate quality control oflubricating oils using fourier transform infrared spectros-copyrdquo Journal of the Brazilian Chemical Society vol 15 no 4pp 570ndash576 2004
[4] J Huang D Brennan L Sattler J Alderman B Lane andC OrsquoMathuna ldquoA comparison of calibration methods basedon calibration data size and robustnessrdquo Chemometrics andIntelligent Laboratory Systems vol 62 no 1 pp 25ndash352002
[5] M A Al-Ghouti and L Al-Atoum ldquoVirgin and recycledengine oil differentiation a spectroscopic studyrdquo Journal ofEnvironmental Management vol 90 no 1 pp 187ndash195 2009
[6] R Kellner J M Mermet M Otto M Valcarcel andH M Widmer Analytical Chemistry John Wiley amp SonsAustralia Limited Milton Australia 2004
[7] B G Osborne T Fearn P T Hindle and B G OsbornePractical nir Spectroscopy with Applications in Food andBeverage Analysis Longman Scientific amp Technical WileyHarlow Essex UK 1993
[8] J D Donald and D D Kevin Interpreting Diffuse Reflectanceand Transmittance NIR Chichester UK 2007
[9] A Rinnan F van den Berg and S B Engelsen ldquoReview of themost common pre-processing techniques for near-infraredspectrardquo TrAC Trends in Analytical Chemistry vol 28 no 10pp 1201ndash1222 2009
[10] E Arendse O Amos Fawole L Samukelo MagwazaN Helene and U Linus Opara ldquoComparing the analyticalperformance of near and mid infrared spectrometers for
evaluating pomegranate juice qualityrdquo LWT vol 91pp 180ndash190 2018
[11] J C Machado M A Faria I M P L V O FerreiraR N M J Pascoa and J A Lopes ldquoVarietal discrimination ofhop pellets by near and mid infrared spectroscopyrdquo Talantavol 180 pp 69ndash75 2018
[12] H W Siesler ldquoVibrational spectroscopyrdquo in ReferenceModule in Materials Science and Materials EngineeringElsevier New York NY USA 2016
[13] A Paula Craig B G Botelho L S Oliveira and A S FrancaldquoMid infrared spectroscopy and chemometrics as tools for theclassification of roasted coffees by cup qualityrdquo FoodChemistry vol 245 pp 1052ndash1061 2018
[14] S R Khandasammy M A Fikiet E Mistek et al ldquoBloodstainspaintings and drugs raman spectroscopy applications in fo-rensic sciencerdquo Forensic Chemistry vol 8 pp 111ndash133 2018
[15] J Engel G Jan E Szymanska et al ldquoBreaking with trends inpre-processingrdquo TrAC Trends in Analytical Chemistryvol 50 pp 96ndash106 2013
[16] H Martens S Jensen and P Geladi ldquoMultivariate linearitytransformation for near-infrared reflectance spectrometryrdquo inProceedings of Nordic symposium on Applied Statisticspp 205ndash234 Stavanger Norway June 1983
[17] P Geladi D MacDougall and HMartens ldquoLinearization andscatter-correction for near-infrared reflectance spectra ofmeatrdquo Applied Spectroscopy vol 39 no 3 pp 491ndash500 1985
[18] P Kubelka and F Munk ldquoEin beitrag zur optik der far-banstricherdquo Zeitschrift fur Technische Physik vol 12pp 593ndash601 1931
[19] A Claus Andersson ldquoDirect orthogonalizationrdquo Chemo-metrics and Intelligent Laboratory Systems vol 47 no 1pp 51ndash63 1999
[20] R J Barnes M S Dhanoa and S J Lister ldquoStandard normalvariate transformation and de-trending of near-infrareddiffuse reflectance spectrardquo Applied Spectroscopy vol 43no 5 pp 772ndash777 1989
[21] M Zeaiter J-M Roger and V Bellon-Maurel ldquoRobustnessof models developed by multivariate calibration part ii theinfluence of pre-processing methodsrdquo TrAC Trends in Ana-lytical Chemistry vol 24 no 5 pp 437ndash445 2005
[22] T Fearn C Riccioli A Garrido-Varo and J Emilio Guer-rero-Ginel ldquoOn the geometry of SNV and mscrdquo Chemo-metrics and Intelligent Laboratory Systems vol 96 no 1pp 22ndash26 2009
[23] T Isaksson and B Kowalski ldquoPiece-wise multiplicative scattercorrection applied to near-infrared diffuse transmittance datafrom meat productsrdquo Applied Spectroscopy vol 47 no 6pp 702ndash709 1993
[24] Y Bi K Yuan W Xiao et al ldquoA local pre-processing methodfor near-infrared spectra combined with spectral segmen-tation and standard normal variate transformationrdquo Analy-tica Chimica Acta vol 909 pp 30ndash40 2016
[25] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 pp 2825ndash2830 2011
[26] S Raschka Python Machine Learning Packt PublishingBirmingham UK 2015
[27] E L Lehmann and G Casella Deory of Point EstimationSpringer Texts in Statistics Springer New York NY USA2003
[28] O Heinisch ldquoSteel R G D and J H Torrie principles andprocedures of statistics (with special reference to the bi-ological sciences) mcgraw-hill book company New York
Journal of Spectroscopy 13
Toronto London 1960 481 s 15 abb 81 s 6 drdquo BiometrischeZeitschrift vol 4 no 3 pp 207-208 1962
[29] K Pearson ldquoNote on regression and inheritance in the case oftwo parentsrdquo Proceedings of the Royal Society of Londonvol 58 no 1 pp 240ndash242 1895
14 Journal of Spectroscopy
TribologyAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
International Journal ofInternational Journal ofPhotoenergy
Hindawiwwwhindawicom Volume 2018
Journal of
Chemistry
Hindawiwwwhindawicom Volume 2018
Advances inPhysical Chemistry
Hindawiwwwhindawicom
Analytical Methods in Chemistry
Journal of
Volume 2018
Bioinorganic Chemistry and ApplicationsHindawiwwwhindawicom Volume 2018
SpectroscopyInternational Journal of
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Medicinal ChemistryInternational Journal of
Hindawiwwwhindawicom Volume 2018
NanotechnologyHindawiwwwhindawicom Volume 2018
Journal of
Applied ChemistryJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Biochemistry Research International
Hindawiwwwhindawicom Volume 2018
Enzyme Research
Hindawiwwwhindawicom Volume 2018
Journal of
SpectroscopyAnalytical ChemistryInternational Journal of
Hindawiwwwhindawicom Volume 2018
MaterialsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
BioMed Research International Electrochemistry
International Journal of
Hindawiwwwhindawicom Volume 2018
Na
nom
ate
ria
ls
Hindawiwwwhindawicom Volume 2018
Journal ofNanomaterials
Submit your manuscripts atwwwhindawicom
353 Partial Peak SNV e idea behind Partial Peak SNV issimilar to PSNV picking the regions of the spectrum whichshow a high correlation with the target values agglomeratingPOIs in close proximity and standardizing these importantspectral features (Figure 7(a)) But unlike for PSNV not onlythe whole spectrum is 13nally taken into account but alsoa small window around the POI It may occur that the samedata point appears several times in dishyerent standardizations(see overlapping regions in Figures 7(b) and 7(d)) Due to thisworkow the PPSNV spectrum may have more data points(due to overlapping) or less (because not the entire spectrum istaken into account) than those of the original spectrum APPSNV spectrum is calculated as follows
PPSNV algorithm
(i) Perform SNV across the POIs with a left and rightmargin of pw
e optimization focuses on the adjustment of thewindow size pw around the POIs in which the SNV isapplied for maximal predictive power in calibration (Fig-ure 7(c)) e optimization process is divided into thefollowing steps
(1) Fit the data set to the target values (only calibration)(2) Pick peaks from the normalized model coecient
vector (|w|max(|w|)) threshold for peaks 01
Wavenumbers (cmndash1) Wavenumbers (cmndash1)
(b)(a)Wavenumbers (cmndash1)
2500
3500 3450 3400 3350 3500 3450 3400 3350
3500 3000 2000 1000 3500 3000 2000 1000Wavenumbers (cmndash1)
2500
(d)(c)
Figure 5 Demonstration of the improvement of peak alignment for DLSNV (a) SNV-transformed transmission spectra with marked zoomlevel of (c) (b) Optimized DLSV spectra with marked zoom area of (d) Magenta indicates (relatively) fresh samples whereas red indicatesa strong degradation level
R2
Agglomeration window size
Optimization
(c)Wavenumbers (cmndash1)
2500
Divide spectrum in peak windows
Agglomeratepeaks andperform
peak SNV
Wavenumbers (cmndash1)2500
Model coefficientsselected peaks
(b)(a)
10 20 30 40 503500 3000 2000 10003500 3000 2000 1000 3500 3000 2000 1000Wavenumbers (cmndash1)
2500
Final peak SNV
(d)
Find optimalagglomerationwindow size 095
096
Figure 6 Demonstration of the workow and optimization process for Dynamic Localized SNV (a) Picked peaks of normalized absolutecoecient vector and indication of the POIs in one spectrum (b) spectrum separation according to agglomerated peaks (c) optimizationprocess in order to 13nd optimal agglomeration window size and (d) optimized PSNV
Journal of Spectroscopy 7
Reference friction modifier peak volume
Nonlinearity
ndash3
ndash2
ndash1
0
1
2
Calibration samplesValidation samples
Pred
icte
d fri
ctio
n m
odifi
er p
eak
volu
me
Decreasein additive concentration
ndash3 ndash2 ndash1 0 1 2
(a)
Calibration samplesValidation samples
Reference friction modifier peak volume
ndash3
ndash2
ndash1
0
1
2
Pred
icte
d fri
ctio
n m
odifi
er p
eak
volu
me
ndash3 ndash2 ndash1 0 1 2
(b)
Calibration samplesValidation samples
ndash3
ndash2
ndash1
0
1
2
Pred
icte
d fri
ctio
n m
odifi
er p
eak
volu
me
Reference friction modifier peak volumendash3 ndash2 ndash1 0 1 2
(c)
Calibration samplesValidation samples
ndash3 ndash2 ndash1 0 1 2
ndash3
ndash2
ndash1
0
1
2
Pred
icte
d fri
ctio
n m
odifi
er p
eak
volu
me
Reference friction modifier peak volume
(d)
Figure 8 Cross validation results for (a) SNV (b) Dynamic Localized SNV (c) Peak SNV and (d) Partial Peak SNV preprocessed spectra inthe regression case of the friction modi13er additive e reference values were measured by HPLC-QToF Red dots indicate prediction ofcalibration samples and blue dots represent prediction of the hold-out validation samples e dashed arrow in (a) indicates the increasingdegradation of the samples with increasing time as the friction modi13er additive concentration gets lower
R2
Window size0 50 100 150 200
Optimization
(c)
3500 3000 2000 1000Wavenumbers (cmndash1)
2500
Divide spectrum in peak windows
Agglomeratepeaks andperform PartPeak SNV
3500 3000 2000 1000Wavenumbers (cmndash1)
2500
Model coefficientsselected peaks
(b)(a)
150010000Data points
500
Final part peak SNV
(d)
Find optimalwindow size
092
097
Figure 7 Demonstration of the workow and optimization process for Dynamic Localized SNV (a) Picked peaks of normalized absolutecoecient vector and indication of the POIs in one spectrum (b) spectrum separation according to agglomerated peaks (c) optimizationprocess in order to 13nd optimal window size around the POIs and (d) optimized PPSNV
8 Journal of Spectroscopy
(3) Perform PPSNV across the peaks with the windowsize pw
(4) Evaluate the performance via R2 for pw between 1and 200 data points and choose pwopt according tomaximal R2
4 Results and Discussion
41 Cross Validation In Figure 8(a) the cross validationrecovery function for predictions of the SNV preprocessedspectra of the regression on the friction modi13er compoundis shown A 50-fold cross validation strategy with a cali-brationvalidation splitting of 7030 was used Red dotsrepresent the prediction of calibration and blue dots rep-resent validation samples It is obvious that the linear modelstruggles to predict the high and low compound intensityregions correctly e nonlinearity is visualized by an arrowand a dashed line to guide the eye
In Figure 8 also the cross validation recovery functionfor predictions after Dynamic Localized SNV (Figure 8(b))Peak SNV (Figure 8(c)) and Partial Peak SNV (Figure 8(d))optimization are shown e saturation eshyect in the lowintensity area of the compound response is almost com-pletely removed in the latter three cases It is also notablethat the scattering around the green bisecting line is sig-ni13cantly reduced us the con13dence interval for thepredictions is improved
e RMSEP values during the cross validation of theregression of the friction modi13er component are sum-marized in Figure 9 in a box-and-whisker plot representa-tion e red line indicates the median within the boxes theinterquartile range (IQR) (contains 50 of the data) isdepicted and the margins of the whiskers representQ1 minus 15 middot IQR and Q3 + 15 middot IQR for the lower and upperbound respectively (Q1 means the smallest 25 of the dataset are smaller than this value and Q3 means the smallest75 are smaller than this value) Subplot Figure 9(a) refers tothe transmission spectra and Figure 9(b) refers to the ab-sorbance spectra e labels are associated with (1) withoutstandardization (2) single SNV transformation on the fullspectral range (3) Localized SNV (4) Dynamic LocalizedSNV (5) Peak SNV and (6) Partial Peak SNV
It is noticeable that the RMSEP is very poor in case of thecrude transmission spectra and that SNV has a very usefulimpact on them whereas the improvement after SNV is lowfor absorbance spectra
For all sophisticated optimized standardization ap-proaches DLSNV PSNV and PPSNV the median and thescattering around the median of RMSEP decreases drasti-cally with respect to the SNV-transformed full spectra butalso LSNV seems to be a reasonable choice DLSNV onabsorbance spectra is characterized by the lowest medianand the smallest scattering con13rmed by Table 2 summa-rizing the mean values and standard deviation of RMSEP
RMSE
P Proposedmethods
Proposedmethods
(1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)
+ ++
+ +
+++
++++
00
02
04
06
08
10
12
14
16
(a) (b)
Figure 9 Box-and-whisker plot representation of the root-mean-squared error of prediction of the cross validation strategy (50 foldsrandom train test split of 7030 of the data) for the friction modi13er compound of ATF A In (a) the RMSEP values for the transmissionspectra are shown and in (b) the RMSEP values for the absorbance spectra are shown Boxplot (1) is without standardization (2) is witha single SNV transformation on the full spectrum (3) is with optimized LSNV (4) is with optimized Dynamic DLSNV (5) is with optimizedPSNV and (6) is with optimized PPSNV
Journal of Spectroscopy 9
e summarized RMSEPs of the regression to the antiox-idant additive of ATF B are shown in Figure 10 in a box-and-whisker plot representation where Figure 10(a) refers to thetransmission spectra and Figure 10(b) to the absorbance spectraIn this case DLSNV PSNV and PPSNV reduce both themedian and the scattering around themedian enormously whencompared with SNV on full spectra e best performance isachieved by PPSNV conducted on the transmission spectracon13rmed by Table 2 On transmission spectra DLSNV andPPSNV perform better than LSNV but PSNV only has a pos-itive eshyect when compared to SNV In relation to LSNV usingPSNV the predictive power is reduced e fact that PPSNV isthe best choice for this regression use case suggests that it isbene13cial to only use spectral regions with high correlation withthe target value and drop regions without or low correlation
411 Noise Robustness In Figure 11 the performance of theregression model for the prediction of noisy spectra isshown for the friction modi13er In subplot Figure 11(a)the curves for all preprocessings are depicted and inFigure 11(b) a zoomed view is shown Without any pre-processing the initial calibration error for both trans-mission and absorbance spectra is very poor and rises veryfast with the increasing noise level factor Although thesophisticated preprocessing methods LSNV DLSNVPNSV and PPSNV show a lower initial calibration errorthe slope of the error is lower than for SNV In Figure 11(b)the trend of the transmission spectra having low errorsteepness is visible One may say that the three proposedstandardization techniques show a very similar noise ro-bustness behavior and are signi13cantly better than none or
(a) (b)
RMSE
P
Proposedmethods
Proposedmethods
(1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)00
02
04
06
08
10
12
14
+
+
++
++
+
++
+
Figure 10 Box-and-whisker plot representation of the root-mean-squared error of prediction of the cross validation strategy for thephenolic antioxidant compound of ATF B In (a) the RMSEP values for the transmission spectra are shown and in (b) the RMSEP valuesfor the absorbance spectra are shown Boxplot (1) is without standardization (2) is with a single SNV transformation on the full spectrum(3) is with optimized LSNV (4) is with optimized Dynamic DLSNV (5) is with optimized PSNV and (6) is with optimized PPSNV
Table 2 Summary of the model performances described by the mean value and the standard deviation of the RMSEP values during crossvalidation e relative improvements and respective p-values compared with LSNV are also listed
MethodFriction modi13er ATF A Antioxidant ATF B
Transmission Absorbance Transmission AbsorbanceRaw 083plusmn 022 053plusmn 016 090plusmn 014 070plusmn 011Single SNV 034plusmn 014 042plusmn 017 061plusmn 011 070plusmn 013LSNV 030plusmn 007 031plusmn 007 024plusmn 004 024plusmn 004DLSNV 028plusmn 006 026 plusmn 005 021plusmn 004 021plusmn 004Rel improvement 9 (plt 005) 16 (plt 0001) 13 (plt 0001) 13 (plt 0001)PSNV 028plusmn 008 028plusmn 006 033plusmn 006 031plusmn 007Rel improvement 8 (pgt 005) 9 (plt 005) minus41 (plt 0001) minus33 (plt 0001)PPSNV 026plusmn 006 026plusmn 006 017 plusmn 004 024plusmn 005Rel improvement 13 (plt 001) 15 (plt 0001) 29 (plt 0001) minus3 (pgt 005)
10 Journal of Spectroscopy
SNV preprocessing indicating that the model did notoverfit the data as mentioned in Section 332
In Figure 12 the performance of the regression model ofthe antioxidant for the prediction of noisy spectra is shown(e absorbance spectra with or without preprocessing showa similar noise trend as the SNV-transformed spectra InFigure 12(b) the localized versions are shown in a zoomedview (e PPSNV preprocessing on the transmission spectrais characterized by the flattest noise dependency (ese resutsdemonstrate the superiority of the PPSNVmethod in this usecase As mentioned above PSNV is not advantageous in thisapplication and shows the lowest noise immunity but it ispreferable to the SNV across the whole spectrum
42 Summary (e optimized parameters for the pre-processing methods are summarized in Table 3 (e LSNVoptimization process selects the same window size asDLSNV (us the second window size run has no influenceon the final result in these two cases but the starting pointproduces an improvement
As already mentioned in Table 2 the cross validationperformances of the tested methods are summarized as meanvalues and standard deviation for all cross validation runs Forthe friction modifier the performances of DLSNV PSNVand PPSNV are very similar Table 2 also lists the relative
improvements against the benchmark preprocessing LSNVaccompanied by corresponding p values from a two-sidedt-test which tests the significance of the mean values beingdifferent (the deviation for relative improvements when thesame mean value is given due to the fact that the improve-ments were calculated from exact values rather than roundedvalues)
(e best mean RMSEP value for the regression model forthe frictionmodifier of 026 is produced by DLSNV based onabsorbance spectra (e antioxidant compound is modeledbest by PPSNV preprocessing of the transmission spectraand yields a very low prediction error of 017
To summarize one may say that all proposed methodsperformed very well reducing both mean and standarddeviation of the cross validation error compared with SNVPSNV is not reasonable for the antioxidant additive as theperformance is poor compared with the benchmark pre-processing method LSNV
Which preprocessing method is the best depends on theactual regression use case but in general it is shown thatPPSNV outperforms PSNV (is suggests that it is bene-ficial to drop spectral regions showing low or no de-pendency on the target value and to only consider highlycorrelated peaks
For the antioxidant compound PPSNV yielded anenormous improvement (is could be explained as the
Noise levelfactor
Wo standardizationSNVLSNVDLSNV
PSNVPPSNVTransmissionAbsorbance
14
12
10
08
RMSE
06
04
02
0001 02 03 04
(a)
RMSE
Noise levelfactor
LSNVDLSNVPSNV
PPSNVTransmissionAbsorbance
045
040
035
030
025
020
01501 02 03 04
(b)
Figure 11 RMSE as a function of the noise level factor for the regression of the friction modifier compound of ATF A calibrated by theunperturbed full data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50repetitions of noise addition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown
Journal of Spectroscopy 11
phenolic aging inhibitor is a compound with very narrowvibrational band in the ATF B and thus does not have a greatimpact when the SNV is carried out across the entirespectrum (is may lead to a suboptimal alignment of thisband In case of novel standardizations the SNV is opti-mized to the high correlative bands and scatter effects can becompensated for these exact regions
(e fact that PSNV is unsuccessful for the antioxidantmay be because the POIs are not centered to the middle ofthe SNVwindow andmay have large left and right margins ifthey are far away from other POIs As a result they may notbe optimally standardized (is is shown in Figure 6(b)where the single POI at about 2700 cmminus1 has a large singleSNV window
(e study provides an overview of model performanceswhen using transmission or absorbance spectra suggestingthat both cases can lead to valid regressionmodels Howeverfor quantitative models built on transmission spectra theSNV is vital whereas in the absorbance case the predictivepower does not depend on SNV transformation In absor-bance spectra the influence of the baseline constant is re-duced because high transmission values are converted intolow absorbance values
To conclude DLSNV PSNV and PPSNV were able toimprove both transmission and absorbance predictivemodels (e scattering around the mean values are also
drastically reduced because the model does not have to learnhow to compensate for the baseline shift in each crossvalidation step leading to more reproducible results Eachvibrational band is optimally aligned so that the additivedepletion trend is encoded in the absolute signal intensityand the model does not have to weigh a data point asbackground correction
5 Conclusion
(e results presented in this study demonstrate the out-performance of the proposed novel standardizationstrategies Dynamic Localized SNV Peak SNV and PartialPeak SNV to improve both the mean and scatter of RMSEP
Noise level factor
RMSE
Wo standardizationSNVLSNVDLSNV
PSNVPPSNVTransmissionAbsorbance
20
15
10
05
0001 02 03 04
(a)
LSNVDLSNVPSNV
PPSNVTransmissionAbsorbance
RMSE
Noise level factor
07
06
05
04
03
02
0101 02 03 04
(b)
Figure 12 RMSE as a function of the noise level factor for the regression of the antioxidant compound of ATF B calibrated by the originalfull data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50 repetitions of noiseaddition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown
Table 3 Summary of the preprocessing parameters For DLSNVwindow size and starting point for PSN agglomeration windowand for PPSNV window width around POI is shown
MethodATF A ATF B
Transmission Absorbance Transmission AbsorbanceLSNV wopt 52 wopt 52 wopt 51 wopt 2 51DLSNV wopt 2 52 wopt 2 52 wopt 2 51 wopt 2 51
sopt 5 sopt 7 sopt 48 sopt 48PSNV aggopt 5 aggopt 14 aggopt 6 aggopt 10PPSNV pwopt 17 pwopt 25 pwopt 15 pwopt 38
12 Journal of Spectroscopy
values in cross validation and the robustness against noisedrastically with respect to SNV transformation executed onthe entire spectrum Against the benchmark LSNV anenhancement of the predictive power of a ridge regressionmodel by up to 16 and 29 could be achieved for thefriction modifier and the antioxidant compound re-spectively (e demonstrated optimization workflows forperforming SNV on specific regions of the spectrum havebeen introduced here for the first time (erefore thestandardization methods used in this paper are capable ofeliminating nonlinearities by flexible rescaling in definedareas To our knowledge such standardization techniqueshave not been presented elsewhere
Data Availability
(e data used to support the findings of this study are in-cluded within the article
Conflicts of Interest
(e authors declare that they have no conflicts of interest
References
[1] M Zeaiter J M Roger and V Bellon-Maurel ldquoDynamicorthogonal projection a new method to maintain the on-linerobustness of multivariate calibrations application to nir-based monitoring of wine fermentationsrdquo Chemometrics andIntelligent Laboratory Systems vol 80 no 2 pp 227ndash2352006
[2] R M Balabin and R Z Safieva ldquoMotor oil classificationby base stock and viscosity based on near infrared (nir)spectroscopy datardquo Fuel vol 87 no 12 pp 2745ndash27522008
[3] A Borin and R J Poppi ldquoMultivariate quality control oflubricating oils using fourier transform infrared spectros-copyrdquo Journal of the Brazilian Chemical Society vol 15 no 4pp 570ndash576 2004
[4] J Huang D Brennan L Sattler J Alderman B Lane andC OrsquoMathuna ldquoA comparison of calibration methods basedon calibration data size and robustnessrdquo Chemometrics andIntelligent Laboratory Systems vol 62 no 1 pp 25ndash352002
[5] M A Al-Ghouti and L Al-Atoum ldquoVirgin and recycledengine oil differentiation a spectroscopic studyrdquo Journal ofEnvironmental Management vol 90 no 1 pp 187ndash195 2009
[6] R Kellner J M Mermet M Otto M Valcarcel andH M Widmer Analytical Chemistry John Wiley amp SonsAustralia Limited Milton Australia 2004
[7] B G Osborne T Fearn P T Hindle and B G OsbornePractical nir Spectroscopy with Applications in Food andBeverage Analysis Longman Scientific amp Technical WileyHarlow Essex UK 1993
[8] J D Donald and D D Kevin Interpreting Diffuse Reflectanceand Transmittance NIR Chichester UK 2007
[9] A Rinnan F van den Berg and S B Engelsen ldquoReview of themost common pre-processing techniques for near-infraredspectrardquo TrAC Trends in Analytical Chemistry vol 28 no 10pp 1201ndash1222 2009
[10] E Arendse O Amos Fawole L Samukelo MagwazaN Helene and U Linus Opara ldquoComparing the analyticalperformance of near and mid infrared spectrometers for
evaluating pomegranate juice qualityrdquo LWT vol 91pp 180ndash190 2018
[11] J C Machado M A Faria I M P L V O FerreiraR N M J Pascoa and J A Lopes ldquoVarietal discrimination ofhop pellets by near and mid infrared spectroscopyrdquo Talantavol 180 pp 69ndash75 2018
[12] H W Siesler ldquoVibrational spectroscopyrdquo in ReferenceModule in Materials Science and Materials EngineeringElsevier New York NY USA 2016
[13] A Paula Craig B G Botelho L S Oliveira and A S FrancaldquoMid infrared spectroscopy and chemometrics as tools for theclassification of roasted coffees by cup qualityrdquo FoodChemistry vol 245 pp 1052ndash1061 2018
[14] S R Khandasammy M A Fikiet E Mistek et al ldquoBloodstainspaintings and drugs raman spectroscopy applications in fo-rensic sciencerdquo Forensic Chemistry vol 8 pp 111ndash133 2018
[15] J Engel G Jan E Szymanska et al ldquoBreaking with trends inpre-processingrdquo TrAC Trends in Analytical Chemistryvol 50 pp 96ndash106 2013
[16] H Martens S Jensen and P Geladi ldquoMultivariate linearitytransformation for near-infrared reflectance spectrometryrdquo inProceedings of Nordic symposium on Applied Statisticspp 205ndash234 Stavanger Norway June 1983
[17] P Geladi D MacDougall and HMartens ldquoLinearization andscatter-correction for near-infrared reflectance spectra ofmeatrdquo Applied Spectroscopy vol 39 no 3 pp 491ndash500 1985
[18] P Kubelka and F Munk ldquoEin beitrag zur optik der far-banstricherdquo Zeitschrift fur Technische Physik vol 12pp 593ndash601 1931
[19] A Claus Andersson ldquoDirect orthogonalizationrdquo Chemo-metrics and Intelligent Laboratory Systems vol 47 no 1pp 51ndash63 1999
[20] R J Barnes M S Dhanoa and S J Lister ldquoStandard normalvariate transformation and de-trending of near-infrareddiffuse reflectance spectrardquo Applied Spectroscopy vol 43no 5 pp 772ndash777 1989
[21] M Zeaiter J-M Roger and V Bellon-Maurel ldquoRobustnessof models developed by multivariate calibration part ii theinfluence of pre-processing methodsrdquo TrAC Trends in Ana-lytical Chemistry vol 24 no 5 pp 437ndash445 2005
[22] T Fearn C Riccioli A Garrido-Varo and J Emilio Guer-rero-Ginel ldquoOn the geometry of SNV and mscrdquo Chemo-metrics and Intelligent Laboratory Systems vol 96 no 1pp 22ndash26 2009
[23] T Isaksson and B Kowalski ldquoPiece-wise multiplicative scattercorrection applied to near-infrared diffuse transmittance datafrom meat productsrdquo Applied Spectroscopy vol 47 no 6pp 702ndash709 1993
[24] Y Bi K Yuan W Xiao et al ldquoA local pre-processing methodfor near-infrared spectra combined with spectral segmen-tation and standard normal variate transformationrdquo Analy-tica Chimica Acta vol 909 pp 30ndash40 2016
[25] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 pp 2825ndash2830 2011
[26] S Raschka Python Machine Learning Packt PublishingBirmingham UK 2015
[27] E L Lehmann and G Casella Deory of Point EstimationSpringer Texts in Statistics Springer New York NY USA2003
[28] O Heinisch ldquoSteel R G D and J H Torrie principles andprocedures of statistics (with special reference to the bi-ological sciences) mcgraw-hill book company New York
Journal of Spectroscopy 13
Toronto London 1960 481 s 15 abb 81 s 6 drdquo BiometrischeZeitschrift vol 4 no 3 pp 207-208 1962
[29] K Pearson ldquoNote on regression and inheritance in the case oftwo parentsrdquo Proceedings of the Royal Society of Londonvol 58 no 1 pp 240ndash242 1895
14 Journal of Spectroscopy
TribologyAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
International Journal ofInternational Journal ofPhotoenergy
Hindawiwwwhindawicom Volume 2018
Journal of
Chemistry
Hindawiwwwhindawicom Volume 2018
Advances inPhysical Chemistry
Hindawiwwwhindawicom
Analytical Methods in Chemistry
Journal of
Volume 2018
Bioinorganic Chemistry and ApplicationsHindawiwwwhindawicom Volume 2018
SpectroscopyInternational Journal of
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Medicinal ChemistryInternational Journal of
Hindawiwwwhindawicom Volume 2018
NanotechnologyHindawiwwwhindawicom Volume 2018
Journal of
Applied ChemistryJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Biochemistry Research International
Hindawiwwwhindawicom Volume 2018
Enzyme Research
Hindawiwwwhindawicom Volume 2018
Journal of
SpectroscopyAnalytical ChemistryInternational Journal of
Hindawiwwwhindawicom Volume 2018
MaterialsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
BioMed Research International Electrochemistry
International Journal of
Hindawiwwwhindawicom Volume 2018
Na
nom
ate
ria
ls
Hindawiwwwhindawicom Volume 2018
Journal ofNanomaterials
Submit your manuscripts atwwwhindawicom
Reference friction modifier peak volume
Nonlinearity
ndash3
ndash2
ndash1
0
1
2
Calibration samplesValidation samples
Pred
icte
d fri
ctio
n m
odifi
er p
eak
volu
me
Decreasein additive concentration
ndash3 ndash2 ndash1 0 1 2
(a)
Calibration samplesValidation samples
Reference friction modifier peak volume
ndash3
ndash2
ndash1
0
1
2
Pred
icte
d fri
ctio
n m
odifi
er p
eak
volu
me
ndash3 ndash2 ndash1 0 1 2
(b)
Calibration samplesValidation samples
ndash3
ndash2
ndash1
0
1
2
Pred
icte
d fri
ctio
n m
odifi
er p
eak
volu
me
Reference friction modifier peak volumendash3 ndash2 ndash1 0 1 2
(c)
Calibration samplesValidation samples
ndash3 ndash2 ndash1 0 1 2
ndash3
ndash2
ndash1
0
1
2
Pred
icte
d fri
ctio
n m
odifi
er p
eak
volu
me
Reference friction modifier peak volume
(d)
Figure 8 Cross validation results for (a) SNV (b) Dynamic Localized SNV (c) Peak SNV and (d) Partial Peak SNV preprocessed spectra inthe regression case of the friction modi13er additive e reference values were measured by HPLC-QToF Red dots indicate prediction ofcalibration samples and blue dots represent prediction of the hold-out validation samples e dashed arrow in (a) indicates the increasingdegradation of the samples with increasing time as the friction modi13er additive concentration gets lower
R2
Window size0 50 100 150 200
Optimization
(c)
3500 3000 2000 1000Wavenumbers (cmndash1)
2500
Divide spectrum in peak windows
Agglomeratepeaks andperform PartPeak SNV
3500 3000 2000 1000Wavenumbers (cmndash1)
2500
Model coefficientsselected peaks
(b)(a)
150010000Data points
500
Final part peak SNV
(d)
Find optimalwindow size
092
097
Figure 7 Demonstration of the workow and optimization process for Dynamic Localized SNV (a) Picked peaks of normalized absolutecoecient vector and indication of the POIs in one spectrum (b) spectrum separation according to agglomerated peaks (c) optimizationprocess in order to 13nd optimal window size around the POIs and (d) optimized PPSNV
8 Journal of Spectroscopy
(3) Perform PPSNV across the peaks with the windowsize pw
(4) Evaluate the performance via R2 for pw between 1and 200 data points and choose pwopt according tomaximal R2
4 Results and Discussion
41 Cross Validation In Figure 8(a) the cross validationrecovery function for predictions of the SNV preprocessedspectra of the regression on the friction modi13er compoundis shown A 50-fold cross validation strategy with a cali-brationvalidation splitting of 7030 was used Red dotsrepresent the prediction of calibration and blue dots rep-resent validation samples It is obvious that the linear modelstruggles to predict the high and low compound intensityregions correctly e nonlinearity is visualized by an arrowand a dashed line to guide the eye
In Figure 8 also the cross validation recovery functionfor predictions after Dynamic Localized SNV (Figure 8(b))Peak SNV (Figure 8(c)) and Partial Peak SNV (Figure 8(d))optimization are shown e saturation eshyect in the lowintensity area of the compound response is almost com-pletely removed in the latter three cases It is also notablethat the scattering around the green bisecting line is sig-ni13cantly reduced us the con13dence interval for thepredictions is improved
e RMSEP values during the cross validation of theregression of the friction modi13er component are sum-marized in Figure 9 in a box-and-whisker plot representa-tion e red line indicates the median within the boxes theinterquartile range (IQR) (contains 50 of the data) isdepicted and the margins of the whiskers representQ1 minus 15 middot IQR and Q3 + 15 middot IQR for the lower and upperbound respectively (Q1 means the smallest 25 of the dataset are smaller than this value and Q3 means the smallest75 are smaller than this value) Subplot Figure 9(a) refers tothe transmission spectra and Figure 9(b) refers to the ab-sorbance spectra e labels are associated with (1) withoutstandardization (2) single SNV transformation on the fullspectral range (3) Localized SNV (4) Dynamic LocalizedSNV (5) Peak SNV and (6) Partial Peak SNV
It is noticeable that the RMSEP is very poor in case of thecrude transmission spectra and that SNV has a very usefulimpact on them whereas the improvement after SNV is lowfor absorbance spectra
For all sophisticated optimized standardization ap-proaches DLSNV PSNV and PPSNV the median and thescattering around the median of RMSEP decreases drasti-cally with respect to the SNV-transformed full spectra butalso LSNV seems to be a reasonable choice DLSNV onabsorbance spectra is characterized by the lowest medianand the smallest scattering con13rmed by Table 2 summa-rizing the mean values and standard deviation of RMSEP
RMSE
P Proposedmethods
Proposedmethods
(1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)
+ ++
+ +
+++
++++
00
02
04
06
08
10
12
14
16
(a) (b)
Figure 9 Box-and-whisker plot representation of the root-mean-squared error of prediction of the cross validation strategy (50 foldsrandom train test split of 7030 of the data) for the friction modi13er compound of ATF A In (a) the RMSEP values for the transmissionspectra are shown and in (b) the RMSEP values for the absorbance spectra are shown Boxplot (1) is without standardization (2) is witha single SNV transformation on the full spectrum (3) is with optimized LSNV (4) is with optimized Dynamic DLSNV (5) is with optimizedPSNV and (6) is with optimized PPSNV
Journal of Spectroscopy 9
e summarized RMSEPs of the regression to the antiox-idant additive of ATF B are shown in Figure 10 in a box-and-whisker plot representation where Figure 10(a) refers to thetransmission spectra and Figure 10(b) to the absorbance spectraIn this case DLSNV PSNV and PPSNV reduce both themedian and the scattering around themedian enormously whencompared with SNV on full spectra e best performance isachieved by PPSNV conducted on the transmission spectracon13rmed by Table 2 On transmission spectra DLSNV andPPSNV perform better than LSNV but PSNV only has a pos-itive eshyect when compared to SNV In relation to LSNV usingPSNV the predictive power is reduced e fact that PPSNV isthe best choice for this regression use case suggests that it isbene13cial to only use spectral regions with high correlation withthe target value and drop regions without or low correlation
411 Noise Robustness In Figure 11 the performance of theregression model for the prediction of noisy spectra isshown for the friction modi13er In subplot Figure 11(a)the curves for all preprocessings are depicted and inFigure 11(b) a zoomed view is shown Without any pre-processing the initial calibration error for both trans-mission and absorbance spectra is very poor and rises veryfast with the increasing noise level factor Although thesophisticated preprocessing methods LSNV DLSNVPNSV and PPSNV show a lower initial calibration errorthe slope of the error is lower than for SNV In Figure 11(b)the trend of the transmission spectra having low errorsteepness is visible One may say that the three proposedstandardization techniques show a very similar noise ro-bustness behavior and are signi13cantly better than none or
(a) (b)
RMSE
P
Proposedmethods
Proposedmethods
(1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)00
02
04
06
08
10
12
14
+
+
++
++
+
++
+
Figure 10 Box-and-whisker plot representation of the root-mean-squared error of prediction of the cross validation strategy for thephenolic antioxidant compound of ATF B In (a) the RMSEP values for the transmission spectra are shown and in (b) the RMSEP valuesfor the absorbance spectra are shown Boxplot (1) is without standardization (2) is with a single SNV transformation on the full spectrum(3) is with optimized LSNV (4) is with optimized Dynamic DLSNV (5) is with optimized PSNV and (6) is with optimized PPSNV
Table 2 Summary of the model performances described by the mean value and the standard deviation of the RMSEP values during crossvalidation e relative improvements and respective p-values compared with LSNV are also listed
MethodFriction modi13er ATF A Antioxidant ATF B
Transmission Absorbance Transmission AbsorbanceRaw 083plusmn 022 053plusmn 016 090plusmn 014 070plusmn 011Single SNV 034plusmn 014 042plusmn 017 061plusmn 011 070plusmn 013LSNV 030plusmn 007 031plusmn 007 024plusmn 004 024plusmn 004DLSNV 028plusmn 006 026 plusmn 005 021plusmn 004 021plusmn 004Rel improvement 9 (plt 005) 16 (plt 0001) 13 (plt 0001) 13 (plt 0001)PSNV 028plusmn 008 028plusmn 006 033plusmn 006 031plusmn 007Rel improvement 8 (pgt 005) 9 (plt 005) minus41 (plt 0001) minus33 (plt 0001)PPSNV 026plusmn 006 026plusmn 006 017 plusmn 004 024plusmn 005Rel improvement 13 (plt 001) 15 (plt 0001) 29 (plt 0001) minus3 (pgt 005)
10 Journal of Spectroscopy
SNV preprocessing indicating that the model did notoverfit the data as mentioned in Section 332
In Figure 12 the performance of the regression model ofthe antioxidant for the prediction of noisy spectra is shown(e absorbance spectra with or without preprocessing showa similar noise trend as the SNV-transformed spectra InFigure 12(b) the localized versions are shown in a zoomedview (e PPSNV preprocessing on the transmission spectrais characterized by the flattest noise dependency (ese resutsdemonstrate the superiority of the PPSNVmethod in this usecase As mentioned above PSNV is not advantageous in thisapplication and shows the lowest noise immunity but it ispreferable to the SNV across the whole spectrum
42 Summary (e optimized parameters for the pre-processing methods are summarized in Table 3 (e LSNVoptimization process selects the same window size asDLSNV (us the second window size run has no influenceon the final result in these two cases but the starting pointproduces an improvement
As already mentioned in Table 2 the cross validationperformances of the tested methods are summarized as meanvalues and standard deviation for all cross validation runs Forthe friction modifier the performances of DLSNV PSNVand PPSNV are very similar Table 2 also lists the relative
improvements against the benchmark preprocessing LSNVaccompanied by corresponding p values from a two-sidedt-test which tests the significance of the mean values beingdifferent (the deviation for relative improvements when thesame mean value is given due to the fact that the improve-ments were calculated from exact values rather than roundedvalues)
(e best mean RMSEP value for the regression model forthe frictionmodifier of 026 is produced by DLSNV based onabsorbance spectra (e antioxidant compound is modeledbest by PPSNV preprocessing of the transmission spectraand yields a very low prediction error of 017
To summarize one may say that all proposed methodsperformed very well reducing both mean and standarddeviation of the cross validation error compared with SNVPSNV is not reasonable for the antioxidant additive as theperformance is poor compared with the benchmark pre-processing method LSNV
Which preprocessing method is the best depends on theactual regression use case but in general it is shown thatPPSNV outperforms PSNV (is suggests that it is bene-ficial to drop spectral regions showing low or no de-pendency on the target value and to only consider highlycorrelated peaks
For the antioxidant compound PPSNV yielded anenormous improvement (is could be explained as the
Noise levelfactor
Wo standardizationSNVLSNVDLSNV
PSNVPPSNVTransmissionAbsorbance
14
12
10
08
RMSE
06
04
02
0001 02 03 04
(a)
RMSE
Noise levelfactor
LSNVDLSNVPSNV
PPSNVTransmissionAbsorbance
045
040
035
030
025
020
01501 02 03 04
(b)
Figure 11 RMSE as a function of the noise level factor for the regression of the friction modifier compound of ATF A calibrated by theunperturbed full data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50repetitions of noise addition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown
Journal of Spectroscopy 11
phenolic aging inhibitor is a compound with very narrowvibrational band in the ATF B and thus does not have a greatimpact when the SNV is carried out across the entirespectrum (is may lead to a suboptimal alignment of thisband In case of novel standardizations the SNV is opti-mized to the high correlative bands and scatter effects can becompensated for these exact regions
(e fact that PSNV is unsuccessful for the antioxidantmay be because the POIs are not centered to the middle ofthe SNVwindow andmay have large left and right margins ifthey are far away from other POIs As a result they may notbe optimally standardized (is is shown in Figure 6(b)where the single POI at about 2700 cmminus1 has a large singleSNV window
(e study provides an overview of model performanceswhen using transmission or absorbance spectra suggestingthat both cases can lead to valid regressionmodels Howeverfor quantitative models built on transmission spectra theSNV is vital whereas in the absorbance case the predictivepower does not depend on SNV transformation In absor-bance spectra the influence of the baseline constant is re-duced because high transmission values are converted intolow absorbance values
To conclude DLSNV PSNV and PPSNV were able toimprove both transmission and absorbance predictivemodels (e scattering around the mean values are also
drastically reduced because the model does not have to learnhow to compensate for the baseline shift in each crossvalidation step leading to more reproducible results Eachvibrational band is optimally aligned so that the additivedepletion trend is encoded in the absolute signal intensityand the model does not have to weigh a data point asbackground correction
5 Conclusion
(e results presented in this study demonstrate the out-performance of the proposed novel standardizationstrategies Dynamic Localized SNV Peak SNV and PartialPeak SNV to improve both the mean and scatter of RMSEP
Noise level factor
RMSE
Wo standardizationSNVLSNVDLSNV
PSNVPPSNVTransmissionAbsorbance
20
15
10
05
0001 02 03 04
(a)
LSNVDLSNVPSNV
PPSNVTransmissionAbsorbance
RMSE
Noise level factor
07
06
05
04
03
02
0101 02 03 04
(b)
Figure 12 RMSE as a function of the noise level factor for the regression of the antioxidant compound of ATF B calibrated by the originalfull data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50 repetitions of noiseaddition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown
Table 3 Summary of the preprocessing parameters For DLSNVwindow size and starting point for PSN agglomeration windowand for PPSNV window width around POI is shown
MethodATF A ATF B
Transmission Absorbance Transmission AbsorbanceLSNV wopt 52 wopt 52 wopt 51 wopt 2 51DLSNV wopt 2 52 wopt 2 52 wopt 2 51 wopt 2 51
sopt 5 sopt 7 sopt 48 sopt 48PSNV aggopt 5 aggopt 14 aggopt 6 aggopt 10PPSNV pwopt 17 pwopt 25 pwopt 15 pwopt 38
12 Journal of Spectroscopy
values in cross validation and the robustness against noisedrastically with respect to SNV transformation executed onthe entire spectrum Against the benchmark LSNV anenhancement of the predictive power of a ridge regressionmodel by up to 16 and 29 could be achieved for thefriction modifier and the antioxidant compound re-spectively (e demonstrated optimization workflows forperforming SNV on specific regions of the spectrum havebeen introduced here for the first time (erefore thestandardization methods used in this paper are capable ofeliminating nonlinearities by flexible rescaling in definedareas To our knowledge such standardization techniqueshave not been presented elsewhere
Data Availability
(e data used to support the findings of this study are in-cluded within the article
Conflicts of Interest
(e authors declare that they have no conflicts of interest
References
[1] M Zeaiter J M Roger and V Bellon-Maurel ldquoDynamicorthogonal projection a new method to maintain the on-linerobustness of multivariate calibrations application to nir-based monitoring of wine fermentationsrdquo Chemometrics andIntelligent Laboratory Systems vol 80 no 2 pp 227ndash2352006
[2] R M Balabin and R Z Safieva ldquoMotor oil classificationby base stock and viscosity based on near infrared (nir)spectroscopy datardquo Fuel vol 87 no 12 pp 2745ndash27522008
[3] A Borin and R J Poppi ldquoMultivariate quality control oflubricating oils using fourier transform infrared spectros-copyrdquo Journal of the Brazilian Chemical Society vol 15 no 4pp 570ndash576 2004
[4] J Huang D Brennan L Sattler J Alderman B Lane andC OrsquoMathuna ldquoA comparison of calibration methods basedon calibration data size and robustnessrdquo Chemometrics andIntelligent Laboratory Systems vol 62 no 1 pp 25ndash352002
[5] M A Al-Ghouti and L Al-Atoum ldquoVirgin and recycledengine oil differentiation a spectroscopic studyrdquo Journal ofEnvironmental Management vol 90 no 1 pp 187ndash195 2009
[6] R Kellner J M Mermet M Otto M Valcarcel andH M Widmer Analytical Chemistry John Wiley amp SonsAustralia Limited Milton Australia 2004
[7] B G Osborne T Fearn P T Hindle and B G OsbornePractical nir Spectroscopy with Applications in Food andBeverage Analysis Longman Scientific amp Technical WileyHarlow Essex UK 1993
[8] J D Donald and D D Kevin Interpreting Diffuse Reflectanceand Transmittance NIR Chichester UK 2007
[9] A Rinnan F van den Berg and S B Engelsen ldquoReview of themost common pre-processing techniques for near-infraredspectrardquo TrAC Trends in Analytical Chemistry vol 28 no 10pp 1201ndash1222 2009
[10] E Arendse O Amos Fawole L Samukelo MagwazaN Helene and U Linus Opara ldquoComparing the analyticalperformance of near and mid infrared spectrometers for
evaluating pomegranate juice qualityrdquo LWT vol 91pp 180ndash190 2018
[11] J C Machado M A Faria I M P L V O FerreiraR N M J Pascoa and J A Lopes ldquoVarietal discrimination ofhop pellets by near and mid infrared spectroscopyrdquo Talantavol 180 pp 69ndash75 2018
[12] H W Siesler ldquoVibrational spectroscopyrdquo in ReferenceModule in Materials Science and Materials EngineeringElsevier New York NY USA 2016
[13] A Paula Craig B G Botelho L S Oliveira and A S FrancaldquoMid infrared spectroscopy and chemometrics as tools for theclassification of roasted coffees by cup qualityrdquo FoodChemistry vol 245 pp 1052ndash1061 2018
[14] S R Khandasammy M A Fikiet E Mistek et al ldquoBloodstainspaintings and drugs raman spectroscopy applications in fo-rensic sciencerdquo Forensic Chemistry vol 8 pp 111ndash133 2018
[15] J Engel G Jan E Szymanska et al ldquoBreaking with trends inpre-processingrdquo TrAC Trends in Analytical Chemistryvol 50 pp 96ndash106 2013
[16] H Martens S Jensen and P Geladi ldquoMultivariate linearitytransformation for near-infrared reflectance spectrometryrdquo inProceedings of Nordic symposium on Applied Statisticspp 205ndash234 Stavanger Norway June 1983
[17] P Geladi D MacDougall and HMartens ldquoLinearization andscatter-correction for near-infrared reflectance spectra ofmeatrdquo Applied Spectroscopy vol 39 no 3 pp 491ndash500 1985
[18] P Kubelka and F Munk ldquoEin beitrag zur optik der far-banstricherdquo Zeitschrift fur Technische Physik vol 12pp 593ndash601 1931
[19] A Claus Andersson ldquoDirect orthogonalizationrdquo Chemo-metrics and Intelligent Laboratory Systems vol 47 no 1pp 51ndash63 1999
[20] R J Barnes M S Dhanoa and S J Lister ldquoStandard normalvariate transformation and de-trending of near-infrareddiffuse reflectance spectrardquo Applied Spectroscopy vol 43no 5 pp 772ndash777 1989
[21] M Zeaiter J-M Roger and V Bellon-Maurel ldquoRobustnessof models developed by multivariate calibration part ii theinfluence of pre-processing methodsrdquo TrAC Trends in Ana-lytical Chemistry vol 24 no 5 pp 437ndash445 2005
[22] T Fearn C Riccioli A Garrido-Varo and J Emilio Guer-rero-Ginel ldquoOn the geometry of SNV and mscrdquo Chemo-metrics and Intelligent Laboratory Systems vol 96 no 1pp 22ndash26 2009
[23] T Isaksson and B Kowalski ldquoPiece-wise multiplicative scattercorrection applied to near-infrared diffuse transmittance datafrom meat productsrdquo Applied Spectroscopy vol 47 no 6pp 702ndash709 1993
[24] Y Bi K Yuan W Xiao et al ldquoA local pre-processing methodfor near-infrared spectra combined with spectral segmen-tation and standard normal variate transformationrdquo Analy-tica Chimica Acta vol 909 pp 30ndash40 2016
[25] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 pp 2825ndash2830 2011
[26] S Raschka Python Machine Learning Packt PublishingBirmingham UK 2015
[27] E L Lehmann and G Casella Deory of Point EstimationSpringer Texts in Statistics Springer New York NY USA2003
[28] O Heinisch ldquoSteel R G D and J H Torrie principles andprocedures of statistics (with special reference to the bi-ological sciences) mcgraw-hill book company New York
Journal of Spectroscopy 13
Toronto London 1960 481 s 15 abb 81 s 6 drdquo BiometrischeZeitschrift vol 4 no 3 pp 207-208 1962
[29] K Pearson ldquoNote on regression and inheritance in the case oftwo parentsrdquo Proceedings of the Royal Society of Londonvol 58 no 1 pp 240ndash242 1895
14 Journal of Spectroscopy
TribologyAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
International Journal ofInternational Journal ofPhotoenergy
Hindawiwwwhindawicom Volume 2018
Journal of
Chemistry
Hindawiwwwhindawicom Volume 2018
Advances inPhysical Chemistry
Hindawiwwwhindawicom
Analytical Methods in Chemistry
Journal of
Volume 2018
Bioinorganic Chemistry and ApplicationsHindawiwwwhindawicom Volume 2018
SpectroscopyInternational Journal of
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Medicinal ChemistryInternational Journal of
Hindawiwwwhindawicom Volume 2018
NanotechnologyHindawiwwwhindawicom Volume 2018
Journal of
Applied ChemistryJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Biochemistry Research International
Hindawiwwwhindawicom Volume 2018
Enzyme Research
Hindawiwwwhindawicom Volume 2018
Journal of
SpectroscopyAnalytical ChemistryInternational Journal of
Hindawiwwwhindawicom Volume 2018
MaterialsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
BioMed Research International Electrochemistry
International Journal of
Hindawiwwwhindawicom Volume 2018
Na
nom
ate
ria
ls
Hindawiwwwhindawicom Volume 2018
Journal ofNanomaterials
Submit your manuscripts atwwwhindawicom
(3) Perform PPSNV across the peaks with the windowsize pw
(4) Evaluate the performance via R2 for pw between 1and 200 data points and choose pwopt according tomaximal R2
4 Results and Discussion
41 Cross Validation In Figure 8(a) the cross validationrecovery function for predictions of the SNV preprocessedspectra of the regression on the friction modi13er compoundis shown A 50-fold cross validation strategy with a cali-brationvalidation splitting of 7030 was used Red dotsrepresent the prediction of calibration and blue dots rep-resent validation samples It is obvious that the linear modelstruggles to predict the high and low compound intensityregions correctly e nonlinearity is visualized by an arrowand a dashed line to guide the eye
In Figure 8 also the cross validation recovery functionfor predictions after Dynamic Localized SNV (Figure 8(b))Peak SNV (Figure 8(c)) and Partial Peak SNV (Figure 8(d))optimization are shown e saturation eshyect in the lowintensity area of the compound response is almost com-pletely removed in the latter three cases It is also notablethat the scattering around the green bisecting line is sig-ni13cantly reduced us the con13dence interval for thepredictions is improved
e RMSEP values during the cross validation of theregression of the friction modi13er component are sum-marized in Figure 9 in a box-and-whisker plot representa-tion e red line indicates the median within the boxes theinterquartile range (IQR) (contains 50 of the data) isdepicted and the margins of the whiskers representQ1 minus 15 middot IQR and Q3 + 15 middot IQR for the lower and upperbound respectively (Q1 means the smallest 25 of the dataset are smaller than this value and Q3 means the smallest75 are smaller than this value) Subplot Figure 9(a) refers tothe transmission spectra and Figure 9(b) refers to the ab-sorbance spectra e labels are associated with (1) withoutstandardization (2) single SNV transformation on the fullspectral range (3) Localized SNV (4) Dynamic LocalizedSNV (5) Peak SNV and (6) Partial Peak SNV
It is noticeable that the RMSEP is very poor in case of thecrude transmission spectra and that SNV has a very usefulimpact on them whereas the improvement after SNV is lowfor absorbance spectra
For all sophisticated optimized standardization ap-proaches DLSNV PSNV and PPSNV the median and thescattering around the median of RMSEP decreases drasti-cally with respect to the SNV-transformed full spectra butalso LSNV seems to be a reasonable choice DLSNV onabsorbance spectra is characterized by the lowest medianand the smallest scattering con13rmed by Table 2 summa-rizing the mean values and standard deviation of RMSEP
RMSE
P Proposedmethods
Proposedmethods
(1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)
+ ++
+ +
+++
++++
00
02
04
06
08
10
12
14
16
(a) (b)
Figure 9 Box-and-whisker plot representation of the root-mean-squared error of prediction of the cross validation strategy (50 foldsrandom train test split of 7030 of the data) for the friction modi13er compound of ATF A In (a) the RMSEP values for the transmissionspectra are shown and in (b) the RMSEP values for the absorbance spectra are shown Boxplot (1) is without standardization (2) is witha single SNV transformation on the full spectrum (3) is with optimized LSNV (4) is with optimized Dynamic DLSNV (5) is with optimizedPSNV and (6) is with optimized PPSNV
Journal of Spectroscopy 9
e summarized RMSEPs of the regression to the antiox-idant additive of ATF B are shown in Figure 10 in a box-and-whisker plot representation where Figure 10(a) refers to thetransmission spectra and Figure 10(b) to the absorbance spectraIn this case DLSNV PSNV and PPSNV reduce both themedian and the scattering around themedian enormously whencompared with SNV on full spectra e best performance isachieved by PPSNV conducted on the transmission spectracon13rmed by Table 2 On transmission spectra DLSNV andPPSNV perform better than LSNV but PSNV only has a pos-itive eshyect when compared to SNV In relation to LSNV usingPSNV the predictive power is reduced e fact that PPSNV isthe best choice for this regression use case suggests that it isbene13cial to only use spectral regions with high correlation withthe target value and drop regions without or low correlation
411 Noise Robustness In Figure 11 the performance of theregression model for the prediction of noisy spectra isshown for the friction modi13er In subplot Figure 11(a)the curves for all preprocessings are depicted and inFigure 11(b) a zoomed view is shown Without any pre-processing the initial calibration error for both trans-mission and absorbance spectra is very poor and rises veryfast with the increasing noise level factor Although thesophisticated preprocessing methods LSNV DLSNVPNSV and PPSNV show a lower initial calibration errorthe slope of the error is lower than for SNV In Figure 11(b)the trend of the transmission spectra having low errorsteepness is visible One may say that the three proposedstandardization techniques show a very similar noise ro-bustness behavior and are signi13cantly better than none or
(a) (b)
RMSE
P
Proposedmethods
Proposedmethods
(1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)00
02
04
06
08
10
12
14
+
+
++
++
+
++
+
Figure 10 Box-and-whisker plot representation of the root-mean-squared error of prediction of the cross validation strategy for thephenolic antioxidant compound of ATF B In (a) the RMSEP values for the transmission spectra are shown and in (b) the RMSEP valuesfor the absorbance spectra are shown Boxplot (1) is without standardization (2) is with a single SNV transformation on the full spectrum(3) is with optimized LSNV (4) is with optimized Dynamic DLSNV (5) is with optimized PSNV and (6) is with optimized PPSNV
Table 2 Summary of the model performances described by the mean value and the standard deviation of the RMSEP values during crossvalidation e relative improvements and respective p-values compared with LSNV are also listed
MethodFriction modi13er ATF A Antioxidant ATF B
Transmission Absorbance Transmission AbsorbanceRaw 083plusmn 022 053plusmn 016 090plusmn 014 070plusmn 011Single SNV 034plusmn 014 042plusmn 017 061plusmn 011 070plusmn 013LSNV 030plusmn 007 031plusmn 007 024plusmn 004 024plusmn 004DLSNV 028plusmn 006 026 plusmn 005 021plusmn 004 021plusmn 004Rel improvement 9 (plt 005) 16 (plt 0001) 13 (plt 0001) 13 (plt 0001)PSNV 028plusmn 008 028plusmn 006 033plusmn 006 031plusmn 007Rel improvement 8 (pgt 005) 9 (plt 005) minus41 (plt 0001) minus33 (plt 0001)PPSNV 026plusmn 006 026plusmn 006 017 plusmn 004 024plusmn 005Rel improvement 13 (plt 001) 15 (plt 0001) 29 (plt 0001) minus3 (pgt 005)
10 Journal of Spectroscopy
SNV preprocessing indicating that the model did notoverfit the data as mentioned in Section 332
In Figure 12 the performance of the regression model ofthe antioxidant for the prediction of noisy spectra is shown(e absorbance spectra with or without preprocessing showa similar noise trend as the SNV-transformed spectra InFigure 12(b) the localized versions are shown in a zoomedview (e PPSNV preprocessing on the transmission spectrais characterized by the flattest noise dependency (ese resutsdemonstrate the superiority of the PPSNVmethod in this usecase As mentioned above PSNV is not advantageous in thisapplication and shows the lowest noise immunity but it ispreferable to the SNV across the whole spectrum
42 Summary (e optimized parameters for the pre-processing methods are summarized in Table 3 (e LSNVoptimization process selects the same window size asDLSNV (us the second window size run has no influenceon the final result in these two cases but the starting pointproduces an improvement
As already mentioned in Table 2 the cross validationperformances of the tested methods are summarized as meanvalues and standard deviation for all cross validation runs Forthe friction modifier the performances of DLSNV PSNVand PPSNV are very similar Table 2 also lists the relative
improvements against the benchmark preprocessing LSNVaccompanied by corresponding p values from a two-sidedt-test which tests the significance of the mean values beingdifferent (the deviation for relative improvements when thesame mean value is given due to the fact that the improve-ments were calculated from exact values rather than roundedvalues)
(e best mean RMSEP value for the regression model forthe frictionmodifier of 026 is produced by DLSNV based onabsorbance spectra (e antioxidant compound is modeledbest by PPSNV preprocessing of the transmission spectraand yields a very low prediction error of 017
To summarize one may say that all proposed methodsperformed very well reducing both mean and standarddeviation of the cross validation error compared with SNVPSNV is not reasonable for the antioxidant additive as theperformance is poor compared with the benchmark pre-processing method LSNV
Which preprocessing method is the best depends on theactual regression use case but in general it is shown thatPPSNV outperforms PSNV (is suggests that it is bene-ficial to drop spectral regions showing low or no de-pendency on the target value and to only consider highlycorrelated peaks
For the antioxidant compound PPSNV yielded anenormous improvement (is could be explained as the
Noise levelfactor
Wo standardizationSNVLSNVDLSNV
PSNVPPSNVTransmissionAbsorbance
14
12
10
08
RMSE
06
04
02
0001 02 03 04
(a)
RMSE
Noise levelfactor
LSNVDLSNVPSNV
PPSNVTransmissionAbsorbance
045
040
035
030
025
020
01501 02 03 04
(b)
Figure 11 RMSE as a function of the noise level factor for the regression of the friction modifier compound of ATF A calibrated by theunperturbed full data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50repetitions of noise addition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown
Journal of Spectroscopy 11
phenolic aging inhibitor is a compound with very narrowvibrational band in the ATF B and thus does not have a greatimpact when the SNV is carried out across the entirespectrum (is may lead to a suboptimal alignment of thisband In case of novel standardizations the SNV is opti-mized to the high correlative bands and scatter effects can becompensated for these exact regions
(e fact that PSNV is unsuccessful for the antioxidantmay be because the POIs are not centered to the middle ofthe SNVwindow andmay have large left and right margins ifthey are far away from other POIs As a result they may notbe optimally standardized (is is shown in Figure 6(b)where the single POI at about 2700 cmminus1 has a large singleSNV window
(e study provides an overview of model performanceswhen using transmission or absorbance spectra suggestingthat both cases can lead to valid regressionmodels Howeverfor quantitative models built on transmission spectra theSNV is vital whereas in the absorbance case the predictivepower does not depend on SNV transformation In absor-bance spectra the influence of the baseline constant is re-duced because high transmission values are converted intolow absorbance values
To conclude DLSNV PSNV and PPSNV were able toimprove both transmission and absorbance predictivemodels (e scattering around the mean values are also
drastically reduced because the model does not have to learnhow to compensate for the baseline shift in each crossvalidation step leading to more reproducible results Eachvibrational band is optimally aligned so that the additivedepletion trend is encoded in the absolute signal intensityand the model does not have to weigh a data point asbackground correction
5 Conclusion
(e results presented in this study demonstrate the out-performance of the proposed novel standardizationstrategies Dynamic Localized SNV Peak SNV and PartialPeak SNV to improve both the mean and scatter of RMSEP
Noise level factor
RMSE
Wo standardizationSNVLSNVDLSNV
PSNVPPSNVTransmissionAbsorbance
20
15
10
05
0001 02 03 04
(a)
LSNVDLSNVPSNV
PPSNVTransmissionAbsorbance
RMSE
Noise level factor
07
06
05
04
03
02
0101 02 03 04
(b)
Figure 12 RMSE as a function of the noise level factor for the regression of the antioxidant compound of ATF B calibrated by the originalfull data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50 repetitions of noiseaddition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown
Table 3 Summary of the preprocessing parameters For DLSNVwindow size and starting point for PSN agglomeration windowand for PPSNV window width around POI is shown
MethodATF A ATF B
Transmission Absorbance Transmission AbsorbanceLSNV wopt 52 wopt 52 wopt 51 wopt 2 51DLSNV wopt 2 52 wopt 2 52 wopt 2 51 wopt 2 51
sopt 5 sopt 7 sopt 48 sopt 48PSNV aggopt 5 aggopt 14 aggopt 6 aggopt 10PPSNV pwopt 17 pwopt 25 pwopt 15 pwopt 38
12 Journal of Spectroscopy
values in cross validation and the robustness against noisedrastically with respect to SNV transformation executed onthe entire spectrum Against the benchmark LSNV anenhancement of the predictive power of a ridge regressionmodel by up to 16 and 29 could be achieved for thefriction modifier and the antioxidant compound re-spectively (e demonstrated optimization workflows forperforming SNV on specific regions of the spectrum havebeen introduced here for the first time (erefore thestandardization methods used in this paper are capable ofeliminating nonlinearities by flexible rescaling in definedareas To our knowledge such standardization techniqueshave not been presented elsewhere
Data Availability
(e data used to support the findings of this study are in-cluded within the article
Conflicts of Interest
(e authors declare that they have no conflicts of interest
References
[1] M Zeaiter J M Roger and V Bellon-Maurel ldquoDynamicorthogonal projection a new method to maintain the on-linerobustness of multivariate calibrations application to nir-based monitoring of wine fermentationsrdquo Chemometrics andIntelligent Laboratory Systems vol 80 no 2 pp 227ndash2352006
[2] R M Balabin and R Z Safieva ldquoMotor oil classificationby base stock and viscosity based on near infrared (nir)spectroscopy datardquo Fuel vol 87 no 12 pp 2745ndash27522008
[3] A Borin and R J Poppi ldquoMultivariate quality control oflubricating oils using fourier transform infrared spectros-copyrdquo Journal of the Brazilian Chemical Society vol 15 no 4pp 570ndash576 2004
[4] J Huang D Brennan L Sattler J Alderman B Lane andC OrsquoMathuna ldquoA comparison of calibration methods basedon calibration data size and robustnessrdquo Chemometrics andIntelligent Laboratory Systems vol 62 no 1 pp 25ndash352002
[5] M A Al-Ghouti and L Al-Atoum ldquoVirgin and recycledengine oil differentiation a spectroscopic studyrdquo Journal ofEnvironmental Management vol 90 no 1 pp 187ndash195 2009
[6] R Kellner J M Mermet M Otto M Valcarcel andH M Widmer Analytical Chemistry John Wiley amp SonsAustralia Limited Milton Australia 2004
[7] B G Osborne T Fearn P T Hindle and B G OsbornePractical nir Spectroscopy with Applications in Food andBeverage Analysis Longman Scientific amp Technical WileyHarlow Essex UK 1993
[8] J D Donald and D D Kevin Interpreting Diffuse Reflectanceand Transmittance NIR Chichester UK 2007
[9] A Rinnan F van den Berg and S B Engelsen ldquoReview of themost common pre-processing techniques for near-infraredspectrardquo TrAC Trends in Analytical Chemistry vol 28 no 10pp 1201ndash1222 2009
[10] E Arendse O Amos Fawole L Samukelo MagwazaN Helene and U Linus Opara ldquoComparing the analyticalperformance of near and mid infrared spectrometers for
evaluating pomegranate juice qualityrdquo LWT vol 91pp 180ndash190 2018
[11] J C Machado M A Faria I M P L V O FerreiraR N M J Pascoa and J A Lopes ldquoVarietal discrimination ofhop pellets by near and mid infrared spectroscopyrdquo Talantavol 180 pp 69ndash75 2018
[12] H W Siesler ldquoVibrational spectroscopyrdquo in ReferenceModule in Materials Science and Materials EngineeringElsevier New York NY USA 2016
[13] A Paula Craig B G Botelho L S Oliveira and A S FrancaldquoMid infrared spectroscopy and chemometrics as tools for theclassification of roasted coffees by cup qualityrdquo FoodChemistry vol 245 pp 1052ndash1061 2018
[14] S R Khandasammy M A Fikiet E Mistek et al ldquoBloodstainspaintings and drugs raman spectroscopy applications in fo-rensic sciencerdquo Forensic Chemistry vol 8 pp 111ndash133 2018
[15] J Engel G Jan E Szymanska et al ldquoBreaking with trends inpre-processingrdquo TrAC Trends in Analytical Chemistryvol 50 pp 96ndash106 2013
[16] H Martens S Jensen and P Geladi ldquoMultivariate linearitytransformation for near-infrared reflectance spectrometryrdquo inProceedings of Nordic symposium on Applied Statisticspp 205ndash234 Stavanger Norway June 1983
[17] P Geladi D MacDougall and HMartens ldquoLinearization andscatter-correction for near-infrared reflectance spectra ofmeatrdquo Applied Spectroscopy vol 39 no 3 pp 491ndash500 1985
[18] P Kubelka and F Munk ldquoEin beitrag zur optik der far-banstricherdquo Zeitschrift fur Technische Physik vol 12pp 593ndash601 1931
[19] A Claus Andersson ldquoDirect orthogonalizationrdquo Chemo-metrics and Intelligent Laboratory Systems vol 47 no 1pp 51ndash63 1999
[20] R J Barnes M S Dhanoa and S J Lister ldquoStandard normalvariate transformation and de-trending of near-infrareddiffuse reflectance spectrardquo Applied Spectroscopy vol 43no 5 pp 772ndash777 1989
[21] M Zeaiter J-M Roger and V Bellon-Maurel ldquoRobustnessof models developed by multivariate calibration part ii theinfluence of pre-processing methodsrdquo TrAC Trends in Ana-lytical Chemistry vol 24 no 5 pp 437ndash445 2005
[22] T Fearn C Riccioli A Garrido-Varo and J Emilio Guer-rero-Ginel ldquoOn the geometry of SNV and mscrdquo Chemo-metrics and Intelligent Laboratory Systems vol 96 no 1pp 22ndash26 2009
[23] T Isaksson and B Kowalski ldquoPiece-wise multiplicative scattercorrection applied to near-infrared diffuse transmittance datafrom meat productsrdquo Applied Spectroscopy vol 47 no 6pp 702ndash709 1993
[24] Y Bi K Yuan W Xiao et al ldquoA local pre-processing methodfor near-infrared spectra combined with spectral segmen-tation and standard normal variate transformationrdquo Analy-tica Chimica Acta vol 909 pp 30ndash40 2016
[25] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 pp 2825ndash2830 2011
[26] S Raschka Python Machine Learning Packt PublishingBirmingham UK 2015
[27] E L Lehmann and G Casella Deory of Point EstimationSpringer Texts in Statistics Springer New York NY USA2003
[28] O Heinisch ldquoSteel R G D and J H Torrie principles andprocedures of statistics (with special reference to the bi-ological sciences) mcgraw-hill book company New York
Journal of Spectroscopy 13
Toronto London 1960 481 s 15 abb 81 s 6 drdquo BiometrischeZeitschrift vol 4 no 3 pp 207-208 1962
[29] K Pearson ldquoNote on regression and inheritance in the case oftwo parentsrdquo Proceedings of the Royal Society of Londonvol 58 no 1 pp 240ndash242 1895
14 Journal of Spectroscopy
TribologyAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
International Journal ofInternational Journal ofPhotoenergy
Hindawiwwwhindawicom Volume 2018
Journal of
Chemistry
Hindawiwwwhindawicom Volume 2018
Advances inPhysical Chemistry
Hindawiwwwhindawicom
Analytical Methods in Chemistry
Journal of
Volume 2018
Bioinorganic Chemistry and ApplicationsHindawiwwwhindawicom Volume 2018
SpectroscopyInternational Journal of
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Medicinal ChemistryInternational Journal of
Hindawiwwwhindawicom Volume 2018
NanotechnologyHindawiwwwhindawicom Volume 2018
Journal of
Applied ChemistryJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Biochemistry Research International
Hindawiwwwhindawicom Volume 2018
Enzyme Research
Hindawiwwwhindawicom Volume 2018
Journal of
SpectroscopyAnalytical ChemistryInternational Journal of
Hindawiwwwhindawicom Volume 2018
MaterialsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
BioMed Research International Electrochemistry
International Journal of
Hindawiwwwhindawicom Volume 2018
Na
nom
ate
ria
ls
Hindawiwwwhindawicom Volume 2018
Journal ofNanomaterials
Submit your manuscripts atwwwhindawicom
e summarized RMSEPs of the regression to the antiox-idant additive of ATF B are shown in Figure 10 in a box-and-whisker plot representation where Figure 10(a) refers to thetransmission spectra and Figure 10(b) to the absorbance spectraIn this case DLSNV PSNV and PPSNV reduce both themedian and the scattering around themedian enormously whencompared with SNV on full spectra e best performance isachieved by PPSNV conducted on the transmission spectracon13rmed by Table 2 On transmission spectra DLSNV andPPSNV perform better than LSNV but PSNV only has a pos-itive eshyect when compared to SNV In relation to LSNV usingPSNV the predictive power is reduced e fact that PPSNV isthe best choice for this regression use case suggests that it isbene13cial to only use spectral regions with high correlation withthe target value and drop regions without or low correlation
411 Noise Robustness In Figure 11 the performance of theregression model for the prediction of noisy spectra isshown for the friction modi13er In subplot Figure 11(a)the curves for all preprocessings are depicted and inFigure 11(b) a zoomed view is shown Without any pre-processing the initial calibration error for both trans-mission and absorbance spectra is very poor and rises veryfast with the increasing noise level factor Although thesophisticated preprocessing methods LSNV DLSNVPNSV and PPSNV show a lower initial calibration errorthe slope of the error is lower than for SNV In Figure 11(b)the trend of the transmission spectra having low errorsteepness is visible One may say that the three proposedstandardization techniques show a very similar noise ro-bustness behavior and are signi13cantly better than none or
(a) (b)
RMSE
P
Proposedmethods
Proposedmethods
(1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)00
02
04
06
08
10
12
14
+
+
++
++
+
++
+
Figure 10 Box-and-whisker plot representation of the root-mean-squared error of prediction of the cross validation strategy for thephenolic antioxidant compound of ATF B In (a) the RMSEP values for the transmission spectra are shown and in (b) the RMSEP valuesfor the absorbance spectra are shown Boxplot (1) is without standardization (2) is with a single SNV transformation on the full spectrum(3) is with optimized LSNV (4) is with optimized Dynamic DLSNV (5) is with optimized PSNV and (6) is with optimized PPSNV
Table 2 Summary of the model performances described by the mean value and the standard deviation of the RMSEP values during crossvalidation e relative improvements and respective p-values compared with LSNV are also listed
MethodFriction modi13er ATF A Antioxidant ATF B
Transmission Absorbance Transmission AbsorbanceRaw 083plusmn 022 053plusmn 016 090plusmn 014 070plusmn 011Single SNV 034plusmn 014 042plusmn 017 061plusmn 011 070plusmn 013LSNV 030plusmn 007 031plusmn 007 024plusmn 004 024plusmn 004DLSNV 028plusmn 006 026 plusmn 005 021plusmn 004 021plusmn 004Rel improvement 9 (plt 005) 16 (plt 0001) 13 (plt 0001) 13 (plt 0001)PSNV 028plusmn 008 028plusmn 006 033plusmn 006 031plusmn 007Rel improvement 8 (pgt 005) 9 (plt 005) minus41 (plt 0001) minus33 (plt 0001)PPSNV 026plusmn 006 026plusmn 006 017 plusmn 004 024plusmn 005Rel improvement 13 (plt 001) 15 (plt 0001) 29 (plt 0001) minus3 (pgt 005)
10 Journal of Spectroscopy
SNV preprocessing indicating that the model did notoverfit the data as mentioned in Section 332
In Figure 12 the performance of the regression model ofthe antioxidant for the prediction of noisy spectra is shown(e absorbance spectra with or without preprocessing showa similar noise trend as the SNV-transformed spectra InFigure 12(b) the localized versions are shown in a zoomedview (e PPSNV preprocessing on the transmission spectrais characterized by the flattest noise dependency (ese resutsdemonstrate the superiority of the PPSNVmethod in this usecase As mentioned above PSNV is not advantageous in thisapplication and shows the lowest noise immunity but it ispreferable to the SNV across the whole spectrum
42 Summary (e optimized parameters for the pre-processing methods are summarized in Table 3 (e LSNVoptimization process selects the same window size asDLSNV (us the second window size run has no influenceon the final result in these two cases but the starting pointproduces an improvement
As already mentioned in Table 2 the cross validationperformances of the tested methods are summarized as meanvalues and standard deviation for all cross validation runs Forthe friction modifier the performances of DLSNV PSNVand PPSNV are very similar Table 2 also lists the relative
improvements against the benchmark preprocessing LSNVaccompanied by corresponding p values from a two-sidedt-test which tests the significance of the mean values beingdifferent (the deviation for relative improvements when thesame mean value is given due to the fact that the improve-ments were calculated from exact values rather than roundedvalues)
(e best mean RMSEP value for the regression model forthe frictionmodifier of 026 is produced by DLSNV based onabsorbance spectra (e antioxidant compound is modeledbest by PPSNV preprocessing of the transmission spectraand yields a very low prediction error of 017
To summarize one may say that all proposed methodsperformed very well reducing both mean and standarddeviation of the cross validation error compared with SNVPSNV is not reasonable for the antioxidant additive as theperformance is poor compared with the benchmark pre-processing method LSNV
Which preprocessing method is the best depends on theactual regression use case but in general it is shown thatPPSNV outperforms PSNV (is suggests that it is bene-ficial to drop spectral regions showing low or no de-pendency on the target value and to only consider highlycorrelated peaks
For the antioxidant compound PPSNV yielded anenormous improvement (is could be explained as the
Noise levelfactor
Wo standardizationSNVLSNVDLSNV
PSNVPPSNVTransmissionAbsorbance
14
12
10
08
RMSE
06
04
02
0001 02 03 04
(a)
RMSE
Noise levelfactor
LSNVDLSNVPSNV
PPSNVTransmissionAbsorbance
045
040
035
030
025
020
01501 02 03 04
(b)
Figure 11 RMSE as a function of the noise level factor for the regression of the friction modifier compound of ATF A calibrated by theunperturbed full data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50repetitions of noise addition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown
Journal of Spectroscopy 11
phenolic aging inhibitor is a compound with very narrowvibrational band in the ATF B and thus does not have a greatimpact when the SNV is carried out across the entirespectrum (is may lead to a suboptimal alignment of thisband In case of novel standardizations the SNV is opti-mized to the high correlative bands and scatter effects can becompensated for these exact regions
(e fact that PSNV is unsuccessful for the antioxidantmay be because the POIs are not centered to the middle ofthe SNVwindow andmay have large left and right margins ifthey are far away from other POIs As a result they may notbe optimally standardized (is is shown in Figure 6(b)where the single POI at about 2700 cmminus1 has a large singleSNV window
(e study provides an overview of model performanceswhen using transmission or absorbance spectra suggestingthat both cases can lead to valid regressionmodels Howeverfor quantitative models built on transmission spectra theSNV is vital whereas in the absorbance case the predictivepower does not depend on SNV transformation In absor-bance spectra the influence of the baseline constant is re-duced because high transmission values are converted intolow absorbance values
To conclude DLSNV PSNV and PPSNV were able toimprove both transmission and absorbance predictivemodels (e scattering around the mean values are also
drastically reduced because the model does not have to learnhow to compensate for the baseline shift in each crossvalidation step leading to more reproducible results Eachvibrational band is optimally aligned so that the additivedepletion trend is encoded in the absolute signal intensityand the model does not have to weigh a data point asbackground correction
5 Conclusion
(e results presented in this study demonstrate the out-performance of the proposed novel standardizationstrategies Dynamic Localized SNV Peak SNV and PartialPeak SNV to improve both the mean and scatter of RMSEP
Noise level factor
RMSE
Wo standardizationSNVLSNVDLSNV
PSNVPPSNVTransmissionAbsorbance
20
15
10
05
0001 02 03 04
(a)
LSNVDLSNVPSNV
PPSNVTransmissionAbsorbance
RMSE
Noise level factor
07
06
05
04
03
02
0101 02 03 04
(b)
Figure 12 RMSE as a function of the noise level factor for the regression of the antioxidant compound of ATF B calibrated by the originalfull data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50 repetitions of noiseaddition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown
Table 3 Summary of the preprocessing parameters For DLSNVwindow size and starting point for PSN agglomeration windowand for PPSNV window width around POI is shown
MethodATF A ATF B
Transmission Absorbance Transmission AbsorbanceLSNV wopt 52 wopt 52 wopt 51 wopt 2 51DLSNV wopt 2 52 wopt 2 52 wopt 2 51 wopt 2 51
sopt 5 sopt 7 sopt 48 sopt 48PSNV aggopt 5 aggopt 14 aggopt 6 aggopt 10PPSNV pwopt 17 pwopt 25 pwopt 15 pwopt 38
12 Journal of Spectroscopy
values in cross validation and the robustness against noisedrastically with respect to SNV transformation executed onthe entire spectrum Against the benchmark LSNV anenhancement of the predictive power of a ridge regressionmodel by up to 16 and 29 could be achieved for thefriction modifier and the antioxidant compound re-spectively (e demonstrated optimization workflows forperforming SNV on specific regions of the spectrum havebeen introduced here for the first time (erefore thestandardization methods used in this paper are capable ofeliminating nonlinearities by flexible rescaling in definedareas To our knowledge such standardization techniqueshave not been presented elsewhere
Data Availability
(e data used to support the findings of this study are in-cluded within the article
Conflicts of Interest
(e authors declare that they have no conflicts of interest
References
[1] M Zeaiter J M Roger and V Bellon-Maurel ldquoDynamicorthogonal projection a new method to maintain the on-linerobustness of multivariate calibrations application to nir-based monitoring of wine fermentationsrdquo Chemometrics andIntelligent Laboratory Systems vol 80 no 2 pp 227ndash2352006
[2] R M Balabin and R Z Safieva ldquoMotor oil classificationby base stock and viscosity based on near infrared (nir)spectroscopy datardquo Fuel vol 87 no 12 pp 2745ndash27522008
[3] A Borin and R J Poppi ldquoMultivariate quality control oflubricating oils using fourier transform infrared spectros-copyrdquo Journal of the Brazilian Chemical Society vol 15 no 4pp 570ndash576 2004
[4] J Huang D Brennan L Sattler J Alderman B Lane andC OrsquoMathuna ldquoA comparison of calibration methods basedon calibration data size and robustnessrdquo Chemometrics andIntelligent Laboratory Systems vol 62 no 1 pp 25ndash352002
[5] M A Al-Ghouti and L Al-Atoum ldquoVirgin and recycledengine oil differentiation a spectroscopic studyrdquo Journal ofEnvironmental Management vol 90 no 1 pp 187ndash195 2009
[6] R Kellner J M Mermet M Otto M Valcarcel andH M Widmer Analytical Chemistry John Wiley amp SonsAustralia Limited Milton Australia 2004
[7] B G Osborne T Fearn P T Hindle and B G OsbornePractical nir Spectroscopy with Applications in Food andBeverage Analysis Longman Scientific amp Technical WileyHarlow Essex UK 1993
[8] J D Donald and D D Kevin Interpreting Diffuse Reflectanceand Transmittance NIR Chichester UK 2007
[9] A Rinnan F van den Berg and S B Engelsen ldquoReview of themost common pre-processing techniques for near-infraredspectrardquo TrAC Trends in Analytical Chemistry vol 28 no 10pp 1201ndash1222 2009
[10] E Arendse O Amos Fawole L Samukelo MagwazaN Helene and U Linus Opara ldquoComparing the analyticalperformance of near and mid infrared spectrometers for
evaluating pomegranate juice qualityrdquo LWT vol 91pp 180ndash190 2018
[11] J C Machado M A Faria I M P L V O FerreiraR N M J Pascoa and J A Lopes ldquoVarietal discrimination ofhop pellets by near and mid infrared spectroscopyrdquo Talantavol 180 pp 69ndash75 2018
[12] H W Siesler ldquoVibrational spectroscopyrdquo in ReferenceModule in Materials Science and Materials EngineeringElsevier New York NY USA 2016
[13] A Paula Craig B G Botelho L S Oliveira and A S FrancaldquoMid infrared spectroscopy and chemometrics as tools for theclassification of roasted coffees by cup qualityrdquo FoodChemistry vol 245 pp 1052ndash1061 2018
[14] S R Khandasammy M A Fikiet E Mistek et al ldquoBloodstainspaintings and drugs raman spectroscopy applications in fo-rensic sciencerdquo Forensic Chemistry vol 8 pp 111ndash133 2018
[15] J Engel G Jan E Szymanska et al ldquoBreaking with trends inpre-processingrdquo TrAC Trends in Analytical Chemistryvol 50 pp 96ndash106 2013
[16] H Martens S Jensen and P Geladi ldquoMultivariate linearitytransformation for near-infrared reflectance spectrometryrdquo inProceedings of Nordic symposium on Applied Statisticspp 205ndash234 Stavanger Norway June 1983
[17] P Geladi D MacDougall and HMartens ldquoLinearization andscatter-correction for near-infrared reflectance spectra ofmeatrdquo Applied Spectroscopy vol 39 no 3 pp 491ndash500 1985
[18] P Kubelka and F Munk ldquoEin beitrag zur optik der far-banstricherdquo Zeitschrift fur Technische Physik vol 12pp 593ndash601 1931
[19] A Claus Andersson ldquoDirect orthogonalizationrdquo Chemo-metrics and Intelligent Laboratory Systems vol 47 no 1pp 51ndash63 1999
[20] R J Barnes M S Dhanoa and S J Lister ldquoStandard normalvariate transformation and de-trending of near-infrareddiffuse reflectance spectrardquo Applied Spectroscopy vol 43no 5 pp 772ndash777 1989
[21] M Zeaiter J-M Roger and V Bellon-Maurel ldquoRobustnessof models developed by multivariate calibration part ii theinfluence of pre-processing methodsrdquo TrAC Trends in Ana-lytical Chemistry vol 24 no 5 pp 437ndash445 2005
[22] T Fearn C Riccioli A Garrido-Varo and J Emilio Guer-rero-Ginel ldquoOn the geometry of SNV and mscrdquo Chemo-metrics and Intelligent Laboratory Systems vol 96 no 1pp 22ndash26 2009
[23] T Isaksson and B Kowalski ldquoPiece-wise multiplicative scattercorrection applied to near-infrared diffuse transmittance datafrom meat productsrdquo Applied Spectroscopy vol 47 no 6pp 702ndash709 1993
[24] Y Bi K Yuan W Xiao et al ldquoA local pre-processing methodfor near-infrared spectra combined with spectral segmen-tation and standard normal variate transformationrdquo Analy-tica Chimica Acta vol 909 pp 30ndash40 2016
[25] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 pp 2825ndash2830 2011
[26] S Raschka Python Machine Learning Packt PublishingBirmingham UK 2015
[27] E L Lehmann and G Casella Deory of Point EstimationSpringer Texts in Statistics Springer New York NY USA2003
[28] O Heinisch ldquoSteel R G D and J H Torrie principles andprocedures of statistics (with special reference to the bi-ological sciences) mcgraw-hill book company New York
Journal of Spectroscopy 13
Toronto London 1960 481 s 15 abb 81 s 6 drdquo BiometrischeZeitschrift vol 4 no 3 pp 207-208 1962
[29] K Pearson ldquoNote on regression and inheritance in the case oftwo parentsrdquo Proceedings of the Royal Society of Londonvol 58 no 1 pp 240ndash242 1895
14 Journal of Spectroscopy
TribologyAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
International Journal ofInternational Journal ofPhotoenergy
Hindawiwwwhindawicom Volume 2018
Journal of
Chemistry
Hindawiwwwhindawicom Volume 2018
Advances inPhysical Chemistry
Hindawiwwwhindawicom
Analytical Methods in Chemistry
Journal of
Volume 2018
Bioinorganic Chemistry and ApplicationsHindawiwwwhindawicom Volume 2018
SpectroscopyInternational Journal of
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Medicinal ChemistryInternational Journal of
Hindawiwwwhindawicom Volume 2018
NanotechnologyHindawiwwwhindawicom Volume 2018
Journal of
Applied ChemistryJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Biochemistry Research International
Hindawiwwwhindawicom Volume 2018
Enzyme Research
Hindawiwwwhindawicom Volume 2018
Journal of
SpectroscopyAnalytical ChemistryInternational Journal of
Hindawiwwwhindawicom Volume 2018
MaterialsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
BioMed Research International Electrochemistry
International Journal of
Hindawiwwwhindawicom Volume 2018
Na
nom
ate
ria
ls
Hindawiwwwhindawicom Volume 2018
Journal ofNanomaterials
Submit your manuscripts atwwwhindawicom
SNV preprocessing indicating that the model did notoverfit the data as mentioned in Section 332
In Figure 12 the performance of the regression model ofthe antioxidant for the prediction of noisy spectra is shown(e absorbance spectra with or without preprocessing showa similar noise trend as the SNV-transformed spectra InFigure 12(b) the localized versions are shown in a zoomedview (e PPSNV preprocessing on the transmission spectrais characterized by the flattest noise dependency (ese resutsdemonstrate the superiority of the PPSNVmethod in this usecase As mentioned above PSNV is not advantageous in thisapplication and shows the lowest noise immunity but it ispreferable to the SNV across the whole spectrum
42 Summary (e optimized parameters for the pre-processing methods are summarized in Table 3 (e LSNVoptimization process selects the same window size asDLSNV (us the second window size run has no influenceon the final result in these two cases but the starting pointproduces an improvement
As already mentioned in Table 2 the cross validationperformances of the tested methods are summarized as meanvalues and standard deviation for all cross validation runs Forthe friction modifier the performances of DLSNV PSNVand PPSNV are very similar Table 2 also lists the relative
improvements against the benchmark preprocessing LSNVaccompanied by corresponding p values from a two-sidedt-test which tests the significance of the mean values beingdifferent (the deviation for relative improvements when thesame mean value is given due to the fact that the improve-ments were calculated from exact values rather than roundedvalues)
(e best mean RMSEP value for the regression model forthe frictionmodifier of 026 is produced by DLSNV based onabsorbance spectra (e antioxidant compound is modeledbest by PPSNV preprocessing of the transmission spectraand yields a very low prediction error of 017
To summarize one may say that all proposed methodsperformed very well reducing both mean and standarddeviation of the cross validation error compared with SNVPSNV is not reasonable for the antioxidant additive as theperformance is poor compared with the benchmark pre-processing method LSNV
Which preprocessing method is the best depends on theactual regression use case but in general it is shown thatPPSNV outperforms PSNV (is suggests that it is bene-ficial to drop spectral regions showing low or no de-pendency on the target value and to only consider highlycorrelated peaks
For the antioxidant compound PPSNV yielded anenormous improvement (is could be explained as the
Noise levelfactor
Wo standardizationSNVLSNVDLSNV
PSNVPPSNVTransmissionAbsorbance
14
12
10
08
RMSE
06
04
02
0001 02 03 04
(a)
RMSE
Noise levelfactor
LSNVDLSNVPSNV
PPSNVTransmissionAbsorbance
045
040
035
030
025
020
01501 02 03 04
(b)
Figure 11 RMSE as a function of the noise level factor for the regression of the friction modifier compound of ATF A calibrated by theunperturbed full data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50repetitions of noise addition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown
Journal of Spectroscopy 11
phenolic aging inhibitor is a compound with very narrowvibrational band in the ATF B and thus does not have a greatimpact when the SNV is carried out across the entirespectrum (is may lead to a suboptimal alignment of thisband In case of novel standardizations the SNV is opti-mized to the high correlative bands and scatter effects can becompensated for these exact regions
(e fact that PSNV is unsuccessful for the antioxidantmay be because the POIs are not centered to the middle ofthe SNVwindow andmay have large left and right margins ifthey are far away from other POIs As a result they may notbe optimally standardized (is is shown in Figure 6(b)where the single POI at about 2700 cmminus1 has a large singleSNV window
(e study provides an overview of model performanceswhen using transmission or absorbance spectra suggestingthat both cases can lead to valid regressionmodels Howeverfor quantitative models built on transmission spectra theSNV is vital whereas in the absorbance case the predictivepower does not depend on SNV transformation In absor-bance spectra the influence of the baseline constant is re-duced because high transmission values are converted intolow absorbance values
To conclude DLSNV PSNV and PPSNV were able toimprove both transmission and absorbance predictivemodels (e scattering around the mean values are also
drastically reduced because the model does not have to learnhow to compensate for the baseline shift in each crossvalidation step leading to more reproducible results Eachvibrational band is optimally aligned so that the additivedepletion trend is encoded in the absolute signal intensityand the model does not have to weigh a data point asbackground correction
5 Conclusion
(e results presented in this study demonstrate the out-performance of the proposed novel standardizationstrategies Dynamic Localized SNV Peak SNV and PartialPeak SNV to improve both the mean and scatter of RMSEP
Noise level factor
RMSE
Wo standardizationSNVLSNVDLSNV
PSNVPPSNVTransmissionAbsorbance
20
15
10
05
0001 02 03 04
(a)
LSNVDLSNVPSNV
PPSNVTransmissionAbsorbance
RMSE
Noise level factor
07
06
05
04
03
02
0101 02 03 04
(b)
Figure 12 RMSE as a function of the noise level factor for the regression of the antioxidant compound of ATF B calibrated by the originalfull data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50 repetitions of noiseaddition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown
Table 3 Summary of the preprocessing parameters For DLSNVwindow size and starting point for PSN agglomeration windowand for PPSNV window width around POI is shown
MethodATF A ATF B
Transmission Absorbance Transmission AbsorbanceLSNV wopt 52 wopt 52 wopt 51 wopt 2 51DLSNV wopt 2 52 wopt 2 52 wopt 2 51 wopt 2 51
sopt 5 sopt 7 sopt 48 sopt 48PSNV aggopt 5 aggopt 14 aggopt 6 aggopt 10PPSNV pwopt 17 pwopt 25 pwopt 15 pwopt 38
12 Journal of Spectroscopy
values in cross validation and the robustness against noisedrastically with respect to SNV transformation executed onthe entire spectrum Against the benchmark LSNV anenhancement of the predictive power of a ridge regressionmodel by up to 16 and 29 could be achieved for thefriction modifier and the antioxidant compound re-spectively (e demonstrated optimization workflows forperforming SNV on specific regions of the spectrum havebeen introduced here for the first time (erefore thestandardization methods used in this paper are capable ofeliminating nonlinearities by flexible rescaling in definedareas To our knowledge such standardization techniqueshave not been presented elsewhere
Data Availability
(e data used to support the findings of this study are in-cluded within the article
Conflicts of Interest
(e authors declare that they have no conflicts of interest
References
[1] M Zeaiter J M Roger and V Bellon-Maurel ldquoDynamicorthogonal projection a new method to maintain the on-linerobustness of multivariate calibrations application to nir-based monitoring of wine fermentationsrdquo Chemometrics andIntelligent Laboratory Systems vol 80 no 2 pp 227ndash2352006
[2] R M Balabin and R Z Safieva ldquoMotor oil classificationby base stock and viscosity based on near infrared (nir)spectroscopy datardquo Fuel vol 87 no 12 pp 2745ndash27522008
[3] A Borin and R J Poppi ldquoMultivariate quality control oflubricating oils using fourier transform infrared spectros-copyrdquo Journal of the Brazilian Chemical Society vol 15 no 4pp 570ndash576 2004
[4] J Huang D Brennan L Sattler J Alderman B Lane andC OrsquoMathuna ldquoA comparison of calibration methods basedon calibration data size and robustnessrdquo Chemometrics andIntelligent Laboratory Systems vol 62 no 1 pp 25ndash352002
[5] M A Al-Ghouti and L Al-Atoum ldquoVirgin and recycledengine oil differentiation a spectroscopic studyrdquo Journal ofEnvironmental Management vol 90 no 1 pp 187ndash195 2009
[6] R Kellner J M Mermet M Otto M Valcarcel andH M Widmer Analytical Chemistry John Wiley amp SonsAustralia Limited Milton Australia 2004
[7] B G Osborne T Fearn P T Hindle and B G OsbornePractical nir Spectroscopy with Applications in Food andBeverage Analysis Longman Scientific amp Technical WileyHarlow Essex UK 1993
[8] J D Donald and D D Kevin Interpreting Diffuse Reflectanceand Transmittance NIR Chichester UK 2007
[9] A Rinnan F van den Berg and S B Engelsen ldquoReview of themost common pre-processing techniques for near-infraredspectrardquo TrAC Trends in Analytical Chemistry vol 28 no 10pp 1201ndash1222 2009
[10] E Arendse O Amos Fawole L Samukelo MagwazaN Helene and U Linus Opara ldquoComparing the analyticalperformance of near and mid infrared spectrometers for
evaluating pomegranate juice qualityrdquo LWT vol 91pp 180ndash190 2018
[11] J C Machado M A Faria I M P L V O FerreiraR N M J Pascoa and J A Lopes ldquoVarietal discrimination ofhop pellets by near and mid infrared spectroscopyrdquo Talantavol 180 pp 69ndash75 2018
[12] H W Siesler ldquoVibrational spectroscopyrdquo in ReferenceModule in Materials Science and Materials EngineeringElsevier New York NY USA 2016
[13] A Paula Craig B G Botelho L S Oliveira and A S FrancaldquoMid infrared spectroscopy and chemometrics as tools for theclassification of roasted coffees by cup qualityrdquo FoodChemistry vol 245 pp 1052ndash1061 2018
[14] S R Khandasammy M A Fikiet E Mistek et al ldquoBloodstainspaintings and drugs raman spectroscopy applications in fo-rensic sciencerdquo Forensic Chemistry vol 8 pp 111ndash133 2018
[15] J Engel G Jan E Szymanska et al ldquoBreaking with trends inpre-processingrdquo TrAC Trends in Analytical Chemistryvol 50 pp 96ndash106 2013
[16] H Martens S Jensen and P Geladi ldquoMultivariate linearitytransformation for near-infrared reflectance spectrometryrdquo inProceedings of Nordic symposium on Applied Statisticspp 205ndash234 Stavanger Norway June 1983
[17] P Geladi D MacDougall and HMartens ldquoLinearization andscatter-correction for near-infrared reflectance spectra ofmeatrdquo Applied Spectroscopy vol 39 no 3 pp 491ndash500 1985
[18] P Kubelka and F Munk ldquoEin beitrag zur optik der far-banstricherdquo Zeitschrift fur Technische Physik vol 12pp 593ndash601 1931
[19] A Claus Andersson ldquoDirect orthogonalizationrdquo Chemo-metrics and Intelligent Laboratory Systems vol 47 no 1pp 51ndash63 1999
[20] R J Barnes M S Dhanoa and S J Lister ldquoStandard normalvariate transformation and de-trending of near-infrareddiffuse reflectance spectrardquo Applied Spectroscopy vol 43no 5 pp 772ndash777 1989
[21] M Zeaiter J-M Roger and V Bellon-Maurel ldquoRobustnessof models developed by multivariate calibration part ii theinfluence of pre-processing methodsrdquo TrAC Trends in Ana-lytical Chemistry vol 24 no 5 pp 437ndash445 2005
[22] T Fearn C Riccioli A Garrido-Varo and J Emilio Guer-rero-Ginel ldquoOn the geometry of SNV and mscrdquo Chemo-metrics and Intelligent Laboratory Systems vol 96 no 1pp 22ndash26 2009
[23] T Isaksson and B Kowalski ldquoPiece-wise multiplicative scattercorrection applied to near-infrared diffuse transmittance datafrom meat productsrdquo Applied Spectroscopy vol 47 no 6pp 702ndash709 1993
[24] Y Bi K Yuan W Xiao et al ldquoA local pre-processing methodfor near-infrared spectra combined with spectral segmen-tation and standard normal variate transformationrdquo Analy-tica Chimica Acta vol 909 pp 30ndash40 2016
[25] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 pp 2825ndash2830 2011
[26] S Raschka Python Machine Learning Packt PublishingBirmingham UK 2015
[27] E L Lehmann and G Casella Deory of Point EstimationSpringer Texts in Statistics Springer New York NY USA2003
[28] O Heinisch ldquoSteel R G D and J H Torrie principles andprocedures of statistics (with special reference to the bi-ological sciences) mcgraw-hill book company New York
Journal of Spectroscopy 13
Toronto London 1960 481 s 15 abb 81 s 6 drdquo BiometrischeZeitschrift vol 4 no 3 pp 207-208 1962
[29] K Pearson ldquoNote on regression and inheritance in the case oftwo parentsrdquo Proceedings of the Royal Society of Londonvol 58 no 1 pp 240ndash242 1895
14 Journal of Spectroscopy
TribologyAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
International Journal ofInternational Journal ofPhotoenergy
Hindawiwwwhindawicom Volume 2018
Journal of
Chemistry
Hindawiwwwhindawicom Volume 2018
Advances inPhysical Chemistry
Hindawiwwwhindawicom
Analytical Methods in Chemistry
Journal of
Volume 2018
Bioinorganic Chemistry and ApplicationsHindawiwwwhindawicom Volume 2018
SpectroscopyInternational Journal of
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Medicinal ChemistryInternational Journal of
Hindawiwwwhindawicom Volume 2018
NanotechnologyHindawiwwwhindawicom Volume 2018
Journal of
Applied ChemistryJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Biochemistry Research International
Hindawiwwwhindawicom Volume 2018
Enzyme Research
Hindawiwwwhindawicom Volume 2018
Journal of
SpectroscopyAnalytical ChemistryInternational Journal of
Hindawiwwwhindawicom Volume 2018
MaterialsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
BioMed Research International Electrochemistry
International Journal of
Hindawiwwwhindawicom Volume 2018
Na
nom
ate
ria
ls
Hindawiwwwhindawicom Volume 2018
Journal ofNanomaterials
Submit your manuscripts atwwwhindawicom
phenolic aging inhibitor is a compound with very narrowvibrational band in the ATF B and thus does not have a greatimpact when the SNV is carried out across the entirespectrum (is may lead to a suboptimal alignment of thisband In case of novel standardizations the SNV is opti-mized to the high correlative bands and scatter effects can becompensated for these exact regions
(e fact that PSNV is unsuccessful for the antioxidantmay be because the POIs are not centered to the middle ofthe SNVwindow andmay have large left and right margins ifthey are far away from other POIs As a result they may notbe optimally standardized (is is shown in Figure 6(b)where the single POI at about 2700 cmminus1 has a large singleSNV window
(e study provides an overview of model performanceswhen using transmission or absorbance spectra suggestingthat both cases can lead to valid regressionmodels Howeverfor quantitative models built on transmission spectra theSNV is vital whereas in the absorbance case the predictivepower does not depend on SNV transformation In absor-bance spectra the influence of the baseline constant is re-duced because high transmission values are converted intolow absorbance values
To conclude DLSNV PSNV and PPSNV were able toimprove both transmission and absorbance predictivemodels (e scattering around the mean values are also
drastically reduced because the model does not have to learnhow to compensate for the baseline shift in each crossvalidation step leading to more reproducible results Eachvibrational band is optimally aligned so that the additivedepletion trend is encoded in the absolute signal intensityand the model does not have to weigh a data point asbackground correction
5 Conclusion
(e results presented in this study demonstrate the out-performance of the proposed novel standardizationstrategies Dynamic Localized SNV Peak SNV and PartialPeak SNV to improve both the mean and scatter of RMSEP
Noise level factor
RMSE
Wo standardizationSNVLSNVDLSNV
PSNVPPSNVTransmissionAbsorbance
20
15
10
05
0001 02 03 04
(a)
LSNVDLSNVPSNV
PPSNVTransmissionAbsorbance
RMSE
Noise level factor
07
06
05
04
03
02
0101 02 03 04
(b)
Figure 12 RMSE as a function of the noise level factor for the regression of the antioxidant compound of ATF B calibrated by the originalfull data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50 repetitions of noiseaddition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown
Table 3 Summary of the preprocessing parameters For DLSNVwindow size and starting point for PSN agglomeration windowand for PPSNV window width around POI is shown
MethodATF A ATF B
Transmission Absorbance Transmission AbsorbanceLSNV wopt 52 wopt 52 wopt 51 wopt 2 51DLSNV wopt 2 52 wopt 2 52 wopt 2 51 wopt 2 51
sopt 5 sopt 7 sopt 48 sopt 48PSNV aggopt 5 aggopt 14 aggopt 6 aggopt 10PPSNV pwopt 17 pwopt 25 pwopt 15 pwopt 38
12 Journal of Spectroscopy
values in cross validation and the robustness against noisedrastically with respect to SNV transformation executed onthe entire spectrum Against the benchmark LSNV anenhancement of the predictive power of a ridge regressionmodel by up to 16 and 29 could be achieved for thefriction modifier and the antioxidant compound re-spectively (e demonstrated optimization workflows forperforming SNV on specific regions of the spectrum havebeen introduced here for the first time (erefore thestandardization methods used in this paper are capable ofeliminating nonlinearities by flexible rescaling in definedareas To our knowledge such standardization techniqueshave not been presented elsewhere
Data Availability
(e data used to support the findings of this study are in-cluded within the article
Conflicts of Interest
(e authors declare that they have no conflicts of interest
References
[1] M Zeaiter J M Roger and V Bellon-Maurel ldquoDynamicorthogonal projection a new method to maintain the on-linerobustness of multivariate calibrations application to nir-based monitoring of wine fermentationsrdquo Chemometrics andIntelligent Laboratory Systems vol 80 no 2 pp 227ndash2352006
[2] R M Balabin and R Z Safieva ldquoMotor oil classificationby base stock and viscosity based on near infrared (nir)spectroscopy datardquo Fuel vol 87 no 12 pp 2745ndash27522008
[3] A Borin and R J Poppi ldquoMultivariate quality control oflubricating oils using fourier transform infrared spectros-copyrdquo Journal of the Brazilian Chemical Society vol 15 no 4pp 570ndash576 2004
[4] J Huang D Brennan L Sattler J Alderman B Lane andC OrsquoMathuna ldquoA comparison of calibration methods basedon calibration data size and robustnessrdquo Chemometrics andIntelligent Laboratory Systems vol 62 no 1 pp 25ndash352002
[5] M A Al-Ghouti and L Al-Atoum ldquoVirgin and recycledengine oil differentiation a spectroscopic studyrdquo Journal ofEnvironmental Management vol 90 no 1 pp 187ndash195 2009
[6] R Kellner J M Mermet M Otto M Valcarcel andH M Widmer Analytical Chemistry John Wiley amp SonsAustralia Limited Milton Australia 2004
[7] B G Osborne T Fearn P T Hindle and B G OsbornePractical nir Spectroscopy with Applications in Food andBeverage Analysis Longman Scientific amp Technical WileyHarlow Essex UK 1993
[8] J D Donald and D D Kevin Interpreting Diffuse Reflectanceand Transmittance NIR Chichester UK 2007
[9] A Rinnan F van den Berg and S B Engelsen ldquoReview of themost common pre-processing techniques for near-infraredspectrardquo TrAC Trends in Analytical Chemistry vol 28 no 10pp 1201ndash1222 2009
[10] E Arendse O Amos Fawole L Samukelo MagwazaN Helene and U Linus Opara ldquoComparing the analyticalperformance of near and mid infrared spectrometers for
evaluating pomegranate juice qualityrdquo LWT vol 91pp 180ndash190 2018
[11] J C Machado M A Faria I M P L V O FerreiraR N M J Pascoa and J A Lopes ldquoVarietal discrimination ofhop pellets by near and mid infrared spectroscopyrdquo Talantavol 180 pp 69ndash75 2018
[12] H W Siesler ldquoVibrational spectroscopyrdquo in ReferenceModule in Materials Science and Materials EngineeringElsevier New York NY USA 2016
[13] A Paula Craig B G Botelho L S Oliveira and A S FrancaldquoMid infrared spectroscopy and chemometrics as tools for theclassification of roasted coffees by cup qualityrdquo FoodChemistry vol 245 pp 1052ndash1061 2018
[14] S R Khandasammy M A Fikiet E Mistek et al ldquoBloodstainspaintings and drugs raman spectroscopy applications in fo-rensic sciencerdquo Forensic Chemistry vol 8 pp 111ndash133 2018
[15] J Engel G Jan E Szymanska et al ldquoBreaking with trends inpre-processingrdquo TrAC Trends in Analytical Chemistryvol 50 pp 96ndash106 2013
[16] H Martens S Jensen and P Geladi ldquoMultivariate linearitytransformation for near-infrared reflectance spectrometryrdquo inProceedings of Nordic symposium on Applied Statisticspp 205ndash234 Stavanger Norway June 1983
[17] P Geladi D MacDougall and HMartens ldquoLinearization andscatter-correction for near-infrared reflectance spectra ofmeatrdquo Applied Spectroscopy vol 39 no 3 pp 491ndash500 1985
[18] P Kubelka and F Munk ldquoEin beitrag zur optik der far-banstricherdquo Zeitschrift fur Technische Physik vol 12pp 593ndash601 1931
[19] A Claus Andersson ldquoDirect orthogonalizationrdquo Chemo-metrics and Intelligent Laboratory Systems vol 47 no 1pp 51ndash63 1999
[20] R J Barnes M S Dhanoa and S J Lister ldquoStandard normalvariate transformation and de-trending of near-infrareddiffuse reflectance spectrardquo Applied Spectroscopy vol 43no 5 pp 772ndash777 1989
[21] M Zeaiter J-M Roger and V Bellon-Maurel ldquoRobustnessof models developed by multivariate calibration part ii theinfluence of pre-processing methodsrdquo TrAC Trends in Ana-lytical Chemistry vol 24 no 5 pp 437ndash445 2005
[22] T Fearn C Riccioli A Garrido-Varo and J Emilio Guer-rero-Ginel ldquoOn the geometry of SNV and mscrdquo Chemo-metrics and Intelligent Laboratory Systems vol 96 no 1pp 22ndash26 2009
[23] T Isaksson and B Kowalski ldquoPiece-wise multiplicative scattercorrection applied to near-infrared diffuse transmittance datafrom meat productsrdquo Applied Spectroscopy vol 47 no 6pp 702ndash709 1993
[24] Y Bi K Yuan W Xiao et al ldquoA local pre-processing methodfor near-infrared spectra combined with spectral segmen-tation and standard normal variate transformationrdquo Analy-tica Chimica Acta vol 909 pp 30ndash40 2016
[25] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 pp 2825ndash2830 2011
[26] S Raschka Python Machine Learning Packt PublishingBirmingham UK 2015
[27] E L Lehmann and G Casella Deory of Point EstimationSpringer Texts in Statistics Springer New York NY USA2003
[28] O Heinisch ldquoSteel R G D and J H Torrie principles andprocedures of statistics (with special reference to the bi-ological sciences) mcgraw-hill book company New York
Journal of Spectroscopy 13
Toronto London 1960 481 s 15 abb 81 s 6 drdquo BiometrischeZeitschrift vol 4 no 3 pp 207-208 1962
[29] K Pearson ldquoNote on regression and inheritance in the case oftwo parentsrdquo Proceedings of the Royal Society of Londonvol 58 no 1 pp 240ndash242 1895
14 Journal of Spectroscopy
TribologyAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
International Journal ofInternational Journal ofPhotoenergy
Hindawiwwwhindawicom Volume 2018
Journal of
Chemistry
Hindawiwwwhindawicom Volume 2018
Advances inPhysical Chemistry
Hindawiwwwhindawicom
Analytical Methods in Chemistry
Journal of
Volume 2018
Bioinorganic Chemistry and ApplicationsHindawiwwwhindawicom Volume 2018
SpectroscopyInternational Journal of
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Medicinal ChemistryInternational Journal of
Hindawiwwwhindawicom Volume 2018
NanotechnologyHindawiwwwhindawicom Volume 2018
Journal of
Applied ChemistryJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Biochemistry Research International
Hindawiwwwhindawicom Volume 2018
Enzyme Research
Hindawiwwwhindawicom Volume 2018
Journal of
SpectroscopyAnalytical ChemistryInternational Journal of
Hindawiwwwhindawicom Volume 2018
MaterialsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
BioMed Research International Electrochemistry
International Journal of
Hindawiwwwhindawicom Volume 2018
Na
nom
ate
ria
ls
Hindawiwwwhindawicom Volume 2018
Journal ofNanomaterials
Submit your manuscripts atwwwhindawicom
values in cross validation and the robustness against noisedrastically with respect to SNV transformation executed onthe entire spectrum Against the benchmark LSNV anenhancement of the predictive power of a ridge regressionmodel by up to 16 and 29 could be achieved for thefriction modifier and the antioxidant compound re-spectively (e demonstrated optimization workflows forperforming SNV on specific regions of the spectrum havebeen introduced here for the first time (erefore thestandardization methods used in this paper are capable ofeliminating nonlinearities by flexible rescaling in definedareas To our knowledge such standardization techniqueshave not been presented elsewhere
Data Availability
(e data used to support the findings of this study are in-cluded within the article
Conflicts of Interest
(e authors declare that they have no conflicts of interest
References
[1] M Zeaiter J M Roger and V Bellon-Maurel ldquoDynamicorthogonal projection a new method to maintain the on-linerobustness of multivariate calibrations application to nir-based monitoring of wine fermentationsrdquo Chemometrics andIntelligent Laboratory Systems vol 80 no 2 pp 227ndash2352006
[2] R M Balabin and R Z Safieva ldquoMotor oil classificationby base stock and viscosity based on near infrared (nir)spectroscopy datardquo Fuel vol 87 no 12 pp 2745ndash27522008
[3] A Borin and R J Poppi ldquoMultivariate quality control oflubricating oils using fourier transform infrared spectros-copyrdquo Journal of the Brazilian Chemical Society vol 15 no 4pp 570ndash576 2004
[4] J Huang D Brennan L Sattler J Alderman B Lane andC OrsquoMathuna ldquoA comparison of calibration methods basedon calibration data size and robustnessrdquo Chemometrics andIntelligent Laboratory Systems vol 62 no 1 pp 25ndash352002
[5] M A Al-Ghouti and L Al-Atoum ldquoVirgin and recycledengine oil differentiation a spectroscopic studyrdquo Journal ofEnvironmental Management vol 90 no 1 pp 187ndash195 2009
[6] R Kellner J M Mermet M Otto M Valcarcel andH M Widmer Analytical Chemistry John Wiley amp SonsAustralia Limited Milton Australia 2004
[7] B G Osborne T Fearn P T Hindle and B G OsbornePractical nir Spectroscopy with Applications in Food andBeverage Analysis Longman Scientific amp Technical WileyHarlow Essex UK 1993
[8] J D Donald and D D Kevin Interpreting Diffuse Reflectanceand Transmittance NIR Chichester UK 2007
[9] A Rinnan F van den Berg and S B Engelsen ldquoReview of themost common pre-processing techniques for near-infraredspectrardquo TrAC Trends in Analytical Chemistry vol 28 no 10pp 1201ndash1222 2009
[10] E Arendse O Amos Fawole L Samukelo MagwazaN Helene and U Linus Opara ldquoComparing the analyticalperformance of near and mid infrared spectrometers for
evaluating pomegranate juice qualityrdquo LWT vol 91pp 180ndash190 2018
[11] J C Machado M A Faria I M P L V O FerreiraR N M J Pascoa and J A Lopes ldquoVarietal discrimination ofhop pellets by near and mid infrared spectroscopyrdquo Talantavol 180 pp 69ndash75 2018
[12] H W Siesler ldquoVibrational spectroscopyrdquo in ReferenceModule in Materials Science and Materials EngineeringElsevier New York NY USA 2016
[13] A Paula Craig B G Botelho L S Oliveira and A S FrancaldquoMid infrared spectroscopy and chemometrics as tools for theclassification of roasted coffees by cup qualityrdquo FoodChemistry vol 245 pp 1052ndash1061 2018
[14] S R Khandasammy M A Fikiet E Mistek et al ldquoBloodstainspaintings and drugs raman spectroscopy applications in fo-rensic sciencerdquo Forensic Chemistry vol 8 pp 111ndash133 2018
[15] J Engel G Jan E Szymanska et al ldquoBreaking with trends inpre-processingrdquo TrAC Trends in Analytical Chemistryvol 50 pp 96ndash106 2013
[16] H Martens S Jensen and P Geladi ldquoMultivariate linearitytransformation for near-infrared reflectance spectrometryrdquo inProceedings of Nordic symposium on Applied Statisticspp 205ndash234 Stavanger Norway June 1983
[17] P Geladi D MacDougall and HMartens ldquoLinearization andscatter-correction for near-infrared reflectance spectra ofmeatrdquo Applied Spectroscopy vol 39 no 3 pp 491ndash500 1985
[18] P Kubelka and F Munk ldquoEin beitrag zur optik der far-banstricherdquo Zeitschrift fur Technische Physik vol 12pp 593ndash601 1931
[19] A Claus Andersson ldquoDirect orthogonalizationrdquo Chemo-metrics and Intelligent Laboratory Systems vol 47 no 1pp 51ndash63 1999
[20] R J Barnes M S Dhanoa and S J Lister ldquoStandard normalvariate transformation and de-trending of near-infrareddiffuse reflectance spectrardquo Applied Spectroscopy vol 43no 5 pp 772ndash777 1989
[21] M Zeaiter J-M Roger and V Bellon-Maurel ldquoRobustnessof models developed by multivariate calibration part ii theinfluence of pre-processing methodsrdquo TrAC Trends in Ana-lytical Chemistry vol 24 no 5 pp 437ndash445 2005
[22] T Fearn C Riccioli A Garrido-Varo and J Emilio Guer-rero-Ginel ldquoOn the geometry of SNV and mscrdquo Chemo-metrics and Intelligent Laboratory Systems vol 96 no 1pp 22ndash26 2009
[23] T Isaksson and B Kowalski ldquoPiece-wise multiplicative scattercorrection applied to near-infrared diffuse transmittance datafrom meat productsrdquo Applied Spectroscopy vol 47 no 6pp 702ndash709 1993
[24] Y Bi K Yuan W Xiao et al ldquoA local pre-processing methodfor near-infrared spectra combined with spectral segmen-tation and standard normal variate transformationrdquo Analy-tica Chimica Acta vol 909 pp 30ndash40 2016
[25] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 pp 2825ndash2830 2011
[26] S Raschka Python Machine Learning Packt PublishingBirmingham UK 2015
[27] E L Lehmann and G Casella Deory of Point EstimationSpringer Texts in Statistics Springer New York NY USA2003
[28] O Heinisch ldquoSteel R G D and J H Torrie principles andprocedures of statistics (with special reference to the bi-ological sciences) mcgraw-hill book company New York
Journal of Spectroscopy 13
Toronto London 1960 481 s 15 abb 81 s 6 drdquo BiometrischeZeitschrift vol 4 no 3 pp 207-208 1962
[29] K Pearson ldquoNote on regression and inheritance in the case oftwo parentsrdquo Proceedings of the Royal Society of Londonvol 58 no 1 pp 240ndash242 1895
14 Journal of Spectroscopy
TribologyAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
International Journal ofInternational Journal ofPhotoenergy
Hindawiwwwhindawicom Volume 2018
Journal of
Chemistry
Hindawiwwwhindawicom Volume 2018
Advances inPhysical Chemistry
Hindawiwwwhindawicom
Analytical Methods in Chemistry
Journal of
Volume 2018
Bioinorganic Chemistry and ApplicationsHindawiwwwhindawicom Volume 2018
SpectroscopyInternational Journal of
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Medicinal ChemistryInternational Journal of
Hindawiwwwhindawicom Volume 2018
NanotechnologyHindawiwwwhindawicom Volume 2018
Journal of
Applied ChemistryJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Biochemistry Research International
Hindawiwwwhindawicom Volume 2018
Enzyme Research
Hindawiwwwhindawicom Volume 2018
Journal of
SpectroscopyAnalytical ChemistryInternational Journal of
Hindawiwwwhindawicom Volume 2018
MaterialsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
BioMed Research International Electrochemistry
International Journal of
Hindawiwwwhindawicom Volume 2018
Na
nom
ate
ria
ls
Hindawiwwwhindawicom Volume 2018
Journal ofNanomaterials
Submit your manuscripts atwwwhindawicom
Toronto London 1960 481 s 15 abb 81 s 6 drdquo BiometrischeZeitschrift vol 4 no 3 pp 207-208 1962
[29] K Pearson ldquoNote on regression and inheritance in the case oftwo parentsrdquo Proceedings of the Royal Society of Londonvol 58 no 1 pp 240ndash242 1895
14 Journal of Spectroscopy
TribologyAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
International Journal ofInternational Journal ofPhotoenergy
Hindawiwwwhindawicom Volume 2018
Journal of
Chemistry
Hindawiwwwhindawicom Volume 2018
Advances inPhysical Chemistry
Hindawiwwwhindawicom
Analytical Methods in Chemistry
Journal of
Volume 2018
Bioinorganic Chemistry and ApplicationsHindawiwwwhindawicom Volume 2018
SpectroscopyInternational Journal of
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Medicinal ChemistryInternational Journal of
Hindawiwwwhindawicom Volume 2018
NanotechnologyHindawiwwwhindawicom Volume 2018
Journal of
Applied ChemistryJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Biochemistry Research International
Hindawiwwwhindawicom Volume 2018
Enzyme Research
Hindawiwwwhindawicom Volume 2018
Journal of
SpectroscopyAnalytical ChemistryInternational Journal of
Hindawiwwwhindawicom Volume 2018
MaterialsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
BioMed Research International Electrochemistry
International Journal of
Hindawiwwwhindawicom Volume 2018
Na
nom
ate
ria
ls
Hindawiwwwhindawicom Volume 2018
Journal ofNanomaterials
Submit your manuscripts atwwwhindawicom
TribologyAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
International Journal ofInternational Journal ofPhotoenergy
Hindawiwwwhindawicom Volume 2018
Journal of
Chemistry
Hindawiwwwhindawicom Volume 2018
Advances inPhysical Chemistry
Hindawiwwwhindawicom
Analytical Methods in Chemistry
Journal of
Volume 2018
Bioinorganic Chemistry and ApplicationsHindawiwwwhindawicom Volume 2018
SpectroscopyInternational Journal of
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Medicinal ChemistryInternational Journal of
Hindawiwwwhindawicom Volume 2018
NanotechnologyHindawiwwwhindawicom Volume 2018
Journal of
Applied ChemistryJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Biochemistry Research International
Hindawiwwwhindawicom Volume 2018
Enzyme Research
Hindawiwwwhindawicom Volume 2018
Journal of
SpectroscopyAnalytical ChemistryInternational Journal of
Hindawiwwwhindawicom Volume 2018
MaterialsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
BioMed Research International Electrochemistry
International Journal of
Hindawiwwwhindawicom Volume 2018
Na
nom
ate
ria
ls
Hindawiwwwhindawicom Volume 2018
Journal ofNanomaterials
Submit your manuscripts atwwwhindawicom