preprocessing: smoothing and derivatives -...

18
Preprocessing: Smoothing and Derivatives Vladimir Bochko Jarmo Alander University of Vaasa November 1, 2010 Vladimir Bochko Chemometrics. Preprocessing: smoothing and derivatives

Upload: doanduong

Post on 18-Mar-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

Preprocessing:Smoothing and

Derivatives

Vladimir Bochko Jarmo Alander

University of Vaasa

November 1, 2010

Vladimir Bochko Chemometrics. Preprocessing: smoothing and derivatives

Contents

• Preprocessing

• Baseline correction.

• Autoscaling.

• Savitzky-Golay Smoothing Filters

• Savitzky-Golay Filters for Derivatives

• Smoothing Splines

• Splines for Derivatives

Vladimir Bochko Chemometrics. Preprocessing: smoothing and derivatives

Preprocessing

• Normalizing. It is done by dividing each feature realted toa sample vector by a constant. The constant value may bedefined as the 1-norm or 2-norm of the vector.

• Weighting. For weighting any values are used. Thepurpose to give importance only to some samples.

• Smoothing. Reduce noise and keep useful variation.Smoothing filters.

• Baseline correction. Reduce unwilling slow variation.Estimation of baseline and subtruction from spectrum,Derivatives, Multiplicative Scatter Correction MSC.

Vladimir Bochko Chemometrics. Preprocessing: smoothing and derivatives

Baseline correction

• We consider the measured spectrum s. The baseline isapproximated using polynomial [3], for example up to linearterm.

s = s̃ + α + βx

where x is a wavelength number, α defines a shift (offset)in spectrum and β desribes a linear change in spectrum.

• We are interested in s̃ and the rest terms describeunwilling variation.

• If the baseline model has only shift then after estimatingthe shift we may subtruct it from each spectrum.

Vladimir Bochko Chemometrics. Preprocessing: smoothing and derivatives

Derivatives

• If we have only shift s = s̃ + α we can use derivative withrespect x to remove the shift as follows: s

′ = (s̃ + α)′ = s̃′.

• Computing the successive derivaties we may remove thebaseline defined by the higher order model.

• Numerically the first derivative for s = (s1, s2, . . . , sk) canbe computed using a moving window where si is replacedby si − si−1.

• For the second derivate we use the moving window:(si − si−1) − (si−1 − si−2) = si − 2si−1 + si−2

• We can also use the derivative using the moving window:(si+si−1)

2 − ( si−1+si−2)2 . This computing is robust to noise

because we average spectrum values.

Vladimir Bochko Chemometrics. Preprocessing: smoothing and derivatives

Autoscaling

• The autoscaling procedure uses both mean centering andvariance scaling and helps for removing unwilling variation.

• Mean centering. For each feature we compute mean andsubtruct it from the other feature values.

• Variance scaling is implemented by divison each featurevalue by standard deviation. Standard deviation is definedusing values of this feature.

Vladimir Bochko Chemometrics. Preprocessing: smoothing and derivatives

Savitzky-Golay Smoothing Filters

• The mesured spectra are usually slowly varying andcorrupted by random noise. The beginning and the end ofthe spectra contain mostly noise. The purpose of smootingis to reduce the level of noise keeping spectrum details.

• Savitzky-Golay filters were designed to analyze theabsorption peaks in noisy spectra. The absorption peakssuggest which chemical elements are presented in thetested materials.

• Filters have several names Savitzky-Golay, least squaresand Digital Smoothing Polynomial (DISPO) filters.

• The filters operate in the time domain and relate tolow-pass filters.

Vladimir Bochko Chemometrics. Preprocessing: smoothing and derivatives

Savitzky-Golay Smoothing Filters

Suppose we have a spectrum with equally spaced valuessi = s0 + i∆, where the spectral values are obtained at intervals∆ anf i = . . . ,−2,−1, 0, 1, 2, . . .. The finit impulse responsefilter replaces si by a linear combination of its neighbor values

fi =

nright∑

n=nleft

cnsi+n

where nleft and nright are the number of points to the left andright of a current point, respectively. cn are weight coefficients.

Vladimir Bochko Chemometrics. Preprocessing: smoothing and derivatives

Savitzky-Golay Smoothing Filters

• For an averaging filter all coefficients are constantcn = 1/(nleft + nright + 1). The coefficients are computedin a moving window.

• The Savitzky-Golay smoothing filter uses a polynomial ofhigher order. Least-squares fitting is used to fit thepolynomial to the data inside the moving window. The useof the least square fit in this way may be computationallydemanding. However it is possible to find the coefficientscn for which least square fitting inside a moving window isautomatically implemented.

Vladimir Bochko Chemometrics. Preprocessing: smoothing and derivatives

Savitzky-Golay Smoothing Filters

• Next the Savitzky-Golay smoothing filter coefficients aregiven for the polynomial order M [1].

M cnleft cn0 cnright

2 -0.086 0.343 0.486 0.343 -0.0864 0.035 -0.128 0.070 0.315 0.417 0.315 0.070 -0.128 0.035

• In practice, however, the larger values nleft and nright areused, e.g. nleft = nright = 16, 32.

Vladimir Bochko Chemometrics. Preprocessing: smoothing and derivatives

Savitzky-Golay Smoothing Filters

• a) The generated synthetic signal a(x) which is used as anunderlying function. b) The same signal with additiveGausian random noise.

a b

0 1 2 3 4 5 60.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

0 1 2 3 4 5 60

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Vladimir Bochko Chemometrics. Preprocessing: smoothing and derivatives

Savitzky-Golay Smoothing Filters

• a) Moving average, nleft = nright = 15. b) Savitzky-Golayfilter, nleft = nright = 15, polynomial order 2. Thesmoothed result is shown by green dots.

a b

0 1 2 3 4 5

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2a vs. xa vs. x (smooth)

0 1 2 3 4 5

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2a vs. xa vs. x (smooth)

Vladimir Bochko Chemometrics. Preprocessing: smoothing and derivatives

Savitzky-Golay Smoothing Filters• a) Savitzky-Golay filter, nleft = nright = 15, polynomial

order 2. a) Savitzky-Golay filter, nleft = nright = 30,polynomial order 2. The smoothed result is shown bygreen dots.

a b

0 1 2 3 4 5

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2a vs. xa vs. x (smooth)

0 1 2 3 4 5

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2a vs. xa vs. x (smooth)

Vladimir Bochko Chemometrics. Preprocessing: smoothing and derivatives

Savitzky-Golay Filters for Derivatives

• This technique may be used with noisy data.

• The polynomial and least square fitting are used at eachpoint of spectra.

• The measured derivative is the derivative of the fittedpolynomial.

• The Savitzky-Golay filters from libraries can be used in thederivative mode.

• Computation is made in the moving window.

Vladimir Bochko Chemometrics. Preprocessing: smoothing and derivatives

Smoothing Splines• Splines and wavelets are the other popular techniques

used for smoothing. For example, the smoothing splinemay fit spectral data. For the next illustration the smoothingsplines were used from the functional data analysis library[2]. a) The original spectrum of the gastrointestinal tract ofthe dog. b) The smoothed spectrum.

a b

400 500 600 700 800 900 10000

50

100

150

200

250

300

Wavelength, nm

Val

ue

400 500 600 700 800 900 1000

50

100

150

200

250

300

Wavelength, nm

Val

ue

Vladimir Bochko Chemometrics. Preprocessing: smoothing and derivatives

Splines for derivatives• The computation of derivatives may be ustable numerically.

To achieve robust computation a B-spline is used. Afterobtaining the analytical spline expansion of spectrum thefirst and second derivatives can be exactly computed. a)The smoothed spectrum of the gastrointestinal tract of thedog. b) The first derivative.

a b

400 500 600 700 800 900 1000

50

100

150

200

250

300

Wavelength, nm

Val

ue

400 500 600 700 800 900 1000−20

−15

−10

−5

0

5

10

15

20

Wavelength, nm

Der

ivat

ive

valu

e

Vladimir Bochko Chemometrics. Preprocessing: smoothing and derivatives

References

Press W. H. , S.A. Teukolsky S.A., W. T. Vetterling W. T., FlanneryB. P.. Numerical Recipes in C++, Cambridge University Press,2002.

Ramsay J. O. and Silverman B. W., Functional Data Analysis.Springer-Verlag, New York, 1997

Beebe K., Pell R. J., Seasholtz M. B., Chemometrics: A Practicalguide. John Wiley & Sons, Inc., 1998.

SOLO toolbox:http://wiki.eigenvector.com/index.php?title=Main Page

Vladimir Bochko Chemometrics. Preprocessing: smoothing and derivatives

Questions

Vladimir Bochko Chemometrics. Preprocessing: smoothing and derivatives