validation of the performance of process stream analyzer systems

9
PRACTITIONER’S REPORT Validation of the performance of process stream analyzer systems with nonparametric behavior Aerenton Ferreira Bueno Deborah Aparecida Flores Ozaki Eduardo Barbosa Evandro Evangelista dos Santos Maura Moreira Gomes Patrı ´cia Hiromi Iida Soraia Cristina Almeida dos Santos Fagner Geovani de Sa ´ Elcio Cruz de Oliveira Received: 3 January 2014 / Accepted: 8 April 2014 Ó Springer-Verlag Berlin Heidelberg 2014 Abstract Literature describes several procedures based on statistical principles to validate whether the degree of agreement between the results produced by a stream ana- lyzer system—systems for determining chemical and physical characteristics by continual sampling, automatic analysis, and recording or otherwise signaling of output data—and those produced by an independent test mea- surement procedure, which purports to measure the same property, meets user-specified requirements. However, these documents always consider that the data gathered during validation procedure are normally distributed, and this supposition it is not always true. The aim of this manuscript is to develop four different procedures applied to data with nonparametric behavior. The most represen- tative validation procedure compares measures of position and dispersion, using typical process samples, while the less representative one compares only measures of posi- tion, using a reference sample. These nonparametric data are also treated using parametric statistics for comparison. The results show that different conclusions can be reached when the nonparametric data are treated as parametric. A new Brazilian standard is being carried out based on this subject, in order to complement the international standards. Keywords Validation Performance Stream analyzer systems Nonparametric data Introduction With the increasing industrial automation, off-line ana- lyzers (bench or laboratory analyzers) have been replaced by online analyzers or stream analyzer systems for the following advantages: high speed, continuous monitoring suitable for closed-loop control; better accuracy due to absence of sample manipulation and human error, and less exposition to risk and hazardous samples. In this work, a total analyzer system design addresses the chemical and physical properties of the process or product stream to be measured, provides a representative sample, and handles it without adversely affecting the value of the spe- cific property of interest. Moreover, it includes a sample loop, piping, hardware, a sampling port, sample conditioning devices, analyzer unit instrumentation, any data analysis computer hardware and software, and a readout display [1]. However, when a new stream analyzer is initially installed, or after major maintenance has been performed, a set of tests must be performed to demonstrate that the analyzer meets the manufacturer’s specifications, the his- torical performance standards, and the specification limits set by the process being analyzed. These diagnostic tests may require that the analyzer be adjusted so as to provide predetermined output levels for certain certified reference materials (CRMs) or compared with other reliable ana- lyzers [1]. This practice is considered in this manuscript as ‘‘validation.’’ A. F. Bueno D. A. F. Ozaki E. Barbosa E. E. dos Santos M. M. Gomes P. H. Iida S. C. A. dos Santos F. G. de Sa ´ Petro ´leo Brasileiro S.A., Rio de Janeiro, Brazil F. G. de Sa ´ E. C. de Oliveira Post-Graduation Metrology Programme, Metrology for Quality and Innovation, Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro, RJ 22453-900, Brazil E. C. de Oliveira (&) Petrobras Transporte S.A., Av. Presidente Vargas, 328, Rio de Janeiro, RJ 20091-060, Brazil e-mail: [email protected] 123 Accred Qual Assur DOI 10.1007/s00769-014-1053-8

Upload: immer

Post on 26-Dec-2015

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Validation of the Performance of Process Stream Analyzer Systems

PRACTITIONER’S REPORT

Validation of the performance of process stream analyzer systemswith nonparametric behavior

Aerenton Ferreira Bueno • Deborah Aparecida Flores Ozaki • Eduardo Barbosa •

Evandro Evangelista dos Santos • Maura Moreira Gomes • Patrıcia Hiromi Iida •

Soraia Cristina Almeida dos Santos • Fagner Geovani de Sa • Elcio Cruz de Oliveira

Received: 3 January 2014 / Accepted: 8 April 2014

� Springer-Verlag Berlin Heidelberg 2014

Abstract Literature describes several procedures based

on statistical principles to validate whether the degree of

agreement between the results produced by a stream ana-

lyzer system—systems for determining chemical and

physical characteristics by continual sampling, automatic

analysis, and recording or otherwise signaling of output

data—and those produced by an independent test mea-

surement procedure, which purports to measure the same

property, meets user-specified requirements. However,

these documents always consider that the data gathered

during validation procedure are normally distributed, and

this supposition it is not always true. The aim of this

manuscript is to develop four different procedures applied

to data with nonparametric behavior. The most represen-

tative validation procedure compares measures of position

and dispersion, using typical process samples, while the

less representative one compares only measures of posi-

tion, using a reference sample. These nonparametric data

are also treated using parametric statistics for comparison.

The results show that different conclusions can be reached

when the nonparametric data are treated as parametric. A

new Brazilian standard is being carried out based on this

subject, in order to complement the international standards.

Keywords Validation � Performance �Stream analyzer systems � Nonparametric data

Introduction

With the increasing industrial automation, off-line ana-

lyzers (bench or laboratory analyzers) have been replaced

by online analyzers or stream analyzer systems for the

following advantages: high speed, continuous monitoring

suitable for closed-loop control; better accuracy due to

absence of sample manipulation and human error, and less

exposition to risk and hazardous samples.

In this work, a total analyzer system design addresses the

chemical and physical properties of the process or product

stream to be measured, provides a representative sample, and

handles it without adversely affecting the value of the spe-

cific property of interest. Moreover, it includes a sample

loop, piping, hardware, a sampling port, sample conditioning

devices, analyzer unit instrumentation, any data analysis

computer hardware and software, and a readout display [1].

However, when a new stream analyzer is initially

installed, or after major maintenance has been performed, a

set of tests must be performed to demonstrate that the

analyzer meets the manufacturer’s specifications, the his-

torical performance standards, and the specification limits

set by the process being analyzed. These diagnostic tests

may require that the analyzer be adjusted so as to provide

predetermined output levels for certain certified reference

materials (CRMs) or compared with other reliable ana-

lyzers [1]. This practice is considered in this manuscript as

‘‘validation.’’

A. F. Bueno � D. A. F. Ozaki � E. Barbosa �E. E. dos Santos � M. M. Gomes � P. H. Iida �S. C. A. dos Santos � F. G. de Sa

Petroleo Brasileiro S.A., Rio de Janeiro, Brazil

F. G. de Sa � E. C. de Oliveira

Post-Graduation Metrology Programme, Metrology for Quality

and Innovation, Pontifical Catholic University of Rio de Janeiro,

Rio de Janeiro, RJ 22453-900, Brazil

E. C. de Oliveira (&)

Petrobras Transporte S.A., Av. Presidente Vargas,

328, Rio de Janeiro, RJ 20091-060, Brazil

e-mail: [email protected]

123

Accred Qual Assur

DOI 10.1007/s00769-014-1053-8

Page 2: Validation of the Performance of Process Stream Analyzer Systems

Validation can be defined as a statistically quantified

judgment, so that the analyzer system or subsystem being

assessed can produce predicted primary test method results

with acceptable precision and bias performance when

compared with actual results from a primary test method

measurement system for common materials [2].

The literature quotes several international standards to

validate the performance stream analyzer systems [1, 3–5].

ASTM D6708 [6] and ASTM D7235 [7] are also cited for

data treating. However, all of them deal specifically with

Gaussian data.

In 2004, a nonparametric statistics study, focused to

sampling in atmospheric chemistry, has been carried out

[8].

The aim of this manuscript is to develop procedures to

validate the performance of the stream analyzer systems

with nonparametric behavior, because this is a common

situation in real-life data, for example, more than 60 % in

oil and gas online analyzers in Brazil.

Procedures

Some definitions are required:

• Reference Test (RT): Measurement procedure used in

the laboratory or portable analyzer to produce reliable

results under control, which can be compared with

those obtained by the process analyzer (PA);

• Process Typical Sample (PTS): Sample obtained from

the process stream so as to represent the typical

characteristics of the product;

• Reference Sample (RS): primary standard, mixing/

dilution of standards, CRMs or sample from interlab-

oratory comparisons;

• Validation Sample (VS): Sample used in the validation

procedure. It may be a PTS or RS whose parameter of

interest does not change significantly during the

validation procedure.

Validation is divided into two steps: initial validation

and continual validation. Continual validation is necessary

in order to guarantee that, after initial validation, the stream

analyzer systems continue in conformance to CRMs or

other reliable analyzers. Statistical process control (SPC)

chart monitoring is carried out with new production sam-

ples. However, continual validation is not treated in this

manuscript.

Initial validation

Initial validation is defined as a relationship between the

analyzer results and RT, RS, or PTS. Depending on the

approach, the initial validation can be carried out by the

combination of the RT, RS, and PTS.

The validation procedure should be selected according

to Fig. 1.

Revision of the nonparametric statistical analysis

Median absolute deviation (MAD)

This test applies to data set with behavior that departs from

normality and is an important robust univariate spread

measure [9]. In these situations, Grubbs’ test is not

recommended.

The median absolute deviation is considered as a mea-

sure of statistical dispersion. Moreover, the MAD is a

robust statistic, being more resilient to outliers in a data set

Fig. 2 Flow diagram for each procedure: Procedure A (a), Procedure

B (b), Procedure C (c), and Procedure D (d). a This procedure can be

considered as the best situation, because the validation is most

representative. The results from the analyzer for a set of typical

process samples are obtained, and these same samples are mandato-

rily tested by a reference test. Results are evaluated by specific

statistical tests, in order to compare measures of position (paired data)

and dispersion. b This procedure uses synthesized samples in the

laboratory or reference samples, when it is not possible to obtain

typical process samples. The validation process uses comparison of

position (unpaired data) and dispersion measures necessarily using a

reference test. c In this procedure, it is not possible to introduce a

validation sample into the analyzer, so dispersion measures cannot be

taken. Comparison of position measures (paired data) necessarily

using a reference test or typical process samples is carried out.

d When there is no reference test available, this procedure is

recommended. The confidence interval from a reference sample is

compared to the confidence interval derived from the process analyzer

only for position measures

c

Fig. 1 Selection of the validation procedure

Accred Qual Assur

123

Page 3: Validation of the Performance of Process Stream Analyzer Systems

(a) (b)

(c) (d)

Accred Qual Assur

123

Page 4: Validation of the Performance of Process Stream Analyzer Systems

than the standard deviation, mostly used in parametric

statistic. In the standard deviation calculations, the dis-

tances from the mean are squared, so large deviations are

weighted more heavily, and thus, outliers can strictly

influence it. In the MAD, the deviations of a small number

of outliers are irrelevant (Fig. 2).

The median value separates an equal number of higher

values from lower values of a set of data, a population, or a

probability distribution, from the lower half. It is recom-

mended when the data are not parametric.

A more logical approach to a robust analysis, focused

set of data with non-normal behavior, can be based on the

concept of a distance function. A robust estimate of the

variance [10] of xi values can be obtained from MAD,

Eq. (1):

MAD ¼ median xi �median ðxiÞj j ð1Þ

For the suspect result (lowest or higher value of a set of

data), x0, if x0 �median ðxiÞj j=MAD provides results

greater than 5, is rejected being considered as an outlier.

For instance, considering the data set (0.380, 0.400,

0.401, 0.403, 0.410, 0.411, 0.413), the median of these

numbers is 0.403, given in bold font. The absolute ordered

deviations calculated from the median are 0.000, 0.002,

0.003, 0.007, 0.008, 0.010, and 0.023. The MAD is the

median of these seven numbers, that is, 0.007, Eq. (1). For

the lowest value of the set one finds, |0.380–0.403|/

0.007 = 3.3, what is lower than the limit 5, and therefore,

this value is not considered an outlier [10].

Wilcoxon signed-rank test

The Wilcoxon signed-rank test is a nonparametric statisti-

cal hypothesis test used as an alternative to the paired

Student’s t test, t test for matched pairs, or the t test for

dependent sets of data when the population cannot be

assumed to be normally distributed [11].

For each pair set, the value of the first set of data is

clearly associated with the respective value of the second

set of data. These differences are sorted in ascending

order, and orders are given to them if they are different

from zero. If two or more differences are identical, the

average order used is that which would have been used if

the observations had differed. Considering a set of sam-

ples of which the same property is measured using two

different procedures, the difference between the mea-

surement results for each sample is evaluated and sorted

with increasing absolute value thus attributing order

numbers (excluding zero differences). These numbers

denote the rank Ri unless the absolute differences for two

or more samples are equal, when the average rank is

attributed to each of them. This is exemplified in Tables

1 and 2 with mass concentration values of zinc, deter-

mined by two different measurement procedures, for each

of eight samples of health food [10].

The ranking of these results presents an additional

worry, that of tied ranks, as 0.2, 0.4, and 0.7. In such cases,

it is worth verifying that the ranking has been done cor-

rectly by calculating the sum of all the ranks without regard

to sign, so each gets the average of those ranks, 1.5.

The sum of ranks of positive differences (in bold) is 7.5,

whereas the sum related to negative ranks amounts to 28.5.

A systematic difference between the data sets x2,i and

Table 3 Example of Wilcoxon rank-sum test

Measurementprocedure

Silverconcentration(lg mL-1)

Rank Measurementprocedure

Silverconcentration(lg mL-1)

Rank

2 7.7 1 1 9.8 6

2 8.0 2 2 9.9 7

2 9.0 3 1 10.2 8

1 9.5 4 1 10.5 9

2 9.7 5 1 10.7 10

Table 1 Measurement results of zinc mass concentration in eight

samples of health food (x2,i using EDTA titration, x1,i using atomic

spectroscopy) and their difference di

i x2,i (mg L-1) x1,i (mg L-1) sgndi |di| (mg L-1)

1 7.2 7.6 –1 0.4

2 6.1 6.8 –1 0.7

3 5.2 4.6 1 0.6

4 5.9 5.7 1 0.2

5 9.0 9.7 –1 0.7

6 8.5 8.7 –1 0.2

7 6.6 7.0 –1 0.4

8 4.4 4.7 –1 0.3

Sgn is the sign function

Table 2 Example of Wilcoxon signed-rank test applied to the data

presented in Table 1: ordering by absolute difference |di|

i x2,i (mg L-1) x1,i(mg L-1) sgndi |di| (mg L-1) Ri Risgndi

4 8.5 8.9 –1 0.2 1.5 –1.5

6 5.9 5.7 1 0.2 1.5 1.5

8 4.4 4.7 –1 0.3 3 –3

1 7.2 7.6 –1 0.4 4.5 –4.5

7 6.6 7.0 –1 0.4 4.5 –4.5

3 5.2 4.6 1 0.6 6 6

2 6.1 6.8 –1 0.7 7.5 –7.5

5 9.0 9.7 –1 0.7 7.5 –7.5

Bold fonts indicate positive differences di

Accred Qual Assur

123

Page 5: Validation of the Performance of Process Stream Analyzer Systems

x1,i must be assumed if the smaller sum is below a critical

value, which for 8 pairs and a probability of 0.05 is 3 [10].

Obviously, the data of the example do not provide evidence

for a systematic difference between the two measurement

procedures.

Wilcoxon rank-sum test

The Wilcoxon rank-sum test is a nonparametric alternative

to the two set of data t test which is based solely on the

order in which the observations from the two sets of data

fall [12].

This statistical test proposes to compare mean values for

two groups (sizes n1 and n2) that are not correlated. The

variances of the groups should be evaluated if they are

statistically identical or different.

Let X11, X12,…, and X21, X22,…, be two independent

random sets of data of sizes n1 B n2 from the contin-

uous populations X1 and X2, that is, populations in

which a random variable is measuring a continuous

characteristic.

The test procedure is as follows. Arrange all n1 ? n2

observations in ascending order of magnitude and assign

ranks to them. If two or more observations are tied (iden-

tical), use the mean of the ranks that would have been

assigned if the observations differed.

Considering W1 as the sum of the ranks in the smaller set

of data, and W2 as the sum of the ranks in the other set of

data [13], then

W2 ¼ ððn1 þ n2Þ � ðn1 þ n2 þ 1Þ=2Þ �W1 ð2Þ

As an example [11], a sample photographic waste was

analyzed for silver by atomic absorption spectrometry, five

successive measurements giving values of 9.8, 10.2, 10.7,

9.5, and 10.5 lg mL-1, measurement Procedure 1, without

chemical treatment. After chemical treatment, the waste

was analyzed again by the same analytical technique, five

successive measurements giving values of 7.7, 9.7, 8.0, 9.9,

and 9.0 lg mL-1, measurement Procedure 2.

These data are analyzed in ascending order and ranked

as shown in Table 3.

The sum of the ranks for measurement Procedure 1 is

W1 = 4 ? 6 ? 8 ? 9 ? 10 = 37, and for the measure-

ment Procedure 2 is W2 = ((5 ? 5) � (5 ? 5 ? 1)/2) -

37 = 18.

Since neither W1 nor W2 is less than or equal to

W0.05 = 17 [13], there is not a statistically significant dif-

ference between both measurement procedures for a

confidence level of 95 %.

Test of Moses

This statistical test attempts to compare the variability

(dispersion) from two independent populations, when at

least one of them presents non-normal distribution [14].

The observations of each group are randomly distributed

to their respective subgroups m1 and m2.

The sum of the squares of the differences is calculatedP

D2ji ¼

PXi � X� �2

and sorted increasingly (progres-

sively, gradually), and orders are given. If two or more

observations are identical, the average of the orders due to

different observations is used, where �X is the average of all

Xi values. Note that the notationP

Dji2 represents the sum

of the squared difference scores of the ith subsample in

each Group j. Here, a subsample is a set of scores derived

from a sample.

The ranks of each group, R1 and R2, are separated into

their respective groups with their respective sums equal toP

R1 andP

R2, subsamples m1 and m2, scores k and

n = m�k.

Here, we use an example, Table 4, based on a result of

the small set of data, with each subsample comprised of

k = 2 scores, m1 = 3 subsamples for Group 1, and m2 = 3

subsamples for Group 2. Since n1 = n2 = 3�2 = 6, 12

Table 4 Numerical example of Test of Moses, 2 groups with 3 subsamples each

Subsample �X X � �Xð Þ X � �Xð Þ2P

X � �Xð Þ2¼P

D21i Rank

Group 1

1, 10 5.5 -4.5, 4.5 20.25, 20.25 40.5 4.5

10, 0 5.0 5.0, -5.0 25.00, 25.00 50.0 6

9, 0 4.5 4.5, -4.5 20.25, 20.25 40.5 4.5

Subsample �X X � �Xð Þ X � �Xð Þ2P

X � �Xð Þ2¼P

D22i Rank

Group 2

4, 4 4.0 0.0, 0.0 0.0, 0.0 0.0 1

5, 6 5.5 -0.5, 0.5 0.25, 0.25 0.5 2.5

5, 6 5.5 -0.5, 0.5 0.25, 0.25 0.5 2.5

Accred Qual Assur

123

Page 6: Validation of the Performance of Process Stream Analyzer Systems

subjects are employed in the analysis. In this instance, it is

used a table of random numbers to select the scores for

each of the subsamples.X

R1 ¼ 4:5þ 6þ 4:5 ¼ 15 andX

R2 ¼ 1þ 2:5þ 2:5 ¼ 6:

U1, U2, and U are calculated as:

U1 ¼ m1 � m2 þm1 � ðm1 þ 1Þ

2�X

R1

¼ 3 � 3þ 3 � ð3þ 1Þ2

� 15 ¼ 0 ð3Þ

U2 ¼ m1 � m2 þm2 � ðn2 þ 1Þ

2�X

R2

¼ 3 � 3þ 3 � ð3þ 1Þ2

� 6 ¼ 9 ð4Þ

Ucalculated is the minimum value between U1 and U2:

Ucalculated ¼ min U1; U2ð Þ ¼ 0

As Ucalculated ¼ 0�Ucritical ¼ 0, there is not a statisti-

cally significant difference between the variances of

Groups 1 and 2.

Test of Bonett-Price [15]

This test is applied to data sets with nonparametric distri-

butions. In this situation, the confidence interval is based

on a linear function of medians of the population.

The median (Md) of the data set and estimation the

variance, varMd, are calculated by

varMd ¼ Y n�aþ1ð Þ � Y að Þ� ��

2z� �2

Y(i) represents the quantitative score of the response vari-

able in the position i (ordered).

a and z are tabulated, in function of the data number and

level of significance [15].

The confidence interval, CI, can be calculated by:

CI ¼ Md � za=2 varMdð Þ�A simple example is related to the salt concentration in

waste water, expressed in lg mL-1: 13.53, 28.42, 48.11,

48.64, 51.40, 59.91, 67.98, 79.13, and 103.5. The median,

Md, is 51.4. Referring to values of parameters for the Price–

Bonett variance estimate [15]: n ¼ 9; a ¼ 2 and z ¼ 2:064:

Y(9-2?1) = Y(8) = 79.13; Y(2) = 28.42.

Table 6 Comparing mean and median of ammoniacal nitrogen resultsfrom ten different samples by parametric and nonparametric tests

Process analyzer (PA) [mg L-1] Reference test (RT) [mg L-1]

8.6 10.4

3.7 4.2

2.1 2.5

2.3 2.1

0.9 1.1

1.3 1.4

0.8 1.2

0.8 1.0

0.9 1.2

0.8 0.9

Parametric statistics Nonparametric statistics

Paired t test Wilcoxon signed-rank test

tcalculated = 2.237\ tcritical = 2.262

Wcalculated = 4.00\ Wcritical = 8.00

There is not a statisticallysignificant differencebetween the means

There is a statistically significantdifference between the medians

Table 7 Precision evaluation of results from ten replicate measurements

of ammoniacal nitrogen concentration in the same sample by parametric

and nonparametric tests

Process analyzer (PA) [mg L-1] Reference test (RT) [mg L-1]

8.6 8.2

8.6 8.6

8.5 8.5

8.6 8.6

8.5 8.5

8.6 8.6

8.5 8.5

8.5 8.5

8.6 8.6

8.7 8.4

SWcalculated = 0.8021

\ SWcritical = 0.8420

SWcalculated = 0.7733

\ SWcritical = 0.8420

Non-normal distribution Non-normal distribution

Parametric statistics Nonparametric statistics

F-Fischer–Snedecor Moses

Fcalculated = 3.417

\ Fcritical = 3.179

Ucalculated = 11.00

[ Ucritical = 2.00

There is a statistically

significant difference

between the variances

There is not a statistically

significant difference

between the variances

Table 5 Comparing parametric and nonparametric tests

Aim/statistical test Normal

distribution

Non-normal

distribution

Outliers Grubbs’ test Median absolute

deviation (MAD)

Comparing means/medians Paired t test Wilcoxon signed-

rank test

Comparing means/medians

(unpaired samples)

Unpaired

t test

Wilcoxon rank-sum

test

Comparing dispersion

measures

F-Fischer–

Snedecor

Moses

Confidence interval t test Bonett-Price

Accred Qual Assur

123

Page 7: Validation of the Performance of Process Stream Analyzer Systems

The variance estimator, varMd,

varMd ¼ Y n�aþ1ð Þ � Y að Þ� ��

2z� �2

¼ ð79:13� 28:42Þ=ð2� 2:064Þ½ �2¼ 150:91:

A 95 % confidence interval, CI ¼ Md � za=2 varMdð Þ� ¼51:4� 1:96ð150:91Þ� ¼ ð27:33; 75:47Þ.

Experimental

Four case studies, one for each procedure, are discussed in

this work.

Data were collected from Alberto Pasqualini Refinery,

Canoas, Brazil.

In the first study, ammoniacal nitrogen concentration in

effluent samples is determined by RT and PA. Validation is

performed using Procedure A. The process analyzer is GLI

International (IL, USA), model 750-500 ion selective

electrode whose work range is 0–20 mg L-1. The mea-

surement procedure is based on Standard Methods 4500.

In the second study, ethane amount of substance fraction

in the same sample hydrogen recycle is determined by RT

and PA. Validation follows Procedure B. The process

analyzer is a Yokogawa (Tokyo, Japan) chromatograph,

model GC 1000 Mark II, work range 0–1 mmol/mol. The

measurement procedure is correlated to UOP 539.

The next study involves flash point in diesel oil. The

process analyzer is an integrated flash point, Precision

Scientific Petroleum Instruments (IL, USA), model 45624,

work range 10–121 �C. The measurement procedure is

correlated with ASTM D93.

For the last study, propane amount of substance fraction

in a reference sample of hydrogen recycle is determined by

PA, validated by Procedure D and compared with the

certified result. The process analyzer is a Yokogawa

chromatograph model GC 1000 Mark II, work range

0–1 mmol/mol. The measurement procedure is correlated

with UOP 539.

Results and discussion

Nonparametric data from case studies are calculated by

both parametric and nonparametric tests, Table 5. Before

choosing one of these tests, it is necessary to assess whe-

ther or not the data follow a normal distribution. In this

work, the Shapiro–Wilk test, named as SW, is used to check

deviation from normality/normal distribution [16].

All calculations are conducted by Microsoft Excel, and

critical values are based on a confidence level of 95 %. No

value is considered as an outlier by Grubbs and MAD tests.

Case study: Procedure A

Comparison of mean and median is evaluated by ten dif-

ferent samples that are analyzed by PA and RT. The mass

Table 8 Precision and comparing mean and median evaluations of results from the same sample for ethane determination by parametric and

nonparametric tests

Process analyzer (PA) [mmol/mol] Reference test (RT) [mmol/mol]

0.102 0.102 0.103 0.102

0.102 0.102 0.102 0.102

0.103 0.102 0.102 0.102

0.103 0.102 0.102 0.102

0.103 0.102 0.102 0.102

0.103 0.102 0.102

0.103 0.102 0.102

0.102 0.102 0.102

SWcalculated = 0.591 \ SWcritical = 0.887 SWcalculated = 0.311 \ SWcritical = 0.866

Non-normal distribution Non-normal distribution

Parametric statistics Nonparametric statistics

F-Fischer–Snedecor Moses

Fcalculated = 2.98 [ Fcritical = 2.62 Ucalculated = 4.00 [ Ucritical = 1.00

There is a statistically significant difference between the

variances

There is not a statistically significant difference between the

variances

Unpaired t test Wilcoxon rank-sum test

tcalculated = 1.656 \ tcritical = 2.056 Wcalculated = 3.05 [ Wcritical = 1.96

There is not a statistically significant difference between the

means

There is a statistically significant difference between the

medians

Accred Qual Assur

123

Page 8: Validation of the Performance of Process Stream Analyzer Systems

concentrations are shown in Table 6. Shapiro–Wilk test

for the differences of each pair, between PA and RT:

SWcalculated = 0.7253 \ SWcritical = 0.8420. The distribu-

tion is non-normal.

Precision is evaluated by comparing ten results of the

same sample that is analyzed either by PA or RT. The mass

concentrations and the results treated by parametric and

nonparametric tests are shown in Table 7.

Case study: Procedure B

Precision and mean/median are evaluated by the same

sample. PA and RT use sixteen and thirteen replicates,

respectively. The amount of substance, comparing preci-

sion (variance) and mean/median evaluations by

parametric and nonparametric tests, is shown in Table 8.

Case study: Procedure C

Only, comparison of mean and median is evaluated in this

procedure. Ten different samples analyzed by PA and RT are

compared. The flash points and comparison of mean and

median by parametric and nonparametric tests are shown in

Table 9. Shapiro–Wilk test for the differences of each pair,

between PA and RT: SWcalculated = 0.8294 \ SWcritical =

0.8420. The distribution is non-normal.

Case study: Procedure D

Only, comparison of mean and median by parametric and

nonparametric tests is evaluated in this procedure. A con-

fidence interval derived from twenty-three replicates of a

reference sample analyzed by PA is compared with the

certified value. The amount-of-substance fraction is shown

in Table 10.

Conclusions

New procedures for performance validation of process

stream analyzer systems have been developed to deal with

nonparametric data. The procedures can be applied to all

common situations faced in that activity, such as non-

availability of a reference test, impossibility of introduction

of the validation sample in the analyzer, and non-avail-

ability of a typical process sample. From the studied

Table 9 Comparison of mean and median of flash point results from ten different

samples by parametric and nonparametric tests

Process analyzer (PA) [�C] Reference test (RT) [�C]

40.0 39.0

40.0 41.0

41.0 41.0

41.0 41.0

40.5 40.0

39.5 39.5

43.0 43.0

43.0 43.0

41.5 41.5

42.0 41.0

Parametric statistics Nonparametric statistics

Paired t test Wilcoxon signed-rank test

tcalculated = 0.818 \ tcritical = 2.262 Wcalculated = 4.00 \ Wcritical = 8.00

There is not a statistically

significant difference between the means

There is a statistically significant

difference between the medians

Table 10 Comparison of mean and median of replicates of a reference sample for propane determination by parametric and nonparametric tests

Process analyzer (PA) [mmol/mol]

0.3011 0.3030 0.3041

0.3020 0.3030 0.3040

0.3020 0.3030 0.3052

0.3030 0.3030 0.3050

0.3029 0.3030 0.3050

0.3030 0.3030 0.3060

0.3030 0.3030 0.3061

0.3030 0.3030

SWcalculated = 0.8531 \ SWcritical = 0.9140

Non-normal distribution

Parametric statistics Nonparametric statisticst test Bonett-Price

Process Analyzer (PA) 0.3029–0.3040 0.3025–0.3035

Reference Sample (RS) 0.3037–0.3043 0.3037–0.3043

There is not a statistically significantdifference between the means

There is a statistically significantdifference between the medians

Accred Qual Assur

123

Page 9: Validation of the Performance of Process Stream Analyzer Systems

examples, it is possible to observe that when parametric

statistics are applied to nonparametric data, different

results can be obtained, leading to wrong conclusions,

which highlight the importance of the use of the right

statistics in each case. This manuscript can be very useful

for readers to understand better this subject given that

international standards do not foresee and deal with it. In

2014, this study will be part of a new Brazilian standard in

order to cover this deficiency.

Acknowledgments This work was supported by Petrobras, and the

original idea and the meetings to elaborate it were sponsored by

Marcia Beatriz Ruiz Del Frari, manager from Products Engineering

and Oil Stockpiling of the Downstream-Refinery.

References

1. ASTM D 3764 (2009). Standard practice for validation of the

performance of process stream analyzer systems. American

Society for Testing and Materials, West Conshohocken

2. ASTM D4175 (2009) Standard terminology relating to petro-

leum, petroleum products, and lubricants. American Society for

Testing and Materials, West Conshohocken

3. ASTM D3864 (2012) Standard guide for on-line monitoring

systems for water analysis. American Society for Testing and

Materials, West Conshohocken

4. ASTM E1655–05 (2012) Standard practices for infrared multi-

variate quantitative analysis. American Society for Testing and

Materials, West Conshohocken

5. ASTM D6299 (2010) Standard practice for applying statistical

quality assurance and control charting techniques to evaluate

analytical measurement system performance. American Society

for Testing and Materials, West Conshohocken

6. ASTM D 6708 (2008) Standard practice for statistical assessment

and improvement of expected agreement between two test

methods that purport to measure the same property of a material.

American Society for Testing and Materials, West Conshohocken

7. ASTM D7235 (2010) Standard guide for establishing a linear

correlation relationship between analyzer and primary test

method results using relevant astm standard practices. American

Society for Testing and Materials, West Conshohocken

8. Nunes MJ, Camoes MF, McGrovern F, Raes F (2004) Parametric

or non-parametric statistical tools applied to marine aerossol

sampling. Accred Qual Assur 9:355–360

9. Serfling R, Mazumder S (2009) Exponential probability

inequality and convergence results for the median absolute

deviation and its modifications. Statist Probab Lett V79

79(16):1767–1773

10. Miller JN, Miller JC (2010) Statistics and chemometrics for

analytical chemistry, 6th edn. Pearson Education Limited,

Harlow

11. Kasuya E (2010) Wilcoxon signed-ranks test: symmetry should

be confirmed before the test. Anim Behav 79:765–767

12. Graham MA, Chakraborti S, Human SW (2011) A non-para-

metric exponentially weighted moving average signed-rank chart

for monitoring location. Comput Stat Data An 55(8):2490–2503

13. Montgomery DC, Runger DC (2003) Applied statistics and

probability for engineers, 3rd edn. Wiley, New York

14. Sheskin DJ (2003) Handbook of parametric and non-parametric

statistical procedures, 3rd edn. Chapman and Hall, Boca Raton

15. Bonett DG, Price RM (2002) Statistical inference for a linear

function of medians: confidence intervals, hypothesis testing, and

sample size requirements. Psychol Methods 7:370–383

16. Massart DL et al (1997) Handbook of chemometrics and quali-

metrics, Part A. Elsevier, Amsterdam

Accred Qual Assur

123