531notesv2

Graduate Lectures and Problems in QualityControl and Engineering Statistics:

Theory and Methods

To Accompany

Statistical Quality Assurance Methods for Engineers

by

Vardeman and Jobe

Stephen B. Vardeman

V2.0: January 2001

c Stephen Vardeman 2001. Permission to copy for educationalpurposes granted by the author, subject to the requirement thatthis title page be axed to each copy (full or partial) produced.

Contents

1 Measurement and Statistics 11.1 Theory for Range-Based Estimation of Variances . . . . . . . . . 11.2 Theory for Sample-Variance-Based Estimation of Variances . . . 31.3 Sample Variances and Gage R&R . . . . . . . . . . . . . . . . . . 41.4 ANOVA and Gage R&R . . . . . . . . . . . . . . . . . . . . . . . 51.5 Condence Intervals for Gage R&R Studies . . . . . . . . . . . . 71.6 Calibration and Regression Analysis . . . . . . . . . . . . . . . . 101.7 Crude Gaging and Statistics . . . . . . . . . . . . . . . . . . . . . 11

1.7.1 Distributions of Sample Means and Ranges from IntegerObservations . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.7.2 Estimation Based on Integer-Rounded Normal Data . . . 13

2 Process Monitoring 212.1 Some Theory for Stationary Discrete Time Finite State Markov

Chains With a Single Absorbing State . . . . . . . . . . . . . . . 212.2 Some Applications of Markov Chains to the Analysis of Process

Monitoring Schemes . . . . . . . . . . . . . . . . . . . . . . . . . 242.3 Integral Equations and Run Length Properties of Process Moni-

toring Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3 An Introduction to Discrete Stochastic Control Theory/MinimumVariance Control 373.1 General Exposition . . . . . . . . . . . . . . . . . . . . . . . . . . 373.2 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4 Process Characterization and Capability Analysis 454.1 General Comments on Assessing and Dissecting Overall Variation 454.2 More on Analysis Under the Hierarchical Random Eects Model 474.3 Finite Population Sampling and Balanced Hierarchical Structures 50

5 Sampling Inspection 535.1 More on Fraction Nonconforming Acceptance Sampling . . . . . 535.2 Imperfect Inspection and Acceptance Sampling . . . . . . . . . . 58

3

4 CONTENTS

5.3 Some Details Concerning the Economic Analysis of Sampling In-spection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

6 Problems 691 Measurement and Statistics . . . . . . . . . . . . . . . . . . . . . 692 Process Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . 743 Engineering Control and Stochastic Control Theory . . . . . . . 934 Process Characterization . . . . . . . . . . . . . . . . . . . . . . . 1015 Sampling Inspection . . . . . . . . . . . . . . . . . . . . . . . . . 115

A Useful Probabilistic Approximation 127

Chapter 1

Measurement and Statistics

V&J 2.2 presents an introduction to the topic of measurement and the relevanceof the subject of statistics to the measurement enterprise. This chapter expandssomewhat on the topics presented in V&J and raises some additional issues.

Note that V&J equation (2.1) and the discussion on page 19 of V&J arecentral to the role of statistics in describing measurements in engineering andquality assurance. Much of Stat 531 concerns process variation. The discus-sion on and around page 19 points out that variation in measurements from aprocess will include both components of real process variation and measure-ment variation.

1.1 Theory for Range-Based Estimation of Vari-ances

Suppose that X1;X2; : : : ;Xn are iid Normal (,2) random variables and let

R = maxXi minXi= max(Xi ) min(Xi )=

max

Xi

min

Xi

= (maxZi minZi)

where Zi = (Xi )=. Then Z1; Z2; : : : ; Zn are iid standard normal randomvariables. So for purposes of studying the distribution of the range of iid normalvariables, it suces to study the standard normal case. (One can derive general facts from the = 1 facts by multiplying by .)

Consider rst the matter of the nding the mean of the range of n iid stan-dard normal variables, Z1; : : : ; Zn. Let

U = minZi; V = maxZi and W = V U :

1

2 CHAPTER 1. MEASUREMENT AND STATISTICS

ThenEW = EV EU

andEU = EminZi = E(minZi) = Emax(Zi) ;

where the n variables Z1;Z2; : : : ;Zn are iid standard normal. ThusEW = EV EU = 2EV :

Then, (as is standard in the theory of order statistics) note that

V t , all n values Zi are t :So with the standard normal cdf,

P [V t] = n(t)and thus a pdf for V is

f(v) = n(v)n1(v) :

So

EV =Z 1

1v

n(v)n1(v)

dv ;

and the evaluation of this integral becomes a (very small) problem in numericalanalysis. The value of this integral clearly depends upon n. It is standard toinvent a constant (whose dependence upon n we will display explicitly)

d2(n):= EW = 2EV

that is tabled in Table A.1 of V&J. With this notation, clearly

ER = d2(n) ;

(and the range-based formulas in Section 2.2 of V&J are based on this simplefact).

To nd more properties of W (and hence R) requires appeal to a well-knownorder statistics result giving the joint density of two order statistics. The jointdensity of U and V is

f(u; v) =

n(n 1)(u)(v) ((v) (u))n2 for v > u0 otherwise :

A transformation then easily shows that the joint density of U and W = V Uis

g(u;w) =

n(n 1)(u)(u + w) ((u + w) (u))n2 for w > 00 otherwise :

1.2. THEORY FOR SAMPLE-VARIANCE-BASED ESTIMATION OF VARIANCES3

Then, for example, the cdf of W is

P [W t] =Z t

0

Z 11

g(u;w)dudw ;

and the mean of W 2 is

EW 2 =Z 1

0

Z 11

w2g(u;w)dudw :

Note that upon computing EW and EW 2, one can compute both the varianceof W

VarW = EW 2 (EW )2and the standard deviation of W ,

pVarW . It is common to give this standard

deviation the name d3(n) (where we continue to make the dependence on nexplicit and again this constant is tabled in Table A.1 of V&J). Clearly, havingcomputed d3(n)

:=p

VarW , one then hasp

Var R = d3(n) :

1.2 Theory for Sample-Variance-Based Estima-tion of Variances

Continue to suppose that X1;X2; : : : ;Xn are iid Normal (; 2) random vari-ables and take

s2 :=1

n 1nX

i=1

(Xi X)2 :

Standard probability theory says that

(n 1)s22

2n1 :

Now if U 2 it is the case that EU = and Var U = 2. It is thus immediatethat

Es2 = E

2

n 1

(n 1)s22

=

2

n 1

E

(n 1)s22

= 2

and

Var s2 = Var

2

n 1

(n 1)s22

=

2

n 12

Var

(n 1)s22

=

24

n 1so that p

Var s2 = 2r

2n 1 :


Knowing that (n 1)s2=2 2n1 also makes it easy enough to developproperties of s =

ps2. For example, if

f(x) =

8 0

0 otherwise

is the 2n1 probability density, then

Es = Er

2

n 1r

(n 1)s22

=p

n 1Z 1

0

pxf(x)dx = c4(n) ;

for

c4(n):=

R 10

pxf(x)dxpn 1

another constant (depending upon n) tabled in Table A.1 of V&J. Further, thestandard deviation of s is

pVar s =

qEs2 (Es)2 =

q2 (c4(n))2 =

q1 c24(n) = c5(n)

forc5(n)

:=q

1 c24(n)yet another constant tabled in Table A.1.

The fact that sums of independent 2 random variables are again 2 (withdegrees of freedom equal to the sum of the component degrees of freedom) andthe kinds of relationships in this section provide means of combining variouskinds of sample variances to get pooled estimators of variances (and variancecomponents) and nding the means and variances of these estimators. For ex-ample, if one pools in the usual way the sample variances from r normal samplesof size m to get a single pooled sample variance, s2pooled , r(m 1)s2pooled=2 is2 with degrees of freedom = r(m 1). That is, all of the above can beapplied by thinking of s2pooled as a sample variance based on a sample of sizen= r(m 1) + 1.

1.3 Sample Variances and Gage R&RThe methods of gage R&R analysis presented in V&J 2.2.2 are based on ranges(and the facts in 1.1 above). They are presented in V&J not because of theireciency, but because of their computational simplicity. Better (and analo-gous) methods can be based on the facts in 1.2 above. For example, under thetwo-way random eects model (2.4) of V&J, if one pools I J cell samplevariances s2ij to get s2pooled , all of the previous paragraph applies and gives meth-ods of estimating the repeatability variance component 2 (or the repeatabilitystandard deviation ) and calculating means and variances of estimators basedon s2pooled .

1.4. ANOVA AND GAGE R&R 5

Or, consider the problem of estimating reproducibility dened in display (2.5)of V&J. With yij as dened on page 24 of V&J, note that for xed i, theJ random variables yij i have the same sample variance as the J randomvariables yij , namely

s2i:=

1J 1

Xj

(yij yi:)2 :

But for xed i the J random variables yij i are iid normal with mean andvariance 2 +

2 +

2=m, so that

Es2i = 2 +

2 +

2=m :

So1I

Xi

s2i

is a plausible estimator of 2 + 2 +

2=m. Hence

1I

Xi

s2i s2pooled

m;

or better yet

max

0;

1I

Xi

s2i s2pooled

m

!(1.1)

is a plausible estimator of 2reproducibility .

1.4 ANOVA and Gage R&RUnder the two-way random eects model (2.4) of V&J, with balanced data, itis well-known that the ANOVA mean squares

MSE =1

IJ(m 1)Xi;j;k

(yijk y::)2 ;

MSAB =m

(I 1)(J 1)Xi;j

(yij yi: y:j + y::)2 ;

MSA =mJ

I 1X

i

(yi: y::)2 ; and

MSB =mI

J 1X

i

(y:j y::)2 ;

are independent random variables, that

EMSE = 2 ;

EMSAB = 2 + m2 ;

EMSA = 2 + m2 + mJ2 ; and

EMSB = 2 + m2 + mI2 ;


Table 1.1: Two-way Balanced Data Random Eects Analysis ANOVA TableANOVA Table

Source SS df MS EMSParts SSA I 1 MSA 2 + m2 + mJ2Operators SSB J 1 MSB 2 + m2 + mI2PartsOperators SSAB (I 1)(J 1) MSAB 2 + m2Error SSE (m 1)IJ MSE 2Total SSTot mIJ 1

and that the quantities

(m 1)IJMSEEMSE

;(I 1)(J 1)MSAB

EMSAB;

(I 1)MSAEMSA

and(J 1)MSB

EMSB

are 2 random variables with respective degrees of freedom

(m 1)IJ ; (I 1)(J 1) ; (I 1) and (J 1) :These facts about sums of squares and mean squares for the two-way random

eects model are often summarized in the usual (two-way random eects model)ANOVA table, Table 1.1. (The sums of squares are simply the mean squaresmultiplied by the degrees of freedom. More on the interpretation of such tablescan be found in places like 8-4 of V.)

As a matter of fact, the ANOVA error mean square is exactly s2pooled from1.3 above. Further, the expected mean squares suggest ways of producing sen-sible estimators of other parametric functions of interest in gage R&R contexts(see V&J page 27 in this regard). For example, note that

2reproducibility =1

mIEMSB +

1m

(1 1I)EMSAB 1

mEMSE ;

which suggests the ANOVA-based estimator

b2reproducibility = max0; 1mI MSB + 1m(1 1I )MSAB 1mMSE

: (1.2)

What may or may not be well known is that this estimator (1.2) is exactly theestimator of 2reproducibility in display (1.1).

Since many common estimators of quantities of interest in gage R&R studiesare functions of mean squares, it is useful to have at least some crude standarderrors for them. These can be derived from delta method/propagation oferror/Taylor series argument provided in the appendix to these notes. Forexample, if MSi i = 1; : : : ; k are independent random variables, (iMSi=EMSi)with a 2i distribution, consider a function of k real variables f(x1; : : : ; xk) andthe random variable

U = f(MS1;MS2; :::;MSk) :

1.5. CONFIDENCE INTERVALS FOR GAGE R&R STUDIES 7

Propagation of error arguments produce the approximation

Var U kX

i=1

@f@xi

EMS1;EMS2;:::;EMSk

!2VarMSi =

kXi=1

@f@xi

EMS1;EMS2;:::;EMSk

!22(EMSi)2

i;

and upon substituting mean squares for their expected values, one has a stan-dard error for U , namely

pdVar U =vuut2 kX

i=1

@f@xi

MS1;MS2;:::;MSk

!2(MSi)2

i: (1.3)

In the special case where the function of the mean squares of interest is linearin them, say

U =kX

i=1

ciMSi ;

the standard error specializes to

pdVarU =vuut2 kX

i=1

c2i (MSi)2

i;

which provides at least a crude method of producing standard errors for b2reproducibilityand b2overall. Such standard errors are useful in giving some indication of theprecision with which the quantities of interest in a gage R&R study have beenestimated.

1.5 Condence Intervals for Gage R&R StudiesThe parametric functions of interest in gage R&R studies (indeed in all randomeects analyses) are functions of variance components, or equivalently, functionsof expected mean squares. It is thus possible to apply theory for estimating suchquantities to the problem of assessing precision of estimation in a gage study.As a rst (and very crude) example of this, note that taking the point of view of1.4 above, where U = f(MS1;MS2; : : : ;MSk) is a sensible point estimator ofan interesting function of the variance components and

pdVar U is the standarderror (1.3), simple approximate two-sided 95% condence limits can be made as

U 1:96pdVarU :

These limits have the virtue of being amenable to hand calculation from theANOVA sums of squares, but they are not likely to be reliable (in terms ofholding their nominal/asymptotic coverage probability) for I,J or m small.

Linear models experts have done substantial research aimed at nding re-liable condence interval formulas for important functions of expected mean


squares. For example, the book Condence Intervals on Variance Componentsby Burdick and Graybill gives results (on the so-called modied large samplemethod) that can be used to make condence intervals on various importantfunctions of variance components. The following is some material taken fromSections 3.2 and 3.3 of the Burdick and Graybill book.

Suppose that MS1;MS2; : : : ;MSk are k independent mean squares. (TheMSi are of the form SSi=i, where SSi=EMSi = iMSi=EMSi has a 2idistribution.) For 1 p < k and positive constants c1; c2; : : : ; ck suppose thatthe quantity

= c1EMS1 + + cpEMSp cp+1EMSp+1 ckEMSk (1.4)is of interest. Letb = c1MS1 + + cpMSp cp+1MSp+1 ckMSk :Approximate condence limits on in display (1.4) are of the form

L = b qVL and/or U = b + qVU ;for VL and VU dened below.

Let F:df1;df2 be the upper point of the F distribution with df1 and df2degrees of freedom. (It is then the case that F:df1;df2 = (F1:df2;df1)1.) Also,let 2:df be the upper point of the

2df distribution. With this notation

VL =pX

i=1

c2i MS2i G

2i +

kXi=p+1

c2i MS2i H

2i +

pXi=1

kXj=p+1

cicjMSiMSjGij+p1Xi=1

pXj>i

cicjMSiMSjGij ;

forGi = 1 i2:i

;

Hi =i

21:i 1 ;

Gij =(F:i;j 1)2 G2i F 2:i;j H2j

F:i;j;

and

Gij =

8>:0 if p = 1

1p 1

1 i + j

:i+j

2 (i + j)2ij

G2i ij

G2jji

!otherwise :

On the other hand,

VU =pX

i=1

c2i MS2i H

2i +

kXi=p+1

c2i MS2i G

2i +

pXi=1

kXj=p+1

cicjMSiMSjHij+k1X

i=p+1

kXj>i

cicjMSiMSjHij ;

1.5. CONFIDENCE INTERVALS FOR GAGE R&R STUDIES 9

for Gi and Hi as dened above, and

Hij =(1 F1:i;j )2 H2i F 21:i;j G2j

F1:i;j;

and

Hij =

8>>>:0 if k = p + 1

1k p 1

0@1 i + j2:i+j

!2(i + j)2

ij G

2i ij

G2jji

1A otherwise :One uses (L;1) or (1; U) for condence level (1 ) and the interval (L;U)for condence level (1 2). (Using these formulas for hand calculation is(obviously) no picnic. The C program written by Brandon Paris (available othe Stat 531 Web page) makes these calculations painless.)

A problem similar to the estimation of quantity (1.4) is that of estimating

= c1EMS1 + + cpEMSp (1.5)for p 1 and positive constants c1; c2; : : : ; cp. In this case let

b = c1MS1 + + cpMSp ;and continue the Gi and Hi notation from above. Then approximate condencelimits on given in display (1.5) are of the form

L = b vuut pX

i=1

c2i MS2i G2i and/or U = b +vuut pX

i=1

c2i MS2i H2i :

One uses (L;1) or (1; U) for condence level (1 ) and the interval (L;U)for condence level (1 2).

The Fortran program written by Andy Chiang (available o the Stat 531Web page) applies Burdick and Graybill-like material and the standard errors(1.3) to the estimation of many parametric functions of relevance in gage R&Rstudies.

Chiangs 2000 Ph.D. dissertation work (to appear in Technometrics in Au-gust 2001) has provided an entirely dierent method of interval estimation offunctions of variance components that is a uniform improvement over the mod-ied large sample methods presented by Burdick and Graybill. His approachis related to improper Bayes methods with so called Jereys priors. Andyhas provided software for implementing his methods that, as time permits, willbe posted on the Stat 531 Web page. He can be contacted (for preprints of hiswork) at [email protected] at the National University of Singapore.


1.6 Calibration and Regression AnalysisThe estimation of standard deviations and variance components is a contribu-tion of the subject of statistics to the quantication of measurement systemprecision. The subject also has contributions to make in the matter of im-proving measurement accuracy. Calibration is the business of bringing a localmeasurement system in line with a standard measurement system. One takesmeasurements y with a gage or system of interest on test items with knownvalues x (available because they were previously measured using a gold stan-dard measurement device). The data collected are then used to create a con-version scheme for translating local measurements to approximate gold standardmeasurements, thereby hopefully improving local accuracy. In this short sec-tion we note that usual regression methodology has implications in this kind ofenterprise.

The usual polynomial regression model says that n observed random valuesyi are related to xed values xi via

yi = 0 + 1xi + 2x2i + + kxki + "i (1.6)for iid Normal (0; 2) random variables "i. The parameters and are theusual objects of inference in this model. In the calibration context with x agold standard value, quanties precision for the local measurement system.Often (at least over a limited range of x) 1) a low order polynomial does a goodjob of describing the observed x-y relationship between local and gold standardmeasurements and 2) the usual (least squares) tted relationship

y^ = g(x) = b0 + bx + b2x2 + + bkxk

has an inverse g1(y). When such is the case, given a measurement yn+1 fromthe local measurement system, it is plausible to estimate that a correspondingmeasurement from the gold standard system would be x^n+1 = g1(yn+1). Areasonable question is then How good is this estimate?. That is, the matterof condence interval estimation of xn+1 is important.

One general method for producing such condence sets for xn+1 is based onthe usual prediction interval methodology associated with the model (1.6).That is, for a given x, it is standard (see, e.g. 9-2 of V or 9.2.4 of V&J#2) toproduce a prediction interval of the form

y^ tq

s2 + (std error(y^))2

for an additional corresponding y. And those intervals have the property thatfor all choices of x; ; 0; 1; 2; :::; k

Px;;0;1;2;:::;k [y is in the prediction interval at x]= desired condence level= 1 P [a tnk1 random variable exceeds jtj] .

1.7. CRUDE GAGING AND STATISTICS 11

But rewording only slightly, the event

y is in the prediction interval at x

is the same as the event

x produces a prediction interval including y.

So a condence set for xn+1 based on the observed value yn+1 is

fxj the prediction interval corresponding to x includes yn+1g . (1.7)Conceptually, one simply makes prediction limits around the tted relationshipy^ = g(x) = b0 + bx + b2x2 + + bkxk and then upon observing a new y seeswhat xs are consistent with that observation. This produces a condence setwith the desired condence level.

The only real diculties with the above general prescription are 1) the lack ofsimple explicit formulas and 2) the fact that when is large (so that the regres-sion

pMSE tends to be large) or the tted relationship is very nonlinear, the

method can produce (completely rational but) unpleasant-looking condencesets. The rst problem is really of limited consequence in a time when stan-dard statistical software will automatically produce plots of prediction limitsassociated with low order regressions. And the second matter is really inherentin the problem.

For the (simplest) linear version of this inverse prediction problem, thereis an approximate condence method in common use that doesnt have thedeciencies of the method (1.7). It is derived from a Taylor series argument andhas its own problems, but is nevertheless worth recording here for completenesssake. That is, under the k = 1 version of the model (1.6), commonly usedapproximate condence limits for xn+1 are (for x^n+1 = (yn+1 b0)=b1 andx the sample mean of the gold standard measurements from the calibrationexperiment)

x^n+1 tp

MSEjb1j

s1 +

1n

+(x^n+1 x)2Pni=1(xi x)2 .

1.7 Crude Gaging and StatisticsAll real-world measurement is to the nearest something. Often one may ignorethis fact, treat measured values as if they were exact and experience no realdiculty when using standard statistical methods (that are really based on anassumption that data are exact). However, sometimes in industrial applicationsgaging is crude enough that standard (e.g. normal theory) formulas givenonsensical results. This section briey considers what can be done to appro-priately model and draw inferences from crudely gaged data. The assumptionthroughout is that what are available are integer data, obtained by coding rawobservations via

integer observation =raw observation some reference value

smallest unit of measurement


(the smallest unit of measurement is the nearest something above).

1.7.1 Distributions of Sample Means and Ranges from In-teger Observations

To begin with something simple, note rst that in situations where only a fewdierent coded values are ever observed, rather than trying to model observa-tions with some continuous distribution (like a normal one) it may well makesense to simply employ a discrete pmf, say f , to describe any single measure-ment. In fact, suppose that a single (crudely gaged) observation Y has a pmff(y) such that

f(y) = 0 unless y = 1; 2; :::;M :

Then if Y1; Y2; : : : ; Yn are iid with this marginal discrete distribution, one caneasily approximate the distribution of a function of these variables via simulation(using common statistical packages). And for two of the most common statisticsused in QC settings (the sample mean and range) one can even work out exactprobability distributions using computationally feasible and very elementarymethods.

To nd the probability distribution of Y in this context, one can build upthe probability distributions of sums of iid Yis recursively by adding probabil-ities on diagonals in two-way joint probability tables. For example the n = 2distribution of Y can be obtained by making out a two-way table of joint prob-abilities for Y1 and Y2 and adding on diagonals to get probabilities for Y1 + Y2.Then making a two-way table of joint probabilities for (Y1 + Y2) and Y3 onecan add on diagonals and nd a joint distribution for Y1 + Y2 + Y3. Or notingthat the distribution of Y3 + Y4 is the same as that for Y1 + Y2, it is possible tomake a two-way table of joint probabilities for (Y1 + Y2) and (Y3 + Y4), add ondiagonals and nd the distribution of Y1 + Y2 + Y3 + Y4. And so on. (Clearly,after nding the distribution for a sum, one simply divides possible values by nto get the corresponding distribution of Y .)

To nd the probability distribution of R = maxYiminYi (for Yis as above)a feasible computational scheme is as follows. Let

Skj = Pj

x=k f(y) = P [k Y j] if k j0 otherwise

and compute and store these for 1 k; j M . Then deneMkj = P [minYi = k and maxYi = j] :

Now the event fminYi = k and maxYi = jg is the event fall observations arebetween k and j inclusiveg less the event fthe minimum is greater than k or themaximum is less than jg. Thus, it is straightforward to see that

Mkj = (Skj)n (Sk+1;j)n (Sk;j1)n + (Sk+1;j1)n


and one may compute and store these values. Finally, note that

P [R = r] =MrXk=1

Mk;k+r :

These algorithms are good for any distribution f on the integers 1; 2; : : : ;M .Karen (Jensen) Hultings DIST program (available o the Stat 531 Web page)automates the calculations of the distributions of Y and R for certain f s re-lated to integer rounding of normal observations. (More on this rounding ideadirectly.)

1.7.2 Estimation Based on Integer-Rounded Normal DataThe problem of drawing inferences from crudely gaged data is one that has ahistory of at least 100 years (if one takes a view that crude gaging essentiallyrounds exact values). Sheppard in the late 1800s noted that if one roundsa continuous variable to integers, the variability in the distribution is typicallyincreased. He thus suggested not using the sample standard deviation (s) ofrounded values but instead employing what is known as Sheppards correctionto arrive at r

(n 1)s2n

112

(1.8)

as a suitable estimate of standard deviation for integer-rounded data.The notion of interval-censoring of fundamentally continuous observations

provides a natural framework for the application of modern statistical theory tothe analysis of crudely gaged data. For univariate X with continuous cdf F (xj)depending upon some (possibly vector) parameter , consider X derived fromX by rounding to the nearest integer. Then the pmf of X is, say,

g(xj) :=

F (x + :5j) F (x :5j) for x an integer0 otherwise :

Rather than doing inference based on the unobservable variables X1;X2; : : : ;Xnthat are iid F (xj), one might consider inference based on X1 ;X2 ; : : : ;Xn thatare iid with pmf g(xj).

The normal version of this scenario (the integer-rounded normal data model)makes use of

g(xj; ) :=8


likelihood function that treats the (joint) probability assigned to the vector(x1; x2; : : : ; xn) as a function of the parameters,

L(; ) :=Y

i

g(xi j; ) =Y

i

xi + :5

xi :5

:

The log of this function of and is (naturally enough) called the loglikelihoodand will be denoted as

L(;) := lnL(; ) :A sensible estimator of the parameter vector (; ) is the point (b; b) max-

imizing the loglikelihood. This prescription for estimation is only partiallycomplete, depending upon the nature of the sample x1; x2; : : : ; xn. There arethree cases to consider, namely:

1. When the sample range of x1; x2; : : : ; xn is at least 2, L(; ) is well-behaved (nice and mound-shaped) and numerical maximization or justlooking at contour plots will quickly allow one to maximize the loglikeli-hood. (It is worth noting that in this circumstance, usually b is close tothe Sheppard corrected value in display (1.8).)

2. When the sample range of x1; x2; : : : ; xn is 1, strictly speaking L(;)fails to achieve a maximum. However, with

m := #[xi = minxi ] ;

(; ) pairs with small and

minxi + :5 1m

n

will have

L(; ) sup;

L(; ) = m lnm + (n m) ln(n m) n lnn :

That is, in this case one ought to estimate that is small and therelationship between and is such that a fraction m=n of the underlyingnormal distribution is to the left of minxi + :5, while a fraction 1 m=nis to the right.

3. When the sample range of x1; x2; : : : ; xn is 0, strictly speaking L(;)fails to achieve a maximum. However,

sup;

L(; ) = 0

and for any 2 (x1 :5; x1 + :5), L(; ) ! 0 as ! 0. That is, in thiscase one ought to estimate that is small and 2 (x1 :5; x1 + :5).


Beyond the making of point estimates, the loglikelihood function can provideapproximate condence sets for the parameters and/or . Standard largesample statistical theory says that (for large n and 2: the upper point ofthe 2 distribution):

1. An approximate (1) level condence set for the parameter vector (; )is

f(; )jL(; ) > sup;

L(; ) 122:2g : (1.9)

2. An approximate (1 ) level condence set for the parameter is

fj sup

L(; ) > sup;

L(; ) 122:1g : (1.10)

3. An approximate (1 ) level condence set for the parameter is

fj sup

L(; ) > sup;

L(;) 122:1g : (1.11)

Several comments and a fuller discussion are in order regarding these con-dence sets. In the rst place, Karen (Jensen) Hultings CONEST program(available o the Stat 531 Web page) is useful in nding sup

;L(; ) and pro-

ducing rough contour plots of the (joint) sets for (; ) in display (1.9). Second,it is common to call the function of dened by

L() = sup

L(; )

the prole loglikelihood function for and the function of

L() = sup

L(; )

the prole loglikelihood function for . Note that display (1.10) then says thatthe condence set should consist of those s for which the prole loglikelihoodis not too much smaller than the maximum achievable. And something entirelyanalogous holds for the sets in (1.11). Johnson Lee (in 2001 Ph.D. dissertationwork) has carefully studied these condence interval estimation problems anddetermined that some modication of methods (1.10) and (1.11) is necessary inorder to provide guaranteed coverage probabilities for small sample sizes. (Itis also very important to realize that contrary to naive expectations, not evena large sample size will make the usual t-intervals for and 2-intervals for hold their nominal condence levels in the event that is small, i.e. that therounding or crudeness of the gaging is important. Ignoring the rounding whenit is important can produce actual condence levels near 0 for methods withlarge nominal condence levels.)


Table 1.2: for 0-Range Samples Based on Very Small n

n :05 :10 :202 3:084 1:547 :7853 :776 :5624 :517

Intervals for a Normal Mean Based on Integer-Rounded Data

Specically regarding the sets for in display (1.10), Lee (in work to appear inthe Journal of Quality Technology) has shown that one must replace the value2:1 with something larger in order to get small n actual condence levels nottoo far from nominal for most (; ). In fact, the choice

c(n;) = n ln

t2

2 :(n1)n 1 + 1

!(for t

2 :(n1) the upper2 point of the t distribution with = n 1 degrees of

freedom) is appropriate.After replacing 2:1 with c(n; ) in display (1.10) there remains the numer-

ical analysis problem of actually nding the interval prescribed by the display.The nature of the numerical analysis required depends upon the sample rangeencountered in the crudely gaged data. Provided the range is at least 2, L()is well-behaved (continuous and mound-shaped) and even simple trial anderror with Karen (Jensen) Hultings CONEST program will quickly producethe necessary interval. When the range is 0 or 1, L() has respectively 2 or 1discontinuities and the numerical analysis is a bit trickier. Lee has recorded theresults of the numerical analysis for small sample sizes and = :05; :10 and :20(condence levels respectively 95%; 90% and 80%).

When a sample of size n produces range 0 with, say, all observations equalto x, the intuition that one ought to estimate 2 (x :5; x + :5) is soundunless n is very small. If n and are as recorded in Table 1.2 then display(1.10) (modied by the use of c(n; ) in place of 2:1) leads to the interval(x ; x + ). (Otherwise it leads to (x :5; x + :5) for these .)

In the case that a sample of size n produces range 1 with, say, all observationsx or x +1, the interval prescribed by display (1.10) (with c(n;) used in placeof 2:1) can be thought of as having the form (x + :5L; x + :5+U ) whereL and U depend upon

nx = #[observations x] and nx+1 = #[observations x + 1] . (1.12)

When nx nx+1, it is the case that L U . And when nx nx+1,correspondingly L U . Let

m = maxfnx ; nx+1g (1.13)


Table 1.3: (1;2) for Range 1 Samples Based on Small n

n m :05 :10 :202 1 (6:147; 6:147) (3:053; 3:053) (1:485; 1:485)3 2 (1:552; 1:219) (1:104; 0:771) (0:765; 0:433)4 3 (1:025; 0:526) (0:082; 0:323) (0:639; 0:149)

2 (0:880; 0:880) (0:646; 0:646) (0:441; 0:441)5 4 (0:853; 0:257) (0:721; 0:132) (0:592; 0:024)

3 (0:748; 0:548) (0:592; 0:339) (0:443; 0:248)6 5 (0:772; 0:116) (0:673; 0:032) (0:569; 0:000)

4 (0:680; 0:349) (0:562; 0:235) (0:444; 0:126)3 (0:543; 0:543) (0:420; 0:420) (0:299; 0:299)

7 6 (0:726; 0:035) (0:645; 0:000) (0:556; 0:000)5 (0:640; 0:218) (0:545; 0:130) (0:446; 0:046)4 (0:534; 0:393) (0:432; 0:293) (0:329; 0:193)

8 7 (0:698; 0:000) (0:626; 0:000) (0:547; 0:000)6 (0:616; 0:129) (0:534; 0:058) (0:446; 0:000)5 (0:527; 0:281) (0:439; 0:197) (0:347; 0:113)4 (0:416; 0:416) (0:327; 0:327) (0:236; 0:236)

9 8 (0:677; 0:000) (0:613; 0:000) (0:541; 0:000)7 (0:599; 0:065) (0:526; 0:010) (0:448; 0:000)6 (0:521; 0:196) (0:443; 0:124) (0:361; 0:054)5 (0:429; 0:321) (0:350; 0:242) (0:267; 0:163)

10 9 (0:662; 0:000) (0:604; 0:000) (0:537; 0:000)8 (0:587; 0:020) (0:521; 0:000) (0:450; 0:000)7 (0:515; 0:129) (0:446; 0:069) (0:371; 0:012)6 (0:437; 0:242) (0:365; 0:174) (0:289; 0:105)5 (0:346; 0:346) (0:275; 0:275) (0:200; 0:200)

and correspondingly take

1 = maxfL;Ug and 2 = minfL;Ug .Table 1.3 then gives values for 1 and 2 for n 10 and = :05; :10 and :2.

Intervals for a Normal Standard Deviation Based on Integer-RoundedData

Specically regarding the sets for in display (1.11), Lee found that in orderto get small n actual condence levels not too far from nominal, one must notonly replace the value 2:1 with something larger, but must make an additionaladjustment for samples with ranges 0 and 1.

Consider rst replacing 2:1 in display (1.11) with a (larger) value d(n; )given in Table 1.4. Lee found that for those (; ) with moderate to large ,


Table 1.4: d(n; ) for Use in Estimating

n :05 :102 10:47 7:713 7:26 5:234 6:15 4:395 5:58 3:976 5:24 3:717 5:01 3:548 4:84 3:429 4:72 3:33

10 4:62 3:2615 4:34 3:0620 4:21 2:9730 4:08 2:881 3:84 2:71

making this d(n; ) for 2:1 substitution is enough to produce an actual con-dence level approximating the nominal one. However, even this modicationis not adequate to produce an acceptable coverage probability for (;) withsmall .

For samples with range 0 or 1, formula (1.11) prescribes intervals of the form(0; U). And reasoning that when is small, samples will typically have range0 or 1, Lee was able to nd (larger) replacements for the limit U prescribed by(1.11) so that the resulting estimation method has actual condence level notmuch below the nominal level for any (; ) (with large or small).

That is if a 0-range sample is observed, estimate by

(0;0)

where 0 is taken from Table 1.5. If a range 1 sample is observed consisting,say, of values x and x +1, and nx ; nx+1 and m are as in displays (1.12) and(1.13), estimate using

(0;1;m)

where 1;m is taken from Table 1.6.The use of these values 0 for range 0 samples, and 1;m for range 1 samples,

and the values d(n; ) in place of 2:1 in display (1.11) nally produces a reliablemethod of condence interval estimation for when normal data are integer-rounded.


Table 1.5: 0 for Use in Estimating

n :05 :102 5:635 2:8073 1:325 0:9164 0:822 0:6535 0:666 0:5586 0:586 0:5027 0:533 0:4648 0:495 0:4359 0:466 0:413

10 0:443 0:39611 0:425 0:38112 0:409 0:36913 0:396 0:35814 0:384 0:34915 0:374 0:341


Table 1.6: 1;m for Use in Estimating (m in Parentheses)

n :05 :102 16:914(1) 8:439(1)3 3:535(2) 2:462(2)4 1:699(3) 2:034(2) 1:303(3) 1:571(2)5 1:143(4) 1:516(3) 0:921(4) 1:231(3)6 0:897(5) 1:153(4) 1:285(3) 0:752(5) 0:960(4) 1:054(3)7 0:768(6) 0:944(5) 1:106(4) 0:660(6) 0:800(5) 0:949(4)8 0:687(7) 0:819(6) 0:952(5) 0:599(7) 0:707(6) 0:825(5)

1:009(4) 0:880(4)9 0:629(8) 0:736(7) 0:837(6) 0:555(8) 0:644(7) 0:726(6)

0:941(5) 0:831(5)10 0:585(9) 0:677(8) 0:747(7) 0:520(9) 0:597(8) 0:654(7)

0:851(6) 0:890(5) 0:753(6) 0:793(5)11 0:550(10) 0:630(9) 0:690(8) 0:493(10) 0:560(9) 0:609(8)

0:775(7) 0:851(6) 0:685(7) 0:763(6)12 0:522(11) 0:593(10) 0:646(9) 0:470(11) 0:531(10) 0:573(9)

0:708(8) 0:789(7) 0:818(6) 0:626(8) 0:707(7) 0:738(6)13 0:499(12) 0:563(11) 0:610(10) 0:452(12) 0:506(11) 0:544(10)

0:658(9) 0:733(8) 0:791(7) 0:587(9) 0:655(8) 0:716(7)14 0:479(13) 0:537(12) 0:580(11) 0:436(13) 0:485(12) 0:520(11)

0:622(10) 0:681(9) 0:745(8) 0:558(10) 0:607(9) 0:674(8)0:768(7) 0:698(7)

15 0:463(14) 0:515(13) 0:555(12) 0:422(14) 0:468(13) 0:499(12)0:593(11) 0:639(10) 0:701(9) 0:534(11) 0:574(10) 0:632(9)0:748(8) 0:682(8)

Chapter 2

Process Monitoring

Chapters 3 and 4 of V&J discuss methods for process monitoring. The keyconcept there regarding the probabilistic description of monitoring schemes isthe run length idea introduced on page 91 and specically in display (3.44).Theory for describing run lengths is given in V&J only for the very simplest caseof geometrically distributed T . This chapter presents some more general toolsfor the analysis/comparison of run length distributions of monitoring schemes,namely discrete time nite state Markov chains and recursions expressed interms of integral (and dierence) equations.

2.1 Some Theory for Stationary Discrete TimeFinite State Markov Chains With a SingleAbsorbing State

These are probability models for random systems that at times t = 1; 2; 3 : : :can be in one of a nite number of states

S1;S2; : : : ;Sm;Sm+1 :

The Markov assumption is that the conditional distribution of where thesystem is at time t + 1 given the entire history of where it has been up throughtime t only depends upon where it is at time t. (In colloquial terms: Theconditional distribution of where Ill be tomorrow given where I am and how I gothere depends only on where I am, not on how I got here.) So called stationaryMarkov Chain (MC) models employ the assumption that movement betweenstates from any time t to time t + 1 is governed by a (single) matrix of (one-step) transition probabilities (that is independent of t)

P(m+1)(m+1)

= (pij)

where

pij = P [system is in Sj at time t + 1 j system is in Si at time t] :

21

22 CHAPTER 2. PROCESS MONITORING

S S

S

1 2

3

.1 .05

.8 .05

1.0

.1

.9

Figure 2.1: Schematic for a MC with Transition Matrix (2.1)

As a simple example of this, consider the transition matrix

P33

:=

0@ :8 :1 :1:9 :05 :050 0 1

1A : (2.1)Figure 2.1 is a useful schematic representation of this model.

The Markov Chain represented by Figure 2.1 has an interesting property.That is, while it is possible to move back and forth between states 1 and 2,once the system enters state 3, it is stuck there. The standard jargon for thisproperty is to say that S3 is an absorbing state. (In general, if pii = 1, Si iscalled an absorbing state.)

Of particular interest in applications of MCs to the description of processmonitoring schemes are chains with a single absorbing state, say Sm+1, where itis possible to move (at least eventually) from any other state to the absorbingstate. One thing that makes these chains so useful is that it is very easy towrite down a matrix formula for a vector giving the mean number of transitionsrequired to reach Sm+1 from any of the other states. That is, with

Li = the mean number of transitions required to move from Si to Sm+1 ;

Lm1 =

0BBB@L1L2...

Lm

1CCCA ; P(m+1)(m+1) =0@ Rmm rm1

01m 111

1A ; and 1m1 =

[email protected]

1CCCAit is the case that

L = (I R)11 : (2.2)

2.1. SOME THEORY FOR STATIONARY DISCRETE TIME FINITE STATE MARKOV CHAINS WITH A SIN

To argue that display (2.2) is correct, note that the following system of mequations clearly holds:

L1 = (1 + L1)p11 + (1 + L2)p12 + + (1 + Lm)p1m + 1 p1;m+1L2 = (1 + L1)p21 + (1 + L2)p22 + + (1 + Lm)p2m + 1 p2;m+1

...Lm = (1 + L1)pm1 + (1 + L2)pm2 + + (1 + Lm)pmm + 1 pm;m+1 :

But this set is equivalent to the set

L1 = 1 + p11L1 + p12L2 + + p1mLmL2 = 1 + p21L1 + p22L2 + + p2mLm

...Lm = 1 + pm1L1 + pm2L2 + + pmmLm

and in matrix notation, this second set of equations is

L = 1 + RL : (2.3)

SoL RL = 1 ;

i.e.(I R)L = 1 :

Under the conditions of the present discussion it is the case that (I R) isguaranteed to be nonsingular, so that multiplying both sides of this matrixequation by the inverse of (I R) one nally has equation (2.2).

For the simple 3-state example with transition matrix (2.1) it is easy enoughto verify that with

R =

:8 :1:9 :05

one has

(I R)11 =

10:511

:

That is, the mean number of transitions required for absorption (into S3) fromS1 is 10:5 while the mean number required from S2 is 11:0.

When one is working with numerical values in P and thus wants numericalvalues in L, the matrix formula (2.2) is most convenient for use with numericalanalysis software. When, on the other hand, one has some algebraic expressionsfor the pij and wants algebraic expressions for the Li, it is usually most eectiveto write out the system of equations represented by display (2.3) and to try andsee some slick way of solving for an Li of interest.

It is also worth noting that while the discussion in this section has centeredon the computation of mean times to absorption, other properties of time toabsorption variables can be derived and expressed in matrix notation. Forexample, Problem 2.22 shows that it is fairly easy to nd the variance (orstandard deviation) of time to absorption variables.


2.2 Some Applications of Markov Chains to theAnalysis of Process Monitoring Schemes

When the current condition of a process monitoring scheme can be thoughtof as discrete random variable (with a nite number of possible values), because

1. the variables Q1; Q2; ::. fed into it are intrinsically discrete (for examplerepresenting counts) and are therefore naturally modeled using a discreteprobability distribution (and the calculations prescribed by the schemeproduce only a xed number of possible outcomes),

2. discretization of the Qs has taken place as a part of the developmentof the monitoring scheme (as, for example, in the zone test schemesoutlined in Tables 3.5 through 3.7 of V&J), or

3. one approximates continuous distributions for Qs and/or states of thescheme with a nely-discretized version in order to approximate exact(continuous) run length properties,

one can often apply the material of the previous section to the prediction ofscheme behavior. (This is possible when the evolution of the monitoring schemecan be thought of in terms of movement between states where the conditionaldistribution of the next state depends only on a distribution for the next Qwhich itself depends only on the current state of the scheme.) This sectioncontains four examples of what can be done in this direction.

As an initial simple example, consider the simple monitoring scheme (sug-gested in the book Sampling Inspection and Quality Control by Wetherill) thatsignals an alarm the rst time

1. a single point Q plots outside 3 sigma limits, or

2. two consecutive Qs plot between 2 and 3 sigma limits.

(This is a simple competitor to the sets of alarm rules specied in Tables 3.5through 3.7 of V&J.) Suppose that one assumes that Q1; Q2; : : : are iid and

q1 = P [Q1 plots outside 3 sigma limits]

andq2 = P [Q1 plots between 2 and 3 sigma limits] :

Then one might think of describing the evolution of the monitoring scheme witha 3-state MC with states

S1 = all is OK,S2 = no alarm yet and the current Q is between 2 and 3 sigma limits, andS3 = alarm.

2.2. SOME APPLICATIONS OF MARKOV CHAINS TO THE ANALYSIS OF PROCESS MONITORING SCH

q + q1 2

S3

S2S1

1- q - q1 2

q2

1- q - q1 2

q1

1.0

0

Figure 2.2: Schematic for a MC with Transition Matrix (2.4)

For this representation, an appropriate transition matrix is

P =

0@ 1 q1 q2 q2 q11 q1 q2 0 q1 + q20 0 1

1A (2.4)and the ARL of the scheme (under the iid model for the Q sequence) is L1, themean time to absorption into the alarm state from the all-OK state. Figure2.2 is a schematic representation of this scenario.

It is worth noting that a system of equations for L1 and L2 is

L1 = 1 q1 + (1 + L2)q2 + (1 + L1)(1 q1 q2)L2 = 1 (q1 + q2) + (1 + L1)(1 q1 q2) ;

which is equivalent to

L1 = 1 + L1 (1 q1 q2) + L2q2L2 = 1 + L1(1 q1 q2) ;

which is the non-matrix version of the system (2.3) for this example. It iseasy enough to verify that this system of two linear equations in the unknownsL1 and L2 has a (simultaneous) solution with

L1 =1 + q2

1 (1 q1 q2) q2(1 q1 q2) :

As a second application of MC technology to the analysis of a process moni-toring scheme, we will consider a so-called Run-Sum scheme. To dene such a


scheme, one begins with zones for the variable Q as indicated in Figure 3.9 ofV&J. Then scores are dened for various possible values of Q. For j = 0; 1; 2a score of +j is assigned to the eventuality that Q is in the positive j-sigma to(j + 1)-sigma zone, while a score of j is assigned to the eventuality that Qis in the negative j-sigma to (j + 1)-sigma zone. A score of +3 is assigned toany Q above the upper 3-sigma limit while a score of 3 is assigned to any Qbelow the lower 3-sigma limit. Then, for the variables Q1; Q2; : : : one denescorresponding scores Q1; Q2; : : : and run sums R1; R2; : : : where

Ri = the sum of scores Q through time i under the provision that anew sum is begun whenever a score is observed with a sign dierentfrom the existing Run-Sum.

(Note, for example, that a new score of Q = +0 will reset a current Run-Sumof R = 2 to +0.) The Run-Sum scheme then signals at the rst i for whichjQi j = 3 or jRij 4.

Then dene states for a Run-Sum process monitoring scheme

S1 = no alarm yet and R = 0,S2 = no alarm yet and R = 1,S3 = no alarm yet and R = 2,S4 = no alarm yet and R = 3,S5 = no alarm yet and R = +0,S6 = no alarm yet and R = +1,S7 = no alarm yet and R = +2,S8 = no alarm yet and R = +3, andS9 = alarm.

If one assumes that the observations Q1; Q2; : : : are iid and for j = 3;2;1;0;+0;+1;+2;+3 lets

qj = P [Q1 = j] ;an appropriate transition matrix for describing the evolution of the scheme is

P =

0BBBBBBBBBBBB@

q0 q1 q2 0 q+0 q+1 q+2 0 q3 + q+30 q0 q1 q2 q+0 q+1 q+2 0 q3 + q+30 0 q0 q1 q+0 q+1 q+2 0 q3 + q2 + q+30 0 0 q0 q+0 q+1 q+2 0 q3 + q2 + q1 + q+1

q0 q1 q2 0 q+0 q+1 q+2 0 q3 + q+3q0 q1 q2 0 0 q+0 q+1 q+2 q3 + q+3q0 q1 q2 0 0 0 q+0 q+1 q3 + q+2 + q+3q0 q1 q2 0 0 0 0 q+0 q3 + q+1 + q+2 + q+30 0 0 0 0 0 0 0 1

1CCCCCCCCCCCCAand the ARL for the scheme is L1 = L5. (The fact that the 1st and 5th rows ofP are identical makes it clear that the mean times to absorption from S1 and S5

2.2. SOME APPLICATIONS OF MARKOV CHAINS TO THE ANALYSIS OF PROCESS MONITORING SCH

q-1

q0

q1

q2

qm

qm-1

q-m

h-h 0 2h/m-h/m h/m

f *(y)

... ...

Figure 2.3: Notational Conventions for Probabilities from Rounding Q k1Values

must be the same.) It turns out that clever manipulation with the non-matrixversion of display (2.3) in this example even produces a fairly simple expressionfor the schemes ARL. (See Problem 2.24 and Reynolds (1971 JQT ) and thereferences therein in this nal regard.)

To turn to a dierent type of application of the MC technology, considerthe analysis of a high side decision interval CUSUM scheme as described in4.2 of V&J. Suppose that the variables Q1; Q2; : : : are iid with a continuousdistribution specied by the probability density f(y). Then the variables Q1 k1; Q2 k1; Q3 k1; : : : are iid with probability density f(y) = f(y+k1). For apositive integer m, we will think of replacing the variables Qi k1 with versionsof them rounded to the nearest multiple of h=m before CUSUMing. Then theCUSUM scheme can be thought of in terms of a MC with states

Si = no alarm yet and the current CUSUM is (i 1)

hm

for i = 1; 2; : : : ;m andSm+1 = alarm.

Then let

qm =Z h+ 12( hm)

1f(y)dy = P [Q1 k1 h + 12

hm

] ;

qm =Z 1

h12 ( hm)f(y)dy = P [h 1

2

hm

< Q1 k1] ;

and for m < j < m take

qj =Z j( hm)+ 12( hm)

j( hm) 12( hm)f(y)dy : (2.5)

These notational conventions for probabilities qm; : : : ; qm are illustrated inFigure 2.3.

In this notation, the evolution of the high side decision interval CUSUMscheme can then be described in approximate terms by a MC with transition


matrix

P(m+1)(m+1)

=

0BBBBBBBBBBBBBBBBBBBB@

0Xj=m

qj q1 q2 qm1 qm1X

j=mqj q0 q1 qm2 qm1 + qm

2Xj=m

qj q1 q0 qm3 qm2 + qm1 + qm...

......

......

qm + qm+1 qm+2 qm+3 q0mX

j=1

qj

0 0 0 0 1

1CCCCCCCCCCCCCCCCCCCCA

:

For i = 1; : : : ;m the mean time to absorption from state Si (Li) is approximatelythe ARL of the scheme with head start (i 1) hm. (That is, the entries of thevector L specied in display (2.2) are approximate ARL values for the CUSUMscheme using various possible head starts.) In practice, in order to nd ARLsfor the original scheme with non-rounded iid observations Q, one would ndapproximate ARL values for an increasing sequence of ms until those appearto converge for the head start of interest.

As a nal example of the use of MC techniques in the probability modelingof process monitoring scheme behavior, consider discrete approximation of theEWMA schemes of 4.1 of V&J where the variables Q1; Q2; : : : are again iidwith continuous distribution specied by a pdf f(y). In this case, in order toprovide a tractable discrete approximation, it will not typically suce to simplydiscretize the variables Q (as the EWMA calculations will then typically producea number of possible/exact EWMA values that grows as time goes on). Instead,it is necessary to think directly in terms of rounded/discretized EWMAs. So foran odd positive integer m, let = (UCLEWMA LCLEWMA)=m and think ofreplacing an (exact) EWMA sequence with a rounded EWMA sequence takingon values ai dened by

ai:= LCLEWMA +

2

+ (i 1)

for i = 1; 2; : : : ;m. For i = 1; 2; :::;m let

Si = no alarm yet and the rounded EWMA is ai

and

Sm+1 = alarm.

2.3. INTEGRAL EQUATIONS AND RUN LENGTH PROPERTIES OF PROCESS MONITORING SCHEMES2

And for 1 i; j m, letqij = P [moving from Si to Sj ] ;

= P [aj 2 (1 )ai + Q aj +2

] ;

= P [aj (1 )ai

2 Q aj (1 )ai

+

2

] ;

= P [ai +(j i)

2 Q ai + (j i) +

2

] ;

=Z ai+ (ji) + 2

ai+(ji)

2f(y)dy : (2.6)

Then with

P =

0BBBBBBBBBBBBBB@

q11 q12 q1m 1 mX

j=1

q1j

q21 q22 q2m 1 mX

j=1

q2j

......

......

qm1 qm2 qmm 1 mX

j=1

qmj

0 0 0 1

1CCCCCCCCCCCCCCAthe mean time to absorption from the state S(m+1)=2 (the value L(m+1)=2) ofa MC with this transition matrix is an approximation for the EWMA schemeARL with EWMA0 = (UCLEWMA + LCLEWMA)=2. In practice, in order tond the ARL for the original scheme, one would nd approximate ARL valuesfor an increasing sequence of ms until those appear to converge.

The four examples in this section have illustrated the use of MC calculationsin the second and third of the two circumstances listed at the beginning of thissection. The rst circumstance is conceptually the simplest of the three, and isfor example illustrated by Problems 2.25, 2.28 and 2.37. The examples have alsoall dealt with iid models for the Q1; Q2; : : : sequence. Problem 2.26 shows thatthe methodology can also easily accommodate some kinds of dependencies inthe Q sequence. (The discrete model in Problem 2.26 is itself perhaps less thancompletely appealing, but the reader should consider the possibility of discreteapproximation of the kind of dependency structure employed in Problem 2.27before dismissing the basic concept illustrated in Problem 2.26 as useless.)

2.3 Integral Equations and Run Length Proper-ties of Process Monitoring Schemes

There is a second (and at rst appearance quite dierent) standard method ofapproaching the analysis of the run length behavior of some process monitoring


schemes where continuous variables Q are involved. That is through the use ofintegral equations, and this section introduces the use of these. (As it turns out,by the time one is forced to nd numerical solutions of the integral equations,there is not a whole lot of dierence between the methods of this section andthose of the previous one. But it is important to introduce this second point ofview and note the correspondence between approaches.)

Before going to the details of specic schemes and integral equations, a smallpiece of calculus/numerical analysis needs to be reviewed and notation set foruse in these notes. That concerns the approximation of denite integrals on theinterval [a; a + h]. Specication of a set of points

a a1 a2 am a + hand weights

wi 0 withmX

i=1

wi = h

so that Z a+ha

f(y)dy may be approximated asmX

i=1

wif(ai)

for reasonable functions f(y), is the specication of a so-called quadraturerule for approximating integrals on the interval [a; a+h]. The simplest of suchrules is probably the choice

ai:= a +

i 12m

h with wi

:=hm

: (2.7)

(This choice amounts to approximating an integral of f by a sum of signed areasof rectangles with bases h=m and (signed) heights chosen as the values of f atmidpoints of intervals of length h=m beginning at a.)

Now consider a high side CUSUM scheme as in 4.2 of V&J, where Q1; Q2; : : :are iid with continuous marginal distribution specied by the probability densityf(y). Dene the function

L1(u):= the ARL of the high side CUSUM scheme using a head start of u :

If one begins CUSUMing at u, there are three possibilities of where he/she will beafter a single observation, Q1. If Q1 is large (Q1 k1 hu) then there will bean immediate signal and the run length will be 1. If Q1 is small (Q1 k1 u)the CUSUM will zero out, one observation will have been spent, and onaverage L1(0) more observations are to be faced in order to produce a signal.Finally, if Q1 is moderate (u < Q1 k1 < h u) then one observation willhave been spent and the CUSUM will continue from u+(Q1 k1), requiring onaverage an additional L1(u + (Q1 k1)) observations to produce a signal. Thisreasoning leads to the equation for L1,

L1(u) = 1 P [Q1 k1 h u] + (1 + L1(0))P [Q1 k1 u]+

Z k1+huk1u

(1 + L1(u + y k1))f(y)dy :


Writing F (y) for the cdf of Q1 and simplifying slightly, this is

L1(u) = 1 + L1(0)F (k1 u) +Z h

0L1(y)f(y + k1 u)dy : (2.8)

The argument leading to equation (2.8) has a twin that produces an integralequation for

L2(v):= the ARL of a low side CUSUM scheme using a head start of v :

That equation is

L2(v) = 1 + L2(0) (1 F (k2 u)) +Z 0

hL2(y)f(y + k2 v)dy : (2.9)

And as indicated in display (4.20) of V&J, could one solve equations (2.8) and(2.9) (and thus obtain L1(0) and L2(0)) one would have not only separate highand low side CUSUM ARLs, but ARLs for some combined schemes as well.(Actually, more than what is stated in V&J can be proved. Yashchin in aJournal of Applied Probability paper in about 1985 showed that with iid Qs,high side decision interval h1 and low side decision interval h2 for nonnegativeh2, if k1 k2 and

(k1 k2) jh1 h1j max (0; u v max(h1; h2)) ;

for the simultaneous use of high and low side schemes

ARLcombined =L1(0)L2(v) + L1(u)L2(0) L1(0)L2(0)

L1(0) + L2(0):

It is easily veried that what is stated on page 151 of V&J is a special case ofthis result.) So in theory, to nd ARLs for CUSUM schemes one need onlysolve the integral equations (2.8) and (2.9). This is easier said than done. Theone case where fairly explicit solutions are known is that where observations areexponentially distributed (see Problem 2.30). In other cases one must resort tonumerical solution of the integral equations.

So consider the problem of approximate solution of equation (2.8). Fora particular quadrature rule for integrals on [0; h], for each ai one has fromequation (2.8) the approximation

L1(ai) 1 + L1(a1)F (k1 ai) +mX

j=1

wjL1(aj)f(aj + k1 ai) :


That is, at least approximately one has the system of m linear equations

L1(a1) = 1 + L1(a1)[F (k1 a1) + w1f(k1)] +mX

j=2

L1(aj)wjf(aj + k1 a1) ;

L1(a2) = 1 + L1(a1)[F (k1 a2) + w1f(a1 + k1 a2)] +mX

j=2

L1(aj)wjf(aj + k1 a2) ;

...

L1(am) = 1 + L1(a1)[F (k1 am) + w1f(a1 + k1 am)] +mX

j=2

L1(aj)wjf(aj + k1 am)

in the m unknowns L1(a1); : : : ; L1(am). Again in light of equation (2.8) and thenotion of numerical approximation of denite integrals, upon solving this set ofequations (for approximate values of (L1(a1); : : : ; L1(am)) one may approximatethe function L1(u) as

L1(u) 1 + L1(a1)F (k1 u) +mX

j=1

wjL1(aj)f(aj + k1 u) :

It is a revealing point that the system of equations above is of the form (2.3)that was so useful in the MC approach to the determination of ARLs. That is,let

L =

0BBB@L1(a1)L1(a2)

...L1(am)

1CCCAand

R =

0BBB@F (k1 a1) + w1f(k1) w2f(a2 + k1 a1) wmf(am + k1 a1)

F (k1 a2) + w1f(a1 + k1 a2) w2f(k1) wmf(am + k1 a2)...

......

F (k1 am) + w1f(a1 + k1 am) w2f(a2 + k1 am) wmf(k1)

1CCCAand note that the set of equations for the ai head start approximate ARLs isexactly of the form (2.3). With the simple quadrature rule in display (2.7) notethat a generic entry of R; rij , for j 2 is

rij = wjf(aj + k1 ai) =

hm

f

(j i)

hm

+ k1

:

But using again the notation f(y) = f(y+k1) employed in the CUSUM exampleof 2.2, this means

rij =

hm

f

(j i)

hm

Z (ji)( hm)+ 12( hm)(ji)( hm) 12( hm)

f(y)dy = qji


(in terms of the notation (2.5) from the CUSUM example). The point is thatwhether one begins from a discretize the Q k1 distribution and employ theMC material point of view or from a do numerical solution of an integralequation point of view is largely immaterial. Very similar large systems oflinear equations must be solved in order to nd approximate ARLs.

As a second application of integral equation ideas to the analysis of processmonitoring schemes, consider the EWMA schemes of 4.1 of V&J where Q1; Q2; : : :are iid with a continuous distribution specied by the probability density f(y).Let

L(u) = the ARL of a EWMA scheme with EWMA0 = u :

When one begins a EWMA sequence at u, there are 2 possibilities of wherehe/she will be after a single observation, Q1. If Q1 is extreme (Q1 +(1)u >UCLEWMA or Q1 + (1 )u < LCLEWMA) then there will be an immediatesignal and the run length will be 1. If Q1 is moderate (LCLEWMA Q1 +(1)u UCLEWMA) one observation will have been spent and on averageL(Q1+(1)u) more observations are to be faced in order to produce a signal.Now the event

LCLEWMA Q1 + (1 )u UCLEWMAis the event

LCLEWMA (1 )u

Q1 UCLEWMA (1 )u ;so this reasoning produces the equation

L(u) = 1

1 P [LCLEWMA (1 )u

Q1 UCLEWMA (1 )u ]

+Z UCLEWMA(1)u

LCLEWMA(1)u

(1 + L(y + (1 )u)) f(y)dy ;

or

L(u) = 1 +Z UCLEWMA(1)u

LCLEWMA(1)u

L(y + (1 )u)f(y)dy ;or nally

L(u) = 1 +1

Z UCLEWMALCLEWMA

L(y)f

y (1 )u

dy : (2.10)

As in the previous (CUSUM) case, one must usually resort to numericalmethods in order to approximate the solution to equation (2.10). For a partic-ular quadrature rule for integrals on [LCLEWMA; UCLEWMA], for each ai onehas from equation (2.10) the approximation

L(ai) 1 + 1mX

j=1

wjL(aj)f

aj (1 )ai

: (2.11)


Now expression (2.11) is standing for a set of m equations in the m unknownsL(a1); : : : ; L(am) that (as in the CUSUM case) can be thought of in terms ofthe matrix expression (2.3) if one takes

L =

0B@ L(a1)...L(am)

1CA and Rmm =

0@wjf aj(1)ai

1A : (2.12)Solution of the system represented by equation (2.11) or the matrix expression(2.3) with denitions (2.12) produces approximate values for L(a1); : : : ; L(am)and therefore an approximation for the function L(u) as

L(u) 1 + 1

mXj=1

wjL(aj)f

aj (1 )u

:

Again as in the CUSUM case, it is worth noting the similarity between theset of equations used to nd MC ARL approximations and the set of equa-tions used to nd integral equation ARL approximations. With the quadra-ture rule (2.7) and an odd integer m, using the notation = (UCLEWMA LCLEWMA)=m employed in 2.2 in the EWMA example, note that a genericentry of R dened in (2.12) is

rij =wjf

aj(1)ai

=f

ai +

(ji)

Z ai+ (ji) + 2

ai+(ji)

2f(y)dy = qij ;

(in terms of the notation (2.6) from the EWMA example of 2.2). That is,as in the CUSUM case, the sets of equations used in the MC and integralequation approximations for the EWMA0 = ai ARLs of the scheme are verysimilar.

As a nal example of the use of integral equations in the analysis of processmonitoring schemes, consider the X=MR schemes of 4.4 of V&J. Suppose thatobservations x1; x2; : : : are iid with continuous marginal distribution speciedby the probability density f(y). Dene the function

L(y) = the mean number of additional observations to alarm, given thatthere has been no alarm to date and the current observation is y.

Then note that as one begins X=MR monitoring, there are two possibilities ofwhere he/she will be after observing the rst individual, x1. If x1 is extreme(x1 < LCLx or x1 > UCLx) there will be an immediate signal and the runlength will be 1. If x1is not extreme (LCLx x1 UCLx) one observationwill have been spent and on average another L(x1) observations will be requiredin order to produce a signal. So it is reasonable that the ARL for the X=MRscheme is

ARL = 1 (1 P [LCLx x1 UCLx]) +Z UCLx

LCLx(1 + L(y))f(y)dy ;


that is

ARL = 1 +Z UCLx

LCLxL(y)f(y)dy ; (2.13)

where it remains to nd a way of computing the function L(y) in order to feedit into expression (2.13).

In order to derive an integral equation for L(y) consider the situation if therehas been no alarm and the current individual observation is y. There are twopossibilities of where one will be after observing one more individual, x. If xis extreme or too far from y (x < LCLx or x > UCLx or jx yj > UCLR)only one additional observation is required to produce a signal. On the otherhand, if x is not extreme and not too far from y (LCLx x UCLx andjx yj UCLR) one more observation will have been spent and on averageanother L(x) will be required to produce a signal. That is,

L(y) = 1 (P [x < LCLx or x > UCLx or jx yj > UCLR])+

Z min(UCLx;y+UCLR)max(LCLx;yUCLR)

(1 + L(x))f(x)dx ;

that is,

L(y) = 1 +Z min(UCLx;y+UCLR)

max(LCLx;yUCLR)L(x)f(x)dx

= 1 +Z UCLx

LCLxI[jx yj UCLR]L(x)f(x)dx : (2.14)

(The notation I[A] is indicator function notation, meaning that when A holdsI[A] = 1; and otherwise I[A] = 0.) As in the earlier CUSUM and EWMA ex-amples, once one species a quadrature rule for denite integrals on the interval[LCLx; UCUx], this expression (2.14) provides a set of m linear equations forapproximate values of L(ai)s. When this system is solved, the resulting valuescan be fed into a discretized version of equation (2.13) and an approximate ARLproduced. It is worth noting that the potential discontinuities of the integrandin equation (2.14) (produced by the indicator function) have the eect of mak-ing numerical solutions of this equation much less well-behaved than those forthe other integral equations developed in this section.

The examples of this section have dealt only with ARLs for schemes basedon (continuous) iid observations. It therefore should be said that:

1. The iid assumption can in some cases be relaxed to give tractable integralequations for situations where correlated sequences Q1; Q2; : : : are involved(see for example Problem 2.27),

2. Other descriptors of the run length distribution (beyond the ARL) canoften be shown to solve simple integral equations (see for example theintegral equations for CUSUM run length second moment and run lengthprobability function in Problem 2.31), and


3. In some cases, with discrete variables Q there are dierence equation ana-logues of the integral equations presented here (that ultimately correspondto the kind of MC calculations illustrated in the previous section).

Chapter 3

An Introduction to DiscreteStochastic ControlTheory/Minimum VarianceControl

Section 3.6 of V&J provides an elementary introduction to the topic of Engi-neering Control and contrasts this adjustment methodology with (the processmonitoring methodology of) control charting. The last item under the En-gineering Control heading of Table 3.10 of V&J makes reference to optimalstochastic control theory. The object of this theory is to model system behav-ior using probability tools and let the consequences of the model assumptionshelp guide one in the choice of eective control/adjustment algorithms. Thischapter provides a very brief introduction to this theory.

3.1 General ExpositionLet

f: : : ; Z(1); Z(0); Z(1); Z(2); : : :gstand for observations on a process assuming that no control actions are taken.One rst needs a stochastic/probabilistic model for the sequence fZ(t)g, andwe will let

Fstand for such a model. F is a joint distribution for the Zs and might, forexample, be:

1. a simple random walk model specied by the equation Z(t) = Z(t 1) +(t), where the s are iid normal (0; 2) random variables,

37

38CHAPTER 3. AN INTRODUCTION TO DISCRETE STOCHASTIC CONTROL THEORY/MIN

2. a random walk model with drift specied by the equation Z(t) = Z(t 1)+d+(t), where d is a constant and the s are iid normal (0; 2) randomvariables, or

3. some Box-Jenkins ARIMA model for the fZ(t)g sequence.Then let

a(t)

stand for a control action taken at time t, after observing the process. Oneneeds notation for the current impact of control actions taken in past periods,so we will further let

A(a; s)

stand for the current impact on the process of a control action a taken s periodsago. In many systems, the control actions, a, are numerical, and A(a; s) = ah(s)where h(s) is the so-called impulse response function giving the impact of aunit control action taken s periods previous. A(a; s) might, for example, be:

1. given by A(a; s) = a for s 1 in a machine tool control problem where ameans move the cutting tool out a units (and the controlled variable isa measured dimension of a work piece),

2. given by A(a; s) = 0 for s u and by A(a; s) = a for s > u in a machinetool control problem where a means move the cutting tool out a unitsand there are u periods of dead time, or

3. given by A(a; s) =1 exp sh a for s 1 in a chemical process

control problem with time constant and control period h seconds.

We will then assume that what one actually observes for (controlled) processbehavior at time t 1 is

Y (t) = Z(t) +t1Xs=0

A(a(s); t s) ;

which is the sum of what would have been observed with no control and all ofthe current eects of previous control actions. For t 0, a(t) will be chosenbased on

f: : : ; Z(1); Z(0); Y (1); Y (2); : : : ; Y (t)g :A common objective in this context is to choose the actions so as to minimize

EF (Y (t) T (t))2

ortX

s=1

EF (Y (s) T (s))2

3.1. GENERAL EXPOSITION 39

for some (possibly time-dependent) target value T (s). The problem of choosingof control actions to accomplish this goal is called the minimum variance(MV) control problem, and it has a solution that can be described in fairly(deceptively, perhaps) simple terms.

Note rst that given f: : : ; Z(1); Z(0); Y (1); Y (2); : : : ; Y (t)g one can recoverf: : : ; Z(1); Z(0); Z(1); Z(2); : : : ; Z(t)g. This is because

Z(s) = Y (s) s1Xr=0

A(a(r); s r)

i.e., to get Z(s), one simply subtracts the (known) eects of previous controlactions from Y (s).

Then the model F (at least in theory) provides one a conditional distributionfor Z(t + 1); Z(t + 2); Z(t + 3); : : : given the observed Zs through time t. Theconditional distribution for Z(t + 1); Z(t + 2); Z(t + 3) : : : given what one canobserve through time t, namely f: : : ; Z(1); Z(0); Y (1); Y (2); : : : ; Y (t)g, is thenthe conditional distribution one gets for Z(t+1); Z(t+2); Z(t+3); : : : from themodel F after recovering Z(1); Z(2); : : : ; Z(t) from the corresponding Y s. Thenfor s t + 1, let

EF [Z(s)j : : : ; Z(1); Z(0); Z(1); Z(2); : : : ; Z(t)] or just EF [Z(s)jZt]stand for the mean of this conditional distribution of Z(s) available at time t.

Suppose that there are u 0 periods of dead time (u could be 0). Thenthe earliest Y that one can hope to inuence by choice of a(t) is Y (t + u + 1).Notice then that if one takes action a(t) at time t, ones most natural projectionof Y (t + u + 1) at time t is

bY (t+u+1jt) := EF [Z(t +u+1)jZt] + t1Xs=0

A(a(s); t +u+1 s) +A(a(t); u+1)

It is then natural (and in fact turns out to give the MV control strategy) to tryto choose a(t) so that

bY (t + u + 1jt) = T (t + u + 1) :That is, the MV strategy is to try to choose a(t) so that

A(a(t); u+1) = T (t+u+1)(

EF [Z(t + u + 1)jZt] +t1Xs=0

A(a(s); t + u + 1 s))

:

A caveat here is that in practice MV control tends to be ragged. Thatis, in order to exactly optimize the mean squared error, constant tweaking (andoften fairly large adjustments are required). By changing ones control objectivesomewhat it is possible to produce smoother optimal control policies that are


nearly as eective as MV algorithms in terms of keeping a process on target.That is, instead of trying to optimize

EFtX

s=1

(Y (s) T (s))2 ;

in a situation where the as are numerical (a = 0 indicating no adjustmentand the size of adjustments increasing with jaj) one might for a constant > 0set out to minimize the alternative criterion

EF

tX

s=1

(Y (s) T (s))2 + t1Xs=0

(a(s))2!

:

Doing so will smooth the MV algorithm.

3.2 An ExampleTo illustrate the meaning of the preceding formalism, consider the model (F)specied by

Z(t) = W (t) + (t) for t 0and W (t) = W (t 1) + d + (t) for t 1

(3.1)

for d a (known) constant, the s normal (0; 2 ), the s normal (0; 2) andall the s and s independent. (Z(t) is a random walk with drift observedwith error.) Under this model and an appropriate 0 mean normal initializingdistribution for W(0), it is the case that each

bZ(t + 1j t) := EF [Z(t + 1)jZ(0); : : : ; Z(t)]may be computed recursively asbZ(t + 1jt) = Z(t) + (1 ) bZ(tjt 1) + dfor some constant (that depends upon the known variances 2 and 2).

We will nd MV control policies under model (3.1) with two dierent func-tions A(a; s). Consider rst the possibility

A(a; s) = a 8s 1 ; (3:2:2) (3.2)(an adjustment a at a given time period takes its full and permanent eectat the next time period).

Consider the situation at time t = 0. Available are Z(0) and bZ(0j1) (theprior mean of W (0)) and from these one may compute the prediction

bZ(1j0) := Z(0) + (1 ) bZ(0j1) + d :

3.2. AN EXAMPLE 41

That means that taking control action a(0), one should predict a value of

bY (1j0) := bZ(1j0) + a(0)for the controlled process at time t = 1, and upon setting this equal to thetarget T (1) and solving for a(0) one should thus choose

a(0) = T (1) bZ(1j0) :At time t = 1 one has observed Y (1) and may recover Z(1) by noting that

Y (1) = Z(1) + A(a(0); 1) = Z(1) + a(0) ;

so thatZ(1) = Y (1) a(0) :

Then a prediction (of the uncontrolled process) one step ahead is

bZ(2j1) := Z(1) + (1 ) bZ(1j0) + d :That means that with a target of T (2) one should predict a value of the con-trolled process at time t = 2 ofbY (2j1) := bZ(2j1) + a(0) + a(1) :Upon setting this value equal to T (2) and solving it is clear that one shouldchoose

a(1) = T (2) bZ(2j1) + a(0) :

So in general under (3.2), at time t one may note that

Z(t) = Y (t) t1Xs=0

a(s)

and (recursively) compute

bZ(t + 1jt) := Z(t) + (1 ) bZ(tjt 1) + d :Then setting the predicted value of the controlled process equal to T (t+1) andsolving for a(t), nd the MV control action

a(t) = T (t + 1) bZ(t + 1jt) + t1X

s=0

a(s)

!:

Finally, consider the problem of MV control under the same model (3.1),but now using

A(a; s) =

0 if s = 1a for s = 2; 3; : : : (3.3)


(a description of response to process adjustment involving one period of delay,after which the full eect of an adjustment is immediately and permanentlyfelt).

Consider the situation at time t = 0. In hand are Z(0) and the prior meanof W (0), bZ(0j1), and the rst Y that one can aect by choice of a(0) is Y (2).Now

Z(2) = W (2) + (2) ;= W (1) + d + (2) + (2) ;= Z(1) (1) + d + (2) + (2)

so that bZ(2j0) := EF [Z(2)jZ(0)] ;= EF [Z(1) (1) + d + (2) + (2)jZ(0)] ;= bZ(1j0) + d ;= Z(0) + (1 ) bZ(0j1) + 2d

is a prediction of where the uncontrolled process will be at time t = 2. Then aprediction for the controlled process at time t = 2 isbY (2j0) := bZ(2j0) + A(a(0); 2) = bZ(2j0) + a(0)and upon setting this equal to the time t = 2 target, T (2), and solving, one hasthe MV control action

a(0) = T (2) bZ(2j0) :At time t = 1 one has in hand Y (1) = Z(1) and bZ(1j0) and the rst Y that

can be aected by the choice of a(1) is Y (3). Now

Z(3) = W (3) + (3) ;= W (2) + d + (3) + (3) ;= Z(2) (2) + d + (3) + (3)

so that bZ(3j1) := EF [Z(3)jZ(0); Z(1)] ;= EF [Z(2) (2) + d + (3) + (3)jZ(0); Z(1)] ;= bZ(2j1) + d ;= Z(1) + (1 ) bZ(1j0) + 2d

is a prediction of where the uncontrolled process will be at time t = 3. Then aprediction for the controlled process at time t = 3 isbY (3j1) := bZ(3j1) + A(a(0); 3) + A(a(1); 2) = bZ(3j1) + a(0) + a(1)

3.2. AN EXAMPLE 43

and upon setting this equal to the time t = 3 target, T (3), and solving, one hasthe MV control action

a(1) = T (3) bZ(3j1) + a(0) :

Finally, in general under (3.3), one may at time t note that

Z(t) = Y (t) t2Xs=0

a(s)

and (recursively) compute

bZ(t + 2jt) := Z(t) + (1 ) bZ(tjt 1) + 2d :Then setting the time t + 2 predicted value of the controlled process equal toT (t + 2) and solving for a(t), we nd the MV control action

a(t) = T (t + 2) bZ(t + 2jt) + t1X

s=0

a(s)

!:

Chapter 4

Process Characterizationand Capability Analysis

Sections 5.1 through 5.3 of V&J discuss the problem of summarizing the be-havior of a stable process. The bottom line of that discussion is that one-sample statistical methods can be used in a straightforward manner to char-acterize a process/population/universe standing behind data collected understable process conditions. Section 5.5 of V&J opens a discussion of summariz-ing process behavior when it is not sensible to model all data in hand as randomdraws from a single/xed universe. The notes in this chapter carry the themeof 5.5 of V&J slightly further and add some theoretical detail missing in thebook.

4.1 General Comments on Assessing and Dis-secting Overall Variation

The questions How much variation is there overall? and Where is the varia-tion coming from? are fundamental to process characterization/understandingand the guidance of improvement eorts. To provide a framework for discus-sion here, suppose that in hand one has r samples of data, sample i of size ni(i = 1; : : : ; r). Depending upon the specic application, these r samples canhave many dierent logical structures. For example, 5.5 of V&J considers thecase where the ni are all the same and the r samples are naturally thought of ashaving a balanced hierarchical/tree structure. But many others (both regularand completely irregular) are possible. For example Figure 4.1 is a schematicparallel to Figure 5.16 of V&J for a staggered nested data structure.

When data in hand represent the entire universe of interest, methods ofprobability and statistical inference have no relevance to the basic questionsHow much variation is there overall? and Where is the variation comingfrom? The problem is one of descriptive statistics only, and various creative

45

46CHAPTER 4. PROCESS CHARACTERIZATION AND CAPABILITY ANALYSIS

1

1 2

1

1 2 1 1 1

Level of A

Level of B(A)

Level of C(B(A))

Level of D(C(B(A)))

2

1

1 2 1

Figure 4.1: Schematic of a staggered Nested Data Set

combinations of methods of statistical graphics and basic numerical measures(like sample variances and ranges) can be assembled to address these issues.And most simply, a grand sample variance is one sensible characterization ofoverall variation.

The tools of probability and statistical inference only become relevant whenone sees data in hand as representing something more than themselves. Andthere are basically two standard routes to take in this enterprise. The rstposits some statistical model for the process standing behind the data (like thehierarchical random eects model (5.28) of V&J). One may then use the datain hand in the estimation of parameters (and functions of parameters) of thatmodel in order to characterize process behavior, assess overall variability anddissect that variation into interpretable pieces.

The second standard way in which probabilistic and statistical methods be-come relevant (to the problems of assessing overall variation and analysis of itscomponents) is through the adoption of a nite population sampling perspec-tive. That is, there are times where there is conceptually some (possibly highlystructured) concrete data set of interest and the data in hand arise through theapplication (possibly in various complicated ways) of random selection of someof the elements of that data set. (As one possible example, think of a warehousethat contains 100 crates, each of which contains 4 trays, each of which in turnholds 50 individual machine parts. The 20,000 parts in the warehouse couldconstitute a concrete population of interest. If one were to sample 3 cratesat random, select at random 2 trays from each and then select 5 parts fromeach tray at random, one has a classical nite population sampling problem.Probability/randomness has entered through the sampling that is necessitatedbecause one is unwilling to collect data on all 20,000 parts.)

Section 5.5 of V&J introduces the rst of these two approaches to assessingand dissecting overall variation for balanced hierarchical data. But it does nottreat the nite population sampling ideas at all. The present chapter of thesenotes thus extends slightly the random eects analysis ideas discussed in 5.5and then presents some simple material from the theory of nite population

4.2. MORE ON ANALYSIS UNDER THE HIERARCHICAL RANDOM EFFECTS MODEL47

sampling.

4.2 More on Analysis Under the HierarchicalRandom Eects Model

Consider the hierarchical random eects model with 2 levels of nesting discussedin 5.5.2 of V&J. We will continue the notations yijk; yij ; yi: and y:: used inthat section and also adopt some additional notation. For one thing, it will beuseful to dene some ranges. Let

Rij = maxk

yijkmink

yijk = the range of the jth sample within the ith level of A ;

i = maxj

yijminj

yij = the range of the J sample means within the ith level of A ;

and

= maxi

yi: mini

yi: = the range of the means for the I levels of A :

It will also be useful to consider the ANOVA sums of squares and meansquares alluded to briey in 5.5.3. So let

SSTot =Xi;j;k

(yijk y::)2

= (IJK 1) the grand sample variance of all IJK observations ;SSC(B(A)) =

Xi;j;k

(yijk yij)2

= (K 1) the sum of all IJ level C sample variances ;SSB(A) = K

Xi;j

(yij yi:)2

= K(J 1) the sum of all I sample variances of J means yijand

SSA = KJX

i

(yi: y::)2

= KJ(I 1) the sample variance of the I means yi: :Note that in the notation of 5.5.2, SSA = KJ(I 1)s2A, SSB(A) = K(J 1)

PIi=1 s

2Bi and SSC(B(A)) = (K 1)

Pi;j s

2ij = IJ(K 1)b2. And it is an

algebraic fact that SSTot = SSA + SSB(A) + SSC(B(A)).Mean squares are derived from these sums of squares by dividing by appro-

priate degrees of freedom. That is, dene

MSA :=SSAI 1 ;


MSB(A) :=SSB(A)I(J 1) ;

andMSC(B(A)) :=

SSC(B(A))IJ(K 1) :

Now these ranges, sums of squares and mean squares are interesting measuresof variation in their own right, but are especially helpful when used to produceestimates of variance components and functions of variance components. Forexample, it is straightforward to verify that under the hierarchical random eectsmodel (5.28) of V&J

ERij = d2(K) ;

Ei = d2(J)q

2 + 2=K

andE = d2(I)

q2 + 2=J + 2=JK :

So, reasoning as in 2.2.2 of V&J (there in the context of two-way random eectsmodels and gage R&R) reasonable range-based point estimates of the variancecomponents are b2 = R

d2(K)

2;

b2 = max0; d2(J)2

b2K

!and b2 = max0; d2(I)

2 1

J

d2(J)

2!:

Now by applying linear model theory or reasoning from V&J displays (5.30)and (5.32) and the fact that Es2ij = 2, one can nd expected values for themean squares above. These are

EMSA = KJ2 + K2 +

2 ;

EMSB(A) = K2 + 2

andEMSC (B(A)) = 2 :

And in a fashion completely parallel to the exposition in 1.4 of these notes,standard linear model theory implies that the quantities

IJ(K 1)MSC(B(A))EMSC(B(A))

;I(J 1)MSB(A)

EMSB(A)and

(I 1)MSAEMSA

are independent 2 random variables with respective degrees of freedom

IJ(K 1); I(J 1) and (I 1) :

4.2. MORE ON ANALYSIS UNDER THE HIERARCHICAL RANDOM EFFECTS MODEL49

Table 4.1: Balanced Data Hierarchical Random Eects Analysis ANOVA Table(2 Levels of Nesting)

ANOVA TableSource SS df MS EMSA SSA I 1 MSA KJ2 + K2 + 2B(A) SSB(A) I(J 1) MSB(A) K2 + 2C(B(A)) SSC(B(A)) IJ(K 1) MSC(B(A)) 2Total SSTot IJK 1

These facts about sums of squares and mean squares for the hierarchicalrandom eects model are conveniently summarized in the usual (hierarchicalrandom eects model) ANOVA table (for two levels of nesting), Table 4.1. Fur-ther, the fact that the expected mean squares are simple linear combinationsof the variance components 2, 2 and

2 motivates the use of linear combina-tions of mean squares in the estimation of the variance components (as in 5.5.3of V&J). In fact (as indicated in 5.5.3 of V&J) the standard ANOVA-basedestimators b2 = SSC(B(A))

IJ(K 1) ;

b2 = 1K max

0;SSB(A)I(J 1) b2

and b2 = 1JK max

0;

SSA(I 1)

SSB(A)I(J 1)

are exactly the estimators (described without using ANOVA notation) in dis-plays (5.29), (5.31) and (5.33) of V&J. The virtue of describing them in thepresent terms is to suggest/emphasize that all that was said in 1.4 and 1.5(in the gage R&R context) about making standard errors for functions of meansquares and ANOVA-based condence intervals for functions of variance com-ponents is equally true in the present context.

For example, the formula (1.3) of these notes can be applied to derive stan-dard errors for b2 and b2 immediately above. Or since

2 =1K

EMSB(A) 1K

EMSC(B(A))

and2 =

1JK

EMSA 1JK

EMSB(A)

are both of form (1.4), the material of 1.5 can be used to set condence limitsfor these quantities.

As a nal note in this discussion of the what is possible under the hierarchicalrandom eects model, it is worth noting that while the present discussion hasbeen conned to a balanced data framework, Problem 4.8 shows that at least


point estimation of variance components can be done in a fairly elementaryfashion even in unbalanced data contexts.

4.3 Finite Population Sampling and BalancedHierarchical Structures

This brief subsection is meant to illustrate the kinds of things that can be donewith nite population sampling theory in terms of estimating overall variabilityin a (balanced) hierarchical concrete population of items and dissecting thatvariability.

Consider rst a nite population consisting of NM items arranged into Nlevels of A, with M levels of B within each level of A. (For example, there mightbe N boxes, each containing M widgets. Or there might be N days, on each ofwhich M items are manufactured.) Let

yij = a measurement on the item at level i of A and level j of B within

531notesv2

Documents