supplemental material to intro to sqc 6th ed

8/9/2019 Supplemental Material to Intro to SQC 6th Ed

1/93

1

About the Supplemental Text Material

I have prepared supplemental text material to accompany the 6 th edition of Introduction to Statistical QualityControl . This material consists of (1) additional background reading on some aspects of statistics and

statistical quality control and improvement, (2) extensions of and elaboration on some textbook topics and (3)some new topics that I could not easily find a home for in the text without making the book much t Much of this material has been prepared in at least partial response to the many excellent and very helpfulsuggestions that have been made over the years by textbook users. However, sometimes there just waway to easily accommodate their suggestions directly in the book. Some of the supplemental material is alsoin response to FAQs or frequently asked questions from students. I have also provided a list of refor this supplemental material that are not cited in the textbook.

Feedback from my colleagues indicates that this book is used in a variety of ways. Most often, it is used asthe textbook in an upper-division undergraduate course on statistical quality control and improvement.However, there are a significant number of instructors that use the book as the basis of a graduate-level course,

or offer a course taken by a mixture of advanced undergraduates and graduate students. Obviously the topicalcontent and depth of coverage varies widely in these courses. Consequently, I have included somesupplemental material on topics that might be of interest in a more advanced undergraduate or graduate-levelcourse.

There is considerable personal bias in my selection of topics for the supplemental material. The cofar from comprehensive.

I have not felt as constrained about mathematical level or statistical background of the readers in thesupplemental material as I have tried to be in writing the textbook. There are sections of the supplementalmaterial that will require more background in statistics than is required to read the text material. However, Ithink that many instructors will be able to use selected portions of this supplement material in their coursesquite effectively, depending on the maturity and background of the students.

Supplemental Text Material Contents

Chapter 3

S3-1. Independent Random Variables

S3-2. Development of the Poisson Distribution

S3-3. The Mean and Variance of the Normal Distribution

S3-4. More about the Lognormal Distribution

S3-5. More about the Gamma Distribution

S3-6. The Failure Rate for the Exponential Distribution

S3-7. The Failure Rate for the Weibull Distribution

Chapter 4

S4-1. Random Samples

S4-2. Expected Value and Variance Operators


2/93

2

S4-3. Proof That 2 2( ) and ( ) E x E S

S4-4. More about Parameter Estimation

S4-5. Proof That ( ) E S

S4-6. More about Checking Assumptions in the t -TestS4-7. Expected Mean Squares in the Single-Factor Analysis of Variance

Chapter 5

S5-1. A Simple Alternative to Runs Rules on the x Chart

Chapter 6

S6-1. s 2 is not Always an Unbiased Estimator of

2

S6-2. Should We Use 2d or*

2d in Estimating via the Range Method?

S6-3. Determining When the Process has Shifted

S6-4. More about Monitoring Variability with Individual Observations

S6-5. Detecting Drifts versus Shifts in the Process Mean

S6-6. The Mean Square Successive Difference as an Estimator of 2

Chapter 7

S7-1. Probability Limits on Control Charts

Chapter 8

S8-1. Fixed Versus Random Factors in the Analysis of Variance

S8-2. More about Analysis of Variance Methods for Measurement Systems Capability Studies

Chapter 9S9-1. The Markov Chain Approach for Finding the ARLs for Cusum and EWMA ControlCharts

S9-2. Integral Equations versus Markov Chains for Finding the ARL

Chapter 10

S10-1. Difference Control Charts

S10-2. Control Charts for Contrasts

S10-3. Run Sum and Zone Control Charts

S10-4. More about Adaptive Control Charts


3/93

3

Chapter 11

S-11.1 Multivariate Cusum Control Charts

Chapter 13

S13-1. Guidelines for Planning Experiments

S13-2. Using a t -Test for Detecting Curvature

S13-3. Blocking in Designed Experiments

S13-4. More about Expected Mean Squares in the Analysis of Variance

Chapter 14

S14-1. Response Surface Designs

S14-2. Fitting Regression Models by Least Squares

S14-3. More about Robust Design and Process Robustness Studies

Chapter 15

S15-1. A Lot Sensitive Compliance (LTPD) Sampling Plan

S15-2. Consideration of Inspection Errors


4/93

4

Supplemental Material for Chapter 3

S3.1. Independent Random Variables

Preliminary Remarks

Readers encounter random variables throughout the textbook. An informal definition of and notationfor random variables is used. A random variable may be thought of informally as any variable forwhich the measured or observed value depends on a random or chance mechanism. That is, the valueof a random variable cannot be known in advance of actual observation of the phenomena. Formally,of course, a random variable is a function that assigns a real number to each outcome in the samplespace of the observed phenomena. Furthermore, it is customary to distinguish between the randomvariable and its observed value or realization by using an upper-case letter to denote the randomvariable (say X ) and the actual numerical value x that is the result of an observation or a measuredvalue. This formal notation is not used in the book because (1) it is not widely employed in thestatistical quality control field and (2) it is usually quite clear from the context whether we are

discussing the random variable or its realization.Independent Random variables

In the textbook, we make frequent use of the concept of independent random variables. Most readershave been exposed to this in a basic statistics course, but here a brief review of the concept is given.For convenience, we consider only the case of continuous random variables. For the case of discreterandom variables, refer to Montgomery and Runger (2007).

Often there will be two or more random variables that jointly define some physical phenomena of interest. Forexample, suppose we consider injection-molded components used to assemble a connector for an automotiveapplication. To adequately describe the connector, we might need to study both the hole interior diameter andthe wall thickness of the component. Let x1 represent the hole interior diameter and x2 represent the wallthickness. The joint probability distribution (or density function) of these two continuous random variablescan be specified by providing a method for calculating the probability that x1 and x2 assume a value in anyregion R of two-dimensional space, where the region R is often called the range space of the random variable.This is analogous to the probability density function for a single random variable. Let this joint probabilitydensity function be denoted by 1 2( , ) f x x . Now the double integral of this joint probability density functionover a specified region R provides the probability that x1 and x2 assume values in the range space R.

A joint probability density function has the following properties:

a. 1 2 1 2( , ) 0 for all , f x x x x

b. 1 2 1 2( , ) 1 f x x dx dx

c. For any region R of two-dimensional space 1 2 1 2 1 2{( , ) } ( , )

R

P x x R f x x dx dx

The two random variables x1 and x2 are independent if 1 2 1 1 2 2( , ) ( ) ( ) f x x f x f x where 1 1 2 2( ) and ( ) f x f x arethe marginal probability distributions of x1 and x2, respectively, defined as

1 1 1 2 2 2 2 1 2 1( ) ( , ) and ( ) ( , ) f x f x x dx f x f x x dx

In general, if there are p random variables 1 2, , ..., p x x x then the joint probability density function is

1 2( , ,..., ) p f x x x , with the properties:

a. 1 2 1 2( , ,..., ) 0 for all , ,..., p p f x x x x x x


5/93

5

b. 1 2 1 2... ( , ..., ) ... 1 p p R

f x x x dx dx dx

c. For any region R of p-dimensional space,

1 2 1 2 1 2{( , ,..., ) } ... ( , ,..., ) ... p p p R

P x x x R f x x x dx dx dx

The random variables x1, x2, , x p are independent if

1 2 1 1 2 2( , ,..., ) ( ) ( )... ( ) p p p f x x x f x f x f x

where ( )i i f x are the marginal probability distributions of x1, x2 , , x p, respectively, defined as

1 2 1 2 1 1( ) ... ( , ,..., ) ... ... xi

i i p i i p R

f x f x x x dx dx dx dx dx

S3.2. Development of the Poisson Distribution

The Poisson distribution is widely used in statistical quality control and improvement, frequently asthe underlying probability model for count data . As noted in Section 3.2.3 of the text, the Poissondistribution can be derived as a limiting form of the binomial distribution, and it can also be developedfrom a probability argument based on the birth and death process. We now give a summary of bothdevelopments.

The Poisson Distribution as a Limiting Form of the Binomial Distribution

Consider the binomial distribution

( ) (1 )

!(1 ) , 0,1, 2,...,

!( )!

x n x

x n x

n p x p p x

n p p x n

x n x

Let np so that / p n . We may now write the binomial distribution as

( 1)( 2) ( 1)( )

!

1 2 1(1) 1 1 1 1 1

!

x n x

x n x

n n n n x n p x

x n n

x

x n n n n n

Let and 0n p so that np remains constant. The terms1 2 1

1 , 1 ,..., 1 x

n n n

and

1 x

n

all approach unity. Furthermore,

1 asn

e nn

Thus, upon substitution we see that the limiting form of the binomial distribution is

( )!

xe p x

x


6/93

6

which is the Poisson distribution.

Development of the Poisson Distribution from the Poisson Process

Consider a collection of time- oriented events, arbitrarily called arrivals or births. Let xt be thenumber of these arrivals or births that occur in the interval [0,t ). Note that the range space of xtis R = {0,1,}. Assume that the number of births during non-overlapping time intervals areindependent random variables, and that there is a positive constant such that for any small timeinterval t , the following statements are true:

1. The probability that exactly one birth will occur in an interval of length t is t .

2. The probability that zero births will occur in the interval is 1 t .

3. The probability that more than one birth will occur in the interval is zero.

The parameter is often called the mean arrival rate or the mean birth rate. This type of process, inwhich the probability of observing exactly one event in a small interval of time is constant (or the

probability of occurrence of event is directly proportional to the length of the time interval), and theoccurrence of events in non-overlapping time intervals is independent is called a Poisson process .

In the following, let

{ } ( ) ( ), 0,1, 2,...t x P x x p x p t x

Suppose that there have been no births up to time t. The probability that there are no births at the endof time t + t is

0 0( ) (1 ) ( ) p t t t p t

Note that

0 00

( ) ( )( )

p t t p t p t

t

so consequently

0 000

0

( ) ( )lim ( )

( )

t

p t t p t p t

t

p t

For x > 0 births at the end of time t + t we have

1( ) ( ) (1 ) ( ) x x x p t t p t t t p t

and

0

1

( ) ( )lim ( )

( ) ( )

x x x

t

x x

p t t p t p t

t

p t p t

Thus we have a system of differential equations that describe the arrivals or births:

0 0

1

( ) ( ) for 0

( ) ( ) ( ) for 1,2,... x x x

p t p t x

p t p t p t x


7/93

7

The solution to this set of equations is

( )( ) 0,1, 2,...

!

x t

x

t e p t x

x

Obviously for a fixed value of t this is the Poisson distribution.

S3.3. The Mean and Variance of the Normal Distribution

In Section 3.3.1 we introduce the normal distribution, with probability density function

22

1( )

21( ) ,2

x

f x e x

and we stated that 2 and are the mean and variance, respectively, of the distribution. We nowshow that this claim is correct.

Note that ( ) 0 f x . We first evaluate the integral ( ) I f x dx, showing that it is equal to 1. In

the integral, change the variable of integration to ( ) / z x . Then

2 / 21

2

z I e dz

Since 20, if 1, then 1 I I I . Now we may write

2 2

2 2

2 / 2 / 2

( ) / 2

12

12

x y

x y

I e dx e dy

e dxdy

If we switch to polar coordinates, then cos( ), sin( ) x r y r and222 / 2

0 0

2

0

1

2

1 12 1

2 2

r I e rdrd

d

So we have shown that ( ) f x has the properties of a probability density function.

The integrand obtained by the substitution ( ) / z x is, of course, the standard normal distribution , an important special case of the more general normal distribution. The standard normal

probability density function has a special notation, namely

2 / 21( ) ,2

z z e z

and the cumulative standard normal distribution is

( ) ( ) z

z t dt

Several useful properties of the standard normal distribution can be found by basic calculus:

1. ( ) ( ), z z for all real z , so ( ) z is an even function (symmetric about 0) of z

2. ( ) ( ) z z z


8/93

8

3. 2( ) ( 1) ( ) z z z

Consequently, ( ) z has a unique maximum at z = 0, inflection points at 1 z , and both( ) 0 and ( ) 0 as z z z .

The mean and variance of the standard normal distribution are found as follows:

( ) ( )

( )

( ) |

0

E z z z dz

z dz

z

and

2 2( ) ( )

[ ( ) ( )]

( ) | ( )

0 1

1

E z z z dz

z z dz

z z dz

Because the variance of a random variable can be expressed in terms of expectation as2 2 2 2( ) ( ) , E z E z we have shown that the mean and variance of the standard normal

distribution are 0 and 1, respectively.

Now consider the case where x follows the more general normal distribution. Based on thesubstitution, we have ( ) / z x

22

1( )

21( )2

( ) ( )

( ) ( )

(1) (0)

x

E x x e dx

z z dz

z dz z z dz

and

22

1( )

2 2 2

2

2 2

2 2

1( )

2

( ) ( )

( ) 2 ( ) ( )

x

E x x e dx

z z dz

z dz z z dz z dz

Therefore, it follows that 2 2 2 2 2 2( ) ( ) ( )V x E x .


9/93

9

S3.4. More about the Lognormal Distribution

The lognormal distribution is a general distribution of wide applicability. The lognormal distributionis defined only for positive values of the random variable x and the probability density function is

22

1(ln )

21

( ) 02

x

f x e x x

The parameters of the lognormal distribution are 2 and 0 . The lognormalrandom variable is related to the normal random variable in that ln y x is normally distributed with

mean 2 and variance .

The mean and variance of the lognormal distribution are21

2

2 22 2

( )

( ) ( 1)

x

x

E x e

V x e e

The median and mode of the lognormal distribution are

2

x e

mo e

In general, the k th origin moment of the lognormal random variable is2 21

2( ) k k k E x e

Like the gamma and Weibull distributions, the lognormal finds application in reliability engineering,often as a model for survival time of components or systems. Some important properties of thelognormal distribution are:

1. If x1 and x2 are independent lognormal random variables with parameters 2 21 1 2 2( , ), ( , ) ,respectively, then 1 2 y x x is a lognormal random variable with parameters

2 2

1 2 1 2and .

2. If 1 2, , ..., k x x x are independently and identically distributed lognormal random variables with

parameters 2 and , then the geometric mean of the xi, or1/

1

k k

i

i

x

, has a lognormal

distribution with parameters 2 and / k .

3. If x is a lognormal random variable with parameters 2 and , and if a, b, and c are constants

such that cb e , then the random variable a y bx has a lognormal distribution with

parameters2 2 andc a a .

S3.5. More about the Gamma Distribution

The gamma distribution is introduced in Section 3.3.4. The gamma probability density function is

1( ) ( ) , 0

( )

r x f x x e x

r

where r > 0 is a shape parameter and 0 is a scale parameter. The parameter r is called a shape parameter because it determines the basic shape of the graph of the density function. For example, if r = 1, the gamma


10/93

10

distribution reduces to an exponential distribution. There are actually three basic shapes; 1r orhyperexponential, r = 1 or exponential, and r > 1 or unimodal with right skew.

The cumulative distribution function of the gamma is

1

0( ; , ) ( )

( )

x r x F x r t e dt r

The substitution /u t in this integral results in ( ; , ) ( / ; ,1) F x r F x r , which depends on only

through the variable / x . We typically call such a parameter a scale parameter . It can be important to havea scale parameter in a probability distribution so that the results do not depend on the scale of measurementactually used. For example, suppose that we are measuring time in months, and 6 . The probability that

x is less than or equal to 12 months is (12 / 6; ,1) (2; ,1) F r F r . If we wish to consider measuring time inweeks, then the probability that x is less than or equal to 48 weeks is just (48/ 24; ,1) (2; ,1) F r F r .Therefore, different scales of measurement can be accommodated by changing the scale parameter withouthaving to change to a more general form of the distribution.

When r is an integer, the gamma distribution is sometimes called the Erlang distribution. Another special caseof the gamma distribution arises when we let r = , 1, 3/2, 2, and 1/ 2 ; this is the chi -square distribution with degrees of freedom / 1, 2, ...r . The chi-square distribution is very important in statisticalinference.

S3.6. The Failure Rate for the Exponential Distribution

The exponential distribution

( ) , 0 x f x e x

was introduced in Section 3.3.3 of the text. The exponential distribution is frequently used in reliabilityengineering as a model for the lifetime or time to failure of a component or system. Generally, we define thereliability function of the unit as

0

( ) { }

1 ( )

1 ( )

t

R t P x t

f x dx

F t

where, of course, ( ) F t is the cumulative distribution function. In biomedical applications, the reliabilityfunction is usually called the survival function . For the exponential distribution, the reliability function is

( ) t F t e

The Hazard Function

The mean and variance of a distribution are quite important in reliability applications, but an additional property called the hazard function or the instantaneous failure rate is also useful. The hazard function is theconditional density function of failure at time t , given that the unit has survived until time t . Therefore, letting

X denote the random variable and x denote the realization,


11/93

11

( | ) ( )

( | )

( | ) ( | )lim

( | )lim

( , )lim

{ }

( )lim

[1 ( )]

( )1 ( )

x

x

x

x

f x X x h x

F x X x

F x x X x F x X x x

F x X x x X x x

F x X x x X x xP X x

F x X x x x F x

f x F x

It turns out that specifying a hazard function completely determines the cumulative distributionfunction (and vive-versa).The Hazard Function for the Exponential Distribution

For the exponential distribution, the hazard function is

( )( )

1 ( ) x

x

f xh x

F x

ee

That is, the hazard function for the exponential distribution is constant, or the failure rate is just thereciprocal of the mean time to failure.

A constant failure rate implies that the reliability of the unit at time t does not depend on its age. Thismay be a reasonable assumption for some types of units, such as electrical components, b

probably unreasonable for mechanical components. It is probably not a good assumption for manytypes of system-level products that are made up of many components (such as an automobile).Generally, an increasing hazard function indicates that the unit is more likely to fail in the nextincrement of time than it would have been in an earlier increment of time of the same length. This islikely due to aging or wear.

Despite the apparent simplicity of its hazard function, the exponential distribution has been animportant distribution in reliability engineering. This is partly because the constant failure rateassumption is probably not unr easonable over some region of the units life.

S3.7. The Failure Rate for the Weibull Distribution

The instantaneous failure rate or the hazard function was defined in Section S3.6 of the Supplemental TextMaterial. For the Weibull distribution, the hazard function is


12/93

12

1 ( / )

( / )

1

( )( )

1 ( )

( / )( / ) x

x

f xh x

F x

x e

e

x

Note that if 1 the Weibull hazard function is constant. This should be no surprise, since for 1 theWeibull distribution reduces to the exponential. When 1 , the Weibull hazard function increases,approaching as . Consequently, the Weibull is a fairly common choice as a model for componentsor systems that experience deterioration due to wear-out or fatigue. For the case where 1 , the Weibullhazard function decreases, approaching 0 as 0 .

For comparison purposes, note that the hazard function for the gamma distribution with parameters r and

is also constant for the case r = 1 (the gamma also reduces to the exponential when r = 1). Also, when r > 1the hazard function increases, and when r < 1 the hazard function decreases. However, when r > 1 the hazardfunction approaches from below, while if r < 1 the hazard function approaches from above. Therefore,even though the graph of the gamma and Weibull distributions look very similar, and they can both producereasonable fits to the same sample of data, they clearly have very different characteristics in terms of describingsurvival or reliability data.


13/93

13


S4.1. Random Samples

To properly apply many statistical techniques, the sample drawn from the population of interest must be a

random sample . To properly define a random sample, let x be a random variable that represents the resultsof selecting one observation from the population of interest. Let ( ) f x be the probability distribution of x.

Now suppose that n observations (a sample) are obtained independently from the population under unchangingconditions. That is, we do not let the outcome from one observation influence the outcome from anotherobservation. Let xi be the random variable that represents the observation obtained on the ith trial. Then theobservations 1 2, , ..., n x x x are a random sample.

In a random sample the marginal probability distributions 1 2( ), ( ),..., ( )n f x f x f x are all identical, theobservations in the sample are independent, and by definition, the joint probability distribution of the randomsample is 1 2 1 2( , ,..., ) ( ) ( )... ( )n n f x x x f x f x f x .

S4.2. Expected Value and Variance Operators

Readers should have prior exposure to mathematical expectation from a basic statistics course. Heresome of the basic properties of expectation are reviewed.

The expected value of a random variable x is denoted by ( ) E x and is given by

all

( ), is a discrete random variable

( )( ) , is a continuous random variable

i

i i i x

x p x x

E x xf x dx x

The expectation of a random variable is very useful in that it provides a straightforward characterization of thedistribution, and it has a simple practical interpretation as the center of mass, centroid, or mean of thedistribution.

Now suppose that y is a function of the random variable x, say ( ) y h x . Note that y is also a random variable.The expectation of ( )h x is defined as

all

( ) ( ), is a discrete random variable

[ ( )]( ) ( ) , is a continuous random variable

i

i i i x

h x p x x

E h xh x f x dx x

An interesting result, sometimes called the theorem of the unconscious statistician states that if x is acontinuous random variable with probability density function ( ) f x and ( ) y h x is a function of x having

probability density function ( ) g y , then the expectation of y can be found either by using the definition ofexpectation with ( ) g y or in terms of its definition as the expectation of a function of x with respect to the

probability density function of x. That is, we may write either

( ) ( ) E y yg y dy

or

( ) [ ( )] ( ) ( ) E y E h x h x f x dx


14/93

14

The name for this theorem comes from the fact that we often apply it without consciously thinkingabout whether the theorem is true in our particular case.

Useful Properties of Expectation I:

Let x be a random variable with mean , and c be a constant. Then

1. 1. ( ) E c c

2. 2. ( ) E x

3. 3. ( ) ( ) E cx cE x c

4. 4. [ ( )] [ ( )] E ch x cE h x

5. If c1 and c2 are constants and h1 and h2 are functions, then

1 1 2 2 1 1 2 2[ ( ) ( )] [ ( )] [ ( )] E c h x c h x c E h x c E h x

Because of property 5, expectation is called a linear (or distributive ) operator .

Now consider the function 2( ) ( )h x x c where c is a constant, and suppose that 2[( ) ] E x c exists.

To find the value of c for which 2[( ) ] E x c is a minimum, write

2 2 2

2 2

[( ) ] [ 2 ]

( ) 2 ( )

E x c E x xc c

E x cE x c

Now the derivative of 2[( ) ] E x c with respect to c is 2 ( ) 2 E x c , and this derivative is zero when

( )c E x . Therefore, 2[( ) ] E x c is a minimum when ( )c E x .

The variance of the random variable x is defined as2

2

( ) [( ) ]V x E x

and we usually call2( ) [( ) ]V x E x

the variance operator . It is straightforward to show that if c is a constant, then2 2( )V cx c

The variance is analogous to the moment of inertia in mechanics.

Useful Properties of Expectation II:

Let x1 and x2 be random variables with means 1 2and and variances2 2

1 2and , respectively, and

let c1 and c2 be constants. Then

1. 1 2 1 2( ) E x x

2. It is possible to show that 2 21 2 1 2 1 2( ) 2 ( , )V x x Cov x x , where

1 2 1 1 2 2( , ) [( )( )]Cov x x E x x


15/93

15

is the covariance of the random variables x1 and x2. The covariance is a measure of the linear association between x1 and x2. More specifically, we may show that if x1 and x2 areindependent , then 1 2( , ) 0Cov x x .

3. 2 21 2 1 2 1 2( ) 2 ( , )V x x Cov x x

4. If the random variables x1 and x2 are independent , 2 21 2 1 2( )V x x

5. If the random variables x1 and x2 are independent , 1 2 1 2 1 2( ) ( ) ( ) E x x E x E x

6. Regardless of whether x1 and x2 are independent, in general

1 1

2 2

( )( )

x E x E

x E x

7. For the single random variable x2( ) 4V x x

because 2( , )Cov x x .

Moments

Although we do not make much use of the notion of the moments of a random variable in the book,for completeness we give the definition. Let the function of the random variable x be

( ) k h x x

where k is a positive integer. Then the expectation of ( ) k h x x is called the k th moment about the

origin of the random variable x and is given by

all

( ), is a discrete random variable

( )( ) , is a continuous random variable

i

k i i i

xk

k

x p x x

E x x f x dx x

Note that the first origin moment is just the mean of the random variable x. The second originmoment is

2 2 2( ) E x

Moments about the mean are defined as

all

( ) ( ), is a discrete random variable[( ) ]

( ) ( ) , is a continuous random variable

i

k i i i

xk

k

x p x x E x

x f x dx x

The second moment about the mean is the variance 2 of the random variable x.

S4.3. Proof That 2 2( ) and ( ) E x E s

It is easy to show that the sample average x and the sample variance s2 are unbiased estimators of thecorresponding population parameters 2 and , respectively. Suppose that the random variable x


16/93

16

has mean2 and variance , and that 1 2, , ..., n x x x is a random sample of size n from the population.

Then

1

1

1

1( )

1( )

1

n

i

i

n

i

i

n

i

E x E xn

E xn

n

because the expected value of each observation in the sample is ( )i E x . Now consider

2

2 1

2

1

( )

( ) 1

1( )

1

n

i

i

n

i

i

x x

E s E n

E x xn

It is convenient to write 2 2 21 1

( )n n

i ii i

x x x nx

, and so

2 2 2

1 1

( ) ( ) ( )n n

i i

i i

E x x E x E nx

Now 2 2 2 2 2 21

( ) and ( ) /n

i

i

E x E x n . Therefore

2 2 2 2 2

1

2 2 2 2

2

2

1( ) ( ) ( / )

1

11

( 1)

1

n

i

E s n nn

n n nn

n

n

Note that:

a. These results do not depend on the form of the distribution for the random variable x. Many people think that an assumption of normality is required, but this is unnecessary.

b. Even though 2 2( ) E s , the sample standard deviation is not an unbiased estimator of the population standard deviation. This is discussed more fully in section S3-5.


17/93

17

S4.4. More About Parameter Estimation

Throughout the book estimators of various population or process parameters are given without muchdiscussion concerning how the se estimators are generated. Often they are simply logical ointuitive estimators, such as using the sample average x as an estimator of the population mean .

There are methods for developing point estimators of population parameters. These methods aretypically discussed in detail in courses in mathematical statistics. We now give a brief overview ofsome of these methods.

The Method of Maximum Likelihood

One of the best methods for obtaining a point estimator of a population parameter is the method ofmaximum likelihood. Suppose that x is a random variable with probability distribution ( ; ) f x ,where is a single unknown parameter. Let 1 2, , ..., n x x x be the observations in a random sample ofsize n. Then the likelihood function of the sample is

1 2( ) ( ; ) ( ; ) ( ; )n L f x f x f x

The maximum likelihood estimator of is the value of that maximizes the likelyhood function L( ).

Example 1 The Exponential Distribution

To illustrate the maximum likelihood estimation procedure, set x be exponentially distributed with parameter . The likelihood function of a random sample of size n, say 1 2, , ..., n x x x , is

1

1

( ) i

n

i

i

n x

i

xn

L e

e

Now it turns out that, in general, if the maximum likelihood estimator maximizes L( ), it will alsomaximize the log likelihood, ln ( ) L . For the exponential distribution, the log likelihood is

1

ln ( ) lnn

ii

L n x

Now

1

ln ( ) ni

i

d L n

xd

Equating the derivative to zero and solving for the estimator of we obtain

1

1n

i

i

n

x x

Thus the maximum likelihood estimator (or the MLE) of is the reciprocal of the sample average.

Maximum likelihood estimation can be used in situations where there are several unknown parameters, say 1 2, , , p to be estimated. The maximum likelihood estimators would be found


18/93

18

simply by equating the p first partial derivatives 1 2( , , , ) / , 1,2,..., p i L i p of the likelihood(or the log likelihood) equal to zero and solving the resulting system of equations.

Example 2 The Normal Distribution

Let x be normally distributed with the parameters 2 and unknown. The likelihood function of arandom sample of size n is

2

22

1

12 2

1

1( )

22 / 2

1( , )

2

1(2 )

i

n

i

i

xn

i

x

n

L e

e

The log-likelihood function is

2 2 2

21

1ln ( , ) ln(2 ) ( )2 2

n

ii

n L x

Now2

21

22

2 2 41

ln ( ) 1( )

ln ( ) 1( ) 0

2 2

n

ii

n

ii

L x

L n x

The solution to these equations yields the MLEs

1

2 2

1

1

1 ( )

n

i

i

n

i

i

x xn

x xn

Generally, we like the method of maximum likelihood because when n is large, (1) it results inestimators that are approximately unbiased, (2) the variance of a MLE is as small as or nearly as smallas the variance that could be obtained with any other estimation technique, and (3) MLEs areapproximately normally distributed. Furthermore, the MLE has an invariance property; that is, if

is the MLE of , then the MLE of a function of , say ( )h , is the same function

( ) of the MLEh . There are also some other nice statistical properties that MLEs enjoy; se book on mathematical statistics, such as Hogg and Craig (1978) or Bain and Engelhardt (1987).

The unbiased property of the MLE is a large-sample orasymptotic property. To illustrate, consider theMLE for

2 in the normal distribution of example 2 above. We can easily show that

2 21( ) n

E n

Now the bias in estimation of 2 is2

2 2 2 21( ) n

E

n n

Notice that the bias in estimating 2 goes to zero as the sample size n . Therefore, the MLE isan asymptotically unbiased estimator.


19/93

19

The Method of Moments

Estimation by the method of moments involves equating the origin moments of the probabilitydistribution (which are functions of the unknown parameters) to the sample moments, and solving forthe unknown parameters. We can define the first p sample moments as

1 , 1, 2, ...,

nk i

ik

x M k p

n

and the first p moments around the origin of the random variable x are just

( ), 1,2, ...,k k E x k p

Example 3 The Normal Distribution

For the normal distribution the first two origin moments are

1

2 2

2

and the first two sample moments are

1

2

2

1

1 n

i

i

M x

M xn

Equating the sample and origin moments results in

2 2 2

1

1 n

i

i

x

xn

The solution gives the moment estimators of 2 and :

2 2

1

1 ( )

n

i

i

x

x xn

The method of moments often yields estimators that are reasonably good. For example, in the aboveexample the moment estimators are identical to the MLEs. However, generally moment estimators

are not as good as MLEs because they dont have statistical properties that are as nice. For exmoment estimators usually have larger variances than MLEs.Least Squares Estimation

The method of least squares is one of the oldest and most widely used methods of parameterestimation. Section 4.6 gives an introduction to least squares for fitting regression models. Unlikethe method of maximum likelihood and the method of moments, least squares can be employed whenthe distribution of the random variable is unknown.

To illustrate, suppose that the simple location model can describe the random variable x:

, 1,2,...,i i x i n

where the parameter is unknown and the i are random errors. We dont know the distribution ofthe errors, but we can assume that they have mean zero and constant variance. The least squares


20/93

20

estimator of is chosen so the sum of the squares of the model errors i is minimized. The least

squares function for a sample of n observations 1 2, , ..., n x x x is

2

1

2

1

( )

n

i

i

n

i

i

L

x

Differentiating L and equating the derivative to zero results in the least squares estimator of :

x

In general, the least squares function will contain p unknown parameters and L will be minimized bysolving the equations that result when the first partial derivatives of L with respect to the unknown

parameters are equated to zero. These equations are called the least squares normal equations . SeeSection 4.6 in the textbook.

The method of least squares dates from work by Karl Gauss in the early 1800s. It has a very well-developed and indeed quite elegant theory. For a discussion of the use of least squares in estimatingthe parameters in regression models and many illustrative examples, see Section 4.6 andMontgomery, Peck and Vining (2007), and for a very readable and concise presentation of the theory,see Myers and Milton (1991).

S4.5. Proof That ( ) E S

In Section S4.4 of the Supplemental Text Material we showed that the sample variance is an unbiasedestimator of the population variance; that is, 2 2( ) E s , and that this result does not depend on theform of the distribution. However, the sample standard deviation is not an unbiased estimator of the

population standard deviation. This is easy to demonstrate for the case where the random variable x follows a normal distribution.

Let x have a normal distribution with mean 2 and variance , and let 1 2, ,..., n x x x be a randomsample of size n from the population. Now the distribution of

2

2

( 1)n s

is chi-square with n 1 degrees of freedom, denoted 2 1n . Therefore the distribution of s2 is2 2

1/( 1) times a nn random variable. So when sampling from a normal distribution, the expectedvalue of s2 is

22 2

1

22

1

2

2

( )1

( )1

( 1)1

n

n

E s E n

E n

nn

because the mean of a chi-square random variable with n 1 degrees of freedom is n 1. Now itfollows that the distribution of


21/93

21

( 1)n s

is a chi distribution with n 1 degrees of freedom, denoted 1n . The expected value of S can bewritten as

1

1

( )1

( )1

n

n

E s E n

E n

The mean of the chi distribution with n 1 degrees of freedom is

1( / 2)

( ) 2[( 1) / 2]

n

n E

n

where the gamma function 10( ) r yr y e dy

. Then

4

2 ( / 2)( )

1 [( 1) / 2]

n E s

n n

c

The constant c4 is given in Appendix table VI.

While s is a biased estimator of , the bias gets small fairly quickly as the sample size n increases.From Appendix table VI, note that c4 = 0.94 for a sample of n = 5, c4 = 0.9727 for a sample of n =10, and c4 = 0.9896 or very nearly unity for a sample of n = 25.

S4.6. More about Checking Assumptions in the t -Test

The two-sample t -test can be presented from the viewpoint of a simple linear regression model .This is a very instructive way to think about the t -test, as it fits in nicely with the general notion of afactorial experiment with factors at two levels. This type of experiment is very important in processdevelopment and improvement, and is discussed extensively in Chapter 13. This also leads to anotherway to check assumptions in the t -test. This method is equivalent to the normal probability plottingof the original data discussed in Chapter 4.

We will use the data on the two catalysts in Example 4.9 to illustrate. In the two-sample t-testscenario, we have a factor x with two levels, which we can arbitrarily call low and high. We wuse x = -1 to denote the low level of this factor (Catalyst 1) and x = +1 to denote the high level of thisfactor (Catalyst 2). The figure below is a scatter plot (from Minitab) of the yield data resulting fromusing the two catalysts shown in Table 4.2 of the textbook.


22/93

22

Catalyst

Y i e l d

1.00.50.0-0.5-1.0

98

97

96

95

94

93

92

91

90

89

Scatterplot of Yield vs Catalyst

We will a simple linear regression model to this data, say

y xij ij ij 0 1

where 0 1and are the intercept and slope, respectively, of the regression line and the regressor or predictor variable is x j1 1 and x j2 1 . The method of least squares can be used to estimate the

slope and intercept in this model. Assuming that we have equal sample sizes n for each factor levelthe least squares normal equations are:

2

2

0

11

2

1 2

1

1

1

n y

n y y

ij j

n

i

j j

n

j j

n

The solution to these equations is

( )

0

1 2 112

y

y y

Note that the least squares estimator of the intercept is the average of all the observations from bothsamples, while the estimator of the slope is one-half of the difference between the sample averagesat the high and low levels of the factor x. Below is the output from the linear regression procedurein Minitab for the catalyst data.


23/93

23

Regression Analysis: Yield versus Catalyst

The regression equation is

Yield = 92.5 + 0.239 Catalyst

Predictor Coef SE Coef T P

Constant 92.4938 0.6752 136.98 0.000

Catalyst 0.2387 0.6752 0.35 0.729

S = 2.70086 R-Sq = 0.9% R-Sq(adj) = 0.0%

Analysis of Variance

Source DF SS MS F P

Regression 1 0.912 0.912 0.13 0.729

Residual Error 14 102.125 7.295

Total 15 103.037

Notice that the estimate of the slope (given in the column labeled Coef and the row labeled Catalys

is 0.2387 2 11 1( ) (92.7325 92.255)2 2 y y and the estimate of the intercept is 92.4938

2 11 1

( ) (93.7325 92.255)2 2

y y . Furthermore, notice that the t -statistic associated with the slope is

equal to 0.35, exactly the same value (apart from sign, because we subtracted the averages in the reverse order)we gave in the text. Now in simple linear regression, the t -test on the slope is actually testing the hypotheses

H

H 0 1

0 1

0

0

:

:

and this is equivalent to testing H 0 1 2: .

It is easy to show that the t -test statistic used for testing that the slope equals zero in simple linear regressionis identical to the usual two-sample t -test. Recall that to test the above hypotheses in simple linear regressionthe t -statistic is

t

S xx

01

2

where S xx ( ) x xij j

n

i

2

11

2

is the corrected sum of squares of the xs. Now in our specific problem,

x x x j j 0 1 11 2, ,and so S n xx 2 . Therefore, since we have already observed that the estimate of is just s p,


24/93

24

2 11 2 1

0 2

1( )

21 2

2 p p xx

y y y yt

s sn nS

This is the usual two-sample t -test statistic for the case of equal sample sizes.

Most regression software packages will also compute a table or listing of the residuals from the model. Theresiduals from the Minitab regression model fit obtained above are as follows:

Obs Catalyst Yield Fit SE Fit Residual St Resid

1 -1.00 91.500 92.255 0.955 -0.755 -0.30

2 -1.00 94.180 92.255 0.955 1.925 0.76

3 -1.00 92.180 92.255 0.955 -0.075 -0.03

4 -1.00 95.390 92.255 0.955 3.135 1.24

5 -1.00 91.790 92.255 0.955 -0.465 -0.18

6 -1.00 89.070 92.255 0.955 -3.185 -1.26

7 -1.00 94.720 92.255 0.955 2.465 0.98

8 -1.00 89.210 92.255 0.955 -3.045 -1.21

9 1.00 89.190 92.733 0.955 -3.543 -1.40

10 1.00 90.950 92.733 0.955 -1.783 -0.71

11 1.00 90.460 92.733 0.955 -2.273 -0.90

12 1.00 93.210 92.733 0.955 0.477 0.19

13 1.00 97.190 92.733 0.955 4.457 1.76

14 1.00 97.040 92.733 0.955 4.307 1.70

15 1.00 91.070 92.733 0.955 -1.663 -0.66

16 1.00 92.750 92.733 0.955 0.017 0.01

The column labeled Fit contains the predicted values of yield from the regression model, which justo be the averages of the two samples. The residuals are in the sixth column of this table. They are just thedifferences between the observed values of yield and the corresponding predicted values. A normal probability

plot of the residuals follows.


25/93

25

Residual

P e r c e n t

5.02.50.0-2.5-5.0-7.5

99

95

90

80

70

60504030

20

10

5

1

Normal Probability Plot of the Residuals(response is Yield)

Notice that the residuals plot approximately along a straight line, indicating that there is no serious problem with the normality assumption in these data. This is equivalent to plotting the original yielddata on separate probability plots as we did in Chapter 3.

S4.7. Expected Mean Squares in the Single-Factor Analysis of Variance

In section 4.5.2 we give the expected values of the mean squares for treatments and error in the single-factoranalysis of variance (ANOVA). These quantities may be derived by straightforward application of theexpectation operator.

Consider first the mean square for treatments:

E MS E SS

aTreatments

Treatments( ) F H G I

K J 1

Now for a balanced design (equal number of observations in each treatment)

SS n

yan

yTreatments ii

a 1 12 2

1. ..

and the single-factor ANOVA model is

yi a

j nij i ij

RST

1 2

1 2

, , ,

, , ,

In addition, we will find the following useful:

E E E E E n E anij i ij i( ) ( ) ( ) , ( ) , ( ) , ( ). .. . .. 0 2 2 2 2 2 2

Now


26/93

26

E SS E n

y E an

yTreatments ii

a

( ) ( ) ( ). .. 1 12 2

1

Consider the first term on the right hand side of the above expression:

E n y n E n nii

a

i ii

a

( ) ( ). .1 1

2

1

2

1

Squaring the expression in parentheses and taking expectation results in

E n

yn

a n n an

an n a

ii

a

ii

a

ii

a

( ) [ ( ) ].1 12

1

2 2 2 2

1

2 2 2

1

because the three cross-product terms are all zero. Now consider the second term on the right hand side of E SS

Treatments( ) :

E an

yan

E an n

an E an

ii

a1 1

1

2

1

2

2

.. ..

..

( )

( )

F H G

I K J

since ii

a

1

0. Upon squaring the term in parentheses and taking expectation, we obtain

E an

yan

an an

an

1 12 2 2

2 2

.. [( ) ]F

H G I

K J

since the expected value of the cross-product is zero. Therefore,

Consequently the expected value of the mean square for treatments is

E SS E n

y E an

y

an n a an

a n

Treatments ii

a

ii

a

ii

a

( ) ( ) ( )

( )

( )

. ..

1 1

1

2 2

1

2 2 2 2 2

1

2 2

1


27/93

27

E MS E SS

a

a n

a

n

a

TreatmentsTreatments

i

i

a

i

i

a

( )

( )

F H G

I K J

1

1

1

1

2 2

1

2

2

1

This is the result given in the textbook.

For the error mean square, we obtain

2.

1 1

2 2.

1 1 1

( )

1 ( )

1 1

E E

a n

ij ii j

a n a

ij ii j i

SS E MS E

N a

E y y N a

E y y N a n

Substituting the model into this last expression, we obtain

2

2

1 1 1 1

1 1( ) ( ) ( )

a n a n

E i ij i iji j i j

E MS E N a n

After squaring and taking expectation, this last equation becomes

2 2 2 2 2 2

1 1

2

1( )

a a

E i i

i i

E MS N n N N n a N a


28/93

28


S5.1. A Simple Alternative to Runs Rules on the x Chart

It is well-known that while Shewhart control charts detect large shifts quickly, they are relative insensitive to

small or moderately-sized process shifts. Various sensitizing rules (sometimes called runs rules) have been proposed to enhance the effectiveness of the chart to detect small shifts. Of these rules, the Western Electricrules are among the most popular. The western Electric rules are of the r out of m form; that is, if r out of thelast m consecutive points exceed some limit, an out of control signal is generated.

In a very fundamental paper, Champ and Woodall (1987) point out that the use of these sensitizing rules doesindeed increase chart sensitivity, but at the expense of (sometimes greatly) increasing the rate of false alarms,hence decreasing the in-control ARL. Generally, I do not think that the sensitizing rules should be usedroutinely on a control chart, particularly once the process has been brought into a state of control. They dohave some application in the establishment of control limits (Phase 1 of control chart usage) and in trying to

bring an unruly process into control, but even then they need to be used carefully to avoid false alarms.

Obviously, Cusum and EWMA control charts provide an effective alternative to Shewhart control charts forthe problem of small shifts. However, Klein (2000) has proposed another solution. His solution is simple butelegant: use an r out of m consecutive point rule, but apply the rule to a single control limit rather than to a setof interior warning type limits. He analyzes the following two rules:

1. If two consecutive points exceed a control limit, the process is out of control. The width of the controllimits should be 1.78 .

2. If two out of three consecutive points exceed a control limit, the process is out of control. The widthof the control limits should be 1.93 .

These rules would be applied to one side of the chart at a time, just as we do with the Western Electric rules.

Klein (2000) presents the ARL performance of these rules for the x

chart, using actual control limit widths of1.7814 and 1.9307 , as these choices make the in-control ARL exactly equal to 370, the values

associated with the usual three-sigma limits on the Shewhart chart. The table shown below is adapted fromhis results. Notice that Professor Kleins procedure greatly improves the ability of the Shewhart x chart todetect small shifts. The improvement is not as much as can be obtained with an EWMA or a Cusum, but it issubstantial, and considering the simplicity of Kleins procedure, it should be more widely used in practice.

Shift in process mean,in standard deviation

units

ARL for the Shewhart x chart with three-

sigma control limits

ARL for the Shewhart x chart with

1.7814 controllimits

ARL for the Shewhart x chart with

1.9307 controllimits

0 370 350 370

0.2 308 277 271

0.4 200 150 142

0.6 120 79 73

0.8 72 44 40

1 44 26 23

2 6.3 4.6 4.33 2 2.4 2.4


29/93

29


S6.1. s 2 is not Always an Unbiased Estimator of 2

An important property of the sample variance is that it is an unbiased estimator of the populationvariance, as demonstrated in Section S4.3 of the Supplemental Text Material. However, this unbiased

property depends on the assumption that the sample data has been drawn from a stable process; thatis, a process that is in statistical control. In statistical quality control work we sometimes make thisassumption, but if it is incorrect, it can have serious consequences on the estimates of the process

parameters we obtain.

To illustrate, suppose that in the sequence of individual observations x x x x x

t t m1 2 1, , , , , ,

the process is in-control with mean 0 and standard deviation for the first t observations, but between xt and xt +1 an assignable cause occurs that results in a sustained shift in the process mean to

a new level 0 and the mean remains at this new level for the remaining sampleobservations x xt m1 , , . Under these conditions, Woodall and Montgomery (2000-01) show that

2 2 2( )( ) ( ) .( 1)

t m t E s

m m (S6.1)

In fact, this result holds for any case in which the mean of t of the observations is 0 and the mean of theremaining observations is 0 , since the order of the observations is not relevant in computing s2. Note

that s2 is biased upwards; that is, s2 tends to overestimate 2. Furthermore, the extent of the bias depends onthe magnitude of the shift in the mean ( ), the time period following which the shift occurs ( t ), and the numberof available observations ( m). For example, if there are m = 25 observations and the process mean shifts from

0 to 0 (that is, 1) between the 20 th and the 21 st observation ( t = 20), then s2 will overestimate2 by 16.7% on average. If the shift in the mean occurs earlier, say between the 10 th and 11 th observations, then

s2 will overestimate 2 by 25% on average.

The proof of Equation S6.1 is straightforward. Since we can write

2 2 2

1

11

m

ii

s x mxm

then

2 2 2 2 2

1 1

1 1( ) ( ) ( )

1 1

m m

i ii i

E s E x mx E x mE xm m

Now


30/93

30

11

11

11

11

2

1

2 2

11

02 2

02 2

02

02 2

m E x

m E x E x

mt m t m t

mt m t m

ii

m

i ii t

m

i

t F H G

I K J

F H G

I K J

( ) ( ) ( )

( ) ( )( ) ( )

( )( )

c h

c h

and

11 1

20

2 2

mmE x

m

m

m t

m m

F H G

I K J

F H G

I K J

L

NMM

O

QPP

( )

Therefore

2 2

2 2 2 20 0 0

2

2 2 20 0 0

22 2 2

2 2

1( ) ( )( )

1

1( )( )

1

1 ( )( )( ) ( )

1

1 ( )( )( ) 1

1

m t E s t m t m m

m m m

m t t m t m

m m

m t m t

m m

m t m t

m m

2 2( ) ( )( 1)

t m t

m m

S6.2. Should We Use d 2 or d 2* in Estimating via the Range Method?

In the textbook, we make use of the range method for estimation of the process standard deviation, particularlyin constructing variables control charts (for example, see the and x R charts of Chapter 5). We use the

estimator 2/ R d . Sometimes an alternative estimator, *2/ R d , is encountered. In this section we discuss thenature and potential uses of these two estimators. Much of this discussion is adapted from Woodall andMontgomery (2000-01). The original work on using ranges to estimate the standard deviation of a normaldistribution is due to Tippett (1925). See also the paper by Duncan (1955).

Suppose one has m independent samples, each of size n, from one or more populations assumed to be normallydistributed with standard deviation . We denote the sample ranges of the m samples or subgroups as R R R

m1 2, , , . Note that this type of data arises frequently in statistical process control applications and gaugerepeatability and reproducibility (R & R) studies (refer to Chapter 8). It is well-known that E ( Ri) = d2 andVar ( Ri)=d 32 2 for i m1 2, , , where d 2 and d 3 are constants that depend on the sample size n. Values ofthese constants are tabled in virtually all textbooks and training materials on statistical process control. See,for example Appendix table VI for values of d 2 and d 3 for n = 2 to 25.

There are two estimators of the process standard deviation based on the average sample range,


31/93

31

R

R

m

i

i

m

1 , (S6.2)

that are commonly encountered in practice. The estimator

/ 1 2 R d (S6.3)

is widely used after the application of control charts to estimate process variability and to assess processcapability. In Chapter 4 we report the relative efficiency of the range estimator given in Equation (S6.3) to thesample standard deviation for various sample sizes. For example, if n = 5, the relative efficiency of the rangeestimator compared to the sample standard deviation is 0.955. Consequently, there is little practical difference

between the two estimators. Equation (S6.3) is also frequently used to determine the usual 3-sigma limits onthe Shewhart chart x in statistical process control. The estimator

/ * 2 2 R d (S6.4)

is more often used in gauge R & R studies and in variables acceptance sampling. Here d 2*represents a constant

whose value depends on both m and n. See Chrysler, Ford, GM (1995), Military Standard 414 (1957), andDuncan (1986).

Patnaik (1950) showed that R / is distributed approximately as a multiple of a - distribution. In particular,

R / is distributed approximately as d 2* / , where represents the fractional degrees of freedom for the

distribution. Patnaik (1950) used the approximation

322

*

2128

5

32

1

4

11

d d . (S6.5)

It has been pointed out by Duncan (1986), Wheeler (1995), and Luko (1996), among others, that 1 is an

unbiased estimator of and that 22

is an unbiased estimator of 2. For 22

to be an unbiased estimator of 2,

however, David (1951) showed that no approximation for d 2* was required. He showed that

d d V mn2 2

2 1 2* /( / ) , (S6.6)

where V n is the variance of the sample range with sample size n from a normal population with unit variance.

It is important to note that V d n 32

, so Equation (S5-6) can be easily used to determine values of d 2*from the

widely available tables of d 2 and d 3. Thus, a table of d 2*values, such as the ones given by Duncan (1986),

Wheeler (1995), and many others, is not required so long as values of d 2 and d 3 are tabled, as they usually are(once again, see Appendix Table VI). Also, use of the approximation

41

122 d d


32/93

32

given by Duncan (1986) and Wheeler (1995) becomes unnecessary.

The table of d 2*values given by Duncan (1986) is the most frequently recommended. If a table is required, the

ones by Nelson (1975) and Luko (1996) provide values of d 2*that are slightly more accurate since their values

are based on Equation (S6.6).

It has been noted that as m increases, d 2*approaches d 2. This has frequently been argued noting that increases

as m increases. The fact that d 2*approaches d 2 as m increases is more easily seen, however, from Equation

(S6.6) as pointed out by Luko (1996).

Sometimes use of Equation (S6.4) is recommended without any explanation. See, for example, the AIAGmeasurement systems capability guidelines [Chrysler, Ford, and GM (1995)]. The choice between 1 and 2has often not been explained clearly in the literature. It is frequently stated that the use of Equation (S6.3)requires that R be obtained from a fairly large number of individual ranges. See, for example, Bissell (1994, p. 289). Grant and Leavenworth (1996, p. 128) state that Strictly speaking, the validity of the exactthe d 2 factor assumes that the ranges have been averaged for a fair number of subgroups, say, 20 or more.When only a few subgroups are available, a better estimate of is obtained using a factor that writers on

statistics have designated as d 2*

. Nelson (1975) writes, If fewer than a large number of subgroups are us Equation (S6.3) gives an estimate of which does not have the same expected value as the standard deviationestimator. In fact, Equation (S6.3) produces an unbiased estimator of regardless of the number of samplesm, whereas the pooled standard deviation does not (refer to Section S4.5 of the Supplemental Text Material).The choice between 1 and 2 depends upon whether one is interested in obtaining an unbiased estimator of

or 2. As m increases, both estimators (S6.3) and (S6.4) become equivalent since each is a consistentestimator of .

It is interesting to note that among all estimators of the form cR c( ),0 the one minimizing the mean squarederror in estimating has

c d d 2 2

2/ ( )* .

The derivation of this result is in the proofs at the end of this section. If we let

( )*

32

22

d

d R

then it is shown in the proofs below that

MSE d

d (

)( )*

3

22

221

F H G

I K J

Luko (1996) compared the mean squared error of 2 in estimating to that of 1 and recommended 2 on the basis of uniformly lower MSE values. By definition, 3 leads to further reduction in MSE.

It is shown in the proofs at the end of this section that the percentage reduction in MSE using 3 instead of 2 is


33/93

33

50 2 2

2

d d

d

*

*

F H G

I K J

Values of the percentage reduction are given in Table S6.1. Notice that when both the number ofsubgroups and the subgroup size are small, a moderate reduction in mean squared error can beobtained by using 3 .

Table S6.1.

Percentage Reduction in Mean Squared Error from using 3 instead of 2

Subgroup

Size, n

Number of Subgroups, m

1 2 3 4 5 7 10 15 20

2 10.1191 5.9077 4.1769 3.2314 2.6352 1.9251 1.3711 0.9267 0.6998

3 5.7269 3.1238 2.1485 1.6374 1.3228 0.9556 0.6747 0.4528 0.3408

4 4.0231 2.1379 1.4560 1.1040 0.8890 0.6399 0.4505 0.3017 0.2268

5 3.1291 1.6403 1.1116 0.8407 0.6759 0.4856 0.3414 0.2284 0.1716

6 2.5846 1.3437 0.9079 0.6856 0.5507 0.3952 0.2776 0.1856 0.1394

7 2.2160 1.1457 0.7726 0.5828 0.4679 0.3355 0.2356 0.1574 0.1182

8 1.9532 1.0058 0.6773 0.5106 0.4097 0.2937 0.2061 0.1377 0.1034

9 1.7536 0.9003 0.6056 0.4563 0.3660 0.2623 0.1840 0.1229 0.0923

10 1.5963 0.8176 0.5495 0.4138 0.3319 0.2377 0.1668 0.1114 0.0836

Proofs

Result 1: Let , (

) [ ( ) ]* cR MSE c d cd then 2 2 22

22 1

Proof:

MSE E cR

c R c R

c E R c E R

(

) [( ) ]

[ ]

( ) ( )

2

2 2 2

2 2 2

2

2

Now E R Var R E R d m d ( ) ( ) [ ( )] / ( )2 2 32 2

22 . Thus


34/93

34

MSE c d m c d c d

c d m d cd

c d cd

(

) / ( )

[ ( / ) ]

[ ( ) ]*

232 2 2

22 2

22

2 232

22

2

2 22

22

2

2 1

2 1

Result 2: The value of c that minimizes the mean squared error of estimators of the form cR in estimating

isd

d 2

22( )*

.

Proof:

MSE c d cd

dMSE

dcc d d

c d

d

(

) [ ( ) ]

(

)[ ( ) ]

( ).

*

*

*

2 22

22

22

22

2

22

2 1

2 2 0

Result 3: The mean square error of( ) ( )* *

32

22

2 2

221

F H G

I K J

d d

R d

d is .

Proof:

MSE d

d d

d

d d

d

d

d

d

d

d

(

)( )

( )( )

( ) ( )

( )

**

*

* *

*

32 2

2

24 2

2 2

22 2

2 22

2 2

22

2 2

2 22

22

2 1

2 1

1

LNM

OQP

LNM

OQP

F H G

I K J

(from result 1)

=

Note that MSE n MSE m(

) (

) . 3 30 0 as and as

Result 4: Let

( ).* * 2

23

2

22

R

d

d

d Rand Then

MSE MSE

MSE

(

) (

)(

)

2 3

2

100L

NM

O

QP , the percent reduction

in mean square error using the minimum mean square error estimator instead of R

d 2

* [as recommended by

Luko (1996)], is

50 2 2

2

d d

d

*

*

F H G

I K J

Proof:


35/93

35

Luko (1996) shows that MSE d d

d (

) ( )*

*

2

22 2

2

2, therefore

MSE MSE

d d

d

d

d

d d

d

d d

d

d d

d

d d d d

d

d d

d

d d

d

d d d

d d d

(

) (

) ( )

( )

( ) ( )( )

( ) ( )( )( )

( )

( )

*

* *

*

*

*

*

*

*

* *

*

*

*

*

*

*

*

*

2 3

22 2

2

2 22

22

2 2 2

2

22

22

22

2 2 2

2

2 2 2 2

22

2 2 2

2

2 2

2

2 2 2

2

2 2

21

2

2

2

F H G

I K J

LNM

OQP

LNM

OQP

F H G

I K J

2

2 2 22

22

*

*

*

( )( )

F H G I K J

d d

d

Consequently MSE MSE MSE

d d d d d d

d d d

(

) (

)(

)( ) / ( )

( ) / ( )

* *

* *

*

*

2 3

2

22 2

22

2

22 2 2

2 2

2

1002

100 50LNM

OQP

F H G

I K J

.

S6.3. Determining When the Process has ShiftedControl charts monitor a process to determine whether an assignable cause has occurred. Knowingwhen the assignable c ause has occurred would be very helpful in its identification and eventuremoval. Unfortunately, the time of occurrence of the assignable cause does not always coincide withthe control chart signal. In fact, given what is known about average run length performance of controlcharts, it is actually very unlikely that the assignable cause occurs at the time of the signal. Therefore,when a signal occurs, the control chart analyst should look earlier in the process history to determinethe assignable cause.

But where should we start? The Cusum control chart provides some guidance simply search backwards on the Cusum status chart to find the point in time where the Cusum last crossed zero

(refer to Chapter 8). However, the Shewhart x control chart provides no such simple guidance.Samuel, Pignatiello, and Calvin (1998) use some theoretical results by Hinkley (1970) on change-

point problems to suggest a procedure to determine the time of a shift in the process mean followinga signal on the Shewhart x control chart. They assume the standard x control chart with in-control

value of the process mean 0 . Suppose that the chart signal at subgroup average T x . Now the in-

control subgroups are 1 2, , ..., t x x x , and the out-of-control subgroups are 1 2, ,...,t t T x x x , whereobviously t T . Their procedure consists of finding the value of t in the range 0 t T thatmaximizes

2

, 0( )( )

t T t C T t x


36/93


37/93

37

S6.6. The Mean Square Successive Difference as an Estimator of 2

An alternative to the moving range estimator of the process standard deviation is the mean square successivedifference as an estimator of 2 . The mean square successive difference is defined as

21 1

1

1( )2( 1)

n

i

i MSSD x xn

It is easy to show that the MSSD is an unbiased estimator of 2 . Let 1 2, , ..., n x x x be a random sample of size

n from a population with mean 2 and variance . Without any loss of generality, we may take the mean to bezero. Then

21

2

2 21 1

2

2 2

2

2

1( ) ( )

2( 1)

1( 2 )

2( 1)1

[( 1) ( 1) ]2( 1)

2( 1)2( 1)

n

i i

i

n

i i i i

i

E MSSD E x xn

E x x x x

n

n nn

n

n

Therefore, the mean square successive difference is an unbiased estimator of the population variance.


38/93


39/93

39


S8.1. Fixed Versus Random Factors in the Analysis of Variance

In chapter 4, we present the standard analysis of variance (ANOVA) for a single-factor experiment, assuming

that the factor is a fixed factor. By a fixed factor, we mean that all levels of the factor of interest were studiedin the experiment. Sometimes the levels of a factor are selected at random from a large (theoretically infinite)

population of factor levels. This leads to a random effects ANOVA model.

In the single factor case, there are only modest differences between the fixed and random models. The modelfor a random effects experiment is still written as

ij i ij y

but now the treatment effects i are random variables, because the treatment levels actually used in the

experiment have been chosen at random. The population of treatments is assumed to be normally and

independently distributed with mean zero and variance2

. Note that the variance of an observation is

2 2

( ) ( )ij i ijV y V

We often call2 2 and variance components , and the random model is sometimes called the components

of variance model. All of the computations in the random model are the same as in the fixed effects model, but since we are studying an entire population of treatments, it doesnt make much sense to fohypotheses about the individual factor levels selected in the experiment. Instead, we test the followinghypotheses about the variance of the treatment effects:

20

21

: 0

: 0

H

H

The test statistic for these hypotheses is the usual F -ratio, F = MS Treatments /MS E . If the null hypothesis is notrejected, there is no variability in the population of treatments, while if the null hypothesis is rejected, there issignificant variability among the treatments in the entire population that was sampled. Notice that theconclusions of the ANOVA extend to the entire population of treatments.

The expected mean squares in the random model are different from their fixed effects model counterparts. Itcan be shown that

2 2

2

( )

( )

Treatments

E

E MS n

E MS

Frequently, the objective of an experiment involving random factors is to estimate the variance components.A logical way to do this is to equate the expected values of the mean squares to their observed values and solvethe resulting equations. This leads to

2

2

Treatments E

E

MS MS

n

MS

A typical application of experiments where some of the factors are random is in a measurement systemscapability study, as discussed in Chapter 7. The model used there is a factorial model, so the analysis and theexpected mean squares are somewhat more complicated than in the single factor model considered here.


40/93

40

S8.2. Analysis of Variance Methods for Measurement Systems Capability Studies

In Chapter 8 an analysis of variance model approach to measurement systems studies is presented.This method replaces the tabular approach that was presented along with the ANOVA method inearlier editions of the book. The tabular approach is a relatively simple method, but it is not the mostgeneral or efficient approach to conducting gage studies. Gauge and measurement systems studiesare designed experiments , and often we find that the gauge study must be conducted using anexperimental design that does not nicely fit into the tabular analysis scheme. For example, supposethat the operators used with each instrument (or gauge) are different because the instruments are indifferent physical locations. Then operators are nested within instruments, and the experiment has

been conducted as a nested design.

As another example, suppose that the operators are not selected at random, because the specific operators usedin the study are the only ones that actually perform the measurements. This is a mixed model experiment, andthe random effects approach that the tabular method is based on is inappropriate. The random effects modelanalysis of variance approach in the text is also inappropriate for this situation. Dolezal, Burdick, and Birch(1998), Montgomery (2001), and Burdick, Borror, and Montgomery (2003) discuss the mixed model analysis

of variance for gauge R & R studies.The tabular approach does not lend itself to constructing confidence intervals on the variancecomponents or functions of the variance components of interest. For that reason we do notrecommend the tabular approach for general use. There are three general approaches to constructingthese confidence intervals: (1) the Satterthwaite method, (2) the maximum likelihood large-samplemethod, and (3) the modified large sample method. Montgomery (2001) gives an overview of thesedifferent methods. Of the three approaches, there is good evidence that the modified large sampleapproach is the best in the sense that it produces confidence intervals that are closest to the statedlevel of confidence.

Hamada and Weerahandi (2000) show how generalized inference can be applied to the problem of determining

confidence intervals in measurement systems capability studies. The technique is somewhat more involvedthat the three methods referenced above. Either numerical integration or simulation must be used to find thedesired confidence intervals. Burdick, Borror, and Montgomery (2003) discuss this technique.

While the tabular method should be abandoned, the control charting aspect of measurement systems capabilitystudies should be used more consistently. All too often a measurement study is conducted and analyzed viasome computer program without adequate graphical analysis of the data. Furthermore, some of the advice invarious quality standards and reference sources regarding these studies is just not very good and can produceresults of questionable validity. The most reliable measure of gauge capability is the probability that parts aremisclassified.


41/93

41


S9.1. The Markov Chain Approach to Finding the ARLs for Cusum and EWMA ControlCharts

When the observations drawn from the process are independent, average run lengths or ARLs are easy todetermine for Shewhart control charts because the points plotted on the chart are independent. The distributionof run length is geometric, so the ARL of the chart is just the mean of the geometric distribution, or 1/ p, where

p is the probability that a single point plots outside the control limits.

The sequence of plotted points on Cusum and EWMA charts is not independent, so another approach must beused to find the ARLs. The Markov chain approach developed by Brook and Evans (1972) is very widelyused. We give a brief discussion of this procedure for a one-sided Cusum.

The Cusum control chart statistic (or )C C form a Markov process with a continuous state space. By

discretizing the continuous random variable (or )C C with a finite set of values, approximate ARLs can

be obtained from Markov chain theory. For the upper one-sided Cusum with upper decision interval H , theintervals are defined as follows:( , / 2],[ / 2,3 / 2], ,[( 1/ 2) , ( 1/ 2) ], ,[( 3 / 2) , ],[ , )w w w k w k w m w H H where m + 1 is thenumber of states and w = 2 H/ (2m- 1). The elements of the transition probability matrix of the Markov chain

[ ]ij pP are

/ 2

0

( / 2)

( / 2)

( ) , 0,1,..., 1

0,1,..., 1( )

1,2, ..., 1

( ) , 0,1,..., 1

0, 0,1,..., 1

1

w

i

j i w

ij j i w

im H

mj

mm

p f x iw k dx i m

i m p f x iw k dx

j m

p f x iw k dx i m

p j m

p

The absorbing state is m and f denotes the probability density function of the variable that is being monitoredwith the Cusum.

From the theory of Markov chains, the expected first passage times from state i to the absorbing state are

1

0

1 , 0,1,..., 1m

i ij j j

p i m

Thus, i is the ARL given that the process started in state i. Let Q be the matrix of transition probabilities

obtained by deleting the last row and column of P . Then the vector of ARLs is found by computing

I Q 1

where 1 is an 1m vector of 1s and I is the m m identity matrix.

When the process is out of control, this procedure gives a vector of initial-state (or zero-state) ARLs. That is,the process shifts out of control at the initial start-up of the control chart. It is also possible to calculate steady-state ARLs that describe performance assuming that the process shifts out of control after the control chart has

been operating for a long period of time. There is typically very little difference between initial-state andsteady-state ARLs.


42/93

42

Let ( , ) P n i be the probability that run length takes on the value n given that the chart started in state i. Collectthese quantities into a vector say

[ ( , 0), ( ,1), ..., ( , 1)]n P n P n P n m p

for n = 1,2, . These probabilitiescan be calculated by solving the following equations:1

1

1

( )

, 2,3,...n n n

p I Q 1

p Qp

This technique can be used to calculate the probability distribution of the run length, given the control chartstarted in state i. Some authors believe that the distribution of run length or its percentiles is more useful thatthe ARL, since the distribution of run length is usually highly skewed and so the ARL may not be a value in any sense.

S9.2. Integral Equations Versus Markov Chains for Finding the ARL

Two methods are used to find the ARL distribution of control charts, the Markov chain method andan approach that uses integral equations. The Markov chain method is described in Section S9.1 ofthe Supplemental Text Material. This section gives an overview of the integral equation approachfor the Cusum control chart. Some of the notation defined in Section S9.1 will be used here.

Let ( , ) and ( ) P n u R u be the probability that the run length takes on the value n and the ARL for the Cusumwhen the procedure begins with initial value u. For the one-sided upper Cusum

1/ 2 ( 1/ 2)

( 1/ 2 )1

(1, ) 1 ( )

1 ( ) ( )

H

mw j w

j w j

P u f x u k dx

f x u k dx f x u k dx

and

0

0

0 / 2

0

1 ( 1 / 2 )

( 1/ 2 )1

0 / 2

0 0

( , ) ( 1,0) ( ) ( 1, ) ( )

( 1,0) ( ) ( 1, ) ( )

( 1, ) ( )

( 1,0) ( ) ( 1, ) ( )

( 1,

H

w

m j w

j w j

w

P n u P n f x u k dx P n y f x u k dx

P n f x u k dx P n y f x u k dx

P n y f x u k dx

P n f x u k dx P n f x u k dx

P n

1 ( 1 / 2 )( 1/ 2 )

1

) ( )m j w

j j w j

f x u k dx

for n = 1,2, and for0 ( , / 2) and [( 1/ 2) ,( 1/ 2) ), 1,2,..., 1. jw j w j w j m If w is small,then j is the midpoint jw of the jth interval for j = 1,2,,m 1, and considering only the values of ( , ) P n u for which u = iw results in


43/93

43

1

0

1

0

(1, ) 1

( , ) ( 1, ) , 2,3,...

m

ij j

m

ij j

P iw p

P n iw P n iw p n

But these last equations are just the equations used for calculating the probabilities of first-passagetimes in a Markov chain. Therefore, the solution to the integral equation approach involves solvingequations identical to those used in the Markov chain procedure.

Champ and Rigdon (1991) give an excellent discussion of the Markov chain and integral equationtechniques for finding ARLs for both the Cusum and the EWMA control charts. They observe thatthe Markov chain approach involves obtaining an exact solution to an approximate formulation of theARL problem, while the integral equation approach involves finding an approximate solution to theexact formulation of the ARL problem. They point out that more accurate solutions can likely befound via the integral equation approach. However, there are problems for which only the Markovchain method will work, such as the case of a drifting mean.


44/93

44


S10.1. Difference Control Charts

The difference control chart is briefly mentioned in Chapter 10, and a reference is given to a paper by Grubbs

(1946). There are actually two types of difference control charts in the literature. Grubbs compared samplesfrom a current production process to a reference sample. His application was in the context of testingordinance. The plotted quantity was the difference in the current sample average and the reference sampleaverage. This quantity would be plotted on a control chart with center line at zero and control limits at

2 2

2 1 2 A R R , where

2 2

1 2and R R are the average ranges for the reference samples (1) and the current

production samples (2) used to establish the control limits.

The second type of difference control chart was suggested by Ott (1947), who considered the situation wheredifferences are observed between paired measurements within each subgroup (much as in a paired t -test), andthe average difference for each subgroup is plotted on the chart. The center line for this chart is zero, and the

control limits are at2

, where A R R is the average of the ranges of the differences. This chart would beuseful in instrument calibration, where one measurement on each unit is from a standard instrument (say in alaboratory) and the other is from an instrument used in different conditions (such as in production).

S10.1. Control Charts for Contrasts

There are many manufacturing processes where process monitoring is important but traditionalstatistical control charts cannot be effectively used because of rational subgrouping considerations.Examples occur frequently in the chemical and processing industries, stamping, casting and moldingoperations, and electronics and

supplemental material to intro to sqc 6th ed

Documents