chapter 5qualtiy control

8/9/2019 Chapter 5Qualtiy Control

1/64

Chapter 5

Basic Probability andStatistics


2/64

Basic Probability and Statistics

y List of Sectionsy Introduction

y Probability Defined

y Types of Data

y Characteristics of Data

y Visually Describing Data

y Numerically Describing Datay Take-Away Knowledge


3/64

Chapter 5:

Basic Probability and Statistics

List of Sections

y

Summaryy Key Terms

y Exercises

y References

yAppendix A5.1: Using Windows

yAppendix A5.2: Introduction to Minitab

yAppendix A5.3: Using Minitab for charts, descriptive

statistics and normal probabilities


4/64

Chapter 5:Basic Probability and Statistics

y You will be able to:

y Define probability

y Define attribute and measurement variable data

y

Discuss visual displays of datay Discuss numerical methods of describing data

y Interpret the standard deviation as a measure of

variation

y

Calculate probabilities under the normal distribution


5/64

Introductiony In this chapter we look at the:

y issues of quantifying probabilities and examiningcharacteristics of data taken from a population orprocess.

y the types of data that may be encountered and howthey are classified.

y the methods for visually displaying data.

y the calculation of numerical measures to describedata.


6/64

Probability Definedy There are three popular definitions of probability:

y the classical definition,

y the relative frequency definition, and

y Bayesian definition (not discussed here).


7/64

The Classical Definition of Probability

y The classical approach to probability states that ifan experiment has N equally likely and mutuallyexclusive outcomes, and if n of those outcomescorrespond to the occurrence of event A, then theprobability of event A is

P(A) = n/Ny where:

y n = the number of experimental outcomes that correspondto the occurrence of event A, and

y N = the total number of experimental outcomes.


8/64


9/64

The Relative Frequency Definition of Probability

y The relative frequency approach to probabilitystates that if an experiment is conducted a largenumber of times (say k times), then the probabilityof event A occurring is

P(A) = k/M

y where:y k = the number of times A occurred during these

experiments, and

y M = the maximum number of times that event A could haveoccurred during these experiments.


10/64

For example, suppose in the die-tossing example

that a fair die was tossed 100,000 times and that a2, 4, or 6 appeared in 50,097 throws. Consequently,

the relative frequency probability of tossing an even

number on a fair die is

P(Even) = 50,097/100,000 = 0.50097


11/64

The classical definition of the probability of aneven toss and the relative frequency definition ofan even toss differ as a result of the methodsused in their respective calculations.


12/64

Analytic Studies and Relative Frequency

ProbabilitiesyAnalytic studies are conducted to determine

process characteristics. But process characteristicshave a past and present and will have a future;hence there is no frame from which classical

probabilities can be calculated.y Probabilities concerning process characteristics

must be obtained empirically, throughexperimentation, and must therefore be relativefrequency probabilities.


13/64

Types of Datay Data is information collected about a product,

service, process, person, or machine.

y We classify data into two types:y attribute data

y variables (measurement) data.


14/64

yAttribute DatayAttribute data arise:

y From the classification of items, such as products orservices, into categories (e.g., conforming or non-conforming).

y From counts of the number of items in a given category or

the proportion in a given category (e.g., the proportion ofdefective units in a sample of units).

y From counts of the number of occurrences per unit (e.g., thenumber of defects per unit).


15/64

Grievance Number Grievance Level

1 1

2 1

3 1

4 25 4

6 1

7 1

8 1

9 3

10 1

11 1

Classification of 2003 Union Grievances into

Four Categories


16/64

Category Number of

Grievances

Proportion of

Grievances

First Level 8 8/11 = 0.73

Second Level 1 1/11 = 0.09

Third Level 1 1/11 = 0.09

Fourth Level 1 1/11 = 0.09


17/64

Variables (Measurement) Datay From the measurement of a characteristic of a

product, service, or process and

y From the computation of a numerical value from twoor more measurements of variables data.


18/64


19/64

Characterizing Datay Enumerative Studies

yA complete census of the frame in an enumerativestudy provides all the information needed to takeaction on the frame.

y If the information used as the basis for actionconstitutes a random sample from the frame, theseerrors can be quantified, and valid statisticalinferences can be made on the frame in question.


20/64

yAnalytic StudiesyA complete census of the frame is impossible in an

analytic study, for a frame consists of all past,present, and future observations, and futureobservations cannot be measured.

yAs we are dealing with an ongoing process, wewish to characterize the data to take action on thatprocess for the future.


21/64

y Unlike the enumerative study, since the frame isunknown (the future cannot be measured) it is not

possible to quantify these errors.yAny inferences we make in an analytic study are

conditional on the environmental state when thesample was selected; that environmental state willnever again exist.

y

Information on such a problem can never becomplete.y If, however, a knowledge of the process and the

environment and an analysis of the data indicatethat the process is stable and predictable, and willremain so in the near future, the visual and

numerical characterizations discussed in thischapter can be used to make inferences and takeaction in the near future.


22/64

Visually Describing DataTabular Displays

y Frequency Distributions.y A frequency distribution shows us, in tabular form, the

number of times, or the frequency with which, a given valueor group of values occurs.


23/64


24/64

Tabular Displays of Attribute Data


25/64

Tabular Displays of Measurement Data (toppanel) and Attribute Data (bottom panel)


26/64

Limitations ofFrequency Displays.

y It is important to note that the frequencydisplays we have discussed do not includeinformation on the time-ordering of data.

y

In an analytic study, where we examine thesample to take action on the process, afrequency display would fail to show trendsthat may be occurring over time.

y This loss of information can be critical.


27/64

Graphical Displaysy

Data are often represented in graphical form.y Frequency distributions of measurement (variables)

data are commonly presented in frequencypolygons or histograms.y Frequency distributions of attribute data are commonly

presented in bar charts.

y In all these displays, the class intervals are drawn along thehorizontal axis, and the absolute or relative frequenciesalong the vertical axis.


28/64


29/64


30/64

Run Chart: Importance of Time-Ordering in

Analytic Studies.y In analytic studies we want to be able to detect trends or

other patterns over time to take action on a process in thenear future.

y In a run chart (also called a tier chart), this information ispreserved by plotting the observed values on the vertical

axis and the times they were observed on the horizontalaxis.


31/64

Graphical Displays of Measurement Data with Time


32/64


33/64

y Measures of Central Tendency

y Mean

y Median

y Mode


34/64

y The Mean

y

In trying to convey the underlying character of variablesdata by somehow representing the typical value of the data,

the most common numerical representation is the arithmetic

average or mean: the sum of the numerical values of the

measurement divided by the number of items examined.


35/64

In an enumerative study, if the items constitute a frame,

the average is called the population mean and is

usually denoted by the Greek letterQ (pronounced

"mew"). When the items constitute a sample drawn

from a frame, we call

the average a sample mean and denote it as ("x bar").

Thus, in an enumerative study, we might make

reference to eitherQor x

x


36/64

y In an analytic study, there is no population (future

output does not yet exist), and hence we cannot

describe a population mean. The mean of a sampled

subgroup (e.g., four items from a day's

production) is denoted as , and when we average

the values to calculate a process mean,

we denote that value as ("x bar-bar" or x

double-bar).

x

x


37/64

y The sample mean, can be calculated as

y where, n is the size of the sample or the number of

items included in the determination of the samplemean.

n!


38/64

y If we calculate sample means for subgroups of size n in

the past and present, we can calculate the process

mean as the average of these subgroup means:

subgroupsofNumber

!


39/64

6.0228

168.68

subgroupsof Nu ber!!!


40/64

y The Median

y

The middle value when the data are arranged in ascendingorder. When there are an even number of observations, the

median value is the arithmetic average of the middle two

values.

y The median is the middle value, 50 percent of the data

points must have values less than the median.

y The median is not as influenced by the magnitude of the

extreme items as is the mean.


41/64

y The interpretation of the median in an analyticstudy, however, can be misleading.y Where extreme data points are observed in a

sample, we have seen that they do not affect thecomputation of the median.

y In an analytic study, where we are concerned withthe process itself, the existence of extreme datapoints is a critical factor in our analysis. That is,extreme data points may indicate processdisturbances and instability, and the need forcorrective action on the process.

y

A measure like the median, which is insensitive toextreme data points, must be used with caution inan analytic study.


42/64

y The Mode

y The mode of a distribution is the value that occurs most

frequently, or the value corresponding to the highest point

on a frequency polygon or histogram.

y Like the median, and unlike the mean, it is not affected by

extreme data points.

y A frequency distribution with one such high point is called

unimodal.

y Distributions with two high points of concentration are called

bimodal.


43/64

y Like the median, an important characteristic of the mode is

that it is not affected by extreme data points. In analyticstudies, where extreme data points may reveal a great deal

about the process under investigation, the mode should

therefore be used with caution.


44/64

y The Proportion

y The proportion or fraction of the data possessing one of twosuch conditions is then a meaningful measure of central

tendency.y p = x/n

y where, x is the number of defective items and n is the totalnumber of items in the sample.

y Thus, p = 8/38 = 0.21


45/64

Measures of Variability

yAll populations and processes have some degree of

variability, given appropriate sensitivity of themeasuring instrument; not all items in a population

or process are identical. Thus we must be able to

quantify not only the central tendency but also the

degree of variability in a set of data.y The two commonly used quantitative measures of

such variability are the range and the standard

deviation.


46/64

y The Range

y The range is the simplest measure of dispersion; for raw

data from an enumerative or an analytic study, it is definedas the difference between the largest data point and the

smallest data point in a set of data:

R = xmax - xmin


47/64

Item

No.

Process

A

Item

No.

Process

B

Item

No.

Process

C

1 5.0 1 5.0 1 7.62 5.3 2 7.8 2 7.8

3 8.0 3 7.9 3 8.0

4 9.2 4 8.0 4 8.1

5 10.0 5 8.8 5 8.1

6 10.5 6 10.5 6 8.4

8.0 8.0 8.0

Range 5.5 5.5 0.8

x

Weights (in grams) from three manufacturing

processes


48/64

y The Standard Deviation

y The standard deviation as a measure of dispersion takesinto account each of the data points and their distances

from the mean.

y The more disperse the data points, the larger the standard

deviation will be; the closer the data points to the mean, the

smaller the standard deviation will be.


49/64

y The Standard Deviation

y In an enumerative study, the population standard

deviation is computed as:

N

)-(x 2 Q!W


50/64

The Standard Deviationy For both enumerative and analytic studies, we calculate

the standard deviation of a sample (or subgroup) of nobservations, called the sample standard deviation, s:

1n)x-(xs

2

!


51/64

Item

No.

Process

A

Item

No.

Process

B

Item

No.

Process

C

1 5.0 1 5.0 1 7.62 5.3 2 7.8 2 7.8

3 8.0 3 7.9 3 8.0

4 9.2 4 8.0 4 8.1

5 10.0 5 8.8 5 8.16 10.5 6 10.5 6 8.4

8.0 8.0 8.0

s 2.37 1.79 0.28

x

Weights (in grams) from three manufacturing

processes


52/64

y Measures of Shape

y Skewness

y Kurtosis


53/64

y Measures of Shape

y Skewness

y the skewness, or lack of symmetry of a set of data

y A numerical measure of skewness, Pearson's coefficient of

skewness, is defined as:

s

)M-x3(Skewness ep !

Skewness

= 0

Skewness

> 0

Skewness

< 0


54/64

y Measures of Shape

y Skewness

y Another way to view skewness is as a measure of the

relative sizes of the tails of the distribution.

y In symmetric distributions (where the two tails are the

same), the coefficient of skewness will be zero.

y In skewed distributions (where the difference between the

frequencies in the two tails is large), the magnitude of the

coefficient of skewness will be large.


55/64

y Measures of Shapey Kurtosis

y Peakedness or kurtosis. A distribution with a relatively high

concentration of data in the middle and at the tails, but lowconcentration in the shoulders, has a large kurtosis; one that isrelatively flat in the middle, with fat shoulders and thin tails, haslittle kurtosis.

y A numerical measure of kurtosis is given by:

3-s

)-(Kurtosis

4

4


56/64

Kurtosis


57/64

y Interpretation of the Standard Deviationy The standard deviation is interpreted by determining

the proportion of data that lies within k standarddeviations from the mean for a distribution.

y This proportion is directly a function of the shape ofthe distribution.

y In an analytic study, the distribution must be stable

(only common causes of variation) to interpret thestandard deviation.

y There are four classic scenarios for distributions ofdata:y normal (bell-shaped) distribution

y skewed to the right distribution

y skewed to the left distribution

y unknown distribution.


58/64

y Normal Distribution.

y

In the case of data that is normally distributed, theprobability of obtaining a random data point within

k standard deviation of the mean is:

y k = 1 P(Q - 1W < X < Q + 1W) = .6826

y k = 2 P(Q - 2W < X < Q + 2W) = .9544

y k = 3 P(Q - 3W < X < Q + 3W) = .9973


59/64

y Skewed Distribution (Right or Left).

y In the case of data that is unimodal (and skewed

to the right or the left), the probability of obtaining

a random data point within k standard deviation

from the mean is described by the Camp-Meidel

inequality as follows:

? A

uWQWQ 2

k2.25

1-1)kXk-P(


60/64

y To summarize equation, we can say that:

y k = 1 P(Q - 1W < X < Q + 1W) u 1- (1/[2.25]12)

= 0.5556

y k = 2 P(Q - 2W < X < Q + 2W) u 1- (1/[2.25]22)

= 0.8889

y k = 3 P(Q - 3W < X < Q + 3W) u 1- (1/[2.25]32)

= 0.9506


61/64

y Unknown Distribution

y In the case of data for which the distribution is

unknown, the probability of obtaining a random data

point within k standard deviation from the mean,

assuming that k u 1, is described by Chebychevs

inequality as follows:

P(Q - kW < X < Q + kW) u 1- (1/k2)y To summarize equation, we can say that:

y k = 1 P(Q - 1W < X < Q + 1W) u 1- (1/12) = 0.0000

y k = 2 P(Q - 2W < X < Q + 2W) u 1- (1/22) = 0.7500

y k = 3 P(Q - 3W < X < Q + 3W) u 1- (1/32) = 0.8889


62/64

y If the standard deviation of a distribution is small,we do not have to go very far either side of the

mean to include a large portion of the data in the

distribution, even if we do not know anything about

the shape of the distribution.


63/64

y More Details on the Normal Distribution

y We often wish to calculate the probabilities underthe normal distribution between two values of the

random variable X, or the probabilities under the

normal distribution above and/or below one value of

the random variable X.


64/64

y For example, the figure below shows the

probability of selecting a value of X between x1

and x2, given a stable normal distribution with amean ofQ and a standard deviation ofW.

chapter 5qualtiy control

Documents