chapter 5qualtiy control
TRANSCRIPT
-
8/9/2019 Chapter 5Qualtiy Control
1/64
Chapter 5
Basic Probability andStatistics
-
8/9/2019 Chapter 5Qualtiy Control
2/64
Basic Probability and Statistics
y List of Sectionsy Introduction
y Probability Defined
y Types of Data
y Characteristics of Data
y Visually Describing Data
y Numerically Describing Datay Take-Away Knowledge
-
8/9/2019 Chapter 5Qualtiy Control
3/64
Chapter 5:
Basic Probability and Statistics
List of Sections
y
Summaryy Key Terms
y Exercises
y References
yAppendix A5.1: Using Windows
yAppendix A5.2: Introduction to Minitab
yAppendix A5.3: Using Minitab for charts, descriptive
statistics and normal probabilities
-
8/9/2019 Chapter 5Qualtiy Control
4/64
Chapter 5:Basic Probability and Statistics
y You will be able to:
y Define probability
y Define attribute and measurement variable data
y
Discuss visual displays of datay Discuss numerical methods of describing data
y Interpret the standard deviation as a measure of
variation
y
Calculate probabilities under the normal distribution
-
8/9/2019 Chapter 5Qualtiy Control
5/64
Introductiony In this chapter we look at the:
y issues of quantifying probabilities and examiningcharacteristics of data taken from a population orprocess.
y the types of data that may be encountered and howthey are classified.
y the methods for visually displaying data.
y the calculation of numerical measures to describedata.
-
8/9/2019 Chapter 5Qualtiy Control
6/64
Probability Definedy There are three popular definitions of probability:
y the classical definition,
y the relative frequency definition, and
y Bayesian definition (not discussed here).
-
8/9/2019 Chapter 5Qualtiy Control
7/64
The Classical Definition of Probability
y The classical approach to probability states that ifan experiment has N equally likely and mutuallyexclusive outcomes, and if n of those outcomescorrespond to the occurrence of event A, then theprobability of event A is
P(A) = n/Ny where:
y n = the number of experimental outcomes that correspondto the occurrence of event A, and
y N = the total number of experimental outcomes.
-
8/9/2019 Chapter 5Qualtiy Control
8/64
-
8/9/2019 Chapter 5Qualtiy Control
9/64
The Relative Frequency Definition of Probability
y The relative frequency approach to probabilitystates that if an experiment is conducted a largenumber of times (say k times), then the probabilityof event A occurring is
P(A) = k/M
y where:y k = the number of times A occurred during these
experiments, and
y M = the maximum number of times that event A could haveoccurred during these experiments.
-
8/9/2019 Chapter 5Qualtiy Control
10/64
For example, suppose in the die-tossing example
that a fair die was tossed 100,000 times and that a2, 4, or 6 appeared in 50,097 throws. Consequently,
the relative frequency probability of tossing an even
number on a fair die is
P(Even) = 50,097/100,000 = 0.50097
-
8/9/2019 Chapter 5Qualtiy Control
11/64
The classical definition of the probability of aneven toss and the relative frequency definition ofan even toss differ as a result of the methodsused in their respective calculations.
-
8/9/2019 Chapter 5Qualtiy Control
12/64
Analytic Studies and Relative Frequency
ProbabilitiesyAnalytic studies are conducted to determine
process characteristics. But process characteristicshave a past and present and will have a future;hence there is no frame from which classical
probabilities can be calculated.y Probabilities concerning process characteristics
must be obtained empirically, throughexperimentation, and must therefore be relativefrequency probabilities.
-
8/9/2019 Chapter 5Qualtiy Control
13/64
Types of Datay Data is information collected about a product,
service, process, person, or machine.
y We classify data into two types:y attribute data
y variables (measurement) data.
-
8/9/2019 Chapter 5Qualtiy Control
14/64
yAttribute DatayAttribute data arise:
y From the classification of items, such as products orservices, into categories (e.g., conforming or non-conforming).
y From counts of the number of items in a given category or
the proportion in a given category (e.g., the proportion ofdefective units in a sample of units).
y From counts of the number of occurrences per unit (e.g., thenumber of defects per unit).
-
8/9/2019 Chapter 5Qualtiy Control
15/64
Grievance Number Grievance Level
1 1
2 1
3 1
4 25 4
6 1
7 1
8 1
9 3
10 1
11 1
Classification of 2003 Union Grievances into
Four Categories
-
8/9/2019 Chapter 5Qualtiy Control
16/64
Category Number of
Grievances
Proportion of
Grievances
First Level 8 8/11 = 0.73
Second Level 1 1/11 = 0.09
Third Level 1 1/11 = 0.09
Fourth Level 1 1/11 = 0.09
-
8/9/2019 Chapter 5Qualtiy Control
17/64
Variables (Measurement) Datay From the measurement of a characteristic of a
product, service, or process and
y From the computation of a numerical value from twoor more measurements of variables data.
-
8/9/2019 Chapter 5Qualtiy Control
18/64
-
8/9/2019 Chapter 5Qualtiy Control
19/64
Characterizing Datay Enumerative Studies
yA complete census of the frame in an enumerativestudy provides all the information needed to takeaction on the frame.
y If the information used as the basis for actionconstitutes a random sample from the frame, theseerrors can be quantified, and valid statisticalinferences can be made on the frame in question.
-
8/9/2019 Chapter 5Qualtiy Control
20/64
yAnalytic StudiesyA complete census of the frame is impossible in an
analytic study, for a frame consists of all past,present, and future observations, and futureobservations cannot be measured.
yAs we are dealing with an ongoing process, wewish to characterize the data to take action on thatprocess for the future.
-
8/9/2019 Chapter 5Qualtiy Control
21/64
y Unlike the enumerative study, since the frame isunknown (the future cannot be measured) it is not
possible to quantify these errors.yAny inferences we make in an analytic study are
conditional on the environmental state when thesample was selected; that environmental state willnever again exist.
y
Information on such a problem can never becomplete.y If, however, a knowledge of the process and the
environment and an analysis of the data indicatethat the process is stable and predictable, and willremain so in the near future, the visual and
numerical characterizations discussed in thischapter can be used to make inferences and takeaction in the near future.
-
8/9/2019 Chapter 5Qualtiy Control
22/64
Visually Describing DataTabular Displays
y Frequency Distributions.y A frequency distribution shows us, in tabular form, the
number of times, or the frequency with which, a given valueor group of values occurs.
-
8/9/2019 Chapter 5Qualtiy Control
23/64
-
8/9/2019 Chapter 5Qualtiy Control
24/64
Tabular Displays of Attribute Data
-
8/9/2019 Chapter 5Qualtiy Control
25/64
Tabular Displays of Measurement Data (toppanel) and Attribute Data (bottom panel)
-
8/9/2019 Chapter 5Qualtiy Control
26/64
Limitations ofFrequency Displays.
y It is important to note that the frequencydisplays we have discussed do not includeinformation on the time-ordering of data.
y
In an analytic study, where we examine thesample to take action on the process, afrequency display would fail to show trendsthat may be occurring over time.
y This loss of information can be critical.
-
8/9/2019 Chapter 5Qualtiy Control
27/64
Graphical Displaysy
Data are often represented in graphical form.y Frequency distributions of measurement (variables)
data are commonly presented in frequencypolygons or histograms.y Frequency distributions of attribute data are commonly
presented in bar charts.
y In all these displays, the class intervals are drawn along thehorizontal axis, and the absolute or relative frequenciesalong the vertical axis.
-
8/9/2019 Chapter 5Qualtiy Control
28/64
-
8/9/2019 Chapter 5Qualtiy Control
29/64
-
8/9/2019 Chapter 5Qualtiy Control
30/64
Run Chart: Importance of Time-Ordering in
Analytic Studies.y In analytic studies we want to be able to detect trends or
other patterns over time to take action on a process in thenear future.
y In a run chart (also called a tier chart), this information ispreserved by plotting the observed values on the vertical
axis and the times they were observed on the horizontalaxis.
-
8/9/2019 Chapter 5Qualtiy Control
31/64
Graphical Displays of Measurement Data with Time
-
8/9/2019 Chapter 5Qualtiy Control
32/64
-
8/9/2019 Chapter 5Qualtiy Control
33/64
y Measures of Central Tendency
y Mean
y Median
y Mode
-
8/9/2019 Chapter 5Qualtiy Control
34/64
y The Mean
y
In trying to convey the underlying character of variablesdata by somehow representing the typical value of the data,
the most common numerical representation is the arithmetic
average or mean: the sum of the numerical values of the
measurement divided by the number of items examined.
-
8/9/2019 Chapter 5Qualtiy Control
35/64
In an enumerative study, if the items constitute a frame,
the average is called the population mean and is
usually denoted by the Greek letterQ (pronounced
"mew"). When the items constitute a sample drawn
from a frame, we call
the average a sample mean and denote it as ("x bar").
Thus, in an enumerative study, we might make
reference to eitherQor x
x
-
8/9/2019 Chapter 5Qualtiy Control
36/64
y In an analytic study, there is no population (future
output does not yet exist), and hence we cannot
describe a population mean. The mean of a sampled
subgroup (e.g., four items from a day's
production) is denoted as , and when we average
the values to calculate a process mean,
we denote that value as ("x bar-bar" or x
double-bar).
x
x
-
8/9/2019 Chapter 5Qualtiy Control
37/64
y The sample mean, can be calculated as
y where, n is the size of the sample or the number of
items included in the determination of the samplemean.
n!
-
8/9/2019 Chapter 5Qualtiy Control
38/64
y If we calculate sample means for subgroups of size n in
the past and present, we can calculate the process
mean as the average of these subgroup means:
subgroupsofNumber
!
-
8/9/2019 Chapter 5Qualtiy Control
39/64
6.0228
168.68
subgroupsof Nu ber!!!
-
8/9/2019 Chapter 5Qualtiy Control
40/64
y The Median
y
The middle value when the data are arranged in ascendingorder. When there are an even number of observations, the
median value is the arithmetic average of the middle two
values.
y The median is the middle value, 50 percent of the data
points must have values less than the median.
y The median is not as influenced by the magnitude of the
extreme items as is the mean.
-
8/9/2019 Chapter 5Qualtiy Control
41/64
y The interpretation of the median in an analyticstudy, however, can be misleading.y Where extreme data points are observed in a
sample, we have seen that they do not affect thecomputation of the median.
y In an analytic study, where we are concerned withthe process itself, the existence of extreme datapoints is a critical factor in our analysis. That is,extreme data points may indicate processdisturbances and instability, and the need forcorrective action on the process.
y
A measure like the median, which is insensitive toextreme data points, must be used with caution inan analytic study.
-
8/9/2019 Chapter 5Qualtiy Control
42/64
y The Mode
y The mode of a distribution is the value that occurs most
frequently, or the value corresponding to the highest point
on a frequency polygon or histogram.
y Like the median, and unlike the mean, it is not affected by
extreme data points.
y A frequency distribution with one such high point is called
unimodal.
y Distributions with two high points of concentration are called
bimodal.
-
8/9/2019 Chapter 5Qualtiy Control
43/64
y Like the median, an important characteristic of the mode is
that it is not affected by extreme data points. In analyticstudies, where extreme data points may reveal a great deal
about the process under investigation, the mode should
therefore be used with caution.
-
8/9/2019 Chapter 5Qualtiy Control
44/64
y The Proportion
y The proportion or fraction of the data possessing one of twosuch conditions is then a meaningful measure of central
tendency.y p = x/n
y where, x is the number of defective items and n is the totalnumber of items in the sample.
y Thus, p = 8/38 = 0.21
-
8/9/2019 Chapter 5Qualtiy Control
45/64
Measures of Variability
yAll populations and processes have some degree of
variability, given appropriate sensitivity of themeasuring instrument; not all items in a population
or process are identical. Thus we must be able to
quantify not only the central tendency but also the
degree of variability in a set of data.y The two commonly used quantitative measures of
such variability are the range and the standard
deviation.
-
8/9/2019 Chapter 5Qualtiy Control
46/64
y The Range
y The range is the simplest measure of dispersion; for raw
data from an enumerative or an analytic study, it is definedas the difference between the largest data point and the
smallest data point in a set of data:
R = xmax - xmin
-
8/9/2019 Chapter 5Qualtiy Control
47/64
Item
No.
Process
A
Item
No.
Process
B
Item
No.
Process
C
1 5.0 1 5.0 1 7.62 5.3 2 7.8 2 7.8
3 8.0 3 7.9 3 8.0
4 9.2 4 8.0 4 8.1
5 10.0 5 8.8 5 8.1
6 10.5 6 10.5 6 8.4
8.0 8.0 8.0
Range 5.5 5.5 0.8
x
Weights (in grams) from three manufacturing
processes
-
8/9/2019 Chapter 5Qualtiy Control
48/64
y The Standard Deviation
y The standard deviation as a measure of dispersion takesinto account each of the data points and their distances
from the mean.
y The more disperse the data points, the larger the standard
deviation will be; the closer the data points to the mean, the
smaller the standard deviation will be.
-
8/9/2019 Chapter 5Qualtiy Control
49/64
y The Standard Deviation
y In an enumerative study, the population standard
deviation is computed as:
N
)-(x 2 Q!W
-
8/9/2019 Chapter 5Qualtiy Control
50/64
The Standard Deviationy For both enumerative and analytic studies, we calculate
the standard deviation of a sample (or subgroup) of nobservations, called the sample standard deviation, s:
1n)x-(xs
2
!
-
8/9/2019 Chapter 5Qualtiy Control
51/64
Item
No.
Process
A
Item
No.
Process
B
Item
No.
Process
C
1 5.0 1 5.0 1 7.62 5.3 2 7.8 2 7.8
3 8.0 3 7.9 3 8.0
4 9.2 4 8.0 4 8.1
5 10.0 5 8.8 5 8.16 10.5 6 10.5 6 8.4
8.0 8.0 8.0
s 2.37 1.79 0.28
x
Weights (in grams) from three manufacturing
processes
-
8/9/2019 Chapter 5Qualtiy Control
52/64
y Measures of Shape
y Skewness
y Kurtosis
-
8/9/2019 Chapter 5Qualtiy Control
53/64
y Measures of Shape
y Skewness
y the skewness, or lack of symmetry of a set of data
y A numerical measure of skewness, Pearson's coefficient of
skewness, is defined as:
s
)M-x3(Skewness ep !
Skewness
= 0
Skewness
> 0
Skewness
< 0
-
8/9/2019 Chapter 5Qualtiy Control
54/64
y Measures of Shape
y Skewness
y Another way to view skewness is as a measure of the
relative sizes of the tails of the distribution.
y In symmetric distributions (where the two tails are the
same), the coefficient of skewness will be zero.
y In skewed distributions (where the difference between the
frequencies in the two tails is large), the magnitude of the
coefficient of skewness will be large.
-
8/9/2019 Chapter 5Qualtiy Control
55/64
y Measures of Shapey Kurtosis
y Peakedness or kurtosis. A distribution with a relatively high
concentration of data in the middle and at the tails, but lowconcentration in the shoulders, has a large kurtosis; one that isrelatively flat in the middle, with fat shoulders and thin tails, haslittle kurtosis.
y A numerical measure of kurtosis is given by:
3-s
)-(Kurtosis
4
4
-
8/9/2019 Chapter 5Qualtiy Control
56/64
Kurtosis
-
8/9/2019 Chapter 5Qualtiy Control
57/64
y Interpretation of the Standard Deviationy The standard deviation is interpreted by determining
the proportion of data that lies within k standarddeviations from the mean for a distribution.
y This proportion is directly a function of the shape ofthe distribution.
y In an analytic study, the distribution must be stable
(only common causes of variation) to interpret thestandard deviation.
y There are four classic scenarios for distributions ofdata:y normal (bell-shaped) distribution
y skewed to the right distribution
y skewed to the left distribution
y unknown distribution.
-
8/9/2019 Chapter 5Qualtiy Control
58/64
y Normal Distribution.
y
In the case of data that is normally distributed, theprobability of obtaining a random data point within
k standard deviation of the mean is:
y k = 1 P(Q - 1W < X < Q + 1W) = .6826
y k = 2 P(Q - 2W < X < Q + 2W) = .9544
y k = 3 P(Q - 3W < X < Q + 3W) = .9973
-
8/9/2019 Chapter 5Qualtiy Control
59/64
y Skewed Distribution (Right or Left).
y In the case of data that is unimodal (and skewed
to the right or the left), the probability of obtaining
a random data point within k standard deviation
from the mean is described by the Camp-Meidel
inequality as follows:
? A
uWQWQ 2
k2.25
1-1)kXk-P(
-
8/9/2019 Chapter 5Qualtiy Control
60/64
y To summarize equation, we can say that:
y k = 1 P(Q - 1W < X < Q + 1W) u 1- (1/[2.25]12)
= 0.5556
y k = 2 P(Q - 2W < X < Q + 2W) u 1- (1/[2.25]22)
= 0.8889
y k = 3 P(Q - 3W < X < Q + 3W) u 1- (1/[2.25]32)
= 0.9506
-
8/9/2019 Chapter 5Qualtiy Control
61/64
y Unknown Distribution
y In the case of data for which the distribution is
unknown, the probability of obtaining a random data
point within k standard deviation from the mean,
assuming that k u 1, is described by Chebychevs
inequality as follows:
P(Q - kW < X < Q + kW) u 1- (1/k2)y To summarize equation, we can say that:
y k = 1 P(Q - 1W < X < Q + 1W) u 1- (1/12) = 0.0000
y k = 2 P(Q - 2W < X < Q + 2W) u 1- (1/22) = 0.7500
y k = 3 P(Q - 3W < X < Q + 3W) u 1- (1/32) = 0.8889
-
8/9/2019 Chapter 5Qualtiy Control
62/64
y If the standard deviation of a distribution is small,we do not have to go very far either side of the
mean to include a large portion of the data in the
distribution, even if we do not know anything about
the shape of the distribution.
-
8/9/2019 Chapter 5Qualtiy Control
63/64
y More Details on the Normal Distribution
y We often wish to calculate the probabilities underthe normal distribution between two values of the
random variable X, or the probabilities under the
normal distribution above and/or below one value of
the random variable X.
-
8/9/2019 Chapter 5Qualtiy Control
64/64
y For example, the figure below shows the
probability of selecting a value of X between x1
and x2, given a stable normal distribution with amean ofQ and a standard deviation ofW.