chapter 5qualtiy control

Upload: engineermq

Post on 30-May-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/9/2019 Chapter 5Qualtiy Control

    1/64

    Chapter 5

    Basic Probability andStatistics

  • 8/9/2019 Chapter 5Qualtiy Control

    2/64

    Basic Probability and Statistics

    y List of Sectionsy Introduction

    y Probability Defined

    y Types of Data

    y Characteristics of Data

    y Visually Describing Data

    y Numerically Describing Datay Take-Away Knowledge

  • 8/9/2019 Chapter 5Qualtiy Control

    3/64

    Chapter 5:

    Basic Probability and Statistics

    List of Sections

    y

    Summaryy Key Terms

    y Exercises

    y References

    yAppendix A5.1: Using Windows

    yAppendix A5.2: Introduction to Minitab

    yAppendix A5.3: Using Minitab for charts, descriptive

    statistics and normal probabilities

  • 8/9/2019 Chapter 5Qualtiy Control

    4/64

    Chapter 5:Basic Probability and Statistics

    y You will be able to:

    y Define probability

    y Define attribute and measurement variable data

    y

    Discuss visual displays of datay Discuss numerical methods of describing data

    y Interpret the standard deviation as a measure of

    variation

    y

    Calculate probabilities under the normal distribution

  • 8/9/2019 Chapter 5Qualtiy Control

    5/64

    Introductiony In this chapter we look at the:

    y issues of quantifying probabilities and examiningcharacteristics of data taken from a population orprocess.

    y the types of data that may be encountered and howthey are classified.

    y the methods for visually displaying data.

    y the calculation of numerical measures to describedata.

  • 8/9/2019 Chapter 5Qualtiy Control

    6/64

    Probability Definedy There are three popular definitions of probability:

    y the classical definition,

    y the relative frequency definition, and

    y Bayesian definition (not discussed here).

  • 8/9/2019 Chapter 5Qualtiy Control

    7/64

    The Classical Definition of Probability

    y The classical approach to probability states that ifan experiment has N equally likely and mutuallyexclusive outcomes, and if n of those outcomescorrespond to the occurrence of event A, then theprobability of event A is

    P(A) = n/Ny where:

    y n = the number of experimental outcomes that correspondto the occurrence of event A, and

    y N = the total number of experimental outcomes.

  • 8/9/2019 Chapter 5Qualtiy Control

    8/64

  • 8/9/2019 Chapter 5Qualtiy Control

    9/64

    The Relative Frequency Definition of Probability

    y The relative frequency approach to probabilitystates that if an experiment is conducted a largenumber of times (say k times), then the probabilityof event A occurring is

    P(A) = k/M

    y where:y k = the number of times A occurred during these

    experiments, and

    y M = the maximum number of times that event A could haveoccurred during these experiments.

  • 8/9/2019 Chapter 5Qualtiy Control

    10/64

    For example, suppose in the die-tossing example

    that a fair die was tossed 100,000 times and that a2, 4, or 6 appeared in 50,097 throws. Consequently,

    the relative frequency probability of tossing an even

    number on a fair die is

    P(Even) = 50,097/100,000 = 0.50097

  • 8/9/2019 Chapter 5Qualtiy Control

    11/64

    The classical definition of the probability of aneven toss and the relative frequency definition ofan even toss differ as a result of the methodsused in their respective calculations.

  • 8/9/2019 Chapter 5Qualtiy Control

    12/64

    Analytic Studies and Relative Frequency

    ProbabilitiesyAnalytic studies are conducted to determine

    process characteristics. But process characteristicshave a past and present and will have a future;hence there is no frame from which classical

    probabilities can be calculated.y Probabilities concerning process characteristics

    must be obtained empirically, throughexperimentation, and must therefore be relativefrequency probabilities.

  • 8/9/2019 Chapter 5Qualtiy Control

    13/64

    Types of Datay Data is information collected about a product,

    service, process, person, or machine.

    y We classify data into two types:y attribute data

    y variables (measurement) data.

  • 8/9/2019 Chapter 5Qualtiy Control

    14/64

    yAttribute DatayAttribute data arise:

    y From the classification of items, such as products orservices, into categories (e.g., conforming or non-conforming).

    y From counts of the number of items in a given category or

    the proportion in a given category (e.g., the proportion ofdefective units in a sample of units).

    y From counts of the number of occurrences per unit (e.g., thenumber of defects per unit).

  • 8/9/2019 Chapter 5Qualtiy Control

    15/64

    Grievance Number Grievance Level

    1 1

    2 1

    3 1

    4 25 4

    6 1

    7 1

    8 1

    9 3

    10 1

    11 1

    Classification of 2003 Union Grievances into

    Four Categories

  • 8/9/2019 Chapter 5Qualtiy Control

    16/64

    Category Number of

    Grievances

    Proportion of

    Grievances

    First Level 8 8/11 = 0.73

    Second Level 1 1/11 = 0.09

    Third Level 1 1/11 = 0.09

    Fourth Level 1 1/11 = 0.09

  • 8/9/2019 Chapter 5Qualtiy Control

    17/64

    Variables (Measurement) Datay From the measurement of a characteristic of a

    product, service, or process and

    y From the computation of a numerical value from twoor more measurements of variables data.

  • 8/9/2019 Chapter 5Qualtiy Control

    18/64

  • 8/9/2019 Chapter 5Qualtiy Control

    19/64

    Characterizing Datay Enumerative Studies

    yA complete census of the frame in an enumerativestudy provides all the information needed to takeaction on the frame.

    y If the information used as the basis for actionconstitutes a random sample from the frame, theseerrors can be quantified, and valid statisticalinferences can be made on the frame in question.

  • 8/9/2019 Chapter 5Qualtiy Control

    20/64

    yAnalytic StudiesyA complete census of the frame is impossible in an

    analytic study, for a frame consists of all past,present, and future observations, and futureobservations cannot be measured.

    yAs we are dealing with an ongoing process, wewish to characterize the data to take action on thatprocess for the future.

  • 8/9/2019 Chapter 5Qualtiy Control

    21/64

    y Unlike the enumerative study, since the frame isunknown (the future cannot be measured) it is not

    possible to quantify these errors.yAny inferences we make in an analytic study are

    conditional on the environmental state when thesample was selected; that environmental state willnever again exist.

    y

    Information on such a problem can never becomplete.y If, however, a knowledge of the process and the

    environment and an analysis of the data indicatethat the process is stable and predictable, and willremain so in the near future, the visual and

    numerical characterizations discussed in thischapter can be used to make inferences and takeaction in the near future.

  • 8/9/2019 Chapter 5Qualtiy Control

    22/64

    Visually Describing DataTabular Displays

    y Frequency Distributions.y A frequency distribution shows us, in tabular form, the

    number of times, or the frequency with which, a given valueor group of values occurs.

  • 8/9/2019 Chapter 5Qualtiy Control

    23/64

  • 8/9/2019 Chapter 5Qualtiy Control

    24/64

    Tabular Displays of Attribute Data

  • 8/9/2019 Chapter 5Qualtiy Control

    25/64

    Tabular Displays of Measurement Data (toppanel) and Attribute Data (bottom panel)

  • 8/9/2019 Chapter 5Qualtiy Control

    26/64

    Limitations ofFrequency Displays.

    y It is important to note that the frequencydisplays we have discussed do not includeinformation on the time-ordering of data.

    y

    In an analytic study, where we examine thesample to take action on the process, afrequency display would fail to show trendsthat may be occurring over time.

    y This loss of information can be critical.

  • 8/9/2019 Chapter 5Qualtiy Control

    27/64

    Graphical Displaysy

    Data are often represented in graphical form.y Frequency distributions of measurement (variables)

    data are commonly presented in frequencypolygons or histograms.y Frequency distributions of attribute data are commonly

    presented in bar charts.

    y In all these displays, the class intervals are drawn along thehorizontal axis, and the absolute or relative frequenciesalong the vertical axis.

  • 8/9/2019 Chapter 5Qualtiy Control

    28/64

  • 8/9/2019 Chapter 5Qualtiy Control

    29/64

  • 8/9/2019 Chapter 5Qualtiy Control

    30/64

    Run Chart: Importance of Time-Ordering in

    Analytic Studies.y In analytic studies we want to be able to detect trends or

    other patterns over time to take action on a process in thenear future.

    y In a run chart (also called a tier chart), this information ispreserved by plotting the observed values on the vertical

    axis and the times they were observed on the horizontalaxis.

  • 8/9/2019 Chapter 5Qualtiy Control

    31/64

    Graphical Displays of Measurement Data with Time

  • 8/9/2019 Chapter 5Qualtiy Control

    32/64

  • 8/9/2019 Chapter 5Qualtiy Control

    33/64

    y Measures of Central Tendency

    y Mean

    y Median

    y Mode

  • 8/9/2019 Chapter 5Qualtiy Control

    34/64

    y The Mean

    y

    In trying to convey the underlying character of variablesdata by somehow representing the typical value of the data,

    the most common numerical representation is the arithmetic

    average or mean: the sum of the numerical values of the

    measurement divided by the number of items examined.

  • 8/9/2019 Chapter 5Qualtiy Control

    35/64

    In an enumerative study, if the items constitute a frame,

    the average is called the population mean and is

    usually denoted by the Greek letterQ (pronounced

    "mew"). When the items constitute a sample drawn

    from a frame, we call

    the average a sample mean and denote it as ("x bar").

    Thus, in an enumerative study, we might make

    reference to eitherQor x

    x

  • 8/9/2019 Chapter 5Qualtiy Control

    36/64

    y In an analytic study, there is no population (future

    output does not yet exist), and hence we cannot

    describe a population mean. The mean of a sampled

    subgroup (e.g., four items from a day's

    production) is denoted as , and when we average

    the values to calculate a process mean,

    we denote that value as ("x bar-bar" or x

    double-bar).

    x

    x

  • 8/9/2019 Chapter 5Qualtiy Control

    37/64

    y The sample mean, can be calculated as

    y where, n is the size of the sample or the number of

    items included in the determination of the samplemean.

    n!

  • 8/9/2019 Chapter 5Qualtiy Control

    38/64

    y If we calculate sample means for subgroups of size n in

    the past and present, we can calculate the process

    mean as the average of these subgroup means:

    subgroupsofNumber

    !

  • 8/9/2019 Chapter 5Qualtiy Control

    39/64

    6.0228

    168.68

    subgroupsof Nu ber!!!

  • 8/9/2019 Chapter 5Qualtiy Control

    40/64

    y The Median

    y

    The middle value when the data are arranged in ascendingorder. When there are an even number of observations, the

    median value is the arithmetic average of the middle two

    values.

    y The median is the middle value, 50 percent of the data

    points must have values less than the median.

    y The median is not as influenced by the magnitude of the

    extreme items as is the mean.

  • 8/9/2019 Chapter 5Qualtiy Control

    41/64

    y The interpretation of the median in an analyticstudy, however, can be misleading.y Where extreme data points are observed in a

    sample, we have seen that they do not affect thecomputation of the median.

    y In an analytic study, where we are concerned withthe process itself, the existence of extreme datapoints is a critical factor in our analysis. That is,extreme data points may indicate processdisturbances and instability, and the need forcorrective action on the process.

    y

    A measure like the median, which is insensitive toextreme data points, must be used with caution inan analytic study.

  • 8/9/2019 Chapter 5Qualtiy Control

    42/64

    y The Mode

    y The mode of a distribution is the value that occurs most

    frequently, or the value corresponding to the highest point

    on a frequency polygon or histogram.

    y Like the median, and unlike the mean, it is not affected by

    extreme data points.

    y A frequency distribution with one such high point is called

    unimodal.

    y Distributions with two high points of concentration are called

    bimodal.

  • 8/9/2019 Chapter 5Qualtiy Control

    43/64

    y Like the median, an important characteristic of the mode is

    that it is not affected by extreme data points. In analyticstudies, where extreme data points may reveal a great deal

    about the process under investigation, the mode should

    therefore be used with caution.

  • 8/9/2019 Chapter 5Qualtiy Control

    44/64

    y The Proportion

    y The proportion or fraction of the data possessing one of twosuch conditions is then a meaningful measure of central

    tendency.y p = x/n

    y where, x is the number of defective items and n is the totalnumber of items in the sample.

    y Thus, p = 8/38 = 0.21

  • 8/9/2019 Chapter 5Qualtiy Control

    45/64

    Measures of Variability

    yAll populations and processes have some degree of

    variability, given appropriate sensitivity of themeasuring instrument; not all items in a population

    or process are identical. Thus we must be able to

    quantify not only the central tendency but also the

    degree of variability in a set of data.y The two commonly used quantitative measures of

    such variability are the range and the standard

    deviation.

  • 8/9/2019 Chapter 5Qualtiy Control

    46/64

    y The Range

    y The range is the simplest measure of dispersion; for raw

    data from an enumerative or an analytic study, it is definedas the difference between the largest data point and the

    smallest data point in a set of data:

    R = xmax - xmin

  • 8/9/2019 Chapter 5Qualtiy Control

    47/64

    Item

    No.

    Process

    A

    Item

    No.

    Process

    B

    Item

    No.

    Process

    C

    1 5.0 1 5.0 1 7.62 5.3 2 7.8 2 7.8

    3 8.0 3 7.9 3 8.0

    4 9.2 4 8.0 4 8.1

    5 10.0 5 8.8 5 8.1

    6 10.5 6 10.5 6 8.4

    8.0 8.0 8.0

    Range 5.5 5.5 0.8

    x

    Weights (in grams) from three manufacturing

    processes

  • 8/9/2019 Chapter 5Qualtiy Control

    48/64

    y The Standard Deviation

    y The standard deviation as a measure of dispersion takesinto account each of the data points and their distances

    from the mean.

    y The more disperse the data points, the larger the standard

    deviation will be; the closer the data points to the mean, the

    smaller the standard deviation will be.

  • 8/9/2019 Chapter 5Qualtiy Control

    49/64

    y The Standard Deviation

    y In an enumerative study, the population standard

    deviation is computed as:

    N

    )-(x 2 Q!W

  • 8/9/2019 Chapter 5Qualtiy Control

    50/64

    The Standard Deviationy For both enumerative and analytic studies, we calculate

    the standard deviation of a sample (or subgroup) of nobservations, called the sample standard deviation, s:

    1n)x-(xs

    2

    !

  • 8/9/2019 Chapter 5Qualtiy Control

    51/64

    Item

    No.

    Process

    A

    Item

    No.

    Process

    B

    Item

    No.

    Process

    C

    1 5.0 1 5.0 1 7.62 5.3 2 7.8 2 7.8

    3 8.0 3 7.9 3 8.0

    4 9.2 4 8.0 4 8.1

    5 10.0 5 8.8 5 8.16 10.5 6 10.5 6 8.4

    8.0 8.0 8.0

    s 2.37 1.79 0.28

    x

    Weights (in grams) from three manufacturing

    processes

  • 8/9/2019 Chapter 5Qualtiy Control

    52/64

    y Measures of Shape

    y Skewness

    y Kurtosis

  • 8/9/2019 Chapter 5Qualtiy Control

    53/64

    y Measures of Shape

    y Skewness

    y the skewness, or lack of symmetry of a set of data

    y A numerical measure of skewness, Pearson's coefficient of

    skewness, is defined as:

    s

    )M-x3(Skewness ep !

    Skewness

    = 0

    Skewness

    > 0

    Skewness

    < 0

  • 8/9/2019 Chapter 5Qualtiy Control

    54/64

    y Measures of Shape

    y Skewness

    y Another way to view skewness is as a measure of the

    relative sizes of the tails of the distribution.

    y In symmetric distributions (where the two tails are the

    same), the coefficient of skewness will be zero.

    y In skewed distributions (where the difference between the

    frequencies in the two tails is large), the magnitude of the

    coefficient of skewness will be large.

  • 8/9/2019 Chapter 5Qualtiy Control

    55/64

    y Measures of Shapey Kurtosis

    y Peakedness or kurtosis. A distribution with a relatively high

    concentration of data in the middle and at the tails, but lowconcentration in the shoulders, has a large kurtosis; one that isrelatively flat in the middle, with fat shoulders and thin tails, haslittle kurtosis.

    y A numerical measure of kurtosis is given by:

    3-s

    )-(Kurtosis

    4

    4

  • 8/9/2019 Chapter 5Qualtiy Control

    56/64

    Kurtosis

  • 8/9/2019 Chapter 5Qualtiy Control

    57/64

    y Interpretation of the Standard Deviationy The standard deviation is interpreted by determining

    the proportion of data that lies within k standarddeviations from the mean for a distribution.

    y This proportion is directly a function of the shape ofthe distribution.

    y In an analytic study, the distribution must be stable

    (only common causes of variation) to interpret thestandard deviation.

    y There are four classic scenarios for distributions ofdata:y normal (bell-shaped) distribution

    y skewed to the right distribution

    y skewed to the left distribution

    y unknown distribution.

  • 8/9/2019 Chapter 5Qualtiy Control

    58/64

    y Normal Distribution.

    y

    In the case of data that is normally distributed, theprobability of obtaining a random data point within

    k standard deviation of the mean is:

    y k = 1 P(Q - 1W < X < Q + 1W) = .6826

    y k = 2 P(Q - 2W < X < Q + 2W) = .9544

    y k = 3 P(Q - 3W < X < Q + 3W) = .9973

  • 8/9/2019 Chapter 5Qualtiy Control

    59/64

    y Skewed Distribution (Right or Left).

    y In the case of data that is unimodal (and skewed

    to the right or the left), the probability of obtaining

    a random data point within k standard deviation

    from the mean is described by the Camp-Meidel

    inequality as follows:

    ? A

    uWQWQ 2

    k2.25

    1-1)kXk-P(

  • 8/9/2019 Chapter 5Qualtiy Control

    60/64

    y To summarize equation, we can say that:

    y k = 1 P(Q - 1W < X < Q + 1W) u 1- (1/[2.25]12)

    = 0.5556

    y k = 2 P(Q - 2W < X < Q + 2W) u 1- (1/[2.25]22)

    = 0.8889

    y k = 3 P(Q - 3W < X < Q + 3W) u 1- (1/[2.25]32)

    = 0.9506

  • 8/9/2019 Chapter 5Qualtiy Control

    61/64

    y Unknown Distribution

    y In the case of data for which the distribution is

    unknown, the probability of obtaining a random data

    point within k standard deviation from the mean,

    assuming that k u 1, is described by Chebychevs

    inequality as follows:

    P(Q - kW < X < Q + kW) u 1- (1/k2)y To summarize equation, we can say that:

    y k = 1 P(Q - 1W < X < Q + 1W) u 1- (1/12) = 0.0000

    y k = 2 P(Q - 2W < X < Q + 2W) u 1- (1/22) = 0.7500

    y k = 3 P(Q - 3W < X < Q + 3W) u 1- (1/32) = 0.8889

  • 8/9/2019 Chapter 5Qualtiy Control

    62/64

    y If the standard deviation of a distribution is small,we do not have to go very far either side of the

    mean to include a large portion of the data in the

    distribution, even if we do not know anything about

    the shape of the distribution.

  • 8/9/2019 Chapter 5Qualtiy Control

    63/64

    y More Details on the Normal Distribution

    y We often wish to calculate the probabilities underthe normal distribution between two values of the

    random variable X, or the probabilities under the

    normal distribution above and/or below one value of

    the random variable X.

  • 8/9/2019 Chapter 5Qualtiy Control

    64/64

    y For example, the figure below shows the

    probability of selecting a value of X between x1

    and x2, given a stable normal distribution with amean ofQ and a standard deviation ofW.