discrete and random variables
DESCRIPTION
assignment qcTRANSCRIPT
INTRODUCTION
In statistics, numerical random variables represent counts and measurements. They come in
two different flavors: discrete and continuous, depending on the type of outcomes that are
possible:
Discrete random variables. If the possible outcomes of a random variable can be listed out
using a finite (or countably infinite) set of single numbers (for example, {0, 1, 2 . . . , 10}; or
{-3, -2.75, 0, 1.5}; or {10, 20, 30, 40, 50…} ), then the random variable is discrete.
Continuous random variables. If the possible outcomes of a random variable can only be
described using an interval of real numbers (for example, all real numbers from zero to
ten ), then the random variable iscontinuous.
Discrete random variables typically represent counts — for example, the number of people who
voted yes for a smoking ban out of a random sample of 100 people (possible values are 0, 1, 2, .
. . , 100); or the number of accidents at a certain intersection over one year's time (possible
values are 0, 1, 2, . . .).
Discrete random variables have two classes: finite and countably infinite. A discrete random
variable is finite if its list of possible values has a fixed (finite) number of elements in it (for
example, the number of smoking ban supporters in a random sample of 100 voters has to be
between 0 and 100). One very common finite random variable is obtained from the binomial
distribution.
A discrete random variable is countably infinite if its possible values can be specifically listed out
but they have no specific end. For example, the number of accidents occurring at a certain
intersection over a 10-year period can take on possible values: 0, 1, 2, . . . (in theory, the
number of accidents can take on infinitely many values.).
Continuous random variables typically represent measurements, such as time to complete a
task (for example 1 minute 10 seconds, 1 minute 20 seconds, and so on) or the weight of a
newborn. What separates continuous random variables from discrete ones is that they
are uncountably infinite; they have too many possible values to list out or to count and/or they
can be measured to a high level of precision (such as the level of smog in the air in Los Angeles
on a given day, measured in parts per million).
CONTROL CHARTS FOR ATTRIBUTE VARIABLES
Like the continuous variable control charts, the control chart for an attribute variable
also takes the form of a sideways, two-way comparison of a two-sided hypothesis test. The
difference between the continuous and attribute control charts lie in the underlying
distributions. Most continuous variables are well represented by the Normal Distribution, while
the attribute variables are typically modeled by the Binomial or Poisson Distributions.
Both the Binomial and Poisson are forms of discrete distributions, which means that the
variable takes on non-negative, integer values (such as defect counts). Continuous
distributions, on the other hand, allow the variable to take on non-negative, real values, such as
length measurements. For both types of distributions, the probability of observing a particular
outcome was found using the probability density function (the curve) of the distribution.
With continuous distributions, the probability of observing a range of values was
defined by the area under the curve. This area was computed by integration (or looking up the
value from a table of integral values). For a discrete distribution, the probability of observing a
range of outcomes is found by summing up the probability of observing each outcome in the
range of values.
Example:
Assume that you have two six-sided dice. The possible outcomes for the sum of the two
die are the discrete values from 2 through 12. If you tabulated the different ways (rolls
of each die) in which you could reach each of the totals, you would have a histogram
that describes the PDF for your dice (assuming that they are “fair”).
What is the most frequently occurring sum that you could roll?
What is the probability of obtaining the most likely sum in a single roll of the dice?
What is the probability of obtaining a sum greater than 2 and less than 11?
There are four types of control charts commonly used with attribute data. The decision on
which to use depends on: (a) whether or not a unit is to be classified defective (having one or
more defects), or if the number of defects in a unit (or per unit) is of interest; and (b) if the size
of the rational sampling group is fixed or variable.
A unit can be classified as defective (or non-conforming) if it contains one or more defects. If
the number of defective units in a sample of units is of interest, and if the number of units in
the sample is constant, the np-Chart is used to track the production process. However, if the
number of units in the sample varies, and the interest in the fraction (or percentage) of
defective units in the sample, then the p-Chart is used.
Sometimes just the number of defects per inspection unit is a better measure of performance.
If it is easier to count the number of defects in a fixed-size inspection unit (defects per 100
solder joints), then the c-Chart is used. But if the size of the inspection unit could vary (perhaps
the current inspection unit has 350 solder joints this time), then the u-Chart will let us track the
number of defects on a per-unit basis (where the number of inspection units is 3.5 in this case).
Figure 1 (below) depicts the decision process for choosing the most appropriate control chart.
The following sections describe how the control limits for these control charts are computed,
and how these charts are interpreted.
P-Charts
P-Charts are derived from the Binomial Distribution, and are used to track the proportion (p)
that are defective within a variable sample size. If D is the number of defective units in a
random sample of size n, then our sample proportion defective will be:
p̂=Dn
Since these samples come from a binomial distribution, and assuming that we knew the true
proportion defective in all the product was p, then the probability that the number of
defectives (D) in a sample of size n is exactly x units is given by:
P {D=x}=(nx ) px(1−p )n−xwhere x = 0, 1, 2, …
If we took a large enough number of samples, we would find that the mean proportion
defective in the distribution () would be very close to p, and that the population variance
would be given by:
σ p̂2=p(1−p )n
If we wanted to do a two-sided hypothesis test to see if the proportion defective from one
sample was different from the proportion defective found in another sample, we could use an
approximate normal distribution and the test statistic:
z0=p̂1− p̂2
√ p̂(1− p̂ )( 1n1
− 1n2 ) where
p̂=n1 p̂1+n2 p̂2
n1+n2
This hypothesis test lends itself to the creation of a control chart for the proportion defective if
we are taking random samples from an industrial process. Like we did with the continuous
variable control
Use p-Chart
No, varies
Yes, constant
Use np-Chart
Individual Defects
Poisson DistributionUse c-Chart
Use u-Chart
No, varies
Kind of inspection variable?
Defective Units
(possibly with multiple defects)Binomial Distribution
Discrete
Attribute
What is the inspection basis? Is the size of the inspection unit fixed?
Yes, constant
Is the size of the inspection sample fixed?
Figure1.
Use X-bar and S-Chart
Use X-bar and R-Chart
Which spread method
preferred?
Standard Deviation
RangeContinuous
Variable
charts, we’ll turn the hypothesis test on its’ side, and estimate a centerline and the upper and
lower control limits.
The best guess for the unknown population proportion defective would be to find the mean
proportion defective over a large number of samples, and let this become our centerline:
p=∑i=1
m
pi
m=∑i=1
m
Di
mn where m is the number of samples, each of size n
As before, when we do not yet have an idea of the process’ performance, we would estimate
the control limits from a large number (20-25) of independent and random samples. Using plus
and minus three standard deviations from the centerline (Shewhart style), the trial control
limits for the proportion defective in any particular sample are:
UCL=p+3√ p(1− p )n
CL=p
LCL= p−3√ p(1−p )n
These trial control limits would be plotted along with the individual, time-ordered sample data,
and then checked to be sure that all samples were within the control limits. If not, we would
investigate the out-of-control points, remove the special cause (if found) and recalculate the
trial limits without any out-of-control samples in the data. Then we would use those control
limits for production monitoring purposes.
If the control limits were to be calculated from a standard value (prior history) for the
proportion defective, the formulation is similar (we replace the sample parameters with the
standard population values):
UCL=p+3√ p(1−p )n
CL=p
LCL=p−3√ p(1−p )n
If we desired control limits at a different point (either further out from, or closer into the mean)
we could replace the constant 3 with a different value (2 for 2 limits, 6 for 6 limits…). Note
also that we should pick our sample size so that there is a high probability of finding at least one
defect in a sample – otherwise, we would effectively accept a zero-defects sample (rendering
our lower control limit useless in detecting important shifts in our process).
In practice, however, the sample size for a p-chart does not have to be held constant. Usually,
we would estimate the mean sample size (n ) and substitute it for the fixed sample size (n) in
the above equations. The computation for the mean sample size from m samples of differing
sizes is found by:
n=∑i=1
m
ni
m
A more exact alternative would be to compute variable width control limits that change with
the individual sample size. If we started with m samples (20 ≤ m ≤ 25) of individual size ni, then
we would estimate the centerline (once) at p from:
p=∑i=1
m
Di
∑i=1
m
ni
And then we could have the control limits vary in width about this centerline as the sample size
changed, using the limits:
UCL=p+3√ p(1− p )ni
CL=p
LCL= p−3√ p(1−p )ni
NOTE: When using variable width control limits, it is not possible to utilize rules for detecting
runs. In general, run rules are never used with p-charts. The lack of a strong statistical basis for
these run rules is one of the reasons that continuous variable control charts are preferred to
attributes charts – there is simply more information available from the continuous variable than
from the discrete variable.
NP-Charts
The np-Chart is used to track the number of defective units in a sample of units (rather than the
proportion of defective units). Like the p-chart, this chart is derived from the Binomial
Distribution. However, the np-Chart always requires a fixed sample size. Calculating the
control limits from sample data leads to:
UCL=n p+3√n p(1−p )CL=n pLCL=n p−3√n p(1−p )
And if there was a historical standard for estimating np, then the control limits become:
UCL=np+3√np(1− p )CL=npLCL=np−3√np(1− p )
For an np-Chart, the control limits are constant (until we improve the process and recalculate
tighter control limits). In this case, as long as we have a sample of inspection units that has a
high probability of having at least one defective unit in each sample, we can utilize the run rules
without violating assumptions too much. This gives us a slightly more powerful control chart
than the p-chart, at the cost of inspecting a slightly larger sample of the units.
C-Charts
Sometimes the presence of a defect does not “ruin” the product, even if defects are
undesirable. For example, a farmer might still buy a tractor even with a few scratches in the
paint on one fender, or a computer programmer might still accept an LCD monitor with one or
two defective pixels. However, a good manufacturer would still wish to track the number of
defects occurring in each product in order to improve and continue to compete. C- and u-
Charts work to track the number of defects that occur as a product is created.
The c-chart is derived from the Poisson Distribution, which assumes that the opportunities for
defects to occur is essentially infinite (ex.: small defects occurring within a large area). If x is a
given number of defects, then the probability of observing x defects in an inspection unit is:
p( x )= e−c c x
x ! where c is the true mean count of the number of occurrences per unit
For the Poisson distribution, the mean and the variance are the same, and both are equal to c.
This information can be used to set up an approximate Normal hypothesis test (but it is quicker
to just cut to the derivation of the control charts limits!).
The mean count of defects occurring per inspection unit is best estimated by counting the total
number of defects occurring over a large number of inspection units:
c= total number of defects
total inspection units
This parameter will represent our center line, but we will also need upper and lower control
limits. If we are working to establish control limits from sample data, the formulation would
be:
UCL=c+3√cCL=cLCL=c−3√c or 0 if LCL is negative
Alternatively, if we are continuing to use an existing and stable process, the “standard” value of
c could be used for the control limits by:
UCL=c+3√cCL=cLCL=c−3√c or 0 if LCL is negative
In all cases of the c-Chart, the inspection unit is a constant size. Provided that the LCL is greater
than zero, then we will have constant control limits and we can apply the rules for detecting
runs in addition to the out-of-control point criteria to determine if our process is stable and in-
control.
U-Charts
These charts are used when the size of the inspection unit may vary. (In fact, the size might not
even be an integer multiple of the inspection units!) Assuming that we have to generate the
control limits from a pool of 20-25 samples, our best estimate for the center line for the u-chart
is:
u= total number of defects
total units inspected
One option for the u-chart is to use the mean sample size in computing the upper and lower
control limits. The mean sample size and the control limits are computed from:
n=∑i=1
m
ni
m
UCL=u+3√unCL=u
LCL=u−3√unAnother, more exact alternative is to use variable control limits. Similar to the variable limit p-
chart, we would compute our centerline once from our sample data, and then use it to change
the limits with each sample. From this point, we can compute our control limits for each
individual sample size (ni) by:
UCL=u+3√uniCL=u
LCL=u−3√uniAs with the other variable limit control chart, the ability to use run tests is forfeited.
Additionally, if the defects occur in clusters (ie. the presence of one defect makes it more likely
for another defect to occur), then the defects do not follow a Poisson Distribution and the
control limits will not be very precise. In some instances, mixtures of defect types can
sometimes cause clustering.
In some cases, when the defect rates are in the low parts-per-million range, the size of the
inspection unit will grow very large. U-Charts can also be used if the plotted variable is changed
to be the time-between-successive-defects, with much lower inspection frequency/cost.
Control chart for variables
Variables are the measurable characteristics of a product or service. Measurement data
is taken and arrayed on charts. The types of charts are often classified according to the type of
quality characteristic that they are supposed to monitor: there are quality control charts
for variables and control charts for attributes. Specifically, the following charts are commonly
constructed for controlling variables
1) X-bar chart
In this chart the sample means are plotted in order to control the mean value of a
variable (e.g., size of piston rings, strength of materials, etc.). The charts' x-axes are time based,
so that the charts show a history of the process. For this reason, data should be time-ordered;
that is, entered in the sequence from which it was generated. If this is not the case, then trends
or shifts in the process may not be detected, but instead attributed to random (common cause)
variation. For subgroup sizes greater than ten, use X-bar / Sigma charts, since the range
statistic is a poor estimator of process sigma for large subgroups
2) R chart
In this chart, the sample ranges are plotted in order to control the variability of a variable.
3) S chart
In this chart, the sample standard deviations are plotted in order to control the variability of
a variable. For sample size (n>10), the S-chart is more efficient than R-chart. For situations
where sample size exceeds 10, the X-bar chart and the S-chart should be used.
4) S2 chart
In this chart, the sample variances are plotted in order to control the variability of a variable
5) X-bar and R charts
An Xbar-R chart plots the process mean (Xbar chart) and process range (R chart) over
time for variables data in subgroups. This combination control chart is widely used to examine
the stability of processes in many industries.
For example, Xbar-R can be use to monitor the process mean and variation for
subgroups of part lengths, call times, or hospital patients' blood pressure over time.
The Xbar chart and the R chart are displayed together because both charts can
determine whether your process is stable. Examine the R chart first because the process
variation must be in control to correctly interpret the Xbar chart. The control limits of the Xbar
chart are calculated considering both process spread and center. If the R chart is out of control,
then the control limits on the Xbar chart may be inaccurate and may falsely indicate an out-of-
control condition or fail to detect one.
6) X-bar and S charts
An Xbar-S chart plots the process mean (Xbar chart) and process standard deviation
(S chart) over time for variables data in subgroups. This combination control chart is widely
used to examine the stability of processes in many industries.
The X-s chart is very similar to the X-R chart. The major difference is that the
subgroup standard deviation is plotted when using the X-s chart, while the subgroup
range is plotted when using the X-R chart. One advantage of using the standard deviation
instead of the range is that the standard deviation takes into account all the data, not just
the maximum and the minimum.
The figure below is the X chart. The X values are plotted on this chart. Three lines
are plotted on the chart. The middle line is the overall process averag; the upper line is the
upper control limit; and the lower line is the lower control limit.
CONCLUSIONS
In general, continuous variable control charts will detect smaller changes earlier than an
attribute control charts can. The Central Limit Theorem can be used to justify an approximation
of attribute data with control charts based on the Normal Distribution. Finally, continuous
variable control charts normally require much smaller sample sizes as well.
However, attribute control charts can cover several defect types on one chart, where two
charts (x-bar and R- or -Charts are required for each single characteristic to be measured. And
continuous variables generally require more refined equipment and time to complete the
measurement, leading to a higher inspection cost.
REFERENCES1) http://www.dummies.com/how-to/content/statistics-discrete-and-continuous-random-
variable.html
2) http://stattrek.com/probability-distributions/discrete-continuous.aspx?Tutorial=Stat
3) http://www.henry.k12.ga.us/ugh/apstat/chapternotes/7supplement.html
4) http://www.statisticshowto.com/discrete-vs-continuous-variables/