7 quality control tools
DESCRIPTION
7 Quality Control ToolsTRANSCRIPT
The Seven Quality Control Tools (7 qct)
comprise a number of graphical tools to
be used in the analysis of numerical data.
By means of imagination and some knowl-
edge of the process at hand, we can use the
graphical aids to point out and illuminate
both known and unknown problems and
sources of variation. The analysis is often
strengthened if we utilise advanced meth-
ods of statistical analysis.
7 Quality control toolsoverview
•
•
•
•
•
••
•
•••
•
••
••
•
••
•
••
•
•
•
•
•
••
•
•
••••
•
••
••
••••
•
••
•
•
•
102
7 qct, Seven Quality Control Tools, are tools that support the analysis of numerical
data. Examples of these tools are histograms, Pareto diagrams, scatter diagrams, etc.
They can be combined with verbal methods such as affinity diagrams, tree
diagrams, etc.
The systematic application of 7 qct is a useful way of treating measurements.
However, the 7 qct tools are only a part of a statistical analysis of the data at hand.
It is easy to come across data that cannot be handled by the 7 qct alone.
Statistical methods
It is not possible to avoid the use of statistical methods when drawing meaningful
conclusions from complex numerical data sets. In other words, it is not a question
of “either/or” in the choice between graphical and statistical methods. Both meth-
ods are necessary to investigate and convey the message hidden in complex
measurement data.
Data collections
The data that is to be collected for analysis must be of good quality. No analysis
can salvage an inferior set of data. Therefore, the time spent on planning the data
collection is a good investment if used with care. Before any collection it should
be fairly well known what questions are to be answered, what diagrams are to
be drawn, what numerical analysis are to be performed etc. However, one must
be prepared to enlarge or change some ideas as new directions and hypotheses are
born. Maybe the investigator is looking for silver and suddenly he finds gold...
One requires factual information in order to solve a problem and this information
often takes the form of measurement data. On occasions, one may be tempted to use
measurement data that has already been collected for some other purpose, but this
may be risky.
Assume, for example, that one is interested in comparing the results of two
machines. Assume further that one has access to measurements that have been
earlier on the two machines. Unfortunately, one does not know whether the
machines were run by different operators, the state of the raw materials involved
in the process, etc. Then, the measurements provide little possibility in determin-
ing whether or not there is a real difference between the machines, simply because
the results may have been affected of a number of other factors.
It is, therefore essential, that one first has a clear definition of one’s purpose
in performing a measurement, and then plans the execution of the measurement
based on this purpose.
Some questions
• What is the purpose of the measurement?
• What is to be measured? The purpose of the measurement process guides
the choice of measurement variables.
• Where in the process should the measurement be made? Is a flow chart of
the process required?
103
• How are the measurements to be made? What measuring instruments are to
be used? How are the measurements to be documented (measured values, date,
measurement method, measuring equipment, name of the person performing
the measurement, etc.)? Are any special measurement routines or instructions
necessary?
• Who or which persons will perform the measurements? Do they require
special training?
• When are the measurements to be performed? Are they to be taken during
a short interval or over a longer period of time?
• Are there any surrounding variables that should be checked? Is there a risk that
the results may be affected by hidden variables that are unknown at this time?
• How is one to report, describe and analyse the measurement material?
One usually differentiates between two different types of measurement data –
variable data and attribute data. Variable data refers to data that is the result of
measuring a length, a weight, a time, etc. They are continuous by nature and are
measured using some form of measuring equipment. Attribute data, on the other
hand, is based on some form of number counting and is, therefore, discrete (i.e., not
continuous). Examples of attribute data are the number of faulty units, percentage
defective entities, number of defects, number of faults of various types, etc.
Of the methods included in 7 qct, frequency matrices, bar charts, and Pareto
diagrams are applied in the measurement of attribute data, whereas histograms,
scatter diagrams, run charts, and stratification may be applied to the measurement
of both attribute and variable data.
Diagram – general
Most diagrams use perpendicular co-ordinate systems. Such systems have two axes
that form a 90° angle where they cross. The horizontal axis is called the x-axis and
the vertical axis is called the y-axis. All diagrams do not have arrows on the axes.
The most basic requirements that a diagram must meet are that it should be
clear and easily understood. The diagram must not contain unnecessary ornamen-
tation, but as much information as is required to make it easily understandable.
The axes must be clearly indicated and there must be no doubt as to which units
are being used. Does the axis show percentage, cost per tonne or what? The
diagram must be fully understandable and there must be no need to go back to
the text to find out what the diagram is supposed to show. In case the diagram is
to be interpreted together with a table of data, this table must be easily accessible.
Sometimes there is the need of comparing several diagrams at the same time. To
enable this to be done efficiently, the scales and the sizes of the diagrams must agree
and be carefully selected. Most diagrams have linear, i.e., proportional scales, but
some become clearer if another scale is used, e.g., a logarithmic scale. However, the
use of such scales is not so normal, so they should be used with care. The above may
be summarised in the following requirements for a diagram:
• the diagram must clear and easily understandable
• the axes must be clearly indicated
• the diagram must have an adequate amount (not too many) of well-defined
figures and units
• the diagram should have a short but clear explanation with comments and
references.
Impossible graphsAll graphical presentations should be lucid and crisp to make it easy for the onlook-
er to grasp the essence of the data. Unfortunately, many people are tempted to use
the unlimited possibilities that modern computer programs offer. In a few seconds,
it is possible to turn a simple histogram into a three-dimensional diagram cluttered
by different colours and fonts, with numbers on top of the bars, and with more or
less irrelevant text or figures filling up the diagram area.
This misuse of graphs is not uncommon; it can be seen in newspapers, technical
journals, and other printed matter. In more serious connections, however, this is
to be avoided. Instead, one should strive for simplicity. This means including an
adequate amount of text explaining the graph, including the axes and the scales,
so that the message of the graph is clearly conveyed. If measurement data is to
be attached, they are best shown in a separate table which is associated with the
graph.
The frequency matrix is used to sum-
marise measurement data that can be
divided into classes in accordance
with two or more grouping methods.
Frequency matrix
Report number A B C D E
1 //// // /
2 /// /
3 / /// //
106
Frequency matrix
The frequency matrix is a simple way of summarising measurement data that can
be subdivided into classes in accordance with two or more grouping methods. The
frequency matrix is based on first subdividing data into classes and then counting
the number of measurement data items found in each class. In most cases, slashes
are used when this is done manually. One slash (/) is drawn for each observation.
This provides a suitable visual picture of how data is distributed between the
different classes, and the frequency matrix can, therefore, be helpful in providing
clues to where one should look for the cause of a specific problem.
Frequency matrix – step by step
1. Establish the purpose of the data collection to be performed.
2. Plan the measurements.
3. Collect data and document the conditions (date, time, machine, operator, etc.)
4. In the event that classification is to be performed based on a number of criteria,
e.g., classification as a function of day of the week and machine, shift or
machine and type of error, a frequency matrix should be used to summarise
the measurement material that has been collected.
5. Review the data material and enter each value into its class in the frequency
matrix.
6. Assign a title to the frequency matrix and add any other explanatory
information that may be necessary to simplify the reading and understanding
of the matrix.
7. Analyse the frequency matrix: Is there any data aggregation in any specific
class or classes? What could have caused this?
Bar charts are used to show how one
or several variables change between
different categories or classes. The
x-axis of the diagram usually shows
the categories or classes and the height
of the bar corresponds to the measured
value of the category in question. In
many cases, there are several groups
of data within one category.
Bar chart
108
Bar chart
A bar chart is used when one wants to show how data can be divided into one or
several categories. For example it makes it possible to compare data from different
time intervals, e.g. how consumed time is distributed amongst different tasks or
projects or how costs are distributed over different years.
Figure 1 A–C can be three different products that are divided into categories e.g. three different years.
Figure 2 The diagram to the left is exactly the same as figure 1 but drawn in another way. However, figure1is usually easier to understand and the differences are easier to spot.
Analysis using a Pareto diagram is an
effective way to find possible opportuni-
ties for improvements. If “the vital few”
are found then one has a good ground
for further actions. The Pareto diagram
is used when data can be divided into
groups of e.g. fault types, processes,
subsystems, etc.
Pareto diagram
110
Pareto diagram
One must first determine where the problems and the opportunities lie before
attempting to change or improve a process. This is made easier if one utilises
some form of systematic approach. A good approach is the use the various
graphical methods that are available. Through their use, one can easily gain
an overview of the collected data material and thereby determine which faults
or problems are the most serious ones.
We shall now illustrate the design, drawing and interpretation of a so-called
Pareto Diagram. Pareto diagrams are used in cases where one can subdivide data
into categories. Assume that we are manufacturing printed circuit boards and that
we find, during final inspection that we have a 10% incidence of rework. This
rework often consists of changes that are due to a number of fault-types (categories)
e.g., missing components, incorrectly positioned components, faulty soldering,
faulty labelling, and other faults.
Another example may be the study of the causes of delays in the delivery of
finished products to customers. There are probably a number of different faults that
can explain why deliveries are late. Each of the above fault-types can naturally be
further subdivided. Incorrectly positioned components may perhaps be subdivided
into type of component or size or supplier. The objective is to determine whether
a particular cause is more prominent than others are and thereby take the necessary
steps to correct the situation.
It is vital that one subdivides one’s data material using a number of approaches.
Even if a particular fault occurs most frequently, it is not necessarily so that it is the
most expensive one. If possible, the Pareto diagram should therefore be based on
costs instead of quantity. An example: assume that we have a number of customer
complaints which we subdivide into transportation, installation, delivery, bill of
delivery, administration, miscellaneous and draw the Pareto diagram shown in
Figure 1. If instead, we attempt to illustrate the costs resulting from the different
complaints, we get Figure 2 where the causes of the complaints are ranged
in another order.
Figure 2 The effect of dividing with respect tocosts instead of quantity in a Pareto diagram.
Figure 1 Division of the number of complaintsaccording to different causes.
111
Further comments
The Pareto diagram is commonly used in connection with improvement work.
One way is to check if corrective actions have be successful or not. A new, later
Pareto diagram has to be compiled in the same way using the same categories
and the same scales. The basic idea is that we use the human ability to compare
pictures and if the pictures are differently scaled, this ability can not be utilised.
If data stems from several different time intervals, it must be fully understood
that criteria have not been changed. Any such changes makes it more difficult
or even impossible to compare the diagrams.
112
If we have numerical data that expresses
number of faults per batch, time, length,
thickness, etc., we can use a bar diagram
or a histogram. These types of diagrams
divide the data into a number of classes
or intervals. We then get an idea of, e.g.,
the average, variation, how the data is
distributed over the interval, if there are
extremely deviating data points, etc.
Bar diagram and histogram
114
Bar diagram
Bardiagram and histogram are simple tools that in an effective way present a
set of data graphically. These tools are used when the x-axis is quantitative
(i.e. numbers), instead of qualitative (e.g. categories).
Figure 1 The diagram to the left is often used when the x-axis consists of integers (whole numbers). The fictive data is number of faulty units in a batch of eight units. The smallest possible outcome is of course 0 and the largest is 8. The y-axis shows how many batches with 0, 1, ... or 8 faulty units in a batch.
Figure 2 The diagram to the left is called a histogram and is often used when the data on the x-axis is convenientlydivided into intervals.
Histogram
In the example of the bar diagram, we only had nine different options (0 through
8 faults per batch). On the other hand, if 0 would have been the smallest and 43
the largest possible value we would have needed 44 bars and, possibly, some bars
would have zero length (maybe there were no batches with, e.g., 4, 17, 22, 31, or
39 faults). Such a diagram would not give a good overview of the data. Instead the
data would be grouped according to table 2.
1st bar consists of 0–4 faults per batch
2nd 5–9
3rd 10–14
4th 15–19
5th 20–24
6th 25–29
7th 30–34
8th 35–39
9th 40–44
In this way, we get a well-composed diagram with nine bars. The groups or
intervals in Table 2 are usually called cells or bins, and the diagram is called
a histogram.
Interpretation of a histogram
A histogram compiles a set of data into an overview. It is possible to get an at least
approximate idea of its average value and how the values are spread around the
average. If there are limits or tolerances for the process, these can be drawn as limits
vertically in the histogram. Then it is possible to check whether the data is inside
or outside the limits, or if the whole data set is displaced in any direction.
It is also important to evaluate the shape of the histogram (see Figure 3):
• Are the values symmetrically distributed?
• Is the distribution of values skewed in any direction?
• Are there any outliers (i.e., very extreme values)?
• Is there more than one obvious peak in the distribution of values?
Using such questions, one can get ideas for continued investigation of the data.
Different processes give different histograms. It might be completely natural and
expected that a histogram shows a markedly skew picture because that is what
the process generates. In another case, one may expect a symmetrical histogram
but it shows two obvious peaks. In that case, further investigation is needed to find
the reasons. Figure 3 shows different histograms together with some comments.
Figure 3 Five histograms with different appearances.
Figure 3a Symmetrical. This histogram shows data from a process that generates data
that is symmetrical around its average.
Figure 3b Positively skewed. This histogram is positively skewed and may arise from
many situations. For example, data showing the number of incorrect units per batch
shows such a histogram, especially when the fault rate is low.
Figure 3c Negatively skewed. A negatively skewed histogram may be the result
of data showing the number of correct units per batch, especially when the fault
rate is low.
Figure 3d Outliers. This histogram shows a set of data with some outliers (very
extreme values). There is no general advice on how to handle outliers. They
might be due to typing errors, incorrect measurements, etc. The may also come
from some another machine or process, or some process conditions that generate
such extreme values. However, under no circumstances should the extreme values
be discharged without some analysis or comments. On the contrary, deviating
values could sometimes provide valuable leads to improvement.
115
a b c d e
116
Figure 3e Two peaks. This histogram shows a data set having two peaks. There may
be many things causing this appearance. Suppose that the data set comes from
two processes with different mean values. This would then manifest itself as two
peaks if the two mean values are significantly different.
However, a smaller difference, will not stand out that much but could still
be of practical significance. On the contrary, the histogram will probably show
one peak only. One has to use several different graphical approaches to get all
the information out of the data.
Further comments
The more measurements the histogram contains the more obvious its real shape
becomes. If the histogram contains only a few values, there is the risk that the
random fluctuations will induce a pattern that does not exist. Suppose that
a histogram splits into two more or less different parts. Is this then a clear
indication that the process that generated the values consists of two parts?
Or is it just randomness? Surely, if one feels that there is a real difference
behind the data, it should be investigated using different plots and numerical
analysis.
Sometimes people confuse a histogram with an ordinary diagram showing
time, e.g., weeks or months on the x-axis. The reason for this is probably that
most diagrams that people see in papers, on TV, at work, etc., show time series
of some kind. The real strength of the histogram does not emerge until it is
used in conjunction with statistical theory.
Sometimes each data point consists of two
or more values, e.g., length and weight,
lead-time and batch size. It is then a good
idea to show the data as a graphical picture
in order to reveal any relation between the
variables. If we make several smaller dia-
grams closer together we get an efficient
overview of the data. Note that sometimes
a lack of relation between two variables is
also valuable information.
Scatter diagram
•
••••••
•••••••
••
••••
118
Scatter diagram
A major task in the improvement work is to find the basic sources for variation and
fault generation. A scatter diagram can be used to investigate the correlation
between two variables.
The diagram to the left is a scatter diagram. Every dot consists of two values: one x-value and one y-value. The x-value could be the size of a batch of items or a block of program-ming code. The y-value could be the fault rate found in the particular batch or program-ming block. The interpretation of this par-ticular diagram would be that the larger the batch or block size, the larger the fault rate. This might be an unknown ideaworth a deeper investigation. Of course, a diagram that shows no correlation at all, is also know-ledge. There might have been a strong idea that an increase in x generates an increase in y. But suppose that the diagram does not show this: what is thenwrong, the original idea or the process? More investigation is needed!
Analysis of some different scatter diagrams
In Figure 1, there are four typical scatter diagrams involving two variables.
The relation between two variables is sometimes referred to as the correlation:
Figure 1 Diagram for data with different kinds of correlation.
•
•
•
•
•
••
•
•
••
•
••
••
•
••
•
••
•
•
•
•
•
••
•
•
••••
•
••
••
•••
•
•
•
•
•
•
•
•
••••••
•••••••
••
••••
•
•••
•••
••
•
••••
••
•
••
•
••••
•••••
••••••
•
•
•••
•••
••
••••••••••••
••
•
a b
c d
Figure 1a Positive correlation The diagram shows a so-called positive correlation,
i.e. a trend upwards to the right, between the x- and y-results. Higher x-values
correspond to higher y-values in general.
Figure 1b No correlation The diagram shows no obvious correlation between the
x- and y-results. The y-values do not change for changes in the x-values (except
for small random changes).
Figure 1c Negative correlation The diagram shows a negative correlation, i.e. a trend
downwards to the right for the x- and y-results. Higher x-values correspond to
lower y-values in general.
Figure 1d The diagram shows a curved relation between the x- and y-results. For
small x-values, the y-values decrease but for large x-values the y-values increase.
Sometimes, the outcome may depend on two or more variables. Assume that the
outcome depends on the temperature and pressure of the process. Then, it is
difficult to illustrate the relation in an ordinary two-dimensional scatter diagram.
A three-dimensional diagram or a numerical method may be needed for further
analysis.
Further comments
Very often there are many variables that one wants to plot against each other.
However, the interpretation becomes clumsy if one has to generate too many
diagrams on paper. There are nowadays computer programs that can generate
what is known as a plot matrix, i.e. on the screen one can plot several variables
against each other in pairs. In this way one gets a good overview. If the pro-
gram also supports what is called brushing technique, then it is possible to
point with the cursor on one or several values in one plot and then the same
value or values are emphasised in all plots on the screen. In this way it is
possible to look at some values from several different angles and thus easier
understand the information of the data.
119
A run chart or line diagram is usually
used to illustrate the development of a
course of events over time. The diagram
allows one to determine whether the
measured variable indicates a tendency
to increase or whether there exists peri-
odic variations, up and down.
Run chart
•
•
•
• ••
•
Run chart
Run charts are very often used to show how a variable is changing over time or
in space. Are there trends or periodic behaviours in the material?
The diagram has two axes. The horizontal axis often represents time, whereas
the vertical axis specifies a quantity. Measurement results are plotted as points
in the diagram. The points are ordinarily connected using straight lines. In a
run chart, a variable can be studied as a function of time.
Batch number Fault rate
1 0.014
2 0.023
3 0.000
4 0.006
5 0.030
6 0.012
7 0.005
8 0.005
9 0.010
10 0.003
Figure 1 A line diagram illustrating measured percentage defects in 10 successive batches of goods.
Some notes on the use of different scaling
The choice of the scales and of the position of the origin of the axes in a diagram
means a lot for the appearance of the diagram. This can be seen in Figure 2 where
the left-hand diagram includes the origin whereas the right-hand diagram, with
exactly the same data, has a different scale that puts the origin far below the edge
of the paper.
Figure 2 Two diagrams with the same data. However, the y-axis of the left diagram includes the origin whilethe y-axis of the right diagram with a different scale, puts the origin far outside the paper.
The two diagrams in Figure 2 show the same data but convey different impressions.
The left-hand diagram seems to indicate that the level is fairly stable whereas the
right-hand diagram seems to show a dramatic increase from, say, x-value 5. Note
that neither diagram is wrong, they only present the data differently.
122
•
•
•
•
•
•
• •
•
•
123
Even if the y-axis includes the origin, one can create different impressions by using
different ratios between the two axes. Figure 3 shows this.
Whether to use a scatter diagram or a run chart is perhaps a matter of taste. The
scatter diagram might be considered more true as it shows only the those measure-
ments that are really recorded.
Most run charts consists of data from discrete times only, but still the chart
shows a line between the recorded points. One argument in favour of using the
run chart in this case is that the line between the points make it easier to detect
regular patterns in the data.
One can argue that many times we do not know the values between the points.
Continuous recording of temperatures etc. can of course be shown as a run chart.
All comments that are applicable to scatter diagrams are of course applicable
to run charts as well.
•
•
•
• ••
•
•
•
•
• ••
•
•
•
•• • • •
•
•
•
••
•
•
Figure 3 Four different diagrams with different ratiosbetween the two axes of the diagram.
By dividing the data based on different
criteria, we can investigate the data further
using some graphical tool. If we, e.g., have
a large set of data from different machines,
it might be wise to account for data per
machine, shift, etc. This is called stratifi-
cation of the data and we can use most of
the graphical tools.
Stratification
• • ••
••
• • • ••
•• • •
• • • • •
++++++
++
++++++
++
++++
126
Stratification
Measurements are performed for a number of reasons. One may be interested in
investigating the cause of a specific problem or one may be interested in collecting
general information that may, eventually, be helpful in improving one’s process.
Collecting different categories of data (from different machines, different shifts,
different companies, etc.) and subdividing this data into subsets may lead to
the discovery of differences or peculiarities that support the continuation of an
improvement process.
The subdivision of the total data set into subsets is called stratification (the
subsets are sometimes referred to as strata, which is the plural form of stratum).
After having performed the stratification process, one can continue by using a
suitable diagram to describe the different subsets in an effort to determine the
existence of any differences in position, spread, and distribution of fault types
or the like.
During the performance of the actual measurements, one should note all
imaginable stratification factors that vary during the performance of the measure-
ments. Measurements should be conducted during a sufficiently long period of
time so that stratification factors of interest (shifts, material consignments, etc.)
have sufficient opportunity to vary. Stratification factors
may be composed of other categories than those men-
tioned above. They may very well be numerical variables
that are subdivided into two or more classes (categories):
large, medium, or small program blocks, large or small
customers, etc.
Example 1: Stratification of a histogram
In this first example we have data that is pictured using
a histogram. It is known that the data comes from three
different sources and it is important to investigate
whether there is a difference between the averages of
the three data sets. (Such a question should be answered
using both numerical and graphical methods). By
making several histograms, one for each source,
it might be possible to see such differences.
Note that the top histogram contains all the
data; its y-axis, therefore, has a different scale. It is
impossible to see that the top histogram actually
consists of several histograms. However, if the data
is stratified, it is quite easy to see that each histogram
contains data with different averages.
127
Example 2: Stratification of a scatter diagram
The second example shows the appearance of how the data in a scatter diagram after
stratification of the data. The first diagram contains all the data using only one sym-
bol. The second diagram contains one symbol per stratum (there are two strata in
the diagram).
It is obvious that the data originates from (at least) two processes. However, without
a numerical analysis, it is difficult to determine whether the data indicated using a
‘+’-sign has a greater slope than the ‘•’-data or whether it has the same slope but at
a higher level.
Stratification – step by step
1. When planning the measurements, establish which imaginable stratification
factors (machine, shift, company, block size, etc.) are to be registered together
with the measurement results.
2. Perform the measurements and register the measured data together with the
stratification factors to enable stratification.
3. Stratify the measurement data in different ways and describe the various subsets
using suitable diagrams to enable their being compared. The availability of
software for statistical analysis, e.g. Minitab or any other statistical package,
simplifies this task considerably.
• • ••
••
• • • ••
•• • •
• • • • •
• • • •• •
••
• • •• • •
••
• • • •
• • ••
••
• • • ••
•• • •
• • • • •
++++++
++
++++++
++
++++
Figure 1 A scatter diagram of all data. The right-hand diagram shows the same data as theleft-hand diagram but using two difficult symbols.