7 quality control tools

The Seven Quality Control Tools (7 qct)

comprise a number of graphical tools to

be used in the analysis of numerical data.

By means of imagination and some knowl-

edge of the process at hand, we can use the

graphical aids to point out and illuminate

both known and unknown problems and

sources of variation. The analysis is often

strengthened if we utilise advanced meth-

ods of statistical analysis.

7 Quality control toolsoverview

•

•

•

•

•

••

•

•••

•

••

••

•

••

•

••

•

•

•

•

•

••

•

•

••••

•

••

••

••••

•

••

•

•

•

102

7 qct, Seven Quality Control Tools, are tools that support the analysis of numerical

data. Examples of these tools are histograms, Pareto diagrams, scatter diagrams, etc.

They can be combined with verbal methods such as affinity diagrams, tree

diagrams, etc.

The systematic application of 7 qct is a useful way of treating measurements.

However, the 7 qct tools are only a part of a statistical analysis of the data at hand.

It is easy to come across data that cannot be handled by the 7 qct alone.

Statistical methods

It is not possible to avoid the use of statistical methods when drawing meaningful

conclusions from complex numerical data sets. In other words, it is not a question

of “either/or” in the choice between graphical and statistical methods. Both meth-

ods are necessary to investigate and convey the message hidden in complex

measurement data.

Data collections

The data that is to be collected for analysis must be of good quality. No analysis

can salvage an inferior set of data. Therefore, the time spent on planning the data

collection is a good investment if used with care. Before any collection it should

be fairly well known what questions are to be answered, what diagrams are to

be drawn, what numerical analysis are to be performed etc. However, one must

be prepared to enlarge or change some ideas as new directions and hypotheses are

born. Maybe the investigator is looking for silver and suddenly he finds gold...

One requires factual information in order to solve a problem and this information

often takes the form of measurement data. On occasions, one may be tempted to use

measurement data that has already been collected for some other purpose, but this

may be risky.

Assume, for example, that one is interested in comparing the results of two

machines. Assume further that one has access to measurements that have been

earlier on the two machines. Unfortunately, one does not know whether the

machines were run by different operators, the state of the raw materials involved

in the process, etc. Then, the measurements provide little possibility in determin-

ing whether or not there is a real difference between the machines, simply because

the results may have been affected of a number of other factors.

It is, therefore essential, that one first has a clear definition of one’s purpose

in performing a measurement, and then plans the execution of the measurement

based on this purpose.

Some questions

• What is the purpose of the measurement?

• What is to be measured? The purpose of the measurement process guides

the choice of measurement variables.

• Where in the process should the measurement be made? Is a flow chart of

the process required?

103

• How are the measurements to be made? What measuring instruments are to

be used? How are the measurements to be documented (measured values, date,

measurement method, measuring equipment, name of the person performing

the measurement, etc.)? Are any special measurement routines or instructions

necessary?

• Who or which persons will perform the measurements? Do they require

special training?

• When are the measurements to be performed? Are they to be taken during

a short interval or over a longer period of time?

• Are there any surrounding variables that should be checked? Is there a risk that

the results may be affected by hidden variables that are unknown at this time?

• How is one to report, describe and analyse the measurement material?

One usually differentiates between two different types of measurement data –

variable data and attribute data. Variable data refers to data that is the result of

measuring a length, a weight, a time, etc. They are continuous by nature and are

measured using some form of measuring equipment. Attribute data, on the other

hand, is based on some form of number counting and is, therefore, discrete (i.e., not

continuous). Examples of attribute data are the number of faulty units, percentage

defective entities, number of defects, number of faults of various types, etc.

Of the methods included in 7 qct, frequency matrices, bar charts, and Pareto

diagrams are applied in the measurement of attribute data, whereas histograms,

scatter diagrams, run charts, and stratification may be applied to the measurement

of both attribute and variable data.

Diagram – general

Most diagrams use perpendicular co-ordinate systems. Such systems have two axes

that form a 90° angle where they cross. The horizontal axis is called the x-axis and

the vertical axis is called the y-axis. All diagrams do not have arrows on the axes.

The most basic requirements that a diagram must meet are that it should be

clear and easily understood. The diagram must not contain unnecessary ornamen-

tation, but as much information as is required to make it easily understandable.

The axes must be clearly indicated and there must be no doubt as to which units

are being used. Does the axis show percentage, cost per tonne or what? The

diagram must be fully understandable and there must be no need to go back to

the text to find out what the diagram is supposed to show. In case the diagram is

to be interpreted together with a table of data, this table must be easily accessible.

Sometimes there is the need of comparing several diagrams at the same time. To

enable this to be done efficiently, the scales and the sizes of the diagrams must agree

and be carefully selected. Most diagrams have linear, i.e., proportional scales, but

some become clearer if another scale is used, e.g., a logarithmic scale. However, the

use of such scales is not so normal, so they should be used with care. The above may

be summarised in the following requirements for a diagram:

• the diagram must clear and easily understandable

• the axes must be clearly indicated

• the diagram must have an adequate amount (not too many) of well-defined

figures and units

• the diagram should have a short but clear explanation with comments and

references.

Impossible graphsAll graphical presentations should be lucid and crisp to make it easy for the onlook-

er to grasp the essence of the data. Unfortunately, many people are tempted to use

the unlimited possibilities that modern computer programs offer. In a few seconds,

it is possible to turn a simple histogram into a three-dimensional diagram cluttered

by different colours and fonts, with numbers on top of the bars, and with more or

less irrelevant text or figures filling up the diagram area.

This misuse of graphs is not uncommon; it can be seen in newspapers, technical

journals, and other printed matter. In more serious connections, however, this is

to be avoided. Instead, one should strive for simplicity. This means including an

adequate amount of text explaining the graph, including the axes and the scales,

so that the message of the graph is clearly conveyed. If measurement data is to

be attached, they are best shown in a separate table which is associated with the

graph.

The frequency matrix is used to sum-

marise measurement data that can be

divided into classes in accordance

with two or more grouping methods.

Frequency matrix

Report number A B C D E

1 //// // /

2 /// /

3 / /// //

106

Frequency matrix

The frequency matrix is a simple way of summarising measurement data that can

be subdivided into classes in accordance with two or more grouping methods. The

frequency matrix is based on first subdividing data into classes and then counting

the number of measurement data items found in each class. In most cases, slashes

are used when this is done manually. One slash (/) is drawn for each observation.

This provides a suitable visual picture of how data is distributed between the

different classes, and the frequency matrix can, therefore, be helpful in providing

clues to where one should look for the cause of a specific problem.

Frequency matrix – step by step

1. Establish the purpose of the data collection to be performed.

2. Plan the measurements.

3. Collect data and document the conditions (date, time, machine, operator, etc.)

4. In the event that classification is to be performed based on a number of criteria,

e.g., classification as a function of day of the week and machine, shift or

machine and type of error, a frequency matrix should be used to summarise

the measurement material that has been collected.

5. Review the data material and enter each value into its class in the frequency

matrix.

6. Assign a title to the frequency matrix and add any other explanatory

information that may be necessary to simplify the reading and understanding

of the matrix.

7. Analyse the frequency matrix: Is there any data aggregation in any specific

class or classes? What could have caused this?

Bar charts are used to show how one

or several variables change between

different categories or classes. The

x-axis of the diagram usually shows

the categories or classes and the height

of the bar corresponds to the measured

value of the category in question. In

many cases, there are several groups

of data within one category.

Bar chart

108

Bar chart

A bar chart is used when one wants to show how data can be divided into one or

several categories. For example it makes it possible to compare data from different

time intervals, e.g. how consumed time is distributed amongst different tasks or

projects or how costs are distributed over different years.

Figure 1 A–C can be three different products that are divided into categories e.g. three different years.

Figure 2 The diagram to the left is exactly the same as figure 1 but drawn in another way. However, figure1is usually easier to understand and the differences are easier to spot.

Analysis using a Pareto diagram is an

effective way to find possible opportuni-

ties for improvements. If “the vital few”

are found then one has a good ground

for further actions. The Pareto diagram

is used when data can be divided into

groups of e.g. fault types, processes,

subsystems, etc.

Pareto diagram

110

Pareto diagram

One must first determine where the problems and the opportunities lie before

attempting to change or improve a process. This is made easier if one utilises

some form of systematic approach. A good approach is the use the various

graphical methods that are available. Through their use, one can easily gain

an overview of the collected data material and thereby determine which faults

or problems are the most serious ones.

We shall now illustrate the design, drawing and interpretation of a so-called

Pareto Diagram. Pareto diagrams are used in cases where one can subdivide data

into categories. Assume that we are manufacturing printed circuit boards and that

we find, during final inspection that we have a 10% incidence of rework. This

rework often consists of changes that are due to a number of fault-types (categories)

e.g., missing components, incorrectly positioned components, faulty soldering,

faulty labelling, and other faults.

Another example may be the study of the causes of delays in the delivery of

finished products to customers. There are probably a number of different faults that

can explain why deliveries are late. Each of the above fault-types can naturally be

further subdivided. Incorrectly positioned components may perhaps be subdivided

into type of component or size or supplier. The objective is to determine whether

a particular cause is more prominent than others are and thereby take the necessary

steps to correct the situation.

It is vital that one subdivides one’s data material using a number of approaches.

Even if a particular fault occurs most frequently, it is not necessarily so that it is the

most expensive one. If possible, the Pareto diagram should therefore be based on

costs instead of quantity. An example: assume that we have a number of customer

complaints which we subdivide into transportation, installation, delivery, bill of

delivery, administration, miscellaneous and draw the Pareto diagram shown in

Figure 1. If instead, we attempt to illustrate the costs resulting from the different

complaints, we get Figure 2 where the causes of the complaints are ranged

in another order.

Figure 2 The effect of dividing with respect tocosts instead of quantity in a Pareto diagram.

Figure 1 Division of the number of complaintsaccording to different causes.

111

Further comments

The Pareto diagram is commonly used in connection with improvement work.

One way is to check if corrective actions have be successful or not. A new, later

Pareto diagram has to be compiled in the same way using the same categories

and the same scales. The basic idea is that we use the human ability to compare

pictures and if the pictures are differently scaled, this ability can not be utilised.

If data stems from several different time intervals, it must be fully understood

that criteria have not been changed. Any such changes makes it more difficult

or even impossible to compare the diagrams.

If we have numerical data that expresses

number of faults per batch, time, length,

thickness, etc., we can use a bar diagram

or a histogram. These types of diagrams

divide the data into a number of classes

or intervals. We then get an idea of, e.g.,

the average, variation, how the data is

distributed over the interval, if there are

extremely deviating data points, etc.

Bar diagram and histogram

114

Bar diagram

Bardiagram and histogram are simple tools that in an effective way present a

set of data graphically. These tools are used when the x-axis is quantitative

(i.e. numbers), instead of qualitative (e.g. categories).

Figure 1 The diagram to the left is often used when the x-axis consists of integers (whole numbers). The fictive data is number of faulty units in a batch of eight units. The smallest possible outcome is of course 0 and the largest is 8. The y-axis shows how many batches with 0, 1, ... or 8 faulty units in a batch.

Figure 2 The diagram to the left is called a histogram and is often used when the data on the x-axis is convenientlydivided into intervals.

Histogram

In the example of the bar diagram, we only had nine different options (0 through

8 faults per batch). On the other hand, if 0 would have been the smallest and 43

the largest possible value we would have needed 44 bars and, possibly, some bars

would have zero length (maybe there were no batches with, e.g., 4, 17, 22, 31, or

39 faults). Such a diagram would not give a good overview of the data. Instead the

data would be grouped according to table 2.

1st bar consists of 0–4 faults per batch

2nd 5–9

3rd 10–14

4th 15–19

5th 20–24

6th 25–29

7th 30–34

8th 35–39

9th 40–44

In this way, we get a well-composed diagram with nine bars. The groups or

intervals in Table 2 are usually called cells or bins, and the diagram is called

a histogram.

Interpretation of a histogram

A histogram compiles a set of data into an overview. It is possible to get an at least

approximate idea of its average value and how the values are spread around the

average. If there are limits or tolerances for the process, these can be drawn as limits

vertically in the histogram. Then it is possible to check whether the data is inside

or outside the limits, or if the whole data set is displaced in any direction.

It is also important to evaluate the shape of the histogram (see Figure 3):

• Are the values symmetrically distributed?

• Is the distribution of values skewed in any direction?

• Are there any outliers (i.e., very extreme values)?

• Is there more than one obvious peak in the distribution of values?

Using such questions, one can get ideas for continued investigation of the data.

Different processes give different histograms. It might be completely natural and

expected that a histogram shows a markedly skew picture because that is what

the process generates. In another case, one may expect a symmetrical histogram

but it shows two obvious peaks. In that case, further investigation is needed to find

the reasons. Figure 3 shows different histograms together with some comments.

Figure 3 Five histograms with different appearances.

Figure 3a Symmetrical. This histogram shows data from a process that generates data

that is symmetrical around its average.

Figure 3b Positively skewed. This histogram is positively skewed and may arise from

many situations. For example, data showing the number of incorrect units per batch

shows such a histogram, especially when the fault rate is low.

Figure 3c Negatively skewed. A negatively skewed histogram may be the result

of data showing the number of correct units per batch, especially when the fault

rate is low.

Figure 3d Outliers. This histogram shows a set of data with some outliers (very

extreme values). There is no general advice on how to handle outliers. They

might be due to typing errors, incorrect measurements, etc. The may also come

from some another machine or process, or some process conditions that generate

such extreme values. However, under no circumstances should the extreme values

be discharged without some analysis or comments. On the contrary, deviating

values could sometimes provide valuable leads to improvement.

115

a b c d e

116

Figure 3e Two peaks. This histogram shows a data set having two peaks. There may

be many things causing this appearance. Suppose that the data set comes from

two processes with different mean values. This would then manifest itself as two

peaks if the two mean values are significantly different.

However, a smaller difference, will not stand out that much but could still

be of practical significance. On the contrary, the histogram will probably show

one peak only. One has to use several different graphical approaches to get all

the information out of the data.

Further comments

The more measurements the histogram contains the more obvious its real shape

becomes. If the histogram contains only a few values, there is the risk that the

random fluctuations will induce a pattern that does not exist. Suppose that

a histogram splits into two more or less different parts. Is this then a clear

indication that the process that generated the values consists of two parts?

Or is it just randomness? Surely, if one feels that there is a real difference

behind the data, it should be investigated using different plots and numerical

analysis.

Sometimes people confuse a histogram with an ordinary diagram showing

time, e.g., weeks or months on the x-axis. The reason for this is probably that

most diagrams that people see in papers, on TV, at work, etc., show time series

of some kind. The real strength of the histogram does not emerge until it is

used in conjunction with statistical theory.

Sometimes each data point consists of two

or more values, e.g., length and weight,

lead-time and batch size. It is then a good

idea to show the data as a graphical picture

in order to reveal any relation between the

variables. If we make several smaller dia-

grams closer together we get an efficient

overview of the data. Note that sometimes

a lack of relation between two variables is

also valuable information.

Scatter diagram

•

••••••

•••••••

••

••••

118

Scatter diagram

A major task in the improvement work is to find the basic sources for variation and

fault generation. A scatter diagram can be used to investigate the correlation

between two variables.

The diagram to the left is a scatter diagram. Every dot consists of two values: one x-value and one y-value. The x-value could be the size of a batch of items or a block of program-ming code. The y-value could be the fault rate found in the particular batch or program-ming block. The interpretation of this par-ticular diagram would be that the larger the batch or block size, the larger the fault rate. This might be an unknown ideaworth a deeper investigation. Of course, a diagram that shows no correlation at all, is also know-ledge. There might have been a strong idea that an increase in x generates an increase in y. But suppose that the diagram does not show this: what is thenwrong, the original idea or the process? More investigation is needed!

Analysis of some different scatter diagrams

In Figure 1, there are four typical scatter diagrams involving two variables.

The relation between two variables is sometimes referred to as the correlation:

Figure 1 Diagram for data with different kinds of correlation.

•

•

•

•

•

••

•

•

••

•

••

••

•

••

•

••

•

•

•

•

•

••

•

•

••••

•

••

••

•••

•

•

•

•

•

•

•

•

••••••

•••••••

••

••••

•

•••

•••

••

•

••••

••

•

••

•

••••

•••••

••••••

•

•

•••

•••

••

••••••••••••

••

•

a b

c d

Figure 1a Positive correlation The diagram shows a so-called positive correlation,

i.e. a trend upwards to the right, between the x- and y-results. Higher x-values

correspond to higher y-values in general.

Figure 1b No correlation The diagram shows no obvious correlation between the

x- and y-results. The y-values do not change for changes in the x-values (except

for small random changes).

Figure 1c Negative correlation The diagram shows a negative correlation, i.e. a trend

downwards to the right for the x- and y-results. Higher x-values correspond to

lower y-values in general.

Figure 1d The diagram shows a curved relation between the x- and y-results. For

small x-values, the y-values decrease but for large x-values the y-values increase.

Sometimes, the outcome may depend on two or more variables. Assume that the

outcome depends on the temperature and pressure of the process. Then, it is

difficult to illustrate the relation in an ordinary two-dimensional scatter diagram.

A three-dimensional diagram or a numerical method may be needed for further

analysis.

Further comments

Very often there are many variables that one wants to plot against each other.

However, the interpretation becomes clumsy if one has to generate too many

diagrams on paper. There are nowadays computer programs that can generate

what is known as a plot matrix, i.e. on the screen one can plot several variables

against each other in pairs. In this way one gets a good overview. If the pro-

gram also supports what is called brushing technique, then it is possible to

point with the cursor on one or several values in one plot and then the same

value or values are emphasised in all plots on the screen. In this way it is

possible to look at some values from several different angles and thus easier

understand the information of the data.

119

A run chart or line diagram is usually

used to illustrate the development of a

course of events over time. The diagram

allows one to determine whether the

measured variable indicates a tendency

to increase or whether there exists peri-

odic variations, up and down.

Run chart

•

•

•

• ••

•

Run chart

Run charts are very often used to show how a variable is changing over time or

in space. Are there trends or periodic behaviours in the material?

The diagram has two axes. The horizontal axis often represents time, whereas

the vertical axis specifies a quantity. Measurement results are plotted as points

in the diagram. The points are ordinarily connected using straight lines. In a

run chart, a variable can be studied as a function of time.

Batch number Fault rate

1 0.014

2 0.023

3 0.000

4 0.006

5 0.030

6 0.012

7 0.005

8 0.005

9 0.010

10 0.003

Figure 1 A line diagram illustrating measured percentage defects in 10 successive batches of goods.

Some notes on the use of different scaling

The choice of the scales and of the position of the origin of the axes in a diagram

means a lot for the appearance of the diagram. This can be seen in Figure 2 where

the left-hand diagram includes the origin whereas the right-hand diagram, with

exactly the same data, has a different scale that puts the origin far below the edge

of the paper.

Figure 2 Two diagrams with the same data. However, the y-axis of the left diagram includes the origin whilethe y-axis of the right diagram with a different scale, puts the origin far outside the paper.

The two diagrams in Figure 2 show the same data but convey different impressions.

The left-hand diagram seems to indicate that the level is fairly stable whereas the

right-hand diagram seems to show a dramatic increase from, say, x-value 5. Note

that neither diagram is wrong, they only present the data differently.

122

•

•

•

•

•

•

• •

•

•

123

Even if the y-axis includes the origin, one can create different impressions by using

different ratios between the two axes. Figure 3 shows this.

Whether to use a scatter diagram or a run chart is perhaps a matter of taste. The

scatter diagram might be considered more true as it shows only the those measure-

ments that are really recorded.

Most run charts consists of data from discrete times only, but still the chart

shows a line between the recorded points. One argument in favour of using the

run chart in this case is that the line between the points make it easier to detect

regular patterns in the data.

One can argue that many times we do not know the values between the points.

Continuous recording of temperatures etc. can of course be shown as a run chart.

All comments that are applicable to scatter diagrams are of course applicable

to run charts as well.

•

•

•

• ••

•

•

•

•

• ••

•

•

•

•• • • •

•

•

•

••

•

•

Figure 3 Four different diagrams with different ratiosbetween the two axes of the diagram.

By dividing the data based on different

criteria, we can investigate the data further

using some graphical tool. If we, e.g., have

a large set of data from different machines,

it might be wise to account for data per

machine, shift, etc. This is called stratifi-

cation of the data and we can use most of

the graphical tools.

Stratification

• • ••

••

• • • ••

•• • •

• • • • •

++++++

++

++++++

++

++++

126

Stratification

Measurements are performed for a number of reasons. One may be interested in

investigating the cause of a specific problem or one may be interested in collecting

general information that may, eventually, be helpful in improving one’s process.

Collecting different categories of data (from different machines, different shifts,

different companies, etc.) and subdividing this data into subsets may lead to

the discovery of differences or peculiarities that support the continuation of an

improvement process.

The subdivision of the total data set into subsets is called stratification (the

subsets are sometimes referred to as strata, which is the plural form of stratum).

After having performed the stratification process, one can continue by using a

suitable diagram to describe the different subsets in an effort to determine the

existence of any differences in position, spread, and distribution of fault types

or the like.

During the performance of the actual measurements, one should note all

imaginable stratification factors that vary during the performance of the measure-

ments. Measurements should be conducted during a sufficiently long period of

time so that stratification factors of interest (shifts, material consignments, etc.)

have sufficient opportunity to vary. Stratification factors

may be composed of other categories than those men-

tioned above. They may very well be numerical variables

that are subdivided into two or more classes (categories):

large, medium, or small program blocks, large or small

customers, etc.

Example 1: Stratification of a histogram

In this first example we have data that is pictured using

a histogram. It is known that the data comes from three

different sources and it is important to investigate

whether there is a difference between the averages of

the three data sets. (Such a question should be answered

using both numerical and graphical methods). By

making several histograms, one for each source,

it might be possible to see such differences.

Note that the top histogram contains all the

data; its y-axis, therefore, has a different scale. It is

impossible to see that the top histogram actually

consists of several histograms. However, if the data

is stratified, it is quite easy to see that each histogram

contains data with different averages.

127

Example 2: Stratification of a scatter diagram

The second example shows the appearance of how the data in a scatter diagram after

stratification of the data. The first diagram contains all the data using only one sym-

bol. The second diagram contains one symbol per stratum (there are two strata in

the diagram).

It is obvious that the data originates from (at least) two processes. However, without

a numerical analysis, it is difficult to determine whether the data indicated using a

‘+’-sign has a greater slope than the ‘•’-data or whether it has the same slope but at

a higher level.

Stratification – step by step

1. When planning the measurements, establish which imaginable stratification

factors (machine, shift, company, block size, etc.) are to be registered together

with the measurement results.

2. Perform the measurements and register the measured data together with the

stratification factors to enable stratification.

3. Stratify the measurement data in different ways and describe the various subsets

using suitable diagrams to enable their being compared. The availability of

software for statistical analysis, e.g. Minitab or any other statistical package,

simplifies this task considerably.

• • ••

••

• • • ••

•• • •

• • • • •

• • • •• •

••

• • •• • •

••

• • • •

• • ••

••

• • • ••

•• • •

• • • • •

++++++

++

++++++

++

++++

Figure 1 A scatter diagram of all data. The right-hand diagram shows the same data as theleft-hand diagram but using two difficult symbols.

7 quality control tools

Documents