measure phase six sigma statistics

Measure PhaseSix Sigma StatisticsMeasure Phase

Six Sigma Statistics

© OpenSourceSixSigma, LLCOSSS LSS Green Belt v9.1 - Measure Phase 2

Six Sigma Statistics

Descriptive StatisticsDescriptive Statistics

Normal DistributionNormal Distribution

Assessing NormalityAssessing Normality

Graphing TechniquesGraphing Techniques

Basic StatisticsBasic Statistics

Special Cause / Common CauseSpecial Cause / Common Cause

Wrap Up & Action ItemsWrap Up & Action Items

Process CapabilityProcess Capability

Measurement System Analysis

Measurement System Analysis

Six Sigma StatisticsSix Sigma Statistics

Process DiscoveryProcess Discovery

Welcome to MeasureWelcome to Measure


Purpose of Basic Statistics

The purpose of Basic Statistics is to:• Provide a numerical summary of the data being analyzed.

– Data (n) • Factual information organized for analysis. • Numerical or other information represented in a form suitable for

processing by computer• Values from scientific experiments.

• Provide the basis for making inferences about the future.• Provide the foundation for assessing process capability.• Provide a common language to be used throughout an

organization to describe processes.

Relax….it won’t be that bad!


Statistical Notation – Cheat Sheet

An individual value, an observation

A particular (1st) individual value

For each, all, individual values

The mean, average of sample data

The grand mean, grand average

The mean of population data

A proportion of sample data

A proportion of population data

Sample size

Population size

Summation

The Standard Deviation of sample data

The Standard Deviation of population data

The variance of sample data

The variance of population data

The range of data

The average range of data

Multi-purpose notation, i.e. # of subgroups, # of classes

The absolute value of some term

Greater than, less than

Greater than or equal to, less than or equal to


Parameters vs. Statistics

Population Parameters:– Arithmetic descriptions of a

population– µ, , P, 2, N

Population

Sample

Sample

Sample

Sample Statistics:– Arithmetic descriptions of a

sample– X-bar , s, p, s2, n

Population: All the items that have the “property of interest” under study.

Frame: An identifiable subset of the population.

Sample: A significantly smaller subset of the population used to make an inference.


Types of Data

Attribute Data (Qualitative)– Is always binary, there are only two possible values (0, 1)

• Yes, No• Go, No go• Pass/Fail

Variable Data (Quantitative)– Discrete (Count) Data

• Can be categorized in a classification and is based on counts.– Number of defects– Number of defective units– Number of customer returns

– Continuous Data• Can be measured on a continuum, it has decimal subdivisions that

are meaningful– Time, Pressure, Conveyor Speed, Material feed rate– Money– Pressure– Conveyor Speed– Material feed rate


Discrete Variables

Discrete Variable Possible values for the variable

The number of defective needles in boxes of 100 diabetic syringes

0,1,2, …, 100

The number of individuals in groups of 30 with a Type A personality

0,1,2, …, 30

The number of surveys returned out of 300 mailed in a customer satisfaction study.

0,1,2, … 300

The number of employees in 100 having finished high school or obtained a GED

0,1,2, … 100

The number of times you need to flip a coin before a head appears for the first time

1,2,3, …

(note, there is no upper limit because you might need to flip forever before the first

head appears.


Continuous Variables

Continuous Variable Possible Values for the Variable

The length of prison time served for individuals convicted of first degree

murder

All the real numbers between a and b, where a is the smallest amount of time

served and b is the largest.

The household income for households with incomes less than or equal to $30,000

All the real numbers between a and $30,000, where a is the smallest

household income in the population

The blood glucose reading for those individuals having glucose readings equal

to or greater than 200

All real numbers between 200 and b, where b is the largest glucose reading in

all such individuals


Definitions of Scaled Data

• Understanding the nature of data and how to represent it can affect the types of statistical tests possible.

• Nominal Scale – data consists of names, labels, or categories. Cannot be arranged in an ordering scheme. No arithmetic operations are performed for nominal data.

• Ordinal Scale – data is arranged in some order, but differences between data values either cannot be determined or are meaningless.

• Interval Scale – data can be arranged in some order and for which differences in data values are meaningful. The data can be arranged in an ordering scheme and differences can be interpreted.

• Ratio Scale – data that can be ranked and for which all arithmetic operations including division can be performed. (division by zero is of course excluded) Ratio level data has an absolute zero and a value of zero indicates a complete absence of the characteristic of interest.


Nominal Scale

Qualitative Variable Possible nominal level data values for the variable

Blood Types A, B, AB, O

State of Residence Alabama, …, Wyoming

Country of Birth United States, China, other

Time to weigh in!


Ordinal Scale

Qualitative Variable Possible Ordinal level data values

Automobile Sizes Subcompact, compact, intermediate, full size, luxury

Product rating Poor, good, excellent

Baseball team classification Class A, Class AA, Class AAA, Major League


Interval Scale

Interval Variable Possible Scores

IQ scores of students in BlackBelt Training

100…(the difference between scores is measurable and has meaning but a difference of 20 points between 100 and 120 does not indicate that one student is 1.2 times more intelligent )


Ratio Scale

Ratio Variable Possible Scores

Grams of fat consumed per adult in the United States

0 …(If person A consumes 25 grams of fat and person B consumes 50 grams, we can say that person B consumes twice as much fat as person A. If a person C consumes zero grams of fat per day, we can say there is a complete absence of fat consumed on that day. Note that a ratio is interpretable and an absolute zero exists.)


Converting Attribute Data to Continuous Data

• Continuous Data is always more desirable

• In many cases Attribute Data can be converted to Continuous

• Which is more useful?– 15 scratches or Total scratch length of 9.25”– 22 foreign materials or 2.5 fm/square inch– 200 defects or 25 defects/hour


Descriptive Statistics

Measures of Location (central tendency)– Mean– Median – Mode

Measures of Variation (dispersion) – Range – Interquartile Range– Standard deviation– Variance


Descriptive Statistics

Open the MINITAB™ Project “Measure Data Sets.mpj” and select the worksheet “basicstatistics.mtw”


Measures of Location

Mean is:• Commonly referred to as the average. • The arithmetic balance point of a distribution of data.

PopulationSample

Descriptive Statistics: Data

Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3Data 200 0 4.9999 0.000712 0.0101 4.9700 4.9900 5.0000 5.0100

Variable MaximumData 5.0200

Stat>Basic Statistics>Display Descriptive Statistics…>Graphs…>Histogram of data, with normal curve

Data

Frequency

5.025.015.004.994.984.97

80

70

60

50

40

30

20

10

0

Mean 5.000StDev 0.01007N 200

Histogram (with Normal Curve) of Data

Data

Frequency

5.025.015.004.994.984.97

80

70

60

50

40

30

20

10

0

Mean 5.000StDev 0.01007N 200




Median is:• The mid-point, or 50th percentile, of a distribution of data.• Arrange the data from low to high, or high to low.

– It is the single middle value in the ordered list if there is an odd number of observations

– It is the average of the two middle values in the ordered list if there are an even number of observations

Data

Frequency

5.025.015.004.994.984.97

80

70

60

50

40

30

20

10

0

Mean 5.000StDev 0.01007N 200


Data

Frequency

5.025.015.004.994.984.97

80

70

60

50

40

30

20

10

0

Mean 5.000StDev 0.01007N 200







Trimmed Mean is a:Compromise between the Mean and Median.

• The Trimmed Mean is calculated by eliminating a specified percentage of the smallest and largest observations from the data set and then calculating the average of the remaining observations

• Useful for data with potential extreme values.

Stat>Basic Statistics>Display Descriptive Statistics…>Statistics…> Trimmed Mean


Variable N N* Mean SE Mean TrMean StDev Minimum Q1 MedianData 200 0 4.9999 0.000712 4.9999 0.0101 4.9700 4.9900 5.0000

Variable Q3 MaximumData 5.0100 5.0200



Mode is:The most frequently occurring value in a distribution of data.

Data

Frequency

5.025.015.004.994.984.97

80

70

60

50

40

30

20

10

0

Mean 5.000StDev 0.01007N 200


Data

Frequency

5.025.015.004.994.984.97

80

70

60

50

40

30

20

10

0

Mean 5.000StDev 0.01007N 200


Mode = 5


Measures of Variation

Range is the:Difference between the largest observation and the smallest

observation in the data set.• A small range would indicate a small amount of variability and a

large range a large amount of variability.

Interquartile Range is the:Difference between the 75th percentile and the 25th percentile.




Use Range or Interquartile Range when the data distribution is Skewed.



Standard Deviation is:Equivalent of the average deviation of values from the Mean

for a distribution of data.A “unit of measure” for distances from the Mean.

Use when data are symmetrical.

PopulationSample




Cannot calculate population Standard Deviation because this is sample data.



Variance is the:Average squared deviation of each individual data point from the Mean.

Sample Population


Normal Distribution

The Normal Distribution is the most recognized distribution in statistics.

What are the characteristics of a Normal Distribution?

– Only random error is present– Process free of assignable cause– Process free of drifts and shifts

So what is present when the data is Non-normal?


The Normal Curve

The normal curve is a smooth, symmetrical, bell-shaped curve, generated by the density function.

It is the most useful continuous probability model as many naturally occurring measurements such as heights, weights, etc. are approximately Normally Distributed.


Normal Distribution

Each combination of Mean and Standard Deviation generates a unique normal curve:

“Standard” Normal Distribution

– Has a μ = 0, and σ = 1

– Data from any Normal Distribution can be made to fit the standard Normal by converting raw scores to standard scores.

– Z-scores measure how many Standard Deviations from the mean a particular data-value lies.


Normal Distribution

The area under the curve between any 2 points represents the proportion of the distribution between those points.

Convert any raw score to a Z-score using the formula:

Refer to a set of Standard Normal Tables to find the proportion between μ and x.

x

The area between the Mean and any other point depends upon the Standard Deviation.

The area between the Mean and any other point depends upon the Standard Deviation.


The Empirical Rule

The Empirical Rule…

+6-1-3-4-5-6 -2 +4+3+2+1 +5

68.27 % of the data will fall within +/- 1 standard deviation95.45 % of the data will fall within +/- 2 standard deviations99.73 % of the data will fall within +/- 3 standard deviations

99.9937 % of the data will fall within +/- 4 standard deviations99.999943 % of the data will fall within +/- 5 standard deviations

99.9999998 % of the data will fall within +/- 6 standard deviations

measure phase six sigma statistics

Documents

llcosss lss green belt

individuals opensourcesixsigma

number of individuals

number of defectsnumber

number of employees

number of times

number of surveys

real numbers