data & statistics 101

DATA & STATISTICS 101

Presented by

Stu Nagourney NJDEP, OQA

Precision, Accuracy and Bias

Precision: Degree of agreement between a series of measured values under the same conditions

Accuracy: Degree of agreement between the measured and the true value

Bias: Error caused by some aspect of the measurement system

Precision, Accuracy and Bias

Sources of Error

Systematic Errors: Bias always in the same direction, and constant no matter how many measurements are made

Random Errors: Vary in sign and are unpredictable. Average to 0 if enough measurements are made

Blunders: The occasional mistake that produces erroneous results; can be minimized but never eliminated

Applying Statistics

One cannot sample every entity of an entire system or population. Statistics provides estimates of the behavior of an entire system or population, provided that:– Measurement system is stable– Individual measurements are all independent– Individual measurements are random

representatives of the system or population

Distributions

Data generated by a measurement process generally have the following properties:– Results spread symmetrically around a central

value– Small deviations from the central value occur more

often than large deviations– The frequency distribution of a large amount of

data approximates a bell-shaped curve– The mean of even small sets of data represent the

overall better than individual values

“Normal” Distribution

Other Distributions

Issues with Distributions

For large amounts of data, distributions are easy to define. For smaller data sets, it is harder to define a distribution.

Deviations from “normal” distributions:– Outliers that are not representative of the population– Shifts in operational characteristics that skew the

distribution– Large point-to-point variations that cause

broadening

Estimation of Standard Deviation

The basic parameters that characterize a population are– Mean ()

– Standard Deviation ()

Unless the entire population is examined, and cannot be known. They can only be estimated from a representative sample by– Sample Mean (X)

– Estimate of Standard Deviation (s)

Measures of Central Tendency & Variability

Central Tendency: the value about which the individual results tend to “cluster

Mean: X = [X1 + X2 + X3 + … Xn] / n

Median: Middle value of an odd number of results when listed in order

s = [(Xi - X)2 / n-1]1/2

Measures of Central Tendency & Variability

Statistics

If you make several sets of measurements from a normal distribution, you will get different means and standard deviations

Even the best scientist and/or laboratory will have measurement differences when examining the same sample (system)

What needs to be defined is the confidence in measurement data and the significance of any differences

Estimation of Standard Deviation

X (Xi – X) (Xi – X)2

15.2 0.143 0.020414.7 -0.357 0.125715.1 0.043 0.001815.0 -0.057 0.003315.3 0.243 0.059015.2 0.143 0.020414.9 -0.157 0.0247

X = 15.057 = 0.2572s = (0.2572/6)1/2 = 0.207

If we take 10X the measurements;all the values are the same as above:

X = 15.057 = 2.572s = (2.572/69)1/2 = 0.193

Does a Measured Value Differ from an Expected Value?

Confidence Interval of the Mean (CI) : The probability where a sample mean lies relative to the population mean

CI = X ± (t) (s) / (n)1/2: value of t depends upon level of confidence desired & # of degrees of freedom (n-1)

Does a Measured Value Differ from an Expected Value?

NIST SRM 2682 (Subbitumerous Coal) was analyzed in triplicate for SCertified value = 0.47%

X = 0.485%s = 0.0090%n = 3

Desired CI = 95%; is the measured mean agree with thecertified value for S of 0.47%?

X = 0.485 (4.303)(0.0090) / (3)1/2

X = 0.485 0.0223Values 0.463 to 0.507 are OK

What if s = 0.0090, but 21 measurements were made? t goes upX = 0.485 (2.086)(0.0090) / (3)1/2

X = 0.485 0.0108Values 0.474 to 0.496 are OK

Criteria for Rejecting an Observation

One can always reject a data point if there is an assignable cause

If not, evaluate using statistical techniques

Common Outlier Tests– Dixon (Q) Test– Grubbs Test– Youdon Test– Student t Test

Criteria for Rejecting an Observation: Dixon (Q) Test

1. Calculate the range of results2. Find the difference between the suspected result and its

nearest neighbor3. Q = Step 2 / Step 14. Consult a Table; if the computed Q > the value in the Table,

the result in question can be rejected with 90% confidence.

0.10140.10120.1019 ?0.1016

Q = 0.1019 – 0.1016 / 0.1019 – 0.1012Q = 0.0003 / 0.0007Q = 0.43Since the measured Q (0.43) is less than the reference value(0.76), the value of 0.1019 cannot be rejected

Control Charts

data & statistics 101

Documents