data & statistics 101
DESCRIPTION
DATA & STATISTICS 101. Presented by Stu Nagourney NJDEP, OQA. Precision, Accuracy and Bias. Precision: Degree of agreement between a series of measured values under the same conditions Accuracy: Degree of agreement between the measured and the true value - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: DATA & STATISTICS 101](https://reader036.vdocuments.site/reader036/viewer/2022081501/56813066550346895d9640db/html5/thumbnails/1.jpg)
DATA & STATISTICS 101
Presented by
Stu Nagourney NJDEP, OQA
![Page 2: DATA & STATISTICS 101](https://reader036.vdocuments.site/reader036/viewer/2022081501/56813066550346895d9640db/html5/thumbnails/2.jpg)
Precision, Accuracy and Bias
Precision: Degree of agreement between a series of measured values under the same conditions
Accuracy: Degree of agreement between the measured and the true value
Bias: Error caused by some aspect of the measurement system
![Page 3: DATA & STATISTICS 101](https://reader036.vdocuments.site/reader036/viewer/2022081501/56813066550346895d9640db/html5/thumbnails/3.jpg)
Precision, Accuracy and Bias
![Page 4: DATA & STATISTICS 101](https://reader036.vdocuments.site/reader036/viewer/2022081501/56813066550346895d9640db/html5/thumbnails/4.jpg)
Sources of Error
Systematic Errors: Bias always in the same direction, and constant no matter how many measurements are made
Random Errors: Vary in sign and are unpredictable. Average to 0 if enough measurements are made
Blunders: The occasional mistake that produces erroneous results; can be minimized but never eliminated
![Page 5: DATA & STATISTICS 101](https://reader036.vdocuments.site/reader036/viewer/2022081501/56813066550346895d9640db/html5/thumbnails/5.jpg)
Applying Statistics
One cannot sample every entity of an entire system or population. Statistics provides estimates of the behavior of an entire system or population, provided that:– Measurement system is stable– Individual measurements are all independent– Individual measurements are random
representatives of the system or population
![Page 6: DATA & STATISTICS 101](https://reader036.vdocuments.site/reader036/viewer/2022081501/56813066550346895d9640db/html5/thumbnails/6.jpg)
Distributions
Data generated by a measurement process generally have the following properties:– Results spread symmetrically around a central
value– Small deviations from the central value occur more
often than large deviations– The frequency distribution of a large amount of
data approximates a bell-shaped curve– The mean of even small sets of data represent the
overall better than individual values
![Page 7: DATA & STATISTICS 101](https://reader036.vdocuments.site/reader036/viewer/2022081501/56813066550346895d9640db/html5/thumbnails/7.jpg)
“Normal” Distribution
![Page 8: DATA & STATISTICS 101](https://reader036.vdocuments.site/reader036/viewer/2022081501/56813066550346895d9640db/html5/thumbnails/8.jpg)
Other Distributions
![Page 9: DATA & STATISTICS 101](https://reader036.vdocuments.site/reader036/viewer/2022081501/56813066550346895d9640db/html5/thumbnails/9.jpg)
Issues with Distributions
For large amounts of data, distributions are easy to define. For smaller data sets, it is harder to define a distribution.
Deviations from “normal” distributions:– Outliers that are not representative of the population– Shifts in operational characteristics that skew the
distribution– Large point-to-point variations that cause
broadening
![Page 10: DATA & STATISTICS 101](https://reader036.vdocuments.site/reader036/viewer/2022081501/56813066550346895d9640db/html5/thumbnails/10.jpg)
Estimation of Standard Deviation
The basic parameters that characterize a population are– Mean ()
– Standard Deviation ()
Unless the entire population is examined, and cannot be known. They can only be estimated from a representative sample by– Sample Mean (X)
– Estimate of Standard Deviation (s)
![Page 11: DATA & STATISTICS 101](https://reader036.vdocuments.site/reader036/viewer/2022081501/56813066550346895d9640db/html5/thumbnails/11.jpg)
Measures of Central Tendency & Variability
Central Tendency: the value about which the individual results tend to “cluster
Mean: X = [X1 + X2 + X3 + … Xn] / n
Median: Middle value of an odd number of results when listed in order
s = [(Xi - X)2 / n-1]1/2
![Page 12: DATA & STATISTICS 101](https://reader036.vdocuments.site/reader036/viewer/2022081501/56813066550346895d9640db/html5/thumbnails/12.jpg)
Measures of Central Tendency & Variability
![Page 13: DATA & STATISTICS 101](https://reader036.vdocuments.site/reader036/viewer/2022081501/56813066550346895d9640db/html5/thumbnails/13.jpg)
Statistics
If you make several sets of measurements from a normal distribution, you will get different means and standard deviations
Even the best scientist and/or laboratory will have measurement differences when examining the same sample (system)
What needs to be defined is the confidence in measurement data and the significance of any differences
![Page 14: DATA & STATISTICS 101](https://reader036.vdocuments.site/reader036/viewer/2022081501/56813066550346895d9640db/html5/thumbnails/14.jpg)
Estimation of Standard Deviation
X (Xi – X) (Xi – X)2
15.2 0.143 0.020414.7 -0.357 0.125715.1 0.043 0.001815.0 -0.057 0.003315.3 0.243 0.059015.2 0.143 0.020414.9 -0.157 0.0247
X = 15.057 = 0.2572s = (0.2572/6)1/2 = 0.207
If we take 10X the measurements;all the values are the same as above:
X = 15.057 = 2.572s = (2.572/69)1/2 = 0.193
![Page 15: DATA & STATISTICS 101](https://reader036.vdocuments.site/reader036/viewer/2022081501/56813066550346895d9640db/html5/thumbnails/15.jpg)
Does a Measured Value Differ from an Expected Value?
Confidence Interval of the Mean (CI) : The probability where a sample mean lies relative to the population mean
CI = X ± (t) (s) / (n)1/2: value of t depends upon level of confidence desired & # of degrees of freedom (n-1)
![Page 16: DATA & STATISTICS 101](https://reader036.vdocuments.site/reader036/viewer/2022081501/56813066550346895d9640db/html5/thumbnails/16.jpg)
Does a Measured Value Differ from an Expected Value?
NIST SRM 2682 (Subbitumerous Coal) was analyzed in triplicate for SCertified value = 0.47%
X = 0.485%s = 0.0090%n = 3
Desired CI = 95%; is the measured mean agree with thecertified value for S of 0.47%?
X = 0.485 (4.303)(0.0090) / (3)1/2
X = 0.485 0.0223Values 0.463 to 0.507 are OK
What if s = 0.0090, but 21 measurements were made? t goes upX = 0.485 (2.086)(0.0090) / (3)1/2
X = 0.485 0.0108Values 0.474 to 0.496 are OK
![Page 17: DATA & STATISTICS 101](https://reader036.vdocuments.site/reader036/viewer/2022081501/56813066550346895d9640db/html5/thumbnails/17.jpg)
Criteria for Rejecting an Observation
One can always reject a data point if there is an assignable cause
If not, evaluate using statistical techniques
Common Outlier Tests– Dixon (Q) Test– Grubbs Test– Youdon Test– Student t Test
![Page 18: DATA & STATISTICS 101](https://reader036.vdocuments.site/reader036/viewer/2022081501/56813066550346895d9640db/html5/thumbnails/18.jpg)
Criteria for Rejecting an Observation: Dixon (Q) Test
1. Calculate the range of results2. Find the difference between the suspected result and its
nearest neighbor3. Q = Step 2 / Step 14. Consult a Table; if the computed Q > the value in the Table,
the result in question can be rejected with 90% confidence.
0.10140.10120.1019 ?0.1016
Q = 0.1019 – 0.1016 / 0.1019 – 0.1012Q = 0.0003 / 0.0007Q = 0.43Since the measured Q (0.43) is less than the reference value(0.76), the value of 0.1019 cannot be rejected
![Page 19: DATA & STATISTICS 101](https://reader036.vdocuments.site/reader036/viewer/2022081501/56813066550346895d9640db/html5/thumbnails/19.jpg)
Control Charts