data analysis i
DESCRIPTION
Data Analysis I. Anthony E. Butterfield CH EN 4903-1. "When a man finds a conclusion agreeable, he accepts it without argument, but when he finds it disagreeable, he will bring against it all the forces of logic and reason." ~ Thucydides (460 – 395 BC). Data Analysis. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Data Analysis I](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816649550346895dd9c2a1/html5/thumbnails/1.jpg)
Data Analysis I
Anthony E. ButterfieldCH EN 4903-1
"When a man finds a conclusion agreeable, he accepts it without argument, but when he finds it disagreeable, he will bring against it all the forces of logic and reason."
~ Thucydides (460 – 395 BC)
![Page 2: Data Analysis I](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816649550346895dd9c2a1/html5/thumbnails/2.jpg)
Data Analysis
• Reasons for data analysis using our p data.• Basics of data analysis.– Statistics.– Probability distributions.
• Confidence Intervals.• Error Propagation.• Rejecting data.• Hypothesis Testing.• Fitting data.
http://www.che.utah.edu/~geoff/writing/index.html
![Page 3: Data Analysis I](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816649550346895dd9c2a1/html5/thumbnails/3.jpg)
Analysis of Our p Experiment
• Hypothesis:Stuff that look like circles are circles.
• We have our data…What now? Is the hypothesis true?
Object Name Width Perimeter Battery 4.4 ± 0.1 14.0 ± 0.1Scotch Tape 2.6 ± 0.0 8.2 ± 0.0Duct Tape 5.3 ± 0.1 16.8 ± 0.3Floppy 6.3 ± 0.1 19.0 ± 1.0Fitting 8.8 ± 0.0 27.7 ± 0.2Gold Doubloon 3.5 ± 0.0 10.7 ± 0.2Red Cap 4.1 ± 0.5 12.9 ± 1.0White Cap 4.0 ± 0.0 12.5 ± 0.0Black Cap 7.8 ± 0.0 24.6 ± 0.0 Soup Can 6.7 ± 0.5 21.3 ± 0.1Frisbee 8.8 ± 0.1 27.8 ± 0.5Poker Chip 27.0 ± 0.5 85.0 ± 1.0Toy Wheel 5.6 ± 0.1 17.1 ± 0.2Spool of Wire 25.9 ± 0.1 81.5 ± 0.1Plastic Cup 9.8 ± 0.0 31.4 ± 0.0Paper Cup 2.9 ± 0.0 9.3 ± 0.0
![Page 4: Data Analysis I](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816649550346895dd9c2a1/html5/thumbnails/4.jpg)
Results from Our p Experiment• Good news: The
average “p” we found is pretty close to p.
• But is it close enough?
• Other issues: Precision, accuracy, types of error?
![Page 5: Data Analysis I](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816649550346895dd9c2a1/html5/thumbnails/5.jpg)
For or Against
“p” Does Not ≈ p
•Confidence in our hypothesis is diminished.
•Going against robust “theory”: Check methods, calculations, take more data…
•Good luck publishing….
“p” ≈ p
•Confidence in our hypothesis is increased.• Nothing is “proven”.
•Publish results:A.E.Butterfield, et al., “The Circularity of Circular Looking Stuff”, Nature, 2009.
![Page 6: Data Analysis I](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816649550346895dd9c2a1/html5/thumbnails/6.jpg)
Data Analysis, Big Picture
• We need an objective means to avoid Thucydides‘ criticism, and impartially choose whether our data supports or undermines our favored hypothesis.
• "The method of science, as stodgy and grumpy as it may seem, is far more important than the findings of science." ~ Carl Sagan, The Demon Haunted World
![Page 7: Data Analysis I](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816649550346895dd9c2a1/html5/thumbnails/7.jpg)
Types of Data Analysis
• Quality vs Quantity– Quantitative
• “The temperature is 45.2 ± 0.1 °C (95% CL).”– Semi-Quantitative
• “The temperature is above 0 °C.”– Qualitative
• “It’s hot.”
• Structural Analysis – What is its structure?• Content Analysis – What is in it?• Distribution Analysis – Where is it?• Process Analysis – When does it occur?
![Page 8: Data Analysis I](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816649550346895dd9c2a1/html5/thumbnails/8.jpg)
Basics of Statistics
• Mean:
• Deviation:
• Standard Deviation:
• Variance:
n
iix
nxxxE
1
1
ii xd
n
iix
nxEs
1
22 1
22)( xExVar
![Page 9: Data Analysis I](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816649550346895dd9c2a1/html5/thumbnails/9.jpg)
Discrete Probability Distributions• Random variable x can take on n different
values, x1, x2…, xn, with probabilities of P1, P2…, Pn, respectively.
• Examples:
n
iiP
1
%100
![Page 10: Data Analysis I](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816649550346895dd9c2a1/html5/thumbnails/10.jpg)
Continuous Probability Distributions
• A probability density function that describes the probability that a continuous variable will fall within a particular range.
• Examples:
x
x
Pdx%100
![Page 11: Data Analysis I](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816649550346895dd9c2a1/html5/thumbnails/11.jpg)
Central Limit Theorem• The sum of a sufficiently large number of
independent and identically distributed random variables has a normal distribution, regardless of the original distribution:
![Page 12: Data Analysis I](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816649550346895dd9c2a1/html5/thumbnails/12.jpg)
Normal Distribution• AKA: Gaussian distribution, bell curve.• One of the most common distributions in
nature and, therefore, data analysis.• Probability density function (PDF):
2
2
2 2exp
21
px
![Page 13: Data Analysis I](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816649550346895dd9c2a1/html5/thumbnails/13.jpg)
• Integrate PDF from -∞ to x.
• The probability that a value will be below x.
Normal Cumulative Distribution Function
21
21
xerfdxPDFCDF
x
![Page 14: Data Analysis I](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816649550346895dd9c2a1/html5/thumbnails/14.jpg)
a) What is the probability, with =0 and =1, of the measurement being exactly 0?
b) What is the probability of measuring a value between -0.5 and 1.5 , with =0 and =1?
c) Between -1 and 3 if with =0 and =2?
Normal Examples
0%
62%62%
![Page 15: Data Analysis I](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816649550346895dd9c2a1/html5/thumbnails/15.jpg)
An Abnormal Distribution
• Log-normal Distribution– Used when random
variables multiply.– Particles often take
this distribution.
2
2
2lnexp
21
px
x
![Page 16: Data Analysis I](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816649550346895dd9c2a1/html5/thumbnails/16.jpg)
Normal Confidence Intervals• A range that a parameter lies within, given a
certain probability.• Confidence intervals for normal distributions:
..2.. 1 LCerfIC
![Page 17: Data Analysis I](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816649550346895dd9c2a1/html5/thumbnails/17.jpg)
C.I. for Single Measurements
• Gauges / Rulers.– Estimated by the
distinguishable increments.– In our p experiment?
• Digital readouts.– Often ± the smallest digital
precision available.• Fluctuating values.– Use the range of fluctuation
over an appropriate amount of time.
1.27±0.02 (1.6%)
3.11±0.01 (0.3%)
2.41±0.01 (0.4%)
![Page 18: Data Analysis I](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816649550346895dd9c2a1/html5/thumbnails/18.jpg)
Error Propagation
• For addition or subtraction intuition may be:
• But it is unlikely the extremes or error will occur twice:
• Multiplication or division– Our p data:
• In general:
2
1
2
321 ,...,,
n
ix
iG
n
ixG
xxxxfG
2/122
yxxyyxxy
2/122yxyx
yxyx
![Page 19: Data Analysis I](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816649550346895dd9c2a1/html5/thumbnails/19.jpg)
Some Examples• Calculate interfacial tension between a liquid
and a solid:
• If T = 25 ± 1 °C and P = 101 ± 2 KPa, what is v?
21222 sincos
cos
LV
LVSVSL
LVSVSL
molmeev
ee
PRT
PR
PRTv
v
PTv
3
2
2
2
2
2
2
49.4245.2
736.2978.62101
15.298*008314.01*101008314.0
,
P’s Contribution
T’s Contribution
![Page 20: Data Analysis I](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816649550346895dd9c2a1/html5/thumbnails/20.jpg)
A Better, Numerical Method
2
10
11321
3210
,...,,,...,,,...,,
ni
ii
niiiii
n
ff
xxxxxxxffxxxxff
• Can be used for problems which are solved numerically.
• May add or subtract i and get different results.
![Page 21: Data Analysis I](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816649550346895dd9c2a1/html5/thumbnails/21.jpg)
An Example
• If T = 25 ± 1 °C and P = 101 ± 2 KPa, what is v of an ideal gas?
molme
f
f
f
vv
3222
2
1
0
48.4,004-4.7656e005-8.2317e-
0.02412101
15.298*008314.0
0.0246101
115.298*008314.0
0245.0101
15.298*008314.0
P is biggest source of error
![Page 22: Data Analysis I](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816649550346895dd9c2a1/html5/thumbnails/22.jpg)
Chauvenet’s Criterion• A statistically justifiable means
of rejecting outlying data may be desired (illegitimate error).
• The probability of taking a certain measurement on a normal distribution times the number of measurements must be less than 50%.
• Tossing data out is suspect, though; avoid it.
nerf
x2112 1
![Page 23: Data Analysis I](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816649550346895dd9c2a1/html5/thumbnails/23.jpg)
Example of Chauvenet’s Criterion
• Data from our circle experiment:
• We could toss the “floppy”datum.
• Would make ouraverage p=3.1465,verses 3.138.
• Further from p.
2.1516*2112 1
erf
![Page 24: Data Analysis I](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816649550346895dd9c2a1/html5/thumbnails/24.jpg)