data analysis i

24
Data Analysis I Anthony E. Butterfield CH EN 4903-1 "When a man finds a conclusion agreeable, he accepts it without argument, but when he finds it disagreeable, he will bring against it all the forces of logic and reason." ~ Thucydides (460 – 395 BC)

Upload: decima

Post on 24-Feb-2016

38 views

Category:

Documents


0 download

DESCRIPTION

Data Analysis I. Anthony E. Butterfield CH EN 4903-1. "When a man finds a conclusion agreeable, he accepts it without argument, but when he finds it disagreeable, he will bring against it all the forces of logic and reason." ~ Thucydides (460 – 395 BC). Data Analysis. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data Analysis I

Data Analysis I

Anthony E. ButterfieldCH EN 4903-1

"When a man finds a conclusion agreeable, he accepts it without argument, but when he finds it disagreeable, he will bring against it all the forces of logic and reason."

~ Thucydides (460 – 395 BC)

Page 2: Data Analysis I

Data Analysis

• Reasons for data analysis using our p data.• Basics of data analysis.– Statistics.– Probability distributions.

• Confidence Intervals.• Error Propagation.• Rejecting data.• Hypothesis Testing.• Fitting data.

http://www.che.utah.edu/~geoff/writing/index.html

Page 3: Data Analysis I

Analysis of Our p Experiment

• Hypothesis:Stuff that look like circles are circles.

• We have our data…What now? Is the hypothesis true?

Object Name Width Perimeter Battery 4.4 ± 0.1 14.0 ± 0.1Scotch Tape 2.6 ± 0.0 8.2 ± 0.0Duct Tape 5.3 ± 0.1 16.8 ± 0.3Floppy 6.3 ± 0.1 19.0 ± 1.0Fitting 8.8 ± 0.0 27.7 ± 0.2Gold Doubloon 3.5 ± 0.0 10.7 ± 0.2Red Cap 4.1 ± 0.5 12.9 ± 1.0White Cap 4.0 ± 0.0 12.5 ± 0.0Black Cap 7.8 ± 0.0 24.6 ± 0.0 Soup Can 6.7 ± 0.5 21.3 ± 0.1Frisbee 8.8 ± 0.1 27.8 ± 0.5Poker Chip 27.0 ± 0.5 85.0 ± 1.0Toy Wheel 5.6 ± 0.1 17.1 ± 0.2Spool of Wire 25.9 ± 0.1 81.5 ± 0.1Plastic Cup 9.8 ± 0.0 31.4 ± 0.0Paper Cup 2.9 ± 0.0 9.3 ± 0.0

Page 4: Data Analysis I

Results from Our p Experiment• Good news: The

average “p” we found is pretty close to p.

• But is it close enough?

• Other issues: Precision, accuracy, types of error?

Page 5: Data Analysis I

For or Against

“p” Does Not ≈ p

•Confidence in our hypothesis is diminished.

•Going against robust “theory”: Check methods, calculations, take more data…

•Good luck publishing….

“p” ≈ p

•Confidence in our hypothesis is increased.• Nothing is “proven”.

•Publish results:A.E.Butterfield, et al., “The Circularity of Circular Looking Stuff”, Nature, 2009.

Page 6: Data Analysis I

Data Analysis, Big Picture

• We need an objective means to avoid Thucydides‘ criticism, and impartially choose whether our data supports or undermines our favored hypothesis.

• "The method of science, as stodgy and grumpy as it may seem, is far more important than the findings of science." ~ Carl Sagan, The Demon Haunted World

Page 7: Data Analysis I

Types of Data Analysis

• Quality vs Quantity– Quantitative

• “The temperature is 45.2 ± 0.1 °C (95% CL).”– Semi-Quantitative

• “The temperature is above 0 °C.”– Qualitative

• “It’s hot.”

• Structural Analysis – What is its structure?• Content Analysis – What is in it?• Distribution Analysis – Where is it?• Process Analysis – When does it occur?

Page 8: Data Analysis I

Basics of Statistics

• Mean:

• Deviation:

• Standard Deviation:

• Variance:

n

iix

nxxxE

1

1

ii xd

n

iix

nxEs

1

22 1

22)( xExVar

Page 9: Data Analysis I

Discrete Probability Distributions• Random variable x can take on n different

values, x1, x2…, xn, with probabilities of P1, P2…, Pn, respectively.

• Examples:

n

iiP

1

%100

Page 10: Data Analysis I

Continuous Probability Distributions

• A probability density function that describes the probability that a continuous variable will fall within a particular range.

• Examples:

x

x

Pdx%100

Page 11: Data Analysis I

Central Limit Theorem• The sum of a sufficiently large number of

independent and identically distributed random variables has a normal distribution, regardless of the original distribution:

Page 12: Data Analysis I

Normal Distribution• AKA: Gaussian distribution, bell curve.• One of the most common distributions in

nature and, therefore, data analysis.• Probability density function (PDF):

2

2

2 2exp

21

px

Page 13: Data Analysis I

• Integrate PDF from -∞ to x.

• The probability that a value will be below x.

Normal Cumulative Distribution Function

21

21

xerfdxPDFCDF

x

Page 14: Data Analysis I

a) What is the probability, with =0 and =1, of the measurement being exactly 0?

b) What is the probability of measuring a value between -0.5 and 1.5 , with =0 and =1?

c) Between -1 and 3 if with =0 and =2?

Normal Examples

0%

62%62%

Page 15: Data Analysis I

An Abnormal Distribution

• Log-normal Distribution– Used when random

variables multiply.– Particles often take

this distribution.

2

2

2lnexp

21

px

x

Page 16: Data Analysis I

Normal Confidence Intervals• A range that a parameter lies within, given a

certain probability.• Confidence intervals for normal distributions:

..2.. 1 LCerfIC

Page 17: Data Analysis I

C.I. for Single Measurements

• Gauges / Rulers.– Estimated by the

distinguishable increments.– In our p experiment?

• Digital readouts.– Often ± the smallest digital

precision available.• Fluctuating values.– Use the range of fluctuation

over an appropriate amount of time.

1.27±0.02 (1.6%)

3.11±0.01 (0.3%)

2.41±0.01 (0.4%)

Page 18: Data Analysis I

Error Propagation

• For addition or subtraction intuition may be:

• But it is unlikely the extremes or error will occur twice:

• Multiplication or division– Our p data:

• In general:

2

1

2

321 ,...,,

n

ix

iG

n

ixG

xxxxfG

2/122

yxxyyxxy

2/122yxyx

yxyx

Page 19: Data Analysis I

Some Examples• Calculate interfacial tension between a liquid

and a solid:

• If T = 25 ± 1 °C and P = 101 ± 2 KPa, what is v?

21222 sincos

cos

LV

LVSVSL

LVSVSL

molmeev

ee

PRT

PR

PRTv

v

PTv

3

2

2

2

2

2

2

49.4245.2

736.2978.62101

15.298*008314.01*101008314.0

,

P’s Contribution

T’s Contribution

Page 20: Data Analysis I

A Better, Numerical Method

2

10

11321

3210

,...,,,...,,,...,,

ni

ii

niiiii

n

ff

xxxxxxxffxxxxff

• Can be used for problems which are solved numerically.

• May add or subtract i and get different results.

Page 21: Data Analysis I

An Example

• If T = 25 ± 1 °C and P = 101 ± 2 KPa, what is v of an ideal gas?

molme

f

f

f

vv

3222

2

1

0

48.4,004-4.7656e005-8.2317e-

0.02412101

15.298*008314.0

0.0246101

115.298*008314.0

0245.0101

15.298*008314.0

P is biggest source of error

Page 22: Data Analysis I

Chauvenet’s Criterion• A statistically justifiable means

of rejecting outlying data may be desired (illegitimate error).

• The probability of taking a certain measurement on a normal distribution times the number of measurements must be less than 50%.

• Tossing data out is suspect, though; avoid it.

nerf

x2112 1

Page 23: Data Analysis I

Example of Chauvenet’s Criterion

• Data from our circle experiment:

• We could toss the “floppy”datum.

• Would make ouraverage p=3.1465,verses 3.138.

• Further from p.

2.1516*2112 1

erf

Page 24: Data Analysis I