stat 155, section 2, last time reviewed excel computation of: –time plots (i.e. time series)...

46
Stat 155, Section 2, Last Time • Reviewed Excel Computation of: – Time Plots (i.e. Time Series) – Histograms • Modelling Distributions: Densities (Areas) • Normal Density Curve (very useful model) • Fitting Normal Densities (using mean and s.d.)

Upload: clinton-stephens

Post on 01-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

Stat 155, Section 2, Last Time

• Reviewed Excel Computation of:– Time Plots (i.e. Time Series)– Histograms

• Modelling Distributions: Densities (Areas)

• Normal Density Curve (very useful model)

• Fitting Normal Densities

(using mean and s.d.)

Page 2: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

Reading In Textbook

Approximate Reading for Today’s Material:

Pages 71-83, 102-112

Approximate Reading for Next Class:

Pages 123-127, 132-145

Page 3: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

2 Views of Normal Fitting

1. “Fit Model to Data”

Choose & .

2. “Fit Data to Model”

First Standardize Data

Then use Normal .

Note: same thing, just different rescalings

(choose scale depending on need)

sx

1,0

Page 4: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

Normal Distribution Notation

The “normal distribution,

with mean & standard deviation ”

is abbreviated as:

,N

Page 5: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

Interpretation of Z-scores

Recall Z-score Idea:

• Transform data

• By subtracting mean & dividing by s.d.

• To get (mean 0, s.d. 1)

• Interpret as

• I.e. “ is sd’s above the mean”

nXX ,...,1

ii ZX

/ ii XZ

iX iZ

Page 6: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

Interpretation of Z-scores

Same idea for Normal Curves:

Z-scores are on scale,

so use areas to interpret them

Important Areas:

• Within 1 sd of mean

“the majority”

1,0N

%68

Page 7: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

Interpretation of Z-scores

2. Within 2 sd of mean

“really most”

3. Within 3 sd of mean

“almost all”

%95

%7.99

Page 8: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

Interpretation of Z-scores

Interactive Version (used for above pics)

From Publisher’s Website:

http://bcs.whfreeman.com/ips5e/

• Statistical Applets

• Normal Curve

Page 9: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

Interpretation of Z-scores

Summary:

These relations are called the

“68 - 95 - 99.7 % Rule”

HW: 1.86 (a: 234-298, b: 234, 298),

1.87

Page 10: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

Computation of Normal Areas

Classical Approach: Tables

• See inside covers of text

• Summarizes area computations

• Because can’t use calculus

• Constructed by “computers”

(a job description in the early 1900’s!)

Page 11: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

Computation of Normal Areas

EXCEL

Computation:

works in terms of

“lower areas”

E.g. for

Area < 1.3

is 0.7257

)5.0,1(N

Page 12: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

Computation of Normal Areas

Interactive Version (used for above pic)

From Same Publisher’s Website:

http://bcs.whfreeman.com/ips5e/

• Statistical Applets

• Normal Curve

Page 13: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

Computation of Normal Areas

EXCEL Computation:

(of above e.g.)

• Use NORMDIST

• Enter parameters

• x is “cutoff point”

• Return is Area

below x

Page 14: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

Computation of Normal Areas

Computation of areas over intervals:

(use subtraction)

= -

Page 15: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

Computation of Normal Areas

Computation of areas over intervals:

(use subtraction for EXCEL too)

E.g. Use Excel to check 68 - 95 - 99.7% Rulehttp://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg9.xls

Page 16: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

Normal Area HWHW (use Excel):

1.94

1.97 (Hint: the % above 130 =

100% - % below 130)

1.99 (see discussion above)

1.113

Caution: Don’t just “twiddle EXCEL until answer appears”. Understand it!!!

Page 17: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

And Now for Something Completely Different

A mind blowing video clip:

8 year old Skateboarding Twins:

http://www.youtube.com/watch?v=8X2_zsnPkq8&mode=related&search=

• Do they ever miss?

• You can explore farther…

Thanks to Devin Coley for the link

Page 18: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

Inverse of Area Function

Inverse of Frequencies: “Quantiles”

Idea: Given area, find “cutoff” x

I.e. for

Area = 80%

This x

is the “quantile”

Page 19: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

Inverse of Area Function

EXCEL Computation of Quantiles:

Use NORMINV

Continue Class Example:

http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg9.xls

• “Probability” is “Area”

• Enter mean and SD parameters

Page 20: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

Inverse Area Example

When a machine works normally, it fills bottles with mean = 25 oz, and SD = 0.2 oz.

The machine is “out of control” when it overfills. Choose an “alarm level”, which will give only 1 % false alarms.

Want: cutoff, x, so that Area above = 1%

Note: Area below = 100% - Area above = 99%http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg9.xls

Page 21: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

Inverse Area HW

1.95, 1.101, 1.107, 1.109

1.116 a (-0.674, 0.674)

1.117

1.118 (4.3%)

Page 22: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

Normal Diagnostic

When is the Normal Model “good”?

Useful Graphical Device:

Q-Q plot = Normal Quantile Plot

Idea: look at plot which is approximately linear for data from Normal Model

Page 23: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

Normal Quantile Plot

Approach, for data :

1. Sort data

2. Compute “Theoretical Proportions”:

3. Compute “Theoretical Z-scores”

4. Plot Sorted Data (Y-axis) vs.

Theoretical Z – scores (X-axis)

nXX ,...,1

nini ,...,1),1/(

niniNORMINV ,...,1)),1/((

Page 24: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

Normal Quantile Plot

Several Examples:

http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg12.xls

• Show how to compute in Excel

• Steps as above

Page 25: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

Normal Quantile Plot

Main Lessons:

• Melbourne Winter Temperature Data– Gaussian is good, so looks ~ linear

– So OK, to use normal model for these data

– Adding trendline helps in assessing linearity

Page 26: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

Normal Quantile Plot

Main Lessons:

• Intro Stat Course Exam Scores Data– Skewed distributions nonlinearity

– Outliers show up clearly

– Normal model unreliable here

• Combined plot highlights– Mean = Y-intercept

– Standard Deviation = Slope

Page 27: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

Normal Quantile Plot

Main Lessons:

• Simulated Bimodal Data– Curve is flat near modes

– Roughly linear near peaks

– Corresponds to two normal subpopulaitons

– Goes up fast a valley

Page 28: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

Normal Quantile Plot

Homework:

1.122

1.123

1.125

Page 29: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

And now for something completely different

Recall

Distribution

of majors of

students in

this course:

Stat 155, Section 2, Majors

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Busine

ss /

Man

.

Biolog

y

Public

Poli

cy /

Health

Pharm

/ Nur

sing

Jour

nalis

m /

Comm

.

Env. S

ci.

Other

Undec

ided

Fre

qu

ency

Page 30: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

And now for something completely different

How about a biology joke?

A seventh grade Biology teacher arranged a demonstration for his class. He took two earth worms and in front of the class he did the following: He dropped the first worm into a beaker of water where it dropped to the bottom and wriggled about. He dropped the second worm into a beaker of Ethyl alchohol and it immediately shriveled up and died. He asked the class if anyone knew what this demonstration was intended to show them.

Page 31: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

And now for something completely different

He asked the class if anyone knew what this demonstration was intended to show them.

A boy in the second row immediately shot his arm up and, when called on said: "You're showing us that if you drink alcohol, you won't have worms."

Page 32: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

Variable Relationships

Chapter 2 in Text

Idea: Look beyond single quantities, to how quantities relate to each other.

E.g. How do HW scores “relate”

to Exam scores?

Section 2.1: Useful graphical device:

Scatterplot

Page 33: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

Plotting Bivariate Data

Toy Example:

(1,2)

(3,1)

(-1,0)

(2,-1)

Toy Scatterplot, Separate Points

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

-2 -1 0 1 2 3 4

x

y

Page 34: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

Plotting Bivariate Data

Sometimes:

Can see more

insightful patterns

by connecting

points

Toy Scatterplot, Connected points

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

-2 -1 0 1 2 3 4

x

y

Page 35: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

Plotting Bivariate Data

Sometimes:

Useful to switch off

points, and only

look at lines/curves

Toy Scatterplot, Lines Only

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

-2 -1 0 1 2 3 4

x

y

Page 36: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

Plotting Bivariate Data

Common Name: “Scatterplot”

A look under the hood:

EXCEL: Chart Wizard (colored bar icon)

• Chart Type: XY (scatter)

• Subtype conrols points only, or lines

• Later steps similar to above

(can massage the pic!)

Page 37: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

Scatterplot E.g.Data from related Intro. Stat. Class

(actual scores)

A. How does HW score predict Final Exam?

= HW, = Final Examhttp://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg10.xls

i. In top half of HW scores:

Better HW Better Final

ii. For lower HW:

Final is much more “random”

ix iy

Page 38: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

Scatterplots

Common Terminology:

When thinking about “X causes Y”,

Call X the “Explanatory Var.” or “Indep. Var.”

Call Y the “Response Var.” or “Dep. Var.”

(think of “Y as function of X”)

(although not always sensible)

Page 39: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

Scatterplots

Note: Sometimes think about causation,

Other times: “Explore Relationship”

HW: 2.1

Page 40: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

Class Scores Scatterplotshttp://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg10.xls

B. How does HW predict Midterm 1?

= HW, = MT1i. Still better HW better Exam

ii. But for each HW, wider range of MT1 scores

iii. I.e. HW doesn’t predict MT1 as well as Final

iv. “Outliers” in scatterplot may not be outliers in either individual variable

e.g. HW = 72, MT1 = 94

(bad HW, but good MT1?, fluke???)

ix iy

Page 41: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

Class Scores Scatterplotshttp://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg10.xls

C. How does MT1 predict MT2?

= MT1, = MT2i. Idea: less “causation”, more “exploration”

ii. Still higher MT1 associated with higher MT2

iii. For each MT1, wider range of MT2

i.e. “not good predictor”

iv. Interesting Outliers:

MT1 = 100, MT2 = 56 (oops!)

MT1 = 23, MT2 = 74 (woke up!)

ix iy

Page 42: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

Important Aspects of Relations

I. Form of Relationship

II. Direction of Relationship

III. Strength of Relationship

Page 43: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

I. Form of Relationship• Linear: Data approximately follow a line

Previous Class Scores Examplehttp://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg10.xls

Final vs. High values of HW is “best”

• Nonlinear: Data follows different pattern

Nice Example: Bralower’s Fossil Data

http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg11.xls

Page 44: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

Bralower’s Fossil Datahttp://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg11.xls

From T. Bralower, formerly of Geological Sci.

Studies Global Climate, millions of years ago:

• Ratios of Isotopes of Strontium

• Reflects Ice Ages, via Sea Level

(50 meter difference!)

• As function of time

• Clearly nonlinear relationship

Page 45: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

II. Direction of Relationship

• Positive Association

X bigger Y bigger

• Negative Association

X bigger Y smaller

E.g. X = alcohol consumption, Y = Driving Ability

Clear negative association

Page 46: Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal

III. Strength of Relationship

Idea: How close are points to lying on a line?

Revisit Class Scores Example:http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg10.xls

• Final Exam is “closely related to HW”

• Midterm 1 less closely related to HW

• Midterm 2 even related to Midterm 1