last time:. 2/3 2/3 of all type a respondents had measurements between 55 and 69

Last Time:Last Time:

Type A

50-54

55-59

60-64

65-69

70-74

75-79

80-84

85-89

90-94

2/3

2/3 of all Type A respondents had measurements between 55 and 69

50-54

55-59

60-64

65-69

70-74

75-79

80-84

85-89

90-94

50-54

55-59

60-64

65-69

70-74

75-79

80-84

85-89

90-94

DataWorld

TheoryWorld

Comment: A density function is like a “smoothed out” very fine-tuned histogram

Examples of Density Functions

Median

75th percentile25th percentile

Area p below pth percentile

Symmetric

= Mean

IQR

Examples of Density Functions

Median Mean

Positively Skewed (Skewed to the right)

THE NORMAL DISTRIBUTIONTHE NORMAL DISTRIBUTIONProperties of X ~ N( , )

The proportion of a normally distributed X within:

•one standard deviation from its mean is .6826 P( - < X < + ) = .6826

•two standard deviations from its mean is .9544 P( - 2 < X < + 2 ) = .9544

•three standard deviations from its mean is .9974 P( - 3 < X < + 3 ) = .9974

True for any value of and

STANDARD NORMAL DISTRIBUTIONSTANDARD NORMAL DISTRIBUTION

Z ~ N( 0, 1)

-4 -3 -2 -1 0 1 2 3 4

Know everything about Z ~ N(0,1):

Table in your book (inside cover) tabulates values P(Z<z)

(note the table goes over two pages)

Note: you can think of values z of Z ~ N(0,1) as

“z many standard deviations from the mean”

z

AMAZING PROPERTY OF AMAZING PROPERTY OF

NORMAL DISTRIBUTIONSNORMAL DISTRIBUTIONS

If X is normally distributedthen a+bX (b>0) is also normally distributed.

More precisely: X ~ N( , ) (a+bX) ~ N(a+b , b)

Note:

This type of relationship is not necessarily true

for other distributions

Example:The population distribution of psychometric test X is a normal distribution with mean 1.1 and standard deviation of .08: Thus, X ~ N(1.1 , .08).

a) P(1.1 < X) = ?

b) P(1.02 < X < 1.18) = ?

c) How to calculate P(1.1 < X < 1.25) ?

d) How to calculate P(X > 1) ?

e) How to find x such that P(X <x) = .75 ?

0064.,08.,1.1 2

Today:Today:

Rehearse the Normal DistributionRehearse the Normal Distribution

Start Chapter 2:Start Chapter 2:

Relationships among VariablesRelationships among Variables

Relationships among VariablesSo far:

Mostly interested in a single variable at a time. Exception: Type A, Type B data

where we recorded the type and the blood pressure

Mode, Median, Mean, IQR, Variance, Standard Deviation, etc. all applied to a single variable

Single variable statistics are common in daily life:

Government / Mass Media provide tons ofSocio-Economic Statistics, Sports Statistics

Relationships among Variables:The crucial feature of almost all

scientific research

How does the perception of a stimulus vary withthe physical intensity of that stimulus?

How does the attitude towards the President vary withthe socio-economic properties of the survey respondent?

How does the performance on a mental task vary with age?


scientific research

How does depression vary withnumber of traumatic experiences ?

How does undergraduate student alcohol abuse vary withperformance in quantitative courses?

How does memory performance vary with attention span?


scientific research

How does the behavior of respondents in an experiment vary with the experimental group that the respondents belong to ?

and on … … and on …

… …and on …

Relationships among Variables: Interpretations

Stimulus ResponseExperimental Group Observed Behavior

SAT Verbal ? ? SAT quantitative

One variable is used to “explain” another variable

Both variables depend on a third (“lurking”) variable

Relationships among

Variables: Interpretations


One variable is used to “explain” another variable

X VariableIndependent VariableExplaining VariableExogenous VariablePredictor Variable

Y VariableDependent VariableResponse Variable

Endogenous VariableCriterion Variable

Scatter Plots

X

Y

Questions to ask about Scatter Plots

• Is there a systematic trend?

• Can the relationship be described by a linear function Y = a +bX?

• If so, is there a lot of scatter around the line?

• Is there a strong linear relationship?

• Are there lurking variables?

Scatter Plots

X

Y Weak Positive Association?A lot of Scatter!Lurking Variables?

Scatter Plots

X

YVenus Mars

Scatter Plots

X

YVenus

Negative AssociationNot a lot of Scatter

Scatter Plots

X

YMarsPositive Association

Not a lot of Scatter

Example: Performance in Experiment

PRACTICE TRIALCASE 1 86 82.6CASE 2 109.3 112.6CASE 3 73.3 70CASE 4 80.6 76.6CASE 5 86.6 84CASE 6 85.3 86CASE 7 83.3 82.6CASE 8 78.6 81.3CASE 9 92 86.6CASE 10 76 75.3

PRACTICE: Performance Score in a Practice SessionTRIAL: Performance Score in a Trial Session

Suppose these scores are Interval Scale

Case i = Respondent i

Sample Size: 10 Respondents

Example: Performance in Experiment

PRACTICE TRIALCASE 1 86 82.6CASE 2 109.3 112.6CASE 3 73.3 70CASE 4 80.6 76.6CASE 5 86.6 84CASE 6 85.3 86CASE 7 83.3 82.6CASE 8 78.6 81.3CASE 9 92 86.6CASE 10 76 75.3

Stem and Leaf Plots

Stem and Leaf Plot of variable: PRACTICE, N = 10 Minimum: 73.300 Lower hinge: 78.600 Median: 84.300 Upper hinge: 86.600 Maximum: 109.300 7 3 7 H 68 8 M 03 8 H 566 9 2 * * * Outside Values * * * 10 9

Stem and Leaf Plot of variable: TRIAL, N = 10 Minimum: 70.000 Lower hinge: 76.600 Median: 82.600 Upper hinge: 86.000 Maximum: 112.600 7 0 7 H 56 8 M 1224 8 H 66 * * * Outside Values * * * 11 2

Stem and Leaf Plots

PRACTICE TRIAL

N of cases 10 10

Minimum 73.300 70.000

Maximum 109.300 112.600

Mean 85.100 83.760

Standard Dev 10.133 11.381

Some Descriptive Statistics:

Histograms

Box and Whisker Plots

The 1970 Vietnam War Draft Lottery

http

://w

ww

.sss

.gov

/lott

er1.

htm

http://www.sss.gov/lotter1.htm

http://lib.stat.cmu.edu/DASL/Stories/DraftLottery.html



(0,a)

b

InterceptSlope

bX+a

X

Reminder: (Simple) Linear Function Y=a+bX

We are now interested in this, not for data transformation purposes, but rather to model the relationship between an

independent variable X and a dependent variable Y

Y

1

slope:b

intercept :a

bXaY :sprediction errorless had weIf

X

Y

Simple Least-Squares Regression

X

YA guess at the location of the regression line

X

YAnother guess at the location of the regression line(same slope, different intercept)

X

YInitial guess at the location of the regression line

X

YAnother guess at the location of the regression line(same intercept, different slope)

X

YInitial guess at the location of the regression line

X

YAnother guess at the location of the regression line(different intercept and slope, same “center”)

X

Y

We will end up being reasonably confidentthat the true regression line is somewhere in the indicated region.

X

YEstimated Regression Line

errors/residuals

X


X


Wrong Picture!

Wrong Picture!

Error Terms have to be drawn vertically

X


iii yye ˆ

iy

iy

ix

bXaY ˆ

:Line Regression theofEquation

How do we find a and b?

N

1i

2N

1i

2

idualserrors/res squared of sum theminimize to, Find

abxye

ba

iii

In Least-Squares Regression:

In Least-Squares Regression:

XbYa

XX

YYXXb N

ii

N

iii

,

1

2

1

N

i

N

iii

N

i

N

ii

N

iiii

XXN

YXYXN

b

1

2

1

2

1 11

ComputationalFormula

Outliers? Influential

Data Points?

last time:. 2/3 2/3 of all type a respondents had measurements between 55 and 69

Documents

pth percentilesymmetric

s of data points