last time:. 2/3 2/3 of all type a respondents had measurements between 55 and 69
TRANSCRIPT
Last Time:Last Time:
Type A
50-54
55-59
60-64
65-69
70-74
75-79
80-84
85-89
90-94
2/3
2/3 of all Type A respondents had measurements between 55 and 69
50-54
55-59
60-64
65-69
70-74
75-79
80-84
85-89
90-94
50-54
55-59
60-64
65-69
70-74
75-79
80-84
85-89
90-94
DataWorld
TheoryWorld
Comment: A density function is like a “smoothed out” very fine-tuned histogram
Examples of Density Functions
Median
75th percentile25th percentile
Area p below pth percentile
Symmetric
= Mean
IQR
Examples of Density Functions
Median Mean
Positively Skewed (Skewed to the right)
THE NORMAL DISTRIBUTIONTHE NORMAL DISTRIBUTIONProperties of X ~ N( , )
The proportion of a normally distributed X within:
•one standard deviation from its mean is .6826 P( - < X < + ) = .6826
•two standard deviations from its mean is .9544 P( - 2 < X < + 2 ) = .9544
•three standard deviations from its mean is .9974 P( - 3 < X < + 3 ) = .9974
True for any value of and
STANDARD NORMAL DISTRIBUTIONSTANDARD NORMAL DISTRIBUTION
Z ~ N( 0, 1)
-4 -3 -2 -1 0 1 2 3 4
Know everything about Z ~ N(0,1):
Table in your book (inside cover) tabulates values P(Z<z)
(note the table goes over two pages)
Note: you can think of values z of Z ~ N(0,1) as
“z many standard deviations from the mean”
z
AMAZING PROPERTY OF AMAZING PROPERTY OF
NORMAL DISTRIBUTIONSNORMAL DISTRIBUTIONS
If X is normally distributedthen a+bX (b>0) is also normally distributed.
More precisely: X ~ N( , ) (a+bX) ~ N(a+b , b)
Note:
This type of relationship is not necessarily true
for other distributions
Example:The population distribution of psychometric test X is a normal distribution with mean 1.1 and standard deviation of .08: Thus, X ~ N(1.1 , .08).
a) P(1.1 < X) = ?
b) P(1.02 < X < 1.18) = ?
c) How to calculate P(1.1 < X < 1.25) ?
d) How to calculate P(X > 1) ?
e) How to find x such that P(X <x) = .75 ?
0064.,08.,1.1 2
Today:Today:
Rehearse the Normal DistributionRehearse the Normal Distribution
Start Chapter 2:Start Chapter 2:
Relationships among VariablesRelationships among Variables
Relationships among VariablesSo far:
Mostly interested in a single variable at a time. Exception: Type A, Type B data
where we recorded the type and the blood pressure
Mode, Median, Mean, IQR, Variance, Standard Deviation, etc. all applied to a single variable
Single variable statistics are common in daily life:
Government / Mass Media provide tons ofSocio-Economic Statistics, Sports Statistics
Relationships among Variables:The crucial feature of almost all
scientific research
How does the perception of a stimulus vary withthe physical intensity of that stimulus?
How does the attitude towards the President vary withthe socio-economic properties of the survey respondent?
How does the performance on a mental task vary with age?
Relationships among Variables:The crucial feature of almost all
scientific research
How does depression vary withnumber of traumatic experiences ?
How does undergraduate student alcohol abuse vary withperformance in quantitative courses?
How does memory performance vary with attention span?
Relationships among Variables:The crucial feature of almost all
scientific research
How does the behavior of respondents in an experiment vary with the experimental group that the respondents belong to ?
and on … … and on …
… …and on …
Relationships among Variables: Interpretations
Stimulus ResponseExperimental Group Observed Behavior
SAT Verbal ? ? SAT quantitative
One variable is used to “explain” another variable
Both variables depend on a third (“lurking”) variable
Relationships among
Variables: Interpretations
Relationships among Variables: Interpretations
Relationships among Variables: Interpretations
One variable is used to “explain” another variable
X VariableIndependent VariableExplaining VariableExogenous VariablePredictor Variable
Y VariableDependent VariableResponse Variable
Endogenous VariableCriterion Variable
Scatter Plots
X
Y
Questions to ask about Scatter Plots
• Is there a systematic trend?
• Can the relationship be described by a linear function Y = a +bX?
• If so, is there a lot of scatter around the line?
• Is there a strong linear relationship?
• Are there lurking variables?
Scatter Plots
X
Y Weak Positive Association?A lot of Scatter!Lurking Variables?
Scatter Plots
X
YVenus Mars
Scatter Plots
X
YVenus
Negative AssociationNot a lot of Scatter
Scatter Plots
X
YMarsPositive Association
Not a lot of Scatter
Example: Performance in Experiment
PRACTICE TRIALCASE 1 86 82.6CASE 2 109.3 112.6CASE 3 73.3 70CASE 4 80.6 76.6CASE 5 86.6 84CASE 6 85.3 86CASE 7 83.3 82.6CASE 8 78.6 81.3CASE 9 92 86.6CASE 10 76 75.3
PRACTICE: Performance Score in a Practice SessionTRIAL: Performance Score in a Trial Session
Suppose these scores are Interval Scale
Case i = Respondent i
Sample Size: 10 Respondents
Example: Performance in Experiment
PRACTICE TRIALCASE 1 86 82.6CASE 2 109.3 112.6CASE 3 73.3 70CASE 4 80.6 76.6CASE 5 86.6 84CASE 6 85.3 86CASE 7 83.3 82.6CASE 8 78.6 81.3CASE 9 92 86.6CASE 10 76 75.3
Stem and Leaf Plots
Stem and Leaf Plot of variable: PRACTICE, N = 10 Minimum: 73.300 Lower hinge: 78.600 Median: 84.300 Upper hinge: 86.600 Maximum: 109.300 7 3 7 H 68 8 M 03 8 H 566 9 2 * * * Outside Values * * * 10 9
Stem and Leaf Plot of variable: TRIAL, N = 10 Minimum: 70.000 Lower hinge: 76.600 Median: 82.600 Upper hinge: 86.000 Maximum: 112.600 7 0 7 H 56 8 M 1224 8 H 66 * * * Outside Values * * * 11 2
Stem and Leaf Plots
PRACTICE TRIAL
N of cases 10 10
Minimum 73.300 70.000
Maximum 109.300 112.600
Mean 85.100 83.760
Standard Dev 10.133 11.381
Some Descriptive Statistics:
Histograms
Box and Whisker Plots
The 1970 Vietnam War Draft Lottery
http
://w
ww
.sss
.gov
/lott
er1.
htm
http://lib.stat.cmu.edu/DASL/Stories/DraftLottery.html
http://lib.stat.cmu.edu/DASL/Stories/DraftLottery.html
(0,a)
b
InterceptSlope
bX+a
X
Reminder: (Simple) Linear Function Y=a+bX
We are now interested in this, not for data transformation purposes, but rather to model the relationship between an
independent variable X and a dependent variable Y
Y
1
slope:b
intercept :a
bXaY :sprediction errorless had weIf
X
Y
Simple Least-Squares Regression
X
YA guess at the location of the regression line
X
YAnother guess at the location of the regression line(same slope, different intercept)
X
YInitial guess at the location of the regression line
X
YAnother guess at the location of the regression line(same intercept, different slope)
X
YInitial guess at the location of the regression line
X
YAnother guess at the location of the regression line(different intercept and slope, same “center”)
X
Y
We will end up being reasonably confidentthat the true regression line is somewhere in the indicated region.
X
YEstimated Regression Line
errors/residuals
X
YEstimated Regression Line
X
YEstimated Regression Line
Wrong Picture!
Wrong Picture!
Error Terms have to be drawn vertically
X
YEstimated Regression Line
iii yye ˆ
iy
iy
ix
bXaY ˆ
:Line Regression theofEquation
How do we find a and b?
N
1i
2N
1i
2
idualserrors/res squared of sum theminimize to, Find
abxye
ba
iii
In Least-Squares Regression:
In Least-Squares Regression:
XbYa
XX
YYXXb N
ii
N
iii
,
1
2
1
N
i
N
iii
N
i
N
ii
N
iiii
XXN
YXYXN
b
1
2
1
2
1 11
ComputationalFormula
Outliers? Influential
Data Points?