chapter 3 correlation and regressionnlucas/stat 145/145 powerpoint files/145 chapter 3 part 1... ·...
Post on 20-Oct-2019
16 Views
Preview:
TRANSCRIPT
Chapter 3
TOPIC SLIDE
Correlation Defined 3
Range of the Correlation Coefficient 6
Scatter Plots 9
Null and Alternative Hypotheses 12
Statistical Significance 16
Example 1 21
Example 2 24
Coefficient of Determination 28
Tutorials
• Obtaining the Correlation Coefficient in Excel 2007
CORRELATION AND REGRESSION
Chapter 3
CORRELATION
➊ Indicates how well the ranking of scores on one
variable matches the ranking of scores on a
second variable
➋ As the ranking of scores on the first variable
increasingly match the ranking of scores on the
second variable, the correlation will be stronger
• The fewer matched rankings, the weaker the
correlation
Chapter 3
CORRELATION
➌ The ranking of scores may match in the same
direction (i.e., the score ranked first on variable 1
is also ranked first on variable 2) or opposite
direction (i.e., the score ranked first on variable 1
is ranked last on variable 2)
➍ There is no correlation when the ranking of scores
on one variable fail to match any of the scores on
the second variable
Chapter 3
CORRELATION
➊ EXAMPLE: Five soccer players were ranked
according to their soccer ability and their grade
point average (GPA)
Perfect Positive r Perfect Negative r
Soccer Soccer
Player Ability GPA Player Ability GPA
A 1 1 A 1 5
B 2 2 B 2 4
C 3 3 C 3 3
D 4 4 D 4 2
E 5 5 E 5 1
Chapter 3
CORRELATION
➊ The numeric value of the correlation coefficient has a
range of +1.00 to -1.00, where zero indicates no
correlation
• The closer the correlation coefficient is to +1.00 or -
1.00, the stronger the correlation between two
variables
• The closer the correlation coefficient is to 0, the weaker
the correlation between two variables
• A correlation coefficient equal to 0 means there is
no correlation between two variables
➋ Which value represents a stronger correlation?
• +.65 or -.85
Chapter 3
CORRELATION
➊ The correlation coefficient describes two
characteristics:
• The sign of the correlation (positive or
negative) indicates the direction of the
relationship between the two variables
• The value of the correlation indicates how
strong the correlation is between two variables
➋ The symbol for the correlation between two
variables for a sample is a lower case, italicized r
Chapter 3
CORRELATION
➊ Here is a rough guideline for defining the strength of a
correlation coefficient:
• r = ±.80 to ±1.00 Strong Correlation
• r = ±.60 to ±.80 Moderate Correlation
• r = ±.40 to ±.60 Weak to Moderate
• r < ±.40 Weak Correlation
➋ The guideline above assumes a sample size of N ≥ 30
Chapter 3
CORRELATION
➊ A scatter plot is a graph that describes the direction and
strength of the correlation between two variables
➋ The closer the points in the graph are to forming a straight
line, the stronger the correlation between the two variables
• When the points in the graph form a circular pattern,
the correlation will be close or equal to zero
• When the pattern of points leans from lower right to
upper left, the scatter plot indicates the correlation is
negative
• When the pattern of points leans from lower left to
upper right, the scatter plot indicates the correlation is
positive
Chapter 3
SCATTER PLOTS
➊ When the pattern is lower right to upper left, the correlation
is negative:
➋ When the pattern is lower left to upper right, the correlation
is positive:
Chapter 3
SCATTER PLOTS
Y
X
Y
X
Chapter 3
Scatter Plots
➊ A non-zero correlation does not necessarily mean two
variables are related to each other
➋ There are two competing hypotheses:
• The alternative hypothesis (HA) contends there is a true
correlation between the two variables for the population
and the sample correlation observed is not solely due
to random error
• The null hypothesis (H0) states that there is no
correlation between the two variables for the population
and that any sample correlation observed is solely due
to random error
Chapter 3
NULL HYPOTHESIS
➊ When a correlation coefficient is sufficiently large, we can
make the inference that it reflects not just random error
alone, but also a measure of how much two variables have
in common
• Remember random error is present in everything we
measure – you can’t get rid of it and all statistics
contain some amount of random error
• Smaller samples have more random error and larger
samples have less
Chapter 3
NULL HYPOTHESIS
➊ A statistical conclusion is a statement that rejects or fails to
reject the null hypothesis
• When we reject the null hypothesis, we are saying the
sample correlation obtained is NOT solely due to
random error but indicates a real correlation between
the two variables for the population
• When we fail to reject the null hypothesis, we are
acknowledging the observed sample correlation may
be only due to random error and that there may not be
any true correlation between the two variables for the
population
Chapter 3
NULL HYPOTHESIS
➊ The stronger the correlation, the more likely there is a real
correlation between two variables for the population
➋ Whether a sample correlation between two variables is
real or not is a function of how big the sample size is and
the strength of the correlation between two variables
• As a general rule, the larger the sample size, the
weaker the sample correlation needs to be in order to
declare it statistically significant (meaning the null
hypothesis is rejected)
• In other words, the correlation coefficient needs to
be increasingly stronger for data sets based on
small sample sizes
Chapter 3
NULL HYPOTHESIS
➊ To determine if a sample correlation is significant, we need
to first work from the assumption that the null hypothesis is
true
• We assume the null hypothesis is true because we
haven’t analyzed the data yet (there’s no evidence of a
correlation without analyzing the data)
➋ We only analyze the data from one sample, but to
determine if a sample correlation is statistically significant
we have to remember there are an infinite number of
samples that could have been selected
Chapter 3
STATISTICAL SIGNIFICANCE
➊ Assuming the null hypothesis is true, the correlation for the
sample obtained should be zero and if the value is not
zero, then we assume the correlation is solely due to
random error
➋ If we imagine obtaining the correlations for all possible
samples (where each sample is the same size), we would
find that the average of all sample correlations is equal to
the population correlation
• Again, if the null hypothesis is true, the correlation
between two variables for the population will be zero
Chapter 3
STATISTICAL SIGNIFICANCE
➊ If we imagine obtaining the correlations for all possible
samples (where each sample is the same size), we could
build a histogram using the sample correlation coefficients
• Since the histogram consists of all possible sample
correlations, it is called a sampling distribution of
sample correlations
• This histogram (or sampling distribution) will be flatter
and wider when the sample correlations are based on
smaller sample sizes and taller and narrower when the
sample correlations are based on larger sample sizes
Chapter 3
STATISTICAL SIGNIFICANCE
➊ The null hypothesis is rejected and the sample correlation
is statistically significant when the obtained correlation
value (from Excel) falls in the outer 5% of the histogram (or
sampling distribution)
Chapter 3
STATISTICAL SIGNIFICANCE
0
2.5% 2.5%
Significant
Reject Ho
Significant
Reject Ho
Not Significant
Fail to Reject Ho
r
rcrit .025 rcrit .025
➊ The correlation values that identify the outer 5% of the
sampling distribution are called the critical values
➋ The critical values are found by using the r table found on
the class website
➌ To look-up the critical value, you’ll need to know the
sample size or N
• Locate the sample size under the first column
• Then, for the selected sample size, locate the critical
value under the third column (.05 under ‘2-tailed
testing’)
Chapter 3
NULL HYPOTHESIS
➊ A researcher recruited 25 adults ranging in age from 35 to
65 years old to find out if there is a relationship between
number of television hours watched and blood pressure.
The sample correlation obtained was +.65.
➋ State the null hypothesis for this problem
• The null hypothesis expects there to be no correlation
between number of television hours watched and blood
pressure for adults ranging in age from 35 to 65 years
old. Any non-zero sample correlation observed is
assumed to be solely due to random error.
Chapter 3
CORRELATION
Conduct a test of the null hypothesis at the 5% level. Be
sure to properly state the statistical conclusion
• The sample correlation obtained in Excel is +.65
• The sample size is 25
• The critical values from the r table are ±.396
• The statistical conclusion is:
• Since r (25) = +.65, p < .05; Reject H0
Chapter 3
CORRELATION
Provide an interpretation of the statistical conclusion using
the variables from the description of the problem
• Based on the 25 adults surveyed, ranging in age from
25 to 65 years old, it appears that as the amount of
television watched per day increases, there is an
increase in blood pressure. The obtained sample
correlation does not seem to be solely due to random
error, but rather indicates a real correlation between
amount of television watched per day and blood
pressure.
Chapter 3
CORRELATION
➊ A marriage counselor believes that couples who spend
more time making meals together are more satisfied with
their relationship. Sixteen couples are recruited for the
study and asked to keep track of how much time (in
minutes) they spend preparing meals together each day
for one month. At the end of the month, couples are asked
to complete a survey on how satisfied they are with their
current relationship. The sample correlation obtained was
+.45.
Chapter 3
CORRELATION
➋ State the null hypothesis for this problem
• The null hypothesis expects there to be no correlation
between amount of time couples spend together
preparing meals and their satisfaction with their current
relationship. Any non-zero sample correlation observed
is assumed to be solely due to random error.
Chapter 3
CORRELATION
Conduct a test of the null hypothesis at the 5% level. Be
sure to properly state the statistical conclusion
• The sample correlation obtained in Excel is +.45
• The sample size is 16
• The critical values from the r table are ±.497
• The statistical conclusion is:
• Since r (16) = +.45, p < .05; Fail to reject H0
Chapter 3
CORRELATION
Provide an interpretation of the statistical conclusion using
the variables from the description of the problem
• Based on the 16 couples recruited for the study, it
appears that satisfaction with current relationship is not
dependent on how much time couples spend making
meals together. The obtained sample correlation may
only be due to random error alone.
Chapter 3
CORRELATION
➊ The coefficient of determination or r 2 provides an estimate
of the percentage of variance that is common to two
variables (also known as covariance)
• Variance refers to all the things that cause scores on a
given variable to be different
• What causes people to be different heights?
• Genes, nutrition, disease, age, race, and gender
to name a few
• Differences on these traits cause variance in
heights across the population
Chapter 3
COEFFICIENT OF DETERMINATION
➊ If two variables are correlated, they must share some
amount of variance
• There is a significant correlation between height and
weight for the population
• What is the variance shared between these two
variables?
• Both height and weight are influenced by genes,
nutrition, disease, age, race, and gender
• These variables likely explain why height and
weight are correlated
• The variance shared by two variables is known as
covariance
Chapter 3
COEFFICIENT OF DETERMINATION
➊ What is the coefficient of determination or r 2 for the
problem examining the relationship between amount of TV
watched and blood pressure?
• To get the coefficient of determination, square the
sample correlation obtained in Excel
• r 2 = .65 x .65 = .42 or 42%
• Interpretation: It is estimated that 42% of the
variance in amount of TV watched per day is
common to blood pressure. This estimate of
covariance is based on a sample size of 25.
Chapter 3
COEFFICIENT OF DETERMINATION
➊ What is the coefficient of determination or r 2 for the problem
examining the relationship between amount of time couples
spend making meals together and level of satisfaction with their
current relationship?
• r 2 = .45 x .45 = .20 or 20%
• Interpretation: It is estimated that 20% of the variance in
amount of time couples spend making meals together is
common to the level of satisfaction with their current
relationship. This estimate of covariance is based on a
sample size of 16.
• NOTE: The coefficient of determination was done for the example
above for demonstration only. The coefficient of determination is
not interpretable for non-significant correlations
Chapter 3
COEFFICIENT OF DETERMINATION
End of Chapter 3 – Part 1
top related