statistics and research methods

30
Statistics and Research methods Wiskunde voor HMI Bijeenkomst 2

Upload: morrison

Post on 22-Mar-2016

55 views

Category:

Documents


4 download

DESCRIPTION

Statistics and Research methods . Wiskunde voor HMI Bijeenkomst 2. Correlation. Association between scores on two variables e.g., age and coordination skills in children, price and quality. Scatter Diagram. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Statistics and Research methods

Statistics and Research methods

Wiskunde voor HMIBijeenkomst 2

Page 2: Statistics and Research methods

Correlation

Association between scores on two variables– e.g., age and coordination skills in children, price

and quality

Page 3: Statistics and Research methods

Scatter Diagram

A Scatter Diagram (or scatterplot) is a visual display of the relationship between two variables

Example: A company is interested in whether there is a relationship between the number of employees supervised by a manager and the amount of stress reported by that manager

Page 4: Statistics and Research methods

Stress and Employees Supervised

0

1

2

3

4

5

6

7

8

9

10

0 2 4 6 8 10 12

# of Employees Supervised

Stre

ss L

evel

Page 5: Statistics and Research methods

Cause and Effect

An important type of relationship between two variables: cause and effect

Independent variable = cause Dependent variable = effect

Page 6: Statistics and Research methods

Correlation and Causality

Three possible directions of causality:

1. X Y

2. X Y

3. Z

X Y

Page 7: Statistics and Research methods

Correlation and Causality

In situations where variables cannot be manipulated experimentally, it is difficult to know whether one is actually causing the other

Example in newspaper: “drinking coffee causes cancer”– However, a third variable may cause both high

coffee consumption and cancer– Such third variables are called ‘confounds’

Page 8: Statistics and Research methods

However, we can still try to predict one variable on the basis of a second variable, even if the causal relationship has not been determined

Predictor variable Criterion variable

Page 9: Statistics and Research methods

Scatter Diagrams

The independent (or predictor) variable goes on the horizontal (x) axis; the dependent (or criterion) variable on the vertical (y) axis.

Page 10: Statistics and Research methods

Hours of Overtime Worked and Spouse’s Marital Satisfaction

0123456789

10

0 5 10 15 20 25

Hours of Overtime

Mar

ital S

atis

fact

ion

Page 11: Statistics and Research methods

Patterns of Correlation

Linear correlation Curvilinear correlation No correlation Positive correlation Negative correlation

Page 12: Statistics and Research methods
Page 13: Statistics and Research methods

Degree of Linear CorrelationThe Correlation Coefficient

Figure correlation using Z scores Cross-product of Z scores

– Multiply score on one variable by score on the other variable

Correlation coefficient– Average of the cross-products of Z scores

Page 14: Statistics and Research methods

Degree of Linear CorrelationThe Correlation Coefficient

Formula for the correlation coefficient:

Positive perfect correlation: r = +1 No correlation: r = 0 Negative perfect correlation: r = –1

Page 15: Statistics and Research methods

Correlation and Causality

Correlational research design– Correlation as a statistical procedure– Correlation as a kind of research design

Page 16: Statistics and Research methods

Issues in Interpreting the Correlation Coefficient

Statistical significance e.g. p < .05 Proportionate reduction in error =

Proportion of variance accounted for– r2

– Used to compare correlations

Page 17: Statistics and Research methods

Issues in Interpreting the Correlation Coefficient (continued)

Restriction in range

Unreliability of measurement

Page 18: Statistics and Research methods

Correlation in Research Articles

Scatter diagrams occasionally shown Correlation matrix

Page 19: Statistics and Research methods

Regression

Making predictions– does knowing a person’s score on one variable allow us to say

what their score on a second variable is likely to be? The method we use to make predictions is called

regression When scores on one variable are used to predict

scores on another variable, it is called bivariate regression (two variables)

When scores on two or more variables are used to predict scores on another variable, it is called multiple regression

Page 20: Statistics and Research methods

Naming (two variables)

Variable Predicted From

Variable Predicted To

Name Independent Variable Dependent Variable

Alternative Name Predictor Variable Criterion Variable

Symbol X Y

Example Number of hours slept night before

Happy mood that day

Page 21: Statistics and Research methods

• These two variables correlate positively

• People who drink a lot of coffee tend to be happy, and people who do not tend to be unhappy

• Preview: The line is called a regression line, and represents the estimated linear relationship between the two variables. Notice that the slope of the line is positive in this example.

Page 22: Statistics and Research methods

The Regression Line

Relation between predictor variable and predicted values of the criterion variable

Formula: Y = a + (b) X Slope of regression line

– Equals b, the raw-score regression coefficient Intercept of the regression line

– Equals a, the regression constant Method of least squares to derive a and b

Page 23: Statistics and Research methods

Method of least squares

a and b derived by:– least squares method (drawing)– line through MX and MY

Page 24: Statistics and Research methods

where b = (SDY/SDX) = (r)(SDY/SDX) a = MY – bMX

The Regression Line

Y = a + (b) X

Page 25: Statistics and Research methods

Bivariate Raw Score Prediction

Direct raw-score prediction model– Predicted raw score (on criterion variable) =

regression constant plus the result of multiplying a raw-score regression coefficient by the raw score on the predictor variable

– Formula

– The “hat” over Y means “predicted”

))((ˆ XbaY

Page 26: Statistics and Research methods

Bivariate prediction with Z scores

Given the Z score for X, what is the Z score for Y? We use the prediction model:

where (beta) is the “standardized regression coefficient”

It’s also called “beta weight”, because it tells us how much “weight” to give to ZX when making a prediction for ZY.

The “hat” over ZY means “predicted”.

XY ZZ ˆ

Page 27: Statistics and Research methods

What is ?

It turns out that the best value to use for in the prediction model is r, the (Pearson) correlation coefficient

Thus, the bivariate regression model is

When r = 1, ; when r = -1,

When r = 0; no relation;

“best guess” for Y is the mean score

XY ZrZ ˆXY ZZ ˆ XY ZZ ˆ

0ˆ YZ

Page 28: Statistics and Research methods

Proportionate Reduction in Error

We want a measure of how accurate our regression model (raw score prediction formula) is predicting the data

We can compare the error we make when predicting with our regression model, SSError to the error that we would make if we didn’t have the model SSTotal

Page 29: Statistics and Research methods

Proportionate Reduction in Error

Error– Actual score minus the predicted score

SSError = Sum of squared error using prediction model

SSTotal = Sum of squared error when predicting from the mean = 2MY

22 )ˆ( YYError

2)ˆ( YY

Page 30: Statistics and Research methods

Error and Proportionate Reduction in Error

Formula for proportionate reduction in error:

Proportionate reduction in error = r2

Proportion of variance accounted for

Total

ErrorTotal error in reduction ateProportionSS

SSSS