exploring relationships: correlations & multiple linear regression dr james betts developing...

34
Exploring Relationships: Correlations & Multiple Linear Regression Dr James Betts Developing Study Skills and Research Methods (HL20107)

Upload: zoe-kelley

Post on 28-Mar-2015

222 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Exploring Relationships: Correlations & Multiple Linear Regression Dr James Betts Developing Study Skills and Research Methods (HL20107)

Exploring Relationships:

Correlations & Multiple Linear Regression

Dr James Betts

Developing Study Skills and Research Methods (HL20107)

Page 2: Exploring Relationships: Correlations & Multiple Linear Regression Dr James Betts Developing Study Skills and Research Methods (HL20107)

Lecture Outline:

•Correlation Coefficients

•Coefficients of Determinations

•Prediction & Regression

•Multiple Linear Regression

•Assessment Details.

Page 3: Exploring Relationships: Correlations & Multiple Linear Regression Dr James Betts Developing Study Skills and Research Methods (HL20107)

Statistics

Descriptive Inferential

Correlational

Relationships

GeneralisingOrganising, summarising & describing data

Significance

Page 4: Exploring Relationships: Correlations & Multiple Linear Regression Dr James Betts Developing Study Skills and Research Methods (HL20107)

Correlation• A measure of the relationship (correlation) between

interval/ratio LOM variables taken from the same set of subjects

• A ratio which indicates the amount of concomitant variation between two sets of scores

• This ratio is expressed as a correlation coefficient (r):

Perfect Negative

Relationship

Perfect Positive

RelationshipNo

Relationship+_

Strong Moderate Weak StrongModerateWeak

-1 +10 +0.7+0.3+0.1-0.7 -0.3 -0.1

Page 5: Exploring Relationships: Correlations & Multiple Linear Regression Dr James Betts Developing Study Skills and Research Methods (HL20107)

Correlation Coefficient & ScatterplotsDirection

Variable X (e.g. VO2max).

Var

iabl

e Y

(e.g

. 10

km r

un ti

me)

Variable X (e.g. VO2max)

Var

iabl

e Y

(e.g

. Exe

rcis

e C

apac

ity)

.

Page 6: Exploring Relationships: Correlations & Multiple Linear Regression Dr James Betts Developing Study Skills and Research Methods (HL20107)

Correlation Coefficient & Scatterplots

Variable X (e.g. VO2max)

Var

iabl

e Y

(e.g

. Exe

rcis

e C

apac

ity)

.Variable X (e.g. Age)

Var

iabl

e Y

(e.g

. Str

engt

h)

Form

Page 7: Exploring Relationships: Correlations & Multiple Linear Regression Dr James Betts Developing Study Skills and Research Methods (HL20107)

Correlation Coefficient & Scatterplots

Variable X (e.g. VO2max)

Var

iabl

e Y

(e.g

. Exe

rcis

e C

apac

ity)

.

Significance

Variable X (e.g. VO2max)

Var

iabl

e Y

(e.g

. 100

m S

prin

t tim

e)

.

Page 8: Exploring Relationships: Correlations & Multiple Linear Regression Dr James Betts Developing Study Skills and Research Methods (HL20107)

Correlation Coefficient & Scatterplots

Variable X (e.g. VO2max)

Var

iabl

e Y

(e.g

. Exe

rcis

e C

apac

ity)

.Variable X (e.g. VO2max)

Var

iabl

e Y

(e.g

. 100

m s

prin

t tim

e)

.

Significance

Page 9: Exploring Relationships: Correlations & Multiple Linear Regression Dr James Betts Developing Study Skills and Research Methods (HL20107)

Methods of Calculating r• Any method of calculating r requires:

– Homoscedacity (i.e. equal scattering)– Linear data (curvilinear data requires eta η)

• Parametric data (i.e. raw data >ordinal LOM and either

normal distribution or large sample) permits the use of ‘Pearson’s Product-Moment Correlation’

• If raw data violates these assumptions then use ‘Spearman’s Rank Order Correlation’ instead.

Page 10: Exploring Relationships: Correlations & Multiple Linear Regression Dr James Betts Developing Study Skills and Research Methods (HL20107)

X = Alcohol Units Y = Skill Score X2 Y2 XY

15 4 225 16 60

14 6 196 36 84

10 4 100 16 40

9 8 81 64 72

8 7 64 49 56

8 8 64 64 64

7 10 49 100 70

6 9 36 81 54

4 14 16 196 56

2 12 4 144 24Totals=

Pearson’s Product-Moment Correlation

Page 11: Exploring Relationships: Correlations & Multiple Linear Regression Dr James Betts Developing Study Skills and Research Methods (HL20107)

r = nXY-(X)(Y)

[nX2-(X)2] [nY2-(Y)2

Pearson’s Product-Moment Correlation

Page 12: Exploring Relationships: Correlations & Multiple Linear Regression Dr James Betts Developing Study Skills and Research Methods (HL20107)

X = Alcohol Units Y = Skill Score Rank X Rank Y D D2

15 4 10 1.5 8.5 72

14 6 9 3 6 36

10 4 8 1.5 6.5 42

9 8 7 5.5 1.5 2.3

8 7 5.5 4 1.5 2.3

8 8 5.5 5.5 0 0

7 10 4 8 4 16

6 9 3 7 4 16

4 14 2 10 8 64

2 12 1 9 8 64Total=

Spearman’s Rank-Order Correlation

Page 13: Exploring Relationships: Correlations & Multiple Linear Regression Dr James Betts Developing Study Skills and Research Methods (HL20107)

Spearman’s Rank-Order Correlation

r = 1 - 6D2

n(n2-1)

Page 14: Exploring Relationships: Correlations & Multiple Linear Regression Dr James Betts Developing Study Skills and Research Methods (HL20107)
Page 15: Exploring Relationships: Correlations & Multiple Linear Regression Dr James Betts Developing Study Skills and Research Methods (HL20107)

Correlations

1 -.860**

.001

10 10

-.860** 1

.001

10 10

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

VAR00001

VAR00002

VAR00001 VAR00002

Correlation is significant at the 0.01 level (2-tailed).**. Correlations

1.000 -.927**

. .000

10 10

-.927** 1.000

.000 .

10 10

Correlation Coefficient

Sig. (2-tailed)

N

Correlation Coefficient

Sig. (2-tailed)

N

VAR00001

VAR00002

Spearman's rhoVAR00001 VAR00002

Correlation is significant at the 0.01 level (2-tailed).**.

SPSS Correlation Outputs

Page 16: Exploring Relationships: Correlations & Multiple Linear Regression Dr James Betts Developing Study Skills and Research Methods (HL20107)

Coefficient of Determination (r2 x 100)• AKA ‘variance explained’, this figure denotes how much

of the variance in Y can be explained/predicted by X

e.g. to predict long jump distance (Y) from maximum sprint speed (X)

r = 0.8

r2 = 64%

Y X

Page 17: Exploring Relationships: Correlations & Multiple Linear Regression Dr James Betts Developing Study Skills and Research Methods (HL20107)

Correlation versus Regression

• By attempting to predict one variable using another, we are now moving away from simple correlation and moving into the concept of regression

Correlation =

Regression =

Page 18: Exploring Relationships: Correlations & Multiple Linear Regression Dr James Betts Developing Study Skills and Research Methods (HL20107)

Linear Regression • The equation for a linear relationship can be expressed as:

Y= a + bX -where: a = the y intercept; and b = the

gradient

Variable X (e.g. VO2max)

Var

iabl

e Y

(e.

g. E

xerc

ise

Cap

acit

y)

.

Page 19: Exploring Relationships: Correlations & Multiple Linear Regression Dr James Betts Developing Study Skills and Research Methods (HL20107)
Page 20: Exploring Relationships: Correlations & Multiple Linear Regression Dr James Betts Developing Study Skills and Research Methods (HL20107)
Page 21: Exploring Relationships: Correlations & Multiple Linear Regression Dr James Betts Developing Study Skills and Research Methods (HL20107)

SPSS Regression OutputLinear Regression

5.00 10.00 15.00

AlcoholUnits

5.00

7.50

10.00

12.50

Ski

llSco

re

SkillScore = 13.92 + -0.69 * AlcoholUnitsR-Square = 0.74

Page 22: Exploring Relationships: Correlations & Multiple Linear Regression Dr James Betts Developing Study Skills and Research Methods (HL20107)

Extrapolation versus Interpolation

Variable X (e.g. VO2max)

Var

iabl

e Y

(e.

g. E

xerc

ise

Cap

acit

y)

.

Remember that the accuracy of your equation depends upon the

linear relationship you observed ?

Interpolation =

Extrapolation =

Page 23: Exploring Relationships: Correlations & Multiple Linear Regression Dr James Betts Developing Study Skills and Research Methods (HL20107)

Multiple Linear Regression • We saw earlier how maximum sprint speed (X) can

predict/explain 64% of variance in long jump distance (Y) Y X

r2 = 64%

…but can Y be predicted any more effectively using more than one independent variable (i.e. X1, X2 , X3, etc)?

Page 24: Exploring Relationships: Correlations & Multiple Linear Regression Dr James Betts Developing Study Skills and Research Methods (HL20107)

Multiple Linear Regression • However, we can often predict Y effectively just using a

specific subset of X variables (i.e. a reduced model) Y X1

X2 Event Experience

Page 25: Exploring Relationships: Correlations & Multiple Linear Regression Dr James Betts Developing Study Skills and Research Methods (HL20107)

Multiple Linear Regression • ‘Best Subset Selection Methods’ involve calculation of r for every possible combination of IVs• Stepwise regression methods involve gradually either adding or removing variables and monitoring the impact of each action on r.

– Standard methods add and remove variables– Forward selection methods begin with 1 IV and add more– Backwards elimination methods begin with all IVs and remove

• The order in which IVs are added/removed is critical as the variance explained solely by any one will be entirely dependent upon the presence of others.

Page 26: Exploring Relationships: Correlations & Multiple Linear Regression Dr James Betts Developing Study Skills and Research Methods (HL20107)
Page 27: Exploring Relationships: Correlations & Multiple Linear Regression Dr James Betts Developing Study Skills and Research Methods (HL20107)

Model Summary

.860a .740 .708 1.74391Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), AlcoholUnitsa.

Excluded Variablesb

.072a .374 .720 .140 .994

.208a 1.150 .288 .399 .950

BodyMass

Age

Model1

Beta In t Sig.Partial

Correlation Tolerance

CollinearityStatistics

Predictors in the Model: (Constant), AlcoholUnitsa.

Dependent Variable: SkillScoreb.

SPSS Multiple Linear Regression Output

Page 28: Exploring Relationships: Correlations & Multiple Linear Regression Dr James Betts Developing Study Skills and Research Methods (HL20107)

Summary: Exploring Relationships•The relationship between two variables can be expressed as a correlation coefficient (r)

•The coefficient of determination (r2) denotes the % of one variable that is explained by another

•Linear regression can provide an equation with which to predict one variable from another

•Multiple linear regression can potentially improve this prediction using multiple predictor variables.

Page 29: Exploring Relationships: Correlations & Multiple Linear Regression Dr James Betts Developing Study Skills and Research Methods (HL20107)

Coursework Project (40 % overall grade)• Your coursework will require you to address

2 out of 3 research scenarios that are available on the unit webpage

• For each of the 2 scenarios you will need to:

– Perform a literature search in order to provide a

comprehensive introduction to the research area

– Identify the variables of interest and evaluate the

research design which was adopted

– Formulate and state appropriate hypotheses…

Page 30: Exploring Relationships: Correlations & Multiple Linear Regression Dr James Betts Developing Study Skills and Research Methods (HL20107)

• Cont’d…– Summarise descriptive statistics in an appropriate

and well presented manner– Select the most appropriate statistical test with

justification for your decision– Transfer the output of your inferential statistics

into your word document– Interpret your results and discuss the validity and

reliability of the study– Draw a meaningful conclusion (state whether

hypotheses are accepted or rejected).

Page 31: Exploring Relationships: Correlations & Multiple Linear Regression Dr James Betts Developing Study Skills and Research Methods (HL20107)

Coursework Details (see unit outline)• 2000 words maximum (i.e. 1000 for each)

• Any supporting SPSS data/outputs to be appended

• To be submitted on Thursday 6th May

Assessment Weighting

Evaluation & Analysis (30 %)

Reading & Research (20 %)

Communication & Presentation (20 %)

Knowledge (30 %)

Page 32: Exploring Relationships: Correlations & Multiple Linear Regression Dr James Betts Developing Study Skills and Research Methods (HL20107)

Coursework Details• All information relating to your coursework (including

the relevant data files) are accessible via the module web page:

http://people.bath.ac.uk/jb335/Y2%20Research%20Skills%20(FH200107).html

Web address also referenced on shared area

Electronic copy to be included with submission.

Any further questions/problems can be raised in the CW revision lecture/labs after Easter

Page 33: Exploring Relationships: Correlations & Multiple Linear Regression Dr James Betts Developing Study Skills and Research Methods (HL20107)

Timed Practical Computing Exercise (20 % overall grade)

• This test will involve analysis/interpretation of the resultant data assessed via short answer questions

• Practice session Wednesday 14th April

• Duration = 80 min (2 groups)

• I will Email specific details after Easter.

Page 34: Exploring Relationships: Correlations & Multiple Linear Regression Dr James Betts Developing Study Skills and Research Methods (HL20107)

bothIVs

unpaired

BothIVs

paired

>2 variables

2 variables

>2 groups

2 groups

>2 observations

2 observations

>1 observed frequency

1observedfrequency

Looking for differences between categories/frequencies?

(i.e. nominal LOM)

Goodnessof Fit χ2

Looking for differences within the same group

of subjects? (i.e. paired data)

Looking for differences between 2 separate groups of subjects? (i.e. unpaired data)

Looking for relationships?Looking for differences

with >1 independent variable?

Contingency χ2

Pairedt-test

1-way paired

ANOVA

Independent t-test

1-wayunpairedANOVA

Pearson’sr

Multiple Linear

Regression

2-waypaired

ANOVA

2-wayunpairedANOVA

1 IV paired1 IV

unpaired

2-waymixed model

ANOVA

Wilcoxon test

Friedman’stest

Mann-Whitneytest

KruskalWallis

test

Spearman’sr

Post-Hoc Tests

non-parametric

Start Here

If multiple DVs are involved then use MANOVA