5. testing differences

Steve Saffhill

Research Methods in Sport & ExerciseDifference Testing

Aims and Objectives

Introduce how to test for significance differences in two groups of data

Detail basic principles of difference testingIntroduce parametric and non-parametric

tests How to interpret SPSS outputs

Parametric Assumptions - Reminder

1. The data must be randomly sampled

2. The data must be high level data (interval/ratio not nominal or ordinal)

3. The data must be normally distributed…… - normal curve on histograms- z scores between 1.96 & + 1.96

4. The data must be of equal variance

These assumptions are of progressive importance.

If you do not meet #1 then use Non-parametric inferential tests

Some can be violated but you must justify doing so with supporting evidence! (Vincent, 2005)

# 4 is the least important

Basic Principles – Different Tests

• The statistical tests enables us to evaluate the effect of an independent variable on a dependent variable.

• IV = the presumed cause of the effect being researched. The researcher controls the IV (i.e., differing levels of

athletic ability)

• DV = those that can be explained by the effects of the IV What is actually being measured (Performance changes -

that the researcher cannot control)!!!!

PARAMETRIC = t-test • Use for both repeated measurements (Paired t test) (i.e.

measurements carried out on the same subjects) and

• independent measurements (independent t test) (i.e. measurements carried out on two different groups of subjects).

NON-PARAMETRIC = • Wilcoxon – Repeated measures/Paired equivalent• Mann-Whitney-U – independent t-test equivalent

• The statistical test is used to determine if the two levels of treatment differ significantly (p < .05) so that their difference would not be attributable to a chance occurrence more than 5 times in 100.

• The statistical test is always of the null hypothesis.• All that statistics can do is reject or fail to reject the null hypothesis. • Statistics cannot accept the research hypothesis...you do!• Only logical reasoning, good experimental design, and appropriate theorising can

do so. • Statistics can determine only whether the groups are different, not why they are

different.

• You the researcher say why!

Experimental Research Designs• Within-Participants Design = repeated measures test on same group

• Allows you to control for inter-individual confounding variables

• If you use different groups there is a chance of some variable other than your IV that distinguishes between your group!

• In this design you have same people in all conditions so there is less extraneous variation between conditions!

• Fewer participants too!!

• Between-Participants Design = different groups in each condition of the IV

• Each group is less likely to get bored, tired etc• Less susceptible to practice effects/results bias

• Needs more participants• Need different participants in each group• So lose some control over confounding influences!

Independent t - test

• The most frequently used t test determines whether two sample means differ reliably from each other.

• In this test the samples are independent of one another – also referred to as a between comparison.

• E.g., male v female scores in anxiety (IV/DV?)• E.g., football v boxing VO2max (IV/DV?)

Types of t test

Example

• In an experiment of training intensity and distance run in 12 minutes the results are as follows:

mean distance run in 12 min after 70% training intensity = 3004 m after 40% training intensity = 2456 m.

Can you identify the IV and DV??

Types of t test

IV = training intensity (70% vs 40%)

DV = distance run

• The question that statistics has to answer is:

“Is the difference in the two mean scores significant or is it one that could have occurred by chance given the inherent variability of groups produced by random sampling?”

Types of t test

Using SPSS to carry out the analysis gives the following result: t (2.8) = 13.81, p <.03

• The t is basically a ratio between a measure of the between group variance and within group variance

• The larger the variance between the groups compared with the variance within the groups = larger t value

DV goes into dependent list in SPSS and IV goes into

FACTOR list!

Then you have to define your groups (i.e., tell SPSS

who is who in what group)

What is the probability of obtaining that t value by chance?

• The larger t is, then the more likely there is a TRUE difference between the groups that is theoretically caused by our independent variable

• Each t value comes with its own associated probability level and this is where the p value comes from

• p = .03

• Yes – there is a significant difference in distance between the two groups

• 70% intensity group ran reliably further than the 40% intensity of training group.

• There is a significant difference between the two mean scores!

Caution!!!

• All comparison between groups‑ ‑ techniques assume that the variances between the groups are equivalent.

Although mild violations of this assumption do not present major problems, serious violations are more likely if group sizes are not approximately equal.

• Most computer programs allow unequal group sizes. However, the homogeneity assumption should be checked if group sizes are very different or even when variances are very different (automatically covered by SPSS) - Levene’s equality of variance test.

Gro u p Sta tis tic s

9 4 4 .3 6 1 6 .8 8 8 6 0 .0 9 1 6 51 0 1 4 .1 0 7 4 1 .1 6 5 0 2 .11 5 9 2

L EVELs e n i o rj u n i o r

CS1N Me a n Std . De v i a t i o n

Std . Erro rMe a n

Independent T-test on SPSS

Ind e p e n de nt Sa mple s Te s t

8 .3 11 .0 0 4 1 .7 0 4 1 9 3 .0 9 0 .2 5 4 2 .1 4 9 1 9 -.0 4 0 0 9 .5 4 8 4 3

1 .7 2 0 1 8 5 .9 6 1 .0 8 7 .2 5 4 2 .1 4 7 7 8 -.0 3 7 3 7 .5 4 5 7 1

Eq u a l v a ria n c e sa s s u me dEq u a l v a ria n c e sn o t a s s u me d

CS1F Sig .

L e v e n e 's T e s t fo rEq u a lity o f Va ria n c e s

t d f Sig . (2 -ta ile d )Me a n

Diffe re n c eStd . Erro r

Diffe re n c e L o we r Up p e r

9 5 % Co n fid e n c eIn te rv a l o f th eDiffe re n c e

t-te s t fo r Eq u a lity o f Me a n s

If less than 0.05 we can say that there is a significant difference in the variance of the two sets of scores!!!

If this is the case we say the variance is not assumed and use the bottom value here

Dependent t Test – also called a repeated measures or Paired t Test

• This means that the two groups of scores are related in some manner.

• one group of subjects is tested twice on the same variable, and the experimenter is interested in the change between the two tests

• Hence repeated measures designThere is no IV as such and hence both variables go

into dependent list in SPSS (i.e., nothing goes into

factor list)

Example: Effects of visualisation on pain– Condition 1 = imagine performing an exciting t-test whilst

plunging hands into ice cold water– Condition 2 = imagine being on a beach drinking beer

whilst plunging hands into ice cold water

IV = Condition DV = time hand immersed

• Similar formula to independent t-tests, however it is a bit more sensitive as it takes into consideration that we are using the same participants in both conditions

Counter-Balancing

• We couldn’t have all do C1 first as they would never return for C2

• It might also lead to order effects! Learning, practice etc

• ½ do C1 and ½ do C2 first and then swap

Pa ire d Sa m ple s Sta tis tic s

4 .2 2 1 9 2 6 9 1 .11 6 0 3 .0 6 8 0 54 .3 3 7 9 2 6 9 .9 8 9 7 5 .0 6 0 3 5

CS1CS2

Pa i r1

Me a n N Std . De v ia t i o nStd . Erro r

Me a n

Paired Samples Correlations

269 .523 .000CS1 & CS2Pair 1N Correlation Sig.

Pai r ed Sam pl es Test

- . 1161 1. 03399 . 06304 - . 2402 . 0081 - 1 . 841 268 . 067CS1 - CS2Pair 1M ean St d. Dev iat ion

St d. Er r orM ean Lower Upper

95% Conf idenc eI n t er v al of t he

Dif f e r enc e

Pair ed Dif f e r enc es

t df Sig . ( 2 - t ailed)

The difference between mean of C1 and C2

SPSS Output

Issues of Significance

• Differences in pain between the two conditions were not statistically significant (p = 0.67)

• Remember p must be < 0.05 to be statistically significant– There is no significant difference (p = 0.67)– This only reflects a tendency– Power issue??

• Tendency accepted as p<0.1

Non-Parametric Difference Tests

• Wilcoxon - 2 groups – within groups/repeated measures – Paired t-test

• Mann Whitney U- 2 groups – between groups – independent t-test

Mann Whitney U

• Do males and females differ on their emphasis on importance of body image?

• Hypothesis = males and females will differ on their emphasis of importance of body image

• Imagine the data were not randomly sampled/high level, and/or not Normally distributed

2 output boxes appear:

Irrelevant

p value

>0.05 = no sig diff

Similar function to t in t-test

High = better

Wilcoxon

• Differences between imagery rating scores from memory and after watching video playback

• Hypothesis = there will be a significant difference between imagery rating from memory and after watch the video of the skill

• IV = presence/absence of video (operationalised by asking subjects to rate likeness to actual performance

• DV = 1-7 scale

2 tables appear:

This is the p value!

>0.05 = NOT significant

Similar function to t in t-test

High = better

• Generally speaking parametric tests are preferred

• However, they are not always possible! It depends on YOUR data

Meaningful versus statistical significance

• Meaningful significance cannot be determined by statistics.

• It is a decision made by the researcher.

• Statistical significance is not the same as meaningful significance.

• A small effect of altering a surgical method may emerge as statistically significant but may be unimportant when we measure surgical survival

One tailed or two tailed tests• This topic is concerned with directional or non-

directional hypotheses. • If the hypothesis is non-directional, e.g. performance of

Group A will be different to Group B following an intervention then we must chose a two-tailed statistical test.

• If a hypothesis is directional, e.g. Group A will score significantly higher than group B following the intervention then we must chose a one-tailed statistical test.

• a directional hypothesis is a more powerful test.

• This is concerned with Directional vs. non-directional hypotheses.

• If the hypothesis is non-directional (e.g. performance of Group A will be different to Group B following an intervention)

• two-tailed statistical test.

• If a hypothesis is directional (e.g. Group A will score significantly higher than group B following the intervention)

• one-tailed statistical test.

• In a directional hypothesis, not only do you say there will be a difference, but also what that difference will be• E.g. women have better ultra endurance than men

One or two tailed

Pai r ed Sam pl es Test

- . 1161 1. 03399 . 06304 - . 2402 . 0081 - 1 . 841 268 . 067CS1 - CS2Pa ir 1M ean St d. Dev iat ion

St d. Er r orM ean Lower Upper

95% Con f idenc eI nt er v al of t he

Dif f e r enc e

Pair ed Dif f e r enc es

t df Sig . ( 2- t ailed)

• SPSS USUALLY assumes we conduct two tailed research so the p value it produces is for a two tailed test! So we do nothing!!!

• However, if our hypothesis is one-tailed we must change the p value SPSS has given us into a one tailed value..OR click on 1-tailed on SPSS if it has it!

• We simply half it!!!This p value becomes 0.0335!!!

Notice there was no significant difference and now there is!!!!

What are the implications of one and two tailed hypotheses?

‘Why not always conduct one-tailed tests if it is more likely to demonstrate significance?’

• The answer lies in the fact that you have to declare what you are going to do before conducting the study (hypothesis).

(Remember - your study should be rooted in theory: you must have an idea of what should happen)

• If we have conducted a one-tailed test and the result goes in the opposite direction to that predicted, no matter how extreme, then you cannot claim this as significant.

Why not always carry out two-tailed tests?

• For example, if our theory concerns the effects of stimulants on motor performance, then stimulants generally speed up motor reactions.

• In which case it makes no sense to predict that tasks performed with a stimulant will be performed significantly faster or slower than those performed without it.

• In this case the theory dictates a directional test and hence a one-tailed test of the hypothesis.

Think about your research first and select a statistical test that fits your design/the

literature.

What if more than 2 groups?• Often you will want to conduct a test to see if there are

differences between more than two groups/conditions

• VO2 max in football, rugby and hockey?

• Motivation in Year 1, Year 2, Year 3?

• ANOVA or MANOVA!

• More advanced stats - we will cover this next week!

Summary Check parametric assumptions to use correct test

Allow to test significance of the IV on changing DV

T-tests used to test data from TWO groups

Can run either paired or independent sampled t-test

For non-parametric data use either: Wilcoxon – paired groups Mann Whitney U – independent groups

Can have 1- or 2-tailed significance, it depends on your hypothesis!

5. testing differences

Education