learning programme hypothesis testing intermediate training in quantitative analysis bangkok 19-23...

46
LEARNING PROGRAMME Hypothesis testing Hypothesis testing Intermediate Intermediate Training in Training in Quantitative Quantitative Analysis Analysis Bangkok 19-23 November Bangkok 19-23 November 2007 2007

Upload: magdalen-mcdonald

Post on 26-Dec-2015

218 views

Category:

Documents


4 download

TRANSCRIPT

LEARNING PROGRAMME

Hypothesis testingHypothesis testing

Intermediate Intermediate Training in Training in

Quantitative Analysis Quantitative Analysis Bangkok 19-23 November Bangkok 19-23 November

20072007

LEARNING PROGRAMME - 2

Hypothesis testing

Hypothesis testing involves:1. defining research questions and

2. assessing whether changes in an independent variable are associated with changes in the dependent variable by conducting a statistical test

Dependent and independent variablesDependent and independent variables Dependent variables are the outcome variables Independent variables are the predictive/ explanatory

variables

LEARNING PROGRAMME - 3

Example…

Research question: Is educational level of the mother related to birthweight?

What is the dependent and independent variable?

Research question: Is access to roads related to educational level of mothers?

Now?

LEARNING PROGRAMME - 4

Tests statistics

To test hypotheses, we rely on test statistics…

Test statistics are simply the result of a particular statistical test

The most common include:The most common include:

1. T-tests calculate T-statistics

2. ANOVAs calculate F-statistics

3. Correlations calculate the pearson correlation coefficient

LEARNING PROGRAMME - 5

Significant test statistic Is the relationship observed by chance, or because there actually

is a relationship between the variables???

This probability is referred to as a p-value and is expressed a decimal percent (ie. p=0.05)

If the probability of obtaining the value of our test statistic by chance is less than 5% then we generally accept the experimental hypothesis as true: there is an effect on the population

Ex: if p=0.1-- What does this mean? Do we accept the experimental hypothesis?

This probability is also referred to as significance level (sig.)

LEARNING PROGRAMME

Hypothesis testingHypothesis testingPart 1: ContinuousPart 1: Continuous variables variables

Intermediate Intermediate Training in Training in

Quantitative Analysis Quantitative Analysis Bangkok 19-23 November Bangkok 19-23 November

20072007

LEARNING PROGRAMME - 7

Topics to be covered in this presentation

T- test One way analysis of variance (ANOVA) Correlation Simple linear regression

LEARNING PROGRAMME - 8

Learning objectives

By the end of this session, the participant should be able to:Conduct t-testsConduct ANOVAConduct correlationsConduct linear regressions

LEARNING PROGRAMME - 9

Hypothesis testing…WFP tests a variety of hypothesis…

Some of the most common include:

1. Looking at differences between groups of people (comparisons of means)

Ex. Are different livelihood groups more likely to have different levels food consumption??

2. Looking at the relationship between two variables…

Ex. Is asset wealth associated with food consumption??

LEARNING PROGRAMME - 10

How to assess differences in two means statistically

T-tests

LEARNING PROGRAMME - 11

T-test

A test using the t-statistic that establishes whether two means differ significantly.Independent means t-test:Independent means t-test:

It is used in situations in which there are two experimental conditions and different participants have been used in each condition.

Dependent or paired means t-test:Dependent or paired means t-test:This test is used when there are two

experimental conditions and the same participants took part in both conditions of experiment.

LEARNING PROGRAMME - 12

T-test assumptions

In order to conduct a T-test, data must be: Normally distributed Interval Estimates are independent Homogeneity of variance

Independent and dependent t-tests

Independent t-tests

LEARNING PROGRAMME - 13

The independent t-test

The independent t-test compares two means, when those means have come from different groups of people;

This test is the most useful for our purposes

LEARNING PROGRAMME - 14

T-tests formulas

Quite simply, the T-test formula is a ratio of the:

Difference between the two means or averages/ the variability or dispersion of the scores

Statistically this formula is:

LEARNING PROGRAMME - 15

Example T-testsDifference in weight for age z-scores between males and females in Kenya

T-test =

T-test = 5.56

Report

WAZNEW

-1.0441 2673 1.25354 1.571

-.8505 2675 1.28895 1.661

-.9473 5348 1.27494 1.625

Gender of child

Male

Female

Total

Mean N Std. Deviation Variance

2675/661.12673/571.1

)8505.0(0441.1

LEARNING PROGRAMME - 16

To conduct an independent t-test

In SPSS, t-tests are best run using the following steps:

1. Click on “Analyze” drop down menu2. Click on “Compare Means”3. Click on “Independent- Sample T-Test…”4. Move the independent and dependent variable

into proper boxes5. Click “OK”

LEARNING PROGRAMME - 17

One note of caution about independent t-tests

It is important to ensure that the assumption of homogeneity of variance (sometimes referred to as homoschedasticity) is met

To do so:

Look at the column labelled Levene’s Test for Equality of Variance.

If the Sig. value is less than .05 then the assumption of homogeneity of variance has been broken and you should look at the row in the table labelled Equal variances not assumed.

If the Sig. value of Levene’s test is bigger than .05 then you should look at the row in the table labelled Equal variances assumed.

LEARNING PROGRAMME - 18

Testing for homogeneity of variance Look at the column labelled Sig. : if the value is less than .05 then

the means of the two groups are significantly different. Look at the values of the means to tell you how the groups differ.

Independent Samples Test

Levene's Test for Equality of

Variances t-test for Equality of Means

F Sig. t df Sig. (2-tailed) Mean Difference Std. Error Difference

95% Confidence Interval of the Difference

Lower Upper Weight for age z-score (underweight)

Equal variances assumed 1.333 .248 -1.087 2096 .277 -.09170 .08438 -.25718 .07377

Equal variances not assumed

-1.085 2070.504 .278 -.09170 .08450 -.25743 .07402

LEARNING PROGRAMME - 19

What to do if we want to statistically compare differences in three means?

Analysis of variance

(ANOVA)

LEARNING PROGRAMME - 20

Analysis of Variance (ANOVA)

ANOVAs, however, produce an F-statistic, which is an omnibus test, i.e. it tells us if there are any difference among the different means but not how (or which) means differ.

ANOVAs are similar to t-tests and in fact an ANOVA conducted to compare two means will give the same answer as a t-test.

LEARNING PROGRAMME - 21

Calculating an ANOVA

ANOVA formulas: calculating an ANOVA by hand is complicated and knowing the formulas are not necessary…

Instead, we will rely on SPSS to calculate ANOVAs…

LEARNING PROGRAMME - 22

Example of One-Way ANOVAs

Report

WAZNEW

-1.3147 736 1.32604

-1.0176 3247 1.21521

-.5525 907 1.25238

-.1921 172 1.33764

-.9494 5062 1.27035

Mother's education level

No education

Primary

Secondary

Higher

Total

Mean N Std. Deviation

ANOVA

WAZNEW

354.567 3 118.189 76.507 .000

7812.148 5057 1.545

8166.715 5060

Between Groups

Within Groups

Total

Sum of Squares df Mean Square F Sig.

Research question: Do mean child malnutrition (GAM) rates differ according to mother’s educational level (none, primary, or secondary/ higher)?

LEARNING PROGRAMME - 23

To calculate one-way ANOVAs in SPSS

In SPSS, one-way ANOVAs are run using the following steps:

Click on “Analyze” drop down menu

1. Click on “Compare Means”

2. Click on “One-Way ANOVA…”

3. Move the independent (factor) and dependent variable into proper boxes

4. Click “OK”

LEARNING PROGRAMME - 24

Determining where differences exist

In addition to determining that differences exist among the means, you may want to know which means differ.

There is one type of test for comparing means:

Post hoc tests are run after the experiment has been conducted (if you don’t have specific hypothesis).

LEARNING PROGRAMME - 25

ANOVA post hoc tests

Once you have determined that differences exist among the means, post hoc range tests and pairwise multiple comparisons can determine which means differ.

Tukeys post hoc test is the amongst the most popular and are adequate for our purposes…so we will focus on this test…

LEARNING PROGRAMME - 26

To calculate Tukeys test in SPSS

In SPSS, Tukeys post hoc tests are run using the following steps:

1. Click on “Analyze” drop down menu2. Click on “Compare Means”3. Click on “One-Way ANOVA…”4. Move the independent and dependent variable into proper

boxes5. Click on “Post Hoc…”6. Check box beside “Tukey”7. Click “Continue”8. Click “OK”

LEARNING PROGRAMME - 27

Tukey’s post hoc testMultiple Comparisons

Dependent Variable: WAZNEW

Tukey HSD

-.2971* .05074 .000 -.4275 -.1667

-.7621* .06166 .000 -.9206 -.6037

-1.1226* .10537 .000 -1.3933 -.8518

.2971* .05074 .000 .1667 .4275

-.4650* .04667 .000 -.5850 -.3451

-.8255* .09737 .000 -1.0757 -.5752

.7621* .06166 .000 .6037 .9206

.4650* .04667 .000 .3451 .5850

-.3604* .10348 .003 -.6263 -.0945

1.1226* .10537 .000 .8518 1.3933

.8255* .09737 .000 .5752 1.0757

.3604* .10348 .003 .0945 .6263

(J) Mother's education level

Primary

Secondary

Higher

No education

Secondary

Higher

No education

Primary

Higher

No education

Primary

Secondary

(I) Mother's education level

No education

Primary

Secondary

Higher

MeanDifference (I-J) Std. Error Sig. Lower Bound Upper Bound

95% Confidence Interval

The mean difference is significant at the .05 level.*.

LEARNING PROGRAMME - 28

Other types of Post Hoc tests

There are lots of different post hoc tests, characterized by different adjustment/ setting of the error rate for each test and for multiple comparisons.

if interested, please feel free to investigate more and to try different tests – SPSS help

might provide you some good hints!

LEARNING PROGRAMME - 29

Now what if we would like to measure how well two variables are associated with one another?

Correlations

LEARNING PROGRAMME - 30

CorrelationsT-tests and ANOVAs measure differences

between means

Correlations explain the strength of the linear relationship between two variables…

Pearson correlation coefficients (r) are the test statistics used to statistically measure correlations

LEARNING PROGRAMME - 31

Types of correlations Positive correlations: Two variables are positively correlated if

increases (or decreases) in one variable results in increases (or decreases) in the other variable.

Negative correlations: Two variables are negatively correlated if one increases (or decreases) and the other decreases (on increases).

No correlations: Two variables are not correlated if there is no linear relationship between them.

Strong negative correlation

No correlation Strong positive correlation

-1--------------------------0---------------------------1

LEARNING PROGRAMME - 32

Illustrating types of correlations

Perfect positive correlation

Test statistic= 1

Positive correlation

Test statistics>0 and <1

Perfect negative correlation

Test statistic= -1

Negative correlation

Test statistic<0 and >-1

LEARNING PROGRAMME - 33

Example for the Kenya Data

Correlation between children’s weight and height…

Cases w eighted by CHWEIGHT

Weight of child

3002001000

He

igh

t o

f ch

ild

1400

1200

1000

800

600

400

200

Is this a positive or negative correlation??

In what range would the test statistics fall?

LEARNING PROGRAMME - 34

Measuring the strength of a correlation: Pearson’s correlation coefficient

Pearson correlation coefficient (r) is the name of the test statistic

It is measured using the following formula:

Looks complicated and we will rely on spss to calculate them…

LEARNING PROGRAMME - 35

To calculate a Pearson’s correlation coefficient in SPSS

In SPSS, correlations are run using the following steps:

1. Click on “Analyze” drop down menu

2. Click on “Correlate”

3. Click on “Bivariate…”

4. Move the variables that you are interested in assessing the correlation between into the box on the right

5. Click “OK”

LEARNING PROGRAMME - 36

example in SPSS…

Correlations

1 .932**

.000

10 10

.932** 1

.000

10 10

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

wealth

FCS

wealth FCS

Correlation is significant at the 0.01 level(2-tailed).

**.

Using SPSS we get Pearson’s correlation (0.932)

LEARNING PROGRAMME - 37

1. Lets refresh briefly, what does a correlation of 0.932 mean??

2. What does *** mean?

LEARNING PROGRAMME - 38

What if we are interested in defining this relationship further by assessing how change in one variable specifically impacts the other variable?

Linear regression

LEARNING PROGRAMME - 39

Linear regression

Allows to statistically model the relationship between variables…

allowing us to determine how change in one unit of an independent variable specifically impacts

LEARNING PROGRAMME - 40

Types of linear regressionThere are two types of linear regression:

1. Simple linear regression

2. Multiple linear regression

3. Simple linear regression compares two variables, assessing how the dependent affects the independent (as discussed)

4. Multiple linear regression is more complicated– this involves assessing the relationship of two variables, while taking account of the impact of other variables.

5. We will focus only on simple linear regression…

LEARNING PROGRAMME - 41

The mechanics of simple linear regression…put simply Linear regression allows us to linearly model the relationship between two variables (in this

case x and y), allowing us to predict how one variable would respond given changes in another

Linear regression actually fits the line that best shows the relationship between x and y and provides the equation for this line

Y = a + b x Y= dependent variable a= constant coefficient b= independent variable coefficient Using this equation we can predict

changes in dependent variables, given changes in the independent variable

LEARNING PROGRAMME - 42

Simple linear regressionTo illustrate, lets return to the previous example of wealth index and FCS

Here, the correlation coefficient (0.932) indicates that increases in wealth index are associated with increases in FCS.

Conducting a linear regression would allow us to estimate specifically how FCS increases given increases in units of wealth index

Correlations

1 .932**

.000

10 10

.932** 1

.000

10 10

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

wealth

FCS

wealth FCS

Correlation is significant at the 0.01 level(2-tailed).

**.

LEARNING PROGRAMME - 43

Simple linear regressionRegressing FCS by wealth index gives the

following output:

Using this output, we can build the regression equation…Y = a + b x

Y= FCSa= 38.482b= 14.101x= wealth index

Coefficientsa

38.482 2.737 14.058 .000

14.101 1.932 .932 7.297 .000

(Constant)

wealth

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: FCSa.

LEARNING PROGRAMME - 44

Compiling the equation… FCS= 38.482 + 14.101(wealth index) What if we wanted to predict the FCS of a

households in this population who had an wealth index of 0.569?

FCS= 38.482 + 14.101 (0.569) FCS= 46.50 What would the predicted FCS of a household be if

the wealth index is: 2.256? -1.256?

LEARNING PROGRAMME - 45

To calculate a linear regression in SPSS…

In SPSS, correlations are run using the following steps:

1. Click on “Analyze” drop down menu

2. Click on “Regression”

3. Click on “Linear…”

4. Move the independent and dependent variables into the proper boxes

5. Click “OK”

LEARNING PROGRAMME - 46

Now… practical exercise!