learning programme hypothesis testing intermediate training in quantitative analysis bangkok 19-23...
TRANSCRIPT
LEARNING PROGRAMME
Hypothesis testingHypothesis testing
Intermediate Intermediate Training in Training in
Quantitative Analysis Quantitative Analysis Bangkok 19-23 November Bangkok 19-23 November
20072007
LEARNING PROGRAMME - 2
Hypothesis testing
Hypothesis testing involves:1. defining research questions and
2. assessing whether changes in an independent variable are associated with changes in the dependent variable by conducting a statistical test
Dependent and independent variablesDependent and independent variables Dependent variables are the outcome variables Independent variables are the predictive/ explanatory
variables
LEARNING PROGRAMME - 3
Example…
Research question: Is educational level of the mother related to birthweight?
What is the dependent and independent variable?
Research question: Is access to roads related to educational level of mothers?
Now?
LEARNING PROGRAMME - 4
Tests statistics
To test hypotheses, we rely on test statistics…
Test statistics are simply the result of a particular statistical test
The most common include:The most common include:
1. T-tests calculate T-statistics
2. ANOVAs calculate F-statistics
3. Correlations calculate the pearson correlation coefficient
LEARNING PROGRAMME - 5
Significant test statistic Is the relationship observed by chance, or because there actually
is a relationship between the variables???
This probability is referred to as a p-value and is expressed a decimal percent (ie. p=0.05)
If the probability of obtaining the value of our test statistic by chance is less than 5% then we generally accept the experimental hypothesis as true: there is an effect on the population
Ex: if p=0.1-- What does this mean? Do we accept the experimental hypothesis?
This probability is also referred to as significance level (sig.)
LEARNING PROGRAMME
Hypothesis testingHypothesis testingPart 1: ContinuousPart 1: Continuous variables variables
Intermediate Intermediate Training in Training in
Quantitative Analysis Quantitative Analysis Bangkok 19-23 November Bangkok 19-23 November
20072007
LEARNING PROGRAMME - 7
Topics to be covered in this presentation
T- test One way analysis of variance (ANOVA) Correlation Simple linear regression
LEARNING PROGRAMME - 8
Learning objectives
By the end of this session, the participant should be able to:Conduct t-testsConduct ANOVAConduct correlationsConduct linear regressions
LEARNING PROGRAMME - 9
Hypothesis testing…WFP tests a variety of hypothesis…
Some of the most common include:
1. Looking at differences between groups of people (comparisons of means)
Ex. Are different livelihood groups more likely to have different levels food consumption??
2. Looking at the relationship between two variables…
Ex. Is asset wealth associated with food consumption??
LEARNING PROGRAMME - 11
T-test
A test using the t-statistic that establishes whether two means differ significantly.Independent means t-test:Independent means t-test:
It is used in situations in which there are two experimental conditions and different participants have been used in each condition.
Dependent or paired means t-test:Dependent or paired means t-test:This test is used when there are two
experimental conditions and the same participants took part in both conditions of experiment.
LEARNING PROGRAMME - 12
T-test assumptions
In order to conduct a T-test, data must be: Normally distributed Interval Estimates are independent Homogeneity of variance
Independent and dependent t-tests
Independent t-tests
LEARNING PROGRAMME - 13
The independent t-test
The independent t-test compares two means, when those means have come from different groups of people;
This test is the most useful for our purposes
LEARNING PROGRAMME - 14
T-tests formulas
Quite simply, the T-test formula is a ratio of the:
Difference between the two means or averages/ the variability or dispersion of the scores
Statistically this formula is:
LEARNING PROGRAMME - 15
Example T-testsDifference in weight for age z-scores between males and females in Kenya
T-test =
T-test = 5.56
Report
WAZNEW
-1.0441 2673 1.25354 1.571
-.8505 2675 1.28895 1.661
-.9473 5348 1.27494 1.625
Gender of child
Male
Female
Total
Mean N Std. Deviation Variance
2675/661.12673/571.1
)8505.0(0441.1
LEARNING PROGRAMME - 16
To conduct an independent t-test
In SPSS, t-tests are best run using the following steps:
1. Click on “Analyze” drop down menu2. Click on “Compare Means”3. Click on “Independent- Sample T-Test…”4. Move the independent and dependent variable
into proper boxes5. Click “OK”
LEARNING PROGRAMME - 17
One note of caution about independent t-tests
It is important to ensure that the assumption of homogeneity of variance (sometimes referred to as homoschedasticity) is met
To do so:
Look at the column labelled Levene’s Test for Equality of Variance.
If the Sig. value is less than .05 then the assumption of homogeneity of variance has been broken and you should look at the row in the table labelled Equal variances not assumed.
If the Sig. value of Levene’s test is bigger than .05 then you should look at the row in the table labelled Equal variances assumed.
LEARNING PROGRAMME - 18
Testing for homogeneity of variance Look at the column labelled Sig. : if the value is less than .05 then
the means of the two groups are significantly different. Look at the values of the means to tell you how the groups differ.
Independent Samples Test
Levene's Test for Equality of
Variances t-test for Equality of Means
F Sig. t df Sig. (2-tailed) Mean Difference Std. Error Difference
95% Confidence Interval of the Difference
Lower Upper Weight for age z-score (underweight)
Equal variances assumed 1.333 .248 -1.087 2096 .277 -.09170 .08438 -.25718 .07377
Equal variances not assumed
-1.085 2070.504 .278 -.09170 .08450 -.25743 .07402
LEARNING PROGRAMME - 19
What to do if we want to statistically compare differences in three means?
Analysis of variance
(ANOVA)
LEARNING PROGRAMME - 20
Analysis of Variance (ANOVA)
ANOVAs, however, produce an F-statistic, which is an omnibus test, i.e. it tells us if there are any difference among the different means but not how (or which) means differ.
ANOVAs are similar to t-tests and in fact an ANOVA conducted to compare two means will give the same answer as a t-test.
LEARNING PROGRAMME - 21
Calculating an ANOVA
ANOVA formulas: calculating an ANOVA by hand is complicated and knowing the formulas are not necessary…
Instead, we will rely on SPSS to calculate ANOVAs…
LEARNING PROGRAMME - 22
Example of One-Way ANOVAs
Report
WAZNEW
-1.3147 736 1.32604
-1.0176 3247 1.21521
-.5525 907 1.25238
-.1921 172 1.33764
-.9494 5062 1.27035
Mother's education level
No education
Primary
Secondary
Higher
Total
Mean N Std. Deviation
ANOVA
WAZNEW
354.567 3 118.189 76.507 .000
7812.148 5057 1.545
8166.715 5060
Between Groups
Within Groups
Total
Sum of Squares df Mean Square F Sig.
Research question: Do mean child malnutrition (GAM) rates differ according to mother’s educational level (none, primary, or secondary/ higher)?
LEARNING PROGRAMME - 23
To calculate one-way ANOVAs in SPSS
In SPSS, one-way ANOVAs are run using the following steps:
Click on “Analyze” drop down menu
1. Click on “Compare Means”
2. Click on “One-Way ANOVA…”
3. Move the independent (factor) and dependent variable into proper boxes
4. Click “OK”
LEARNING PROGRAMME - 24
Determining where differences exist
In addition to determining that differences exist among the means, you may want to know which means differ.
There is one type of test for comparing means:
Post hoc tests are run after the experiment has been conducted (if you don’t have specific hypothesis).
LEARNING PROGRAMME - 25
ANOVA post hoc tests
Once you have determined that differences exist among the means, post hoc range tests and pairwise multiple comparisons can determine which means differ.
Tukeys post hoc test is the amongst the most popular and are adequate for our purposes…so we will focus on this test…
LEARNING PROGRAMME - 26
To calculate Tukeys test in SPSS
In SPSS, Tukeys post hoc tests are run using the following steps:
1. Click on “Analyze” drop down menu2. Click on “Compare Means”3. Click on “One-Way ANOVA…”4. Move the independent and dependent variable into proper
boxes5. Click on “Post Hoc…”6. Check box beside “Tukey”7. Click “Continue”8. Click “OK”
LEARNING PROGRAMME - 27
Tukey’s post hoc testMultiple Comparisons
Dependent Variable: WAZNEW
Tukey HSD
-.2971* .05074 .000 -.4275 -.1667
-.7621* .06166 .000 -.9206 -.6037
-1.1226* .10537 .000 -1.3933 -.8518
.2971* .05074 .000 .1667 .4275
-.4650* .04667 .000 -.5850 -.3451
-.8255* .09737 .000 -1.0757 -.5752
.7621* .06166 .000 .6037 .9206
.4650* .04667 .000 .3451 .5850
-.3604* .10348 .003 -.6263 -.0945
1.1226* .10537 .000 .8518 1.3933
.8255* .09737 .000 .5752 1.0757
.3604* .10348 .003 .0945 .6263
(J) Mother's education level
Primary
Secondary
Higher
No education
Secondary
Higher
No education
Primary
Higher
No education
Primary
Secondary
(I) Mother's education level
No education
Primary
Secondary
Higher
MeanDifference (I-J) Std. Error Sig. Lower Bound Upper Bound
95% Confidence Interval
The mean difference is significant at the .05 level.*.
LEARNING PROGRAMME - 28
Other types of Post Hoc tests
There are lots of different post hoc tests, characterized by different adjustment/ setting of the error rate for each test and for multiple comparisons.
if interested, please feel free to investigate more and to try different tests – SPSS help
might provide you some good hints!
LEARNING PROGRAMME - 29
Now what if we would like to measure how well two variables are associated with one another?
Correlations
LEARNING PROGRAMME - 30
CorrelationsT-tests and ANOVAs measure differences
between means
Correlations explain the strength of the linear relationship between two variables…
Pearson correlation coefficients (r) are the test statistics used to statistically measure correlations
LEARNING PROGRAMME - 31
Types of correlations Positive correlations: Two variables are positively correlated if
increases (or decreases) in one variable results in increases (or decreases) in the other variable.
Negative correlations: Two variables are negatively correlated if one increases (or decreases) and the other decreases (on increases).
No correlations: Two variables are not correlated if there is no linear relationship between them.
Strong negative correlation
No correlation Strong positive correlation
-1--------------------------0---------------------------1
LEARNING PROGRAMME - 32
Illustrating types of correlations
Perfect positive correlation
Test statistic= 1
Positive correlation
Test statistics>0 and <1
Perfect negative correlation
Test statistic= -1
Negative correlation
Test statistic<0 and >-1
LEARNING PROGRAMME - 33
Example for the Kenya Data
Correlation between children’s weight and height…
Cases w eighted by CHWEIGHT
Weight of child
3002001000
He
igh
t o
f ch
ild
1400
1200
1000
800
600
400
200
Is this a positive or negative correlation??
In what range would the test statistics fall?
LEARNING PROGRAMME - 34
Measuring the strength of a correlation: Pearson’s correlation coefficient
Pearson correlation coefficient (r) is the name of the test statistic
It is measured using the following formula:
Looks complicated and we will rely on spss to calculate them…
LEARNING PROGRAMME - 35
To calculate a Pearson’s correlation coefficient in SPSS
In SPSS, correlations are run using the following steps:
1. Click on “Analyze” drop down menu
2. Click on “Correlate”
3. Click on “Bivariate…”
4. Move the variables that you are interested in assessing the correlation between into the box on the right
5. Click “OK”
LEARNING PROGRAMME - 36
example in SPSS…
Correlations
1 .932**
.000
10 10
.932** 1
.000
10 10
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
wealth
FCS
wealth FCS
Correlation is significant at the 0.01 level(2-tailed).
**.
Using SPSS we get Pearson’s correlation (0.932)
LEARNING PROGRAMME - 37
1. Lets refresh briefly, what does a correlation of 0.932 mean??
2. What does *** mean?
LEARNING PROGRAMME - 38
What if we are interested in defining this relationship further by assessing how change in one variable specifically impacts the other variable?
Linear regression
LEARNING PROGRAMME - 39
Linear regression
Allows to statistically model the relationship between variables…
allowing us to determine how change in one unit of an independent variable specifically impacts
LEARNING PROGRAMME - 40
Types of linear regressionThere are two types of linear regression:
1. Simple linear regression
2. Multiple linear regression
3. Simple linear regression compares two variables, assessing how the dependent affects the independent (as discussed)
4. Multiple linear regression is more complicated– this involves assessing the relationship of two variables, while taking account of the impact of other variables.
5. We will focus only on simple linear regression…
LEARNING PROGRAMME - 41
The mechanics of simple linear regression…put simply Linear regression allows us to linearly model the relationship between two variables (in this
case x and y), allowing us to predict how one variable would respond given changes in another
Linear regression actually fits the line that best shows the relationship between x and y and provides the equation for this line
Y = a + b x Y= dependent variable a= constant coefficient b= independent variable coefficient Using this equation we can predict
changes in dependent variables, given changes in the independent variable
LEARNING PROGRAMME - 42
Simple linear regressionTo illustrate, lets return to the previous example of wealth index and FCS
Here, the correlation coefficient (0.932) indicates that increases in wealth index are associated with increases in FCS.
Conducting a linear regression would allow us to estimate specifically how FCS increases given increases in units of wealth index
Correlations
1 .932**
.000
10 10
.932** 1
.000
10 10
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
wealth
FCS
wealth FCS
Correlation is significant at the 0.01 level(2-tailed).
**.
LEARNING PROGRAMME - 43
Simple linear regressionRegressing FCS by wealth index gives the
following output:
Using this output, we can build the regression equation…Y = a + b x
Y= FCSa= 38.482b= 14.101x= wealth index
Coefficientsa
38.482 2.737 14.058 .000
14.101 1.932 .932 7.297 .000
(Constant)
wealth
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig.
Dependent Variable: FCSa.
LEARNING PROGRAMME - 44
Compiling the equation… FCS= 38.482 + 14.101(wealth index) What if we wanted to predict the FCS of a
households in this population who had an wealth index of 0.569?
FCS= 38.482 + 14.101 (0.569) FCS= 46.50 What would the predicted FCS of a household be if
the wealth index is: 2.256? -1.256?
LEARNING PROGRAMME - 45
To calculate a linear regression in SPSS…
In SPSS, correlations are run using the following steps:
1. Click on “Analyze” drop down menu
2. Click on “Regression”
3. Click on “Linear…”
4. Move the independent and dependent variables into the proper boxes
5. Click “OK”