analyzing experimental research data
TRANSCRIPT
Analyzing Experimental Research Data: The T-test ANOVA and Chi-square
Atula AhujaChangyan Shi
The experimental design determines the statistical test to be used to analyze the data.
There are several tools and procedures for analyzing quantitative data obtained from different types of experimental designs. Different designs call for different methods of analysis. This presentation focuses on:
1. T-test2. Analysis of variance (F-test), and3. Chi-square test
Experimental Design and Statistics
The Logic of Significance Testing
The results of an inferential statistical test informs whether the results of an experiment would occur frequently or rarely by chance.
Inferential statistical test with small p values occur frequently by chance (accept the null hypothesis), whereas large values occur rarely by chance (reject the null hypothesis).
The Logic of Significance Testing
P value is the probability at which the null hypothesis will be rejected when it is true.
Traditionally statisticians say that any event that occurs by chance 5 times or fewer in 100 occasions is a rare event. (i.e., .05 level of significance).
T- Test When the means of two independent groups
are to be compared, we can use the T- test. This test can help determine how confident we can be that the differences between two groups as a result of the treatment is not due to chance. The researcher calculates a t-value using the sample mean and standard deviation and compares the calculated t-value against a tabulated value. If null hypothesis is rejected, we can say that the difference between the two groups is significant.
Example for t-testA researcher compares performances of two randomly selected groups learning French. The two groups, follow up their frontal lessons with practice sessions. the Experimental Group gets practice
sessions with the aid of the computer. the Control group has practice sessions with
a teacher. The researcher investigates the effects of the computer practice session on students’ achievement in French.
Hypotheses testing
210 μμ:H 211 μμ:H
Mean scores of the two groups are equal
2ˆˆ
μ̂-μ̂
2
1
1
21
nn
tcal
Suppose, upon calculation, the researcher finds thetcal = 1.99
The researcher can use t-test to test the hypothesis
Example for t-test
Upon comparison with the table value of T at p=0.05, it is found that tcal > ttab
This means the null hypothesis is rejected and the differences between two groups are significantly different.
The result is reported as t=1.99, p=.05
One-Way ANOVA
When there are more than two groups, the appropriate procedure is ‘ANOVA’ where we need to analyze the variability within groups and variability between groups. The test we do in ANOVA is the F-test
One-Way ANOVA
3210 μμμ:Hsame the are means population the of allNot :1H
The “F-test”
groupswithinyVariabilitgroupsbetweenyVariabilitF
ExampleTM1 TM2 TM3 TM460 50 40 5767 52 45 6742 43 43 5467 67 55 6756 67 46 6962 59 61 6964 67 45 6859 64 52 6572 63 53 7071 65 63 68
ExampleStep 1) calculate the sum of squares between groups: Mean for group 1 = 62.0Mean for group 2 = 59.7Mean for group 3 = 50.3Mean for group 4 = 65.4 Grand mean= 59.85
SSB = [(62-59.85)2 + (59.7-59.85)2 + (50.3-59.85)2 + (65.4-59.85)2 ] x n per group= 19.65x10 = 1266.6 , n= number of observations in each group
TM1 TM2 TM3 TM460 50 40 5767 52 45 6742 43 43 5467 67 55 6756 67 46 6962 59 61 6964 67 45 6859 64 52 6572 63 53 7071 65 63 68
ExampleStep 2) calculate the sum of squares within groups: (60-62) 2+(67-62) 2+ (42-62) 2+ (67-62) 2+ (56-62)
2+ (62-62) 2+ (64-62) 2+ (59-62) 2+ (72-62) 2+ (71-62) 2+ (50-59.7) 2+ (52-59.7) 2+ (43-59.7) 2+67-59.7) 2+ (67-59.7) 2+ (69-59.7) 2…+…… = 2060.6
Mean=62 Mean=59.7 Mean=50.3 Mean=65.4
TM1 TM2 TM3 TM460 50 40 5767 52 45 6742 43 43 5467 67 55 6756 67 46 6962 59 61 6964 67 45 6859 64 52 6572 63 53 7071 65 63 68
Step 3) Fill in the ANOVA table
3 1266.6 422.2 7.38 0.001
36 2060.6 57.2
Source of variation
d.f.
Sum of squares
Mean Sum of Squares
F-statistic
p-value
Between
Within
Total 39 3327.2
F value
Factorial ANOVA (Two way)
When two factors or more than two factors are involved.(age, gender. level of competence)
The aim is to test if there is also an interaction between teaching method and gender/age/etc.
Factorial ANOVA (Two way)
Gender
Teaching method 1
Teaching method 2
Teaching method 3
Teaching method 4
1 60 50 48 471 67 52 49 671 42 43 50 541 67 67 55 671 56 67 56 682 62 59 61 652 64 67 61 652 59 64 60 562 72 63 59 602 71 65 64 65
Factorial ANOVA (Two way)
TM 1 TM 2 TM 3 TM 4 Average
Male Mean=58.4N=5
Mean=55.8N=5
Mean=51.6N=5
Mean=60.6N=5
Mean=56.6N=20
Female Mean=65.6N=5
Mean=63.6N=5
Mean=61N=5
Mean=62.2N=5
Mean=63.1N=20
Average 62N=10
59.7N=10
56.3N=10
61.4N=10
59.9N=40
1μ̂1μ̂
Using the between, within and interaction sum of squares, we create the ANOVA table and calculate F-Statistic associated with main effects and interaction effects
Hypothesis test procedure will be the same as before
Chi Square (Χ2)
Where o = observed frequencies, and e = expected frequencies
The most obvious difference between the chi‑square tests and the other hypothesis tests we have considered (t and ANOVA) is the nature of the data.
For chi‑square, the data are frequencies rather than numerical scores.
Chi Squared is used to observe the difference between what we actually observe and what we expect to find if the null hypothesis is true.
The chi-square statistic is calculated as follows
Data was collected on citizen’s viewpoints about building of the 2012 Olympic venue at Stratford and tried to find if viewpoints changed according to the perspectives of different groups.
Through a survey/questionnaire, 20 responses from each category of local person were collected, about the usefulness of the new Olympic developments. The statement posed:‘The 2012 Olympic Games development will be of benefit to the whole community of Stratford, east London.’
1 2 3 4
Strongly agree Agree Disagree Strongly disagree
Case Study: Using chi-squared to analyse questionnaire responses
Results of the survey.
Category (type) Frequency of negative responses (Observed values: o)
Business owner 4School student 6Adult male resident 14Adult female resident 10Senior citizen 16
20 people responded from each category and only the frequency of negative response, i.e. those who either disagreed or strongly disagreed with the statement.
The expected data (e) is the mean negative frequency of response, calculated by adding up the observed data (o) and then dividing by the number of categories, i.e. 5. This gives an expected frequency of 10 for each category.
Business owner
School student
Adult male
resident
Adult female residen
t
Senior citizen
Total
o 4 6 14 10 16 50
e 10 10 10 10 10 50
o - e -6 -4 4 0 6 --------
(o – e)² 36 16 16 0 36 --------
(o – e)² e
3.6 1.6 1.6 0 3.6 --------
x² 3.6 1.6 1.6 0 3.6 10.4
Interpreting the Chi-Squared ValueCalculated chi-square value= 10.4 4 degrees of freedom. Critical values for 4 df are:
Confidence level 0.1090%
0.0595%
0.0199%
0.00599.5%
Critical value 7.78 9.49 13.28 14.86
To reject the null hypothesis (Hₒ), chi-squared score must be greater than the critical value at the 0.05 level of significance. Since 10.4 is higher than the 0.05 level of significance- which is 9.49, we can reject the null hypothesis (Hₒ).
Summary of different statistical procedures
Different types of data analysis are appropriate for different types of research problems.
Qualitative: data collection of procedures of a low level of explicitness.
Descriptive: use different types of descriptive statistics (frequencies, central tendencies, and variabilities).
Correlational analysis: examination of the relationships between variables.
Multivariate procedures: more complex relationships, dealing with a numbers of variables at a time.
Experimental research
procedures for analyzing data from experimental research:
T-test: helps examine whether the differences between two samples are statistically significant.
One way analysis of variance: examines differences between more than two groups;
Factorial analysis of variance: analyzing the effect of a treatment under more complex conditions.
Chi square: compare frequencies observed in a sample with some theoretically expected frequencies.
Using Computer for Data Analysis The most popular is SPSS and its
updated version. (the researchers are advised to find out which packages are available when preparing the research proposal.
Different phases in performing computer data analysis Phase 1: prepare the data collection tools
with a coding system integrated into the procedures.
Phase 2: the data are transferred to coding sheets. (Example see next slide)
Phase 3: the data are transferred to the computer database. (with professional helps)
Phase 4: choose an appropriate program for the analysis. (experts advise is encouraged)
Phase 5: get results.
Coding sheet
Caution for the researcher during the data analysis Should have a “feel” for the results
and to use intuition. (false results, or error)
Keep a close watch on the results (sensible).
Understand the statistics used for the data analysis.
Acquaint themselves with the specific statistical procedures.