analyzing experimental research data

Analyzing Experimental Research Data: The T-test ANOVA and Chi-square

Atula AhujaChangyan Shi

The experimental design determines the statistical test to be used to analyze the data.

There are several tools and procedures for analyzing quantitative data obtained from different types of experimental designs. Different designs call for different methods of analysis. This presentation focuses on:

1. T-test2. Analysis of variance (F-test), and3. Chi-square test

Experimental Design and Statistics

The Logic of Significance Testing

The results of an inferential statistical test informs whether the results of an experiment would occur frequently or rarely by chance.

Inferential statistical test with small p values occur frequently by chance (accept the null hypothesis), whereas large values occur rarely by chance (reject the null hypothesis).

The Logic of Significance Testing

P value is the probability at which the null hypothesis will be rejected when it is true.

Traditionally statisticians say that any event that occurs by chance 5 times or fewer in 100 occasions is a rare event. (i.e., .05 level of significance).

T- Test When the means of two independent groups

are to be compared, we can use the T- test. This test can help determine how confident we can be that the differences between two groups as a result of the treatment is not due to chance. The researcher calculates a t-value using the sample mean and standard deviation and compares the calculated t-value against a tabulated value. If null hypothesis is rejected, we can say that the difference between the two groups is significant.

Example for t-testA researcher compares performances of two randomly selected groups learning French. The two groups, follow up their frontal lessons with practice sessions. the Experimental Group gets practice

sessions with the aid of the computer. the Control group has practice sessions with

a teacher. The researcher investigates the effects of the computer practice session on students’ achievement in French.

Hypotheses testing

210 μμ:H 211 μμ:H

Mean scores of the two groups are equal

2ˆˆ

μ̂-μ̂

2

1

1

21

nn

tcal

Suppose, upon calculation, the researcher finds thetcal = 1.99

The researcher can use t-test to test the hypothesis

Example for t-test

Upon comparison with the table value of T at p=0.05, it is found that tcal > ttab

This means the null hypothesis is rejected and the differences between two groups are significantly different.

The result is reported as t=1.99, p=.05

One-Way ANOVA

When there are more than two groups, the appropriate procedure is ‘ANOVA’ where we need to analyze the variability within groups and variability between groups. The test we do in ANOVA is the F-test

One-Way ANOVA

3210 μμμ:Hsame the are means population the of allNot :1H

The “F-test”

groupswithinyVariabilitgroupsbetweenyVariabilitF

ExampleTM1 TM2 TM3 TM460 50 40 5767 52 45 6742 43 43 5467 67 55 6756 67 46 6962 59 61 6964 67 45 6859 64 52 6572 63 53 7071 65 63 68

ExampleStep 1) calculate the sum of squares between groups: Mean for group 1 = 62.0Mean for group 2 = 59.7Mean for group 3 = 50.3Mean for group 4 = 65.4 Grand mean= 59.85

SSB = [(62-59.85)2 + (59.7-59.85)2 + (50.3-59.85)2 + (65.4-59.85)2 ] x n per group= 19.65x10 = 1266.6 , n= number of observations in each group

TM1 TM2 TM3 TM460 50 40 5767 52 45 6742 43 43 5467 67 55 6756 67 46 6962 59 61 6964 67 45 6859 64 52 6572 63 53 7071 65 63 68

ExampleStep 2) calculate the sum of squares within groups: (60-62) 2+(67-62) 2+ (42-62) 2+ (67-62) 2+ (56-62)

2+ (62-62) 2+ (64-62) 2+ (59-62) 2+ (72-62) 2+ (71-62) 2+ (50-59.7) 2+ (52-59.7) 2+ (43-59.7) 2+67-59.7) 2+ (67-59.7) 2+ (69-59.7) 2…+…… = 2060.6

Mean=62 Mean=59.7 Mean=50.3 Mean=65.4

TM1 TM2 TM3 TM460 50 40 5767 52 45 6742 43 43 5467 67 55 6756 67 46 6962 59 61 6964 67 45 6859 64 52 6572 63 53 7071 65 63 68

Step 3) Fill in the ANOVA table

3 1266.6 422.2 7.38 0.001

36 2060.6 57.2

Source of variation

d.f.

Sum of squares

Mean Sum of Squares

F-statistic

p-value

Between

Within

Total 39 3327.2

F value

Factorial ANOVA (Two way)

When two factors or more than two factors are involved.(age, gender. level of competence)

The aim is to test if there is also an interaction between teaching method and gender/age/etc.


Gender

Teaching method 1

Teaching method 2

Teaching method 3

Teaching method 4

1 60 50 48 471 67 52 49 671 42 43 50 541 67 67 55 671 56 67 56 682 62 59 61 652 64 67 61 652 59 64 60 562 72 63 59 602 71 65 64 65


TM 1 TM 2 TM 3 TM 4 Average

Male Mean=58.4N=5

Mean=55.8N=5

Mean=51.6N=5

Mean=60.6N=5

Mean=56.6N=20

Female Mean=65.6N=5

Mean=63.6N=5

Mean=61N=5

Mean=62.2N=5

Mean=63.1N=20

Average 62N=10

59.7N=10

56.3N=10

61.4N=10

59.9N=40

1μ̂1μ̂

Using the between, within and interaction sum of squares, we create the ANOVA table and calculate F-Statistic associated with main effects and interaction effects

Hypothesis test procedure will be the same as before

Chi Square (Χ2)

Where o = observed frequencies, and e = expected frequencies

The most obvious difference between the chi‑square tests and the other hypothesis tests we have considered (t and ANOVA) is the nature of the data.

For chi‑square, the data are frequencies rather than numerical scores.

Chi Squared is used to observe the difference between what we actually observe and what we expect to find if the null hypothesis is true.

The chi-square statistic is calculated as follows

Data was collected on citizen’s viewpoints about building of the 2012 Olympic venue at Stratford and tried to find if viewpoints changed according to the perspectives of different groups.

Through a survey/questionnaire, 20 responses from each category of local person were collected, about the usefulness of the new Olympic developments. The statement posed:‘The 2012 Olympic Games development will be of benefit to the whole community of Stratford, east London.’

1 2 3 4

Strongly agree Agree Disagree Strongly disagree

Case Study: Using chi-squared to analyse questionnaire responses

Results of the survey.

Category (type) Frequency of negative responses (Observed values: o)

Business owner 4School student 6Adult male resident 14Adult female resident 10Senior citizen 16

20 people responded from each category and only the frequency of negative response, i.e. those who either disagreed or strongly disagreed with the statement.

The expected data (e) is the mean negative frequency of response, calculated by adding up the observed data (o) and then dividing by the number of categories, i.e. 5. This gives an expected frequency of 10 for each category.

Business owner

School student

Adult male

resident

Adult female residen

t

Senior citizen

Total

o 4 6 14 10 16 50

e 10 10 10 10 10 50

o - e -6 -4 4 0 6 --------

(o – e)² 36 16 16 0 36 --------

(o – e)² e

3.6 1.6 1.6 0 3.6 --------

x² 3.6 1.6 1.6 0 3.6 10.4

Interpreting the Chi-Squared ValueCalculated chi-square value= 10.4 4 degrees of freedom. Critical values for 4 df are:

Confidence level 0.1090%

0.0595%

0.0199%

0.00599.5%

Critical value 7.78 9.49 13.28 14.86

To reject the null hypothesis (Hₒ), chi-squared score must be greater than the critical value at the 0.05 level of significance. Since 10.4 is higher than the 0.05 level of significance- which is 9.49, we can reject the null hypothesis (Hₒ).

Summary of different statistical procedures

Different types of data analysis are appropriate for different types of research problems.

Qualitative: data collection of procedures of a low level of explicitness.

Descriptive: use different types of descriptive statistics (frequencies, central tendencies, and variabilities).

Correlational analysis: examination of the relationships between variables.

Multivariate procedures: more complex relationships, dealing with a numbers of variables at a time.

Experimental research

procedures for analyzing data from experimental research:

T-test: helps examine whether the differences between two samples are statistically significant.

One way analysis of variance: examines differences between more than two groups;

Factorial analysis of variance: analyzing the effect of a treatment under more complex conditions.

Chi square: compare frequencies observed in a sample with some theoretically expected frequencies.

Using Computer for Data Analysis The most popular is SPSS and its

updated version. (the researchers are advised to find out which packages are available when preparing the research proposal.

Different phases in performing computer data analysis Phase 1: prepare the data collection tools

with a coding system integrated into the procedures.

Phase 2: the data are transferred to coding sheets. (Example see next slide)

Phase 3: the data are transferred to the computer database. (with professional helps)

Phase 4: choose an appropriate program for the analysis. (experts advise is encouraged)

Phase 5: get results.

Coding sheet

Caution for the researcher during the data analysis Should have a “feel” for the results

and to use intuition. (false results, or error)

Keep a close watch on the results (sensible).

Understand the statistics used for the data analysis.

Acquaint themselves with the specific statistical procedures.

analyzing experimental research data

Education