Cross‐tabulation and Chi‐square testq
Business Research MethodologyBusiness Research Methodology
Dr. Gunjan MalhotraDr. Gunjan MalhotraAssistant Professormailforgunjan@gmail [email protected]
Simple Tabulation for Ranking Type Q ti Bi i t i blQuestions – Bivariate variables
• Suppose ‐ ordinal scale questions
• Q. Rank the 5 brands of refrigerators shown below on ascale of 1 to 5 (1=Best and 5=Worst), according to youropinionopinion.
BRAND RANKBRAND RANKWhirlpool ___Kelvinator ___Godrej ___Samsung ___Videocon ___
Output table formulationOutput table formulation
Table 1BRAND RANK 1 RANK2 RANK3 RANK4 RANK5BRAND RANK 1 RANK2 RANK3 RANK4 RANK5Whirlpool x x x x xKelvinator x x x x xKelvinator x x x x xGodrej x x x x xSamsung x x x x xSamsung x x x x xVideocon x x x x x
Univariate tablesUnivariate tables• For constructing univariate tables ‐ take up one column at atime and do separate frequency tables or charts. E.g.
BRAND No. of People who Ranked it No.1p
Whirlpool 90
Kelvinator 60
Godrej 70
Samsung 32g
Videocon 45
TOTAL 297
• We can calculate %age on a total for each brand. E.g. 90/297works out to 303 or 30 3% who ranked Whirlpool as no 1 andworks out to .303 or 30.3% who ranked Whirlpool as no.1. andso on.
Simple Tabulation for Rating Type Questions Q. Rate the following attributes of LIRIL soap on a scale of 1 to 5 (1= Very Unsatisfactory to 5=Very Satisfactory).Very Unsatisfactory to 5 Very Satisfactory).
Lather __________________________________
1 2 3 4 51 2 3 4 5
Fragrance __________________________________
1 2 3 4 5
• For each attribute, the number of people who rated it as 1, 2, 3, 4 or 5 can be tabulated in separate tables like:
RATING Lather
1 30
2 25
3 50
4 76
5 22
TOTAL 203
Alternatively, we can tabulate ratings for all attributes as follows ‐
RATING LATHER FRAGRANCE ATR.3 ATR.4 ATR.51 x x x x x1 x x x x x2 x x x x x3 x x x x x4 x x x x x5 x x x x x
Second Stage Analysis – Cross Tabulation• A cross‐tabulation can be done by combining any two of the
questions and tabulating the data together. This is a 2‐variablequestions and tabulating the data together. This is a 2 variablecross tabulation.
b l b d f f b d f• E.g. a cross‐tabulation between Brand Preference for brands of teaand Region to which Respondent belongs.
BRANDRegionwise Buyers (No.)RAN Regionwise uyers (No.)North South East West Total
Brooke Bond 25 (50%) 20 20 15(30%) 80(40%)Lipton 10(20%) 15 20 5(10%) 50(25%)Tata 15(30%) 15 10 30(60%) 70(35%)Total 50(100%) 50 50 50(100%) 200(100%)Total 50(100%) 50 50 50(100%) 200(100%)
– An extension of this could be adding percentages.An extension of this could be adding percentages.
Calculating Percentages in a Cross Tabulation•In the above example, we can compute percentages
• row‐wise,row wise,• column‐wise or•on the total sample of 200.
•The general rule is to calculate percentages across the dependentvariable (across Brand categories ).( g )
• Assume that brand preference depends on the region to whichrespondents belong. i.e. “Brand” ‐ dependent variable, and“Region” ‐ independent variable.
• The interpretation is – “Out of 50 respondents from the NorthernRegion, 50% buy Brooke Bond, 20% buy Lipton, and 30% buy TataRegion, 50% buy Brooke Bond, 20% buy Lipton, and 30% buy TataTea”.
Chi‐square testq
1. Univariate ‐ Chi‐square test for goodness of fitq g
• Test for significance in the analysis of frequency distributions.Test for significance in the analysis of frequency distributions.• Each question represents a variable under study.• Compare observed frequencies with expected frequenciesCompare observed frequencies with expected frequencies
2 Bivariate ‐ Chi‐square test for relatedness or independence2. Bivariate Chi square test for relatedness or independence
– Chi‐Square allows testing for significant differences between– Chi‐Square allows testing for significant differences between groups.
[Two different questions in a questionnaire may represent two variables.]q q y p
Chi‐square test for Goodness of FitChi square test for Goodness of Fit• is used to analyze probabilities of multinomial y pdistribution trials along a single dimension.
• The Chi‐square test for goodness‐of‐fit test comparesThe Chi square test for goodness of fit test compares the expected (theoretical) frequencies of categories from a population distribution to the observedfrom a population distribution to the observed (actual) frequencies from a distribution to determine whether there is a difference between what waswhether there is a difference between what was expected and what was observed .
∑ −=
i
ii )²( ²E
EOxiE
Example 1: Chi Square test for goodness of fit ‐ Equal expected frequency
• The table outlines the attitudes of 60 people towards US• The table outlines the attitudes of 60 people towards US military bases in Australia. A chi‐square test for goodness of fit will allow us to determine if differencesgoodness of fit will allow us to determine if differences in frequency exist across response categories.H Th i i ifi t diff f f• Ho: There is no significant difference across frequency of attitudes towards military base in Australia.
Attitude towards US Military Frequency of ResponseAttitude towards US Military bases in Australia
Frequency of Response(Observed frequencies)
In favour 8
Against 20
Undecided 32
Output 1: Chi‐Square test – equal expected frequencies
Interpretation 1: Chi‐square test – equal d f iexpected frequencies
• The output shows that the chi‐square value is significant (p < .05). (Ho: rejected).g (p ) ( j )
• Therefore it can be concluded that there are• Therefore, it can be concluded that there are significant differences in the frequency of attitudes towards military base in Australiatowards military base in Australia.
• The results show that people are largely undecided on this issue, chi‐square (2,N=60)=14.4, p < .05.
Example 2: Chi‐square test for goodness of fit – Unequal expected frequencies
• Sometimes the expected frequencies are not evenly balanced across categories.y g
• E.g. the expected frequency for each category was 15 15 and 30was 15, 15 and 30.
Attitude towardsUS Military bases
Frequency of Response
Expected Frequency ofUS Military bases
in AustraliaResponse(Observedfrequencies)
Frequency of responses
I f 8 15In favour 8 15
Against 20 15
Undecided 32 30
Output 2: Chi‐square test – unequal expected frequencies
Interpretation 2: Chi‐square test – unequal expected frequencies
• The output shows that the chi square value is• The output shows that the chi‐square value is not significant (p = .079 > .05). (Ho = accepted)
• Therefore, it can be concluded that there is no ,significant differences in the frequency of attitudes towards military base in Australia.attitudes towards military base in Australia.
Th lt h th t l l l• The results show that people are largely undecided on this issue, chi‐square (2,N=60)= 5 067 055.067, p > .05.
Chi square test of IndependenceChi‐square test of Independence
• Qualitative Variables Nominal data• Qualitative Variables ‐ Nominal data
• used to test if the two variables are statistically• used to test if the two variables are statistically associated with each other significantly.
• Used to analyze the frequencies of two variables with multiple categories to determine whether the twomultiple categories to determine whether the two variables are independent.
• It is possible to do a cross‐tabulation (and a chi‐squared test – with given table value, df, confidence level) for any two nominal variables in the survey.
Example 1: Chi square test for cross tabExample 1: Chi‐square test for cross‐tab
• Let us assume that we have conducted consumer survey for a brand of detergent. One of the question dealt with income category of the respondent. Another asked the respondent to rate his purchase intentions.
• Ho: There is no significant association between Respondent Income and Purchase Intentionp
S. No
INCOME CODE INTENT INTCODE No.1 Less Than 5000 1 NONE 1 2 Less Than 5000 1 LOW 2 3 Less Than 5000 1 LOW 2 4 Less Than 5000 1 NONE 14 Less Than 5000 1 NONE 15 Less Than 5000 1 HIGH 3 6 5001-10000 2 LOW 2 7 5001-10000 2 HIGH 3 8 5001-10000 2 VERY
HIGH 4
9 5001-10000 2 HIGH 3 10 5001-10000 2 LOW 2 11 10001-20000 3 HIGH 3 12 10001-20000 3 VERY
HIGH 4
13 10001-20000 3 CERTAIN 514 10001-20000 3 HIGH 3 15 10001-20000 3 VERY
HIGH 4
16 Above 20000 4 HIGH 316 Above 20000 4 HIGH 317 Above 20000 4 CERTAIN 5 18 Above 20000 4 VERY
HIGH 4
19 Abo e 20000 4 CERTAIN 519 Above 20000 4 CERTAIN 520 Above 20000 4 CERTAIN 5
Both variables are coded.Both variables are coded.
Income codes and their equivalent incomes are –
Code Income in Rs. per Month1 Less than 50001 Less than 50002 5001 to 10,0003 10,001 to 20,0004 Above 20 0004 Above 20,000
Purchase Intention codes are as follows –
Code Explanation (Value Labels for the Variable)1 None – No intention to buy1 None No intention to buy2 Low – Low intention to buy3 High – High intention4 Very High Very high intention4 Very High – Very high intention5 Certain – Certain to buy
INCOME Per Month by PURCHASE INTENTION
Income per Month in RS.--- Purchase Intent
Code Less than 5000
5000-10000
10000-20000
Above 20000
TOTAL
5000None 1 2 0 0 0 2 Low 2 2 2 0 0 4Low 2 2 2 0 0 4High 3 1 2 2 1 6 V. High 4 0 1 2 1 4 Certain 5 0 0 1 3 4TOTAL 5 5 5 5 20
Cross‐tabulation of code (column‐income per month) and Intcode (row – purchase intent).
Result 1: Chi Square test for cross tabResult 1: Chi‐Square test for cross‐tab
Interpretation 1: Chi‐square test for cross‐tab
• The cross‐tabulation shows the number of respondentsfalling into each cell (a cell is the combination of oneINCOME category with one PURCHASE INTENTION category).
• The first line of the chi‐squared test reads a significancelevel of 0 097 This means the chi‐squared test is showing alevel of 0.097. This means the chi squared test is showing asignificant association between these two variables at a 90percent confidence level. (equivalent to 0.10 significancelevel).
• Thus, we conclude that at 90 percent confidence level,PURCHASE INTENTION and INCOME are associatedsignificantly with each other This may lead us to concludesignificantly with each other. This may lead us to concludethat the price of the detergent is important in its purchase.
Example 2: Chi square test for Cross tabsExample 2: Chi square test for Cross‐tabs
• Suppose the researcher finds the association• Suppose the researcher finds the association between educational background (independent
i bl ) f PGDM t d t d th i fvariable) of PGDM students and their performance in terms of grade (dependent variable) secured.
• A bivariate cross‐tabulation has been done by combining the above two variables and tabulating g gthe data together.
• Here assumption is made by our group based on• Here assumption is made by our group based on information extracted from the database (performance) of B schools(performance) of B‐schools.
• We want to test at 90% and 95% confidence level, what is the level of significance of gassociation between EDUCATIONAL BACKGROUND of PGDM students and theirBACKGROUND of PGDM students and their PERFORMANCE in terms of GRADE.
• Further, the variables are coded.
• Educational background and their eqvivalent codes areEducational background CodeEducational background Code
B.Com 1B E 2B.E. 2B.Sc. 3B B A 4B.B.A. 4B.A. 5
• Grade codes are as follows:Grade Obtainend Grade Code
A 1B 2C 3
• These two variables were cross‐tabulated for twenty‐five observations.y
• A cross‐tabulation with a Chi‐squared test was performed using SPSS packageperformed using SPSS package.
Input data tablell k d d d d dS.No. Roll No. Background Code Grade Grdcode
1 1 B.Com 1 B 22 2 B.Com 1 C 33 3 B.Com 1 A 14 4 B.Com 1 C 35 5 B.Com 1 B 26 6 B.E. 2 A 17 7 B.E. 2 A 17 7 B.E. 2 A 18 8 B.E. 2 A 19 9 B.E. 2 B 210 10 B.E. 2 A 111 11 B Sc 3 B 211 11 B.Sc. 3 B 212 12 B.Sc. 3 B 213 13 B.Sc. 3 C 314 14 B.Sc. 3 C 315 15 B.Sc. 3 C 316 16 BBA 4 A 117 17 BBA 4 B 218 18 BBA 4 C 319 19 BBA 4 C 320 20 BBA 4 B 221 21 B.A. 5 C 322 22 B.A. 5 C 322 22 B.A. 5 C 323 23 B.A. 5 C 324 24 B.A. 5 C 325 25 B.A. 5 B 2
Output table 2: Grades Vs Entry QualificationOutput table 2: Grades Vs Entry Qualification
Result 2: Chi Square test for cross tabResult 2: Chi‐Square test for cross‐tab
Interpretation 2: Chi‐Square test for cross‐tab• The Chi‐square test revealed the significant association between the educational background of the studentsbetween the educational background of the students and their performance in terms of grade.
• The significance level of 0.089 (Pearson’s) has been achieved This means the Chi‐square test is showing aachieved. This means the Chi square test is showing a significant association between the above two variables at 91.1% confidence level (100 – 8.9).
• Thus we conclude that at 90% confidence level, ,educational background of PGDM students and their performance in terms of grade are associated significantly with each other, whereas this is not significant at the 95% confidence level.
• From the obtained contingency coefficient (C) of 0.596, it g y ( ) ,can be inferred that the association between the dependent and independent variable is significant, as the value 0.596 is closer to 1 that to 0.
• From the Lambda asymmetric value (with grade code dependent) of 0.286, we conclude that there is a moderate level of association between the above two variables. This lambda value tells us that there is a 28.6% reduction in predicting the grade of student when we know his educational background.
• This leads us to conclude that educational background plays a vital role in the performance of the students of PGDM course.
Example 3: Chi‐square test for cross tab ‐ 3• A manufacturer was interested in assesing how children ages four, five
and six play with one of the manufacturer’s toys. Each child was asked 1 i ll i h hild’ l d i i h15 questions. Following the child’s completed interview, the parent was asked the same 15 questions to validate the child’s answers. The following table lists the number of responses to selected items from g pthe survey. One hundred interviewers were conducted with both the parent and the child. Notice that item response rates varied from
ti t ti F h ti t t t l t th d th tquestion to question. For each question, state at least one method that could be used to attempt to correct for this item nonresponse bias.
Question # Children Responding
# Parents Responding
Age of child 95 100
Location of Play 80 85
How much the child 30 50How much the child liked the toy
30 50
Result 3: Chi square test for cross tabResult 3: Chi‐square test for cross‐tab
• Thank you…