crosstabs & measures of association

29
Crosstabs & Measures of Association POL242 October 9 and 11, 2012 Jennifer Hove

Upload: joella

Post on 22-Feb-2016

27 views

Category:

Documents


0 download

DESCRIPTION

Crosstabs & Measures of Association. POL242 October 9 and 11, 2012 Jennifer Hove. Questions of Causality. Recall: Most causal thinking in social sciences is probabilistic, not deterministic: as X increases, the probability of Y increases, not that X invariably produces Y - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Crosstabs &  Measures of Association

Crosstabs & Measures of Association

POL242October 9 and 11, 2012

Jennifer Hove

Page 2: Crosstabs &  Measures of Association

Questions of CausalityRecall:

Most causal thinking in social sciences is probabilistic, not deterministic: as X increases, the probability of Y increases, not that X invariably produces Y

We can observe only association per HumeWe must therefore infer causationNot one, but many possible causes

Page 3: Crosstabs &  Measures of Association

Inferring Causal Relations1. There must be association

X Y; ~X ~Y2. Time order must be considered

Presumed cause should precede presumed effect3. Must rule out possible rival explanations

Sometimes what appears to be a strong relationship between two variables is due to influence of others

4. Must be able to identify the process by which one factor brings about change in anotherCausal linkage

Page 4: Crosstabs &  Measures of Association

Establishing AssociationWith nominal or ordinal data, relationships usually

presented in tabular or table formWhy? Hypotheses rest on core idea of comparison

Ex: if we compare respondents on basis of their value on the IV, say party identification, they should also differ along DV, say support for gay rights

Crosstabs are a wonderful means of making comparisons

“God speaks to you through crosstabs!”

Page 5: Crosstabs &  Measures of Association

Using/Interpreting CrosstabsData arranged in side-by-

side frequency distributionsIV (X) presented across the

top of the table – in columns If ordinal, arrange from low

scores (on left) to high scores (on right)

DV (Y) presented down the left hand side of the table – in rowsAgain, if ordinal, arrange

from low (at top) to high (at bottom)

Low HighAll

Respondents86.1%(173)

52.7%(355)

60.4%(528)

13.9(28)

47.3(318)

39.6(346)

Tau-b=.29Source: Strategic Counsel, CTV/Globe and Mail Survey, July 2007

100(201)

100(673)

100(874)

Table 1: Support for the Afghan Mission by Perceived Impact of Taliban Resurgence, 2007

Low

High

Total (N)

Fear of Taliban Resurgence

Support for Afghan Mission

Page 6: Crosstabs &  Measures of Association

Using/Interpreting CrosstabsData presented so that

categories of the IV add to 100%Percentaging within

categories of the IV (down in a table)

Comparisons are made across categories of the IVFrom left to rightTo see the effect of

the IV on the DV

Low HighAll

Respondents86.1%(173)

52.7%(355)

60.4%(528)

13.9(28)

47.3(318)

39.6(346)

Tau-b=.29Source: Strategic Counsel, CTV/Globe and Mail Survey, July 2007

100(201)

100(673)

100(874)

Table 1: Support for the Afghan Mission by Perceived Impact of Taliban Resurgence, 2007

Low

High

Total (N)

Fear of Taliban Resurgence

Support for Afghan Mission

Page 7: Crosstabs &  Measures of Association

Rules (!) of Crosstabs1. Make the IV define the columns and the DV define

the rows of the table

2. Always percentage down within categories of the IV

3. Interpret the relationship by comparing across columns, within rows of the table

Page 8: Crosstabs &  Measures of Association

Example: 2 x 2 CrosstabSupport for Y Variable by Support for X Variable

Score on X Variable Low High

Score on Y Variable

Low A B A + B High C D C + D

A + C B+ D

Low HighAll

Respondents86.1%(173)

52.7%(355)

60.4%(528)

13.9(28)

47.3(318)

39.6(346)

100(201)

100(673)

100(874)

Table 1: Support for the Afghan Mission by Perceived Impact of Taliban Resurgence, 2007

Low

High

Total (N)

Fear of Taliban Resurgence

Support for Afghan Mission

Page 9: Crosstabs &  Measures of Association

DiagonalsMain diagonal: running to the right and down

When larger proportion of cases fall on main diagonal, relationship is said to be direct or positive

Low values on X associated with low values on Y; high values on X associated with high values on Y

Score on X Variable Low High

Score on Y Variable

Low A B A + B High C D C + D

A + C B+ D

Page 10: Crosstabs &  Measures of Association

DiagonalsOff diagonal: running to the right and up

When larger proportion of cases fall on off diagonal, relationship is said to be inverse or negative

Low values on X associated with high values on Y; high values on X associated with low values on Y

Score on X Variable Low High

Score on Y Variable

Low A B A + B High C D C + D

A + C B+ D

Page 11: Crosstabs &  Measures of Association

Explaining Variation in YRelationships between variables in social sciences are

rarely, if ever, perfectly predictableYou are unlikely to see something like this:

Support for Y Variable by Support for X Variable

Low HighLow 100% 0High 0 100%Total 100 100

Score on X Variable

Score on Y Variable

Page 12: Crosstabs &  Measures of Association

Explaining Variation in YThere is likely to be more than one explanation or

“cause” behind the variation in YSo we will generally be looking at:

X1 YX2 Y

To compare, we want to know relative strength of each relationship

A variety of summary terms called measures of association are used

Page 13: Crosstabs &  Measures of Association

Measures of AssociationCompress information that appears in a crosstab

into a single number by summarizing:Magnitude (strength) of the relationshipDirection of the relationship

Magnitude: ranges from 0 (completely unpredictable) to 1 (perfectly predictable)

Direction: positive (+) = cases primarily on main diagonal; negative (-) = cases primarily on off diagonal

Page 14: Crosstabs &  Measures of Association

Two Cautionary NotesDirection is not useful with nominal-level variables,

since they are not ordered/ranked from low to highEven with ordinal measurement, interpretation of

direction depends entirely on how your variables are codedShould always code your variables so that high scores

indicate “more” of what you want to explain

Page 15: Crosstabs &  Measures of Association

Direction & StrengthCombining direction & strength, we get a range

of possibilities

All intermediary values can also occur, e.g. -.2367Note that equivalent positive and negative scores are

equal in strengthEx: +.4 and -.4 are equal in strength; they differ only in

direction

-1.0 -.8 -.6 -.4 -.2 0 +.2 +.4 +.6 +.8 +1.0

Page 16: Crosstabs &  Measures of Association

Choosing among Measures We use different measures of association for 2 main

reasons:1. There are different levels of measurement

Ordinal measurement offers ranking information used to calculate association, which isn’t available with nominal data

2. Some measures are specific to tables of certain sizes and shapesSpecific measures for 2 x 2 tables; others for larger

square tables; still others for rectangular tables

Page 17: Crosstabs &  Measures of Association

Phi ΦUse with dichotomous variables, 2 x 2 tablesApplies to nominal and ordinal dataMeasures the strength of a relationship by taking the

# of cases on the main diagonal minus the # of cases on the off diagonal (adjusting for marginal distribution of cases, i.e. the sum of the columns and rows)

))()()(( DBCADCBABCAD

Page 18: Crosstabs &  Measures of Association

2 Examples: Phi Φ

6.

2.

Low HighLow 75% 10%High 25% 90%Total 100 100

Score on X Variable

Score on Y Variable

Low HighLow 50% 20%High 50% 80%Total 100 100

Score on X Variable

Score on Y Variable

Page 19: Crosstabs &  Measures of Association

Cramer’s VAn extension of PhiLogic of Cramer’s V is based on percentage

differences across the columns, not on logic of diagonals

Use with nominal data, when tables are larger than 2 x 2

Page 20: Crosstabs &  Measures of Association

Lambda Lambda (λ) is another measure of association for

nominal dataIts rationale of “percentage of improvement” or

“proportion reduction in error” is relatively easy to explain

Not recommended in this courseWhen modal category of each column is in same row,

λ=0

Page 21: Crosstabs &  Measures of Association

Measures of Association: Ordinal DataMeasures include Tau-b, Tau-c and Gamma Rely on analysis of diagonals

Support for X Low Med High

Support for Y

Low a b c Med d e f

High g h i

Page 22: Crosstabs &  Measures of Association

Measures of Association: Ordinal DataMeasures include Tau-b, Tau-c and Gamma Rely on analysis of diagonals

Support for X Low Med High

Support for Y

Low a b c Med d e f

High g h i

Page 23: Crosstabs &  Measures of Association

Measures of Association: Ordinal DataMeasures include Tau-b, Tau-c and Gamma Rely on analysis of diagonals

Support for X Low Med High

Support for Y

Low a b c Med d e f

High g h i

Page 24: Crosstabs &  Measures of Association

Mind your Ps and QsThe letter P indicates the # of pairs of cases on the

main diagonals (from left to right)The letter Q indicates the # of pairs of cases on the

off diagonal (from right to left)If P > Q, we have a positive associationIf P < Q, we have a negative associationThe core calculation = P - Q

Page 25: Crosstabs &  Measures of Association

GammaThe information of P and Q can be used to

calculate Gamma (γ)

Problems:Any vacant cell produces a score of 1.0Tends to overstate strength of a relationship

QPQP

QPQ

QPP

QPQP

Page 26: Crosstabs &  Measures of Association

Tau-b and Tau-cPreferable to Gamma, though built on the same

logic of diagonalsTends to produce results similar to phi (using

nominal data) or the most important interval measure (r) – to be discussed later in the year

))(( YQPXQPQPbTau

Page 27: Crosstabs &  Measures of Association

Tau-b and Tau-cTau-b never quite reaches 1.0 in non-square tablesSo Tau-c was developed to use with rectangular

tablesIn practice, the difference between Tau-b and Tau-c

when applied to the same table is not great, but keep the distinction above in mind

Page 28: Crosstabs &  Measures of Association

Example

Approval of Chavez

Very Bad Bad Good

Very Good

All Respondents

Disapprove12.7%(26)

22.8%(64)

43.4%(171)

67.9%(110)

35.6%(371)

Approve87.3(178)

77.2(217)

56.6(223)

32.1(52)

64.4(670)

100(394)

100(162)

100(1041)

Table 2: Approval of President Chavez by Opinion of the United States, 2007

Opinion of the United States

Total (N)

100(204)

100(281)

Tau-c: -.39 Tau-b: -.35Source: Latinobarometer, 2007 – Venezuelan respondents only

Page 29: Crosstabs &  Measures of Association

Summing UpWith nominal data, use Phi or Cramer’s V

Phi used for 2 x 2 tablesCramer’s V used for any other crosstab involving

nominal dataAvoid Lambda

With ordinal data, use Tau-c or Tau-bTau-b used for square tables: 3 x 3, 4 x 4, etcTau-c used for rectangular tablesAvoid Gamma