section 2.6 relations in categorical variables so far in chapter two we have dealt with data that is...

22
Section 2.6 Relations in Categorical Variables So far in chapter two we have dealt with data that is quantitative. In this section we consider categorical data. Suppose we measure two variables in an individual, and both of those variables are categorical in nature. How can we display their association if there is any?

Post on 20-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Section 2.6 Relations in Categorical Variables So far in chapter two we have dealt with data that is quantitative. In this section we consider categorical

Section 2.6 Relations in Categorical Variables

So far in chapter two we have dealt with data that is quantitative.

In this section we consider categorical data.

Suppose we measure two variables in an individual, and both of those variables are categorical in nature. How can we display their association if there is any?

Page 2: Section 2.6 Relations in Categorical Variables So far in chapter two we have dealt with data that is quantitative. In this section we consider categorical

Section 2.6 Relations in Categorical Variables

Consider the situation where 400 individuals are classified as having received a vaccine and whether the vaccine helped ward off the illness which it was intended for.

One method is to display the information in a table. In the table we would write in the appropriate counts for each category. This table is known as a Two-Way Table, if we have measured two variables.

Page 3: Section 2.6 Relations in Categorical Variables So far in chapter two we have dealt with data that is quantitative. In this section we consider categorical

Section 2.6 Relations in Categorical Variables

Consider the situation where 400 individuals are classified as having received a vaccine and whether the vaccine helped ward off the illness which it was intended for.

Vaccinated Not Vaccinated Total

Attacked 60 85 145Not Attacked 190 65 255

Total 250 150 400

Treatment

Med

ical

co

nditi

onTwo - Way Table

Margins

The rows and columns that contain the totals are considered the margins.

Page 4: Section 2.6 Relations in Categorical Variables So far in chapter two we have dealt with data that is quantitative. In this section we consider categorical

Section 2.6 Relations in Categorical Variables

Vaccinated Not Vaccinated Total

Attacked 60 85 145Not Attacked 190 65 255

Total 250 150 400

Treatment

Med

ical

co

nditi

onTwo - Way Table

Margins

A nice method of creating a picture with this table is to use bar graphs. However, a single bar graph can not capture all the information that is shown. We must choose what it is we want to see.

Page 5: Section 2.6 Relations in Categorical Variables So far in chapter two we have dealt with data that is quantitative. In this section we consider categorical

Section 2.6 Relations in Categorical Variables

Vaccinated Not Vaccinated Total

Attacked 60 85 145Not Attacked 190 65 255

Total 250 150 400

TreatmentM

edic

al

cond

ition

Marginal Distributions

Margins

If we create a bar graph that graphs the margins, this is known as a marginal distribution. And since we have two categories then we need to bar graphs to show both categories.

Page 6: Section 2.6 Relations in Categorical Variables So far in chapter two we have dealt with data that is quantitative. In this section we consider categorical

Section 2.6 Relations in Categorical Variables

Vaccinated Not Vaccinated Total

Attacked 60 85 145Not Attacked 190 65 255

Total 250 150 400

TreatmentM

edic

al

cond

ition

Marginal Distributions

Marginal Distribution for Treatment

0

50

100

150

200

250

300

Vaccinated Not Vaccinated

Treatment

Freq

uenc

y

Marginal Distribution for Medical Condition

0

50

100

150

200

250

300

Attacked Not AttackedMedical Condition

Fre

qu

en

cy

Page 7: Section 2.6 Relations in Categorical Variables So far in chapter two we have dealt with data that is quantitative. In this section we consider categorical

Section 2.6 Relations in Categorical Variables

Vaccinated Not Vaccinated Total

Attacked 60 85 145Not Attacked 190 65 255

Total 250 150 400

TreatmentM

edic

al

cond

ition

Conditional Distributions

If we only consider a particular row or column then the graph is considered a conditional distribution.

Page 8: Section 2.6 Relations in Categorical Variables So far in chapter two we have dealt with data that is quantitative. In this section we consider categorical

Section 2.6 Relations in Categorical Variables

Vaccinated Not Vaccinated Total

Attacked 60 85 145Not Attacked 190 65 255

Total 250 150 400

TreatmentM

edic

al

cond

ition

Conditional Distributions

If we only consider a particular row or column then the graph is considered a conditional distribution.

Suppose that we wish to consider only the medical condition where a person has been attacked, and find out if being vaccinated resulted in less cases of attacks compared to not being vaccinated.

Page 9: Section 2.6 Relations in Categorical Variables So far in chapter two we have dealt with data that is quantitative. In this section we consider categorical

Section 2.6 Relations in Categorical Variables

Vaccinated Not Vaccinated Total

Attacked 60 85 145Not Attacked 190 65 255

Total 250 150 400

TreatmentM

edic

al

cond

ition

Conditional Distributions

People Who Have Been Attacked

0102030405060708090

Vaccinated Not Vaccinated

Treatment

Fre

qu

en

cy

This bar graph is based on the condition that we only are considering people who have been attacked.

Page 10: Section 2.6 Relations in Categorical Variables So far in chapter two we have dealt with data that is quantitative. In this section we consider categorical

Section 2.6 Relations in Categorical Variables

Vaccinated Not Vaccinated Total

Attacked 60 85 145Not Attacked 190 65 255

Total 250 150 400

TreatmentM

edic

al

cond

ition

Conditional Distributions

Medical Condition Based on Treatment-No Vaccination

0

20

40

60

80

100

Attacked Not Attacked

Medical Condition

Fre

qu

ency

This bar graph of the medical condition is based on the condition that the person is not vaccinated.

Page 11: Section 2.6 Relations in Categorical Variables So far in chapter two we have dealt with data that is quantitative. In this section we consider categorical

Section 2.6 Relations in Categorical Variables

Vaccinated Not Vaccinated Total

Attacked 60 85 145Not Attacked 190 65 255

Total 250 150 400

TreatmentM

edic

al

cond

ition

Conditional Distributions

Usually percentages can add to the understanding of a distribution. We could create a table based on percentages of the total (marginal), or percentages based on a column or row (conditional).

Page 12: Section 2.6 Relations in Categorical Variables So far in chapter two we have dealt with data that is quantitative. In this section we consider categorical

Section 2.6 Relations in Categorical Variables

Vaccinated Not Vaccinated Total

Attacked 60 85 145Not Attacked 190 65 255

Total 250 150 400

TreatmentM

edic

al

cond

ition

Conditional Distributions

Vaccinated Not Vaccinated Total

Attacked 60/400=.15 85/400=.2125 0.3625Not Attacked 190/400 = .475 65/400=.1625 0.6375

Total 250/400=.625 150/400=.375 1

Treatment

Med

ical

co

nditi

on

Page 13: Section 2.6 Relations in Categorical Variables So far in chapter two we have dealt with data that is quantitative. In this section we consider categorical

Section 2.6 Relations in Categorical Variables

Marginal Distributions

Vaccinated Not Vaccinated Total

Attacked 60/400=.15 85/400=.2125 0.3625Not Attacked 190/400 = .475 65/400=.1625 0.6375

Total 250/400=.625 150/400=.375 1

Treatment

Med

ical

co

nditi

on

Marginal Distribution for Medical Condition

0

0.2

0.4

0.6

0.8

Attacked NotAttackedMedical Condition

Rela

tive

Freq

uenc

y

Page 14: Section 2.6 Relations in Categorical Variables So far in chapter two we have dealt with data that is quantitative. In this section we consider categorical

Section 2.6 Relations in Categorical Variables

Conditional Distributions

Vaccinated Not Vaccinated Total

Attacked 60/250=.24 85/150=.57 0.3625Not Attacked 190/250 = .76 65/150=.43 0.6375

Total 250/250=1 150/150=1 1Medic

al co

nditio

n

Conditional Distribution Based on a Person Being Vaccinated

00.10.2

0.30.40.50.6

0.70.8

Attacked Not Attacked

Medical Condition

Re

lati

ve F

req

ue

ncy

Page 15: Section 2.6 Relations in Categorical Variables So far in chapter two we have dealt with data that is quantitative. In this section we consider categorical

Section 2.6 Relations in Categorical Variables

Vaccinated Not Vaccinated Total

Attacked 60 85 145Not Attacked 190 65 255

Total 250 150 400Medic

al co

nditio

n

What percentage of the people who were attacked are vaccinated?

60/145 .414 rounded

Page 16: Section 2.6 Relations in Categorical Variables So far in chapter two we have dealt with data that is quantitative. In this section we consider categorical

Section 2.6 Relations in Categorical Variables

Vaccinated Not Vaccinated Total

Attacked 60 85 145Not Attacked 190 65 255

Total 250 150 400Medic

al co

nditio

n

What percentage of the people are vaccinated?

250/400 = .625

Page 17: Section 2.6 Relations in Categorical Variables So far in chapter two we have dealt with data that is quantitative. In this section we consider categorical

Section 2.6 Relations in Categorical Variables

Vaccinated Not Vaccinated Total

Attacked 60 85 145Not Attacked 190 65 255

Total 250 150 400Medic

al co

nditio

n

What percentage of the people who are not vaccinated were not attacked?

65/150 .433

Page 18: Section 2.6 Relations in Categorical Variables So far in chapter two we have dealt with data that is quantitative. In this section we consider categorical

Simpson’s Paradox

An association or comparison that holds for several groups can reverse direction when the data are combined to form a single group. This reversal is called Simpson’s paradox (page 200).

Page 19: Section 2.6 Relations in Categorical Variables So far in chapter two we have dealt with data that is quantitative. In this section we consider categorical

Simpson’s Paradox

Page 207 problem 96.

Yes No Yes NoWhite Victim 19 132 White Victim 11 52Black Victim 0 9 Black Victim 6 97

White DefendantDeath Penalty

Black DefendantDeath Penalty

This is a three-way table because there are three categories: race of defendant, race of victim, death penalty verdict. In order to show all three categories two tables are needed.

Page 20: Section 2.6 Relations in Categorical Variables So far in chapter two we have dealt with data that is quantitative. In this section we consider categorical

Simpson’s Paradox Page 207 problem 96.

Yes No total Yes No totalWhite Victim 19 132 151 White Victim 11 52 61Black Victim 0 9 9 Black Victim 6 97 103

19 141 17 149

White DefendantDeath Penalty

Black DefendantDeath Penalty

Yes No Yes NoWhite Victim 0.125828 132 White Victim 0.174603 52Black Victim 0 9 Black Victim 0.058252 97

White DefendantDeath Penalty

Black DefendantDeath Penalty

Let us look at the percentage of time the death penalty is given depending on the race of the defendant.

Notice that the black defendant receives the death penalty more often, regardless of the race of the victim as compared to the white defendant.

Page 21: Section 2.6 Relations in Categorical Variables So far in chapter two we have dealt with data that is quantitative. In this section we consider categorical

Simpson’s Paradox Page 207 problem 96.

Yes No TotalWhite def 19 141 160Black def 17 149 166

36 290

Death PenaltyYes No Total

White def 0.1188 0.8813 160Black def 0.1024 0.8976 166

36 290

Death Penalty

Yes No total Yes No totalWhite Victim 19 132 151 White Victim 11 52 61Black Victim 0 9 9 Black Victim 6 97 103

19 141 17 149

White DefendantDeath Penalty

Black DefendantDeath Penalty

Yes No Yes NoWhite Victim 0.125828 132 White Victim 0.174603 52Black Victim 0 9 Black Victim 0.058252 97

White DefendantDeath Penalty

Black DefendantDeath Penalty

When we remove the category “victims race” by combining the tables, the result of this is that the white defendant receives the death penalty more often, 11.88%, than the black defendant, 10.24%.

Page 22: Section 2.6 Relations in Categorical Variables So far in chapter two we have dealt with data that is quantitative. In this section we consider categorical

THE END