stac51: categorical data analysisfisher.utstat.utoronto.ca/~mahinda/stac51/slidesc51_1p.pdfmahinda...

Introduction

STAC51: Categorical data Analysis

Mahinda Samarakoon

January 21, 2016

Mahinda Samarakoon STAC51: Categorical data Analysis 1 / 21

Introduction

Table of contents

1 Introduction


Introduction

Basic Concepts

Categorical data analysis is concerned with the statistical methodsfor analysis of categorical response (dependent) variables.Explanatory variables may be categorical or continuous or both.For example the explanatory variables can be income, education,gender, race etc.There are two types of categorical variables:


Introduction

Types of variables

Nominal - unordered categories

Major: Mathematics, Statistics ot Computer ScienceFavorite music: rock, classical, jazz, country, folk, popCriminal offense convictions: murder, robbery, assault

Ordinal - ordered categories, but the exact distances betweencategories are unknown..Examples

Patient condition: excellent, good, fair, poorGovernment spending: too high, about right, too lowHighest attained education level: HS, BS, MS, PhD


Introduction

Types of variables

Binary valentines

A binary variable is a special case of a categorical variable,taking only two values (categories) such as success and failureot true or false.

For binary variables nominal-ordinal distinction is notimportant.


Introduction

Types of variables

Interval variables

An interval variables is one that does have meaningfuldistances between any two values.

Examples: Annual income, height, weight, systolic bloodpressure level.


Introduction

Probability Distributions for Categorical Data

In categorical data analysis, the binomial distribution (and itsmultinomial distribution generalization) plays the role that theNormal distribution does for continuous response.

Recall that for a Bin(n, π) random variable Y

P(Y = y) = pY (y) =(ny

)πy (1− π)n−y for y = 1, . . . , n and

zero otherwise.

E (Y ) = nπ

Var(Y ) = nπ(1− π)

If X1, . . . ,Xn are i.i.d. Bernoulli random variables, i.e.P(X1 = 1) = π and P(X1 = 0) = 1− π, thenY = X1 + · · ·+ Xn ∼ Bin(n, π). In other words Y is thenumber of successes (i.e. 1’s) in n independent Bernoullitrials.


Introduction

Binomial Distribution

Example According to published statistics, 8% of people ages14-24 are school dropouts, i.e. persons who are not in regularschool and who have not completed the 12th grade or any higherdegree degree. Suppose you pick five people at random from thisage group, what is the probability that exactly two of then will beschool dropouts?Solution: Let Y denote the number of school dropouts in thissample of 5 people, then Y ∼ Bin(n = 5, π = 0.08). The questionwants P(Y = 2) pause and using the formula

P(Y = 2) =

(5

2

)(0.08)2(1− 0.08)5−2

= 10× (0.08)× (0.92)3

= 0.049836032.


Introduction

Multinomial Distribution

In some trials more than two outcomes are possible. Suppose nindependent trails can have outcome in any of c categories. Letyij = 1 if the i th outcome results in category j and zero otherwise.

Let nj =n∑

i=1yij , then (n1, n2, . . . , nc) is an observed value (vector)

from a multinomial distribution. The probability mass function ofthe multinomial distribution is given by:

p(n1, n2, . . . , nc) =

(n!

n1!n2! . . . nc !

)πn11 π

n22 . . . πncc . (1)

where πj is the probability of an outcome in category j (for anytrial).


Introduction

Multinomial Distribution: Example

Suppose we have a bowl with 10 marbles - 2 red marbles, 3 greenmarbles, and 5 blue marbles. We randomly select 4 marbles fromthe bowl, with replacement. What is the probability of selecting 2green marbles and 2 blue marbles?Solution: Let Y1,Y1 and , Y3 denote the numbers of red, greenand blue marbles respectively. Then (Y1,Y1,Y3) has a multinomialdistribution with n = 4, π1 = 0.2, π2 = 0.3 and π3 = 0.5 andP(Y1 = 0,Y2 = 2,Y2 = 2) =

(4!

0!2!2!

)0.20 × 0.32 × 0.52 =

6× 0.0225 = 0.135.R commands

> dmultinom(x = c(0, 2, 2), size = 4, prob = c(0.2, 0.3, 0.5))

[1] 0.135

>


Introduction

Multinomial Distribution

Some properties of the Multinomial DistributionIf Y1,Y2, . . . ,Yc−1 have a multinomial (n, π1, π2, . . . , πc), then

Yi ∼ Bin(n, πi )

µi = E (Yj) = nπj

Var(Yj) = nπj(1− πj)Cov(Yj ,Yk) = E ((Yj − µj)(Yk − µk)) = −nπjπk .


Introduction

Poisson Distribution

Sometimes, count data do not result from a fixed number of trials.For example, the number of accidents during a particular period ina particular city. This type of random variables often have aPoisson distribution. The probability mass function of the Poissondistribution is given by

p(y) =e−µµy

y !, y = 0, 1, . . . (2)

The parameter of the distribution µ represents the mean of thedistribution. That is, if Y ∼ Po(µ), then E (Y ) = µ. It can also beshown that Var(Y ) = µ.


Introduction

Poisson Distribution: Example

Births in a hospital occur randomly at an average rate of 1.8 birthsper hour. It is reasonable to assume that distribution of the thenumber of births in a in any particular hour to be Poisson withmean 1.8.What is the probability of observing 4 births in a given hour at thehospital?Solution: Let Y be the number of births in this interval. ThenY ∼ Po(1.8) and so P(Y = 4) = e−1.81.84

4! = 0.0723.


Introduction

Poisson Approximation to the Binomial distribution

If n is large (n ≥ 100) and π is small (usually π ≤ 0.01) (andnπ ≤ 20), then we can use Poisson(µ = nπ) to approximate thebinomial probabilities.


Introduction

Poisson Approximation to the Binomial distribution:Example

Suppose that 1 in 5000 light bulbs are defective. Let Y denote thenumber of defective bulbs in a batch of 10000 bulbs.What is the chance that at most three bulbs will be defective?Solution: Y ∼ Bin(n = 10000, p = 1/5000 = 0.0002).P(Y ≤ 3) = P(Y = 0) + P(Y = 1) + P(Y = 2) + P(Y = 3)=(10000

0

)0.00020(1− 0.0002)10000−0 +

(100001

)0.00021(1−

0.0002)10000−1 +(10000

2

)0.00022(1− 0.0002)10000−2 +(10000

3

)0.00023(1− 0.0002)10000−3 =?

Or we can use the Poisson approximation.Y

approx∼ Po(µ = nπ = 10000× 0.0002) = 2.P(Y ≤ 3) = P(Y = 0) + P(Y = 1) + P(Y = 2) + P(Y = 3) ≈e−2 20

0! + e−2 21

1! + e−2 22

2! + e−2 23

3! = 0.8571230094


Introduction

Poisson Approximation to the Binomial distribution:Example

Here are the R commands calculating P(Y ≤ 3) using the twodistributions:

> pbinom(3, 10000, 0.0002)

[1] 0.8571415

> ppois(3, 2)

[1] 0.8571235


Introduction

The Chi-squared Distribution Another distribution that we oftencome across in categorical data analysis is the chi-squareddistribution. Definition Let Z1,Z2, . . . ,Zν be ν iid randomvariables each having a N(0, 1) distribution., then the distributionof the random variable Y = Z 2

1 + Z 22 + · · ·+ Z 2

ν is called achi-squared distribution with degreed of freedom ν.


Introduction

Some properties of the Chi-squared distribution

1 If Z ∼ N(0, 1), then E (Z 2) = Var(Z ) + (E (Z ))2 = 1 + 02 = 1

2 If X ∼ N(µ, σ2), then, it can be shown that for any integerp ≥ 0,

E (X − µ)2p =(2p)!

p!2pσ2p

andE (X − µ)2p+1 = 0.

3 Var(Z 2) = E (Z 4)− (E (Z 2)) = (2×2)!2!22

× 12×2 − 11 = 2

4 If Y = Z 21 + Z 2

2 + · · ·+ Z 2ν , where Z1,Z2, . . . ,Zν are iid

N(0, 1), then EY = EZ 21 + EZ 2

2 + · · ·+ EZ 2ν = ν

5 Y = Z 21 + Z 2

2 + · · ·+ Z 2ν , where Z1,Z2, . . . ,Zν are iid N(0, 1),

then Var(Y ) = Var(Z 21 ) + Var(Z 2

2 ) + · · ·+ Var(Z 2ν ) = 2ν

6 If Y1 ∼ χ2ν1 and Y2 ∼ χ2

ν2 and if Y1 and Y2 are independent ,then Y1 + Y2 ∼ χ2

ν1+ν2


Introduction

Inference for ProportionsLet Y be the number of successes (i.e. 1’s) in n independentBernoulli trials with success probability π. The probability of asuccess π is usually an unknown parameter and we estimate it bythe sample proportion of successes:

π̂ =Y

n. (3)


Introduction

Some properties of π̂

1 π̂ is an unbiased estimator of π (i.e. E (π̂) = π).

2 Var(π̂) = π(1−π)n

3 π̂Pr→ π by WLLN

4 π̂approx∼ N(π, π(1−π)n ) for large n, by CLT


Introduction

Definition (Likelihood function)The likelihood function is the probability of the observed data,expressed as a function of the parameter value.

Definition (Maximum Likelihood Estimate)The maximum likelihood estimate (MLE) is the parametervalue at which the likelihood function takes its maximum.


stac51: categorical data analysisfisher.utstat.utoronto.ca/~mahinda/stac51/slidesc51_1p.pdfmahinda...

Documents