basic probability with an emphasis on contingency tables

Post on 24-Dec-2015

253 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Basic Probability

With an Emphasis on Contingency Tables

Students in PSYC 2101

• Skip to Slide # 7.

Random Variable

• A random variable is real valued function defined on a sample space.– The sample space is the set of all distinct

outcomes possible for an experiment.– Function: two sets’ (well defined collections

of objects) members are paired so that each member of the one set (domain) is paired with one and only one member of the other set (range)

• The domain is the sample space, the range is a set of real numbers.

• A random variable is the set of pairs created by pairing each possible experimental outcome with one and only one real number.

Examples

the outcome of rolling a die: = 1, = 2, = 3, etc. (Each outcome has only one number, and, vice versa)

= 1, = 2, = 1, etc. (each outcome has (odd-even) only one number, but not vica versa)

The weight of each student in my statistics class.

Probability Distribution

• Each value of the random variable is paired with one and only one probability.

• More on this later.

Probability Experiments

• A probability experiment is a well-defined act or process that leads to a single well defined outcome.– Flip a coin, heads or tails.– Roll a die, how many spots up.– Stand on a digital scale, what number is

displayed.

Probability

• The probability of an event, P(A) is the fraction of times that event will occur in an indefinitely long series of trials of the experiment.

• Cannot be known, can be estimated.

Estimating Probability

• Empirically – perform experiment many times, compute relative frequencies.

• Rationally – make assumptions and then apply logic.

• Subjectively – strength of individual’s belief regarding whether an event will or will not happen – often expressed in terms of odds.

Odds of Occurrence of Event A

• If the experiment were performed (a & b) times, we would expect A to occur a times and B to occur b times.

• There are 20 students in a class, 14 of whom are women. If randomly select one, what are the odds it will be a woman?

• 14 to 6 = 7 to 3.

Convert Odds to Probability

• Probability = a/(a & b).• 14 women, 6 men.• Odds = 7 to 3.• Probability = 7 out of 10.

Convert Probability to Odds

• Odds = P(A)/P(not A)• Probability = .70• Odds = .70/(1 - .70) = 7 to 3

Independence

• Two events are independent iff (if and only if) the occurrence or non-occurrence of the one has no effect on the occurrence or non-occurrence of the other.– I roll a die twice. The outcome on the first roll

has no influence on the outcome on the second roll.

Mutual Exclusion

• Two events are mutually exclusive iff the occurrence of the one precludes occurrence of the other (both cannot occur simultaneously on any one trial).– You could earn final grade of A in this class.– You could earn a B.– You can’t earn both.

Mutual Exhaustion

• Two (or more) events are mutually exhaustive iff they include all possible outcomes.– You could earn a final grade of A, B, C, D, or

F.– These are mutually exhaustive since there are

no other possibilities.

Marginal Probability

• The marginal probability of event A, P(A), is the probability of A ignoring whether or not any other event has also occurred.– P(randomly selected student is female) =

.70

Conditional Probability of A

• the probability that A will occur given that B has occurred

• P(A|B), the probability of A given B.– Given that the selected student is wearing a

skirt, the probability that the student is female is .9999

– Unless you are in Scotland• If P(A|B) = P(A), the A and B are

independent of each other.

Joint Probability

• The probability that both A and B will occur.

• P(A B) = P(A) P(B|A) = P(B) P(A|B)• If A and B are independent, this simplifies

to P(A B) = P(A) P(B)• This is known as the Multiplication Rule

The Addition Rule

• If A and B are mutually exclusive, the probability that one or the other will occur is the sum of their separate probabilities.

.5 .3 .2 P(B) P(A) B) P(A

Grade A B C D F

Probability .2 .3 .3 .15 .05

• If A and B are not mutually exclusive, things get a little more complicated.

• P(A B) = P(A) + P(B) - P(A B)

Two-Way Contingency Table

• A matrix where rows represent values of one categorical variable and columns represent values of a second categorical variable.

• Can be use to illustrate the relationship between two categorical variables.

Survey Questions

• We have asked each of 150 female college students two questions:

1. Do you smoke (yes/no)?

2. Do you have sleep disturbances (yes/no)?

• Suppose that we obtain the following data (these are totally contrived, not real):

Marginal Probabilities

Sleep?

Smoke? No Yes

No 20 30 50

Yes 40 60 100

60 90 150

60.5

3

15

9

150

90 P(Sleep) 66.

3

2

15

10

150

100 P(Smoke)

Conditional Probabilities Show Absolute Independence

Sleep?

Smoke? No Yes

No 20 30 50

Yes 40 60 100

60 90 150

60.5

3

50

30 Nosmoke)|P(Sleep 60.

5

3

100

60 Smoke) | P(Sleep

Multiplication Rule Given Independence

• Sixty of 150 have sleep disturbance and smoke, so P (Sleep Smoke) = 60/150 = .40

• P(A B) = P(A) x P(B)

40.15

6

3

2

5

3

P(Smoke) x P(Sleep) Smoke) P(Sleep

“Sleep” = Sexually Active

• Preacher claims those who smoke will go to Hell.

• And those who fornicate will go to Hell.• What is the probability that a randomly

selected coed from this sample will go to Hell?

Addition Rule

66.15

10

150

100 P(Smoke) 60.

15

9

150

90 P(Sleep)

27.115

19

15

10

15

9 P(Smoke) P(Sleep)

A probability cannot exceed one.Something is wrong here!

Welcome to Hell

• The events (sleeping and smoking) are not mutually exclusive.

• We have counted the overlap between sleeping and smoking (the 60 women who do both) twice.

• 30 + 40 + 60 = 130 of the women sleep and/or smoke.

• The probability we seek = 130/150 = 13/15 = .87

Addition Rule For Events That Are NOT Mutually Exclusive

.87.15

13

15

6-

15

10

15

9

Smoke) P(Sleep - P(Smoke) P(Sleep)

Smoke) P(Sleep

Sleep = Sexually Active, Smoke = Use Cannabis

Sleep?

Smoke? No Yes

No 30 20 50

Yes 40 60 100

70 80 150

Marginal Probabilities

Sleep?

Smoke? No Yes

No 30 20 50

Yes 40 60 100

70 80 150

35.15

8

150

80 P(Sleep) 66.

3

2

150

100 P(Smoke)

Conditional Probabilities Indicate Nonindependence

Sleep?

Smoke? No Yes

No 30 20 50

Yes 40 60 100

70 80 150

40.50

20 Nosmoke)|P(Sleep 60.

100

60 Smoke) | P(Sleep

Joint Probability

• What is the probability that a randomly selected coed is both sexually active and a cannabis user?

• There are 60 such coeds, so the probability is 60/150 = .40.

• Now let us see if the multiplication rule works with these data.

Multiplication Rule

• Oops, this is wrong. The joint probability is .40. We need to use the more general form of the multiplication rule.

53.45

16

3

2

15

8

P(Smoke) x P(Sleep)

Smoke) P(Sleep

Multiplication Rule NOT Assuming Independence

• Now that looks much better.

.40.15

6

5

3

3

2

Smoke)|P(Sleep P(Smoke)

Sleep) P(Smoke

Actual Data From Jury Research

• Castellow, Wuensch, and Moore (1990, Journal of Social Behavior and Personality, 5, 547-562

• Male employer sued for sexual harassment by female employee.

• Experimentally manipulated physical attractiveness of both litigants

Effect of Plaintiff Attractiveness

• P(Guilty | Attractive) = 56/73 = 77%.• P(Guilty | Not Attractive) = 39/72 = 54%.• Defendant found guilty more often if

plaintiff was attractive.

Guilty?

Plaintiff Attractive?

No

Yes

No 33 39 72

Yes 17 56 73

50 95 145

Odds and Odds Ratios• Odds(Guilty | Attractive) = 56/17• Odds(Guilty | Not Attractive) = 39/33• Odds Ratio = 56/17 39/33 = 2.79.• Odds of guilty verdict 2.79 times higher

when plaintiff is attractive.

Guilty?

Plaintiff Attractive?

No

Yes

No 33 39 72

Yes 17 56 73

50 95 145

Effect of Defendant Attractiveness

• P(Guilty | Not Attractive) = 53/70 = 76%.• P(Guilty | Attractive) = 42/75 = 56%.• The defendant was more likely to be found

guilty when he was unattractive.

Guilty?

Attractive? No Yes

No 17 53 70

Yes 33 42 75

50 95 145

Odds and Odds Ratio

• Odds(Guilty | Not Attractive) = 53/17.• Odds(Guilty | Attractive) = 42/33.• Odds Ratio = 53/17 42/33 = 2.50.• Odds of guilty verdict 2.5 times higher

when defendant is unattractive.

Guilty?

Attractive? No Yes

No 17 53 70

Yes 33 42 75

50 95 145

Combined Effects of Plaintiff and Defendant Attractiveness

• Plaintiff attractive, Defendant not = 83% guilty.

• Defendant attractive, Plaintiff not = 41% guilty.

• Odds ratio = 83/17 41/59 = 7.03.• When attorney tells you to wear Sunday

best to trial, listen.

Odds Ratios and Probability Ratios

• Odds of Success– 90/10 = 9 for Antibiotic Group– 40/60 = 2/3 for Homeopathy Group– Odds Ratio = 9/(2/3) = 13.5

Odds Ratios and Probability Ratios

• Odds of Failure– 10/90 = 1/9 for Antibiotic Group– 60/40 = 1.5 for Homeopathy Group– Odds Ratio = 1.5/(1/9) = 13.5

Notice that the odds ratio comes out the same with both perspectives.

Odds Ratios and Probability Ratios• Probability of Success

– 90/100 = .9 for Antibiotic Group– 40/100 = .4 for Homeopathy Group– Probability Ratio = .9/(.4) = 2.25

Odds Ratios and Probability Ratios

• Probability of Failure– 10/100 = .1 for Antibiotic Group– 60/100 = .6 for Homeopathy Group– Odds Ratio = .6/(.1) = 6

Notice that the probability ratio differs across perspectives.

Another Example

• According to Medscape, 0.5% of the general population has narcissistic personality disorder (NPD)

• The rate is 20% among members of the US Military.

Odds Ratios

• Odds of NPD– Military: .2/.8 = .25– General: .005/.995 = .005– Ratio: .25/.005 = 49.75

• Odds of NOT NPD– Military: .8/.2 = 4– General: .995/.005 = 199– Ratio: 199/4 = 49.75

Probability Ratios

• Probability of NPD– Military: 20%– General: 0.5%– Ratio: 20/0.5 = 40.

• Probability of NOT NPD– Military: 80%– General: 99.5%– Ratio: .995/.8 = 1.24

Probability Distributions

• For a discrete variable, pair each value with the probability of obtaining that value.

• For example, I flip a fair coin five times. What is the probability for each of the six possible outcomes?

• May be a table, a chart, or a formula.

Probability Table

Number of Heads

Percent

0 3.11 15.62 31.23 31.24 15.65 3.1

Probability Chart

Probability Formula

• y is number of heads, n is number of tosses, p is probability of heads, q is probability of tails

ynyqpy

nyYP

!y)-(n !

!

Continuous Variable

• There is an infinite number of values, so a table relating each value to a probability would be infinitely large.

• The probability of any exact value is vanishingly small.

• We can find the probability that a randomly selected case has a value between a and b.

Evolution of a Continuous Variable

• I’ll start with a histogram for a discrete variable.

• In each step I’ll double the number of values (and number of bars).

• All the way up to an infinite number of values with each bar infinitely narrow.

• Now one final step, to an uncountably large number of bars, each infinitely narrow, yielding a continuous, uniform distribution ranging from A to B.

• Now I do the same but I start with a binomial distribution with p = .5 and three bars.

• Note that the bars are not all of equal height.

• Each time I split one, I lower the height of the tail-wards one more than the center-wards one.

• Now one final leap to a continuous (normal) distribution with an uncountably large number of infinitely narrow bars.

Random Sampling

• Sampling N data points from a population is random if every possible different sample of size N was equally likely to be selected.

• Random samples most often will be representative of the population.

• Our stats assume random sampling.

Y Random, X Not

  ProbabilitySample X Y

AB 1/2 1/6AC 0 1/6AD 0 1/6BC 0 1/6BD 0 1/6CD 1/2 1/6

Counting Rules

• PSYC 2101 students can skip the material in the rest of this slide show.

Arranging Y Things

• There are Y! ways to arrange Y different things.

• I am getting a four scoop ice cream cone.• Chocolate, Vanilla, Coconut, and Mint.• How many different ways can I arrange

these four flavors?• 4! = 4(3)(2)(1) = 24.

Permutations

• If I have 10 different flavors, how many different ways can I select and arrange 4 different flavors from these 10?

)!(

!

YN

N

5040!6

!678910

)!410(

!10

Combinations

• Same problem, but order of the flavors does not count.

• The are Y! ways to arrange Y things, so just divide the number of permutations by Y!

210234!6

!678910

!4!6

!10

!)!(

!

YYN

N

Number of Different Strings

• CL = number of different strings• C is the number of different characters

available• L is the length of the string.• Ten different characters (0 – 9) and two

character strings• 102 = 100 different strings

• Use letters instead (A through Z)• 262 = 676 different strings• Use letters and numbers• 362 = 1,296 different strings• Use strings of length 1 or 2.• 36 + 1,296 = 1,332 different strings

• Use strings of length up to 3.• 363 = 46,656 three character strings• + 1,332 one and two character strings• 47,988 different strings.• Use lengths up to 4• 1,679,616 + 47,988 = 1,727,604• Use lengths up to 5• 60,466,176 + 1,727,604 = 62,193,780

• Use strings of length up to 6• 2,176,782,336 + 62,193,780 =

2,238,976,116 different strings• That is over 2 BILLION different strings.

top related