c4, l1, s1 chapter 2 probability. c4, l1, s2 i am offered two lotto cards: –card 1: has numbers...
Post on 12-Jan-2016
220 Views
Preview:
TRANSCRIPT
C4, L1, S1
Chapter 2Probability
C4, L1, S2
I am offered two lotto cards:
– Card 1: has numbers
– Card 2: has numbers
Which card should I take so that I have the greatest chance of winning lotto?
Lotto
C4, L1, S3
In the casino I wait at the roulette wheel until I see a run of at least five reds in a row.
I then bet heavily on a black.
I am now more likely to win.
Roulette
C4, L1, S4
Coin Tossing
I am about to toss a coin 20 times.
What do you expect to happen?
Suppose that the first four tosses have been heads and there are no tails so far. What do you expect will have happened by the end of the 20 tosses ?
C4, L1, S5
Coin Tossing
• Option A– Still expect to get 10 heads and 10 tails. Since
there are already 4 heads, now expect to get 6 heads from the remaining 16 tosses. In the next few tosses, expect to get more tails than heads.
• Option B– There are 16 tosses to go. For these 16 tosses I
expect 8 heads and 8 tails. Now expect to get 12 heads and 8 tails for the 20 throws.
C4, L1, S6
• In a TV game show, a car will be given away.
– 3 keys are put on the table, with only one of them
being the right key. The 3 finalists are given a
chance to choose one key and the one who
chooses the right key will take the car.
– If you were one of the finalists, would you prefer
to be the 1st, 2nd or last to choose a key?
TV Game Show
C4, L1, S7
Let’s Make a Deal Game Show
• You pick one of three doors – two have booby prizes behind them– one has lots of money behind it
• The game show host then shows you a booby prize behind one of the other doors
• Then he asks you “Do you want to change doors?”– Should you??! (Does it matter??!)
• See the following website:• http://www.stat.sc.edu/~west/javahtml/LetsMakeaDeal.html
C4, L1, S8
Game Show Dilemma
Suppose you choose door A. In which case
Monty Hall will show you either door B or C
depending upon what is behind each.
No Switch Strategy ~ here is what happens
Result A B C
Win Car Goat Goat
Lose Goat Car Goat
Lose Goat Goat Car
P(WIN) = 1/3
C4, L1, S9
Game Show Dilemma
Suppose you choose door A, but ultimately
switch. Again Monty Hall will show you either
door B or C depending upon what is behind each.
Switch Strategy ~ here is what happens
Result A B C
Lose Car Goat Goat
Win Goat Car Goat
Win Goat Goat Car
Monty will show either B or C.
You switch to the one not shown
and lose.
Monty will show door C, you switch to B and win.
Monty will show door B, you switch to C and win.
P(WIN) = 2/3 !!!!
C4, L1, S10
Matching Birthdays• In a room with 23 people what is the
probability that at least two of them will have the same birthday?
• Answer: .5073 or 50.73% chance!!!!!
• How about 30? • .7063 or 71% chance!• How about 40? • .8912 or 89% chance!• How about 50? • .9704 or 97% chance!
C4, L1, S11
Probability
What is Chapter 6 trying to do? – Introduce us to basic ideas about probabilities:
• what they are and where they come from• simple probability models (genetics)• conditional probabilities• independent events• Baye’s Rule
Teach us how to calculate probabilities:• tables of counts and using properties of
probabilities such as independence.
C4, L1, S12
ProbabilityI toss a fair coin (where fair means ‘equally likely outcomes’)
What are the possible outcomes? Head and tail ~ This is called a “dichotomous
experiment” because it has only two possible outcomes. S = {H,T}.
What is the probability it will turn up heads?
1/2
I choose a patient at random and observe whether they are successfully treated.
What are the possible outcomes?
“Success” and “Failure”
What is the probability of successful treatment?
?????
What factors influence this probability? ?????
C4, L1, S13
What are Probabilities?
• A probability is a number between 0 & 1 that quantifies uncertainty.
• A probability of 0 identifies impossibility
• A probability of 1 identifies certainty
C4, L1, S14
Where do probabilities come from?
• Probabilities from models:The probability of getting a four when a fair dice is rolled is
1/6 (0.1667 or 16.7% chance)
C4, L1, S15
• Probabilities from data
or Empirical probabilities
What is the probability that a randomly selected patient is successfully treated?– In a clinical trial n = 67 patients are “randomly”
selected.– 40 of these patients are successfully treated.– The estimated probability that a randomly chosen
patient will have a successful outcome is
40/67 (0.597 or 59.7% chance)
Where do probabilities come from?
C4, L1, S16
• Subjective Probabilities– The probability that there will be another
outbreak of ebola in Africa within the next year is 0.1.
– The probability of rain in the next 24 hours is very high. Perhaps the weather forecaster might say a there is a 70% chance of rain.
– A doctor may state your chance of successful treatment.
Where do probabilities come from?
C4, L1, S17
For equally likely outcomes, and a given event A:
Simple Probability Models
“The probability that an event A occurs”
is written in shorthand as P(A).
P(A) =Number of outcomes in A
Total number of outcomes
C4, L1, S18
1. Heart Disease
In 1996, 6631 Minnesotans died from coronary heart disease. The numbers of deaths classified by age and gender are:
Sex
Age Male Female Total
< 45 79 13 92
45 - 64 772 216 988
65 - 74 1081 499 1580
> 74 1795 2176 3971
Total 3727 2904 6631
C4, L1, S19
Let
A be the event of being under 45B be the event of being maleC be the event of being over 64
1. Heart Disease
Sex
Age Male Female Total
< 45 79 13 92
45 - 64 772 216 988
65 - 74 1081 499 1580
> 74 1795 2176 3971
Total 3727 2904 6631
C4, L1, S20
Find the probability that a randomly chosen member of this population at the time of death was:
a) under 45 P(A) = 92/6631 = 0.0139
1. Heart Disease
Sex
Age Male Female Total
< 45 79 13 92
45 - 64 772 216 988
65 - 74 1081 499 1580
> 74 1795 2176 3971
Total 3727 2904 6631
C4, L1, S21
Conditional Probability
• We wish to find the probability of an event occuring given information about occurrence of another event. For example, what is probability of developing lung cancer given that we know the person smoked a pack of cigarettes a day for the past 30 years.
• Key words that indicate conditional probability are:“given that”, “of those”, “if …”,
“assuming that”
C4, L1, S22
“The probability of event A occurring given that event B has already occurred”
is written in shorthand as P(A|B)
Conditional Probability
C4, L1, S23
P(A|B) =__________ , P(B) > 0
Conditional Probability and Independence
P(A and B) P(B)
Two events A and B are said to be independent if
P(A|B) = P(A) and P(B|A) = P(B)
i.e. knowing the occurrence of one of the events tells you nothing about the occurrence of the other.
C4, L1, S24
1. Heart Disease
Sex
Age Male Female Total
< 45 79 13 92
45 - 64 772 216 988
65 - 74 1081 499 1580
> 74 1795 2176 3971
Total 3727 2904 6631
Find the probability that a randomly chosen member of this population at the time of death was:
b) male assuming that the person was younger than 45.
C4, L1, S25
Sex
Age Male Female Total
< 45 79 13 92
45 - 64 772 216 988
65 - 74 1081 499 1580
> 74 1795 2176 3971
Total 3727 2904 6631
Find the probability that a randomly chosen member of this population at the time of death was:
b) male given that the person was younger than 45. P(B|A) = 79/92 = 0.8587
2. Heart Disease
P(B|A) = P(A and B)/P(A) = (79/6631)/(92/6631) = 79/92
C4, L1, S26
Sex
Age Male Female Total
< 45 79 13 92
45 - 64 772 216 988
65 - 74 1081 499 1580
> 74 1795 2176 3971
Total 3727 2904 6631
Find the probability that a randomly chosen member of this population at the time of death was:
c) male and was over 64.
P(B and C) = (1081 + 1795)/6631= 2876/6631=.434
1. Heart Disease
C4, L1, S27
Sex
Age Male Female Total
< 45 79 13 92
45 - 64 772 216 988
65 - 74 1081 499 1580
> 74 1795 2176 3971
Total 3727 2904 6631
Find the probability that a randomly chosen member of this population at the time of death was:
d) over 64 given they were female (not B).
1. Heart Disease
C4, L1, S28
Sex
Age Male Female Total
< 45 79 13 92
45 - 64 772 216 988
65 - 74 1081 499 1580
> 74 1795 2176 3971
Total 3727 2904 6631
P(C|not B) = (499+2176)/2904 = .9211
1. Heart DiseaseFind the probability that a randomly chosen member of this population at the time of death was:
d) over 64 given they were female (not B).
C4, L1, S29
2. Hodgkin’s Disease
Type None Partial
Positive
Row
Totals
LD 44 10 18 72
LP 12 18 74 104
MC 58 54 154 266
NS 12 16 68 96
Column
Totals
126 98 314 n = 538
Response to Treatment
C4, L1, S30
2. Hodgkin’s Disease
C4, L1, S31
2. Hodgkin’s Disease
Type
None Partial Positive
Row
Totals
LD 44 10 18 72
LP 12 18 74 104
MC 58 54 154 266
NS 12 16 68 96
Column
Totals
126 98 314 n = 538
Response to Treatment
a) Had positive response to treatment
P(pos) = 314/538 = .584 or 58.4% chance
C4, L1, S32
2. Hodgkin’s Disease
Type
None Partial PositiveRow
Totals
LD 44 10 18 72
LP 12 18 74 104
MC 58 54 154 266
NS 12 16 68 96
Column
Totals
126 98 314 n = 538
Response to Treatment
b) Had at least some response to treatment
P(par or pos) = (98 + 314)/538 = 412/538
= .766 or 76.6% chance
C4, L1, S33
2. Hodgkin’s DiseaseType
None Partial PositiveRow
Totals
LD 44 10 18 72
LP 12 18 74 104
MC 58 54 154 266
NS 12 16 68 96
Column
Totals
126 98 314 n = 538
Response to Treatment
c) Had positive response to treatment given they have LP
P(pos|LP) = 74/104 = .7115 or 71.15%
C4, L1, S34
2. Hodgkin’s Disease
Type
None Partial Positive
Row
Totals
LD 44 10 18 72
LP 12 18 74 104
MC 58 54 154 266
NS 12 16 68 96
Column
Totals
126 98 314 n = 538
Response to Treatment
d) Had positive response to treatment given they have LD.
P(pos|LD) = 18/72= .25 or 25.0% chance
C4, L1, S35
Mosaic Plot with Conditional Probs.
C4, L1, S36
Adding Conditional Probs. in JMP
C4, L1, S37
3. Helmet Use and Head Injuries in Motorcycle Accidents (Wisconsin, 1991)
Brain Injury
No Brain Injury
Row Totals
No Helmet 97 1918 2015
Helmet Worn 17 977 994
ColumnTotals 114 2895 3009
BI = the event the motorcyclist sustains brain injury
NBI = no braininjury
H = the event themotorcyclist waswearing a helmet
NH = no helmet worn P(BI) = 114 / 3009 = .0379
What is the probability that a motorcyclist involved in a accident sustains brain injury?
C4, L1, S38
3. Helmet Use and Head Injuries in Motorcycle Accidents (Wisconsin, 1991)
Brain Injury
No Brain Injury
Row Totals
No Helmet 97 1918 2015
Helmet Worn 17 977 994
ColumnTotals 114 289 3009
BI = the event the motorcyclist sustains brain injury
NBI = no braininjury
H = the event themotorcyclist waswearing a helmet
NH = no helmet worn P(H) = 994 / 3009 = .3303
What is the probability that a motorcyclist involved in a accident was wearing a helmet?
C4, L1, S39
3. Helmet Use and Head Injuries in Motorcycle Accidents (Wisconsin, 1991)
Brain Injury
No Brain Injury
Row Totals
No Helmet 97 1918 2015
Helmet Worn 17 977 994
ColumnTotals 114 2895 3009
What is the probability that the cyclist sustained brain injury given they were wearing a helmet? P(BI|H) = 17 / 994 = .0171
BI = the event the motorcyclist sustains brain injury
NBI = no braininjury
H = the event themotorcyclist waswearing a helmet
NH = no helmet worn
C4, L1, S40
3. Helmet Use and Head Injuries in Motorcycle Accidents (Wisconsin, 1991)
Brain Injury
No Brain Injury
Row Totals
No Helmet 97 1918 2015
Helmet Worn 17 977 994
ColumnTotals 114 2895 3009
What is the probability that the cyclist not wearing a helmet sustained brain injury? P(BI|NH) = 97 / 2015
= .0481
BI = the event the motorcyclist sustains brain injury
NBI = no braininjury
H = the event themotorcyclist waswearing a helmet
NH = no helmet worn
C4, L1, S41
3. Helmet Use and Head Injuries in Motorcycle Accidents (Wisconsin, 1991)
Brain Injury
No Brain Injury
Row
Totals
No Helmet 97 1918 2015
Helmet Worn 17 977 994
Column
Totals 114 2895 3009
How many times more likely is a non-helmet wearer to sustain brain injury?
.0481 / .0171 = 2.81 times more likely. This is called the relative risk or risk ratio (denoted RR).
C4, L1, S42
Example 3: Helmet Use and Head Injuries in Motorcycle Accidents (Wisconsin, 1991)
The shading for Brain Injury for the No Helmet group is roughly three times higher than the shading for Brain Injury for the Helmet Worn group. (recall RR = 2.81)
Motorcyclists not wearing a helmet are at three times the risk of suffering brain injury.
C4, L1, S43
Building a Contingency Table from a Story
4. HIV Example
A European study on the transmission of the HIV
virus involved 470 heterosexual couples.
Originally only one of the partners in each couple
was infected with the virus. There were 293
couples that always used condoms. From this
group, 3 of the non-infected partners became
infected with the virus. Of the 177 couples who
did not always use a condom, 20 of the non-
infected partners became infected with the virus.
C4, L1, S44
Let C be the event that the couple always used condoms. (NC be the complement)
Let I be the event that the non-infected partner became infected. (NI be the complement)
C NC
NI
I
4. HIV Example
Total
Total
Condom UsageInfectio
n Status
C4, L1, S45
A European study on the transmission of the HIV virus involved 470 heterosexual couples. Originally only one of the partners in each couple was infected with the virus. There were 293 couples that always used condoms. From this group, 3 of the non-infected partners became infected with the virus.
C NC
NI
I
4. HIV Example
Total
Total
Condom UsageInfectio
n Status
470293
3
C4, L1, S46
Of the 177 couples who did not always use a condom, 20 of the non-infected partners became infected with the virus.
C NC
NI
I
4. HIV Example
Total
Total
Condom UsageInfectio
n Status
470293
3 20
177
290 15723
447
C4, L1, S47
a) What proportion of the couples in this study always used condoms?
C NC
NI
I
Total
Total
Condom UsageInfection
Status
470293
3 20
177
290 15723
447
4. HIV Example
P(C )
C4, L1, S48
a) What proportion of the couples in this study always used condoms?
C NC
NI
I
Total
Total
Condom UsageInfection
Status
470293
3 20
177
290 15723
447
4. HIV Example
P(C ) = 293/470 (= 0.623)
C4, L1, S49
b) If a non-infected partner became infected, what is the probability that he/she was one of a couple that always used condoms?
4. HIV Example
C NC
NI
I
Total
Total
Condom UsageInfection
Status
470293
3 20
177
290 15723
447
P(C|I ) = 3/23 = 0.130
C4, L1, S50
4. HIV Example
c) In what percentage of couples did the non-HIV partner become infected amongst those that did not use condoms?
P(I|NC) = 20/177 = .113 or 11.3%• Amongst those that did where condoms?
P(I|C) = 3/293 = .0102 or 1.02%• What is relative risk of infection associated
with not wearing a condom?
RR = P(I|NC) / P(I|C) = 11.08 times more likely to become infected.
C4, L1, S51
4. HIV Example
The percentage of couples where the non-HIV partner became infected in the non-condom user group is 11 times higher than that for condom group.
The risk of infection is 11 time higher in no condom group
C4, L1, S52
Relative Risk (RR) and Odds Ratio (OR)
Example: Age at First Pregnancy and Cervical Cancer
A case-control study was conducted to determine whether there was increased risk of cervical cancer amongst women who had their first child before age 25. A sample of 49 women with cervical cancer was taken of which 42 had their first child before the age of 25. From a sample of 317 “similar” women without cervical cancer it was found that 203 of them had their first child before age 25.
Q: Do these data suggest that having a child at or before age 25 increases risk of cervical cancer?
C4, L1, S53
Relative Risk (RR) and Odds Ratio (OR)
The ODDS for an event A are defined as
Odds for A = _______P(A)
1 – P(A)
For example suppose we roll a single die the odds for a 3 are:
Odds for 3 = P(3)/(1 – P(3)) = = (1/6)/(1 – (1/6)) = 1/5
1 three for every 5 rolls that don’t result in a six.
(Odds for a 3 are 1:5 and odds against are 5:1)
C4, L1, S54
Relative Risk (RR) and Odds Ratio (OR)The Odds Ratio (OR) for a disease associated with a risk
factor is ratio of the odds for disease for those with risk factor and the odds for disease for those without the risk factor
OR = _________________________
P(Disease|Risk Factor)
1 – P(Disease|Risk Factor)
_____________________
P(Disease|No Risk Factor)
1 – P(Disease|No Risk Factor)
_______________________
The Odds Ratio gives us the multiplicative increase in odds associated with having the “risk factor”.
Odds for disease amongst those with risk factor present
Odds for disease amongst those without the risk factor.
C4, L1, S55
Relative Risk (RR) and Odds Ratio (OR)
Age at 1st Pregnancy Case Contro
l
Row Totals
Age < 25
42 203 245
Age > 25
7 114 121
Column
Totals 49 317 n = 366
Cervical Cancer
a) Why can’t we calculate P(Cervical Cancer | Age < 25)?Because the number of women with disease was fixed in advance and therefore NOT RANDOM !
C4, L1, S56
Relative Risk (RR) and Odds Ratio (OR)
Age at 1st Pregnancy Case Contro
l
Row Totals
Age < 25
42 203 245
Age > 25
7 114 121
Column
Totals 49 317 n = 366
Cervical Cancer
b) What is P(risk factor|disease status) for each group?P(Age < 25|Case) = 42/49 = .857 or 85.7%
P(Age < 25|Control) = 203/317 = .640 or 64.0%
C4, L1, S57
Relative Risk (RR) and Odds Ratio (OR)
Age at 1st Pregnancy Cas
eContr
ol
Row Totals
Age < 25
42 203 245
Age > 25
7 114 121
Column
Totals 49 317 n = 366
Cervical Cancer
c) What are the odds for the risk factor amongst the cases?
Amongst the controls?
Odds for risk factor cases = .857/(1-.857) = 5.99
Odds for risk factor controls = .64/(1- .64) = 1.78
C4, L1, S58
Relative Risk (RR) and Odds Ratio (OR)
Age at 1st Pregnancy Case Contro
l
Row Totals
Age < 25
42 203 245
Age > 25
7 114 121
Column
Totals 49 317 n = 366
Cervical Cancer
d) What is the odds ratio for the risk factor associated with being a case?
Odds Ratio (OR) = 5.99/1.78 = 3.37, the odds for having 1st child on or before age 25 are 3.37 times higher for women who currently have cervical cancer versus those that do not have cervical cancer.
C4, L1, S59
Relative Risk (RR) and Odds Ratio (OR)
Odds Ratio
The ratio of dark to light shading is 3.37 times larger for the cervical cancer group than it is for the control group.
C4, L1, S60
e) Even though it is inappropriate to do so calculate P(disease|risk status).
P(case|Age<25) = 42/245 = .171 or 17.1%
P(case|Age>25) = 7/121 = .058 or 5.8% Now calculate the odds for disease
given the risk factor statusOdds for Disease for 1st Preg. Age < 25 = .171/(1 - .171) = .207Odds for Disease for 1st Preg. Age > 25 = .058/(1 - .058) = .061
Relative Risk (RR) and Odds Ratio (OR)
C4, L1, S61
f) Finally calculate the odds ratio for disease associated with 1st pregnancy age < 25 years of age. Odds Ratio = .207/.061 = 3.37
This is exactly the same as the odds ratio for having the risk factor (Age < 25) associated with being in the cervical cancer group!!!!
Relative Risk (RR) and Odds Ratio (OR)
Final Conclusion: Women who have their first child at or before age 25 have 3.37 times the odds of developing cervical cancer when compared to women who had their first child after the age of 25.
C4, L1, S62
Relative Risk (RR) and Odds Ratio (OR)
Risk Factor
Status Case Control
Risk Factor Present
a b
Risk Factor Absent
c d
Disease Status
OR = _____a X d
b X c
Much easier computational formula!!!
C4, L1, S63
Relative Risk (RR) and Odd’s Ratio (OR)
When the disease is fairly rare, i.e. P(disease) < .10 or 10%, then one can show that the odds ratio and relative risk are similar.
OR is approximately equal to RR when
P(disease) < .10 or 10% chance.
In these cases we can use the phrase:
“… times more likely” when interpreting the OR.
C4, L1, S64
Relative Risk (RR) and Odds Ratio (OR)
Age at 1st Pregnancy Case Contro
l
Row Totals
Age < 25
a42
b203 245
Age > 25
c7
d114 121
Column
Totals 49 317 n = 366
OR = (42 X 114)/(7 X 203) = 3.37 Because less than 10% of the population of women develop cervical cancer we can say women who have their first child at or before age 25 are 3.37 times more likely to develop cervical cancer than women who have their first child after age 25.
C4, L1, S65
More About RR and OR• The most commonly cited advantage of the RR over the OR
is that the former is the more natural interpretation. The relative risk comes closer to what most people think of when they compare the relative likelihood of events.
e.g. suppose there are two groups, one with a 25% chance of mortality and the other with a 50% chance of mortality. Most people would say that the latter group has it twice as bad. But the odds ratio is 3, which seems too big.
RR = .50/.25 = 2.00 OR = P(death|high mortality)/P(survive|high mortality) P(death|low mortality)/P(survive|low mortality)
= .50/(1 - .50) = 3.00 .25/(1 - .25)
C4, L1, S66
More About RR and OREven more extreme examples are possible. A change
from 25% to 75% mortality represents a relative risk of 3, but an odds ratio of 9. A change from 10% to 90% mortality represents a relative risk of 9 but an odds ratio of 81.
RR = .90 /.10 = 9.00
OR = P(death|high mortality)/P(survive|high mortality)
P(death|low mortality)/P(survive|low mortality)
= .90/(1 - .90) = 81.00
.10/(1 - .10)
C4, L1, S67
More About RR and OR
• OR’s arise as part of logistic regression which we will study later in the course.
• Despite their pitfalls OR’s are really the only option when case-control studies are used.
• Any study of risk needs to adjust for potential confounding factors which is typically done using logistic regression.
top related