binomial distribution definition 4.1. a random variable is a variable that assumes numerical values...

Binomial Distribution

Definition 4.1. A random variable is a variable that assumes numerical values associated with the random outcomes of an experiment, where one and only one numerical value is assigned to each sample point. Random variables that can assume a countable or finite number of value are called discrete. Random variables that can assume values corresponding to all of points contained in one or more intervals are called continuous.


Example 4.1. Identify the following variables as

discrete or continuous.

(a) The reaction time difference to the stimulus

before and after training. continuous

(b) The number of violent crimes committed per

month in your community. discrete

(c ) The number of commercial aircraft near-

misses per month. discrete


(d) The number of winners each week in a

state lottery. discrete

(e) The number of free throws made per

game by a basketball team. discrete

(f) The distance traveled by a school bus

each day. continuous


To completely describe a discrete random

variable one must specify the possible

values that the random variable can

assume and the probability associated

with each value.


The probability distribution of a discrete

random variable must satisfy the following

two rules.

(1) p(x) 0 for all x

(2) p(x) = 1where the summation of p(x)

is over all possible values of x.


Example 4.2. In each case determine

whether the given values can serve as the

probabilities for a random variable that

can take on the values 1, 2, 3, 4.

(a) p(1) = .2, p(2) = .8, p(3) = .2, p(4) = -.2

(b) P(1) = .25, p(2) = .17, p(3) = .39, p(4)

= .19


Sometimes it helps to table the discrete

probability distribution(s):

X P1(X) P2(X)

1 .2 .25

2 .8 .17

3 .2 .39

4 -.2 .19


Solution. In both (a) and (b),

p(1) + p(2) + p(3) + p(4) = 1. However, in

(a) p(4) = -.2. Recall that one of the

rules is that probabilities are non-

negative. So only the set in (b) can serve

as the probabilities of a random variable

with possible values 1, 2, 3, 4.


Definition 4.2. The mean, or expected

value, of a discrete random variable x is

= E(x) = x p(x). The variance of a discrete

random variable is 2 = E[(x - )2] = [(x - )2 ]p(x). The standard deviation of a

discrete random variable is equal to the

square root of the variance, i.e., = sqrt(2).


Example 4.3. Suppose that the

probabilities are .4, .3, .2, and .1 that 1, 2,

3, or 4 new anti-inflammatory drugs

respectively will be approved by the FDA

in the year 2003.


(a) Find the mean of this distribution.

Solution. = x p(x) = (1)(.4) + (2)(.3) +

(3)(.2) + (4)(.1) = 2

(b) Find the variance of this distribution.

Solution. 2 = (x - )2p(x) = (1 – 2)2(.4) +

(2 – 2)2(.3) + (3 –2)2(.2) + (4 – 2)2(.1) = 1


Let x be a discrete random variable with

probability distribution p(x), mean , and

standard deviation . Then, depending

on the shape of p(x), the following

probability statements can be made:


Chebyshev’s Empirical Rule Rule

P( - < x < + ) 0 .68

P( - 2 < x < +2) ¾ .95P( - 3 < x < + 3) 8/9 1.00

Chebyshev’s rule applies to any probability distribution. The empirical rule applies to distributions that are mound-shaped and symmetric.


Recall that we had that if events A and B

are independent, p(AB) = p(A)p(B).

One can also show that if we have four

events – A, B, C, D – then if each pair of

the events is independent then

p(AB CD) = p(A) p(B) p(C) p(D).


Example 4.4. Consider a population of

voters that is .4 Democrats and .6

Republicans. Suppose we choose a voter

at random from the population, put the

voter back in the population, and repeat

the process a total of four times.


Each time we take a voter there are 2

possible outcomes – D and R. We take a

voter 4 times. Therefore there are 2 x 2 x

2 x 2 = 16 outcomes.


DDDD RRRR

DDDR RRRD

DDRR RRDD

DDRD RRDR

DRDD RDRR

DRRD RDDR

DRDR RDRD

DRRR RDDD


Since we put a voter back after polling her or him, P(D) = .4 and P(R) =.6 on each of the four draws. So the four draws are independent, i.e., the probability of a D (or R) on a given draw does not depend on what the outcomes of the previous draws were. So we get the probability of the intersection of any set of four results on the four draws by multiplying.


Outcome P Outcome PDDDD .4x.4x.4x.4 RRRR .6x.6x.6x.6 DDDR .4x.4x.4x.6 RRRD .6x.6x.6x.4DDRR .4x.4x.6x.6 RRDD .6x.6x.4x.4DDRD .4x.4x.6x.4 RRDR .6x.6x.4x.4DRDD .4x.6x.4x.4 RDRR .6x.4x.6x.6DRRD .4x.6x.6x.4 RDDR .6x.4x.4x.6DRDR .4x.6x.4x.6 RDRD .6x.4x.6x.4DRRR .4x.6x.6x.6 RDDD .6x.4x.4x.4


Outcome P Outcome PDDDD (.4)4x(.6)0 RRRR (.4)0x(.6)4 DDDR (.4)3x(.6)1 RRRD (.4)1x(.6)3 DDRR (.4)2x(.6)2 RRDD (.4)2x(.6)2 DDRD (.4)3x(.6)1 RRDR (.4)1x(.6)3 DRDD (.4)3x(.6)1 RDRR (.4)1x(.6)3 DRRD (.4)2x(.6)2 RDDR (.4)2x(.6)2 DRDR (.4)2x(.6)2 RDRD (.4)2x(.6)2 DRRR (.4)1x(.6)3 RDDD (.4)3x(.6)1


Now the random variable we are

interested in is X = the number of

Democrats we get in four draws from the

voter pool. X has the following set of

possible values: {0, 1, 2, 3, 4}. What are

the probabilities of each value?


X P(X)

0 1 . (.4)0 . (.6)4

1 4 . (.4)1 . (.6)3

2 6 . (.4)2 . (.6)2

3 4 . (.4)3 . (.6)1

4 1 . (.4)4 . (.6)0


4 4

But 1 = 4!/0!4! = ( ) = ( )

4 0 4

4

4 = 4!/1!3! = ( ) = ( ) , and

1 4 3

6 = 4!/2!2! = ( ). 2 So we can write the probability distribution for x as:


X P(X)

4

0 ( ). (.4)0 . (.6)4

0 4

1 ( ). (.4)1 . (.6)3

4 1

2 ( ). (.4)2 . (.6)2

2 4

3 ( ) . (.4)3 . (.6)1

4 3

4 ( )

. (.4)4 . (.6)0

4


So we can say that in choosing four

voters one at a time with replacement the

probability that the number of Democrats

is x, that is, p(X = x) = 4

( ) . (.4)x . (.6)4 - x for x = 0, 1, 2, 3, 4.

x


Now we chose 4 voters but suppose

instead that we were interested in n.

Also, we said that p(D) and p(R) in a

single draw were .4 and .6 respectively.

Suppose instead that they were p and

(1- p) = q respectively where p is any number

such that 0 < p < 1.


Then we can say that in choosing n voters one at a time with replacement, where P(D) = p and P(R) = 1 – p = q, the probability that the number of Democrats is x, that is, P(X = x) = n

( ) . px qn - x for x = 0, 1, 2, . . ., n. x


We call this the binomial probability

distribution of x successes in n trials or

b(n, p).


Notice there were four key assumptions in

developing this distribution:

1. There is a fixed number, n, of

identical repetitions or trials. In our case

a trial was drawing a voter.


2. There are only two possible outcomes

on each trial. We will denote one

outcome by S (for Success) and the other

by F (for failure).


3. The probability of S, p, is the same for each

trial, as is the probability of F, 1 – p = q. In our

case we made this true by saying that we

replace the voter we drew on one trial before

the next one. If we are polling voter populations

with very large numbers of Democrats and

Republicans this assumption will be

approximately true even without replacement.


4. The trials are all independent. Again,

this was true in our case by virtue of

replacement but it will generally be

approximately true when polling voter

populations with very large numbers of

Democrats and Republicans.


The binomial random variable x is the

number of S’s in n trials. We call the

probability distribution of x b(n, p) to

indicate that it depends on n and p.


The picture of a binomial distribution

b(n, p) depends on n and p. For our

example n was 4 and p was .4. The

probabilities work out as:


X P(X)

0 1 . (.4)0 . (.6)4 = .130

1 4 . (.4)1 . (.6)3 = .346

2 6 . (.4)2 . (.6)2 = .346

3 4 . (.4)3 . (.6)1 = .154

4 1 . (.4)4 . (.6)0 = .026


0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0 1 2 3 4

b(4, .4)

X P(X)

0 .130

1 .346

2 .346

3 .154

4 .026


Since calculations with the formula can

become tedious, tables of cumulative

binomial probabilities for n = 5-10, 15, 20,

and 25 have been constructed and are

available as Table II on pp. 798-801 of the

book.


In the cumulative tables you first find the

value of n in which you are interested.

Then you find the value of p, in the top

row of the n table you chose. Then the

vertical k values give the cumulative

binomial probability from x = 0 up to k.


Example 4.6. Looking at n = 6, p = .7, and

k = 3 we find the entry .256. What this

means is that for the b(n, p) = b(6, .7)

distribution, p(0) + p(1) + p(2) = p(3) =

.256.


One can also use the cumulative table to

find the probability of a single value of x.


Example 4.7. In Example 4.5 we used the

binomial formula to calculate p(4) for

b(n, p) = (6, .3) and found it to be .06.

But in this case, p(4) = [p(0) + p(1) + p(2)

+ p(3) + p(4)] – [p(0) + p(1) + p(2) + p(3)]

=, from the cumulative table, .989 - .930 =

.059 .06.


To translate word problems into questions

about the binomial distribution and then to

use the cumulative binomial table to

answer the questions is an art which

requires practice.


Example 4.8. Experience has shown that 30% of the rocket launchings at a NASA base have to be delayed due to weather conditions. Use Table II to determine the probabilities that among ten rocket launchings at that base (a) At most three will have to be delayed due to weather conditions(b) At least six will have to be delayed due to weather conditions.


Solution. p = .3 n = 10

(a) “At most three” means 0, 1, 2, or 3.

From the Table II with n = 10 and p = .3

we find that p(0) + p(1) + p(2) + p(3) =

.650


(b) “At least six” means 6, 7, 8, 9, or 10.

From the table with n = 10 and p = .3 we

seek p(6) + p(7) + p(8) + p(9) + p(10) = 1 – [p(0) + p(1) + p(2) + p(3) + p(4) = p(5)]

= 1 - .953 = .047.


In the case of a b(n, p) distribution a

theorem gives us some special results: = x p(x) = np

2 = (x - )2p(x) = npq

= the square root of npq


Example 4.9. If 80% of certain videocasette

recorders will function successfully through the

90-day warranty period, find the mean and

standard deviation of the number of these

videocasette recorders, among 10 randomly

selected, which will function successfully

through the 90-day warranty period, using:


(a) Table II, the formula that defines , and the formula that defines 2.Solution. p = .8 and n = 10 = x p(x) = (0)p(0) + (1)p(1) + (2)p(2) + (3)p(3) + (4)p(4) + (5)p(5) + (6)(p(6) + (7)p(7) + (8)p(8) + (9)p(9) + (10)p(10) =, by Table II, (0)(0) + (1)(0) + (2)(0) + (3)(.001) + (4)(.005) + (5)(.027) + (6)(.088) + (7)(.201) + (8)(.302) + (9)(.269) + (10)(.107) = 8


2 = (x - )2p(x) =, by Table II,

(-8)2(0) + (-7)2(0)+ (-6)2(0) + (-5)2(.001) +

(-4)2(.005) + (-3)2(.027) + (-2)2(.088) +

(-1)2(.201) + (0)2(.302) + (1)2(.269) +

(2)2(.107) = 1.598 so = the square root

of 1.598 = 1.264


(b) The special formulas for the mean and the

standard deviation of the binomial distribution.

Solution. p = .8 and n = 10 = np = 10 . .8 = 8

2 = npq = 10 . .8 x .2 = 1.6 = the square root of npq = the square

root of 1.6 = 1.265

binomial distribution definition 4.1. a random variable is a variable that assumes numerical values...

Documents