engineering statistics chapter 2 special variables 2d approximation of variables

30
Engineering Statistics Chapter 2 Special Variables 2D Approximation of Variables

Upload: ashlie-phillips

Post on 29-Dec-2015

242 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Engineering Statistics Chapter 2 Special Variables 2D Approximation of Variables

Engineering Statistics Chapter 2

Special Variables

2D Approximation of Variables

Page 2: Engineering Statistics Chapter 2 Special Variables 2D Approximation of Variables

Poisson Distribution as Approximation to Binomial Distribution

• As n increases, the probability table for Bin(n, p) becomes longer. When n>30, it is not practical to tabulate the probabilities for the distribution. The UTM tables stops at 30. Some other statistical tables may go up to n=40, but all such tables have to stop at some point.

• One reason why we do not create tables for binomial distribution for n>30 is that, for small values of p, we can use a Poisson distribution approximately equivalent to the binomial distribution.

Page 3: Engineering Statistics Chapter 2 Special Variables 2D Approximation of Variables

Comparing means and variance

• We recall that the mean for X~Bin(n, p) is np, and its variance is npq.

• In the case of P(), the mean and variance are both .

• Note that when p is nearly 0, then q (=1–p) will be close to 1, which means np npq, which indicates that the mean and variance are nearly equal. This is similar in property to the Poisson distribution.

• In such a case, the Poisson distribution P(np) is approximately equivalent to Bin(n, p).

Page 4: Engineering Statistics Chapter 2 Special Variables 2D Approximation of Variables

Example 1• Given X~Bin(40, 0.05), calculate P(X6). Compare it to

P(Y6) for Y~P(2).Solution:For the binomial distribution Bin(40, 0.05)

P(X6) = P(X=0)+… +P(X=6)= 0.128512+0.270552+0.277672+0.185114 +0.090122+0.034151+0.010485=0.9966

For the Poisson distribution Y~P(2)P(Y6) = 0.995. (Value from the table)

The probability using Poisson distribution is very near the original binomial distribution. It is of very much easier than the multiple calculations using binomial distributions.

Page 5: Engineering Statistics Chapter 2 Special Variables 2D Approximation of Variables

Points to note

• The approximation of Poisson for binomial is generally good when n and p satisfy the stated conditions.

• For X~Bin(n, p) and Y~P(np), individual events of the type P(X = r) and P(Y = r) may differ significantly, even for n>30 and p nearly 0.

• However, for compound events of the type P(Xr) and P(Yr), their values are usually very close.

Page 6: Engineering Statistics Chapter 2 Special Variables 2D Approximation of Variables

Example 2• The probability a man in a village contracts TB is 0.005. The

authority is checking 500 people. What is the probability up to 4 people have TB?

Solution: In this case, direct calculation using the binomial distribution may be done using the calculator, giving 0.891681. However, it is easier using Poisson approximation.

• Let X represent the number of TB patients. X~Bin(500, 0.005). Since n>30 and p<0.1, we approximate X by the Poisson distribution P(2.5). From the table, we have P(X4) = 0.8912. Again, we note that the answers are not much different from the direct calculations using binomial distribution.

Page 7: Engineering Statistics Chapter 2 Special Variables 2D Approximation of Variables

Example 3

• 2% of motorcyclists do not have valid license. In an operation, the JPJ stops 60 motorcyclists to check their license. What is the probability 3 to 6 of them have invalid licenses?

• Solution: Let I represents the number of invalid licenses. I~Bin(60, 0.02). As n>30 and p<0.1, use the Poisson distribution P(1.2) to represent I.

• P(3I6) = P(I6) –P(I2) = 0.9997–0.8795 = 0.1202.

Page 8: Engineering Statistics Chapter 2 Special Variables 2D Approximation of Variables

When p>0.9

• If X~Bin(n, p) such that p>0.9, then we represent X’~Bin(n, q), where q=1-p. We then interpret the events in X in terms of X’, as we have discussed earlier in the chapter on binomial distribution.

• Hence if n>30 and p>0.9, events of X may also be approximated using Poisson distribution through X’.

Page 9: Engineering Statistics Chapter 2 Special Variables 2D Approximation of Variables

Example 4• 97% of octogenerians suffer from cataracts. In a

health screening, 50 octogenerians have their eyes checked. What is the probability less then 47 have cataracts?

Solution: Let C represent people with cataracts. Then C~Bin(50, 0.97). Using C’ to represent those with no cataracts, we have C’~Bin(50, 0.03). We note the C’ can be approximated using Poisson distribution P(1.5).

Now P(C<47)=P(C’4)=1–P(C’3)=1–0.9344 = 0.0656.

Page 10: Engineering Statistics Chapter 2 Special Variables 2D Approximation of Variables

Example 5

• At least 95% of visitors to an exhibition ends up buying goods on exhibitions. During a period under observation, 220 to 240 visitors enter the exhibition hall. What is the probability at least 95% of them buy some exhibits?

Solution: Let B represent number who make purchases. Taking the lower value 220 first, B~Bin(220, 0.95). Using B’ to represent number who only visit, B’~Bin(200, 0.05).

Page 11: Engineering Statistics Chapter 2 Special Variables 2D Approximation of Variables

Example 5 (contd)

• As n>30, we approximate B’ using Poisson distribution. I.e. B’~P(11). Now 95% of 220 is 209. P(B209) = P(B’11) = 0.5793.

• Next if n=240, then B~Bin(240, 0.95) B’~Bin(240, 0.05). The Poisson approximation in this case is P(12). This time, 95% of visitors equals 228. The probability is P(B228) = P(B’12) = 0.5760.

Page 12: Engineering Statistics Chapter 2 Special Variables 2D Approximation of Variables

Approximation

• Note that X~Bin(n, p) is approximated by P(np) only when both the conditions n>30 and p<0.1 are true.

• In general, when n is large, say >50, and p is small, <0.05, we have good approximations for probabilities of events. Indeed, the Poisson distribution is the limiting distribution of the binomial distribution as N and p0.

• When n is rather small, <40 and p is very near 0.1, the approximations are not very good.

Page 13: Engineering Statistics Chapter 2 Special Variables 2D Approximation of Variables

When p>0.1 and p<0.9• Unfortunately, when p>0.1 and p<0.9, the Poisson

distribution P(np) may not be close to the binomial distribution Bin(n, p) anymore.

• For example, if X~(50, 0.4), then the mean of X is 20.

• Let us examine the probabilities of the event X=10 for Bin(50, 0.4) and P(20):

Bin(50,0.4): P(X=10)=0.0014398P(20): P(X=10)=0.0058163

• As we can see, the values are very different.

Page 14: Engineering Statistics Chapter 2 Special Variables 2D Approximation of Variables

Normal distribution as Approximation to Binomial Distribution

• For n>30, and when p lies between 0.1 and 0.9, we can approximate the binomial distribution Bin(n, p) as N(np, npq).

• This approximation will create probabilities which are good for sufficiently large n and when p is close to 0.5.

• Unfortunately, because Binomial distribution is discrete, while the normal distribution is continuous, we need to make some adjustments to the original events.

Page 15: Engineering Statistics Chapter 2 Special Variables 2D Approximation of Variables

Continuity adjustment

• For a discrete event X>5, we do not want to include 5. But if we start from 6, there is big big gap left between 5 and 6.

• The proposal is to treat X>5 in discrete variable as X>5.5 as continuous.

• For discrete X5, we treat it as X4.5 and so on.• The table below list the correction needed when

we approximate a discrete event using continuous variable.

Page 16: Engineering Statistics Chapter 2 Special Variables 2D Approximation of Variables

Corrections

Discrete event Continuous event

X < a X< a – 0.5

X > a X > a + 0.5

X a X a + 0.5

X a X a – 0.5

Page 17: Engineering Statistics Chapter 2 Special Variables 2D Approximation of Variables

Example 6• 24% of all smokers will die of lung cancer in 2 years after

detection. A record is kept on 100 such cases. What is the probability at least 30 will die within 2 years?

Solution: L = number of lung cancer patients who will die within 2 years. N = 100, p =0.24. L~Bin(100, 0.24).

Since n> 30 and 0.1<p<0.9 , we can use normal distribution as approximation for L. = np = 1000.24 = 24, variance = 2 = npq = 240.76 = 18.24.

Hence L~N(24, 18.24).The event we want is L30; this we convert to L29.5 as

continuous correction.P(L29.5) = P(z [29.5 – 24]/18.24) = P(z1.29

= 0.5 – 0.4015 = 0.0985.

Page 18: Engineering Statistics Chapter 2 Special Variables 2D Approximation of Variables

Example 7

• In a new agricultural project, 200 cactus plants of dragon fruits are planted. Based on experience, only 77% of the plants will bear fruits. What is the probability 160 plants or less will be successful?

Solution: F = number of cactus bearing fruits. n = 200, p = 0.77. F~Bin(200, 0.77). Mean = 200 0.77 = 154; variance = 200 0.77 0.23 = 35.42.

Hence we approximate F as N(154, 35.42). P(F160) = P(z[160-154]/35.42)= P(z1.01) = 0.5 + 0.3438 = 0.8438.

Page 19: Engineering Statistics Chapter 2 Special Variables 2D Approximation of Variables

Example 8

• The probability a new car has more than 2 defects within 2 years is 0.25 for car A and 0.32 for car B. A company sells 50 A and 40 B. What is the probability more A than B will have defects within the two year guarantee period?

Solution:A~Bin(50, 0.25); B~Bin(40, 0.32)

We note that, as binomial distribution, we cannot find the probability for the event A>B. This is when normal distribution as approximation is needed.

Page 20: Engineering Statistics Chapter 2 Special Variables 2D Approximation of Variables

Normal distribution for A-B

Using normal distribution as approximation:

A~N(12.5, 9.375);

B~N(12.8, 8.704)

A-B~N(12.5-12.8, 9.375+8.704)

P(A>B) = P(A-B>0)

P(A-B>0.5) (continuity adjustment)

= P(z>[0.5 – (-0.3)]/18.079)

= P(z>0.04) = 0.5 – 0.016 = 0.484.

Page 21: Engineering Statistics Chapter 2 Special Variables 2D Approximation of Variables

Example 9• 15% of men and 8% of women are colour-blind. A

check is done on 80 men and 80 women. What is the probability more than 20 are colour-blind?

Solution:

M~Bin(80, 0.15) M~N(12, 10.2)

W~Bin(80, 0.08) W~N(6.4, 5.888)

M+W~N(12+6.4, 10.2+5.888)

~N(18.4, 16.088)

Page 22: Engineering Statistics Chapter 2 Special Variables 2D Approximation of Variables

We want the event M+W>20, which on adjustment, becomes M+W>20.5

Now P(M+W>20.5)= P(z>[20.5-18.4]/16.088)= P(z>0.52) = 0.5 – 0.1985) = 0.3015.

Technically speaking, since p<0.1 in the case of W, we cannot use normal distribution as approximation for W. However, there is no direct way of combining two binomial distributions. Using the normal distribution, we can obtain the answer, even though it may not be very accurate.

Page 23: Engineering Statistics Chapter 2 Special Variables 2D Approximation of Variables

Using normal distribution to approximate Poisson distribution• Even the Poisson distribution table is

limited. In the UTM table, stops at 40. So we still need to find ways to overcome the problem when the exceeds 40.

• In general, we find that, when >40, the normal distribution N(, ) when both the mean an variance are give a good approximation.

Page 24: Engineering Statistics Chapter 2 Special Variables 2D Approximation of Variables

Example 10

• On the average, 80 accidents occur in a day during the festive season. What is the probability more than 90 accidents occur on such a day?

Solution: A~P(80). Since >40, we use the normal distribution N(80, 80) as the approximation.

As Poisson distribution is discrete, we also need to make continuity adjustment. Thus P(A>90) is adjusted to P(A>90.5) = P(z>[90.5-80]/80)

= P(z>1.17) = 0.5 – 0.379 = 0.121.

Page 25: Engineering Statistics Chapter 2 Special Variables 2D Approximation of Variables

Example 11

• Hospital A receives 30 male and 25 female patients each day. What is the probability the total number of patients on a certain day is between 50 and 65?

Solution: M~P(30), W~P(25).M+W~P(55)

Using the normal distribution as approximation, we have M+W~N(55, 55)

Page 26: Engineering Statistics Chapter 2 Special Variables 2D Approximation of Variables

• Unfortunately, the word between here has two interpretations: either it means more than 50 and less than 60 (40<X<60), or it means from 40 to 60 (40X60). Since the question is not clear on this, we have to decide on our own.

Case I: P(40<M+W<60) P(40.5<M+W<59.5)= P([40.5-55]/55)<z<[59.5-55]/55)= P(-1.96<z<0.61) = 0.4750+0.2291 = 0.7041.

Case II: P(40M+W 60) P(39.5 M+W 60.5)= P([39.5-55]/55) z [60.5-55]/55)= P(-2.09 z 0.74) = 0.4817+0.2704 = 0.7521.

We usually avoid using the word between without qualification because of the possible misunderstanding.

Page 27: Engineering Statistics Chapter 2 Special Variables 2D Approximation of Variables

Using Poisson distributions to combine Binomial Distributions

• As had been shown, there is no simple way to solve problems of the type P(X1+X2=k) if X1~Bin(n1, p1) and X2~Bin(n2, p2) unless p1=p2.

• If, however, the n’s are sufficiently big and the p’s are relatively small, we can achieve good approximations for P(X1+X2=k) by using the corresponding Poisson distributions as approximations for the binomial distributions.

• Let’s examine an example

Page 28: Engineering Statistics Chapter 2 Special Variables 2D Approximation of Variables

Combining Binomials

Example 12. 12% of city dwellers and 8% of kampung folks are known to suffer from breathing ailments. A check is made on 25 people living in cities and 40 people from kampung. What is the probability up to 6 of them show signs of breathing sickness?

Solution: We use C and K to represent the numbers of city and kampung dwellers who are suffering from breathing sicknesses. Then C~Bin(25, 0.12) and K~Bin(40, 0.08).

Page 29: Engineering Statistics Chapter 2 Special Variables 2D Approximation of Variables

Example 12 (contd)Obviously, K is a good candidate for using Poisson

distribution as an approximation. Unfortunately, for C, the n is a little small and the p is a bit too big.

However, in this case, we shall resort to the Poisson distribution for approximation for K as well. Hence C~P(3) and K~P(3.2) C+K~P(6.2). From the table, P(C+K6) = 0.5742.

The approximation of C appears to violate the conditions needed. However, the amount of time and work saved compensates for the loss in accuracy.

Page 30: Engineering Statistics Chapter 2 Special Variables 2D Approximation of Variables

Using normal distribution insteadAnother way out is to use normal distributions as

approximation for both C and K, as we did in Ex 9. As K is rather skewed, it may not be so desirable. Also, this will involve more calculations as we need to carry out continuous adjustment. We show the work briefly here.

C~N(3, 2.25), K~N(3.2, 2.944) C+K~N(6.2, 5.194).

P(C+K6) P(C+K6.5) = P(Z0.13) = 0.5517.This is a little different from 0.5742 which we obtain

using Poisson distribution.