interesting probability problems

Interesting Probability Problems

Jonathan Mostovoy - 1002142665University of Toronto

August 19, 2016

Contents

1 Chapter 1 Questions 21a) 1.8.17 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21b) 1.12.12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2


3 Chapter 3 Questions 43a) 3.2.13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43b) 3.3.17 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43c) 3.4.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53d) 3.5.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63e) 3.9.18 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73f) 3.11.26 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4 Chapter 4 Questions 94a) 4.4.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94b) 4.7.12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94c) 4.9.15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10


6 Non-Textbook Problems 136a) A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136b) B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146c) C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146d) D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

1

1 Chapter 1 Questions

1a) 1.8.17

A deck of 52 cards contains four aces. If the cards are shuffled and distributed in a randommanner to four players so that each player receives 13 cards, what is the probability that all four aceswill be received by the same player?

Answer: Since ∃ agents i = 1, . . . , 4 the probability that agent i will receive 4 aces is(134

). Since

∃ 4 agents, it =⇒ ∃ 4(134

)scenarios where one agent receives all 4 aces. Therefore, and since ∃

there are(524

)ways of arranging the four aces in a deck of 52 cards, the probability that all four aces

will be received by the same player is:

Pr(4 aces recieved by same agent) =4(134

)(524

) = 4

(48

13,13,13,9

)(52

13,13,13,13

) =44

4165≈ 0.0105642

1b) 1.12.12

Let A1, . . . , An be n arbitrary events. Show that the probability that exactly one of these n eventswill occur is:n∑i=1

Pr(Ai)− 2

n∑i<j

Pr(Ai ∩Aj) + 3

n∑i<j<k

Pr(Ai ∩Aj ∩Ak)− · · ·+ (−1)n+1nPr(A1 ∩A2 ∩ · · · ∩An)

Proof.

Let P(n) =

n⋃i=1

Pr(Ai\(A1, A2, . . . , Ai−1, Ai+1, . . . , An)), which we shall prove:

=

n∑i=1

Pr(Ai)−2

n∑i<j

Pr(Ai∩Aj) + 3

n∑i<j<k

Pr(Ai∩Aj ∩Ak)−· · ·+ (−1)n+1nPr(A1∩A2∩ · · ·∩An)

Our plan is to form induction on the function P(n). We begin at n = 2. We define an alternatedefinition of the set A and B as A = {x : x ∈ A\B or A ∩ B} and B = {y : y ∈ B\A or A ∩ B}.Both sets for which all x or y in A or B are trivially disjoint to one other since (A\B)∩ (A∩B) = ∅.Thus, Pr(A) = Pr(A\B)+Pr(A∩B). From here, we find Pr(A)+Pr(B) = Pr(A\B)+Pr(B\A)+2Pr(A ∩B). Now since all three sets are disjoint, we can replace our additions to unions to yield:

Pr(A) + Pr(B)− 2Pr(A ∩B) = P(2) = Pr(A\B) ∪ Pr(B\A)

We now have our inductive hypothesis assuming validity of our Theorem for P(n), thus for k = n+1:

P(n+ 1) =

n+1⋃i=1

Pr(Ai\(A1, A2, . . . , Ai−1, Ai+1, . . . , An, An+1))

But by our inductive hypothesis, we know what P(n) is, thus all we need to do is add on the removeall intersections of An+1 and add back it’s own probability (in a sense, take away the overlaps thatAn+1 could have on everything else) ∀ terms in our P(n) formula, i.e.:

P(n+ 1) =

n∑i=1

Pr(Ai) + Pr(An+1)− 2

n∑i<j

Pr(Ai ∩Aj)− 2

n∑i=1

Pr(Ai ∩An+1) + . . .

2

+(−1)n+1nPr(A1 ∩A2 ∩ · · · ∩An) + (−1)n+1nPr(A2 ∩A3 ∩ · · · ∩An+1 + . . .

+(n+ 1)(−1)n+2Pr(A1 ∩ · · · ∩An+1)

=

n+1∑i=1

Pr(Ai)−2

n+1∑i<j

Pr(Ai∩Aj)+3

n+1∑i<j<k

Pr(Ai∩Aj∩Ak)−· · ·+(−1)n+2(n+1)Pr(A1∩A2∩· · ·∩An+1)


2a) 2.1.14

A machine produces defective parts with three different probabilities depending on its state ofrepair. If the machine is in good working order, it produces defective parts with probability 0.02. If itis wearing down, it produces defective parts with probability 0.1. If it needs maintenance, it producesdefective parts with probability 0.3. The probability that the machine is in good working order is 0.8,the probability that it is wearing down is 0.1, and the probability that it needs maintenance is 0.1.Compute the probability that a randomly selected part will be defective.

Answer: We first define the notation: g, w, b, d as “good working order”, “wearing down”,“needs maintenance (bad)” and “production of defective parts” respectively. We summarize theinformation given as: Pr(d|g) = .2, P r(d|w) = .1, P r(d|b) = .3, P r(g) = .8, P r(w) = .1 = Pr(b).We now recall the “Law of Total Probability”, which states: given events B1, . . . , Bk form a partitionof the space S and Pr(Bj) > 0 for j = 1, . . . , k. Then, for every event A in S:

Pr(A) =

k∑j=1

Pr(Bj)Pr(A|Bj)

Since g, w & b must be independent and sum to 1, our answer is a direct application of the Law ofTotal Probability:

Pr(d) =∑i

Pr(i)Pr(d|i), (j = g, w, b) = (.8)(.2) + (.1)(.1) + (.1)(.3) = .2

2b) 2.2.20

Suppose that A1, . . . , Ak form a sequence of k independent events. Let B1, . . . , Bk be anothersequence of k events such that for each value of j, (j = 1, . . . , k), either Bj = Aj or Bj = Acj. Provethat B1, . . . , Bk are also independent events. Hint: Use an induction argument based on the numberof events Bj for which Bj = Acj.

Proof. We let n, n ≤ k denote the number of Bj ’s where the relation Bj = Acj holds, which =⇒ ∃k − n cases where Bj = Aj . For n = 0, our desired relation trivially holds because we assumeA1, . . . , Ak is independent. For higher order cases, we recall the identity of Pr(A ∩ B) = Pr(A) −Pr(A ∩Bc) and the fact that if A1, . . . , Al is independent, then so too will A1, . . . , Av where v ≤ l.

3

Thus, through induction, we assume Pr(B1 ∩ · · · ∩ Bk) are independent, and n of these Bj ’ssatisfy Bj = Acj . Therefore, for s = n+ 1, we want to see if we can factor Pr(B1 ∩ · · · ∩Bi−1, Bci ∩Bi+1 ∩ · · · ∩Bk) where it was Bi that now became Bci from n to n+ 1. We see this relationship tobe true from:

Pr(B1 ∩ · · · ∩Bi−1 ∩Bi+1 ∩ · · · ∩Bk ∩Bci )= Pr(B1 ∩ · · · ∩Bi−1 ∩Bi+1 ∩ · · · ∩Bk)− Pr(B1 ∩ · · · ∩Bi−1 ∩Bi+1 ∩ · · · ∩Bk ∩Bi)= Pr(B1)× · · · ×Pr(Bi−1)×Pr(Bi+1)× · · · ×Pr(Bk)−Pr(B1 ∩ · · · ∩Bi−1 ∩Bi+1 ∩ · · · ∩Bk ∩Bi)= Pr(B1) × · · · × Pr(Bi−1) × Pr(Bi+1) × · · · × Pr(Bk) − Pr(B1) × · · · × Pr(Bi−1) × Pr(Bi+1) ×· · · × Pr(Bk)× Pr(Bi)= Pr(B1)× · · · × Pr(Bk)× (1− Pr(Bi))= Pr(B1)× · · · × Pr(Bi−1)× Pr(Bci )× Pr(Bi+1)× · · · × Pr(Bk)


3a) 3.2.13

An ice cream seller takes 20 gallons of ice cream in her truck each day. Let X stand for thenumber of gallons that she sells. The probability is 0.1 that X = 20. If she doesn’t sell all 20 gallons,the distribution of X follows a continuous distribution with a p.d.f. of the form:

f(x) =

{cx for 0 < x < 20

0 otherwise

where c is a constant that makes Pr(X < 20) = 0.9. Find the constant c so that Pr(X < 20) = 0.9as described above.

Answer: All we need to remember that the integral of our p.d.f over our sample space mustbe equal to the the total probability of occurrence, which in this case is equal to .9, but usually 1.Thus: ∫ 20

0

cx = .9 =⇒ 200c = .9 =⇒ c =9

2000= 0.0045

3b) 3.3.17

Prove that the quantile function F−1 of a general random variable X has the following threeproperties that are analogous to properties of the c.d.f.:

1. F−1 is a non-decreasing function of p for 0 < p < 1.

2. Let x0 = limp→0, p>0 F−1(p) and x1 = limp→1, p<1 F

−1(p). Then x0 equals the greatest lowerbound on the set of numbers c such that Pr(X ≤ c) > 0, and x1 equals the least upper boundon the set of numbers d such that Pr(X ≥ d) > 0.

3. F−1 is continuous from the left; that is F−1(p) = F−1(p−) for all 0 < p < 1.

Proofs:

4

1. Proof. Assume p, q ∈ (0, 1) and p ≤ q. Then, by definition of the quantile function and sinceF (x) is non-decreasing, F−1 is non-decreasing since ∀p, q:(

F−1(p) = min{x ∈ R : F (x) ≥ p})⊆(F−1(q) = min{x ∈ R : F (x) ≥ q}

)

2. Proof. x0: Let z1 > z2 > z3 > . . . be a decreasing sequence of numbers such that limn→∞ zn =0.

=⇒ x0 = limp→0, p>0

F−1(p) =

∞⋂n=1

min{x ∈ R : F (x) ≥ zn} = min{x ∈ R : F (x) ≥ c}

for some c ∈ R where no other x ∈ R is less than c and F (x) > 0, which is equivalent to sayingc is the least upper bound of all possible numbers where F (c) > 0 ≡ the condition Pr(X ≤c) > 0

x1: Let y1 < y2 < y3 < . . . be an increasing sequence of numbers such that limn→∞ yn = 1

=⇒ x1 = limp→1, p<1

F−1(p) =

∞⋃n=1

min{x ∈ R : F (x) ≥ yn} = min{x ∈ R : F (x) ≥ d}

for some d ∈ R where no other x ∈ R is greater less than d and F (x) < 1, which isequivalent to saying d is the least upper bound of all possible numbers where F (d) < 1 ≡the condition Pr(X ≥ d) > 0

3. Proof. Let y1 < y2 < y3 < . . . be an increasing sequence of numbers such that limn→∞ yn = p.We immediately see that:

F−1(p) = min{x ∈ R : F (x) ≤ p} =

∞⋃n=1

min{x ∈ R : F (x) ≤ yn}

=⇒ F−1(p) = limn→∞

F−1(yn) = F−1(p−)

3c) 3.4.4

Suppose that X and Y have a continuous joint distribution for which the joint p.d.f. is definedas follows:

f(x, y) =

{cy2 for 0 ≤ x ≤ 2 and 0 ≤ y ≤ 1

0 otherwise

Determine:

1. the value of the constant c

2. Pr(X + Y > 2)

3. Pr(Y < 1/2)

5

4. Pr(X ≤ 1)

5. Pr(X = 3Y )

Answers:

1. ∫ ∞−∞

∫ ∞−∞

cy2 dxdy = 1 =

∫ 1

0

∫ 2

0

cy2 dxdy =

∫ 1

0

2cy2 dy =2c

3= 1 =⇒ c =

3

2

2.

Pr(X + Y > 2) =

∫ 2

1

∫ 1

2−x

3y2

2dxdy =

3

8

3.

Pr(Y <1

2) =

∫ 12

0

∫ 2

0

3y2

2dxdy =

1

8

4.

Pr(X ≤ 1) =

∫ 1

0

∫ 1

0

3y2

2dxdy =

1

2

5. Pr(X = 3Y ) = 0 since with an n-dimensional continuous probability space, the probabilitythat an (n − 1)-dimensional event occurs is effectively 0; i.e., since X = 3Y is a line in a2-dimensional continuous probability space, the probability this happens must be 0.

3d) 3.5.8

Suppose that the joint p.d.f. of X and Y is as follows:

f(x, y) =

{24xy for x ≥ 0, y ≥ 0 and x+ y ≤ 1

0 otherwise

Are X and Y independent?

Answer: We recall Theorem 3.5.5, which states: when dealing with a joint p.d.f. f(x, y), therandom variables, X and Y , will be independent ⇐⇒ f(x, y) = h1(x)h2(y) where hi(z) is a non-negative function only dependent on z. For every point within the defined triangle, we can definetwo functions which work, e.g;

h1(x) =

{kx for x ∈ [0, 1]

0 otherwise

where k ∈ R, and

h2(y) =

{24k y for x ∈ [0, 1]

0 otherwise

However, if we choose a point within the the unit square outside of our triangle, we need f(x, y) = 0,but h1(x) > 0 and h2(y) > 0 which thus leads to a contradiction which =⇒ X and Y are NOTindependent. Thus, we can conclude the following generalization: If ∃ xi within f(x1, x2, . . . ) whosedomain is a function of at least 1 other variable, xj where j 6= i, then ∃ some dependency amongstthe variables x1, x2, . . . .

6

3e) 3.9.18

Let the conditional p.d.f. of X given Y be g1(x|y) = 3x2

y3 for 0 < x < y and 0 otherwise. Let the

marginal p.d.f. of Y be f2(y), where f2(y) = 0 for y ≤ 0 but is otherwise unspecified. Let Z = XY .

Prove that Z and Y are independent and find the marginal p.d.f. of Z.

Proof. By the definition of conditional probability, the joint p.d.f. of (X,Y ) is

f(x, y) =

{3x2f2(y)

y3 if x > 0

0 otherwise

We recall that if Z = XY , we can define the dummy second random variable W = Y so that the

Jacobian of our inverse transformation of x = zw and y = w is:

J =

∣∣∣∣∂x∂z ∂x∂w

∂y∂z

∂y∂w

∣∣∣∣ =

∣∣∣∣w z0 1

∣∣∣∣Thus,

g(z, w) = f(x, y)w = f(zw,w)w =3(zw)2f2(w)w

w3= 3z2f2(w)

Where the bounds on our variables were previously established in the question. Thus, since g(z, w) =f1(z)f2(w) we may conclude independence. Further, the marginal p.d.f. of Z will be:

f1(z) =

{3z2 if z ∈ (0, 1)

0 otherwise

Since∫∞−∞ f2(w)dw = 1 if f2(w) is a proper probability function.

3f) 3.11.26

Let X1, X2 be two independent random variables each with p.d.f. f1(x) = e−x for x > 0 andf1(x) = 0 for x ≤ 0. Let Z = X1 −X2 and W = X1

X2.

1. Find the joint p.d.f. of X1 and Z.

2. Prove that the conditional p.d.f. of X1 given Z = 0 is:

h1(x1|0) =

{2e−2x1 for x1 > 0

0 otherwise

3. Find the joint p.d.f. of X1 and W .

4. Prove that the conditional p.d.f. of X1 given W = 1 is:

h1(x1|1) =

{4x1e

−2x1 for x1 > 0

0 otherwise

7

5. Notice that {Z = 0} = {W = 1}, but the conditional distribution of X1 given Z = 0 is notthe same as the conditional distribution of X1 given W = 1. This discrepancy is known as theBorel paradox. In light of the discussion that begins on page 146 about how conditional p.d.f.’sare not like conditioning on events of probability 0, show how “Z very close to 0” is not thesame as “W very close to 1.” Hint: Draw a set of axes for x1 and x2, and draw the two sets{(x1, x2) : |x1 − x2| < ε} and {(x1, x2) : |x1

x2− 1| < ε} and see how different they are.

1. Answer: First, let us define the Jacobian from the two equations X1 = V and X2 = V − Z:

J =

∣∣∣∣∂x1

∂v∂x1

∂z∂x2

∂v∂x2

∂z

∣∣∣∣ =

∣∣∣∣1 01 −1

∣∣∣∣=⇒

g(v, z) = f(x1, x2)|J | = | − 1|e−ve−(v−z) = e−2vez ≡ e−2x1ez

2. Proof. By definition, h1(x1|0) = g(x1,0)g1(0)

. Thus, we first find g1(z):

g1(z) =

∫ ∞max (0,z)

g(x1, z)dx1 =

[− 1

2e−2x1ez

]x1=∞

x1=max (0,z)

=

{12e−z if z ≥ 0

12ez if z < 0

=⇒h1(x1|0) =

e−2x1e0

12e

0= 2e−2x1 and 0 if x1 ≤ 0

3. Answer: First, let us define the Jacobian from the two equations X1 = V and X2 = VW :

J =

∣∣∣∣∂x1

∂v∂x1

∂w∂x2

∂v∂x2

∂w

∣∣∣∣ =

∣∣∣∣ 1 01w − v

w2

∣∣∣∣=⇒

g(v, w) = f(x1, x2)|J | = | − v

w2|e−ve−( vw ) =

v

w2e−v(1+

1w ) ≡ x1

w2e−x1(1+

1w ) for v/x1, w > 0

4. Proof. By definition, h1(x1|1) = g(x1,1)g1(1)

. Thus, we first find g1(w):

g1(w) =

∫ ∞0

g(x1, w)dx1 =

[− 1 + x1

w2(1 + 1w )2

e−x1(1+1w )

]x1=∞

x1=0

=1

w2(1 + 1w )2

=⇒

h1(x1|1) =x1

w2 e−x1(1+

1w )

1w2(1+ 1

w )2

∣∣∣∣w=1

= 4x1e−2x1 and 0 if x1 ≤ 0

5. The difference between {(x1, x2) : |x1 − x2| < ε} and {(x1, x2) : |x1

x2− 1| < ε} can be seen

below, which =⇒ h1(x1|w) 6= h1(x1|z).

8


4a) 4.4.9

Let X be a random variable with mean µ and variance σ2, and let ψ1(t) denote the m.g.f. of Xfor −∞ < t < ∞. Let c be a given positive constant, and let Y be a random variable for which them.g.f. is:

ψ2(t) = ec(ψ1(t)−1) for −∞ < t <∞

Find expressions for the mean and the variance of Y in terms of the mean and the variance of X.

Answer: We first summarize: ψ1(0) = 1, ψ′1(0) = µ and ψ′′1 (0) = σ2 + µ2. We compute:

dψ2(t)

dt

∣∣∣∣t=0

= cψ′1(0)ec(ψ1(0)−1) = cµ

.d2ψ2(t)

dt2

∣∣∣∣t=0

= cψ′′1 (0)ec(ψ1(0)−1) + c2(ψ′1(0))2ec(ψ1(0)−1) = c(σ2 + µ2) + c2µ2

Therefore, Mean = cµ and Variance = c(σ2 + µ2) + c2µ2 − (cµ)2 = c(σ2 + µ2).

4b) 4.7.12

Suppose that X and Y are random variables such that E(Y |X) = aX + b. Assuming thatCov(X,Y ) exists and that 0 < Var(X) < ∞, determine expressions for a and b in terms ofE(X), E(Y ), Var(X), and Cov(X,Y ).

Answer: We recall E(E(X1|X2)) = E(X1) =⇒ E(E(Y |X)) = E(Y ) = E(aX+b) = aE(X)+b.Thus, we have our first equation: E(Y ) = aE(X) + b.

Next, we apply E(Xf(X,Y )) to both sides of the equation, yielding on the left: E(XE(Y |X)) =E(E(XY |X)), and by what we had originally noted above, = E(XY ), which = E(X(aX + b)) =E(aX2 + bX) = aE(X2) + bE(X). Therefore, we have our second equation: E(XY ) = aE(X2) +bE(X). So we must solve the following linear equations:

9

1. E(XY ) = aE(X2) + bE(X)

2. E(Y ) = aE(X) + b

Which yields the equation:

E(Y )− aE(X) =E(XY )− aE(X2)

E(X)

=⇒ E(X)E(Y )− E(XY ) = a((E(X))2)− E(X2)

=⇒ a =E(XY )− E(X)E(Y )

E(X2)− (E(X))2=

Cov(X,Y )

Var(X)

=⇒ b = E(Y )− aE(X) = E(Y )− E(X)Cov(X,Y )

Var(X)

4c) 4.9.15

Suppose that X1, . . . , Xn are random variables for which Var(Xi) has the same value σ2 fori = 1, . . . , n and ρ(Xi, Xj) has the same value ρ for every pair of values i and j such that i 6= j.Prove that ρ ≥ − 1

n−1 .

Proof. Let us first note that if we want to find the variance of the sum of: Z = X1 + · · ·+Xn, then:

Var(Z) =

n∑i=1

Var(Xi) + 2∑i<j

Cov(Xi, Xj)

In this scenario, we have Var(Xi) = σ2 ∀i and Cov(Xi, Xj) = ρσ2 ∀i 6= j, which =⇒

V ar(Z) = nσ2 + 2

(n

2

)ρσ2 = nσ2 + n(n− 1)ρσ2

We next recall V ar(Y ) ≥ 0 ∀Y . Therefore, and in recalling σ2, n > 0:

0 ≤ nσ2 + n(n− 1)ρσ2 =⇒ −nσ2 ≤ n(n− 1)ρσ2 =⇒ −1 ≤ (n− 1)ρ

=⇒ ρ ≥ − 1

n− 1


5a) 5.4.16

In this exercise, we shall prove that the three assumptions underlying the Poisson process modeldo indeed imply that occurrences happen according to a Poisson process. What we need to showis that, for each t, the number of occurrences during a time interval of length t has the Poissondistribution with mean λt . Let X stand for the number of occurrences during a particular timeinterval of length t. Feel free to use the following extension of Eq. (5.4.7): For all real a,

limu→0

(1 + au+ o(u))1u = ea

10

1. For each positive integer n divide the time interval into n disjoint subintervals of length tn each.

For i = 1, . . . , n, let Yi = 1 if exactly one arrival occurs in the i’th subinterval, and let Ai bethe event that two or more occurrences occur during the i’th subinterval. Let Wn =

∑ni=1 Yi.

For each non-negative integer k, show that we can write Pr(X = k) = Pr(Wn = k) + Pr(B),where B ⊆ ∪ni=1Ai.

2. Show that limn→∞ Pr(∪ni=1Ai) = 0. Hint: Show that Pr(∩ni=1Aci ) = (1 + o(u))

1u where u = 1

n .

3. Show that limn→∞ Pr(Wn = k) = e−λ(λt)k

k! . Hint: limn→∞n!

nk(n−k)! = 1.

4. Show that X has the Poisson distribution with mean λt.

1. Proof. Trivially from chapter 1, we know {X = k} = ({X = k} ∩ A) ∪ ({X = k} ∩ Ac) ∀ setsA, and the two sets, ({X = k} ∩ A) and ({X = k} ∩ Ac) are disjoint. If we let A = ∪ni=1Ai,then ({X = k} ∩ (∪ni=1Ai)

c) = {Wn = k}. We now trivially note ({X = k} ∩ (∪ni=1Ai)) ⊆ A=⇒ Pr(X = k) = Pr(Wn = k) + Pr(B) where B = ({X = k} ∩ (∪ni=1Ai)) ⊆ A.

2. Proof. First, we note as is stated in the part 1, ∃n disjoint subintervals of length tn =⇒

A1, . . . , An are independent and that Pr(Ai) = Pr(Aj) ∀i, j. Therefore,

Pr(∩ni=1Aci ) =

n∏i=1

Pr(Aci ) = [Pr(Ac1)]n = [1− Pr(A1)]n

It was our assumption that Pr(Ai) = o( 1n ) = o(u) =⇒

limn→∞

Pr(A) = 1− limn→∞

(1 +0

n− o( 1

n))n = 1− e0 = 0

3. Proof. We recall that if ∃n Bernoulli R.V.’s with the parameter p = λtn + o(u), and if Wn =∑n

i=1 Yi, then:

Pr(Wn = k) =

(n

k

)(λt

n+ o(u)

)k(1− λt

n− o(u)

)n−kNext, we note limn→∞ nk(λtn + o(u))k = (λt)k, and limn→∞ nk(1 − λt

n − o(u))n−k = e−λt.Therefore,

limn→∞

Pr(Wn = k) =(λt)ke−λt

k!limn→∞

n!

nk(n− k)!=e−λt(λt)k

k!

4. Proof. From part 1, we already saw Pr(X = k) = Pr(Wn = k) + Pr(B), since Pr(X = k) 6=f(n) =⇒

Pr(X = k) = limn→∞

Pr(Wn = k) + limn→∞

Pr(B) =e−λt(λt)k

k!+ 1− lim

n→∞(1 +

0

n− o( 1

n))n

=e−λt(λt)k

k!By parts 1-3

11

5b) 5.7.24

Review the derivation of the Black-Scholes formula (5.6.18). For this exercise, assume thatour stock price at time u in the future is S0e

µu+Wu , where Wu has the gamma distribution withparameters αu and β with β > 1. Let r be the risk-free interest rate.

1. Prove that e−ruE(Su) = S0 ⇐⇒ µ = r − α log( ββ−1 ).

2. Assume that µ = r − α log( ββ−1 ). Let R be 1 minus the c.d.f. of the gamma distribution with

parameters αu and 1. Prove that the risk-neutral price for the option to buy one share of thestock for the price q at time u is S0R(c[β − 1])− qe−ruR(cβ), where:

c = log

(q

S0

)+ αu log

(β

β − 1

)− ru

3. Find the price for the option being considered when u = 1, q = S0, r = 0.06, α = 1, andβ = 10.

1. Proof. We recall ψ(t) for the Gamma distribution is = ( ββ−t )

α. Therefore,

E(Su) = E(S0eµu+Wu = S0e

µuE(eWu) = S0eµu(

β

β − 1)α

Thus, S0 = e−ruE(Su) ⇐⇒

S0 = e−ruS0eµu(

β

β − 1)αu ⇐⇒ log(1) = −ru+µu+αu log(

β

β − 1) ⇐⇒ µ = r−α log(

β

β − 1)

2. Proof. We recall the value of an option at time u will be h(Su), where h(s) = s − q if s > qand 0 otherwise. Therefore, h(Su) > 0 ⇐⇒

W > log

(q

S0

)+ αu log

(β

β − 1

)− ru = c

The risk-neutral price of the option is the present value of E(h(Su)), which equals:

e−ruE(h(S)) = e−ru∫ ∞c

[S0e

µu+Wu − q]βαu

Γ(αu)wαu−1e−βwdw

We split the integrand into two parts at the −q. The second integral is then just a constanttimes the integral of a normal p.d.f., namely,

−qe−ru∫ ∞c

βαu

Γ(αu)wαu−1e−βwdw = −qe−ruR(cβ)

The first integral is:

e−ruS0

∫ ∞c

eµu+Wuβαu

Γ(αu)wαu−1e−βwdw = S0[R(c(β − 1))]

12

Combining these two integrals yields:

S0R(c[β − 1])− qe−ruR(cβ)

where

c = log

(q

S0

)+ αu log

(β

β − 1

)− ru

3. Since q = S0 it =⇒

c = log

(q

S0

)+ αu log

(β

β − 1

)− ru = log

(S0

S0

)+ log

(10

9

)− 0.06 ≈ 0.0453605156

From here we substitute c into

S0R(c[β − 1])− qe−ruR(cβ) ≈ S0

[R(0.0453605156(9))− e−.06R(0.0453605156(10))

]≈ 0.0599976S0

6 Non-Textbook Problems

6a) A

For an event B with P (B) > 0, define Q(A) = P (A|B) for any event A. Show that Q satisfiesAxioms 1-3 of probability and conclude Q is a probability.

Let us first recall the first 3 Axioms of Probability:

1.) For every event A, Pr(A) ≥ 0

2.) Pr(S) = 1

3.) If A1, A2, . . . is a countably infinite sequence of disjoint events, then Pr(∪∞i=1Ai) =∑∞i=1 Pr(Ai).

Proof. Since Pr is a probability function, we know that ∀ sets X, Pr(X) ≥ 0. Furthermore, we

recall the formal definition of condition probability: Pr(A|B) = Pr(A∩B)Pr(B) . From here we note

∀x ∈ (A ∩B), x ∈ B =⇒ Pr(A ∩B) ≤ Pr(B) =⇒ Pr(A∩B)Pr(B) ≤ 1. Thus, we can conclude the first

2 Axioms hold.

For the 3rd Axiom, We find if A1, A2, . . . is a countably infinite sequence of disjoint events, then∀i 6= j (Ai ∩ B) ∩ (Aj ∩ B) = (Ai ∩ Aj) ∩ B = (∅ ∩ B) = ∅ =⇒ (Ai ∩ B) and (Aj ∩ B)are independent =⇒ Pr(∪∞i=1Ai ∩ B) =

∑∞i=1 Pr(Ai ∩ B) (steps made due to the distribu-

tion law and definition of disjointness respectively). Let us call these findings “Rule Z”. Next,Pr(∪∞i=1Q(Ai)) = Pr(∪∞i=1(Ai|B)) = . . .

=Pr((∪∞i=1(Ai)) ∩B)

Pr(B)

=Pr(A1 ∩B) ∪ (A2 ∩B) ∪ . . . )

Pr(B)by distribution law

13

=

∑∞i=1 Pr(Ai ∩B)

Pr(B)by Rule Z

=

∞∑i=1

Pr(Ai|B) and thus completes our proof

6b) B

Assume that X1, X2, . . . are an i.i.d. random variables having E(|Xj |p) <∞ for 1 < p ≤ 2. Letµ = E(Xj).

1. Show that Xn = X1+···+Xnn → µ in Lp as n→∞.

2. Show that Xn → µ almost surely.

1. First, we recognize the function, f(Xj) = |Xj |p must be convex for p ∈ (1, 2]. Therefore,

we can establish the lower bound from Jensen’s Inequality: E(|Xn|p) ≥ |E(Xn)|p and by theTheorems of 4.2, plim(E(Xn)) = µ. Therefore, from Jensen’s and Sec. 4.2, Xn → c s.t. c ≥ µas n → ∞. We now establish the upper bound by E(|Xn|p) ≤

∑ni=1E(|Xi|p) =⇒ (also by

4.2 findings) Xn → k, k ≤ µ Therefore, combining both our upper and lower bounds, we getXn → µ as n→∞

2. By our previous findings, we know each of these upper and lower bounds work for plim’s, butthey also do work for a.s. covergance, therefore, P (limn→∞Xn = µ) = 1 as n→∞.

6c) C

Two random variables X and Y have bivariate normal distribution if the joint density is:

pdfX,Y (x, y) =1

2πσxσy√

1− ρ2e− 1

2(1−ρ2)

(( x−µxσx

)2−2ρ( x−µxσx)(y−µyσy

)+(y−µyσy

)2)

1. Compute marginal probability density function of X.

2. Show conditional distribution of Y given X = x is N(ν, τ2) and find ν and τ .

3. Show that X and Y are independent if and only if Cov(X,Y ) = 0.

1. Proof. Let Q(X,Y ) = − 12(1−ρ2)

((x−µxσx

)2 − 2ρ(x−µxσx)(y−µyσy

) + (y−µyσy

)2). We can re-write

Q(X,Y ) as:

Q(X,Y ) =−1

2(1− ρ2)

[((y − µyσy

)− ρ(x− µx

σx

))2

+ (1− ρ2)(x− µx

σx

)2]

=−1

2

[(x− µxσx

)2

+

(y − α(x)

σy√

1− ρ2

)2], where α(x) = µy + ρ

σyσx

(x− µx)

14

Thus, since fX(x) =∫∞−∞ fX,Y (x, y)dy, we now have:

fX(x) =e−

12 (x−µxσx

)2

2πσxσy√

1− ρ2

∫ ∞−∞

e− 1

2 (y−α(x)

σy√

1−ρ )2

dy

We next recognize that:

1√

2πσy√

1− ρ2e− 1

2 (y−α(x)

σy

√1−ρ2

)2

dy ≡ the p.d.f. of the N(α(x), σ2y(1− ρ2) distribution

Therefore,

ξ(x, y) =

∫ ∞−∞

1√

2πσy√

1− ρ2e− 1

2 (y−α(x)

σy

√1−ρ2

)2

dy = 1

Thus,

fX(x) = ζ(x)ξ(x, y) = ζ(x) =1√

2πσxe−

12 (x−µxσx

)2 ≡ N(µx, σ2x)

2. Proof. We recall: fY |X(y|x) =fX,Y (x,y)fX(x) . Thus, from part 1, we can immediately substitute in

fX(x) = ζ(x) and fX,Y (x, y):

fY |X(y|x) =[2πσxσy

√1− ρ2]−1e

−12

[(x−µxσx

)2+(

y−α(x)

σy

√1−ρ2

)2][√

2πσx]−1e−12 (x−µxσx

)2

=1

√2πσy

√1− ρ2

e− 1

2

(y−α(x)

σy

√1−ρ2

)2≡ N(α(x), σy

√1− ρ2)

Thus, fY |X(y|x) does = N(ν, τ), where ν = α(x) and τ = σy√

1− ρ2.

3. Proof. We recall Corollary 3.5.1, which states two variables are independent ⇐⇒ fX,Y (x, y) =fX(x)fY (y). Using this information, and by the symmetry of x and y in the Bivariate NormalDistribution, we know fY (y) ≡ N(µy, σ

2y). Therefore,

fY (y)fX(x) =1

2πσxσye−12

[(x−µxσx

)2+(y−µyσy

)2]However, since ρ2 ∈ [0, 1], we have a problem with independence. If we let:

M (g(x, y, ρ), ρ) = e1

1−ρ2g(x,y,ρ)−log(

√1−ρ2)

, and

g(x, y, ρ) = −(1− ρ2) log(2πσxσy)− 1

2

((x− µxσx

)2 − 2ρ(x− µxσx

)(y − µyσy

) + (y − µyσy

)2)

Thus one can see that: M (g(x, y, ρ), ρ) = fX,Y (x, y). Furthermore, by construction, it isimpossible for M (g(x, y, ρ), ρ) = fX(x)fY (y) unless ρ = 0 since otherwise the 1

1−ρ2 and

− log(√

1− ρ2) terms will be 6= 1 and 0 respectively, thereby shifting the density too much for

the now non-zero−2ρ(x−µxσx)(y−µyσy

) to bring back. Also, since ρ = 0 ⇐⇒ Cov(X,Y ) = 0 =⇒X and Y are independent ⇐⇒ Cov(X,Y ) = 0

15

6d) D

Suppose Yi ∼ ind. exponential(µi) where µi = 1βxi

where β > 0 and xi > 0.

1. Find β maximizing:n∏i=1

pdfYi(yi)

2. Show that the β̂ found in (1) converges to β in probability.

3. Show that Var(β̂)→ 0.

1. Proof. We know fYi(yi) = 1βxi

e− 1βxi

yi Therefore, by standard MLE practices, we have

L =

n∏i=1

1

βxie− 1βxi

yi =⇒ log(L ) = log

[1

βn

( n∏i=1

1

xi

)e− 1β

∑ni=1

yixi

]

And Maximizing: − n log(β) + log

( n∏i=1

1

xi

)− 1

β

n∑i=1

yixi

=⇒

∂ log(L )

∂β= −n

β+

n

β2

n∑i=1

yixi

= 0 =⇒ β̂ =1

n

n∑i=1

yixi

2. First, we know E(β̂) = E( 1n

∑ni=1

yixi

) = 1nE(

∑ni=1

yixi

) = 1n

∑ni=1E( yixi ) = 1

n (nβ) = β. Bytheorems from 4.2, and by Chebyshev’s Inequality, and noting E(Yi) = 1/µi =⇒ E(Yi/xi) =

1xiµi

= βxixi

= β.

3. Since there is only one parameter within an exponential distribution, and σ2 = µ2 for expo-nential =⇒ β̂2 = σ̂2. Since we already showed that plim(β̂) = β =⇒ plim(β̂2) = β2. Next,

we apply Chebyshev’s Inequality, which results in the fact that Var(β̂)→ 0 as n→∞ else ourprevious findings would not hold under such assumptions.

16

interesting probability problems

Documents