brian bumpas-final draft

Pro�ting from the Lottery

Brian Bumpas

December 5, 2011

Contents

1 Introduction 2

2 Lotteries in General, Abrams and Garibaldi's Model, and Some De�nitions 2

2.1 The Lottery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2.2 De�nitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.3 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Model Analysis 8

3.1 Propositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.2 eRoR Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.3 Graphical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4 Model Application 16

4.1 California's MEGA Millions Lottery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.2 Maximum Likelihood Estimations for N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.3 Obtaining the Other Necessary Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

5 Conclusion 25

1

1 Introduction

While the lottery is generally a �bad bet� and the chances of winning are extremely slim, there may be

circumstances in which it actually turns out to be statistically worthwhile. For instance, the largest jackpot

won in California was $315 million, a huge sum split between seven people in 2005 [2]. With such a huge

payo�, was it actually a smart idea to buy into the lottery? The enormous possible payout greatly overwhelms

the cost of the $1 ticket necessary to enter into the lottery, and this skews the expected rate of return. We

will �nd that there have indeed been some cases where the jackpot was enough to produce a positive rate of

return. This paper will use and explain a model developed by Aaron Abrams and Skip Garibaldi to analyze

just how big a jackpot must be for these circumstances to arise and for the lottery to be a �good bet.�

In order to accomplish this and to obtain a better grasp on the feasibility of a worthwhile wager, we

will apply Abrams and Garibaldi's model to California's version of the MEGA Millions lottery. Through

California State Lottery's published statistics on the amount of winners per drawing and the probability of

winning each prize, we will estimate the total number of people in each drawing. Then with the statistics on

the number of people who win each prize, we can also estimate the expected rate of return for each prize.

Once these results are obtained, we will compare each lottery above to those in Abrams and Garibaldi's

article, Finding Good Bets in the Lottery, and Why You Shouldn't Take Them. From this analysis, we

produce a strategy to decide when the lottery is a �good bet.�

2 Lotteries in General, Abrams and Garibaldi's Model, and Some

De�nitions

Before we analyze the model, we will explain the general setup of a lottery, de�ne a few terms used throughout

the paper, then introduce the model itself.

2.1 The Lottery

While there are many di�erent types of lotteries, we will only deal with one in this paper. In particular, we

consider games with large jackpots. In these games, a player purchases a lottery ticket before a particular

drawing takes place. Drawings are often conducted weekly or twice weekly. For each drawing, the lottery

operator randomly selects numbers that match certain criteria. Basically, the player guesses which numbers

will be randomly chosen, in the hope that he or she picked correctly. Then if a player's numbers match

enough of those that were drawn by the lottery operator, the player wins a prize. Note that winners can

claim only one type of prize: he or she wins the best prize that is applicable to their number choices. There

2

can be many di�erent prizes (usually there are between �ve and ten ), but we will divide these into three

di�erent over-arching categories.

We will refer to the �rst category as a fixed prize: if a player meets the criteria for winning this kind

of award, he or she receives a ��xed� amount of money. We say this is �xed because the amount awarded

does not vary per drawing, and it does not vary based o� of how many people play during a drawing.

We call the second category a pari-mutuel prize. �Pari-mutuel� literally means �mutual bet.� Applied

to the lottery, a pari-mutuel prize is one that is split between numerous winners. Furthermore, for any

particular drawing, the amount paid out from a pari-mutuel prize is determined by some proportion of

the total amount collected. From here on out, we will use an index of i's in order to indicate that there

are multiple pari-mutuel prizes: for a lottery with d pari-mutuel prizes, we have 1 ≤ i ≤ d. Then if the

lottery operators determine that the total pool of money dedicated to a certain pari-mutuel prize should be

determined by some proportion ri, then for a drawing that collects N dollars, that prize pool will be valued

at a total of riN dollars. Once this amount is determined, the pool is split equally between each winner

of that prize. Then if ni players win that same prize, each of those players will receive riN/ni dollars. It

is important to note that the pari-mutuel prize amount varies per drawing : if more people play the lottery

during one particular drawing, more money is collected by the lottery, so the pari-mutuel prize is worth

more. On the other hand, if more people win that prize during a particular drawing, it is split between more

people, so it is worth less to an individual.

Now, the �nal type of prize is the jackpot. For each drawing where nobody wins the jackpot, the

amount awarded for this prize will grow. Strictly speaking, this is most frequently (but not necessarily

always) a pari-mutuel prize because it is usually split between each winner. MEGA Millions and California

SuperLOTTO Plus both o�er pari-mutuel jackpots: the amount by which the jackpot increases per drawing

is proportional to the number of people who play during that drawing, and if multiple people win the jackpot,

it is split between each winner. Despite the fact that the jackpot is also a pari-mutuel prize, we create a

di�erent category because it is usually far more valuable than any other pari-mutuel prize. One can think

of the jackpot as the �First Prize� because everybody wants to win this amount.

This parlance can be rather muddled, so let us talk speci�cs. Lotto Texas is a game in which a player

chooses six numbers, where each number is between 1 and 54. We will henceforth refer to a player's choice of

numbers as a ticket. In the Lotto Texas game, each ticket costs $1. A new drawing is held every Wednesday

and Saturday evening. For the drawing itself, a machine randomly pulls out six balls. If the player's ticket

has three matching numbers, then he or she can claim $3. Because there are(

63

)ways to choose three of the

six winning numbers and(

483

)ways to choose three of the remaining numbers, the probability of winning

this �xed prize is(

63

)(483

)× 1

25,827,165 = 345,92025,827,165 ≈ 0.013394. Thus, the odds of winning this prize are

3

approximately 1 in 75. Since the prize value does not change per drawing, it is a �xed prize. Considering

another winning scenario, if the player has four matching numbers, then he or she can claim a pari-mutuel

prize that depends on how many tickets are purchased in total. This prize category is usually valued between

$40 and $50, and the odds of winning are about 1 in 1,526. If the player has �ve matching numbers, then

he or she can claim a di�erent pari-mutuel prize, usually valued between $1,000 and $4,000. The odds of

winning are about 1 in 89,678. Lastly, if the player has six matching numbers, he or she can claim the

jackpot. This prize starts at $5 million, and increases by $1 million for each drawing where nobody wins

the prize. The odds of winning are approximately 1 in 25,827,165. Since the jackpot is pari-mutuel, if two

people win the jackpot, it is split evenly between each person. These statistics can be seen in [3].

For example, say that John buys a ticket on Sunday for $1. He had to pick six numbers, each between

1 and 54, without repetition. But John is predictable: his favorite numbers are 3, 8, 20, 37, 42, and 47,

so his ticket consists of these numbers. That following Wednesday, the Lotto Texas drawing is published.

It turns out that the winning ticket was 3, 8, 15, 20, 42, 47. Unfortunately, John did not win the jackpot

because his choice of 37 does not match the randomly selected choice of 20. However, the remaining �ve

of his numbers match those on the winning ticket, so he wins the �ve matching numbers category. For

illustration, let us assume that during this drawing, the total collected by the lottery N was $2,000,000

(since each ticket is $1, this also means that 2,000,000 tickets were purchased), and for the �ve matching

numbers category, Lotto Texas pays out a total pool of .005N . Then for this particular drawing, John's prize

pool is worth .005 × $2, 000, 000 = $10, 000. But let us say that four other players also won this prize, so

John doesn't win the whole pool�rather, it is split equally between each person. Then each winner receives

$10, 000/5 = $2, 000. So even though John did not win the jackpot, he is extremely happy because he came

out nearly $2,000 richer!1

2.2 De�nitions

De�nition: Suppose a random variable X can take value X1 with probability P1, value X2 with probability

P2, and so on, up to value Xk with probability Pk. The Expected Value of X is

E(X) =

k∑i=1

XiPi.

1Note: For ease, we did not consider taxes in this hypothetical circumstance.

4

Intuitively, the expected value ofX represents the valueX will take on average. We will de�ne the expected

rate of return analogously.

De�nition: For any particular asset or investment, its Expected Rate of Return (eRoR) is de�ned

as

(eRoR) =

n∑i=1

PiRi, (2.1)

where each Pi is the probability that the return Ri (in dollars) is attained for the asset, and n is the

total number of possibilities.

This is a formula that we will modify numerous times. It is the sum of the products of all possible returns

and their corresponding probabilities. Observe that∑ni=1 pi = 1. Also, the expected rate of return of an

asset can be either positive or negative. For our purposes, let a �bad bet� be an asset that has a negative

expected rate of return. Similarly, let a �good bet� be an asset with a positive expected rate of return.

As a quick example, say a �nancial analyst determines that purchasing a $1 stock in the company

Oil�elds Inc. will provide a return R1 = $0.20 with probability P1 = 0.05. Note that the positive return

indicates that for every dollar invested in Oil�elds Inc., the investor will get $1.20 back in return with

a probability of 0.05. Let us also say that other returns and their respective probabilities are given by:

R2 = $0.50, P2 = 0.50, R3 = −$0.35, P3 = 0.45. Here, R3 = −$0.35 indicates that a $1 investment will

only give back sixty-�ve cents with a probability of 0.45. If this is the outcome, the investor loses thirty-�ve

cents for every dollar he or she puts in to the stock. To �nd the expected rate of return, we sum each of

these products to get:

(eRoR) = $0.20× 0.05 + $0.50× 0.50− $0.35× 0.45 ≈ $0.10.

So for this example, every dollar invested should give a net gain of ten cents. So putting money into this

stock should (on average) be a �good bet� in that the investor should not lose money from it.

We can now observe that for the lottery,

(eRoR) = − (cost of ticket) + (expected winnings from �xed prizes) + (expected winnings from pari-

mutuel prizes) + (expected winnings from jackpot) (2.2)

5

2.3 The Model

Now we will duplicate Abrams and Garibaldi's model. To begin, we will de�ne some terms for individual

drawings. So that we stay consistent with Section 2.1, let N be the total number of ticket sales for a

drawing, in dollars. Now any particular lottery will have many di�erent possibilities for each ticket. Let t

be the total number of those possibilities, which can also be referred to as the number of distinct tickets.

For example, in the Lotto Texas example, since each player chooses six numbers between 1 and 54, there are

t =(

546

)= 25, 827, 165 distinct tickets. Then if Alice goes and buys a ticket to this lottery, she can choose

from 25,827,165 distinct tickets. Now, we want to have some way to represent each possible prize. Then we

will say:

• There are t�xi distinct ways to win a �xed prize worth ai dollars. In the Lotto Texas example, a1 = $3,

and there is only one �xed prize. To win this prize, the player must have three matching numbers. As we

discussed in Section 2.1, there are t�x1 =(

63

)(483

)= 345, 920 ways to win this prize because there are

(63

)ways to choose three of the six winning numbers and

(483

)ways to choose three of the remaining numbers.

Thus, the probability of winning this �xed prize is 345, 920/25, 827, 165 ≈ 0.013394. Generally, with

a total of c �xed prizes, we will say that a total of t�x1 , t�x2 , tfix3 , · · · , tfix

c tickets win �xed prizes of

(positive) dollar amounts a1, a2, a3, . . . , ac, respectively. Then of the total possible tickets t, any player

that purchases one of t�xi tickets will receive a prize with a value of ai dollars. Note that multiple people

can buy the same ticket and win these prizes without changing the value of the prize.

• There are tparii ways to split a pot of riN dollars. In the hypothetical Lotto Texas example, the category

of �ve matching numbers splits a pot of r1N = $10, 000, where r1 = 0.005 and N = $2, 000, 000.

Since there are(

64

)ways to choose four of the six winning numbers and

(482

)ways of choosing the

remaining losing numbers, there are tpari1 =(

64

)(482

)= 16, 920 ways to win this prize. Thus, the

probability of winning the prize is 16, 920/25, 827, 165 ≈ 0.000655. Generally speaking, there are

tpari1 , tpari2 , tpari3 , · · · , tparid distinct tickets that can split a pari-mutuel pot of (positive) dollar amounts

r1N, r2N, r3N, . . . , rdN, respectively. Here, each ri is the proportion of total sales N given to the prize

won by tparii tickets. Note that the letter d represents the total number of pari-mutuel prizes. Then

any player that purchases one of these tparii tickets will receive a prize that is inversely proportional to

the number of players who win that prize: each individual wins a prize of riNnidollars, where ni is the

number of people who win the i-th pari-mutuel prize.

• One of the possible distinct tickets wins a share of the (positive) jackpot J . If w copies of this one

6

ticket are sold, then each ticket holder receives J/w, in dollars.

In order to ensure accuracy of prediction in �nal estimates on rates of return, from now on we will assume

that the values above are after taxes. All values from here on are consequently adjusted for taxes.

For our model, we will also create some other variables to represent di�erent statistics. These numbers

depend on the structure of each lottery, not on any particular drawing of that lottery. As a result, they do

not vary with the value of the jackpot for a drawing or the number of people who play during any particular

drawing. First, we are interested in the cost of a ticket minus the expected winnings from �xed prizes. To

get this statistic, we start with the cost of a ticket. Now, in order to �nd the average winnings that a player

will receive from �xed prizes, we add together the value of each prize (and we do this once for every winning

ticket), then we divide this by the total number of tickets. Finally, we subtract this quotient from the cost

of each ticket. We will denote this by f . Then with a ticket that costs $1,

f := 1−∑ci=1 ait

fixi

t.

Thus, f represents the loss that a player will incur per ticket on average, after taking �xed prizes into

account. This does not account for pari-mutuel prizes. Observe that the value∑ci=1 ait

�xi /t gives the

proportion that a player can expect ot receive from �xed prizes per ticket purchased. Then by subtracting∑ci=1 ait

�xi /t, we get f . We can also view f from the lottery operator's point of view: f represents the

proportion of each ticket that goes to the jackpot, the pari-mutuel prizes, and the amount that the operator

takes in as pro�t or to cover costs. Note that since each ai is non-negative and since the lottery operator

must stay pro�table, 0 < f ≤ 1.

Furthermore, we will de�ne F to be:

F := 1−∑ci=1 ait

fixi

t−

d∑i=1

ri = f −d∑i=1

ri. (2.3)

Unfortunately, there is no direct interpretation of F from the player's perspective. However, remember

that f can be used to represent the proportion of every ticket that goes to the jackpot, pari-mutuel prizes, and

lottery overhead. So subtracting each rate ri from f will remove the pari-mutuel prizes from the equation.

Thus from the lottery operator's perspective, F represents the proportion of every ticket that goes to jackpot

and overhead. Furthermore, a lottery will put at least some money into both F and f , and the operator

must pro�t from the lottery, so 0 < F < f ≤ 1.

7

Also note that the probability of winning the jackpot with w − 1 other people is given by:

(N − 1

w − 1

)(1

t

)w (1− 1

t

)N−w. (2.4)

We justify this by choosing w−1 winners from a total of N−1 players (independent of your own ticket). This

gives the binomial coe�cient. Each contestant has a chance of p = 1t to win, so 1

t multiplies the binomial

coe�cient w times. There is also a probability of 1 − 1t to lose the jackpot, so we multiply by this N − w

times (since there are N − w losers).

Just for simpli�cation and to decrease the space taken up by the equation for the (eRoR), let us de�ne

the function s as:

s(p,N) :=∑w≥1

1

w

(N − 1

w − 1

)pw(1− p)N−w.

With this de�nition, the expected return from the jackpot is Js (1/t,N). We multiply by 1w because the

value of the player's share of the jackpot, after split with w − 1 other people, is Jt . Similarly, we can �nd

the amount that one can expect to win from pari-mutuel prizes: since the rate of compensation is∑di=1 ri,

we multiply by the probability of winning that prize in order to get the (eRoR). So for the i-th pari-mutuel

prize, the expected rate of return for that prize alone is riNs (pi, N). We do this for each pari-mutuel prize.

Then the expected winnings from pari-mutuel prizes is∑di=1 riNs(pi, N). Also remember that f is the

average loss a player will incur after accounting for �xed prizes. Then −f gives us the negative of the cost of

a ticket plus the expected winnings from �exed prizes. Now, combining these into Equation (2.2) transforms

it into:

(eRoR) = − (cost of ticket) + (expected winnings from �xed prizes) + (expected winnings from pari-

mutuel prizes) + (expected winnings from jackpot)

= −f +

d∑i=1

riNs(pi, N) + Js(1

t,N), (2.5)

where pi is the probability of winning the i-th pari-mutuel prize and d is the number of pari-mutuel prizes.

Note that it is neither certain that a person will win any prize in the lottery nor certain that they will lose,

so 0 < pi < 1.

3 Model Analysis

Here we will introduce some mathematical insights and then proceed with analyzing the model itself.

8

3.1 Propositions

Let us now take some propositions into consideration. In the interest of space, we will only prove one of

them, but the rest are justi�ed in Abrams and Garibaldi's paper.

Proposition 3.1: For 0 < p < 1 and N > 0, we have:

s(p,N) =1− (1− p)N

N.

Plugging this into Equation (2.5) results in:

(eRoR) = −f +

d∑i=1

ri(1− (1− pi)N

)+ Js

(1

t,N

). (3.1)

Lemma 3.2: If 0 < c < 1, then 1− 1c − ln c < 0.

Lemma 3.3: If b ∈ (0, 1), then:

1. The function

h(x, y) :=1− bxy

x

satis�es hx := ∂h∂x < 0 and hy := ∂h

∂y > 0 for x, y > 0.

2. For every z > 0, the level set [(x, y) | h(x, y) = z] intersects the �rst quadrant in the graph of a smooth,

positive, increasing, concave up function de�ned on the interval (0, 1z ).

With these lemmas, we can prove the following corollary.

Corollary 3.4 (Abrams & Garibaldi in [1]): Suppose that 0 < p < 1. Then for x ≥ 0,

1. The function x 7→ s(p, x) decreases from − ln(1− p) to 0.

2. The function x 7→ xs(p, x) increases from 0 to 1.

9

Proof:

1. We will �rst show that x 7→ s(p, x) is a decreasing function, then show that its limit as x → ∞ is 0,

and �nally show that as x → 0+, s (p, x) → − ln (1− p). Together, these facts show that x 7→ s(p, x)

decreases from − ln(1− p) to 0.

(a) To show that x 7→ s(p, x) is decreasing, we set y = 1, and b = 1 − p. Then h (x, y) becomes

h (x, 1) = 1−(1−p)xx = s (p, x). We have 0 < 1− p < 1, so by Lemma 3.3, s (p, x) is decreasing.

(b) We will �rst show x 7→ s(p, x) tends to 0. But we can do this by inspection: since 0 < p < 1, we

know 0 < 1− p < 1, so limx→∞ (1− p)x = 0. Also, x→∞, we know that 1x → 0. It follows that

limx→∞ s (p, x) = 0. Then x 7→ s(p, x) decreases to 0.

(c) For 0 < a < 1, ln (a) < 0. Then − ln (a) > 0 on the same interval. Because 0 < 1 − p < 1, it

follows that − ln(1 − p) > 0. Then we must �nd the limit of x 7→ s(p, x) as x → 0+. We will do

so using l'Hopital's Rule. Deriving the numerator and denominator of s (p, x) with respect to x,

we get:

limx→0+

s(p, x) = limx→0+

ddx [1− (1− p)x]

ddxx

= limx→0+

− ln (1− p) (1− p)x

1

= − ln (1− p) (1− p)0

= − ln (1− p) .

2. With xs (p, x) = x · 1−(1−p)xx = 1 − (1− p)x . Then as x → ∞, xs (p, x) → 1. Similarly, as x →

0, xs (p, x)→ 0. Hence, on x ≥ 0, xs (p, x) increases from 0 to 1.

3.2 eRoR Analysis

In this section, we will begin deeper analysis of the (eRoR). With the above results, we can �nd that there is

e�ectively a �jackpot cuto�� J0. With a jackpot value that is less than J0, that lottery drawing will certainly

10

be a bad bet. This essentially gives us an upper bound on the (eRoR) in Equation (3.1). We will �nd that

J0 is so high that a lottery will rarely exhibit positive returns.

Note that since 0 ≤ pi ≤ 1, each term 1 − (1− pi)N ≤ 1. Also, −f +∑di=1 ri = −F . Then Equation

(3.1) becomes:

(eRoR) = −f +

d∑i=1

ri(1− (1− pi)N

)+ Js

(1

t,N

)

≤ −f +

d∑i=1

ri + Js

(1

t,N

)= −F + Js

(1

t,N

).

But by Corollary 3.1, since N > 1 and s(p, x) is decreasing, s(

1t , N

)< s

(1t , 1)= 1

t . Then:

(eRoR) < −F +J

t. (3.2)

With this insight, we will de�ne the value J0 as the jackpot cutoff. We call this the jackpot cuto�

because for a drawing with a lottery jackpot J such thatt J < J0, (eRoR) will certainly be negative. We let:

J0 := Ft.

Then in order for expected returns to be positive, we must have the lottery's jackpot J > Ft = J0. In

other words, the jackpot must be greater than the proportion of lottery income that goes into the jackpot

and overhead multiplied by the total number of distinct tickets. This is why we call J0 = Ft the jackpot

cuto�. Any lottery with jackpot J < J0 will then be a �bad bet.� Equivalently, we need J large enough so

that JJ0> 1 for the lottery to be a �good bet.� This normalization of J helps provide useful criteria that will

assist us in graphing our model. Similarly, we can normalize N and deal with the ratio NJ . This will also

help us in graphical analysis: we can think of the jackpot as large or small in comparison with the number

of people who participate in a particular drawing. With these ratios, we de�ne two new variables:

x := NJ so that N = xJ ;

and y := JJ0

so that J = yJ0.

Here we assume that there is a jackpot cuto� J0 that is greater than 0. On a similar note, we also assume

that an individual drawing's jackpot J is also greater than 0. Lastly, since N and J both vary per drawing

(but J0 does not), x and y also vary per drawing. Then x and y will simplify our graphical analysis and give

11

us relatively easy criteria that we can use to determine when a lottery is a good bet. If we substitute our

new expressions of N and J as well as our old de�nition of s (p,N) into (3.4), we obtain:

(eRoR) = −f +

d∑i=1

ri(1− (1− pi)N

)+ Js

(1

t,N

)

= −f +

d∑i=1

ri

(1− (1− pi)xJ

)+ (yJ0) s

(1

t, xJ

)

= −f +

d∑i=1

ri

(1− (1− pi)xyJ0

)+ (yJ0) s

(1

t, xyJ0

)

= −f +

d∑i=1

ri

(1− (1− pi)xyJ0

)+ (yJ0) ·

1−(1− 1

t

)xyJ0xyJ0

.

Then:

(eRoR) = −f +

d∑i=1

ri

(1− (1− pi)xyJ0

)+

1−(1− 1

t

)xyJ0x

. (3.3)

This equation gives us our most powerful graphical insight: when we set this equation equal to 0, we

have the break-even curve. In the interest of our graphical analysis, we do not simplify the exponents.

For any lottery, we can substitute the values of f, ri, pi, and1t , then graph the results as level curves in a

two-dimensional �eld. We will apply this equation and graph it in Section 4.3. Any particular ordered pair

(x0, y0) that lies on this break-even curve will have an (eRoR) of 0, while any ordered pair that lies above

it will have positive returns.

Proposition 3.5 (Abrams and Garibaldi in [1]): For every lottery, the break-even curve is the graph

of a smooth, positive function g(x) with domain(0, 1

F

).

Proof:

With any �xed value of x, we can think of (eRoR) as a smooth function of one variable, namely y. This

variable can never be 0 mathematically because that would imply J = 0, which in turn would make x = NJ

unde�ned. Practically speaking, we will never have a jackpot J = 0 because each lottery that has a jackpot

will put some money towards it.

12

Then as y → 0+, we have:

limy→0+

(eRoR) = limy→0+

[−f +

d∑i=1

ri

(1− (1− pi)xyJ0

)+

1−(1− 1

t

)xyJ0x

]

= −f +

d∑i=1

ri

(1− (1− pi)0

)+

1−(1− 1

t

)0x

= −f +

d∑i=1

ri (1− 1) +1− 1

x

= −f.

Note that −f < 0, so there is a value of y close to 0 (call it y0) such that at that point, our (eRoR) is

negative. Intuitively speaking, this makes sense. That is, we know that a lottery has to stay pro�table, so

it cannot hand out more in �xed prizes than it takes in via ticket sales. Then the only way we can have

(eRoR) > 0 is if nobody has won the jackpot in such a long time that it has increased to a huge sum.

Next, let us consider the partial derivatives of our (eRoR) function with respect to y. Keep in mind that

f is not a function of J or J0, so∂f∂y = 0. This gives us:

∂

∂y(eRoR) =

∂

∂y

[−f +

d∑i=1

ri

(1− (1− pi)xyJ0

)+

1−(1− 1

t

)xyJ0x

]

= −d∑i=1

rixJ0 ln (1− pi) (1− pi)xyJ0 −xJ0 ln

(1− 1

t

) (1− 1

t

)xyJ0x

= −d∑i=1

rixJ0 ln (1− pi) (1− pi)xyJ0 − J ln

(1− 1

t

)(1− 1

t

)xyJ0.

We know that 0 < 1− pi < 1 for each pi, so ln (1− pi) is de�ned and is negative. Similarly, ln(1− 1

t

)< 0.

Then the partial derivative of the (eRoR) with respect to y is positive. Thus, (eRoR) is a continuous, strictly

increasing function of y on our domain. We also have:

limy→∞

(eRoR) = limy→∞

−f +

d∑i=1

ri

(1− (1− pi)xyJ0

)+

1−(1− 1

t

)xyJ0x

= −f +

d∑i=1

ri +1

x

= −F +1

x.

Thus, the limit can be positive only when x ∈(0, 1

F

). For any x in this interval, we will have a value of y

(call it yk) such that (eRoR) > 0. Hence, since y0 < 0 < yk and (eRoR) is a continuous function, we can

invoke the Intermediate Value Theorem to say that there is a unique y ∈ (y0, yk) such that (eRoR) = 0.

13

Deriving the (eRoR) with respect to x leads to a more complicated result, and it is unclear whether

(eRoR) increases or decreases in x. But we can use the Implicit Function Theorem to show that the graph

of the break-even curve is still smooth. In conclusion, since (eRoR) is a smooth function of two variables,

we can plot smooth level curves of (eRoR) as functions of x on the domain. And if we get the level curve

for (eRoR) = 0, this will give us the break-even curve.

3.3 Graphical Analysis

Now that we can graph the break-even curve and have found some of its general properties, we will establish

upper and lower bounds for it. This brings us to another of Abrams and Garibaldi's most crucial results,

which we will eventually prove. Let the functions U, V be such that:

U :=

{(x, y)

∣∣∣∣− 1 +1− 0.45xy

x= 0

}

and

L :=

{(x, y)

∣∣∣∣− 0.8 +1− 0.36xy

x= 0

}.

De�nition: A major lottery is one with at least 500 distinct tickets.

Note: This is a fairly arbitrary de�nition, and almost all state- or federal-run lotteries will match this

condition. In particular, the MEGA Millions lottery has over 150 million distinct tickets.

Lemma 3.6: The function g (t) =(1− 1

t

)tis increasing t > 1 and its limit as t→∞ is 1

e .

Note: We will not prove this, but rest assured that it is justi�ed in Abrams and Garibaldi's paper.

Theorem 3.7 (Abrams and Garibaldi in [1]): For any major lottery with F ≥ 0.8 so that the lotterypays out less than 20% of revenue in prizes other than the jackpot:

1. The break-even curve lies in the region between the curves U and L and to the right of the y-axis.

2. Any drawing with (x, y) above the break-even curve has a positive (eRoR).

3. Any drawing with (x, y) below the break-even curve has a negative (eRoR).

14

Proof:

For Part (1), we will start by positing that for c > 0,

1− 0.45c < cJ0s

(1

t, cJ0

)< 1− 0.36c. (3.4)

Now, we will show this is true. Since J0 = Ft and s(p,N) = 1−(1−p)NN ,

cJ0s

(1

t, cJ0

)= cJ0 ×

1−(1− 1

t

)cJ0cJ0

= 1−(1− 1

t

)cJ0.

By Lemma 3.6,(1− 1

t

)tis increasing. And since we assumed t ≥ 500 and

(1− 1

500

)500= 0.998500, we have:

(1− 1

t

)t≥ 0.998500 > 0.36.

Now, for the lower bound in (3.8), Lemma 3.6 implies(1− 1

t

)tis at most 1

e . Combining these, we get:

1− e−cF < cJ0s

(1

t, cJ0

)< 1− 0.36cF . (3.5)

Since our hypothesis guarantees F ≥ 0.8, we can plug this in to obtain:

1− e−cF ≥ 1− e−c×0.8 ≈ 1− 0.4493c > 1− 0.45c. (3.6)

Similarly,

1− 0.36cF ≤ 1− 0.36c. (3.7)

Combining Equation (3.6) and Equation (3.7) with Equation (3.5) conclusively gives us our proposition in

Equation (3.4).

If we return to Equation (3.1), we can note that each ri is non-negative. Then to obtain a lower

bound, we can take pi = 0 for all i, because this will cancel out each term in the sum, leaving us with

−f +Js (1/t,N) ≤ (eRoR). Then combining this with the upper bound given by Equation (3.2), we obtain:

−f + Js

(1

t,N

)≤ (eRoR) ≤ −F + Js

(1

t,N

).

Then we can replace N with xyJ0 and J with xyJ0x = yJ0 and apply equation (3.5) with c = xy. Recall that

15

x and y > 0. Then the inequality transforms into:

−f + Js

(1

t,N

)≤ (eRoR) ≤ −F + Js

(1

t,N

)−1 + yJ0s

(1

t, xyJ0

)≤ (eRoR) ≤ −0.8 + yJ0s

(1

t, xyJ0

)−1 + 1− 0.45xy

x< (eRoR) < −0.8 + 1− 0.36xy

x. (3.8)

Analogously to Lemma 3.3, the partial derivates of the upper and lower bounds in Equation (3.8) are negative

with respect to x and positive with respect to y. This implies part (1) of this theorem.

Parts (2) and (3) follow directly from the proof of Proposition 3.5: for any �xed value x, there is a y0

that will give us a value of the (eRoR) below the break-even curve (this will be negative), and a yk that will

give us a value of the (eRoR) above the break-even curve (this will be positive).

4 Model Application

Now we will begin our journey to acquire data so that we can graph the break-even curves of California's

lotteries. In order to do this, we will need to compile data to represent each lottery. Speci�cally, the

California Lottery provides statistics on the probability of winning each prize (we denote this probability

pi), the value of each prize per drawing (denoted by vi), and the number of people winning each prize per

drawing (denoted by ni). We must use this data to �nd the value of the jackpot cuto� (denoted by J0),

the rates at which the lottery compensates for pari-mutuel prizes (denoted by ri), and the total number of

people playing in each drawing (denoted by N). However, in some situations, the best way to get this data

may not be inherently obvious.

4.1 California's MEGA Millions Lottery

Before we analyze the MEGA Millions lottery, let us �rst brie�y discuss its setup. Each ticket costs $1, and a

player can choose �ve numbers between 1 and 56 as well as one number between 1 and 46 (this is the MEGA

number). Drawings are conducted twice weekly, on Tuesdays and Fridays. A player's winnings are based

o� of how many numbers he or she matches. The jackpot is given out to players who match all �ve base

numbers as well as the MEGA number, while the second highest prize is given to players who just match the

�ve base numbers. The third highest prize is given to players who match four base numbers as well as the

MEGA number, while the fourth highest prize is given to players who just match four base numbers, etc.

16

Players also win a prize for matching only the MEGA number. Each prize is pari-mutuel, so compensation

depends on how many people won that prize. There are no �xed prizes. And the last piece of pertinent

information is that there are no California state taxes, while federal withholding taxes are levied at a rate of

30% on non-U.S. citizens on prizes valued at over $600. We will not assume that the players are American

citizens, because citizenship is not necessary to play the MEGA Millions lottery in California. The curious

reader can adjust the model for U.S. citizens alone by noting that the corresponding federal tax rate is 25%.

4.2 Maximum Likelihood Estimations for N

When dealing with situations similar to the lottery, there is a large amount of variance, which we can think

of as uncertainty in estimation. For example, consider the number of people per drawing and the equation

Npi = ni. For any total amount of participants N , this equation will give us an estimate of how many

people ni win prize i based o� of the probability pi of winning that prize. Then if we know ni and pi and

want to estimate the value of N , one way we could do so is by dividing the number of winners ni by the

probability pi of winning that prize. Then N ≈ ni

pi. But this estimate can vary widely: looking at Table 4.1

under the prize category for 5 matching numbers, we would estimate that N ≈ 3 · 3, 904, 701 = 11, 714, 103.

That would mean roughly one-third of California's population bought a ticket in the four days between

10/21/2011 (the date of the previous drawing) and 10/25/2011, assuming that each person buys only one

ticket. This does not seem very plausible for such a low jackpot. But if we choose to estimate the same N

using the prize category for 3 matching numbers, we get N ≈ 12, 465 · 306 = 3, 814, 290. Since the jackpot

for this particular drawing was not very high, it seems to be far more likely that a relatively small amount

of people participated in the drawing. As we have just seen, estimations that arise from simply dividing ni

by pi can produce a wide range of approximate values for N , giving us a wide range of error. We will then

expand on Abrams and Garibaldi's model by turning to maximum likelihood estimates.

Recall from Probability Theory: if we want the probability of events x1, · · · , xn dependent on a model

parameter θ, we would have a statement like Pr (x1, · · · , xn|θ). Here we take the model parameter θ as

a �xed value and the values xj ,where 0 < j ≤ n, as unknowns. But what if we didn't know the model

parameters, and instead just observed the data x1, · · · , xn? In this case, we would want to �nd the values

of θ that would maximize the probability of the events x1, · · · , xn actually occurring. Then we want to

maximize the statement L (θ|x1, · · · , xn). This is the concept of maximum likelihood estimation. Restated

di�erently, while we may not know the actual values of certain parameters, we can take the parameter values

that will most likely give us our observed data.

17

De�nition: The maximum likelihood estimation of a parameter θ is given by:

θ̂MLE = argmaxθL (θ|x1, . . . , xn)

where argmaxθ gives the value of θ that maximizes the argument L (θ|x1, . . . , xn).

Table 4.1

Prize Category: ni = # of Winners in CA: pi = Probability to Win,

1 in:

vi = Prize Amount:

5 + Mega 0 175,711,536 $ 39,900,000.00

5 3 3,904,701 $ 96,699.40

4 + Mega 8 689,065 $ 5,000.10

4 226 15,313 $ 168.00

3 + Mega 284 13,781 $ 152.00

3 12,465 306 $ 7.00

2 + Mega 4,945 844 $ 9.00

1 + Mega 29,321 141 $ 3.00

Mega 58,662 75 $ 1.00

Overall 105,914 39.89 n/a

Source: [4], MEGA Millions Draw #662, 10/25/2011. Prize values are after federal tax. Each vi =riN/ni.

Example: Let us consider a coin that has been �ipped 70 times. We know the outcome of each �ip, but we

are unsure if the coin is fair. Then given that the coin landed as �heads� 32 times, we want to �nd the

probability that a coin tossed 70 times shows �heads� 32 times. We have that for any particular �ip,

�heads� shows with the probability θ, and �tails� shows with the probability 1 − θ. Then we want to

maximize

L (θ|H = 32, T = 38) = Pr (H = 32, T = 38|θ) =(70

32

)θ32 (1− θ)38

,

where 0 ≤ p ≤ 1. The binomial coe�cient indicates we must choose which �ips will give us a �heads�,

and we must do this 32 times. To maximize this, we will take the derivative with respect to θ and set

18

this equal to zero:

d

dθL (θ|H = 32, T = 38) =

d

dθ

[(70

32

)θ32 (1− θ)38

]= 32

(70

32

)θ31 (1− θ)38 − 38

(70

32

)θ32 (1− θ)37

=

(70

32

)θ31 (1− θ)37

[32 (1− θ)− 38θ]

0 =

(70

32

)θ31 (1− θ)37

[32− 70θ] .

Then L (θ) has critical points when θ = 0, θ = 1, and θ = 3270 . However, when θ = 0 or θ = 1, we have

L (θ) = 0; when θ = 3270 , we have L (θ) > 0. Then the likelihood of the outcome H = 32 is maximized

for θ = 3270 , so θ̂MLE = 32

70 . Of course, this makes sense: the likelihood that a coin will come out heads

32 out of 70 times is maximized when we have an unfair coin that will land as heads with probability

θ = 3270 .

With this explanation of maximum likelihood estimation, we can now try to �nd N̂MLE for our model of

the lottery. In order to do so, we must maximize the function L (N |n1, . . . , nd) such that:

L (N |n1, . . . nd) = Pr (n1, . . . , nd|N) =

(N

n1, . . . , nk, N −∑ki=1 ni

)pn1

1 pn22 . . . pnk

k

(1−

k∑i=1

pi

)N−∑ki=1 ni

.

Note that we have a multinomial coe�cient because multiple people win the same prizes, and we must choose

who wins. Now, let us represent Pr (n1, . . . , nd|N) as PN . Then the function L (N |n1, . . . nd) is maximized

when we have found an N such that PN+1

PN≈ 1. Then plugging this into Mathematica gives us:

1 ≈ PN+1

PN=

( N+1n1,...,nk,(N+1)−

∑ki=1 ni

)pn1

1 pn22 . . . pnk

k

(1−

∑ki=1 pi

)(N+1)−∑k

i=1 ni

(N

n1,...,nk,N−∑k

i=1 ni

)pn1

1 pn22 . . . pnk

k

(1−

∑di=1 pi

)N−∑di=1 ni

N̂MLE ≈∑ki=1 ni −

∑ki=1 pi∑k

i=1 pi=

∑ki=1 ni∑ki=1 pi

− 1.

S for any given drawing, we can use the data produced by the California Lottery to plug ni and pi in

to obtain our value for N̂MLE . Basically, rather than getting a common denominator then adding up each

term, we simply add the terms in the numerator, and divide that sum by the sum of the denominators. For

19

Table 4.1, this is given by:

N̂MLE ≈ 0 + 3 + 8 + 226 + 284 + 12, 464 + 4, 945 + 29, 321 + 58, 6621

174,711,536 + 13,904,701 + 1

689,065 + 115,313 + 1

13,781 + 1306 + 1

844 + 1141 + 1

75

− 1

≈ 4, 233, 484.3317

≈ 4, 233, 484.

Then for the MEGA Millions drawing on 10/25/2011, there were approximately 4,233,484 tickets purchased

in California. This is how we will compile data on population per drawing.

Note that the number obtained by maximum likelihood estimations is markedly lower than the average

of dividing the number of people who won each prize by the probability of winning this prize. If we were to

go with this method of averaging, we would have:

Navg ≈ 1

9

[01

174,711,536

+31

3,904,701

+81

689,065

+226

115,313

+284

113,781

+12, 464

1306

+4, 945

1844

+29, 321

1141

+58, 662

175

]≈ 4, 569, 216.2

≈ 4, 569, 216.

But this di�erence would be even more pronounced if somebody had won the jackpot. In order to check the

validity of this statement, we will keep all other numbers constant, but change n1 from n1 = 0 to n1 = 1.

With this small change, we can easily �nd that Navg ≈ 23,981,609 tickets, whereas N̂MLE ≈ 4,233,524

tickets. Then for this small change in observed data, Navg increases wildly, while N̂MLE has gone up by a

relatively minuscule amount. So while it is possible that 23,981,609 tickets were purchased and the average

is accurate, it is far more likely that 4,233,524 tickets were purchased. Since maximum likelihood estimations

typically minimize variance, we rely on this method to obtain estimations for N .

4.3 Obtaining the Other Necessary Data

Now that we have estimations for N , we can use these to estimate the rates ri that are paid out for each

pari-mutuel prize. Recall that all prizes in the MEGA Millions lottery are pari-mutuel: due to state law,

there are no �xed prizes within California. Furthermore, because California is the only such state, we should

only consider California's data on drawing participation N and rates of compensation ri. We will ignore

national participation data and compensation rates. Due to the fact that there are no �xed prizes, we must

20

also modify Abrams and Garibaldi's model by observing that:

f = 1−∑ci=1 ait

fixi

t(4.1)

= 1. (4.2)

Now, we should mention that since each ticket in the MEGA Millions lottery costs $1, the number of tickets

purchased N also represents the total amount of money spent by individuals per drawing. So we have

something of the form riN = nivi, where vi is the value of the i-th pari-mutuel prize, not including the

jackpot. Then we will approximate these rates by saying

ri ≈vini

N̂MLE

.

With this insight, for MEGA Millions Draw #662, we have the data in Table 4.2. Then for this drawing,

the lottery operator paid out a total of∑8i=1 ri ≈ 0.16 (approximately 16%) of the income N to pari-mutuel

prizes excluding the jackpot.

Table 4.2:

vini = $96, 699.40 ∗ 3 $5, 000.10 ∗ 8 $168 ∗ 226 $152 ∗ 284 $7 ∗ 12, 465

ri ≈ 0.0685246950 0.0094486716 0.0089684997 0.0101968024 0.0206106838

vini = $9 ∗ 4, 945 $3 ∗ 29, 321 $1 ∗ 58, 662

ri ≈ 0.0105126180 0.0207779219 0.0138566721

Source: [4], MEGA Millions Draw #662, 10/25/2011. Estimated with N̂MLE ≈ $4, 233, 484. Prizevalues are after federal taxes.

Now we must �nd the jackpot cuto�. Recall that J0 = Ft. Also remember that the MEGA Millions lottery

is set up so that a player chooses �ve numbers, each between 1 and 56, and one MEGA number, which

can be between 1 and 46. We can choose �ve numbers between 1 and 56 in(

565

)=3,819,816 di�erent ways.

Then for each of those 3.8 million choices, we choose one number between 1 and 46, for a total of(

461

)=46

di�erent ways. Now for the total number of distinct tickets, we multiply these together. Consequently, there

are 3, 819, 816 × 46 = 175, 711, 536 possible distinct tickets. We can substitute this and Equation 4.1 into

21

the formula for the jackpot cuto� to get:

J0 = Ft

=

(f −

8∑i=1

ri

)× 175, 711, 536

≈ (1− 0.1628965645)× 175, 711, 536

≈ 147, 088, 730.

Then for the MEGA Millions lottery, the jackpot cuto� is approximately $147,088,730. This assumes that

our estimates for the rates ri are both correct and constant. However, we have attempted to minimize error

by obtaining our N estimate through maximum likelihood estimation. And as for the constancy of the rates,

the lottery should keep them the same because any alterations would be a fundamental restructuring of the

game.

Now for Draw #662, the (eRoR) given by Equation 3.1 is equivalent to:

(eRoR) = −f +

d∑i=1

ri

(1− (1− pi)xyJ0

)+

1−(1− 1

t

)Nx

≈ −1 +d∑i=1

ri

(1− (1− pi)N

)+J

N

[1− 0.9999999944233484

]≈ −1 + (0.139703179) +

39900000

4233484(0.0250810156)

≈ −1 + (0.139703179) + (0.236385096)

= −0.623911725

≈ −0.62.

Then with 4,233,484 purchased tickets, we have an expected rate of return of about -62%. So for every dollar

a person spent on this drawing, they received an average return of 38 cents, and lost a net total of 62 cents.

Table 4.3 compiles every time since 2009 where the jackpot value was above the jackpot cuto� of

$147,088,730 after 30% federal taxes. Take note that we only deal with the �nal jackpot values: since

the jackpot increases for every drawing in which somebody does not win the jackpot, for each date below,

the jackpot was higher than J0 for several drawings prior to the one listed. This shows there have been

numerous times where the (eRoR) has been positive.

Table 4.3:

22

Date: J : N̂MLE : (eRoR) : J/J0 : N/J :

03/25/2011 $218,400,000 26,263,932 +32% 1.48 0.12

01/04/2011 $248,500,000* 35,706,523 +45% 1.69 0.14

05/04/2010 $186,200,000 24,072,463 +16% 1.27 0.13

08/28/2009 $235,200,000* 22,609,912 +42% 1.60 0.10

05/01/2009 $157,500,000** 9,790,701 +3% 1.07 0.06

03/03/2009 $148,400,000 11,742,182 -2% 1.01 0.08

Sources: [5, 4]. Jackpot values are after Federal taxes. The three columns on the right are estimatedbased on N̂MLE.

* Split between two winners.

** Split between three winners.

At this point, we can also produce our break-even curve. This is given by the graph in Figure 4.1. Note

that the yellow graph of the (eRoR) falls between our upper and lower bounds, which are blue and red

(respectively). Thus, our predictions are consistent with actual results. Furthermore, if we increased the

window, we would see that our functions never appear to touch. Since the yellow curve shows the break-even

curve (or equivalently, it shows where (eRoR) = 0 for the MEGA Millions lottery), by Theorem 3.7, any

point above the curve will give a positive (eRoR), and any point below or to the left will give a negative

(eRoR). The area between 0 ≤ x ≤ 0.3 where 1 ≤ y ≤ 1.1 is ambiguous, and must be handled on a case-by-

case basis. Consistent with our predictions, each point in Table 4.3 that lies above the yellow curve gives a

positive (eRoR), while the one point below, (0.08, 1.01), gives a negative (eRoR).

Figure 4.1:

23

The graph of U is given by the blue curve, the graph of (eRoR) is given by the yellow curve, and the

graph of L is given by the red curve. Abrams and Garibaldi proposed a di�erent graph for the (eRoR),

given by the green curve. The blue dots are the plotted points from Table 4.3. Note also that x = N/J

and y = J/J0.

As it turns out, since the California lottery has its own setup for pari-mutuel prizes (and thus has its

own number of participants N), we have a slightly di�erent curve than the one proposed in Abrams and

Garibaldi's paper. Our yellow break-even curve starts o� above, but then quickly converges. Because the

graph of our (eRoR) break-even curve is higher than Abrams and Garibaldi's initially, California should

meet the conditions for positive (eRoR) less frequently. However, California seems to exhibit an advantage

because the number of participants within the state are markedly lower than the total number of national

participants... It seems that California's lack of �xed prizes tends to be an advantage for the player in the

end. As noted in their paper, since the point (1, 2) is below our curve, for any drawing with N > J and

J < 2J0, we have a negative (eRoR). Nationally speaking, N will very frequently exceed J . But locally

speaking, this will happen relatively rarely because California players are a small subset of the national set,

and MEGA Millions jackpots start at $12,000,000, so players in our state have an advantage.

24

5 Conclusion

While we have isolated numerous scenarios in which the lottery has a positive rate of return, even for

California, these tend to be very few and far between. And when those circumstances do arise, there are

numerous other worthy investments that can allow for equal or similar returns on investment with much less

risk. Essentially, large jackpots skew the expected rate of return and can make an investment in the lottery

seem desirable even when in all likelihood, any particular person will not win the lottery. Overall, less risky

assets constitute better investments. Abrams and Garibaldi consider this on a more deeper level through

Portfolio Theory analysis, but su�ce to say that while the lottery can be a �good bet�, it is rarely a good

investment.

25

References

[1] Aaron Abrams; Skip Garibaldi. Finding good bets in the lottery, and why you shouldn't take them.

American Mathematical Monthly, 117:3�26, January 2010.

[2] Unknown. Amazing stats: The stats don't lie! Technical report, California Lottery, 2011.

http://www.calottery.com/WinnersGallery/AmazingStats/. Accessed on October 10, 2011.

[3] Unknown. How to play lotto texas. Technical report, Texas Lottery, 2011.

[4] Unknown. Past winning numbers. Technical report, California Lottery, 2011.

http://www.calottery.com/games/megamillions/winningnumbers/pastwinningnumbers.htm. Accessed

on 11/2/2011.

[5] Unknown. Previous results. Technical report, USAMEGA.com, 2011. http://www.usamega.com/mega-

millions-history.asp. Accessed on 11/3/2011.

26

brian bumpas-final draft

Documents