lecture 1: introductionrsgill01/667/lecture 1.pdf · lecture 1: introduction math 667-01...

22
Lecture 1: Introduction MATH 667-01 Statistical Inference University of Louisville August 20, 2019 Last corrected: 8/23/2019 1 / 22 Lecture 1: Introduction

Upload: others

Post on 17-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 1: Introductionrsgill01/667/Lecture 1.pdf · Lecture 1: Introduction MATH 667-01 Statistical Inference University of Louisville August 20, 2019 Last corrected: 8/23/2019 1/22

Lecture 1: Introduction

MATH 667-01Statistical Inference

University of Louisville

August 20, 2019Last corrected: 8/23/2019

1 / 22 Lecture 1: Introduction

Page 2: Lecture 1: Introductionrsgill01/667/Lecture 1.pdf · Lecture 1: Introduction MATH 667-01 Statistical Inference University of Louisville August 20, 2019 Last corrected: 8/23/2019 1/22

Introduction

This lecture provides an introduction to important conceptson estimates and estimators through a basic example.

First, an example of a question from probability is presented,and then a couple questions from statistics are discussed.

2 / 22 Lecture 1: Introduction

Page 3: Lecture 1: Introductionrsgill01/667/Lecture 1.pdf · Lecture 1: Introduction MATH 667-01 Statistical Inference University of Louisville August 20, 2019 Last corrected: 8/23/2019 1/22

A Probability Question

3 / 22 Lecture 1: Introduction

Page 4: Lecture 1: Introductionrsgill01/667/Lecture 1.pdf · Lecture 1: Introduction MATH 667-01 Statistical Inference University of Louisville August 20, 2019 Last corrected: 8/23/2019 1/22

A Probability Question

Example L1.1: Suppose that there are ten marbles in a box, 6of which are red and 4 of which are blue. Four marbles areselected at random without replacement from the box. Whatis the probability that exactly one of the selected marbles isred?

Answer to Example L1.1:Recall that if S is a finite sample space with equally likelyoutcomes, and we are interested in computing the probabilityof an event A, then

P (A) =number of elements in A

number of elements in S.

4 / 22 Lecture 1: Introduction

Page 5: Lecture 1: Introductionrsgill01/667/Lecture 1.pdf · Lecture 1: Introduction MATH 667-01 Statistical Inference University of Louisville August 20, 2019 Last corrected: 8/23/2019 1/22

A Probability Question

Answer to Example L1.1 continued:Since we are selecting a sample of 4 objects withoutreplacement from a population of 10 objects, the samplespace is the set of all possible subsets of size 4 from thepopulation, and the number of elements of S (that is, thenumber of possible samples) is(

10

4

)=

10!

4! · 6!=

10 · 9 · 8 · 74 · 3 · 2 · 1

= 210.

Next, we need to compute the number of ways of choosing 1red marble and 3 blue marbles. The Fundamental Theorem ofCounting1 states that if a job consists of k separate tasks, theith of which can be done in ni ways, i = 1, . . . , k, then theentire job can be done in n1 × n2 × · · · × nk ways.

1CB: Theorem 1.2.14 on page 13, HMC: page 105 / 22 Lecture 1: Introduction

Page 6: Lecture 1: Introductionrsgill01/667/Lecture 1.pdf · Lecture 1: Introduction MATH 667-01 Statistical Inference University of Louisville August 20, 2019 Last corrected: 8/23/2019 1/22

A Probability Question

Answer to Example L1.1 continued:

So, since there are

(6

1

)= 6 ways to choose the one red

marble and

(4

3

)= 4 ways to choose three blue marbles, there

are 6× 4 = 24 ways to choose one red and 3 blue marblesfrom the box.

So, the probability of the event A that exactly one of theselected marbles is red is

P (A) =

(6

1

)(4

3

)(

10

4

) =24

210=

4

35≈ .114286.

6 / 22 Lecture 1: Introduction

Page 7: Lecture 1: Introductionrsgill01/667/Lecture 1.pdf · Lecture 1: Introduction MATH 667-01 Statistical Inference University of Louisville August 20, 2019 Last corrected: 8/23/2019 1/22

A Probability Question

Alternately, the answer to Example L1.1 can be obtained byletting X be the random variable representing the number ofred marbles obtained when selecting 4 marbles at randomwithout replacement from the box and computing P (X = 1).

In this experiment, X is a discrete random variable whichfollows a hypergeometric distribution2 that has probabilitymass function (pmf)

P (X = x) =

(M

x

)(N −Mn− x

)(N

n

)for x in {max {0,M −N + n} , . . . ,min {n,M}} where N isthe number of objects in the population, M is the number ofobjects of the desired type in the population, and n is thenumber of objects selected in the sample.

2CB: page 87, HMC: page 1627 / 22 Lecture 1: Introduction

Page 8: Lecture 1: Introductionrsgill01/667/Lecture 1.pdf · Lecture 1: Introduction MATH 667-01 Statistical Inference University of Louisville August 20, 2019 Last corrected: 8/23/2019 1/22

A Statistics Question

In statistics, we instead consider what we can infer about theunknown population based on a known sample.

For example, suppose that we know that there are 10 red andblue marbles in a box but we do not know how many are redand how many are blue. We select 4 marbles at randomwithout replacement and obtain 1 red marble and 3 bluemarbles. What can we infer about the contents of the box?What is our best guess for the number of red marbles in thebox?

8 / 22 Lecture 1: Introduction

Page 9: Lecture 1: Introductionrsgill01/667/Lecture 1.pdf · Lecture 1: Introduction MATH 667-01 Statistical Inference University of Louisville August 20, 2019 Last corrected: 8/23/2019 1/22

A Statistics Question

9 / 22 Lecture 1: Introduction

Page 10: Lecture 1: Introductionrsgill01/667/Lecture 1.pdf · Lecture 1: Introduction MATH 667-01 Statistical Inference University of Louisville August 20, 2019 Last corrected: 8/23/2019 1/22

A Statistics Question

There are many ways to obtain an estimate of M , the numberof red marbles in the box. In this class, we will consider a fewof the possible methods of estimating this number.

Now we will briefly introduce a popular estimate called themaximum likelihood estimate (MLE) to estimate a population“parameter” using observed data based on a parametricmodel.

Suppose that X is an n-dimensional random vector with jointpmf (or in the continuous case, joint pdf) fθ(x) where θ is ak-dimensional parameter in a parameter space Θ.

10 / 22 Lecture 1: Introduction

Page 11: Lecture 1: Introductionrsgill01/667/Lecture 1.pdf · Lecture 1: Introduction MATH 667-01 Statistical Inference University of Louisville August 20, 2019 Last corrected: 8/23/2019 1/22

A Statistics Question

Definition L1.1: The likelihood function L : Θ→ R is definedby L(θ) = L(θ;x) = fθ(x).

Note the difference between the pmf/pdf and the likelihoodfunction. For the pmf/pdf, θ is considered to be known andthe domain of f is the possible values of X. For thelikelihood function, x is considered to be known and thedomain of L is Θ.

Definition L1.23: A value of θ in Θ which maximizes L iscalled a maximum likelihood estimate (MLE) of θ. Such avalue is often denoted by θ̂ or θ̂(x).

The MLE finds the value(s) of the parameter which “makesthe observed data most likely”.

3CB: Definition 7.2.4 on page 316, HMC: Definition 6.1.1 on page 357)11 / 22 Lecture 1: Introduction

Page 12: Lecture 1: Introductionrsgill01/667/Lecture 1.pdf · Lecture 1: Introduction MATH 667-01 Statistical Inference University of Louisville August 20, 2019 Last corrected: 8/23/2019 1/22

A Statistics Question

Example L1.2: Suppose that there are ten marbles in a box,M of which are red and 10−M of which are blue. We selectfour marbles at random without replacement and observe 1red marble and 3 blue marbles. What is the maximumlikelihood estimate of M , the number of red marbles in thebox?

Answer to Example L1.2: Let X be a hypergeometric randomvariable with parameters M , N = 10, and n = 4. Thelikelihood function for estimating M is

L(M) = L(M ;X = 1) =

(M

1

)(10−M

3

)(

10

4

) for M ∈ {1, . . . , 7} .

Now we compute the likelihood function for each value in thedomain of L on the next slide.

12 / 22 Lecture 1: Introduction

Page 13: Lecture 1: Introductionrsgill01/667/Lecture 1.pdf · Lecture 1: Introduction MATH 667-01 Statistical Inference University of Louisville August 20, 2019 Last corrected: 8/23/2019 1/22

A Statistics Question

Answer to Example L1.2 continued:

L(1) =

(1

1

)(9

3

)/(10

4

)= .400000

L(2) =

(2

1

)(8

3

)/(10

4

)≈ .533333

L(3) =

(3

1

)(7

3

)/(10

4

)= .500000

L(4) =

(4

1

)(6

3

)/(10

4

)≈ .380952

L(5) =

(5

1

)(5

3

)/(10

4

)≈ .238095

L(6) =

(6

1

)(4

3

)/(10

4

)≈ .114285

L(7) =

(7

1

)(3

3

)/(10

4

)≈ .033333

13 / 22 Lecture 1: Introduction

Page 14: Lecture 1: Introductionrsgill01/667/Lecture 1.pdf · Lecture 1: Introduction MATH 667-01 Statistical Inference University of Louisville August 20, 2019 Last corrected: 8/23/2019 1/22

A Statistics Question

Since L is maximized when M = 2, the MLE is M̂ = 2.

The following R command can be used to compute all ofthese values.

> dhyper(1,1:7,9:3,4)

[1] 0.40000000 0.53333333 0.50000000 0.38095238

[5] 0.23809524 0.11428571 0.03333333

A list of some R commands with some probability distributionfor computing cdf’s, for inverting cdf’s, for computing pdf’s orpmf’s, and for generating random values are available athttp://www.stat.umn.edu/geyer/old/5101/rlook.html.

14 / 22 Lecture 1: Introduction

Page 15: Lecture 1: Introductionrsgill01/667/Lecture 1.pdf · Lecture 1: Introduction MATH 667-01 Statistical Inference University of Louisville August 20, 2019 Last corrected: 8/23/2019 1/22

A Statistics Question

Alternately, we can maximize L without checking all possiblevalues of θ.Since the natural logarithm is an increasing function,maximizing L is equivalent to maximizing

`(M) = lnL(M)

= lnM +

2∑i=0

ln(10−M − i)− ln 6− ln 210.

Even though the domain of ` is {1, . . . , 7}, let’s temporarilyconsider all values in [1, 7] and differentiate ` twice to obtain

d`

dM=

1

M− 1

10−M− 1

9−M− 1

8−Mand

d2`

dM2= − 1

M2− 1

(10−M)2− 1

(9−M)2− 1

(8−M)2.

15 / 22 Lecture 1: Introduction

Page 16: Lecture 1: Introductionrsgill01/667/Lecture 1.pdf · Lecture 1: Introduction MATH 667-01 Statistical Inference University of Louisville August 20, 2019 Last corrected: 8/23/2019 1/22

A Statistics Question

Sinced2`

dM2< 0, ` is concave downward on [1, 7] so

d`

dMis

decreasing.

Furthermore, since

d`

dM

∣∣∣∣M=2

=11

168> 0

andd`

dM

∣∣∣∣M=3

= − 37

210< 0,

` is either maximized at M = 2 or M = 3.

16 / 22 Lecture 1: Introduction

Page 17: Lecture 1: Introductionrsgill01/667/Lecture 1.pdf · Lecture 1: Introduction MATH 667-01 Statistical Inference University of Louisville August 20, 2019 Last corrected: 8/23/2019 1/22

Sampling Distribution

In order to assess the quality of a method of estimating aparameter, we need to study the sampling distribution4 of theestimator of the parameter.

To do this, we consider the experiment before the data isobserved and consider all possible samples and theirprobabilities. For each possible sample, we determine theestimate of the parameter if the sample is observed. Thisallows us to compute the pmf/pdf of the estimator.

In general, it is important to note that the distribution of theestimator depends on the true value of the parameter.

4CB: Definition 5.2.1 on page 21117 / 22 Lecture 1: Introduction

Page 18: Lecture 1: Introductionrsgill01/667/Lecture 1.pdf · Lecture 1: Introduction MATH 667-01 Statistical Inference University of Louisville August 20, 2019 Last corrected: 8/23/2019 1/22

Sampling Distribution

Now suppose that there are ten marbles in a box, M of whichare red and 10−M of which are blue. We select four marblesat random without replacement and observe x red marblesand 4− x blue marbles. For each possible value of x in{0, 1, 2, 3, 4}, we find the maximum likelihood estimate of M .

Using the same method discussed in the answer toExample L1.1, we find thatM̂ = 0 if x = 0,M̂ = 2 if x = 1,M̂ = 5 if x = 2,M̂ = 8 if x = 3,and M̂ = 10 if x = 4,as shown in the R code on the next slide.

18 / 22 Lecture 1: Introduction

Page 19: Lecture 1: Introductionrsgill01/667/Lecture 1.pdf · Lecture 1: Introduction MATH 667-01 Statistical Inference University of Louisville August 20, 2019 Last corrected: 8/23/2019 1/22

Sampling Distribution

> dhyper(0,0:6,10:4,4)

[1] 1.000000000 0.600000000 0.333333333

[4] 0.166666667 0.071428571 0.023809524

[7] 0.004761905

> dhyper(1,1:7,9:3,4)

[1] 0.40000000 0.53333333 0.50000000 0.38095238

[5] 0.23809524 0.11428571 0.03333333

> dhyper(2,2:8,8:2,4)

[1] 0.1333333 0.3000000 0.4285714 0.4761905

[5] 0.4285714 0.3000000 0.1333333

> dhyper(3,3:9,7:1,4)

[1] 0.03333333 0.11428571 0.23809524 0.38095238

[5] 0.50000000 0.53333333 0.40000000

> dhyper(4,4:10,6:0,4)

[1] 0.004761905 0.023809524 0.071428571

[4] 0.166666667 0.333333333 0.600000000

[7] 1.00000000019 / 22 Lecture 1: Introduction

Page 20: Lecture 1: Introductionrsgill01/667/Lecture 1.pdf · Lecture 1: Introduction MATH 667-01 Statistical Inference University of Louisville August 20, 2019 Last corrected: 8/23/2019 1/22

Sampling Distribution

So we can write

M̂(x) =

0 if x = 02 if x = 15 if x = 28 if x = 310 if x = 4

;

given a sample x, M̂(x) is the maximum likelihood estimateof M .

Before we observe the data, X is a hypergeometric randomvariable with parameters M , N = 10 and n = 4, and we wantto find the distribution of the maximum likelihood estimatorM̂(X).

20 / 22 Lecture 1: Introduction

Page 21: Lecture 1: Introductionrsgill01/667/Lecture 1.pdf · Lecture 1: Introduction MATH 667-01 Statistical Inference University of Louisville August 20, 2019 Last corrected: 8/23/2019 1/22

Sampling Distribution

Example L1.3: Suppose that there are ten marbles in a box,M of which are red and 10−M of which are blue where M isunknown. If the true value of M is 6, find the samplingdistribution for the maximum likelihood estimator of M basedon a sample of four marbles selected at random withoutreplacement.

21 / 22 Lecture 1: Introduction

Page 22: Lecture 1: Introductionrsgill01/667/Lecture 1.pdf · Lecture 1: Introduction MATH 667-01 Statistical Inference University of Louisville August 20, 2019 Last corrected: 8/23/2019 1/22

Sampling Distribution

Answer to Example L1.3: When M = 6, the pmf of M̂(X) isgiven below.

P (M̂(X) = 0) = P (X = 0) =

(6

0

)(4

4

)/(10

4

)=

1

210≈ .005

P (M̂(X) = 2) = P (X = 1) =

(6

1

)(4

3

)/(10

4

)=

24

210≈ .114

P (M̂(X) = 5) = P (X = 2) =

(6

2

)(4

2

)/(10

4

)=

90

210≈ .429

P (M̂(X) = 8) = P (X = 3) =

(6

3

)(4

1

)/(10

4

)=

80

210≈ .381

P (M̂(X) = 10) = P (X = 4) =

(6

4

)(4

0

)/(10

4

)=

15

210≈ .071

P (M̂(X) = m) = 0 if m is not in {0, 2, 5, 8, 10}

22 / 22 Lecture 1: Introduction