stat 231 final slides

100
 STAT 231 Final

Upload: rachel-l

Post on 13-Jul-2015

59 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 1/100

 

STAT 231

 Final

Page 2: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 2/100

 

Outline

• Chapter 1

 –  Data types (discrete, continuous, categorical)

 –  Problem (3

 different

 aspects)

 –  Populations (target, study, sample)

 –  Representations of  data

• Graphical: histograms,

 CDFs,

 box

 plots

 

• Numerical: mean, standard deviation, IQR

 –  Bivariate Data

• Relative risk

• Correlation co‐efficient 

Page 3: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 3/100

 

Outline

• Chapter 2

 –  Review of  probability distributions

 –  Random PPDAC

 examples…

 

Page 4: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 4/100

Outline

• Chapter 3

 –  Binomial Model

 –  Response Model

 –  Regression Model

 –  Maximum Likelihood Estimation

 

Page 5: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 5/100

Outline

• Chapter 4

 –  Sampling distributions for estimators

 –  Introduction to

 new

 distributions

• Gaussian

• Chi‐squared

• t –  Confidence Interval

 –  Hypothesis Testing

 –  Confidence Intervals

 and

 Hypothesis

 Testing

 with

 the

 likelihood

 

function

 

Page 6: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 6/100

Outline

• Chapter 5

 –  Testing for independence with categorical variates

 –  Model checking

 and

 assessment

 for

 assumptions

 

Page 7: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 7/100

Outline

• Chapter 6 – Comparison

• 2 sample t-tests• Paired t-test

 – Causality

• Testing for association• Blocking

• Randomization and repetition

• Matching – Prediction

• Prediction intervals for response

• Prediction intervals for regression

 

Page 8: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 8/100

Confidence Intervals using the 

Relative Likelihood Function

Define the likelihood function

Define the relative likelihood function as:

)(

)(

π 

π  )

 L

 L

∏=

=n

i

i x f  L1

)()(π 

 

Page 9: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 9/100

Confidence Intervals using the 

Relative Likelihood Function

Graph the

 relative

 likelihood

 function:

Draw a horizontal line at 0.1, the intersection of  the two

x‐coordinates forms an approximate 95% confidence interval

 

Page 10: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 10/100

Hypothesis Testing using the 

Likelihood Function

1) Define the null hypothesis, define the alternate 

hypothesis 

2) Define

 the

 test

 statistic,

 identify

 the

 distribution,

 

calculate the observed value 

3) Calculate the p‐value

The test statistic:

Distribution of  D:

)]()~

([20θ θ  ll D −=

 

Page 11: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 11/100

Hypothesis Testing using the 

Likelihood Function

Observed value

 of 

 D:

P‐value:

 

)]()([20θ θ  lld  −=  )

)( d  DP ≥ pn D −2

~ χ 

 

Page 12: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 12/100

Example

 

Page 13: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 13/100

Example

The observed value of  the test statistic )]()([20θ θ  lld  −=

 )

 

Page 14: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 14/100

Example

∑=

++=n

i

i xnl1

ln)1ln()( θ θ θ 

N

 

Page 15: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 15/100

Example

 

Page 16: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 16/100

Example

)]()([2 0θ θ  lld  −=  ) ∑=

++=n

i

i xnl1

ln)1ln()( θ θ θ 

 

Page 17: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 17/100

Model Assessment

• We’ve been assuming our data collected fits 

to a specific model (Binomial, Response, etc.)

• With these models come many assumptions, 

including independence

• In this

 chapter,

 we

 analyze

 our

 data

 to

 

actually see if  we’re able to use these models 

to fit

 our

 data

 

Independence with

Page 18: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 18/100

Independence with 

Binary Variates

• We want to see if  we can assume two binary 

variates (represented by 2 random variables X 

and Y)

 are

 independent

• This is essentially another type of  hypothesis 

testing

• Since a binary variate is  just a categorical 

variate with

 2 categories,

 this

 test

 can

 be

 extended to two categorical variates

 

Independence with

Page 19: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 19/100

Independence with

Binary Variates

Define:

Let X represent the binary variate gender (Male = 0, Female = 1)

Let Y represent the binary variate smoker (Non‐Smoker = 0, 

Smoker = 1)

 

Let n be the sample size

Let us collect our observed data and present in the following 

frequency table:

Male (X=0) Female (X=1) TotalNon-Smoker (Y=0) a b a + b

Smoker (Y=1) c d c + d

Total a + c b + d n = a + b + c + d

 

Independence with

Page 20: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 20/100

Independence with

Binary Variates

If  X and Y are independent then:

Expected 

frequency 

of  

male 

smokers 

is 

Expected frequency of  male non‐smokers is 

Expected frequency of  female smokers is 

Expected frequency

 of 

 female

 non

‐smokers

 is

 

)1()0( =⋅=⋅ Y P X Pn

)0()0( =⋅=⋅ Y P X Pn

)1()1( =⋅=⋅ Y P X Pn

)0()1( =⋅=⋅ Y P X Pn

 

Independence with

Page 21: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 21/100

Independence with

Binary Variates

Using the observed frequency table

Male (X=0) Female (X=1) Total

Non-Smoker (Y=0) a b a + b

Smoker (Y=1) c d c + d

Total a + c b + d n = a + b + c + d

)0( = X P

)1( = X P

)0( =Y P

)1( =Y P

 

Independence with

Page 22: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 22/100

Independence with

Binary Variates

Creating our expected frequency tableMale (X=0) Female (X=1) Total

Non-Smoker (Y=0) a + b

Smoker (Y=1) c + d

Total a + c b + d n = a + b + c + d

1

)0()0(

e

Y P X Pn

=

=⋅=⋅

2

)0()1(

e

Y P X Pn

=

=⋅=⋅

3

)1()0(

e

Y P X Pn

=

=⋅=⋅

4

)1()1(

e

Y P X Pn

=

=⋅=⋅

 

Independence with

Page 23: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 23/100

Independence with

Binary Variates

As with any other hypothesis testing question, we need to define the test statistic.

Test Statistic:

Distribution of  the test statistic:

Observed value:

∑=

−=

n

i i

ii

e

eoS

1

2)(

)1)(1(2

~ −− cr S χ 

∑=

−=

n

i i

ii

e

eos

1

2)(

 

Independence with

Page 24: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 24/100

Independence with

Binary Variates

p‐value

Make your

 conclusion:

 

Reject: X and Y are not independent

Accept: X and

 Y are

 independent

)( sSP ≥=

 

E l

Page 25: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 25/100

Example

 

E l

Page 26: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 26/100

Example

 

E l

Page 27: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 27/100

Example

∑=

−=

n

i i

ii

e

eos

1

2)(

Observed value:

 

l

Page 28: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 28/100

Example

P‐value:

 

M d l A t

Page 29: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 29/100

Model Assessment

For the

 regression

 model,

 we

 have

 the

 following

 

assumptions when fitting our data

1) The expectation of  Y is a linear function of  the explanatory 

variate

2) The model used is Gaussian 

3) Yi’s are independent

4) 

The 

model 

has 

constant 

variance

 

M d l A t

Page 30: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 30/100

Model Assessment

The expectation

 of 

 Y is

 a linear

 function

 of 

 the

 

explanatory variate

• The model

 assumes

 that

 E[Yi]

 is

 a linear

 combination

 of 

 xi

• If  we plot Yi vs. xi we should see a linear relationship

 

Model Assessment

Page 31: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 31/100

Model Assessment

The model

 used

 is

 Gaussian

• In the model, we assume and thus 

• How do

 we

 check

 if  this

 assumption

 is

 reasonable?

Residuals

• Rearranging the

 model,

 

• A realization of  R becomes 

• An estimated residual is, 

• Graphically  ,  is the distance from the line of  best fit to our observed response variate

),0(~ σ G R ),(~ σ  β α  xGY  +

)( xY  R β α +−=

)( iii x yr  β α +−=

iiii y y x yr )

 )

 ) )

−=+−= )( β α 

ir  )

 

Model Assessment

Page 32: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 32/100

Model Assessment

• We can

 check

 for

 the

 Gaussian

 assumptions

 by

 plotting

 a QQ 

 

plot

• Plot the sample quantiles against the theoretical quantiles of  

the estimated

 residuals,

 if 

 the

 line

 is

 relatively

 straight,

 then

 

the Gaussian assumption holds

 

Model Assessment

Page 33: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 33/100

Model Assessment

Yi’s are

 independent

• We will check these assumptions by plotting the fitted 

response 

against 

the 

estimated 

residuals,• If  our assumptions are true, we should see a random pattern 

centered around 0

ii x y β α 

 )

 ) )

+= ir 

 )

 

Model Assessment

Page 34: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 34/100

Model Assessment

 

Model Assessment

Page 35: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 35/100

Model Assessment

Yi’s have Constant Variance

• If  Yi’s have constant variance, we should see residuals evenly 

distributed around

 zero

Non‐constant variance: funnel shaped

 

Comparison

Page 36: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 36/100

Comparison

Recall in Chapter 1 we learned there were three 

different aspects (type of  problem)

• Descriptive

• Causative• Predictive

Chapter 6 looks

 at

 techniques

 for

 solving

 each

 of 

 

the 3 problems

 

Comparison

Page 37: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 37/100

Comparison

• The descriptive aspect of  the problem could involve looking 

and comparing between two different populations 

• In this

 section,

 we

 will

 learn

 how

 to

 conduct

 hypothesis

 tests

 that will allow us to make the conclusion whether there’s a 

difference between 2 populations

 – The question

 asked

 is

 ‘is

 there

 a difference

 between

 the

 mean values of  the 2 populations?’

• Essentially, the hypothesis tested is whether the parameter 

for 

each 

population 

is 

equal 

210 : μ μ  = H 

 

Comparison

Page 38: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 38/100

Comparison

2 sample

 t‐tests

 (Response

 Model)

• Two populations

• The estimator for each population is

• The sampling

 distribution

 for

 each

 estimator

 is

 j j RY  111 += μ   j j RY  222 += μ 

1

1

1

1

1

~

n

n

 j

 j∑=

=μ 2

1

2

2

2

~

n

n

 j

 j∑=

=μ 

),(~~

1

11

n

Gσ 

μ μ  ),(~~

2

22

n

Gσ 

μ μ 

 

Comparison

Page 39: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 39/100

Comparison

• In the

 hypothesis

 tests,

 we

 want

 to

 see

 if 

 the

 two

 parameters

 

and  are equal, so let’s look at the r.v.

• What is the sampling distribution of   under the 

assumption 

1μ  2μ  21

~~ μ μ  −

21

~~ μ μ  −

21μ μ  =

),(~~

1

11

n

Gσ 

μ μ  ),(~~

2

22

n

Gσ 

μ μ 

 

Comparison

Page 40: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 40/100

Comparison

)11

,0(~~~

21

21nn

G +− σ μ μ 

)1,0(~11

~~

21

21 G

nn +

σ 

μ μ Standardize

Replace with estimate

2

21

21

21~

11~

~~−+

+

−nnt 

nnσ 

μ μ 

 

Comparison

Page 41: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 41/100

Comparison

)2(

)1()1(

21

2

22

2

11

−+

−+−= nn

nn σ σ σ 

 ) )

 )

2

21

21

21~

11~

~~

−+

+

−= nnt 

nn

σ 

μ μ 

 

Example

Page 42: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 42/100

Example

 

Example

Page 43: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 43/100

Example

3.711 =μ  )

7.682=μ 

 )

2.101 =σ  )

3.112 =σ  )

471 =n

362 =n

 

Example

Page 44: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 44/100

Example

6892.10)23647(

3.11)136(2.10)147(

)2(

)1()1(22

21

2

22

2

11 =−+

−+−=

−+

−+−=

nn

nn σ σ σ 

 ) )

 )

097.1

36

1

47

16892.10

7.683.71=

+

−=t 

 

Paired T‐Tests

Page 45: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 45/100

Paired T Tests

• In the prior pages, we looked at two sample t‐tests

• A stronger test is called the paired t‐test

• This test

 only

 works

 if  the

 two

 samples

 we

 collect

 are

 actually

 

data for the same group of  n units, but at different times

• The paired t‐test involves simplifying the two data sets into 

one by

 finding

 the

 difference

 of 

 each

 pair

 of 

 data,

 and

 

working with this single dataset

• Then we conduct a usual t‐test/hypothesis test on this single 

dataset of 

 differences

 

Causation

Page 46: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 46/100

• The causative

 aspect

 of 

 a problem

 looks

 at

 the

 

relationship between the explanatory and response 

variates

• Recall in

 chapter

 1 we

 looked

 at

 2 types

 of 

 concepts

 that

 

looks at the relationship between X and Y

 – Relative Risk

 – Association

• Association involves calculating the correlation 

coefficient 

∑∑

==

=

−−

−−

===n

ii

n

ii

i

n

i

i

YY  XX 

 XY 

 y y x x

 y y x x

SS

Sr 

1

2

1

2

1

)()(

)()(

 ρ 

 

Causation

Page 47: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 47/100

• In this

 course,

 we

 only

 have

 the

 skills

 to

 test

 for

 

association

• This involves

 testing

 the

 hypothesis

 in the regression model

• If   , then we can say there is no 

association between

 X and

 Y

0:0=

 β  H 

0:0 = β  H 

 

Example

Page 48: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 48/100

p

 

Example

Page 49: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 49/100

p

)~

(

0

 β 

 β  β 

SE t 

−=

 )

 

Causation

Page 50: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 50/100

• Association does NOT imply causation

• The course

 notes

 talks

 about

 why

 this

 is

 the

 case and how we can avoid making the wrong 

assumption using three techniques

 – Blocking

 – Repetition and Randomization

 – Matching

 

Causation

Page 51: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 51/100

Confounding

• Association does not imply causation

• There could be a third hidden variate that is related to both 

the explanatory

 and

 response

 and

 causes

 this

 causal

 

relationship: this is called confounding 

• The difficulty with confounding variates is identifying them in 

the first

 place,

 or

 else

 we

 will

 make

 a wrong

 conclusion

 about

 

the relationship between the explanatory and response 

variates

• If  we

 can

 identify

 the

 confounding

 variates,

 then

 there

 are

 

tools we can use when designing experimental plans to 

account for these variates

 

Causation

Page 52: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 52/100

Blocking

• If  we’ve identified the confounding variate, we neutralize its 

effect by collecting samples where the units have the same 

value for

 the

 confounding

 variate

 

• The Chicken Example:

 –  Response variate: growth rate of  chickens

 –  Explanatory variate:

 protein

 in

 diet

 –  Confounding variate: gender of  the chickens

 –  Blocking: look at samples of  only male chickens and samples of  only 

females chickens

 –  This eliminates the gender effect and the experimenter is able to look 

at the effects of  protein in diet on the growth rate of  chickens

 

Causation

Page 53: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 53/100

Replication and

 Randomization

• If  we cannot identify or control the confounding variate, we can

also try to neutralize its effects by randomly allocating our 

controlled variate

 in

 the

 experimental

 plan

• The Medicine Example:

 –  Response variate: survival rate

 –  Explanatory variate:

 type

 of 

 treatment

 –  Confounding variates: medical history/health of  the patient

 –  Using randomization and replication to assign the treatment type to each 

unit 

will 

result 

in 

two 

very 

balanced 

groups 

in 

terms 

of  

their 

health/medical history

 –  This will eliminate the confounding variates as much as possible

 

Causation

Page 54: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 54/100

Matching and Observational Plans

• In observational plans, the experimenter cannot 

control the

 variates

• The method of  matching is used where the units that 

are being observed are compared with a control unit 

that has

 very

 similar

 characteristics

 to

 the

 unit

 in

 the

 

plan, (this is similar to blocking)

• Thus 

if  

there 

is 

difference 

in 

the 

value 

observed 

between the sampled unit and the control unit, the 

difference must be legitimate

 

Prediction

Page 55: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 55/100

• The predictive aspect of  a problem involves 

using 

our 

collected 

data 

to 

estimate 

value 

for a unit to be randomly selected from the 

population 

• We will look at prediction intervals for

 – Response

 – Regression

 

Prediction

Page 56: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 56/100

The Model

The predicted

 unit:

 

Since  follows the response model then 

 RY  += μ 

0Y 

),(~0 σ μ GY 

),(~ σ μ GY 

0

 

Prediction

Page 57: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 57/100

What would be a logical choice to use as our predicted 

value?

• The average

We need

 the

 estimator

 for

 the

 mean

 parameter:

 

),(~~n

G σ μ μ n

n

ii

∑== 1~μ 

μ ~

From MLE Sampling Distribution

 

Prediction

Page 58: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 58/100

If  we

 look

 at

 the

 difference

 between

 our

 predicted

 value

 and

 the

population average, then we have the random variable

μ ~0 −Y 

),(~~

nG

σ μ μ ),(~0 σ μ GY 

 

Prediction

Page 59: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 59/100

Standardizing gives

Replace with an estimator gives

)1

1,0(~~0

nGY  +− σ μ 

)1,0(~1

1

~0 G

n

+

σ 

μ 

1

0 ~1

1~

~

+

−nt 

n

σ 

μ 

 

Prediction

Page 60: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 60/100

Constructing a 95%

 Prediction

 Interval

 for (

 unknown)

Our ultimate goal: 

Since  we can make the probability statement:

0Y  σ 

bY a ≤≤ 0

1

0 ~1

1~

~

+

−nt 

n

σ 

95.0)1

1~

~( 0 =≤

+

−c

n

Y P

σ 

μ 

 

Prediction

Page 61: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 61/100

95.0)1

1~

~( 0 =≤

+

−≤− c

n

Y cP

σ 

μ 

 

Example

Page 62: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 62/100

Let Y be

 the

 response

 variate

 representing

 body

 weight

 (kg).

 The

following sample is collected: 

60 54 72 65 64

Construct a 95%

 prediction

 interval

 for

 the

 body

 weight

 of 

 someone

 we

 randomly select from the population.

nc

1

1+⋅± σ μ 

) )

N

 

Example

Page 63: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 63/100

nc

11+⋅± σ μ  ) )

 

Prediction

Page 64: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 64/100

The Model

But 

for 

our 

purposes, 

we 

will 

use 

shifted 

version 

of  

the 

model

 R xY  i ++= β α 

 R x xY  i +−+= )( β α 

 

Prediction

Page 65: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 65/100

The Model

The predicted

 unit:

 

We want to predict  given the subgroup 

Since  follows the regression model then 

0Y 

0Y 

 R x xY  i +−+= )( β α 

0Y  0 x xi =

)),((~ 00σ  β α  x xGY  −+

 

Prediction

Page 66: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 66/100

What would be a logical choice to use as our predicted 

value?

• The average

 given

 the

 subgroup

 which

 we

 

will denote0 x xi =

)(~ 0 xμ 

)(~~]|[)(~000 x x xY  E  x −+== β α μ 

 R x xY  i +−+= )( β α 

Regression Model

Average of the subgroup 0 x xi =

 

Prediction

Page 67: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 67/100

Using Maximum

 Likelihood

 Estimation

 we

 obtain

 the

 estimators

The sampling distributions of  these two estimators are

),(~~

nG

σ α α  ),(~

~

 XX S

Gσ 

 β  β 

n

Y n

i

i∑== 1~α   XX 

 XY 

n

i

i

n

i

ii

S

S

 x x

 x xY Y 

=

−−

=

=

=

1

2

1

)(

))((~

 β 

 

Prediction

Page 68: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 68/100

What is

 the

 sampling

 distribution

 of  )(~~)(~ 00 x x x −+= β α μ 

),(~~

n

Gσ 

α α  ),(~~

 XX S

Gσ 

 β  β 

))

)(1(),((~)(~

2

0

00

 xxS

 x x

n x xG x−

+−+ σ  β α μ 

 

Prediction

Page 69: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 69/100

If  we

 look

 at

 the

 difference

 between

 our

 predicted

 value

 and

 the

population average, then we have the random variable

The obvious

 next

 step

 would

 be

 to

 determine

 the

 sampling

 distribution of  

)(~ 00 xY  μ −

)),((~ 00 σ  β α  x xGY  −+ )))(1

(),((~)(~2

0

00

 xx

S

 x x

n x xG x

−+−+ σ  β α μ 

)(~ 00 xY  μ −

 

Prediction

Page 70: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 70/100

)),((~ 00 σ  β α  x xGY  −+ )))(1(),((~)(~2

0

00

 xxS x x

n x xG x −+−+ σ  β α μ 

 

Prediction

Page 71: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 71/100

Standardizing gives

Estimating sigma gives 

))(1

1,0(~)(~2

0

00

 xxS

 x x

nG xY 

−++− σ μ 

)1,0(~)(1

1

)(~

2

0

00 G

S

 x x

n

 xY 

 xx

−++

σ 

μ 

22

0

00 ~)(1

1~

)(~−

−++

−n

 xx

S

 x x

n

 xY 

σ 

μ 

 

Prediction

Page 72: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 72/100

Constructing a 95%

 Prediction

 Interval

 for (

 unknown)

Our ultimate goal: 

Since  we can make the probability 

statement:

0Y  σ 

bY a ≤≤ 0

22

0

00 ~)(1

1~

)(~−

−++

−n

 xx

S

 x x

n

 xY 

σ 

95.0)

)(11~

)(~(

20

00 =≤

−++

−c

S

 x x

n

 xY P

 xx

σ 

μ 

 

Prediction

Page 73: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 73/100

95.0))(1

1~

)(~(

2

0

00 =≤−

++

−c

S

 x x

n

 xY P

 xx

σ 

μ 

95.0))(1

1~)(~)(1

1~)(~(

95.0))(1

1~)(~)(1

1~(

95.0))(1

1

~

)(~(

2

0

00

2

0

0

2

0

00

2

0

2

0

00

=−

++⋅+≤≤−

++⋅−

=−

++⋅≤−≤−

++⋅−

=≤−

++

−≤−

 xx xx

 xx xx

 xx

S

 x x

nc xY 

S

 x x

nc xP

S

 x x

nc xY 

S

 x x

ncP

c

S

 x x

n

 xY cP

σ μ σ μ 

σ μ σ 

σ 

μ 

 

Prediction

Page 74: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 74/100

 xxS

 x x

n

c x x2

0

0

)(11)(

−++⋅±−+ σ  β α 

)

 )

 )

Upper and Lower bounds of a regression prediction interval

 

Example

Page 75: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 75/100

Let Y be

 the

 response

 variate

 representing

 body

 weight

 (kg)

 and

 X be the explanatory variate representing body height (cm).

The following

 sample

 is

 collected:

 

Construct a 95% prediction interval for the body weight of  

someone we randomly select from the population whose 

height is

 175cm.

 Use

 

i 1 2 3 4 5

xi 172 162 180 170 174

yi 60 54 72 65 64

97.2=σ  )

 

Example

Page 76: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 76/100

 xxS x x

nc x x

2

00 )(11)( −++⋅±−+ σ  β α 

)

 )

 )

i 1 2 3 4 5xi 172 162 180 170 174

yi 60 54 72 65 64

 

Example

Page 77: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 77/100

 xxS

 x x

nc x x

20

0

)(11)( −++⋅±−+ σ  β α  ) ) )

 

Outline

Page 78: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 78/100

• Chapter 1

 –  Data types (discrete, continuous, categorical)

 –  Problem (3 different aspects)

 –  Populations (target, study, sample)

 –  Representations of  data

• Graphical: histograms, CDFs, box plots 

• Numerical: mean,

 standard

 deviation,

 IQR

 –  Bivariate Data

• Relative risk

• Correlation 

co‐

efficient 

• Chapter 2

 –  Review of  probability distributions

 –  Random PPDAC

 examples…

 

PPDAC

Page 79: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 79/100

 

PPDAC

Page 80: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 80/100

Draw a frequency

 histogram

 of 

 the

 Flash

 data,

 with

 bins

 given

 by

the intervals (45  – 49.9), (50  – 54.9), etc.

First make

 a frequency

 table

 with

 the

 bin

 widths

Interval Frequency

(45 – 49.9) 1

(50 – 54.9) 1

(55 – 59.9) 2

(60 – 64.9) 5

(65 – 69.9) 5

(70 – 74.9) 1

(75 – 79.9) 1

(80 – 84.9) 1

(85 – 89.9) 2

(90 – 94.9) 1

 

PPDAC

Page 81: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 81/100

 

Concept Review 

Page 82: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 82/100

• From the

 previous

 example:

 – Target population, study population, sample, unit

 – Response vs.

 explanatory

 variates

 – Aspects

• Descriptive

• Causative

• Predictive 

 – Histograms

• Bin Width

• Frequency histogram

 

Outline

Page 83: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 83/100

• Chapter 3

 –  Binomial Model

 –  Response Model

 –  Regression Model

 –  Maximum Likelihood Estimation

 

MLE

Page 84: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 84/100

∏=

=n

i

i x f  L

1

);()( θ θ 

 

MLE

Page 85: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 85/100

∑−

+−=n

i

i xnl1

)ln()1(ln)( θ θ θ 

 

Concept Review

Page 86: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 86/100

• From the previous example:

 – Maximum Likelihood Estimation Method

• Define likelihood

 function

• Define log likelihood function

• Differentiate with respect to the parameter

• Set to

 zero

• Solve for the parameter

 

Outline

Page 87: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 87/100

• Chapter 4

 –  Sampling distributions for estimators

 –  Introduction to new distributions

• Gaussian

• Chi‐squared

• t

 –  Confidence Interval

 –  Hypothesis Testing

 –  Confidence Intervals and Hypothesis Testing with the likelihood 

function

 

Confidence Intervals

Page 88: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 88/100

 

Confidence Interval

Page 89: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 89/100

 

Concepts Review

h i l

Page 90: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 90/100

• From the

 previous

 example:

 – Confidence Intervals for the response model, sigma 

unknown – Structure of  a symmetric confidence interval

 

Hypothesis Testing

Page 91: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 91/100

 

Hypothesis Testing

For a paired t test we create a new set of data

Page 92: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 92/100

For a paired

 t‐test,

 we

 create

 a new

 set

 of 

 data

1 2 3 4 5 6 7 8

Diff 0.48 0.53 0.52 0.21 -0.05 0.44 0.41 0.68

9 10 11 12 13 14 15 16

Diff 0.46 0.76 3.09 0.26 0.34 0.32 -0.07 0.33

 

Hypothesis Testing

T t t ti ti 0DT μμ

Page 93: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 93/100

Test statistic: 1

0 ~~~

−−= n

 D

 D t 

n

T σ 

μ μ 

 

Hypothesis Testing

P value

Page 94: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 94/100

P‐value

 

Hypothesis Testing

Page 95: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 95/100

 

Hypothesis Testing

For a 2 sample t test we have two populations with 2 sets of data

Page 96: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 96/100

For a 2 sample

 t‐test,

 we

 have

 two

 populations,

 with

 2 sets

 of 

 data

 

Hypothesis Testing

Test statistic: 21

~~= tT μμ

Page 97: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 97/100

Test statistic: 2

21

21~

11~−+

+

−= nnt 

nn

σ 

μ μ 

 

Hypothesis Testing

912)116(482)116()1()1(2222

++ nn σσ

Page 98: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 98/100

704.2)21616(

91.2)116(48.2)116(

)2(

)1()1(

21

2211 =−+

−+−=

−+

−+−=

nn

nn σ σ σ 

 ) ) )

Observed value

 of 

 the

 test

 statistic:

21

21

11

nn

+

−=

σ 

μ μ 

 )

 ) )

 

Hypothesis Testing

P value

Page 99: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 99/100

P‐value

 

Concepts Review

• From the previous example:

Page 100: Stat 231 Final Slides

5/12/2018 Stat 231 Final Slides - slidepdf.com

http://slidepdf.com/reader/full/stat-231-final-slides 100/100

• From the

 previous

 example:

 – Hypothesis Testing

• Define the null hypothesis

• Define the test statistic, identify the distribution, calculate 

the observed value of  the test statistic

• Calculate the p‐value

 – 2 sample t test

 – Paired t test