lecture 8: introduction to estimation & hypothesis...

Business Statistics:

Lecture 8: Introduction to Estimation & Hypothesis Testing

2

Agenda

Introduction to Estimation

Point estimation

Interval estimation

Introduction to Hypothesis Testing

Concepts en terminology of hypothesis testing

Null and alternative hypothesis

Type I and type II error

3

Concepts of Estimation

Objective: determine the approximate value of a population

parameter on the basis of a sample statistic, e.g.

sample mean ( ) is used to estimate population mean (μ)

sample proportion ( ) is used to estimate population proportion (p)

Estimator: formula (statistic) that provides the guess of the

population parameter (denoted by uppercase letters: , )

Estimate: numerical outcome of estimator once the sample

has been drawn (denoted by lowercase letters: , )

Two types of estimators:

Point estimator: provides single value

Interval estimator: provides interval with certain confidence

X

P̂

X P̂

x p̂

4

Point Estimators

Point Estimator: draws inferences about a population by

estimating the value of an unknown parameter using a single value

Example:

Consider adult, male inhabitants interested in mean height (μ) in cm; standard deviation (σ) is assumed to be 10 cm; population is assumed to be normal

Draw random sample X1, X2, …, Xn from population E(Xi) = μ for i = 1, …, n

Consider two point estimators:

Sample mean:

Sample median: M

X

5

Interval Estimators

Drawbacks of Point Estimator:

Virtually certain that estimate is wrong:

We need to know how close estimator is to the parameter of interest:

Therefore use is made of Interval Estimator: draws

inferences about a population by estimating the value of an unknown parameter using an interval

The width of the interval is related to the confidence (probability) that the interval includes the true parameter

P(X ) 0

P(| X | )?

6

Interval Estimator for μ (σ known)

Derivation:

Suppose that X ~ N(μ, σ) or n > 30 (CLT)

Then it follows that

The interval is a random interval

It includes (covers) the parameter μ with probability 1 –

X XN ,X / n

/2 /2

XP z z 1

/ n

/2 /2P z / n X z / n 1

/2 /2P X z / n X z / n 1

1 – = 0.95; z/2 = z.025 = 1.96

/2 /2[X z / n,X z / n]

7


Interpretation: ‘with repeated sampling from this population, the proportion of

valus of for which the interval includes the population mean μ is equal to 1 – ’

The interval is called the (1 – )x100% confidence interval (CI) estimator of μ

1 – is the confidence level (probability of correct estimate)

: lower confidence limit (LCL)

: upper confidence limit (UCL)

If we replace by the observed value , we get the (1 – )x100% confidence interval estimate for μ

Alternative notation:

X

/2 /2[X z / n,X z / n]

/2 /2[X z / n,X z / n]

/2X z / n

/2X z / n

X x

/2x z / n

8


Example (mean height of adult, male inhabitants of the Netherlands – cont.):

Suppose n = 400 and = 182 cm (σ = 10 cm)

Question: compute 95.44% CI estimate for μ

Solution:

(1 – )x100% CI estimate:

= 182

1 – = 0.9544 /2 = 0.0228 z/2 = 2.0 (Table 4)

95.44% CI estimate for μ: 182 2.0×0.5 = 182 1 cm

LCL = 181 cm; UCL = 183 cm

x

/2 /x z n

x

10 / 400/ n 0.5

9

The Error of Estimation

Sampling error can be defined as difference between an estimator (e.g. ) and a parameter (e.g. μ); also called error of estimation

X

10

Hypothesis & Hypothesis Testing

Hypothesis

Answer to a research question or assumption made about

a population parameter (Not a sample estimate!)

population mean

population proportion

Example: The mean monthly cell phone bill of this city is = $42

Example: The proportion of adults in this city with cell phones is p = .68

Hypothesis Testing

Determine whether there is enough statistical evidence in

favor of a certain belief or hypothesis about a parameter

11

Concepts of Hypothesis Testing

Overview of critical concepts in hypothesis testing:

1. There are two hypotheses

H0 (null hypothesis) & H1 (alternative hypothesis)

2. Testing procedure starts from assumption that H0 is true

3. Goal of the process is to determine whether there is enough evidence in favor of H1

4. There are two possible decisions:

‘there is enough evidence to reject H0 in favor of H1’

‘there is not enough evidence to reject H0 in favor of H1’

5. There are two possible errors:

Type I error: Reject a true H0; P(Type I error) =

Type II error: Do not reject a false H0; P(Type II error) =

The Null Hypothesis, H0

States the assumption (numerical) to be tested

Example: The average number of TV sets in U.S. Homes is at

least three ( )

Is always about a population parameter, not about a sample statistic

3μ:H0

The Null Hypothesis, H0

Begin with the assumption that the null hypothesis is true

Always contains “=” , “≤” or “” sign

May or may not be rejected

(continued)

The Alternative Hypothesis, HA

Is the opposite of the null hypothesis

e.g.: The average number of TV sets in U.S. homes is less than 3 ( HA: < 3 )

Never contains the “=” , “≤” or “” sign

May or may not be accepted

HA is generally the hypothesis that is believed (or needs to be supported) by the researcher

Population

Claim: the population mean age is 50. (Null Hypothesis:

REJECT

Suppose the sample mean age is 20: x = 20

Sample Null Hypothesis

20 likely if = 50? Is

Hypothesis Testing Process

If not likely,

Now select a random sample

H0: = 50 )

x

Sampling Distribution of x

= = 50 If H0 is true

It is unlikely that we would get a sample mean of this value ...

... then we reject the null hypothesis that = 50.

Reason for Rejecting H0

20

... if in fact this were the population mean…

x X

How much is a value of sample statistic far away from the

population value under H0?

We choose the critical value (cut-off value) on

your sampling distribution that tells you that your sample statistic is very far from the null hypothesis and

thus not likely.

17

Level of Significance,

In statistics, a critical value is the value corresponding

to a given significance level

Level of significance defines unlikely values of sample

statistic if null hypothesis is true

Defines rejection region of the sampling distribution

Is designated by (level of significance)

Typical values are .01, .05, or .10

http://en.wikipedia.org/wiki/Significance_level

Level of Significance and the Rejection Region

H0: μ ≥ 3

HA: μ < 3 0

H0: μ ≤ 3

HA: μ > 3

H0: μ = 3

HA: μ ≠ 3

/2

critical value

Lower tail test

Level of significance =

0

0

/2

Upper tail test

Two tailed test

Rejection region is shaded

Errors in Making Decisions

Type I Error

Reject a true null hypothesis

The probability of Type I Error is

Called level of significance of the test

Set by researcher in advance

Errors in Making Decisions

Type II Error

Fail to reject a false null hypothesis

The probability of Type II Error is β

(continued)

β

Outcomes and Probabilities

State of Nature

Decision

Do Not Reject

H 0

No error (1 - )

Type II Error ( β )

Reject H 0

Type I Error ( )

Possible Hypothesis Test Outcomes

H0 False H0 True

Key: Outcome (Probability) No Error:

Power ( 1 - β )

lecture 8: introduction to estimation & hypothesis...

Documents