qt1 - 07 - estimation

Download QT1 - 07 - Estimation

If you can't read please download the document

Upload: prithwis-mukerjee

Post on 16-Apr-2017

1.644 views

Category:

Education


3 download

TRANSCRIPT

Introduction

Estimation

QUANTTECHINTEUQIASEVIT10SS

Why Estimation ?

[ From ] Inference

For a given population

Various statistical parameters are GIVEN or KNOWN

Mean, Standard Deviation etc

Task was to interpret them and take managerial decisions

How many shirts to be stocked in the store ?

Is the machine setup faulty ? Should we fix it

[ To ] Estimation

For the given population or sample

Various statistical parameters are NOT KNOWN

So managerial decisions cannot be taken

UNLESS we can estimate the parameters

Two kinds of Estimates

Point Estimate

A single number that is used to estimate a given population parameter

Mean Age = 22.3

Interval Estimate

A range of values used to estimate a population parameter

Mean Age is between 21.5 and 23

Difficulty of point estimate

It is either right or wrong !

No way to know the quantum of error in the estimate

Needs to accompanied by another estimate of the error that could have happened !!

Estimator

Estimator

A sample statistic that is used to estimate a population parameter

Estimate

A specific value of the statistic that is observed

Criteria for a Good Estimator

Unbiased

Example : Mean of the sampling distribution of sample means taken from the population is equal to the population mean itself.

Efficient

Depends on the standard error of the statistic

Standard error = standard deviation of the sampling distribution

If standard error is low, estimator is efficient

Consistent

When sample size increases the value of the statistic comes closer and closer to the value of the parameter

Sufficient

Uses all information that can be extracted from sample

Point Estimate

Estimate of Mean

Sample Mean

S x

x =

n

Estimate of Standard Deviation

S (x x)2

s2 =

n 1

We cannot use the lower statistic because

S (x x)2

s2 =

n

because it can be shown that it has a bias !

Where are the errors here ?

Potential number of patrons at a very popular musical concert that is always sold out ..

Estimator : Average number of tickets sold

Telephone calls are billed by whole minutes even if the duration is a fraction. What is the average length of a call ?

Estimator : Average billing for all calls made over a day / Rate per minute

Interval Estimate

An interval estimate describes a range of values within which a population parameter is likely to lie

Consider an interval estimate for the mean

Start with a point estimate

Find the likely error of this estimate

Standard error is standard deviation of the estimator

Make an interval estimate

Defined in terms of the estimate and the standard error

Find the probability that mean will fall in this interval estimate

Example

Estimate the average battery life of a car in months

From a sample size of 200 we get x = 36

Standard error of sample mean

s

sx = = 0.707 assuming s = 10

Now we can make an interval estimates like

x sx < m < x + sx => 35.293 < m < 36.707

x 2sx < m < x + 2sx => 34.586 < m < 37.414

x 3sx < m < x + 3sx => 33.879 < m < 38.121

Back to Probability

Sampling distribution of the mean is also normal with

Mean = 36.0

Standard Deviation ( Standard Error) = 0.707

So probability of the real mean lying between the limits given by the interval estimate is known !

68.3%

95.5%

99.7%

Interval Estimate of Mean

Probabilities are as follows

68.3%=> 35.293 < m < 36.707

95. 5 %=> 34.586 < m < 37.414

99.7%=> 33.879 < m < 38.121

Here we note that the probabilities are odd, fractional kind of numbers ...

So how can we have simpler probabilities like 50% or 90% probability ?

Confidence Interval

We observe that in a normal distribution

90% of the values lie within 1.64s of mean

99% of the values lie within 2.58s of mean

So we redefine our interval estimates as

90%=> 34.84 < m < 37.16

99%=> 34.18 < m < 37.82

Confidence Intervals

Original Limits

68.3%=> 35.293 < m < 36.707

95. 5 %=> 34.586 < m < 37.414

99.7%=> 33.879 < m < 38.121

More convenient limits

90%=> 34.84 < m < 37.16

99%=> 34.18 < m < 37.82

Limits

1s

2s

3s

1.64s

2.58s

Is a higher confidence interval always better ?

I am 99.999% sure that the average age of this class lies between 1 year and 50 years

Does this really help you in anyway ?

I am 95% sure that the average age of this class lies between 23 and 26 years

This gives me a far better idea of where the average age of the class lies

This information is better than the first information

95% confident that the mean battery life lies between 30 42 months

Does NOT mean that

There is 95% probability that the mean life of all our batteries falls within the interval established from this one sample

It DOES mean that

If we select many random samples of the same size and calculate a confidence interval for each of these samples then 95% of these intervals will contain the population mean

Calculation of Confidence Interval
Example

A large automotive parts wholesalers needs an estimate of the mean life that he can expect from a windshield wiper under normal driving conditions

It is known that the standard deviation of the population life is 6 months

Observations from 1 simple random sample of 100 blades is as follows

Sample Size

n = 100

Sample Mean

x = 21 months

Population standard deviation

s = 6 months

95% Confidence Interval

Standard Error

s

sx =

6

=

= 0.6 months

95% confidence level will include 47.5 % on each side of the mean

Sample size is > 30 so we can assume that the sample mean follows a normal distribution

In a normal distribution

95% values lie within 1.96 times the standard deviation

95 % values of the sample mean lie within 1.96 times the standard error

Upper Confidence Limit
Lower Confidence Limit

Upper Confidence Limit

x + 1.96 sx

= 21 + 1.96 ( 0.6)

= 22.18 months

Lower Confidence Limit

x 1.96 sx

= 21 1.96 ( 0.6)

= 19.82 months

Two major assumptions

Standard Deviation of Population is known

In reality this is may not be known

The sampling distribution follows the normal distribution

This assumption is valid only if the sample size is more than 30.

Standard deviation is not known

When Standard Deviation Known

Standard Error of the Sample mean

s

sx =

When Standard Deviation is not Known

Standard Error of the sample mean

s

sx =

S (x - x)2

s =

n - 1

^

^

^

How do we get this interval

[Usually] we are trying to estimate the population mean m

We have an estimator E which [ in most cases ] is the sample mean.

E follows a distribution that has mean m and standard error s

We create a statistic Q = (E m)/s

Q follows a some distribution ( normal ? T ? )

We identify two values Q1, Q2 such that probability of Q falling between Q1 and Q2 is equal to required confidence P

Interval is E Q1s < m < E + Q2s

What is our goal ?

What is known ?

E, s, P

What is to be calculated

Q1, Q2

What is the objective

To be confident that

Probability of m

Lying between E Q1s and E + Q2s

Is equal to P

What are the steps

Identify an estimator

What distribution does the estimator follow ?

Is the standard deviation known ?

If not what is the estimator for the standard deviation

Is the sample size big enough ?

Get a value of the estimate

Get a value of the standard error for the estimator

Set an appropriate confidence level in terms of probability

From the graph / table of the sampling distribution get the upper and lower limits in terms of estimate and the standard error

Confidence Intervals Revisited

Original Limits

68.3%=> 35.293 < m < 36.707

95. 5 %=> 34.586 < m < 37.414

99.7%=> 33.879 < m < 38.121

More convenient limits

90%=> 34.84 < m < 37.16

99%=> 34.18 < m < 37.82

Point to note ...

m and s come from the estimate

How do we connect

68.3%, 90%, 95.5%, 99%, 99.7%

1, 1.64, 2, 2.58, 3

Limits

1s

2s

3s

1.64s

2.58s

By looking atthe probabilitydistribution function of the estimator

Sample Size in Estimation

What should be the sample size such that with a known population standard deviation, the sample size should be adequate to ensure an adequate confidence interval ?

Which distribution does the estimator follow ?

So far ... and usually .. we assume that the estimator follows the normal distribution

That is how we get

68.3% => 1.0s

90.0% => 1.64s

95.5% => 2.0s

99.0% => 2.58s

99.7% => 3.00s

68.3%

95.5%

99.7%

The Student's t distribution

Used when

Standard deviation of the population is NOT known AND

Sample size is less than 30

When this happens we cannot use the Normal distribution but must look up the tables for the T distribution

t-distribution instead of normal

=NORMSINV(D5+0.5)

=TINV($F5;G$2)

Usage of t-table

The probability that we are working with

IS NOT the probability that the estimated value will fall inside the confidence interval

INSTEAD

It is the probability that the estimated value will fall OUTSIDE the confidence interval

This probability is defined as a

Confidence = 1 a

Degree of Freedom

1 sample size

Binomial Distribution / Proportions

We have a binomial distribution with p as the success probability

20% student population are engineers

45% of employees population are married

We need to have an estimate of p

Estimator is proportion p from sample

Assumptions

Estimator follows normal distribution

m = np

Standard error of estimate

Example

Sample = 75

Fraction graduate

p = 0.4

Fraction not graduate

q = 0.6

Estimate of p = 0.4

Standard Error for Estimator

= 0.057

99% confidence interval

Z = 2.58

LCL

0.4 0.057 * 2.58

= 0.253

UCL

0.4 + 0.057 * 2.58

= 0.547

Normal DistributionValue of XProbability DensityColumn C

33.550.0013926773648661

33.730.00318468329587139

33.900.00684972932251462

34.080.0138570887117816

34.250.026367078474001

34.430.0471892911201431

34.600.0794358088809738

34.780.125771030205113

34.950.187299380755427

35.130.262351463951498

35.300.345638466463144

35.480.428303944809365

35.650.499198769788985

35.830.547250750234345

36.000.56427479547586

36.180.547250750234345

36.350.499198769788985

36.530.428303944809365

36.700.345638466463144

36.880.262351463951498

37.050.187299380755427

37.230.125771030205113

37.400.0794358088809738

37.580.0471892911201431

37.750.026367078474001

37.930.0138570887117816

38.100.00684972932251462

38.280.00318468329587139

38.450.0013926773648661

Area within Control LimitsArea outside control limitsNormal DistributionStudent t DistributionArea in tail491929

0.010PzPt

0.850.150.431.440.151.7781.5741.5001.479

0.860.140.431.480.141.8381.6191.5401.517

0.870.130.441.510.131.9021.6661.5831.558

0.880.120.441.550.121.9711.7181.6281.602

0.890.110.451.600.112.0481.7731.6771.649

0.900.100.451.640.102.1321.8331.7291.699

0.910.090.461.700.092.2261.8991.7861.754

0.920.080.461.750.082.3331.9731.8501.814

0.930.070.471.810.072.4562.0551.9201.881

0.940.060.471.880.062.6012.1502.0001.957

0.950.050.481.960.052.7762.2622.0932.045

0.960.040.482.050.042.9992.3982.2052.150

0.970.030.492.170.033.2982.5742.3462.282

0.980.020.492.330.023.7472.8212.5392.462

0.990.010.502.580.014.6043.2502.8612.756

1.000.000.50#VALUE!0.00Err:502Err:502Err:502Err:502

???Page ??? (???)02/12/2008, 09:02:04Page /

Click to edit the title text format

Click to edit the outline text format

Second Outline Level

Third Outline Level

Fourth Outline Level

Fifth Outline Level

Sixth Outline Level

Seventh Outline Level

Eighth Outline Level

Ninth Outline Level

prithwis mukerjee

Population of InterestParameter of InterestSample Statistic used as EstimatorEstimate Made

Production at FactoryAnnual ProductionProduction in one month2000 t/year

Candidates for employmentAverage AgeAverage age of every tenth applicant26y 3m

Students in EngineeringProportion of womenProportion of women in a sample of 100 students24.50%

???Page ??? (???)22/10/2008, 14:52:11Page / Normal DistributionValue of XProbability DensityColumn C

33.550.0013926773648661

33.730.00318468329587139

33.900.00684972932251462

34.080.0138570887117816

34.250.026367078474001

34.430.0471892911201431

34.600.0794358088809738

34.780.125771030205113

34.950.187299380755427

35.130.262351463951498

35.300.345638466463144

35.480.428303944809365

35.650.499198769788985

35.830.547250750234345

36.000.56427479547586

36.180.547250750234345

36.350.499198769788985

36.530.428303944809365

36.700.345638466463144

36.880.262351463951498

37.050.187299380755427

37.230.125771030205113

37.400.0794358088809738

37.580.0471892911201431

37.750.026367078474001

37.930.0138570887117816

38.100.00684972932251462

38.280.00318468329587139

38.450.0013926773648661