يﺮﯿﮔدﺎﯾ...

4/28/2014

1

فاصله هاي آماري براي یک نمونهStatistical Intervals for

single Sample

نظریه یادگیري)Learning Theory(

دانشگاه فردوسی مشهد دانشکده مهندسی

رضا منصفی

درس و پنجمبیست

4/28/2014

2

Outline Confidence interval on the Mean of a Normal Distribution, Variance Known

Development of the Confidence Interval and its Basic Properties Choice of Sample Size One-Sided Confidence Bounds General Method to Derive a Confidence Interval A Large-Sample Confidence Interval for µ Bootstrap Confidence Intervals

Confidence interval on the Mean of a Normal Distribution, Variance Unknown The t Distribution Development of the t Distribution The t Confidence Interval on µ

Confidence Interval on the Variance and Standard Deviation of a Normal Population Large-Sample Confidence Interval for a Population Proportion Prediction Interval for a Future Observation Tolerance Intervals for a Normal Distribution

4/28/2014

3

IntroductionRecapitulate: In last lesson, showed how a parameter can be

estimatedestimated from samplesample datadata.Note: Important to understand how good is the estimate obtained.

Example: Suppose estimate of the mean of a parameter is

Sampling Variability: Almost never the case that . The point estimate says nothing about how close is to µ.

? Is the process mean likely to be between 900 and 1100? Or

? Is it likely to be between 990 and 1010? Note: Answer to these questions, affectsaffects our decisions regarding

this process. Interval estimate: BoundsBounds that represent an intervalinterval of plausibleplausible valuesvalues for a

parameterparameter are an example of an intervalinterval estimateestimate.Surprisingly, it is easy to determine such intervals in many cases, and

the same data that provided the point estimate point estimate are typically used.Confidence Interval: An interval estimate for a population parameter is called a

Confidence Interval (CI).

ˆ 1000m x x

(900 1100)2

ˆ 1000

(990 1010)2

ˆ 1000

4/28/2014

4

CannotCannot be certaincertain that the interval contains the true, unknown population parameter.

Only used a sample from the full population to compute the point estimate and the interval. However, the confidenceconfidence intervalinterval is constructed so that we have highhigh confidenceconfidence that it does contain the unknown population parameter. Confidence intervals are widely used in engineering and the sciences.

Tolerance interval: Another important type of interval estimate.

Assumption: Data of the estimated parameter might be assumed to be normally distributed.We might like to calculate limits that bound 95% of the parameter values.

For a normal distribution, we know that 95% of the distribution is in the interval

4/28/2014

5

As for a confidence interval, it is not certain that bounds 95% of the distribution, but the interval is constructed so that we have highhigh confidenceconfidence that it does.

Tolerance intervals are widely used and, as we will subsequently see, they are easy to calculate for normal distributions.

This is not a usefuluseful tolerancetolerance interval interval because the parameters µ and σ are unknown.Point estimates such as

for µ and s for σ

can be used.However, we need to account for the potential error in each

point estimate and s to form a tolerance interval for the distribution.

The result is an interval of the form

m x

where k is an appropriate constant (that is larger than 1.96 to account for the estimation error).

x

!?

4/28/2014

6

3 types of interval estimates:-

1)1) CConfidenceonfidenceinterval

boundsbounds populationpopulation or distribution parametersparameters(such as the mean)

2)2) ToleranceTolerance interval

boundsbounds a selectedselected proportionproportion of a distributiondistribution

3)3) PredictionPrediction interval

boundsbounds futurefuture observationsobservations from the populationpopulation or distribution

..محدوده ي پارامترهاي جمعیت یا توزیع را مشخص می کندمحدوده ي پارامترهاي جمعیت یا توزیع را مشخص می کند

..محدوده ي قسمت انتخابی از توزیع را مشخص می کندمحدوده ي قسمت انتخابی از توزیع را مشخص می کند

..محدوده ي مشاهده هاي آتی از جمعیت یا توزیع را مشخص می کندمحدوده ي مشاهده هاي آتی از جمعیت یا توزیع را مشخص می کند

4/28/2014

7

1) ConfidenceConfidence interval and

2) ToleranceTolerance intervalbound unknown elements of a distribution.

3) PredictionPrediction intervalprovides bounds on one (or more) future observationsfrom the population.

For example, a prediction interval

could be used to bound a single, new measurement of

one more parameter.With a large sample size,

the predictionprediction intervalinterval for normally distributed data tends to the tolerancetolerance intervalinterval,

but for more modest sample sizes

the prediction and tolerance intervalsare differentdifferent.

4/28/2014

8

Confidence Interval on the Mean of a Normal Distribution,

VarianceVariance KnownKnownSimple situation: The basic ideas of a Confidence Interval (CI) are most easily

understood by initially considering a simple situation.

Suppose that we have a normal population with unknownunknown mean µ and knownknown variance σ2!?

This is a somewhat unrealistic scenario because typically we know the distribution

meanmeanbefore we

know the variancevariance !?

However, later will present confidence intervals for more general situations

(t Distribution).

!?

4/28/2014

9

Development of the Confidence Interval and its Basic PropertiesSuppose X1, X2, ….., Xn is a

random sample from a normal distribution with

unknown mean µ and known variance σ2 !?

The sample mean is normally distributed with mean µ and variance σ2/n.May standardize by

1) subtracting the mean and 2) dividing by the standard deviation, which results in the variable

XX

Z has a standard normal distribution.A confidence interval estimate for µ is an interval of the form where the

end-points l and u are computed from the sample data.

Because differentdifferent samplessamples will produceproduce differentdifferent valuesvalues of l and u,

these end-points are values of random variables L and U, respectively. Suppose that we can determine values of L and U such that the following

probability statement is true:

l u

4/28/2014

10

There is a probability of of selecting a sample for which the CI will contain the truevalue of µ. Once we have selected the sample, so that X1 = x1, X2 = x2, ….., Xn = xn , and computed l and u, the resulting confidence interval for µ is

The end-points or bounds l and u are called the lowerlower-- andand upperupper--confidenceconfidence limitslimits, respectively, and

is called the ConfidenceConfidence CoefficientCoefficient.

In our problem situation, because has a standard normaldistribution, we may write

Manipulate the quantities inside the brackets by(1) multiplying through by (2) subtracting from each term, and(3) multiplying through by -1.

This results in

X

L U

4/28/2014

11

From consideration of the lower and upper limits of the inequalities in

are the lower- and

upper-confidence limitsL and U, respectively.

This leads to the following definition.

Definition:

L U

4/28/2014

12

Example:

4/28/2014

13

Interpreting a Confidence IntervalHow does one interpret a confidence interval? In the impact energy estimation problem, the 95% CI is , so it is tempting to conclude that µ is within this interval with probability of 95%.

However, with a little reflection, it’s easy to see that this cannotcannot bebe correctcorrect!!!!!!!!!!!!!!

the true value of µ is unknownunknown and the statementstatement is either

correctcorrect (true with probability 1) or incorrectincorrect (false with probability 1).

The correct interpretation lies in the realization that a CICI is a randomrandom intervalinterval

because in the

probability statementdefining the

end-points of the interval , L and U are random variables.

Consequently, the correct interpretation of a 100(1 - α)% CI depends on the relative frequency view of probability.

63.84 65.08

63.84 65.08

4/28/2014

14

Specifically, if an infinite number of random samples are collected and a 100(1-α )% confidence interval for

µ is computed from each sample, 100(1-α )% of these intervals will contain the

true value of µ.

Repeated construction of a confidence interval for µ.

Several 100(1-α )% confidence intervals for the mean µ of a normal distribution.

??

Dots at the center of the intervals indicate the point estimate of µ (that is, )x

Notice that one of the intervals fails to contain the true value of µ. i.e., confidence 95%

If this were a 95%confidence interval, in the long run only (100%-95%)=5% of the intervals would fail to contain µ.

4/28/2014

15

In practice, we obtain only oneone randomrandom sample sample

and calculatecalculateoneone confidenceconfidence intervalinterval.

Since this interval either willwill oror willwill notnot

contain the truetrue valuevalue of µ,

it is not reasonablereasonable to attach a probabilityprobability levellevel !!

to this specificspecific eventevent. The appropriate statement is the

observed interval [l, u] brackets the true value of µ with confidenceconfidence 100(1-α )%.

This statement has a frequency interpretation; that is, we don’t know if the statement is truetrue forfor thisthis specificspecific samplesample,

but the method used to obtain the interval [l, u] yields correct statements100(1-α )% of the time.

4/28/2014

16

Confidence Level and Precision of EstimationOur choice of the 95% levellevel of confidenceconfidence was essentially arbitrary. What would have happened if we had chosen a

higherhigher levellevel of confidenceconfidence, say, 99%?In fact, doesn’t it seem reasonable that we would want the higher level of confidence? At α=0.01, we find while for α=0.05, . Thus, the length of the 95% CICI iswhereas the length of 99% CICI is

Thus, the 99% CI is longer than the 95% CI. This is why we have a higher level of confidence in the 99% confidence interval.

Generally, for a fixed sample size n and standard deviation σ,

the higher the confidence level, the longer the resulting CICI.

4/28/2014

17

One-Sided Confidence BoundsThe confidence interval in Equation gives both a lower confidence bound and an upper confidence bound for µ.

Thus it provides a two-sided CI.

It is also possible to obtain one-sided confidence bounds for µ by setting either and replacing

Definition: One-sided Confidence Bounds on the Mean. Variance Known

Error in estimating µ with

4/28/2014

18

General Method to Derive a Confidence Interval

4/28/2014

19

A LargeLarge--SampleSample Confidence Interval for µ

Definition: LargeLarge--SampleSample Confidence Interval on the Mean

4/28/2014

20

Confidence Interval on the Mean of a NormalDistribution,

Variance Unknown ?WithWith knownknown σ2: Construct confidence intervals on the mean of a normal population.

Used the following procedure.

This CI is also approximately valid (because of the central limit theorem) regardless of whether or not the underlying population is normal, so long as

n is reasonably large (n >= 40). Even can handle the case of unknown variance for the large-sample-size situation. However, when the

sample is smallsmall and σ2 is unknownunknown??

Must make an assumption about the formform of the underlyingunderlying distributiondistribution to obtain a valid CI procedure. A reasonable assumption in many cases is that the underlying distribution is normal.

4/28/2014

21

Many populations encountered in practice are well approximated by the

normalnormal distributiondistribution, so this assumption will lead to

confidence interval procedures of wide applicability. In fact, moderate departure from normality will have little effect on validity.

When the assumption is unreasonable, an alternate is to use the nonparametricnonparametric proceduresprocedures

that are valid for any underlyingunderlying distributiondistribution.Suppose that the population of interest has a

normal distribution with unknownunknown meanmean µ

and unknownunknown variancevariance σ2.

Assume that a random sample of size n, say

X1, X2, …., Xn, is available, and let and

S2

be the samplesample meanmean and variancevariance, respectively.

x

4/28/2014

22

We wish to construct a two-sided CI on µ. If the variance

σ2 is knownknown, we know that

has a standard normal distribution. When

σ2 is unknownunknown, a logical procedure is to replace

σwith the

samplesample standardstandard deviationdeviationS.

The random variable Z now becomes

A logical question is what effect does replacing σ by S have on the distribution of the random variable T?

If n is largelarge, the answer to this question is “veryvery littlelittle,” and we can proceed to use the confidenceconfidence intervalinterval based on the normalnormal distributiondistribution as before.

However, n is usuallyusually smallsmall in most engineeringengineering problemsproblems, and in this situation a different distribution must be employed to construct the CI.

4/28/2014

23

The t DistributionDefinition: t Distribution

where k is the number of

degrees of freedom. The

mean and variance

of the t distribution are zero and k/(k - 2) (for k > 2),

respectively.

t Distribution

4/28/2014

24

Probability density functions of several t distributions.

Dirtributions

Several t distributions are shown. The general appearance of the t distribution is similar to the standard normaldistribution in that both distributions are

symmetric and unimodal, and the maximum ordinate value is reached when the mean µ=0.

4/28/2014

25

However, the t distribution has heavier tails than the normal;

that is, it has more probability in the tails than the normal distribution. As the number of

degrees of freedom k -->∞, the limiting form of the t distribution

is the standard normal distribution. Generally, the number of

degrees of freedom for t are the number of degrees of freedom

associated with the estimated standard deviation.

4/28/2014

26

Percentage points of the t distribution.

4/28/2014

27

4/28/2014

28

Development of the t Distribution (The t Confidence Interval on µ)

4/28/2014

29

Definition: Confidence Intrerval on the Mean, Variance Uknown

4/28/2014

30

Confidence Interval Confidence Interval on the VarianceVariance andStandard Deviation of a Normal Population

Definition: Chi2 Distribution (to be pronounced kAi2)

4/28/2014

31

4/28/2014

32

4/28/2014

33

4/28/2014

34

4/28/2014

35

Definition:Confidence Interval on the Variance

Definition: One-sided Confidence Bound on the Variance

4/28/2014

36

Example: Detergent Filling

4/28/2014

37

Large-Sample Confidence Interval for aPopulation Proportion

Definition: Normal Approximation for the Binomial Proportion

4/28/2014

38

4/28/2014

39

Definition: Approximation Confidence Interval on a Binomial Proportion

4/28/2014

40

Example: Crankshaft Bearing

4/28/2014

41

One-Sided Confidence Bounds

Definition:Approximation One-sided Confidence Bounds on a Binomial Proportion

4/28/2014

42

Prediction Interval for a Future ObservationTolerance and Prediction Intervals

4/28/2014

43

Definition: Prediction Interval

4/28/2014

44

Example: Alloy Adhesion

4/28/2014

45

Tolerance Intervals for a Normal Distribution

Definition: Tolerance Interval

4/28/2014

46

Example: Alloy Adhesion

4/28/2014

47

Student’s t-distributionConjugateConjugate priorprior for the

precisionprecision of a GaussianGaussianis given by a

gammagamma distributiondistribution. If we have a

univariateunivariate GaussianGaussiantogether with a

GammaGamma priorpriorand we integrate out the precision, we obtain the marginal distribution of

x in the form

4/28/2014

48

where we have made the change of variable

By convention we define new parameters given by and

in terms of which the distribution

takes the form

which is known as Student’s tStudent’s t--distributiondistribution. The

parameter λis sometimes called the precision of the t-distribution, even though it is not in general equal to the

inverse of the variance. The parameter ν is called the degreesdegrees of freedom (freedom (dfdf or or DoFDoF)). For the particular case of

ν = 1, the tt--distributiondistribution reduces to the CauchyCauchy distributiondistribution, while in the limit

ν →∞ the tt--distributiondistribution St(St(x|µx|µ, λ, ν), λ, ν) becomes a GaussianGaussian N(N(x|µx|µ, λ, λ−−11)) with mean µ and precision λ.

4/28/2014

49

4/28/2014

50

We see that Student’s t-distribution is obtained by adding up an infinite number of Gaussian distributions having the same

mean but different precisions.

This can be interpreted as an infinite mixture of Gaussians.

The result is a distribution that in general has longer ‘tails’ than a Gaussian, as was seen in Figure. This gives the t-distribution an important property called robustness, which means that it is much less sensitive than the Gaussian to the presence of a few data points which are outliers.The robustness of the t-distribution is illustrated in Figure, which compares themaximum likelihood solutions for a Gaussian and a t-distribution. Note that the maximum likelihood solution for the t-distribution can be found using the expectation maximization (EM) algorithm. Here we see that the effect of a small number of outliers is much less significant for the t-distribution than for the Gaussian. Outliers can arise in practical applications either because the process that generates the data corresponds to a distribution having a heavy tail or simply through mislabelleddata.

4/28/2014

51

Illustration of the robustness of Student’s t-distribution compared to a Gaussian. (a) Histogram distribution of 30 data points drawn from a Gaussian distribution,

together with the maximum likelihood fit obtained from a t-distribution (red curve) and a Gaussian (green curve, largely hidden by the red curve). Because the t-distribution contains the Gaussian as a special case it gives almost the same solution as the Gaussian.

(b) The same data set but with three additional outlying data points showing how the Gaussian (green curve) is strongly distorted by the outliers, whereas the t-distribution (red curve) is relatively unaffected.

4/28/2014

52

Robustness is also an important property for regression problems. Unsurprisingly, the least squares approach to regression does not exhibit robustness, because it corresponds to maximum likelihood under a (conditional) Gaussian distribution. By basing a regression model on a heavy-tailed distribution such as a t-distribution, weobtain a more robust model.

If we go back and substitute the alternative parameters ν = 2a, λ = a/b, and η = τb/a,

we see that the t-distribution can be written in the form

We can then generalize this to a multivariate Gaussian N(x|μ,Λ) to obtain the correspondingmultivariate Student’s t-distribution in the form

Using the same technique as for the univariate case, we can evaluate this integral to give

4/28/2014

53

This is the multivariate form of Student’s t-distribution and satisfies the following properties

where D is the dimensionality of x, and Δ2 is the squared MahalanobisMahalanobis distancedistancedefined by

Go Back

يﺮﯿﮔدﺎﯾ...

Documents