يﺮﯿﮔدﺎﯾ...
TRANSCRIPT
4/28/2014
1
فاصله هاي آماري براي یک نمونهStatistical Intervals for
single Sample
نظریه یادگیري)Learning Theory(
دانشگاه فردوسی مشهد دانشکده مهندسی
رضا منصفی
درس و پنجمبیست
4/28/2014
2
Outline Confidence interval on the Mean of a Normal Distribution, Variance Known
Development of the Confidence Interval and its Basic Properties Choice of Sample Size One-Sided Confidence Bounds General Method to Derive a Confidence Interval A Large-Sample Confidence Interval for µ Bootstrap Confidence Intervals
Confidence interval on the Mean of a Normal Distribution, Variance Unknown The t Distribution Development of the t Distribution The t Confidence Interval on µ
Confidence Interval on the Variance and Standard Deviation of a Normal Population Large-Sample Confidence Interval for a Population Proportion Prediction Interval for a Future Observation Tolerance Intervals for a Normal Distribution
4/28/2014
3
IntroductionRecapitulate: In last lesson, showed how a parameter can be
estimatedestimated from samplesample datadata.Note: Important to understand how good is the estimate obtained.
Example: Suppose estimate of the mean of a parameter is
Sampling Variability: Almost never the case that . The point estimate says nothing about how close is to µ.
? Is the process mean likely to be between 900 and 1100? Or
? Is it likely to be between 990 and 1010? Note: Answer to these questions, affectsaffects our decisions regarding
this process. Interval estimate: BoundsBounds that represent an intervalinterval of plausibleplausible valuesvalues for a
parameterparameter are an example of an intervalinterval estimateestimate.Surprisingly, it is easy to determine such intervals in many cases, and
the same data that provided the point estimate point estimate are typically used.Confidence Interval: An interval estimate for a population parameter is called a
Confidence Interval (CI).
ˆ 1000m x x
(900 1100)2
ˆ 1000
(990 1010)2
ˆ 1000
4/28/2014
4
CannotCannot be certaincertain that the interval contains the true, unknown population parameter.
Only used a sample from the full population to compute the point estimate and the interval. However, the confidenceconfidence intervalinterval is constructed so that we have highhigh confidenceconfidence that it does contain the unknown population parameter. Confidence intervals are widely used in engineering and the sciences.
Tolerance interval: Another important type of interval estimate.
Assumption: Data of the estimated parameter might be assumed to be normally distributed.We might like to calculate limits that bound 95% of the parameter values.
For a normal distribution, we know that 95% of the distribution is in the interval
4/28/2014
5
As for a confidence interval, it is not certain that bounds 95% of the distribution, but the interval is constructed so that we have highhigh confidenceconfidence that it does.
Tolerance intervals are widely used and, as we will subsequently see, they are easy to calculate for normal distributions.
This is not a usefuluseful tolerancetolerance interval interval because the parameters µ and σ are unknown.Point estimates such as
for µ and s for σ
can be used.However, we need to account for the potential error in each
point estimate and s to form a tolerance interval for the distribution.
The result is an interval of the form
m x
where k is an appropriate constant (that is larger than 1.96 to account for the estimation error).
x
!?
4/28/2014
6
3 types of interval estimates:-
1)1) CConfidenceonfidenceinterval
boundsbounds populationpopulation or distribution parametersparameters(such as the mean)
2)2) ToleranceTolerance interval
boundsbounds a selectedselected proportionproportion of a distributiondistribution
3)3) PredictionPrediction interval
boundsbounds futurefuture observationsobservations from the populationpopulation or distribution
..محدوده ي پارامترهاي جمعیت یا توزیع را مشخص می کندمحدوده ي پارامترهاي جمعیت یا توزیع را مشخص می کند
..محدوده ي قسمت انتخابی از توزیع را مشخص می کندمحدوده ي قسمت انتخابی از توزیع را مشخص می کند
..محدوده ي مشاهده هاي آتی از جمعیت یا توزیع را مشخص می کندمحدوده ي مشاهده هاي آتی از جمعیت یا توزیع را مشخص می کند
4/28/2014
7
1) ConfidenceConfidence interval and
2) ToleranceTolerance intervalbound unknown elements of a distribution.
3) PredictionPrediction intervalprovides bounds on one (or more) future observationsfrom the population.
For example, a prediction interval
could be used to bound a single, new measurement of
one more parameter.With a large sample size,
the predictionprediction intervalinterval for normally distributed data tends to the tolerancetolerance intervalinterval,
but for more modest sample sizes
the prediction and tolerance intervalsare differentdifferent.
4/28/2014
8
Confidence Interval on the Mean of a Normal Distribution,
VarianceVariance KnownKnownSimple situation: The basic ideas of a Confidence Interval (CI) are most easily
understood by initially considering a simple situation.
Suppose that we have a normal population with unknownunknown mean µ and knownknown variance σ2!?
This is a somewhat unrealistic scenario because typically we know the distribution
meanmeanbefore we
know the variancevariance !?
However, later will present confidence intervals for more general situations
(t Distribution).
!?
4/28/2014
9
Development of the Confidence Interval and its Basic PropertiesSuppose X1, X2, ….., Xn is a
random sample from a normal distribution with
unknown mean µ and known variance σ2 !?
The sample mean is normally distributed with mean µ and variance σ2/n.May standardize by
1) subtracting the mean and 2) dividing by the standard deviation, which results in the variable
XX
Z has a standard normal distribution.A confidence interval estimate for µ is an interval of the form where the
end-points l and u are computed from the sample data.
Because differentdifferent samplessamples will produceproduce differentdifferent valuesvalues of l and u,
these end-points are values of random variables L and U, respectively. Suppose that we can determine values of L and U such that the following
probability statement is true:
l u
4/28/2014
10
There is a probability of of selecting a sample for which the CI will contain the truevalue of µ. Once we have selected the sample, so that X1 = x1, X2 = x2, ….., Xn = xn , and computed l and u, the resulting confidence interval for µ is
The end-points or bounds l and u are called the lowerlower-- andand upperupper--confidenceconfidence limitslimits, respectively, and
is called the ConfidenceConfidence CoefficientCoefficient.
In our problem situation, because has a standard normaldistribution, we may write
Manipulate the quantities inside the brackets by(1) multiplying through by (2) subtracting from each term, and(3) multiplying through by -1.
This results in
X
L U
4/28/2014
11
From consideration of the lower and upper limits of the inequalities in
are the lower- and
upper-confidence limitsL and U, respectively.
This leads to the following definition.
Definition:
L U
4/28/2014
12
Example:
4/28/2014
13
Interpreting a Confidence IntervalHow does one interpret a confidence interval? In the impact energy estimation problem, the 95% CI is , so it is tempting to conclude that µ is within this interval with probability of 95%.
However, with a little reflection, it’s easy to see that this cannotcannot bebe correctcorrect!!!!!!!!!!!!!!
the true value of µ is unknownunknown and the statementstatement is either
correctcorrect (true with probability 1) or incorrectincorrect (false with probability 1).
The correct interpretation lies in the realization that a CICI is a randomrandom intervalinterval
because in the
probability statementdefining the
end-points of the interval , L and U are random variables.
Consequently, the correct interpretation of a 100(1 - α)% CI depends on the relative frequency view of probability.
63.84 65.08
63.84 65.08
4/28/2014
14
Specifically, if an infinite number of random samples are collected and a 100(1-α )% confidence interval for
µ is computed from each sample, 100(1-α )% of these intervals will contain the
true value of µ.
Repeated construction of a confidence interval for µ.
Several 100(1-α )% confidence intervals for the mean µ of a normal distribution.
??
Dots at the center of the intervals indicate the point estimate of µ (that is, )x
Notice that one of the intervals fails to contain the true value of µ. i.e., confidence 95%
If this were a 95%confidence interval, in the long run only (100%-95%)=5% of the intervals would fail to contain µ.
4/28/2014
15
In practice, we obtain only oneone randomrandom sample sample
and calculatecalculateoneone confidenceconfidence intervalinterval.
Since this interval either willwill oror willwill notnot
contain the truetrue valuevalue of µ,
it is not reasonablereasonable to attach a probabilityprobability levellevel !!
to this specificspecific eventevent. The appropriate statement is the
observed interval [l, u] brackets the true value of µ with confidenceconfidence 100(1-α )%.
This statement has a frequency interpretation; that is, we don’t know if the statement is truetrue forfor thisthis specificspecific samplesample,
but the method used to obtain the interval [l, u] yields correct statements100(1-α )% of the time.
4/28/2014
16
Confidence Level and Precision of EstimationOur choice of the 95% levellevel of confidenceconfidence was essentially arbitrary. What would have happened if we had chosen a
higherhigher levellevel of confidenceconfidence, say, 99%?In fact, doesn’t it seem reasonable that we would want the higher level of confidence? At α=0.01, we find while for α=0.05, . Thus, the length of the 95% CICI iswhereas the length of 99% CICI is
Thus, the 99% CI is longer than the 95% CI. This is why we have a higher level of confidence in the 99% confidence interval.
Generally, for a fixed sample size n and standard deviation σ,
the higher the confidence level, the longer the resulting CICI.
4/28/2014
17
One-Sided Confidence BoundsThe confidence interval in Equation gives both a lower confidence bound and an upper confidence bound for µ.
Thus it provides a two-sided CI.
It is also possible to obtain one-sided confidence bounds for µ by setting either and replacing
Definition: One-sided Confidence Bounds on the Mean. Variance Known
Error in estimating µ with
4/28/2014
18
General Method to Derive a Confidence Interval
4/28/2014
19
A LargeLarge--SampleSample Confidence Interval for µ
Definition: LargeLarge--SampleSample Confidence Interval on the Mean
4/28/2014
20
Confidence Interval on the Mean of a NormalDistribution,
Variance Unknown ?WithWith knownknown σ2: Construct confidence intervals on the mean of a normal population.
Used the following procedure.
This CI is also approximately valid (because of the central limit theorem) regardless of whether or not the underlying population is normal, so long as
n is reasonably large (n >= 40). Even can handle the case of unknown variance for the large-sample-size situation. However, when the
sample is smallsmall and σ2 is unknownunknown??
Must make an assumption about the formform of the underlyingunderlying distributiondistribution to obtain a valid CI procedure. A reasonable assumption in many cases is that the underlying distribution is normal.
4/28/2014
21
Many populations encountered in practice are well approximated by the
normalnormal distributiondistribution, so this assumption will lead to
confidence interval procedures of wide applicability. In fact, moderate departure from normality will have little effect on validity.
When the assumption is unreasonable, an alternate is to use the nonparametricnonparametric proceduresprocedures
that are valid for any underlyingunderlying distributiondistribution.Suppose that the population of interest has a
normal distribution with unknownunknown meanmean µ
and unknownunknown variancevariance σ2.
Assume that a random sample of size n, say
X1, X2, …., Xn, is available, and let and
S2
be the samplesample meanmean and variancevariance, respectively.
x
4/28/2014
22
We wish to construct a two-sided CI on µ. If the variance
σ2 is knownknown, we know that
has a standard normal distribution. When
σ2 is unknownunknown, a logical procedure is to replace
σwith the
samplesample standardstandard deviationdeviationS.
The random variable Z now becomes
A logical question is what effect does replacing σ by S have on the distribution of the random variable T?
If n is largelarge, the answer to this question is “veryvery littlelittle,” and we can proceed to use the confidenceconfidence intervalinterval based on the normalnormal distributiondistribution as before.
However, n is usuallyusually smallsmall in most engineeringengineering problemsproblems, and in this situation a different distribution must be employed to construct the CI.
4/28/2014
23
The t DistributionDefinition: t Distribution
where k is the number of
degrees of freedom. The
mean and variance
of the t distribution are zero and k/(k - 2) (for k > 2),
respectively.
t Distribution
4/28/2014
24
Probability density functions of several t distributions.
Dirtributions
Several t distributions are shown. The general appearance of the t distribution is similar to the standard normaldistribution in that both distributions are
symmetric and unimodal, and the maximum ordinate value is reached when the mean µ=0.
4/28/2014
25
However, the t distribution has heavier tails than the normal;
that is, it has more probability in the tails than the normal distribution. As the number of
degrees of freedom k -->∞, the limiting form of the t distribution
is the standard normal distribution. Generally, the number of
degrees of freedom for t are the number of degrees of freedom
associated with the estimated standard deviation.
4/28/2014
26
Percentage points of the t distribution.
4/28/2014
27
4/28/2014
28
Development of the t Distribution (The t Confidence Interval on µ)
4/28/2014
29
Definition: Confidence Intrerval on the Mean, Variance Uknown
4/28/2014
30
Confidence Interval Confidence Interval on the VarianceVariance andStandard Deviation of a Normal Population
Definition: Chi2 Distribution (to be pronounced kAi2)
4/28/2014
31
4/28/2014
32
4/28/2014
33
4/28/2014
34
4/28/2014
35
Definition:Confidence Interval on the Variance
Definition: One-sided Confidence Bound on the Variance
4/28/2014
36
Example: Detergent Filling
4/28/2014
37
Large-Sample Confidence Interval for aPopulation Proportion
Definition: Normal Approximation for the Binomial Proportion
4/28/2014
38
4/28/2014
39
Definition: Approximation Confidence Interval on a Binomial Proportion
4/28/2014
40
Example: Crankshaft Bearing
4/28/2014
41
One-Sided Confidence Bounds
Definition:Approximation One-sided Confidence Bounds on a Binomial Proportion
4/28/2014
42
Prediction Interval for a Future ObservationTolerance and Prediction Intervals
4/28/2014
43
Definition: Prediction Interval
4/28/2014
44
Example: Alloy Adhesion
4/28/2014
45
Tolerance Intervals for a Normal Distribution
Definition: Tolerance Interval
4/28/2014
46
Example: Alloy Adhesion
4/28/2014
47
Student’s t-distributionConjugateConjugate priorprior for the
precisionprecision of a GaussianGaussianis given by a
gammagamma distributiondistribution. If we have a
univariateunivariate GaussianGaussiantogether with a
GammaGamma priorpriorand we integrate out the precision, we obtain the marginal distribution of
x in the form
4/28/2014
48
where we have made the change of variable
By convention we define new parameters given by and
in terms of which the distribution
takes the form
which is known as Student’s tStudent’s t--distributiondistribution. The
parameter λis sometimes called the precision of the t-distribution, even though it is not in general equal to the
inverse of the variance. The parameter ν is called the degreesdegrees of freedom (freedom (dfdf or or DoFDoF)). For the particular case of
ν = 1, the tt--distributiondistribution reduces to the CauchyCauchy distributiondistribution, while in the limit
ν →∞ the tt--distributiondistribution St(St(x|µx|µ, λ, ν), λ, ν) becomes a GaussianGaussian N(N(x|µx|µ, λ, λ−−11)) with mean µ and precision λ.
4/28/2014
49
4/28/2014
50
We see that Student’s t-distribution is obtained by adding up an infinite number of Gaussian distributions having the same
mean but different precisions.
This can be interpreted as an infinite mixture of Gaussians.
The result is a distribution that in general has longer ‘tails’ than a Gaussian, as was seen in Figure. This gives the t-distribution an important property called robustness, which means that it is much less sensitive than the Gaussian to the presence of a few data points which are outliers.The robustness of the t-distribution is illustrated in Figure, which compares themaximum likelihood solutions for a Gaussian and a t-distribution. Note that the maximum likelihood solution for the t-distribution can be found using the expectation maximization (EM) algorithm. Here we see that the effect of a small number of outliers is much less significant for the t-distribution than for the Gaussian. Outliers can arise in practical applications either because the process that generates the data corresponds to a distribution having a heavy tail or simply through mislabelleddata.
4/28/2014
51
Illustration of the robustness of Student’s t-distribution compared to a Gaussian. (a) Histogram distribution of 30 data points drawn from a Gaussian distribution,
together with the maximum likelihood fit obtained from a t-distribution (red curve) and a Gaussian (green curve, largely hidden by the red curve). Because the t-distribution contains the Gaussian as a special case it gives almost the same solution as the Gaussian.
(b) The same data set but with three additional outlying data points showing how the Gaussian (green curve) is strongly distorted by the outliers, whereas the t-distribution (red curve) is relatively unaffected.
4/28/2014
52
Robustness is also an important property for regression problems. Unsurprisingly, the least squares approach to regression does not exhibit robustness, because it corresponds to maximum likelihood under a (conditional) Gaussian distribution. By basing a regression model on a heavy-tailed distribution such as a t-distribution, weobtain a more robust model.
If we go back and substitute the alternative parameters ν = 2a, λ = a/b, and η = τb/a,
we see that the t-distribution can be written in the form
We can then generalize this to a multivariate Gaussian N(x|μ,Λ) to obtain the correspondingmultivariate Student’s t-distribution in the form
Using the same technique as for the univariate case, we can evaluate this integral to give
4/28/2014
53
This is the multivariate form of Student’s t-distribution and satisfies the following properties
where D is the dimensionality of x, and Δ2 is the squared MahalanobisMahalanobis distancedistancedefined by
Go Back