empirical likelihood confidence intervals under unequal
TRANSCRIPT
1
Empirical Likelihood confidence intervals under unequal
probability sampling
Yves G. Berger
Omar De La Rivar Torres
Design-based inference without re-sampling, linearisation and variance estimation
2
Outline
● Issues with standard Confidence Intervals.● A new Empirical Likelihood approach:
➔Point estimation➔Estimation of Confidence Intervals
● Simulations● European Income & Living Condition Survey
2009 (NET-SILC2, EUROSTAT)● Concluding remarks
3
Issues with standardConfidence Intervals
● Skewed variables
→ Skewed sampling distributions
→ Poor coverages of Standard CI
→ Linearised variance estimates can be poor
● Example: Income/wealth variables
Domains Extreme values
Measures of poverty, Quantiles
4
Example: Confidence interval of a 10% quantile
● Skewed population (exponential)● Skewed sampling distribution
95% CI based upon: n = 80N= 800
n = 80N = 150
Linearisation 98% 99.8%Rescaled Bootstrap 97% 99%Direct Bootstrap 93% 90%Woodruff 93% 94%Proposed Empirical Likelihood 96% 94%
5
Example: Estimation of a mean with auxiliary variables
● Skewed data N = 150
n = 40 n = 80Standard 91% 93%Pseudo - EL1 94% 94%Pseudo - EL2 87% 89%Proposed EL 95% 94%
6
Example: Persistent risk of Poverty (European Income & Living Condition
Survey 2009)● Male 25 yo – 44yo Standard Emp. Likelihood
7
New Empirical Likelihood Approach
Does not involve● variance estimates● Linearisation● Re-sampling● normality of the point estimator● negligible sampling fractions
Remark: Pseudo-EL proposed EL
Pseudo-EL relies on variance
8
The parameter of interest
● Population parameter solution of estimating equations
● Examples: Mean, Total, Ratio, Quantiles,
M-estimator, Poverty indicators, regression,
Winsorisation ...
does not need to be differentiable!
9
Proposed Empirical Likelihood Approach
● Empirical likelihood function:
● = Unit mass of unit
10
Proposed Empirical Likelihood Approach
● Maximise
Under the constraint
Design + auxiliary ● Example:
auxiliary variables
strat. variables
11
Empirical log-likelihood ratio function (deviance)
"Reduced" "Full"
● Maximum under
● Maximum under
12
Maximum Empirical Likelihood Estimator
● Maximum EL Estimator of minimises
Maximum EL Estimator is the solution of
The maximise under
13
Maximum Empirical Likelihood Estimator
● Maximise
under the constraint and
● Solution:
● Consider that
always holds
14
Examples of Maximum Empirical likelihood estimators
● Example 1: "model" with just an intercept
Hájek Estimator
Greg if contains auxiliary variables
15
Examples of Maximum Empirical likelihood estimators
● Example 2: Ratio "model"
Hortvitz-Thompson estimator
16
Maximum Empirical Likelihood Estimator
● Example 3: Auxiliary variables within
Optimal GREG● Example 4:
Kim (2009) EL
17
Pps sampling (with replacement)
● Under regularity conditions
under pps sampling
18
Empirical Likelihood Confidence Intervals (pps sampling)
● Confidence intervals (Wilks' type)
19
Empirical Likelihood Confidence Intervals
20
EL relies on normality of the estimating equation when !
Remark: with auxilliary variables, Greg
instead of HT● The point estimator does not have to be normal
or unbiased
stronger & harder to justify
21
Without auxilliary variables
"Reduced" "Full"
22
With auxilliary variables
23
With auxilliary variables + Stratification
24
πps sampling(without replacement)
25
πps sampling(without replacement)
under Hájek (1964) asymptotic framework "High entropy"
26
πps sampling(without replacement)
● reduce the effect on the CI of units with large (finite population corrections)
● not needed
● not adjusted by parameters that need to
be estimated
● Can be extended with auxilliary variables
27
Simulations
● Population data (skewed) Rao & Wu (2006)
● and ~ exponential●
●
●
●
● Value to control correlation( , )
28
Coverage Prob. Mean. N=800. No Auxil. Var.
29
Coverage Prob. Mean. N=150. No Auxil. Var.
30
Coverage Prob. Mean. N=150. With Auxil. Var.
31
Variance Length CI. Mean. N=150. With Auxil.
32
Coverage Prob. 1St Quartile. N=800. No Auxil.
33
Variance Length CI. 1St Quartile. N=800. No Auxil.
34
Coverage Prob. 1St Quartile. N=150. No Auxil.
35
Variance Length CI. 1St Quartile. N=150. No Auxil.
36
Persistent risk of Poverty (European Income & Living Condition Survey 2009)
● Male 25 yo – 44yo (multi-stage designs)
37
New EL versus Bootstrap● Does not need re-sampling. "Simpler than
bootstrap"● Wider class of parameters compared to
bootstrap. ● More stable CI than direct bootstrap● Better coverage than bootstrap● EL include design information ( , stratication,
clusters).● EL intervals take into account of the bias of the
point estimator
38
New EL versus Pseudo-Empirical likelihood
● New EL ≠ Pseudo-EL● The pseudo-EL function is not a standard EL
function● CI Relies on variances (design effect)● May need N for totals and counts● Limited range of parameter with pseudo-EL. E.g.
no pseudo-EL CI for quantiles (only woodruff)● More stable CI than pseudo-EL● Design information through a design effect
(estimated)● Range preserving and good coverages
39
New EL versus Calibration
● Equivalent point estimator.● EL can be used without auxilliary information● EL can be used for testing, CI, p-values● EL can be used with "calibration weights"
(same point estimates)
● Calibration relies on CLT & variances● Calibration relies on a distance function
disconnected from mainstream statistics
40
Extensions
● Multi-stage samplng (OK for small sampling fractions)
● Rao-Hartley-Cochran design● Modelling: design naturally included, random
effect no needed● Conditional Estimating Equations
Example
Can't be solved with estimating equations
41
Extensions
● Re-weighting (Total nonresponse)
● Random Hot-deck imputation
● Calibration on known quantiles
or on distribution functions
42
STD Bootstrap
RS DirectEmp. Lik.
Design based √ √ √ √
Does not rely on normality of Point estimator × √ × √
Does not need variance estimates × √ × √
Does not need re-sampling √ × × √
Does not need linearisation × √ √ √
Range preserved × √ × √
Take into account sampling distribution × √ × √
Take into account of the design √ √ ?√ √
Suitable with large sampling fractions √ × √ √Complex Parameters √ √ ? √
43
Concluding Remarks● Does not involve variance estimation &
Linearisation● Design based (non-parametric)● Flexible and general approach (complex
parameters, modelling)● Does not rely on normality of the point
estimator● Better coverage for confidence intervals
(better inference)● EL intervals take into account of the bias
44
References
BERGER & DE LA RIVA TORRES (2012).
http://eprints.soton.ac.uk/337688/
BERGER & DE LA RIVA TORRES (2012).
Proceedings of the Survey Research Method Section of the American Statistical Association, Joint Statistical Meeting, San Diego
OSIER, BERGER and GOEDEMÉ (2013)
Standard error estimation for the EU-SILC indicators of poverty and social exclusion. Eurostat “Methodologies and Working papers” series
45
Regularity conditions●
●
●
●
●
●