Small Area Estimation: An Appraisal

Nikos Tzavidis 1

Workshop on Measuring Progress at a Local LevelPisa, May 28-29, 2013

1Southampton Statistical Sciences Research Institute, University ofSouthampton (

Small Area Estimation & Measuring Progress at Local Level

Small Area Estimation & Measuring Progress at Local Level


A Non-technical Introduction to SAE

Motivation & Definition

Data requirements

Small Area Methods & Case Studies

Direct Vs. indirect methods

Model-based Vs. design-based methods

Methodologies for continuous outcomes

Methodologies for discrete outcomes

Case studies

Income & povertyUnemploymentHealth outcomes

Concluding remarks

From the statistcian’s desk to policy

Small Area Estimation & Measuring Progress at Local Level

Part I

Introduction to SAE

Small Area Estimation & Measuring Progress at Local Level


Surveys are used to provide estimates for large domains

Estimates for smaller domains are important

Direct Estimation: Use only domain-specific data

Problems with direct estimation

1 Direct estimates may suffer from low precision

2 Not applicable with zero sample sizes

Small Area Estimation & Measuring Progress at Local Level

What is Small Area Estimation?

A definition

Small area estimation is concerned with the development ofstatistical procedures for producing efficient (precise) estimates fordomains (planned or unplanned) with small or zero sample sizes.Domains are defined by the cross-classification of geographicaldistricts by social/economic/demographic characteristics.The target is the estimation of a parameter(average/percentile/proportion/rate) and the estimation of thecorresponding prediction error.


Small Area Estimation & Measuring Progress at Local Level

Is SAE Relevant for Policy Makers?

Letter from the House of Commons to ONS

The House of Commons has for many years produced a monthlyreport for Members on unemployment by constituency. This reporthas been highly valued by them. Over the last few years the Officefor National Statistics has been developing and improving itsportfolio of labour market statistics. We recognise that what wereally need is a consistent and reliable set of up to dateconstituency level labour market statistics covering unemployment,employment and inactivity. I am aware that research is underwaywithin ONS to achieve this goal. The purpose of this letter is tostress the importance of bringing this work to a conclusion so thatboth of our organisations can provide a common authoritative setof labour market information for Members of Parliament andothers.

Small Area Estimation & Measuring Progress at Local Level

SAE - Data Requirements

Survey Data: Available for y and for x related to y

Census/Administrative Data: Available for x but not for y

SAE in 3 Steps

1 Use survey data to estimate models that link y to x

2 Combine the estimated model parameters with x, for out ofsample units, to form predictions

3 Use these predictions to estimate the target parameters

Small Area Estimation & Measuring Progress at Local Level

SAE - Data Requirements (Cont’d)

Access to good auxiliary information is crucial

Data requirements depend what is the target parameter

Case 1: Averages: Domain-level means/totals of auxiliaryvariablesCase 2: Percentiles: Auxiliary information available for everyunit in the population

Small Area Estimation & Measuring Progress at Local Level

Direct Vs. Indirect Methods

Direct methods use only domain-specific data

Indirect methods borrow information from all data

Small Area Estimation & Measuring Progress at Local Level

Design-based Vs. Model-based methods

Model-based methods

Borrow strength by using a model

Estimation using frequentist or Bayesian approaches

Inference is under model conditional on the selected sample

Design-based (Model-assisted) methods

Direct estimation

Can allow for use of models (model-assisted)

Inference is under the randomization distribution

Small Area Estimation & Measuring Progress at Local Level

SAE: A Paradigm Shift

NSIs producing official statistics avoid the use of models

SAE: One area in official statistics where models accepted

Presents a paradigm shift for NSIs

Impacts on the use of SAE methods in practice

Small Area Estimation & Measuring Progress at Local Level

Part II

SAE for Continuous Outcomes

Small Area Estimation & Measuring Progress at Local Level

Popular model-assisted estimators of domainaverages

Synthetic estimator

ˆyk = XTk βw

βw is the probability weighted estimator

Can be biased BUT

Fairly stable

Survey regression & Generalised regression estimators (GREG)

ˆyk = ˆY HTk + (Xk − ˆXHT

k )T βw

Corrects the potential bias of the synthetic estimator BUT

Can be unstable

Small Area Estimation & Measuring Progress at Local Level

Model-based MethodsNested Error Regression Model

Key Concept (Battese, Harter & Fuller, 1988)

Include random area-specific effects to account for between area variationbeyond that explained by model covariates

Notation: (k =domain, i =individual)

yik = xTikβ + uk + εik, i = 1, ..., nk, k = 1, ...d

Estimator of the small area average

ˆyk = γk(yk + (Xk − xk)β) + (1− γk)Xkβ,

γk =σu

(σu + σe/nk)

Small Area Estimation & Measuring Progress at Local Level

Advances in Model-based SAE - Nested ErrorRegression Model & Beyond

Empirical Best Prediction (Molina & Rao, 2010)

Dealing with outliers(Sinha & Rao, 2009 ; Chambers et al., 2013; Giusti et al.,2013)

Design consistent estimation(You & Rao, 2002)

Estimation with M-quantile models(Chambers & Tzavidis, 2006; Fabrizi et al.,2013; Marchetti etal.,2012)

Non-parametric models(Opsomer et al., 2008)

Incorporating spatial structures(Salvati & Pratesi, 2009)

Small Area Estimation & Measuring Progress at Local Level

Small Area Estimation with M-quantileRegression

Main idea of SAE with M-quantile regression(Chambers & Tzavidis, Biometrika, 2006)

Quantiles/M-quantiles used for describing group differences

Similar role to random effects BUT

Estimation is semiparametric

If a hierarchical structure does explain part of the variability inthe data, units within the same domain will be clustered inthe same part of f(y|x)

Small Area Estimation & Measuring Progress at Local Level

Beyond AveragesThe Small Area Distribution Function (DF)

Averages offer a rather limited picture

Fk = N−1k


I(yk < z) +∑i∈rk

I(yk < z)]

Use an estimator of the distribution function

Derive estimates of medians and percentiles for small areas

Small Area Estimation & Measuring Progress at Local Level

Estimators of the Small Area DF

Estimators of the DF (Tzavidis et al., 2010)

Empirical distribution function

Fk = N−1k


I(yi < z) +∑i∈rk

I(yi < z)]

The Chambers-Dunstan estimator

Fk = N−1k


I(yi < z) +∑i∈sk


I(yl + (yi − yi) < z)]

Empirical Best Predictor - Monte-Carlo method for estimatingthe DF

Small Area Estimation & Measuring Progress at Local Level

Case Study I: Estimation of Income & Poverty

Estimation of income distributions and poverty

Two case studies: Italy and the UK

UK: Target areas - Local Authority Districts (∼400)

Italy: Target areas - Provinces in Regions

Data - Italy: EU-SILC, Census micro-data

Data - UK: Family Resources Survey, Census micro-data

Target parameters: Income distributions & Poverty indicators

Model-based estimates using EBP and M-quantile approaches

Small Area Estimation & Measuring Progress at Local Level

Income & Poverty in Lombardia

Small Area Estimation & Measuring Progress at Local Level

Income & Poverty in Tuscany

Small Area Estimation & Measuring Progress at Local Level

Income & Poverty in Calabria

Small Area Estimation & Measuring Progress at Local Level

Income Distributions in LADs in North West& South East England

Small Area Estimation & Measuring Progress at Local Level

Head Count Ratio in LADs in North West &South East England

Small Area Estimation & Measuring Progress at Local Level

Part III

SAE for Discrete Outcomes

Small Area Estimation & Measuring Progress at Local Level

A Binomial Generalised Linear Mixed Model

yik = 0, 1

yik|uk ∼ Bin(1, pik)

uk ∼ N(0,Σu)



= xTikβ + uk

Extensions to multinomial responses possible

Small Area Estimation & Measuring Progress at Local Level

Poisson Generalized Linear Mixed Model

yik is a count

yik|uk ∼ Poisson(µik)

uk ∼ N(0,Σu)

withlog(µik) = xTikβ + uk

Small Area Estimation & Measuring Progress at Local Level


Plug-in Empirical Best Predictor of Yd is

E(y|x, k) = N−1k



yik = exp{xTikβ + uk}

yik =exp(xTikβ + uk)

[1 + exp(xTikβ + uk)]

Notes on the use of GLMMs in SAE

Standard methods for fitting GLMMs can be sensitive tooutliers

Prediction of the random effects with GLMMs iscomputationally complicated

Small Area Estimation & Measuring Progress at Local Level

Robust Estimation for GLMs

Cantoni & Ronchetti JASA, 2001

yi from Exponential Family

E(yi) = µi ; V (yi) = V (µi); g(µi) = xTi β∑ni=1

(yi−µi)V (µi)

∂∂βµi = 0

Large deviations of yi from µi or leverage points− > influence∑ni=1


V 1/2(µi

∂∂βµi − α(β) = 0 (Huber quasi-likelihood)

α(β) = n−1∑E[ψ(ri)]w(xi)

1V 1/2(µi)


ri Pearson residuals; w(xi) controls leverage points

Two special cases: Poisson and Logistic regression

Small Area Estimation & Measuring Progress at Local Level

Robust SAE Estimation for DiscreteOutcomes

Chambers et al., 2013; Tzavidis et al.,2013

Extension of M-quantile approach for binary and countoutcomes

Let Qy(q|xi) = Qiqψ. Estimate βψ(q) by solving



V 1/2[Qiqψ]Q′iqψ − a(βψ(q)) = 0,


riq =yi−QiqψV 1/2[Qiqψ ]

, are the Pearson’s residuals

Q′iqψ = ∂Qiqψ/∂βψ(q)

Estimation: Fisher scoring algorithm

Small Area Estimation & Measuring Progress at Local Level

Case Study II: Estimation of Unemployment

Estimation of Unemployment

UK: Target areas - Local Authority Districts (∼400)

Data - UK: Labour Force Survey

Auxiliary info: Age by gender and unemployment benefitcounts.

Target parameters: LAD proportions of unemployed

Model-based estimates using Binomial-glmm andBinomial-MQ

Small Area Estimation & Measuring Progress at Local Level

Case Study II: Estimation of Unemployment

Small Area Estimation & Measuring Progress at Local Level

Case Study III: Estimating the Number ofVisits to Physicians in Italy

Ageing is great concern for Italy (65+, 20.3%)

Estimate the number of visits to physicians for the elderly

Data from the Health Conditions survey (reliable estimates atNUTS 2)

Regions (Tuscany, 23.3%, Liguria, 26.7%, Umbria, 23.1%)

60 Health Authorities - Small Areas

Model-based estimates using Poisson-glmm and Poisson-MQ

Small Area Estimation & Measuring Progress at Local Level

SAE Estimates

Small Area Estimation & Measuring Progress at Local Level

Recent Applications

Mexico - Estimation of multidimensional poverty

Definition incorporates many dimensions: Income, lack ofaccess to health & education

Estimation: Treat as a multinomial or count outcome

UK - Estimation of child poverty

Currently obtaining experimental estimates

Small Area Estimation & Measuring Progress at Local Level

Future Use of SAE in the UK - Beyond 2011

Alternative, more frequently updated, Census output

Efficient use of survey and administrative data

SAE methods evaluated for producing census outputs

Small Area Estimation & Measuring Progress at Local Level

Mean Squared Error (MSE) Estimation

Approaches to MSE estimation

Important part of small area estimation

Analytic and computer intensive approaches to MSE

Analytic MSE estimator for model-based averages(Prasad & Rao, 1990; Chambers, Chandra & Tzavidis, 2011)

Parametric bootstrap (Hall & Maiti, 2006)

Parametric & Non-parametric bootstrap for model-basedestimates of distributions(Hall & Maiti, 2006; Tzavidis et al., 2011)

Parametric and non-parametric bootstrap for model-basedestimates with GLMM and robust-GLMM (Manteiga et al.,2008; Chambers et al., 2013)

Small Area Estimation & Measuring Progress at Local Level

Part IV

Small Area Estimation &

Measuring Progress Locally

Small Area Estimation & Measuring Progress at Local Level

Producing Small Area Statistics

Need for a transparent estimation framework

Model-based estimation presents organisational paradigm shift

Properties of estimators must be clearly understood

System set up for ”industrial” production of target outputs

Computing power - MSE estimation is time consuming

Start using simpler estimation procedures & adapt gradually

Small Area Estimation & Measuring Progress at Local Level

From the Statistcian’s desk to policy -Reflections

Significant advances in model-based SAE. However,

Gap between producing the estimates and using the estimates

How are estimates used for creating policies?

How to allocate resources?

How to measure progress - Measuring impact?

Small Area Estimation & Measuring Progress at Local Level

