the probability sampling tradition in a period of crisis q2010 keynote speech carl-erik särndal...

The Probability Sampling Tradition in a period of crisis

Q2010 Keynote speech

Carl-Erik Särndal

Université de Montréal

The Probability Sampling Tradition

has governed surveys at National Statistical Institutes (NSI:s) for decades

Breaking a tradition : Not easy …

Background

The merits of probability sampling, also known as scientific sampling, are put in question by severe imperfections : non-sampling errors, economic pressures etc.

The problem not new – but more and more compelling

BackgroundThe probability sampling process• is expensive (through follow-ups);• its theoretical merits are compromised

(by nonresponse, etc.)• “a few extra %” amount to very little• alternative data collection methods exist

Yet probability sampling continues to be practiced. Wasteful ? Can we do without probability sampling?

My view

is a (Canadian) theoretician’s view

on (official) statistics production

To what extent guided by (statistical science) theory ?

Something we admire: Being able to predict facts about the world we

live in by theoretical arguments and deduction

This is the predictive power of science

In statistics: Want precise statements, backed by convincing theory, of level of unemployment, of industrial production, and so on

Theory as a basis for science (knowledge)

Theory as a basis for science

Gérard Jorland : How is it possible that one can predict, merely by theoretical deductions, the existence of a new planet, or a new chemical element, or a new elementary particle?

Based only on a calculus, on a set of mathematical equations ... remarkable achievement of the human mind.

Famous example: Planet Neptune was “found” by mathematical prediction by Le Verrier 1846, then empirically observed by Galle, at the position given by Le Verrier

Many other examples come from physics, astronomy, chemistry

A hypothesis to test:

The sciences are predictive to the extent that they are mathematically formulated.

But that hypothesis is rejected : Today, Economics is highly mathematical and theoretical, but such arguments did not predict the current economic crisis, for example.

The contrast

Physics: Predictive power of formal theory very high

Economics: Predictive power of formal theory low

So “science formulated mathematically” does not guarantee “predictive power of theory”

Why then are Physics and Economics different? Both are theoretical (mathematical) .

Contrasts

Physics : the objects (planets, elementary particles, and so on) are inanimate ; predictive power very high

Economics : the objects and the participants (human beings) are unpredictable, relationships highly complex; predictive power very low

Theory as a guide in statistics productionOur ambition : Create knowledge (predictions) about our

world through statistical surveys .

To what extent is this activity supported by theory ? To what extent scientific ?

Legitimate questions !

Some NSI:s take pride in “scientific principles”.

Sampling = Limiting attention to a small subset

To what extent scientific ?

We accept without hesitation that observing only n = 1,000 (or a few thousand) is enough - but provided the sample is “scientific”

What is a scientific sample ? RoperCentre, Univ. of Connecticut, says :

A scientific sample is a process in which respondents are chosen randomly by one of several methods. The key component in the scientific sample is that everyone within the designated group (sample frame) has a chance of being selected.

We may add : Such a sample also known as a probability sample It is not necessarily a representative sample in the sense “all have the same probability”.

scientific sample probability sample representative sample

around these terms, unfortunate ambiguity and confusion reigns in literature, in conversation

Ask, and you get a variety of responses

Sampling = Limiting attention to a small subset

Two contrasting examples:

Sampling trees in a forest - to predict volume

Sampling human beings in a country - to predict (assess) unemployment, or health conditions, or expenditures

Estimating volume of wood on a sample of trees

With classical probability sampling theory, we get not only a figure for the total volume of wood in the forest, but also a statement of its margin of error, free of any assumptions.

We can determine exactly the accuracy we want.

Estimating unemployed on a sample of people

We get from the LFS a figure, but we cannot quantify its margin of error. There is no objective declaration of numerical quality

because unmeasured are : nonresponse error, measurement error, frame error, recording and data handling error, and so on

The contrast

Trees are inanimate objects, like planets

Human beings, they are precisely that, human,

inconsistent, emotional, prone to error

The contrast

Trees : Predictive power of probability sampling theory very high – objects do not “cause trouble”

People : Predictive power of sampling theory very low - the survey is complex; human beings are involved

A large scale statistical investigation (survey) :

“Unpredictable people are involved at so many points of this incredibly complex process”

so we will never have a theory that will allow precise measurement of total survey error

(Stanley McCarthy 2001)

Producing numbers is (relatively) easy ; by comparison, stating their accuracy is difficult

Article by Platek and Särndal : Can a statistician deliver ?

J. Official Statistics vol. 17 (2001), pp. 1 – 127

with 16 discussions

and a rejoinder by the authors

Can a statistician fulfill the promise (to society) ?

Upon rereading : Have we advanced any, in 10 years ?

The title : Can a statistician deliver ?

“Statistician” may denote

the head of a National Statistical Institute (NSI)

or

a person expert in the subject (labour market, or health issues, or manufacturing industry, etc.)

or

a person trained in statistical science (methodologist)

As expected, feelings conveyed were of two kinds:

high ranking NSI officials: “Keep the ship sailing”, despite difficult times

academics and researchers: Regret the absence of a more solid (theoretical) base for (national) statistics production

Three themes are prominent in the 16 discussions (summarized in the authors’ rejoinder) :

The role of theory

The scientific and professional credo of the NSI

The concept of quality in regard to the NSI’s activity

The uncertain future of the NSI

I. Fellegi (Statistics Canada) on survival of the NSI. “Survival beyond quality” depends on

• Respect for respondents, and

• Credibility of information; Accuracy is an important part, but so are Relevance, Transparency & others

The uncertain future of the NSI

I. Fellegi : A life and death question for the NSI is

credibility :

Information that is not believed will not be used, and the NSI has no function any more.

Can the NSI count on future high co-operation and truthful response ? -

More and more doubtful.

Believing numerical information

We have no objective measures of “margin of error”

But what about the Total Survey Error model ? (US Bureau of the Census, around 1950)

It recognizes total error as a sum of a number of components.

Can we not use these equations, this theory ?

Believing information

The Total Survey Error model

• helped us to focus on specific components of total error

• disappointed us by failing to provide routine measures for the numerical quality of published statistics.

Believing informationDiscussants of Can a statistician deliver ?

deliver “a death sentence” on the TSE model :

“Unattainable and unrealistic ideal”“Utopian project”“Unrealistic utopian dream”

Theory is there, but it does not workSome say: We choose not to use itIn question are the notions of “probability” and

“probable error”

Statistics Canada Quality Guidelines (1998)

describes Survey Methodology as : “A collection of practices, backed by some

theory and empirical evaluation, among which practitioners have to make sensible choices in the context of a particular application”

A patchwork of theories, one for questionnaire design, one for motivating response, one for data handling and editing, one for imputation, one for estimation in small areas, and so on

Fragmentation …

European Statistics Code of Practice (2005)

Sound methodology must underpin quality statistics. This requires adequate tools, procedures and expertise. The overall methodological framework of the statistical authority follows European and other international standards, guidelines, and good practices ... Survey designs, sample selections, and sample weights are well based and regularly reviewed, revised or updated …

(Emphasis is mine.) A “be-good” encouragement; what about “scientific underpinnings” ?

The stark reality

“Good practice” is the guide, not theory .

Numerical quality is not assured .

Large errors probably not infrequent; most go undetected .

So what ? - Other important professions are also guided by a bunch of “good practices”

The NSI:s situation

Its work is guided by “a collection of practices supported by some theory” plus requirement to keep response burden low

With this frail and fragmented base, the NSI must produce reliable Official Statistics, for the good of the nation, a solid basis for policy decisions

Not an enviable situation and a threat to NSI’s existence…

The Probability Sampling Tradition (born in 1930’s)

created the concept of Nonresponse Rate :

“the selected objects” (the probability sample) as compared with

“the data delivering objects” (the respondents)

We measure, steadfastly, sometimes misguidedly, the size ratio of those two sets

Our obsession with the Nonresponse Rate

When NR rate was 2%, nobody worried

When NR rate is now around 50%, we worry

• Intuitively because the non-responding may be systematically related to target variable values

• Probabilistically because “making the observation” (getting the response) has an unknown probability; the theory capsizes

The believers in Probability Sampling regret that the theory cannot cope

The non-believers : Why worry about the NR rate ? Just collect some reasonably good data from a reasonably representative set of objects.

Our obsession with Nonresponse Rates

Why not (in the manner of some private survey institutes) just get data from “a reasonably representative set of co-operative objects”, and not bother with this stifling concept of the Nonresponse Rate ?

It is time that NSI:s deliver a strong endorsement of the Probability Sampling Tradition – if this is what they really believe in; otherwise, act accordingly

Our obsession with Nonresponse Rates

NR rate itself is a poor indicator of NR bias,

of “accuracy of estimates”

See for ex. Groves (2006), Schouten (2009)

Särndal and Lundström (2008)

Conclusions

What options remain for the NSI today, to show their superior capacity to produce “serious numbers” amidst a deluge of “junk information” ?

The underpinnings may be just “a collection of practices”, but still, the NSI is the model of statistical competence in the nation - and it must demonstrate this !

Media criticism of the NSI sometimes harsh.

Conclusions

The NSI’s delicate balancing act

vis-à-vis

• The national government : fulfill the mandate

• The world of theory and learning : show “scientific credibility”

• The other (private) producers of statistics : tough competition

• The supra-agency (EuroStat) : dictates

Conclusions

A fact is that the quality component accuracy cannot be measured (probabilistically).

Yet this is what users want desperately to have measured.

When important numbers are proven wrong (by users), trust in the NSI suffers

Other numbers may be wrong, but go unnoticed - and may not matter much .

ConclusionsThe Probability Sampling (Scientific Sampling)

tradition, is a reflection of an idyllic past -

now we are 2010 , not 1950 On what grounds is it still defendable, in our

time? It is a challenge to the NSI, and to the academics

(the theoreticians), to provide the answers

Conclusions

The NSI vis-à-vis the scientific world : a sometimes hesitant relationship:

Most NSI:s have a scientific (academic) advisory board

NSI:s look to the learned world for support and acceptance

NSI:s own investment in research may (understandably) be limited.

Implementing new theory into the NSI's production has met with obstacles

Conclusions

Relationship of the NSI to the world of learning; an empirical investigation, see

Risto Lehtonen and Carl-Erik Särndal : Research and Development in Official

Statistics and Scientific Co-operation with Universities: A Follow-Up Study , J. Official Statistics (2010)

Conclusions

Debate article :

S. Lundström and C.E. Särndal (2010): The devastating consequences of nonresponse : Probability sampling in question at Statistics Sweden . (In Swedish; internal report).

Credit goes to Statistics Sweden for their courage to debate a sensitive issue.

the probability sampling tradition in a period of crisis q2010 keynote speech carl-erik särndal...

Documents

scientific slide

montral slide

easy slide

compelling slide

chemistry slide

predictive power of

science knowledge slide

scientific sampling