on sampling with replacement: an axiomatic approach …

ON SAMPLING WITH REPLACEMENT: AN AXIOMATIC APPROACH

by

RICHARD CONRAD TAEUBER

Institute of StatisticsMimeo Series No. 299October, 1961

iv

TABLE OF CONTENTS

Page

100 INTRODUCTION. • 0 ••• 1

2.0 REVIEW OF LITERATURE • ooooooe.oooo 6

300 ON THE BASIC CRITERIA FOR A THEORY OF SAMPLING 0 21

301 Components of the Sampling Problem • • • • • • 213.2 On the Question of Sampling with or without Replacement 253.3 The Applicability of Traditional Estimation Criteria 29304 Criteria for Estimators from Finite Populations • 0 39

400 THE GENERAL CLASSES OF LINEAR ESTIMATORS FOR SAMPLING WITHREPLAC~ " 0 0 • e' 0 0 0 • • 0

o • •

(I " 0 "

o 0 0 ~

o • 0

43

4346

. • . . 505356637076859092

I) 0 0 • 0'

. .. . .

. . .. . .

401 Introductory Remarks • • • 0 • • • •4.2 Probability System and Notation • 0 •

4.3 Some Combinatorial Considerations • •4.4 Class One Estimators • • • • • .4.5 Class Two Estimators • •4.6 Class Three Estimators4.7 Class Four Estimators •4.8 Class Five Estimators • • • • • •4.9 Class Six Estimators • • •4.10 Class Seven Estimators •••• • •4.11 Summary of Numerical Examples • • • • •

OOooGOOOOOOO

....

5.0 SOME ADDITIONAL COMMENTS ON THE ESTIMATORS

6.0 SUMMARY

601 Summary and Conclusions • • 0 • 06.2 Suggestions for Future Research 0

700 LIST OF REFERENCES 0 • 0 0 • 0 0 ••••

94

99

99•• 104

o • 106

~.

8.0 APPENDIX A. THE DISTRIBUTION OF THE NUMBER OF DISTINCT UNITS'IN THE SAMPLE 0 ••• 0 ••• 0 • • •• 0,0 • • • .. '111

8.1 Equal Selection Probability Case •8.2 Arbitrary Selection Probability Case . . . 111

115

900 APPENDIX B. A STATISTICAL THEORY OF COMMUNISM • • • 117

.-

1

1.0 INTRODUCTION

Sample survey procedures are among the most valuable and powerful

tools at the command of a statistician. Improperly or carelessly used,

they could be exceedingly dangerous. To get useful results from a

sample survey, one must heed the importance of the logical planning of

all steps in an investigation. Not only is there a problem of how best

to select the sample, but also there are the problems of how to obtain

an estimate of the desired population value and what measure of reli

ability to attach to that estimate. To do all this makes it inevitable

that certain assumptions regarding the unknown population will be neces

sary. It is here that finite population sampling differs considerably

from procedures for draWing samples from an infinite population. When

sampling from finite populations, the only assumptions made concern .such

things as existence, identifiability and aVailability of the sampling

units and the probability construct chosen for the selection of the sam

pling units. Sample survey theory makes no assumptions concerning the

abstract distribution of the variables (characteristics) under study.

Even after the unknown number of centuries that man has been

drawing samples and acting on the information that they prOVide him,

there still has not been developed a general theory with practical

applicability which will universally indicate to the sampler a "best'.'

(in some sense) system for selecting the sampling units to be observed

and at the same time indicate a "best" estimating procedure by which to

glean the information prOVided by the sample. (The term "best" will

usually connote minimum variance, but this is not the only requirement

.'

2

for bestness which one lIl.ight impose. For certain restrictive particular

cases "best" systems and estimators are known, e.g., the arithmetic mean

of the observations is the minimum variance estimator when sampling

without replacement and with equal selection probabilities.) However,

with some reflection, it is easy to see why no such perfect theory has

been developed for sample surveys, for the method that any given sampler

adopts is very ,dependent on the nature of the material which is avail-

able or can be obtained, and the assumptions necessary to utilize that

material to which he can gain access. In spite of this absence ofa•

"general theory" for sample surveys, some progress has been made in the

formulation of improved sampling systems and estimating procedures which

will give "better" results.

With, no doubt, centuries of application, the study of the theory

behind sample surveys (at least that which was published) dates from

1713 and the appearance of Bernouilli' s Ars co:qiectandi. In the two

centuries following the appearance of this work, little was, published by

anyone other than Poisson and texis. Beginning in 1916 many authors

have published various considerations of aspects of sample survey the-

ory, especially as applied to the drawing and evaluating of samples from

finite populations.

Modern developments in the field of sampling finite populations are

usually said to stem from the paper by Neyman (1934) which was the last

major paper to give much consideration to purposive selection of the

sample units, as contrasted with probabili.stic selection, and which

pointed the way to more "scientific" lines of development of the art.

3

Nine years later Hansen and Hurwitz (1943) stimulated the consideration

of drawing samples with unequal or arbitrary selection probabilities by

using the idea of probability proportional to s~ze. Midzuno (1950) for-

malized the general approach of arbitrary probabilities by introducing

the concept of a probability field into such studies. As he said:

"there is no need of equal probability for every element when we con

struct the probability field, isn't it?"

It was not until 1952 that the first attempts at formulating

general classes of estimators for samples from finite populations was

published by Horvitz and Thompson (1952). But Horvitz and Thompson did

not recognize the deductive approach of their own work, and so merely

stated three of the possible classes of estimators.

It remained for Koop (1957) to formalize the approach to the

formation of classes of linear estimators. The formation of seven

classes of linear estimators, for the case of sampling with unequal

(general) selection probabilities and without replacement of. the sampled

units, was based on three axioms which are descriptions of physical

realities. This approach to the formulation of classes of estimators,

based on the way things actually happen with the associated probabil--

i ties, would seem, for finite popUlations, much more fundamental than

one based on classical estimation criteria. In fact, the notion that

there are criteria for which one can develop classes of estimators is

not germane to sample surveys. In sample survey theory, one first

develops classes of estimators, then applies criteria, such as unbiased-

ness or minimum variance, to attempt a determination of bestness within

each class.

.","

4

Another problem which has been under discussion recently in the

literature of sample surveys is. the question of whether one should sam

ple with or without replacement. It is argued that with equal total

sample size, there is no question but that one should use without re

placement by virtue of the fact that the variance of the mean is small

er. But, when cost is figured in as a consideration in the decision

process, then the comparison clouds for the costs of sampling with re

placement depend on the number of distinct units in the sample, rather

than the total sample size.

Hidden by considerations such as the question of whether to sample

with or without replacement, the development of newer and fancier esti

matorsfor specialized situations, the extension of the sampling plan to

more and more stages, the more and more theoretical discussions of some

of the technical problems which arise in actual samples, etc., is an

almost complete lack of discussion of principles governing the choice of

estimators to use on samples from finite populations. Although the

basic principles of unbiasedness and minimum variance, which are direct

ly applicable to samples from finite populations, appeared in the liter

ature in the early nineteenth century, little has been developed since

in the way of criteria specifically applicable to the problem of deter

mining optimum estimators (in some sense) when the population under

study is finite. In this, and many other aspects of sample survey the

ory (samples from finite populations) the tendency seems to have been to

assume that the criteria developed for infinite populations will merely

transfer to finite populations •. In some cases they may, but for the

5

most part they do not without adding unwarranted assumptions about the

nature of the population or the sample.

What this dissertation proposes to do, then, is:

(1) To discuss, in a preliminary manner, the applicability of the

classical estimation criteria to samples from finite popula

tions, and to suggest some possible criteria which might be

utilized in evaluating possible estimators for use on samples

from finite populations.

(2) Examine the question of sampling with replacement versus sam

pling without replacement to see what conclusions might be

reached, or have been reached, or to see if such a comparison

can properly be made in the first place.

(3) Using an axiomatic approach, to examine the problem of formu

lating classes of estimators for samples drawn from finite

populations, with arbitrary selection probabilities, and with

replacement of each sampling unit before the next unit is

drawn.

It can be noted that the first two objectives are somewhat interrelated~

The third objective, the use of the axiomatic approach, does not depend

on the results of the first objective. However, this approach to the

formulation of classes of estimators is further justified by the results

of the first objective.

6

2.0 REVI:EWOF LITERATURE

Man from time innnemorial has engaged in the use of sampling

techniques to base decisions on partial knOWledge of the situation. He

has judged the opinions of many by talking with friends or advisors; he

has condemned or praised a whole. nation or race of people after but a

five or ten day visit; he has pushed aside a bowl of hot soup. or tepid

mush after swallowing one spoonful; et cetera, et cetera and so forth.

In the case of the soup or the mush, the universe (bowlful) is undoubt-

edly sufficiently homogeneous that such a sample would lead to valid

inferences. But for the other examples cited (in fact for most of the

sampling that is done by man, either unconsciously or deliberately),

there is great danger that false and misleading inferences will be

drawn if complete objectivity in the formulation of the goals and pro-

cedures of the inquiry and in the collection and analysis of the data

is lacking.

Eventually people began to want to formalize the methodology behind

obtaining some of these sample estimates. Some sort of formal procedure-.,

was needed to obtliLin measures of central tendency and an indication of

their Validity, based on a subset of an entire population. The first

known formal approach to study the theory of sampling was that of

Bernouilli in his monumental study Ars coniectandi which appeared :i,n

1713. A c.entury later Poisson gave indications of the theory that would

result from the introduction of stratification into the sampling proce-

dure. Subsequently Lexis systematized the work of his predecessors and

added the beginnings of the theory of sampling clusters of elements.

7

Also, it can be noted that the germinal ideas of the analysis of

variance techniques are to be found in Lexis I works.

Sir Arthur Bowley (1926) su:mma.rized the adaptation of the works of

Bernouilli and Poisson to sampling from finite populations. Bowley was

also one of the first to apply the representative method (purposive se

lection) in practice, and included in this paper a discussion of the

theory involved. This paper undoubtedly marked the high-water-mark for

purposive selection (as contrasted to random selection, or attempts

thereat) because the major papers subsequent to this one seemed to as-

sume random selection, or to condemn purposive selection. Bowley, him-

self, later made the .comment ~ when discussing the paper by Neyman (1934),

that he thought his 1926 paper had "damned it (purposive selection) with

very faint praise".

In the decade immediately preceding Bowley's paper,l the theory of

sampling finite populations with equal selection probabilities and with

out replacement began to develop in earnest. Isserlis (1916 and 1918),

Edgeworth (1918), Tschuprow (1923) and Neyman (writing under the name

J. Splawa-Neyman) (1925) de~ived formulae for the, first four moments of

the sample mean. Mortara (1917) developed a formula for the standard

error of the mean. Neyman, in addition to giving formulae for the first

four moments of the 'sample. mean, gave formulae for the first two moments'; .

of the sampling variance. Due to· inaccessability, formidable notational

1 .. .For a very interesting discourse on part of the.history of the

development of sampling theories· and practice in the five decades preceding Bowley I s paper see the article by You Poh Seng (1~5l).

8

systems, or other reasons, none of these papers stimulated 'Wide growth

in the field of sample survey theory.

Tschuprow, in that same 1923 paper, developed the principles of the

theory of the optimum. allocation of units in stratified sampling. In

fact, "Zarkovic (1956), in his article on the history of sampling methods

in Russia, gives the impression that had the works of Tsc~uprow been

more accessible, and had they had a system of notation which was easier

to understand, they might be the monumental works being cited in this

chapter. Zarkovic refers to an earlier Russian work which mentions that

Tschuprow, in 1900 in a report IIOn Sampling Methods II , dealt only with

probability samples (Western reliance was then on purposive selection)

and developed the basic theory of surveys. Also several of Tschuprow' s

other works, especially those in connection with the Russian census

circa 1913, where many of the techniques were applied, were quite sug

gestive of techniques and theoretical developments which were "derived"

much later in the more familiar Western literature 0

In fact, if Zarkovic is right, Russian sta.tisticians were in the

forefront" of the development of sample survey theory and techniques up

to the time of the death of Lenin. This was due, undoubtedly, to the

fact that

"These Russian statisticians watched the development ofstatistical theory allover the world, they publishedtranslations of the most important "foreign "contributionsand they reviewed for their rea.ders all important results,whatever country prOVided them. This keen actiVity supplied the base from which they sought solutions to theirown practical problems." (1956, p. 336)

He goes on to say, though, that in the years after the death of Lenin,

9

political considerations became increasingly important in Russian

statistical effort, and less reliance was placed on theory in the prac-

tical application of "statistical" techniques. (For an illustration of

this non-reliance on theory in the Communist world, see Section 9.0).

There are indications, however, that, at least in Russia, a reliance on

theory is again emerging [YeZhOV (1957)).

An impetus to sample survey development, following the above

mentioned papers and the fundamental statistical contributions of

Pearson, Fisher, and others, resulted from the paper by Neyman (1934)

entitled liOn Two Different Aspects of the Representative Method:

Modern developments in the field of sample survey theory can be said to

have begun With this paper. Several new concepts (i. e., new to most

"

researchers; some had been anticipated in earlier articles which had not

received as much attention) were introduced and discussed, such as:

( i ) optimum use of resources in sample surveys,

(ii) criteria for the choice of the sampling unit:>

(iii) use of preliminary inquiry for improving the design of the

survey, and

(iv) optimum allocation for assigning units to different strata

subject to the restriction that the sample shall have a

fixed total number of sampled units.

Neyman also discussed the advantages of random over purposive selection

of units, and also the advantages of using stratified sampling, going

so far as to make the statement that the only recommended method of

sampling is stratified random sampling.

10

The next major paper to appear was that by Hansen and Hurwitz

(1943). Faced with the situation where the sampling unit, and the ulti

mate unit.of analysis are not identical, they examined the question of

sampling with unequal selection probabilities. In situations where the

sampling units are aggregates of ultimate units (i. e., clusters) limita

tions on resources may prohibit the effort needed to group the ultimate

units into clusters of equal size by artificial methods. These authors

noticed that if one· sampled units with replacement using probabilities

(Pi) exactly proportional to the values (Yi) of their aggregate char

acteristics (i.e., Pi = YilT, T = the population total), the mean of

these aggregate values, each weighted by the reciprocal of its respec

tive selection probability, has a sampling variance of zero since each

Yilpi ;: T.

These considerations led Hansen and Hurwitz to consider selecting

sampling units with probabilities proportional to some measure of size

so as to reduce the Sampling variance of the estimator. The scheme that

they proposed was essentially a stratified two-stage scheme, selecting

one primary unit per stratum at the first stage with probabilities pro

portional to some measure of size, and at t~e second stage selecting the

elements in each selected primary unit with equal probabilities and with

out replacement. An unweighted estimator was shown to be unbiased and

to have a smaller variance than if the sampling plan was based on equal

first-stage selection probabilities.

The appearance of this article by Hansen and Hurwitz stimulated

attempts to generalize the approach, both in terms of estimators

11

invalving varying probabilities (rather than being restricted to equal

selection probabilit~es as had most previous studies) and in terms of

selecting more than one first-stage unit per stratum so that the sam-

pling variance of the estimator could be calculated. Not all of these

papers will be mentioned here as they are not of immediate relevance to

this dissertation.

Sukhatme and Narain (1951) outlined a scheme where the primary

sampling units (p. s. u. IS) were selected with replacement and With

probabilities proportional to their sizes as measured by the number of

sub-units in each primary unit. Then the second stage units were se-

lected without replacement and with equal selection probabilities. They

presented the theory, and also compared the efficiency of the following

two schemes:

(A) selected a random sample of mni sub-units from the i-th pri

mary unit, where m is an integer, and ni denotes the

number of times the i-th primary unit appears in the sample,

E n. = n, andJ. '

(B) select ni independent random sub-samples of m sub-units

from the i-th primary unit.

The variances of the sample means are respectively:

[~~N M -m N :2 :2 ]VA

1 + Ei :2 - (n-l) E

Pi O"i= - PiO"in i=l mMi i=l Mi

1 [~~ N M -m Pi~i]VB = - + Ei

n i=l mMi

12

where N = number of p. s. u. 's in the stratum;

Mi = number of sUb-units in the i-thp. s. u. ;

:2 the between p. variance;O"b = s. u.

t{ = the within the i-th p. s. u. variance.

Thus in their plan (A) that part of the variance attributable to sub-

sampling is reduced to the order of

m(n-l) tj,(M-m)N

whereN

M = L: Mi/N,i=l

which, it may be noted, is very nearly equal to the over-all sampling

fraction.

The estimates of the between and within variances are as follows:

for case (A):

"2 v mni -:20" = L: L: (Yij-Yi) /(nm-v)w i=l j=l

ni(y

i-y):2 A2

]A2v 0"

[ E(V)-lO"b = L: 2!.

n-l n n-l . ,i=l

where v =

:2O"w =

the number of distinct p. s. u.' s in the sample;

for case (B): the estimates come straight from analysis of variance

considerations, since the sub-samples are drawn inde-

pendently.

Wilks (1960) raised an objection to the above scheme by noting that

it is conceivable that mni could exceed Mi ; thus the above method

13

could require observation of more ~han the total number of secondary

units available.

Wilks suggested that one let Ni = oMi = the number of·elements in'

the i-th p. s. u. (and to consider a reasonable approximation where the

Ni are roundeC!. to the nearest integral multiple of m) 0 Then one is to

draw a s~le of n p. s. u.'s~ in a manner such that a sample of aim

sub-units is drawn from the i-th p. s. u.~ where the a i (i = 1, 2, 0 0 0'

k) are random variables having the hypergeometric distribution

This scheme may be regarded as one in which s~l;ing is done without

replacement at both stages, whereas the scheme proposed by Sukhatme and

Narain involves s~ling with replacement at the first stage and without

replacement at the second stage.

Wilks suggests that the estimator for the mean be

_ 1 k _ 1 kY = - Z a.y = - Z m aiyJ,. ,

n i=l J, i mn i=l .

which is self-weighting. The expression for the estimate of the vari-

ance of this mean is, unfortunately~ rather complicated and is given by

Wilks (1960~ p. 246).

In the early 1950's many articles appeared which incorporated un-

equal selection: probabilities into the formulation of estimators 0 The

majority of these were by Indian authors, and have not received much

...

14

attention in this country. The article that is probably the best known

of the ones thai;, appeared in this interval· is that by Horvitz and,

Thompson (1952). In their "Generalization of Sampling Without Replace-

ment from a Finite Universe ll they formulated three classes of linear

estimators for the population total with coefficients for class one de-

pendent on the presence or absence of a unit in the sample, for class

two d~pendent on the order of draw and for class three dependent on the

particular sample involved. This article was the first to incorporate

the ideas of what was subsequently formalized as the axiomatic approach

to the formation of classes of estimators, although they did not explore

the logical consequences of this formulation. These requirements on the

coefficients for the classes will be seen later'to be the same as our

class two, class one and class three respectively.

They determined coef;f'icients for each class by imposing (a) the

condition of unbiasedness (that the expected value of the estimator be

equal to the total); and (b) that the coefficients so determined shall

be independent of the properties of the population. The authors them-

selves were aware that they were indicating only three of the possible, .

classes of linear estimators of the total when sampling a finite popula-

tion. It was subsequently shown that there were in fact seven classes

of linear estimators for sampling a filiite population with unequal

(general) selection probabilities and without replacement. It will be

seen in Section 4.0 that these same seven classes can readily be adapted

to the case where the sampling is done with replacement.

15

Horvitz and Thompson themselves indicate that they did not consider

the general solution of determining a "best linear unbiased" estimator

for the total of a finite population sampled with arbitrary probabil-

i ties and without replacement. Godambe (1955) considered this question

and demonstrated that a uniformly minimum variance unbiased estimator

for the total or mean of a finite population does not exist.

Godambe also put forward a "unified theory of sampling from a

finite population". He developed a system of notation to indicate the

element by the unit selected on a particular draw and the sequence of

units preceding the individual unit selected (i. e ., the particular sam

. ple involved)~ He also defined symbolically a system of probabilities

to handle this case, and proposed a "general" estimator which can be

seen to belong to class seven among the classes developed axiomatically

in this dissertation.

Koop (1957) recognized the systematic approach to the development

of classes of estimators implicit in the works of Horvitz and Thompson

and Godanibe, but not directly recognized by those authors. He posited

three axioms, axioms which are descriptions of physical realities, and

then, in a systematic fashion, derived seven classes of linear estimators.

This approach will be discussed more fully in Section 4.0, and the axio-

matic approach applied to the problem of determining classes of estima-

tors for a system of sampling where the probabilities are arbitrary and, - J •

the sampling is done with replacement of each unit before the next is

drawn.

16

In their article, Des Raj and Khamis (1958) made a comparison

between the arithmetic mean of the distinct units observed in the sample

when it is drawn with replacement, and the arithmetic mean of the total

ity of observed units including repetitions. They assumed equal selec

tion probabilities and made the comparison for both the case when the

sample size is fixed and the number of distinct units is random, and the

case when the number of distinct units is prespecified.

Basu (1958), in his article liOn Sampling with and without Replace

ment", written independently about the same time, made the same compar

ison as did Des Raj and ~amis, but not by an analytic method as did

Des Raj and Khamis.

Roy and Chakravarti (1960) acknowledged the researches of Godambe

(1955), Des Raj and Khamis (1958) and Basu (1958) and said they were

going further, obtaining an "admissible" estimator, together with a

"complete class of .estimators ll for a very general scheme of sampling.

This very general scheme whi.ch they propose has some exotic properties;

however, it appears that they have induced generality by deliberately

leaving some details unspecified. Their estimator can be shown to

belong to class two.

Godambe (1960) also demonstrated the Iladmissibilityll of an estima

tor which is algebraically equivalent to that proposed by Roy and

Chakravarti when the same restrictions are imposed. Godambe' s estimator

is the same as given for the class two estimator in Section 4.5. This

article will also be discussed at length later, in Section 5.0.

17

Since this dissertation is discussing principally sampling with

replacement, the following additional recent articles are of interest.

Nanjamma, et al, (1959) propose a scheme of sampling with replacement

which leads to an unbiased estimator. Their scheme is to select one

unit with probability proportional to some auxiliary variable x, replace

it and then select the rest of the sample units from the whole popula-

tion with equal probability with replacement at each draw. For this~.

selection procedure the ratio estimator, R = y/x is shown to be un-

biased for estimating the population ratio R. The sampling variance

and an unbiased estimator thereof would be different from those in the

case of sampling with equal probabilities without replacement of the

units. The variance estimator they give as:

A A A2

V(R) = R -

v 2 vE ni(ni-l)y. + E n.nj

y .y .i=l ~ ifj ~ ~ J

n(n-l) x X

where X is the known population value. They also mention another

modification of the probability proportional to size (pps) with replace-

ment scheme which has the first unit selected with probability propor

tional to the size of the x-characteristic (ppx), replaced, and then the

remaining (n-l) units selected 'With replacement with ppz, where z is

another indicator of size. The estimator for this case is algebraically

the same as that of the usual biased ratio estimator in the case of com-

plete pps with replacement sampling, to wit:

,

18

which is unbiased for the ppx-ppz scheme by virtue of the new probabil

ity system.

stevens (1958) postulated a scheme whereby sampling with replace

ment could be made equivalent to sampling without replacement, thus

taking advantage of the simpler probabilistic manipulations. He showed

that sampling without replacement with pps can be achieved if the sam

pling units are grouped with reference to size. Then when the same unit

is selected a second (or more) time, it is substituted by another unit

of the same size chosen at· random. The estimate of the population total

is then formally the same as when sampling is done with replacement.

Des Raj (1958) compared the efficiency of an estimator for the case

of sampling with probability proportional to size and with replacement

with the efficiency of some alternative methods such as: simple average

(simple random sampling), ratio, regression, proportionate allocation

stratified and optimum allocation stratified sampling. Zarkovic (1960),

in making essentially the same comparison, added difference estimates

and dropped the optimum allocation stratified sampling estimate.

One final aspect of the literature apropos to this dissertation is

the relative absence of any consideration of estimation criteria (other

than unbiasednessand minimum variance) applicable directly to samples

from finite populations. By this is meant the absence of criteria which

do not depend on artificial devices such as letting the size of the

finite population approach infinity. For instance, Madow (1948) claims

that under very broad conditions the usual theorems concerning the

limiting distributions of estimates hold for estimates based on samples

19

taken from finite populations, at random without replacement. He also

states that under the same conditions, the same conclusions are true for

samples drawn with replacement, if the approach to infinity by the size

of the "finite" universe is within the limitations imposed by "condition

w" • In his paper, Madow "proves" that the limiting distribution of the

mean is normal "provided only that as the universe increases in size,

the higher moments do not increase too rapidly as compared with the

variance, and that for sufficiently large sizes of sample and population

the ratio of n to N is bounded away from one."

Another frequently used conceptual device [see, for instance,

Cochran (1946), Des Raj (1958)] is to make the assumption that the

finite population itself is a random sample from an infinite super-

population, making the sample a second- stage sample.

Using "consistencyll as an illustration, this being a universally

accepted desirable criterion for any estimator, very few authors use a

definition applicable to samples from finite populations. For instance,

in the textbooks on sample survey theory: Yates (1953) does not bother

to give a definition; Cochran (1953), Hansen, Hurwitz and Madow (1953),~

and Sukhatme (1953) all give the "infinite" definition involving con-

vergence in probability. Cochran (1" 13) does actually give a IIfinite"

definition of consistency, but in the next paragraph he returns to the

convergence in probability definition saying that "the idea of consist-

ency does not play an important part in the subsequent exposition."

Only Deming (1960) uses a suitable definition, although he does not

state it explicitly but refers to Fisher's paper "On the Mathematical

20

Foundations of' Theoretical Statistics" (1921). He does make the state

ment, though, that lithe bias of' this estimate is inconsistent, i.e., the

bias if' any does not diminish to zero as ni approaches Ni " (p. ;20).

This whole question of' estimation criteriaf'or samples f'rom f'inite

populations is discussed in Sections ;.; and ;.4.

•

21

3 .0 ON THE BASIC CRITERIA FOR A THEORY OF SAMPLING

3 .1 Components of the Sampling Problem

It has been said that sample survey theory is easy because it deals

mainly with the estimation of means or totals and the variance of these

estimates. This statement is made in spite of the multitude of problems

which can beset a sampler in real life situations, in spite of a be

wildering maze of formulae which can be present for a very involved

multi-stage survey; and also in spite of complex formulas and difficult

terminology which often confuse the practicioner in the field and those

trying to glean some knowiedge from: the report of the survey. The two

conflicting Viewpoints arising in the above situations would seem to be

resolved if the first is attributed to a non-sampler who is looking at

sampling from the broad spectrum of the traditional approach to estima

tion and attribute the second to the practicing sampler who sees the

multitude of problems that arise when actually conducting a survey.

The resolution of these viewpoints would be very difficult, since

many aspects of the traditional (infinite) approach to estimation do not

hold when applied to the sampling of a finite population. (By the term

finite is meant a size below that which might be categorized as "indef

initely large", for which the infinite theory would hold, at least

approximately. )

In the study of the theory behind various aspects of sampling ~

conglomeration of problems may be encountered: one can select the sam

ple systematically, purposefully or probabi1istical1y; one can have an

unrestricted sample or one can stratify, or use clusters, chunks or

22

quotas; one can have equal, unequal, arbitrary or judgement probabil-

ities or probabilities proportional to certain measures of size; one

can have a single-stage or multipJ.e-stage sample; one can use mean-per-

sampling-unit, regression, ratio or more elaborate estimators; one can

use biased, deliberately or accidentally, or unbiased estimators; one

can study the effects of response and non-response errors; and so forth.~

But behind all these related or unrelated aspects of sampling there are

five components of any sampling plan, the first three of which, at least,

must be specified~rior to any theoretical or empirical investigation.

First and foremost there must be a well defined UNIVERSE; a

universe which consists of the totality of ultimate units of analysis

about which information is desired and which is invariant under further

considerations of the particular sampling investigation being carried

out. 'For the universe one must next specify the FRAME, Le., a descrip-

tion (e. g. by maps ) and/or listing of all sampling units (each contain-

ing one or more units of analysis) which comprise the universe or a '

sufficient portion thereof, if the sampling operation is planned in

several stages, to conduct the survey. For a full discussion of the

concept of the "frame II see Deming (1960, ch. 3).

This dissertation will be concerned With a single universe from

which the units (i.e., the ultimate units of interest) can be selected

in one stage. Thus the universe under consideration can be said to be

simple. Correspondingly the frame is simply a list.

23

Given the universe and the frame, next define a PROBABILITY SYSTmC

for the possible selection of every unit revealed by the frame [see Keop

(1960)] • When the frame is a list, as above, the probability system

will be defined by a single sequence of non-negative numbers which sum

to unity. For more complex frames (those which show' the universe in

separate portions and in which the units are in some hierarchal or

nested order) the probability system will be correspondingly complex

and will consist of a sequence of probabilities specific to each unit or

subdivision of the frame (strata, first stage units, second stage units,

etc. ). For a geometrical representation of a probability system see

Feller (1957, p. 118 ff).

Then the SAMPLING PROCEDURE comes operationally from the selection

probability system and is the scheme for determining which particular

units are to be drawn for the sample.

And also, for every logical combination of a specific frame and a

specific probability system, there is a specific problem of determining

an ESTIMATOR; the problem of selecting the arithmetical procedure of

estimation which will "optimally" (in some sense of the word) give the

information desired from the survey in the first place, i.e., the esti-

mates of the population values of the characteristics under observation.

2The use of the word llsystem" follows the usage of Carmichael(1937) who states "A set of objects, with the associated rule Or rulesof combination, is called a system, or, more explicitly, a mathematicalsystem. " Thus the use of the term system is intended to connote notonly the simple Pi values but also any applicable associated rules of

combination necessary for full specification.

24

Schematically the directions of influence between the five compo-

nents can be represented by the following diagram:

r )UNIVERSE ----+-) FRAME ---,..> PROBABILITY SYSTEM

t~SAMPLING OPERATION~ESTIMATOR

Given the frame and the probability system, one may be able to get

an "optimum" sampling plan and an "optimum" estimator. Vary either the

frame or the probability system, or both, and the problem of getting the

sample and estimates, or comparing various methods for obtaining the sam-

ple and estimates, changes. Problems of choice within the last two com-

ponents, i.e., the sampling operation and the estimation process, con-

stitute most sampling research, and are the source of the statements

that the study of sample survey theory is rather difficult and frequent-

ly involved in algebraic complications. But all five components must

be spelled out in detail for any individual survey. Further, the first

three components must be specified accurately and completely, for no

amount of refinement or elaborateness in the last two can overcome de-

fects in the first three, e.g., definition or delineation of the frame

or selection probabilities.

It might be noted here that the formulation of these five compo-

nents has ignored several other parts of any sampling problem, equally

as important as the five given, but which depend on the individuals

planning the survey, and not on the process itself. These non-prob-

abilistic problems are involved with the mechanical process of

25

accumulating the sample data, and would include requirements that the

objectives of the study are well defined, that the appropriate question

naire is designed to obtain the desired information in a manner which

can be used, that the answers obtained are to the questions on the sur

vey questionnaire as designed, and not as interpreted by the interview

er, and that the units actually interviewed are the units selected by

the sampler designing the survey.

The neglect of these ideas, fundamental to the study of sample

survey theory, is a great source of confusion and difficulty in much of

the research into comparisons of sampling methodologies done thusfar.

3.2 On the Question of Sampling with or without Replacement

As an example of the application of the principles discussed in the

preceding section, the statement can be made that there is no valid· di

rect comparison between sampling with replacement and sampling without

replacement. A comparison between the two is possible, but only on an

indirect, total (or multiple) basis. That is, since completely differ

ent probability systems are involved, two complete sampling plans must

be run, with a final judgement as to which is better depending on com-

parisons of end results for items such as variances and costs involved.

With equal sample Sizes, and no consideration of cost, then there

is agreement that sampling· without replacement is better than sampling

with replacement (using the mean of the sample units as the estimator)

by virtue of a smaller variance, Le., (N-n)cr2InN versus cr2In. How

ever, when cost i~ considered, the conclusions are not clearcut, for the

cost of the with-replacement sample is dependent, not on the total

units in the sample

26

sample size, but on the mmiber of distinct units included in the sample.

The problem of making comparisons between the two then involves the dis

tribution of v (the number of distinct units when sampling with replace

ment) which is discussed in Section 8.0.

Apropos to this discussion of with versus without replacement, two

articles' already mentioned in Section 2. a will now be discussed briefly;

that by Des Raj and Khemis (1958) and that by Basu (1958).

Des Raj and Khamis compare the arithmetic mean of the distinct

1 v(y =- Z y.) with the arithmetic mean of thev· v i=l 1.

totality of observed units

times the i-th unit appears

_ . 1 v(y = - Z kiy.,

n n i=l 1.

in the sample).

where k. is the number of1.

For the two cases that they

examine, the applicability of their results is restricted by assuming

equal selection probabilities (P. = liN) 0

1.

For their case A (n fixed, v a random variable) they then have a

neat algebraic inequality to show that

to wit:

1o - = Q 0

n

Thus for the restrictive case of sampling with replacement and with

equal selection probabilities, Des Raj and Khemis have shown that the

arithmetic mean of the distinct unit characteristic values in the sample,

27

has smaller variance than the arithmetic mean of the totality of ob

served variate values. (Actually the strict inequality only holds for

n ~ 2, but for n = 1 no estimate of the variance is possible.)

Basu (1958) in an article entitled "On Sampling with and Without

Replacement" attempted the same comparison that Des Raj and Khamis made,

utilizing an "indirect proof" of the inequality

2(N-V !!-)

E N-l v(J'2

< n(n > 1) •

(Note that had Basu used a definition for (J'2 which used N-l as a divisor

rather than N, the left-hand expression Would have simplified consider

ably from the standpoint of taking expectations.) The proof of this

inequality is not apparent. For the case of equal selection probabil

ities, the conclusion of his argument runs as follows (with notational

changes to correspond with the above):

"Since Yn is an unbiased estimator of Y, it follows at once

that ~ is also unbiased. It also follows that, for any convex

(downwards) loss function, Yv has a uniformly bet'ter risk func

tion than y. In particular VCy ) < V(y ), the sign of then v - nequality holding only when n = 1. Thus the inequality is proved.We may note in passing that T (the vector of distinct observations) is a sufficient statistic here although not a completeone. No uniformly best unbiased estimator of Y exists." .(1958, p. 290).

Basuls argument for the general case of arbitrary probabilities also

rests· on the idea of sufficiency and he claims that the same inequality

holds. But the concept of sufficiency is not relevant for finite popu

lations (see Section 3.3.3), so where does the argument rest? Whether

or not the inequality does hold in fact, merely stating an intuitive

28

belief does not constitute proof. It can be conceded that the vector of

distinct observations does, in a physical sense, contain all the informa

tion in the sample, but with selection probabilities and possible obser

vational weights necessary for estimation known in advance and independ

ent of the characteristics under study, or determined by counting the

.appearances of the units, a mere statement of sufficiency does not

constitute a proof, unless one is redefining sufficiency.

From the above arguments, should one be restricted to sampling

without replacement and forget entirely sampling with replacement? This

question has not been answered. This is not the question actually

attacked by any of the authors, or what was actually proved in the one

case. The question of whether one should sample with or without re

placement, as does the question of the numerical structure of the se

lection probability construct, arises in the consideration of the

probability, system to be used in a given survey problem. The decision

may be made on the basis of choice, or may be dictated by external cir-

. cumstances, but once specified cannot be altered without changing the

entire problem. And it is a decision which must be made before one pro

ceeds to the steps of selecting an "optimum" sampling procedure or an

"optimum" estimator.

It·; is undoubtedly for these reasons that the various authors who

consider the question of sampling with or without replacement start out

saying thi~ is the comparison they are making, but actually make a com

parison between using an estimator based on the totality of observed

units and 'one based on just the distinct units drawn in the sample of n,

29

both for the case of sampling with replacement. As said earlier, a

comparison is possible, but only by duplicating the entire sampling plan

and then comparing end results,· remembering that the costs involved are

behind every step of every comparison.

3.3 The Applicability of Traditional Estimation Criteria

From the above arguments, then, there are five components of any

sampling plan, all of which are essential to the estimate which is

finally obtained. Of these five components

Universe,

Frame;

Probability System,

Sampling Operation, and

Estimator

the first three must be completely specified before any problem involv

ing the last two can be discussed. The problems involved in obtaining

an "optimum" sampling plan for any given situation, subject to consid

erations such as costs, expediency, etc., will not be discussed in this

dissertation.

When one comes to the position of deciding on an estimator to be

used to arrive at an estimate of the desired mean or total (or other

population value), one can choose from within a variety of specific

estimators for a given situation. However, behind this choice of a

specific estimator lies the problem of determining which one is "best"

for the purposes at hand, or even deciding whatcriteria should be used

in resolving the question of bestness. Neyman (1952, p. 158) made the

,

30

following comment along these lines:

"While there is likely to be general agreement as to thedesirability of using the best, or at least a satisfactory,method of making assertions regarding Tl, there may be difficulty in explaining exactly what properties a method ofestimation should posseSB1n order to qualify as the 'best'or as 'satisfactory'. And without having such an exactexplanation, without knowing exactly what we are lookingfor, it is obviously hopeless to expect that we shallever find it. If it were possible to devise a methodof using the values of the observable random variablesto predict exactly and without fail the value of theestimated parameter, then there would be universal agreement that the method in question is the best imaginable.However, it is obvious that, barring some very artificialexamples, such a method does not exist and we have to putup with unavoidable ,errors."

With 'this "unaVOidable error" thus present in any estimate, what cri-

teria are to be used for determining the choice of estimator? This

question is particularly appropriate to the problem of estimation based

on a sample from a finite population. The traditional, or classical,

approach to this question of criteria for estimators has been based on

concepts developed for and largely applicable to infinite populations,

and samples therefrom.

Fisher's magnum opus on estimation (1925) posited that:

"Any body of numerical observations ••• may be interpreted as a random sample of some infinite hypotheticalpopulation of possible values. Problems of estimationarise when we know, or are willing to assume, the formof the frequency distribution of the population, as amathematical function involving one or more unknownparameters, and wish to estimate the values of theseparameters by means of the observational record available. A statistic may be defined as a function of theobservations designed as an estimate of any such parameter. The primary qualifications of satisfactory statistics may most readily be seen by their behaviourwhen derived from large samples." (p. 701)

From this bieginning, then, the criteria for determining "bestness:' in

1

31

estimators have been developed as if all estimators that might be ques-

tioned are based on samples that came from an infinite population.

But what of the problem of estimation based on samples from finite

populations? Fisher makes the statement that estimation problems arise

when one knows or is willing to assume the form of the frequency distri-

bution of the population. However, in sample survey theory little or

no attention is paid to the abstract distribution of the characteristics

under observation (abstract distribution meaning, for each characteris-

tic, a sunnnarization by histographic methods to indicate the proportions

of units contained between arbitrarily chosen bounds for the measure of

the characteristic under consideration). For infinite populations the

abstract distribution is identified with a frequency distribution, but

the frequency distribution concept does not yield operational probabil-

ities for sampling purposes, i.e., probabilities of the form f(x)dx

are not very realistic as selection probabilities. To impute the fre-

quency distribution approach, a classicist would use randomization con-

cepts, where there is no discrimination against or preferential treat-

ment for a unit on other than probabilistic considerations.

The problem of estimation in sample surveys is to determine the

method of weighting the sample observations (this being dependent on the

method of selection of the units that comprise the sample, and the known

selection probability system) to produce the "best" estimate of the

desired population value.

What really occurs in sample surveys is this. There is a universe

of units, U. (i = 1, 2, •.. , N), each of which has associated with it aJ. .

32

vector of charac~eristics, say Yi = (Yli' Y2i, •.• Yhi)' One must note

that "i"is not necessarily a simple index, but may be an extended index

with a number of sUbscripts sufficient to identify the unit in the hier-

archal structure of the frame, however complicated it may be. If one

desires to examine the j-th characteristic, Y' i (which will hereafter. . J

be ,denoted by xi)' then a set of units is drawn from among the Ui ac-

cording to the probabilities prescribed by the system. Then a function

of the characteristics observed for the units included in the sample is

calculated to estimate the mean or total for that particular character-

istic for the finite population under study, Le., calculate

,., '"f(X

i) = Xor T •

Also, to compare alternative estimation procedures, or to "evaluate" the

estimate that this process yields, one may compute a "variance", a func

!.' 2tion of the form f(Xi - ~) , which can be used as a measure of the pre-

cision or as a "bestness" indicator.

In the traditional approach to determining the optimality of this,

or a chosen, estimator, one would like it to possess those properties

Let us now examine, within the framework given earlier for sampling from

33

a finite population, each of these concepts in turn to see to what

extent they can be applied to estimates for population values for finite

universes.

3.3.1 unbiasedness. This is probably the most universally recog-

nized attribute for an estimator. Unbiasedness is concerned with the

distribution of an estimate, and requires that the distribution be

"centered" on the population value (parameter), 1.e., that the expecta-

tion of the estimator is equal to the population value being estimated.

It should be noted here that the concept of expectation must be

modified to be applied to finite populations. Essentially it can be re-

garded as an averaging over all possible samples, i •e. , "the mean of the

distribution of the estimates X, each X being calculated by the rules

contained in the sampling procedure for all the possible samples that .

one can draw by applying the procedure to a given frame" [Deming

(1960)]. One can express this as

S A

= E 1i es=l s,s

,.,where e

.sdenotes the value of the estimator calculated from the s-th

sample; and

to1is denotes the probability of selecting the s-th sample.

The expectation of an individual unit would be expressed by

where Xi is the measurement of the characteristic under study; and

Pi is the probability of selecting the i-th unit on a given draw.

34

This criterion of unbiasedness certainly can be applied to an esti

mator based on a sample from a finite population. However it must fall

into the category of a potentially desired attribute rather than a

universally required one since:

a) if the standard error of the estimator is large, the fact that

it is unbiased is rather incidental;

b) it is possible that a biased estimator will give a more precise

estimate, i. e., have a smaller mean square error. The decision

as to wheth~r or not to require unbiasedness in this situation

must rest on a consideration of the total error, which arises

from bfas and sampling variation together. In general, how-

ever, one should not use a biased estimator unless an upper

bound can be computed for the bias from known properties' of the

universe in question.

To further cloud the issue, there may be some problems in which un

biasedness of the estimate might be more important than a smaller error,

if, say, large amounts of money, or even life, might be lost on a wrong

decision.

3.3.2 Consistency. The criterion of consistency is less stringent

than that of unbiasedness in that it requires unbiasedness "in the lim-'

it". The traditional and universally accepted definition of a consist

ent estimator can be stated as:

f(.!) p-> e or P.r [If(.!) - el > e: ] > 8 for n > N(e:, 8).

This is the definition that is used or cited almost univ~rsally in the

books on sampling. However the concept of convergence in probability

35

leaves something to be desired when one thinks of a finite (rather than

indefinitely large) population. Fisher (1956, p. 145), in fact, makes

the comment in his latest book that "the asymptotic definition is satis

fied by any statistic whatsoever applied to a finite sample".

Fora definition of consistency which applies to samples from.

finite populations, it would be best to use Fisheris 1921 definition:

"Consistency.--A statistic satisfies the criterion of consistency if, when it is calculated from the whole population,it is equal to the required parameter." (p. 310)

This definition is very satisfactory for sample survey theory, and with

this definition, the criterion of consistency is certainly a reasonable

one to require for any estimator.

3.3.3 Sufficiency. Sufficiency, at least in the traditional sense,

requires that the whole of the relevant information (not the current

popular usage of "information") available in the sample will be con

tained in, or utilized by, the estimator(s) which is (are) computed. It

was in this general sense that Fisher first defined sufficiency in 1921,

i.e.

"Sufficiency.--A statistic satisfies the cri.terion of sufficiency when no other statistic whic can be calculated fromthe same sample prOVides any additional information as tothe value of the parameter to be estimated." (p. 310)

or

" ••• sufficiency, which latter requires that the whole ofthe relevant information supplied by a sample shall be contained in the statistics calculated.". (p. 367).

From these first general statements a more formal definition has come

into universal usage, this definition being, as given in Fraser (1958,

p. 218):

..

36

"We have the definition:A statistic t(x) is a sufficient statistic, if, giventhe value of t"[x), the conditional distribution isindependent of the parameters.

o Evaluating conditional distributions can often betedious 0 Fortunately we have a criterion that avoids this:

A statistic t(x) is a sufficient statistic if andonly if the probability or density function can be;factored,

into two' parts, one dependent only on the statisticand the parameters, the second independent of theparameters. "

Sufficiency, then, for the infinite population case is definitely to be

aimed at, although not always obtainable. For a finite population, how-

ever, one cannot admit this concept as being relevant in view of the

considerations set forth below.

In a special sense every sample of any size Whatsoever is suffi-

cient for estimating the desired population value. Firstly surveys are

interested in estimating means, totals, ratios or other functions of the

measurable characteristics revealed by the ultimate units. These pop

ulation values (which may, only by convention, be termed parameters) are

logically separate from their respective selection probabilities as re-

vealed by the probability system. Secondly, since probabilities enter:into

sampling only in the process of selecting the units to be included in

the sample, and not with the 'characteristics to be measured, the condi-

tional distribution of the sampled characteristic values from any size

sample, depending only on these selection probabilities and the sampling

procedure, is independent of the population value being estimated. Thus

the concept of sufficiency is not relevant, at least in the context of

37

the universally accepted definition as quoted above from Fraser. (One

might note that this logical separateness of the population values and

the selection probabilities ,is an essential feature of sample survey

theory. Without this separateness no sampling operation would be pos

sible and therefore there would be no meaningful theories of sampling.)

Reference might be made to the original definition given by

Fisher, but that is too general and subject to the same type of criti

cism as just given against the more complete (complex) definition.

The one place that the traditional definition of sufficiency might

fit would be for the case when sampling from a finite population with

replacement where the vector of ni-values (with ni denoting the number

of times that the i-th unit appears in the sample) would be sufficient

for estimating the probabilities of selection of the sampling units.

However, these are known or assumed at the start of the sampling proce

dure, and have no bearing on the characteristics carried by these units

which are being measured.

Thus one must conclude that the notion of sufficiency has little

meaningful application to estimators based on samples from finite pop

ulations • Actually Basu (1958) is the only author to argue strongly for

sufficiency; other authors have been silent on the question since (pre

sumably), as indicated, its logical basis is rather insecure for finite

populations.

3.3.4 Efficiency and Minimum Variance. Efficiency seems to be one

of those concepts for which every author has his own definition, the

common denominator of which seems to be a connection with the idea of

minimum variance; hence they will be discussed together. The one set of

definitions which is directly akin to the problem at hand of choosing

criteria for estimators from finite populations is that an efficient

estimator is that one from among several satisfying a set of other cri-

teria which has minimum variance. That one is taken to be the most

efficient, and the relative efficiency of the other estimators is meas-

ured by the ratio, less than unity, of the minimum variance to their

variance. This notion, as indicated, extends directly to the problem

of estimating from a sample taken from a finite population, since it is

entirely logical that if there isa choice of' estimators, the obvious

selection would be the one with the smallest spread or variance.

The other group of definitions centers around asymptotic ideas, and

is one place where Fisher and Neyman agree, to wit:

"Here, again, I agree unreservedly with Fisher that, whenseveral consistent estimates of' the same parameter areavailable, all tending to be normally distributed, theone with the smallest. variance is preferable to others."--Neyman (1952, p. 188)

The dependence on asymptotic normality rules out this definition. In

the case of a single universe of N units, the most that n can tend

to, when sampling with replacement,. is N,iI which cannot yield a nor-

mally distributed estimator.

While discussing minimum variance, a digression might be made from

the main stream of thought to set the historical record straight with the

following quote from Neyman (1952, p. 227):

"Laplace himself studied certain problems on the assumptionthat the loss due to an error in estimation is directlyproportional to the absolute value of' the error. On theother hand, Gauss noticed that various results became

39

more elegant if the loss is assumed to be proportional tothe square of the error committed so that

Upon reflecting on the general nature of errors of measurements, in particular on the possibility of systematicerrors, Gauss f01.Uld it necessary to impose on the estimateF (X ) another condition, that of 1.Ulbiasedness, expressedn nby the identity,

It will be seen that the two conditions, one of the1.Ulbiasedness of F (X ) and the other of minimum ex-.. n n .pected loss measured by the square of the error, formulate the now familiar problem of best 1.Ulbiased estimates.All this was reported to the Konigliche Societat derWissenschaften in G6ttingen on February 15, 1821, andsubsequently published in Latin."

Of the classical criteria of estimation, these two, which outdate almost

all statistical theory, are about the only ones that apply to finite

population problems.

Further, as shown by Das (1951), Godambe (1955) and others, minimum

variance estimators may not exist in an estimable form, because the co-

efficients or weights for the observations necessary to produce a mini-

mum variance estimator may be enmeshed with the other variate values.

3.4 Criteria for Estimators from Finite Populations

With all this, then, where does this subject stand? What criteria

can or should be applied to determine which estimator is optimum when

the sample is from a finite population? Judging from the frequency of

mention in the literature, there would seem to be the following:

1. Consistency: The chosen estimator should be consistent, not

40

in the sense of convergence in probability to the population value being

estimated, but in the sense of:

Definition: A statistic satisfies the criterion of consistencyif, when it is calculated from the whole population, it is equal to the required population value.

The more restrictive condition of unbiasedness could well be listed as a

universal criterion, if it were not for the fact that there may be sit-

uations where an estimate with a disappearing bias will better meet

other criteria for "bestness tl• If there is no bias, then consistency is

assured. If a bias is to be allowed to be present, however, one should

be able to determine an upper limit for that bias in terms of some char-

acteristics of the sample or population.

2. Minimum variance or minimum mean square error: In the case of

an unbiased estimator, these two are the same, but in the general case

they are related in the following manner: MSE = V + (Bias)2. Except in

the case where there are compelling reasons for ignoring this criterion,

it is certainly evident that one would want an estimator which gave as

narrow a spread to the estimates of the population value as possible.

It would seem that these are the two major criteria, at least from

among those based on probabilistic considerations. They have their rel-

ative importance determined by the given particular situation at hand.

Both are to be desired, but there may be situations where one or the

other of them is an overriding consideration to the detriment of the

other, for example, where unbiasedness is of such importance that var-

iance is taken at face value rather than considered a restriction. Or,

as indicated earlier, a minimum variance estimator might not exist in an

41

estimable form, although one might then modify "minimum" to choose the

estimator with the smaller variance.

There are two other criteria which might be mentioned, a.lthough

they are of lesser rank than consistency and minimum mean square error,

and not based on probabilistic considerations. These are:

3. Cogredience (Independence of scale): To satisfy this criterion,

our estimate f(~) must ha.ve the property f( c ~) = c f(~). For example,

if two people are estimating lengths from the same observations, one

measuring in feet. and the other in meters, we would like them to get

equivalent estimates, expressed in feet and meters respectively. (This

really should be taken care of in interviewer instructions.)

4. Ease of Computation: It would seem desirable, all other things

being equal, that an estimate be easy to compute. The more complicated

the form of an estimator, the more expensive it is to produce estimates

and the more time it may take to get results which can be acted on •.

With the advent of the large computers, though, this objection may be

disappearing. Also along these lines, if past history continues, the

process of adapting techniques so that they can be handled on the com-

puters may well indicate that, with further work, simplifications and

short cuts can be developed and approximations found' which would serve

most purposes.

The third criterion mentioned would seem to be essential, although,

as mentioned, it should be required before the consideration of estima-

tion problems. Ease of computation should possibly be considered a

desideratum, rather than a criterion, but that is a matter of semantics

42

--certainly it is not as dominant a criterion as either of the first two

mentioned •.

ThUS there are two major criteria to apply to the problem of select

ing an estimator from among a class of estimators for samples from a

finite population: that it be consistent and that it have a minimum

mean square error.

4;

4.0 THE GENERAL CLASSES OF LINEAR ESTIMATORSFOR SAMPLING WITH REPLACEMENT

4.1 Introductory Remarks

The first formal approach to the problem of determining classes of

estimators for samples from finite populations was that of Horvitz and

Thompson (1952). They formulated three classes of linear estimators.

for the population total for a scheme of sampling from a finite popula

tion with arbitrary probabilities arid withoutreplacement. These classes

were formed by having coefficients dependent on the order of draw, the

presence or absence of a unit in the sample, and on the particular sam-

ple drawn, respectively. However, they did not formalize their ideas

for establishing classes of estimators, and thus did not pursue them

further.

Godambe (1955) formulated what hecal~ed a "unified· theory" of

sampl:ing from finite populations. This theory, actually a generalized

basic theory, was not axiomatic in nature, although Godanibe apparently

recognized some essence of formality in his approach and that of Horvitz

and Thompson (but he too failed to formalize the deductive process).

For his theory Godambe did posit a generalized notational system which

could cover both probability systems where the units are. drawn with or

without replacement, however, this system is not an operational system

for. determining probabilities. It will be seen that Godanibe' s most

general estimator would fall into class seven under the aXiomatic ap-

proach presented subsequently.

44

Realizing that one must have some definite set of rules for estab

1ishmentof the classes when formulating groups or classes of estimators

for samples from finite populations, Koop (1957) developed an axiomatic

approach. This axiomatic approach, with axioms based on the physical

realities of sample formation, i.e., the way things actually happen,

would seem much more basic for establishing classes of estimators for

samples from finite populations than attempts at utilizing the classical

"infinite" estimation criteria such as unbiasedness, sufficiency, admis

sibility, completeness, etc. In fact, the notion that there are crite

ria for which one can develop classes of estimators is not germane to

sample surveys. In sample survey theory, the classes of estimators are

developed first, e.g., by axiomatic methods, and then criteria such as

unbiasedness or minimum mean square error are applied to various estima

tors in each class to attempt a determination of bestness within that

class. The generality of the axiomatic approach is also of considerable

theoretical advantage because it provides the basis for determining the

optimum probabilities in any defined sense.

For sampling from finite populations, axioms, to be useful, must

be based on physical realities, since sample survey theory is opera

tional in a physical sense. These axioms, as postulated by Koop are

"three features inherent in the nature of the process of selection" of'

the sample. They are stated as follows:

lI(i) the order of appearance of the elements,

(ii) the presence or absence of any given element (in the sample)

which is a member of the population (or universe), and

(iii) the set of elements composing the sample considered as one

of the total number possible (in repeated sampling according

to the given probability system)." (1957, p. 25)

These three possible features, or combinations thereof, which are. in

herent in the selection process and therefore sampling procedure, supply

the basis for the deductive construction of seven general classes of

estimators. The seven result from taking the axioms singly, two at a

time, and, most generally, all three together.

He derived the classes of estimators for estimating the total of a

finite population when sampling with arbitrary probabilities and without

replacement. This thesis will consider the case of sampling from a fi

nite population with arbitrary probabilities, but with replacement of

each sampling unit preceding the drawing of another unit.

These estimators of the population total (note that the choice

between discussing the total or the mean is completely arbitrary) for a

characteristic under study will be listed and discussed in Sections 4.4

through 4.10 inclusive. For each class of estimator, weights or coeffi

cients will be determined which (a) satisfy the criterion of unbiased

ness, (b) are independent of the properties of the population, 1.e., of

the measurable characteristic(s) under observation in the sample, and

(c) are positive.

In connection with requirement (b), Koop has shown that, for the

general classes of estimators, minimum variance estimators do exist, but

the weights for such estimators are enmeshed with the variate values of

the characteristics of the sampling units. Thus, although theoretically

46

eXistent, such weights are non-estimable; hence for all practical pur-

poses, minimUlll variance estimators do not exist. For this reason this

study will restrict consideration to weights which are independent of

the values of the characteristics under study.

4.2 Probability System and Notation

4.2.1 The Probability System. Consider a population of N sam-

pIing units, Up U2, • • ., UN' Associated with each of' these units is

a vector of measurable characteristics, say Yi = (Yli' Y2i' ... , Yhi).

A sample of n of these N units is to be drawn in a manner which is

completely specified before the sampling procedure begins, and from ob-

serving certain of the vector characteristics of the sample units it is

desired to estimate the aggregate of these characteristics pertaining to

the universe under consideration. For drawing the sample it is given

that the probabilities of selection at any given draw are arbitrary

(arbitrary in the sense that they can assume discretionary non-negative

values, not necessarily equal) with the sole restriction that when

summed over all units in the universe they sum to one.

Also for the system under consideration, each unit is required to

be returned to the universe after it is drawn and measured, and bef'ore

the next unit is drawn. The case where the sampling is done without

replacement of the units has already been mentioned above. The most

general case where the units mayor may not be replaced, depending on

some arbitrary or systematic method of determination, or where they may

be replaced in clUlllps after a certain number have been drawn, or some

such chaotic situation will not be discussed, for fairly obvious reasons.

47

Within this framework, then, the following probabilities are to be

considered, with attendant notation. For an explanation of notation see

Section 4.2.2.

Pi -- the probability of drawing the i-th unit on any given draw. These

Pi values will be constant for all draws. The only restrictions

on the values of the Pi are that 0 < Pi < 1 and that Pl + P2

+ ••• + PN = 1. Allowing either equality in the bounds on the p.J.

Ps

effectively reduces the size of the universe and thus will not be

considered.

pi = 1 - (1 - Pi)n = the probability that the i-th unit will appear,

any number of times, in a sample of n units drawn with replaceN

mente I: p~ = E(v) where v is the number of distinct unitsi=l J.

among the n units drawn (see also Section 8.0).

= IT p.ni = the probability. of obtaining a ~iven particulari€s J.

v

sample,

n:P ,-- =..""....;~

s JT n.!i€s J.

V

I: n = n •i

n.11 p. J. = the probability of obtaining a specific comi€sv J.

bination of units, disregarding order of draw, but with the same

number of appearances of each unit in the sample (i.e., a constant

ni vector, ni ~ 0). This would be the sum of' n! ~Tr ni : Ps - terms •h€sv

P -svI:

P(nlv)

n!

JT n.!J.€S J.

v

IT niPi = the probability of obtaining a given

i€sv

distinct sample, that is the set of samples with the same set of

48

distinct units. The n. vector is disregarded other than when the-J.

elements are non-zero. vnThis would be the sum of' A 0 Ps-terms.

4.2.2 Notation and Def'initions. Def'initions:

Particular sample -- a given individual sample, i.e., the ordered array

of' units resulting f'rom the n draws comprising the sat,nple. They

will be S = ~ in nuniber.

Distinct samplei

a sample containing a· set of' v distinct units, dis-

regarding the number of' times each unit appears. A distinct sample

is the set of' particular samples with the same :A.iui vector, where

:A.. = 1 if' the i-th unit appears in the sample any nuniber of'times,J.

[Ref'. Riordan (1958,

The n. vector is disregarded other than-J.

nThere will be S' = ~ S =

v=l vFor each distinct sample of' v distinct

f'or nonappearance.= 0

whether its elements are non-zero.n~ ~ distinct samples.

v=l

units there are AvOn particular samples.

1'. 91 ).]

Indices (Subscripts):

i the unit index f'or the universe. i = 1, 2, ••• , N.

...s ref'ers to a particular sample, or is the index f'or sunmJ.a.tion

over all particular samples. s = 1, 2, ••• , S•

ref'ers to a distinct sample, or is the index f'or summation

overall distinct samples. s = 1, 2, ... , st.V ..

t the index denoting the order of' draw f'or the sample units.

t = 1, 2, ... , n.

v an indexf'or summation over the dif'f'erentpossib1e nunibers

of' distinct units. It also denotes the nuniber of' distinct

units among the n units in the sample (1 ~ v.~ n).

Letters:

n the sample size.

the number of' times the i-th unit appears in the samplej

S

S'

E n = n.i€s i

vref'ers.to particular samples, If' in number.

denotes the total number of' distinct samples,

number.

nE eN in

vv=l

z

denotes the number of' distinct samples of' size v, i.e., the

samples of' n with v distinct units, ~ in number f'or a

given v.

(with a subscript) will be used as a characteristic random

variable to denote appearance of' a unit according to the

,

:;

specif'ic subscript assigned.

6vOn the dif'f'erences of' zero notation. 6 is the f'inite dif'f'erence

operator: 6 un = un+l - un' Thus the dif'f'erences of' zero

would be

(2n_ln ) _ (In.On)

2n_2(ln) + On

~On :; [(3n_2n)_(2n_ln~ _ [(2n_ln)_(ln"On)]

For additional discussion see Whittaker and.Robinson (1944)

or Riordan (1958). See also Section 4.3(3). Tables of' 6vOn

50

were given by stevens (1937) and.were reprinted by Fisher and

Yates (1949, table 22).

P(nlv) -- denotes the v-part partitions of n, that is, all sets of non

zero values for n. (i € s ) such that r n. = n. The full~ v . ~

~€Sv(proper) partitional notation, as given by Chrystal (1900,

p. 556), would beP(nlvl ~ n-v-l), i.e., the partitions of

n into. y parts no one of which exceeds n - v-l, but the

shorter form will be used.

i€sv those its (ubits) contained in.a distinct sample.

s::> i those samples (distinct samples) which contain the i-th unit.

eSv:;)i)

4.3 Some Combinatorial Considerations

The following are some combinatorial considerations concerning a

sample of size n drawn from a finite population of size N . with re-

placement.

(1) The total number of possible samples is S = Ifl since each

unit drawn is replaced prior to the drawing of the next unit.

(2) The total number_Of distinct samples, i. e ., samples containing

different sets of v. (1 ~ v ~ n) distinct units, is

S' =n1: S =

v=l v

+eNn

since there are CN combinations of v units from.atotal of N.v

51

(3) The total number of samples of size n which will contain v

. distinct units, Le., the number of ways of putting n different objects

into v different cells, With no cell (among the v) empty, is given by

Riordan (1958, p. 91) as:

where Sen, v) is the Stirling number of the second kind. This could

also be written:

from which it follows that

=1 for v = 1

= n! for v = n

= 0 for v > n.

(4) From paragraphs (2) and (3) another ex:Pression for the total

number of possibJ,.esamples would be:

(5) COD.!Sider all those sam:Ples of size n,. each containing v dis-

tinct units; the total number which contain the i-th unit is N-lC l'v-

52

(6) Thus the total number of particular samples containing the

i-th unit is

(7) If one is given that the sample of n contains v distinct

units, and that one of those units is the i-th unit, there might be some, ,vn'

interest in the number of those t::. 0 sa.mple~ which contain the i-th

unit a given number of 'times. Then with the help of the respective

diagrams it is easy to see that number which contain i

n - I,-- -JA'- ---..

once is:

i.-., .

~----="" ,,--:-;-__~..J~.

v-IVotherdistinct units

• 0 It

n - 2___------'A----__

'------,vr--------'

i i v-I others

twice is:

n - v-I times is: n - v-I v - I

iii v-I others

thus:

53

(8) It follows from (7) that the total number of times that the

i-th unit will appear in the s6mples of size n with v distinct units

is

=n-v-l

E r Cn ~V-l On-r •r=l r

From this, the total number of times that the i-th unit will appear in

all ~ s6mples (Le., v = 1, 2, ... , n) will be:

n n-v-lI = n + E CN-l E r Cn ~V-l On-r •

v=2 v-l r=l r

I, as a total quantity, can be derived much more simply by noting that

the number of appearances of a particular unit, say the i-th, among the

~ units which appear in the Nn s6mples is symmetric in the N units,

and thus

I =~/N = n ~-l •

This approach, however, does not provide any information concerning the

component structure of I.

4.4 Class One Estimator

4.4.1 The Estimator (of the universe total). The class one estima

tor, with weights dependent solely on the order of appearance, is given

by

nTl = E at x.

t=l ~t(4.4-1)

where at (t = 1, 2, ••• , n) is the weight attached to the element

selected at the t-th draw and x. is the value of the characteristicJ.t

measured on the i-th unit observed on the t-th draw.

4.4~2 Number of Weights. The total number of weights isn, one

for each draw.

4.4.3 Determination of the Weights. The first step in the deter

mination of weights is to determine the expectation of Tl , as follows:

n'E(Tl ) = E( 1: at xi )

t=l t

n= 1: at E(x. )

t=l J.t

For Tl to be unbiased E(Tl ) must identically equalN1: x., i.e.,

i=l J.

N n1: xp 1: a E

i=l i~, t=l t

which requires that

nPi 1: at = 1 ' for i = 1, 2, ••• , N.

t=l(4.4-2)

This condition effectively says that for Tl to exist as an estimator,

all the Pi must be equal, i. e.,

Pi = P = liN •

Hence the at exist only when the Pi are equal and not in the general

55

case. In this situation, a solution is

so that

at = N/n

n NT = L: -x1 t=l n it

t = 1,2, ... , n (4.4-3 )

(4.4-4)

This is a well known estimator which is readily seen to be unbiased.

4.4.4 Variance of Tl . The variance of Tl' for the case when prob

abilities are equal·and when cit = N/n , is

if n= 2" V( L: x. ) .

n t=l J.t

Because of independence of the draws, one from another,

~= 2"

n

2nO"

which can be estimated by

(4.4-5 )

(4.4-6)

56

4.5 Class Two Estimator

4.5.1 The·Estimator. The'class two estimator~ with weights depend-

ent solely on the presence or absence of a given element in the sample,

is given by

(4.5-1)

where ~. (i ~ 1, 2, ••• , N) is the weight attached to the i-th element~ .

whenever it appears in the sample, and where i€sv denotelS summation

over the distinct units in the sample (v ~ n).

4.5.2 Number of Weights. The total number of weights is N, one

for each sampling unit.

4.5.3 Determination of the Weights. Since the weights, the ~i'

are attach~d whenever the i-th element appears in the samp;I.e, and summa-

tion is over the distinct units, to determine the ~i (4.5-1) must be

rewritten as

(4.5-2 )

where z. is a characteristic random variable for which~

= 1 when the i-th unit appears in the sample, irrespective

of the number of times it appears,

= 0 when the i-thunit does not appear in the sa.mple, and

where, see (8-10),

57

Then:

N *= E f3ixi p. •i=l 1.

For unbiasedness

v *- 1: t3.Pi x.

i=l 1. 1.

which imposes the requirement that

f3.P~ = 1 for all i.1. 1.

Therefore, for unbiasedness, it is found that the weights are uniquely

dete:rmined as

..Thus the unbiased linear estimator for class two is

where p~ = 1 - (1 - p1..)n1.

(4.5-3 )

(4.5-4)

= the probability that the i-th unit appears in a sample of

n units drawn with replacement.

58

This is the analogue of the Horvitz-Thompson estimator. It also has

been propounded, for the case of equal selection probabilities, by

Godambe (1955 ) and Roy and Chakrava,:rti (1958).

4.5.4 Variance of T2 • The variance of T2

is given by

NV(.T2 ) = V( E ~iX'Zi)

i=l ~

Substituting (8-12) and (8-13) into this expression, and also substi-

tuting f.or the ~i' yields,

2N x.

~= E

i=l pi'2i

where P~= 1 - (l-pi)n

* * ( )n~ = 1 - Pi =l-pi

* n~j= (l-pi-Pj ) •

This ,can be ~itten more concisely as

*. N (l-pi ) 2= E -- x.*. ~

i=l p.~

* * *~qj - ~j

**PiPj

An estimate of V( T2 ) can be obtained as

.*= (1-1'1) 2E --:;2 xi -

ie:s p.v =!-

NE

ifje:sv

* * *~qj - ~j

***pipjI'ij(4·5-6)

59

The functional similarity between this and the Horvitz-Thompson variance

formula is readily apparent.

4.5.5 Numerical Example. To illustrate the procedures involved in

Section 4.5, consider the following examples, based on all possible

samples of sizen = 3 from two simple four-unit populations.

Unit: A B C D- - -"Pi 1/2 1/4 1/8 1/8

1-(1-Pi)3 * .8750 .5781 .3301 .3301= Pi

Case A:

Case B:

Xi 3 4 8 5

xi/P~ 3.4286 6.9192 24.2351 15.1469

x. 8 5 4 3~

* 9.1429 8.6490Xi/Pi 12.1175 9.0882

When setting up Case A, the numerical values were assigned to the units

at random. It was also deemed advisable to examine the situation where

the probabilities are somewhat proportional to the size of the units,

thus the numerical values were reassigned to the letter-units to produce

this situation as Case B.

When drawing samples of size 3 (n=3) with replacement, the fo11ow-

ing distinct samples are possible:

A: AM(64)

B: BBB(8)

C: CCC(l)

D: DDD(l)

AB: AAB(32), ABA(32), BAA(32), .BBA(16), BAB(16), ABB(16)

60

AC: AAC(16), ACA(16), CAA( 16 ), CCA( 4), CAC( 4), ACC( 4)

AD: AAD(16), ADA(16), DAA(16 ), DDA( 4), DAD( 4), ADD( 4).-

BC: BBC( 4), BCB( 4), CBB( 4), CCB( 2), CBC( 2), BCC( 2)

BD: BBD( 4), BDB( 4), DBB( 4), DDB( 2), DBD( 2), BDD( 2).CD: CCD( 1), CDC( 1), DCC( 1 ), DDC( 1), DCD( 1), CDD( 1)

ABC: ABc( 8), ACB( 8), BAC( 8), BCA( 8), CAB( 8), CBA( 8)

ABD: ABD( 8), ADB( 8), BAD( 8), BDA( 8), DAB( 8), DBA( 8)- -

ACD: ACD( 4), ADC( 4), CAD( 4), CDA( 4), DAc( 4), DCA( 4)

BCD: BCD( 2), BDC( 2), CBD( 2), CDB( 2), DBC( 2), DCB( 2).

The number f'ollowing each sample, _when divided by 512, is the probabil

ity of' obtaining that particular sample (p ).s

This example, with N = 4, n = 3, produces the f'ollowing results f'or

the class two estimator:

A 64

B 8

C 1

D 1..AB 144

AC 60

AD 60

BC 18

BD 18

*T2 = E xi/Pos ies

v~

v

Case A Case B

3.4286 9.1429

6·9192 8.6490

24.2351 12.1175

15.14b9 9·0882

10.3478 17·7919

27.6637 21.2604

18.5755 18.2311

31.1543 20.7665

22.0661 17·7372

CD 6

ABC 48

ABD 48

ACD 24

BCD 12

From this then,

61

T2 = .1:: xi/p;s ~€sv v

Case A Case B

39.3820 21.2057

34.5829 29.9094

25.4947 26.8801

42.8106 30.3486

46.3012 29.8547

for Case A (3-4-8-5): 512 E(T2) = 10239.6540 or E(T2) = 19·9993 ;

for Case B (8-5-4-3): 512 E(T2) = 10239.8865 or E(T2) = 19.9998 ;

i •e •, the estimator is unbiased since T = 20.0000.

Using (4.5-5), the variances for these examples can be determined

as follows:

* car *1 * 2 2 2 * * 2 * *Pi ~ Pi xiA xiB xiA~/Pi xiB~/pi

A .8750 .1250 .1429 9 64 1.2861 9.1456B .5781 .4219 .7298 16 25 11.6768 18.2450C .3301 .6699 2.0294 64 16 129.8816 32.4704D .3301 .6699 2.0294 25 9 50.7350 18.2646

A: 193·5797 B: 78.1256

- AB AC AD BC BD CD.* * .0527 .0837 .0837 .2826 .2826 .4488~qj

~j = (l-Pi -Pj )n .0156 .0527 .0527 .2441 ..2441 .4219

* * * .0371 .0310 .0385 .0385 .0269~qj - %j .0310

* * ·5058 .2888 .2888 .1908 .1908PiPj .1090

62

AB . AC AD BC BD CD

(~qj - ~j )/p~pj .0733 .1073 .1073 .2018 .2018 .2468

xixj(A) 12 24 15 32 20 40

XiXj(B) 40 32 24 20 15 12

A: 25.4299 B: 18.9654x 2 x 2

50.8598 37.9308

Thus, for Case A:

and, for Case B:

V(T2)A = 193.5795 - 50.8598

= 142.7197 ,

V(T2)B = 78.1256 - 37.9308

= 40.1948.

..

When computed directly from the possible sample estimates, i. e., by

using

the following results are obtained:

Case A: V(T2)A = 142.6741

Case B: V(T2)B = 40.1744

The slight discrepancies are due to rounding off errors • Further, using

(4.5-6), the following estimates of the variance are obtained for the

various possible samples listed above:

Sample Case A Case B

A

B

C

not esti:rn.e,ble

not estimable

not estimable

63


D not estimable

AB 19.8801 36.0524

AC 389.4929 101.5662

AD 151.7662 60.3442

BC 396.5720 119.2456

BD 163.2139 78.8808

CD 513.0051 143.4502

ABC 390.8153 116.4865

ABD ·159.4966 77.9345

ACD 505.6374 141.2127

BCD 505.4355 156.3205

4.6 Class Three Estimator

4.6.1 The Estimator. The class three estimator, with weights

dependent solely on the distinct sample drawn, is given by

(4.6-1)

where r (s = 1, 2, •• 0' SI) is the weight attached to the s -thSv v v

distinct· sample whenever it appears. SI is the number of distinct sam-

p1es and

sample 0

i€s again denotes summation over the distinct units in thev

4.6.2 Number of Weights. The total number of weights isn

SI = E eN with there being C~ different sets of v distinct unitsv=l v

in the sample of n.

64

4.6.3 Determination of the Weights. Imposing the criterion of

unbiasedness says

(4.6-2 )

where P denotes the probability of obtaining the s-th sample. (4.6-2)s

can be rewritten as

where i€U denotes all i in the universe and s :) i denotes those

samples containing the i-th unit. For T3

to be unbiased requires that

E 'Y P =1s:) i Sv s

for all i.

This expression can be rewritten as

n n= E E SE€S 'YsvPs = E E 'Ys E Ps

v=l s:;) i v=l s::>i v S€Sv V V V

where, in the triple sum, the first summation is over the possible

values of v. The second summation is over those distinct sets with v

units which contain the i-th unit. The third summation (with index

S€S ) groups the particular samples (sets of n, ordered by draw) intov

distinct sets (those with distinct sets of v units). The third sum wtll

vngroup ~ 0 terms together, one for each particular sample within the

distinct sample, and one can readily observe that

the requirement for unbiasedness becomes

nE E 'Ys Ps = 1 ,

v=l s:::>i v vv

E P = Ps SS€S VV

Thus

(4.6-3 )

from which a solution for the r issv

1= --=,,~-

N-ln C 1v-

1• Ps

v

(4.6-4)

since

Es :::liv

1

n eN- lv-l

= (·E l/n c~=is :::liv

N-lC 1v-l= = -n eN- l n

v-l

Thus, from (4.6-4),

(4.6-5 )1T3 = _N-l

n P c-"s v-lv

Note that a more general solution would be

=

One could thus obtainnE E C

sv= 1.

v::;l s ::>iv ,.

additional solutions for the rls by suitable manipulation of the CIS.

with the restriction that

From the requirement for unbiasedness, (4.6-3) it can be seen tha.t

one can obtain the class two estimator for the restrictive case of equal

selection probabilities. Assuming that the r s are equal over all sv'v

(4.6-3) becomes

n nr(s) E E PEl

v v=l s::>i svv

or, since

unit,

n.r: .r:

vel s :;,iv

1:r =s *v p

p = p* = the inclusion probability for the i-thSv

66

and T'3 (equal)1=-*P -

4.6.4 Variance of T3

• One can determine the variance of T3

as

follows:

n S= .r: .r:v P (rs . .r:

vel s =1 sv v ~€SV V

This can be estimated by

(4.6-6)

/::::-..

To obtain an estimator for V(T3)~ an unbiased estimator, rrF, must be--2 N 2 N

found for T' = .r: Xi + .r: XiX. • The simplest unbiased estimator ofi=l ifj J

rrF is given by

2..r: Xi

~ = _~_€_s-=v:-::-__V n eN-l,p

v-l sv

+

p PI sv sv

where P = =sv N n n S

1 - I: Pi I: I:v P

v=2 si=l s =1 vv

Note that at least two different tinits :(v.?~ 2) • are required to esti

mate the cross product term. The above can readily be shown to be un-

biased as follows:

67

=

2

(

I: Xi )~ ~v P __i_€_S,..lv ___

v=l s =1 sv n CN-1 Pv· v-1 sv

=

=

1

n ~-1v-l

I: xixj

)I:

( ifJ·· n S ( irJ.. xixJ )E(n-l) ~N-2 p'

= I: I:v p(n-1) C~-2 p'v=2 s =1· sv

v-2 s v v-2 sv ··v

=n

I: x,xj

I: I:irj€U ~ v=2 sv~i,j

1

(n-1) CN-22v-

N

= ilj xixj ,

Thus:

68

n eN-1 Pv-1 sv

1:: xxi~j€s i j

,v

(n-1) CN-22

pIv- s. v

(4.6-7)

It may be noted that this estimator can be negative for certain samples.

4.6.5 Numerical Example. To illustrate the techniques of Secti~n .

4.6, the examples of Section·4.5.5 can be used. Since distinct samples

are again involved, the following results are readily obtainable for

1T =-~~-- Z x3 n ce=i Psi€s i

. v v

(4.6-5 )

(2)n ~-;L

v-1

A 64 3B 8 3C 1 3D 1 3

AB 144 9AC 60 9AD 60 9BC 18 9BD 18 9CD 6 9

ABC 48 9ABD 48 9ACD 24 9BCD 12 9

(3 )

1:: Xi T3

= 512(3)/(1)(2)i€s sv v

A B A B

3 8 8.0000 21.3333 .

4 5 85.3333 106.6667

8 4 1365.3333 682.6667

5 3 853.3333 . 512.0000

7 13 2.7654 5.135811 12 10.4296 11.3778

8 11 7.5852 10.4296

12 9 37·9259 28.4444

9 8 28.4444 25.•2840

13 7 123·2593 66,.3704

1.5 17 17.7777 20.1481

12 16 14.2222 18.963016 15 37·9259 35.555617 12 80·5926 56.8889

69

From this then,

for Case A (:3- 4-8-5) : 512 E(T:;) = 102:;9.9878 or E(T:; )A = 20.0000,

for Case B (8-5-4-:;):512 E(T:;) = 102:;9.9983 or E(T:;)B = 20.0000,

i.e., the estimator is unbiased.

Using (4.6-6) (which is the same computationally as the variance

of all possible sample estimates), ·the variance of T:;, for these exam

ples, ·is:

V(T:;)A = 5331.8290 ,

V(T3 )B = 1601.6463 •

Further, using (4.6-7), th~ various samples delineated. above for which

v = 2 or 3 produce estimates of the variance.of the total as follows:


AB - 38.729:1. -130.4508

AC -135.6382 ..179.9976

AD - 84.2017 -135.6382

BC 406.8677 192.8370

BD 192.8)70 166.8239

CD 11429.0031 :;291.9930

ABC 99.6849 -138.2485

ABD - 71.4258 -116.9902

ACD 485.2026 432.7377

BCD 4318.3894 2141.5599

70

4.7 Class Four Estimator

4.7.1 The Estimator. The class four estimator, with weights

dependent on both the presence or absence of a unit and the order of

appearance of the units, is given by

(4. r-l)

where 5it(i = 1, 2, ••• , N; t = 1, 2, ••• , n) is the weight attached

to the i-th element whenever it appears at the t-th draw.

4.7.2 Number of Weights. The total number of weights is N n (N

weights at each of n draws).

4.7.3 Determination of the Weights, Since the weights, 5it ' are

attached depending on the appearance of the i-th element on a particular

draw, as for the class two estimator, the estimator can be rewritten in-

troducing the characteristic random variable Zit' Thus (4.7-1) be\':--

comes:

N nE E 5. t x.Ziti=l t=l J. J.

(4.7-2 )

where{

= 1

= °if the i-th element appears at the t-th draw

if the i-th element does not appear at the t-th

draw

and E(zit) = Pi sinCe: the individual draws are independent.

11

Taking expectations, .

N n= I: x. I: B.tPi

i=l 1, t=l 1,

Imposing the criterion of unbiasedness, i.e., requiring thatN .

E(T4) = i~lxi ,means that the Bit can be determined by setting

for i = 1, 2, ... , N.

The obvious solution for this is to s~t

1Bit = nPi

(4.1-3)

which weights hold for all i, and are independent of the order of the

draw (the t' s ) • This yields the familiar

A more general solution might be Bit = rot/Pi where I: rot = 1 ,

but it is well known that the variance of a linear function, with arbi-,

trary weights is minimized when the weights are equal, Le., when

rot = lin for all t.

When the selection probabilities are equal it is readily seen that

the class four estimator reduces to the class one E7stimator (4.4-3) •.

72

4.7.4 Variance of T4• To determine the variance of T4 , set

N n 1V(T4) = V( E E np xizit )

1=1 t=1 i

(4.7-5)

Note that the terms involving Cov(xit,Xitt) and COV(Xit, Xjt ,) dis

appear by virtue of the independence of the" draws, One from another, so

that the termsinvo1~ng the t-thand t t-th draws· have zero covariance.

Now, from multinomial theory for a single draw,

V(Zit) = Pi % where % = 1 - Pi

COV(Zit' Zjt) = - PiPj .'

Substituting these into (4.7-5) produces

N n X~ N nV(T4) = Z E ~ 2 Pi % + Z Z

i=1 t=l n Pi i~j t=1

2N Xi 1-Pi= E - (-)

i=1 n Pi

N 2 N 2Xi 1= E - ( E Xi)

i=l nPi ni~l

N 2r/-Xi

= E -- - -i=l nPi n

N [ xi 2

- Pi r]1= 1:; P1 (P

i)n i=l

and

(4.7-6)

73

One can estimate this variance by using

.... 1V.( T4) =---,.;;;;...~......n(n-l).

nE

t=l(4.7-7)

where T= 1n

nE

t=l

It is to be noted that (4.7-7) always produces positive estimates of

the variance which is a definite interpretational attribute.

4.7.5 Numerical Example. The class four estimator depends on sum

mation over the units -of the sample as they appear, and not just the

distinct units observed. ThuS, in using the four-unit population from

Section 4.5.5 as an example, the interest is in the groups of samples

that have the same units the 'same number of times. Order is not. impor-

tant, so the samples and results for this case can be grouped as follows:

1T4 = n (4.7-4)

unit A B C D. .

{ Case A 6 16 64 40Xi/Pi

Case B 16 20 32 24

74

Sample 512 PSI .E xi/Pi T4i€s st

A B A B

AM 64 18 48 .6.0000 16.0000

AAJ3 96 28 52 9.3333 17·3333

Me 48 76 64 25.3333 21.3333

AAD 48 52 56 17.3333 18.6667

ABB 48 ?8 56 12.6667 18.6667

ABC 48 86 68 28.6667 22.6667

ABD 48 62 60 20~6667 20.0.000

ACC 12 134 80 44.6667 26.6667

ACD 24 . 110 72 36.6667 24.0000

ADD 12 86 64 28.6667 21.3333

BBB 8 48 60 16.0000 ' 20.0000

BBC 12 96 72 32.0000 24.0000

BBD 12 72' 64 24.0000 21·3333

BCC 6 144 84 48.0000 28.0000

BCD 12 120 76 40.0000 25·3333

BDD 6 96 68 32.0000 22.6667

CCC 1 192 96 64.0000 '32.0000

CCD 3 168 88 56.0000 29·3333

CDD 3 144 80 48.0000 26.6667

DDD 1 120 72 40.0000 24.0000

75

Thus, for these examples:

for Case A (3-4-8-5): 512 E(T4) =10240.0000 or E(T4)A =20.0000 ,

for Case B (8-5-4-3): 512 E(T4) =10239.9994 or E(T4)B =20.0000 ,


Using (4.7-6), the variance of T4, for these examples, is

V(T4)A = 131.3333 ,

V(T4)B = 9·3333.

When computed directly from the sample estimates, the results are:

V(T4)A = 131.2813 ,

V(T4)B = 9.3333·

Further, using (4.7-7), the following estimates of the variance are

obtained for the various possible samples listed above:


AM. not estimable

AA'B 11.1111 1.7778

MC 373.7778 28.4444

AAD 128.4444 7.1111

ABB 11.1111 1.7778

ABC 320.4444 23.1111

ABD 101.7778 5.3333

ACC 373.7778 28.4444

ACD 283.1111 21.3333

ADD 128.4444 7·1111

76


BBB not estimable

BBC 256.0000 16.0000

BBD 64.0000 1.7778

BCC 256.0000 16.0000

BCD 192.0000 12.4444

BDD 64.0000 1.7778

CCC not estimable

CCD 6400000 ,701111

eDD 64.0000 7·1111

DDD not estimable

4.8 Class Five Estimator

40801 The Estimator. The class five estimator, with weights

dependent on the presence or absence of a particular unit in the dis-

tinct sample drawn, is given by:

(4.8-1)

where es i (i = 1,2, 000' N; Sv = 1,2, 0.0' Sf) is the weight attachv

ed to the i-th element whenever it appears in the s -th samp1eo Summav

tion again is over distinct units.

408.2 Number of Weights 0 The total number of weights isn n .~ v eN = N ~ CN-1

1 with V eN corresponding to the situation wherev=l v v=l v- v

there are ~ combinations of v distinct units from among the· N , and

samples with v distinct units which

a weight is attached to each of the v

_N-lAlternatively, there are c-" 1v-

units in the distinct sample.

77

contain a given specific unit, say the j-th, and N such units.

4.8.3 Determination of the Weights. As with the cla§lsthree esti-

mator, to determine weights for the class five estimator which satisfy

the criterion of unbiasedness, expectation must be taken over all pos-

sible samples. This leads to equations of the following form:

s= sE__

lPsEes 1 Xi

i€s vv

which, in turn, stipulate that for the estimator to be unbiased one must

determine a set of weights satisfying

N nE x. E E

i=l J. v=l s :>iv

NEli E Xi •

1=1(4.8-2 )

The solution of this equation, or the determination of a set of values

which satisfy it, is a problem in combinatorial number theory. As a

special case of the class five estimator, if the subscript "1" is sup-

pressed, one can determine the directly froIl). the identity (4.8-2).

78

Thus

nE E P e ( ) = 1

v=l s::)i Sv Sv iv

must hold in order that T5= !: es (i) xi be unbiased. This, of.. i€s vv

"course, is the same criterion as obtained for the class three estimator

(4.6-3). This yields as a general solution es (i) = Cs (i)/PS withn v v V· v

the restriction that !: !: Cs (i) = 1 , or yields a specific solutionv=l s ~i v

v

e·v(i) = ~ p.v ~=ir ·Another solution is the estimator given by Basu (1958), which be-

longs to this class for certain special values of the c' s. Consider

n

E E CSv(i) = 1v=l s ';;)i

v

i = 1, 2, ••• , N

(4.8-3)

The c-coefficients relate to the possible samples of size n containing

v = 1, 2, ••. , n distinct units. Also it is only meaningful to determine

them in the context of probability values relating to samples of size n.

The right hand side of (4.8-3) will result in multinomial probabilities

relating to samples of size n-l.· Now multiply both sides of (4.8-3) by

Pi' yielding

n ( )n-lp. E !: Cs (i) = PJ.. Pl + ... + PN ,for all i,

J. v=l sv:;)i v

n-l= p. E'!: E (n-l) !

J. v=l Sv P(n-llv) JT njv

79

Choose the following solutions to this equation:

(i = 1, 2, ... , N; v = 1, the i-th

unit),

pC ()=p [I:i s2 i i P(n-1-!:2)

(n-1)!n -In Ii" j.

(i = 1,2, ••• , N; v = 2, say units

i, j),

(i =1, 2, •• 0 , N; v = 3, say units i,j,k),

or, in general

(i = 1,2, ... , N; v = 4, say units i, j,

k, h),

80

(n-l) !

+ E , (V- l )!P(n-llv-l) n .•n j •...n jJl 2 v-l

(i = 1,2, •• . ,Nj 2 =5 v =5 n-l, say units i,

jl' j2' .oo, jv-l)·

(4.8-4)

The solutions listed above hold simply because the sum of the multi-

nomial probabilities in the square brackets for any given i, and for

( N )n-lall sets of distinct v's, add to E Pi =1. Of course, in the

i=llight of the above demonstration, Pi need not be multiplied to both

sides of the condition of unbiasedness, (4.8-3), but this device helps

in the choice of the probability functions in relat~on to the sets of

possible distinct samples of size n •

It will be seen that the sum of the multinomial probabilities in

square brackets on the right hand side of each equation, when multiplied

by p., is the probability of selecting the i-th unit to complete a given~

collection of v distinct units.

Thus the coefficients can be determined as

which yields the estimator

where [

81

P [ ] xiT' 1:

i (4.8-6)=5 P Pii€sv Sv

] denotes the term inside square brackets in (4.8-4).

This estimator, when divided by N, is equivalent to the estimator ob-

tained by Basu for estimating the population mean.

Also, the estimator given by Des Raj and Khamis (1958) belongs to

this class and is a special case for equal selection probabilities,

i.e., p. = liN for all i. In this situationJ.

(l/N)(~von-l + ~v-10n-l) N-(n-l)

~vO~-n

• ( ~vOn-l + ~v-10n-l )

~vOn

using the "differencesof zero" notation, as explained in Section 4.3(3),

rather than the summation notation. Further, from the definition of

~vOn given in Section 4.2, it can be readily shown by induction that

so that

Another ... special case can be obtained by stipulating that

e . = 8i for all s ~ i • . This.situation produces a requirement fors J. vv

unbiasedness which is identical to that of the class two estimator. An

82

alternate derivation would be:

nsince E E P is the probability that the sample includes the

v=l sv~i Sv

i-th unit, and. so equals P~. Thus, as for (4.5-3) ~

4.8.4 Variance of T5

. In very general form, the variance of T5

Will 'be

n= E

v=l(4.8-7)

e'

which can be estimated by

83

[

i~S X~- n P v eN-i

s vv(408-8)

using the simplest unbiased estimator of T2 as given in Section 406040

Again negative estimates of the variance are possible 0

4.8.5 Numerical Example~ The coefficients for the class five

estimator depend on the appearance of a particular unit in a given

distinct sample. Using (4.8-6) to illustrate this class, the coeffi-

cients for the distinct units within each sample are determined from

(4.8-5) after eValuating (4.8-4) to determine the [ J-termo This

term is dependent on the selection probabilities, the sample size n, and

the number of distinct units v. For samples of size n = 3, this term

is:

for v = 1

for v = 2

for v = 3

[[[

J= P~ ,

] 2=2p.p. + Pj' ,

~ J

J= 2pjPk '

and is applied to the coefficient for the i-th unit in the distinct

sample.

84, .

T5s (i)v.

Sample P Estimator Case A Case Bsv

A 64 2A 6.0000 16.0000

B 8 4B 16.0000 20.0000

C 1 8c 64.0000 32.0000

D 1 8D 40.0000 24.0000

AB 144 10 A +16 B 10.4444 17.77789 9

AC· 60 ~ A + 16 C 29·2000 22.40005 - 5

AD 60 ~ A + 16 D 19.6000 19.20005 5

Be 18 20 B + 32 C 37·3333 25·33339 9

BD 18 20 B + 32 D 26.6667 21.77789 9-

CD 6 >4c + 4D 52.0000 28.0000

ABC 48 gA+ !!B+.§C 28.6667 22.66673- 3 3.. -

gA+ !!B+.§DABD 48 20.6667 20.00003 3 3

ACD 24 gA + .§C+.§D 36.6667 24.00003 3 3

BCD 12 !!B+ .§C+.§D 40.0000 25·33333 3 3

Thus, for these examples:t 9

for Case A (3-4-8-5): 512 E(T5) = 10239.9976 or E(T5

)A = 20.0000,, ,

for Case B (8-5-4-3): 512 E(T5

) = 10240.0042 or E(T5

)B = 20.0000,


85t

The variance of T5, computed directly from the distribution of

estimates, is

tV(T5)A = 118.5349 ,

V(T~)B = 8.3961.

4.9 Class Six Estimator

4.9.1 The Estimator. The class six estimator, 'With weights,

dependent on both the order of appearance of the units and the partic-

ular sample involved, is given as

(4·9-1)

where ~ (s = 1, 2, ••• , S) is the weight attached to the s-th samples

whose elements appear in a specified order, and xit is the character

istic value observed on the i-th unit at the t-th draw.

4.9.2 Number of Weights. The total number of weights is S =~

since for this case where attention is paid to the ordering of the

elements within the sample, there will be a separate weight for each

sample.

4.9.3 Determination of the Weights. Taking the expectation of

(4.9-1) over all possible samples yields

where summation is over all i appearing in s, including repetitiona.

86

Thus

S= ~ p ~ E xi

s=l s s i€s

N ,= 1: xi E P 4>

i=l s s s

with E' denoting sunnnation over all appearances of the i-th unit, I

in number as derived in Section 4.;(8). Imposing the condition of un-

biasedness, i.e., setting

NE x. E 1 P 4>

J. S si=l SH

requires that

Niii 1: x .

. -1 J.J.-

for all i • (4.9-2 )

A set of weights which satisfied this requirement would be

1P ",I

S LJ

SH

= 1P Is

(4.9-; )

where I = n If--l is the number of times that the i-th unit appears in

the If' samples and is developed in Section 4.;(8); This yields, then,

as an estimator of the population total

87

1n

T6 = Z xitPIs tel

1n

= Z xP n rf~l t=l it

s

N n= 1: Xitn Hs t=l.

(4.9-4)

where

For the

IT ni(NPi)

i€svcase where the selection-probabilities are equal, i.e.,

Pl." = P = l/N, then H = 1- and $ reduces to the familiar form:s s '.

$s = N/n, so that, for the equal probability case,

This , it will be recalled, is the estimator obtained in class one for

the restrictive case, i.e., equal selection probabilities, for which the

class one esti:m.ator did exist. Further, if one sets $s = $ for allSv

S€Sv' then the class three estimator can be derived, since the require-

ment for unbiasedness would be

sS:l Ps $sv i;S Xi

;;; T

n1: Xi 1: 1: $ 1: P - T

sv,i Sv si€U vel S€Sv

N n1: Xi 1: 1: $s Ps

;;; Ti=l vel s :::»i v vv

88

or

nE E $s pi!! 1

v=l s =»i v Svv

which is the same as (4.6-3), the unbiasedness requirement for class

three.

4.9.4 Variance of T6.. The variance of T6 can be determined

quite simply as follows:

S n' 2V(T6) = E Ps ($s E. xit ) - ~ •

s=l t=l

Further expansion of this expression would become involved, for, with

summation over all units drawn including repetitions, some of the cross-

product terms are, in fact, squares. An estimator for the variance of

T6 would be

[

E X~2 ie:s

= T6 · -. P Is s

+

where :t is the total number of appearances of the i-th unit in all

possible samples and L is the total number of times the (i,j)-cross-

product occurs in all possible samples. That this is unbiased follows

directly from the expectation methods used in this section, and along~

the lines used to prove·"rl- is unbiased in Section 4.6.4.

4.9.5 Numerical Example. The weights for the class six estimator

depend on the particular sample. For brevity in the listing, it can be

noted that under the assumption that the selection probabilities are

coIistant over all draws, the probability of obtaining a particular sample

89

~l depends on the units drawn, and not on. the ord~r in which they are drawn.

Thus ~articular samples having the same units the same number of times

can be lumped together, as in the discussion of the class four estima-

tor. Again using the four-unit population of Section 4.5.5, the esti-

mates produced by

T6 = 1E Xi (4.9-4)

Ps I ie:s

would be:

512 Ps f 512 Ps PsI = Psnlif-1.E Xi T6ie:s Sf

A B Case A Case B

AM 64 64 6 9 24 1.5000 4.0000AAJ3 96 32 3 10 21 3.3333 7.0000MC 48 16 3/2 14 20 9·3333 13.3333AAD 48 16 3/2 11 19 7.3333 12.6667e)ABB 48 16 3/2 11 18 7·3333 12.0000ABC 48 8 3/4 15 17 20.0000 22.6667ABD 48 8 3/4 12 16 16.0000 21.3333ACC 12 4 3/8 19 16 50.6667 42.6667ACD 24 4 3/8 16 15 42.6667 40.0000ADD 12 4 3/8 13 14 34.6667 37.3333BBB 8 8 3/4 12 15 16.0000 20.0000BBC 12 4 3/8 16 14 42.6667 37.3333BBD 12 4 3/8 13 13 34.6667 34.• 6667BCC 6 2 3/16 20 13 106.6667 69.•3333BCD 12 2 3/16 17 12 9006667 64,.0000BDD 6 2 3/16 14 11 74.6667 58.6667CCC 1 1 3/32 24 12 256.0000 128,.0000CCD 3 1 3/32 21 11 224.0000 117..3333CDD 3 1 3/32 18 10 192.0000 106.• 6667DDD 1 1 3/32' 15 9 160.0000 96.0000

e'

90

Thus, for these examples:

for Case A (3-4-8-5): 512 E(T6) = 10239.9952 or E(T6)A = 20.0000 ,

for Case B (8-5-4-3): 512 E(T6) = 10240.0000 or E(T6)B = 20.0000 ,


Using (4.9-6), the variance of T6, for these examples, is

V(T6)A = 1009.9484

V(T6)B = 354.6458 •

4.10 Class Seven Estimator

4.10.1 The Estimator. The class seven estimator, the most general

class of estimators with weights dependent on the order of draw, the

presence or absence of a unit, and the p~icular sample involved, is .

given by:

n

~7 = t:l Vsit xit (4.10-1)

where V 0t (t = 1,2, .••. , n; i = 1,2, ... ,N; s = 1,2, ... , S) iss~

the weight attached to the i-th unit appearing at the t-th draw in the

s-th sample (whose elements, of course, appear in a specified order).

4.10.2 Number of Weights. The total number of weights is n if; n

for each of the ~ samples, since the V's depend on the ,sample, unit

and order of draw.,

4.10.3 Determination of the Weights. In a manner similar to that

used in class six, the restr~ctions for unbiasedness can be derived. .along the following lines:

91

S- ~ p ~ t.t x.

6=1 s i€s s~ ~

N= ~ X ~l ,I, P

i=l i s ~sit s

with ~l having meaning as in Section 4.9.3. Thus, to produce unbiaseds

ness the weights must satisfy

that is

for all i.

A general solution to (4.10-2) would be

lJrsit (4.10-3 )

where the c. t satisfy the restriction that ~' c it = 1 for every i.s~ s s

A more specific solution would again involve the use of combinatorial

number theory.

It can readily be seen that the class seven estimator is the most

general class, since by suitable suppression of the subscripts on the

lJrsit' one can reach any of the other classes of estimators. By requir

ing equality of the t.t for all it, the unbiasedness requirements~

(4.10-2) becomes

which is the same as that for class six (4.9-2). From there one can

92

move to classes three, two and one. By suppressing t, and setting

Vsi = Vs i ' one moves to class five, and from there to classes threev

and two. Finally, by suppressing s, Vsit = Vit and class four is

obtained; from which class one can.be reached for the case of equal se-

lection probabilities.

4.10.4 Variance of T7..' Again the general expression is the easi

est to manipulate for Whatever purpose might be at hand, and thus

(4.10-5 )

assuming the Vsit have been determined to produce an unbiased

estimator.

An estimator for the variance of T7

would be

'" n 2V(T7) = (I: V 't X,t), t=l SJ. J.

n . [I:X~2 i€s J.= (E Vsit xit ) - P I

t=l s

~

where T2 is as given in Section 4.9.4.

(4.10-6)

4.11 Summary of Numerical Examples

In illustrating several of the various estimators derived in this

dissertation, two numerical examples were used. These two four-unit

populations used the same numerical values, but in the second example,

Case B, the numerical values were assigned to the units so as to provide

selection probabilities at least somewhat proportional to size.

93

The two populations were:

unit A B C D

selection probability 1/2 1/4 1/8 1/8

numerical value Case A 3 4 8 5

Case B 8 5 4 3

For all classes of estimators studied, Case B provided better (in

the sense of smaller) results in terms of the variance of the estimator,

the range of the estimates of that variance, and the range of the esti-

mates of the population total. Further, for both cases used as examples,

the estimator given in Class Five as (4.8-5) had the smallest variance

among the unbiased estimators for which variances were determined.

For comparitive purposes the results can be summarized as follows:

Case A (random assignment of numerical values to units):

Range of estimates

Class Variance Totals Variances

2 142.7 3.4 - 46.3 19.9 - 513·03 5331.8 2.8 - 1365.3 -135.6 - 11429.04 131.3 6.0 - 64.0 11.1 - 373·85 118.5 6.0 - 64.06 1009·9 1.5- 256.0

Case B (probabilities somewhat proportional to size):

Range of estimates

Class Variance Totals Variances

2 40.2 8.6 - 30·3 36.0 - 156.33 1601.6 5·1 - 682.7 -180.0 - 3293·04 9.3 16.0 - 32.0 1.8- 28.4

5 8.4 16.0 - 32.06 354.6 4.0 - 128.0

5.0 SOME ADDITIONAL COMMENTS ON THE ESTIMATORS

The reader of this dissertation has undoubtedly noticed that some

of the weights given for the various classes of estimators are rather

formidable in appearance, especially if one is thinking about the compu

tational aspects of producing numerical results. The advent of the

large high-speed computers should help negate any reluctance to use a

non- self-weighting (1. e., self-weighting meaning equal simple weights)

design with "complicated" weights. Another approach to this problem

has been proposed by Murthy and Sethi (1961). Starting from the premise

that the effort required to produce the multipliers used in the estima

tor may be prohibitive where a non-self-weighting design is used in a

large scale survey, they propose a technique to substitute for the multi

pliers a very small number of multipliers called "randomized rounded-off

multipliers" , substituted by a suitable randomizing process, thus reduc

ing the computational burden. They suggest a procedure for determining

the values of the randomized rounded-off' multipliers which minimizes

the increase in the variance of the estimator.

Another item which might be a cause of concern is the possibility

that some of the estimators could have negative estimates of the sample

variance. In regard to this problem, Koop (1957, ch. 6) gives a very

complete discussion of the possibility of and interpretation of negative

estimates of the sampling variance, and these remarks will not be re

peated here. Also to be noted is that among the various estimators pro

posed in the various classes, only those estimators proposed in classes

one and four have variance estimators which always produce a positive

95

estimate of the variance. The variance estimators in the other classes

mayor may not produce a positive estimate, depending on the particular

sample involved.

Having formulated seven classes of estimators for the case of

sampling with replacement, the question might now be raised as to

whether any of the estimators can be eliminated from further considera-

tion by virtue of an estimator from another class having a consistently

smaller variance. For the general case such comparisons between the

variances will involve comparisons between quadratic forms which involve

both the variate values of the characteristic(s) under study and the Pi

vector, with the p. values arbitrary subject only to the restrictionN ~

that 1: p. = 1'1 ~~=

In general, to get an answer to this question, one would have to be

very specific, for the direction of the inequalities, from preliminary

considerations, would seem to involve the specific values of Nand n

under consideration, and also the specific probability vector, (~i) (or

at least its structure), applicable to the problem at hand. Given all

these specifications, it would seem that inequalities should exist but

imposing such restrictions does not yield a general answer to the ques-

tion of "bestness" of any of the ,estimators posited earlier.

For the restrictive case of equal selection probabilities, the

class one estimator can be eliminated from further consideration.

Des Raj and Khamis (1958) have shown that this estimator, which is the

arithmetic mean of the total sample for the case of equal selection

probabilities, has a larger variance than the arithmetic mean of the

96

distinct units observed when sampling with equal selection probabilities

and with replacement. This arit'hmetic mean of the distinct units (i. e.,

an estimator with weights N/v) belongs to class five rather than class

one.

One additional comparison (inequality) has been "provedll in the

literature and it is worthy of comment. Godambe (1960) shows that the

estimator given in Section 4.5 as the class two estimator (the sum over

the distinct units of the x's divided by their inclusion probabilities)

has smaller variance than any member of a class corresponding to class

five for some population. Th1~ follows, Godambe says, from the follow-

ing argument: Define a linear estimator es as

where the summation is over the distinct units in the sample. "It is

again clear that all the known linear estimates must be particular cases

of e II says Godambe. If es is to be unbiased, then .E f3 ° P = 1s s::> i S1. S

for all 1. And if es is unbiased, its variance is given by

= ~ x~ .E f3;i Ps + .E xiXJ0.E f3sif3sJo Ps - T2 .1=1 1. s::> i irj s::> i, j

Godambe then" proves that setting f3si = l/p( i) yields an admissible

estimate. Here p(i) denotes the probability that the i-th unit is

*included in the sample (= Pi with replacement).

by supposing that

This is done, for i €s ,o 0

97

Xi = 10

xi = 0 for i f i o •

For these assumptions,

1 ,

vee') = v ( !: 13' . Xi) = !:13' P - 1s i s~ i 6i s. . €Sv s;) 0 0

so that

,

1 )2 Ps.p(i )

o

3

His argument runs that since

is positive with the two components inside the brackets assumed unequal,

and with P also always positive,s

is at least as good as any other estimator in the class of unbiased

estimators for some special population. The derivation of this

3The article, as printed, omitted Ps from this equ:ation. This

was corrected on the basis of private correspondence with Dr. Godambe.

98

inequality rests on the assumption that all elements in a population are

zero except on the i -th, which takes the value one. The logical justio

fication for the use of this estimator solely on the basis of its merit

from this peculiar restrictive case would seem to be rather shaky.

Special attention might be called to the effect of the lis-factor in

class six. This factor,will have the effect of helping correct for a

disproportionate number of units in the sample from among those with

large probabilities or those with small probabilities (assuming selec

tion probabilities somewhat indicative of size). If a disproportionate

preponderance of the smaller (probability) units are drawn, then lis

will be numerically small, and being in the denominator will tend to

increase the estimate of the total (or mean) and will correspondingly

increase the variance. Consider the situation where a few units re-

ceived special probabilistic consideration by virtue of their large

size, with the bulk of the units being smaller, and having equal proba-

bilities among themselves. Then, if the sample drawn included only the

smaller units, the value of lis. = JT (N . t i [( 4.9-5 ~ would be less1€S . P1 ~ .

vthan one arid the estimate of the total or mean would be inflated to

counteract the absence of a "representative" number of the larger units.

Also the estimate of the variance of the total would be inflated to give

a truer picture than that given by the essentially equal s~ler units.

By the same token, if the sample as drawn included a disproportion-

ate number of the larger units, then the lis-factor would be p.umerically

large (> 1) and the estimate of the total would be deflated, as would

the estimate of the variance of the estimator. All in all, it would

seem to be a useful inclusion in an estimator.

99

6.0 SUMMARY

6.1 Summary and Conclusions

There is more to the estimation of unknown population values than

the making and recording of observations. Nor is one helped much by

merely taking a large number of observations. All too often, as a re-

sult of insufficient consideration of the basic components of a sampling

plan, badly biased sample results have been put forth as reliable simply

Ibecause the number of units in the saIlJ±lle involved was numerically large.

One must note that a large sample is not necessarily a good sample, but

it is nearly always an expensive sample.

Relegating cost considerations to the background, but not ignoring

them, it has been seen that a sampling plan has five major components:

1. A UNIVERSE: the totality of ultimate units of analysis about

which information is desired.

2. The FRAME: a delineation of the sampling units (which may consist

of one of more units of analysis).

3. A PROBABILITY SYSTEM:: a set of numbers, one for each saIlJ±lling

unit, with values restricted to the range 0 < p. < 1 and withJ. .

their sum over all sampling units in the universe restricted to

one, which are in one-to-one correspondence with the particular

frame involved. These selection probabilities must be operation-

ally realizable.

4. A SAMPLING PROCEDURE: a scheme which comes operationally from the

probability system for determining which particular units consti-

tute the sample.

100

5. .An ESTThlATION PROCEDURE: the result of the logical combination of

the observations (obtained through the frame) and the probability

system, and also involved with the sampling procedure, for arriv

ing at the desired estimates of the population values of the

characteristics under observation.

The first three of these must be completely specified prior to any con

sideration of the last two. .And for each change in the specification of

the frame and the probability system, the problem of obtaining an "opti

mum" sampling procedure and an "optimum" estimator changes •.

One can note that both the frame and the probability system can be

either simple or complex. In the discussion of estimators in Section 4.0,

consideration was restricted to one stage sampling, so that there was a

simple frame and simple probability system, however that does not affect

the generality of the above formulation. The frame ang. the probability

system, whether simple, different for each stage of a multi-stage plan,

varying over time, etc., still must be specified before one can consider

problems of selection of a sampling procedure or an estimator.

Also as a result of this formulation, the statement has been made

that a direct comparison between sampling with replacement and sampling

without replacement does not have any logical justification. The

authors who have considered this question apparently came to the same

conclusion, although they did not state it explicitly, because the com

J2arisons actually made were between estimating on the basis of the dis

tinct units and on the basis of the totality of units when sampling with

replacement, rather than the stated with versus without comparison.

101

In addition to considering the non-human components of a sampling

plan, consideration also has been given to criteria to.be applied in

helping determine which of a choice of estimators is optimum. In the

literature on sample survey techniques, the criteria that have been

used have been, for the most part, those developed for infinite popula-

tions and applied to samples from finite populations with the expecta-

tion that the degree of relevance is still fairly high. It was seen

that the concepts of sufficiency and efficiency (defined in terms of

minimum variance for an asymptotically normally distributed estimator)

are usually meaningless when applied to samples from finite populations.

Asymptotic normality cannot be achieved without resorting to an argu-

ment that the size of the fixed finite universe be allowed to approach .

infinity. Regarding sufficiency, the argument follows principally from

the fact that, in the process of sampling, probabilities enter the prob-

lem only in connection with the selection of the units to be included,

and not in connection with the characteristics of those units which are

the objectives of the investigation.

Further, the concep~ of consistency, when the traditional defini-

tion based on convergence in probability is used, does not apply to

finite samples from finite populations for the same reasons as given

above. However, if one goes back to the first definition given for con-

sistency, when Fisher promulgated the beginnings of estimation theory

(1921), there is obtained a definition which seems to be perfectly suit-

able for finite populations. It is the following:

"A statistic satisfies the c~.iterion of consistency if, whenit is calculated from the whole population, it is equal tothe required population value. n

102

The two oldest estimation criteria, unbiasedness and mini~um

variance, which were formulated by Gauss in the early 1800's are still

applicable, both to the infinite and finite populations, however they

are possibly too restrictive to be general criteria.

Thus, in the way of major estimation criteria to be applied to the

problem of selecting an optimum estimator to be based on the observa

tions of a finite sample from a finite population, one can require:

(1) that the estimator be consistent, and,

(2) that it have a minimum mean square error

where mean square error equals the sum of the variance and the square of

the bias. Many might consider the desideratum to be that the estimator

be unbiased and have minimum variance, but, for generality, a better (in

some sense) estimator may be obtained if consideration is given to esti

mators which are consistent, i.e., have a disappearing bias, and conse

quently might have a minimum mean square error, if such minimum is

obtainable. If there is no bias present in the desired estimator, the

two sets of criteria are identical.

And finally this dissertation has applied the axiomatic approach to

the case of sampling with arbitrary selection probabilities and with re

placement of the sampling units before another unit is drawn. It has

been seen that the use of axioms in the process of formulation of

classes of estimators has produced seven classes of linear unbiased

estimators of the population total, with the weights independent of the

unit characteristics (thus prohibiting imposition of a minimum variance

criterion). Within each class, a condition derived from the criterion

103

of unbiasedness has been derived, and possible solutions to that equa-

tion have been proposed. To tie the various classes of estimators to-

gether it may be noted that from the condition of unbiasedness on the

class seven estimator every other estimator can be derived by suitable

suppression (assumption of equality, e.g., r j = r when j is suppressed)

of the subscripts which denote conditions on the weights. From class

six, one can go to classes one, three and two; and so forth. The pos-

sible directions of movement are indicated by the arrows in the follow-

ing diagram:

In considering the variances of the estimators given in Sections

4.4 to 4.10, the class one estimator has been shown to be inferior to

an estimator belonging to class five, to wit:

n< v(l! 1.: x. )

n t=l ~tfor n > 2· •

However, as class one is so restricted, this comparison is also restrict-

ed to the case of equal selection probabilities. No such statement can

be made concerning the general case of arbitrary selection probabilities.

104

Otherwise, the choice of which of the various estimators to use

will depend on the specific circumstances of the sample to be drawn,

including the choice of the probability system, and possible outside

considerations which will dictate the combination of restrictions to be

applied to the choice of weights for the selected units.

6.2 Recommendations for Future Research

A major objective of this dissertation has been to raise the point

of view that the whole area of sample survey theory needs a theory of

estimation or a set of estimation criteria derived for and applied to

finite samples from finite populations. The field of sample survey

theory should not have to rely on ready-made concepts derived for in-

. finite populations, which, when applied to finite populations, have to

rely on ideas such as letting both the sample and the population ap

proach an infinite size.

If such a theory is developed, it will of necessity mean more

emphasis on combinatorial theory in the study of and development of

sample survey theory.

Another area of possible additional research would be the inter

mediate area between sampling with replacement which was a subject of

this dissertation and sampling Without replacement which was the subject

of the dissertation by Koop (1957). One might have a situation where

replacement of one or more units occurred simultaneously after a given

number of units has been drawn without replacement. Or one might

postulate a sampling scheme where the decision as to whether to replace

c' 105

a given unit is arbitrary (e.g., the unit might die before replacement

could be effected) or is determined 'in a systematic or probabilistic

manner.

With the advent of bigger and faster computers ,empirical sampling

might be done to investigate the relative efficiencies for the estima

tors proposed here for the case of sampling with replacement. Another

topic for investigation along these lines would be the possible depend

ence of the relative efficiencies on the structure of the selection

probability vector.

And finally, this dissertation dealt mostly with unbiased linear

estimators • With the large computers for use in computation, it is

undoubtedly desirable to modify the "best linear unbiased" criterion to

include consideration of estimators that are nonlinear and consistent,

but would have a smaller mean square error than the best linear unbiased

estimators.

106

7.0 LIST OF REFERENCES

Basu, D. 1958. On sampling with and without replacement. Sankhya 20:287-294.

Bowley, A. L. 1926. Measurement of' the precision attained in sampling.Bull. Inst. Inter. Stat. Tome XXII, I-ere Livraison: (1)-(62).

Carmichael, R. D. 1937. Introduction to the Theory of' Groups of' FiniteOrder, Dover Publications, Inc., New York (reprinted 1956).

Chrystal, G. 1900. Algebra, Part II. Dover Publications, Inc., NewYork (reprinted 1961).

Cochran, W. G. 1946. Relative accuracy of' systematic and stratif'iedrandom samples for a certain class of' populations. Ann. Math.Stat. 17: 164-177.

Cochran, W. G. 1953. Sampling Techniques. John Wiley and Sons, Inc.,New York.

Das, A. C. 1951. On two-phase sampling and sampling with varying probabilities. Bull. Inst. Inter. Stat. 33: 105-112.

Deming, W. E. 1960. Sample Design in Business Research. John Wileyand Sons, Inc., New York.

Edgeworth, F. Y. 1918. On the value of a mean as calculated from asample. J. Roy. Stat. Soc. 81: 624- 632.

Feller, W. 1957. An Introduction to Probability Theory and its Applications, Vol. I, 2nd edn. John Wiley and Sons, Inc., New York.

Fisher, R. A. 1921. On the mathematical f'oundations of theoreticalstatistics. Phil. Trans. Roy. Soc. London Ser. A 222: 309-368.

Fisher, R. A. 1925.Phil. Soc. 22:

Theory of' statistical estimation.700-725·

Proc. Cambridge

Fisher, R. A. 1956. Statistical Methods and Scientific In:f'erence.Hafner Publishing Co., New York.

Fisher, R. A., and Yates, F. 1949. Statistical Tables, 3rd edn. Oliverand Boyd, Ltd., London.

Fraser, D. A. S. 1958. Statistics: An Introduction. John Wiley andSons, Inc., New York.

107

Godambe, V. P. 1955. A unified theory of sampling from finite populations. J. Roy. Stat. Soc. Ser. B 17: 269-278.

Godambe, V. P. 1960. An admissible estimate for any sampling design.Sankhya 22: 285-288.

Hansen, M. H. and Hurwitz, W. N. 1943. On the theory of sampling fromfinite populations. Ann. Math. Stat. 14: 333-362.

Hansen, M. H., Hurwitz, W. N. and Madow, W. G. 1953. Sample SurveyMethods and Theory, Vol. II. John Wiley and Sons, Inc., New York.

Horvitz, D. G. and Thompson, D. J. 1952. A generalization of samplingwithout replacement from a finite universe. J. Am. Stat. Assoc.47: 663-685.

Isserlis, L. 1916. On the conditions under which the "probable errors"of frequency distributions have a real significance. Froc. Roy.Soc. (London) Ser. A 92: 23-41.

Isserlis, L. 1918. On the value of a mean as calculated from a sample.J. Roy. Stat. Soc. 81: 75-81.

Koop, J. C.' 1957. Contributions to the general theory of sampling finite populations without replacement and with unequal probabilities.Unpublished Ph.D. Thesis, North Carolina State College, Raleigh(university Microfilms, Ann Arbor).

Koop, J. C. 1960. On theoretical questions underlying the technique ofreplicated or interpenetrating samples. Froc. Social Stat. Sect.,Am. Stat. Assoc. 1960: 196-205.

1-1adow, W. G. 1948. On the limiting distribution of estimates based onsamples from finite universes. Ann. Math. Stat. 19: 535-545.

Midzuno, H. 1950. An outline of the theory of sampling systems. Ann.Inst. Stat. Math. 1:: 149-156.

Mortara, G. 1917. Elementi di statistica. Appunti sulle lexioni distatistica methodologica dettate nel R. Instituto Superiore distudi comerciali di Roma. Rome. p. 356. As cited by Tschuprow(1923) .

Murthy, M. N. and Sethi, V. K. 1961. Randomized rounded-off multipliers in sampling theory. J. Am. Stat. Assoc. 56: 328-334.

NanjaJ'lJIlla, N. S., Murthy, M. N. and Sethi, V. K. 1959. Some sampling .systems providing unbiased ratio estimators. Sankhy~ 21: 299-314.

.'

108

Neyman, J. 1934. On two different aspects of the representative method: the method of stratified sampling and the method of purposiveselection. J. Roy. Stat. Soc. 2]: 558-606.

Neyman, J. 1952. Lectures and Conferences on Mathematical Statisticsand Probability. Graduate School, u. S. Dept. Agr., Washington,D. C.

Raj, Des. 1958. On the relative accuracy of some sampling techniques .J. Am. Stat. Assoc. 22: 98-101.

Raj, Des and Rhams, S. H. 1958. Some remarks on sampling with replacement. Ann. Math. Stat.~: 550-557.

Riordan, J. 1958. An Introduction to Combinatorial Analysis. JohnWiley and Sons, Inc., New York.

Roy, J. and Chakravarti, I. M. 1960.population. Ann. Math. Stat. 31:

Estimating the mean of a finite392-398.

Seng, Y. P. 1951. Historical survey of the development of samplingtheories and practice. J. Roy. Stat. Soc. Sere A 114: 214-231.

Splawa-NeYman, J. 1925. Contributions to the theory of small samplesdrawn from a finite population. Biometrika 17: 472-479.

Stevens, W. L. 1937. Significance of grouping. Ann. Eugenics.§: 57-69.

Stevens, W. L. 1958. Sampling without replacement with probability proportional to size. J. Roy. Stat. Soc. Sere B 20: 393-397.

Sukhatme, P. V. 1953. Sampling Theory of Surveys with Applications.The Indian Society of Agricultural Statistics, New Delhi, India,and The Iowa State College Press, Ames, Iowa.

Sukhatme, P. V. and Narain, R. D. 1952 . Sampling with replacement. J.Indian Soc. Agr. Stat. ~: 42- 49.

Tschuprow, A. A. 1923. On the mathematical expectation of the momentsof frequency distributions in the case of correlated observations.Metron g(3): 461-493 and g(4): 646-683.

Whittaker, E. and Robinson, G. 1944. The Calculus of Observati'ons, 4thedn. Blackie and Son, Ltd., London.

Wilks, S. s. 1960. A two-stage scheme for sampling without replacement. Bull. Inst. Inter. Stat. 21(2): 241-248.

Wu-min. 1958. Two ways of compiling statistics.Peking, China. April 29: 1, 4.

A

Jel'lIllin j ih pao.

109

Yates, F. 1953. Sampling Methods for Censuses and Surveys, 2nd edn.Hafner Publishing Co., New York.

Yezhov, A. 1957. Soviet Statistics. Foreign Language PublishingHouse, Moscow.

Zarkovic, S. s. 1956. Note on the history of sampling methods inRussia. J. Roy. Stat. Soc. Sere A 119: 336-338.

Zarkovic, S. S. 1960. On the efficiency of sampling with varying probabilities and the selection of units withreplacement. Metrika~: 53-60.

APPENDICES

111

8.0 APPENDIX A

THE DISTRIBUTION OF THE NUMBER OF DISTINCT UNITS IN THE SAMPLE

Let v denote the number of distinct units appearing in a sample

of size n drawn from a fimte population of size N with replacement

of each unit drawn preceding the next draw. Then it is readily apparent

that v is a random variable (1 ~ v ~ n) with a distribution dependent

on n and N• Although all the results of this appendix are not used

in the body of this dissertation, the use of generating functions in

this field is of interest.

8.1 Equal Selection Probability Case

Let the probabilities of selection be equal for each of the N

units (i.e., Pi = P = liN), then an analogy may be drawn between the

distribution of v and that of the number of empty cells when r balls

are randomly distributed among n cells. This classic "occupancy prob-

lem" yields the following formula, as given by Feller (1957, p. 92), for

the probability of having m cells empty when placing r objects into

n cells:

P[m] = Pr(m cells empty)

(8-1)

To apply this formula to the distribution of v, note that

Pr( v distinct units in n draws) =

Pr(N-vunits not drawn or "empty").

112

Setting, in (8-~)

n = N, m = N - v, r = n

and reversing the order of summation (setting s = v-s) gives:

p(v) = ( N) ~ (_l)v-s ( v ) (1 _ (N-V); (v-s) )N-v s=o v-s

Using the "differences of zero" notation, (8-2) may be written in a

more elegant form as

(8-2)

(8-3)

where t:::. is the usual finite difference operator with unit increment and

From (8-3) the probability generating function of v can be obtained

as

Note that t:::.sOn = 0 for s = 0 and for s > n.

(8-4)

Further, the factorial generating function is readily obtained from

the probability generating function by substituting (1 + t) for t in

( 8-4), to wit

113

(8-5 )

where C = 1 + A = the usual increment operator with unit increments.

Using Fv(t), (8-5), one can readily compute E(v) and E v(v-l) and from

these the mean and variance of v are easily obtained as follows:

E(v) =t=O

(8-6)

Since the variance of v is

(8-7)

first determine:

114

t=O

and then, by substituting this· in (8-7), obtain

V(v) = N- n N(N-1) [~ - 2(N-1)2 + (N-2)nJ + N-~[~ - (N-1)nJ

_r n r [~n _ 2~(N_1)n + (N_1)2nJ

= N- n N(N-1)n + N-n N(N-1)(N-2)n _ N-2~(N_1)2n, _

= N (N;lt _ r (N;1)2n + N(N_1)(N;2)n •

Also, E(~) can be seen to be:

(8-9)

E(v2 ) = E [v( v-1) ] + E( v)

= N-~ [Nn - 2(N-1)n + (N-2)nJ +~ [(N_1)n - (N_2)n].

(8-10 )

115

8.2 Arbitrary Selection Probability Case

With arbitrary, or unequal, selection probabilities, the analogy

with the "occupancy problem" disappears, and the distribution of v be·

comes rather messy. One can, however, obtain expressions for the mean

and variance of v without first obtaining the distribution of v.

Let the characteristic random variable

1 if the i-th unit is drawn, regardless of the number of

Zi = times it appears in the sample.

o if the i-th unit is not drawn.

Also denote the probability of the i-th unit being drawn on any givenN

draw as p. with 1:: Pi = 1 •1. i=l

Then, on n draws with replacement, the probability that the i-th

unit is not drawn, i. e ., that zi takes the value zero, is:

Pr( zi = 0) = ~ where ~ = 1 - Pi

from which it follows that

Thus the expectation of zi is seen to be

= (1) (1 - ~) + (0) (~)

(8-11)

Now since the number of distinct units equals the sum over all units

in the population of the characteristic random variable, then

116

NE( v) = r: E( z. )

i=l. J.

N n= r: (1 - ~)

i=l

N n= N - r: (1 - p) •

i=l i

This approach also yields

N N NV(v) = V( r: Zi) = r: V(Zi) + r: Cov(zi' Zj).

i=l· i=l irj

Now:

(8-12)

(8-13 )

= (1 - ~ - q~ + ~j) - (1 - ~)(1 - q~)

( n n n)= - ~ qj - ~j

where ~j = 1 - Pi - Pj •

so that, using (8-13) and (8-14), the variance of v is

N N:2 N n= r: %- (r: ~) + r: ~ ..

i=l i=l irj J

(8-14)

(8-15 )

These results have also been derived by Basu (1958), but in a very

compressed form.

117

9.0 APPENDIX B

A STATISTICAL THEORY OF COMMUNISM

The following is a translation of an article, Wu-min (1958), which.... - -

appeared in Jenmin jih pao, the official party newspaper in Communist

China, on April 29, 1958. The government was, at the time, having con

siderable difficulty explaining to the world the discrepancy between

the actual production figures for some crops, and the stated objectives

of the five-year plan then in effect.

It is reproduced as an illustration of the reaction that can occur

when scientific principles do not produce politically desired results.

The moral, however, is not that samplers should pay sole attention to

statistical theory and methodology at the expense of political consid-

erations when formulating the problem, but that the desideratum is sam-

pIers who observe considerations of the subject under study and the

national goals which may be involved, and still retain complete objec-

tivity in compiling, analysing and reporting the sample data.

Two Ways of Compiling Statistics

In reading the report on "Speeding up production by using statis-

tics in Ho-Pei Province, II we see that there are two ways of compiling

and using statistics. One is static and isolated, t~e other is imposing

and integrated. Statisticians in the past, under metaphysical philoso-

phy, adhered too closely to regulations and forms, claiming that statis-

tical workers should assume an extremely detached and cool attitude.

But at the height of our national leap forward in agricultural and

118

industrial production we cannot stand still; we must march forward with

the mass of the people. The State Statistical Bureau has made a thor

ough investigation of past policies and found the following shortcom

ings:

1. Too much stress on textbooks, report forms, neglecting polit

ical responsibility, obserVing the rules to each title and letter.

Doing nothing beyond this. A new and improved system of statistics in

troduced in Ho-Pei Province has been used with highly effective results.

Statistical workers of the old school, Visiting Ho-Pei, have doubted

these results because the new methods cannot be found in their text

books.

The value of new methods and experience must be judged by their

contributions to the national welfare and socialist construction. We

must be materialistic and follow the principle of actuality. Most of

the Chinese texts on statistical methods were translated or compiled

from foreign books. No books have yet been written with creative genius

based upon actual experience in China. Therefore government agencies

dealing with statistics and colleges giving statistical courses should

accept the responsibility of accumulating experience in China and com

piling our own textbooks.

2. Too much emphasis on official formulae, which disregard the

ories and politics; seeking only concrete figures, forgetting the spirit

of the times. Statisticians should first learn the statistical theory

of Marxism and Leninism, then respond to the demand that China be guided

by those principles and establish its own method of statistics.

119

3. Too much mystification and self-consciousness among statisti

cians, who insist that this work must be done only by specialists.

Thus, they depend only upon their own workers, having no confidence in

other people, and refuse guidance by the government or the party. We

must cooperate with the local population and participate in their pro

duction efforts. This is the lesson demonstrate~by the experiment in

Ho-Pei Province.

4. Too much reliance on official rules and procedures; seeking

only figures, disregarding people. Too much preoccupation with writing

reports and filling forms, to the neglect of positive, creative and pro

gressive work. Statistical workers in Ho-Pei Province have adopted

entirely different methods, which can be summarized under the following

three points:

1) Related their statistical records with the major activities

of the party and the productive labor of the people. This

makes statistics the motive power and guiding force in the

national production leap.forward.

2) Maintained political consciousness and guiding principles,

without holding too rigidly to absolute figures, prescribed

procedures and forms, which would waste time.

3) Relied on local authorities and the mass of the people to

bring results and overcome obstacles. In this way Ho-Pei

statistics are based upon actual conditions and the accu

racy of sources can be guaranteed.

The new statistical methods in Ho-Pei have created new experience, new

120

trends and directives in statistics. It is worthwhile, therefore to

recommend these methods to all statistical workers in China, and hope

that they will pay special attention to adapting concrete methods to

suit local conditions and purposes.

INSTITUTE OF STATISTICS

NORTH CAROLINA STATE COLLEGE

(Mimeo Series available for distribution at cost)

265. Eicker, FriedheIm. Consistency of parameter-estimates in a linear time-series model. October, 1960.

266. Eicker, FriedheIm. A necessary and sufficient condition for consistency of the LS estimates in linear regression. October,1960.

267. Smith, W. L. On some general renewal theorems for nonidentically distributed variables. October, 1960.

268. Duncan, D. B. Bayes rules for a common multiple comparisons problem and related Student-t problems. November,1960.

269. Bose, R. C. Theorems in the additive theory of numbers. November, 1960.

270. Cooper, Dale and D. D. Mason. Available soil moisture as a stochastic process. December, 1960.

271. Eicker, FriedheIm. Central limit theorem and consistency in linear regression. December, 1960.

272. Rigney, jackson A. The cooperative organization in wildlife statistics. Presented at the 14th Annual Meeting, Southeastern Association of Game and Fish Commissioners, Biloxi, Mississippi, October 23-26, 1960. Published in Mimeo Series,January, 1961.

273. Schutzenberger, M. P. On the definition of a certain class of automata. January, 1961.

274. Roy, S. N. and J. N. Shrizastaza. Inference on treatment effects and design of experiments in relation to such inferences.January, 1961.

275. Ray-Chaudhuri, D. K. An algorithm for a minimum cover of an abstract complex. February, 1961.

276. Lehman, E. H., Jr. and R. L. Anderson. Estimation of the scale parameter in the Weibull distribution using samples cen·sored by time and by number of failures. March, 1961.

277. Hotelling, Harold. The behavior of some standard statistical tests under non-standard conditions. February, 1961.

278. Foata, Dominique. On the construction of Bose-Chaudhuri matrices with help of Abelian group characters. February,1961.

279. Eicker, FriedheIm. Central limit theorem for sums over sets of random variables. February, 1961.

280. Bland, R. P. A minimum average risk solution for the problem of choosing the largest mean. March, 1961.

281. Williams, J. S., S. N. Roy and C. C. Cockerham. An evaluation of the worth of some selected indices. May, 1961.

282. Roy, S. N. and R. Gnanadesikan. Equality of two dispersion matrices against alternatives of intermediate specificity.April, 1961.

283. Schutzenberger, M. P. On the recurrence of patterns. April, 1961.

284. Bose, R. C. and I. M. Chakravarti. A coding problem arising in the transmission of numerical data. April, 1961.

285. Patel, M. S. Investigations on factorial designs. May, 1961.

286. Bishir, J. W. Two problems in the theory of stochastic branching processes. May, 1961.

287. Konsler, T. R. A quantitative analysis of the growth and regrowth of a forage crop. May, 1961.

288. Zaki, R. M. and R. L. Anderson. Applications of linear programming techniques to some problems of production plan-ning over time. May, 1961.

289. Schutzenberger, M. P. A remark on finite transducers. June, 1961.

290. Schutzenberger, M. P. On the equation a2+D = b2+m c"'-p in a free group. June, 1961.

291. Schutzenberger, M. P. On a special class of recurrent events. June, 1961.

292. Bhattacharya, P. K. Some properties of the least square estimator in regression analysis when the 'independent' variablesare stochastic. June, 1961.

293. Murthy, V. K. On the general renewal process. June, 1961.

294. Ray-Chaudhuri, D. K. Application of geometry of quadrics of constructing PBIB designs. June, 1961.

295. Bose, R. C. Ternary error correcting codes and fractionally replicated designs. May, 1961.

296. Koop, J. C. Contributions to the general theory of sampling finite populations without replacement and with unequalprobabilities. September, 1961.

297. Foradori, G. T. Some non-response sampling theory for two stage designs. Ph.D. Thesis. November, 1961.

298. Mallios, W. S. Some aspects of linear regression systems. Ph.D. Thesis. November, 1961.

299. Taeuber, R. C. On sampling with replacement: an axiomatic approach. Ph.D. Thesis. November, 1961.

300. Gross, A. J. On the construction of burst error correcting codes. August, 1961.

301. Srivastava, J. N. Contribution to the construction and analysis of designs. August, 1961.

302. Hoeffding, Wassily. The strong laws of large numbers for u-statistics. August, 1961.

303. Roy, S. N. Some recent results in normal multivariate confidence bounds. August, 1961.

304. Roy, S. N. Some remarks on normal multivariate analysis of variance. August, 1961.

305. Smith, W. L. A necessary and sufficient condition for the convergence of the renewal density. August, 1961.

306. Smith, W. L. A note on characteristic functions which vanish identically in an interval. September, 1961.

307. Fukushima, Kozo. A comparison of sequential tests for the Poisson parameter. September, 1961.

308. Hall, W. J. Some sequential analogs of Stein's two-stage test. September, 1961.

309. Bhattacharya, P. K. Use of concomitant measurements in the design and analysis of experiments. November, 1961.

on sampling with replacement: an axiomatic approach …

Documents