on sampling with replacement: an axiomatic approach …
TRANSCRIPT
ON SAMPLING WITH REPLACEMENT: AN AXIOMATIC APPROACH
by
RICHARD CONRAD TAEUBER
Institute of StatisticsMimeo Series No. 299October, 1961
iv
TABLE OF CONTENTS
Page
100 INTRODUCTION. • 0 ••• 1
2.0 REVIEW OF LITERATURE • ooooooe.oooo 6
300 ON THE BASIC CRITERIA FOR A THEORY OF SAMPLING 0 21
301 Components of the Sampling Problem • • • • • • 213.2 On the Question of Sampling with or without Replacement 253.3 The Applicability of Traditional Estimation Criteria 29304 Criteria for Estimators from Finite Populations • 0 39
400 THE GENERAL CLASSES OF LINEAR ESTIMATORS FOR SAMPLING WITHREPLAC~ " 0 0 • e' 0 0 0 • • 0
o • •
(I " 0 "
o 0 0 ~
o • 0
43
4346
. • . . 505356637076859092
I) 0 0 • 0'
. .. . .
. . .. . .
401 Introductory Remarks • • • 0 • • • •4.2 Probability System and Notation • 0 •
4.3 Some Combinatorial Considerations • •4.4 Class One Estimators • • • • • .4.5 Class Two Estimators • •4.6 Class Three Estimators4.7 Class Four Estimators •4.8 Class Five Estimators • • • • • •4.9 Class Six Estimators • • •4.10 Class Seven Estimators •••• • •4.11 Summary of Numerical Examples • • • • •
OOooGOOOOOOO
....
5.0 SOME ADDITIONAL COMMENTS ON THE ESTIMATORS
6.0 SUMMARY
601 Summary and Conclusions • • 0 • 06.2 Suggestions for Future Research 0
700 LIST OF REFERENCES 0 • 0 0 • 0 0 ••••
94
99
99•• 104
o • 106
~.
8.0 APPENDIX A. THE DISTRIBUTION OF THE NUMBER OF DISTINCT UNITS'IN THE SAMPLE 0 ••• 0 ••• 0 • • •• 0,0 • • • .. '111
8.1 Equal Selection Probability Case •8.2 Arbitrary Selection Probability Case . . . 111
115
900 APPENDIX B. A STATISTICAL THEORY OF COMMUNISM • • • 117
.-
1
1.0 INTRODUCTION
Sample survey procedures are among the most valuable and powerful
tools at the command of a statistician. Improperly or carelessly used,
they could be exceedingly dangerous. To get useful results from a
sample survey, one must heed the importance of the logical planning of
all steps in an investigation. Not only is there a problem of how best
to select the sample, but also there are the problems of how to obtain
an estimate of the desired population value and what measure of reli
ability to attach to that estimate. To do all this makes it inevitable
that certain assumptions regarding the unknown population will be neces
sary. It is here that finite population sampling differs considerably
from procedures for draWing samples from an infinite population. When
sampling from finite populations, the only assumptions made concern .such
things as existence, identifiability and aVailability of the sampling
units and the probability construct chosen for the selection of the sam
pling units. Sample survey theory makes no assumptions concerning the
abstract distribution of the variables (characteristics) under study.
Even after the unknown number of centuries that man has been
drawing samples and acting on the information that they prOVide him,
there still has not been developed a general theory with practical
applicability which will universally indicate to the sampler a "best'.'
(in some sense) system for selecting the sampling units to be observed
and at the same time indicate a "best" estimating procedure by which to
glean the information prOVided by the sample. (The term "best" will
usually connote minimum variance, but this is not the only requirement
.'
2
for bestness which one lIl.ight impose. For certain restrictive particular
cases "best" systems and estimators are known, e.g., the arithmetic mean
of the observations is the minimum variance estimator when sampling
without replacement and with equal selection probabilities.) However,
with some reflection, it is easy to see why no such perfect theory has
been developed for sample surveys, for the method that any given sampler
adopts is very ,dependent on the nature of the material which is avail-
able or can be obtained, and the assumptions necessary to utilize that
material to which he can gain access. In spite of this absence ofa•
"general theory" for sample surveys, some progress has been made in the
formulation of improved sampling systems and estimating procedures which
will give "better" results.
With, no doubt, centuries of application, the study of the theory
behind sample surveys (at least that which was published) dates from
1713 and the appearance of Bernouilli' s Ars co:qiectandi. In the two
centuries following the appearance of this work, little was, published by
anyone other than Poisson and texis. Beginning in 1916 many authors
have published various considerations of aspects of sample survey the-
ory, especially as applied to the drawing and evaluating of samples from
finite populations.
Modern developments in the field of sampling finite populations are
usually said to stem from the paper by Neyman (1934) which was the last
major paper to give much consideration to purposive selection of the
sample units, as contrasted with probabili.stic selection, and which
pointed the way to more "scientific" lines of development of the art.
3
Nine years later Hansen and Hurwitz (1943) stimulated the consideration
of drawing samples with unequal or arbitrary selection probabilities by
using the idea of probability proportional to s~ze. Midzuno (1950) for-
malized the general approach of arbitrary probabilities by introducing
the concept of a probability field into such studies. As he said:
"there is no need of equal probability for every element when we con
struct the probability field, isn't it?"
It was not until 1952 that the first attempts at formulating
general classes of estimators for samples from finite populations was
published by Horvitz and Thompson (1952). But Horvitz and Thompson did
not recognize the deductive approach of their own work, and so merely
stated three of the possible classes of estimators.
It remained for Koop (1957) to formalize the approach to the
formation of classes of linear estimators. The formation of seven
classes of linear estimators, for the case of sampling with unequal
(general) selection probabilities and without replacement of. the sampled
units, was based on three axioms which are descriptions of physical
realities. This approach to the formulation of classes of estimators,
based on the way things actually happen with the associated probabil--
i ties, would seem, for finite popUlations, much more fundamental than
one based on classical estimation criteria. In fact, the notion that
there are criteria for which one can develop classes of estimators is
not germane to sample surveys. In sample survey theory, one first
develops classes of estimators, then applies criteria, such as unbiased-
ness or minimum variance, to attempt a determination of bestness within
each class.
.","
4
Another problem which has been under discussion recently in the
literature of sample surveys is. the question of whether one should sam
ple with or without replacement. It is argued that with equal total
sample size, there is no question but that one should use without re
placement by virtue of the fact that the variance of the mean is small
er. But, when cost is figured in as a consideration in the decision
process, then the comparison clouds for the costs of sampling with re
placement depend on the number of distinct units in the sample, rather
than the total sample size.
Hidden by considerations such as the question of whether to sample
with or without replacement, the development of newer and fancier esti
matorsfor specialized situations, the extension of the sampling plan to
more and more stages, the more and more theoretical discussions of some
of the technical problems which arise in actual samples, etc., is an
almost complete lack of discussion of principles governing the choice of
estimators to use on samples from finite populations. Although the
basic principles of unbiasedness and minimum variance, which are direct
ly applicable to samples from finite populations, appeared in the liter
ature in the early nineteenth century, little has been developed since
in the way of criteria specifically applicable to the problem of deter
mining optimum estimators (in some sense) when the population under
study is finite. In this, and many other aspects of sample survey the
ory (samples from finite populations) the tendency seems to have been to
assume that the criteria developed for infinite populations will merely
transfer to finite populations •. In some cases they may, but for the
5
most part they do not without adding unwarranted assumptions about the
nature of the population or the sample.
What this dissertation proposes to do, then, is:
(1) To discuss, in a preliminary manner, the applicability of the
classical estimation criteria to samples from finite popula
tions, and to suggest some possible criteria which might be
utilized in evaluating possible estimators for use on samples
from finite populations.
(2) Examine the question of sampling with replacement versus sam
pling without replacement to see what conclusions might be
reached, or have been reached, or to see if such a comparison
can properly be made in the first place.
(3) Using an axiomatic approach, to examine the problem of formu
lating classes of estimators for samples drawn from finite
populations, with arbitrary selection probabilities, and with
replacement of each sampling unit before the next unit is
drawn.
It can be noted that the first two objectives are somewhat interrelated~
The third objective, the use of the axiomatic approach, does not depend
on the results of the first objective. However, this approach to the
formulation of classes of estimators is further justified by the results
of the first objective.
6
2.0 REVI:EWOF LITERATURE
Man from time innnemorial has engaged in the use of sampling
techniques to base decisions on partial knOWledge of the situation. He
has judged the opinions of many by talking with friends or advisors; he
has condemned or praised a whole. nation or race of people after but a
five or ten day visit; he has pushed aside a bowl of hot soup. or tepid
mush after swallowing one spoonful; et cetera, et cetera and so forth.
In the case of the soup or the mush, the universe (bowlful) is undoubt-
edly sufficiently homogeneous that such a sample would lead to valid
inferences. But for the other examples cited (in fact for most of the
sampling that is done by man, either unconsciously or deliberately),
there is great danger that false and misleading inferences will be
drawn if complete objectivity in the formulation of the goals and pro-
cedures of the inquiry and in the collection and analysis of the data
is lacking.
Eventually people began to want to formalize the methodology behind
obtaining some of these sample estimates. Some sort of formal procedure-.,
was needed to obtliLin measures of central tendency and an indication of
their Validity, based on a subset of an entire population. The first
known formal approach to study the theory of sampling was that of
Bernouilli in his monumental study Ars coniectandi which appeared :i,n
1713. A c.entury later Poisson gave indications of the theory that would
result from the introduction of stratification into the sampling proce-
dure. Subsequently Lexis systematized the work of his predecessors and
added the beginnings of the theory of sampling clusters of elements.
7
Also, it can be noted that the germinal ideas of the analysis of
variance techniques are to be found in Lexis I works.
Sir Arthur Bowley (1926) su:mma.rized the adaptation of the works of
Bernouilli and Poisson to sampling from finite populations. Bowley was
also one of the first to apply the representative method (purposive se
lection) in practice, and included in this paper a discussion of the
theory involved. This paper undoubtedly marked the high-water-mark for
purposive selection (as contrasted to random selection, or attempts
thereat) because the major papers subsequent to this one seemed to as-
sume random selection, or to condemn purposive selection. Bowley, him-
self, later made the .comment ~ when discussing the paper by Neyman (1934),
that he thought his 1926 paper had "damned it (purposive selection) with
very faint praise".
In the decade immediately preceding Bowley's paper,l the theory of
sampling finite populations with equal selection probabilities and with
out replacement began to develop in earnest. Isserlis (1916 and 1918),
Edgeworth (1918), Tschuprow (1923) and Neyman (writing under the name
J. Splawa-Neyman) (1925) de~ived formulae for the, first four moments of
the sample mean. Mortara (1917) developed a formula for the standard
error of the mean. Neyman, in addition to giving formulae for the first
four moments of the 'sample. mean, gave formulae for the first two moments'; .
of the sampling variance. Due to· inaccessability, formidable notational
1 .. .For a very interesting discourse on part of the.history of the
development of sampling theories· and practice in the five decades preceding Bowley I s paper see the article by You Poh Seng (1~5l).
8
systems, or other reasons, none of these papers stimulated 'Wide growth
in the field of sample survey theory.
Tschuprow, in that same 1923 paper, developed the principles of the
theory of the optimum. allocation of units in stratified sampling. In
fact, "Zarkovic (1956), in his article on the history of sampling methods
in Russia, gives the impression that had the works of Tsc~uprow been
more accessible, and had they had a system of notation which was easier
to understand, they might be the monumental works being cited in this
chapter. Zarkovic refers to an earlier Russian work which mentions that
Tschuprow, in 1900 in a report IIOn Sampling Methods II , dealt only with
probability samples (Western reliance was then on purposive selection)
and developed the basic theory of surveys. Also several of Tschuprow' s
other works, especially those in connection with the Russian census
circa 1913, where many of the techniques were applied, were quite sug
gestive of techniques and theoretical developments which were "derived"
much later in the more familiar Western literature 0
In fact, if Zarkovic is right, Russian sta.tisticians were in the
forefront" of the development of sample survey theory and techniques up
to the time of the death of Lenin. This was due, undoubtedly, to the
fact that
"These Russian statisticians watched the development ofstatistical theory allover the world, they publishedtranslations of the most important "foreign "contributionsand they reviewed for their rea.ders all important results,whatever country prOVided them. This keen actiVity supplied the base from which they sought solutions to theirown practical problems." (1956, p. 336)
He goes on to say, though, that in the years after the death of Lenin,
9
political considerations became increasingly important in Russian
statistical effort, and less reliance was placed on theory in the prac-
tical application of "statistical" techniques. (For an illustration of
this non-reliance on theory in the Communist world, see Section 9.0).
There are indications, however, that, at least in Russia, a reliance on
theory is again emerging [YeZhOV (1957)).
An impetus to sample survey development, following the above
mentioned papers and the fundamental statistical contributions of
Pearson, Fisher, and others, resulted from the paper by Neyman (1934)
entitled liOn Two Different Aspects of the Representative Method:
Modern developments in the field of sample survey theory can be said to
have begun With this paper. Several new concepts (i. e., new to most
"
researchers; some had been anticipated in earlier articles which had not
received as much attention) were introduced and discussed, such as:
( i ) optimum use of resources in sample surveys,
(ii) criteria for the choice of the sampling unit:>
(iii) use of preliminary inquiry for improving the design of the
survey, and
(iv) optimum allocation for assigning units to different strata
subject to the restriction that the sample shall have a
fixed total number of sampled units.
Neyman also discussed the advantages of random over purposive selection
of units, and also the advantages of using stratified sampling, going
so far as to make the statement that the only recommended method of
sampling is stratified random sampling.
10
The next major paper to appear was that by Hansen and Hurwitz
(1943). Faced with the situation where the sampling unit, and the ulti
mate unit.of analysis are not identical, they examined the question of
sampling with unequal selection probabilities. In situations where the
sampling units are aggregates of ultimate units (i. e., clusters) limita
tions on resources may prohibit the effort needed to group the ultimate
units into clusters of equal size by artificial methods. These authors
noticed that if one· sampled units with replacement using probabilities
(Pi) exactly proportional to the values (Yi) of their aggregate char
acteristics (i.e., Pi = YilT, T = the population total), the mean of
these aggregate values, each weighted by the reciprocal of its respec
tive selection probability, has a sampling variance of zero since each
Yilpi ;: T.
These considerations led Hansen and Hurwitz to consider selecting
sampling units with probabilities proportional to some measure of size
so as to reduce the Sampling variance of the estimator. The scheme that
they proposed was essentially a stratified two-stage scheme, selecting
one primary unit per stratum at the first stage with probabilities pro
portional to some measure of size, and at t~e second stage selecting the
elements in each selected primary unit with equal probabilities and with
out replacement. An unweighted estimator was shown to be unbiased and
to have a smaller variance than if the sampling plan was based on equal
first-stage selection probabilities.
The appearance of this article by Hansen and Hurwitz stimulated
attempts to generalize the approach, both in terms of estimators
11
invalving varying probabilities (rather than being restricted to equal
selection probabilit~es as had most previous studies) and in terms of
selecting more than one first-stage unit per stratum so that the sam-
pling variance of the estimator could be calculated. Not all of these
papers will be mentioned here as they are not of immediate relevance to
this dissertation.
Sukhatme and Narain (1951) outlined a scheme where the primary
sampling units (p. s. u. IS) were selected with replacement and With
probabilities proportional to their sizes as measured by the number of
sub-units in each primary unit. Then the second stage units were se-
lected without replacement and with equal selection probabilities. They
presented the theory, and also compared the efficiency of the following
two schemes:
(A) selected a random sample of mni sub-units from the i-th pri
mary unit, where m is an integer, and ni denotes the
number of times the i-th primary unit appears in the sample,
E n. = n, andJ. '
(B) select ni independent random sub-samples of m sub-units
from the i-th primary unit.
The variances of the sample means are respectively:
[~~N M -m N :2 :2 ]VA
1 + Ei :2 - (n-l) E
Pi O"i= - PiO"in i=l mMi i=l Mi
1 [~~ N M -m Pi~i]VB = - + Ei
n i=l mMi
12
where N = number of p. s. u. 's in the stratum;
Mi = number of sUb-units in the i-thp. s. u. ;
:2 the between p. variance;O"b = s. u.
t{ = the within the i-th p. s. u. variance.
Thus in their plan (A) that part of the variance attributable to sub-
sampling is reduced to the order of
m(n-l) tj,(M-m)N
whereN
M = L: Mi/N,i=l
which, it may be noted, is very nearly equal to the over-all sampling
fraction.
The estimates of the between and within variances are as follows:
for case (A):
"2 v mni -:20" = L: L: (Yij-Yi) /(nm-v)w i=l j=l
ni(y
i-y):2 A2
]A2v 0"
[ E(V)-lO"b = L: 2!.
n-l n n-l . ,i=l
where v =
:2O"w =
the number of distinct p. s. u.' s in the sample;
for case (B): the estimates come straight from analysis of variance
considerations, since the sub-samples are drawn inde-
pendently.
Wilks (1960) raised an objection to the above scheme by noting that
it is conceivable that mni could exceed Mi ; thus the above method
13
could require observation of more ~han the total number of secondary
units available.
Wilks suggested that one let Ni = oMi = the number of·elements in'
the i-th p. s. u. (and to consider a reasonable approximation where the
Ni are roundeC!. to the nearest integral multiple of m) 0 Then one is to
draw a s~le of n p. s. u.'s~ in a manner such that a sample of aim
sub-units is drawn from the i-th p. s. u.~ where the a i (i = 1, 2, 0 0 0'
k) are random variables having the hypergeometric distribution
This scheme may be regarded as one in which s~l;ing is done without
replacement at both stages, whereas the scheme proposed by Sukhatme and
Narain involves s~ling with replacement at the first stage and without
replacement at the second stage.
Wilks suggests that the estimator for the mean be
_ 1 k _ 1 kY = - Z a.y = - Z m aiyJ,. ,
n i=l J, i mn i=l .
which is self-weighting. The expression for the estimate of the vari-
ance of this mean is, unfortunately~ rather complicated and is given by
Wilks (1960~ p. 246).
In the early 1950's many articles appeared which incorporated un-
equal selection: probabilities into the formulation of estimators 0 The
majority of these were by Indian authors, and have not received much
...
14
attention in this country. The article that is probably the best known
of the ones thai;, appeared in this interval· is that by Horvitz and,
Thompson (1952). In their "Generalization of Sampling Without Replace-
ment from a Finite Universe ll they formulated three classes of linear
estimators for the population total with coefficients for class one de-
pendent on the presence or absence of a unit in the sample, for class
two d~pendent on the order of draw and for class three dependent on the
particular sample involved. This article was the first to incorporate
the ideas of what was subsequently formalized as the axiomatic approach
to the formation of classes of estimators, although they did not explore
the logical consequences of this formulation. These requirements on the
coefficients for the classes will be seen later'to be the same as our
class two, class one and class three respectively.
They determined coef;f'icients for each class by imposing (a) the
condition of unbiasedness (that the expected value of the estimator be
equal to the total); and (b) that the coefficients so determined shall
be independent of the properties of the population. The authors them-
selves were aware that they were indicating only three of the possible, .
classes of linear estimators of the total when sampling a finite popula-
tion. It was subsequently shown that there were in fact seven classes
of linear estimators for sampling a filiite population with unequal
(general) selection probabilities and without replacement. It will be
seen in Section 4.0 that these same seven classes can readily be adapted
to the case where the sampling is done with replacement.
15
Horvitz and Thompson themselves indicate that they did not consider
the general solution of determining a "best linear unbiased" estimator
for the total of a finite population sampled with arbitrary probabil-
i ties and without replacement. Godambe (1955) considered this question
and demonstrated that a uniformly minimum variance unbiased estimator
for the total or mean of a finite population does not exist.
Godambe also put forward a "unified theory of sampling from a
finite population". He developed a system of notation to indicate the
element by the unit selected on a particular draw and the sequence of
units preceding the individual unit selected (i. e ., the particular sam
. ple involved)~ He also defined symbolically a system of probabilities
to handle this case, and proposed a "general" estimator which can be
seen to belong to class seven among the classes developed axiomatically
in this dissertation.
Koop (1957) recognized the systematic approach to the development
of classes of estimators implicit in the works of Horvitz and Thompson
and Godanibe, but not directly recognized by those authors. He posited
three axioms, axioms which are descriptions of physical realities, and
then, in a systematic fashion, derived seven classes of linear estimators.
This approach will be discussed more fully in Section 4.0, and the axio-
matic approach applied to the problem of determining classes of estima-
tors for a system of sampling where the probabilities are arbitrary and, - J •
the sampling is done with replacement of each unit before the next is
drawn.
16
In their article, Des Raj and Khamis (1958) made a comparison
between the arithmetic mean of the distinct units observed in the sample
when it is drawn with replacement, and the arithmetic mean of the total
ity of observed units including repetitions. They assumed equal selec
tion probabilities and made the comparison for both the case when the
sample size is fixed and the number of distinct units is random, and the
case when the number of distinct units is prespecified.
Basu (1958), in his article liOn Sampling with and without Replace
ment", written independently about the same time, made the same compar
ison as did Des Raj and ~amis, but not by an analytic method as did
Des Raj and Khamis.
Roy and Chakravarti (1960) acknowledged the researches of Godambe
(1955), Des Raj and Khamis (1958) and Basu (1958) and said they were
going further, obtaining an "admissible" estimator, together with a
"complete class of .estimators ll for a very general scheme of sampling.
This very general scheme whi.ch they propose has some exotic properties;
however, it appears that they have induced generality by deliberately
leaving some details unspecified. Their estimator can be shown to
belong to class two.
Godambe (1960) also demonstrated the Iladmissibilityll of an estima
tor which is algebraically equivalent to that proposed by Roy and
Chakravarti when the same restrictions are imposed. Godambe' s estimator
is the same as given for the class two estimator in Section 4.5. This
article will also be discussed at length later, in Section 5.0.
17
Since this dissertation is discussing principally sampling with
replacement, the following additional recent articles are of interest.
Nanjamma, et al, (1959) propose a scheme of sampling with replacement
which leads to an unbiased estimator. Their scheme is to select one
unit with probability proportional to some auxiliary variable x, replace
it and then select the rest of the sample units from the whole popula-
tion with equal probability with replacement at each draw. For this~.
selection procedure the ratio estimator, R = y/x is shown to be un-
biased for estimating the population ratio R. The sampling variance
and an unbiased estimator thereof would be different from those in the
case of sampling with equal probabilities without replacement of the
units. The variance estimator they give as:
A A A2
V(R) = R -
v 2 vE ni(ni-l)y. + E n.nj
y .y .i=l ~ ifj ~ ~ J
n(n-l) x X
where X is the known population value. They also mention another
modification of the probability proportional to size (pps) with replace-
ment scheme which has the first unit selected with probability propor
tional to the size of the x-characteristic (ppx), replaced, and then the
remaining (n-l) units selected 'With replacement with ppz, where z is
another indicator of size. The estimator for this case is algebraically
the same as that of the usual biased ratio estimator in the case of com-
plete pps with replacement sampling, to wit:
,
18
which is unbiased for the ppx-ppz scheme by virtue of the new probabil
ity system.
stevens (1958) postulated a scheme whereby sampling with replace
ment could be made equivalent to sampling without replacement, thus
taking advantage of the simpler probabilistic manipulations. He showed
that sampling without replacement with pps can be achieved if the sam
pling units are grouped with reference to size. Then when the same unit
is selected a second (or more) time, it is substituted by another unit
of the same size chosen at· random. The estimate of the population total
is then formally the same as when sampling is done with replacement.
Des Raj (1958) compared the efficiency of an estimator for the case
of sampling with probability proportional to size and with replacement
with the efficiency of some alternative methods such as: simple average
(simple random sampling), ratio, regression, proportionate allocation
stratified and optimum allocation stratified sampling. Zarkovic (1960),
in making essentially the same comparison, added difference estimates
and dropped the optimum allocation stratified sampling estimate.
One final aspect of the literature apropos to this dissertation is
the relative absence of any consideration of estimation criteria (other
than unbiasednessand minimum variance) applicable directly to samples
from finite populations. By this is meant the absence of criteria which
do not depend on artificial devices such as letting the size of the
finite population approach infinity. For instance, Madow (1948) claims
that under very broad conditions the usual theorems concerning the
limiting distributions of estimates hold for estimates based on samples
19
taken from finite populations, at random without replacement. He also
states that under the same conditions, the same conclusions are true for
samples drawn with replacement, if the approach to infinity by the size
of the "finite" universe is within the limitations imposed by "condition
w" • In his paper, Madow "proves" that the limiting distribution of the
mean is normal "provided only that as the universe increases in size,
the higher moments do not increase too rapidly as compared with the
variance, and that for sufficiently large sizes of sample and population
the ratio of n to N is bounded away from one."
Another frequently used conceptual device [see, for instance,
Cochran (1946), Des Raj (1958)] is to make the assumption that the
finite population itself is a random sample from an infinite super-
population, making the sample a second- stage sample.
Using "consistencyll as an illustration, this being a universally
accepted desirable criterion for any estimator, very few authors use a
definition applicable to samples from finite populations. For instance,
in the textbooks on sample survey theory: Yates (1953) does not bother
to give a definition; Cochran (1953), Hansen, Hurwitz and Madow (1953),~
and Sukhatme (1953) all give the "infinite" definition involving con-
vergence in probability. Cochran (1" 13) does actually give a IIfinite"
definition of consistency, but in the next paragraph he returns to the
convergence in probability definition saying that "the idea of consist-
ency does not play an important part in the subsequent exposition."
Only Deming (1960) uses a suitable definition, although he does not
state it explicitly but refers to Fisher's paper "On the Mathematical
20
Foundations of' Theoretical Statistics" (1921). He does make the state
ment, though, that lithe bias of' this estimate is inconsistent, i.e., the
bias if' any does not diminish to zero as ni approaches Ni " (p. ;20).
This whole question of' estimation criteriaf'or samples f'rom f'inite
populations is discussed in Sections ;.; and ;.4.
•
21
3 .0 ON THE BASIC CRITERIA FOR A THEORY OF SAMPLING
3 .1 Components of the Sampling Problem
It has been said that sample survey theory is easy because it deals
mainly with the estimation of means or totals and the variance of these
estimates. This statement is made in spite of the multitude of problems
which can beset a sampler in real life situations, in spite of a be
wildering maze of formulae which can be present for a very involved
multi-stage survey; and also in spite of complex formulas and difficult
terminology which often confuse the practicioner in the field and those
trying to glean some knowiedge from: the report of the survey. The two
conflicting Viewpoints arising in the above situations would seem to be
resolved if the first is attributed to a non-sampler who is looking at
sampling from the broad spectrum of the traditional approach to estima
tion and attribute the second to the practicing sampler who sees the
multitude of problems that arise when actually conducting a survey.
The resolution of these viewpoints would be very difficult, since
many aspects of the traditional (infinite) approach to estimation do not
hold when applied to the sampling of a finite population. (By the term
finite is meant a size below that which might be categorized as "indef
initely large", for which the infinite theory would hold, at least
approximately. )
In the study of the theory behind various aspects of sampling ~
conglomeration of problems may be encountered: one can select the sam
ple systematically, purposefully or probabi1istical1y; one can have an
unrestricted sample or one can stratify, or use clusters, chunks or
22
quotas; one can have equal, unequal, arbitrary or judgement probabil-
ities or probabilities proportional to certain measures of size; one
can have a single-stage or multipJ.e-stage sample; one can use mean-per-
sampling-unit, regression, ratio or more elaborate estimators; one can
use biased, deliberately or accidentally, or unbiased estimators; one
can study the effects of response and non-response errors; and so forth.~
But behind all these related or unrelated aspects of sampling there are
five components of any sampling plan, the first three of which, at least,
must be specified~rior to any theoretical or empirical investigation.
First and foremost there must be a well defined UNIVERSE; a
universe which consists of the totality of ultimate units of analysis
about which information is desired and which is invariant under further
considerations of the particular sampling investigation being carried
out. 'For the universe one must next specify the FRAME, Le., a descrip-
tion (e. g. by maps ) and/or listing of all sampling units (each contain-
ing one or more units of analysis) which comprise the universe or a '
sufficient portion thereof, if the sampling operation is planned in
several stages, to conduct the survey. For a full discussion of the
concept of the "frame II see Deming (1960, ch. 3).
This dissertation will be concerned With a single universe from
which the units (i.e., the ultimate units of interest) can be selected
in one stage. Thus the universe under consideration can be said to be
simple. Correspondingly the frame is simply a list.
23
Given the universe and the frame, next define a PROBABILITY SYSTmC
for the possible selection of every unit revealed by the frame [see Keop
(1960)] • When the frame is a list, as above, the probability system
will be defined by a single sequence of non-negative numbers which sum
to unity. For more complex frames (those which show' the universe in
separate portions and in which the units are in some hierarchal or
nested order) the probability system will be correspondingly complex
and will consist of a sequence of probabilities specific to each unit or
subdivision of the frame (strata, first stage units, second stage units,
etc. ). For a geometrical representation of a probability system see
Feller (1957, p. 118 ff).
Then the SAMPLING PROCEDURE comes operationally from the selection
probability system and is the scheme for determining which particular
units are to be drawn for the sample.
And also, for every logical combination of a specific frame and a
specific probability system, there is a specific problem of determining
an ESTIMATOR; the problem of selecting the arithmetical procedure of
estimation which will "optimally" (in some sense of the word) give the
information desired from the survey in the first place, i.e., the esti-
mates of the population values of the characteristics under observation.
2The use of the word llsystem" follows the usage of Carmichael(1937) who states "A set of objects, with the associated rule Or rulesof combination, is called a system, or, more explicitly, a mathematicalsystem. " Thus the use of the term system is intended to connote notonly the simple Pi values but also any applicable associated rules of
combination necessary for full specification.
24
Schematically the directions of influence between the five compo-
nents can be represented by the following diagram:
r )UNIVERSE ----+-) FRAME ---,..> PROBABILITY SYSTEM
t~SAMPLING OPERATION~ESTIMATOR
Given the frame and the probability system, one may be able to get
an "optimum" sampling plan and an "optimum" estimator. Vary either the
frame or the probability system, or both, and the problem of getting the
sample and estimates, or comparing various methods for obtaining the sam-
ple and estimates, changes. Problems of choice within the last two com-
ponents, i.e., the sampling operation and the estimation process, con-
stitute most sampling research, and are the source of the statements
that the study of sample survey theory is rather difficult and frequent-
ly involved in algebraic complications. But all five components must
be spelled out in detail for any individual survey. Further, the first
three components must be specified accurately and completely, for no
amount of refinement or elaborateness in the last two can overcome de-
fects in the first three, e.g., definition or delineation of the frame
or selection probabilities.
It might be noted here that the formulation of these five compo-
nents has ignored several other parts of any sampling problem, equally
as important as the five given, but which depend on the individuals
planning the survey, and not on the process itself. These non-prob-
abilistic problems are involved with the mechanical process of
25
accumulating the sample data, and would include requirements that the
objectives of the study are well defined, that the appropriate question
naire is designed to obtain the desired information in a manner which
can be used, that the answers obtained are to the questions on the sur
vey questionnaire as designed, and not as interpreted by the interview
er, and that the units actually interviewed are the units selected by
the sampler designing the survey.
The neglect of these ideas, fundamental to the study of sample
survey theory, is a great source of confusion and difficulty in much of
the research into comparisons of sampling methodologies done thusfar.
3.2 On the Question of Sampling with or without Replacement
As an example of the application of the principles discussed in the
preceding section, the statement can be made that there is no valid· di
rect comparison between sampling with replacement and sampling without
replacement. A comparison between the two is possible, but only on an
indirect, total (or multiple) basis. That is, since completely differ
ent probability systems are involved, two complete sampling plans must
be run, with a final judgement as to which is better depending on com-
parisons of end results for items such as variances and costs involved.
With equal sample Sizes, and no consideration of cost, then there
is agreement that sampling· without replacement is better than sampling
with replacement (using the mean of the sample units as the estimator)
by virtue of a smaller variance, Le., (N-n)cr2InN versus cr2In. How
ever, when cost i~ considered, the conclusions are not clearcut, for the
cost of the with-replacement sample is dependent, not on the total
units in the sample
26
sample size, but on the mmiber of distinct units included in the sample.
The problem of making comparisons between the two then involves the dis
tribution of v (the number of distinct units when sampling with replace
ment) which is discussed in Section 8.0.
Apropos to this discussion of with versus without replacement, two
articles' already mentioned in Section 2. a will now be discussed briefly;
that by Des Raj and Khemis (1958) and that by Basu (1958).
Des Raj and Khamis compare the arithmetic mean of the distinct
1 v(y =- Z y.) with the arithmetic mean of thev· v i=l 1.
totality of observed units
times the i-th unit appears
_ . 1 v(y = - Z kiy.,
n n i=l 1.
in the sample).
where k. is the number of1.
For the two cases that they
examine, the applicability of their results is restricted by assuming
equal selection probabilities (P. = liN) 0
1.
For their case A (n fixed, v a random variable) they then have a
neat algebraic inequality to show that
to wit:
1o - = Q 0
n
Thus for the restrictive case of sampling with replacement and with
equal selection probabilities, Des Raj and Khemis have shown that the
arithmetic mean of the distinct unit characteristic values in the sample,
27
has smaller variance than the arithmetic mean of the totality of ob
served variate values. (Actually the strict inequality only holds for
n ~ 2, but for n = 1 no estimate of the variance is possible.)
Basu (1958) in an article entitled "On Sampling with and Without
Replacement" attempted the same comparison that Des Raj and Khamis made,
utilizing an "indirect proof" of the inequality
2(N-V !!-)
E N-l v(J'2
< n(n > 1) •
(Note that had Basu used a definition for (J'2 which used N-l as a divisor
rather than N, the left-hand expression Would have simplified consider
ably from the standpoint of taking expectations.) The proof of this
inequality is not apparent. For the case of equal selection probabil
ities, the conclusion of his argument runs as follows (with notational
changes to correspond with the above):
"Since Yn is an unbiased estimator of Y, it follows at once
that ~ is also unbiased. It also follows that, for any convex
(downwards) loss function, Yv has a uniformly bet'ter risk func
tion than y. In particular VCy ) < V(y ), the sign of then v - nequality holding only when n = 1. Thus the inequality is proved.We may note in passing that T (the vector of distinct observations) is a sufficient statistic here although not a completeone. No uniformly best unbiased estimator of Y exists." .(1958, p. 290).
Basuls argument for the general case of arbitrary probabilities also
rests· on the idea of sufficiency and he claims that the same inequality
holds. But the concept of sufficiency is not relevant for finite popu
lations (see Section 3.3.3), so where does the argument rest? Whether
or not the inequality does hold in fact, merely stating an intuitive
28
belief does not constitute proof. It can be conceded that the vector of
distinct observations does, in a physical sense, contain all the informa
tion in the sample, but with selection probabilities and possible obser
vational weights necessary for estimation known in advance and independ
ent of the characteristics under study, or determined by counting the
.appearances of the units, a mere statement of sufficiency does not
constitute a proof, unless one is redefining sufficiency.
From the above arguments, should one be restricted to sampling
without replacement and forget entirely sampling with replacement? This
question has not been answered. This is not the question actually
attacked by any of the authors, or what was actually proved in the one
case. The question of whether one should sample with or without re
placement, as does the question of the numerical structure of the se
lection probability construct, arises in the consideration of the
probability, system to be used in a given survey problem. The decision
may be made on the basis of choice, or may be dictated by external cir-
. cumstances, but once specified cannot be altered without changing the
entire problem. And it is a decision which must be made before one pro
ceeds to the steps of selecting an "optimum" sampling procedure or an
"optimum" estimator.
It·; is undoubtedly for these reasons that the various authors who
consider the question of sampling with or without replacement start out
saying thi~ is the comparison they are making, but actually make a com
parison between using an estimator based on the totality of observed
units and 'one based on just the distinct units drawn in the sample of n,
29
both for the case of sampling with replacement. As said earlier, a
comparison is possible, but only by duplicating the entire sampling plan
and then comparing end results,· remembering that the costs involved are
behind every step of every comparison.
3.3 The Applicability of Traditional Estimation Criteria
From the above arguments, then, there are five components of any
sampling plan, all of which are essential to the estimate which is
finally obtained. Of these five components
Universe,
Frame;
Probability System,
Sampling Operation, and
Estimator
the first three must be completely specified before any problem involv
ing the last two can be discussed. The problems involved in obtaining
an "optimum" sampling plan for any given situation, subject to consid
erations such as costs, expediency, etc., will not be discussed in this
dissertation.
When one comes to the position of deciding on an estimator to be
used to arrive at an estimate of the desired mean or total (or other
population value), one can choose from within a variety of specific
estimators for a given situation. However, behind this choice of a
specific estimator lies the problem of determining which one is "best"
for the purposes at hand, or even deciding whatcriteria should be used
in resolving the question of bestness. Neyman (1952, p. 158) made the
,
30
following comment along these lines:
"While there is likely to be general agreement as to thedesirability of using the best, or at least a satisfactory,method of making assertions regarding Tl, there may be difficulty in explaining exactly what properties a method ofestimation should posseSB1n order to qualify as the 'best'or as 'satisfactory'. And without having such an exactexplanation, without knowing exactly what we are lookingfor, it is obviously hopeless to expect that we shallever find it. If it were possible to devise a methodof using the values of the observable random variablesto predict exactly and without fail the value of theestimated parameter, then there would be universal agreement that the method in question is the best imaginable.However, it is obvious that, barring some very artificialexamples, such a method does not exist and we have to putup with unavoidable ,errors."
With 'this "unaVOidable error" thus present in any estimate, what cri-
teria are to be used for determining the choice of estimator? This
question is particularly appropriate to the problem of estimation based
on a sample from a finite population. The traditional, or classical,
approach to this question of criteria for estimators has been based on
concepts developed for and largely applicable to infinite populations,
and samples therefrom.
Fisher's magnum opus on estimation (1925) posited that:
"Any body of numerical observations ••• may be interpreted as a random sample of some infinite hypotheticalpopulation of possible values. Problems of estimationarise when we know, or are willing to assume, the formof the frequency distribution of the population, as amathematical function involving one or more unknownparameters, and wish to estimate the values of theseparameters by means of the observational record available. A statistic may be defined as a function of theobservations designed as an estimate of any such parameter. The primary qualifications of satisfactory statistics may most readily be seen by their behaviourwhen derived from large samples." (p. 701)
From this bieginning, then, the criteria for determining "bestness:' in
1
31
estimators have been developed as if all estimators that might be ques-
tioned are based on samples that came from an infinite population.
But what of the problem of estimation based on samples from finite
populations? Fisher makes the statement that estimation problems arise
when one knows or is willing to assume the form of the frequency distri-
bution of the population. However, in sample survey theory little or
no attention is paid to the abstract distribution of the characteristics
under observation (abstract distribution meaning, for each characteris-
tic, a sunnnarization by histographic methods to indicate the proportions
of units contained between arbitrarily chosen bounds for the measure of
the characteristic under consideration). For infinite populations the
abstract distribution is identified with a frequency distribution, but
the frequency distribution concept does not yield operational probabil-
ities for sampling purposes, i.e., probabilities of the form f(x)dx
are not very realistic as selection probabilities. To impute the fre-
quency distribution approach, a classicist would use randomization con-
cepts, where there is no discrimination against or preferential treat-
ment for a unit on other than probabilistic considerations.
The problem of estimation in sample surveys is to determine the
method of weighting the sample observations (this being dependent on the
method of selection of the units that comprise the sample, and the known
selection probability system) to produce the "best" estimate of the
desired population value.
What really occurs in sample surveys is this. There is a universe
of units, U. (i = 1, 2, •.. , N), each of which has associated with it aJ. .
32
vector of charac~eristics, say Yi = (Yli' Y2i, •.• Yhi)' One must note
that "i"is not necessarily a simple index, but may be an extended index
with a number of sUbscripts sufficient to identify the unit in the hier-
archal structure of the frame, however complicated it may be. If one
desires to examine the j-th characteristic, Y' i (which will hereafter. . J
be ,denoted by xi)' then a set of units is drawn from among the Ui ac-
cording to the probabilities prescribed by the system. Then a function
of the characteristics observed for the units included in the sample is
calculated to estimate the mean or total for that particular character-
istic for the finite population under study, Le., calculate
,., '"f(X
i) = Xor T •
Also, to compare alternative estimation procedures, or to "evaluate" the
estimate that this process yields, one may compute a "variance", a func
!.' 2tion of the form f(Xi - ~) , which can be used as a measure of the pre-
cision or as a "bestness" indicator.
In the traditional approach to determining the optimality of this,
or a chosen, estimator, one would like it to possess those properties
Let us now examine, within the framework given earlier for sampling from
33
a finite population, each of these concepts in turn to see to what
extent they can be applied to estimates for population values for finite
universes.
3.3.1 unbiasedness. This is probably the most universally recog-
nized attribute for an estimator. Unbiasedness is concerned with the
distribution of an estimate, and requires that the distribution be
"centered" on the population value (parameter), 1.e., that the expecta-
tion of the estimator is equal to the population value being estimated.
It should be noted here that the concept of expectation must be
modified to be applied to finite populations. Essentially it can be re-
garded as an averaging over all possible samples, i •e. , "the mean of the
distribution of the estimates X, each X being calculated by the rules
contained in the sampling procedure for all the possible samples that .
one can draw by applying the procedure to a given frame" [Deming
(1960)]. One can express this as
S A
= E 1i es=l s,s
,.,where e
.sdenotes the value of the estimator calculated from the s-th
sample; and
to1is denotes the probability of selecting the s-th sample.
The expectation of an individual unit would be expressed by
where Xi is the measurement of the characteristic under study; and
Pi is the probability of selecting the i-th unit on a given draw.
34
This criterion of unbiasedness certainly can be applied to an esti
mator based on a sample from a finite population. However it must fall
into the category of a potentially desired attribute rather than a
universally required one since:
a) if the standard error of the estimator is large, the fact that
it is unbiased is rather incidental;
b) it is possible that a biased estimator will give a more precise
estimate, i. e., have a smaller mean square error. The decision
as to wheth~r or not to require unbiasedness in this situation
must rest on a consideration of the total error, which arises
from bfas and sampling variation together. In general, how-
ever, one should not use a biased estimator unless an upper
bound can be computed for the bias from known properties' of the
universe in question.
To further cloud the issue, there may be some problems in which un
biasedness of the estimate might be more important than a smaller error,
if, say, large amounts of money, or even life, might be lost on a wrong
decision.
3.3.2 Consistency. The criterion of consistency is less stringent
than that of unbiasedness in that it requires unbiasedness "in the lim-'
it". The traditional and universally accepted definition of a consist
ent estimator can be stated as:
f(.!) p-> e or P.r [If(.!) - el > e: ] > 8 for n > N(e:, 8).
This is the definition that is used or cited almost univ~rsally in the
books on sampling. However the concept of convergence in probability
35
leaves something to be desired when one thinks of a finite (rather than
indefinitely large) population. Fisher (1956, p. 145), in fact, makes
the comment in his latest book that "the asymptotic definition is satis
fied by any statistic whatsoever applied to a finite sample".
Fora definition of consistency which applies to samples from.
finite populations, it would be best to use Fisheris 1921 definition:
"Consistency.--A statistic satisfies the criterion of consistency if, when it is calculated from the whole population,it is equal to the required parameter." (p. 310)
This definition is very satisfactory for sample survey theory, and with
this definition, the criterion of consistency is certainly a reasonable
one to require for any estimator.
3.3.3 Sufficiency. Sufficiency, at least in the traditional sense,
requires that the whole of the relevant information (not the current
popular usage of "information") available in the sample will be con
tained in, or utilized by, the estimator(s) which is (are) computed. It
was in this general sense that Fisher first defined sufficiency in 1921,
i.e.
"Sufficiency.--A statistic satisfies the cri.terion of sufficiency when no other statistic whic can be calculated fromthe same sample prOVides any additional information as tothe value of the parameter to be estimated." (p. 310)
or
" ••• sufficiency, which latter requires that the whole ofthe relevant information supplied by a sample shall be contained in the statistics calculated.". (p. 367).
From these first general statements a more formal definition has come
into universal usage, this definition being, as given in Fraser (1958,
p. 218):
..
36
"We have the definition:A statistic t(x) is a sufficient statistic, if, giventhe value of t"[x), the conditional distribution isindependent of the parameters.
o Evaluating conditional distributions can often betedious 0 Fortunately we have a criterion that avoids this:
A statistic t(x) is a sufficient statistic if andonly if the probability or density function can be;factored,
into two' parts, one dependent only on the statisticand the parameters, the second independent of theparameters. "
Sufficiency, then, for the infinite population case is definitely to be
aimed at, although not always obtainable. For a finite population, how-
ever, one cannot admit this concept as being relevant in view of the
considerations set forth below.
In a special sense every sample of any size Whatsoever is suffi-
cient for estimating the desired population value. Firstly surveys are
interested in estimating means, totals, ratios or other functions of the
measurable characteristics revealed by the ultimate units. These pop
ulation values (which may, only by convention, be termed parameters) are
logically separate from their respective selection probabilities as re-
vealed by the probability system. Secondly, since probabilities enter:into
sampling only in the process of selecting the units to be included in
the sample, and not with the 'characteristics to be measured, the condi-
tional distribution of the sampled characteristic values from any size
sample, depending only on these selection probabilities and the sampling
procedure, is independent of the population value being estimated. Thus
the concept of sufficiency is not relevant, at least in the context of
37
the universally accepted definition as quoted above from Fraser. (One
might note that this logical separateness of the population values and
the selection probabilities ,is an essential feature of sample survey
theory. Without this separateness no sampling operation would be pos
sible and therefore there would be no meaningful theories of sampling.)
Reference might be made to the original definition given by
Fisher, but that is too general and subject to the same type of criti
cism as just given against the more complete (complex) definition.
The one place that the traditional definition of sufficiency might
fit would be for the case when sampling from a finite population with
replacement where the vector of ni-values (with ni denoting the number
of times that the i-th unit appears in the sample) would be sufficient
for estimating the probabilities of selection of the sampling units.
However, these are known or assumed at the start of the sampling proce
dure, and have no bearing on the characteristics carried by these units
which are being measured.
Thus one must conclude that the notion of sufficiency has little
meaningful application to estimators based on samples from finite pop
ulations • Actually Basu (1958) is the only author to argue strongly for
sufficiency; other authors have been silent on the question since (pre
sumably), as indicated, its logical basis is rather insecure for finite
populations.
3.3.4 Efficiency and Minimum Variance. Efficiency seems to be one
of those concepts for which every author has his own definition, the
common denominator of which seems to be a connection with the idea of
minimum variance; hence they will be discussed together. The one set of
definitions which is directly akin to the problem at hand of choosing
criteria for estimators from finite populations is that an efficient
estimator is that one from among several satisfying a set of other cri-
teria which has minimum variance. That one is taken to be the most
efficient, and the relative efficiency of the other estimators is meas-
ured by the ratio, less than unity, of the minimum variance to their
variance. This notion, as indicated, extends directly to the problem
of estimating from a sample taken from a finite population, since it is
entirely logical that if there isa choice of' estimators, the obvious
selection would be the one with the smallest spread or variance.
The other group of definitions centers around asymptotic ideas, and
is one place where Fisher and Neyman agree, to wit:
"Here, again, I agree unreservedly with Fisher that, whenseveral consistent estimates of' the same parameter areavailable, all tending to be normally distributed, theone with the smallest. variance is preferable to others."--Neyman (1952, p. 188)
The dependence on asymptotic normality rules out this definition. In
the case of a single universe of N units, the most that n can tend
to, when sampling with replacement,. is N,iI which cannot yield a nor-
mally distributed estimator.
While discussing minimum variance, a digression might be made from
the main stream of thought to set the historical record straight with the
following quote from Neyman (1952, p. 227):
"Laplace himself studied certain problems on the assumptionthat the loss due to an error in estimation is directlyproportional to the absolute value of' the error. On theother hand, Gauss noticed that various results became
39
more elegant if the loss is assumed to be proportional tothe square of the error committed so that
Upon reflecting on the general nature of errors of measurements, in particular on the possibility of systematicerrors, Gauss f01.Uld it necessary to impose on the estimateF (X ) another condition, that of 1.Ulbiasedness, expressedn nby the identity,
It will be seen that the two conditions, one of the1.Ulbiasedness of F (X ) and the other of minimum ex-.. n n .pected loss measured by the square of the error, formulate the now familiar problem of best 1.Ulbiased estimates.All this was reported to the Konigliche Societat derWissenschaften in G6ttingen on February 15, 1821, andsubsequently published in Latin."
Of the classical criteria of estimation, these two, which outdate almost
all statistical theory, are about the only ones that apply to finite
population problems.
Further, as shown by Das (1951), Godambe (1955) and others, minimum
variance estimators may not exist in an estimable form, because the co-
efficients or weights for the observations necessary to produce a mini-
mum variance estimator may be enmeshed with the other variate values.
3.4 Criteria for Estimators from Finite Populations
With all this, then, where does this subject stand? What criteria
can or should be applied to determine which estimator is optimum when
the sample is from a finite population? Judging from the frequency of
mention in the literature, there would seem to be the following:
1. Consistency: The chosen estimator should be consistent, not
40
in the sense of convergence in probability to the population value being
estimated, but in the sense of:
Definition: A statistic satisfies the criterion of consistencyif, when it is calculated from the whole population, it is equal to the required population value.
The more restrictive condition of unbiasedness could well be listed as a
universal criterion, if it were not for the fact that there may be sit-
uations where an estimate with a disappearing bias will better meet
other criteria for "bestness tl• If there is no bias, then consistency is
assured. If a bias is to be allowed to be present, however, one should
be able to determine an upper limit for that bias in terms of some char-
acteristics of the sample or population.
2. Minimum variance or minimum mean square error: In the case of
an unbiased estimator, these two are the same, but in the general case
they are related in the following manner: MSE = V + (Bias)2. Except in
the case where there are compelling reasons for ignoring this criterion,
it is certainly evident that one would want an estimator which gave as
narrow a spread to the estimates of the population value as possible.
It would seem that these are the two major criteria, at least from
among those based on probabilistic considerations. They have their rel-
ative importance determined by the given particular situation at hand.
Both are to be desired, but there may be situations where one or the
other of them is an overriding consideration to the detriment of the
other, for example, where unbiasedness is of such importance that var-
iance is taken at face value rather than considered a restriction. Or,
as indicated earlier, a minimum variance estimator might not exist in an
41
estimable form, although one might then modify "minimum" to choose the
estimator with the smaller variance.
There are two other criteria which might be mentioned, a.lthough
they are of lesser rank than consistency and minimum mean square error,
and not based on probabilistic considerations. These are:
3. Cogredience (Independence of scale): To satisfy this criterion,
our estimate f(~) must ha.ve the property f( c ~) = c f(~). For example,
if two people are estimating lengths from the same observations, one
measuring in feet. and the other in meters, we would like them to get
equivalent estimates, expressed in feet and meters respectively. (This
really should be taken care of in interviewer instructions.)
4. Ease of Computation: It would seem desirable, all other things
being equal, that an estimate be easy to compute. The more complicated
the form of an estimator, the more expensive it is to produce estimates
and the more time it may take to get results which can be acted on •.
With the advent of the large computers, though, this objection may be
disappearing. Also along these lines, if past history continues, the
process of adapting techniques so that they can be handled on the com-
puters may well indicate that, with further work, simplifications and
short cuts can be developed and approximations found' which would serve
most purposes.
The third criterion mentioned would seem to be essential, although,
as mentioned, it should be required before the consideration of estima-
tion problems. Ease of computation should possibly be considered a
desideratum, rather than a criterion, but that is a matter of semantics
42
--certainly it is not as dominant a criterion as either of the first two
mentioned •.
ThUS there are two major criteria to apply to the problem of select
ing an estimator from among a class of estimators for samples from a
finite population: that it be consistent and that it have a minimum
mean square error.
4;
4.0 THE GENERAL CLASSES OF LINEAR ESTIMATORSFOR SAMPLING WITH REPLACEMENT
4.1 Introductory Remarks
The first formal approach to the problem of determining classes of
estimators for samples from finite populations was that of Horvitz and
Thompson (1952). They formulated three classes of linear estimators.
for the population total for a scheme of sampling from a finite popula
tion with arbitrary probabilities arid withoutreplacement. These classes
were formed by having coefficients dependent on the order of draw, the
presence or absence of a unit in the sample, and on the particular sam-
ple drawn, respectively. However, they did not formalize their ideas
for establishing classes of estimators, and thus did not pursue them
further.
Godambe (1955) formulated what hecal~ed a "unified· theory" of
sampl:ing from finite populations. This theory, actually a generalized
basic theory, was not axiomatic in nature, although Godanibe apparently
recognized some essence of formality in his approach and that of Horvitz
and Thompson (but he too failed to formalize the deductive process).
For his theory Godambe did posit a generalized notational system which
could cover both probability systems where the units are. drawn with or
without replacement, however, this system is not an operational system
for. determining probabilities. It will be seen that Godanibe' s most
general estimator would fall into class seven under the aXiomatic ap-
proach presented subsequently.
44
Realizing that one must have some definite set of rules for estab
1ishmentof the classes when formulating groups or classes of estimators
for samples from finite populations, Koop (1957) developed an axiomatic
approach. This axiomatic approach, with axioms based on the physical
realities of sample formation, i.e., the way things actually happen,
would seem much more basic for establishing classes of estimators for
samples from finite populations than attempts at utilizing the classical
"infinite" estimation criteria such as unbiasedness, sufficiency, admis
sibility, completeness, etc. In fact, the notion that there are crite
ria for which one can develop classes of estimators is not germane to
sample surveys. In sample survey theory, the classes of estimators are
developed first, e.g., by axiomatic methods, and then criteria such as
unbiasedness or minimum mean square error are applied to various estima
tors in each class to attempt a determination of bestness within that
class. The generality of the axiomatic approach is also of considerable
theoretical advantage because it provides the basis for determining the
optimum probabilities in any defined sense.
For sampling from finite populations, axioms, to be useful, must
be based on physical realities, since sample survey theory is opera
tional in a physical sense. These axioms, as postulated by Koop are
"three features inherent in the nature of the process of selection" of'
the sample. They are stated as follows:
lI(i) the order of appearance of the elements,
(ii) the presence or absence of any given element (in the sample)
which is a member of the population (or universe), and
(iii) the set of elements composing the sample considered as one
of the total number possible (in repeated sampling according
to the given probability system)." (1957, p. 25)
These three possible features, or combinations thereof, which are. in
herent in the selection process and therefore sampling procedure, supply
the basis for the deductive construction of seven general classes of
estimators. The seven result from taking the axioms singly, two at a
time, and, most generally, all three together.
He derived the classes of estimators for estimating the total of a
finite population when sampling with arbitrary probabilities and without
replacement. This thesis will consider the case of sampling from a fi
nite population with arbitrary probabilities, but with replacement of
each sampling unit preceding the drawing of another unit.
These estimators of the population total (note that the choice
between discussing the total or the mean is completely arbitrary) for a
characteristic under study will be listed and discussed in Sections 4.4
through 4.10 inclusive. For each class of estimator, weights or coeffi
cients will be determined which (a) satisfy the criterion of unbiased
ness, (b) are independent of the properties of the population, 1.e., of
the measurable characteristic(s) under observation in the sample, and
(c) are positive.
In connection with requirement (b), Koop has shown that, for the
general classes of estimators, minimum variance estimators do exist, but
the weights for such estimators are enmeshed with the variate values of
the characteristics of the sampling units. Thus, although theoretically
46
eXistent, such weights are non-estimable; hence for all practical pur-
poses, minimUlll variance estimators do not exist. For this reason this
study will restrict consideration to weights which are independent of
the values of the characteristics under study.
4.2 Probability System and Notation
4.2.1 The Probability System. Consider a population of N sam-
pIing units, Up U2, • • ., UN' Associated with each of' these units is
a vector of measurable characteristics, say Yi = (Yli' Y2i' ... , Yhi).
A sample of n of these N units is to be drawn in a manner which is
completely specified before the sampling procedure begins, and from ob-
serving certain of the vector characteristics of the sample units it is
desired to estimate the aggregate of these characteristics pertaining to
the universe under consideration. For drawing the sample it is given
that the probabilities of selection at any given draw are arbitrary
(arbitrary in the sense that they can assume discretionary non-negative
values, not necessarily equal) with the sole restriction that when
summed over all units in the universe they sum to one.
Also for the system under consideration, each unit is required to
be returned to the universe after it is drawn and measured, and bef'ore
the next unit is drawn. The case where the sampling is done without
replacement of the units has already been mentioned above. The most
general case where the units mayor may not be replaced, depending on
some arbitrary or systematic method of determination, or where they may
be replaced in clUlllps after a certain number have been drawn, or some
such chaotic situation will not be discussed, for fairly obvious reasons.
47
Within this framework, then, the following probabilities are to be
considered, with attendant notation. For an explanation of notation see
Section 4.2.2.
Pi -- the probability of drawing the i-th unit on any given draw. These
Pi values will be constant for all draws. The only restrictions
on the values of the Pi are that 0 < Pi < 1 and that Pl + P2
+ ••• + PN = 1. Allowing either equality in the bounds on the p.J.
Ps
effectively reduces the size of the universe and thus will not be
considered.
pi = 1 - (1 - Pi)n = the probability that the i-th unit will appear,
any number of times, in a sample of n units drawn with replaceN
mente I: p~ = E(v) where v is the number of distinct unitsi=l J.
among the n units drawn (see also Section 8.0).
= IT p.ni = the probability. of obtaining a ~iven particulari€s J.
v
sample,
n:P ,-- =..""....;~
s JT n.!i€s J.
V
I: n = n •i
n.11 p. J. = the probability of obtaining a specific comi€sv J.
bination of units, disregarding order of draw, but with the same
number of appearances of each unit in the sample (i.e., a constant
ni vector, ni ~ 0). This would be the sum of' n! ~Tr ni : Ps - terms •h€sv
P -svI:
P(nlv)
n!
JT n.!J.€S J.
v
IT niPi = the probability of obtaining a given
i€sv
distinct sample, that is the set of samples with the same set of
48
distinct units. The n. vector is disregarded other than when the-J.
elements are non-zero. vnThis would be the sum of' A 0 Ps-terms.
4.2.2 Notation and Def'initions. Def'initions:
Particular sample -- a given individual sample, i.e., the ordered array
of' units resulting f'rom the n draws comprising the sat,nple. They
will be S = ~ in nuniber.
Distinct samplei
a sample containing a· set of' v distinct units, dis-
regarding the number of' times each unit appears. A distinct sample
is the set of' particular samples with the same :A.iui vector, where
:A.. = 1 if' the i-th unit appears in the sample any nuniber of'times,J.
[Ref'. Riordan (1958,
The n. vector is disregarded other than-J.
nThere will be S' = ~ S =
v=l vFor each distinct sample of' v distinct
f'or nonappearance.= 0
whether its elements are non-zero.n~ ~ distinct samples.
v=l
units there are AvOn particular samples.
1'. 91 ).]
Indices (Subscripts):
i the unit index f'or the universe. i = 1, 2, ••• , N.
...s ref'ers to a particular sample, or is the index f'or sunmJ.a.tion
over all particular samples. s = 1, 2, ••• , S•
ref'ers to a distinct sample, or is the index f'or summation
overall distinct samples. s = 1, 2, ... , st.V ..
t the index denoting the order of' draw f'or the sample units.
t = 1, 2, ... , n.
v an indexf'or summation over the dif'f'erentpossib1e nunibers
of' distinct units. It also denotes the nuniber of' distinct
units among the n units in the sample (1 ~ v.~ n).
Letters:
n the sample size.
the number of' times the i-th unit appears in the samplej
S
S'
E n = n.i€s i
vref'ers.to particular samples, If' in number.
denotes the total number of' distinct samples,
number.
nE eN in
vv=l
z
denotes the number of' distinct samples of' size v, i.e., the
samples of' n with v distinct units, ~ in number f'or a
given v.
(with a subscript) will be used as a characteristic random
variable to denote appearance of' a unit according to the
,
:;
specif'ic subscript assigned.
6vOn the dif'f'erences of' zero notation. 6 is the f'inite dif'f'erence
operator: 6 un = un+l - un' Thus the dif'f'erences of' zero
would be
(2n_ln ) _ (In.On)
2n_2(ln) + On
~On :; [(3n_2n)_(2n_ln~ _ [(2n_ln)_(ln"On)]
For additional discussion see Whittaker and.Robinson (1944)
or Riordan (1958). See also Section 4.3(3). Tables of' 6vOn
50
were given by stevens (1937) and.were reprinted by Fisher and
Yates (1949, table 22).
P(nlv) -- denotes the v-part partitions of n, that is, all sets of non
zero values for n. (i € s ) such that r n. = n. The full~ v . ~
~€Sv(proper) partitional notation, as given by Chrystal (1900,
p. 556), would beP(nlvl ~ n-v-l), i.e., the partitions of
n into. y parts no one of which exceeds n - v-l, but the
shorter form will be used.
i€sv those its (ubits) contained in.a distinct sample.
s::> i those samples (distinct samples) which contain the i-th unit.
eSv:;)i)
4.3 Some Combinatorial Considerations
The following are some combinatorial considerations concerning a
sample of size n drawn from a finite population of size N . with re-
placement.
(1) The total number of possible samples is S = Ifl since each
unit drawn is replaced prior to the drawing of the next unit.
(2) The total number_Of distinct samples, i. e ., samples containing
different sets of v. (1 ~ v ~ n) distinct units, is
S' =n1: S =
v=l v
+eNn
since there are CN combinations of v units from.atotal of N.v
51
(3) The total number of samples of size n which will contain v
. distinct units, Le., the number of ways of putting n different objects
into v different cells, With no cell (among the v) empty, is given by
Riordan (1958, p. 91) as:
where Sen, v) is the Stirling number of the second kind. This could
also be written:
from which it follows that
=1 for v = 1
= n! for v = n
= 0 for v > n.
(4) From paragraphs (2) and (3) another ex:Pression for the total
number of possibJ,.esamples would be:
(5) COD.!Sider all those sam:Ples of size n,. each containing v dis-
tinct units; the total number which contain the i-th unit is N-lC l'v-
52
(6) Thus the total number of particular samples containing the
i-th unit is
(7) If one is given that the sample of n contains v distinct
units, and that one of those units is the i-th unit, there might be some, ,vn'
interest in the number of those t::. 0 sa.mple~ which contain the i-th
unit a given number of 'times. Then with the help of the respective
diagrams it is easy to see that number which contain i
n - I,-- -JA'- ---..
once is:
i.-., .
~----="" ,,--:-;-__~..J~.
v-IVotherdistinct units
• 0 It
n - 2___------'A----__
'------,vr--------'
i i v-I others
twice is:
n - v-I times is: n - v-I v - I
iii v-I others
thus:
53
(8) It follows from (7) that the total number of times that the
i-th unit will appear in the s6mples of size n with v distinct units
is
=n-v-l
E r Cn ~V-l On-r •r=l r
From this, the total number of times that the i-th unit will appear in
all ~ s6mples (Le., v = 1, 2, ... , n) will be:
n n-v-lI = n + E CN-l E r Cn ~V-l On-r •
v=2 v-l r=l r
I, as a total quantity, can be derived much more simply by noting that
the number of appearances of a particular unit, say the i-th, among the
~ units which appear in the Nn s6mples is symmetric in the N units,
and thus
I =~/N = n ~-l •
This approach, however, does not provide any information concerning the
component structure of I.
4.4 Class One Estimator
4.4.1 The Estimator (of the universe total). The class one estima
tor, with weights dependent solely on the order of appearance, is given
by
nTl = E at x.
t=l ~t(4.4-1)
where at (t = 1, 2, ••• , n) is the weight attached to the element
selected at the t-th draw and x. is the value of the characteristicJ.t
measured on the i-th unit observed on the t-th draw.
4.4~2 Number of Weights. The total number of weights isn, one
for each draw.
4.4.3 Determination of the Weights. The first step in the deter
mination of weights is to determine the expectation of Tl , as follows:
n'E(Tl ) = E( 1: at xi )
t=l t
n= 1: at E(x. )
t=l J.t
For Tl to be unbiased E(Tl ) must identically equalN1: x., i.e.,
i=l J.
N n1: xp 1: a E
i=l i~, t=l t
which requires that
nPi 1: at = 1 ' for i = 1, 2, ••• , N.
t=l(4.4-2)
This condition effectively says that for Tl to exist as an estimator,
all the Pi must be equal, i. e.,
Pi = P = liN •
Hence the at exist only when the Pi are equal and not in the general
55
case. In this situation, a solution is
so that
at = N/n
n NT = L: -x1 t=l n it
t = 1,2, ... , n (4.4-3 )
(4.4-4)
This is a well known estimator which is readily seen to be unbiased.
4.4.4 Variance of Tl . The variance of Tl' for the case when prob
abilities are equal·and when cit = N/n , is
if n= 2" V( L: x. ) .
n t=l J.t
Because of independence of the draws, one from another,
~= 2"
n
2nO"
which can be estimated by
(4.4-5 )
(4.4-6)
56
4.5 Class Two Estimator
4.5.1 The·Estimator. The'class two estimator~ with weights depend-
ent solely on the presence or absence of a given element in the sample,
is given by
(4.5-1)
where ~. (i ~ 1, 2, ••• , N) is the weight attached to the i-th element~ .
whenever it appears in the sample, and where i€sv denotelS summation
over the distinct units in the sample (v ~ n).
4.5.2 Number of Weights. The total number of weights is N, one
for each sampling unit.
4.5.3 Determination of the Weights. Since the weights, the ~i'
are attach~d whenever the i-th element appears in the samp;I.e, and summa-
tion is over the distinct units, to determine the ~i (4.5-1) must be
rewritten as
(4.5-2 )
where z. is a characteristic random variable for which~
= 1 when the i-th unit appears in the sample, irrespective
of the number of times it appears,
= 0 when the i-thunit does not appear in the sa.mple, and
where, see (8-10),
57
Then:
N *= E f3ixi p. •i=l 1.
For unbiasedness
v *- 1: t3.Pi x.
i=l 1. 1.
which imposes the requirement that
f3.P~ = 1 for all i.1. 1.
Therefore, for unbiasedness, it is found that the weights are uniquely
dete:rmined as
..Thus the unbiased linear estimator for class two is
where p~ = 1 - (1 - p1..)n1.
(4.5-3 )
(4.5-4)
= the probability that the i-th unit appears in a sample of
n units drawn with replacement.
58
This is the analogue of the Horvitz-Thompson estimator. It also has
been propounded, for the case of equal selection probabilities, by
Godambe (1955 ) and Roy and Chakrava,:rti (1958).
4.5.4 Variance of T2 • The variance of T2
is given by
NV(.T2 ) = V( E ~iX'Zi)
i=l ~
Substituting (8-12) and (8-13) into this expression, and also substi-
tuting f.or the ~i' yields,
2N x.
~= E
i=l pi'2i
where P~= 1 - (l-pi)n
* * ( )n~ = 1 - Pi =l-pi
* n~j= (l-pi-Pj ) •
This ,can be ~itten more concisely as
*. N (l-pi ) 2= E -- x.*. ~
i=l p.~
* * *~qj - ~j
**PiPj
An estimate of V( T2 ) can be obtained as
.*= (1-1'1) 2E --:;2 xi -
ie:s p.v =!-
NE
ifje:sv
* * *~qj - ~j
***pipjI'ij(4·5-6)
59
The functional similarity between this and the Horvitz-Thompson variance
formula is readily apparent.
4.5.5 Numerical Example. To illustrate the procedures involved in
Section 4.5, consider the following examples, based on all possible
samples of sizen = 3 from two simple four-unit populations.
Unit: A B C D- - -"Pi 1/2 1/4 1/8 1/8
1-(1-Pi)3 * .8750 .5781 .3301 .3301= Pi
Case A:
Case B:
Xi 3 4 8 5
xi/P~ 3.4286 6.9192 24.2351 15.1469
x. 8 5 4 3~
* 9.1429 8.6490Xi/Pi 12.1175 9.0882
When setting up Case A, the numerical values were assigned to the units
at random. It was also deemed advisable to examine the situation where
the probabilities are somewhat proportional to the size of the units,
thus the numerical values were reassigned to the letter-units to produce
this situation as Case B.
When drawing samples of size 3 (n=3) with replacement, the fo11ow-
ing distinct samples are possible:
A: AM(64)
B: BBB(8)
C: CCC(l)
D: DDD(l)
AB: AAB(32), ABA(32), BAA(32), .BBA(16), BAB(16), ABB(16)
60
AC: AAC(16), ACA(16), CAA( 16 ), CCA( 4), CAC( 4), ACC( 4)
AD: AAD(16), ADA(16), DAA(16 ), DDA( 4), DAD( 4), ADD( 4).-
BC: BBC( 4), BCB( 4), CBB( 4), CCB( 2), CBC( 2), BCC( 2)
BD: BBD( 4), BDB( 4), DBB( 4), DDB( 2), DBD( 2), BDD( 2).CD: CCD( 1), CDC( 1), DCC( 1 ), DDC( 1), DCD( 1), CDD( 1)
ABC: ABc( 8), ACB( 8), BAC( 8), BCA( 8), CAB( 8), CBA( 8)
ABD: ABD( 8), ADB( 8), BAD( 8), BDA( 8), DAB( 8), DBA( 8)- -
ACD: ACD( 4), ADC( 4), CAD( 4), CDA( 4), DAc( 4), DCA( 4)
BCD: BCD( 2), BDC( 2), CBD( 2), CDB( 2), DBC( 2), DCB( 2).
The number f'ollowing each sample, _when divided by 512, is the probabil
ity of' obtaining that particular sample (p ).s
This example, with N = 4, n = 3, produces the f'ollowing results f'or
the class two estimator:
A 64
B 8
C 1
D 1..AB 144
AC 60
AD 60
BC 18
BD 18
*T2 = E xi/Pos ies
v~
v
Case A Case B
3.4286 9.1429
6·9192 8.6490
24.2351 12.1175
15.14b9 9·0882
10.3478 17·7919
27.6637 21.2604
18.5755 18.2311
31.1543 20.7665
22.0661 17·7372
CD 6
ABC 48
ABD 48
ACD 24
BCD 12
From this then,
61
T2 = .1:: xi/p;s ~€sv v
Case A Case B
39.3820 21.2057
34.5829 29.9094
25.4947 26.8801
42.8106 30.3486
46.3012 29.8547
for Case A (3-4-8-5): 512 E(T2) = 10239.6540 or E(T2) = 19·9993 ;
for Case B (8-5-4-3): 512 E(T2) = 10239.8865 or E(T2) = 19.9998 ;
i •e •, the estimator is unbiased since T = 20.0000.
Using (4.5-5), the variances for these examples can be determined
as follows:
* car *1 * 2 2 2 * * 2 * *Pi ~ Pi xiA xiB xiA~/Pi xiB~/pi
A .8750 .1250 .1429 9 64 1.2861 9.1456B .5781 .4219 .7298 16 25 11.6768 18.2450C .3301 .6699 2.0294 64 16 129.8816 32.4704D .3301 .6699 2.0294 25 9 50.7350 18.2646
A: 193·5797 B: 78.1256
- AB AC AD BC BD CD.* * .0527 .0837 .0837 .2826 .2826 .4488~qj
~j = (l-Pi -Pj )n .0156 .0527 .0527 .2441 ..2441 .4219
* * * .0371 .0310 .0385 .0385 .0269~qj - %j .0310
* * ·5058 .2888 .2888 .1908 .1908PiPj .1090
62
AB . AC AD BC BD CD
(~qj - ~j )/p~pj .0733 .1073 .1073 .2018 .2018 .2468
xixj(A) 12 24 15 32 20 40
XiXj(B) 40 32 24 20 15 12
A: 25.4299 B: 18.9654x 2 x 2
50.8598 37.9308
Thus, for Case A:
and, for Case B:
V(T2)A = 193.5795 - 50.8598
= 142.7197 ,
V(T2)B = 78.1256 - 37.9308
= 40.1948.
..
When computed directly from the possible sample estimates, i. e., by
using
the following results are obtained:
Case A: V(T2)A = 142.6741
Case B: V(T2)B = 40.1744
The slight discrepancies are due to rounding off errors • Further, using
(4.5-6), the following estimates of the variance are obtained for the
various possible samples listed above:
Sample Case A Case B
A
B
C
not esti:rn.e,ble
not estimable
not estimable
63
Sample Case A Case B
D not estimable
AB 19.8801 36.0524
AC 389.4929 101.5662
AD 151.7662 60.3442
BC 396.5720 119.2456
BD 163.2139 78.8808
CD 513.0051 143.4502
ABC 390.8153 116.4865
ABD ·159.4966 77.9345
ACD 505.6374 141.2127
BCD 505.4355 156.3205
4.6 Class Three Estimator
4.6.1 The Estimator. The class three estimator, with weights
dependent solely on the distinct sample drawn, is given by
(4.6-1)
where r (s = 1, 2, •• 0' SI) is the weight attached to the s -thSv v v
distinct· sample whenever it appears. SI is the number of distinct sam-
p1es and
sample 0
i€s again denotes summation over the distinct units in thev
4.6.2 Number of Weights. The total number of weights isn
SI = E eN with there being C~ different sets of v distinct unitsv=l v
in the sample of n.
64
4.6.3 Determination of the Weights. Imposing the criterion of
unbiasedness says
(4.6-2 )
where P denotes the probability of obtaining the s-th sample. (4.6-2)s
can be rewritten as
where i€U denotes all i in the universe and s :) i denotes those
samples containing the i-th unit. For T3
to be unbiased requires that
E 'Y P =1s:) i Sv s
for all i.
This expression can be rewritten as
n n= E E SE€S 'YsvPs = E E 'Ys E Ps
v=l s:;) i v=l s::>i v S€Sv V V V
where, in the triple sum, the first summation is over the possible
values of v. The second summation is over those distinct sets with v
units which contain the i-th unit. The third summation (with index
S€S ) groups the particular samples (sets of n, ordered by draw) intov
distinct sets (those with distinct sets of v units). The third sum wtll
vngroup ~ 0 terms together, one for each particular sample within the
distinct sample, and one can readily observe that
the requirement for unbiasedness becomes
nE E 'Ys Ps = 1 ,
v=l s:::>i v vv
E P = Ps SS€S VV
Thus
(4.6-3 )
from which a solution for the r issv
1= --=,,~-
N-ln C 1v-
1• Ps
v
(4.6-4)
since
Es :::liv
1
n eN- lv-l
= (·E l/n c~=is :::liv
N-lC 1v-l= = -n eN- l n
v-l
Thus, from (4.6-4),
(4.6-5 )1T3 = _N-l
n P c-"s v-lv
Note that a more general solution would be
=
One could thus obtainnE E C
sv= 1.
v::;l s ::>iv ,.
additional solutions for the rls by suitable manipulation of the CIS.
with the restriction that
From the requirement for unbiasedness, (4.6-3) it can be seen tha.t
one can obtain the class two estimator for the restrictive case of equal
selection probabilities. Assuming that the r s are equal over all sv'v
(4.6-3) becomes
n nr(s) E E PEl
v v=l s::>i svv
or, since
unit,
n.r: .r:
vel s :;,iv
1:r =s *v p
p = p* = the inclusion probability for the i-thSv
66
and T'3 (equal)1=-*P -
4.6.4 Variance of T3
• One can determine the variance of T3
as
follows:
n S= .r: .r:v P (rs . .r:
vel s =1 sv v ~€SV V
This can be estimated by
(4.6-6)
/::::-..
To obtain an estimator for V(T3)~ an unbiased estimator, rrF, must be--2 N 2 N
found for T' = .r: Xi + .r: XiX. • The simplest unbiased estimator ofi=l ifj J
rrF is given by
2..r: Xi
~ = _~_€_s-=v:-::-__V n eN-l,p
v-l sv
+
p PI sv sv
where P = =sv N n n S
1 - I: Pi I: I:v P
v=2 si=l s =1 vv
Note that at least two different tinits :(v.?~ 2) • are required to esti
mate the cross product term. The above can readily be shown to be un-
biased as follows:
67
=
2
(
I: Xi )~ ~v P __i_€_S,..lv ___
v=l s =1 sv n CN-1 Pv· v-1 sv
=
=
1
n ~-1v-l
I: xixj
)I:
( ifJ·· n S ( irJ.. xixJ )E(n-l) ~N-2 p'
= I: I:v p(n-1) C~-2 p'v=2 s =1· sv
v-2 s v v-2 sv ··v
=n
I: x,xj
I: I:irj€U ~ v=2 sv~i,j
1
(n-1) CN-22v-
N
= ilj xixj ,
Thus:
68
n eN-1 Pv-1 sv
1:: xxi~j€s i j
,v
(n-1) CN-22
pIv- s. v
(4.6-7)
It may be noted that this estimator can be negative for certain samples.
4.6.5 Numerical Example. To illustrate the techniques of Secti~n .
4.6, the examples of Section·4.5.5 can be used. Since distinct samples
are again involved, the following results are readily obtainable for
1T =-~~-- Z x3 n ce=i Psi€s i
. v v
(4.6-5 )
(2)n ~-;L
v-1
A 64 3B 8 3C 1 3D 1 3
AB 144 9AC 60 9AD 60 9BC 18 9BD 18 9CD 6 9
ABC 48 9ABD 48 9ACD 24 9BCD 12 9
(3 )
1:: Xi T3
= 512(3)/(1)(2)i€s sv v
A B A B
3 8 8.0000 21.3333 .
4 5 85.3333 106.6667
8 4 1365.3333 682.6667
5 3 853.3333 . 512.0000
7 13 2.7654 5.135811 12 10.4296 11.3778
8 11 7.5852 10.4296
12 9 37·9259 28.4444
9 8 28.4444 25.•2840
13 7 123·2593 66,.3704
1.5 17 17.7777 20.1481
12 16 14.2222 18.963016 15 37·9259 35.555617 12 80·5926 56.8889
69
From this then,
for Case A (:3- 4-8-5) : 512 E(T:;) = 102:;9.9878 or E(T:; )A = 20.0000,
for Case B (8-5-4-:;):512 E(T:;) = 102:;9.9983 or E(T:;)B = 20.0000,
i.e., the estimator is unbiased.
Using (4.6-6) (which is the same computationally as the variance
of all possible sample estimates), ·the variance of T:;, for these exam
ples, ·is:
V(T:;)A = 5331.8290 ,
V(T3 )B = 1601.6463 •
Further, using (4.6-7), th~ various samples delineated. above for which
v = 2 or 3 produce estimates of the variance.of the total as follows:
Sample Case A Case B
AB - 38.729:1. -130.4508
AC -135.6382 ..179.9976
AD - 84.2017 -135.6382
BC 406.8677 192.8370
BD 192.8)70 166.8239
CD 11429.0031 :;291.9930
ABC 99.6849 -138.2485
ABD - 71.4258 -116.9902
ACD 485.2026 432.7377
BCD 4318.3894 2141.5599
70
4.7 Class Four Estimator
4.7.1 The Estimator. The class four estimator, with weights
dependent on both the presence or absence of a unit and the order of
appearance of the units, is given by
(4. r-l)
where 5it(i = 1, 2, ••• , N; t = 1, 2, ••• , n) is the weight attached
to the i-th element whenever it appears at the t-th draw.
4.7.2 Number of Weights. The total number of weights is N n (N
weights at each of n draws).
4.7.3 Determination of the Weights, Since the weights, 5it ' are
attached depending on the appearance of the i-th element on a particular
draw, as for the class two estimator, the estimator can be rewritten in-
troducing the characteristic random variable Zit' Thus (4.7-1) be\':--
comes:
N nE E 5. t x.Ziti=l t=l J. J.
(4.7-2 )
where{
= 1
= °if the i-th element appears at the t-th draw
if the i-th element does not appear at the t-th
draw
and E(zit) = Pi sinCe: the individual draws are independent.
11
Taking expectations, .
N n= I: x. I: B.tPi
i=l 1, t=l 1,
Imposing the criterion of unbiasedness, i.e., requiring thatN .
E(T4) = i~lxi ,means that the Bit can be determined by setting
for i = 1, 2, ... , N.
The obvious solution for this is to s~t
1Bit = nPi
(4.1-3)
which weights hold for all i, and are independent of the order of the
draw (the t' s ) • This yields the familiar
A more general solution might be Bit = rot/Pi where I: rot = 1 ,
but it is well known that the variance of a linear function, with arbi-,
trary weights is minimized when the weights are equal, Le., when
rot = lin for all t.
When the selection probabilities are equal it is readily seen that
the class four estimator reduces to the class one E7stimator (4.4-3) •.
72
4.7.4 Variance of T4• To determine the variance of T4 , set
N n 1V(T4) = V( E E np xizit )
1=1 t=1 i
(4.7-5)
Note that the terms involving Cov(xit,Xitt) and COV(Xit, Xjt ,) dis
appear by virtue of the independence of the" draws, One from another, so
that the termsinvo1~ng the t-thand t t-th draws· have zero covariance.
Now, from multinomial theory for a single draw,
V(Zit) = Pi % where % = 1 - Pi
COV(Zit' Zjt) = - PiPj .'
Substituting these into (4.7-5) produces
N n X~ N nV(T4) = Z E ~ 2 Pi % + Z Z
i=1 t=l n Pi i~j t=1
2N Xi 1-Pi= E - (-)
i=1 n Pi
N 2 N 2Xi 1= E - ( E Xi)
i=l nPi ni~l
N 2r/-Xi
= E -- - -i=l nPi n
N [ xi 2
- Pi r]1= 1:; P1 (P
i)n i=l
and
(4.7-6)
73
One can estimate this variance by using
.... 1V.( T4) =---,.;;;;...~......n(n-l).
nE
t=l(4.7-7)
where T= 1n
nE
t=l
It is to be noted that (4.7-7) always produces positive estimates of
the variance which is a definite interpretational attribute.
4.7.5 Numerical Example. The class four estimator depends on sum
mation over the units -of the sample as they appear, and not just the
distinct units observed. ThuS, in using the four-unit population from
Section 4.5.5 as an example, the interest is in the groups of samples
that have the same units the 'same number of times. Order is not. impor-
tant, so the samples and results for this case can be grouped as follows:
1T4 = n (4.7-4)
unit A B C D. .
{ Case A 6 16 64 40Xi/Pi
Case B 16 20 32 24
74
Sample 512 PSI .E xi/Pi T4i€s st
A B A B
AM 64 18 48 .6.0000 16.0000
AAJ3 96 28 52 9.3333 17·3333
Me 48 76 64 25.3333 21.3333
AAD 48 52 56 17.3333 18.6667
ABB 48 ?8 56 12.6667 18.6667
ABC 48 86 68 28.6667 22.6667
ABD 48 62 60 20~6667 20.0.000
ACC 12 134 80 44.6667 26.6667
ACD 24 . 110 72 36.6667 24.0000
ADD 12 86 64 28.6667 21.3333
BBB 8 48 60 16.0000 ' 20.0000
BBC 12 96 72 32.0000 24.0000
BBD 12 72' 64 24.0000 21·3333
BCC 6 144 84 48.0000 28.0000
BCD 12 120 76 40.0000 25·3333
BDD 6 96 68 32.0000 22.6667
CCC 1 192 96 64.0000 '32.0000
CCD 3 168 88 56.0000 29·3333
CDD 3 144 80 48.0000 26.6667
DDD 1 120 72 40.0000 24.0000
75
Thus, for these examples:
for Case A (3-4-8-5): 512 E(T4) =10240.0000 or E(T4)A =20.0000 ,
for Case B (8-5-4-3): 512 E(T4) =10239.9994 or E(T4)B =20.0000 ,
i.e., the estimator is unbiased.
Using (4.7-6), the variance of T4, for these examples, is
V(T4)A = 131.3333 ,
V(T4)B = 9·3333.
When computed directly from the sample estimates, the results are:
V(T4)A = 131.2813 ,
V(T4)B = 9.3333·
Further, using (4.7-7), the following estimates of the variance are
obtained for the various possible samples listed above:
Sample Case A Case B
AM. not estimable
AA'B 11.1111 1.7778
MC 373.7778 28.4444
AAD 128.4444 7.1111
ABB 11.1111 1.7778
ABC 320.4444 23.1111
ABD 101.7778 5.3333
ACC 373.7778 28.4444
ACD 283.1111 21.3333
ADD 128.4444 7·1111
76
Sample Case A Case B
BBB not estimable
BBC 256.0000 16.0000
BBD 64.0000 1.7778
BCC 256.0000 16.0000
BCD 192.0000 12.4444
BDD 64.0000 1.7778
CCC not estimable
CCD 6400000 ,701111
eDD 64.0000 7·1111
DDD not estimable
4.8 Class Five Estimator
40801 The Estimator. The class five estimator, with weights
dependent on the presence or absence of a particular unit in the dis-
tinct sample drawn, is given by:
(4.8-1)
where es i (i = 1,2, 000' N; Sv = 1,2, 0.0' Sf) is the weight attachv
ed to the i-th element whenever it appears in the s -th samp1eo Summav
tion again is over distinct units.
408.2 Number of Weights 0 The total number of weights isn n .~ v eN = N ~ CN-1
1 with V eN corresponding to the situation wherev=l v v=l v- v
there are ~ combinations of v distinct units from among the· N , and
samples with v distinct units which
a weight is attached to each of the v
_N-lAlternatively, there are c-" 1v-
units in the distinct sample.
77
contain a given specific unit, say the j-th, and N such units.
4.8.3 Determination of the Weights. As with the cla§lsthree esti-
mator, to determine weights for the class five estimator which satisfy
the criterion of unbiasedness, expectation must be taken over all pos-
sible samples. This leads to equations of the following form:
s= sE__
lPsEes 1 Xi
i€s vv
which, in turn, stipulate that for the estimator to be unbiased one must
determine a set of weights satisfying
N nE x. E E
i=l J. v=l s :>iv
NEli E Xi •
1=1(4.8-2 )
The solution of this equation, or the determination of a set of values
which satisfy it, is a problem in combinatorial number theory. As a
special case of the class five estimator, if the subscript "1" is sup-
pressed, one can determine the directly froIl). the identity (4.8-2).
78
Thus
nE E P e ( ) = 1
v=l s::)i Sv Sv iv
must hold in order that T5= !: es (i) xi be unbiased. This, of.. i€s vv
"course, is the same criterion as obtained for the class three estimator
(4.6-3). This yields as a general solution es (i) = Cs (i)/PS withn v v V· v
the restriction that !: !: Cs (i) = 1 , or yields a specific solutionv=l s ~i v
v
e·v(i) = ~ p.v ~=ir ·Another solution is the estimator given by Basu (1958), which be-
longs to this class for certain special values of the c' s. Consider
n
E E CSv(i) = 1v=l s ';;)i
v
i = 1, 2, ••• , N
(4.8-3)
The c-coefficients relate to the possible samples of size n containing
v = 1, 2, ••. , n distinct units. Also it is only meaningful to determine
them in the context of probability values relating to samples of size n.
The right hand side of (4.8-3) will result in multinomial probabilities
relating to samples of size n-l.· Now multiply both sides of (4.8-3) by
Pi' yielding
n ( )n-lp. E !: Cs (i) = PJ.. Pl + ... + PN ,for all i,
J. v=l sv:;)i v
n-l= p. E'!: E (n-l) !
J. v=l Sv P(n-llv) JT njv
79
Choose the following solutions to this equation:
(i = 1, 2, ... , N; v = 1, the i-th
unit),
pC ()=p [I:i s2 i i P(n-1-!:2)
(n-1)!n -In Ii" j.
(i = 1,2, ••• , N; v = 2, say units
i, j),
(i =1, 2, •• 0 , N; v = 3, say units i,j,k),
or, in general
(i = 1,2, ... , N; v = 4, say units i, j,
k, h),
80
(n-l) !
+ E , (V- l )!P(n-llv-l) n .•n j •...n jJl 2 v-l
(i = 1,2, •• . ,Nj 2 =5 v =5 n-l, say units i,
jl' j2' .oo, jv-l)·
(4.8-4)
The solutions listed above hold simply because the sum of the multi-
nomial probabilities in the square brackets for any given i, and for
( N )n-lall sets of distinct v's, add to E Pi =1. Of course, in the
i=llight of the above demonstration, Pi need not be multiplied to both
sides of the condition of unbiasedness, (4.8-3), but this device helps
in the choice of the probability functions in relat~on to the sets of
possible distinct samples of size n •
It will be seen that the sum of the multinomial probabilities in
square brackets on the right hand side of each equation, when multiplied
by p., is the probability of selecting the i-th unit to complete a given~
collection of v distinct units.
Thus the coefficients can be determined as
which yields the estimator
where [
81
P [ ] xiT' 1:
i (4.8-6)=5 P Pii€sv Sv
] denotes the term inside square brackets in (4.8-4).
This estimator, when divided by N, is equivalent to the estimator ob-
tained by Basu for estimating the population mean.
Also, the estimator given by Des Raj and Khamis (1958) belongs to
this class and is a special case for equal selection probabilities,
i.e., p. = liN for all i. In this situationJ.
(l/N)(~von-l + ~v-10n-l) N-(n-l)
~vO~-n
• ( ~vOn-l + ~v-10n-l )
~vOn
using the "differencesof zero" notation, as explained in Section 4.3(3),
rather than the summation notation. Further, from the definition of
~vOn given in Section 4.2, it can be readily shown by induction that
so that
Another ... special case can be obtained by stipulating that
e . = 8i for all s ~ i • . This.situation produces a requirement fors J. vv
unbiasedness which is identical to that of the class two estimator. An
82
alternate derivation would be:
nsince E E P is the probability that the sample includes the
v=l sv~i Sv
i-th unit, and. so equals P~. Thus, as for (4.5-3) ~
4.8.4 Variance of T5
. In very general form, the variance of T5
Will 'be
n= E
v=l(4.8-7)
e'
which can be estimated by
83
[
i~S X~- n P v eN-i
s vv(408-8)
using the simplest unbiased estimator of T2 as given in Section 406040
Again negative estimates of the variance are possible 0
4.8.5 Numerical Example~ The coefficients for the class five
estimator depend on the appearance of a particular unit in a given
distinct sample. Using (4.8-6) to illustrate this class, the coeffi-
cients for the distinct units within each sample are determined from
(4.8-5) after eValuating (4.8-4) to determine the [ J-termo This
term is dependent on the selection probabilities, the sample size n, and
the number of distinct units v. For samples of size n = 3, this term
is:
for v = 1
for v = 2
for v = 3
[[[
J= P~ ,
] 2=2p.p. + Pj' ,
~ J
J= 2pjPk '
and is applied to the coefficient for the i-th unit in the distinct
sample.
84, .
T5s (i)v.
Sample P Estimator Case A Case Bsv
A 64 2A 6.0000 16.0000
B 8 4B 16.0000 20.0000
C 1 8c 64.0000 32.0000
D 1 8D 40.0000 24.0000
AB 144 10 A +16 B 10.4444 17.77789 9
AC· 60 ~ A + 16 C 29·2000 22.40005 - 5
AD 60 ~ A + 16 D 19.6000 19.20005 5
Be 18 20 B + 32 C 37·3333 25·33339 9
BD 18 20 B + 32 D 26.6667 21.77789 9-
CD 6 >4c + 4D 52.0000 28.0000
ABC 48 gA+ !!B+.§C 28.6667 22.66673- 3 3.. -
gA+ !!B+.§DABD 48 20.6667 20.00003 3 3
ACD 24 gA + .§C+.§D 36.6667 24.00003 3 3
BCD 12 !!B+ .§C+.§D 40.0000 25·33333 3 3
Thus, for these examples:t 9
for Case A (3-4-8-5): 512 E(T5) = 10239.9976 or E(T5
)A = 20.0000,, ,
for Case B (8-5-4-3): 512 E(T5
) = 10240.0042 or E(T5
)B = 20.0000,
i.e., the estimator is unbiased.
85t
The variance of T5, computed directly from the distribution of
estimates, is
tV(T5)A = 118.5349 ,
V(T~)B = 8.3961.
4.9 Class Six Estimator
4.9.1 The Estimator. The class six estimator, 'With weights,
dependent on both the order of appearance of the units and the partic-
ular sample involved, is given as
(4·9-1)
where ~ (s = 1, 2, ••• , S) is the weight attached to the s-th samples
whose elements appear in a specified order, and xit is the character
istic value observed on the i-th unit at the t-th draw.
4.9.2 Number of Weights. The total number of weights is S =~
since for this case where attention is paid to the ordering of the
elements within the sample, there will be a separate weight for each
sample.
4.9.3 Determination of the Weights. Taking the expectation of
(4.9-1) over all possible samples yields
where summation is over all i appearing in s, including repetitiona.
86
Thus
S= ~ p ~ E xi
s=l s s i€s
N ,= 1: xi E P 4>
i=l s s s
with E' denoting sunnnation over all appearances of the i-th unit, I
in number as derived in Section 4.;(8). Imposing the condition of un-
biasedness, i.e., setting
NE x. E 1 P 4>
J. S si=l SH
requires that
Niii 1: x .
. -1 J.J.-
for all i • (4.9-2 )
A set of weights which satisfied this requirement would be
1P ",I
S LJ
SH
= 1P Is
(4.9-; )
where I = n If--l is the number of times that the i-th unit appears in
the If' samples and is developed in Section 4.;(8); This yields, then,
as an estimator of the population total
87
1n
T6 = Z xitPIs tel
1n
= Z xP n rf~l t=l it
s
N n= 1: Xitn Hs t=l.
(4.9-4)
where
For the
IT ni(NPi)
i€svcase where the selection-probabilities are equal, i.e.,
Pl." = P = l/N, then H = 1- and $ reduces to the familiar form:s s '.
$s = N/n, so that, for the equal probability case,
This , it will be recalled, is the estimator obtained in class one for
the restrictive case, i.e., equal selection probabilities, for which the
class one esti:m.ator did exist. Further, if one sets $s = $ for allSv
S€Sv' then the class three estimator can be derived, since the require-
ment for unbiasedness would be
sS:l Ps $sv i;S Xi
;;; T
n1: Xi 1: 1: $ 1: P - T
sv,i Sv si€U vel S€Sv
N n1: Xi 1: 1: $s Ps
;;; Ti=l vel s :::»i v vv
88
or
nE E $s pi!! 1
v=l s =»i v Svv
which is the same as (4.6-3), the unbiasedness requirement for class
three.
4.9.4 Variance of T6.. The variance of T6 can be determined
quite simply as follows:
S n' 2V(T6) = E Ps ($s E. xit ) - ~ •
s=l t=l
Further expansion of this expression would become involved, for, with
summation over all units drawn including repetitions, some of the cross-
product terms are, in fact, squares. An estimator for the variance of
T6 would be
[
E X~2 ie:s
= T6 · -. P Is s
+
where :t is the total number of appearances of the i-th unit in all
possible samples and L is the total number of times the (i,j)-cross-
product occurs in all possible samples. That this is unbiased follows
directly from the expectation methods used in this section, and along~
the lines used to prove·"rl- is unbiased in Section 4.6.4.
4.9.5 Numerical Example. The weights for the class six estimator
depend on the particular sample. For brevity in the listing, it can be
noted that under the assumption that the selection probabilities are
coIistant over all draws, the probability of obtaining a particular sample
89
~l depends on the units drawn, and not on. the ord~r in which they are drawn.
Thus ~articular samples having the same units the same number of times
can be lumped together, as in the discussion of the class four estima-
tor. Again using the four-unit population of Section 4.5.5, the esti-
mates produced by
T6 = 1E Xi (4.9-4)
Ps I ie:s
would be:
512 Ps f 512 Ps PsI = Psnlif-1.E Xi T6ie:s Sf
A B Case A Case B
AM 64 64 6 9 24 1.5000 4.0000AAJ3 96 32 3 10 21 3.3333 7.0000MC 48 16 3/2 14 20 9·3333 13.3333AAD 48 16 3/2 11 19 7.3333 12.6667e)ABB 48 16 3/2 11 18 7·3333 12.0000ABC 48 8 3/4 15 17 20.0000 22.6667ABD 48 8 3/4 12 16 16.0000 21.3333ACC 12 4 3/8 19 16 50.6667 42.6667ACD 24 4 3/8 16 15 42.6667 40.0000ADD 12 4 3/8 13 14 34.6667 37.3333BBB 8 8 3/4 12 15 16.0000 20.0000BBC 12 4 3/8 16 14 42.6667 37.3333BBD 12 4 3/8 13 13 34.6667 34.• 6667BCC 6 2 3/16 20 13 106.6667 69.•3333BCD 12 2 3/16 17 12 9006667 64,.0000BDD 6 2 3/16 14 11 74.6667 58.6667CCC 1 1 3/32 24 12 256.0000 128,.0000CCD 3 1 3/32 21 11 224.0000 117..3333CDD 3 1 3/32 18 10 192.0000 106.• 6667DDD 1 1 3/32' 15 9 160.0000 96.0000
e'
90
Thus, for these examples:
for Case A (3-4-8-5): 512 E(T6) = 10239.9952 or E(T6)A = 20.0000 ,
for Case B (8-5-4-3): 512 E(T6) = 10240.0000 or E(T6)B = 20.0000 ,
i.e., the estimator is unbiased.
Using (4.9-6), the variance of T6, for these examples, is
V(T6)A = 1009.9484
V(T6)B = 354.6458 •
4.10 Class Seven Estimator
4.10.1 The Estimator. The class seven estimator, the most general
class of estimators with weights dependent on the order of draw, the
presence or absence of a unit, and the p~icular sample involved, is .
given by:
n
~7 = t:l Vsit xit (4.10-1)
where V 0t (t = 1,2, .••. , n; i = 1,2, ... ,N; s = 1,2, ... , S) iss~
the weight attached to the i-th unit appearing at the t-th draw in the
s-th sample (whose elements, of course, appear in a specified order).
4.10.2 Number of Weights. The total number of weights is n if; n
for each of the ~ samples, since the V's depend on the ,sample, unit
and order of draw.,
4.10.3 Determination of the Weights. In a manner similar to that
used in class six, the restr~ctions for unbiasedness can be derived. .along the following lines:
91
S- ~ p ~ t.t x.
6=1 s i€s s~ ~
N= ~ X ~l ,I, P
i=l i s ~sit s
with ~l having meaning as in Section 4.9.3. Thus, to produce unbiaseds
ness the weights must satisfy
that is
for all i.
A general solution to (4.10-2) would be
lJrsit (4.10-3 )
where the c. t satisfy the restriction that ~' c it = 1 for every i.s~ s s
A more specific solution would again involve the use of combinatorial
number theory.
It can readily be seen that the class seven estimator is the most
general class, since by suitable suppression of the subscripts on the
lJrsit' one can reach any of the other classes of estimators. By requir
ing equality of the t.t for all it, the unbiasedness requirements~
(4.10-2) becomes
which is the same as that for class six (4.9-2). From there one can
92
move to classes three, two and one. By suppressing t, and setting
Vsi = Vs i ' one moves to class five, and from there to classes threev
and two. Finally, by suppressing s, Vsit = Vit and class four is
obtained; from which class one can.be reached for the case of equal se-
lection probabilities.
4.10.4 Variance of T7..' Again the general expression is the easi
est to manipulate for Whatever purpose might be at hand, and thus
(4.10-5 )
assuming the Vsit have been determined to produce an unbiased
estimator.
An estimator for the variance of T7
would be
'" n 2V(T7) = (I: V 't X,t), t=l SJ. J.
n . [I:X~2 i€s J.= (E Vsit xit ) - P I
t=l s
~
where T2 is as given in Section 4.9.4.
(4.10-6)
4.11 Summary of Numerical Examples
In illustrating several of the various estimators derived in this
dissertation, two numerical examples were used. These two four-unit
populations used the same numerical values, but in the second example,
Case B, the numerical values were assigned to the units so as to provide
selection probabilities at least somewhat proportional to size.
93
The two populations were:
unit A B C D
selection probability 1/2 1/4 1/8 1/8
numerical value Case A 3 4 8 5
Case B 8 5 4 3
For all classes of estimators studied, Case B provided better (in
the sense of smaller) results in terms of the variance of the estimator,
the range of the estimates of that variance, and the range of the esti-
mates of the population total. Further, for both cases used as examples,
the estimator given in Class Five as (4.8-5) had the smallest variance
among the unbiased estimators for which variances were determined.
For comparitive purposes the results can be summarized as follows:
Case A (random assignment of numerical values to units):
Range of estimates
Class Variance Totals Variances
2 142.7 3.4 - 46.3 19.9 - 513·03 5331.8 2.8 - 1365.3 -135.6 - 11429.04 131.3 6.0 - 64.0 11.1 - 373·85 118.5 6.0 - 64.06 1009·9 1.5- 256.0
Case B (probabilities somewhat proportional to size):
Range of estimates
Class Variance Totals Variances
2 40.2 8.6 - 30·3 36.0 - 156.33 1601.6 5·1 - 682.7 -180.0 - 3293·04 9.3 16.0 - 32.0 1.8- 28.4
5 8.4 16.0 - 32.06 354.6 4.0 - 128.0
5.0 SOME ADDITIONAL COMMENTS ON THE ESTIMATORS
The reader of this dissertation has undoubtedly noticed that some
of the weights given for the various classes of estimators are rather
formidable in appearance, especially if one is thinking about the compu
tational aspects of producing numerical results. The advent of the
large high-speed computers should help negate any reluctance to use a
non- self-weighting (1. e., self-weighting meaning equal simple weights)
design with "complicated" weights. Another approach to this problem
has been proposed by Murthy and Sethi (1961). Starting from the premise
that the effort required to produce the multipliers used in the estima
tor may be prohibitive where a non-self-weighting design is used in a
large scale survey, they propose a technique to substitute for the multi
pliers a very small number of multipliers called "randomized rounded-off
multipliers" , substituted by a suitable randomizing process, thus reduc
ing the computational burden. They suggest a procedure for determining
the values of the randomized rounded-off' multipliers which minimizes
the increase in the variance of the estimator.
Another item which might be a cause of concern is the possibility
that some of the estimators could have negative estimates of the sample
variance. In regard to this problem, Koop (1957, ch. 6) gives a very
complete discussion of the possibility of and interpretation of negative
estimates of the sampling variance, and these remarks will not be re
peated here. Also to be noted is that among the various estimators pro
posed in the various classes, only those estimators proposed in classes
one and four have variance estimators which always produce a positive
95
estimate of the variance. The variance estimators in the other classes
mayor may not produce a positive estimate, depending on the particular
sample involved.
Having formulated seven classes of estimators for the case of
sampling with replacement, the question might now be raised as to
whether any of the estimators can be eliminated from further considera-
tion by virtue of an estimator from another class having a consistently
smaller variance. For the general case such comparisons between the
variances will involve comparisons between quadratic forms which involve
both the variate values of the characteristic(s) under study and the Pi
vector, with the p. values arbitrary subject only to the restrictionN ~
that 1: p. = 1'1 ~~=
In general, to get an answer to this question, one would have to be
very specific, for the direction of the inequalities, from preliminary
considerations, would seem to involve the specific values of Nand n
under consideration, and also the specific probability vector, (~i) (or
at least its structure), applicable to the problem at hand. Given all
these specifications, it would seem that inequalities should exist but
imposing such restrictions does not yield a general answer to the ques-
tion of "bestness" of any of the ,estimators posited earlier.
For the restrictive case of equal selection probabilities, the
class one estimator can be eliminated from further consideration.
Des Raj and Khamis (1958) have shown that this estimator, which is the
arithmetic mean of the total sample for the case of equal selection
probabilities, has a larger variance than the arithmetic mean of the
96
distinct units observed when sampling with equal selection probabilities
and with replacement. This arit'hmetic mean of the distinct units (i. e.,
an estimator with weights N/v) belongs to class five rather than class
one.
One additional comparison (inequality) has been "provedll in the
literature and it is worthy of comment. Godambe (1960) shows that the
estimator given in Section 4.5 as the class two estimator (the sum over
the distinct units of the x's divided by their inclusion probabilities)
has smaller variance than any member of a class corresponding to class
five for some population. Th1~ follows, Godambe says, from the follow-
ing argument: Define a linear estimator es as
where the summation is over the distinct units in the sample. "It is
again clear that all the known linear estimates must be particular cases
of e II says Godambe. If es is to be unbiased, then .E f3 ° P = 1s s::> i S1. S
for all 1. And if es is unbiased, its variance is given by
= ~ x~ .E f3;i Ps + .E xiXJ0.E f3sif3sJo Ps - T2 .1=1 1. s::> i irj s::> i, j
Godambe then" proves that setting f3si = l/p( i) yields an admissible
estimate. Here p(i) denotes the probability that the i-th unit is
*included in the sample (= Pi with replacement).
by supposing that
This is done, for i €s ,o 0
97
Xi = 10
xi = 0 for i f i o •
For these assumptions,
1 ,
vee') = v ( !: 13' . Xi) = !:13' P - 1s i s~ i 6i s. . €Sv s;) 0 0
so that
,
1 )2 Ps.p(i )
o
3
His argument runs that since
is positive with the two components inside the brackets assumed unequal,
and with P also always positive,s
is at least as good as any other estimator in the class of unbiased
estimators for some special population. The derivation of this
3The article, as printed, omitted Ps from this equ:ation. This
was corrected on the basis of private correspondence with Dr. Godambe.
98
inequality rests on the assumption that all elements in a population are
zero except on the i -th, which takes the value one. The logical justio
fication for the use of this estimator solely on the basis of its merit
from this peculiar restrictive case would seem to be rather shaky.
Special attention might be called to the effect of the lis-factor in
class six. This factor,will have the effect of helping correct for a
disproportionate number of units in the sample from among those with
large probabilities or those with small probabilities (assuming selec
tion probabilities somewhat indicative of size). If a disproportionate
preponderance of the smaller (probability) units are drawn, then lis
will be numerically small, and being in the denominator will tend to
increase the estimate of the total (or mean) and will correspondingly
increase the variance. Consider the situation where a few units re-
ceived special probabilistic consideration by virtue of their large
size, with the bulk of the units being smaller, and having equal proba-
bilities among themselves. Then, if the sample drawn included only the
smaller units, the value of lis. = JT (N . t i [( 4.9-5 ~ would be less1€S . P1 ~ .
vthan one arid the estimate of the total or mean would be inflated to
counteract the absence of a "representative" number of the larger units.
Also the estimate of the variance of the total would be inflated to give
a truer picture than that given by the essentially equal s~ler units.
By the same token, if the sample as drawn included a disproportion-
ate number of the larger units, then the lis-factor would be p.umerically
large (> 1) and the estimate of the total would be deflated, as would
the estimate of the variance of the estimator. All in all, it would
seem to be a useful inclusion in an estimator.
99
6.0 SUMMARY
6.1 Summary and Conclusions
There is more to the estimation of unknown population values than
the making and recording of observations. Nor is one helped much by
merely taking a large number of observations. All too often, as a re-
sult of insufficient consideration of the basic components of a sampling
plan, badly biased sample results have been put forth as reliable simply
Ibecause the number of units in the saIlJ±lle involved was numerically large.
One must note that a large sample is not necessarily a good sample, but
it is nearly always an expensive sample.
Relegating cost considerations to the background, but not ignoring
them, it has been seen that a sampling plan has five major components:
1. A UNIVERSE: the totality of ultimate units of analysis about
which information is desired.
2. The FRAME: a delineation of the sampling units (which may consist
of one of more units of analysis).
3. A PROBABILITY SYSTEM:: a set of numbers, one for each saIlJ±lling
unit, with values restricted to the range 0 < p. < 1 and withJ. .
their sum over all sampling units in the universe restricted to
one, which are in one-to-one correspondence with the particular
frame involved. These selection probabilities must be operation-
ally realizable.
4. A SAMPLING PROCEDURE: a scheme which comes operationally from the
probability system for determining which particular units consti-
tute the sample.
100
5. .An ESTThlATION PROCEDURE: the result of the logical combination of
the observations (obtained through the frame) and the probability
system, and also involved with the sampling procedure, for arriv
ing at the desired estimates of the population values of the
characteristics under observation.
The first three of these must be completely specified prior to any con
sideration of the last two. .And for each change in the specification of
the frame and the probability system, the problem of obtaining an "opti
mum" sampling procedure and an "optimum" estimator changes •.
One can note that both the frame and the probability system can be
either simple or complex. In the discussion of estimators in Section 4.0,
consideration was restricted to one stage sampling, so that there was a
simple frame and simple probability system, however that does not affect
the generality of the above formulation. The frame ang. the probability
system, whether simple, different for each stage of a multi-stage plan,
varying over time, etc., still must be specified before one can consider
problems of selection of a sampling procedure or an estimator.
Also as a result of this formulation, the statement has been made
that a direct comparison between sampling with replacement and sampling
without replacement does not have any logical justification. The
authors who have considered this question apparently came to the same
conclusion, although they did not state it explicitly, because the com
J2arisons actually made were between estimating on the basis of the dis
tinct units and on the basis of the totality of units when sampling with
replacement, rather than the stated with versus without comparison.
101
In addition to considering the non-human components of a sampling
plan, consideration also has been given to criteria to.be applied in
helping determine which of a choice of estimators is optimum. In the
literature on sample survey techniques, the criteria that have been
used have been, for the most part, those developed for infinite popula-
tions and applied to samples from finite populations with the expecta-
tion that the degree of relevance is still fairly high. It was seen
that the concepts of sufficiency and efficiency (defined in terms of
minimum variance for an asymptotically normally distributed estimator)
are usually meaningless when applied to samples from finite populations.
Asymptotic normality cannot be achieved without resorting to an argu-
ment that the size of the fixed finite universe be allowed to approach .
infinity. Regarding sufficiency, the argument follows principally from
the fact that, in the process of sampling, probabilities enter the prob-
lem only in connection with the selection of the units to be included,
and not in connection with the characteristics of those units which are
the objectives of the investigation.
Further, the concep~ of consistency, when the traditional defini-
tion based on convergence in probability is used, does not apply to
finite samples from finite populations for the same reasons as given
above. However, if one goes back to the first definition given for con-
sistency, when Fisher promulgated the beginnings of estimation theory
(1921), there is obtained a definition which seems to be perfectly suit-
able for finite populations. It is the following:
"A statistic satisfies the c~.iterion of consistency if, whenit is calculated from the whole population, it is equal tothe required population value. n
102
The two oldest estimation criteria, unbiasedness and mini~um
variance, which were formulated by Gauss in the early 1800's are still
applicable, both to the infinite and finite populations, however they
are possibly too restrictive to be general criteria.
Thus, in the way of major estimation criteria to be applied to the
problem of selecting an optimum estimator to be based on the observa
tions of a finite sample from a finite population, one can require:
(1) that the estimator be consistent, and,
(2) that it have a minimum mean square error
where mean square error equals the sum of the variance and the square of
the bias. Many might consider the desideratum to be that the estimator
be unbiased and have minimum variance, but, for generality, a better (in
some sense) estimator may be obtained if consideration is given to esti
mators which are consistent, i.e., have a disappearing bias, and conse
quently might have a minimum mean square error, if such minimum is
obtainable. If there is no bias present in the desired estimator, the
two sets of criteria are identical.
And finally this dissertation has applied the axiomatic approach to
the case of sampling with arbitrary selection probabilities and with re
placement of the sampling units before another unit is drawn. It has
been seen that the use of axioms in the process of formulation of
classes of estimators has produced seven classes of linear unbiased
estimators of the population total, with the weights independent of the
unit characteristics (thus prohibiting imposition of a minimum variance
criterion). Within each class, a condition derived from the criterion
103
of unbiasedness has been derived, and possible solutions to that equa-
tion have been proposed. To tie the various classes of estimators to-
gether it may be noted that from the condition of unbiasedness on the
class seven estimator every other estimator can be derived by suitable
suppression (assumption of equality, e.g., r j = r when j is suppressed)
of the subscripts which denote conditions on the weights. From class
six, one can go to classes one, three and two; and so forth. The pos-
sible directions of movement are indicated by the arrows in the follow-
ing diagram:
In considering the variances of the estimators given in Sections
4.4 to 4.10, the class one estimator has been shown to be inferior to
an estimator belonging to class five, to wit:
n< v(l! 1.: x. )
n t=l ~tfor n > 2· •
However, as class one is so restricted, this comparison is also restrict-
ed to the case of equal selection probabilities. No such statement can
be made concerning the general case of arbitrary selection probabilities.
104
Otherwise, the choice of which of the various estimators to use
will depend on the specific circumstances of the sample to be drawn,
including the choice of the probability system, and possible outside
considerations which will dictate the combination of restrictions to be
applied to the choice of weights for the selected units.
6.2 Recommendations for Future Research
A major objective of this dissertation has been to raise the point
of view that the whole area of sample survey theory needs a theory of
estimation or a set of estimation criteria derived for and applied to
finite samples from finite populations. The field of sample survey
theory should not have to rely on ready-made concepts derived for in-
. finite populations, which, when applied to finite populations, have to
rely on ideas such as letting both the sample and the population ap
proach an infinite size.
If such a theory is developed, it will of necessity mean more
emphasis on combinatorial theory in the study of and development of
sample survey theory.
Another area of possible additional research would be the inter
mediate area between sampling with replacement which was a subject of
this dissertation and sampling Without replacement which was the subject
of the dissertation by Koop (1957). One might have a situation where
replacement of one or more units occurred simultaneously after a given
number of units has been drawn without replacement. Or one might
postulate a sampling scheme where the decision as to whether to replace
c' 105
a given unit is arbitrary (e.g., the unit might die before replacement
could be effected) or is determined 'in a systematic or probabilistic
manner.
With the advent of bigger and faster computers ,empirical sampling
might be done to investigate the relative efficiencies for the estima
tors proposed here for the case of sampling with replacement. Another
topic for investigation along these lines would be the possible depend
ence of the relative efficiencies on the structure of the selection
probability vector.
And finally, this dissertation dealt mostly with unbiased linear
estimators • With the large computers for use in computation, it is
undoubtedly desirable to modify the "best linear unbiased" criterion to
include consideration of estimators that are nonlinear and consistent,
but would have a smaller mean square error than the best linear unbiased
estimators.
106
7.0 LIST OF REFERENCES
Basu, D. 1958. On sampling with and without replacement. Sankhya 20:287-294.
Bowley, A. L. 1926. Measurement of' the precision attained in sampling.Bull. Inst. Inter. Stat. Tome XXII, I-ere Livraison: (1)-(62).
Carmichael, R. D. 1937. Introduction to the Theory of' Groups of' FiniteOrder, Dover Publications, Inc., New York (reprinted 1956).
Chrystal, G. 1900. Algebra, Part II. Dover Publications, Inc., NewYork (reprinted 1961).
Cochran, W. G. 1946. Relative accuracy of' systematic and stratif'iedrandom samples for a certain class of' populations. Ann. Math.Stat. 17: 164-177.
Cochran, W. G. 1953. Sampling Techniques. John Wiley and Sons, Inc.,New York.
Das, A. C. 1951. On two-phase sampling and sampling with varying probabilities. Bull. Inst. Inter. Stat. 33: 105-112.
Deming, W. E. 1960. Sample Design in Business Research. John Wileyand Sons, Inc., New York.
Edgeworth, F. Y. 1918. On the value of a mean as calculated from asample. J. Roy. Stat. Soc. 81: 624- 632.
Feller, W. 1957. An Introduction to Probability Theory and its Applications, Vol. I, 2nd edn. John Wiley and Sons, Inc., New York.
Fisher, R. A. 1921. On the mathematical f'oundations of theoreticalstatistics. Phil. Trans. Roy. Soc. London Ser. A 222: 309-368.
Fisher, R. A. 1925.Phil. Soc. 22:
Theory of' statistical estimation.700-725·
Proc. Cambridge
Fisher, R. A. 1956. Statistical Methods and Scientific In:f'erence.Hafner Publishing Co., New York.
Fisher, R. A., and Yates, F. 1949. Statistical Tables, 3rd edn. Oliverand Boyd, Ltd., London.
Fraser, D. A. S. 1958. Statistics: An Introduction. John Wiley andSons, Inc., New York.
107
Godambe, V. P. 1955. A unified theory of sampling from finite populations. J. Roy. Stat. Soc. Ser. B 17: 269-278.
Godambe, V. P. 1960. An admissible estimate for any sampling design.Sankhya 22: 285-288.
Hansen, M. H. and Hurwitz, W. N. 1943. On the theory of sampling fromfinite populations. Ann. Math. Stat. 14: 333-362.
Hansen, M. H., Hurwitz, W. N. and Madow, W. G. 1953. Sample SurveyMethods and Theory, Vol. II. John Wiley and Sons, Inc., New York.
Horvitz, D. G. and Thompson, D. J. 1952. A generalization of samplingwithout replacement from a finite universe. J. Am. Stat. Assoc.47: 663-685.
Isserlis, L. 1916. On the conditions under which the "probable errors"of frequency distributions have a real significance. Froc. Roy.Soc. (London) Ser. A 92: 23-41.
Isserlis, L. 1918. On the value of a mean as calculated from a sample.J. Roy. Stat. Soc. 81: 75-81.
Koop, J. C.' 1957. Contributions to the general theory of sampling finite populations without replacement and with unequal probabilities.Unpublished Ph.D. Thesis, North Carolina State College, Raleigh(university Microfilms, Ann Arbor).
Koop, J. C. 1960. On theoretical questions underlying the technique ofreplicated or interpenetrating samples. Froc. Social Stat. Sect.,Am. Stat. Assoc. 1960: 196-205.
1-1adow, W. G. 1948. On the limiting distribution of estimates based onsamples from finite universes. Ann. Math. Stat. 19: 535-545.
Midzuno, H. 1950. An outline of the theory of sampling systems. Ann.Inst. Stat. Math. 1:: 149-156.
Mortara, G. 1917. Elementi di statistica. Appunti sulle lexioni distatistica methodologica dettate nel R. Instituto Superiore distudi comerciali di Roma. Rome. p. 356. As cited by Tschuprow(1923) .
Murthy, M. N. and Sethi, V. K. 1961. Randomized rounded-off multipliers in sampling theory. J. Am. Stat. Assoc. 56: 328-334.
NanjaJ'lJIlla, N. S., Murthy, M. N. and Sethi, V. K. 1959. Some sampling .systems providing unbiased ratio estimators. Sankhy~ 21: 299-314.
.'
108
Neyman, J. 1934. On two different aspects of the representative method: the method of stratified sampling and the method of purposiveselection. J. Roy. Stat. Soc. 2]: 558-606.
Neyman, J. 1952. Lectures and Conferences on Mathematical Statisticsand Probability. Graduate School, u. S. Dept. Agr., Washington,D. C.
Raj, Des. 1958. On the relative accuracy of some sampling techniques .J. Am. Stat. Assoc. 22: 98-101.
Raj, Des and Rhams, S. H. 1958. Some remarks on sampling with replacement. Ann. Math. Stat.~: 550-557.
Riordan, J. 1958. An Introduction to Combinatorial Analysis. JohnWiley and Sons, Inc., New York.
Roy, J. and Chakravarti, I. M. 1960.population. Ann. Math. Stat. 31:
Estimating the mean of a finite392-398.
Seng, Y. P. 1951. Historical survey of the development of samplingtheories and practice. J. Roy. Stat. Soc. Sere A 114: 214-231.
Splawa-NeYman, J. 1925. Contributions to the theory of small samplesdrawn from a finite population. Biometrika 17: 472-479.
Stevens, W. L. 1937. Significance of grouping. Ann. Eugenics.§: 57-69.
Stevens, W. L. 1958. Sampling without replacement with probability proportional to size. J. Roy. Stat. Soc. Sere B 20: 393-397.
Sukhatme, P. V. 1953. Sampling Theory of Surveys with Applications.The Indian Society of Agricultural Statistics, New Delhi, India,and The Iowa State College Press, Ames, Iowa.
Sukhatme, P. V. and Narain, R. D. 1952 . Sampling with replacement. J.Indian Soc. Agr. Stat. ~: 42- 49.
Tschuprow, A. A. 1923. On the mathematical expectation of the momentsof frequency distributions in the case of correlated observations.Metron g(3): 461-493 and g(4): 646-683.
Whittaker, E. and Robinson, G. 1944. The Calculus of Observati'ons, 4thedn. Blackie and Son, Ltd., London.
Wilks, S. s. 1960. A two-stage scheme for sampling without replacement. Bull. Inst. Inter. Stat. 21(2): 241-248.
Wu-min. 1958. Two ways of compiling statistics.Peking, China. April 29: 1, 4.
A
Jel'lIllin j ih pao.
109
Yates, F. 1953. Sampling Methods for Censuses and Surveys, 2nd edn.Hafner Publishing Co., New York.
Yezhov, A. 1957. Soviet Statistics. Foreign Language PublishingHouse, Moscow.
Zarkovic, S. s. 1956. Note on the history of sampling methods inRussia. J. Roy. Stat. Soc. Sere A 119: 336-338.
Zarkovic, S. S. 1960. On the efficiency of sampling with varying probabilities and the selection of units withreplacement. Metrika~: 53-60.
111
8.0 APPENDIX A
THE DISTRIBUTION OF THE NUMBER OF DISTINCT UNITS IN THE SAMPLE
Let v denote the number of distinct units appearing in a sample
of size n drawn from a fimte population of size N with replacement
of each unit drawn preceding the next draw. Then it is readily apparent
that v is a random variable (1 ~ v ~ n) with a distribution dependent
on n and N• Although all the results of this appendix are not used
in the body of this dissertation, the use of generating functions in
this field is of interest.
8.1 Equal Selection Probability Case
Let the probabilities of selection be equal for each of the N
units (i.e., Pi = P = liN), then an analogy may be drawn between the
distribution of v and that of the number of empty cells when r balls
are randomly distributed among n cells. This classic "occupancy prob-
lem" yields the following formula, as given by Feller (1957, p. 92), for
the probability of having m cells empty when placing r objects into
n cells:
P[m] = Pr(m cells empty)
(8-1)
To apply this formula to the distribution of v, note that
Pr( v distinct units in n draws) =
Pr(N-vunits not drawn or "empty").
112
Setting, in (8-~)
n = N, m = N - v, r = n
and reversing the order of summation (setting s = v-s) gives:
p(v) = ( N) ~ (_l)v-s ( v ) (1 _ (N-V); (v-s) )N-v s=o v-s
Using the "differences of zero" notation, (8-2) may be written in a
more elegant form as
(8-2)
(8-3)
where t:::. is the usual finite difference operator with unit increment and
From (8-3) the probability generating function of v can be obtained
as
Note that t:::.sOn = 0 for s = 0 and for s > n.
(8-4)
Further, the factorial generating function is readily obtained from
the probability generating function by substituting (1 + t) for t in
( 8-4), to wit
113
(8-5 )
where C = 1 + A = the usual increment operator with unit increments.
Using Fv(t), (8-5), one can readily compute E(v) and E v(v-l) and from
these the mean and variance of v are easily obtained as follows:
E(v) =t=O
(8-6)
Since the variance of v is
(8-7)
first determine:
114
t=O
and then, by substituting this· in (8-7), obtain
V(v) = N- n N(N-1) [~ - 2(N-1)2 + (N-2)nJ + N-~[~ - (N-1)nJ
_r n r [~n _ 2~(N_1)n + (N_1)2nJ
= N- n N(N-1)n + N-n N(N-1)(N-2)n _ N-2~(N_1)2n, _
= N (N;lt _ r (N;1)2n + N(N_1)(N;2)n •
Also, E(~) can be seen to be:
(8-9)
E(v2 ) = E [v( v-1) ] + E( v)
= N-~ [Nn - 2(N-1)n + (N-2)nJ +~ [(N_1)n - (N_2)n].
(8-10 )
115
8.2 Arbitrary Selection Probability Case
With arbitrary, or unequal, selection probabilities, the analogy
with the "occupancy problem" disappears, and the distribution of v be·
comes rather messy. One can, however, obtain expressions for the mean
and variance of v without first obtaining the distribution of v.
Let the characteristic random variable
1 if the i-th unit is drawn, regardless of the number of
Zi = times it appears in the sample.
o if the i-th unit is not drawn.
Also denote the probability of the i-th unit being drawn on any givenN
draw as p. with 1:: Pi = 1 •1. i=l
Then, on n draws with replacement, the probability that the i-th
unit is not drawn, i. e ., that zi takes the value zero, is:
Pr( zi = 0) = ~ where ~ = 1 - Pi
from which it follows that
Thus the expectation of zi is seen to be
= (1) (1 - ~) + (0) (~)
(8-11)
Now since the number of distinct units equals the sum over all units
in the population of the characteristic random variable, then
116
NE( v) = r: E( z. )
i=l. J.
N n= r: (1 - ~)
i=l
N n= N - r: (1 - p) •
i=l i
This approach also yields
N N NV(v) = V( r: Zi) = r: V(Zi) + r: Cov(zi' Zj).
i=l· i=l irj
Now:
(8-12)
(8-13 )
= (1 - ~ - q~ + ~j) - (1 - ~)(1 - q~)
( n n n)= - ~ qj - ~j
where ~j = 1 - Pi - Pj •
so that, using (8-13) and (8-14), the variance of v is
N N:2 N n= r: %- (r: ~) + r: ~ ..
i=l i=l irj J
(8-14)
(8-15 )
These results have also been derived by Basu (1958), but in a very
compressed form.
117
9.0 APPENDIX B
A STATISTICAL THEORY OF COMMUNISM
The following is a translation of an article, Wu-min (1958), which.... - -
appeared in Jenmin jih pao, the official party newspaper in Communist
China, on April 29, 1958. The government was, at the time, having con
siderable difficulty explaining to the world the discrepancy between
the actual production figures for some crops, and the stated objectives
of the five-year plan then in effect.
It is reproduced as an illustration of the reaction that can occur
when scientific principles do not produce politically desired results.
The moral, however, is not that samplers should pay sole attention to
statistical theory and methodology at the expense of political consid-
erations when formulating the problem, but that the desideratum is sam-
pIers who observe considerations of the subject under study and the
national goals which may be involved, and still retain complete objec-
tivity in compiling, analysing and reporting the sample data.
Two Ways of Compiling Statistics
In reading the report on "Speeding up production by using statis-
tics in Ho-Pei Province, II we see that there are two ways of compiling
and using statistics. One is static and isolated, t~e other is imposing
and integrated. Statisticians in the past, under metaphysical philoso-
phy, adhered too closely to regulations and forms, claiming that statis-
tical workers should assume an extremely detached and cool attitude.
But at the height of our national leap forward in agricultural and
118
industrial production we cannot stand still; we must march forward with
the mass of the people. The State Statistical Bureau has made a thor
ough investigation of past policies and found the following shortcom
ings:
1. Too much stress on textbooks, report forms, neglecting polit
ical responsibility, obserVing the rules to each title and letter.
Doing nothing beyond this. A new and improved system of statistics in
troduced in Ho-Pei Province has been used with highly effective results.
Statistical workers of the old school, Visiting Ho-Pei, have doubted
these results because the new methods cannot be found in their text
books.
The value of new methods and experience must be judged by their
contributions to the national welfare and socialist construction. We
must be materialistic and follow the principle of actuality. Most of
the Chinese texts on statistical methods were translated or compiled
from foreign books. No books have yet been written with creative genius
based upon actual experience in China. Therefore government agencies
dealing with statistics and colleges giving statistical courses should
accept the responsibility of accumulating experience in China and com
piling our own textbooks.
2. Too much emphasis on official formulae, which disregard the
ories and politics; seeking only concrete figures, forgetting the spirit
of the times. Statisticians should first learn the statistical theory
of Marxism and Leninism, then respond to the demand that China be guided
by those principles and establish its own method of statistics.
119
3. Too much mystification and self-consciousness among statisti
cians, who insist that this work must be done only by specialists.
Thus, they depend only upon their own workers, having no confidence in
other people, and refuse guidance by the government or the party. We
must cooperate with the local population and participate in their pro
duction efforts. This is the lesson demonstrate~by the experiment in
Ho-Pei Province.
4. Too much reliance on official rules and procedures; seeking
only figures, disregarding people. Too much preoccupation with writing
reports and filling forms, to the neglect of positive, creative and pro
gressive work. Statistical workers in Ho-Pei Province have adopted
entirely different methods, which can be summarized under the following
three points:
1) Related their statistical records with the major activities
of the party and the productive labor of the people. This
makes statistics the motive power and guiding force in the
national production leap.forward.
2) Maintained political consciousness and guiding principles,
without holding too rigidly to absolute figures, prescribed
procedures and forms, which would waste time.
3) Relied on local authorities and the mass of the people to
bring results and overcome obstacles. In this way Ho-Pei
statistics are based upon actual conditions and the accu
racy of sources can be guaranteed.
The new statistical methods in Ho-Pei have created new experience, new
120
trends and directives in statistics. It is worthwhile, therefore to
recommend these methods to all statistical workers in China, and hope
that they will pay special attention to adapting concrete methods to
suit local conditions and purposes.
INSTITUTE OF STATISTICS
NORTH CAROLINA STATE COLLEGE
(Mimeo Series available for distribution at cost)
265. Eicker, FriedheIm. Consistency of parameter-estimates in a linear time-series model. October, 1960.
266. Eicker, FriedheIm. A necessary and sufficient condition for consistency of the LS estimates in linear regression. October,1960.
267. Smith, W. L. On some general renewal theorems for nonidentically distributed variables. October, 1960.
268. Duncan, D. B. Bayes rules for a common multiple comparisons problem and related Student-t problems. November,1960.
269. Bose, R. C. Theorems in the additive theory of numbers. November, 1960.
270. Cooper, Dale and D. D. Mason. Available soil moisture as a stochastic process. December, 1960.
271. Eicker, FriedheIm. Central limit theorem and consistency in linear regression. December, 1960.
272. Rigney, jackson A. The cooperative organization in wildlife statistics. Presented at the 14th Annual Meeting, Southeastern Association of Game and Fish Commissioners, Biloxi, Mississippi, October 23-26, 1960. Published in Mimeo Series,January, 1961.
273. Schutzenberger, M. P. On the definition of a certain class of automata. January, 1961.
274. Roy, S. N. and J. N. Shrizastaza. Inference on treatment effects and design of experiments in relation to such inferences.January, 1961.
275. Ray-Chaudhuri, D. K. An algorithm for a minimum cover of an abstract complex. February, 1961.
276. Lehman, E. H., Jr. and R. L. Anderson. Estimation of the scale parameter in the Weibull distribution using samples cen·sored by time and by number of failures. March, 1961.
277. Hotelling, Harold. The behavior of some standard statistical tests under non-standard conditions. February, 1961.
278. Foata, Dominique. On the construction of Bose-Chaudhuri matrices with help of Abelian group characters. February,1961.
279. Eicker, FriedheIm. Central limit theorem for sums over sets of random variables. February, 1961.
280. Bland, R. P. A minimum average risk solution for the problem of choosing the largest mean. March, 1961.
281. Williams, J. S., S. N. Roy and C. C. Cockerham. An evaluation of the worth of some selected indices. May, 1961.
282. Roy, S. N. and R. Gnanadesikan. Equality of two dispersion matrices against alternatives of intermediate specificity.April, 1961.
283. Schutzenberger, M. P. On the recurrence of patterns. April, 1961.
284. Bose, R. C. and I. M. Chakravarti. A coding problem arising in the transmission of numerical data. April, 1961.
285. Patel, M. S. Investigations on factorial designs. May, 1961.
286. Bishir, J. W. Two problems in the theory of stochastic branching processes. May, 1961.
287. Konsler, T. R. A quantitative analysis of the growth and regrowth of a forage crop. May, 1961.
288. Zaki, R. M. and R. L. Anderson. Applications of linear programming techniques to some problems of production plan-ning over time. May, 1961.
289. Schutzenberger, M. P. A remark on finite transducers. June, 1961.
290. Schutzenberger, M. P. On the equation a2+D = b2+m c"'-p in a free group. June, 1961.
291. Schutzenberger, M. P. On a special class of recurrent events. June, 1961.
292. Bhattacharya, P. K. Some properties of the least square estimator in regression analysis when the 'independent' variablesare stochastic. June, 1961.
293. Murthy, V. K. On the general renewal process. June, 1961.
294. Ray-Chaudhuri, D. K. Application of geometry of quadrics of constructing PBIB designs. June, 1961.
295. Bose, R. C. Ternary error correcting codes and fractionally replicated designs. May, 1961.
296. Koop, J. C. Contributions to the general theory of sampling finite populations without replacement and with unequalprobabilities. September, 1961.
297. Foradori, G. T. Some non-response sampling theory for two stage designs. Ph.D. Thesis. November, 1961.
298. Mallios, W. S. Some aspects of linear regression systems. Ph.D. Thesis. November, 1961.
299. Taeuber, R. C. On sampling with replacement: an axiomatic approach. Ph.D. Thesis. November, 1961.
300. Gross, A. J. On the construction of burst error correcting codes. August, 1961.
301. Srivastava, J. N. Contribution to the construction and analysis of designs. August, 1961.
302. Hoeffding, Wassily. The strong laws of large numbers for u-statistics. August, 1961.
303. Roy, S. N. Some recent results in normal multivariate confidence bounds. August, 1961.
304. Roy, S. N. Some remarks on normal multivariate analysis of variance. August, 1961.
305. Smith, W. L. A necessary and sufficient condition for the convergence of the renewal density. August, 1961.
306. Smith, W. L. A note on characteristic functions which vanish identically in an interval. September, 1961.
307. Fukushima, Kozo. A comparison of sequential tests for the Poisson parameter. September, 1961.
308. Hall, W. J. Some sequential analogs of Stein's two-stage test. September, 1961.
309. Bhattacharya, P. K. Use of concomitant measurements in the design and analysis of experiments. November, 1961.