development of the notion of statistical dependence* … · (1811), plana (1813), gauss (1823) and...
TRANSCRIPT
H.O. Lancaster
(received 21 June, 1971)
1. Early work
Naturally, the study of the distribution of single random varia
bles had preceded that of pairs or sets of random variables. However,
it is a surprise to find that the first steps towards a general theory
of dependence were taken as a result of Francis Galton's interest in
genetics. The explanation seems to be that the astronomers were accu
stomed to assume mutual independence of the errors in their measurements
of the observables and they were not expecting the observables, for exa
mple, the positions of the stars, to be related to one another by any
mathematical laws or for any physical reason. In biology, the observa
bles such as height of father and height of son were obviously mutually
dependent and yet not mutually determined - a type of problem not met
by the astronomers.
However, the astronomers had obtained the multivariate normal
distribution without carrying through all its implications. Laplace
(1811), Plana (1813), Gauss (1823) and Bravais (1846) all derived normal
correlations as the joint distribution of linear forms in independently
distributed normal variables but did not define a coefficient of corre
lation. They expressed the density function as a constant times the
exponent of a general quadratic form. Bravais (1846) in his study on
the errors in the position of a point gave the normal density function
as
(1.1) L ' t ”2 * 2 a » *
DEVELOPMENT OF THE NOTION OF STATISTICAL DEPENDENCE*
* Invited address delivered at the sixth New Zealand Mathematics
Colloquium, held at Wellington, 17-19 May, 1971.
Math. Chronicle 2(1972), 1-16.
1
(1 . 2)
The ellipses of equal probability density are then given by
ax2 + 2exy + by2 = D
as stated on his page 272. For each ellipse, the horizontal tangents
could be drawn and each point of contact lay on a straight line through
the origin, a diameter of the ellipse conjugate to the x axis. This is
the line of the modes of the conditional distributions of X for given
Y = y, since at every other point on this horizontal line the quadratic
form, ax2 + 2exy + by2 , corresponds to an ellipse with a greater value
of the D of equation (1.2). This is also the line of the conditional
means of X but Bravais did not make this point. Bravais (1846) also
determined the principal axes and gave the equation in the form,
He also determined ■» the probability inside the axes of equal probabil
ity in the form,
These equations (1.4) and (1.5) can be regarded as a statement of the
distribution of x 2 with two degrees of freedom or even as an elementary
form of the Pearson lemma. Bravais (1846) determined also the principal
axes and the value of the constant in the equation for the density in
three dimensions of random variables X3 Y and Z by successively integrating out the variables. He did not display the bivariate density as
the product of a marginal density and a conditional density.
K. Pearson (1920) suggested that the interpretations of K.Pearson
(1895) and many later authors were incorrect; for Bravais (1846) was
not proposing a theory of the joint distribution of the observables but
only of the errors. As already mentioned, Francis Galton was the first
(1.3)
(1.4)dttr K_ TT ds ~ TT e
and
(1.5)nr = 1 - e
TT
2
worker to study the mutual dependence of one variable on another as they
appear in the physical world, namely, the relations between heights and
other attributes in human parents and offspring. He had found it dif
ficult to obtain sufficient data in human subjects and so was examining
the produce of seeds of the sweet pea. Galton (1877) observed the phen
omenon of ’reversion1 whereby the mean of the produce of large seeds was
always closer to the 'average ancestral type' than the parents and simi
larly for the produce of small seeds; he gives biological reasons, such
as differential mortality, for this phenomenon. In later papers, he used
'regression' rather than 'reversion' to the mean or 'mediocrity'. The
reversion coefficient may be written as X and y = Ax is the line of rev
ersion or regression of Y on X. If the variance of J, the offspring, is
to be equal to the variance of the parent, X3 and if homosoedastioity3 that is, a uniform conditional variance, be assumed then the variance of
Y can be partitioned into the conditional variance and the variance of
conditional means
_ A
(1.6) var Y = (1 - r2)var Y + var y
= (1 - r 2)var Y + X2 var X,
and X = v if var X = var Y> so that the variance is to be preserved from
generation to generation. In this earlier paper, Galton (1877) did not
seem to realise that there are two regression lines. However, Galton
(1886a) gave a diagram displaying them both; it is of interest that he
obtained, by smoothing an observed distribution of heights of offspring,
the ellipses of equal density such as Bravais (1846) had determined in
the theoretical distribution. Galton (1886b) enlisted the help of Dickson
(1886), who wrote the joint distribution as a product of the marginal
distribution of Y and the conditional distribution of X.
Now it appears that Dickson (1886) could well have taken notice of
several other themes in the multivariate normal distribution. Thus
Bienaymg (1852) had generalized the central limit theorems of Laplace
and of Fourier to several variables. However, he had then made a linear
transformation of his variables to obtain the distribution of the theo
retical x2 for he was not interested in the joint distributions of the
3
variables or errors. Todhunter (1869) also had examined errors jointly
normally distributed and in an appendix to this note, Cayley (1869) had
given the general expression for the constant multiplier of the density,
constant * exp - Igt̂ Cx. Further, Todhunter (1869) had obtained the
characteristic function for the normal correlation.
Herschel (1850) in an extended review of Quetelet's (1846)
Lettres & S.A.R. le due ... had shown how the normal distribution could
be derived as the only possible distribution of errors in two dimensions
such that it was possible to obtain two distinct pairs of independent
randan variables, related by an orthogonal transformation. Indeed,
Herschel (1850) sketched a proof of the proposition, that if X is inde
pendent of Y and if also X cos0 + Y sine is independent of -X sine +
Y cos0, then X is normal and Y is normal. This characterization was bit
terly attacked by Ellis (1850). However, the hypotheses were stated more
exactly by Boole (1854), who made the explicit assumptions that the vari
ables X and Y were mutually independent and that the density was depend
ent solely on the distance from the origin; a functional equation resu
lted,
(1.7) n x 2 + y 2m o) •which had a solution of the form,
(1.8) fix2) = (2irJ~^exp(-%x2),
since (f(x2)dx = 1, and so the distribution was normal. Boole (1854)
also clearly recognized that there could not be a universal law of
errors.
Assumptions closely related to those of Herschel (1850) were used
explicitly by Maxwell (1860, 1867) in his derivation of the kinetic the
ory of gases. He deduced the normal distribution of the velocities of
the gas molecules by assuming that the components of the velocities of
the molecules along any orthogonal set of axes are mutually independent.
As is noted at page 122 of Plummer (1940) the assumption of independence
of the components along three orthogonal axes was disputed and subsequ-
4
ently replaced by an analysis of the dynamical conditions; Plummer
(1940) praises Herschel (1850) for realising that some assumptions
have to be made to reach a solution and for making simple and explicit
assumptions, effective in leading to a solution. Many modern writers
have overlooked the point in Maxwell's discussion and have assumed both
the mutual independence about arbitrary orthogonal axes and the normality
and so have multiplied their hypotheses unnecessarily.
The multinomial distribution is perhaps the simplest of the joint
distributions and was early studied. Bienaym6 (1838) gave the asymptotic
distribution of linear forms in the cell frequencies of the multinomial
suitably standardised and also the asymptotic expression for the probab
ility of the set of cell frequencies. The exponent in this expression is
very close to Pearson's X 2 ; in fact it is the sum £ (observed-expected)2
/(observed). This contribution has been almost entirely overlooked by sub
sequent authors although Meyer (1874) mentions it.
2 . Th e d e v e lo p m e n t o f th e n o t i o n o f d e pe nd e nce b y K a r l P e a rs o nThe best introduction to the next era of development is the state
ment of K. Pearson (1920), who wrote
'It will be seen from what has gone before that in 1892 the next steps
to be taken were clearly indicated. They were, I think,
(a) The abolition of the median and quartile processes as too inexact
for accurate statistics.
(b) The replacement of the laborious processes of dividing by the quar-
tiles and averaging the deduced values of r, by a direct and if pos
sible "best" method of finding r.
(c) The determination of the probable errors of r as found by the "best"
and other methods.
(d) The expression of the multiple correlation surface in an adequate
and simple form.
These problems were solved by Dr Sheppard or myself before the end
of 1897.
Closely associated with these problems arose the question of
5
generalising correlation. Why should the distribution be Gaussian,
why should the regression curve be linear?
As early as 1893 I dealt with quite a number of correlation tables
for long series and was able to demonstrate
(i) by applying Galton*s process of drawing contours of equal frequ
ency that most smooth and definite systems of contours can arise from
long series, obviously mathematical families of curves, which are (a)
ovaloid, not ellipsoid, and (b) which do not possess - like the normal
surface contours - more than one axis of symmetry.
(ii) that regression curves can be quite smooth mathematical curves
differing widely from straight lines
(iii) that in cases wherein (i) and (ii) hold, homoscedasticity is not
the rule.
I obtained differential equations to such systems, but for more
than 25 years while often returning to them, have failed to obtain their
integration.
This seems to me the desideratum of the theory of correlation at
the present time: the discovery of an appropriate system of surfaces,
which will give bi-variate skew frequency. We want to free ourselves
from the limitations of the normal surface, as we have from the normal
curve of errors.'
To Karl Pearson and W.F.R. Weldon, the importance of Galton's
work for future studies of heredity and evolution was clear. Weldon
(1890 and 1892) began to accumulate empirical distributions in one and
two dimensions in order to compare variations in biological variables in
space and time. To explain the non-normality of one of Weldon's empiri
cal distributions Pearson (1894) solved the problem of how to dissect a
mixture of normal distributions into its component normal curves and so
began his great series on the Contributions to the mathematical theory
of evolution. Pearson (1895b, 1901 and 1916a) introduced his system of
curves, which have since found many applications in statistics, as the
solutions to a hypergeometric differential equation. This series of
6
curves appears naturally in the solution of differential equations giv
ing the limiting or equilibrium forms of distribution for certain dif
fusion processes and determining bivariate densities which have
'diagonal' or homogenous polynomial expansions.
K. Pearson (1895b) constructed other joint distributions by sampl
ing experiments with card packs and roulette wheels; he noted that if
the material 'obeys a law of skew distribution, the theory of correlation
as developed by Galton and Dickson requires considerable modification.'
Pearson (1896) gave some historical notes on the history of multivariate
normality, in which he assigned rather more importance, as Pearson (1920)
was later to remark, to the memoir of Bravais (1846) than it deserved in
the particular context, for Bravais (1846) had considered neither regres
sion nor conditional distributions and had not paid any special attention
to the coefficient of correlation. In the same article, Pearson (1896)
introduced the multivariate normal distribution by making a linear trans
formation from mutually independent centred normal variables, the elements
of a vector, £, to the elements of a vector, ri.
(2- d » =
where A is m x n, m < n and of rank m* a transformation which had already
been used by Laplace (1810) and Plana (1813) and also by Todhunter (1869).
By integrating out superfluous variables, Pearson (1896) obtained the
joint density function,
(2.2) f(r\ ,t) ,..., ti ) = constant exp(-%x2),1 2 mwhere the expression X 2 appears for the first time in the literature as
a quadratic form in the variables. If m = 2, there was linear regression
between the two variables and the conditional variance of one variable
for fixed values of the other was reduced in the ratio, (1 - p2), where
p was the coefficient of correlation. Pearson (1896) gave the general
formula for the bivariate density,
(2.3) = constant x exp[~h(g x 2 + 2hxy + g y 2)],1 2
and derived from it the standard form as we now know it. He then
7
showed how to estimate r, the coefficient of correlation in a sample
by maximum likelihood. He extended the analysis to normal distributions
in three and higher dimensions. The reader cannot but wish that he had
separated out the purely mathematical or distributional theory from the
genetical applications, for the mixture of the two aspects is very con
fusing. It appears to the m o d e m reader of Pearson (1896), Edgeworth
(1892) and Sheppard (1898) that Pearson had a much better understanding
of the mathematics and the applications than either of the others,
although this view has not been generally held. Pearson (1896) obtained
the general formula for the multivariate normal, namely
(2.4) = (2l0 ^ e x p ^ ^ g *£) ,
a result already known to Todhunter (1869) and Cayley (1869).
It should be noted that at this time, 1896, few joint distribut
ions, not simply products of the marginal distributions, were known and
the lack of such examples and a corresponding theory led such capable
mathematicians as H.W. Watson (1891) into serious difficulties. There
was also lacking a clear notion of multivariate measure or distribution,
free of the notion of sample. Thus we find the multiplier N appearing
in formulae, as in the first formula of Section 5 of Pearson (1896) in,
what seems to us, a rather irrelevant manner. Perhaps, only after the
axiomatizations of probability of Kolmogorov (1933), for example, were
clear statements possible. Appropriate practical examples were also
lacking. Meteorological or vital statistical examples lacked a mathe
matical model to explain their form. A simple model was possible in
genetics. Galton, Weldon and Pearson all believed that correlation or
lack of independence in the distribution of attributes in relatives was
usually brought about by the possession of random elements in common,
that is, the presence or absence of genetic factors. To imitate such
a genetic model, Weldon (1906) carried out dice throwing experiments
which yielded a bivariate binomial distribution, with marginal distrib
utions of the form, (h + h ) 1 2 , and six random variables held in common.
Pearson (1895b) too was carrying out experiments with teetotums (roul
ette wheels) and card packs to obtain empirical joint distributions.
8
Some novel applications of the theory were now possible. Pearson and
Filon (1898) obtained the joint distribution of the errors in estimates
with the aid of some plausible simplifying assumptions such as joint
normality, thus extending the theory in which mutual independence had
been assumed. They were also able to consider the effect of sampling
on one subset of the variables on the joint distribution of the comple
mentary subset. Now that Pearson (1895b) had described a number of dif
ferent frequency curves, it became for the first time practical to ask
whether empirical distributions were better fitted by one theoretical
distribution or another. Sheppard (1898b) proposed to make such comp
arisons by treating the difference, observed-expected, as a binomial
variable, for example, for each of the cells of a contingency table.
As Sheppard (1929) was later to remark, he had paid insufficient atten
tion in 1898 to the fact that these binomial variables were correlated.
Pearson (1900a) with his knowledge of the mutivariate normal theory was
able to attain the solution, the Pearson X 2 , which could be used as a
criterion to test goodness of fit. Pearson (1900a) made the assumption
that variables having marginal normal distributions were jointly normal.
In the asymtotically normal distribution arising from the multinomial,
this assumption can be justified by the use of the generalised central
limit theorem of Bernstein (1926). A closely related theme had been
developed by Bienaymg (1838) who had obtained the asymptotically normal
distribution of linear forms in the cell contents of a multinomial dis
tribution. Later, Sheppard (1898) independently obtained this result.
The idea is useful for Frechet (1951) has since defined multivariate
normality by the property that every linear form in the variables is
normal.
Pearson (1904) considered multivariate distributions in qualitative
variables, for which he introduced the term, 'contingency' for any meas
ure of the total deviation of the classification from independent proba
bility. These measures might in some cases be independent of the ordering;
he was, therefore,making a new departure freeing the theory of the neces
sity to give primacy to the product moment correlation. For bivariate
distributions, he defined <£2 , the integral of the square of the like-
9
lihood ratio of the bivariate density to the product of the marginal
densities, taken with respect to the product measure. He concluded
that he had generalized the work of Yule (1900, 1901 and 1903).
Pearson (1905) in effect defined correlation in a general sense
to mean that the distribution of Y was not the same for every value of
Xy that is ?(Y$.B\X£A) was not constant for every choice of the set A.
He considered the conditional means and the variance of the conditional
mean and the correlation ratio.
Pearson and Heron (1913) were concerned mainly with the estimation
of indexes of correlation rather than with the introduction of new the
ory. Pearson introduced a notion here and in other articles, that in a
partitioned marginal space of 'discrete' variables, the variable could
be regarded as approximating in some way to an underlying continuous
variable. In the discrete space formed from the space of a continuous
random variable, there is indeed a linear form in the indicator variab
les of the cells of the condensed space which has maximum correlation
with the continuous marginal variable and this maximum correlation tends
to unity as the partition is appropriately refined.
The introduction of a new system of distributions in one dimension
was a great step forward, but K. Pearson was much less fortunate in his
generalizations of the idea to two-dimensional distributions; indeed,
Pearson (1923a) noted that, after the introduction of his system of cur
ves to graduate empirical frequency distributions in one dimension, he
had attempted to describe a system of bivariate distributions as the sol
utions of a differential equation but that much work had failed to pro
vide such a system. Certain particular results had been obtained; in
general, a bivariate distribution could not be represented as a product
distribution by a rotation of axes, for example, if the density was ref
erred to the 'principal inertial system of the contour system' of the
bivariate density, a set of independent random variables was not thereby
obtained, as it would be in the multivariate normal distributions.
E.C. Rhodes (1923) and L.N.G. Filon had obtained some special forms of
bivariate densities, which they had not published at the time of the
10
completion of their work. Pearson (1923a) was rather concerned that
the correlation between such variables appeared to be determined by
the marginal distributions. In Pearson (1923a and b) some surfaces
given by analytic expressions were examined for properties such as lin
ear regression. If homoscedasticity were imposed, the normal distribu
tion was characterized. Pearson (1925) attempted to extend to two
dimensions the Charlier method of approximating to the normal in one
dimension. He wished to obtain bivariate surfaces for arbitrary values
of the sample size and moments and mixed moments up to the 4th order,
the fifteen-constant bivariate density. However, we cannot regard
this effort as productive as there are too many parameters to be fitted
and it is possible that negative values of the density may result.
The most successful attempt to obtain bivariate densities was,
according to Pearson (1923a), made by S. Narumi (1923, a, b and c).
Narumi (1923a) had begun by considering a 'regression function', by ̂ *
which he meant any functions of the form y = y{x) and x = x(y),x y
such as the mode of the conditional distribution for given x, the
conditional mean, a linear function in the independent variable.
'Homoscedasticity' was reinterpreted as the same form of conditional
distribution for each value of the independent variable. He then
considered various differential equations. It is of interest that
if the centres of location of the conditional distributions are
situated on straight lines and if the conditional distributions have
the same form, then there are but few possibilities for either the
marginal variables are mutually independent, the distribution is
jointly normal or the joint distribution is degenerate. It is clear
from Narumi (1921b) that this approach leads to very heavy algebra
and functional equations, difficult to solve except in the normal
distribution. The principal extension of the theory due to Narumi,
Rhodes and Pearson may be said to be the densities of the form,T a
f(x)= (1 ± x Ax) , the bivariate hypergeometric distribution and the
multivariate hypergeometric distribution and the multivariate binomial
distribution. Bernstein (1927) obtained the independence distributions,
the joint normal distribution and some other special distributions as a
result of his investigations in this problem.
11
REFERENCES
Bernstein, S.N. (1926). Sur I’extension du theor&me limite du calcul
des probabilites aux sonmes de quantites dipendantes, Math. Ann.
97, 1-59.
Bernstein, S. (1927). Fondements giomitriques de la thiorie des correl
ations, Metron 7(2), 1-27.
Bienayme, J. (1838). Sur la probability des r£sultats moyens des obser
vations; demonstration directe de la r&gle de Laplace, M6mor.
Sav. Etrangers Acad. Sci. Paris 5, 513-558.
Bienaymg, J. (1852). Sur la probability des erreurs d'apr&s la m£thode
des moindres carris, J. Math. Pures Appl. 17, 33-78.
Boole, G. (1854). On a general method in the theory of probabilities,
Phil. Mag. (4) 8, (reprinted in Studies in probability and stat
istics (1952), Watts and Co., London).
Bravais, A. (1846). Analyse mathematique sur les probabilitis des erre
urs de situation d'un point, M6m. de l'Instit. de France 9,
255-332.
Cayley, A. (1869). See Todhunter (1869).
Dickson, J.D.H. (1886). Appendix to Galton (1886),Proc. Roy. Soc. 40,
63-73.
Edgeworth, F.Y. (1892). The law of error and correlated averages, Phil.
Mag. (5) 34, 429-438 and 518-526.
Ellis, R.L. (1850). Remarks on an alleged proof of the ’Method of Least
Squares ’ contained in a late number of the Edinburgh Review,
Phil. Mag. (3) 37, 321-8 and 462.
Frgchet, M. (1951). G&n&ralisations de la loi de probability de Laplace,
Ann. Instit. Henri Poincar£ 12, 1-29.
Galton, F. (1886a). Regression towards mediocrity in hereditary stature,
J. Anthrop. Inst. 15* 246-263.
12
Galton, F. (1886b). Family likeness in stature, Proc. Roy. Soc. 40,
42-63.
Galton, F. (1877). Typical laws of heredity, Proc. Royal Instit. Gt.
Britain 8, 282-301.
Galton, F. (1908) Memories of my life, E.P. Dutton and Co., New York.
Gauss, C.F. (1823). Anwendung der Wahrscheinlichkeitsrechnung auf eine
Aufgabe der praktischen Geometrie, Astron. Nachr. 1, 81-88.
Herschel, J. (Anonymously) (1850). A review of Lettres h S.A.R. le due
regnant de Saxe-Cobourg et Gotha sur la theorie des probabilites
appliquie aux sciences morales et politiques, by M.A. Quetelet
(1846), Edinburgh Review 92, 1-57.
Kolmogorov, A.N. (1933), Grundbegriffe der Wahrscheinlichkeitsrechmmg,
Ergebnisse der Math. 2(3), 196-262.
Laplace, P.S. (1811). M&moires sur les intigrales definies et leur ap
plication aux probabilitis, Memoiresde l’Institut Imperial de
France, 1811, 279-347.
Meyer, A. (1874). Calcul des probabilitis de A. Meyer public sur les
manu8crits de I’auteur par F. Folie, M6m. Soc. Lifege, 6(2),
x+446 pp.
Maxwell, J.C. (1860). Illustrations of the dynamical theory of gases,
Phil. Mag. (4) 19, 19-32 and (4) 20, 21-37.
Maxwell, J.C. (1867). On the dynamical theory of gases, Philos. Trans.
Roy. Soc. Lond. 157, 49-88.
Narumi, S. (1923a, b and c). On the general forms of bivariate frequency
distributions which are mathematically possible when regression
and variation are subjected to limiting conditions. Parts I and II,
Biometrika 15, 77-88 and 209-221.
Pearson, K. (1894). Contribution to the theory of evolution, Philos.
Trans. Roy. Soc. Ser. A. 185, 71-110.
13
Pearson, K. (1895a). Note on regression and inheritance in the case of
two parents, Proc. Roy. Soc. 58, 240-241.
Pearson, K. (1895b). Contributions to the mathematical theory of evol
ution. II. Skew variation in homogeneous material, Philos. Trans.
Roy. Soc. Ser. A. 186, 343-414.
Pearson, K. (1896). Mathematical contributions to the theory of evolution.
III. Regression, heredity and panmixia, Philos. Trans. Roy. Soc.
Ser. A. 187, 253-318.
Pearson, K. (1900a). On the criterion that a given system of deviations
from the probable in the case of a correlated system of variables
is such that it can be reasonably supposed to have arisen from
random sampling, Philos. Mag. 50, 157-175.
Pearson, K. (1901). Mathematical contributions to the theory of evolut
ion. X. Supplement to a memoir on skew variation, Philos. Trans.
Roy. Soc. Ser. A. 197, 443-459.
Pearson, K. (1904). Mathematical contributions to the theory of evolution.
XIII. On the theory of contingency and its relations to associa
tion and normal correlation, Drapers’ Company Research Memoirs,
Biometric Series 1. 35pp.
Pearson, K. (1905). Mathematical contributions to the theory of evolut
ion. XIV. On the general theory of shew correlation and non-linear
regression. Drapers’ Company Research Memoirs, Biometric Series.
ii+54.
Pearson, K. (1916a). Mathematical contributions to the theory of evolut
ion. XIX. Second supplement to a memoir on skew variation, Philos.
Trans. Roy. Soc. Ser. A. 216, 429-457.
Pearson, K. (1920). Notes on the history of correlation, Biometrika 13,
25-45.
Pearson, K. (1923a). Notes on skew frequency surfaces, Biometrika 15,
222-30.
14
Pearson, K. (1925). The fifteen constant bivariate frequency surface3
Biometrika 17, 268-313.
Pearson, K. and Filon, L.N.G. (1898). Mathematical contributions to the
theory of evolution. IV. On the -probable errors of frequency
constants and on the influence of random selection on variation
and correlation, Philos. Trans. Roy. Soc. Ser.A. 186, 343-414.
Pearson, K. and Heron, D. (1913). On theories of association,
Biometrika 9, 159-315.
Plana, G.A.A. (1813). Memoire sur divers probl&nes de probability3
M€m. Accad.Imper. Turin 20 (ann^es 1811-1812), 355-408.
Plummer, H.C. (1940). Probability and frequency3 Macmillan and Co.,
London, xi+277.
Quetelet, A. (1846). Lettres & S.A.R. le due regnant de Saxe-Cobourg
et Gotha sur la thiorie des probabilitis applique aux sciences
morales et politiques, Hayez, Bruxelles, iv+450pp. (English transl.
by O.G. Downes (1849). Charles and Edward Cayton, London.)
Rhodes, E.C. (1923). On a certain skew correlation surface3 Biometrika
14, 355-377.
Sheppard, W.F. (1898a). On the application of the theory of error to
cases of normal distribution and normal correlation3 Philos. Trans.
Roy. Soc. Ser. A. 192, 101-167.
Sheppard, W.F. (1898b). On the geometrical treatment of the 'Normal Curve’
of statistics with special reference to correlation and to the
theory of errors3 Proc. Roy. Soc. 62, 170-173.
Sheppard, W.F. (1929). The fit of formulae for discrepant observations,
Philos. Trans. Roy. Soc. Ser. A. 228, 115-150.
Todhunter, I. (1869). On the method of least squares, Trans. Camb. Philos.
Soc. 11, 219-238.
Watson, H.W. (1891). Observations on the law of facility of errors.
Proc. Birmingham Phil. Soc. 7, 289-318.
15
Weldon, W.F.R. (1890). The variation occurring in the Decapod Crustacea.
- 1. Crangon vulgaris, Proc. Roy. Soc. 47, 445-453.
Weldon, W.F.R. (1892). Certain correlated variations in Crangon vulgaris,
Proc. Roy. Soc. 51, 2-21.
Weldon, W.F.R. (1906). Inheritance in animals and plants, pp. 81-109 in
T.B. Strong,Lectures on the method of science, Clarendon Press,
Oxford, viii+249.
Yule, G.U. (1900). On the association of attributes in statistics,
Philos. Trans. Roy. Soc. Ser.A. 194, 257-319.
Yule, G.U. (190X). On the theory of consistence of logical class
frequencies and its geometrical representation, Philos. Trans.
Roy. Soc. Ser A. 197,91-134.
Yule, G.U. (1903) Notes on the theory of association of attributes in
statistics, Biometrika 2, 121-134.
University of Sydney
16