development of the notion of statistical dependence* … · (1811), plana (1813), gauss (1823) and...

H.O. Lancaster

(received 21 June, 1971)

1. Early work

Naturally, the study of the distribution of single random varia

bles had preceded that of pairs or sets of random variables. However,

it is a surprise to find that the first steps towards a general theory

of dependence were taken as a result of Francis Galton's interest in

genetics. The explanation seems to be that the astronomers were accu

stomed to assume mutual independence of the errors in their measurements

of the observables and they were not expecting the observables, for exa

mple, the positions of the stars, to be related to one another by any

mathematical laws or for any physical reason. In biology, the observa

bles such as height of father and height of son were obviously mutually

dependent and yet not mutually determined - a type of problem not met

by the astronomers.

However, the astronomers had obtained the multivariate normal

distribution without carrying through all its implications. Laplace

(1811), Plana (1813), Gauss (1823) and Bravais (1846) all derived normal

correlations as the joint distribution of linear forms in independently

distributed normal variables but did not define a coefficient of corre

lation. They expressed the density function as a constant times the

exponent of a general quadratic form. Bravais (1846) in his study on

the errors in the position of a point gave the normal density function

as

(1.1) L ' t ”2 * 2 a » *

DEVELOPMENT OF THE NOTION OF STATISTICAL DEPENDENCE*

* Invited address delivered at the sixth New Zealand Mathematics

Colloquium, held at Wellington, 17-19 May, 1971.

Math. Chronicle 2(1972), 1-16.

1

(1 . 2)

The ellipses of equal probability density are then given by

ax2 + 2exy + by2 = D

as stated on his page 272. For each ellipse, the horizontal tangents

could be drawn and each point of contact lay on a straight line through

the origin, a diameter of the ellipse conjugate to the x axis. This is

the line of the modes of the conditional distributions of X for given

Y = y, since at every other point on this horizontal line the quadratic

form, ax2 + 2exy + by2 , corresponds to an ellipse with a greater value

of the D of equation (1.2). This is also the line of the conditional

means of X but Bravais did not make this point. Bravais (1846) also

determined the principal axes and gave the equation in the form,

He also determined ■» the probability inside the axes of equal probabil

ity in the form,

These equations (1.4) and (1.5) can be regarded as a statement of the

distribution of x 2 with two degrees of freedom or even as an elementary

form of the Pearson lemma. Bravais (1846) determined also the principal

axes and the value of the constant in the equation for the density in

three dimensions of random variables X3 Y and Z by successively integrating out the variables. He did not display the bivariate density as

the product of a marginal density and a conditional density.

K. Pearson (1920) suggested that the interpretations of K.Pearson

(1895) and many later authors were incorrect; for Bravais (1846) was

not proposing a theory of the joint distribution of the observables but

only of the errors. As already mentioned, Francis Galton was the first

(1.3)

(1.4)dttr K_ TT ds ~ TT e

and

(1.5)nr = 1 - e

TT

2

worker to study the mutual dependence of one variable on another as they

appear in the physical world, namely, the relations between heights and

other attributes in human parents and offspring. He had found it dif

ficult to obtain sufficient data in human subjects and so was examining

the produce of seeds of the sweet pea. Galton (1877) observed the phen

omenon of ’reversion1 whereby the mean of the produce of large seeds was

always closer to the 'average ancestral type' than the parents and simi

larly for the produce of small seeds; he gives biological reasons, such

as differential mortality, for this phenomenon. In later papers, he used

'regression' rather than 'reversion' to the mean or 'mediocrity'. The

reversion coefficient may be written as X and y = Ax is the line of rev

ersion or regression of Y on X. If the variance of J, the offspring, is

to be equal to the variance of the parent, X3 and if homosoedastioity3 that is, a uniform conditional variance, be assumed then the variance of

Y can be partitioned into the conditional variance and the variance of

conditional means

_ A

(1.6) var Y = (1 - r2)var Y + var y

= (1 - r 2)var Y + X2 var X,

and X = v if var X = var Y> so that the variance is to be preserved from

generation to generation. In this earlier paper, Galton (1877) did not

seem to realise that there are two regression lines. However, Galton

(1886a) gave a diagram displaying them both; it is of interest that he

obtained, by smoothing an observed distribution of heights of offspring,

the ellipses of equal density such as Bravais (1846) had determined in

the theoretical distribution. Galton (1886b) enlisted the help of Dickson

(1886), who wrote the joint distribution as a product of the marginal

distribution of Y and the conditional distribution of X.

Now it appears that Dickson (1886) could well have taken notice of

several other themes in the multivariate normal distribution. Thus

Bienaymg (1852) had generalized the central limit theorems of Laplace

and of Fourier to several variables. However, he had then made a linear

transformation of his variables to obtain the distribution of the theo

retical x2 for he was not interested in the joint distributions of the

3

variables or errors. Todhunter (1869) also had examined errors jointly

normally distributed and in an appendix to this note, Cayley (1869) had

given the general expression for the constant multiplier of the density,

constant * exp - Igt̂ Cx. Further, Todhunter (1869) had obtained the

characteristic function for the normal correlation.

Herschel (1850) in an extended review of Quetelet's (1846)

Lettres & S.A.R. le due ... had shown how the normal distribution could

be derived as the only possible distribution of errors in two dimensions

such that it was possible to obtain two distinct pairs of independent

randan variables, related by an orthogonal transformation. Indeed,

Herschel (1850) sketched a proof of the proposition, that if X is inde

pendent of Y and if also X cos0 + Y sine is independent of -X sine +

Y cos0, then X is normal and Y is normal. This characterization was bit

terly attacked by Ellis (1850). However, the hypotheses were stated more

exactly by Boole (1854), who made the explicit assumptions that the vari

ables X and Y were mutually independent and that the density was depend

ent solely on the distance from the origin; a functional equation resu

lted,

(1.7) n x 2 + y 2m o) •which had a solution of the form,

(1.8) fix2) = (2irJ~^exp(-%x2),

since (f(x2)dx = 1, and so the distribution was normal. Boole (1854)

also clearly recognized that there could not be a universal law of

errors.

Assumptions closely related to those of Herschel (1850) were used

explicitly by Maxwell (1860, 1867) in his derivation of the kinetic the

ory of gases. He deduced the normal distribution of the velocities of

the gas molecules by assuming that the components of the velocities of

the molecules along any orthogonal set of axes are mutually independent.

As is noted at page 122 of Plummer (1940) the assumption of independence

of the components along three orthogonal axes was disputed and subsequ-

4

ently replaced by an analysis of the dynamical conditions; Plummer

(1940) praises Herschel (1850) for realising that some assumptions

have to be made to reach a solution and for making simple and explicit

assumptions, effective in leading to a solution. Many modern writers

have overlooked the point in Maxwell's discussion and have assumed both

the mutual independence about arbitrary orthogonal axes and the normality

and so have multiplied their hypotheses unnecessarily.

The multinomial distribution is perhaps the simplest of the joint

distributions and was early studied. Bienaym6 (1838) gave the asymptotic

distribution of linear forms in the cell frequencies of the multinomial

suitably standardised and also the asymptotic expression for the probab

ility of the set of cell frequencies. The exponent in this expression is

very close to Pearson's X 2 ; in fact it is the sum £ (observed-expected)2

/(observed). This contribution has been almost entirely overlooked by sub

sequent authors although Meyer (1874) mentions it.

2 . Th e d e v e lo p m e n t o f th e n o t i o n o f d e pe nd e nce b y K a r l P e a rs o nThe best introduction to the next era of development is the state

ment of K. Pearson (1920), who wrote

'It will be seen from what has gone before that in 1892 the next steps

to be taken were clearly indicated. They were, I think,

(a) The abolition of the median and quartile processes as too inexact

for accurate statistics.

(b) The replacement of the laborious processes of dividing by the quar-

tiles and averaging the deduced values of r, by a direct and if pos

sible "best" method of finding r.

(c) The determination of the probable errors of r as found by the "best"

and other methods.

(d) The expression of the multiple correlation surface in an adequate

and simple form.

These problems were solved by Dr Sheppard or myself before the end

of 1897.

Closely associated with these problems arose the question of

5

generalising correlation. Why should the distribution be Gaussian,

why should the regression curve be linear?

As early as 1893 I dealt with quite a number of correlation tables

for long series and was able to demonstrate

(i) by applying Galton*s process of drawing contours of equal frequ

ency that most smooth and definite systems of contours can arise from

long series, obviously mathematical families of curves, which are (a)

ovaloid, not ellipsoid, and (b) which do not possess - like the normal

surface contours - more than one axis of symmetry.

(ii) that regression curves can be quite smooth mathematical curves

differing widely from straight lines

(iii) that in cases wherein (i) and (ii) hold, homoscedasticity is not

the rule.

I obtained differential equations to such systems, but for more

than 25 years while often returning to them, have failed to obtain their

integration.

This seems to me the desideratum of the theory of correlation at

the present time: the discovery of an appropriate system of surfaces,

which will give bi-variate skew frequency. We want to free ourselves

from the limitations of the normal surface, as we have from the normal

curve of errors.'

To Karl Pearson and W.F.R. Weldon, the importance of Galton's

work for future studies of heredity and evolution was clear. Weldon

(1890 and 1892) began to accumulate empirical distributions in one and

two dimensions in order to compare variations in biological variables in

space and time. To explain the non-normality of one of Weldon's empiri

cal distributions Pearson (1894) solved the problem of how to dissect a

mixture of normal distributions into its component normal curves and so

began his great series on the Contributions to the mathematical theory

of evolution. Pearson (1895b, 1901 and 1916a) introduced his system of

curves, which have since found many applications in statistics, as the

solutions to a hypergeometric differential equation. This series of

6

curves appears naturally in the solution of differential equations giv

ing the limiting or equilibrium forms of distribution for certain dif

fusion processes and determining bivariate densities which have

'diagonal' or homogenous polynomial expansions.

K. Pearson (1895b) constructed other joint distributions by sampl

ing experiments with card packs and roulette wheels; he noted that if

the material 'obeys a law of skew distribution, the theory of correlation

as developed by Galton and Dickson requires considerable modification.'

Pearson (1896) gave some historical notes on the history of multivariate

normality, in which he assigned rather more importance, as Pearson (1920)

was later to remark, to the memoir of Bravais (1846) than it deserved in

the particular context, for Bravais (1846) had considered neither regres

sion nor conditional distributions and had not paid any special attention

to the coefficient of correlation. In the same article, Pearson (1896)

introduced the multivariate normal distribution by making a linear trans

formation from mutually independent centred normal variables, the elements

of a vector, £, to the elements of a vector, ri.

(2- d » =

where A is m x n, m < n and of rank m* a transformation which had already

been used by Laplace (1810) and Plana (1813) and also by Todhunter (1869).

By integrating out superfluous variables, Pearson (1896) obtained the

joint density function,

(2.2) f(r\ ,t) ,..., ti ) = constant exp(-%x2),1 2 mwhere the expression X 2 appears for the first time in the literature as

a quadratic form in the variables. If m = 2, there was linear regression

between the two variables and the conditional variance of one variable

for fixed values of the other was reduced in the ratio, (1 - p2), where

p was the coefficient of correlation. Pearson (1896) gave the general

formula for the bivariate density,

(2.3) = constant x exp[~h(g x 2 + 2hxy + g y 2)],1 2

and derived from it the standard form as we now know it. He then

7

showed how to estimate r, the coefficient of correlation in a sample

by maximum likelihood. He extended the analysis to normal distributions

in three and higher dimensions. The reader cannot but wish that he had

separated out the purely mathematical or distributional theory from the

genetical applications, for the mixture of the two aspects is very con

fusing. It appears to the m o d e m reader of Pearson (1896), Edgeworth

(1892) and Sheppard (1898) that Pearson had a much better understanding

of the mathematics and the applications than either of the others,

although this view has not been generally held. Pearson (1896) obtained

the general formula for the multivariate normal, namely

(2.4) = (2l0 ^ e x p ^ ^ g *£) ,

a result already known to Todhunter (1869) and Cayley (1869).

It should be noted that at this time, 1896, few joint distribut

ions, not simply products of the marginal distributions, were known and

the lack of such examples and a corresponding theory led such capable

mathematicians as H.W. Watson (1891) into serious difficulties. There

was also lacking a clear notion of multivariate measure or distribution,

free of the notion of sample. Thus we find the multiplier N appearing

in formulae, as in the first formula of Section 5 of Pearson (1896) in,

what seems to us, a rather irrelevant manner. Perhaps, only after the

axiomatizations of probability of Kolmogorov (1933), for example, were

clear statements possible. Appropriate practical examples were also

lacking. Meteorological or vital statistical examples lacked a mathe

matical model to explain their form. A simple model was possible in

genetics. Galton, Weldon and Pearson all believed that correlation or

lack of independence in the distribution of attributes in relatives was

usually brought about by the possession of random elements in common,

that is, the presence or absence of genetic factors. To imitate such

a genetic model, Weldon (1906) carried out dice throwing experiments

which yielded a bivariate binomial distribution, with marginal distrib

utions of the form, (h + h ) 1 2 , and six random variables held in common.

Pearson (1895b) too was carrying out experiments with teetotums (roul

ette wheels) and card packs to obtain empirical joint distributions.

8

Some novel applications of the theory were now possible. Pearson and

Filon (1898) obtained the joint distribution of the errors in estimates

with the aid of some plausible simplifying assumptions such as joint

normality, thus extending the theory in which mutual independence had

been assumed. They were also able to consider the effect of sampling

on one subset of the variables on the joint distribution of the comple

mentary subset. Now that Pearson (1895b) had described a number of dif

ferent frequency curves, it became for the first time practical to ask

whether empirical distributions were better fitted by one theoretical

distribution or another. Sheppard (1898b) proposed to make such comp

arisons by treating the difference, observed-expected, as a binomial

variable, for example, for each of the cells of a contingency table.

As Sheppard (1929) was later to remark, he had paid insufficient atten

tion in 1898 to the fact that these binomial variables were correlated.

Pearson (1900a) with his knowledge of the mutivariate normal theory was

able to attain the solution, the Pearson X 2 , which could be used as a

criterion to test goodness of fit. Pearson (1900a) made the assumption

that variables having marginal normal distributions were jointly normal.

In the asymtotically normal distribution arising from the multinomial,

this assumption can be justified by the use of the generalised central

limit theorem of Bernstein (1926). A closely related theme had been

developed by Bienaymg (1838) who had obtained the asymptotically normal

distribution of linear forms in the cell contents of a multinomial dis

tribution. Later, Sheppard (1898) independently obtained this result.

The idea is useful for Frechet (1951) has since defined multivariate

normality by the property that every linear form in the variables is

normal.

Pearson (1904) considered multivariate distributions in qualitative

variables, for which he introduced the term, 'contingency' for any meas

ure of the total deviation of the classification from independent proba

bility. These measures might in some cases be independent of the ordering;

he was, therefore,making a new departure freeing the theory of the neces

sity to give primacy to the product moment correlation. For bivariate

distributions, he defined <£2 , the integral of the square of the like-

9

lihood ratio of the bivariate density to the product of the marginal

densities, taken with respect to the product measure. He concluded

that he had generalized the work of Yule (1900, 1901 and 1903).

Pearson (1905) in effect defined correlation in a general sense

to mean that the distribution of Y was not the same for every value of

Xy that is ?(Y$.B\X£A) was not constant for every choice of the set A.

He considered the conditional means and the variance of the conditional

mean and the correlation ratio.

Pearson and Heron (1913) were concerned mainly with the estimation

of indexes of correlation rather than with the introduction of new the

ory. Pearson introduced a notion here and in other articles, that in a

partitioned marginal space of 'discrete' variables, the variable could

be regarded as approximating in some way to an underlying continuous

variable. In the discrete space formed from the space of a continuous

random variable, there is indeed a linear form in the indicator variab

les of the cells of the condensed space which has maximum correlation

with the continuous marginal variable and this maximum correlation tends

to unity as the partition is appropriately refined.

The introduction of a new system of distributions in one dimension

was a great step forward, but K. Pearson was much less fortunate in his

generalizations of the idea to two-dimensional distributions; indeed,

Pearson (1923a) noted that, after the introduction of his system of cur

ves to graduate empirical frequency distributions in one dimension, he

had attempted to describe a system of bivariate distributions as the sol

utions of a differential equation but that much work had failed to pro

vide such a system. Certain particular results had been obtained; in

general, a bivariate distribution could not be represented as a product

distribution by a rotation of axes, for example, if the density was ref

erred to the 'principal inertial system of the contour system' of the

bivariate density, a set of independent random variables was not thereby

obtained, as it would be in the multivariate normal distributions.

E.C. Rhodes (1923) and L.N.G. Filon had obtained some special forms of

bivariate densities, which they had not published at the time of the

10

completion of their work. Pearson (1923a) was rather concerned that

the correlation between such variables appeared to be determined by

the marginal distributions. In Pearson (1923a and b) some surfaces

given by analytic expressions were examined for properties such as lin

ear regression. If homoscedasticity were imposed, the normal distribu

tion was characterized. Pearson (1925) attempted to extend to two

dimensions the Charlier method of approximating to the normal in one

dimension. He wished to obtain bivariate surfaces for arbitrary values

of the sample size and moments and mixed moments up to the 4th order,

the fifteen-constant bivariate density. However, we cannot regard

this effort as productive as there are too many parameters to be fitted

and it is possible that negative values of the density may result.

The most successful attempt to obtain bivariate densities was,

according to Pearson (1923a), made by S. Narumi (1923, a, b and c).

Narumi (1923a) had begun by considering a 'regression function', by ̂ *

which he meant any functions of the form y = y{x) and x = x(y),x y

such as the mode of the conditional distribution for given x, the

conditional mean, a linear function in the independent variable.

'Homoscedasticity' was reinterpreted as the same form of conditional

distribution for each value of the independent variable. He then

considered various differential equations. It is of interest that

if the centres of location of the conditional distributions are

situated on straight lines and if the conditional distributions have

the same form, then there are but few possibilities for either the

marginal variables are mutually independent, the distribution is

jointly normal or the joint distribution is degenerate. It is clear

from Narumi (1921b) that this approach leads to very heavy algebra

and functional equations, difficult to solve except in the normal

distribution. The principal extension of the theory due to Narumi,

Rhodes and Pearson may be said to be the densities of the form,T a

f(x)= (1 ± x Ax) , the bivariate hypergeometric distribution and the

multivariate hypergeometric distribution and the multivariate binomial

distribution. Bernstein (1927) obtained the independence distributions,

the joint normal distribution and some other special distributions as a

result of his investigations in this problem.

11

REFERENCES

Bernstein, S.N. (1926). Sur I’extension du theor&me limite du calcul

des probabilites aux sonmes de quantites dipendantes, Math. Ann.

97, 1-59.

Bernstein, S. (1927). Fondements giomitriques de la thiorie des correl

ations, Metron 7(2), 1-27.

Bienayme, J. (1838). Sur la probability des r£sultats moyens des obser

vations; demonstration directe de la r&gle de Laplace, M6mor.

Sav. Etrangers Acad. Sci. Paris 5, 513-558.

Bienaymg, J. (1852). Sur la probability des erreurs d'apr&s la m£thode

des moindres carris, J. Math. Pures Appl. 17, 33-78.

Boole, G. (1854). On a general method in the theory of probabilities,

Phil. Mag. (4) 8, (reprinted in Studies in probability and stat

istics (1952), Watts and Co., London).

Bravais, A. (1846). Analyse mathematique sur les probabilitis des erre

urs de situation d'un point, M6m. de l'Instit. de France 9,

255-332.

Cayley, A. (1869). See Todhunter (1869).

Dickson, J.D.H. (1886). Appendix to Galton (1886),Proc. Roy. Soc. 40,

63-73.

Edgeworth, F.Y. (1892). The law of error and correlated averages, Phil.

Mag. (5) 34, 429-438 and 518-526.

Ellis, R.L. (1850). Remarks on an alleged proof of the ’Method of Least

Squares ’ contained in a late number of the Edinburgh Review,

Phil. Mag. (3) 37, 321-8 and 462.

Frgchet, M. (1951). G&n&ralisations de la loi de probability de Laplace,

Ann. Instit. Henri Poincar£ 12, 1-29.

Galton, F. (1886a). Regression towards mediocrity in hereditary stature,

J. Anthrop. Inst. 15* 246-263.

12

Galton, F. (1886b). Family likeness in stature, Proc. Roy. Soc. 40,

42-63.

Galton, F. (1877). Typical laws of heredity, Proc. Royal Instit. Gt.

Britain 8, 282-301.

Galton, F. (1908) Memories of my life, E.P. Dutton and Co., New York.

Gauss, C.F. (1823). Anwendung der Wahrscheinlichkeitsrechnung auf eine

Aufgabe der praktischen Geometrie, Astron. Nachr. 1, 81-88.

Herschel, J. (Anonymously) (1850). A review of Lettres h S.A.R. le due

regnant de Saxe-Cobourg et Gotha sur la theorie des probabilites

appliquie aux sciences morales et politiques, by M.A. Quetelet

(1846), Edinburgh Review 92, 1-57.

Kolmogorov, A.N. (1933), Grundbegriffe der Wahrscheinlichkeitsrechmmg,

Ergebnisse der Math. 2(3), 196-262.

Laplace, P.S. (1811). M&moires sur les intigrales definies et leur ap

plication aux probabilitis, Memoiresde l’Institut Imperial de

France, 1811, 279-347.

Meyer, A. (1874). Calcul des probabilitis de A. Meyer public sur les

manu8crits de I’auteur par F. Folie, M6m. Soc. Lifege, 6(2),

x+446 pp.

Maxwell, J.C. (1860). Illustrations of the dynamical theory of gases,

Phil. Mag. (4) 19, 19-32 and (4) 20, 21-37.

Maxwell, J.C. (1867). On the dynamical theory of gases, Philos. Trans.

Roy. Soc. Lond. 157, 49-88.

Narumi, S. (1923a, b and c). On the general forms of bivariate frequency

distributions which are mathematically possible when regression

and variation are subjected to limiting conditions. Parts I and II,

Biometrika 15, 77-88 and 209-221.

Pearson, K. (1894). Contribution to the theory of evolution, Philos.

Trans. Roy. Soc. Ser. A. 185, 71-110.

13

Pearson, K. (1895a). Note on regression and inheritance in the case of

two parents, Proc. Roy. Soc. 58, 240-241.

Pearson, K. (1895b). Contributions to the mathematical theory of evol

ution. II. Skew variation in homogeneous material, Philos. Trans.

Roy. Soc. Ser. A. 186, 343-414.

Pearson, K. (1896). Mathematical contributions to the theory of evolution.

III. Regression, heredity and panmixia, Philos. Trans. Roy. Soc.

Ser. A. 187, 253-318.

Pearson, K. (1900a). On the criterion that a given system of deviations

from the probable in the case of a correlated system of variables

is such that it can be reasonably supposed to have arisen from

random sampling, Philos. Mag. 50, 157-175.

Pearson, K. (1901). Mathematical contributions to the theory of evolut

ion. X. Supplement to a memoir on skew variation, Philos. Trans.

Roy. Soc. Ser. A. 197, 443-459.

Pearson, K. (1904). Mathematical contributions to the theory of evolution.

XIII. On the theory of contingency and its relations to associa

tion and normal correlation, Drapers’ Company Research Memoirs,

Biometric Series 1. 35pp.

Pearson, K. (1905). Mathematical contributions to the theory of evolut

ion. XIV. On the general theory of shew correlation and non-linear

regression. Drapers’ Company Research Memoirs, Biometric Series.

ii+54.

Pearson, K. (1916a). Mathematical contributions to the theory of evolut

ion. XIX. Second supplement to a memoir on skew variation, Philos.

Trans. Roy. Soc. Ser. A. 216, 429-457.

Pearson, K. (1920). Notes on the history of correlation, Biometrika 13,

25-45.

Pearson, K. (1923a). Notes on skew frequency surfaces, Biometrika 15,

222-30.

14

Pearson, K. (1925). The fifteen constant bivariate frequency surface3

Biometrika 17, 268-313.

Pearson, K. and Filon, L.N.G. (1898). Mathematical contributions to the

theory of evolution. IV. On the -probable errors of frequency

constants and on the influence of random selection on variation

and correlation, Philos. Trans. Roy. Soc. Ser.A. 186, 343-414.

Pearson, K. and Heron, D. (1913). On theories of association,

Biometrika 9, 159-315.

Plana, G.A.A. (1813). Memoire sur divers probl&nes de probability3

M€m. Accad.Imper. Turin 20 (ann^es 1811-1812), 355-408.

Plummer, H.C. (1940). Probability and frequency3 Macmillan and Co.,

London, xi+277.

Quetelet, A. (1846). Lettres & S.A.R. le due regnant de Saxe-Cobourg

et Gotha sur la thiorie des probabilitis applique aux sciences

morales et politiques, Hayez, Bruxelles, iv+450pp. (English transl.

by O.G. Downes (1849). Charles and Edward Cayton, London.)

Rhodes, E.C. (1923). On a certain skew correlation surface3 Biometrika

14, 355-377.

Sheppard, W.F. (1898a). On the application of the theory of error to

cases of normal distribution and normal correlation3 Philos. Trans.

Roy. Soc. Ser. A. 192, 101-167.

Sheppard, W.F. (1898b). On the geometrical treatment of the 'Normal Curve’

of statistics with special reference to correlation and to the

theory of errors3 Proc. Roy. Soc. 62, 170-173.

Sheppard, W.F. (1929). The fit of formulae for discrepant observations,

Philos. Trans. Roy. Soc. Ser. A. 228, 115-150.

Todhunter, I. (1869). On the method of least squares, Trans. Camb. Philos.

Soc. 11, 219-238.

Watson, H.W. (1891). Observations on the law of facility of errors.

Proc. Birmingham Phil. Soc. 7, 289-318.

15

Weldon, W.F.R. (1890). The variation occurring in the Decapod Crustacea.

- 1. Crangon vulgaris, Proc. Roy. Soc. 47, 445-453.

Weldon, W.F.R. (1892). Certain correlated variations in Crangon vulgaris,

Proc. Roy. Soc. 51, 2-21.

Weldon, W.F.R. (1906). Inheritance in animals and plants, pp. 81-109 in

T.B. Strong,Lectures on the method of science, Clarendon Press,

Oxford, viii+249.

Yule, G.U. (1900). On the association of attributes in statistics,

Philos. Trans. Roy. Soc. Ser.A. 194, 257-319.

Yule, G.U. (190X). On the theory of consistence of logical class

frequencies and its geometrical representation, Philos. Trans.

Roy. Soc. Ser A. 197,91-134.

Yule, G.U. (1903) Notes on the theory of association of attributes in

statistics, Biometrika 2, 121-134.

University of Sydney

16

development of the notion of statistical dependence* … · (1811), plana (1813), gauss (1823) and...

Documents