spatial variation in soil and the role of kriging

Agricultural Water Management, 6 (1983) 111--122 Elsevier Science Publishers B.V., Amsterdam -- Printed in The Netherlands

111

SPATIAL VARIATION IN SOIL AND THE ROLE OF KRIGING

R. Webster and T.M. Burgess

Introduction Soil varies from place to place in such a way that anyone of i ts

properties is best considered mathematically as a random function. The semi-variogram of regionalized variable theory can describe such variation in a region. I t can be estimated from data, and simple mathematical models can usually be found to f i t the sample values well. In these circumstances local estimates of soil properties can be made from sample data by kriging. These estimates are optimal in the sense that they are unbiased and have minimum variance.

The theory has been applied recently in several case studies to characterize variation in soil in the lateral plane, to perform local estimation and to map soil properties from sample data. I t has also been used to make regional estimates with substantial gains in precision over traditional practice, Experience suggests that the intrinsic hypothesis of stationary mean and variance of differences of terms is adequate in practice in most instances, and that block krlging is l ikely to be the most profitable form of kriging.

The procedures are il lustrated using data from a survey of cover loam in Eastern England.

The Nature of Soil Variation Soil varies from place to place, both laterally and vertically.

Methods are needed for describing this variation in such a way that values at unrecorded places can be predicted, or estimated, from sample values elsewhere. The vertical variation, usually termed its profile, has been the concern of pedologists for many years. I t has been described conventionally by recognizing layers, termed horizons, and then treating each of these separately. Lateral variation has been treated similarly. Soi l surveyors recognize where the soil changes in a relatively abrupt manner, and draw boundaries there to separate the soil into classes. They then describe each class separately from sampling points within them. Average or typical values within classes are then used as predictors for those classes. Variation remains within such classes, but this is regarded either as so small that i t can be ignored or else as an unavoidable nuisance that has to be

0378-3774/83/$03.00 © 1983 Elsevier Science Publishers B.V.

112

tolerated within the classi f icat ion scheme. At certain scales of operation and with meagre resources for survey the attitude is often reasonable. The approach takes l i t t l e account, however, of gradual change, either from one soil class to another or within any one soil class. As the scale and intensity of survey is increased, and the area of interest decreased, this type of change becomes increasingly important and may be the only kind of change that has any detectable pattern. In fact, one might expect such change to be the norm for properties such as the depth of the water table and soil water potential.

Figures 1 and 2 i l l us t ra te something of the nature of soil variation. Both are actual examples. Figure 1 shows values of the electr ical conductivity of the soil over a distance of about 100 m. Variation is somewhat irregular but smooth. Figure 2 shows how the thickness of cover loam, a wind blown deposit, varies at another site. The thickness increases somewhat from le f t to r ight, but there is a substantial anomaly at around 300 m and much point-to-point f luctuation. In our experience the situation shown in Figure 2 is the more typical. Indeed, i t is so typical of soil variation that we use i t as an example throughout the rest of this paper. We reported the fu l l study ear l ier (Burgess and Webster, 1980a, 1980b).

v) 6O

E

I - 0

• ~ 2o ° ~

ffl

006 • i l l l 0 0

• oo%0 • • % o e ° ° o o o °iD • • O • Oeoo. . ° o° oOOooO • % 0 • o • oOOOo

= = , e o e e e e e " ~ @ e o • ° • OOilOgir,,~ 00

_ I I I I I 20 40 60 80 100

P o s i t i o n I m e t r e s

Fig. 1. Values of electrical conductivity in the soil measured at 1 m intervals across an archaeological site at Bekesbourne, Kent. From Webster and Burgess (1980).

113

100

80

E U

60 I /) W

.~40 U c- 1--

20

I I I I 0 100 200 300 400

P o s i t i o n / m e t r e s

Fig. 2. The thickness of cover loam measured at 20 m intervals at Hole Farm, Norfolk.

The soil water physicist clearly wants means of describing soil variation quantitatively, and he may wish to use such description for estimation and interpolation. That description must take into account two features of soil. The f i r s t is that the longer range trends have no simple mathematical form. There is not usually any obvious repeating pattern, and the larger the area or the more intensive the sampling the more complex does the variation appear. The second is that the point- to-point variation in a sample reflects real variation in the soil. Only a part of i t , and then often a very small proportion, is measurement error.

A n~ber of the earlier attempts to describe spatial variation in geology and geography involved f i t t ing deterministic global equations to data, either exactly or by least squares approximation (trend surface analysis). The two features mentioned above, however, make the approach inappropriate for soil. There is no theory to suggest any particular type of equation, variation in one part of the region affects the f i t everywhere, and no proper estimate of error is possible; the residual mean square is a biased estimatlor. Splines can be f i t ted and be used as interpolators (Webster, 1978). They overcome the f i r s t and second objections, but s t i l l .provide no estimate of errors.

The alternative has been to treat the soil as a random function, and to describe i t using the methods of Matheron's (1965,1971) regionalized variable theory. In this view there is no underlying mathematical relation between the soil and i ts position on the ground. Even i f there is i t is l ikely to remain unknown, and in any case knowledge of i t is unnecessary. Instead relationships are expressed in terms of separation regardless of absolute position. The theory essentia]ly expresses the idea that values of a soil property at near places are l ikely to be

114

similar, whereas those at places far from one another are not. I t does so quantitatively and in a way that can be used for interpolation. Matheron's source texts, referred to above, describe the mathematical theory. The more recent book by Journel and Huijbregts (1978) gives a good, practically oriented account at an intermediate level.

Regionalized variable theory is the basis of kriging, a technique for local estimation. I t is already being used widely in mining and mineral exploration, and is now being explored for its value in soil research (e.g. Burgess and Webster, 1980a, 1980b; Burgess et a l . , 1981; McBratney et a/.,1982) and especially for its application to irrigation problems (e.g. Hajrasuliha et a l . , 1980; Sisson and Wierenga, 1981; Vieira et a l . , 1981).

Let x denote a place in one, two or three dimensions. The underlying model of spatial variation can then be expressed in general as

~(x) = re(x) + ~(x). (1)

where z(x) is the value of some property, z, of the soil at a place x, m(x) is some deterministic function of z at x and ~(x) is a residual from the function. This model looks at f i r s t like the trend surface model. The most important difference is that the residual, E(x), is not a random error but is a local con~onent of variation that is spatially dependent. Its effect is usually so strong that ~x) can be reduced to a very simple function - often a mean for the locality, and almost certainly nothing more complex than a quadratic in x.

The Semi-variogram I t is often found that the expected value, denoted E[z(x)], of a

soil property within a locality is effectively constant. We say "effectively" because the sampling variation is often so large that i t is d i f f icu l t to distinguish a change in expectation against this noise (e.g. Figure 2). The property is then stationary in the mean for practical purposes; i.e.

E [z(x) ] = ~. (2)

This implies for any two places x and x+h separated by the vector k that

L cx) - : o. (3 )

Also we can usual ly assume that the variance of the di f ferences is f inite and constant throughout the locality, so that

var [z(X)-z(x+h)]=E i{z(X)-z(x+h)}2]= 2Y(h). (4)

These two conditions, stationary mean and variance of differences, define the i n t r i ns i c h ypothesie of regionalized variable theory. They enable the spatial variation present in soil to be described quantitatively yet simply and at the same time provide the tools for spatial prediction.

115

The quantity y(h), which is half the variance of the differences between values at places separated by h, is the se~-va~ianee. I t can be thought of as the dissimilarity between the two places or as half the error of estimating the value at one place from an observation at the other. The vector h is known as the lag. Given the intrinsic hypothesis y(h) can be estimated from data by

I /n

~(h) = - - ~ { z (x i ) - z(xi+h)} 2, (5) 2m £=i

where there are m pairs of observations at lag k. Provided there are sufficient data derived from a suitable sampling

scheme y(h) can be estimated for lags varying in both distance and direction to provide an ordered series of values. A graph of ~(h) against h is a sample se~-va~io~o.m. In many soil survey applications only lateral variation is of interest. The sample semi-variogram can then be regarded as points lying above a plane defined by the lags in the two lateral dimensions. I f variation is isotropic then direction is of no consequence and the sample semi-variogram can be drawn as a simple graph, with semi-variance plotted against distance. Figure 3 shows the semi-variogram of the thickness of cover loam at Hole Farm in Norfolk computed from 452 observations on a square grid with a 20 m interval by Burgess and Webster (1980a).

1000

800

0,1

E o 600

400

200

• • • 0 0 • O O • •

. S i l l • • . " " " , " "

. . . . ~ I " • - T - • - - "~

e O

/ F i t t e d m o d e l :

y (h) ,187,0+603.8{3 h 1 10_.~..2 } 101.2 2( . )3 for 0<h~,101.2

~ (h ) - 187.0+603.8 for h>101.2

100 | I

0 200 300 Lag / m

Fig. 3. Semi-variogram of cover loam thickness. From Burgess and Webster (1980a).

The sample semi-variances are not only estimates of particular true values but are also discrete estimates of a continuous function, Y (h).

116

Provided they have been calculated from enough original observations they usually follow approximately some simple curve. To be useful an equation is needed for this curve, and this can be found by f i t t i n g suitable models by least squares procedures. In Figure 3 the curve f i t ted is that of the spherical model (Matheron,1965),

3h 1 ( h ) 3 Y(h) = e 0 + e I __- -- for 0 < h < a

2a 2 (6)

Y(h) : e 0 + e I for h ~ a

with e 0 = 187.0 cm 2 e I = 603.8 cm 2 a = 101.2 m

and h = [hi since the variation is isotropic.

This example i l lustrates some common features of soil variation. F i rs t there is usually a maximum variance within a region at which the semi-variogram levels out. I t is known as the g i l l , and is equal to e 0 + e I in equation (6). Note that this is not the same as the sample variance unless the spacing is so large that there is no spatial dependence in the data. Associated with the s i l l is a ~ange, a in equation (6), which is the lag distance at which the s i l l is reached, either actually as in Figure 3 or for practical purposes where the f i t ted model approaches the s i l l asymptotically. The range is the distance within which values of the soil are related and beyond which they are independent. I t l imits the neighbourhood within which interpolation is profitable. A third feature is what is known as the hug. get ef feet . The semi-variance at zero lag is i t se l f zero. Nevertheless any smooth curve f i t ted to the observed semi-variance in Figure 3 wi l l not pass through the origin but wil l cut the ordinate at some positive value, e O. This value is the nugget variance, a term derived from gold mining. I t represents variation occurring within distances much less than the shortest sampling interval. I t wi l l include any measurement error, but as mentioned earl ier, such error wil l be only a small proportion of the nugget variance in many cases. The true form of var iat ion within very short distances is concealed in the remainder, and can be discovered only by sampling more closely.

I t is worth making the point here that the form of the semi- variogram can never be determined absolutely. There is always some error, as above. The error can be diminished by increased sampling, and i t is worth doing as much sampling as can be afforded for such basic information. The nugget variance is l ike ly to be reduced by increasing the size of individual sample volumes. Each large volume wil l contain some of the variation that was present in the smaller sample volume, and for additive properties, such as the water content of the soil at a particular time, the nugget variance wil l be smaller by that amount. For intensity properties such as hydraulic conduc- t i v i t y , the nugget variance wil l also be diminished by increasing the volumes of samples, but not by any easily predicted amount.

117

The semi-variogram is intended to describe variation in terms of a stationary random function. The expression on the right hand side of equation (5) can be computed for almost any soil data, however, whether or not the differences are stationary. Where there are abrupt changes i n mean at the scale of sampling, as at soil boundaries, equation (2) does not apply and the sample semi-variogram should be computed separately within each bounded parcel. I f the pattern of spatial dependence is similar for all parcels then the results may be pooled to give an average within-parcel semi-variogram (Webster, 1982).

Where there are dist inct deterministic local trends the procedure can be elaborated to accommodate them. Olea (1975) describes a method for doing this. The resulting semi-variograms are, of course, only a description and not an explanation, and there is substantial scope for research to understand the physical process or processes that give rise to any particular semivariogram. We are only at the beginning.

Kriging Whether we understand how a semi-variogram arises or not we can use

i t for interpolation. The procedure for doing so is known in earth sciences as k~,iqing, a term corned by Matheron in honour of D.G. Krige who had seen the need for the method and applied i t in the South African goldfields (see Krige, 1966, for exa~le).

Kriging is essentially a means of local estimation in which each estimate is a weighted average of n observed values. The estimates can be for points, that is, volumes of soil the same size and shape as those on which the measurements were made. Alternatively they can be for larger blocks of particular shape. In the f i r s t case we can express the estimated value of the property %, a t a place x o as

n

~(Xo) = ~ ~i z(xi). (7) i=i

The weights, ~i, are chosen so that the estimate, ~(Xo), is unbiased and that the estimation variance is less than for any other linear combination of the observed values. I t is this sense that kriging is optimal. The weights therefore sum to 1 and, subject to that constraint, the minimum variance is obtained when

n

~jY(Xl,Xj) + ~=Y(xi, X o) for all i , (8) j=l

/2

and is 2 = ,/=1 ~ ~jY(X'i' x°) + ¢" (9)

The quantity Y(Xl, x~) is the semi-variance of z between the sampling points x i and x4, whTley(x;, ~ ) is the semi-variance between the sampling point x i and the #oin~x o. These are obtained from the semi- varlogram. The quantity ~ is a Lagrange multiplier associated with the

118

mi nimi zati on. In block kriging we estimate the average value of z over the block

B,

f z(x)dx namely z(x/~) = , by

area R B n

~(x~) = ~xlzlxi), (IO) i=1

with weights summing to 1, as before. The minimum variance is now

n

o2 =~ ~j~(xc, x~) +~-~(x R, xR), ~ ,f=l

(11)

and is obtained when

n

~sx(Xi, x#) +~R =#(xi" x~) for al l i . (12) y=l

The quantity T(xi,x~) is the average semi-variance between the sampling point X: and all polnts within B, and #(xR,x R) is the variance within R.

By s~Iving the set of equations (8) we obtain the weights, which we can then use to find the estimated value at x o, equation (7). We can also determine the estimation variance as a by-product. Thus, kriging not only provides the best, in the sense of most precise, estimate from a particular set of data, i t also provides estimates of that precision from which confidence l imits can be gauged. This feature makes i t especially attractive: i t is a proper stat is t ica l estimation procedure, unlike other interpolators.

The weights are interesting and have important practical conse- quences. Where there is appreciable spatial dependence the weights for the observations nearest to ~ are large while those relat ively far off are so small that they are negligible. I t is usually found that only the nearest 16 to 20 observations carry suff ic ient weight to influence z(X o) and that the remainder can be ignored. A kriged estimate is thus local. I t is necessary to obtain the semi-variogram accurately only over a fa i r ly short range; long range trends are of no consequence. Ideally this should be done in a previous reconnaissance in which semi- variances are estimated for lags considerably shorter than the sampling interval in the main survey.

The local nature of kriging also bears on the kriging equations. Equations (8) can be represented in matrix form as

['i A .= b, , j

where A is a matrix containing semi-variances between the sampling

119

sites, b is a vector containing the average semi-variances between the sampling site and x o, and I is the vector or weights. Their solution clearly involves invertin 9 A. I f this had to embrace all the sampling points in a survey i t could be a formidable, i f not impossible, task. But by restricting i t to include only the nearest few sampling points i t becomes small enough to invert very swiftly. For mapping, where interpolated values are needed for many points, the matrix wil l change, especially i f the original data are irregularly scattered. Some ingenious computer programming may s t i l l be needed to minimize the number of matrix inversions.

A further feature of kriging is that the area or volume over which kriged estimates are made can be chosen to suit the purpose of the investigation. The aim wil l often be to interpolate by estimating values at points. In this case kriged estimates at sampling points are equal to the recorded values there. In equation (7) the weight at the sampling point, xi, i=0, equals 1 and all other weights are zero. In these circumstances kriging provides exact interpolation, and the estimation variance is zero. Elsewhere the estimation variances are necessarily larger than the nugget variance. Estimates for larger blocks are averages. Their values wil l usually be very similar to those obtained by point kriging at their centroids, except where the centroids coincide with sampling points. Their estimation variances, however, will be less, and in many instances substantially less.

This feature of kriging has great merit for the soil water physicist. Almost always his data derive from measurements in narrow boreholes or from instruments that sense water in small volumes of soil. Yet he wishes to know with reasonable confidence the values for

Fig. 4. Block diagram of the thickness of cover loam interpolated by punctual kriging.

120

larger areas of volumes - experiments plots, i r r igat ion beds, f ie lds and larger regions. Provided the semi-variogram is known accurately block kriging can give the desired estimates.

Fig. 5. Block diagram of the thickness of cover loam interpolated by kriging 40 m x 40 m blocks.

These features of kriging are i l lustrated in figures 4 and 5, again for the thickness of cover loam. Figure 4 shows the thickness as a s tat is t ica l surface interpolated by punctual kriging. The surface goes through the observed values, and because of the nugget effect, i .e. a spatial ly uncorrelated component of variance, there are discontinuit ies in the surface at the sampling points. The surface obtained by kriging 40 m x 40 m blocks each from 25 observations is shown in Figure 5, Discontinuities are absent. Estimation variances for the nodes of a 3 x 3 grid superimposed on each square cell of the original observation grid are given below. The black discs represent the sampling points, the crosses are the other nodes on the interpolation grid.

• X X • • X X • 0 316 316 0 32.9 32.5 32.5 32.9

X X X X X X X X

316 324 324 316 32.5 32.0 32.0 32.5

X X X X X X X X

316 324 324 316 32.5 32.0 32.0 32.5

m x x • • x x •

0 316 316 0 32.9 32.5 32.5 32.9

Punctual kriging Block kriging (40m x 40m)

121

ReBional Estimation Finally, the physicist may wish to obtain an estimate for the

average of some property over a w~ole region. This has often been regarded as a very simple statistical exercise. The straight arithmetic mean of the data for the region has been calculated, and confidence has been based on the standard error calculated as though there were no spatial dependence. By disregarding any spatial dependence, however, estimates of precision have been unduly conservative - the confidence intervals calculated have been wider than they actually were (Webster, 1977). Likewise, in planning sampling schemes to achieve particular degrees of precision investigators have overestimated the number of observations needed, often quite seriously. Indeed in some instances i t has been so serious that the investigation has been abandoned.

Data can be used more effectively using kriging techniques as follows (McBratney and Webster, in press). We assume a region in two dimensions. The region is divided into polygons such that each con- tains one sampling point and that all other points in that polygon are nearer to the sampling point in i t than to any other sampling point. The result is known as the Di~ichlet teeeellation, and the individual polygons as Pi~ichlet t i les or Vo~onoipol~ons. An estimate is made for each t i le by kriging, and from these an average is calculated with each kriged value weighted by the area of i ts t i le . The estimation variance is obtained as a weighted average of the extension variances of the t i les divided by the number of t i les. (The extension variance of a t i le in this case is the variance of estimating its average value from the single sample measurement contained in i t . ) Provided the sampling intervals are within the range of the semi-variogram, i.e. the sampling points are spatially dependent, the estimation variance obtained by kriging will be less than that derived from classical theory.

In the example of the cover loam survey, classic~1 methods gave a mean thickness of 66.25 cm, with variance 1.7404 cm ~ equivalent to a standard error of ~.32 cm. The kriged estimate was 66.18 cm, with variance 0.5084 cm ~ or standard error of 0.713 cm. We have obtained up to nine-fold apparent gains in precision in other examples. Comparable gains in efficiency, and hence savings in cost, can be achieved i f the semi-variogram is known when planning a sampling scheme.

Conclusion This paper has reviewed the problem of estimating values of soil

properties for small blocks of land and for larger regions given the nature of soil variation. Kriging, based on the notion of soil as a random function, appears to provide the best solution. I t seems especially pertinent to physical properties associated with water in soil. I t is a subject that soil physicists should master and whose application they should explore in the years to come.

122

REFERENCES

Burgess, T.M., and Webster, R. 1980a. Optimal interpolation and isarithmic mapping of soil properties. I. The semi-variogram and punctual kriging. Journal of Soil Science 31, 315-331.

Burgess, T.M., and Webster, R. 1980b. Optimal interpolation and isarithmic mapping of soil properties. I I . Block kriging. Journal of Soil Science 31, 333-341.

Burgess, T.M., Webster, R., and McBratney, A.B. 1981. Optimal interpolation and isarithmic mapping of soil properties. IV. Sampling strategy. Journal of Soil Science 32, 643-659.

Hajrasuliha, S.N., Baniabbassi, N., Metthey, J., and Nielsen, D.R. 1980. Spatial variabil i ty of soil sampling for salinity studies in South-west Iran. Irrigation Science I, 197-208.

Journel, A.G., and Huijbregts, C.J. 1978. Mining geostatistics. Academic Press, London.

Krige, D.G. 1966. Two dimensional weighted moving average trend surfaces for ore-evaluation. Journal of the South African Institution of Mining and Metallurgy 66, 13-38.

Matheron, G. 1965. Les variables regionalis~es et leur estimation. Masson, Paris.

Matheron, G. 1971. The theory of regionalized variables and its applications. Les Cahiers du Centre de Morphologie Math#matique 5, Centre de Geostatistique, Fontainebleau.

McBratney, A.B., Webster, R., McLaren, R.G., and Spiers, R.B. 1982. Regional variation of extractable copper and cobalt in the topsoil of South-east Scotland. Agronomie 2 (in press).

McBratney, A.B., and Webster, R. In press. How many observations are needed for regional estimation of soil properties? Soil Science (in press).

Olea,R.A. 1975. Optimal mapping techniques using regionalized variable theory. Series on Spatial Analysis, No.2. Kansas Geological Survey, Lawrence.

Sisson, JOB., and Wierenga, P.J. 1981. Spatial variabi l i ty of steady- state inf i l t rat ion rates as a stochastic process. Soil Science Society of America Journal 4§, 699-704.

Vieira, S.R., Nielsen, D.R., and Biggar, J.W. 1981. Spatial variabi l i ty of field-measured inf i l t rat ion rate. Soil Science Society of America Journal 45, 1040-1048.

Webster, R. 1977. Quantitative and numerical methods in soil classification and survey. Oxford University Press.

Webster, R. 1978. Mathematical treatment of soil information. Transactions, 11th International Congress of Soil Science 3, 161-190.

Webster, R. 1982. Spatial analysis of soil and its application to soil mapping. In: Computer applications in geology I and I I . Miscellaneous Paper No.14, Geological Society of London, pp.103-136.

Webster, R., and Burgess, T.M. 1980. Optimal interpolation and isarithmic mapping of soil properties. I I I . Changing dr i f t and universal kriging. Journal of Soil Science 31, 505-524.

spatial variation in soil and the role of kriging

Documents