empirical bayes estimation for archaeological stratigraphy

Empirical Bayes estimation for archaeological

stratigraphy

G. T. Allum,

University of Leeds and University of Bradford, UK

R. G. Aykroyd

University of Leeds, UK

and J. G. B. Haigh

University of Bradford, UK

[Received June 1997. Final revision April 1998]

Summary. Models and algorithms are presented for the stratigraphic analysis of earth core samplescollected at archaeological sites. The aim is to separate the occupation of the site into distinctperiods, by dividing the earth core into well-de®ned blocks of uniform magnetic susceptibility. Themodels describe the response of detector equipment by using both a spread function and an errorprocess, and they incorporate prior beliefs regarding the nature of the true susceptibility values.The prior parameters are estimated by using pseudolikelihood and the susceptibilities by maximuma posteriori methods via the one-step-late algorithm. These procedures are illustrated with datafrom synthetic and real core specimens. The new procedures prove to be far superior to otherapproaches, producing reconstructions which clearly show distinct periods of uniform magneticsusceptibility separated by sharp discontinuities.

Keywords: Archaeology; Concave priors; Geophysics; Implicit discontinuity priors; Maximum aposteriori estimation; One-step-late algorithm; Pseudolikelihood; Stratigraphy

1. Introduction

A side-e�ect of human occupation of a site is a tendency to increase the concentration ofmagnetic oxides of iron in the top-soil (Le Borgne, 1955; Tite and Mullins, 1971; Mullins,1977). When ancient pits and ditches have been ®lled with such top-soil, through naturalprocesses of deposition, they have a slightly increased magnetization which can be detectedthrough small anomalies in the overall magnetic ®eld above them; this is the basis of animportant technique of geophysical survey on archaeological sites.

The ratio of induced magnetization (per unit volume) to the inducing magnetic ®eld iscalled the magnetic volume susceptibility, or more simply the susceptibility. In the caseconsidered here the greater the concentration of magnetic oxides in a particular sample oftop-soil the greater will be its susceptibility. Magnetization and magnetic ®eld strength aremeasured in the same physical units (amperes per metre in SysteÁ me International (SI) units)and hence susceptibility is a dimensionless quantity. It is important to note that, sincemagnetization and magnetic ®eld strength are not equivalent physical quantities, the value of

Address for correspondence: R. G. Aykroyd, Department of Statistics, University of Leeds, Leeds, LS2 9JT, UK.E-mail: [email protected]

& 1999 Royal Statistical Society 0035±9254/99/48001

Appl. Statist. (1999)48, Part 1, pp.1^14

susceptibility depends on the system of units in use (Mullins (1977) and Parasnis (1986), Table2.1). In modern SI units, top-soils are expected to have susceptibilities of the order of 10ÿ4.As well as detecting variations in susceptibility through minute changes in the total

magnetic ®eld above the ground, it is useful to measure susceptibility directly. The removal ofsmall samples of soil for laboratory analysis may lead to a better understanding of theprocesses of site deposition. It would clearly be advantageous to take a whole stack ofsamples in the form of a vertical soil core. By measuring the susceptibility of slices of the coreat di�erent depths it is possible to estimate the depth and vertical extent of the occupationlayers, quantities which cannot be observed directly by area magnetometry.

A core sample of the earth is obtainedbyusing a soil borer, a standardpedological tool, whichcan penetrate over 1 m deep in stone-free ground. If the earth is particularly soft and void ofpebbles or large bits of masonry then a plastic cylinder can be pushed into the soil and sealed onsite to prevent the sample from drying out, to obtain a volumetrically correct sample. Sincewater and air are almost magnetically neutral the susceptibility of a feature is largely una�ectedby variations in moisture content, although drying out can change the volume. Often the stratain the core show no variation in colour or texture, but an analysis of the magnetic susceptibilitycan di�erentiate between the separate periods of the now departed landscape.

Once collected, the core is passed through a detector coil, allowing readings of thesusceptibility to be made along the length of the sample. Since the detector coil is sensitiveto the susceptibility across an extended section of the sample, the instantaneous readingindicates, not the value at a sharply de®ned point, but a weighted average of the values overan extended range.

Several techniques are available whereby the restoration of a true signal may be attemptedby direct inversion. Techniques such as Fourier inversion prove to be unsatisfactory, however,since high frequency components in the signal are smoothed away by the broad spread functionbut almost certainly exist in the recorded data because of noise in the measuring system. Suchcomponents are grossly magni®ed by the inversion process, leading to a restored signal whichmay consist largely of random noise. The Wiener ®lter (Gonzalez and Wood, 1992) provides amethod of suppressing the intrusive high frequency components, but it has proved to have onlylimited success for magnetic stratigraphy data (Allum et al., 1996). Although Fourier inversionin combination with the Wiener ®lter may reduce the broad spread of the peaks in the read-out from the detector, the restored signal is still very smooth, making it impossible to obtaina sharp division between the regions of di�erent susceptibility. This is in contradiction tothe archaeologists' expectation that there should be clearly de®ned boundaries betweenneighbouring blocks of soil (known as `contexts'), indicated by sharp changes in the magneticsusceptibility. Among other methods of non-linear deconvolution, one commonly used tech-nique is maximum entropy (Jain, 1989). This is a powerful method for signal reconstructionwhich can be applied when the source is known to take only non-negative values. It is notapplicable in this example since the susceptibility is negative in diamagnetic materials such asquartz,marble, graphite, rock salt and gypsum (Parasnis (1986), p. 7). The aimof this paper is topresent a method of restoration which takes account of the archaeologists' prior expectations,by allowing the reconstructed pro®le to be divided into a limited number of blocks of sharplydistinguishable susceptibility.

2. Modelling the spread function

To collect data, the earth core in its protective plastic cylinder is positioned a small distancefrom one end of the detector coil in line with the coil's axis. The cylinder is then moved

2 G. T. Allum, R. G. Aykroyd and J. G. B. Haigh

mechanically in small steps of equal size, pausing between movements for readings to berecorded automatically. In its initial position, the core is su�ciently far from the coil that itse�ect on the coil is negligible and hence the ®rst reading is e�ectively zero. As the core ismoved nearer and passes into the coil, the readings increase; as it emerges from the other endof the coil, the readings decrease. The core continues to advance beyond the coil until thereading has e�ectively returned to zero. The recorded output from this procedure includes aclear indication of the zero level of the apparatus and the tails of the spread function due tothe distant e�ect of the core.

Let the output readings be denoted y � fyi: i � 1, . . ., ng. The ®rst few readings arerecorded well before the core enters the coil, and the last few well after it has emerged. Inboth cases it may be assumed that the magnetic susceptibility of the core has a negligiblee�ect on the coil. It is unusual for the exact boundaries of the physical core to be recorded;hence this knowledge has not been incorporated in the modelling. If reliable information wereavailable then it should be possible to improve the reconstructions, particularly at the ends ofthe core.

Although the true magnetic susceptibility may vary continuously along the core, for ease ofestimation we shall consider the core to be partitioned into m elements along its length, withsusceptibilities denoted by x � fxj: j � 1, . . ., mg. It is always assumed that the ®rst elementx1 is at the centre of the coil when the ®rst reading y1 is taken, and that the last element xm isthere when the last reading yn is taken. Neither of these elements is part of the physical core,but of a linear extension into a region of zero susceptibility. Using the methods describedlater, it is not necessary for the elements of the core to match the steps of the measuringapparatus, so the resolution of the estimated pro®le may be increased or decreased from thatof the measurement. Since there is no particular reason to change the resolution for the casesdescribed in this paper, the width of the elements is set equal to the length of each step, andhence m is set equal to n.

The susceptibility is measured by detecting small changes in the inductance of the coil asthe core is passed through it. According to standard theory of magnetism, the coil induces amagnetic dipole in each element of the core (Linington, 1964). For a single-turn coil with apoint source on its axis, the resultant change in the inductance is

r2=2�r2 � d2�3=2, �1�where r is the radius of the coil and d is the distance, along the coil axis, from the centre of thecoil to the point source.

The measuring device that is used in practice is unlikely to be a single turn but may containmany, and in consequence will occupy an extended region of space, both in length and inradial thickness. In addition the core is not de®ned in terms of point sources, but elementswith cross-sectional area. It is di�cult to account for all the features of an actual measuringdevice, including the ®nite dimensions of both the coil and the core, but a suitableapproximation (Allum, 1997) for the change in the inductance is

h � 1

4w

"d� w

fp �rÿ a�2 � �d� w�2g ÿdÿ w

fp �rÿ a�2 � �dÿ w�2g

#, �2�

where a is the radius of the core and r and 2w are the radius and length of the coil.Although both spread functions, de®ned in equations (1) and (2), have similar tails, being

nearly proportional to an inverse cube at large distances from the coil, the broader centralportion de®ned by equation (2) makes it a better approximation to the truth. The factors r2=2

Archaeological Stratigraphy 3

in equation (1) and 1=4w in equation (2) ensure that the spread functions are correctlynormalized when the unit of distance is taken as the width of a core element; the ®nalestimates will then have the correct magnitude.

With an extended core made of many elements, we replace d in equation (2) by dij, andlikewise h by hij, where dij is the distance along the coil axis, from the centre of the coil to theposition of core element j at step i, i.e. when data value yi is recorded. The expectation of theobserved susceptibility at step i is then given by

�i � E�yi� �Pmj�1

xjhij for i � 1, . . ., n. �3�

In practice, the observed measurements are subject to error from various sources. It hasbeen veri®ed in calibration experiments (Allum, 1997) that a Gaussian error model, with zeromean and constant variance, accounts satisfactorily for the apparent errors.

3. Methods and algorithms

3.1. EstimationAssuming an additive Gaussian error model the conditional distribution of the data given thetruth is

Yijx � N��i, �2� for i � 1, . . ., n �4�

where �i is given in equation (3).In many problems involving inverse estimation, maximum likelihood produces unac-

ceptable estimates. The Bayesian approach to estimation quanti®es prior knowledge in termsof a probability distribution. This is combined with the data likelihood by using Bayes'stheorem, and inference about the true pro®le is based on the resultant posterior distribution.The development of the general framework for Bayesian image analysis can be attributed toBesag (1983), Grenander (1983) and Geman and Geman (1984); Besag et al. (1995) is a goodgeneral reference for the subject.

Any beliefs about the truth can be incorporated globally or locally. The current workfollows the views expressed by many researchers, including Green (1986, 1990a) and Aykroydand Green (1991), that posterior distributions are sensitive only to the local properties of theprior and that global modelling may not be important; hence it concentrates on the localcharacteristics only.

By using a prior in the form of a Gibbs distribution, i.e. ��xj�� / expfÿ� U�x�g, where U iscalled the energy, an analogy with physical systems can be exploited (Metropolis et al., 1953;Geman and Geman, 1984). Expected con®gurations of the physical system will have lowenergy and hence high probability in the statistical system. An energy function of the form

U�x� � Pj 0^ j

��xj 0 ÿ xj�

considers the interaction of pairs of neighbouring elements only, where j 0 ^ j indicatesadjacent elements. The (non-negative) prior parameter � determines the degree of correlationbetween neighbouring elements. The pairwise di�erence prior is then

��xj�� 1

z�� exp�ÿ � P

j 0^ j

��xj 0 ÿ xj��

�5�


where the normalizing constant z is given by

z�� exp

�ÿ � P

j 0^ j

��xj 0 ÿ xj��dx.

The potential function � is chosen to satisfy the criteria

(a) ��u� � ��ÿu�,(b) ��u�5 0 and ��0� � 0, and(c) ��u� increases with juj.Using a pairwise di�erence prior of the form of equation (5), earlier researchers (Geman

and McClure, 1987; Besag, 1989; Green, 1990a) have made various suggestions for the choiceof �. Extensive experiments on simulated core data (Allum et al., 1996) revealed that themedian potential ��u� � juj was the best of these but was still not completely satisfactory,in that the boundary between adjacent blocks was more di�use than would be expectedarchaeologically. To sharpen the boundary, it is necessary that a single large jump insusceptibility should have higher probability than the equivalent series of small jumps.This can be achieved by considering an implicit discontinuity potential, a quadratic function(concave downwards) for small values of its argument but linear for larger values:

��u� �juj ÿ ��ÿ 1�

2� juj2 if juj4 ,

1

�juj � ��ÿ 1�

2�if juj > .

8><>: �6�

At the crossover the two pieces of the function are smoothly matched in both magnitudeand gradient. The parameter � takes a large positive value, ensuring that the linear portion of� has a small positive gradient to satisfy criterion (c). Otherwise the precise value of � haslittle signi®cance, and � and are the only prior parameters which require to be estimated.

Combining the prior density ��xj�, � and the likelihood ��yjx� by Bayes's theoremproduces the posterior density of any x given the data y

��xjy, �, � � ��yjx� ��xj�, ��yj�, � .

Inference about the truth is then based on this posterior distribution.

3.2. EM and one-step-late algorithmsTo estimate the true susceptibility we must choose the value of x which maximizes thelikelihood or posterior. The high dimensionality of the problem means that conventionalquasi-Newton or conjugate gradient procedures (e.g. the Davidon±Fletcher±Powell orFletcher±Reeves±Polak±Ribiere procedures (Press et al., 1992)) are not practicable asconvergence is achieved extremely slowly. A suitable alternative procedure is the expectation±maximization (EM) algorithm (Dempster et al., 1977), which generalized several otherprocedures emerging from speci®c applications (e.g. Richardson (1972) and Lucy (1974)).The algorithm was introduced as a

`general approach to the iterative computation of maximum-likelihood estimates when the ob-servations can be viewed as incomplete data'.

The subsequent approach closely follows that of Green (1990a, b). The missing data fzij:


i � 1, 2, . . ., n; j � 1, 2, . . ., mg, where zij is the contribution from core element j to therecording at step i, cannot be directly observed. Instead, only sums of contributions from allelements yi � � zij are recorded. The values y and z together constitute the complete data.Since yi is distributed according to equation (4), the distribution of the disaggregated data is

zij � N�xjhij, �2=m�.

As a ®rst step in the EM algorithm, each zij is estimated by its conditional expectation giventhe data (E-step):

zij � E�zijjy� � hijxj � �yi ÿ �i�=m.

If the values zij were observed, with log-likelihood

logf��zjx�g / ÿ m

2�2Pmj�1

Pni�1�zij ÿ hijxj�2,

the maximum likelihood estimate of xj could be found (M-step). The E- and M-steps arecombined to give the iterative step for the EM algorithm

xnewj � xold

j �P �yi ÿ �i�hij=m

Ph2ij,

all sums being for i � 1, . . ., n. The starting values are arbitrary; here all zeros were used.The iterative step is repeated until convergence is obtained; the proof of convergence is givenby Dempster et al. (1977). Major drawbacks of such deterministic procedures are that there isno guarantee that the ®nal solution will be the global maximum and typically the rate ofconvergence is very slow. In fact, convergence is often so slow that the maximum likelihoodestimate is never achieved by the EM approach and the noisiness of the reconstruction isusually increasing when the algorithm is stopped (Green, 1990a).

When maximizing the log-posterior probability, the E-step is unchanged, but the maxi-mum a posteriori (MAP) estimate of xj is found by solving the simultaneous equations (theM-step)

m

�2Pni�1�zij ÿ hijxj�hij ÿ �

Pj 02@j

@�j 0j

@xj

� 0, for j � 1, . . ., m,

where �j 0j � ��xj 0 ÿ xj, � and @j denotes the neighbouring elements of j; later x@j will be used torepresent the set of values of the neighbours. For any potential satisfying the criteria of Section3.1 these equations will be either coupled, or non-linear or both, and hence di�cult to solve. Toavoid this problem, the one-step-late (OSL) algorithm (Green, 1990b) can be used. In the OSLalgorithm the terms @�j 0j=@xj are evaluated at the old estimates rather than at the new. Inmaximizing the log-posterior probability this leads to the M-step (for the OSL algorithm)

xnewj �

1Ph2ij

�Pni�1

zijhij ÿ ��2

m

Pj 02@j

@�j 0j

@xj

��xoldj

�.

Combining the E- and M-steps for the OSL algorithm gives the iteration for the OSLalgorithm

xnewj � xold

j �1

mP

h2ij

�Pni�1�yi ÿ �i�hij ÿ ��2

Pj 02@j

@�j 0j

@xj

��xoldj

�.


Again the starting value is arbitrary and the rates of convergence are slow. For the OSLalgorithm, however, there is no guarantee that the algorithm will converge and, if it does,then no guarantee that it is the global maximum.

An advantage of using the OSL algorithm over other algorithms to ®nd the MAP estimateis that prior knowledge can be incorporated without greatly increasing the computationalexpense; the cost is only slightly greater than that of the EM algorithm for maximumlikelihood.

3.3. Pseudolikelihood parameter estimationFrom equation (5) the distribution of x given � and is

��xj�, � � 1

z��, � exp�ÿ�Pm

j�1

Pj 02@j

��xj ÿ xj 0 , ��

�7�

where @j indicates the neighbours of element j. Geman and McClure (1987) proposed the useof the EM algorithm for estimating a single prior parameter concurrently with the estimationof x; a similar approach is employed here for the estimation of two prior parameters, namely� and .

On the assumptions that

(a) it is possible to observe x directly and(b) each xj given its neighbours x@j is independent of all other xjs,

the conditional distribution of xj given x@j, � and is

��xjjx@j, �, � �1

zj��, �exp

�ÿ � P

j 02@j��xj ÿ xj 0 , �

�.

The product

�*�xj�, � � Qmj�1��xjjx@j, �, � �8�

is then termed the pseudolikelihood of x given � and (Besag, 1975). Consequently themaximum pseudolikelihood estimates of � and can be found in principle by solvingPm

j�1

�Pj 02@j

��xj ÿ xj 0 , � �1

zj��, �@

@�zj��, �

�� 0,

Pmj�1

��Pj 02@j

@

@ ��xj ÿ xj 0 , � �

1

zj��, �@

@ zj��, �

�� 0.

Rather than attempt to solve these complex equations explicitly, a pro®le likelihoodapproach is used; estimates of � are found, over a range of -values, and the resultant log-pro®le-pseudolikelihood log��*fxj�� , g� examined; even this involves considerable e�ort.

4. Experiments

4.1. Application to phantom coresThe techniques described in Section 3 will now be tested by applying them to data setsobtained from phantom cores. Each core was created by making a multilayered `sandwich'


from blocks of material with known susceptibility, possibly including magnetically neutralmaterial. The distinct boundaries between adjacent blocks are intended to reproduce thesharp contrasts that are expected between di�erent archaeological contexts. Two cores wereselected as typical of the many that are available. The advantage of using such phantom coresis that the true susceptibilities are known, and so the accuracy of the statistical estimationassessed. The true susceptibility pro®les are shown in Figs 1(a) and 2(a), with the observeddata in Figs 1(b) and 2(b). The susceptibilities restored by means of the Fourier-based Wiener®lter are shown in Figs 1(c) and 2(c); these may be used for comparison with the statistical


(a) (b)

(c) (d)

(e) (f)

Fig. 1. Phantom core I: the true susceptibility pro®le (a) is made from blocks with known susceptibility Ð thiscore is passed through the detector equipment to produce the observed data (b); the Wiener ®lter estimate (c)shows the approximate location of the blocks; the maximum likelihood estimate (d) has wildly ¯uctuating noisewhich hides the signal; the MAP estimate using the median potential function (e) shows the main blocks clearly,but the detail of the block on the left-hand side is not resolved; the MAP estimate using the implicit discontinuitypotential function (f) clearly shows all blocks with sharp jumps between blocks

methods. Maximum likelihood estimates are shown in Figs 1(d) and 2(d), and are clearly veryunhelpful. It should be pointed out that, owing to the very slow convergence of the EMalgorithm, the true maximum likelihood estimates are likely to be even noisier than thesereconstructions. Figs 1(e) and 2(e) show reconstructions using a median potential, with theprior parameter chosen by using pseudolikelihood.

Fig. 3 shows log��*fxj�� , g� and �� plotted over a range of for phantom core I; theplots for phantom core II are almost identical and are not shown. For both data sets the log-pseudolikelihood increases with to a maximum and then, although it is not clearly shown inthe graph, slightly decreases and settles at a constant value.


(a) (b)

(c) (d)

(e) (f)

Fig. 2. Phantom core II: the true susceptibility pro®le (a) is made from blocks with known susceptibility, whichgive the observed data (b); the Wiener ®lter estimate (c) shows the general location of the blocks; the wild¯uctuations in the maximum likelihood estimate (d) hide the signal; the MAP estimate using the median potentialfunction (e) shows the blocks clearly, but with sloping sides; the MAP estimate using the implicit discontinuitypotential function (f) clearly shows the blocks with very sharp jumps between blocks

From Fig. 3(a) it can be seen that for phantom core I the log-pseudolikelihood achieves amaximum value logf��xj�, �g � 1741 for � 1:4� 10ÿ3 and from Fig. 3(b) the correspond-ing estimate of � is � � 18� 103. For large values of the log-pseudolikelihood settles at aconstant 1734 with a value � � 21� 103; these correspond to the maximum pseudolikelihoodvalues for a median potential. For phantom core II, logf��xj�, �g � 1598 for � 2:1� 10ÿ3

and � � 7:9� 103; the likelihood settles at a value 1595 with � � 8:2� 103. This informationis summarized in Table 1, along with an estimate of �2 using the mean-squared error of theMAP estimate.

These prior parameter estimates are close to those found previously by extensive trial anderror. This agreement is encouraging and leads us to use this automatic procedure on otherdata sets.

The MAP reconstructions of the phantom core by using these estimated parameters areshown in Figs 1(f) and 2(f). These reconstructions are much closer to the true susceptibilitypro®les than those by using the median potential. In general all contexts are more clearlyde®ned and in particular in Fig. 1 the three-level block on the left-hand side is now clearlyresolved.


Fig. 3. Pro®le pseudolikelihood �*fx j�( ), g as a function of in (a) for phantom core I with correspondingvalues of �( ) in (b) (the pseudolikelihood has a maximum at � 1:4� 10ÿ3; the corresponding estimate of � is�( ) � 18� 103)

Table 1. Maximum pseudolikelihood estimates of and �

Parameter Estimates for the following cores:

Phantom core I Phantom core II

� � 103 18 7.9 � 10ÿ3 1.4 2.1�2 � 10ÿ10 15.7 4.8

4.2. Application to real coresThe methods of parameter estimation and MAP estimation have thus far been demonstratedon data from phantom cores. To establish their status as reliable methods they are applied tothe estimation of the susceptibility of real soil cores.

The cores were extracted from `the Park', a mid-Iron-Age farmstead at Guiting Power inGloucestershire, and were part of a controlled experiment to establish whether evidence ofburning is discernible in earth strata. The experiment consisted of burning to the ground awooden funeral pyre containing the corpse of a sheep and covering the burnt area with top-soil. Five cores were removed from the pyre region of the site, four from the main area ofburning and one from the periphery. The analysis for the core from the periphery and one ofthe four cores from the main area will be described in detail and will be referred to as cores 1and 2 respectively.

Each of the cores was passed through the detector equipment and the observed sus-ceptibility digitally recorded; the data sets are shown in Figs 4(a) and 4(d); the site surfaceis at the left-hand side of the diagrams. For a complete comparison of the di�erent methodsthe susceptibility is also estimated by using the Wiener ®lter, Figs 4(b) and 4(e). The estim-ated parameter values are listed in Table 2 and the resultant MAP estimates are shown inFigs 4(c) and 4(f).


Fig. 4. Estimation for the Guiting Power cores: the observed data for core 1 (a), in the periphery of the site,shows a low background reading along the whole core; the Wiener ®lter estimate (b) suggests that there are threepeaks of slightly higher susceptibility; the MAP estimate (c) shows these blocks more clearly; for core 2 (in themain area) the observed data (d) show high readings near the middle of the core; the Wiener ®lter estimate moreclearly shows this high peak, but detail elsewhere is unclear; the MAP estimate (f) shows a well-de®ned block inthe middle, with constant susceptibility elsewhere

The Wiener ®lter, a method which is often used in the analysis of this type ofarchaeological data, again shows the general locations of distinct contexts; it does not,however, give a clear indication of the sharp boundaries expected between neighbouringcontexts. These results are obviously less satisfactory than the MAP estimates which clearlyshow the distinct contexts.

Pyre core 1, which was extracted from the periphery of the burnt region, is a good indicatorof the contexts that were present in the site before the experiment. Core 2 clearly showsan additional stratum of enhanced susceptibility, a result of the burning, on top of the`background' susceptibility. The changes in the susceptibility are sharply de®ned and thebeginning and end of the core are distinguishable, unlike the Wiener ®lter estimates where theGibbs phenomenon (Bracewell, 1986; Champeney, 1987) disguises its exact position.

The lack of contrast in the susceptibility of core 1 has an e�ect on the estimatedparameters; �, and �2 are all smaller than the values estimated for core 2. The signalrecorded is reduced for this core and consequently the signal-to-noise ratio used in theWiener ®lter is larger.

In conclusion the MAP estimates of the pyre cores provide an excellent estimate of thearchaeological contexts and clearly detect the e�ect of burning, a certain indication of pastsettlement.

5. Discussion

The reconstruction of the susceptibility pro®le by Fourier techniques is hindered by theGibbs phenomenon or ringing. This is an overshoot, occurring at both sides of adiscontinuity or sharp change in the estimated pro®le, which arises when the Fourierapproximation attempts to converge to a discontinuous function. This e�ect cannot beeradicated by increasing the number of terms used in the Fourier expansion. The Wiener ®ltercan be used to reduce the high frequency ringing; the locations of clearly separated blocks ofsusceptibility are then predicted quite well, but their shapes are smooth with no sharp edges.An alternative solution to the problem of ringing would be to consider the sum of other typesof functions, e.g. wavelets (Kay, 1994).

The MAP estimates are clearly better than the Fourier-based methods, since bothpotentials produce reconstructions with sharper edges and ¯at tops. With the medianpotential, all the blocks have sloping sides rather than the sharp vertical edges that areexpected by archaeologists, indicating that substantial changes in magnitude are made by aseries of small steps rather than one large step. Consequently, closely spaced blocks are notfully resolved.

The implicit discontinuity potential was introduced to improve the sharpness of the edges


Table 2. Parameter estimates for the Guiting Power pyre cores

Parameter Estimates for the following cores:

Pyre core 1 Pyre core 2

� � 104 6.8 9.6 � 10ÿ4 2.2 6.5�2 � 10ÿ11 4.3 9.3

and has proved to be very successful. The positions of the blocks are largely coincidentwith the truth, but there is still some `®lling in' between the blocks; in practice, however, aseparation of the contexts is far more important than the absolute value of the susceptibility.If necessary, once the separate blocks are clearly distinguishable, the core can be dissectedand the susceptibility of each block measured by other means. However, in limited simulationexperiments a `ramp-shaped' susceptibility pro®le was reconstructed as a step function. Thismeans that the implicit discontinuity potential has its limitations and should not be used insituations where smoothly varying pro®les are expected.

The drawbacks of using a Bayesian approach are the need to choose the prior distributionand the consequent introduction of extra prescriptive parameters. The median potential iswell de®ned, but an implicit discontinuity potential can be chosen from a wide range ofpossible forms. The essential feature is the concavity for small di�erences, which is in contrastwith the convex potentials recommended by others.

In several of the estimated pro®les, small blocks appear on the side of the main blocks.Since these rarely appear in the reconstruction of simulated data, their most likely cause isinaccurate modelling of the spread function. Non-linear e�ects are considered to beinsigni®cant at the low magnetizations that are involved in the measurements, well belowsaturation levels. Although it would be desirable to investigate the response of the measuringapparatus in more detail, particularly to establish precise knowledge of the spread function, itwould involve a great e�ort from the experimentalist in preparing phantom cores and makingthe necessary measurements.

A major limitation of EM-type algorithms is that it is not possible to estimate otherquantities of interest, such as standard errors and con®dence intervals. For the EM algorithmvarious researchers (Jamshidian and Jennrich (1997) and references therein) have exploredapproaches based on the observed information matrix by using asymptotic normalapproximations. These are often algebraically tedious, however, or apply to only specialcases. An alternative approach to overcome this limitation is Markov chain Monte Carlo(MCMC) methods. It would also be possible to include all prior parameters in the estimationprocess as well as computing con®dence intervals. Typically MCMC approaches need carefulmonitoring to ensure proper convergence to the posterior distribution. Either of theseapproaches would require more computing e�ort than could be conveniently deployed by®eld archaeologists.

The approach that has been presented here can be readily applied to other techniques inarchaeological geophysics, and preliminary investigations using area magnetometer surveysof archaeological sites are reported elsewhere (Allum et al., 1995, 1996). In addition, themethod developed provides a general framework for reconstruction where sharp edges arebelieved to be important, and it should ®nd a wide range of applications.

Acknowledgements

We thank Arnold Aspinall and Armin Schmidt of the Department of ArchaeologicalSciences, University of Bradford, for alerting us to the original problem, for technical adviceand for supplying the data. We also thank Alastair Marshall of Guiting Manor AmenityTrust, Cheltenham, Gloucestershire, for allowing us to use the site data. We are grateful tothe referees, Associate Editor and Joint Editor for invaluable comments on an earlier versionof this paper. During this work Gayle Allum was supported by an Engineering and PhysicalSciences Research Council research studentship.


References

Allum, G. T. (1997) A statistical approach to inverse data problems in archaeological geophysics. PhD Thesis.Department of Statistics, University of Leeds, Leeds.

Allum, G. T., Aykroyd, R. G. and Haigh, J. G. B. (1995) A new statistical approach to reconstruction from areamagnetometry data. Archaeol. Prospectn, 2, 197±205.Ð(1996) Restoration of magnetometry data using inverse-data methods. Analec. Praehist. Leid., 28, 111±119.Aykroyd, R. G. and Green, P. J. (1991) Global and local priors, and the location of lesions using gamma-camera

imagery. Phil. Trans. R. Soc. Lond. A, 337, 323±342.Besag, J. (1975) Statistical analysis of non-lattice data. Statistician, 24, 179±195.Ð(1983) Discussion of paper by P. Switzer. Bull. Int. Statist. Inst., 50, 422±425.Ð(1989) Towards Bayesian image analysis. J. Appl. Statist., 16, 395±407.Besag, J., Green, P. J., Higdon, D. and Mengersen, K. (1995) Bayesian computation and stochastic systems. Statist.

Sci., 91, 883±904.Bracewell, R. N. (1986) The Fourier Transform and Its Applications, 2nd edn (revised). New York: McGraw-Hill.Champeney, D. C. (1987) A Handbook of Fourier Theorems. Cambridge: Cambridge University Press.Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977) Maximum likelihood from incomplete data via the EM

algorithm (with discussion). J. R. Statist. Soc. B, 39, 1±38.Geman, S. and Geman, D. (1984) Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images.

IEEE Trans. Pattn Anal. Mach. Intell., 6, 721±741.Geman, S. and McClure, D. E. (1987) Statistical methods for tomographic image reconstruction. Bull. Int. Statist.

Inst., 52, 5±21.Gonzalez, R. C. and Wood, R. E. (1992) Digital Image Processing. Reading: Addison-Wesley.Green, P. J. (1986) Discussion on On the statistical analysis of dirty pictures (by J. Besag). J. R. Statist. Soc. B, 48,

284±285.Ð(1990a) Bayesian reconstruction from emission tomography data using a modi®ed EM algorithm. IEEE

Trans. Med. Imgng, 9, 84±93.Ð(1990b) On use of the EM algorithm for penalized likelihood estimation. J. R. Statist. Soc. B, 52, 443±452.Grenander, U. (1993) General Pattern Theory. Oxford: Oxford University Press.Jain, A. K. (1989) Fundamentals of Digital Image Processing. Englewood Cli�s: Prentice Hall.Jamshidian, M. and Jennrich, R. I. (1997) Acceleration of the EM algorithm by quasi-Newton methods. J. R. Statist.

Soc. B, 59, 569±587.Kay, J. (1994) Wavelets. In Statistics and Images (ed. K. V. Mardia), vol. 2, pp. 209±224. Oxford: Carfax.Le Borgne, E. (1955) Abnormal magnetic susceptibility of the topsoil. Ann. Geophys., 11, 399±419.Linington, R. E. (1964) The use of simpli®ed anomalies in magnetic surveying. Archaeometry, 7, 3±13.Lucy, L. B. (1974) An iterative technique for the recti®cation of observed distributions. Astronom. J., 79, 745±765.Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. and Teller, E. (1953) Equations of state

calculations by fast computing machines. J. Chem. Phys., 21, 1087±1092.Mullins, C. E. (1977) Magnetic susceptibility of the soil and its signi®cance in soil scienceÐa review. J. Soil Sci., 28,

223±246.Parasnis, D. S. (1986) Principles of Applied Geophysics, 4th edn. New York: Chapman and Hall.Press, W. H., Teukolsky, S. A., Vetterling, W. T. and Flannery, B. P. (1992) Numerical Recipes in FORTRAN: the

Art of Scienti®c Computing, 2nd edn. Cambridge: Cambridge University Press.Richardson, W. H. (1972) Bayesian-based iterative method of image restoration. J. Opt. Soc. Am., 62, 55±59.Tite, M. S. and Mullins, C. E. (1971) Enhancement of the magnetic susceptibility of soils on archaeological sites.

Archaeometry, 13, 209±219.


empirical bayes estimation for archaeological stratigraphy

Documents