non-counting errors in 14c dating

7
NON-COUNTING ERRORS IN C DATING Richard Pardi* and Leslie Marcus? *Queens College Radiocarbon Laboratory tDepartment of Biology Queens College, CUNY Flushing, New York 11367 INTRODUCTION The conventional method of reporting laboratory error for 14C dates consists in the summation of “counting” (Poisson) errors in background, standard, and sample determinations. With good reason, few labs go further in attempting to incorporate into their results errors resulting from other than the random process of radioactive decay. Cunie,2 Hultin,j Neustupny,’ and Vertes’ cautioned, but without quantitative estimates, that quoted standard errors may underestimate actual measurement errors. Although it is not surprising that 4C analysts avoid including such errors as the uncertainty in half-life, sample-to-sample fractionation, contamination, etc., it is, nevertheless, worthwhile to attempt to evaluate both the source and magnitude of errors that cannot be related to counting statistics alone. In undertaking this study, we were faced with certain limitations that may reflect on the scope of our conclusions. First, as we will discuss below, our procedure consisted of observing various categories of repeat analysis, with many labs included in each category - in fact, as many as eight labs may have analyzed one sample. However, not every lab performs (or at least reports performing) duplicate or check analysis with equal frequency. Certain labs perform a great many recounts and duplicates but participate in interlab checks less often, or vice versa. Hence, all 14C labs are not included equally across the various categories. Second, we have relied only on published dates. Some of the worst intercomparisons and interlab blunders are never published; also, some labs do not publish their results at all. Third, we compared measured differences with quoted counting errors, as has been done in previous statistical analysis of 14C dates (Clark,’ Long and Rippeteau? SpauldinglO). Not all labs calculate errors in an equivalent manner (Spaulding’ O). Although most labs state essentially “errors are the sum of background, standard and sample counting errors,” an examination of the labs’ published record reveals different methods of esti- mating such errors. In some labs, errors of background, standard, and sample are calculated on the basis of actual counts observed. In others, the error of the sample only is calculated in this way, whereas the background and standard errors are, in fact, standard deviations of comparable period counts about a mean, Still other labs consistently recount their samples, standards, and backgrounds, and their errors are all standard deviations of comparable period counts about a mean. Still further complicating the picture are those labs that report no errors less than some minimum, or that calculate a pooled empirical error for samples within a given age range. Whereas other studies of radiocarbon error (e.g., Currie,2 Neustupny,’ Ralph and Michael: Ralph et al.,’ Renfrew and Clark,* Stuiver and Suess,12 Tauber,’ Wendland and Donley’ ’) have been primarily concerned with the 174

Upload: richard-pardi

Post on 29-Sep-2016

217 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: NON-COUNTING ERRORS IN 14C DATING

NON-COUNTING ERRORS IN ’ C DATING

Richard Pardi* and Leslie Marcus? *Queens College Radiocarbon Laboratory

tDepartment of Biology Queens College, CUNY

Flushing, New York 11367

INTRODUCTION

The conventional method of reporting laboratory error for 14C dates consists in the summation of “counting” (Poisson) errors in background, standard, and sample determinations. With good reason, few labs go further in attempting to incorporate into their results errors resulting from other than the random process of radioactive decay. Cunie,2 Hultin,j Neustupny,’ and Vertes’ cautioned, but without quantitative estimates, that quoted standard errors may underestimate actual measurement errors.

Although it is not surprising that 4C analysts avoid including such errors as the uncertainty in half-life, sample-to-sample fractionation, contamination, etc., it is, nevertheless, worthwhile t o attempt to evaluate both the source and magnitude of errors that cannot be related to counting statistics alone.

In undertaking this study, we were faced with certain limitations that may reflect on the scope of our conclusions. First, as we will discuss below, our procedure consisted of observing various categories of repeat analysis, with many labs included in each category - in fact, as many as eight labs may have analyzed one sample. However, not every lab performs (or at least reports performing) duplicate or check analysis with equal frequency. Certain labs perform a great many recounts and duplicates but participate in interlab checks less often, or vice versa. Hence, all 14C labs are not included equally across the various categories. Second, we have relied only on published dates. Some of the worst intercomparisons and interlab blunders are never published; also, some labs do not publish their results at all. Third, we compared measured differences with quoted counting errors, as has been done in previous statistical analysis of 14C dates (Clark,’ Long and Rippeteau? SpauldinglO). Not all labs calculate errors in an equivalent manner (Spaulding’ O). Although most labs state essentially “errors are the sum of background, standard and sample counting errors,” an examination of the labs’ published record reveals different methods of esti- mating such errors. In some labs, errors of background, standard, and sample are calculated on the basis of actual counts observed. In others, the error of the sample only is calculated in this way, whereas the background and standard errors are, in fact, standard deviations of comparable period counts about a mean, Still other labs consistently recount their samples, standards, and backgrounds, and their errors are all standard deviations of comparable period counts about a mean. Still further complicating the picture are those labs that report no errors less than some minimum, or that calculate a pooled empirical error for samples within a given age range.

Whereas other studies of radiocarbon error (e.g., Currie,2 Neustupny,’ Ralph and Michael: Ralph et al.,’ Renfrew and Clark,* Stuiver and Suess,12 Tauber,’ Wendland and Donley’ ’) have been primarily concerned with the

174

Page 2: NON-COUNTING ERRORS IN 14C DATING

Pardi & Marcus: Errors in 14C Dating 175

ultimate accuracy of the method, i.e., with the relationship 14C year/calendar year, this study is intended t o reveal errors of accident and omission in the collection, preparation, counting, and reporting of ages. The former studies point ou t sources of error that are ancillary t o the method (such as the Suess effect); this study points ou t sources of error over which the user or analyst can exercise some degree of control.

EXPERIMENTAL PROCEDURES

Samples were assigned t o one of four categories with t h e hope of distinguish- ing between various sources of error: (1) Recounted gas or liquid, (2) intralab analysis of identical aliquots, (3) interlab analysis of identical aliquots, and (4) archaeologic or geologic, cultural or stratigraphic precise equivalents, both intra- and interlab. In general, the results of analysis of the first two categories should give a n indication of “true” laboratory precision, whereas analysis of the last two should bear on overall laboratory accuracy. Theoretically, all the above categories reflecting sources of error are additive, with errors in category 2, for example, including errors in category 1 , etc. Common t o all categories is the error of improperly estimating analytical uncertainty through counting statistics. Hence, we would expect either that observed error relative t o quoted error would remain constant (1, if counting error were the only error) or increase going from category 1 t o 4.

More specifically, category 1 will reflect errors that result primarily from counter malfunctions and filling variability or accidents but may include variations in counting medium contaminants such as R n Z Z Z ; category 2, intralab errors such as changing pretreatments, sample preparation errors (such as fractionation), sample mixing or contamination; category 3, interlab errors such as inconsistent and inaccurate standards calibration, system and preparation variability, and sample inhomogeneity ; and, category 4, field or packaging contamination, differences in carbon-bearing phases (e.g., betweek charcoal and shell), as well as the more subjective errors resulting from poor estimates of archaeologic or geologic equivalence. Stukenrathl discusses errors of t h e type that would be reflected in category 4.

The data here presented are compiled from published date lists and other publications of 630 finite age dates included in 260 series of replicates. All dates and errors were converted t o 6 form so that errors would be of approximately the same magnitude, t o reduce numbers t o manageable size, and t o render errors symmetrical. Results from about 50 out of all present and defunct I 4 C labs are included in this report. Tree-ring replicates and finite geophysical and hydrologic measurements were excluded, since we judged that they are not representative of routine 14C analysis. An examination of the accuracy of tree-ring dates by Clark,’ performed in a manner similar t o this study, found that the variability of tree-ring dates exceeded the predicted laboratory error o n the basis of counting statistics alone.

Using the quoted laboratory 1 (I error, xzs were calculated for each of the 260 replicate runs as follows:

where 6, is a measured age in 6 form, 6, is the replicate series average of n ages, and 6, is the quoted lab error. From these xzs, cumulative probabilities were

Page 3: NON-COUNTING ERRORS IN 14C DATING

176 Annals New York Academy of Sciences

assigned for appropriate degrees of freedom. A probability of 1.0000, where measured differences between ages were much greater than quoted error, is assigned to cumulative probabilities larger than 0.9999. Samples of identical age (x2 = 0) would have probabilities of 0,0000. The sum of X ~ S over a series is also a x2 under the hypothesis of homogeneous errors with degrees of freedom equal to the sum of the degrees of freedom for each series in a category. These sums were computed for each category and are presented in T A B L E 1, where they are given as Zx2/L: df.

TABLE 1

SUMMARY OF STATISTICAL DATA

No. of Series Category in Category I: df Zxz/II: df JEjmGi7

1 2 3 4

41 43 0.544 0.738 70 80 5.585 2.363 73 121 11.250 3.354 76 129 21.605 4.648

RESULTS AND CONCLUSIONS

If we accept the accuracy of a laboratory’s quoted errors, then we would expect an even distribution of probabilities as calculated above for all replicate runs within each category, subject only to sampling error due to series size. In FIGURES 1-5, the histograms would have an equal distribution of members in each interval. The hypothesis of a uniform distribution in overall results and in each category may be tested with the Kolmogorov-Smirnov goodness of fit test.g Category 1 fits best, having a fairly even distribution of probabilities over the observed range, as can be seen in F I G U R E 2. From F I G U R E S 1, 3, 4 and 5, it can be seen that observed cumulative probabilities for the overall data set ( F I G U R E l) , as well as that of categories 2, 3, and 4 ( F I G U R E S 3-5) deviate significantly from empirical probabilities, i.e., the distributions are not uniform, with many probabilities near 1 .OOO (TABLE 1).

FIGURE 1. Plot of overall data.

Page 4: NON-COUNTING ERRORS IN 14C DATING

Par& & Marcus: Errors in 14C Dating

NUMBER O F SAMPLES

I I

.oo -.I0

.61 -.n

.7. -M

.H - .ma

.BE -1.00

0 10

FIGURE 2. Plot of category one data.

NUMBER O F SAMPLES

.OO - . O I

.06 - . I6

. I S -.a6

. P I - . 5 I .55 -.a .45 - .56 . 5 5 - .65 .65 - .75 .75 - .85 .85 - .95 .95 -1.00

0 10 20

FIGURE 3. Plot of category two data.

177

Page 5: NON-COUNTING ERRORS IN 14C DATING

178 Annals New York Academy of Sciences

The s t a t i s t i c d m ( ~ ~ ~ ~ ~ l ) , which would be 1.000 if the probability distribution were uniform, permits an evaluation of the relative magnitude of various sources of error. First, since this statistic is less than 1 for category 1, 14C labs appear t o be doing better at counting their samples than the Poisson statistics would indicate. This may reflect a sorting of published data, in that erroneous results on the same counting medium are likely to be rejected and go unreported, Note also that this category has the smallest sample size and that it was derived from dates of only three different labs. This may explain also why

NUMBER OF 8AMPLLS

-00 - .OD . 0 5 - . I S . I 5 - .25 .25 -.35 .35 - .45 .45 - . I 5 .55 -.65 * 65 - .75

.65 - .95

.95 -1.00

. 7 s - .as

0 10 20 30

FIGURE 4. Plot of category three data.

NUMBER OF SAMPLES

-00 - .05 .05 - . I 5 . I S -.25 .25 - -35 .35 - .45 .45 - .55 .55 - .65 .65 - .75 .75 - .65 . 05 - .SO .*I -1.00

L 0 lo 20 30

FIGURE 5. Plot of category four data.

Page 6: NON-COUNTING ERRORS IN 14C DATING

Pardi & Marcus: Errors in I 4 C Dating 179

both ends of the probability distribution, less than 10 percent and greater than 90 percent, are absent. Second, a repeat preparation and analysis in the same lab (category 2 ) results in errors that are on the order of twice as great as quoted errors. Hence, it would appear that users must be cautious in applying l u counting errors when comparing intralab replicates. Third, heterogeneity in categories 3 and 4 appears to be much greater, perhaps more than four times the lab statistical counting error. Hence, more care must be taken when comparing expected coeval results between labs than within labs. It must be emphasized that the statistic, d m f , is not an estimate of the actual increased error but may be used only in a relative sense. In addition, it is sensitive to outliers, though we can see from F I G U R E S 1-5 that it is qualitatively meaningful.

We have not yet examined the relationship between observed error and counting medium (C, COz, CH4, C2H2, C6H6), neither have we looked for any possible relationship between extreme outliers and particular labs, since we feel that the data base must be expanded severalfold above the present 260 series before such comparisons can be made accurately.

Subject to the limitations stated above, this study demonstrates the advis- ability of reporting dates including lab identifications. In addition, when combined with other studies on the ultimate accuracy of the 14C method, our results strongly support the recommendation that 4C dates should not be reported without reference to material dated or extraordinary procedures made in sample preparation or counting.

SUMMARY

A preliminary examination of 6 3 0 individual measurements in 260 series of replicate 4C analysis indicates that quoted errors do not reflect true measure- ment accuracy. In general, sources of error other than counting error appear to increase as samples are subjected to more potential sources of accident and omission, Some labs appear to underestimate counting errors in their quoted errors, whereas identical samples analyzed in different labs show greater heterogeneity than identical samples analyzed in the same lab. The latter, in turn, have errors significantly greater than quoted errors. Field estimates of age equivalence are also found to be less accurate than interlab analysis of identical samples.

ACKNOWLEDGMENTS

We would like to thank M. Smith for his drafting and D. Cosmatos for his assistance with the data processing.

REFERENCES

1 . CLARK, R. M. 1975. A calibration curve for radiocarbon dates. Antiquity 44:

2. CURRIE, L. A. 1972. The evaluation of radiocarbon measurements and inherent statistical limitations in age resolution. Proc. 8th Intern. Conf. Radiocarbon Dating. Lower Hutt City. 2: 598-611.

251 -266.

Page 7: NON-COUNTING ERRORS IN 14C DATING

180 Annals New York Academy of Sciences

3.

4.

5.

6.

7.

8.

9.

10.

11.

12.

13.

14.

15.

HULTIN, E. 1972. "he accuracy of radiocarbon dating. Ethologiska Studier 32:

LONG, A. & B. RIEPETEAU. 1974. Testing contemporaneity and averaging radio- carbon dates. Amer. Antiquity 39: 205-215.

NEUSTUPNY, E. 1970. The accuracy of radiocarbon dating. In Radiocarbon Vari- ations and Absolute Chronology. 1.U. Olson, Ed.: 23-34. John Wiley & Sons, Inc. New York, N.Y.

RALPH, E. K. & H. N. MICHAEL. 1967. Problems of the radiocarbon calendar. Archaeometry 10: 3-1 1.

RALPH, E. K., H. N. MICHAEL & M. C. HAN. 1973. Radiocarbon dates and reality. MASCA Newsletter 9(1): 1-20.

RENFEW, C. & R. M. CLARK. 1974. Problems of the radiocarbon calendar and its calibration. Archaeometry 16: 5-1 8.

S O U L , R. R. & E. J . , ROHLF. 1969. Biometry: 1-776. W. H. Freeman and Co., Publishers. San Francisco, Calif.

SPAULDING, A. C. 1958. The significance of differences in carbon-14 dates. Amer. Antiquity 23: 309-311.

STUCKENRATH, R., Jr. 1965. Carbon-14 and the unwary archeologist. In Radio- carbon and Tritium Dating: 304-318. Univ. Washington Press. Pullman, Wash.

STUIVER, M. & H. E. SUESS. 1966. On the relationship between radiocarbon dates and true sample ages. Radiocarbon 8: 534-540.

TAUBER, H. 1958. Difficulties in the application of C-14 results in archaeology. Archaeologia Austriaca 24: 59-69.

VERTES. L. 1965. A comment on the C-14 date of eastern middle European Paleolithic, with suggestions for future standardization of radiocarbon age determi- nations, In Radiocarbon and Tritium Dating: 210-233. Pullman, Wash.

WENDLAND, E. H. & D. L. DONLEY. 1971. Radiocarbon-calendar age relationship. Earth Planet. Sci. Lett. 2: 135-139.

185-196.