QUANTITATIVE METHODS IN PALAEOECOLOGY AND PALAEOCLIMATOLOGY
Class 3
Analysis of Stratigraphical Data
Espegrend August 2008
Introduction to temporal stratigraphical dataSingle sequence
Partitioning or zonationSequence splittingRate-of-change analysisGradient analysis and summarisationAnalogue matchingRelationships between two or more sets of variables in same
sequenceTwo or more sequences
Sequence comparison and correlationCombined scalingDifference diagramsMapping
Locally weighted regression (LOWESS)INQUA Commission for the Study of the HoloceneSummary
CONTENTS
INTRODUCTIONIn ecology, analysis of quadrats, lakes, streams, etc. Assume no autocorrelation, namely cannot predict the values of a variable at some point in space from known values at other sampling points.
PALAEOCOLOGY – fixed sample order in time.
strong autocorrelation – temporal autocorrelation
STRATIGRAPHICAL DATA
biostratigraphic, lithostratigraphic, geochemical, geophysical, morphometric, isotopic
multivariate
continuous or discontinuous time series
ordering very important – display, partitioning, trends, interpretation
Numerical Techniques in Palaeoecology
Range of numerical of data-analytical techniques available for the summarisation, synthesis, and interpretation of palaeoecological dataMain purposes
1. Detect major patterns in complex data
2. Summarise data in terms of fossil zones, major trends, and groups of fossil types that covary
3. Identify ‘hidden’ features of data such as statistically significant splits in individual curves, rates of change, etc
4. Interpretation of data in terms of modern analogues (vegetation types) and past environment (e.g. climate)
5. Aid comparison and correlation of sequences from 1 or more sites
6. Display fossil data as maps to explore spatial patternsNumerical techniques are a useful part of the
palaeoecologist’s ‘tool-kit’
SINGLE SEQUENCE
Useful for:
1) description
2) discussion and interpretation
3) comparisons in time and space
“sediment body with a broadly similar composition that differs from underlying and overlying sediment bodies in the kind and/or amount of its composition”.
Zonation or Partitioning of Stratigraphical Data
CONSTRAINED CLASSIFICATIONS
1) Constrained agglomerative procedures CONSLINK
CONISS
2) Constrained binary divisive procedures
Partition into g groups by placing g – 1 boundaries.
Number of possibilities
Compared with non-constrained situation.
Criteria – within-group sum-of-squares or variance SPLITLSQ
– within-group information SPLITINF
n n gg 1 1 2 for
2 11n
n
i
m
k ikikik qpp
1 1
log
3) Constrained optimal divisive analysis OPTIMAL
2 group______________________________
3 group
4 group
4) Variable barriers approachBARRIER
All methods in one program: ZONE
n1
n1
n1n2
n2 n3
Pollen diagram and numerical zonation analyses for the complete Abernethy Forest 1974 data set.
Birks & Gordon 1985
CONISS = constrained incremental sum-of-squares (= constrained Word's minimum variance)
What about CONISS in TILIA?
TILIAZONE
OPTIMAL SUM OF SQUARES PARTITIONS OF THE ABERNETHY FOREST 1974 DATA
Number of groups g (zones)
Percentage of total sum-of-
squares
Markers
2 59.3 15
3 28.4 15 32
4 18.9 15 33 41
5 14.7 15 33 41 45
6 10.6 15 32 34 41 45
7 8.1 15 26 32 34 41 45
8 5.8 8 15 26 32 34 41 45
9 4.7 8 15 24 29 32 34 41 45
10 3.9 8 15 24 29 32 33 34 41 45
ZONE
K D Bennett (1996) Determination of the number of zones in a biostratigraphical sequence. New Phytologist 132, 155-170
Broken stick model
Pn ir
i k
n
1 1
BSTICK
HOW MANY ZONES?
Pollen percentage diagram plotted against depth. Lithostratigraphic column is represented; symbols are based on Troels-Smith (1995).
Tzedakis 1994
Ioannina Basin
Ioannina Basin
Tzedakis 1994
Variance accounted for by the nth zone as a proportion of the total variance (fluctuating curve) compared with values from a broken-stick model (smooth curve):
(a) randomized data set,
(b) original data set.
Zonation method: binary divisive using the information content statistic.
Data set; Ioannina.
Original data
Broken stick model
BSTICK
Bennett 1996
Sequence Splitting
Walker & Wilson 1978 J Biogeog 5, 1–21
Walker & Pittelkow 1981 J Biogeog 8, 37–51
SPLIT, SPLIT2
BOUND2
Need statistically ‘independent’ curves
Pollen influx (grains cm–2 year–1)
PCA or CA or DCA axes CANOCO
Aitchison log-ratio transformation LOGRATIO
i
ikik p
pZ log
m
k
iki m
pp1
loglogwhere
Correlograms of sequence splits with charcoal, inorganic matter and total pollen influxes for three sections of the pollen record. The vertical scales give correlations; the horizontal scales give time lag in years (assuming a sampling interval of 50 years).
Amount of palynological compositional change per unit time.
Calculate dissimilarity between pollen assemblages of two adjacent samples and standardise to constant time unit, e.g. 250 14C years.
Jacobson & Grimm 1986 Ecology 67, 958-966
Grimm & Jacobson 1992 Climate Dynamics 6, 179-184
RATEPOLPOLSTACK
(TILIA)
Rate Of Change Analysis
Graph of distance (number of standard deviations) moved every 100 yr in the first three dimensions of the ordination vs age. Greater distance indicates greater change in pollen spectra in 100yr.Jacobson & Grimm 1986
Graph of distance (number of standard deviations) moved every 100 yr in the first three dimensions of the ordination vs. age. Greater distance indicates greater change in pollen spectra in 100 yr.
MANY PROXIES, ONE SITE
Chord distance between samples at Solsø, Skånsø, and Kragsø, calculated on smoothed data with 35 taxa and interpolated at 400 year and 1,000 year intervals.
- fertile
- poor
- poor
ONE PROXY, MANY SITES
Pollen percentages from Loch Lang, Western Isles, plotted against age (radiocarbon years BP). Data from Bennett (1990).
Pollen percentages from Hockham Mere, eastern England, plotted against age (radiocarbon years BP). Data from Bennett (1983).
Comparison of Holocene rates of change at Loch Lang and Hockham Mere, with 2 - 2 dissimilarity coefficient on unsmoothed data, with a radiocarbon timescale.
High rates of change at Hockham Mere
Rate x5 that at Loch Lang
Data Summarisation by Ordination or Gradient Analysis of Single
Sequence
Ordination methods CA/DCA or PCA
joint plot biplot
Sample summary CA/DCA/PCA
Species arrangementCCA
CA = correspondence analysis
DCA = detrended correspondence analysis
PCA = principal components analysis
CCA = canonical correspondence analysis
CANOCOR
Biplot of the Kirchner Marsh data; C2 = 0.746. The lengths of the Picea and Quercus vectors have been scaled down relative to the other vectors. Stratigraphically neighbouring levels are joined by a line.
PCA Biplot 74.6%
Gordon 1982
Correspondence analysis representation of the Kirchner Marsh data; C2 = 0.620. Stratigraphically neighbouring levels are joined by a line.
CA Joint Plot 62%
Gordon 1982
Stratigraphical plot of sample scores on the first correspondence analysis axis (left) and of rarefaction estimate of richness (E(Sn)) (right) for Diss Mere, England. Major pollen-stratigraphical and cultural levels are also shown. The vertical axis is depth (cm). The scale for sample scores runs from –1.0 (left) to + 1.2 (right).
DCS axes 1 and 2 for a south Finnish pollen sequence plotted (right) in relation to time.
The 1st and 2nd axis of the Detrended Correspondence Analysis for Laguna Oprasa and Laguna Facil plotted against calibrated calendar age (cal yr BP). The 1st axis contrasts taxa from warmer forested sites with cooler herbaceous sites. The 2nd axis contrasts taxa preferring wetter sites with those preferring drier sites
Haberle & Bennett 2005
Percentage pollen and spore diagram from Abernethy Forest, Inverness-shire. The percentages are plotted against time, the age of each sample having been estimated from the deposition time. Nomenclatural conventions follow Birks (1973a) unless stated in Appendix 1. The sediment lithology is indicated on the left side, using the symbols of Troels-Smith (1995). The pollen sum, P, includes all non-aquatic taxa. Aquatic taxa, pteridophytes, and algae are calculated on the basis of P + group as indicated.
Species arrangement
Pollen types re-arranged on the basis of the weighted average for depth
CANOCOTRAN
Analogue Analysis
Modern training set – similar taxonomy
– similar sedimentary environment
Compare fossil sample 1 with all modern samples, use appropriate DC, find sample in modern set ‘most like’ (i.e. lowest DC) fossil sample 1, call it ‘closest analogue’, repeat for fossil sample 2, etc.
Overpeck et al. 1985 Quat Res 23, 87–108
ANALOG
MATCH
MAT
Compare fossil sample i with modern sample j.
Calculate similarity between i and j
Sij
Find modern sample with highest similarity 'ANALOGUE‘
Repeat for all fossil samples
Repeat for all modern samples
? Evaluation
Dissimilarity coefficients, radiocarbon dates, pollen zones, and vegetation types represented by the top ten analogues from the Lake West Okoboji site.
Maps of squared chord distance values with modern samples at selected time intervals
Plots of the minimum squared chord-distance for each fossil spectrum at each of the eight sites.
A schematic representation of how fossil diatom zones/samples in a sediment core from an acidified lake can be compared numerically with modern surface sediment samples collected from potential modern analogue lakes. In this space-for-time model the vertical axis represents sedimentary diatom zones defined by depth and time; the horizontal axis represents spatially distributed modern analogue lakes and the dotted lines indicate good floristic matches (dij = <0.65), as defined by the mean squared Chi-squared estimate of dissimilarity (SCD, see text).
Flower et al. 1997
Flower Flower et alet al. 1997. 1997
Comparison and Correlation Between Time Series
Two or more stratigraphical sets of variables from same sequence.
Are the temporal patterns similar?
(1) Separate ordinations
Oscillation log - likelihood G-test or 2 test
(2) Constrained ordinations
Pollen data - 3 or 4 ordination axes or major patterns of variation Y
Chemical data - 3 or 4 ordination axes X
Depth as a covariable
Does 'chemistry' explain or predict 'pollen'? i.e. is variance in Y well explained by X?
Lotter et al., 1992 J. Quat. Sci. Pollen 16O/18O (depth)
34% 16% 12%
79% 12% 4% 1%
Comparison and Correlation Between Time Series
Two or more stratigraphical sets of variables from same sequence.
Are the temporal patterns similar?
(1) Separate ordinations
Oscillation log - likelihood G-test or 2 test
(2) Constrained ordinations
Pollen data - 3 or 4 ordination axes or major patterns of variation Y
Chemical data - 3 or 4 ordination axes X
Depth as a covariable
Does 'chemistry' explain or predict 'pollen'? i.e. is variance in Y well explained by X?
Lotter et al., 1992 J. Quat. Sci. Pollen 16O/18O (depth)
Pollen, oxygen-isotope stratigraphy, and sediment composition of Aegelsee core AE-1 (after Wegmüller and Lotter 1990)
Pollen and oxygen-isotope stratigraphy of Gerzensee core G-III (after Eicher and Siegenthaler 1976)
Is there a statistically significant relationship between the pollen stratigraphy and the stable-isotope record?
Summary of the results from detrended correspondence analysis (DCA) of late-glacial pollen spectra from five sequences. The percentage variance represented by each DCA axis is listed.
Reduce pollen data to DCA axes. Use these then as ‘responses’
Site No. of samples
No. of taxa
DCA Axis
1 2 3 4
Aegelsee AE-1 100 26 57.2 12.0 2.3 1.4
Aegelsee AE-3 54 32 44.3 3.3 1.5 1.4
Gerzensee G-III 65 28 37.6 4.0 1.2 0.9
Faulenseemoos 62 25 44.1 18.8 5.0 3.8
Rotsee RL-250 44 23 38.2 13.3 3.1 2.3
Results of redundancy analysis and partial redundancy analysis permutation tests for the significance of axis 1 when oxygen isotopes and depth are predictor variables, when oxygen is the only predictor, and when oxygen isotopes are the predictor variable and depth is a covariable.
Site Predictor variable: 18O
and depth
Predictor variable: 18O
Covariable: depth
Predictor variable:
18O
Number of response
variables (DCA axes)
Pollen DCA axes
Aegelsee AE-1
0.01a 0.01a 0.02a 2
Aegelsee AE-3
0.01a 0.16 0.20 1
Gerzensee G-III
0.01a 0.46 0.57 1
Faulenseemoos
0.01a 0.01a 0.01a 3
Rotsee RL-250
0.01a 0.21 0.08 2
a Significant at p< 0.05
Lotter et al. 1992
Regional zones, description of common features, interpretation, detection of unique features.
Sequence comparison and correlation.
Sequence slotting SLOTSEQ
FITSEQ
CONSSLOT
Combined scaling of two or more sequences. CANOCO
Difference diagrams
Mapping procedures
ANALYSIS OF TWO OR MORE SEQUENCES
S2 (B1, B2, ..., B7), illustrating the contributions to the measure of discordance (S1, S2) and the 'length' of the sequences, (S1, S2).
The results of sequence-slotting of the Wolf Creek and Horseshoe Lake pollen sequences ( = 2.095). Radiocarbon dates for the pollen zone boundaries are also given, expressed as radiocarbon years before present (BP).
SLOTSEQ
Birks & Gordon 1985
Slotting of the sequences S1 (A1, A2, ..., A10) and
Sequence Comparison and Correlation
Comparison of oxygen-isotope records from Swiss lakes Aegelsee (AE-3), Faulenseemoos (FSM) and Gerzensee (G-III) with the Greenland Dye 3 record (Dansgaard et al. 1982). LST marks the position of the Laacher See Tephra (11,000 yr BP). Letters and numbers mark the position of synchronous events (for details see text).
AE-3 FSM G III Dye 3
Psi values for pair-wise sequence slotting of the stable-isotope stratigraphy at five Swiss late-glacial sites and the Dye 3 site in Greenland. Values above the diagonal are constrained slotting, using the three major shifts shown in previous figure; values below the diagonal are for sequence slotting in the absence of any external constraints. The mean 18O and standard deviation for each sequence is also listed. CONSLOXY
Lotter et al 1992
Fugla Ness, Shetland
Combined Scaling or Ordinations
Pollen diagram from Sel Ayre showing the frequencies of all determinable and indeterminable pollen and spores expressed as percentages of total pollen and spores (P).
Abbreviations: undiff. = undifferentiated, indet = indeterminable.
The 1st and 2nd axis of the Detrended Correspondence Analysis for Laguna Oprasa and Laguna Facil plotted against calibrated calendar age (cal yr BP). The 1st axis contrasts taxa from warmer forested sites with cooler herbaceous sites. The 2nd axis contrasts taxa preferring wetter sites with those preferring drier sites
Haberle & Bennett, 2004
Comparison of Bjärsjöholmssjön and Färskesjön using principal component analysis. The mean scores of the local pollen zones and the ranges of the sample scores in each zone are plotted on the first and second principal components, and are joined up in stratigraphic order. The Blekinge regional pollen assemblage zones are also shown.
Birks & Berglund 1979
Comparison of Färskesjön and Lösensjön using principal component analysis. The mean scores of the local pollen zones and the ranges of the sample scores in each zone are plotted on the first and second principal components, and are joined up in stratigraphic order. The regional pollen assemblage zones are also shown.
Pollen percentage diagram of selected taxa plotted against depth. Lithostratigraphic symbols are based on Troels-Smith (1995). For correlations and ages see Tzedakis (1993, 1994).
Tzedakis & Bennett 1995
5e
7c
9c
11a + b + c
Pollen percentage diagrams of selected arboreal taxa of the Metsovon, Zista, Pamvotis and Dodoni I and II forest periods of Ioannina 249
5e
7c
9c
11a + b + c
Tzedakis & Bennett 1995
Solar insolation values of mid-month day for selected periods at latitude 39º40'N. Values are given for July and January extremes and July minus January for each interglacial period calculated at thousand year intervals. Values are expressed in cal cm2 day-1. In parentheses are percentage differences from 10 ka values. Timing of extreme insolation excursions also given. Data from a computer program written by N.G. Pisias, based on Berger (1978). Chronology based on Imbrie et al. (1984) and Martinson et al. (1987)
Combined plot of sample scores on the first two principal components for Metsovon, Zista, Pamvotis, and Dodoni I forest periods. Asterisks indicate the base of the intervals considered.
Results of comparison of vegetation and climatic signatures of different interglacial periods. '+' sign means similar and '-' means different. First sign refers to climate and second to vegetation character.
Different climate, similar pollen in one comparison
Tzedakis & Bennett, 1995
In multi-proxy studies (e.g. pollen, diatoms, chironomids, etc. studied on the same core), important question is ‘are the major stratigraphical patterns of variation (‘signal’) the same in all proxies?’
Laguna Facil, southern Chile
Massaferro et al. 2005 Quaternary Science Reviews 24: 2510-2522
Pollen and chironomids studied on the same core
Simplified each data-set to the first ordination axes of a correspondence analysis (CA) and a principal components analysis (PCA) for both data-sets
Multi-proxy studies
Massaferro et al. 2005
Chironomid stratigraphy
Massaferro et al. 2005
Pollen stratigraphy
Massaferro et al. 2005
Can detect similarities in both proxies and differences
1. Major change in both prior to 14,700 cal yr BP.
2. Changes in the chironomids tend to lag behind changes in the pollen. Perhaps a chironomid response to changes in vegetation (tree canopy and forest type) or lake chemistry, resulting from changes in catchment soils as a result of vegetational change.
3. At about 7200 cal yr BP, chironomids change before the pollen. May be a response to climate change.
4. Strong correlations between the charcoal stratigraphy and pollen and chironomid stratigraphies. Probable importance of fire and/or vulcanism in influencing both vegetational and limnological dynamics.
Charc
o
al
Massaferro et al. 2005
Can use ordination methods to summarise several palaeoecological proxies and to compare with other proxies
Major changes between pre-European period (A)
and European settlement (B)
Lake Euramoo, NE Queensland, last 800 years
Haberle et al. 2006
Tested how well different proxies ‘predict’ or ‘explain’ (in a statistical sense) other proxies
Only proxy that significantly predicted other proxies was pollen that predicted changes in diatoms (25.4%) and chironomids (15.4%)
Illustrates the importance of catchment and its vegetation on the lake and its biota
Difference Diagrams
Pollen percentage difference diagram for the Hockham Mere and Stow Bedon sequences for selected taxa, plotted against radiocarbon age. Note different percentage scale for each taxon.
Location of the two coring sites, Rezina Marsh and Gramousti Lake, in relation to
altitude.
Pollen percentage difference diagram to compare results between the pollen percentage values of selected taxa at Rezina Marsh and Gramousti Lake. The values are plotted against an estimated time scale and have been calculated at a time interval of 250 yr. Values to the right of the axis (blue) indicate a higher recorded percentage of a taxon at Rezina Marsh, values to the left (red) indicate a higher recorded percentage of the taxon at Gramousti Lake.
Distribution in northern England of maximum
values for pollen of Tilia during the period 5000 to 3000 BC
Mapping
Maps of pollen frequencies 5000 years BP
Pinus Betula
Maps of pollen frequencies 5000 years BP
Ulmus Corylus
Maps of pollen frequencies 5000 years BP
Quercus Tilia
Map of pollen frequencies 5000 years BP
Alnus
Map of scores of pollen spectra on 1st principal component, 5000 years BP
Map of scores of pollen spectra on 2nd principal component, 5000 years BP
Map of scores of pollen spectra on 3rd principal component, 5000 years BP
Provisional map of wood-land types for the British Isles 5000 years ago.
Vegetation regions reconstructed from pollen data for 9000, 6000, 3000, and 0 yr BP
LOCALLY WEIGHTED REGRESSION
W.S. Cleveland LOWESS locally weighted regression or LOESS scatterplot smoothing
May be unreasonable to expect a single functional relationship between Y and X throughout range of X.
(Running averages for time-series – smooth by average of yt-1, y, yt+1 or add weights to yt-1, y, yt+1)
(A) Survival rate (angularly transformed) of tadpoles in a single enclosure plotted as a function of the average body mass of the survivors in the enclosure. Data from Travis (1983). Line indicates the normal least-squares regression. (B) Residuals from the linear regression depicted in part A plotted as a function of the independent variable, average body mass.
Linear
(A) DATA from previous graph A with a line depicting a least-square quadratic model. (B) Data from previous graph A with a line depicting LOWESS regression model with f = 0.67. (C) Data from previous graph A with a line depicting a LOWESS regression model with f = 0.33.
Quadratic LOWESS LOWESS
LOWESS - more general
1. Decide how “smooth” the fitted relationship should be.
2. Each observation given a weight depending on distance to observation x1 for all adjacent points considered.
3. Fit simple linear regression for adjacent points using weighted least squares.
4. Repeat for all observations.
5. Calculate residuals (difference between observed and fitted y).
6. Estimate robustness weights based on residuals, so that well-fitted points have high weight.
7. Repeat LOWESS procedure but with new weights based on robustness weights and distance weights.
Repeat for different degree of smoothness, to find “optimal” smoother. R
How the LOESS smoother works. The shaded region indicates the window of values around the target value (arrow). A weighted linear regression (broken line) is computed, using weights given by the 'tri-cube' function (dotted curve). Repeating this process for all target values gives the solid curve.
tri-cube function
linear regression
target value
Round Loch of Glenhead
LOWESS curve
INQUA COMMISSION FOR THE STUDY OF THE
HOLOCENE
Working Group on Data-Handling Methods
To get newsletters, software, etc.
http://www.chrono.qub.ac.uk/inqua/
SUMMARY
1. A range of robust numerical techniques are now available to assist in the partitioning, summarisation, synthesis, and interpretation of palaeoecological data
2. Gradient analysis or ordination help to detect the major patterns of variation in complex palaeoecological data
3. More specialised techniques like sequence-splitting, rate-of-change analysis, and analogue analysis can be useful in particular research studies
4. There are several techniques for summarising patterns of similarity and dissimilarity at two or more sites
5. Palaeoecological data can be mapped at a range of spatial scales
6. Locally weighted regression (LOWESS) is a useful tool for highlighting ‘signal’ in noisy stratigraphical data