franz siegmund turoial for correspondence analysis

1 / 35Siegmund, F. (2014). Tutorial correspondence analysis

Tutorial for archaeologists:

How to perform a correspondence analysis ‐

a practitioners guide to success and reliability

Frank Siegmund

( Heinrich Heine University Düsseldorf )

( draft, version 0.9; March 20th, 2014 ) *

(1) Introduction

Correspondence analysis (CA) is a widely known and well‐reasoned multivariate statisti‐

cal method for bringing data (cases as variables) into a sequence, when the data follow

a unimodal model. The term “multivariate” means statistic procedures, which take more

than two variables into account at once instead of describing only one phenomenon or

looking for the relation between two variables, like stature and body weight. A well‐

known graph for a “unimodal model” is a bell shaped curve (fig. 1). The term unimodal

model is meant as a contrast to linear (or similar, like quadratic, exponential, ...) models.

Let us keep things simple and give examples known to everyone: The relation between

stature and body weight or the relation between the velocity and the weight of cars and

their gasoline consumption is following a linear model more or less. In general, taller

people are heavier and shorter people are lighter. The heavier a car is and the quicker

the driver is moving it, the more gasoline it consumes. A good example for unimodal

behaviour is the usual relation between age and body weight: New born babies get

heavier when growing up, young adults get heavier further on, but most people loose

weight when getting old. The short history of the devices for data storage in a computer

is another example of a unimodal model. This history is already close to the archaeologi‐

cal applications of CA. Mechanical solutions to store information like punched tapes and

punched cards were replaced by magnetic devices in the 1960s. After some years the

huge Winchester devices were followed by 8‐inch floppy disks in the 1980s, then by

5.25‐inch floppy disks, then by 2.5‐inch disks, then by re‐writable CDs up to our actual

USB sticks. This is exactly the way archaeologists think about time and artefacts: A

special kind of object ‐ often named as a “type” by archeologists ‐ isn’t still invented, its

frequency in the world is zero. When it is invented it occurs in low frequencies first, and

when becomes fashionable it is produced and used in high frequencies. After some time


the frequencies become lower, because other useful alternatives are invented and come

into common use, and after some time the type isn’t in use any longer, which results in

frequencies close to zero again. To apply a CA correct from an statistical point of view,

the data mustn’t follow the bell shaped curve ‐ statistically named as normal distribution

‐ exactly, they are only expected to show one maximum somewhere in the mid part and

minima on both ends.

Unimodal models aren’t exotic and restricted to archaeology and to the question of

chronology. Another example are plants and animals, that prefer certain environmental

conditions. They grow best over a certain span of temperature, humidity, acidity, ... and

equally avoid deviations from this optimum. Along their environmental conditions they

show a unimodal behaviour.

Another characteristic of CA is its robustness, it can be applied to various kinds of data.

CA is able to work with names (presence / absence information), with counted frequen‐

cies (abundance information) as well as with measured observations, while many other

multivariate statistical methods handle measured observations only. This is another

reason for the frequent use of CA in archaeology. The application of CA isn’t limited to

archaeology. On the contrary, it is widely used e.g. in social sciences or in biology and

ecology. The book “Distinction: A Social Critique of the Judgment of Taste” (1984) from

the famous French sociologist Pierre Bourdieu gives an early example of its application

in sociology.

If you want to analyse a multivariate data set, but you are unsure, whether your obser‐

vations follow a unimodal model, but a linear model instead, the suitable and compara‐

ble linear method is Principal Component Analysis (PCA), a method belonging to the

wide family of factor analyses. Applying PCA to unimodal data ends in erroneous results,

as applying CA to data following the linear model. We will look at an example later on. In

practise, PCA is more sensible to slight violations against its model expectations, while

CA is more robust against violations of the expected model.

(2) The theory and aims of CA

When looking at the actual (March 2014) Wikipedia article “correspondence analysis”,

CA seems to be a highly complicated method. In truth, it isn’t. It is easy to understand,

and easy to calculate and to perform. However, we needn’t to explain the theory here,


because there are good books doing it better (see chapter 4), and we needn’t to calcu‐

late a CA, because there are computers to do so (chapter 3). We must only recognize its

aim: to re‐arrange the sequence of the rows and columns of a table in a way, that the

table is diagonalized at the end. Usually these tables consist of many empty cells or cells

with zero, and all the other cells with frequencies e.g. should be agglomerated to a

diagonal cloud in the mid of the table, starting at its upper left side and ending at its

lower right side, or vice versa (fig. 1). After achieving this aim, the observations are

arranged after a unimodal model: each row and each column of the table starts with a

minimum (empty cells or zero cells), followed by a maximum and followed by a mini‐

mum again. The input into a CA is unsorted information (a disordered table), the result

of a CA is a new sequence of the rows and columns of a table.

Fig. 1. Example of a well diagonalized matrix, where the rows and columns follow the unimodal

model.

In the early stages of this method it was performed by a repeatedly mechanical re‐arranging of the

sequence of the rows and the columns of a table. First, the sequence of the rows was optimised, then

the sequence of the columns, then the sequence of the rows again, and so on, up to the final stable

solution. Therefore this method was also named as sequencing, or sequence dating, and sometimes

as seriation or ordination in the sense of sorting and re‐sorting. A picture of a helpful mechanical

solution can be found at Périn (1980, fig. 23). The first computer software was an automatisation of

this repeated re‐sorting of rows and columns only. Nowadays the mathematical solution is much more

sophisticated and achieved by calculation, at the end the CA proposes a new sequence of the rows and

columns and the table has to be re‐arranged only once. In English as well as in German literature the

aspect of changing the sequence of rows and columns was named as “to order” and ordination, the

term is still in use.

PÉRIN, P. (1980). La datation des tombes mérovingiennes : historique, méthodes, applications. Genève:

Droz.


(3) Software to perform a CA

Most of the widely used programs for CA are freeware or open source software, avail‐

able without any financial costs. You have to only learn how they work. The following list

is a personal selection of the author, who has himself extended practice with WinBASP

and PAST. All of these programs are distributed with documentation. It is really useful to

read their instructions.

‐ WinBASP: The Bonn Archaeological Software Package, Version 5.43 (by Scollar, Irwin

et al.)

source: http://www.uni‐koeln.de/~al001/[ If there are any problems with this or other links, please look for it by Google ]

Look for BaspPast too, which helps exchanging data sets between WinBASP and PAST

(and other spread sheets).Be cautious, there is a software error in BaspPast: When transforming data from WinBASP to PAST, the last

type from the WinBASP list vanishes. When you know it happens, you can prevent it with a simple trick.

Before transferring the data to PAST, add a new last type (without any meaning) to the data set in Win‐

BASP and then use BaspPast for change the data set to PAST. ;‐)

‐ PAST 3.0: PAleontological STatistics Version 3.0 (by Hammer, Øyvind)

source: http://folk.uio.no/ohammer/past/index.html

Version 3.0 of PAST is under construction in the years 2013‐14. If you have problems

with it, take the older, stable version 2.7.Hammer, Ø., Harper, D. A. T. & Ryan, P. D. (2001). PAST: Paleontological statistics software package for

education and data analysis. Palaeontologia Electronica 4(1): 9pp.

http://palaeo‐electronica.org/2001_1/past/issue1_01.htm

‐ CAPCA 2.2 (by Madsen, Torsten), an add‐in to Ms‐Excel 2003 or 2007

source: http://archaeoinfo.dk/

Other software solutions are available at:

‐ WinSERION 3.1 (by Stadler, Peter)

source: http://www.winserion.org/

‐ There are several packages for CA in “R”, see: http://www.r‐project.org/

‐ CANOCO 5 (by ter Braak, Cajo, Univ. Wageningen), a powerful package, but it has to

be paid. More information:

http://www.wageningenur.nl/en/Expertise‐Services/Research‐Institutes/plant‐rese

arch‐international/show/Canoco‐for‐visualization‐of‐multivariate‐data.htm

This tutorial is mainly for computers run by MS‐Windows. Most of these special pro‐

grams run only in Windows, so we are bound to this software system. The only exception

known to me is the powerful statistical package “R” which is distributed for Linux and


MAC, too. There are several packages within “R” to calculate a CA. “R” is for free, it is

really powerful and often used by people working with statistics professionally. But all

my experience with introducing beginners into statistics shows that “R” seems to be

highly complicated to them.

I have been told, that by emulating Windows on a MAC, WinBASP can be used, but I

have no experience with this. A MAC version of PAST was announced in October 2013 as

forthcoming, a test version for MAC OSX can be downloaded (since March 2013) at:

http://folk.uio.no/ohammer/past/Past3.dmg I don’t have any experiences with it.

(4) Literature

CA was invented several times by different researchers and therefore got different

names at its beginnings. In Germany e.g. it was named as “Seriation” when introduced

mathematically by Klaus Goldmann and Ernst Kammerer in 1972. They took the term for

it from Sir William Matthew Flinders Petrie (1853‐1942), who named a similar procedure

in the late 19th century as “seriation” (Petrie 1899), but it was done without mathematics

and computers in these times. Nowadays the scientific community follows the name

proposed for it by the first one ever publishing the actual theory and statistical proce‐

dure, the French statistician Jean‐Paul Benzécri (1973‐1976, L’ analyse des donnees. 2

vols). The statistical literature about CA is overwhelmingly numerous and easy to find. If

you want to read as much as necessary and as few as possible, take the book of Michael

J. Greenacre (2007) including many practical advices, while Greenacre (1984) is fre‐

quently cited as the actual standard introduction into the theory of CA.

GREENACRE, M. J. (1984). Theory and application of correspondence analysis. London:

Academic Press.

GREENACRE, M. J. (2007). Correspondence analysis in practice. 2nd ed. Boca Raton: Chap‐

man & Hall.

There are numerous applications of correspondence analysis in archaeology, too many

to make a short list here. They are easy to find with the usual search engines or library

catalogues. At the end of this text you will find a selected list (“Further readings”).

Everyone planning an archaeological research project with a CA should take one or a few

of them as an example. It is useful to choose examples which are close to the topic of

their own research project, which is another reason to avoid a fixed list of literature

here. As an exception my own book might be allowed to be mentioned here (Siegmund

1998), where I have developed a chronology for a large sample of Early Medieval graves


from Western Germany, and a book on Early Medieval chronology in English language

(Bayliss et al. 2013), which gives a very detailed insight into the methodology and the

real working process of an analysis.

SIEGMUND, F. (1998). Merowingerzeit am Niederrhein. Rheinische Ausgrabungen 34. Köln:

Rheinland‐Verlag.

BAYLISS, A., HINES, J., HØILUND NIELSEN, K., MCCORMAC, G. & SCULL, CHR. (2013). Anglo‐Saxon

graves and grave goods of the 6th and 7th centuries AD: a chronological framework.

Edited by J. Hines & A. Bayliss. The Society for Medieval Archaeology Monograph 33.

London: The Society for Medieval Archaeology.

In Italian language: ALBERTI, G. (forthcoming). [title still unknown]. Archeologia e Calcolato‐

ri 2013 (in press). See: http://soi.cnr.it/archcalc/

(5) Starting with the practical part: the choice of software

CA means working with tables. It is not just putting data into a table, to calculate the CA

and to have a final result then. On contrary, it means working intensively with such a

table for some time, changing things and experimenting with several ideas to get good

and stable results at the end. If there are many data, these tables could get really large

and difficult to oversee, and adding or changing information to these tables is error‐

prone. Therefore I like the program WinBASP, although it looks old‐fashioned now.

WinBASP has powerful tools to support the input and the management of data, espe‐

cially when these data are based originally on lists instead of tables (like: type xyz occurs

in graves a, b, c, d, ...). I don’t have practical experience with WinSERION, but its data

management seems to follow a similar concept. Everyone planning a project with

numerous graves (or features) and types should have a look at these two programs, they

could be very useful. Whenever there is a small table only, the program PAST is my

preference. It is free and really well done, and it offers much more statistical procedures

than CA only. This short tutorial therefore is based on PAST. If you want to follow the

practical part of this tutorial, please download PAST from its website (see chapter 3) and

install it on your computer. Additionally this tutorial uses some sample data. You can

type them into your computer by yourself, or copy them from the shared devices.

The next pages are easier to follow, when you print them. You mustn’t, but it

makes it easier to follow the instructions step by step.


(6) Starting with PAST

Start PAST now by double‐clicking on its icon. Go to “File”, go to “Open” and open the

file “1a_ideal‐matrix‐unordered” (fig. 2). You should see a table now, similar to an Excel

sheet (or to sheet e.g. made by Open‐/Libre‐Office Calc). The table contains ten types

named from A to K and ten features named grave 1 to 10. In general, each type is

present in three different graves, and each grave contains three different types. The

types are noted as counted frequencies, therefore you will see numbers in the cells, or

zeros, when this combination of a grave and a type isn’t present. If you want, click on

“Bands” to mark it, which makes the table easier to read.

Fig. 2. Screen shot of our first practical example, the data matrix loaded into PAST.

(6.1) PAST step 1: calculating a CA and reading the scatter plot of axis 1 and axis 2

Let’s do a CA now. Click with your pointer (mouse / mouse & Control) on the uppermost

left corner of the table (grave 6, type H), and go to the lowermost right corner (grave 5,

type C) to mark and highlight the whole table. It should be in a light blue tone now. This

is an important step when using PAST, you must always select an active data set to be

analysed by marking it. See the uppermost command line now with “File”, “Edit”,

“Transform”, ... and go to the button “Multivariate”, then “Ordination”, then “Corre‐

spondence (CA)” and click on it (fig. 3). Immediately a new window “Correspondence

analysis” should appear on your screen. The CA is already done, let’s look at the results.

The new window shows four flags above: “Summary”, “Scatter plot”, “Row scores”,

“Column scores”. Click on “Scatter plot” (fig. 4). You can draw the window a little bigger

with your mouse. This scatter plot shows (by default) the first two axes calculated by the

CA and puts in the graves/rows (black) and the types/columns (blue) according to the

results of the CA. The dots all together should form a kind of a parabola or a horse shoe


now. On the right side of the window you can change the default settings according to

your needs. If you click e.g. on “Plot columns” to demark it, the columns/types will

disappear from the scatter plot. Now the graph is easier to read.

Fig. 3. Screen shot of PAST immediately before calculation the CA.

Fig. 4. Screen shot of the new, second window of PAST, showing the scatter plot of axis 1 (hori‐

zontally) with axis 2 (vertically).

Our example tables use “graves” and “types” as categories. But CA isn’t restricted

to analysis of grave assemblages. It was also successfully applied to features or

layers of settlements and their findings. Another useful application of CA is the

analysis of objects themselves with the object taken as “grave/feature” and their

attributes as “types”.


(6.2) PAST step 2: changing the sequence of graves and types according to the results

of CA and analysing the new table

The scatter plot shows “Axis 1" as x‐axis that means horizontally from left to right (fig 4).

This is the first, the dominant dimension in the data set calculated by the CA. The second

axis “Axis 2" is shown vertically from top to down. We should read the sequence of the

graves along the horizontal axis only. It says grave 1, 2, 3, .... to grave 10, which is differ‐

ent from the succession of our original table “1a_ideal‐matrix‐unordered”. We take a

look at the types now: go to the right side, click on “Plot columns” to mark the columns

(types) and on “Plot rows” to unmark the rows (graves). Now we can see the new

sequence of the types from left to right, which is type A, B, C, ... to type K. Once again,

this is the sequence along the first, dominant axis of our material as calculated by the

CA.

Now we will take a look at the other flags above (fig. 5). “Row scores” and “Column

scores” show the graves and the types respectively, “Axis 1", 2, 3, ... and numbers in the

cells. These are the statistical results of the CA displayed in the scatter plot analysed

first. The meaning of these axes will be given later on, it needs an explanation. The flag

at the left end, “Summary”, shows Axis 1, 2, 3... and their “Eigenvalue”, “% of total” and

“Cumulative”. These terms will be explained in a moment. First, let’s try to re‐arrange

our original table. To do so, we go back to the new window “Correspondence analysis”

and there to the flag “Scatter plot” to re‐display it. Let’s start with the types, displayed

in blue.

Fig. 5. The flag “Summary” activated showing the numerical results of our CA. For the explanati‐

on see chapter 7.1.


We return to our first (main) window with the “1a_ideal‐matrix‐unordered”. There is a

field with several boxes below the uppermost command line. Look at the (second) box

“Click mode”, where a flag was set by default at “Select”. Click to “Drag rows/columns,

sort”. Now you are able to drag and move the columns and the rows of the table to

change their position (not their content!). And so we are doing now, we re‐arrange the

position of all the columns according to the sequence given in the scatter plot in our

window “Correspondence analysis”. Go with the mouse pointer to the title of the

column and drag it into the position you want. The column type A should be now the

first column of the table. Then re‐arrange type B, and so on. First step is done. Now we

have to renovate the sequence of the graves. Go to the window “Correspondence

analysis”, change from “Plot columns” to “Plot rows” to see the graves now, and read

their order from left to right. Now we re‐arrange the sequence of the graves according

to the same rule. Move grave 1 first to the top, grave 2 one row below, and so on. At the

end, the sequence of the graves should be 1, 2, 3, ... to 10, and the sequence of the types

A, B, C, .. to K (fig. 6). Both sequences are now in exactly the order proposed by the axis

1 calculated by the CA. Please analyse the final table to see the ‐ very simple ‐ model I

have proposed for this tutorial: In the columns the types are non‐existing (0), invented

(1), fashionable (2), becoming out‐dated (1) and vanished (0) afterwards, and in the rows

the graves contain three types, two of them with one piece and one of them in two

pieces. This is my “ideal table”. Sorry for giving such simple names to graves and types,

but it is easier this way to gain our first experience with CA.

Fig. 6. Screen shot of our matrix, with rows and columns re‐arranged now according to the

results of the CA.


(7) Some further explanations of the statistical values

First of all: The most important work when performing a CA is the archaeological part.

We need to define types or attributes well, and we have to select the right ones, those

who are able to answer our question. If the question is about chronology ‐ which is

assumed here ‐ the types should be sensible to time, if one is asking e.g. about gender,

the types should be sensible to gender. We have to choose the graves (or features, or

objects with attributes) carefully. When asking about chronology, “closed features” like

well documented graves are necessary, while features which collected different material

for centuries before being buried under earth are less useful. The focus of your work

should be archaeology, a deeper understanding of the statistics is useful, but not really

necessary. The following short explanation of some of the statistical terms and back‐

ground could be handy.

(7.1) “Axis”, “Eigenvalue”, “Inertia”

CA is multi‐dimensional. CA first tries to find the main, the dominant dimension (or

sequence) in the data, named as axis 1. After calculating the first sequence, it looks for

a second dimension, which is independent from the first one, and goes on with a third

dimension, and so on. This is a purely statistical process, which is similar to Principal

Component Analysis (alias Factor Analysis), which is extracting several independent

factors from a set of data. It is possible, that these second, third or higher dimensions of

the material have some archaeological meaning. But in the practise of archaeological

research it is uncommon to use the second or third dimension. There is an example,

what could be theoretically achieved when working with graves: axis 1 means gender,

axis 2 means time, axis 3 means social status. But this is pure idealistic theory, not

achieved by any study known to me. In practice we should take care of understanding

axis 1, and eventually axis 2 too.

All those axes together should give an optimal “explanation” for all the variations

embedded in the whole table. This total variation within the data set is named as

“inertia”. A part of this whole variation is embedded into the first axis (axis 1). The

importance ‐ in a statistical sense ‐ of each axis (and grave/feature, type/attribute) is its

“eigenvalue”. The higher the eigenvalue of an axis, the more important it is. Axis 1 forms

the first eigenvector of the data set. In our example its eigenvalue is about 0.95 (see

column “Eigenvalue” in the flag “Summary” of the window “Correspondence analysis”;

fig. 5), or 30.56 percent of the total inertia of the whole table (flag “% of total”). Axis 2 in


our example shows an eigenvalue of 0.80, which is 25.86 percent of the total inertia. The

flag on the right side adds the percentages axis by axis up to the end (“Cumulative”). In

general, the first axis should have a high eigenvalue, it should have a high amount of the

total inertia of the table. But practically these numbers should not be considered as too

important. I have seen tables analysed by CA with very good looking eigenvalues, but

without archaeological sense, and on contrary, I have seen very good and valuable

archaeological analyses with poor numbers from a statistical point of view. The proof of

the success and value of a CA is not to be found by statistics in these numbers, but by

archaeological arguments.

(7.2) Row and column scores

Now we can understand the row and columns scores. Each axis reflects a dimension of

its own, and the score shows the position of a grave / feature or a type / attribute within

this dimension. The row and column scores are the values displayed in the scatter plot.

There are more than two axes, and therefore we can look at more than one two‐dimen‐

sional scatter plots. We will see this later in another exercise.

It is important to know, that each axis shows a well defined sequence and the scores

display the position of an individual type or grave within this sequence, but there is no

defined direction of any sequence. The direction of any sequence can be freely changed

into its opposite, e.g. by multiplying all values with ‐1. But changing the direction would

not change the sequence itself and would not change the distance of the individual cases

to each other, which is given by the scores. To give an easier explanation for archaeolo‐

gists: when a sequence presumably means time, the position of each case (grave, type)

within this time sequence is statistically well calculated, but there is no answer to the

question: where is the old end, where is the new end of this sequence. This answer can’t

be given by CA, it has to be derived from external archaeological arguments like stratig‐

raphy or radiocarbon dates.

(7.3) Behind the first dimension (axis) of a CA

As we have seen above, CA usually calculates more than one axis up to a statistical point,

where a further extraction of axes isn’t justified any longer. The number of these ex‐

tracted axes depends on the data, it isn’t fixed. In most archaeological applications of CA

only the first axis or the first few axes have meaning and all the rest can’t be interpreted.

The parabola or horseshoe formed by the scatter plot of axis 1 with axis 2 (fig. 4) is


purely statistical result. When the data follow the assumed unimodal model ideally, axes

1 and 2 shows such a parabola. In these cases there is also a special curve when display‐

ing axis 1 with axis 3 and axis 2 with axis 3. The typical shape of these curves should be

known to everybody, and therefore we will study it now.

Open the table “1b_ideal‐matrix‐ordered” with PAST, calculate a CA and display the

scatter plot, just as we did it before. The scatter plot of axis 1 with axis 2 with the graves

and types should be visible now forming a parabola. This well‐shaped parabola indicates,

that our data follow the unimodal model well. Just on the right side of the window

“Correspondence analysis” there are some buttons we haven’t used till now. On the top

there are two buttons with the headline “X axis” and “Y axis”. Go to the second one (“Y

axis”), click on axis 2 and go to axis 3, click. The scatter plot immediately changes, now it

displays axis 1 horizontally (like before) and axis 3 vertically (fig. 7). The dots follow a

lying S‐shaped curve now, which is typical and once again close to the mathematically

expected ideal.

Fig. 7. Scatter Plot of axis 1 (horizontally) with axis 3 (vertically).

Go to the uppermost button with the headline “X axis” now, click on “Axis 1", go to “Axis

2" and click. Once again the graph changes, now displaying axis 2 horizontally and axis 3

vertically (fig. 8). You should be looking at a curve like the sketch of a fish now. Follow

the points from grave 1, 2, ... to 10 with your eyes to recognise the course of this special

curve, which is ‐ once again ‐ typical and close to the mathematically expected ideal.


Fig. 8. Scatter plot of axis 2 (horizontally) with axis 3 (vertically).

These three displays ‐ axes 1 with 2, 1 with 3 and 2 with 3 ‐ are three looks at a three‐

dimensional cube with a three‐dimensional cloud of points within. We looked at it from

three different sides seeing only plane two‐dimensional views. If you are really inter‐

ested in this complicated curve, you could try building a physical model looking like that,

but this isn’t really necessary. For our purposes here the two‐dimensional scatter plots

suit well, we had a chance to see these ideal curves once. You should try to remember

them as a pattern, which could be seen in your future analyses with less ideal data as

well.

(7.4) CA and seriation

Now we are ready to answer the question: What is the difference between a CA and a

“seriation”? Well, as we have seen, the CA is multi‐dimensional and in many cases offers

several axes. Seriation isn’t, it gives one sequence, i.e. a one‐dimensional solution. The

sequence proposed by a seriation ‐ when properly done ‐ is an equivalent of the first axis

of a CA. Therefore older studies which analysed archaeological problems with the help

of a seriation are not incorrect, the result of a CA should be identical or very similar.

(7.5) What is relevant, the curve or the axes?

After seeing these curves and their typical course a usual question is, which sequence is

the relevant one: the position of a point in the course of one of these typical curves, or

the position of a point in the sense of axis 1? The last answer is correct. The position of

the points has to be read along the axes, not along the curves. To get a better idea of the

“real” distances along axis 1 (and 2, 3, ...) statisticians have invented a variant of the CA

named Detrended Correspondence analysis (DCA). Here a usual CA is computed first, but


afterwards the curves are re‐calculated into a line. The idea is, that the distances be‐

tween single points given by a DCA are more accurate than the distances along the axis

of a CA. Perhaps, but the difference is a small one and has ‐ from my point of view ‐ no

meaning for the archaeological practice.

If you are interested in performing a DCA, you could easily calculate it with PAST. Go

back to our table, mark (highlight) the relevant rows and columns and go along the

uppermost line to the flags “Multivariate”, then “Ordination”, and then “Detrended

correspondence (DCA)” instead of “Correspondence (CA)”. That’s it. The new window

displays axis 1 and axis 2 as re‐calculated by the DCA. The sequence of the types and

graves along the first axis hasn’t changed, but the scale and the distances are slightly

different now.

(7.6) The “parabola test”

Sometimes in archaeological applications of CA you will read, that a parabola test was

done. Could be, that you don’t know this test and consult your textbook on statistics,

where topics like chi‐square test, Mann‐Whitney U‐test or Kruskal‐Wallis H‐test are

introduced and explained in detail. But you won’t find any “parabola test” there. This is

not the fault of your text book. The parabola test is a myth. The term doesn’t mean a

serious statistical test. This means, that someone had a look at the display of his results

of a CA, the scatter plot of axes 1 with axis 2. He compared the distribution of the points

in his display visually with the expected shape of a parabola, just as we have seen it

above (fig. 4). That’s it, the famous parabola test.

It can really be worth checking whether the results of a CA show a parabola when

displaying axis 1 with axis 2. It shows, that the data set is close to the unimodal model.

But it is never something like a serious statistical test. So, please never talk about a

parabola test.

Horse shoe or parabola? ‐ that’s the question. As we have learnt already, there is

no real direction of the sequence of the axes, they can be flipped over freely.

Sometimes the display of axis 1 with axis 2 forms the shape of a parabola with two

ends up, sometimes it forms a horse shoe with two ends down. The final mathe‐

matical solution with ends up or down depends on the sequence of the data input

and on a random process, there is no different meaning in it. In a real analysis it is


useful to make the displays comparable to each other, all of them showing a

parabola, or all of them showing a horse shoe ‐ just as you prefer. European

archaeologists often prefer a parabola, American archaeologists often prefer a

horse shoe. It is important to know that there is no real difference between them,

it’s only different convention.

(8) Gaining more experience with CA

It’s obvious, that we need some introduction into the interpretation of the results of a

CA and into the possibilities to get a better idea how to work with a CA and how to

interpret its results. Before that, I would like to study further some artificial tables with

simulated data. Exercises of this kind will give you more experience with these tables

and pictures before working on a real archaeological problem.

(8.1) Case study with an unspecific type

Matrix “2_ideal‐matrix_with‐one‐unsensible‐type” shows an usual case of real applica‐

tions: The table is dominated by well‐defined, closed features (graves) and by time

sensible types (fig. 9). But there is one type which occurs all over the time. Archaeolo‐

gists often name such a type a “long runner”. To keep things simple, this table is already

put into the ideal sequence, so you can oversee its structure from the very start.

Fig. 9. The model table with the additional type “unsensible” (right side).

Please activate PAST, load this table and look at it, then perform a CA and look at the

window “Correspondence analysis” for the scatter plot of axis 1 to axis 2. The scatter

plot of the graves (i.e. rows) looks very similar to the plot derived from our first (ideal)

table but the scatter plot of the types (columns) is different now (fig. 10). While the

types A to K are sorted in the same way as before, we can see the type named

“unsensible” in the mid of our parabola or horse shoe. Well, this is the typical picture.


When a table shows a generally good and stable sequence, one or some few types,

which are not sensible to the underlying dimension ‐ time in our assumption ‐ are

collected in the mid of the open parabola. You can use this phenomenon to get hints,

which graves or types are not really useful for the analysis.

Fig. 10. Scatter plot of axis 1 (horizontally) with axis 2 (vertically) of the CA of the table in fig. 9

with type “unsensible”.

(8.2) Case study with mixed graves

The same could happen to features (graves), as shown in “3_ideal‐matrix_with‐

unspecific‐grave”. Please load the table into PAST and perform a CA. The new grave

named “collector” contains one piece of each type. Just like the “un‐sensible” type in the

example before, this grave is now in the mid of the parabola, while the other graves and

types are arranged as expected.

The next table “4_ideal‐matrix_with‐mixed‐grave” gives a more dramatic version of an

unsuitable feature (fig. 11). The new grave “mixed” is a real mixture out of the types

from grave 2 and 9, with additional types in the mid. After calculating the CA we can see

that the general sequence of the graves and types is still similar to those of the first,

perfect table. But the scatter plot shows distortions now (fig. 12). It is no longer symmet‐

ric, and the sequence, especially for the type H to type I is slightly changed. While the

model tables number 2 and 3 with an un‐sensible type or grave just kept the rule “uni‐

modal”, our mixed grave shows a bimodal collection of types, combining a very old and

a very new assemblage. This violation of the unimodal model has bigger influence on the

resulting sequence of our table.


Fig. 11. Model table with an additional mixed grave (bottom line).


with a mixed grave.

Our table is small in comparison to a real data set and thereby more sensible to single

changes. You can easily prove this by making our model tables larger, i.e. by adding more

artificial types and graves. Real, larger tables can’t be altered by a single case so easily.

But our observation shows, how and where these violations work. Types or graves,

which are not so sensible to the underlying dimension (time in our model assumption)

like the other types and graves, can’t be situated in the resulting sequence really well,

but their general influence on the quality of the final sequence is small. Real mixed

features (graves), combining typical assemblages of different times, or types, which re‐

occur after becoming outdated already, are more influential. When those disturbing


cases are rare, you will find and detect them by the CA, but when they form a bigger

amount of the data set, the CA won’t give suitable results.

(8.3) Case study with weak connection

CA is analysing assemblages and combination of types within them. A type existing in

only one grave or a grave containing only one type doesn’t show a combination and

therefore it is of no value for the analytical process. Such cases should be excluded when

you try to find and establish a sequence. The minimal requirement for our table is: each

grave contains two types at least, and each of those types has to be represented in two

graves at least. But although respecting this rule, parts of a table could be filled with few

data only. To see this in practice, load “5_ideal‐matrix_with‐weak‐connection” (fig. 13),

look at the table and calculate a CA. In comparison to the matrices analysed before, this

matrix shows higher frequencies of types in the graves at both ends of the table, but in

the mid part it was thinned, look at grave 5 and 6 and at type E and F especially. The

aforementioned minimal conditions are still kept.

Fig. 13. Modified model table with graves and types more connected to each other and a mid

part with very few combinations only.

The scatter plot of the CA mirrors these changes and the resulting structure (fig. 14):

type A, B, C and D are plotted close to each other on one end, and type G, H, I and K are

distributed as usual on the other end. While the first ones are drawn together by type

frequencies up to 4, the last ones show type frequencies only up to 3. The mid of our

parabola is thinned with the biggest distance between type E and F. This is just the

structure to be seen in the table. What does this example show? Graves and types can

be more intensively combined with each other, and they can show more distances to

each other than usual. The scatter plot of a CA shows these densities and thinner zones,


which could be taken as an instrument for defining phases in a chronologically ordered

material. It is not necessary to follow the results of a CA, when there are other good

reasons for a phasing. When there are no other or better arguments, pictures like this

can be used for phasing the sequence. In our case we could use the gap between type E

and type F to draw a borderline between two phases, maybe with grave 6 ordered to the

side with grave 7 ff., because it stands a little closer to them.


with a mid part with few combinations only.

(8.4) Don’t overemphasise the scatter plot, look at the table!

As we have learnt from these experiments with our artificial data sets, the scatter plot of

axis 1 with axis 2 could give valuable insights into the characteristics and structure of

your data. But the analysis of the scatter plot of axis 1 with axis 2 should not be used

alone and overestimated. At the end you have to look intensively at the final table,

where you can see the real combinations of the types and the graves. Sometimes ‐

especially at the beginning of a real project ‐ there are some errors in the table, often

simple typos. You won’t find them by looking at the scatter plot, you can find them by

controlling the table.

Another typical “error” may occur while preparing and defining the typology. Often a

large group of objects is organized by an archaeologist into several well defined types.

Usually close to the end of this process of classifying all the objects one or few pieces are

left. They don’t fit well into any of these types. An archaeologist often feels a certain

need not to leave any objects unclassified, so those single pieces are added to the most

plausible category. When performing a CA later on, many of these decisions about the

unusual objects don’t come up again, because they were classified well. But sometimes


errors in these decisions are detected by the CA later on, as unusual combinations

disturbing the sequence. When a sorted table shows a well‐shaped longish diagonal

cloud of frequencies in the mid, you should read it carefully row by row and column by

column. You can often detect outliers then, i.e. a single combination lying far away from

the rest of the cloud of points. Those combinations can be true, outliers are a possible

phenomenon! On the other hand, sometimes you can find your problematic typological

solutions here, and you should re‐think them now.

(9) A look at two tables with real archaeological data

Till now we have collected some experience with artificial tables. It’s time to look at

some real archaeological data sets. Two examples are given here: “Langweiler‐2_Stehli‐

1973‐p91‐fig49" and “beads_Koch‐U‐1977‐table‐4". The first data set derives from the

analysis of the settlement “Langweiler 2" from the Linear Pottery culture (ca. 5.500‐

4.900 BC) in Western Germany (Stehli 1973, 91 fig. 49). It shows single features from this

settlement (rows) and types of the main decorations of the pottery (columns). This was

an early study of this problem, outperformed by actual studies now, but it was to my

knowledge the first time, when the frequency of types in combinations was respected

and calculated, while up to then seriation was based on the presence and absence of

types only. In the original publication the features were divided into three phases; our

table shows these phases 1 to 3 noted as the first letters of the label of the features.

Our data set used as an input here shows the sequence proposed by Stehli (1973). When

performing a CA with this table you will recognize, that the three phases proposed by

Stehli are reproduced very well, but the order proposed by the CA differs in some details

from the proposal published by Stehli (1973). The parabola isn’t formed as well as in our

artificial tables. But this is normal especially when analysing assemblages of findings

from settlements. The scatter plot of axis 1 with axis 2 could be read as a hint, that the

features 1‐0485 and 2‐0821 incorporate types from different times, and that the decora‐

tion type a12 is not very sensitive to time but a “long runner”. These are hints only,

which should be argued in detail on an archaeological ground. The sequence gets better

‐ in a technical sense ‐ when those two features and type a12 are excluded from CA.


Our next example is taken out of the book of U. Koch (1977), where she studied the

beads and the strings of beads from the Early Medieval cemetery near Schretzheim,

Southern Germany (ca. 530‐665 AD). The decorated beads were analysed and classified

into distinct types. The table shows these types (columns) and their representation in

the strings of beads (rows), which were worn by Early Medieval women as necklaces. A

copy of Koch’s printed table (Koch 1977, table 4) is enclosed here at the end (fig. 18). The

original sequence of this table was handmade by U. Koch and obviously follows a differ‐

ent concept: the latest type dates the complex. This is an usual approach e.g. of numis‐

matists when dealing with treasure hoards of coins. We will discuss the methodological

aspect later (see chapter 10). The rows in our table show the strings of beads, which are

labelled in a special way: the leading number gives the chronological phasing of the

graves according to the actual chronology of the cemetery of Schretzheim (Koch 2004),

followed by a hyphen, followed by the grave number as in the original table (Koch 1977,

table 4). Undated graves are marked by two leading hyphens before their number. With

the help of this coding technique we can read the results of the CA easier, because one

can see immediately, whether and how far the sequence of the beads proposed by the

CA is in accordance to the actual chronology of the cemetery.

When a CA of the table of beads from Schretzheim is computed, its order shows a good

concordance with the overall chronology of the cemetery of Schretzheim. However, the

results differ from the sequence of the originally published table. The display of axis 1

with axis 2 shows a well‐formed parabola, but there seem to be some outliers: grave 6‐

258 and 7‐420 and bead type 33,15‐16. Because we can’t go into the archaeological

details here to analyse the reasons, we take the simple solution and remove them ‐ as an

experiment ‐ from the data set (highlight the row or column respectively, then >> “Edit”

>> “Remove”). Re‐calculate the CA and compare the results, ...

Stehli, P. (1973). Keramik. In Farrugia, J.‐P., Kuper, R., Lüning, J. & Stehli, P. (eds.).

Der bandkeramische Siedlungsplatz Langweiler 2, Gemeinde Aldenhoven, Kreis

Düren. Rheinische Ausgrabungen 13 (pp. 57‐100). Bonn: Rheinland‐Verlag.

Koch, U. (1977). Das Reihengräberfeld bei Schretzheim. Germanische Denkmäler

der Völkerwanderungszeit A 13. Berlin: Gebr. Mann.

Koch, U. (2004). Schretzheim §2 Archäologisches. Reallexikon der Germanischen

Altertumskunde vol. 27 (pp. 294‐302). Berlin: de Gruyter.


(10) “The latest type dates the complex”? ‐ Or: how does CA date?

As mentioned above, many numismatists, especially when dealing with hoards, follow

the concept, that the latest piece in an assemblage is dating it. Converting this model

into an ordered table, the table should look like the one cited above (Koch 1977, table

4): a rectangular table with one empty triangle and another triangle filled with frequen‐

cies most densely accumulated along the diagonal border line between both areas. This

picture differs from the tables generated by a CA, which show a symmetrical accumula‐

tion of frequencies along the diagonal. The table derived by a CA mirrors the unimodal

model, which orders types and graves into a mid. Thereby the CA estimates the most

probable mean (!) time of an assemblage and the mean time of a type, not the time of

the last piece. Grave goods are a collection: some pieces are recent, some pieces can be

old. They all were deposited in the earth when the corpse was buried, but some of them

might be acquired by the dead in their early years, some in the last days of their life,

some pieces can be produced for the occasion of this burial. The CA draws all this

together to a mean estimation for the assemblage. If you think this concept isn’t suitable

for your findings, don’t use CA.

Now you might ask, whether there is another statistical solution instead of CA,

more suitable to the model “the last piece dates the complex”? I have to disap‐

point you, because there is no suitable and statistically valid solution for this

different concept. If you try e. g. to analyse this table following the linear model

by a PCA (see chapter 11.5), you will easily recognize that the deviation from the

original results of Koch (1977) and the chronology of the cemetery is more signifi‐

cant than the deviation from the results derived by our CA. From my point of view

the model “the latest type dates the complex” is not suitable to archaeological

problems ‐ but this is my personal opinion only.

(11) It’s time to start with your own projects now

Now you are ready to perform your own projects. It would be the best to use your own,

real data. The following part of this tutorial will give some useful practical advices for

your first steps into CA.

(11.1) Data preparation, or: How does a suitable table look like?

This question isn’t as silly as it looks like at a first glance. Usually the archaeological

information is prepared in a structure like that: grave 1 contains a sword type 1 and a


shield type 44; grave 2 contains a sword type 2 and a shield type 55. One could trans‐

form this information into such a table (fig. 15):

sword shield

grave 1 type 1 type 44

grave 2 type 2 type 55

Fig. 15. Simple table showing the types found in the graves.

But this isn’t a table suitable for CA. Furthermore you have to transfer your data into a

table of such kind (fig. 16)

sword

type 1

sword

type 2

shield

type 44

shield

type 55

grave 1 1 0 1 0

grave 2 0 1 0 1

Fig. 16. Modified table with the same information as fig. 15, but ready now to be analysed

by a CA.

Each row represents a single grave (or feature), each column represents a single type (or

attribute) now, with the numbers in the cells representing presence or absence of this

type in this grave, or the frequency. It is important to recognise the difference between

the two tables and to prepare the input correctly.

There is another kind of table often used in older archaeological literature, but not

suitable for being analysed with a CA. I mean quadratic symmetric tables, where the

rows as well as the columns show types, and the cells show, how often a type is com‐

bined with another type. These tables are symmetric with a diagonal in the mid, showing

the combination of a type with itself, while the two triangles show ‐ symmetrically

mirrored ‐ the number of combinations of each type with the other types. Nowadays

these tables are named “Burt table” in the statistical literature. In our small collection of

examples I have added a table named “8_burt‐table_from‐ideal‐matrix‐1", where I have

transformed the information in our table 1 into a Burt matrix (fig. 17). As far as I know of

archaeology a Burt table was first use by Heinz Gatermann (1942, p. 11 fig. 1) when

analysing the decoration of beaker pottery in western Germany. His study inspired David


L. Clarke (1970, p. 429, 469) to use those matrices in his book about the beaker pottery

in Great Britain and Ireland.

Fig. 17. Our “ideal matrix” transformed into a “Burt table”, which is not suitable to be analysed

by a CA.

Such tables should not be analysed by a usual CA; although this is technically possible,

the results are not correct. Greenacre (2007, pp. 137‐152) explains the statistical prob‐

lems of such process and sketches out a possible solution, named Joint Correspondence

Analysis (JCA). But a JCA needs a different way of calculation. From an archaeological

point of view these tables are also difficult to work with, because the original archaeo‐

logical information ‐ the combination of types in graves ‐ cannot be seen any more.

When you are working with such a table and have to change something like modifying a

typological decision or to delete an unsuitable type or a mixed grave, this isn’t an easy

procedure. So, even when applying a JCA instead of CA to a Burt table gives a valid

statistical solution, don’t use such tables.

GATERMANN, H. (1942). Die Becherkulturen der Rheinprovinz. Würzburg: Triltsch.

CLARKE, D. L. (1970). Beaker pottery of Great Britain and Ireland. Cambridge: University

Press.

NEUFFER, E. M. (1965). Eine statistische Bearbeitung von Kollektivfunden. Bonner Jahr‐

bücher 165, pp. 28‐56.

GEBÜHR, M. (1970). Beigabenvergesellschaftung in mecklenburgischen Gräberfeldern der

älteren römischen Kaiserzeit. Neue Ausgrabungen und Forschungen in Niedersachsen 6,

pp. 93‐116.

(11.2) You need good material, good questions and a suitable benchmark

To achieve a good chronology you need a large amount of material and a well done

typology ‐ stated Oscar Montelius in the introduction of his famous book on archaeologi‐

cal methods in 1903. This simple truth is still valid. The typology must be suitable for

your specific questions. If you are interested in chronology, the types have to be sensible


to the dimension of time. If you are interested in questions of social status, the types

have to be sensible to this specific question. Therefore the one and only optimal typo‐

logy for a certain material doesn’t exist, but there are several ones. A brooch e. g. will be

classified by its style for chronological questions, but may be classified after its material

(gold, silver, bronze) or its weight for a social analysis, or after its position in a grave for

the analysis of costume.

Clearly it is not possible to achieve your goals with too few findings. It is difficult, but not

impossible, to answer the question: how many will be enough? CA helps to find the

answer. How? By observing the stability of your results. Whenever a real project is done,

there is a time of trial and error, when you are trying to get better results step by step.

This is an important part of the research process. After some time you will recognize (I

hope), that further tries to improve your sequence don’t change it any more. Don’t be

disappointed, but be happy instead: the stadium of stability has been reached. When‐

ever you add a new finding to your table now, the addition should be integrated in the

sequence well, but without having much influence on the sequence of the table in

general, in comparison to the results before, without this special finds (graves or types).

When an analysis has reached this point, it is stable, your material is vast enough.

If you can’t add any new findings to your table to test its stability, you can try the oppo‐

site: delete one type or one grave and look, what happens. When the effect on the order

of the whole table is low, stability has been achieved. This concept seems to be a little

handmade, but it isn’t. Statistical theory names this process as “jack‐knifing and boot‐

strapping” (Efron & Tibshirani 1993; Chernick 1999; Good 2013). Jack‐knifing means

deleting single cases from a data set, and bootstrapping means doing this systematically

and observing the results after each step. Delete case 1 from the data set, perform your

analysis and save the results. Put back the deleted case 1 to your data set, delete case 2

now, perform your analysis and save the results, and so on. This process is often named

as sampling. At the end you can analyse all these results, in our case the scores of the

types and the features. The results should be similar to each other, and it would be

interesting to identify those single cases, which are responsible for the single most

deviant results. Analyse them from an archaeological point of view. They could indicate

the weaknesses of your table, e.g. bad defined types or mixed graves, or they could

simply be very influential, without any error. Analysing your results type/grave by

type/grave could take weeks of your precious time! But there are ways to do this sys‐


tematically with the help of a computer. If you plan a project of this kind, have a look at

the statistical package “R”, where you could write a script to do these deletions, addi‐

tions and comparisons automatically (Good 2013).

CHERNICK, M. R. (1999). Bootstrap Methods. A practitioner's guide. Wiley Series in probabil‐

ity and statistics. New York: John Wiley & Sons.

EFRON, B. & TIBSHIRANI, R. J. (1993). An Introduction to the Bootstrap. Monographs on

Statistics and Applied Probability 57. New York: Chapmann & Hall.

GOOD, PH. I. (2013). Introduction to statistics through resampling methods and R. 2nd ed.

Hoboken NY: Wiley.

MONTELIUS, O. (1903). Die typologische Methode. Stockholm: Selbstverlag des Verfassers.

Once again: a statistically validated result is nice, but the archaeological validation is

more important. When starting an analysis, a benchmark is needed, a hypothesis which

can be used to compare it with the results of your actual CA. In the case of a chronologi‐

cal question this could be the archaeological standard chronology used till now, it could

be a stratigraphical information, or some radiocarbon dates, or dated coins in some of

the assemblages, or the chorological / topo‐chronological analysis of a cemetery, which

had grown systematically. It is not necessary to have information of that kind for all of

your grave assemblages, but for some of them you should have it. The best would be, if

you wrote a short chapter for your later publication just at the beginning of your project,

before starting with the CA, where you explicitly describe and reason these benchmarks

or test hypotheses of your study, for yourself as well as for your readers. After this step

you can clearly judge each change of your table in comparison to your benchmark:

whether the results are better than before or not.

When working with the table and the scatter plots of the CA, it is important for your

practical process to have your benchmark(s) distinguishable. I propose to embed this

information into the labelling of the types and features, e. g. by adding special signs to

the type or grave names. Yes, just as I have done it in the examples “6_Langweiler‐

2_Stehli‐1973‐p91‐fig49" and “7_beads_Koch‐U‐1977‐table‐4". One can immediately see

the conventional dating of the graves, and thus read and understand the results of the

CA easier.


(11.3) What is allowed, and what shouldn’t be done? Some practical advices

No table and no CA is ready right away. In most of the cases the final result is an effect

of a long process of trial and error. What are your possibilities? You can't change assem‐

blages individually, e. g. delete a single "disturbing" type within a grave. But you can

delete unsuitable graves as a whole, whenever there are arguments to do so, like graves

mixed by errors during the excavation or during storage in a unprofessional magazine.

You can delete unsuitable types as a whole, for example when they are too unspecific in

relation to your question. Selecting useful features and types or deleting unsuitable

findings is an important part of the enhancement of the table. Before starting the

working process, you should develop some rules and criteria for these operations, and

these rules should be explained and should be a part of your publication.

It is often difficult and needs some time of trial and error to select the set of types you

want to use. If they are all very specific and fine graded, you might have few combina‐

tions only. If you integrate (too) many unspecific, roughly defined types, you enrich the

number of your combinations, but you won’t get a detailed chronology then. There is no

fixed rule of solving these questions, you have to try finding a good solution, and a good

explanation for your decisions.

What can be done, when parts of your table show too few combinations and too few

connections to the rest of the material? Maybe you could look for some additional

materials, e.g. from comparable finding places nearby. Re‐think your typology, it could

be too rough, or too detailed. Sometimes it is useful to split some of your types into

attributes. Instead of types of belt buckles as a whole it could make sense to divide

them: to put one real group of objects once into your table as a belt buckle type after

shape, and a second time as a belt buckle type decorated in style xyz. Such an approach

could help you to bridge or strengthen insufficient zones in a table.

Sometimes it is useful to tighten the rule “each grave contains two types at least, each

type has to be represented in two graves at least”. Especially when analysing archaeo‐

logical assemblages derived from settlements it can be useful to rise this minimum from

two to three or four, which will exclude singularities more efficiently.

On the other hand too vast assemblage can ruin the quality of a sequence as well. When

a single feature comprises much more findings than all the other features, it will domi‐


nate the order, which is often (but not always) inappropriate. From an archaeological

point of view it is likely, that this feature collected its material for a longer time than the

others, which reduces its value for chronological studies. Therefore one could try to form

another rule: exclude overly frequent types and assemblages with too many findings.

In a real research process there is a long time of working with the tables and making

decisions about graves and types to be included or excluded from CA. Where to start?

Should you start with all the material and successively eliminate assemblages and types

which seem to disturb the sequence? Or should you start with a core of well‐known

good suitable types and graves? After having the first good and stable result on their

basis you could add further material to this core in a process of trial and error. Simple

answer: there is no single and easy way to Heaven. My professional experience shows

that it is better to start with a well‐reasoned core if you are a beginner. A very helpful

technique to get your own way: write it down before you start with the CA. At the end,

the whole research design has to be published. You can often test a certain solution by

trying to write down your arguments for the final publication. By then, you will immedi‐

ately see which solution is accompanied by weaker and which by stronger arguments.

You shouldn’t start discussing single cases and decisions here, but introduce the general

rules your study is following. The question you should answer can be therefore specified

as: are there transparent arguments and rules for deleting cases from the study when

starting with all the material? And on contrary: are there transparent rules for electing

a core of material for the start, and how to add further complexes and types succes‐

sively? The answers to these questions could help to find your way.

(11.4) The “edge effect” and how to work with it

The sequence of a table is usually insufficient on both edges. It is common that at the

beginning and at the end of a chronological sequence the archaeological information is

limited in relation to the more central parts of the table. This is a typical cause for the

unsatisfying sequence on the edges. Another reason is the lack of combinations behind

the edges of the actual table. Quite contrary, there were also combinations behind the

edges. But you can’t see them ‐ and the statistical process can’t calculate them ‐ because

they are not represented in the actual table. Therefore types and findings, which should

be situated at the borders of your table, show combinations only trying to move them

into the central part of a table and no ones moving them into the edge. It is often wise

to accept the fact, that the sequence is not optimal on the edges. But what to do, when


these outer parts of the table are important for your analysis? Simple answer: Enrich

your table with material beyond the edges. This is often possible by looking for some

additional archaeological material slightly older and/or slightly younger than the mate‐

rial under study. By that, the actual edges aren’t edges any longer but are traversed to

the more central parts of the table, while the newly added findings form the edges of

your table now ‐ with some edge effect, of course.

(11.5) On de‐trending, weighting and canonical correspondence analysis

There are several variants of CA, which could by applied in special situations. I will give

a short explanation here, ending with a clear advise not to apply them in most cases. The

meaning of the term de‐trending has been explained already above, the procedure is

named Detrended Correspondence Analysis (DCA). De‐trending means to re‐calculate

the parabola ‐ a quadratic function ‐ out of the axes 1 and 2. This could be useful when

a reference of one of these axes to another linear scale should be achieved, like to

estimate real calendaric time from the scores of axis 1. The other purpose of a de‐

trending is to get better results for the second axis. Whenever you want to interpret the

order of the second axis more detailed, a DCA could be useful. So, if there is a problem

which really needs those ideas, de‐trending would be a really serious approach, but in

our standard applications it should not be used.

An archaeologist often thinks about weighting, in order to express their idea of more or

less important things. Some types seem to be more important to achieve a suitable

sequence of the table then others. Such weighting isn’t forbidden and the possibility for

weighting is well implemented in the tools of WinBASP. But... Weighting should not be

too complicated, it should follow simple, clear and precise rules, which are explicitly

listed at the beginning of the study. This could be for example: decorated beads or

decorated potsherds are counted as double in relation to undecorated beads and

undecorated potsherds, because they seem to be more sensible to the chronology. My

personal experience with weighting is: it could be useful, it could make a table more

complicated to read, often its effect on the final result is less intense than expected.

Therefore my advice is: keep things as simple as possible.

Canonical Correspondence Analysis (CCA) is a technique different from CA. Its aim is to

re‐arrange a table with information following the unimodal model into a new sequence,

but along the given “canonical” axis first. A CCA has a specified canonical variable, which


gives the first, “canonical” sequence, which then is followed by further axes (dimensions)

freely ordered similar to the usual CA. If there is a fixed first dimension for all or most

cases in your study, CCA could be a good idea. Any example? Grave assemblages often

show strong differences of gender. The usual approach of chronological studies is to

perform two different analyses, one for the male and one for the female graves. Theo‐

retically, you could analyse them together in one table and define gender as canonical

axis to get a combined chronology as second, or first free axis respectively. Well, I have

tried this several times and my results were not satisfying. CCA is not a standard process,

it should be applied only when good reasons are given. For some examples and further

details see e. g. Müller & Zimmermann (1997).

You should be always aware of the question, whether you assume the unimodal or the

linear model. There is a concept similar to CCA, but for linear models only, which is called

Redundancy Analysis (RDA; see: Jongman, ter Braak & van Tongeren 1995). I have

applied it once to an archaeological problem, where findings out of a short stratigraphi‐

cal sequence had to be analysed (Siegmund 1994). The reason for choosing a linear

model in this special case was the short time span embedded in the sequence. When

types and assemblages in general follow the unimodal model, but the time span repre‐

sented in the archaeological sample is very short, your sample shows ‐ or could show ‐

only one half of the bell shaped life curves of the types. In such case a linear model is

more appropriate.

If you want to see what happens, when a wrong model is applied to a data set, you can

get a visualisation by performing a PCA (instead of a CA) with our “1_ideal‐matrix‐or‐

dered”: Go to PAST, then “Multivariate” >> “Ordination” >> Principal components

(PCA)”, and analyse the obtained scatter plot. Time has been “folded” now, the begin‐

ning and the end of our table are drawn together into the mid of axis 1.

MÜLLER, J. & ZIMMERMANN, A. (EDS.) (1997). Archäologie und Korrespondenzanalyse: Beispie‐

le, Fragen, Perspektiven. Internationale Archäologie 23. Espelkamp: Marie Leidorf.

SIEGMUND, F. (1994). Jülich. Scherben und Schichten zu den Feuersbrünsten des 15. und 16.

Jahrhunderts. Jülicher Geschichtsblätter = Jahrbuch des Jülicher Geschichtsvereins 62, pp.

131‐184.


(12) Applying the results of a given CA

Sometimes there is a well‐reasoned and established chronology based on a large amount

of material and on a CA, and you want to embed your few findings into this given order.

How to do it? There are three different solutions, all of them acceptable.

(a) Keep things simple and don’t use statistics. Read the reference study you are

using, analyse the phasing of the relevant types there, and put your material into these

phases without any statistics. This is the usual way and not bad at all.

(b) Re‐calculate the given CA with your new, additional data. This approach will (or

should) embed your material into the already established sequence. It is a fine way to

approach the problem, but it is possible (or very likely, which depends on the amount of

additions), that your material will change the order of the types and features of the

given study. If you want to avoid this, you could choose solution (c).

(c) Apply the scores of the given CA to your new data. The position of each grave

(feature) in a given CA can be calculated from the scores of the types, and vice versa,

thereby new features (and types) can be included into a given CA very accurately,

without changing the original order. The new features and types are statistically named

as “supplementary points” (Greenacre 2007, pp. 89‐96). The calculation can be done in

the following way, when we assume that we are calculating the position of a new grave

along axis 1 of a given CA: Take the scores of each type along axis 1 of the given CA and

multiply them by the observed frequencies of each type in the new grave you want to

integrate. This will be often a multiplying by zero, which equals zero. Then, you must

calculate the sum of these results, and divide this sum by the sum of objects (not the

number of types) represented in this new feature. The result is the score of the new

grave 1 along axis 1.

What I wanted to underline by this remark is that the scores of the relevant axes of

a CA are important pieces of information, and therefore they should be published.

(13) Final remark

At first glance, the theory of CA and the practical calculations seem to be complicated.

I wanted to show you that they are easy to understand in a general way, and that the

calculations could be practised quickly. The core of your work should be the archaeologi‐

cal part of such an analysis. It is useful to look for an example close to your specific

problem, and follow this example like following a cookbook on your first steps. It helps

to have an experienced colleague, who could be asked for assistance and discussion

from time to time. Be brave and start to gain your own experience with CA, it is a mighty


and useful method, you will often need it.

(14) Some further readings

GOLDMANN, K. (1972). Zwei Methoden chronologischer Gruppierung. Acta Praehistorica et

Archaeologica 3, p. 1‐34.

GOLDMANN, K. (1979). Die Seriation chronologischer Leitfunde der Bronzezeit Europas. Berliner

Beiträge zur Vor‐ und Frühgeschichte NF Bd. 1. Berlin: Spiess.

HAIR J. F., BLACK, W. C., BABIN, B. J. & ANDERSON, R. E. (2010). Multivariate data analysis. 7th ed.

Upper Saddle River: Pearson Prentice Hall.

HAMMER, Ø., HARPER, D. A. T. & RYAN, P. D. (2001). PAST: Paleontological Statistics Software Package

for Education and Data Analysis. Palaeontologia Electronica 4(1): 9 pp.

IHM, P. (1983). Korrespondenzanalyse und Seriation. Archäologische Informationen 6, pp. 8–21.

IHM, P. & VAN GROENEWOUD, H. (1984). Correspondence Analysis and Gaussian Ordination. COMP‐

STAT Lectures 3, pp. 5‐60.

JONGMAN, R. H. G., TER BRAAK, C. J. F. & VAN TONGEREN, O. F. R. (1995), Data analysis in community

and landscape ecology. Cambridge: Cambridge Univ. Press.

KENDALL, D. G. (1963). A statistical approach to Flinders Petrie's sequence dating. Bulletin of the

International Statistical Institute 40, p. 657‐680.

MÜLLER, J. & ZIMMERMANN, A. (Hrsg.) (1997). Archäologie und Korrespondenzanalyse: Beispiele,

Fragen, Perspektiven. Internationale Archäologie 23. Espelkamp: Marie Leidorf.

PETRIE, F. W. M. (1899). Sequences in prehistoric remains. Journal of the Anthropological Institute

29, p. 295–301.

SOKAL, R. R. & ROHLF, F. J. (2012). Biometry: The principles and practice of statistics in biological

research. New York: Freeman.

TER BRAAK, C. J. F. (1987). Unimodal models to related species to environment. Wageningen:

Agricultural Mathematics Group.

WILKINSON, E. M. (1974). Techniques of Data Analysis. Seriation Theorie. Archaeo‐Physika 5. Köln:

Rheinland‐Verlag.

A proposal for an interesting test and training project: Perform a CA of the data set

published by Oscar Montelius (1885), which shows the foundation of his chronology of

north European bronze age. Montelius’ book includes tables of his material which are

easy to transfer (p. 270‐311). The book is available online now and the text (but without

these tables) is available in English, too (Montelius 1996).

MONTELIUS, O. (1885). Om tidsbestämning inom bronsåldern. Stockholm: På Akademiens

Förlag.

(https://openlibrary.org/books/OL22888482M/Om_tidsbest%C3%A4mning_inom_bron

s%C3%A5ldern). ‐ (Incomplete) English translation: MONTELIUS, O. (1996). Dating in the


Bronze Age. Stockholm: Kungl. Vitterhets Historie och Antikvitets akademien.

Author

Priv. Doz. Dr. phil. Frank Siegmund

mail@frank‐siegmund.de

www.frank‐siegmund.de

http://uni‐duesseldorf.academia.edu/FrankSiegmund

* (Very) Extended version of my presentation “Archaeological chronologies based on

correspondence analysis: a practitioner's guide to success and reliability”, University of

Bologna, March 31th 2014.


Fig. 18. Copy of Koch 1977, table 4.

franz siegmund turoial for correspondence analysis

Documents

thenbyre sticks