· correspondence analysis of identiﬁcation data. uwe mortensen, g¨unter meinhardt...

Correspondence Analysis of Identification Data.

Uwe Mortensen, Gunter Meinhardt

Westfalische Wilhelms-Universitat MunsterFachbereich Psychologie und Sportwissenschaften, Institut III

D-48 149 Munster, Fliednerstr. 21

Abstract: Even when the main interest is in the sensory processes, the process of iden-tifying a visual pattern under threshold conditions is determined by the way sensoryand decision processes are interlocked. Important work concerned with these questionswas done by e.g. Ashby and Townsend (1986) and Kadlec and Townsend (1992), whoinvestigated the conditions of sampling independence and perceptual and decisional sep-arability as tests for perceptual independence. These tests are computationally involvedand the question is whether there exist simpler, more direct procedures allowing to testfor sampling and perceptual independence. Here, methods of Dual Scaling and in par-ticular Correspondence Analysis (CA), applied to confusion matrices, are suggested. CAof confusion matrices yields scale values for (i) the stimulus patterns, and (ii) for theresponses. It will be argued that to the extend the sensory activity, generated by show-ing a stimulus pattern, can be represented by a random variable, the scale values of thestimuli will provide estimates of the mean values of these random variables, relative totheir variances, an the scale values of the responses provide information about possibleresponse biasses. The argument carries over to data from discrimination experiments;however, in this paper only data from identification experiments will be considered.

1

Contents

1 Introduction 3

2 Models of the identification process 4

2.1 The structure of the experiment . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 Models, in particular the GRT . . . . . . . . . . . . . . . . . . . . . . . . 4

3 Representation of stimuli and responses by scale values: 6

3.1 Holistic decisions or single component stimuli . . . . . . . . . . . . . . . . 6

3.2 The estimation of scale values . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.2.1 Estimation I: maximising the correlation ratio . . . . . . . . . . . . 8

3.2.2 Estimation II: Correspondence Analysis . . . . . . . . . . . . . . . 11

4 The 1-dimensional case 17

4.1 Numerical evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.2 The analysis of empirical data . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.2.1 Gabor patches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.2.2 Superimposed Gabor patches with and without flankers . . . . . . 26

5 The 2-dimensional case 34

5.1 Ashby and Townsend (1986) . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.2 Numerical evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5.2.1 The separable case; r = 0 versus r 6= 0 . . . . . . . . . . . . . . . . 38

5.2.2 The case r 6= 0, Gaussian classification . . . . . . . . . . . . . . . . 39

5.2.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5.3 Empirical data: stimuli composed of circular ”discs” . . . . . . . . . . . . 39

6 Summary and discussion 45

7 Appendix 46

7.1 Decomposition of variance . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

7.2 Proof of Theorem 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

7.3 Proof of Theorem 3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

7.4 Proof of Theorem 3.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

7.5 Proof of Theorem 3.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

7.6 Proof of Theorem 3.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

2

1 Introduction

Data from discrimination and identification experiments may provide insight into percep-tual, in particular into coding processes. In order to explain errors in discrimination andidentification judgments one usually assumes that the sensory representation of a stim-ulus pattern is, under the conditions of the experiment, noisy so that for nonidenticalstimuli the overlap of the corresponding representations may turn out to be sufficientlylarge to generate wrong identity responses (”confusions”). The subject will, in some im-plicit way, set up boundaries allowing the classification of sensory representations. Theresponses will be biased if these boundaries are chosen sub-optimally.

Discrimination and identification experiments have a long history; Townsend and Lan-don (1983) provided an overview over available models in particular of the identificationprocess. Ashby and Townsend (1986) discussed the notion of independent processingof features or stimulus dimensions, and Townsend and Ashby (1982) provided experi-mental tests of identification models. Nosofsky (1986) discussed the generalisation ofidentification processes to categorisation processes, a topic further investigated by Caelliet al. (1987), Rentschler et al. (1992), (1996) and Juttner et al. (1997). Balakrishnanand Ratcliff(1996) and Balakrishnan (1998) suggested that comparisons are made by thesubject with respect to a subjective likelihood ratio, and provided data that comply withthis model (Balakrishnan, 1999); his results suggest that response bias has an effect onthe encoding distributions, but no effect on the decision rule. This finding may be in-terpreted within the framework of models assuming processes of information integration(Busemeyer and Townsend, 1993; Diederich, 1997; Link and Heath, 1975). While theseapproaches are of high interest, a much simpler (but not necessarily better) ansatz will bemade in this paper: it will be assumed that the subject evaluates the difference betweenthe sensory activities, expressed as the difference between the corresponding randomvariables. It turns out that data from different experiments appear to be compatiblewith this approach.

One way to characterise the sensory activity is to specify a a random vector ~X =(x1, . . . , xs)

′, where the components xi, 1 ≤ i ≤ s are random variables representingaspects of the neural activity that make up the sensory activity or representation. Giventhe xi are normally distributed, they are specified once their means and variances areknown. If s = 1 a measure for the sensitivity to a pattern is d′ = (µsn − µn)/σ, whereµsn is the mean when the pattern is presented, and µn is the mean when no patternis presented, and σ is the standard deviation of X, assumed to be independent of theactual presentation of the pattern (Tanner and Swets, 1954). d′ is independent of theboundaries and thus a measure of sensitivity that is free of bias (provided the equal-variance assumption is correct).

A violation of these assumptions may lead to misinterpretations of the data (Maloneyand Thomas, 1991); however, an explicit test of the assumptions is usually rather timeconsuming. Thus it is useful to have some method that yield measures that are, onthe one hand, equivalent to d′ and that, on the other hand, are not based on particularassumptions concerning the distributions.

It will be argued that the application of Correspondence Analysis (CA) to discrimina-tion and identification data provides such a measure, and additionally allows to evaluate

3

possible response biases. CA belongs to the class of Dual-Scaling-methods. These meth-ods allow to compute scale values for the row and column categories of a contingencytable. The scale values refer to at least one (latent) dimension or attribute that iscommon to both sets of categories, which may therefore ”explain” the possibly existingdependencies among row and column categories.

The calculation of the scale values does not require a specific assumption about theconditional distributions of X. The only assumption that has to be made refers to theway the subject utilises the sensory information in order to make a judgment.

2 Models of the identification process

2.1 The structure of the experiment

Let S = {Si|1 ≤ i ≤ I} and T = {Tj |1 ≤ j ≤ J} be two sets of stimulus patterns, withS ⊆ T , implying I ≤ J . For the special case S ⊂ T one has I < J , i.e. there exist J − Ipatterns Tj that are not identical to any element Si in S. All patterns Tj are definedsuch that, under given experimental conditions, they are confusable with at least someSi ∈ S.

Let nij be the number of times Si has been judged to be identical with Tj ; the nij

can be summarised in a table of the form The table will be referred to as a confusion

Table 1: Data: form of a confusion matrix

T1 T2 · · · TJ

∑

S1 n11 n12 · · · n1J n1+

S2 n21 n22 · · · n2J n2+

... · · ·...

SI nI1 nI2 · · · nIJ nI+∑

n+1 n+2 · · · n+J N+

matrix. In the following, I = J , so there is no classification of the stimuli.

2.2 Models, in particular the GRT

One may distinguish different types of models of pattern identification:

1. Template matching as holistic identification: The pattern is detected as awhole. Special models focus on the notion of a matched filter and the maximisationof cross correlations, e.g. Hauske, Wolf and Lupp (1976), Burgess (1985) andMeinhardt and Mortensen (1998),

2. Component Identification: Here the stimulus is identified according to the clas-sification/identification of its components. A special case are stimuli defined by a

4

single parameter. The components may again be identified via template matching.

3. The role of stimulus environment: A general question is wether lateral, ir-relevant stimulus components play a role. No explicit models will be discussed,although the effect of such components can directly be investigated.

Neural activities: Let Ni be the neural activity generated by the presentation ofthe stimulus Si. The following characterisation of Ni follows the notation of Ashbyand Townsend (1986). It will be assumed that in a given trial, Ni can be representedby a random vectors ~xi. The components of these vectors reflect aspects of the neuralactivities, in particular aspects representing the components of the stimuli that are variedamong the stimuli: for instance, the stimuli may be defined by two components A and B,e.g. Si = AjBk, where Aj and Bk are the j-th and k-th level of A and B, respectively.Then ~xi = (xi1, xi2)

′, and xi1 and xi2 represent the perceptual effects generated by Aj

and Bk. fi(x1, x2) is the distribution of the perceptual effects.

It is conceivable that even if the stimuli are defined as in what Ashby and Townsendcall a Complete Identification Experiment (see, however, Karlec and Townsend (19xx)),i.e. if the stimuli are defined as A1B1, A1B2, A2B1 and A2B2, suggesting a component-wise identification to the subject, the subject my process the stimuli in a holistic manner,so that decisions are made with respect to a single random variable, ~xi = xi. Templatematching would be an example for this, and Xi could be some measure of overall sim-ilarity. This may be seen in context with respect to the separability versus integralitydichotomy (see below).

General Recognition Theory (GRT) The GRT may be taken as a general frameworkto discuss the questions arising from identification experiments. General notions here,special notions for 2-dimensional stimuli later in the corresponding section.

1. Physical dimensions: X, Y physical measures of components. Single componentstimuli: X1 . . ., X4 values of the parameter with respect to which the stimuli differ(example: spatial frequency parameter).

2. Perceptual dimensions: x1, x2, as above.

3. Decisional dimensions: r(x1, x2) response function for the selection of a par-ticular response on the basis of x1, x2; Rr(x1, x2) the global response function forperformance over the course of many trials (d′, or χ2

i ).

4. Response ajbk to stimulus AjBk.

5. Density functions: In the GRT, the densities will be assumed to be Gaussians.

5

3 Representation of stimuli and responses by scalevalues:

3.1 Holistic decisions or single component stimuli

Suppose the stimulus is evaluated with respect to a single random variable, so that~xi = Xi. It will be assumed that the subject decides for the response Tj if Xi ∈ Aj ,where Aj ⊂ R is a set characteristic for Sj . Let A be the union of the ranges Aj :

A =⋃

∀j

Aj , Aj

⋂

Aj′ = ∅, if j 6= j′. (1)

The aim is to represent the Si by scale values αi such that the αi reflect the averageperceptual effect of the corresponding Si. Similarly, each response may be representedby a scale value βj . The general definition of the scale values will be introduced first; theinterpretation in particular of the relation between the αi and the corresponding βi willbe discussed later.

Definition 3.1 Let X be the random variable representing the activity (the perceptualeffect) generated by a stimulus. Then

αi = E(X|Si) (2)

To prepare the definition of the scale values for the responses, the definition of αi will bere-written. If Si is presented, one may simply write Xi instead of X under the conditionof Si, or fi(X), where fi is the density of X given Si was presented. So

αi =

∫

A

Xf(X|Si)dX =

∫

A

Xfi(X)dX. (3)

Because of (1), one may write

αi =∑

j

∫

Aj

fi(X)dX. (4)

Writing Xi instead of X in order to indicate that X is considered given Si was presentedone may define

E(Xi ∈ Aj) =

∫

Aj

fi(X)dX, (5)

so that one has

αi =J∑

j=1

E(Xi ∈ Aj). (6)

One may then introduce the definition

Definition 3.2 The scale value βj for the response Tj is given as the sum of expectedvalues

βj =

I∑

i=1

E(Xi ∈ Aj) (7)

6

Comments:

1. Relation between definitions: Formally, the definitions of αi and βj differ withrespect to the summation: αi is the sum of expected values E(Xi ∈ Aj) over theresponses Tj , while βj is the sum of the expected values E(Xi ∈ Aj) over thedifferent stimuli Si.

Figure 1: Densities over the interval [g2, g2); Gaussian distributions (IV), see Fig. 5

0,48 0,52 0,56 0,60

0

1

2

3

4

5

6

dens

ity

(a)

(b) (c)

(d)

g2

g3

Decision variable

2. Relation between scale values: Let us briefly consider the relation between theαi and βj . A sufficient condition for αi → E(Xi ∈ Ai) and consequently βi → αi isE(Xi ∈ Aj) → 0 for i 6= j. This condition may be relaxed; one will observe αi ≈ βi

if nii > nij , i 6= j and if the nij decrease with decreasing similarity of Si and Sj ,i.e. with increasing difference of the parameter defining the difference between Si

and Sj . See also role of decision boundaries.

3. Choice of variables: So far, no further specification of the random variablesXi has been given, other than that they represent the ”perceptual effects” or the”neural activities” generated by the stimuli. For certain stimuli, e.g. ones that arecomposed of line elements (Townsend and Nosofsky ?), one may simply assumethat X represents the activity of a line detector. For other stimuli, there is nosuch straightforward interpretation, for instance when stimuli are defined by thesuperposition of two ”discs” of different radii, of by the superposition of Gaborpatches defined by different spatial frequency parameters. One may assume thatthe subject searches for aspects of the neural activity that allow to discriminate bestamong the stimuli. This assumption underlies the approach adopted to estimatethe αi and βj , described in the following section. It turns out that to estimatethese values, the Aj do not have to be estimated; however, since the values of βj

depend on their definition, one may refer to the Aj when it comes to interpretatethe estiamted βj-values.

7

3.2 The estimation of scale values

The scale values αi are defined as expected values, and so one may try to estimate themby the arithmetic means of the Xi. Unfortunately, the Xi are not observable. However,given the assumption 3 above, one may arrive at estimates of these arithmetic means.To this end, the assumption has to be cast into a form that allows such an estimation.One possibility of achieving good discriminations is to choose the Xi in such a way thatdifferences among the αi, i.e. the variance of the αi (”between” variance), is maximalrelative to the variance of the Xi (”within” variance). As in discriminant analysis, onemay therefore estimate the αi by (i) decomposing the total sum of squares SStot of theXi into components SSb representing the ”between”, and SSw, representing the ”within”variances, and (ii) find estimates ai of the αi by maximising either the quotient SSb/SSw,or the quotient ρ2 = SSb/SStot. The latter quotient, ρ2, is the correlation quotientintroduced by Guttman (1941). The estimates of the scale values turn out to be equivalentto those arrived at by employing Correspondence Analysis; however, in contrast to themaximisation of ρ2 the derivation of Correspondence Analysis does not refer explicitlyto the Xi. Therefore, the maximisation of ρ2 will be briefly indicated; the approach hasbeen exposed, in a different context, by Nishisato (1980). Correspondence Analysis (CA)will be introduced more explicitly since it arrives at scale values for αi and βj and atthe same time relates them to χ2-components, thus facilitating the interpretation of theresults, and will therefore actually be employed to analyse the data.

3.2.1 Estimation I: maximising the correlation ratio

To begin with, suppose that each of the stimulus patterns Si is presented n times. LetIij be the set of integers indexing the trials when the response to Si was Tj , so if k ∈ Iij ,then in the k-th trial the response Tj was given to Si; the neural activity generated by thepresentation of Si can then be characterised by xik, and of course k ≤ n. Let nij = |Iij |,i.e. the response Tj was given altogether nij times when Si was presented.

Let

xij =1

nij

∑

k∈Iij

xik, (8)

xi+ =1

ni+

J∑

j=1

nij xij , ni+ =∑

j

nij (9)

x+j =1

n+j

I∑

i=1

nij xij , n+j =∑

i

nij (10)

The means xi+ and xj+ could be taken as estimates of αi and βj if the xik were observable.Since the xik cannot directly be observed, some restrictions have to be introduced. Let

ai =1

J

J∑

j=1

xij , (11)

and suppose xik in (8) is replaced by ai, i.e. by their mean value as a least square approx-imation, taken with respect to the response alternatives Tj . Then xij is approximated

8

by

ˆxij =1

nij

∑

k∈Iij

ai =nij ai

nij= ai. (12)

Replacing xij by the approximation ˆxij in (9) and (10) gives

ˆxi+ =1

ni+

J∑

j=1

nij ˆxij =1

ni+

J∑

j=1

nij ai = ai (13)

ˆx+j =1

n+j

I∑

i=1

nij ˆxij =1

n+j

I∑

i=1

nij ai = bj , (14)

i.e. ai is an estimate for xi+ and therefore for αi, and bj is an estimator for x+j andtherefore for βj .

To arrive at an explicit expression for the estimation of the ai and bj the substitutionof ai for xik introduced above will be formulated formally in terms of a mapping s:xik 7→ ai, i.e. s is a function of xik such that s(xik) = ai for all j if k ∈ Iij . Since xik isa random variable, s(xik) will be a random variable as well, so it makes sense to definemeans and variances for s. Writing sik instead of s(xik) for short, one has

sj =1

ni+

I∑

i=1

∑

k∈Iij

sik =1

ni+

I∑

i=1

nij ai. (15)

sj is the mean of the sik over all i for a given Tj . Because of (14),

sj = bj . (16)

The overall mean of the sik is given by s =∑I

i=1

∑Jj=1

∑

k∈Iijsik/N . The total variance

of the sik

SStot =

J∑

j=1

n+j∑

k=1

(sik − s)2. (17)

As is well known from the analysis of variance, SStot can be decomposed into a ”within”-component and a ”between”-component:

SStot =

J∑

j=1

n+j∑

k=1

(sik − sj + sj − s)2

=

J∑

j=1

n+j∑

k=1

(sik − sj)2 +

J∑

j=1

ni+(sj − s)2. (18)

= = SSw + SSb. (19)

The first sum on the right of (18) is the ”within”-component (SSw), and the second isthe ”between”-component SSb. SStot depends on the scale values ai, as is clear fromthe definition sik = ai and from (15). The ai may now be estimated by maximising SSb

relative to the value of SStot, which implies that SSw will be minimised relative to SStot.

9

Without loss of generality one may put s = 0, so that

SStot =

J∑

j=1

n+j∑

k=1

s2ik, SSbt =

J∑

j=1

ni+s2j . (20)

Let K = (nij) be the matrix of confusion frequencies (as given in Table 1). Further, let

Drs = diag(n1+, n2+, . . . , nI+), Dcs = diag(n+1, n+2, . . . , n+J ). (21)

be the diagonal matrices of the row- and column sums, respectively, of K. Let a =(a1, . . . , aI)

′ and b = (b1, . . . , bJ )′. In terms of these vectors SStot can be expressed inthe form

SStot =

I∑

i=1

∑

k∈Iij

s2ik =

I∑

i=1

nij a2i = a′Drsa. (22)

and making use of (15), one gets

SSb =∑

i

1

ni+(∑

j

nij ai)2 = a′K ′D−1

rs Ka. (23)

Guttman’s (1941) correlation ratio is then given by

ρ2 =SSb

SStot=

a′K ′D−1rs Ka

a′Drsa. (24)

To illustrate the meaning of ρ2, consider the following cases:

1. ρ2 → 0 if SSb → 0; so a small value of ρ2 means that it is difficult for the subjectto discriminate between patterns, since SSb → 0 means that the differences si − sare small compared to the sik − si, meaning that the Si are perceived as being verysimilar. The similarity of the Si will therefore be reflected by scale values for theSi that are close together.

2. ρ2 → 1 for SSb → SStot, meaning that the sik− si are small compared to the si− s,so the Si can well be discriminated, and the ai are well separated.

The estimated values for the ai and bj may be conceived as components of the vectorsa = (a1, . . . , aI)

′ and b = (b1, . . . , bJ )′. One finds then

Theorem 3.1 ρ2 is maximal relative to the value of SStot if a = a, satisfying the equa-tion

D−1rs KD−1

cs K ′a = µa, µ = ρ2max. (25)

i.e. a is estimated by the eigenvector a of the matrix D−1rs KD−1

cs K ′, and the corresponding

eigenvalue µ equals ρ2. b = (b1, . . . , bJ )′ is given by (16), i.e. by bj =∑I

i=1 nijai/ni+,j = 1, . . . , J .

Proof: see the Appendix, section 7.2. �

10

Remark: There may exist more than a single eigenvector a satisfying (25). The addi-tional eigenvectors may reflect the fact that the subject makes use of more than a singleaspect of the neural activities representing the patterns. This possibility will be dis-cussed below in context with solutions arrived at by employing Correspondence Analysisin order to find the estimators a and b.

Interpretation of the scale values: The ai were introduced first as estimates ˆxi+

for xij , see (13), and thus represent mean values, without reference to the variances ofthe random variables representing the neural representation. The estimates (25), on theother hand, are defined relative to the variation as defined by SStot. The differenceai − ai′ will be ”small” when the difference between the corresponding means is smallrelative to the variance of the distributions, and ”large” when the difference between themeans is large relative to their variances. It will be demonstrated in section ?? thatunequal variances may lead to a lack of proportionality of the ai to the xi+.

3.2.2 Estimation II: Correspondence Analysis

Let again K = (nij), i, j = 1, · · · ,m be a confusion matrix; nij is the frequency withwhich the stimulus pattern Si is confused with stimulus Sj (indicated by giving theresponse Rj). Let ni+ and n+j be defined as before, and let N =

∑

i,j nij . ni+n+j/N isthe expected number of confusions of Si with Sj , provided the subject judges randomly.Let

xij =nij − ni+n+j/N√

ni+n+j/N. (26)

xij are called residuals, or or weighted residuals. Obviously, the residuals xij reflect thedependencies among stimuli and responses. Certainly,

χ2 =I∑

i=1

J∑

j=1

x2ij , (27)

Let pij = xij/N , ri = ni+/N , cj = n+j/N , and

tij =pij − ricj√

ricj= xij/

√N, (28)

then

χ2

N=

I∑

i=1

J∑

j=1

t2ij . (29)

χ2/N is called the inertia of the confusion matrix, denoted by In(K), i.e. In(K) = χ2/N .In(K) (and, analogously, χ2) can be written as the sum of components characterisingeither the patterns Si or Tj .

Ini(K) = χ2i /N =

J∑

j=1

t2ij , Inj(K) = χ2j/N =

I∑

i=1

t2ij . (30)

11

Certainly, In(K) =∑

j χ2j/N =

∑

i χ2i /N . The scaling provided by CA can be inter-

preted with respect to inertia (equivalently: χ2-) components.

In the context of CA, the inertia instead of the χ2 will be considered in the following,in particular since statistical packages refer to the inertia rather that to the χ2 (e.g.STATISTICA).

Spatial representations and inertia decompositions: Let r = (r1, . . . , rI)′ and

Dr = diag(r1, . . . , rI) the diagonal matrix having ri in its diagonal, with ri = ni+/N ,ni+ =

∑

j nij . Analogously, c = (c1, . . . , cJ )′, cj =∑

i nij/N and Dc = diag(c1, . . . , cJ ).Let T = (tij) be the matrix of tij-values; note that according to (28),

T = D−1/2r (P − rc′)D−1/2

c . (31)

As is well known from linear algebra, the column or row vectors of T can be representedas linear combinations of some orthogonal basis vectors of the I- or J-dimensional vectorspace, respectively. The Singular Value Decomposition (SVD) provides such basis vec-tors, which turn out to be related to the vectors a and b of scale values as characerisedin Theorem 3.1.

Theorem 3.2 The SVD of T is given by

T = UΛ1/2V ′, (32)

with U the matrix of normalised eigenvectors of TT ′, V the matrix of normalised eigenvec-tors of T ′T and Λ1/2 the diagonal matrix diag(

√λ1, . . . ,

√λs), λk, 1 ≤ k ≤ s ≤ min(I, J)

the nonzero eigenvalues of TT ′ and T ′T .

Proof: see Appendix, section 7.3.

Comments: Let uik be the i-th component of the k-th column vector of U , and letvjk be the j-th component of the k-th column vector of V . The uik or uik

√λk may be

taken as coordinates of a point representing the pattern Si in an s-dimensional spaceR

s whose axes (possibly) represent some aspects of the neural representations Ni of Si.Analogously, the vjk or vjk

√λk may be taken as coordinates of a point representing Tj in

Rs. The simultaneous presentation of Si- and Tj-points is known as biplot, which reflects

structural relations between the Si and the Tj . The interpretation of the biplot will bebased upon the distances between the Si-points on the one hand and the Tj-points onthe other. However, the choice of the coordinates determines a particular metric, andthe interpretation will therefore be influenced by the particular metric; an appropriatere-scaling of the coordinates may thus lead to better interpretations. �

Definition of scale values: The coordinates of the Si and Tj are defined as follows:

F = D−1/2r UΛ1/2 (33)

G = D−1/2c V Λ1/2, (34)

(Greenacre, 1984, p. 89). The element fik in the i-th row and k-th column of F is thecoordinate of a point representing Si on the k-th axis. Analogously, the element gjk of G

12

is the coordinate of the point representing Tj on the same axis. The coordinates F andG turn out to be equivalent to those found by maximising the correlation ratio ρ2. Notethat between the fik and the uik the following relations exist:

fik = uik

√λk√ri

, uik = fik

√ri√λk

, (35)

gjk = vjk

√λk√cj

, vjk = gjk

√cj√λk

. (36)

Relation to the square ρ2 of the correlation ratio: The following theorem showsthat Correspondence Analysis is equivalent to maximising ρ2, as discussed in section3.2.1:

Theorem 3.3 The F and G, as defined in (33) and (34), satisfy the following eigenvec-tor equations:

(D−1r PD−1

c P ′)F = ΛF (37)

(D−1c P ′D−1

r P )G = ΛG. (38)

Proof: The proof may be found in section 7.4 of the Appendix. �

Remark: Starting from the SVD (32) makes it difficult to see how the scale values,given in the matrices F and G, relate to the mean values of the underlying distributionsof the criterion variable η. The relations become obvious noting that the equations (37)and (38) correspond to (25) of Theorem 3.1. It follows that the scaling provided by CAis equivalent to the obtained when the correlation is maximised. Note that the equations(37) and (38) result from a rescaling of the eigenvectors in U and V , as given by theSVD (32), which is basically a principal component analysis (PCA) of the matrix Tof residuals. The equivalence of these equations with (25) thus establishes a relationbetween PCA and Discriminant Analysis.

Relation between inertia components and spatial representation: To see how thespatial representation of the Si and the Tj relates to χ2 or inertia components let us firstlook at the tij : According to (32), the element tij of T is given by tij =

∑

k uikvjk

√λk.

Then t2ij =∑

k u2ikv2

jkλk + 2∑

k 6=k′ uikvjkuik′vjk′λk, and

χ2i /N =

s∑

k=1

u2ikλk = ri

s∑

k=1

f2ik, (39)

because of (35) and because∑

j v2jk = 1 (the vectors in V are normalised), and

∑

j vjkvjk′ =0 (the vectors in V are orthogonal). Analogously, one finds

χ2j/N =

s∑

k=1

v2jkλk = cj

s∑

k=1

g2jk. (40)

13

Ini(K) = χ2i /N is the inertia component due to the pattern Si, and Inj(K) = χ2

j/N isthe inertia component due to the pattern Tj .

As an immediate consequence of (39) and (40) one has

χ2/N =

I∑

i=1

χ2i /N =

s∑

k=1

λk

I∑

i=1

u2ik =

s∑

k=1

λk, (41)

since the eigenvectors are normalised, i.e.∑

i u2ik = 1, so that

πk =λk

χ2/N(42)

is the proportion of the total inertia due to the k-th latent dimension; this corresponds tothe role of the λk in in an ordinary PCA, where the eigenvalues λk reflect the proportionof variance explained by the corresponding dimension.

To see how the spatial representations of the Si and Tj in the biplot relate to theinertia components, let us consider the point of Si, say. The coordinates of this point aregiven by fi1, . . . , fis. The square of the Eucledian distance from the origin to this pointis given by

d2i =

s∑

k=1

f2ik, (43)

so that, from (39),χ2

i /N = rid2i , (44)

and, analogously,χ2

j/N = cjd2j , (45)

where dj is the distance from the origin of the point representing Tj . The distancebetween two points, one for Si and the other for Si′ , is given by

d2ii′ =

s∑

k=1

(fik − fi′k)2; (46)

there is, however, no simple relation to the inertia of K. The distance between two pointsrepresenting Tj and Tj′ is defined analogously.

Scale values and inertia components: Consider the vector from the origin of thecoordinate system to the point representing Si. The vector has length di, and accordingto (44), rid

2i = χ2

i /N . The projection of di on the k-th axis equals, by definition, fik, seeFig. 2, and one has

di· cos ϑik = fik. (47)

It follows that then cos2 ϑik = f2ik/d2

i , and multiplying and dividing the right hand sideby ri yields

cos2ϑik =rif

2ik

rid2i

=rif

2ik

Ini(K)(48)

14

Figure 2: Projection of Si on k-th axis

-k-th axis

Si

di·

fik

��

��

��

��>

?origin

ϑik

But rif2ik is the contribution of the k-th dimension to Ini(K), the component of the inertia

(χ2) due to Si. Therefore, cos2ϑik measures the contribution of the k-th dimension toIni(K), i.e. to the inertia or χ2-component generated by Si.

The interpretation of cos2 ϑjk = cjg2jk/Inj(K) is analogous: this value reflects the

contribution of the k-th dimension to the inertia or χ2-component due to Tj .

χ2-distances and differences between scale values: The distance between twopoints respresenting two stimuli is meant to represent the subject’s ability to discriminatebetween these two stimuli. It is plausible to assume that the discriminability does notonly depend upon the difference between the expected values of the random variablesrepresenting the sensory activities resulting from the stimulus presentations, but alsoupon their variances. The variances determine the distribution of frequencies in the rowsrepresenting the stimuli. The more similar the frequency distributions of two rows, themore difficult it will be to discriminate between the corresponding stimuli. The relationbetween distances and variances turns out to be of importance for the interpretation ofthe results of a correspondence analysis. This leads to the following definition:

Definition 3.3 Consider the rows corresponding to the stimuli Si and Si′ . Let

δ2ii′ =

J∑

j=1

1

cj

(

pij

ri− pi′j

ri′

)2

=

J∑

j=1

1

n·j

(

nij

ni·− ni′j

ni′·

)2

. (49)

δ2ii′ is called the χ2-distance between the stimuli Si und Si′ .

δ2jj′ =

I∑

i=1

1

ri

(

pij

cj− pij′

cj′

)2

=

I∑

i=1

1

ni·

(

nij

n·j− nij′

n·j′

)2

(50)

is called the χ2-distance between the responses (column categories) Rj und Rj′ .

δ2ii′ will be small if nij/ni· ≈ ni′j/ni′· for all j; since all stimuli occur with equal frequency,

ni· = ni′·, so δ2ii′ will be small when nij ≈ ni′j for all j, i.e. when the frequency

distributions of row i and row i′ are sufficiently similar. In this case the differencesbetween the two distributions do not contribute much to the χ2-value of the confusionmatrix. The following theorem establishes the relation between a χ2-distance and thecorresponding Eucledian distance:

15

Theorem 3.4 The relation between the Euclidean distance dii′ and the correspondingχ2-distance δii′ is given by

dii′ = δii′ (51)

Analogously one has for the responses

djj′ = δjj′ . (52)

The proof is given in section 7.5 of the Appendix. �

So the value of dii′ reflects the amount as to which the differences between the con-fusion frequencies for stimulus Si and those for Si′ contribute to the overall-χ2 of theconfusion matrix. dii′ is thus a measure for the ability of the subject to discriminatebetween Si and Si′ . If only a single dimension is relevant to explain the confusion, i.e.if the subject refers to a single decision variable, then the differences between two scalevalues - one for Si and the other for Si′ - reflects the ability to discriminate between Si

and Si′ , and since the scale values are meant to be proportional to the expected valuesof the decision variable, given a particular stimulus was shown, dii′ is thus a measure ofsensitivity equivalent to d′; see section ?? for a more explicit discussion of this point.

Relation between the Si- and Tj-points. The relation between the points for theSi and those for the Tj are, of course, also of interest. However, this relation cannot bediscussed in terms of distances, since the distance between the point representing Si andthe point representing Tj is not defined. Instead, the relation between the coordinates Fand G is given by the following theorem:

Theorem 3.5 The relation between the coordinates F of the Si and G of the Tj is givenby the equations

F = D−1r PGΛ−1/2, (53)

G = D−1c P ′FΛ−1/2 (54)

Proof: The proof may be found in section 7.6 of the Appendix. �

To illustrate, let us consider the relation (54) between scale values of T1, . . . , TJ and andS1, . . . , SI on the k-th dimension; one gets the equations

g1k =p11

c1

f1k√λk

+p21

c1

f2k√λk

+ · · · + pI1

c1

fIk√λk

g2k =p12

c2

f1k√λk

+p22

c2

f2k√λk

+ · · · + pI2

c2

fIk√λk

... (55)

gJk =p1J

cJ

f1k√λk

+p2J

cJ

fk√λk

+ · · · + pIJ

cJ

fk√λk

.

So the scale value gjk of Tj on the k-th dimension may be seen as a sum of the ”weighted”fik. One may ask when the point for Tj is near the point for Si. This will be the case

16

when fik ≈ gjk for all k, that is if the fik and the gjk have similar values for all k. Asufficient condition for fik and gjk to assume similar values is that pij is large comparedto the values of pij′ , j 6= j′. So if Si is in particular identified with Tj then the points forSi and Tj will be close together.

4 The 1-dimensional case

4.1 Numerical evaluations

The definition of the scale values αi and βj was made without reference to a specific typeof density for the random variables Xi, i = 1, . . . , I. In order to find out to which extentthe CA of a confusion matrix yields estimates of the αi and βj , specific assumptionshave to made. With reference to the General Recognition Model of Ashby and Townsend(1986) it will now be assumed that the densities are Gaussian. However, the Gaussian as-sumption does not appear to be necessary; it will be demonstrated that densities differingconsiderably from the Gaussian in shape also yield, under certain mild side conditions,scale values that are just linear transformations of the expected values.

Gaussian densities: The mean and the variance of a distribution may depend upon theindividual stimulus. So one may have equal and unequal spacing of expected values, andequal or unequal variances, so there are four different classes of density configurations tobe considered. The following table contains a characterisation of the considered casedwith respect to the expected values and the variances. The labels A to B refer to a

Table 2: Means and variances for density configurations

Label µi (I) standard deviations σ µi (II)A .350 .090 .100 .025 .040 .090 .350B .450 .090 .075 .050 .090 .040 .375C .550 .090 .050 .075 .040 .090 .475D .650 .090 .025 .100 .090 .040 .750

configuration of densities as given in Fig. 3 for equally spaced mean values (µi, (I)), andFig. 4 for not equally spaced mean values, (µi, (II)). The entries between µi, (I) andµi, (II) provide the standard deviations for the corresponding configuration of densities.Figures 3 and 4 show the configurations A to B together with the corresponding plot ofscale values on the first dimension and expected values µi.

The cases (D) and (E) represent two configurations where the standard deviations donot increase or decrease with µi. The nonlinearity of the relation between scale valuesand expected values comes in only when a density with a (sufficiently) smaller standarddeviation is followed by one with larger one, i.e. (D), or when a density with a largerstandard deviation is followed by one with a smaller standard deviation, i.e. (E). If thedifferences between the standard deviations are not sufficiently large, the configurations(D) and (E) may well yield linear relations between the scale values and the µi.

17

Figure 3: Gaussian densities, equal spacing of expected values ((a1) to (e1)), and corre-sponding plots of scale values for stimuli versus expected values ((a2) to (e2)).

0,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,00

1

2

3

4

5

Gau

ssia

n de

nsit

ies

0,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,80

5

10

15

Gau

ssia

n de

nsit

ies

0,35 0,40 0,45 0,50 0,55 0,60 0,65

-1,0

-0,5

0,0

0,5

1,0

Ca

scal

e v

alu

e

0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,00

2

4

6

8

10

Gau

ssia

n de

nsit

ies

CA

sca

le v

alue

s,

0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,80

2

4

6

8

10

Gau

ssia

n de

nsit

ies

CA

sca

le v

alue

s

0,35 0,40 0,45 0,50 0,55 0,60 0,65-1,5

-1,0

-0,5

0,0

0,5

1,0

0,35 0,40 0,45 0,50 0,55 0,60 0,65

-1,0

-0,5

0,0

0,5

1,0

1,5

expected value

expected value

expected value

0,35 0,40 0,45 0,50 0,55 0,60 0,65

-1,0

-0,5

0,0

0,5

1,0

1,5

expected value

CA

sca

le v

alue

s,

expected value

0,35 0,40 0,45 0,50 0,55 0,60 0,65

-1,5

-1,0

-0,5

0,0

0,5

1,0

Ca

scal

e va

lues

0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,00

4

8

12

16

Gau

ssia

n de

nsit

ies

(a)

(b)

(c)

(d)

(e)

decision variable

decision variable

decision variable

decision variable

decision variable

-0,6 -0,4 -0,2 0,0 0,2 0,4 0,6-0,6

-0,4

-0,2

0,0

0,2

0,4

0,6

a = 4.98b = .997r = .999

(A)

Sca

le v

alue

s re

spon

ses

Scale values stimuli

-0,6 -0,4 -0,2 0,0 0,2 0,4 0,6 0,8

-0,4

-0,2

0,0

0,2

0,4

0,6

0,8

a = -.017b = .997r = .998

Sca

le v

alue

s re

spon

ses


-0,8 -0,6 -0,4 -0,2 0,0 0,2 0,4 0,6

-0,8

-0,6

-0,4

-0,2

0,0

0,2

0,4

a = .017b = .997r = .998Sca

le v

alue

s re

spon

ses


-0,8 -0,6 -0,4 -0,2 0,0 0,2 0,4 0,6-0,8

-0,6

-0,4

-0,2

0,0

0,2

0,4

0,6

a = ,008b = ,997r = .997

Sca

le v

alue

s re

spon

ses

-0,6 -0,4 -0,2 0,0 0,2 0,4 0,6 0,8-0,6

-0,4

-0,2

0,0

0,2

0,4

0,6

0,8

a = -.008b = .997r = .997S

cale

val

ues

resp

onse

s



(i) (ii) (iii)

The bottom part of Table 2 gives the parameters for not equally spaced expectedvalues. In Fig. 4. The case (a1)(unequally spaced expected values, equal standarddeviations) is of interest because it shows that equal variances do not yet imply thatthe scale values are a linear function of the expected values. The difference betweentwo scale values depends upon the difference of the corresponding expected values aswell as upon the amount of overlap of the densities, i.e. upon their standard deviations.The relation between expected values and scale values is positively accelerated, meaningthat the difference between two scale values is a nonlinear function of the corresponding

18

Figure 4: Gaussian densities, unequal spacing of expected values ((a1) to (e1)), andcorresponding plots of scale values versus expected values ((a2) to (e2))

0,0 0,2 0,4 0,6 0,80

5

10

15

0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0 1,10

2

46810

121416

Gau

ssia

n de

nsit

y

expected value

0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,90

2

4

6

8

10

0,3 0,4 0,5 0,6 0,7 0,8-1,0

-0,5

0,0

0,5

1,0

1,5

2,0

Ca

scal

e va

lues

expected value

0,2 0,4 0,6 0,8 1,00

2

4

6

8

10

Gau

ssia

n de

nsit

ies

0,3 0,4 0,5 0,6 0,7 0,8-1,0

-0,5

0,0

0,5

1,0

1,5

2,0

Ca

scal

e va

lues

expected value

0,0 0,2 0,4 0,6 0,8 1,00

1

2

3

4

5

Gau

ssia

n de

nsit

yG

auss

ian

dens

ity

Gau

ssia

n de

nsit

y

0,3 0,4 0,5 0,6 0,7 0,8-1,0

-0,5

0,0

0,5

1,0

1,5

2,0

Ca

scal

e va

lues

0,3 0,4 0,5 0,6 0,7 0,8-1,0

-0,5

0,0

0,5

1,0

1,5

2,0

Ca

scal

e va

lues

Ca

scal

e va

lues

expected value

0,3 0,4 0,5 0,6 0,7 0,8-1,0

-0,5

0,0

0,5

1,0

1,5

expected value

(A)

(B)

(C)

(D)

(E)

decision variable

decision variable

decision variable

decision variable

decision variable

19

difference of the expected values. This comes about even more pronounced when thestandard deviations decrease with increasing difference of the expected values, see thecase (b2). Conversely, if the standard deviations increase with increasing difference of theexpected values, the acceleration of the scale values may be reduced to a mere linearity,as demonstrated in (c2). The interpretation of (d1), (d2) and (e1), (e2) is obvious.

To summarise, a linear relationship between expected values and scale values doesnot necessarily imply that the variances of the underlying distributions are about equal;provided the variances do not co-vary with the expected values and do not differ toomuch from each other, the relation may turn out to be linear, although the values ofr2 as given in Fig. 6 may not turn out as high; still, the cases (d2) and (e2) in Fig. 4yield a value of r2 = .971 each. In general, lower values of r2 in case of an overall linearappearing relationship will indicate variances that do not vary systematically with theexpected values.

Apart from the means and variances of the underlying distributions the decisionboundaries have to be chosen. It has been assumed here that the subject tries to decideoptimally, meaning that the boundaries are chosen appropriately, i.e. such as to minimisethe number of errors. The question is, how the subject proceeds to find them. Thisquestion is not explicitly dealt with in this paper. To compute the confusion frequencies,the boundaries were assumed to be given by the points xoi at which two neighbouringdensities have identical values, i.e. fi(x0i) = fi+1(x0i), i = 1, . . . , I − 1, except for thedensities in Fig. 5, where the boundaries were assumed to be given by (µi + µi+1)/2.Generally, these two types of boundary definitions yield confusion frequencies differ onlyto a negligible amount, and the coordinates for the responses Tj are almost identical tothose of the stimuli. In other words, there is no bias in the decisions. This correspondsto the findings for simple patterns, reported in section 4.2.1. For more complex patterns,a bias may exist.

The beta case etc In the Gaussian case one has

fi(x) =1

σi

√2π

exp

(

− (x − µi)2

2σ2i

)

. (56)

Since neural activity can vary only within finite limits it will be of interest to consideran alternative to this assumption. Here, the Beta-distribution will be investigated, nor-malised such that the random variables all vary on the interval [0, 1],

fi(x) =1

B(ai, bi)xai−1(1 − x)bi−1, 0 ≤ x ≤ 1. (57)

The shape of these densities depends upon the values of its parameters; β-densities thusallow to test the robustness of the estimates of αi and βj . The expected value and thevariance of a β-density are given by the following expressions:

µi =ai

ai + bi(58)

σ2i =

aibi

(ai + bi)2(ai + bi + 1). (59)

The parameters of the Gaussians were chosen from a range of values corresponding tothat of the Beta-densities. The normalisation implies the need to choose the parameters

20

Figure 5: Distributions of criterion variable, (a) Gaussian distributions, (b) Beta-distributions (I)

-0,4 -0,2 0,0 0,2 0,4 0,6 0,8 1,0 1,2 1,4

0,0

0,5

1,0

1,5

2,0

2,5

3,0

3,5

(a)

ex3gausstot

(4)(3)

(2)

(1)

den

sity

X (criterion)0,0 0,2 0,4 0,6 0,8 1,0

0,0

0,5

1,0

1,5

2,0

2,5

3,0

3,5

0,0

0,5

1,0

1,5

2,0

2,5

3,0

3,5(1) .45,

(2) .50,

(3) .65,

(4) .78, (4)(3)

(2)(1)

bd12.opj

den

sity

X (criterion)

(b) .038

.031

.063

.029

E(X) 2r

(1) .22,

(2) .45,

(3) .55,

(4) .78,

.023

.026

.014

.029

E(X) 2r

from a certain range. The means µi were chosen not to increase in an equally spacedway, and the standard deviations σi were chosen such that they do not increase with theµi. The parameters ai and bi of the Beta-densities can be computed from (58) and (59),

The expected values µi increased and the variances varied moderately, but irregularlywith µi, see Fig. 5. Confusion probabilities were computed as areas between decisionboundaries under the corresponding densities; multiplied with some sufficiently largeinteger (the number of presentations of each stimulus) and rounded the resulting matrixof confusion probabilities is then turned into a matrix of confusion frequencies which wasthen analysed by a CA. Note that only a single latent dimension is employed, so onewould expect that the CA yields a 1-dimensional solution.

Table 3: Confusion frequencies nij , and scale values, Gauss distributions (I)

T1 T2 T3 T4 ni+ µ σ2 fi1

S1 778 191 29 2 1000 .22 .023 -1.098S2 169 493 302 37 1001 .45 .014 -.221S3 90 288 387 236 1001 .55 .026 .251S4 4 45 200 751 1000 .78 .029 1.069n+j 1041 1017 918 1026 4002

gj1 -1.059 -.0249 .293 1.059 χ2 = 3548.69

Further, the scale values fi1 for the Si and gj1 for the Tj on the first dimension, resultingfro a CA of the nij . The χ2- or inertia component due to the first dimension is only69.35 %; this result will be further commented upon below. Note that the r2-values forthe regression of the CA-scale values on the µi amount to .99.

21

Table 4 shows the results when the distributions are Beta-distributions; the notation isas in Table 3. The plots of the fi1 versus the expected values are presented in Fig. 6,(b). The values r2 = .998 for the Gaussian case and r2 = .997 for the beta-case indicatean excellent approximation of the expected values by the scale values in terms of a linearrelationship. This is remarkable insofar as the parameters of the beta-distributions werenot chosen to generate distributions that look ”nice”, i.e. look like Gaussians.

Table 4: Confusion frequencies nij and scale values, beta-distributions (I)

T1 T2 T3 T4 ni+ µ σ2 fi1

S1 556 168 175 101 1000 .45 .038 -.617S2 468 127 170 235 1000 .50 .063 -.333S3 174 150 278 398 1000 .65 .031 .218S4 65 68 169 697 999 .78 .029 .733n+j 1263 513 792 1431 3999

gj1 -.631 -.238 .048 .616 χ2 = 1159.39

Figure 6: Regression of scale values on expected values: (a) Gaussian distributions (I),(b) Beta-distributions (I); see Fig. 5.

0,4 0,5 0,6 0,7 0,8-0,8

-0,6

-0,4

-0,2

0,0

0,2

0,4

0,6

0,8 (b)

r2 = .997

CA

sca

le v

alue

s, D

im. 1

0,2 0,3 0,4 0,5 0,6 0,7 0,8-1,5

-1,0

-0,5

0,0

0,5

1,0(a)

r2 = .998

CA

sca

le v

alue

s, D

im. 1

expected value expected value

Biplots and the number of latent dimensions

Fig. 7, (a) shows the biplot for the confusion frequencies as generated by the Gaussiandistributions. The main feature of the configuration is that Si- and corresponding Ti-points assume very similar positions, i.e. there is a high degree of correspondence betweenstimulus pattern Si- and corresponding Ti-patterns. The projections of the points on thefirst dimension, ”explaining” 69.35 % of the total inertia (or χ2), are the scale values fi1

and gj1 given in Table 3. There are, however, two points to be discussed with respect tothe biplot:

1. Although only a single dimension (random variable) was used to generate the con-fusion frequencies, the first dimension accounts for only 69.35 % of the inertia of

22

Figure 7: Biplots for the confusion frequencies of Table 3 and 4, as generated by Gauss-densities (a), and beta-densities (b)

-1,5 -1,0 -0,5 0,0 0,5 1,0 1,5-1,5

-1,0

-0,5

0,0

0,5

1,0

1,5

R4S4

S3

R3

R2

S2

R1 S1

Dim

ensi

on 2

: 29

.4 %

of

iner

tia

Dimension 1: 69.35 % of inertia

-0,8 -0,6 -0,4 -0,2 0,0 0,2 0,4 0,6 0,8-0,8

-0,6

-0,4

-0,2

0,0

0,2

0,4

0,6

0,8

R4 S4

S3R3

R2

S2S1

R1

Dim

ensi

on 2

: 6

.96

% of

iner

tia


(a) (b)

the data, 29.40 % are due to a second dimension. Indeed, the eigenvalues areλ1 = .615, λ2 = .261, λ3 = .011, and (42) yields the corresponding percentages ofχ2 ”explained” by the corresponding dimensions.

2. The plot shows the typical horseshoe effect (Greenacre (1984)): the points repre-senting Si and the corresponding Ti seem to lie on a U-shaped line.

The two points are related to each other. Inspection of the confusion frequencies inTable 3 shows that the number of correct responses (778, 493, 387 and 751) are alwaysthe largest numbers in the i-th row and the i-th column. The number of confusionslevels off the farther apart Si′ from a given Si and Ti′ from the corresponding Ti. Thiscauses the Matrix T = (tij) (c.f. (28)), to be of full rank and the SVD of T will yieldr = I = J eigenvalues λk 6= 0. The lesser the number of errors made the more theconfusion matrix will resemble a diagonal matrix and the more will the eigenvalues λk,k > 1, differ from zero, and the more will the biplot based on the first two ”dimensions”resemble a U-shaped or horseshoe like configuration. The eigenvalues λk, k > 1, thereforedo not reflect a genuine second feature of the patterns, but have to be considered anartifact resulting from the linear decomposition (32). A more detailed discussion of thehorseshoe-effect in Correspondence Analysis may be found in Greenacre (1984), p. 226.For the present purposes it is sufficient to note that the horseshoe-effect is indicative ofthe second dimension being a negligible artifact.

Fig. 7 (b) shows the biplot for the confusion frequencies generated by the beta-distributions. Note that here the CA-solution is basically 1-dimensional: the first dimen-sion explains about 93% of the χ2 (or the inertia χ2/N) of the Table. Inspection of Table4 shows that here the main diagonal does not always contain the largest number in agiven row or column. This fact appears to suppress the horseshoe-effect. Correspond-ing to PCA-approximations of factor analyses of measurement, the second dimension,accounting for about 7 % of the inertia, here just reflects random error.

23

4.2 The analysis of empirical data

4.2.1 Gabor patches

An identification experiment was performed employing Gabor patches as stimuli, wherea Gabor patch is defined as

s(x) = exp(−x2/σ2) cos(2πfx); (60)

f is the spatial frequency of s, and σ2 defines the width of s. Four stimulus patterns weredefined by the spatial frequencies f1 = 3.25, f2 = 3.75, f3 = 4.25 and f4 = 4.75 c/deg.The patterns were presented in random order and the subject had to decide which ofthe four patterns was presented. There were 11 sessions with 100 trials each, i.e. eachpattern was presented 25 times in a session. Individual CAs were computed1 for each ofthe 11 confusion matrices, giving essentially the same picture, i.e. biplot for each session.So the data were lumped, i.e. a single CA was computed for the 4 × 4 × 11 matrix.the results are shown in Fig. 4.2.1, together with the results of the matrix generated byaveraging the confusion matrices over the 11 sessions. Table 5 shows the confusion matrixresulting from adding the data from the 11 sessions into one matrix, together with thescale values fi1 for the stimuli, and gj1 for the responses. The scale values with respect tothe second dimension have been omitted since the second dimension does not appear toreflect a second attribute with respect to which the patterns were judged; as the biplotsshow, the configuration shows a horseshoe effect (Greenacre (1984), p. 226). Since theconfusion matrix (as all individual matrices) have maximum frequencies in the diagonalcells the (generalised) SVD of P − rc′ will indicate at least two, if not three dimensionsunderlying the data. The less confusions occur, the more the matrix will become similarto a diagonal matrix requiring 3 dimensions to represent the data, even if a single latentcriterion was employed by the subject to identify the stimuli. The regression of the scale

Table 5: Confusion frequencies, summed over the 11 sessions; χ2 = 922.85 equals sum ofeither χ2

i or χ2j ; eigenvalues: λ1 = .600, λ2 = .208, λ3 = .03.

T1 T2 T3 T4 Σ χ2i fi1

S1 193 74 8 0 275 337.48 -.999S2 60 161 51 3 275 115.34 -.395S3 14 92 134 31 271 98.85 .307S4 4 15 102 154 275 371.18 1.091Σ 271 342 295 188 1096χ2

j 332.91 126.59 127.73 335.62 χ2 = 922.85

gj1 -.979 -.348 .538 1.201

values fi1 on the spatial frequency parameters ϕi of the patterns is perfectly linear, seeFig. 9, suggesting (i) that the subject identified the patterns indeed with respect to asingle latent dimension x representing the apparent spatial frequency characteristic of apattern, (ii) that we may assume that the fi1-values are indeed proportional to the the

1Correspondence Analysis of the Statistica-package

24

Figure 8: Biplots for Gabor patches; (a) different sessions, (b) averaged over sessions

-1,5 -1,0 -0,5 0,0 0,5 1,0 1,5

-1,5

-1,0

-0,5

0,0

0,5

1,0

1,5

f = 3.75 c/deg

f = 4.25 c/deg

f = 4.75 c/deg

Responses Stimuli (spat. freq. f )

Dim

ensi

on 2

: 2

5.18

% o

f in

erti

a

Dimension 1: 67.61 % of inertia-1,5 -1,0 -0,5 0,0 0,5 1,0 1,5

-1,5

-1,0

-0,5

0,0

0,5

1,0

1,5

R4

R3R2

R1

Responses Stimuli

Dim

ensi

on 2

: 24

.77

% o

f in

erti

a


f = 3.25 c/degS1: f= 4.75 c/deg

S4: f = 3.25 c/deg

S2: f = 4.25 c/deg S3: f = 3.75 c/ deg

(a) (b)

Figure 9: Regression of scale values of stimuli versus corresponding conditional expecta-tions

3,2 3,4 3,6 3,8 4,0 4,2 4,4 4,6 4,8

-1,0

-0,5

0,0

0,5

1,0r2 = .995

(averaged matrix)

CA

sca

le v

alue

U

Stimulus parameter: spatial frequency

-1,0 -0,5 0,0 0,5 1,0 1,5

-1,0

-0,5

0,0

0,5

1,0

1,5

r2 = .994

(averaged matrix)

CA

val

ues

V (

Res

ponse

s)

CA scale values U (stimuli)

conditional expectations of the random variables representing the activity generated bythe stimuli, and (iii), with respect to Fig. 3, (A), that the variances of these randomvariabes are about equal, meaning that the patterns are about equally difficult to identify.The data point to the possibility that the spatial frequency parameter appears to be therelevant parameter in the identification process. One may also think of the subjectconcentrating on other features of the pattern that allow an identification of a pattern,e.g. the position of the luminance maxima; if variables like that are the relevant variablesthey have to be such that they are linearly related to the parameter values of the stimuli,i.e. the spatial frequencies ϕi.

Note that the χ2i -values are largest for S1 and S2; correspondingly, χ2

j -values arelargest for T1 and T4. So the pattern defined by the smallest value of ϕ and that defined

25

by the largest value of ϕ generate the largest contribution to the total χ2 of the table.Moreover, if one would plot the χ2

i - or the χ2j -values with versus the subscript of either

the Si or the Tj one would get practically symmetrical, U-shaped curves. This meansthat the χ2-contributions are symmetrical functions of the distance from the virtual meanpattern, defined by a ϕ being the arithmetic mean of the ϕi, i = 1, . . . , 4. The row of con-fusion frequencies one would observe if such a pattern had indeed been presented wouldcorrespond to the mean row of frequencies, and the χ2

i -values increase with increasingdistance from this mean. A similar statement holds for the Tj and their χ2

j -values. Thisis characteristic for the results of a CA: the origin of the biplot corresponds to the meanrow and the mean column of the contingency table, and the more a row or a columndeviates from the mean row or column, the larger the corresponding χ2-component.

4.2.2 Superimposed Gabor patches with and without flankers

The stimuli are now given by superpositions of two Gabor-functions,

s1(x; f1) = cos(2πf1x) exp(−x2/2σ2),

with f1 = 2 c/deg, andsi = cos(2πfix) exp(−x2/2σ2)

and fi = 4+i× .1, i = 1, . . . , 4. The stimulus patterns are then Si = s1+si. The flankingpatches were defined by either one of three possible spatial frequencies f0k: f01 = 3.8,and f01 = 4.5, and f03 = 5.0. Figure 10 shows the luminance distributions for the fourstimulus patterns; Figure 11 shows these distributions superimposed; the overall shape

Figure 10: Luminance distributions of the stimuli

-1,0 -0,5 0,0 0,5 1,0-2

-1

0

1

2

f = 2 + 4.1 c/deg

lum

inan

ce p

rofi

le

retinal coordinate

-1,0 -0,5 0,0 0,5 1,0-2

-1

0

1

2

2 + 4.2 c/deg

lum

inan

ce p

rofi

le

retinal coordinate

-1,0 -0,5 0,0 0,5 1,0

-1

0

1

2

2 + 4.3 c/deg

lum

inan

ce p

rofi

le

retinal coordinate

-1,0 -0,5 0,0 0,5 1,0

-1

0

1

2 2 + 4.4 c/deg

lum

inan

ce p

rofi

le

retinal coordinate

is obviously very similar, so the subject has to respond to relatively fine differences inthe patterns.

To identify a stimulus the subject just has to identify the partial pattern si; in thiscase the subject can separate the two components s1 and si. However, the pattern Si

may be perceived as a conjunction of the two components, or identify to particular partialaspects of the patterns, due to specific aspects of the superposition. In the first case onemay predict that the relevant random variable with respects to whose value the patternsare predicted reflect just the spatial frequency fi of si, and corresponding to the results ofthe experiment reported in section 4.2.1, the scale values for the stimuli, and also for theresponses, should be linear functions of the spatial frequency parameter fi. In the secondcase such a linear relationship may not occur. So a linear relationship is compatible in

26

Figure 11: Luminance distributions of the stimuli, superimposed

-1,5 -1,0 -0,5 0,0 0,5 1,0 1,5-2,0

-1,5

-1,0

-0,5

0,0

0,5

1,0

1,5

2,0

Lum

inan

ce

x (retinal coordinate)

particular with the first hypothesis, although such a relationship is at least in principleconceivable with some sort of Gestalt-identification.

Biplots (I): Fig. 12 shows the biplots corresponding to the three spatial frequencyparameters of the flanking patterns. Although there are only three such parameters,six biplots are shown: the data from trials with a flanking pattern and those withouta flanking pattern have been analysed separately. One may expect that the data fromtrials without flanking patterns are very similar; on the other hand, the spatial frequencyfok of the flanking pattern was kept constant within a session, and this may have aneffect also on the identification process when no flankers were actually presented. Thedominant feature of the biplots is that the patterns appear to be discriminated withrespect to a single dimension: the inertia component due to the first dimension is alwaysbetween ≈ 86% and ≈ 90%. The second dimension seems to reflect a horse-shoe effectand represents most likely a numerical artifact and not a perceptually relevant featureof the patterns. The stimuli S1, S2, S3 and S4 are always well-ordered along the firstdimension, corresponding to the spatial frequency parameter fi of the second patterncomponent. The responses R1, . . . , R4 are also well-ordered along the first dimension;however, with increasing value of fi the precision of the responses seems to deterioratewhen a flanking pattern was shown together with the stimulus pattern. In particular R3

appears midway between S3 and S4, indicating that S3 and S4 are more often confusedwith each other than the remaining stimuli. It is of interest to have a look at the χ2-values for the individual confusion matrices, which are given in Figure 12. For all valuesof f0k, the χ2-value for the stimulus presentations with flanking patterns is higher thanfor the stimulus presentations without flanking pattern; so the flanking patterns seem tofacilitate the identification of the stimulus patterns.

Biplots (II): To investigate further the influence of the flanking patterns, the results of

27

Figure 12: Biplots, for the different spatial frequency parameters of flanking patterns

-1,0 -0,5 0,0 0,5 1,0 1,5

-0,8

-0,4

0,0

0,4

0,8

1,2

R4

R3

R2

R1 S4

S3S2

S1

Dim

.2; 1

2.33

% o

f in

erti

a

Dim.1; 87.58 % of inertia

-0,5 0,0 0,5 1,0 1,5

-0,4

0,0

0,4

0,8

1,2 sf = 4.5, with

R4

R3R2

R1 S4

S3

S2S1

Dim

. 2, 1

1.07

% o

f in

erti

a

Dim.1, 88.89 % of inertia

-1,0 -0,5 0,0 0,5 1,0

-0,8

-0,4

0,0

0,4

0,8

1,2sf = 5.0, without

R4

R3R2

R1 S4

S3

S2S1

Dim

. 2; 13

.90

% o

f in

erti

a

Dim. 1; 85.85 % of inertia

-1,0 -0,5 0,0 0,5 1,0 1,5

-0,8

-0,4

0,0

0,4

0,8

1,2

R4

R3R2

R1 S4

S3S2

S1

Dim

. 2; 9.

30 %

of

iner

tia


-0,8 -0,4 0,0 0,4 0,8

-0,8

-0,4

0,0

0,4

0,8 sf = 3.6, without

R4S4

S3R3R2S2

R1 S1

Dim

. 2, 1

0.04

% o

f in

erti

a

Dim. 1, 86.30 % of inertia

-0,8 -0,4 0,0 0,4 0,8-0,8

-0,4

0,0

0,4

0,8

R4R3

R2

R1S4S3

S2

S1

sf = 4.5, with

Dim

. 2, 6

.95

% o

f in

erti

a


sf = 4.5, without

sf = 5.0, with

x 2 = 141.12 x 2 = 196.24

x 2= 110.83

x 2 = 185.87

x 2= 247.96 x 2

= 260.94

28

Figure 13: Stimulus versus response scale values (biplots), depending upon spatial fre-quency parameter of flanking pattern

-0,8 -0,4 0,0 0,4 0,8

-0,8

-0,4

0,0

0,4

0,8

y = a + bxa = -.11b = 1.14

scal

e va

lues

res

pons

es, w

itho

ut

scale values stimuli, without-0,8 -0,4 0,0 0,4 0,8

-0,5

0,0

0,5

1,0

1,5

scal

e va

lues

, res

pons

es, w

ith

scale values, stimuli, with

-0,8 -0,4 0,0 0,4 0,8-0,8

-0,4

0,0

0,4

0,8

y = a + bxa = -.07b = 1.16

scal

e va

lues

, res

pons

es, w

ith

ou

t

scale values, stimuli, without-0,8 -0,4 0,0 0,4 0,8

-0,5

0,0

0,5

1,0

1,5

scal

e va

lues

, res

pons

es, w

ith

scale values, stimuli, with

spatial frequency of flanking patterns: 3.6 c/deg


-1,0 -0,5 0,0 0,5 1,0 1,5-1,0

-0,5

0,0

0,5

1,0

1,5

scale values stimuli, with-1,0 -0,5 0,0 0,5 1,0

-1,0

-0,5

0,0

0,5

1,0

y = a + bxa = .002b = 1.07

scal

e va

lues

res

pons

es, w

itho

ut

scale values stimuli, without


a = .60

b1 = 1.374

b2 = -.829

a = .688

b1 = 1.317

b2 = -.807

a = .768

b1 = 1.499

b2 = -.743

scal

e va

lues

, res

pons

es, w

ith

29

the Correspondence Analysis (CA) are presented in a different form in Fig. 13: the scalevalues of the stimuli when presented without flanking patterns are plotted against thescale values of the stimuli when presented with flanking patterns, and similarly the scalevalues for the responses from trials without flanking pattern are plotted against the scalevalues of the corresponding responses when the flanking patterns were shown. Only thescale values on the first dimension are presented.

The main result to be taken from Figure 13 is that the relation between the scale valuesfor the stimuli and the responses is linear when no flanking pattern was presented withthe stimuli, whereas this relation appears to nonlinear when the stimuli were presentedwith flanking patterns; again, from a statistical point of view the null-hypothesis, thatthe data are compatible with a linear relationship, may still be maintained. It is thetrend over the different spatial frequencies that suggests the existence of nonlinearities.This trend would mean that in particular the difference between the responses R3 andR4 is reduced under the flanking condition. From the meaning of the scale values for theresponses this indicates that the flanking patterns induce an adjustment of the decisionboundaries.

Another aspect of the data is revealed when the scale values of the stimuli for theno-flanker condition are plotted against the scale values for the stimuli for the flankercondition, and similarly for the scale values of the responses, see Figure 14. The plots ofthe scale values for the stimuli are definitely linear for the flanking parameters f01 = 3.6and f02 = 4.5, while the plots of the scale values of the responses may contain a nonlinearcomponent. From a purely statistical point of view, a linear function will also accountfor the response data, but there is a definite improvement of the fit when a nonlinearfunction is fitted (a polynomial of 2nd order). This finding contrasts with that for theflanking parameter f03 = 5.0: here the plot for the stimuli seems to reflect a positivelyaccelerated relation between the scale values of the stimuli, and while the fit of the 2ndorder polynomial to the response data does not imply any improvement of the fit. Ifone inspects the parameters of the fitted curves, one finds that the slope parameter bfor the stimulus plots increases with increasing value of f0k, from b = 1.03 for f01 = 3.6to b = 1.16 for f02 = 4.5, and for f03 = 5.0 the positively accelerated relation fits evenbetter than the linear one. For the response plots the parameter b1, representing thelinear component of the polynomial, also increases from b1 = 1.23 for f01 to b1 = 1.31for f02, and at the same time the parameter b2, reflecting the negative acceleration,decreases from -.89 for f01 to -.98 for f02; for f03 the relation has turned linear. So thereare systematic changes in the parameters describing the relation between scale values,depending on the value of f0k. The flanking patterns appear to pull the scale values forstimuli and responses apart. With respect to Figs. ?? and ?? one may conclude thatthe flanking patterns exert an influence either on the expected values of the underlyingrandom variables, or, more likely, on their variances. The neural representations appearto be more separable with increasing frequency of the flanking patterns, meaning thatthe variances become smaller with an increasing value of fok. This corresponds to theobserved increase of χ2-values when flanking patterns are presented.

Further aspects of the data are revealed when the scale values of stimuli and responsesare plotted against the spatial frequency parameter fi of the second stimulus component,see Figure 15. The relations between the scale values and the fi appear to be rathersystematic. Except for the relation between the scale values of the responses and the fi

30

Figure 14: Scale values of stimuli and responses, without versus with flankers

-0,8 -0,4 0,0 0,4 0,8

-0,5

0,0

0,5

1,0

1,5 Responses

scal

e va

lues

, w

ith

flan

kers

scale values, no flankers

flanker 3.6 deg

-0,8 -0,4 0,0 0,4 0,8

-0,8

-0,4

0,0

0,4

0,8 Stimuli

scal

e va

lues

, w

ith

flan

kers

scale values , no flankers

y = a + bx

a = -.05

b = 1.03

a = .75

b1 = 1.23

b2 = -.89

Stimuli

-0,8 -0,4 0,0 0,4 0,8-0,8

-0,4

0,0

0,4

0,8

scal

e va

lues

, wit

h fl

anke

rs

scale values, no flanker

-0,8 -0,4 0,0 0,4 0,8-0,8

-0,4

0,0

0,4

0,8

1,2

a = .79b1 = 1.31b2 = -.98

Responses

scal

e va

lues

, wit

h fl

anke

rs

scale values, no flankers

flanker: 4.5 c/deg

y = a + bx

a = -.07

b = 1.16

flanker: 5 c/deg

-1,0 -0,5 0,0 0,5 1,0

-1,0

-0,5

0,0

0,5

1,0

1,5Stimuli

y = a + bxa = .05b = .98

scal

e va

lues

, w

ith f

lanker

s


-1,0 -0,5 0,0 0,5 1,0-1,0

-0,5

0,0

0,5

1,0

1,5

y = a + bxa = .46b = 1.08

Responses


scal

e va

lues

, w

ith f

lanker

s

under the flanker condition, the relation between the scale values and the fi appear tobe nonlinear; in particular, they are positively accelerated. Only for f0k = 5.0 a linearrelation fits equally well when no flanking patterns were presented. The nonlinearityof the relations between scale values and fi-values is remarkable since the differencesfi+1−fi, i ≤ 3 are identical for all i and the experiment reported in section 4.2.1 a linearrelation between the parameter fi, the spatial frequency defining a Gabor patch, and thescale values was observed, see Figures 8 and 9. Moreover, the difference fi+1−fi equalled.25 in this experiment, while in the experiment reported in this section the difference

31

Figure 15: Scale values versus fi-values

4,1 4,2 4,3 4,4

-0,6

-0,3

0,0

0,3

0,6

0,9

Stimuli, no flanker

Sca

le v

alue

s

Spatial frequency

Stimuli, with flanker

Sca

le v

alue

s

4,1 4,2 4,3 4,4

-0,5

0,0

0,5

1,0

1,5

Responses, with flanker

Sca

le v

alue

s

Spatial frequency

4,1 4,2 4,3 4,4

-0,9

-0,6

-0,3

0,0

0,3

0,6

0,9

Responses, no flanker

Sca

le v

alue

s

Spatial frequency

flanker 3.6 deg

4,1 4,2 4,3 4,4-0,8

-0,4

0,0

0,4

0,8

Stimuli, with flanker

Sca

le v

alue

s

Spatial frequency 4,1 4,2 4,3 4,4

-0,4

0,0

0,4

0,8

Sca

le v

alue

s


-0,8

-0,4

0,0

0,4

0,8

Sca

le v

alue

s

Spatial frequency

4,1 4,2 4,3 4,4

-0,5

0,0

0,5

1,0

1,5

scal

e va

lues

Spatial frequency

Responses, with flanker

flanker: 4.5 deg

4,1 4,2 4,3 4,4

-1,0

-0,5

0,0

0,5

1,0

1,5

scal

e va

lues


-1,0

-0,5

0,0

0,5

1,0

1,5sc

ale

valu

e

Spatial frequency

flankers, 5 deg

Stimuli, no flanker

Stimuli, no flanker

Responses, no flanker

Responses, no flanker Stimuli, with flanker Responses, with flanker

flanker: 4.5 deg

flankers, 5 deg

no flanker with flanker

4,1 4,2 4,3 4,4

-0,6

-0,3

0,0

0,3

0,6

0,9

Spatial frequency

flanker 3.6 deg

4,1 4,2 4,3 4,4

-1,0

-0,5

0,0

0,5

1,0

scal

e va

lues


-1,0

-0,5

0,0

0,5

1,0

scal

e va

lue

Spatial frequency

equalled only .1. Possibly, the fact that here superpositions of Gabor patches defined thestimuli plays a role. Alternatively, a change of the variances of the underlying randomvariables with the value of the spatial frequency-parameter of the flanking patterns maycause the observed nonlinearities; further data are required to clarify these points. In anycase, the fact that linear relationships are observed for the responses, provided a flankingpattern was shown, is of interest. The flanking patterns appear to have a adjusting effecton the sets Aj , which determine the range of values of the random variable, representingthe sensory effects, implying the response Rj . These adjustment effects may again bedue to corresponding changes of the variances of the decision variable.

32

“[ht!]

Figure 16: Scale values for f01 = 3.6 versus f02 = 5.0

-0,4 0,0 0,4 0,8-1,0

-0,5

0,0

0,5

1,0

y = a + bxa = -.05b = 1.32

sti

mul

i, n

o fl

(5.

0 c/

deg)

stimuli, no fl (3.6 c/deg)

-0,8 -0,4 0,0 0,4 0,8-1,0

-0,5

0,0

0,5

1,0

1,5

y = a + bxa = .07b = 1.30st

imu

li,

wih

fl (5

.0 c

/deg

)

stimuli, with fl (3.6 c/deg)

-0,8 -0,4 0,0 0,4 0,8-1,0

-0,5

0,0

0,5

1,0

y = a + bxa = .08b = 1.28 r

espo

nses

, no

fl (

5.0

c/de

g)

responses, no fl (3.6 c/deg)

-0,5 0,0 0,5 1,0 1,5-1,0

-0,5

0,0

0,5

1,0

1,5

y = a + bxa = .02b = 1.20r = .99

resp

onse

s w

ith

fl. (

5.0

c/de

g)

responses with fl. (3.6 c/deg)

(a) (b)

(c) (d)

Finally, one may consider the plots of the scale values for stimuli for a certain valuef0k versus the scale values of the stimuli for another value of f0k; similarly for theresponses. Figure 14 shows these plots. We consider in particular the data for f0k = 3.6and f0k = 5.0. Note first that the plots are definitely linear, and note further that forthe stimuli the slope parameters assume the values b = 1.32 (no flanking patterns) andb = 1.30 (with flanking patterns). The difference between these two b-values is certainlynegligible, and an equivalent statement holds for the additive constants a (a = .05 anda = .07), i.e. the difference has only a negligible effect on the prediction of the scalevalues for the stimuli under the f0k = 5.0-condition on the basis of the scale values fromthe f0k = 3.6-condition, whether the flanking patterns were actually presented or not.This is remarkable since the finding means that the flanking patterns employed in aparticular session appear to determine the perception of the stimuli independent of theiractual presentation. The fact that the values of b are larger than 1 imply that the scalevalues for the f0k = 5.0-condition are more spread out than for the f0k = 3.6-condition,corresponding to the larger χ2-values for data from trials with flanking patterns. For theresponses, an analogous statement holds, (b = 1.28 and b = 1.20). If only the scale valuesfor the responses were affected by the flanking parameter one could easily argue that thisparameter affects the definition of the sets Aj , but the fact that also the scale values of thestimuli are affected would mean that also the sensory representations change, providedthe model considered in this paper is correct: an increase of the spatial frequency of theflanking patterns seems to reduce the variance of the underlying random variables.

33

5 The 2-dimensional case

The stimuli may vary with respect to more than a single aspect. Suppose the stimulivary with respect to the aspects A and B. The subject may combine information on theaspects into a single percept that can be represented by a single random variable; thismay be the case if, for instance, the stimuli are represented by templates in the subject’smemory and the subject chooses the response according to the maximally activatedtemplate. If Yj , j = 1, . . . , I, represents the the acitivity of the j-th template, thenthe decision variable would be Y = max(Y1, . . . , YI). Alternatively, the subject may tryto evaluate the aspects A and B individually in order to arrive at a decision about thepresented stimulus. A sufficient, but not necessecary condition for this is that the subjectis able to perceive the different levels Ai of A and Bj of B independenty. If the Ai andBj are not perceived independently, in the sense that the neural activities correspondingtoAi and Bj are independent of each other, the subject may still be able to evaluatethese activities independently. Ashby and Townsend (1986) and Karlec and Townsend(19??) have carefully disentangled these possibilities. In the following, the definitionsand theorems presented by A & T will be briefly presented. It will then be shown thatemploying Correspondence Analysis to the confusion matrix allows, together with sometests proposed by A & T, some quick decisions about the type of process underlying thesubject’s performance.

5.1 Ashby and Townsend (1986)

x, y random variables representing perceptual effects of the two components defining thestimulus, defined as AiBj , i.e. i-th level of feature A and j-th level of feature B.

Definition 5.1 Let fAiBj(x, y) the common distribution of x and y when the stimulus

is defined by AiBj, and let gAiBjdenote marginal distributions. The components Ai and

Bj are perceptually independent, if

fAiBj(x, y) = gAiBj

(x)gAiBj(y) (61)

holds.

Remark: If f is Gaussian, independence holds if and only if the effects of Ai and Bj

are uncorrelated over trials.

Definition 5.2 Let (a, b) denote the event that a stimulus (A,B) is reported. Samplingindependence holds if

p(a2b2|A2B2) = p(A sampled |A2B2)p(B sampled |A2B2) (62)

holds. For the complete identification experiment:

p(a2b2|A2B2) = [p(a2b1|A2B2) + p(a2b2|A2B2)][p(a1b2|A2B2) + p(a2b2|A2B2)]. (63)

Remark:

p(a2b2|A2B2) =p(a2b2 ∩ A2B2)

p(A2B2)

general=

nij

ni+. (64)

34

Figure 17: Configurations of stimuli; (a) the separable case, (b) optimal boundaries forthe perceptually dependent case

0,6 0,9 1,2 1,5 1,8

-0,5

0,0

0,5

1,0

1,5

sb bb

bsss

Sec

ond

com

pone

nt

First component

(a) (b)

0,8 1,0 1,2 1,4 1,6

0,0

0,3

0,6

0,9

1,2

sb bb

bsss

Sec

ond

com

pone

nt

First component

Theorem 5.1 Consider the complete identification experiment. Then

1. Sampling independence holds for the stimulus AiBj, if perceptual independenceholds and the decision bounds are parallel to the coordinate axes.

2. Perceptual independence holds for AiBj, if sampling independence holds for A, Bfor different decision criteria and if decision bounds are parallel to the coordinateaxes.

3. If the decision bounds are not parallel to the axes then sampling independence andperceptual independence are logically unrelated, i.e. .

Separability versus Integrality: According to Garner & Morton (1969), stimuluscomponents are separable if they ”act separately in the organism and thus can go in-dependently of each other”. Integrality if components ”join one another such that it isextremely difficult for the subject to take note of one without at the same time takingnote of the other”. Separable stimuli can be separately attended to whereas componentsof integral stimuli cannot (this may happen to various degrees).

Definition 5.3 Operational definition A: If components are separable, performance ona task demands a response based on a single component is unaffected by the level ofother irrelevant components. With integral components, varying the level of irrelevantcomponents degrades performance (”filtering task”).

Separability can be defined either at the perceptual or the decisional level (p. 164).Let

g(x) =

∫ ∞

−∞

f(x, y)dy. (65)

Components A and B are perceptually separable if the marginal distribution of A doesnot depend on the level of B, so that

gAiB1(x) = gAiB2

(x), i = 1, 2 (66)

35

Definition 5.4 Consider a complete identification experiment, ie one in which all com-binations AiBj occur. A and B are perceptually separable if the perceptual effect of onecomponent does not depend on the level of the other, that is if

gAiB1(x) = gAiB2

(x), i = 1, 2 (67)

gA1Bj(y) = gA2Bj

(y), j = 1, 2 (68)

Definition 5.5 Consider a complete identification experiment with stimuli A1B1, A1B2,A2B1 and A2B2. The components A and B are decisionally separable if the decision aboutone component does not depend upon the level of the other, that is, if the decision boundsin the general recognition theory are parallel to the coordinate axes.

This relates to Theorem 5.1: one may now say that if decisional separability holds,then perceptual independence is equivalent to sampling independence (when the latterholds for different decision criteria). If decisional separability is not found, then Theorem5.1 indicates that sampling independence is logically unrelated to perceptual indepen-dence.

Theorem 5.2 Complete identification experiment; Suppose the following three condi-tions hold:

1. All perceptual presentations are normally distributed (the general Gaussian recog-nition model holds),

2. No two ~µij are equal, where ~µij is the mean vector for stimulus AiBj,

3. the subjects responds optimally, maximising probability correct.

Then perceptual separability and decisional separability together imply the perceptual in-dependence of components A and B within each stimulus configuration.

Theorem 5.3 Complete identification experiment: If perceptual and decisional separa-bility hold for components A and B, then marginal response invariance holds, ie theprobability of correctly recognising one component does not depend upon the level of theother:

p(aib1|AiB1) + p(aib2|AiB1) = p(aib1|AiB2) + p(aib2|AiB2) (69)

p(a1bj |A1Bj) + p(a2bj |A1Bj) = p(a1bj |A2Bj) + p(a2bj |A2Bj) (70)

36

Relevant conclusions: Key question is: does perceptual independence hold?

1. Suppose response invariance does not hold, ie equations (69), (70) do not hold.Then the decision boundaries are not parallel to the coordinate axes, meaning thatdecisional and perceptual separability do not hold, (see Theor. 5.2), i.e. eitherperceptual or decisional separability fail. According to Definition 5.5, if decisionbounds are parallel to coordinate axes, then the features are perceptually separable.So if perceptual separability does not hold, then the axes are not parallel to theaxes.

2. According to Theor. 5.1, if perceptual independence and decision bounds parallelto the axes, then sampling independence, and sampling independence and decisionbounds parallel to the axes, then perceptual independence.

Sampling independence holds, if the conditions of eq. (63) are met. If the equationsare not satisfied, sampling independence can be refuted. Then, according to Theor.5.1, the conjunction sampling independence and parallel decision bounds does nothold. So, if sampling independence and, from ??, parallelism of decision boundsdo not hold, we cannot conclude that perceptual independence holds; the findingssupport the hypothesis that perceptual independence does not hold.

Relation to Correspondence Analysis

CA provides an orthogonal system of coordinates, with each axis representing a certainproportion of the total inertia χ2/N , and therefore of the total χ2. Suppose (x, y) rep-resents the perceptual effect of a stimulus, and suppose further that f is 2-dimensionalGaussian with zero covariance and equal variances. Then perceptual independence holds.Suppose further that decisional and sampling independence holds and the subject decidesoptimally. The stimulus configuration (in the biplot) is then rectangular, where the scaledifferences of the first axis represent the differences between the two means of the featurewith the larger parameter difference, and the difference of the scale values of the seconddimension represent the feature with the smaller parameter difference. If the decisionbounds are optimal, i.e. are defined by the corresponding mean of the scale values, thepoints for the responses will be identical to the stimulus point (error free case).

The scale differences will be proportional to the corresponding d′-values. This isintuitively clear: the scale values will be proportional to the corresponding mean values,and since the variances are supposed to be identical, the weighting will be identical aswell. This will be illustrated numerically. The effect of the following distortions has tobe illustrated numerically:

1. Different variances, optimal decision bounds, zero covariances,

2. Equal variances, equal correlation (positive and negative) for all stimuli,

3. The combination of unequal variances and equal correlation,

4. Shifts of the decision bounds; still parallel to the axes, but deviating from theoptimal values (defined by the equal a priori probabilities and equal costs of wrongdecisions).

37

5. Decision bounds that are not parallel to the axes, but linear.

The effects should be discussed with respect to the Ashby & Townsend and Karlec &Townsend papers: The discussion should aim at an identification of the perceptual inde-pendence case, because that’s what is most interesting. In particular:

1. Definition of Sampling independence, see Def. 63,

2. Relation to Theorem 5.1,

3. and the other notions and corresponding Theorems.

The stimulus configuration in the disc-hat-experiment: So the labelling is counter-clockwise.

Table 6: Stimulus configuration in the disc-hat-experiment

Stim. 1 A1B1

Stim. 2 A2B1

Stim. 3 A2B2

Stim. 4 A1B2

The equivalent to the d′ABjand d′AiB

is then

d′AB1⊜ ∆′

AB1= f31 − f21, d′AB2

⊜ ∆′AB2

= f41 − f11 (71)

d′A1B ⊜ ∆′A1B = f12 − f22, d′A2B ⊜ ∆′

A2B = f42 − f32 (72)

The theorems relating separability and independence on the one hand and d′ABjand

d′AiBon the other should carry over to the equivalent statements relating independence

and separability and ∆′ABj

and ∆′AiB

.

Alternative approach: suppose the stimulus is represented by some variable y =∑

k αkφk + ξ, φk a representation of a feature, αk a weight, ξ some error. y is evaluatedas in Thurstone’s model, or Fisher’s discriminant analysis.

5.2 Numerical evaluations

5.2.1 The separable case; r = 0 versus r 6= 0

The case r = 0: Consider the case that A1 and A2 differ more than B1 and B2 (radiiof superimposed circular discs).

1. The case σ1 = σ2: The mean values determine the configuration. The first axisseparates the stimuli with respect to the greater feature difference.

Special case: all features have the same value. Then the two dimensions representthe same proportion of inertia. The stimuli form a square, they are positioned onthe axes.

If the features have different values, the configuration will also be rectangular, butthe sides of the rectangular are parallel to the axes.

38

2. The case σ1 6= σ2: If all features have the same value, the case σ1 < σ2 willseparate the stimuli S1, S2 on the one hand versus S3 and S4 on the other. Ifσ1 > σ2, the stimuli S1, S4 on the one hand and S2 and S3 on the other.

The case r 6= 0: For equal variances:

1. The case r > 0: The configuration is turned clockwise.

2. The case r < 0: The configuration is turned counterclockwise.

For unequal variances, the reverse may hold (examples). In any case: the separabilitytest always complies with the data, while for r 6= 0 the independence test does not.

5.2.2 The case r 6= 0, Gaussian classification

Let us assume that the subject is able to somehow construct the optimal decision bounds.For positive and negative correlations: the configurations are turned clock- or anticlock-wise. In any case, neither the independence test nor the separability test hold. This isof importance since some configurations look like the separable case with r 6= 0: slightlyturned configuration.

5.2.3 Conclusions

If the stimuli are identified with respect to the components, the stimulus configuration aswell as the response configuration will be rectangular, regardless of the way the decisionboundaries are defined. The Ashby-Townsend tests will reveal whether the boundaries areparallel to the axes defined by the components or not, i.e. whether separability holds ornot. If the correlation between the feature representations are zero, the configuration willbe parallel to the axes, otherwise the configuration appears rotated. It is the rectangularform of the configuration that indicates that decisions are made with respect to theindividual components.

One may therefore say that if rectangularity is not observed, then the subject iden-tifies the patterns not according to an evaluation of the component representation, butaccording to some function φ(A,B). φ may reflect template matching, or that indeedsome function of the components is taken as a basis for the decisions, - or the variance-covariance matrices are different. A look at the inertias may reveal that indeed thedecisions are based on some function φ; this will be discussed with respect to some data(the disc-hat-data). Indeed, if the var-covar-matrices are unequal, this would mean thatthe particular values of the features determine the interaction between the representationof the components is configuration specific, pointing to a holistic representation of thestimuli, which is a special case of a representation by some function φ.

5.3 Empirical data: stimuli composed of circular ”discs”

Stimulus patterns were defined according to

l(r) = L0(1 + ms(r)), r =√

x2 + y2, (73)

39

L0 defines the space average luminance of the screen, and m = (Lmax − Lmin)/2L0 isthe (maxwell contrast). s(r) is defined as the sum of two concentrically superimposed

Figure 18: Example of stimulus: luminance profile

-0,3 -0,2 -0,1 0,0 0,1 0,2 0,3

-0,5

0,0

0,5

1,0

Example of stimulus pattern

"outer"

disc

"outer"

disc

"inner"

disc

L0lum

inan

ceretinal coordinate (visual angle)

circular ”discs” d1 and d2 of different radius and luminance, specified by

dk(r) =

{

1, r ≤ ck

0, r > ck, k = 1, 2 (74)

i.e. s(r) is defined as

s(r) = (1 + α)d1(r) − αd2(r), α = 1/2. (75)

Four patterns s(r) were defined: r1 could assume either the value r11 or r12, and r2 couldassume the values r21 or r22. The total width (diameter) of the stimulus pattern is definedby the ”outer” disc and was either 2c1 = .50 or 2c1 = .460. The ”inner” disc had eitherthe diameter 2c2 = .150 or 2c2 = .140. Stimuli are characterised by a combination of twoletters, b for ”big”, s for ”small”. Responses are characterised the same way, only withcapital letters, a particular pattern si is defined by a combination (r1i, r2i), so there arefour possible patterns. The actual values of the rij are summarized in Table 7. A more

Table 7: Characterisation of stimuli and resonses; The first specifies the ”outer”, thesecond the ”inner” disc”, s and S for ”small”, b and B for ”big”. c1 radius for inner, c2

for outer disc.

Stimulus Response c1 c2 c1/c2 c2/c1

bb BB .25 .075 3.333 .300bs BS .25 .07 .3571 .28sb SB .23 .075 3.067 .320bb BB .23 .07 3.285 .304

complete description of the experiment is given in Mortensen and Meinhardt (1999).

The conditions of stimulus presentation were such that it was difficult for a subjectto arrive at a correct response, so there is some learning involved. One may expect thatthe subject will try different strategies. To minimise the effect of averaging over differentstrategies during a session, the number of stimulus presentations per session was kept

40

relatively small: stimuli were each presented 25 times per session. The first question onemay ask is whether the effect of learning may be seen from a plot of correct responsesfor the four stimulus patterns, seed Fig. 19, (a) for subject GM and (b) for subject KF.Obviously, the course of the proportions of correct responses does not follow classical

Figure 19: Proportion of correct responses over sessions.

0 2 4 6 8 10 12 14 16

0,0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

bb bs sb ss

Pro

port

ion

corr

ect re

spon

ses

Number of session

1 2 3 4 5

0,15

0,20

0,25

0,30

0,35

0,40

0,45

0,50

0,55

bb bs sb sspr

opor

tion

cor

rect

res

pons

es

Number of session

(a)

(b)

learning curves, i.e. here is no more or less monotonic increase of the proportions ofcorrect responses. All one can say is that the proportions for the patterns sb and bs areabove those for the patterns bb and ssfor both subjects; it seems that the patterns sb andbs can be discriminated more easily than the patterns ss and bb. To find out more aboutany learning effects one may thus investigate the biplots for each session. Figure 20 showsthe biplots for subject GM, who participated in only 5 sessions. In the first session (a),the responses show a strong random component; the χ2 for the data from this sessionis not significant; a similar result holds for the second session, which is not presentedhere for lack of space. Still, a first systematic element shows already up in this plot,which remains invariant in the plots for all the following session: the patterns bb and bsshow up in the left quadrants of the plot, and the patterns ss and sb appear in the rightquadrants. So the patterns with a ”big” outer disc seem to form a class, and those with asmall outer disc another. This means that the outer disc defines the meaning of the firstdimension. Further, the patterns with a ”small” inner disc appear either in the upper twoquadrants ((a) and (b)) or in the lower two quadrants ((c) and (d)), so the sizes of thediscs for a basis for the decisions from the first session on. The responses do not showa clear pattern, though. The points for the responses SB and BB are almost identicalin the first session, and their position deviates considerably from the positions of therespective stimulus patterns. For the third session - (c) - , the positions of the responses

41

Figure 20: Biplots, Subject GM. (a): session 1, χ2 = 5.83, p = .80; (b): session 3,χ2 = 15.52, p = .19; (c): session 4, χ2 = 17.27, p = .044; (d): session 5, χ2 = 28.93,p = .001. See text for further explanation.

-0,4 -0,3 -0,2 -0,1 0,0 0,1 0,2 0,3-0,4

-0,3

-0,2

-0,1

0,0

0,1

0,2

0,3

BBSBBS

SS

bb

bs

sb

ssD

im. 2

; 34

.88

% o

f in

erti

a


-0,5 -0,4 -0,3 -0,2 -0,1 0,0 0,1 0,2 0,3 0,4 0,5-0,4

-0,3

-0,2

-0,1

0,0

0,1

0,2

0,3

0,4

BB

BS SB

SS

bb

bs

sbss

Dim

. 2, 1

8.69

% o

f in

erti

a


-0,6 -0,4 -0,2 0,0 0,2 0,40,6-0,6

-0,4

-0,2

0,0

0,2

0,4

0,6

BBBS SB

SS

bb

bs

sb

ssDim

. 2; 37

.33

% o

f in

erti

a


-0,6 -0,4 -0,2 0,0 0,2 0,4 0,6 0,8

-0,6

-0,4

-0,2

0,0

0,2

0,4

0,6

0,8

SB

sb

ssSSbsBS

BB

bb

Dim

. 2; 25

.80

sof

iner

tia


(a) (b)

(c) (d)

correspond to those of the stimulus patterns at least with respect to the size of the outerdisc. The fourth session provides an improved picture, and in the final, fifth session - (d) -the subject’s decisions have improved such that each stimulus pattern corresponds to onequadrant, and the corresponding response point is situated in the same quadrant. Thesubject knows that the stimulus patterns are defined by an orthogonal variation of thesizes of the inner and the outer disc and tries to evaluate the stimulus components, i.e.the discs, independent of each other. In other words, the subject evaluates the patternswith respect to two, not with respect to a single decision variables. Now the scale valuesare meant to reflect conditional means; each stimulus pattern is thus represented by twosuch means, one for the distribution of the representation of the inner, the other for thatof the outer disc. While the representation of the stimuli is relatively stable over thesessions, that of the responses is not, it stabilises over the sessions, though. This maymean that the bounds of the areas for each stimulus have to be learned. In this sense,learning amounts to a reduction of response bias.

A second subject (KF) provided data from altogether 15 sessions; the number ofstimulus presentations was again 25 in each session. Fig. 21 the biplots for a selection ofthe sessions. The subject KF agrees with the subject GM insofar as he eventually arrivedat a classification scheme that is practically identical with that of GM: the two features,inner and outer disc, are evaluated independent of each other, with the outer disc beingthe feature that is easier to discriminate. However, in all sessions preceding the fifteenthKF’s decisions appear to be guided by the relation among the disc sizes, and less so byan independent evaluation of their sizes. In the first session - (a) - the patterns that seem

42

Figure 21: Biplots, Subject KF. (a): session 1, χ2 = 18.17, p = .029; (b): session 6,χ2 = 19.17, p = .021; (c): session 14, χ2 = 37.71, p = .000; (d): session 15, χ2 = 24.52,p = .000.

(a) (b)

(d)

-0,6 -0,4 -0,2 0,0 0,2 0,4 0,6 0,8-0,6

-0,4

-0,2

0,0

0,2

0,4

0,6

0,8

BB BS

SB

SS

bb

bssb

ss

dim

. 2, 8

.65

% o

f in

erti

a


-0,6 -0,4 -0,2 0,0 0,2 0,4 0,6 0,8 1,0-0,6

-0,4

-0,2

0,0

0,2

0,4

0,6

0,8

1,0

BB

BS

SBSS

bbbs

sb ss

Dim

. 2, 1

4.49

% o

f in

erti

a


-0,6 -0,4 -0,2 0,0 0,2 0,4 0,6 0,8 1,0-0,6

-0,4

-0,2

0,0

0,2

0,4

0,6

0,8

1,0

BB

BS

SB

SS

bb

bs

sb

ss

Dim

. 1, 1

4.75

% o

f in

erti

a

Dim. 1, 82.58 % of inertia-0,6 -0,4 -0,2 0,0 0,2 0,4 0,6

-0,6

-0,4

-0,2

0,0

0,2

0,4

0,6BB

BS

SB

SS

bb

bs

sb

ss

Dim

. 2, 2

6.84

% o

f in

erti

a


(c)

to define the first dimension are sb and bs, explaining about 89 % of the total inertia, i.e.of the total χ2; the patterns ss and bb appear to cluster near sb and generate a seconddimension accounting, however, for only 8.65 % of the inertia. Note that the responsesBB and SB are close to the points representing the corresponding stimuli, while theresponses BS and in particular SS are not so, indicating that the decision bounds forthe inner disc are inappropriately chosen. This may result from the fact that the featureare not evaluated independent of each other.

In the sixth session - (b) - the position of the points representing the stimuli haschanged, the first dimension seems to be defined to some extent by the inner disc. Theresponse points are not too far from the stimulus points, i.e. the subject begins to succeedin finding decision bounds relative to the dimensions that serve to distinguish among thepatterns. In the 14th session the subject’s decisions appear again strongly determinedby the relation among the features, the pattern bs versus the remaining patterns seemsto define the 1st dimension, accounting for about 83 % of the inertia. The outer discappears to define the second dimension, accounting, however, only for maximally 15 % ofthe inertia (a part of the proportions reflects just noise!). Only in the last, 15th sessionthe subject appears to abandon the strategy to evaluate the pattern in a Gestalt-likemanner and concentrate on the features independent of each other.

That the subject KF judged the stimuli according to the relation among the featuresmay be further corroborated by looking at the data from another view point. For instance,Fig. 22 shows the biplots for the sessions 9 - (a) - and 10, (b). In session 9, the first

43

Figure 22: Biplots for the sessions 9 - (a) - and 10 - (b), and ratio of disc diameters ascriterion variable, (c) session 9, (d) session 10

(a) (b)

-0,6 -0,4 -0,2 0,0 0,2 0,4 0,6-0,6

-0,4

-0,2

0,0

0,2

0,4

0,6

SS

bsBSbb

ss

BB

SB

sb

Dim

ensi

on 2

: 2.

84 %

of

iner

tia


3,0 3,1 3,2 3,3 3,4 3,5 3,6

-0,6

-0,4

-0,2

0,0

0,2

0,4

0,6

0,8 bs

bb

ss

sb

CA

sca

le v

alues

U (

Dim

. 1)

Ratio of diameter outer/inner disc

2 = .82 r

(c)

3,0 3,1 3,2 3,3 3,4 3,5 3,6

-0,4

-0,2

0,0

0,2

0,4

0,6

ss

sb

bs

bb

CA

sca

le v

alues

U (

Dim

. 1)

Ratio of diameters outer/inner disc

2 = .99 r

(d)

-0,8 -0,6 -0,4 -0,2 0,0 0,2 0,4 0,6 0,8

-0,8

-0,6

-0,4

-0,2

0,0

0,2

0,4

0,6

0,8

bs

BSss

SS

SB

bb

BB

sb

Dim

ensi

on 2

: 14

.49

% o

f in

erti

a

Dimension 1: 85.49% of inertia

Figure 23: Scale values for the first dimension versus quotients of disc ratios for session15, subject KF

3,0 3,1 3,2 3,3 3,4 3,5 3,6

-0,6

-0,4

-0,2

0,0

0,2

0,4

0,6

bs

bb

ss

sb

Co

ord

inat

e D

im. 1

Ratio outer/inner disc

2 = .66r

dimension, accounting for the larger part of the inertia (85.5 %), is defined by the patternsbs and sb; only the points for bb contributes a little to the second dimension. This holdsto an even larger extent for the 10th session. Figure 22 (c) and (d) shows plots of the scalevalues of the stimuli versus the quotient of the radii outer to inner disc, corresponding to

44

the biplots (a) and (b), respectively. For session 10 the fit is practically perfect. So onemay argue that subject KF tries to base his decisions on the value of a single variable,amounting to a gestalt-like evaluation of the patterns. Only in session 15 the decisionsappear to be based on an independent evaluation of the features; Fig. 23 shows the plotof the scale values for the first dimension versus the ratios of the radii, corresponding tothe biplot (d) in Fig. 20, and now the value of r2 has decreased to .66.

6 Summary and discussion

It was argued that the scale values for the stimuli (row categories) and responses (columncategories) provided by a Correspondence Analysis are linear functions of conditional ex-pectations of random variables representing the sensory activity generated by the stimuli.These random variables may actually be random vectors, the components of which rep-resent different aspects or features of the stimuli with respect to which they may bedistinguished. It is assumed that the subject selects the stimulus features such that thenumber of errors is minimised, relative to the experimental conditions. Stimuli and re-sponses can simultaneously be represented by points in a coordinate system whose axesrepresent, analogous to the axes in principal component analysis, independent stimulusdimensions; these latent axes correspond to those found in a discriminant analysis of thedata. However, care has to be taken with respect to numerical artifacts; for instance, asecond dimension may result from the fact that correct responses outnumber the numberof confusions, in which case the horseshoe effect will appear, as in Fig. 7, where it is clearfrom the start that the data, i.e. the confusion matrix, is generated by a single ”latent”dimension. If one has reason to assume that the second dimension does not just reflectnumerical artifacts, the Eucledian distance dii′ between point representing stimuli maybe taken as a parameter free sensitivity measure equivalent to the d′ in SDT; if the datasuggest that only a single dimension represents a perceptually relevant stimulus dimen-sion, the differences of the scale values on this dimension may be interpreted as indicesof discriminability or sensitivity, analogous to the d′-measure. dii′ has the advantage ofnot requiring the assumption that the underlying random variables are Gaussians withequal variance; the nonlinear relations between stimulus parameters and scale values mayactually indicate deviations from the equal-variance case.

The position of points representing the responses provides information concerningpossibly biased confusions of the correct stimulus with other stimuli. The Euclediandistance between stimulus and response points is not explained though; the relationbetween stimulus and response points is given by the systems of equations (53) and (53),or, in explicit form, by (55). For instance, for a certain response, say R1, one has

g11 =p11

c1

f11√λ1

+p21

c1

f21√λ1

+ · · · + pI1

c1

fI1√λ1

g12 =p11

c1

f12√λ2

+p21

c1

f22√λ2

+ · · · + pI1

c1

fI2√λ2

The points for S1 and R1 will be close to each other when g11 ≈ f11 and g12 ≈ f12, anda sufficient condition for this is pi1 ≈ 0 for all i 6= 1, and λ1 ≈ λ2. However, as theinspection e.g. of Table 3 and the corresponding Figure 7 (a) shows, this condition is byno means necessary.

45

The Correspondence Analysis of some experimental data yields results that corre-spond to the above interpretation of the scale values for stimuli and responses. Whenthe stimuli are ”simple” Gabor -patches, as described in section 4.2.1, the scale values forthe stimuli are practically perfect linear functions of the spatial frequency parameter fi

defining the Gabor patch. This suggests that the expected values of the random variablesrepresenting the sensory activity are again linear functions of the fi.

The situation may become more complicated when the stimuli are two-dimensional,as in section 5.3. The stimuli were constructed by an orthogonal variation of two compo-nents, an ”inner” and an ”outer” disc. During the first sessions, the subjects seem to tryto evaluate the patterns with respect to a single dimension, representing the combinationof the two components; in this case, most likely the ratio between the radii of inner andouter disc. After some sessions, the subjects seem to restructure their decisions, trying toevaluate each dimension separately. The usual learning curves - percent correct responsesversus number of trial or session - do not reveal such restructuring processes; the biplotsmay, however, provide some insight into these processes.

The effects of additional, i.e. flanking patterns on the identification of stimuli mayalso be explored investigating the relations between the scale values and the parametersof the stimulus pattern, as in section 4.2.2. Flanking patterns seem to pull apart the scalevalues of stimuli; since the scale values are related to the χ2-values of a confusion matrixthis means that patterns are better discriminated when they are presented togetherwith flanking patterns; for the patterns considered here, discrimination is improved thehigher the value of the spatial frequency parameter of the flanking Gabor patches, see inparticular Figure 14. Most likely the variances of the random variables representing theneural responses are reduced with increasing flanker parameter; more data are requiredbefore further interpretations concerning the underlying neuronal processes are suggested.

So far, only data from identification experiments have been considered; the collectionof data from discrimination experiments is under way.

7 Appendix

7.1 Decomposition of variance

SStot =

I∑

i=1

ni+∑

k=1

(sik − s)2 =

I∑

i=1

ni+∑

k=1

(sik − si + si − s)2

=I∑

i=1

ni+∑

k=1

(sik − si)2 +

I∑

i=1

ni+(si − s)2 − 2I∑

i=1

ni+∑

k=1

(sik − si)(si − s)

=I∑

i=1

ni+∑

k=1

(sik − si)2 +

I∑

i=1

ni+(si − s)2

= SSwt + SSbt,

and∑

i

∑

k(ξik − si)(si − s) =∑

i(si − s)∑

k(ξik − si) = 0 since∑

k(ξik − si) = 0. �

46

7.2 Proof of Theorem 3.1

Let µ be a Lagrange-multiplicator, and

Q(a) = a′K ′D−1rs Ka − µ(a′Dcsa − S). (76)

Let ǫj = (0, · · · , 0, 1, 0, · · · , 0)′ be a vector whose components equal zero except for thej-th component, which is equal to 1. Then the vector a yielding a maximum value of ρ2

relative to the value of SStot is the solution of

∂Q

∂ai= ǫ

′

iK′D−1

rs Ka + ak′D−1rs Kǫj − µ(ǫ

′

jDcsa + aDcsǫj)

= 2K ′D−1rs Ka − 2µDcsa = K ′D−1

rs Ka − µDcsa = 0

i.e.K ′D−1

rs Ka = µDrsa (77)

If this equation is multiplied from the left by D−1rs one gets (25).


Let X = (xij) an arbitrary m×n-matrix, i.e. i = 1, 2, . . . ,m, j = 1, 2, . . . , n, xij ∈ R, withrank r ≤ min(m,n). Let X1,X2, . . . ,Xn denote the column vectors of X. The Xj canbe represented as linear combinations of r linear independent, in particular orthogonal,n-dimensional basis vectors. Equivalently, the row vectors can be represented as lineqarcombinations of r lineqar independent, in particular orthogonal, m-dimensional basisvectors. Let L1, . . . , Lr be r orthogonal, n-dimensional vectors. There exist r coefficientsvj1, . . . , vjr such that

Xj = vj1L1 + · · · + vjrLr, j = 1, 2, . . . , n. (78)

This is equivalent to writingX = UV ′, (79)

where U = [U1, . . . , Ur], V = (vkj), with k = 1, 2, . . . , r and of course j = 1, 2, . . . , n.From the postulated orthogonality of L one finds immediately

X ′X = V U ′UV ′ = V ΛV, U ′U = Λ =

λ1, 0 · · · 00 λ2 · · · 0

0 0... 0

0 0 · · · λr

, (80)

and λk is the square of the length of the basis vector Lk. the equation X ′X = V ΛV ′

suggests that the column vectors of A are the eigenvectors of X ′X, and the λk are thecorresponding eigenvalues. The roots

√λk, i.e. the lengths of the Lk, are also known as

singular values of X ′X.

Multiplying the Uk with 1/√

λk means to norm the Lk, i.e. Q = UΛ−1/2 containsthe orthonormal basis vectors, and certainly U = QΛ−1/2. The eigenvectors of X ′X

47

are also known to be orthogonal, and without loss of generality can be postulated to benormalised. Consequently one may re-write (79) as

X = UΛ1/2V ′. (81)

Further, since UU ′ = V Λ1/2Λ1/2V ′ = UΛU ′, U must contain the orthogoonal eigen-vectors of XX ′, corresponding to the nonzero eigenvalues of XX ′ (equal to those ofX ′X).


According to (102), (D−1c P ′)F = GΛ1/2. Multiplication from the left with D−1

r P yields

D−1r P (D−1

c P ′)F = D−1r PGΛ1/2.

But according to (101) one has D−1/2r PG = FΛ1/2, so that

(D−1r PD−1

c P ′)F = FΛ (82)

follows. Therefore the coordinates F are given as the eigenvectors of the matrix D−1r P (D−1

c P ′.The validity of

(D−1c P ′D−1

r P )G = GΛ (83)

is shown analogously. �


Proof: F = D−1/2r UΛ1/2 implies

fis = uis

√

λs/√

ri,

where uis is the element in the i-th row and the s-th column of U . It follows that

FV ′ = D−1/2r UΛ1/2V ′ = D−1/2

r X.

Thenxij√ri

=

r∑

s=1

fisvjs

andxkj√

ck=

r∑

s=1

fksvjs

so thatxij√ri

− xkj√ck

=r∑

s=1

(fis − fks)vjs.

48

But

xij√ri

− xkj√ck

=1√ri

pij − ricj√ricj

− 1√ck

pkj − ckcj√ckcj

=1

√cj

(

pij

ri− cj

)

− 1√

cj

(

pkj

ck− cj

)

=1

√cj

(

pij

ri− pik

ck

)

, (84)

so thatJ∑

j=1

(

xij√ri

− xkj√ck

)2

=

J∑

j=1

1

cj

(

pij

ri− pkj

ck

)2

= δ2ik (85)

and therefore

δ2ik =

J∑

j=1

(

r∑

s=1

(fis − fks)vjs

)2

=

J∑

j=1

v2js

r∑

s=1

(fis − fks)2

+ 2

J∑

j=1

∑

s<s′

(fis − fks)(fis′ − fks′)vjsvjs′ . (86)

But

2

J∑

j=1

∑

s<s′

(fis − fks)(fis′ − fks′)vjsvjs′

= 2∑

s<s′

[(fis − fks)(fis′ − fks′)

J∑

j=1

vjsvjs′ ] = 0

and∑

j vjsvjs′ = 0 because of the orthogonality of the eigenvectors Bs and Bs′ . Further

one has∑

j v2js = 1, because the eigenvectors are normalised, i.e. have the lengths 1.

Therefore one has on the left of (86) the χ2-distance between the i-th and the k-th row,and on the right one has the corresponding Eucledian distance defined by the coordinatesin F . �


We need the following relations:

r′F = 0 (87)

c′G = 0. (88)

Proof: Let 1 = (1, 1, . . . , 1)′. the number of components of 1 will depend upon theequation with respect to which 1 is used.

49

Certainly, one hasD−1

r r = 1, D−1c c = 1, (89)

which is easily illustrated for the example J = 2:

(

1/r1, 00, 1/r2

)(

r1

r2

)

=

(

11

)

.

Further,

r′1 =∑

i

ri = 1, c′1 =∑

j

cj = 1. (90)

These equations follow immediately from the definition of the ri and the cj .

The SVD of T is given by T = D−1/2r (P − rc′)D

−1/2c = UΛ1/2V ′. Therefore

P − rc′ = D1/2r UΛ1/2V ′D1/2

c . (91)

LetA = D1/2

r U, B = D1/2c V, (92)

so that (91) can be written in the form

P − rc′ = AΛ1/2B′; (93)

this expression is also known as the generalised SVD of the matrix P − rc′.

Since U and V are orthonormal it follows that U ′U = I, V ′V = I, I the unity matrix.

But U = D−1/2r A, V = D

−1/2c B. Then

U ′U = A′D−1r A = I, V ′V = B′D−1

c B = I, (94)

To prove (87) und (88) it should be noted that F may be written in the form F =D−1

r (P − rc′)D−1c B: if AΛ1/2B′ substituted for P − rc′ one gets

F = D−1r AΛ1/2B′D−1

c B = D−1r AΛ1/2.

On the other hand one has

D−1r (P − rc′)D−1

c B = (D−1r P − D−1

r rc′)D−1c B = (D−1

r P − 1c′)D−1c B.

this yields

F = (D−1r P − 1c′)D−1

c B (95)

G = (D−1c P ′ − 1r′)D−1

r A, (96)

where the expression for G was derived in an analogous way. Then it follows that

r′F = r′(D−1r P − 1c′)D−1

c B.

But it is r′(D−1r P − 1c′) = 1′P − r′1c′, and further one has 1′P = c′, r′1 = 1, so that

1′P − r′1c′ = 0, according to (90). Therefore it follows that r′F = 0. c′G = 0 is shownin an analogous way.

50

�

The relation between coordinates

Proof: P − rc′ = AΛ1/2B′ implies

D−1r (P − rc′)D−1

c = D−1r AΛ1/2B′D−1

c , (97)

or, in transposed form,

D1/2c (P ′ − cr′)D1/2

r = D−1c BΛ1/2A′D−1

r . (98)

Because of F = D−1r AΛ1/2, G = D−1

c BΛ1/2 this yields

D−1r PD−1

c − D−1r rc′D−1

c = FB′D−1c , (99)

and on the other hand

D−1c P ′D−1

r − D−1c cr′D−1

r = GA′D−1r . (100)

If (99) is multiplied from the right with BΛ1/2 and (100) with AΛ1/2, one gets becauseof (94)

D−1r PD−1

c BΛ1/2 − D−1r rc′D−1

c BΛ1/2 = FΛ1/2

D−1c P ′D−1

r AΛ1/2 − D−1c cr′D−1

r AΛ1/2 = GΛ1/2.

But D−1c BΛ1/2 = G, and D−1

r AΛ1/2 = F , and c′G = 0, r′F = 0, so that

D−1r PG = FΛ1/2 (101)

D−1c P ′F = GΛ1/2 (102)

follows. Multiplication from the right with Λ−1/2 yields the equations (54) and (53). �

References

[1] Ashby, G.F. & Townsend, J.T. Varieties of perceptual independence. PsychologicalReview, 1986, 93, 154-179

[2] Ashby, F.G. & Gott, R.E. Decision rules in the perception and categorization ofmultidimensional stimuli. Journal of Experimental Psychology: Learning, Memory,and Cognition, 1988, 14, 33-53

[3] Balakrishnan, J.D. (1998) Some more sensitive measures of sensitivity and resonsebias. Psychological Methods, 3, 68-90

[4] Balakrishnan, J.D., Ratcliff, R. (1996) Testing models of decision making usingconfidence ratings in classification. Journal of Experimental Psychology: HumanPerception and Performance, 22, 615-633

[5] Balakrishnan, J.D. (1999) Decision Processes in discrimination: Fundamental mis-representations of Signal Detection Theory. Journal of Experimental Psychology, 25,1189-1206

51

[6] Bishop, C.M., Neural networks for pattern recognition. Clarendon Press, Oxford1995

[7] Bradley, A., Skottun, B., Ohzawa, I., Sclar, G., Freeman, A.D. (1987) Visual ori-entation and spatial frequency discrimination: A comparison of single neurons andbehavior. J. Neurophysiol. 57: 755-772

[8] Britten, K.H., Shadlen, M.N., Newsome, W.T., Movshon, J.A. (1993) Responses ofneurons in macaque MT to stochastic motion signals. Vis. Neuroscience, 10:1157-1169

[9] Burgess, A.E., Ghandeharian, H.: Visual signal detection. III. On Bayesian use ofprior knowledge and cross correlation. J. Opt. Soc. Am. A 2, 1985, 1498 – 1507

[10] Tanner, Jr., W. P., Swets, J.A. (1954) A decision-making theory of visual attention.Psychological Review, 61, 410-409

[11] Busemeyer, J.R., Townsend, J.T. (1993). Decision Field Theory: A dynamic cogni-tion approach to decision making. Psychological Review, 100, 432-459

[12] Caelli, T., Rentschler, I., Scheidler, W. (1987) Visual pattern recognition in humans.I. Evidence for adaptive filtering. Biological Cybernetics, 57, 233-240

[13] Dean, A. F. (1981) The variability of discharge of simple cells of cat striate cortex.Exp. Brain Res. 44: 437-440

[14] Diederich, A: (1997) Dynmic stochastic models for decision making under time con-straints. Journal of Mathematical Psychology, 41, 260-274

[15] Gershon, E.D., Wiener, M.C., Latham, P.E., Richmond, B.J. (1998) Coding strate-gies in monkey V1 and inferior temporal cortices. J. Neurophysiol. 79, 1135-1144

[16] Guttman, L. (1941) The quantification of of class attributes: A theory and methodof scale construction. In Horst, P. (ed.) The prediction of personal adjustment. SocialScience Research Council, New York, 319-348,

[17] Greenacre, M.J. Theory and applications of correspondence analysis. AcademicPress, London, 1983

[18] Hauske, G., Wolf, Lupp, U. (1976) Matched filters in human vision. Biolog. Cybern.,22, 181 – 188

[19] Juttner, M., Caelli, T. Rentschler, I. (1997) Evidence-based pattern classification:a strucutal approach to human perceptual learning and generalization. Journal ofMathematical Psychology, 41, 144-259

[20] Kadlec, H., Townsend, J.T. (1992) Implications of marginal and conditional detec-tion parameters for the separabilities and independence of perceptual dimensions.Journal of Mathematical Psychology, 36, 325-374

[21] Lancaster, H.O. (1957) Some properties of the bivariate normal distribution consid-ered in the form of a contingency table. Biometrika, 44, 289

52

[22] Link, S.W., Heat, R.A. (1975) A sequential theory of psychological discrimination.Psychometrica, 40, 77-105

[23] Luce, R.D., Individual choice behavior. A theoretical analysis. John Wiley & sons,New York 1959

[24] Luce, R.D., Detection and recognition. In: Luce, R.D., Bush, R.R. and Galanter, E.(eds.) Handbook of Mathematical Psychology, Vol. 1, Wiley, New York 1963

[25] Maloney, L.T., Thomas, E.A.C. (1991) Distributional assumptions and observedconservatism in the theory of signal detectability. Journal of Mathematical Psychol-ogy, 35, 443-470

[26] Meinhardt, G., Mortensen, U. (1998) Detection of aperiodic test patterns by patternspecific detectors revealed by subthreshold summation. Biol. Cybern., 79, 413 – 425

[27] Nishisato, S. (1980) Analysis of categorical data: Dual scaling and its applications.Toronto University Press, Toronto

[28] Nosofsky, R.M. Attention, similarity, and the identification-categorization relation-ship. Journal of Experimental Psychology: General, 1986, 115, 39-57

[29] Rentschler, I., Juttner, M., Caelli, T. (1993) Probabilistic analysis of human super-vised learning and classification. Vision Research, ??, 669-687

[30] Rentschler, I., Barth, E., Caelli, T., Zetsche, C., Juttner, M.. (1996) Generalizationof form in visual pattern classification. Spatial Vision, 10, 59-85

[31] Ripley, B. D., Pattern recognition and neural networks. Cambridge University Press,1996

[32] Snowden, R.J., Treue, S., Andersen, R.A. (1992) The response of neurons in areasV1 and MT of the rhesus monkey to moving random dot patterns. Exp. Brain Res.88:3898-400

[33] Swets, J.A. (1986a) Indices of discrimination or diagnostic accuracy: Their ROCsand implied models. Psychological Bulletin, 99, 110-117

[34] Swets, J.A. (1986b) From empirical ROCs in discrimination and diagnostic tasks:implications for theory and measurement of performance. Psychological Bulletin, 99,191-198

[35] Townsend, J.T., Theoretical analysis of an alphabetic confusion matrix. Perceptionand Psychophysics, 1971 (a), 9, 40 - 50

[36] Townsend, J.T., Alphabetic confusion: A test of models for individuals. Perceptionand Psychophysics, 1971 (b), 9, 449-454

[37] Townsend, J.T., & Ashby, G.F. Experimental test of contemporary mathematicalmodels of visual letter recognition. Journal of Experimental Psychology: HumanPerception and Performance, 1982, 8, 834-864

53

[38] Townsend, J.T. & Landon, D.E. Mathematical models of recognition and confusionin psychology. Mathematical Social Sciences,1983, 4, 25-71

[39] Townsend, J.T., Hu, G.G., & Ashby, F.G. A test of visual feature sampling inde-pendence with orthogonal straight lines. Bulletin of the Psychonomic Society, 1980,15, 163-166

54

· correspondence analysis of identiﬁcation data. uwe mortensen, g¨unter meinhardt...

Documents