the neuropsychological basis of perception of biological...

The Neuropsychological Basis of

Perception of Biological Motion

A Dissertation

Submitted for

the Degree of “Philosphiae doctoris” (PhD) in Neuroscience

at the

International Graduate School of Neuroscience (IGSN)

of the

RUHR-UNIVERSITY BOCHUM

by

Daniel Jokisch

Supervised by

Prof. Dr. Irene Daum

and

Prof. Dr. Nikolaus F. Troje

October 2004

Printed with permission of the International Graduate School of Neuroscience of the

RUHR-UNIVERSITY BOCHUM

First Referee: Prof. Dr. Irene Daum

Second Referee: Prof. Dr. Nikolaus F. Troje

Third Referee: Prof. Emily D. Grossman, PhD

Date of the oral examination: November 30th, 2004

Table of Contents

Chapter

I General Introduction

II Study 1: Biological Motion as Cue for the Perception of Size

III Study 2: Structural Encoding and Recognition of Biological

Motion: Evidence from Event-related Potentials and Source

Analysis

IV Study 3: Self Recognition versus Recognition of Others by

Biological Motion: Viewpoint-dependent Effects

V Study 4: Differential Involvement of the Cerebellum in

Biological and Coherent Motion Detection

VI General Discussion

VII References

List of Partial Publications

Declaration

Acknowledgments

Curriculum Vitae

Page

2

33

57

80

92

110

117

135

136

137

138

Chapter I General Introduction

2

I General Introduction

Movement patterns from fellow human beings contain a wide variety of information

providing important cues for successful social interaction. Therefore, the ability to

efficiently read this information is of particular relevance for each individual in

everyday life. Such motion patterns characteristic of living beings are termed as

biological motion (BM). The importance of this information is clearly not restricted to

human social interaction. Considering the animal kingdom, movements from

conspecifics, predators and preys provide a pivotal source of information. Accordingly,

its correct interpretation plays a major adaptive role.

For any animal, motion is an essential part of the visual environment. The ability to

detect animate motion and to adequately react to it is a basic requirement with respect

to an animals’ survival and successful reproduction. On the one hand, accurate and fast

movement recognition of a prey or predator animal and anticipation of its future

movements increases an animal’s fitness. Therefore, its chance of survival increases

since it can adjust a fight or flight reaction optimally. On the other hand, within a given

species, successful social interaction between possible partners or communication with

rivals requires the decoding of a variety of complex social signals mediated by

biological motion. Such successful social interaction is a prerequisite for successful

reproductive behavior. The possible partner needs to be classified in terms of sex, age,

social status and other attributes of biological, social and psychological relevance.

Motion patterns are an important source of information in that respect providing

information not only about the actions of a conspecific but also about its current

constitution in terms of physical fitness and emotional state.

The human species is characterized by a highly developed social structure and the most

complex communicative behavior among all animate beings. Communicative behavior

is based on speech as well as on non-verbal cues like gestures and mimics associated

with biological motion. Another non-verbal communication channel concerns the style

a person moves by providing a first impression about a person’s emotions and


3

personality traits. This information may influence whether an individual finds someone

else sympathetic or even sexually attractive.

To efficiently use the large information content associated with BM and produce

appropriate behavior, the mammalian brain must be capable of perceiving this visual

information, decoding its meaning and deriving correct conclusions from it. The

extraordinary ability of BM perception is very old in terms of evolution and is assumed

to provide the basis for a number of higher cognitive functions. Its special relevance

results in the motivation to further elucidate its neuropsychological basis and to

investigate the interplay between perception and action in terms of meaningful

movements.

The following paragraphs describe the state of the art concerning research on the

perception of biological motion. First, the work of Gunnar Johansson is described who

initiated research on this perceptual phenomenon by his innovative studies. Next, the

large body of literature of literature on psychophysics investigating the information

content in the kinematics of movement patterns is summarized. In the following

paragraphs, the specificity of BM is stressed by the distinction of the categories of BM

and non-BM. The next large section of this chapter links the perceptual phenomenon of

BM perception to its underlying neural machinery. This section starts with a brief

overview about the foundations of the human visual system and its division into the

dorsal and ventral visual streams performing different aspects of visual analysis.

Within this framework, the results from electrophysiological studies, imaging studies

and computational studies are discussed in detail in order to give a extensive view

about recent findings and current developments in research on the neuronal basis of

BM perception. The next section emphasizes the functional significance of BM

perception and its associated neural machinery in higher cognitive functions. Among

these higher cognitive functions are social perception, action understanding, speech

perception and theory of mind. This section shows that BM perception is not an

isolated high-level visual phenomenon but rather an important perceptual ability

relevant for many high-level cognitive functions. Finally, in the last section of this

chapter the objectives of the current work are specified and the relationship to previous

studies is depicted.


4

1.1 Theoretical background

In this section the theoretical background and the empirical evidence for the perception

of BM are described.

1.1.1 The concept of biological motion and psychophysical findings

About 30 years ago the Swedish psychologist Gunnar Johansson introduced the

concept of BM to experimental psychology (Johansson, 1973). He defined BM as

“motion patterns characteristic of living organisms in locomotion”. In everyday

perception, the visual information from BM is coupled with other sources of

information such as form or shape of animate objects. In order to isolate the

significance of animate motion under laboratory conditions, information from BM must

be separated from other sources of information. Johansson designed a new visual

stimulus display, the point-light display technique, which fulfilled these requirements.

By visualizing the position of the main joints of a walking person as bright dots against

a dark background he generated the vivid impression of a human figure in motion.

Using these displays, the compelling power of perceptual organization from BM from

only a few points was demonstrated. Observers need only 100-200 ms to organize such

displays into a coherent percept (Johansson, 1976).

Since this time, a large number of studies have used Johansson’s point-light displays as

stimulus material. It has been demonstrated that BM perception goes far beyond the

ability to recognize a set of moving dots as a human walker. The rudimentary

information contained in point-light displays of BM is sufficient even to solve

sophisticated recognition tasks. Observers are able to recognize the gender of a walking

person (Barclay, Cutting, & Kozlowski, 1978; Cutting, 1978; Kozlowski & Cutting,

1977; Mather & Murdoch, 1994; Troje, 2002a), can identify friends by their gaits

(Cutting & Kozlowski, 1977; Troje, Westhoff, & Lavrov, in press), and can recognize

themselves from a point-light display of their own movements (Beardsworth &

Buckner, 1981). Based on BM it is even possible to derive information about the

emotion of an actor (Dittrich, Troscianko, Lea, & Morgan, 1996; Pollick, Paterson,

Bruderlin, & Sanford, 2001). The ability to perceive BM is not restricted to human


5

movements. Mather and West (1993) extended the point-light display paradigm to

animations of four-legged animals and showed that human observers can identify

different animals.

BM perception is strongly orientation dependent. Recognition performance decreases if

the stimuli are rotated with respect to their normal upright orientation. A number of

studies has shown that inversion of point-light displays impairs both the detection of an

actor as well as the recognition of actions, emotion, identity and several other attributes

(Bertenthal, Proffitt, & Kramer, 1987; Dittrich, 1993; Pavlova & Sokolov, 2000;

Shipley, 2003; Sumi, 1984). These findings resemble findings from face perception

(Thompson, 1980; Valentine, 1988). Both classes of visual stimuli have in common

that they contain sophisticated information for social recognition and communication.

Such orientation effects of faces and BM stimuli depend on the stimulus orientation

relative to the observer (Troje, 2003). Standard BM stimuli as used in the experiments

described above consist of an animation of the motion of dots attached to the joints of a

moving figure. As result, the animation contains information about the position of the

joints and about the motion of these points over time. Beintema and Lappe (2002)

developed a BM display in which the dots were not located on the joints, but rather on

a random position between the joints. Using a limited lifetime technique, each point

was reallocated to another position on the limbs in order to strongly degrade the local

motion. Observers were still able to recognize a human figure in these displays

indicating that in addition to local motion, dynamic form information about body

posture also contributes to BM perception.

Both top-down and bottom-up mechanisms contribute to the perceptual analysis of

BM. At the beginning of research on BM, a bottom up or low level processing

explanation was favored. This view was put forward by Johansson’s original approach

in which he considered BM processing from the perspective of visual vector analysis

(Johansson, 1973, 1976). He proposed an automatic extraction of a mathematically

lawful spatio-temporal relation in early visual patterns. This perspective was supported

by early bottom up computational models (Hoffman & Flinchbaugh, 1982; Webb &

Aggarwal, 1982) and experimental findings (Mather, Radford, & West, 1992). Further

support for the contribution of bottom-up processing of BM was given by Thornton

and Vuong (2004). Their results provide direct evidence that complex dynamic patterns


6

can be processed incidentally by showing that task irrelevant point-light displays of

BM cannot be ignored and are processed to a level where they influence behavior.

Later on, several psychophysical studies indicated that top-down processes also play an

important role in BM perception. Bertenthal and Pinto (1994) suggested that the

perception of a global form specified by BM precedes the perception of the individual

elements or the local relation. This conclusion was drawn from an experimental

paradigm in which complex masking elements were used to render low-level

constraints uninformative. Additional support for the contribution of top-down

mechanisms was derived from findings that stereoscopic depth cues in conflict with

depictions of point-light walkers do not affect the perception of these walkers

(Bülthoff, Bülthoff, & Sinah, 1998). This result was attributed to a top-down

recognition-based influence.

Support for the notion of top-down influences in the perception of BM also stems from

experiments which varied the temporal characteristics of BM stimuli. Perception of

apparent motion results from the sequential presentation of static objects in different

spatial locations. When presented with sequential static images of an inanimate object

in different positions, the object is perceived as moving along the shortest or most

direct path, regardless whether this path is physically possible or not. The perception of

apparent motion is different when the object to be presented is a human figure.

When showing images of apparent motion of humans, observers perceived either a

direct path which was biomechanically impossible or an indirect path which was

biomechanically possible depending on the time interval between the stimuli (Shiffrar

& Freyd, 1990; Shiffrar & Freyd, 1993). If the time interval between images matched

the time required to perform the movement along the biomechanically possible path, a

realistic path was perceived. Based on this result it was concluded that perception of

human movement is constrained by an observer’s knowledge or experience with the

biomechanical properties of his own body.

Given the contribution of top-down and bottom-up processing in the perception of BM

the question on the role of visual attention arises. Attentional effects in processing of

BM were explored by Cavanagh, Labianca and Thornton (2001) and Thornton,

Rensink and Shiffrar (2002). Cavanagh and colleagues (2001) showed that

discrimination of specific features of point-light displays of BM seems to be a serial


7

process, since reaction times increased with the number of items. The reaction time

increase was attributed to increasing attentional demands of the task. Results in a dual

task paradigm to explore the role of attention in the processing of BM (Thornton et al.,

2002) suggested that, in some cases, perception of BM can be automatic. But if

strategies operating in a global, top-down fashion are required, attentional demands

play a vital role.

1.1.2 Specificity of biological motion

Taken together, the behavioral findings indicate that the motion of living beings has

special features. In the absence of any other cue, motion can convey detailed and

specific information about what other organisms are doing. The content of information

in BM can be organized in three hierarchical levels. At the lowest level, humans can

detect whether an object is animate. The movements of inanimate objects are driven by

external forces whereas animate objects are usually self-propelled. At the next level

humans can detect agency since movements of agents are defined by their goals. At the

highest level we can detect intentionality. The movements of intentional agents are

determined by their beliefs and desires.

Looking closer at the physical properties of BM, the fact that BM is self-propelled

cannot does not provide a sufficient explanation for the specificity of BM. For instance,

cars, aircrafts, trains etc. share this property as well but clearly do not belong to the

category of animate objects. Moreover, it has been shown that the brain processes

categories of self-propelled, man-made objects differently compared to animate object

(Caramazza & Shelton, 1998). An interesting approach stresses the relevance of the

direct influence of the constant force of gravity on BM (Alexander, 1989). Gravity

determines one special feature of BM: the periodic dissipation of energy against

gravity. It might be possible that this typical pattern of motion created by the influence

of gravity is the crucial feature which is used by the mammalian visual system to detect

BM and process it as special category.


8

1.1.3 Development of a neural machinery for the perception of biological

motion

The information channel for the analysis of BM is assumed to be very old in terms of

evolution. This assumption is based on the particular relevance of this ability in the

animal kingdom and is further supported by experimental findings of the ability to

perceive BM in a number of different species (cats, pigeons, macaques, chicks and

quails), using Johansson like displays as stimuli (Blake, 1993; Dittrich, Lea, Barrett, &

Gurr, 1998; Oram & Perrett, 1994; Regolin, Tommasi, & Vallortigara, 1999;

Yamaguchi & Fujita, 1999). With respect to its evolutionary importance, it is an

interesting question whether the ability to perceive BM and the underlying neuronal

connections are inborn or acquired during learning. Strong support for the notion that

this ability is innate comes from findings that newly hatched chicks prefer point-light

displays of chicks compared to other visual stimuli (Regolin et al., 1999; Yamaguchi &

Fujita, 1999). With respect to humans, the ability to process point-light displays of BM

has already developed in the first few months of life (Bertenthal, Proffitt, Spetner, &

Thomas, 1985; Fox & McDaniel, 1982). This finding does, however, not rule out that

the ability to perceive BM is innate.

1.1.4 Pathways of the human visual system

Before discussing the evidence for the neural substrates of BM perception on the basis

of neurophysiological and imaging studies, an overview of the brain mechanisms and

neural structures underlying human motion and form perception is given. This section

cannot provide a detailed description of all issues concerning the visual system because

of space limitations. Nevertheless, it provides the general framework in which the

experimental findings reported in the following sections can be integrated. The main

focus will be on the framework of the two extrastriate visual processing streams

(Ungerleider & Mishkin, 1982), the dorsal pathway and the ventral pathway, and their

significance for the perceptual mechanisms contributing to BM processing. This brief

overview is based on the description in Kandel, Schwartz and Jessell (2000).


9

The primary visual cortex (area V1), the first neocortical structure analyzing visual

information, receives input from the magnocellular (M) and the parvocellular (P)

pathway from the retina. The M pathway is more sensitive to stimuli with lower spatial

and higher temporal frequencies. In contrast, the P pathway is essential for color vision

and is particularly sensitive to stimuli with higher spatial and lower temporal

frequencies. The M and P pathways pass from the retina through the parvocellular and

magnocellular layers of the lateral geniculate nucleus to the input layer of the primary

visual cortex. At this stage they feed into the two parallel extrastriate pathways

extending through the cerebral cortex. The dorsal pathway extends from V1, through

the middle temporal area (MT) and superior temporal area (MST), to the posterior

parietal cortex. The ventral pathway extends from V1, through V4, to the inferior

temporal cortex. Whereas the parietal pathway appears to primarily receive

magnocellular input, the inferior temporal pathway depends on both the P and the M

input.

The idea of two separate processing streams was put forward by Ungerleider and

Mishkin (1982). They found specific visual deficits following selective lesions in the

temporal and parietal cortex and suggested that the dorsal processing stream subserves

spatial perception and object localization (“where system”), whereas the ventral

processing stream underlies object perception and recognition (“what system”). This

view was modified by Goodale and Milner (1992), who confirmed the distinction

between the dorsal and the ventral pathway in human and nonhuman primates but

suggested a new interpretation with respect to the functional role of both pathways.

They proposed that both pathways use the same information in different ways.

According to their view, the dorsal pathway is assumed to use the visual information to

guide action (“how” system) and the ventral system is assumed to use the information

for conscious perception and object recognition. There is strong evidence that

processing in these two cortical pathways is hierarchical. Each level has strong

projections to the next level and back projections from higher to lower levels. The type

of processing changes systematically from one level to the next with an increasing

degree of complexity.


10

1.1.4.1 Dorsal path

The dorsal pathway is critically involved in the perception of location and movement.

Moreover, it plays an important role in the control of eye and hand movements. The

first processing step of the dorsal pathway is performed in area V1. Cells in this area

respond to motion in one direction, while motion in the opposite direction does not

elicit a response. In monkeys, area MT is devoted to motion processing since almost all

of these cells are direction selective. In comparison to neurons in area V1, cells in area

MT have receptive fields that are ten times wider than those of cells in V1 projecting to

MT. Neurons in area MT respond to motion of bars of light by detecting contrasts in

luminance or by differences in texture or color. A cortical area adjacent to area MT,

area MST, also contains neurons that show responses to visual motion. These neurons

are assumed to process a specific type of global motion in the visual field called optic

flow. Information from optic flow plays a major role for a person’s own movement

through the environment since it concerns the perceived motion of the visual field

resulting from an individual’s own movement. Neurons in area MST have receptive

fields that cover a large part of the visual field and respond preferentially to large-field

motion. Additionally these neurons are sensitive to shifts in the origin of full-field

motion and to differences in speed between the center and periphery of the field.

Another area involved in perception of optic flow is the superior temporal area of the

parietal cortex (STP).

1.1.4.2 Ventral path

The ventral processing stream extends from V1 through V2 to V4 and then to the

inferior temporal cortex. As in area V1, neurons in area V2 are sensitive to the

orientation of stimuli, to color and to their disparity. These cells extend the analysis of

contours initiated in area V1. V2 neurons perform an analysis at a more abstract level

compared to V1 neurons since they respond not only to real contours but also to

illusory contours. Neurons in area V4 respond to color and to more complex forms

compared to neurons in previous layers. Recognition of complex forms is related to

processes occurring in the inferior temporal cortex. The most important visual input to

the inferior temporal cortex comes from area V4. The inferior temporal cortex consists


11

of two major regions, areas TEO and TE. Receptive fields of neurons in area TEO are

generally larger than those of V4 neurons but smaller than those of neurons in area TE.

The primary inputs of TEO neurons come from area V4, the primary outputs go to area

TE. Therefore, the neural coding of visual object features in area TEO is more global

than in area V4 but not as global as in area TE.

1.1.5 Neural machinery contributing to the perception of biological

motion

The behavioral findings show the high degree of specificity of BM processing. They

imply a highly developed neural machinery dedicated to the perceptual analysis of BM

information reflecting the relevance of information from BM. This assumption has

stimulated much neurophysiological, imaging and neuropsychological research on the

neural basis of the perception of BM in recent years.

1.1.5.1 Electrophysiological studies of biological motion perception in non-humans

A few electrophysiological studies have investigated the recognition of BM in macaque

monkeys (Jellema & Perrett, 2003a, 2003b; Oram & Perrett, 1994, 1996). Some

neurons in the superior temporal polysensory area (STP) responded selectively to full-

body or hand movements. Single cell recordings in area STP yielded neurons which

responded selectively to the sight of whole body movements as well as to point-light

displays of BM (Oram & Perrett, 1994). Some of these neurons are view-dependent

since their response decreased substantially when the stimulus is presented from a

different viewing angle than the neuron’s preferred view. Many cells in the area around

STP have multimodal properties as indicated by the integration of information about

the form and motion of animate objects (Oram & Perrett, 1996). Area STP is located in

the vicinity of the superior temporal sulcus. In the following the term STS-complex is

used referring to the area around the superior temporal sulcus.

Recently it has been suggested that the neural representation for actual BM in the STS-

complex may also extend to BM implied from articulated static postures which form


12

the end point of an action (Jellema & Perrett, 2003a). Neural activity in response to

face and body postures in the STS-complex can be influenced by the perceptual history

in terms of immediately preceding actions (Jellema & Perrett, 2003b). Such a

mechanism could support the formation of expectations about the impending behavior

of others.

Another set of action selective neurons has been found in the monkey premotor cortex

(Gallese, Fadiga, Fogassi, & Rizzolatti, 1996; Rizzolatti, Fogassi, & Gallese, 2001).

These neurons were termed “mirror neurons” since they respond both when the

monkey performed an action and when it observed the same action. Similar to neurons

in the STS-complex, mirror neurons often have multimodal properties (Kohler et al.,

2002). The functional role of this mirror neuron system will be described in a separate

paragraph.

1.1.5.2 Findings from electrophysiological studies in humans

Electrophysiological techniques such as electroencephalogram (EEG) and event-related

potentials (ERP) provide the opportunity to measure brain activity with a very high

temporal resolution, but have the disadvantage of a low spatial resolution in

comparison to functional imaging techniques. Two different approaches are currently

applied in the literature on electrophysiological studies of BM and action perception in

humans. One approach measures electrical brain activity when presenting stimuli

consisting of body movements in full view in which the display initially stands still and

movement onset occurs with a delay. The other approach uses point-light displays of

BM as stimulus material.

By applying the first approach, neural responses to onset of movements of the mouth

and the eyes (Puce, Smith, & Allison, 2000) were observed within 200 ms after motion

onset as measured by ERPs. Facial movements occurring on a continuously present

face elicited different N170 amplitudes for mouth opening versus closing and for eye

aversion versus eyes gazing at the observer. Similar results were found for the

observation of whole body actions of others (Wheaton, Pipingas, Silberstein, & Puce,

2001). ERPs elicited in response to movement onset in movie sequences of body


13

stepping, hand closing and opening, and mouth opening and closing were selective for

specific hand and body motions.

An fMRI and ERP study of visual processing of natural and line-drawings displays of

moving faces (Puce et al., 2003) supports the notion that the temporal lobe integrates

facial form and motion in humans. The STS and the fusiform gyrus responded

selectively to both types of face stimuli, and they evoked larger ERPs compared to

control stimuli at around 200 ms after motion onset. Puce and Perrett (2003) recently

concluded that a specialized visual mechanism exist in the STS complex of both

humans and non-human primates which produces selective neural responses to moving

natural images of faces and bodies. By using the first approach, i.e. presenting displays

with delayed motion onset, it is possible to separate stimulus onset and movement

onset. But such displays lack typical features of BM, which make up its specificity. For

instance, the specific style of a movement is not considered. This source of information

is transported purely by the dynamics of the movement and cannot be conveyed by

displays of apparent motion. Therefore, information about the smoothness or the

intensity of a movement remains underestimated in this paradigm.

Hirai, Fukushima and Hiraki (2003) used the second approach and tried to clarify the

neural dynamics in BM perception by comparing ERPs elicited by point-light displays

of BM and scrambled motion. They report that both types of stimuli elicited peaks at

around 200 and 240 ms, which were larger in the BM condition than in the scrambled

motion condition. A recently published paper (Pavlova, Lutzenberger, Sokolov, &

Birbaumer, 2004) analyzed gamma MEG activity in response to BM generated from a

computer algorithm. Recognizable upright and non-recognizable inverted walkers

evoked enhancements in oscillatory gamma brain activity (25-30 Hz) over the left

occipital cortices as early as 100 ms from stimulus onset. Upright BM elicited further

gamma response over the parietal (130 ms) and right temporal (170 ms) lobes.

1.1.5.3 Findings from functional neuroimaging studies in humans

Neuroimaging research has demonstrated that viewing BM engages a specific structure

located in the area around the superior temporal sulcus (STS) that is often termed STS-


14

complex. The first suggestion for the existence of a specialized mechanism dedicated

to the perception of BM came from an fMRI study aiming to examine the properties of

the V5-complex (Howard et al., 1996). This area is specialized for visual motion

perception. One of the stimulus categories used in this experiment were point-light

displays of BM. Activity in response to these stimuli was found in the V5-complex as

well as in areas of the superior temporal cortex. The latter finding was unexpected,

since this part of the superior temporal cortex had been assumed to belong to the

auditory cortex and was normally activated by the perception of speech. Several

functional imaging studies using fMRI or PET were carried out in the following years

which presented point-light displays of BM as stimulus material and used different

types of stimuli as control condition, such as scrambled motion, coherent motion,

rotating objects or static object (Bonda, Petrides, Ostry, & Evans, 1996; Grezes et al.,

2001; Grossman & Blake, 2001, 2002; Grossman et al., 2000; Servos, Osu, Santi, &

Kawato, 2002; Vaina, Solomon, Chowdhury, Sinha, & Belliveau, 2001). These studies

report selective activation of the superior temporal sulcus (STS) to visual stimuli

consisting of BM. In addition to area STS, activation specific to BM has also been

found in the cerebellum (Grossman et al., 2000; Vaina et al., 2001), area VP (Servos et

al., 2002), the amygdala (Bonda et al., 1996), the occipital and fusiform face area

(Grossman & Blake, 2002) and the premotor cortex (Santi, Servos, Vatikiotis-Bateson,

Kuratate, & Munhall, 2003; Saygin, Wilson, Hagler, Bates, & Sereno, 2004).

There seem to be hemispheric asymmetries associated with the processing of BM,

irrespective of the visual field in which the display was presented, with a pronounced

activity in the right STS-complex (Grezes et al., 2001; Grossman et al., 2000).

Moreover, even inverted displays of BM or imagined BM are sufficient to induce

activity in the STS-complex, but the activity level was lower in these conditions than

during actual viewing of the BM animations (Grossman & Blake, 2001). Recently it

has been shown that point-light displays of BM also induce activation in the premotor

cortex (Saygin et al., 2004). This finding is consistent with the mirror-neuron-theory

(Rizzolatti, Fadiga, Gallese, & Fogassi, 1996) which is described in detail in a separate

section.

Experimental evidence for the specificity of the STS-region to the processing of BM

was provided by a study comparing BM with meaningful and coordinated non-BM


15

such as the pendulum movements of a grandfather clock. Activity in the STS-region

was not induced by meaningful and coordinated non-BM (Pelphrey et al., 2003). A

dissociation between visual processing of moving humans and moving manipulable

objects was also supported by the findings of Beauchamp and colleagues (Beauchamp,

Lee, Haxby, & Martin, 2003). They showed STS activity in response to human point-

light and video displays in contrast to activity in the middle temporal gyrus evoked by

tool video and point-light displays.

1.1.5.4 Findings from lesion studies in humans

Results from imaging studies are consistent with neuropsychological findings in

neurological patients suffering from focal brain lesions (Cowey & Vaina, 2000;

McLeod, Dittrich, Driver, Perrett, & Zihl, 1996; Schenk & Zihl, 1997; Vaina, 1994;

Vaina, Lemay, Bienfang, Choi, & Nakayama, 1990). These case studies provide

evidence for a dissociation between mechanisms involved in the perception of BM on

the one hand, and mechanisms involved in inanimate visual motion tasks or static

object recognition tasks on the other hand.

Patients with bilateral lesions involving the posterior visual pathways such as patients

LM (McLeod et al., 1996) and AF (Vaina, Lemay, Bienfang, Choi, & Nakayama, 1990)

showed severe deficits in visual motion perception (seeing coherent motion in random

noise, speed discrimination) but could nevertheless recognize human action patterns

presented as point-light displays. Patients with bilateral ventral lesions involving the

posterior temporal lobes such as patients EW (Vaina, 1994), who suffered from

prosopagnosia and object agnosia, could also identify BM in point-light animations. A

different pattern emerged in patient AL (Cowey & Vaina, 2000) who is hemianopic and

suffers from visual perceptual impairments in her seeing hemifield resulting from an

additional lesion in ventral extrastriate cortex. AL fails to recognize BM displays

despite intact static form perception and motion detection. Based on the investigation of

a sample of 39 patients with acquired brain damage, it was concluded that deficits in

perception of BM are not caused by impairments of basic visual motion or form

perception. They are a consequence of damage to structures involved in the combined

analysis of visual motion and form information (Schenk & Zihl, 1997).


16

Perception of BM also seems to be affected in patients suffering from lesions in the

parietal cortex (Battelli, Cavanagh, & Thornton, 2003). Patients could easily perform a

classical form-from-motion task but were severely impaired in a visual search task

using BM sequences. The authors hypothesized attentional deficits impair the

integration process which links the unconnected traces of single dots to generate a

global percept of a human walker.

1.1.6 Computational simulations modeling neural mechanisms of

biological motion perception

According to a neural model based on previously reported findings from

psychophysical, neurophysiological and imaging studies, both the dorsal and ventral

processing streams contribute to the perceptual analysis of BM (Giese & Poggio,

2003). This computational model suggests a learning-based, feedforward mechanism

and provides a neurophysiologically plausible explanation for many of the key

experimental findings described in the previous paragraphs. Therefore, this model is

explained in more detail in the following paragraph. The computational model makes

four assumptions: First, it is divided into two parallel processing streams analogous to

the ventral and dorsal visual stream described previously. Second, both pathways

consist of “neural feature detectors” in a hierarchical order that extract form or optical

flow features. Third, the model assumes that the hierarchy is predominantly

feedforward. Forth, the representation of BM is based on a set of learned patterns that

are encoded as snapshots of body postures in the form pathway and by sequences of

complex optic flow patterns in the motion pathway.

1.1.6.1 The form pathway in the model

The form pathway recognizes BM by extracting the form information contained in

individual snapshots from sequences of body postures. Consistent with the general

organization of the visual system, the position and scale invariance as well as the sizes

of the receptive fields increase along the hierarchy. The form pathway of the model is

subdivided into four levels. The first level comprises local orientation detectors, which


17

detect eight preferred orientations and two spatial scales. This stage models simple

cells in primary visual cortex corresponding to brain areas V1 and V2. The second

level in this pathway contains position and scale invariant bar detectors. They extract

local orientation information. Within their receptive field, the responses are

independent of the spatial position and the scale of contours. There are complex cells in

area V2 and cells in area V4 with these properties. The next level along this hierarchy

consists of snapshot neurons that are selective for specific body postures. Such

snapshot neurons have large receptive fields and have substantial position and scale

invariance. Neurons with such features might be located in area IT, and in area STS

and area FA. The highest level within this model hierarchy contains motion pattern

neurons. These neurons temporally smooth and summate the activity of the snapshot

neurons of the previous layer that contribute to the encoding of the same movement

pattern. Motion pattern neurons in the form pathway of this model are assumed to be

sequence selective. Therefore, their responses are restricted to a sequence of snapshots

of body postures that occur during natural movements. Neurons with these properties

might be located in area STS, the premotor cortex (area F5) and area FA.

1.1.6.2 The motion pathway in the model

The motion pathway recognizes BM by analyzing complex optic-flow patterns that are

specific to BM. In analogy to the form pathway, the motion pathway consists of a

hierarchy of neural detectors for optic flow features. Along the hierarchy, there are

increases in the receptive field sizes, invariance of the detectors and complexity of the

extracted features. In parallel to the form pathway of the model, the motion pathway is

subdivided into four layers.

The model assumes local motion detectors and component motion-selective neurons on

the first level. Neurons on this level compute a signal that is derived from local optic

flow vectors. Speed and direction-selective neurons in areas V1 and V2 and area MT

have such properties. The model contains detectors for four different motion directions

and two speed classes. The next level consists of neurons that are selective for

opponent motion. Neurons with this property might be located in areas MT, MST and

KO. Such opponent motion detectors are obtained by combining the responses of two


18

adjacent subfields with selectivity to opposite directions. Activity of each subfield is

obtained by combining the responses of the local motion detectors of the first level

with the same direction preference. The detectors of the third level are selective for

complex optic flow patterns that arise for individual movements of BM patterns. These

neurons are analogous to snapshot neurons in the form pathway and may be located in

area STS and FA. Finally, the motion pattern neurons of the motion pathway in the

fourth layer summate and smooth the output signals of optic flow pattern neurons of

layer three. Moreover, motion pattern neurons are sequence selective. These motion

pattern neurons are located in area STS and F5.

1.1.6.3 Model features

A realistic neural model of BM perception in humans must fulfill several criteria. On

the one hand, the model must be able to generalize BM patterns across position, scale

and identity of an actor. On the other hand, the model must be selective enough to

recognize subtle details to derive the wide variety of information in BM patterns.

Consistent with psychophysical data (Beardsworth & Buckner, 1981; Cutting &

Kozlowski, 1977), the selectivity of the model is sufficient to identify individuals by

their gait. Concerning generalization, the model is invariant with respect to position

changes, scale changes and changes in the speed of the walker. Another property of

BM perception is view-dependence. When a point-light walker is rotated in the image

plane (Bertenthal et al., 1987; Dittrich, 1993; Pavlova & Sokolov, 2000; Shipley, 2003;

Sumi, 1984), recognition performance drops substantially. Simulations with the model

yielded the same result. Another key feature of BM perception is the robustness of the

phenomenon. Even under poor conditions, as e.g. dim illumination, the neural

architecture of the model recognizes BM.

1.1.6.4 Limitations of the model

Although the model proofs that a relatively simple, biologically plausible neural

architecture can account for many properties of BM recognition, it involves several

simplifications. One of these simplifications concerns the role of top-down attentional


19

effects. There is experimental evidence for such effects on BM recognition (Cavanagh

et al., 2001; Thornton et al., 2002). Moreover, the model does not consider the

complexities of every day vision such as eye movements and shifts of attention. To

take this into consideration, the neural architecture must include top-down mechanisms

and their substrates of back projections from higher levels to basic levels of the model.

1.1.7 Functional significance of biological motion perception in higher

cognitive functions

The extraordinary ability of the visual system to derive the compelling percept of a

human person from a few moving bright dots is only one feature of BM perception. It

has been directly related to a number of higher cognitive functions. Research on the

cognitive neuroscience of social perception (Adolphs, 1999, 2001, 2003), action

understanding (Rizzolatti et al., 2001), speech perception (Hauser, Chomsky, & Fitch,

2002) and theory of mind (Blakemore & Decety, 2001; Frith & Frith, 2003; Gallese &

Goldmann, 1998) has received much attention in the last years. BM perception plays an

important role in higher cognitive processes either by providing the interface between

neural systems of perception and action (action understanding and speech perception)

or by performing the initial stages in the processing of socially relevant information in

a larger social cognition network (social perception and theory of mind).

1.1.7.1 Social perception and biological motion

Social perception refers to initial stages of information processing which culminates in

the accurate analysis of the dispositions and intentions of other individuals (Allison,

Puce, & McCarthy, 2000). As pointed out in one of the previous sections, area STS

plays an important role in the processing and analysis of BM perception. This area is

not only involved in the analysis of point-light-displays, but also contributes to the

analysis of other visual features containing social information. One of these features is

direction of gaze which is thought to provide information in social situations, express

intimacy and exercise social control (Kleinke, 1986). Direction of gaze is also an

indicator of social attention by guiding the focus of another person’s attention.


20

Other visual stimuli activating area STS are mouth, hand and body movements. Mouth

movements can be broadly divided into non-speech movements and speech related

movements. Non-speech mouth movements are an important component of facial

gestures. In non-human primates it is assumed that any mouth movement which is

meaningful to another individual will preferentially activate a population of cells in the

STS-complex. One important function of speech related mouth movements is to

improve our comprehension of what is being said. Lip reading activates a region within

the STS-complex bilaterally (Calvert et al., 1997). These regions probably also play a

role in visual-auditory illusions such as the McGurk effect (McGurk & MacDonald,

1976). It has been suggested that lip reading involves regions of the STS-complex

which are distinct from those being involved in non-speech mouth movements.

Observation of hand movements activates parts of the STS-complex. Responsiveness to

hand movements is stronger if movements are goal directed such as grasping an object

(Rizzolatti, Fadiga, Matelli et al., 1996). There seems to be an advantage of the left

STS-complex in visual analysis with respect to meaningful hand movements (Grafton,

Arbib, Fadiga, & Rizzolatti, 1996; Rizzolatti, Fadiga, Matelli et al., 1996), although

one study reported bilateral activation (Grezes, Costes, & Decety, 1998). Activation of

the STS-complex is not restricted to full views of actors performing an action. Point-

light displays of goal directed hand movements also activate the left STS-complex

(Bonda et al., 1996). Observation of American Sign Language (ASL) hand movements

elicits differential activation in the STS-complex in a group of subjects not knowing

ASL and a group of subjects knowing ASL (Neville et al., 1998). STS-activity was

only observed in subjects knowing ASL. Based on this finding it was suggested that the

STS-complex is primarily activated by meaningful or communicative hand gestures.

The STS-complex also plays an important role in implied motion. Implied BM refers to

static images of an animate being performing an action (for instance a soccer player in

the act of kicking a ball). Stronger fMRI activity in the STS-complex has been found

when viewing images containing implied motion compared to images without implied

motion (Kourtzi & Kanwisher, 2000).

Social perception is one part of a larger domain of cognitive functions subserving

social communication (Adolphs, 2001). Apart from social perception, social cognition

and social behavior complement the domain of social communication. Information


21

from social perception feeds into the domain of social cognition, which in turn guides

automatic and planned social behavior. With respect to the domain of social cognition,

two brain areas play an important role: the amygdala and the orbitofrontal cortex. One

of the principal functions of the amygdala seems to be the attachment of emotional

salience to sensory input (Adolphs, 1999). This can be achieved by feedforward and

feedback projections between the STS-complex and the amygdala. Such a circuit might

lead to an attentional amplification of activity of the STS-complex. Similar

mechanisms may effect interactions between the orbitofrontal cortex, the amygdala and

the STS-complex.

The role of the orbitofrontal cortex in social cognition and regulation of behavior has

been studied since the famous case report of Phineas Gage, whose decision making

abilities were severely impaired after a large orbitofrontal lesion. In spite of normal

intellectual functioning, lesions of the orbitofrontal cortex lead to stereotyped and

inappropriate social behavior and lack of concern for other individuals (Damasio,

Grabowski, Frank, Galaburda, & Damasio, 1994). Two models of orbitofrontal cortex

function in social cognition are currently discussed. One model suggests that the

orbitofrontal cortex serves to control impulsive, aggressive and violent social behavior

(Davidson, Putnam, & Larson, 2000). The somatic marker hypothesis (Damasio, 1996)

suggests that the prefrontal cortex contributes to a mechanism which acquires,

represents and retrieves values of actions. This mechanism generates representations of

somatic states which correspond to the anticipated outcome of decisions. The somatic

markers guide decision-making on the basis of the individual’s past experience with

similar situations and favor those decisions that were advantageous for the individual.

Taken together, the domain of social communication entails several component

processes: social perception, social cognition and social behavior. The social cognition

system uses information to construct a complex mental representation of the social

environment. Such information is provided by the social perception system. Processes

in the social cognition system in turn modulate effector systems, resulting in social

behavior. A complex neural machinery subserves the social communication system.

Regions in the temporal lobe such as the STS-complex and the fusiform gyrus

subserving social perception interact with a network of structures including the

amygdala and the orbitofrontal cortex subserving social cognition. These structures in


22

turn provide the input to motor and premotor systems and the basal ganglia which

guide social behavior.

1.1.7.2 Action understanding and biological motion

Action understanding can be defined as the capacity to achieve the internal description

of an action and to use it to organize appropriate future behavior. Two explanatory

approaches for this ability are currently discussed: the visual hypothesis and the direct

matching hypothesis (Rizzolatti et al., 2001). The first view suggests that action

understanding is based on visual analysis of the elements constituting an action.

According to this hypothesis, associations between the action elements and inferences

about their interactions are sufficient to understand an action. In terms of neural

structures, extrastriate visual areas, the inferior temporal lobe and the STS-complex

mediate action understanding. Motor involvement is not required.

The second approach states that we understand an action when we map the visual

representation of the observed action onto our motor representation of the same action

(Rizzolatti et al., 2001). Observation of an action causes resonance in the motor system.

If this view is correct, we understand an action because the motor representation of that

action is activated in our brain. The direct matching hypothesis does not completely

rule out that other cognitive processes, as e.g. suggested by the visual hypothesis, could

also contribute to this function. Experimental evidence supports the latter view. An

action observation execution mechanism does exist in monkeys and humans, which has

a number of implications for the understanding and imitation of actions. Evidence

comes from studies applying transcranial magnetic stimulation (Fadiga, Fogassi,

Pavesi, & Rizzolatti, 1995; Gangitano, Mottaghy, & Pascual-Leone, 2001), from MEG

studies (Hari et al., 1998) and from fMRI studies (Buccino et al., 2001; Iacoboni et al.,

1999). In contrast to the monkey mirror system the human analogue seems to be more

flexible, since it reacts not only to goal-directed actions but also shows resonance

behavior to intransitive movements, i.e. movements not directed towards an object.

Rizzolatti and colleagues (Rizzolatti et al., 2001) argued that sensory binding of

different actions reflected by activity in the STS-complex may have derived from the

development of motor synergistic actions. Efference copies of actions may activate


23

specific sensory targets in order to improve the control of action. As a result, such an

interaction between sensory and motor systems can be used to understand the actions of

others.

Taken together, the strong interaction between the sensory and motor systems serves as

a potential mechanism for several higher cognitive functions such as imitation learning

and action understanding. Moreover, this mechanism could also play an important role

in other fundamental functions such as speech perception and the development of a

theory of mind. The following section addresses these issues in more detail.

1.1.7.3 Speech perception and biological motion

Hauser and colleagues (2002) developed a theoretical framework for the understanding

of language. They suggested a distinction between the faculty of language in the broad

and the narrow sense. According to this view, language in the broad sense consists of

three subsystems. The first subsystem is a computational system for recursion,

providing the capacity to generate an infinite range of expressions from a finite set of

elements. This system also represents language in the narrow sense. The second

subsystem is a conceptual-intentional system including categorization, reference and

reasoning. This subsystem is involved in the acquisition of conceptual representations,

referential vocal signals and a voluntary control over signal production. The third

system is the motor-sensory system linking the action and perception system with

respect to modalities of language production and perception.

The way in which sounds conveying words are transformed into linguistic

representations in the brain is still under debate. Among several theories trying to

explain how speech is perceived, the “motor theory of speech perception”, which was

originally proposed by Liberman and colleagues (Liberman, Cooper, Shankweiler, &

Studdert-Kennedy, 1967; Liberman & Mattingly, 1985), has received much attention.

The main assumption of the motor theory is that the constituents of speech are not

sounds per se, but the articulatory gestures associated with these sounds that are shared

by the speaker and the listener. Accordingly, speech is perceived by matching these

articulatory gestures contained in listened words on the listener’s motor repertoire. This


24

theory has been supported by experimental evidence of the close relation between

neural systems underlying action and perception as previously described. Further

evidence for the motor theory of speech perception is provided by the McGurk effect

(McGurk & MacDonald, 1976), which has been described in the section about social

perception. The kinematics of the face seem to play a special role in the McGurk effect,

since a point-light talking mouth aids the perception of speech in noise (Rosenblum,

Johnson, & Saldana, 1996) and interferes with audio speech perception when the

auditory and visual streams are incongruent (Rosenblum & Saldana, 1996). Speech

related and walking BM perception rely on networks which share some cortical

regions, but there are also regions that are relatively independent (Santi et al., 2003).

Santi and colleagues (2003) suggested that at the level of the STS-complex, the left

hemisphere becomes dominant in speech related BM, while the right STS maintains its

dominant role in processing whole-body BM.

The hypothesis of the specific role of BM perception of meaningful hand and mouth

movements in the development of a neural system for speech perception is also stressed

by Fadiga and Craighero (2003). A specific brain region is assumed to act as a

comparator between own and others’ motor representations. This region would allow

individuals to automatically understand the perceived action, because they are able to

reproduce the same action or the same sensory consequences of that action. In

Liberman’s motor theory of speech perception, it is necessary to have a motor

resonance system for movements of the vocal tract involved in the production of

speech sounds and perception of speech sounds. Such a mirror system may have

evolved from a matching system of motor and perceptual representations of hand

actions.

1.1.7.4 Theory of mind and biological motion

One of the most complex cognitive functions discussed in connection with BM

perception and its neuronal correlates is the ability to understand other people’s mental

states, i.e. their beliefs, desires and intentions. This capacity is described as to have a

‘theory of mind’, or mentalizing (Blakemore & Decety, 2001; Frith & Frith, 1999; U.

Frith & Frith, 2003; Gallese & Goldmann, 1998). The neuronal machinery involved in


25

mentalizing is likely to have evolved from several preexisting mechanisms. Frith and

Frith (1999) argued that such mechanisms might include the ability to distinguish

animate and inanimate entities (1), the ability to share attention by following the gaze

of another agent (2), the ability to represent goal-directed actions (3), and the ability to

distinguish between actions of the self and of others (4). The neuronal structures

assumed to underlie these function are the STS-complex for the detection of animate

objects and other people’s focus of attention, inferior frontal regions for the

representation of goal directed actions, medial prefrontal regions for the representation

of mental states of the self and the decoupling mechanism that distinguishes mental

state representations from physical state representations (Frith & Frith, 1999; Frith &

Frith, 2003). The activation of these components in concert seems to be critical to

mentalizing.

Gallese and Goldmann (1998) explicitly stressed the role of an action perception

matching system in the context of mentalizing. Their simulation theory of mind reading

suggests that other people’s mental states are represented by adopting their perspective

by matching their states with resonant states of one’s own. At least one component of

mentalizing, the understanding of intention, might have evolved from such a

mechanism (Blakemore & Decety, 2001). Their approach builds upon the suggestion

that we understand other people’s actions by mapping the observed action onto our

own motor representations of the same action. A mechanism for the recognition of

other people’s intentions could work in the following way. Sensory consequences of

our own actions are predicted by a forward model mechanism that works automatically

and stores a large number of sensory predictions resulting from different motor actions.

Efference copy signals that are generated simultaneously with the motor commands

may play a special role within this forward model. If this mechanism also operated in

the reverse direction, the process used by the forward model to predict the sensory

consequences of one’s own actions could in principle also be applied to estimate motor

commands from the observation of other people’s actions. Accordingly, observation of

other people’s action activates the motor commands that guide this action. Based on the

activation of this motor command it might be possible to estimate our own intention if

we performed the same action in the same context.


26

1.1.7.5 Concluding remarks on the contribution of biological motion perception to

higher cognitive functions

It is beyond the scope of this section to give a detailed description about all aspects of

social communication, action understanding, speech perception and a theory of mind.

The main focus was directed on the relation between the neural network involved in

BM perception and its critical contribution to these higher-level cognitive functions.

It is widely accepted that that the evolution of neural mechanisms involved in high

level cognitive functions builds upon more basic mechanisms which we share with

other animals. One of these important mechanisms is the neural machinery originally

dedicated to the perception of BM. In the animal kingdom the ability to fast and

efficiently detect BM increases an animal’s chance of survival directly. In humans, the

neural machinery that has evolved from this mechanism plays a more sophisticated role

because of its critical contribution to social communication, action understanding,

speech perception and mentalizing as well as the relevance of social interaction as

highly adaptive value.

1.1.8 Open issues concerning perception of biological motion

As indicated by the large number of publications dealing with this topic, the knowledge

about the neuronal basis of BM perception has substantially increased in recent years.

Nevertheless, there are still several open issues. Initial research of BM perception

concentrated on the information contained in the kinematics of movement patterns. It

was shown that a wide variety of information can be retrieved by human observers

from such visual stimuli and that the visual system can process this kind of stimuli very

efficiently. However, it is as yet unclear what kind of sensory filter the visual system

uses to allow such fast and efficient processing.

Whereas substantial knowledge is available about the neocortical structures underlying

BM perception from a number of neuroimaging studies described previously, many

issues concerning the distinct processing stages and their temporal characteristics in

humans are as yet unclear. Accordingly, no attempts have been made so far to link

distinct processing stages to distinct neural structures. As the results of psychophysical


27

studies have shown, attentional resources are clearly necessary in the perceptual

analysis of BM. It is an additional open question what constitutes their neural

correlates.

The term BM refers to a visual stimulus and the visual information conveyed by the

kinematics of movement patterns, respectively. It is still an unsolved question how this

information is mentally represented. BM information might be stored object centered

and viewpoint-invariant or viewer centered and viewpoint-dependent. Both kinds of

representations indicate different neuronal mechanisms. Furthermore, it is unclear

which role the motor system plays with respect to the representation of human

movement patterns and visual representation of BM.

The role of subcortical regions in the perceptual analysis of BM perception also

remains to be clarified. Neuroimaging studies of BM perception have yielded

inconsistent results with respect to the role of the cerebellum. Clinical lesion studies,

which investigate perceptual performance in patients with distinct lesions, would

provide the opportunity to further elucidate this issue, but have as yet focused on the

role of neocortical brain structures.

1.2 Objectives of the current work

The current work aims to further investigate several issues concerning the

neuropsychological basis of BM perception. Four studies address a number of open

questions described above by using different methodological approaches. The first

study investigates specific aspects of information contained in the kinematics of

animate motion patterns using a psychophysical approach (Study 1). This study is

related to the comprehensive psychophysics literature on BM perception reviewed

previously and might give further insight into the sensory filters subserving the

detection of BM. The second study examines the temporal aspects of BM processing

and associated processing stages by the analysis of event-related potentials and source

analysis (Study 2). This study aims to further elucidate its temporal processing trying to

link distinct processing stages to distinct neural systems. The third study of this thesis

explores differential visual representations of one’s own movement patterns compared


28

to other familiar movement patterns by psychophysical methods (Study 3). This study

is related to research on action perception and aims to gain further insight into the

mental representation of movement patterns and a possible interaction of perceptual

and motor representations. Finally, the fourth study investigates the role of the

cerebellum in the perceptual analysis of BM (Study 4). Using a lesion approach, this

study attempts to clarify the role of the cerebellum in BM perception. The objectives of

these studies are further specified in the following paragraphs.

1.2.1 Study 1: Biological motion as cue for the perception of size

The constant force of earth gravity directly influences the motion patterns of animate

and inanimate beings in the physical world. With respect to BM, gravity determines

periodic fluctuations between kinetic and potential energy. Therefore, gravity

determines a fixed relation between temporal (i.e. the stride frequency) and spatial

parameters (i.e. the length of a leg) of energetically optimal gait patterns. In fact it has

been proven that animals adjust their gait patterns in order to minimize the energy

required for their locomotion and that such a relation between size and stride frequency

does exist for a number of different species. It might be possible that this typical

pattern of motion created by the influence of gravity is the critical feature used by the

mammalian visual system to detect BM and process it as special category. If this

specific motion pattern plays a crucial role as sensory filter for the detection of BM, it

can be assumed that the visual system has implicit knowledge about the relationship

between temporal and spatial parameters of BM as defined by gravitational forces.

The first study* of this thesis explored, whether human observers can employ the

relation between temporal and spatial parameters as defined by gravity in order to

* The study “Biological motion as cue for the perception of size” entails two experiments which explored,

whether human observers have implicit knowledge about the relation between temporal and spatial

parameters in animate motion patterns as defined by gravitational forces. Raw data of the first experiment

were part of the diploma thesis. These data were reanalyzed and an additional correlation analysis was

performed. The second experiment was completely conducted within the PhD-project and extends the

first one by two important points. First, the mechanism for indicating perceived size was changed.

Second, a supplementary task was introduced with the goal to get a direct size estimate based on static


29

retrieve size information from point-light displays of animate motion. The rational was

to induce different size percepts by manipulation of temporal parameters with respect

to the stride frequency. If observers have implicit knowledge about the influence of

gravity on the movement patterns of animate beings, an inverse quadratic relation

between the perceived size of the animate being and its actual stride frequency is

expected.

1.2.2 Study 2: Structural encoding and recognition of biological motion:

Evidence from event-related potentials and source analysis

Whereas the neural structures involved in BM perception have frequently been

examined using functional neuroimaging, only a few studies have so far attempted to

elucidate the temporal course of processing of point-light displays of BM in humans.

The second study of this PhD-project aimed to investigate, how different processing

stages involved in the perceptual analysis of BM are reflected by modulations in event-

related potentials (ERPs) in order to elucidate the time course and location of neural

processing of BM. Data analysis was carried out using conventional averaging

techniques as well as source localization with low resolution brain electromagnetic

tomography (LORETA).

Observers were presented with three stimulus classes: point-light displays of a walking

figure in normal orientation and inverted orientation and displays of scrambled motion

as control stimuli in which dots have the same motion vector as in the BM condition

but with their initial position being randomized. We predicted an inversion effect for

BM in the time window up to 200 ms after stimulus onset as usually found for faces or

other stimulus categories, for which the observer is an expert. Moreover we expected

an ERP source specific to BM in the STS-complex, concerned with the fine analysis of

motion patterns, which provide biologically relevant information and contribute to

social perception.

cues. This procedure provided the opportunity to separate static and kinematic size cues and to calculate,

how both sources of information are integrated.


30

1.2.3 Study 3: Self recognition versus recognition of others by biological

motion: Viewpoint-dependent effects

It is an open question whether the mental representation of one’s own movement

pattern is different from representations of other familiar movement patterns. This

question is addressed in a psychophysical approach examining viewpoint-dependent

recognition effects. The knowledge about such recognition effects provides insight into

the mental representation and perceptual mechanisms of BM processing. It is still under

debate, whether the visual representations of objects are viewpoint-dependent (Bulthoff

& Edelman, 1992; Tarr & Bulthoff, 1995) or viewpoint-invariant (Biederman &

Gerhardstein, 1993, 1995).

Viewpoint-invariance indicates that object recognition is independent of the viewpoint

of previous exposure to the object. By contrast, the hypothesis of viewpoint-

dependence proposes superior object recognition if the object is presented in a familiar

perspective. A dissociation between the mental representation of one’s own gait pattern

and the representation of another familiar person might be based on neural correlates in

a common coding between perception and action as suggested by the direct matching

hypothesis, which was introduced in a previous section.

The third study of this thesis was conducted to address several issues. The first aim was

to investigate recognition performance of gait patterns from familiar persons

represented as point-light displays. The second objective was to elucidate viewpoint-

dependent effects in the representation of gait kinematics by exploring the influence of

viewing angle on recognition performance. The third aim was to examine a potential

dissociation between the mental representation of one’s own gait patterns and gait

patterns of other familiar persons.

1.2.4 Study 4: Cerebellar contribution to the perception of biological

motion

Whereas there is a general agreement concerning the role of the STS-complex in BM

(Bonda et al., 1996; Grezes et al., 2001; Grossman & Blake, 2001, 2002; Grossman et


31

al., 2000; Servos et al., 2002; Vaina et al., 2001), the literature on subcortical regions

and the cerebellum in particular is inconsistent. Two of the above mentioned studies

reported cerebellar involvement (Grossman et al., 2000; Vaina et al., 2001), while the

others failed to detect cerebellar activity. Moreover, there are also inconsistencies

regarding the cerebellar subregion that may be involved in BM perception. Grossman

and colleagues (2000) found cerebellar activity in the anterior portion near the midline,

while Vaina and colleagues (2001) reported activity specific to BM in lateral parts of

the cerebellum.

The fourth study of the present project aimed to explore the role of the cerebellum in

BM perception by assessing BM perception in patients with selective ischemic

cerebellar lesions. More specifically, the study addresses the question whether

cerebellar activity in previous imaging studies is critically involved in the perceptual

analysis of BM or whether it is a consequence of co-activations due to a feedforward

mechanism initialized by a visual stimulus of human movements. The latter hypothesis

is supported by the strong interaction between neural systems for perception and action

as described previously and by the finding of cerebellar contribution to motor imagery

(Decety, 1996; Decety, Sjoholm, Ryding, Stenberg, & Ingvar, 1990; Hanakawa et al.,

2003; Luft, Skalej, Stefanou, Klose, & Voigt, 1998; Ryding, Decety, Sjoholm,

Stenberg, & Ingvar, 1993).

1.2.5 Concluding remarks

Taken together, the series of studies aims to further elucidate the neuronal mechanisms

underlying BM perception in humans and to gain new insights concerning the

neuropsychological basis of BM perception. The studies included in this thesis should

contribute to a better understanding of the sensory filters allowing very fast and

efficient detection of animate motion patterns. Knowledge about the temporal coarse of

BM processing allows to associate distinct processing steps to circumscribed neuronal

structures. Further insight into the mental representation of movement patterns and a

possible interaction of perceptual and motor representations might have implications

for general mechanisms in human brain functioning. Finally, knowledge about the role


32

of subcortical structures with respect to BM perception might help to understand the

interplay between subcortical and cortical structures in higher visual functions.

Chapter II Biological Motion as Size Cue

33

II Study 1: Biological Motion as a Cue for

the Perception of Size

Daniel Jokisch and Nikolaus F. Troje

Summary

Animals as well as humans adjust their gait patterns in order to minimize energy

required for their locomotion. A particularly important factor is the constant force of

earth’s gravity. In many dynamic systems, gravity defines a relation between temporal

and spatial parameters. The stride frequency of an animal that moves efficiently in

terms of energy consumption depends on its size. In two psychophysical experiments,

we investigated whether human observers can employ this relation in order to retrieve

size information from point-light displays of dogs moving with varying stride

frequencies across the screen. In Experiment 1, observers had to adjust the apparent

size of a walking point-light dog by placing it at different depths in a three-dimensional

depiction of a complex landscape. In Experiment 2, the size of the dog could be

adjusted directly. Results show that displays with high stride frequencies are perceived

to be smaller than displays with low stride frequencies and that this correlation

perfectly reflects the predicted inverse quadratic relation between stride frequency and

size. We conclude that biological motion can serve as a cue to retrieve the size of an

animal and, therefore, to scale the visual environment.


34

2.1 Introduction

The perception of motion is a fundamental property of the visual system. One of the

most complex but also most familiar types of motion are the nonrigid movement

patterns of living organisms. For animals as well as for humans, animate motion

patterns contain a wide variety of information. Correct interpretation of this information

is an important ability. In the animal kingdom, accurate and fast movement recognition

of a prey or predator animal increases an animal’s fitness and, therefore, its chance of

survival. For humans, the ability to identify, interpret, and predict the actions of others

is of particular relevance in the context of successful social interaction that plays a

major adaptive role.

Visualizing the position of the main joints of a walking person by bright dots is enough

to convey a vivid impression of a human figure in motion. The percept collapses into a

meaningless array of unconnected dots when the walker stands still, demonstrating that

the interpretation is carried solely by the dynamics of the display (Johansson, 1973).

Observers require only 100–200 ms to organize such point-light displays into a coherent

percept (Johansson, 1976). The rudimentary information contained in point-light

displays of biological motion (BM) is sufficient even to solve sophisticated recognition

tasks. Observers are able to recognize the gender of a walking person (Barclay, Cutting,

& Kozlowski, 1978; Cutting, 1978; Kozlowski & Cutting, 1977; Mather & Murdoch,

1994; Troje, 2002), can identify friends by their gait (Cutting & Kozlowski, 1977), and

can even recognize themselves from a recorded point-light display of their own

movements (Beardsworth & Buckner, 1981). Mather and West (1993) extended the

point-light display paradigm to animations of four-legged animals and showed that

human observers can identify different animals by their movement pattern. Inversion

effects of BM displays of animal movements were investigated by Pinto and Shiffrar

(1999). The ability to perceive BM is not restricted to humans. It has been shown that

cats are able to identify point-light displays of conspecifics (Blake, 1993), that pigeons

are capable of discriminating between categories of conspecifics’ walking and pecking

when presented as point-light displays (Dittrich, Lea, Barrett & Gurr, 1998), and that

chicks and quails also have the ability to perceive point-light displays of BM of

conspecifics (Yamaguchi & Fujita, 1999). The ability of nonhuman primates to perceive


35

BM was indicated by the finding of single cells responding selectively to BM displays

(Oram & Perrett, 1994).

Animals as well as humans adjust their gait patterns in order to minimize the energy

required for their locomotion. The energy costs are determined by the properties of the

physical world. A particularly important factor in this context is the constant force of

earth’s gravity. For many dynamic events occurring under constant gravity conditions, a

fixed relation between temporal and spatial parameters is maintained. This relation is

particularly valid for inanimate motion systems, such as pendulum motion or ballistic

motion. However, it also seems to hold for many animate motion patterns. Therefore,

from a theoretical point of view, time can be used as an information source about spatial

scale in visually recognizable events under the influence of gravity. Several studies have

investigated the perception of scale properties in inanimate dynamic events.

Pittenger (1985, 1990) examined the perception of the scale properties in pendulum

motion. The length of a freely swinging pendulum is proportional to the square of its

period. Pittenger (1985) found that observers could estimate the length of a pendulum

when given information about its period. The estimated lengths were found to be a

linear function of actual lengths, though with wide differences in slopes among

individual observers. When viewing normal pendulums with physically correct periods

and perturbed pendulums with either shorter or longer periods, observers could rate the

naturalness of motion with a high degree of acuity (Pittenger, 1990).

The same idea has also been applied to the perception of the distance of objects in free

fall (Saxberg, 1987a, 1987b; Watson, Banks, von Hofsten, & Royden, 1992). The law

of free fall motion relates the height of a fall to the duration of the event. Analogous to

pendulum motion, the height of fall is proportional to the square of its duration. In a

simulated catching task, in which observers should predict the position where a ball

approaching along a parabolic trajectory would fall, Saxberg (1987b) tested whether

observers make use of this information. When the display contained information both

from image expansion and vertical component of free fall, observers performed this task

well, but when information of image expansion was eliminated, they failed. The authors

concluded that the latter finding demonstrated a lack of using the information mediated

by the relation between height of fall and its duration. However, Watson and colleagues


36

(1992) argued that this failing was based on conflicting sources of information and not

purely on the inability to retrieve the relation between height and duration of the event.

Stappers and Waller (1993) tested people’s ability to use the time of free fall of objects

as a reference to spatial scale and showed that observers reliably matched gravitational

acceleration to apparent depth in a computer simulation. Hecht, Kaiser and Banks

(1996) examined whether observers could utilize size and distance information provided

by gravitational acceleration by presenting observers with displays of the motion of

rising and falling objects. Observers were able to use the information to some extent but

were more sensitive to average velocity than to gravitational acceleration. Another study

that investigated the perception of spatio-temporal patterns of object motion (Warren,

Kim, & Husney, 1987) demonstrated observers’ ability to make accurate perceptual

judgments of elasticity of bouncing objects by detecting single period duration visually

or auditorily in absence of height information. McConnell, Muchisky and Bingham

(1998) tested observers’ ability to judge object size in event displays that eliminated all

information other than time and trajectory forms. Initially, judgment variability was

substantial, but after feedback on one event, observers performed better and generalized

training to other events. Observers were sensitive to the general form of the spatio-

temporal scaling relation, but required feedback to attune event-specific constants.

The general form of the relation between a spatial scale s and a temporal scale T in

events governed by gravity is given by

s = kt², (1)

where k is a constant factor specific to the event being considered.

The above findings document that the human visual system seems to be able to use this

quadratic relation in order to achieve size information from temporal cues. The absolute

quantitative relation expressed in the constant k, however, is not as easily obtainable.

Psychophysical studies considering the relationship between temporal and spatial

parameters as visual cues for event perception have not been restricted to inanimate

dynamic systems. In the domain of animate motion, such visual cues are proposed to

play a role in action perception. Runeson and Frykholm (1981, 1983) have shown that

the weight of an object can be readily estimated by observing another person lifting and

carrying it when the person is represented as point-light display. They concluded that


37

the crucial information is embedded in the kinematics of the action pattern, in which an

object’s weight is specified by the magnitude of postural adjustments relative to the

acceleration of the object. Bingham (1987, 1993c) provided further empirical evidence

for the content of information about an object’s weight in the kinematic pattern. The

studies by Runeson and Frykholm (1981, 1983) and Bingham (1987, 1993c) investigate

the ability to derive additional information from visual point-light displays of human

actions employing knowledge about the effects of gravity on objects in the physical

world. Therefore, these studies are related to our question. However, they do not

directly address the question whether temporal parameters from BM can be used as a

cue about size information of animate beings.

From a physical point of view, the relation between temporal and spatial parameters

described above is also evident in animate locomotion patterns. A simple model for a

walking biped is an inverted pendulum that idealizes the total body mass to a point mass

on a rigid mass-less leg (Alexander, 1977). More complex models consider humans and

animals as a set of coupled, articulated pendulum segments. No mechanical energy is

needed to maintain the movements of an ideal undamped pendulum because kinetic and

gravitational potential energy fluctuations are equal in amplitude and exactly 180° out

of phase. In humans, the pendulum-like mechanism conserves about 65% of the

mechanical energy from step to step at the preferred walking speed (Cavagna, Thys, &

Zamboni, 1976). Pendulum-like energy exchange diminishes at faster walking speeds

because of a mismatch in the magnitudes and phases of the fluctuations of the two

forms of mechanical energy. Thus, at non-optimal speeds, the muscles must provide

additional mechanical power. The relation between the length l and the period T of an

ideal pendulum is

²4/² πgTl = (2)

with g being gravitational acceleration. In order to obey this relation, smaller animals

have to move with a higher stride frequency f = 1/T than larger animals.

The major force that determines the pendulum-like movements during walking is

gravity, which must be at least equal to the centripetal force needed to keep the center of

mass traveling along a circular arc. The centripetal force needed is equal to mv2/L,

where m is body mass, L is leg length, and v is forward speed (Kram, Domingo, &

Ferris, 1997). The ratio between the centripetal force and the gravitational force


38

(mv2/L)/mg = v2/gL is the dimensionless Froude number (Alexander, 1989). Therefore,

if animals travel with equal Froude number, their speeds v are proportional to the square

root of the leg length L. If they move in dynamically similar fashion (Alexander &

Jayes, 1983), the stride length l is proportional to the leg length and hence the stride

frequency f = v/l is inversely proportional to the square root of the leg length.

Pennycuick (1975) measured the stride frequencies of African mammals moving

spontaneously in their natural habitat and found that they are in fact inversely

proportional to the square root of the stride length to a very good approximation. Thus,

the findings show that the relation between spatial and temporal scales expressed in

Equation 1 is also reflected in the locomotion patterns of animals.

In this study, we examined whether the human visual system is able to use this relation

to derive the size of an animal in the absence of other cues. To achieve this, we

presented observers with point-light displays of a dog. Varying the playback speed, we

asked observers to estimate the size of the dog. We predicted that animals are perceived

to be larger in animations presented with low stride frequency and smaller in animations

with high stride frequency. More specifically, we assumed that the relationship between

the stride frequency f of an animal and its estimated size dyns is

²1

1 fcsdyn = (3)

where c1 is a constant factor quantifying the spatio-temporal scaling relation. The

absolute value of c1 depends on gravitational acceleration and on the gait pattern (e.g.,

trotting, cantering, etc.).

However, the kinematics of the animation may not be the only source of information

about the dog’s size. Additional size cues might be contained in an animal’s posture or

proportions of body segments. For example, Pittenger and Todd (1983) have shown that

changes of static body proportions of line drawings of a human body have an effect on

perception of growth, and, therefore, also have an indirect effect on the perception of

size. Studies using other biological objects have also shown that the perception of size

can be influenced by form information. Bingham (1993a, 1993b) showed that properties

of tree form could be used to estimate the height of trees.


39

The size information embedded in body proportions is independent of the temporal

scaling factor and can be described as a second constant:

2csstat = (4)

dyns and stats exist simultaneously and both may contribute to a size estimate. Here, we

assume linear integration, and we introduce a factor λ accounting for the relative weight

of the two terms:

21 )1(²

1 cf

csall λλ −+= (5)

In order to test this hypothesized model, we conducted two experiments presenting

observers with point-light displays depicting a dog moving across the screen. We chose

a dog as a model because dogs cover a wide range of different sizes ensuring that size

estimations made by observers are not restricted too much by the range of possible

sizes. Our point-light dog was shown as walking through a three-dimensional scene

depicting a desert landscape. When observing the image of such a scene, the perceived

size of different objects within the scene depends, on the one hand, on the visual angle

covered by the objects and, on the other hand, on the perceived position in depth within

the scenery. As a consequence of this size-distance ambiguity, there exist two methods

to change the size of an object within the scene: (1) varying its position in depth while

maintaining a fixed visual angle or (2) showing the object at a fixed distance and

varying the size of the object’s visual angle. For both methods, the size of other objects

embedded within the scene provides an absolute reference.

In Experiment 1, observers were asked to adjust the apparent size of the dog by

changing its position in depth while maintaining its projected size on the screen, and,

therefore, its subtended visual angle. In Experiment 2, observers were allowed to

change the size of the dog directly. In Experiment 2, we also added a second task: In

addition to estimating the size of the dynamic point-light displays, observers were

required to estimate the size of a static stick-figure display.


40

2.2 Experiment 1

The observers’ task was to estimate the size of the dog animations. The point-light

displays were presented in a desert landscape with varying stride frequencies.

Perspective and texture gradient created a three-dimensional percept. Reference objects

(cactuses and posts) were scattered across the scene to provide size references at

different depths. With the visual angle subtended by the dog remaining constant,

observers could place the animation at different locations in depth in order to indicate

the perceived size.

2.2.1 Methods

2.2.1.1 Participants

Sixteen students (11 females and 5 males) between the ages of 20 and 39 years from the

psychology and biology departments at the Ruhr-University participated in this

experiment. They received course credit for their participation. All participants had

normal or corrected-to-normal vision. They were naive as to the objectives of this

experiment.

2.2.1.2 Stimuli

Synthetic motion data of a dog (“Animania Dog” by Credo Interactive Inc.) were

presented in saggital view as point-light displays. The display consisted of 20 dots

altogether. Three dots represented the position of each leg’s main joints (forelegs:

elbow, carpal, and phalange; hind legs: knee, tarsal, and phalange). The positions of the

pelvis and the scapula were both represented by two dots each. Two dots represented

the position of the head and two represented the position of the thoracic and coccygeal

vertebrae. Each dot had a size of 4 mm2 and was displayed in a bright green coloring.

An additional set of 20 black dots represented the shadows of the dots depicting the

dog’s body. Adding a shadow ensures that observers perceive the animal’s legs to have

contact to the ground. The point-light display had a size of 4 cm on the screen


41

corresponding to 4 deg of visual angle at the viewing distance of 58 cm. This distance

was fixed by using a wooden chinrest. The image sizes of the point-light displays were

held constant across all trials.

In order to determine exactly the gait pattern of our animated dog, we examined the

phase relations between the feet. The difference between various gait patterns is

described by the phase relations between the movements of the four legs. For instance,

the trot is a symmetrical gait in which diagonal pairs of legs move together. In cantering

animals, this symmetry is broken. Whereas one diagonal pair of legs moves in

synchrony, the other pair is out of phase, with the respective foreleg being ahead of the

contralateral hind leg. According to Alexander (1984), the phase difference of this

asynchronous pair is 140 deg. In our data, the phases of the legs with respect to the left

foreleg were 155, 205, and 0 deg for the right foreleg, the left hind leg, and the right

hind leg, respectively. This pattern clearly shows the asynchronous characteristic of the

canter, but the phase difference between foreleg and hind leg of the asynchronous leg

pair is smaller than described by Alexander (1984). We still term the gait pattern of our

animated dog in the following experiments as “canter,” accepting some mismatch

between the phase relation in our data and data reported in the literature.

The point-light displays were presented on a background depicting a perspective

landscape (Fig. 2.1). The landscape was designed with the software Bryce 4 by Meta

Creations. It portrayed a desert scene in which were embedded some objects (cactuses

and posts) serving as reference objects. All objects belonging to the same class had the

same size within the perspective scene (posts 1 m; cactuses 2 m), resulting in varying

image sizes on the screen according to their positions in spatial depth. Posts were

positioned in regular distances on two parallel lines. Cactuses were arranged in random

order. The lens of the camera recording this scenery was positioned 1.5 m above the

ground having a tilt angle of 8°. The scenery subtended a visual angle of 35.5 * 24.5

deg.


42

Fig. 2.1. Display of a dog on the perspective background. The lines connecting the dots were shown only in the stick-figure depictions of the second subtask of Experiment 2. They were omitted in Experiment 1 and in the first subtask of Experiment 2.

2.2.1.3 Procedure

Animated dogs moved across the scene from the left-hand side to the right-hand side.

The playback speed was varied systematically, resulting in five different stride

frequencies (2.54, 3.02, 3.59, 4.27, and 5.08 cycles/s). These frequencies corresponded

to 71, 84, 100, 119, and 141% of the original stride frequency. By pressing the arrow

buttons on the keyboard, participants could change the vertical position of the point-

light display on the screen and hence the perceived position in depth in 21 steps. The

physical size of the point-light display remained constant. Due to the perspective

background, each vertical screen position corresponded to one position in spatial depth,

resulting in a changed size impression. Apparent size changed from one position to the

adjacent one by factor 1.09. According to the 21 different positions, apparent size

changed altogether by a factor of 5.66 within the whole range.

The experiment took place in a separate experimental room. Animations were presented

on a 19-inch monitor (90 Hz) at a frame rate of 45 Hz. Observers were told that they


43

would be shown with dogs of different sizes animated as point-light displays. They

were instructed to adjust the apparent size of the dogs so that the display on the screen

looked as natural as possible. In each trial, observers were allowed to try different

positions as often as they wanted. Each time the observers hit a key to change the size,

the dog started at the initial position on the left side of the screen. A trial was completed

when the observers had selected one position and confirmed their choice by pressing the

space bar. Time for solving the task was unlimited. No feedback was given following

the size judgments. Before starting the experimental trials, observers were shown six

demonstration trials in order to familiarize them with the displays and the setup. During

those demonstration trials, the experimenter pointed out the perspective properties of

the scene and drew attention to the various sizes of the objects (posts and cactuses)

serving as reference scale.

The experiment was conducted using a one factorial repeated measures within-subjects

design. The independent variable encoded the five different stride frequencies of the

dog animation. In each condition, 11 repeated trials were presented. Each trial started

with different initial sizes covering the whole range of possible sizes. The order of the

55 trials was randomized individually for each participant.

2.2.2 Results and discussion

The effect of stride frequency on perceived size was significant as tested by an analysis

of variance (ANOVA) (F(4,60) = 11.85, p < 0.001). On average across all participants,

animated dogs moving with high stride frequency were perceived to be smaller than

dogs moving with low stride frequency (Fig. 2.2). This outcome confirms the

hypothesis that observers retrieve size information from the stride frequency that

animals use for locomotion. Recall that the instructions did not explicitly draw

observers’ attention to the stride frequency of the animated animals. According to the

instructions, observers were requested to adjust the position so that the scene looked as

natural as possible. Therefore, observers seem to use implicit knowledge to make their

size judgments.

Based on the assumptions formulated in Equation 5, the function


44

21 ²1 kf

ksall += (6)

was fitted to the data. Using averages across observers, the best fitting values are k1 =

141 and k2 = 35. With these values, the Equation 6 correlates with r2 = 0.96 to the means

of estimated sizes across all observers. Only 4% of the variance of the data remains

unexplained. A linear fit, on the other hand, correlates to the empirical data with r2 =

0.88, therefore leaving 12% of the variance unexplained.

Fig. 2.2. Means across all 16 observers in Experiment 1. The estimated size is plotted for each stride frequency. Error bars indicate SEM. The graph corresponds to the fit of the theoretical model. The coefficient of determination between the function and the means across all observers is r2 = 0.96.

When focusing on the patterns of results obtained from each observer, clear

interindividual differences in consideration of the spatio-temporal scaling relation

become obvious, as is indicated by the variability of k1 (Fig. 2.3). Substantial individual

differences are also evident in the correlation between the empirical data and the model

fit (Table 2.1). Out of 16 observers estimating the size of the animated dogs, 9 showed a

response pattern correlating significantly to the model fit. The response pattern of the

others failed to reach a level of significant correlation. One of the observers (T.B.)

reaching a significant level of correlation between his response pattern and the model fit

interpreted the temporal scaling factor in opposition to the expected direction. This

observer associated high stride frequencies with large sizes and low stride frequencies

with small sizes, resulting in a negative value for k1.


45

Fig. 2.3. Mean estimated size for each observer for each stride frequency in Experiment 1 (n = 11). Error bars indicate SEM. The graph corresponds to the fitted model to each observer individually. *p < 0.05; **p < 0.01 indicates the level of significance of the correlation between the model and individual size estimations.


46

Consistent size information could be retrieved by 50% of the observers in the setting

realized in Experiment 1. This outcome indicates substantial interindividual differences

in the ability to retrieve information from the spatio-temporal scaling relation. Such an

outcome might have at least two possible sources. One explanation is that some

observers neglect the spatio-temporal scaling relation in their estimations and refer only

on other size cues. Alternatively, it may be possible that some observers did not

understand the relationship between changes in vertical position and spatial depth, and,

therefore, had major problems to indicate their size impression adequately within this

experimental setup.

Table 2.1. Experiment 1: Parameters of the theoretical model (Equation 6) fitted to the data of individual participants. r2 = coefficient of determination. *p < 0.05; **p < 0.01.

Participant 1k 2k r²

F.N. 308 27 0.38**

K.S. 358 23 0.62**

L.J. 47 56 0.00

C.O. 324 14 0.75**

Z.K. 155 23 0.45**

M.H. 286 39 0.17**

I.L. 8 59 0.00

L.M. 341 16 0.66**

H.B. -26 48 005

T.R. 93 37 0.05

T.B. -111 60 0.12**

M.V. 76 37 0.08

J.B. 104 36 0.11*

R.R. -11 31 0.01

A.S. 215 17 061**

S.B. 93 44 0.04

We inferred perceived size by requiring observers to adjust the position of the dog

animation in the landscape. One objection to this task could be that the phenomenon of


47

visual depth compression might cause perceptual distortions of the otherwise well-

defined relation between the distance of an object and its projected size. However, as

Sedgwick (1993) points out, this would not affect frontal plane dimensions of a

projected object. In addition, the scene provides reference objects at different depths.

The observers therefore did not have to rely on distance provided by depth cues alone.

The size of the dog could be indicated simply in relation to the size of the cactuses and

posts scattered around the scene.

Moreover, from the setting realized in Experiment 1, neither the weight factor λ

providing information about the individual weights of both sources of information

(static versus dynamic) nor the constants c1 and c2 can be calculated directly, because λ

is confounded with the constant scaling factors c1 and c2 (Equation 5). The constant k1

combining λ and c1 only weakly reflects the tendency to what extent the temporal

scaling relation is considered. As a consequence of the above discussed issues, we

designed a second experiment. In this experiment, observers were allowed to directly

change the size of the dog. While this may facilitate indication of perceived size for the

observers, it also rules out any remaining concerns about depth-compression effects.

Furthermore, a second subtask was added to deal with the problem of confoundation of

the weight factor with the scaling factors.

2.3 Experiment 2

In this experiment, we changed the mechanism for indicating perceived size. Observers

could change perceived size of the dog directly by changing its projected size while its

position in spatial depth remained constant. In the supplementary task, with the goal to

get a direct size estimate based on cues independent of the stride frequency, observers

were requested to estimate apparent size of a static stick-figure depiction to derive a

direct measure of c2 in Equation 5. In combination with measurements k1 and k2

obtained from the first part of Experiment 2, this was used to derive values for λ and c1.

By this procedure, we are able to separate size information from static and dynamic

sources and to calculate how the sources of information are integrated.


48

2.3.1 Method

2.3.1.1 Participants

Sixteen students (8 females and 8 males) between the ages of 19 and 32 years from the

psychology department of the Ruhr-University participated in this experiment. None of

these participants had participated in Experiment 1. Participants received course credit

for their participation. All participants had normal or corrected-to-normal vision. They

were naive as to the objectives of this experiment.

2.3.1.2 Stimuli

Stimuli were identical with the ones used in Experiment 1 with the exception that rather

than displaying the dog with constant projected size at 21 different positions in depth,

this time we generated 21 differently sized dogs and displayed all of them at the same

position. The range of apparent sizes covered by this mechanism was the same as in the

previous experiment. The visual angle of the dog animation varied from 2.2 deg for the

smallest animation to 12.4 deg for the largest animation. The pixel size of the dots

describing the positions of the main joints and their shadows on the ground were

adjusted accordingly. As in Experiment 1, five different stride frequencies were used:

2.54, 3.02, 3.59, 4.27, and 5.08 cycles/s. For the second part of the experiment, we

generated a static stick-figure depiction of the point-light display on the perspective

background used before. The stick figure was positioned in the middle of the screen.

Dots belonging to adjacent joints were connected, illustrating the articulation of the

joints (Fig. 2.1).

2.3.1.3 Procedure

The procedure in the first subtask in Experiment 2 was performed similarly to the one

used in Experiment 1. The only difference was the mechanism for indicating size.

Observers’ instructions were similar to the ones in the former experiment, but were

adapted to the new procedure. Six demonstration trials preceded the 55 experimental


49

trials, in which observers gave their size estimates by choosing the dog with the size

that looked most natural. Observers were given no feedback following their size

judgments. The experiment was conducted using a one factorial repeated measures

within-subjects design. In each of the five different frequency conditions, 11 repeated

trials were presented. Each trial started with different initial sizes covering the whole

range of possible sizes. The order of the 55 trials was randomized individually for each

participant. Having completed the first part of the experiment, participants were

instructed about the second subtask, in which they were presented with 11 trials

showing static stick-figure displays of a dog. Observers were explicitly told that all

stick-figure displays were based on the same animal, varying only on its initial display

size and the state (i.e., the phase) of the stride cycle. Using the arrow keys on the

computer keyboard, their task was to indicate the size of the stick-figure dogs by the

same mechanism as in the first subtask.

2.3.2 Results and discussion

The results of the first part of this experiment were analyzed as in Experiment 1. Similar

to the previous experiment, on average across all observers, dogs moving with high

stride frequency were estimated to be smaller than dogs moving with low stride

frequency (Fig. 2.4). This effect was significant as tested by an ANOVA (F(4,60) =

20.67, p < 0.001).


50

Fig. 2.4. Means across all 16 observers in Experiment 2. The estimated size is plotted for each stride frequency. Error bars indicate SEM. The graph corresponds to the fit of the theoretical model. The coefficient of determination between the function and the means across all observers is r2 = 0.98.

This finding again supports the spatio-temporal scale hypothesis. The following

function provides the best fit between the theoretical model and the empirical data:

38²

1189 +=f

s

The coefficient of determination between this function and the means of estimated sizes

across all observers was r2 = 0.98. A linear fit correlates to the model with r2 = 0.94.

Comparing the proposed model fit with a linear fit, the proposed model leaves only 2%

of the variance unexplained, whereas the linear fit leaves 6% of the variance

unexplained. The median of the static figure size estimations of each observer in the

second subtask was taken as value for c2, representing size information independent of

any temporal scaling cue. On average across all observers, c2 assumes a value of 61.47

cm. The standard deviation of 13.90 cm is relatively small, indicating a generally

uniform behavior in this subtask. Individual measures for c2 were used to determine the

weight factor λ = 1 - k2 / c2 and the spatio-temporal scaling factor c1 = k1 * c2 /( c2 – k2)

for each observer, according to Equation 5 (Table 2.2).


51

Table 2.2. Experiment 2: Characteristics of the theoretical model (Equation 5) fitted to the data of individual participants. Note: k1 = λ c1; k2 = (1-λ) c2. c2 was derived from the median of the size estimations per observer given in the static stick-figure trials. r2 = coefficient of determination. *p < 0.05; **p < 0.01.

Participants 1k 2k 1c 2c λ r²

A.C. 491 20 732.84 60.00 0.67 0.71**

J.A. 186 30 413.33 55.00 0.45 0.25**

H.O. 219 38 521.43 65.43 0.42 0.29**

U.A. 204 34 340.02 84.82 0.60 0.24**

A.A. -13 69 86.67 60.00 -0.15 0.00

S.I. 68 53 566.67 60.00 0.12 0.02

J.N. 143 49 572.01 65.43 0.25 0.08

N.K. 427 12 514.46 71.34 0.83 0.81**

C.N. 184 33 408.89 60.00 0.45 0.52**

D.M. -5 70 -20.83 92.50 0.24 0.00

P.P. 97 33 440.91 42.41 0.22 0.18**

M.H. -27 43 128.57 35.67 -0.21 0.06

A.G. 205 34 427.03 65.43 0.48 0.26**

C.K. 180 29 382.98 55.00 0.47 0.36**

M.K. 283 21 435.38 60.00 0.65 0.45**

J.C. 373 33 1065.71 50.43 0.35 0.18**

The individual response patterns again showed considerable inter-individual differences

in the use of the spatio-temporal scaling factor (Fig. 2.5). In this experiment, a very

clear division into two groups became apparent. Whereas for 11 out of 16 observers the

correlation with the proposed model was highly significant (p < 0.01), there was no

correlation at all for the remaining 5 observers (p > 0.05). Showing very flat curves,

these observers did not seem to pay any attention to the different stride frequencies.

Their response patterns seemed to be completely ignorant with respect to the

independent variable (i.e., the stride frequency). Two observers (J.N. and D.M.) also

showed very large variances across similar stimulus repetitions, which indicates that

they responded in a disoriented manner. Observers from this group also gave the largest

and smallest values for the size of the statically displayed dog. Consequently, for some

of them, very low (and in two cases even negative) values for λ are obtained.


52

Fig. 2.5. Mean estimated size for each observer for each simulated stride frequency in Experiment 2 (n = 11). Error bars indicate SEM. The graph corresponds to the fitted model to each observer individually. *p < 0.05; **p < 0.01 indicates the level of significance of the correlation between the model and individual size estimations.


53

Disregarding the five participants that did not show any meaningful behavior, the results

show that the inverse quadratic relation between characteristic size and stride frequency

is employed by the visual system when estimating the size of an animal in the absence

of other cues.

2.4 General discussion of both experiments

As summarized above, previous experimental work has shown that observers are able to

judge object size in inanimate dynamic systems governed by gravity. The experiments

reported here provide the first empirical evidence that those findings can be extended to

the domain of animate motion as well. The human visual system uses the physically

determined relation between spatial and temporal scales to obtain the size of a moving

animal in the absence of other cues. In both experiments conducted to test the spatio-

temporal scale hypothesis, we found the predicted effect of stride frequency on

perceived size. Nevertheless, when investigating the individual size estimations in terms

of the parameters of the proposed model, substantial interindividual differences became

evident. These differences were more pronounced in Experiment 1 than in Experiment

2. The results obtained in the modified setting show that observers retrieved the motion-

mediated size information more efficiently. The data show less intersubject variability

and larger values for k1 when compared to Experiment 1, in which we had attempted to

provide a method for transforming observers’ size impression into a corresponding

response while maintaining a constant retinal size of the stimulus.

In the two experiments reported here, we presented to the observers a single scaling

relation between time and space with the requirement to yield judgment of spatial scale

based on temporal variations. One might argue that observers simply assign numbers to

the temporal variations without really detecting these variations as information about

scale. However, if this were the case, one would expect observers to assign the direction

of the mapping between time and space arbitrarily. Only one of 32 observers showed a

reversed correlation between perceived size and stride frequency. Moreover, we found a

quadratic relation rather than a simple linear one, which reflects the physical properties

of the temporal spatial relation. Simply assigning numbers to temporal variations would

probably lead to a linear relation instead of a quadratic one.


54

Altogether, seven observers in Experiment 1 and five observers in the optimized setting

in Experiment 2 neglected the temporal-spatial scaling relation by showing a random

pattern in their results. A reason for this pattern of results might be the methodological

approach. We used a method similar to Pittenger (1985), in which participants were

given only timing as information about spatial scale in pendulum motion. Pittenger’s

results were similar to the current results in that they were noisy with strong individual

differences. In a related study concerning pendulum motion (Pittenger, 1990), the

observers were given precise information about spatial scale, but the timing of the event

was manipulated to be either consistent or inconsistent with the pendulum law. Rather

than having to readjust the correct timing, observers had to judge only its correctness.

Observers performed with high accuracy on this task. According to Pittenger’s results,

observers seem to be more sensitive to violation of the temporal-spatial scaling relation

than to transforming temporal information about spatial parameters into size judgments.

A similar effect may have also played a role in our setup.

Given constant stride length, a higher stride frequency goes along with a higher

locomotion speed. One might be concerned about this confoundation of stride frequency

and locomotion speed, arguing that the current results could depend on simple

translational speed rather than on the details of the gait itself. In a previous study

(Jokisch, Midford, & Troje, 2001), we used point-light displays of BM of dog

animations, having subtracted the translational motion component. Consequently, the

position of the point-light animal remained constant in the center of the screen. Varying

the stride frequency, we found a significant effect on perceived size. Therefore, we are

confident that the crucial source conveying size information in the experiments we are

reporting here is the stride frequency itself. Nevertheless, we cannot entirely exclude

that translational speed may contribute to the size judgment. In a natural display stride

frequency, locomotion speed and stride length cannot be unconfounded. However, we

did not want to make any issue about the details of the perceptual cues used to derive

size from BM. Instead, we wanted to test whether the human visual system is able to

employ the relation between temporal and spatial scales, which is physically defined

through gravitational acceleration.

Human observers seem to be able to employ the general inverse quadratic relation

between size and stride frequency to derive information about size from temporal


55

parameters. In addition to this qualitative result, the measurements taken in Experiment

2 can also be used to make quantitative comparisons between the absolute size indicated

by the observers and the size of real animals that walk with the respective stride

frequencies. The relation between size and stride frequency of walking animals is

expressed by the factor c1 in Equation 3. Summarizing the results of Experiment 2, we

compute c1 as the median of the 11 observers that did respond in a consistent manner.

The resulting value amounts to 435 cm s-2. Unfortunately, the only set of data that we

are aware of which can be used to derive the spatio-temporal relation factor from natural

locomotion patterns is the one reported by Pennycuick (1975), who compared stride

frequencies and shoulder heights of 14 African quadruped mammal species for different

gait patterns. The smallest animal in this study (Thomson’s gazelle) had a shoulder

height of 60 cm; the largest one (elephant) had a shoulder height of 310 cm. From

Pennycuick’s Fig. 13, we calculated c1 to amount to 410 cm s-2 for cantering animals.

This value is very close to the one obtained from our data.

The close matching between the empirical data for cantering animals (Pennicuick, 1975)

and the data obtained in our experiments seems to imply that the human visual system

not only takes into consideration the general inverse quadratic relation between stride

frequency and size but also takes advantage from implicit knowledge about the

particular observed gait pattern. We want to note, however, that the good quantitative fit

between Pennicuick’s and our data may well be accidental. There are a number of

factors that introduce uncertainty into the absolute value of the spatio-temporal scaling

factor c1 as derived from our experiments. For instance, the perceived height of the

reference objects in the scenery may deviate from their “real” height. The posts were

intended to have a height of 1 m and the cactuses a height of 2 m. Those numbers were

given to the observers in their introduction to the experiment. However, the reference

objects may still have been perceived to be larger or smaller, changing the reference

frame used to indicate the dog’s size. Another critical point is the determination of the

constant c2 in Equation 5. In the second subtask of Experiment 2, we tried to measure

the perceived size as given by cues that are independent from stride frequency. We did

that by asking the observers to estimate the size of a static stick-figure display.

However, this procedure may not be sufficient to accurately derive the desired

information. It is still possible that a moving dog does provide cues about its size, which

are not available in the static display but which are still not depending on the stride


56

frequency. A last factor that adds uncertainty is the fact that living animals, even if they

try to minimize energy consumption during locomotion, are still different from

inanimate dynamic systems. In a swinging pendulum or a bouncing ball, the relation

between temporal and spatial parameters is exactly defined by gravity, because no other

forces affect these motions. In contrast, in dynamic animate systems, muscular forces

controlled by intentional behavior play an important role. They are not used only to

simply compensate for damping effects in the articulated pendulum system of the body;

they can also be used to significantly alter the motion pattern to cover a wider range of

stride frequencies within a given gait pattern.

In summary, we can state that human observers are able to employ implicit knowledge

about the general inverse quadratic relation between size and stride frequency to derive

information about the size of an animal from temporal parameters. The exact scaling of

this relation is dependent on a number of parameters that are beyond the control of our

current experiments. We are therefore critical with respect to the perfect accordance of

our data with quantitative predictions involving knowledge about the biomechanics of

particular quadruped gaits. It would be interesting, however, to measure whether the

perceived size of animals traveling with a given stride frequency changes in a

predictable way as a function of the gait pattern.

Chapter III Encoding and Recognition of Biological Motion

57

III Study 2: Structural Encoding and

Recognition of Biological Motion:

Evidence from Event-related Potentials

and Source Analysis

Daniel Jokisch, Irene Daum, Boris Suchan1 and Nikolaus F. Troje

Summary

In the present study we investigated how different processing stages involved in the

perceptual analysis of biological motion (BM) are reflected by modulations in event-

related potentials (ERP) in order to elucidate the time course and location of neural

processing of BM. Data analysis was carried out using conventional averaging

techniques as well as source localization with low resolution brain electromagnetic

tomography (LORETA). ERPs were recorded in response to point-light displays of a

walking person, an inverted walking person and displays of scrambled motion.

Analysis yielded a pronounced negativity with a peak at 180 ms after stimulus onset

which was more pronounced for upright walkers than for inverted walkers and

scrambled motion. A later negative component between 230 and 360 ms after stimulus

onset had a larger amplitude for upright and inverted walkers as compared to scrambled

walkers. In the later component, negativity was more pronounced in the right

hemisphere revealing asymmetries in BM perception. LORETA analysis yielded

evidence for sources specific to BM within the right fusiform gyrus and the right

superior temporal gyrus for the second component, whereas sources for BM in the

early component were located in areas associated with attentional aspects of visual

processing. The early component might reflect the pop-out effect of a moving dot

1 Boris Suchan contributed to this study by providing assistance with performing the LORETA analysis and discussing the results.


58

pattern representing the highly familiar form of a human figure, whereas the later

component might be associated with the specific analysis of motion patterns providing

biologically relevant information.


59

3.1 Introduction

The human visual system is very sensitive to the detection of animate motion patterns.

We can efficiently detect another living being in a visual scene, recognize human

action patterns and attribute many features of psychological, biological and social

relevance to other persons. An experimental approach for studying information from

biological motion (BM) with reduced interference from non dynamic cues is to

represent the main joints of a person’s body by bright dots against a dark background

(Johansson, 1973). From such point-light displays, observers can easily recognize a

human walker, determine his/her gender (Barclay et al., 1978; Cutting, 1978;

Kozlowski & Cutting, 1977; Mather & Murdoch, 1994; Troje, 2002a), recognize

various action patterns (Dittrich, 1993), identify individual persons (Cutting &

Kozlowski, 1977) and even recognize themselves (Beardsworth & Buckner, 1981). The

evolutionary importance of the perception of animate motion patterns has led to the

development of a specific neural machinery as shown by several brain imaging studies

using fMRI and PET (Bonda et al., 1996; Grezes et al., 2001; Grossman & Blake,

2001, 2002; Grossman et al., 2000; Servos et al., 2002; Vaina et al., 2001). These

studies report selective activation of the superior temporal sulcus (STS) to visual

stimuli consisting of BM. In addition to area STS, activation specific to BM has also

been shown in the cerebellum (Grossman et al., 2000; Vaina et al., 2001), area VP

(Servos et al., 2002), the amygdala (Bonda et al., 1996) and the occipital and fusiform

face area (Grossman & Blake, 2002). Activity in the STS-region is not induced by

meaningful and coordinated non-BM such as the pendulum movements of a

grandfather clock (Pelphrey et al., 2003). A dissociation between visual processing of

moving humans and moving manipulable objects was also supported by Beauchamp

and colleagues (2003) by showing STS activity to human point-light and video displays

in contrast to activity in the middle temporal gyrus to tool video and point-light

displays.

According to a neural model based on BM perception studies, both the dorsal and

ventral processing streams contribute to the perceptual analysis of BM (Giese &

Poggio, 2003). The ventral form pathway is thought to provide information about

sequences of body postures; the dorsal motion pathway is thought to provide

information about complex optic flow patterns. Data from both pathways are integrated


60

in the STS region. This region is not only involved in the perception of whole body

movements. Activation of STS is also observed during perception of movements of the

eyes, hand and mouth and even when looking at implied motion in static images

(Allison et al., 2000). With respect to face perception, STS is suggested to be involved

in the processing of dynamic aspects of faces that convey information facilitating social

communication (Haxby, Hoffman, & Gobbini, 2000).

Results from imaging studies as well as from simulations (Giese & Poggio, 2003) are

consistent with neuropsychological findings in neurological patients suffering from

focal brain lesions (Cowey & Vaina, 2000; McLeod et al., 1996; Schenk & Zihl, 1997;

Vaina, 1994; Vaina et al., 1990). These case studies provide evidence for a dissociation

between mechanisms involved in the perception of BM on the one hand, and

mechanisms involved in inanimate visual motion tasks or static object recognition tasks

on the other hand.

One approach to study the neuronal dynamics of perception of action is to measure

activity when presenting stimuli consisting of body movements in full view in which

the display initially stands still and movement onset occurs with a delay. Neural

responses to onset of movements of the mouth and the eyes (Puce et al., 2000) were

observed within 200 ms after motion onset as measured by ERPs. Facial movements

occurring on a continuously present face elicited different N170 amplitudes for mouth

opening versus closing and for eye aversion versus eyes gazing at the observer. Similar

results were found for the observation of whole body actions of others (Wheaton et al.,

2001). ERPs elicited in response to movement onset in movie sequences of body

stepping, hand closing and opening, and mouth opening and closing were selective for

specific hand and body motions.

Findings from ERP and functional imaging studies in humans are complemented by

electrophysiological studies in monkeys. Single cell recordings in the macaque superior

temporal polysensory area (STP) yielded neurons which responded selectively to the

sight of whole body movements as well as to point-light displays of BM (Oram &

Perrett, 1994). Many STS cells integrate information about the form and motion of

animate objects (Oram & Perrett, 1996). Further support for the notion that also the

temporal lobe integrates facial form and motion in humans stems from an fMRI and


61

ERP study of visual processing of natural and line-drawings displays of moving faces

(Puce et al., 2003). The STS and the fusiform gyrus responded selectively to both types

of face stimuli, and they evoked larger ERPs compared to control stimuli at around 200

ms post motion onset. Puce and Perrett (2003) recently concluded that specialized

visual mechanism exist in the STS complex of both humans and non-human primates

which produces selective neural responses to moving natural images of faces and

bodies. These mechanisms are also involved in the processing of point-light displays of

BM. Whereas substantial knowledge is available about the neural structures underlying

BM perception, many issues concerning the processing stages and their temporal

characteristics in humans are as yet unclear. Hirai and colleagues (2003) tried to clarify

the neural dynamics in BM perception by comparing ERPs elicited by point-light

displays of BM and scrambled motion. They report that both types of stimuli elicited

peaks at around 200 and 240 ms which were larger in the BM condition than in the

scrambled motion condition.

The aim of the present study was to further elucidate the nature, time course and

location of neural processing involved in the perception of BM by using event-related

potentials and, in addition, low resolution brain electromagnetic tomography

(LORETA). Point-light displays of whole body motion in upright and inverted

orientation served as stimuli in order to focus on the dynamic aspects of body motion

and to reduce form cues from body shape. In contrast to some previous ERP-studies

(Puce et al., 2000; Wheaton et al., 2001) and in accordance to the approach of Hirai and

colleagues (2003), in the current study form information has to be derived from the

information of the motion trajectories. Furthermore, the comparison of upright and

inverted BM aims to provide deeper insight into distinct processing stages associated

with BM since inverted displays convey the same structural information as upright BM

but the detection of an actor is substantially impaired as shown in several

psychophysical experiments (Pavlova & Sokolov, 2000; Sumi, 1984; Troje, 2003).

Control stimuli were displays of scrambled motion in which dots have the same motion

vector as in the BM condition but with their initial position being randomized. We

predicted an inversion effect for BM in the time window up to 200 ms after stimulus

onset as usually found for faces or other stimulus categories for which the observer is

an expert. Moreover we expected an ERP source specific to upright BM in the STS-


62

complex, concerned with the fine analysis of motion patterns which provide

biologically relevant information and contribute to social perception.

3.2 Methods

3.2.1 Participants

15 healthy volunteers (eight females, seven males; ages 20 to 35 years) participated in

this study, which was undertaken with the understanding and written consent of each

participant. The procedure was approved by the Ethics Committee of the Ruhr-

University Bochum. All participants had normal or corrected-to-normal vision.

3.2.2 Stimuli

The visual stimuli used in this study were obtained from 20 men and 20 women

walking on a treadmill, which served as models to acquire BM data. Data were

recorded in 3D space using a motion capture system (Vicon; Oxford Metrics, Oxford,

UK). A framework which allows using linear methods to transform BM data (Troje,

2002a) was applied. As result of this procedure, an “average walker” was computed

from our data set and animated as a point-light display. The dots representing the major

joints of the body were located at the ankles, the knees, the hips, the wrists, the elbows,

the shoulders, the center of the pelvis, on the sternum, and in the center of the head.

The point-light displays were presented in frontal view on a black screen either in

upright orientation, inverted orientation (180 degrees rotated in fronto-parallel plane)

or as scrambled motion. Fig. 3.1 illustrates the three categories of visual stimuli. In the

latter condition, the moving dots had the same local motion trajectories as in the

upright BM displays, but their initial starting position was randomized destroying the

spatial relation among the dots. The area in which the scrambling occurred was

matched with respect to size to the other stimulus conditions.


63

Fig. 3.1. Categories of stimuli: BM in upright orientation (BM), BM in inverted orientation (IBM) and scrambled motion (SCR).

3.2.3 Experimental setup

Participants were seated in a dimly lit sound attenuated cabin, with response buttons

under their right hands. A computer screen was mounted at a distance of 90 cm in front

of the participant’s eyes. At this distance the stimuli subtended a visual angle of

approximately 4.1 degree in height and 1.6 degree in width. All stimuli were presented

for 800 ms at the center of the screen; successive trials were separated by intertrial

intervals of 2000 ms in which a black screen was presented. Stimulus and motion onset

occurred simultaneously. The experiment consisted of 60 trials per condition, resulting

in 180 trials altogether which were presented in randomized order. Participants were

asked to maintain central eye fixation during the trials and to respond as quickly and

accurately as possible by pressing the right button to dot patterns representing BM (in

upright and inverted orientation) and the left button to scrambled motion. Observers

did not receive feedback on their responses. Before starting the experimental trials

observers were shown some demonstration trials of all stimulus conditions in order to

familiarize them with the display and the set up.


64

3.2.4 EEG-recording

EEG was recorded with Ag-AgCl electrodes mounted in an elastic cap from 30 scalp

sites (F5, FZ, F6, T7, C5, C3, CZ, C4, C6, T8, TP7, CP5, CP3, CP4, CP6, TP8, P7, P5,

P3, PZ, P4, P6, P8, PO7, O1, OZ, O2, PO8, A1, A2) according to the 10-20 system,

referenced to an electrode on the tip of the nose. EOG was recorded from above and

below the left eye as well as from the outer canthi of both eyes. Impedance was kept

below 5 kΩ. A Neuroscan Synamps System with related software was used for

recording. EEG was sampled with 200 Hz and stored on hard disk.

The EEG data were analyzed off-line using the Brain Vision Analyzer software

package. Raw data were digitally filtered with a 0.1 Hz high-pass and a 40 Hz low-pass

filter and segmented into epochs ranging from 200 ms before stimulus onset to 2000

ms after stimulus onset. After removing segments containing artifacts, ocular

correction was carried out. Artifact detection was automatically performed (criterion +-

75 µV) and visually checked, afterwards. Epochs were baseline-corrected using the

signal during the 200 ms that preceded the onset of the stimulus and averaged

according to the three experimental conditions. Thereafter grand averaged ERPs were

calculated. Since the error rate was generally very low (less than 5%), all trials were

included in the analysis.

3.2.5 Data analysis

Behavioral performance was analyzed by conducting a repeated measure one-way

ANOVA to determine effects of stimulus condition on response times. ERP effects of

experimental variables were determined by conducting repeated measures ANOVAs on

ERP peak amplitude values (N170) or ERP mean amplitude values (N 300). N170

amplitude was measured as peak amplitude in the 150-200 ms time range using an

automated procedure. N300 amplitude was defined as mean activity in the time range

between 230 and 360 ms after stimulus onset and calculated automatically. Peak

latencies were analyzed for the N170 component and for the positive component

preceding the N300 component in the time window 220-280 ms after stimulus onset.

The ANOVAs were conducted for the factors stimulus category (BM versus inverted


65

BM versus scrambled motion), selected electrode locations (O1, O2 versus PO7, PO8

versus P7, P8 versus TP7, TP8) and recording side (right versus left). Electrode

selection was based on the main areas of interest as suggested by previous functional

imaging studies (Bonda et al., 1996; Grezes et al., 2001; Grossman & Blake, 2001,

2002; Grossman et al., 2000; Servos et al., 2002; Vaina et al., 2001). Greenhouse-

Geisser adjustments to the degrees of freedom were performed when appropriate.

3.2.6 LORETA-Analysis

LORETA (Pascual-Marqui et al., 1999; Pascual-Marqui, Michel, & Lehmann, 1994)

calculates the current density at each voxel in the gray matter and the hippocampus of a

reference brain as a linear, weighted sum of the scalp electric potentials. LORETA

chooses the smoothest of all possible current density configurations throughout the

brain volume. This procedure only implicates that neighboring voxels should have a

maximally similar activity, no other constraints are used. LORETA-images represent

the electrical activity at each of the voxel as squared magnitude of the computed

current density.

Amplitudes of the N170 component and mean activity between 230 and 360 ms (N300)

were exported from the results as ASCII data for LORETA analysis. For each subject

and each condition, one LORETA image was generated. These images were converted

(http://www.ihb.spb.ru/~pet_lab/L2S/L2SMain.htm) for further analysis with SPM99

(http://www.fil.ion.ucl.ac.uk/spm/). In SPM99 a PET/SPECT design with a two sample

t-test was performed. The following parameters were used for the analysis: global

normalisation with proportional scaling and proportional scaling to a global mean = 50,

absolute threshold masking with an analysis threshold set to 0 and global calculation of

mean voxel value (within per image). The level of significance was set to p < 0.03.

Foci of significant differences were transformed into Talairach space (Talairach &

Tournoux, 1988) using the algorithm suggested by Brett (http://www.mrc-

cbu.cam.ac.uk/Imaging/mnispace.html) for anatomical labelling.


66

3.3 Results

3.3.1 Behavioral performance

Behavioral performance of correct identification of all three stimulus categories

exceeded 95% across all conditions. RTs in the upright BM condition were

significantly shorter than in the other conditions (F(2,20) = 27.66; p < 0.001). RTs in

the inverted BM condition and in the scrambled motion condition did not differ

significantly from each other as tested post hoc with Bonferroni adjusted measures

(Fig. 3.2). RT analysis is based on 11 out of 15 subjects, because of missing data in

four subjects.

Fig. 3.2. Reaction times for correct responses in the three experimental conditions. Error bars indicate SEM.

3.3.2 ERP effects

Fig. 3.3A shows grand averaged ERPs in response to the three experimental conditions.

In the latency window up to 400 ms after stimulus onset, two distinct components

emerged in all experimental conditions. The first negative component peaks on average

at a latency of 183 ms. In accordance with previous studies (Puce & Perrett, 2003; Puce

et al., 2000; Puce et al., 2003; Wheaton et al., 2001), this component was termed N170.


67

The later negative component is located in the time window between 230 and 360 ms

after stimulus onset. We refer to this second negative component as N300. Fig. 3.3B

shows the difference waveforms of BM minus scrambled motion as well as inverted

BM minus scrambled motion to illustrate the differential amplitudes of BM and

inverted BM compared to scrambled motion. These difference waveforms reach their

largest amplitudes earlier than the N170 peak and the N300 peak respectively.

3.3.2.1 N170 amplitude and latency

N170 amplitude was assessed individually as peak amplitude within the 150-200 ms

latency window. ANOVA yielded significant main effects of stimulus condition

(F(2,28) = 9.97; p = 0.001) and electrode location (F(2,28) = 4.42; p = 0.044) as well as

a significant interaction (F(2,28) = 3.31; p = 0.018). Differences between stimulus

conditions were tested post-hoc with Bonferroni adjusted measures yielding a

significant difference between the BM and scrambled motion conditions (p = 0.003)

and between the BM and inverted BM conditions (p = 0.029). The main effect of

electrode locations is due to generally smaller peak amplitudes at parieto-temporal

electrodes in comparison to the other sites. The interaction between stimulus condition

and electrode location reflects larger peak amplitude differences between upright BM

and the other conditions for posterior electrodes than for anterior electrodes. Analysis

of peak latencies for the N170 component (Table 3.1) did not yield significant

differences between the three conditions (F(2,28) = 0.32; p = 0.969).


68

Fig. 3.3. A) Grand-averaged ERPs recorded at lateral posterior electrodes (left hemisphere: O1, PO7, P7 and TP7; right hemisphere: O2, PO8, P8 and TP8) in response to BM (solid lines), inverted BM (dotted lines) and scrambled motion (dashed lines). Arrows indicate mean peak latency of the N170 and the N300 component. B) Difference waveforms obtained by subtracting ERPs to scrambled motion from ERPs to BM (solid line) and by subtracting ERPs to scrambled motion from ERPs to inverted BM (dotted line).


69

Table 3.1. Peak latencies for the N170 component and the positive peak preceding the N300 component (means and SEM in ms).

BM IBM SCM

Mean N170 peak latency

184.3

(2.7)

181,6

(3.1)

182.9

(3.1)

Mean positive peak latency pre N300 246.1

(3.4)

247.7

(4.9)

257,9

(3.0)

3.3.2.2 N300 amplitude and latency

The later negative component (N300) was quantified as mean amplitude within the

230-360 ms latency window. The N 300 component had a larger amplitude for upright

and inverted walkers as compared to scrambled walkers (F(2,28) = 14.90; p < 0.001). A

post-hoc test with Bonferroni adjusted measures revealed a significant difference

between the BM and scrambled motion conditions (p = 0.001) and between the

inverted BM and scrambled motion conditions (p = 0.003). In addition, there was a

significant interaction between stimulus condition and side of recording (F(2,28) =

4.21; p < 0.027). This interaction is due to larger differences in mean amplitudes

between BM and scrambled motion at right temporo-parietal and parietal electrodes.

This indicates a pronounced right-hemispheric advantage in visual processing of BM,

particularly at later processing stages.

In addition to amplitude differences, significant differences in the latencies (Table 3.1)

of the positive peak preceding the N300 component emerged (F(2,28) = 4.70; p <

0.024). The peak was delayed in the scrambled motion condition, whereas the latencies

for BM and inverted BM did not differ significantly. Moreover, there was also a

significant effect of peak amplitudes for this positive peak (F(2,28) = 18.76; p < 0.001)

with a pronounced positivity for scrambled motion. Amplitude differences between

BM and inverted BM were not significant.


70

3.3.2.3 Source analysis

Source analysis was performed for two reasons: To consider the data from all

electrodes during the two time windows of interest and to get an approximation of the

ERP component sources. Results of the LORETA analysis are listed in Table 3.2. It has

to be pointed out that the spatial resolution of this analysis is not as high as resolution

from fMRI experiments. Source analysis was performed separately for the N170 and

N300 component.

For the N170 component, the contrast between upright BM and scrambled motion and

the contrast between upright BM and inverted BM were calculated, since the ERP

amplitudes between those conditions differed significantly from each other. The

contrast between upright BM and scrambled motion revealed sources in the posterior

cingulate gyrus and in the left lingual gyrus. Results for the upright BM versus inverted

BM contrast yielded three distinct sources: One in the posterior cingulate gyrus, one in

the area of the subcallosal gyrus/precuneus and another in the right occipital gyrus. Fig.

3.4 illustrates schematically the location of the N170 sources for both contrasts.


71

Table 3.2. Results of LORETA analysis for N170 and N300 contrasts showing Talairach space coordinates, probable Brodman areas (BA) in the range of 3mm and levels of significance.

BA x y z p

N170 BM vs SCR

Lingual gyrus 10 -24 79 -7 0.016

Posterior cingulate gyrus 23, 30 -3 -44 22 0.024

N170 BM vs IBM

Subcallosal gyrus,

Precuneus -16 -64 23 0.014

Posterior cingulate gyrus 30 -3 -51 16 0.014

Middle occipital gyrus 19 46 -78 11 0.024

N300 BM vs SCR

Fusiform gyrus,

Cerebellum

6 32 -45 -15 0.001

Subcallosal gyrus,

Anterior cingulate gyrus

6 -3 9 -11 0.009

Medial frontal gyrus,

rectal gyrus

25

11

-10 16 -19 0.015

Superior temporal gyrus 52 -37 16 0.011

N300 IBM vs SCR

Inferior frontal gyrus,

Middle frontal gyrus

47

11

25 29 -12 0.004

Medial frontal gyrus,

Rectal gyrus

25

11

-3 16 -18 0.013

Anterior cingulate gyrus 25 -3 16 -6 0.013

Superior temporal gyrus 52 -37 9 0.027


72

Fig. 3.4: LORETA-analysis: Group comparison of absolute current density values between the upright BM condition and the scrambled motion condition (BM-SCR) and between the upright BM condition and the inverted BM condition (BM-IBM) for the N 170 peak. Three percent p-value threshold.

For the N300 component, the contrast between upright BM and scrambled motion and

the contrast between inverted BM and scrambled motion were calculated. The largest

contrast between upright BM and scrambled motion emerged for the right fusiform

gyrus. In addition, a source within the right superior temporal gyrus was estimated.

Additional sources were observed in the orbitofrontal cortex (subcallosal gyrus and

anterior cingulate gyrus; rectal gyrus and medial frontal gyrus). Results for the inverted

BM versus scrambled motion contrast yielded four distinct sources. One source was

located in the right superior temporal gyrus, the other sources were located in

orbitofrontal brain areas (rectal gyrus and medial frontal gyrus; and anterior cingulate

gyrus) and in the Inferior frontal gyrus. Locations of the N300 sources for both

contrasts are illustrated in Fig. 3.5.


73

Fig. 3.5: LORETA-analysis: Group comparison of absolute current density values between the upright BM condition and the scrambled motion condition (BM-SCR) and between the inverted BM condition and the scrambled motion condition (IBM-SCR) in the time range 230-360 ms after stimulus onset (N 300). Three percent p-value threshold.

LORETA analysis yielded evidence for sources specific to BM within the right

fusiform gyrus and the right superior temporal gyrus for the second component,

whereas sources specific for BM in the early component were located in areas

associated with attentional aspects of visual processing (posterior cingulate cortex).

Additional sources generating the second component were located in orbitofrontal

brain areas (anterior cingulate gyrus, medial frontal gyrus).

3.4 Discussion

The current results present clear evidence for the involvement of two distinct

processing stages in the visual analysis of BM: An early negative component (N170)

peaking at 180 ms after stimulus onset and a later negative component (N300) in the

time window between 230 and 360 ms after stimulus onset. The N170 component is


74

modulated differently by upright BM in comparison to inverted BM and scrambled

motion. The difference between upright and inverted BM reflects an inversion effect in

BM perception. The amplitude of the N300 component did not differ significantly

between inverted and upright BM stimuli, but was less pronounced for scrambled

motion, indicating similar processing for upright and inverted BM conditions in later

processing stages. Whereas the sources generating the N170 component are mainly

located in posterior areas near the midline, the later N300 component is generated by

sources in the superior temporal gyrus and the fusiform gyrus in the right hemisphere.

In our experimental setup, onset of stimulus presentation and onset of motion occur in

parallel. Therefore, the ERPs are evoked in response to stimulus onset as well as in

response to motion onset and consequently both processes may contribute to the neural

responses recorded in the ERPs. The neural basis of motion perception has been

studied psychophysiologically using motion-onset VEPs (Bach & Ullrich, 1994, 1997;

Hoffmann, Unsold, & Bach, 2001). Visual motion onset evokes VEP at two major

sites, the occipital/occipital-temporal sites and the vertex. At occipital sites, visual

motion onset per se evokes ERP components that are dominated by a minor positivity

(P1) around 100-130 ms and a pronounced negativity (N2) around 150-200 ms. The

negative component represents motion mechanisms as shown by its susceptibility to

motion adaptation, while the positive component is more likely associated with form-

processing mechanisms. In visual perception of faces or man-made objects like houses,

stimulus onset elicits a negative component around a latency of 170 ms (Eimer,

2000b). Taking these factors into consideration, the first negative ERP component

(N170) obtained in the present experiment may reflect the contribution of both

processes. Since both processes are inherent features of BM, the relative contribution

of each process to ERPs is difficult to estimate.

BM processing occurs with very short latencies. Visual processing needed to perform

the highly demanding task of discrimination between upright BM patterns on the one

hand and scrambled motion and inverted BM on the other hand can be achieved within

a time period of 180 ms. In static images of natural scenes processing required to

decide whether the scene contains an animal can even be performed within 150 ms, as

measured by ERP modulation (Thorpe, Fize, & Marlot, 1996). Processing speed cannot

be improved by extensive training (Fabre-Thorpe, Delorme, Marlot, & Thorpe, 2001),


75

and seems to be limited by the underlying neural mechanism. The present study is

concerned with motion rather than with static stimuli representing animate versus

inanimate stimuli. Given a monitor frame rate of 60 Hz, the visual system needs at least

2/60 ~ 30 ms to integrate two frames, which is a prerequisite for generation of a

coherent percept in stimuli of BM. Because of the short latencies obtained in the

present study, it seems likely that the processing of BM must be based on highly

automated feed-forward mechanisms.

Nevertheless, we cannot rule out completely that static cues have played a minor role

to perform the discrimination task. The three classes of stimuli can be discriminated on

the basis of static versions of the displays since the spatial arrangement is slightly

different for upright, inverted and scrambled stimuli. Such a discrimination of static

displays would be based purely on geometrical cues and is clearly different from the

discrimination based on the percept of a walking human figure which is only evoked by

animated displays of BM. Moreover, we found sources of the ERP signal in the second

component in brain areas which were reported to be selectively involved in BM

perception in several imaging studies (Bonda et al., 1996; Grezes et al., 2001;

Grossman & Blake, 2001, 2002; Grossman et al., 2000; Servos et al., 2002; Vaina et

al., 2001) but were not reported to be activated by static displays consisting of a few

dots which match joint positions.

The ERP effects are in accordance with psychophysical findings showing a strong

inversion effect in perception of BM (Bertenthal et al., 1987; Dittrich, 1993; Dittrich,

Troscianko, Lea, & Morgan, 1996; Mitkin & Pavlova, 1990; Pavlova & Sokolov, 2000;

Sumi, 1984; Troje, 2003) as well as in face perception, e.g. (Thompson, 1980; Troje,

2003; Valentine, 1988). As expected, the ERP effects are mirrored by behavioral data,

showing shorter response times for the detection of upright walkers in comparison to

inverted walkers and scrambled motion. Taking into account the longer response times

for inverted BM and scrambled motion, similar modulations in the early ERP-

component in response to inverted BM and scrambled motion can be expected. The

pronounced effect for upright BM, therefore, probably reflects the pop-out effect of

BM in familiar orientation that leads to shorter reaction times for the detection of this

type of visual information. This pop-out effect might be associated with the global


76

recognition of a human person depicted as a point-light display as previously described

(Bertenthal & Pinto, 1994).

A pop-out phenomenon resulting from perceptual experience requires the involvement

of high-level areas, since complex visual stimuli such as point-light displays of BM

must be integrated in neural populations having large receptive fields. Due to the short

latency of the N170 component, this can only be achieved by a feedforward

mechanism. This interpretation is in accordance with the reverse hierarchy theory

Hochstein and Ahissar (2002) suggesting that “vision at a glance” matches a high-

level, generalized, categorical scene interpretation by neural processing along the

feedforward hierarchy of areas leading to increasingly complex representations. High-

level spread attention associated with “vision at a glance” subserves the initial, crude

global percept of the gist of a scene. The pop-out effect is only one aspect of this crude

initial assessment. In contrast, for later “vision with scrutiny”, reverse hierarchy

routines focus attention to specific units incorporating detailed information available

there into conscious perception. Therefore, fine discrimination depends on re-entry to

low-level specific receptive fields to bind features.

In the N170 component, sources generating both the contrast between BM and inverted

BM and the contrast between BM and scrambled motion are located in the posterior

cingulate cortex. This area seems to have different attention-related functions. It has

been suggested that the cingulate region may establish a neural interface between

attention and motivation (Small et al., 2003). In addition, activity in the posterior

cingulate cortex was correlated with the speed of detecting a visual target when it was

preceded by a predictive cue (Mesulam, Nobre, Kim, Parrish, & Gitelman, 2001). We

assume that the posterior cingulate cortex might reflect high-level spread attention

subserving neural processing leading to the global percept. The finding of sources

generating the contrast between BM and the other conditions in attention-related areas

also fits well with recent behavioral data suggesting that attention is required for the

visual analysis of point-light displays (Cavanagh et al., 2001; Thornton et al., 2002).

Cavanagh and colleagues (2001) showed that discrimination of specific features of

point-light displays of BM seems to be a serial process since reaction times increased

with the number of items. The reaction time increase was attributed to increasing

attentional demands of the task. Results in a dual task paradigm to explore the role of


77

attention in the processing of BM (Thornton et al., 2002) suggested that, in some cases,

perception of BM can be automatic. But if strategies operating in a global, top-down

fashion are required, attentional demands play a vital role.

The N300 component might reflect processes associated with “vision with scrutiny”

according to the reverse hierarchical theory (Hochstein & Ahissar, 2002) and is

responsible for the fine analysis of BM patterns which is necessary to retrieve visual

information of social and psychological relevance. This view is in agreement with

source localization relating to the contrast between BM and scrambled motion in the

second component. These sources are located in the right superior temporal gyrus and

the right fusiform gyrus. For the contrast between inverted BM and scrambled motion,

there was only one source located in the superior temporal gyrus. In several brain

imaging studies, the STS complex as well as the fusiform face area were shown to be

involved in perception of BM. (Bonda et al., 1996; Grezes et al., 2001; Grossman &

Blake, 2001, 2002; Grossman et al., 2000; Servos et al., 2002; Vaina et al., 2001). The

present results from ERPs and source analysis are in accordance with the findings from

these brain imaging studies.

In addition to STG and FFG, we also found sources within the anterior cingulate cortex

and the orbital prefrontal cortex for the second component. Several studies have shown

that the anterior cingulate cortex is involved in response selection (Bunge, Hazeltine,

Scanlon, Rosen, & Gabrieli, 2002), monitoring of performance and conflict evaluation

(Carter et al., 1998) and regulation of attention (Posner & DiGirolamo, 1998). The

anterior cingulate cortex source may be thus more related to cognitive processes

associated with the task requirements (to make a decision between two response

alternatives) than to perceptual analysis of various stimulus classes. A number of

studies (Adolphs, 2001) provided evidence that the orbitofrontal cortex plays a crucial

role in social cognition and has strong interconnections to the STS-complex as well as

to the fusiform gyrus which are engaged in social perception. In the present study,

sources in the orbitofrontal cortex might also be related to these neural mechanisms.

The ERP correlate of the inversion effect in BM perception are opposite to inversion

effects obtained in face recognition paradigms, in which inverted faces elicit a higher

negativity in the N170 component than upright faces (Rossion & Gauthier, 2002).


78

When comparing upright faces with control stimuli (consisting of houses etc.), faces

elicit a more pronounced N170 effect (Eimer, 2000a, 2000b). In the present procedure,

we used scrambled motion as control stimuli. There were no differences in the early

component between scrambled and inverted BM, indicating that these two stimulus

types are processed similarly during initial processing stages. In addition, the mean

peak latency of the first component was 180 ms. This latency is longer than the peak

latencies usually reported for face perception (Bentin, Allison, Puce, Perez, &

McCarthy, 1996). This difference may be a consequence of an extended integration

time for the detection of form conveyed by motion which is not required for static

image perception.

The finding of hemispheric asymmetries in STG and FFG in later processing of BM

representing human gait patterns is also consistent with evidence from fMRI studies

(Grezes et al., 2001; Grossman et al., 2000), which reported pronounced activity in the

right hemisphere associated with the perception of BM. This finding may be related to

information useful for the recognition of individual human features. These displays

contain such sources of information and provide important cues for social interaction.

Our results are in accordance with the ERP- findings from Hirai and colleagues (2003)

who reported peaks at around 200 and 240 ms after stimulus onset which were larger in

response to BM than to scrambled motion. In contrast to our study, they used only two

experimental conditions (BM versus scrambled motion) and did not include source

analysis. Their displays were presented in profile view and were generated by a

computer algorithm. There are, however, differences concerning the latency of the first

component which was on average shorter in our paradigm than in the study by Hirai

and colleagues (2003). Our study extends this work by demonstrating inversion effects

in early visual processing reflecting an pop-out effect with upright walkers, but not in

later processing stages associated with the fine analysis of BM. In addition, our

findings offer some evidence for the brain areas associated with the two components.

A recently published paper (Pavlova et al., 2004) analyzed gamma MEG activity in

response to BM generated from a computer algorithm. Recognizable upright and non-

recognizable inverted walkers evoked enhancements in oscillatory gamma brain

activity (25-30 Hz) over the left occipital cortices as early as 100 ms from stimulus

onset. Upright BM elicited further gamma response over the parietal (130 ms) and right


79

temporal (170 ms) lobes. Whereas the temporal order and approximate localization of

brain areas showing synchronized firing pattern in the gamma band is in accordance

with our findings, the absolute timing of the occurrence of gamma activity is different

from the timing of the ERP-components found in our study. This difference might be

related to differences in recording techniques (MEG versus EEG) and analysis methods

(frequency analysis versus ERP).

Taken together, our findings suggest two distinct components of BM processing,

recruiting different neuronal populations. The first processing stage (N170) reflects the

generation of a global percept of the visual scene leading to a pop-out effect of upright

BM. The sensitivity to upright BM is the result of the familiarity of BM in normal

orientation. At the level of the second processing stage (N300), brain areas as STG and

FFG, which are known from fMRI studies to be involved in the fine and detailed

perceptual analysis of BM, play an important role. The evidence for fast, efficient

processing underlines the importance of perception of BM, and provides further

evidence for a specific neural network involved in processing biologically relevant

motion signals. The right-hemispheric dominance associated with BM perception

shows clear parallels to asymmetries in face perception and probably reflects the social

relevance of animate motion perception. Furthermore, STG and FFG are primarily

involved in the second ERP component (N300) and show clear hemispheric

asymmetries in the perceptual analysis of BM only during later processing stages.

Chapter IV Viewpoint-dependent Recognition of Biological Motion

80

IV Study 3: Self Recognition versus

Recognition of Others by Biological

Motion: Viewpoint-dependent Effects

Daniel Jokisch, Irene Daum and Nikolaus F. Troje

Summary

In the present study we investigated the influence of viewing angle on recognition

performance of walking patterns of one’s own person and familiar individuals such as

friends or colleagues. Viewpoint-dependent recognition performance was tested in two

groups of twelve persons who know each other very well. Participants’ motion data

were acquired by recording their walking patterns in three-dimensional space using a

motion capture system. Size normalized point-light displays of biological motion of

these walking patterns, including one’s own, were presented to the same group

members on a computer screen in frontal view, half profile view and profile view.

Observers were requested to assign the person’s name to the individual gait pattern

being presented without receiving feedback.

Whereas recognition performance of the own walking patterns was viewpoint

independent, recognition rate for other familiar individuals was better for frontal and

half profile view than for profile view. Viewpoint-dependent recognition effects for

other people might be due to selective attention to approaching people, leading to

preferential exposure to frontal and half profile views of gait patterns. The finding of

viewpoint independent representation of own movement patterns might be related to a

crossmodal transfer from motor to visual representations.


81

4.1 Introduction

One of the most biologically salient events are animate motion patterns. Humans can

efficiently detect another living being in a visual scene and retrieve many features of

psychological, biological and social relevance. The ability to identify, interpret, and

predict the actions of others is of particular relevance and essential for successful social

interaction. Visualizing the positions of the main joints of a walking human body by

bright dots against a dark background (Johansson, 1973) yields information from

biological motion (BM) with reduced interference form non-dynamic cues. From such

point-light displays, observers can easily recognize a human walker within 200 ms

(Johansson, 1976), determine his/her gender (Barclay et al., 1978; Cutting, 1978;

Kozlowski & Cutting, 1977; Mather & Murdoch, 1994; Troje, 2002a) and recognize

various action patterns (Dittrich, 1993). Dynamic cues from walking patterns also

contain sufficient information to recognize identity, if observers are familiar with the

persons to be presented (Cutting & Kozlowski, 1977), and even to recognize oneself

from a recorded point-light display of one’s own movements (Beardsworth & Buckner,

1981). Recognition performance in the latter studies was significantly above chance

level, but information from BM failed to provide a cue for identity as reliably as facial

information or voice information. When presented with gait patterns from six familiar

persons, recognition performance varied between 35–40% for correct identifications of

other persons, whereas the recognition rates for the own gait pattern was 60%.

Perceptual analysis of BM is performed by a specific neuronal network (Bonda et al.,

1996; Grezes et al., 2001; Grossman & Blake, 2001, 2002; Grossman et al., 2000;

Servos et al., 2002; Vaina et al., 2001) which involves both the dorsal motion pathway

as well as the ventral form pathway (Giese & Poggio, 2003). Data from both pathways

are integrated in a region around the superior temporal sulcus. Whereas there exist a

large body of literature about the impressive ability of the visual system to derive a

coherent percept of a human body from a small number of moving dots, the principles

underlying information encoding and retrieval are not yet fully understood.

One important aspect is the viewpoint from which a walker is seen and its influence on

our ability to extract information from BM. The knowledge about viewpoint-dependent


82

recognition effects may provide insight into the mental representations and perceptual

mechanisms of BM processing. There is an ongoing debate whether visual

representations of objects are viewpoint-dependent (Bulthoff & Edelman, 1992; Tarr &

Bulthoff, 1995) or viewpoint-invariant (Biederman & Gerhardstein, 1993, 1995).

Viewpoint-invariance indicates that object recognition is independent upon the

viewpoint of previous exposure to the object, whereas viewpoint-dependence results in

better object recognition when presented in a familiar perspective.

The Recognition-by-Components approach of viewpoint-invariance (Biederman &

Gerhardstein, 1993) is restricted to inanimate objects which fulfill specific criteria

(objects must be decomposable into viewpoint-invariant parts, so-called “geons”;

structural descriptions of different objects must be distinctive; identical structural

descriptions over different viewpoints). Other approaches (Bulthoff & Edelman, 1992)

support that viewpoint-invariance is valid for all object classes, independent of specific

features. To reconcile both theories, viewpoint-independent recognition may occur at a

basic level, whereas viewpoint-dependence applies to a subordinate level. This view is

supported by Foster and Gilson (2002). They showed that both image-based as well as

structural representations can play a role, dependent on the object class and the level of

object specificity. Recognition processes based on localized features seem to be more

viewpoint-dependent, and generalization is limited (Bulthoff & Edelman, 1992).

Consistent with the latter theory, viewpoint-dependence has been shown for unfamiliar

faces (Hill & Bruce, 1996; Hill, Schyns, & Akamatsu, 1997; Troje & Bulthoff, 1996)

and to familiar faces, with slightly longer response times to profile views than to frontal

views (Bruce, Valentine, & Baddeley, 1987). Observers’ performance is poorer at

recognizing their own profile (which is an unfamiliar view for one’s own face)

compared with a frontal view, whereas there is no difference in response time between

frontal and profile views of faces of highly familiar individuals (Troje & Kersten,

1999). Taken together, these results provide strong evidence for the viewpoint

dependency of recognition of identity.

In the domain of BM there is so far only one study on viewpoint-dependent recognition

of identity in an artificial learning paradigm (Troje et al., in press). This study yielded

an overall advantage of frontal views compared to profile and half-profile views.


83

Change of viewpoint from training to test resulted in a performance decrease.

Viewpoint dependence has also been investigated in the context of gender classification

based on BM information (Mather & Murdoch, 1994; Troje, 2002a). As in the above

mentioned study, observers derived more information about the gender of a walker

from frontal view.

It is as yet unclear, whether the results from Troje et al (in press) also apply to

ecological settings. In other words, little is known about the representation of gait

dynamics of familiar persons such as colleagues and friends with whom we interact in

daily life. As we usually do not see our own gait patterns from a third person view, it is

also of interest whether there is a dissociation between the mental representation of our

own gait patterns and the representation of another familiar person. This question is of

special relevance, since evidence from neurophysiological and imaging studies suggest

a common coding between perception and action (Blakemore & Decety, 2001; Decety

& Grezes, 1999; Rizzolatti & Fadiga, in press; Rizzolatti et al., 2001) which may have

distinctive implications for the neuronal representation of the own movement patterns.

The direct matching hypothesis (Rizzolatti et al., 2001) postulates that we understand

actions by mapping the visual representation of observed actions on the motor

representations of the same action. According to this view, observation of an action

induces resonance in the motor system of the observer. In the premotor cortex of

monkeys “mirror neurons” were found that discharge when the monkey performs

specific hand actions and also when it observes another individual performing the same

action (Gallese et al., 1996). There is evidence that a “mirror system”, similar to that

described in the monkey, also exists in humans. In contrast to the monkey mirror

system, the human analogue is more flexible since it reacts not only to goal directed

actions but also shows resonance behavior to intransitive i.e. not object directed

actions. Evidence for such a flexible mirror system comes from studies applying

transcranial magnetic stimulation (Fadiga et al., 1995; Gangitano et al., 2001), MEG-

studies (Hari et al., 1998) and from functional brain imaging (Buccino et al., 2001;

Iacoboni et al., 1999).

The present study addressed three issues. The first aim was to investigate recognition

of walking patterns from familiar individuals such as friends or colleagues represented

as point-light displays (PLD) within a larger sample of different gait patterns and with


84

a more sophisticated presentation technique than in previous studies. The second aim

was to elucidate viewpoint-dependent effects in the representation of gait dynamics by

exploring the influence of viewing angle on recognition performance. The third aim

was to examine differences in the representation of one’s own gait pattern and gait

patterns of other persons.

4.2 Methods

4.2.1 Stimuli.

Two groups of 12 participants each served as models to acquire the motion data. All of

them were staff at the Ruhr-University of Bochum (10 females, 14 males; ages 21 to 42

years). Individuals in each group knew each other well from daily interaction in the

working environment. Motion data of the participants were acquired by recording their

walking patterns in three-dimensional space using a motion capture system equipped

with 9 CCD-cameras (Oxford Metrics, Vicon 512). Participants were instructed to walk

at a comfortable speed through the capture volume which was 7 m long. A set of 41

retroreflective markers was attached to their bodies. The system tracks the positions of

the markers with a spatial accuracy in the range of 1 mm and a temporal resolution of

120 Hz. From these 41 markers the trajectories of 15 “virtual” markers positioned at

major joints of the body were computed. Commercially available software

(Bodybuilder, Oxford Metrics) for biomechanical modeling was used to obtain the

respective computations. Translational motion was subtracted and the data were

normalized in size. Eventually, they were animated as point-light displays such that the

walkers seemed to walk as if on a treadmill. Fitting a Fourier series to the data (Troje,

2002b) the displays could be looped continuously to allow a variable presentation time.

The dots representing the major joints of the body were located at the ankles, the knees,

the hips, the wrists, the elbows, the shoulders, the center of the pelvis, on the sternum,

and in the center of the head.

The displays were presented in frontal view (FV, 0°), half profile view (HV, 30°) and

profile view (PV, 90°) as white dots on a black computer screen (Fig. 4.1). The walkers

subtended 6.4 deg of visual angle at the viewing distance of 90 cm. They were


85

computed in real time on a frame by frame basis and synchronized with the 60 Hz

refresh rate of the 19” CRT monitor to ensure smooth, regular motion. Stimuli were

presented using Matlab with the Psychophysics Toolbox extensions (Brainard, 1997;

Pelli, 1997).

Fig. 4.1. Categories of stimuli: Point-light display in frontal view (FV), half-profile view (HV) and profile view (PV).

4.2.2 Participants

Twenty of the 24 subjects who supplied the motion data participated as observers in the

experiment (9 females, 11 males; ages 21 to 42 years) which was undertaken with the

understanding and written consent of each participant. All subjects had worked in one

of two different laboratories at least for six weeks, saw each other daily and knew each

other well by name.


86

4.2.3 Procedure

Before the experiment was started the procedure was explained in detail to the

observers and they were shown a list of names of all people to be presented, including

their own. Three blocks of 36 trials (12 gait patterns x 3 orientations) were presented in

randomized order. Consecutive blocks were separated by a short break. Stimulus

presentation time was not limited.

Each display remained on the screen until the observers indicated that they had

recognized the gait pattern by pressing a response button. Then a list containing name

buttons of all persons being presented appeared on the screen. Observers were asked to

indicate the name of the person by button press. Then the next trial started. Observers

did not receive feedback about their responses.

4.2.4 Data analysis

Overall recognition performance was analyzed using a one-way ANOVA to determine

effects of viewing angle on percentage of correct identifications. Only blocks 2 and 3

were included in the analysis. The first block served to familiarize the subjects with the

displays and to show them the whole range of different gait patterns in the sample.

Greenhouse-Geisser adjustments to the degrees of freedom were performed when

appropriate. Because of the experimental design, self recognition and recognition of

others are represented asymmetrically in this analysis. Recognition performance of

one’s own gait pattern and of gait patterns of others were therefore analyzed separately

in order to elucidate viewpoint-dependent effects for each group. For recognition of

others, a further ANOVA was conducted. By contrast, recognition of one’s own gait

pattern was analyzed non-parametrically because recognition rate was not distributed

normally.

4.3 Results

On average, mean scores for correct identification were 28.5% for frontal view, 26.9%

for half profile view, and 19.4% for profile view. Analysis of overall recognition


87

performance by a repeated measure ANOVA revealed a significant effect of viewing

angle on recognition performance (F(2,38) = 7.10, p = 0.003). For frontal views (p =

0.012) and half profile views (p = 0.018) recognition performance was significantly

better in comparison to profile views as tested post hoc with Bonferroni adjusted

measures (Fig. 4.2).

Fig. 4.2. Overall percentage of correct identification for frontal (FV), half-profile (HV) and profile view (PV). The dashed line indicates chance level.

The separate comparison of recognition performance of one‘s own gait and of gait

patterns of other persons indicated that only recognition of others’ gait patterns was

viewpoint-dependent (F(2,38) = 7.57, p = 0.002). Rates of correct identification were

28.6% (FV), 26.6% (HV) and 18.4% (PV). Again, for frontal views (p = 0.008) and

half profile views (p = 0.018) percentage of correct identification was significantly

better in comparison to profile views.

By contrast, recognition performance of one’s own gait pattern was almost at the same

level in all viewing conditions. Observer identified their own gait pattern correctly in

27.5% of the trials in the frontal view condition and in 30% of the trials in the half

profile view as well as in the profile view condition (Fig. 4.3). Statistical analysis

0

5

10

15

20

25

30

35

FV HV PV

Cor

rect

iden

tific

atio

n [%

]


88

revealed no significant effect of viewing angle on recognition rate (χ²(2) = 0.08, p =

0.960). Statistical power of the χ²-test was analyzed post-hoc by means of the software

G-Power (Erdfelder, Faul, & Buchner, 1996). The following parameters were used for

analysis: effect size ω was set to 0.33, which indicates a medium effect size according

to Cohen (1988) and corresponds to a population effect of 10%; the α-level was set to

0.05. For these parameters the statistical power was 0.91 and, therefore, the

nullhypothesis was accepted.

Fig. 4.3. Overall percentage of correct identification for frontal (FV), half-profile (HV) and profile view (PV) separated for self recognition and recognition of others. Number of trials are different for self recognition versus recognition for others. The dashed line indicates chance level.

4.4 Discussion

The current results present further evidence that kinematic cues from BM provide

information about personal identity which can be transferred from real life experience

to reduced point-light displays of BM. Recognition performance in this study was

found to be three times higher than chance level. Nevertheless, information from BM

0

5

10

15

20

25

30

35

FV HV PV

Cor

rect

iden

tific

atio

n [%

]

Others Self


89

failed to provide a highly reliable cue for individual identification, if walking patterns

of familiar persons had not been seen before as point-light displays. The process

involved in individual recognition of identity from BM is clearly different from other

processes which derive information about identity like face recognition. In comparison

to earlier studies (Beardsworth & Buckner, 1981; Cutting & Kozlowski, 1977) absolute

recognition rates in the current study was found to be lower. However, in these studies,

the number of walkers were lower, too. If recognition performance is considered with

respect to chance level performance, the differences between the current study and the

previous studies become marginal and even reverse. Recognition of others is 3.0 times

higher and self recognition is 3.51 times higher than chance level in the current study

as compared to ratios of 2.27 in the study by Cutting and Kozlowski (1977), and 3.48

(self recognition) and 1.89 (recognition of others) in the study by Beardsworth and

Buckner (1979). Moreover, simple size which was not accounted for in the older

studies could not be used as cue in the current study since stimuli were normalized with

respect to the walker’s size.

As concerns the role of the viewpoint in recognition of identity of other people,

individual features of gait dynamics can be extracted more efficiently when seen in

frontal or half profile view. This result is in accordance with the findings by Troje and

colleagues (in press). The viewpoint dependency might be due to attention being

automatically drawn to approaching people, resulting in increased exposure to frontal

and half profile views of gait patterns. This finding supports the hypothesis of a viewer

centered representation of BM information from other individuals. In contrast to

viewpoint-dependent recognition for familiar individuals, recognition of the own

walking patterns was found to be independent of the viewing angle.

As suggested by the direct matching hypothesis (Rizzolatti et al., 2001), which assumes

a common coding between perception and action, a different mechanism as in

recognition of others might contribute to the extraction of information of the own

movement pattern. This view is supported by the fact that it is quite unusual to watch

one’s own movements from a third-person perspective. As a consequence, we have

little experience with visual feedback from our own locomotion movements.

Exceptions might be rare situations in which one walks towards a mirror when looking

at the own mirror image or watching video sequences showing the own person


90

walking. Nevertheless, recognition performance of one’s own gait was at the same

level in the frontal and half profile view condition as in the condition to recognize

others. Moreover, recognition performance from profile view was as good as from

other viewing angles and, therefore, exceeded the rate of correct identifications of other

familiar persons. For the recognition of movement patterns of familiar people,

observers have to rely on stored representations of gait kinematics of those persons and

compare them with the actual kinematics provided by the point-light displays. By

contrast, when individuals observe their own movement patterns, they refer to motor

representations associated with their own gait patterns. Referring to motor

representations in order to compare them with visual representations of movements

requires the transfer of BM information from the motor or action system to the visual

system or vice versa.

Such motor representations about the metrics of the own movements are clearly stored

in three-dimensional space. In order to compare the visual representations of a sample

of different gait patterns including one’s own with the kinematics of the own

movements, we assume that the three-dimensional motor representation of the own

kinematics is aligned to any two-dimensional visual representation independent of the

viewpoint of the gait pattern. If there is an exact match between both representations,

the own gait is successfully recognized. In case there is now exact match between

visual and motor representations, the identity of the walker has to be determined on the

basis of the visual representation of the familiar gait patterns.

Comparing the present findings on viewpoint dependency in BM perception with those

from face perception (Troje & Kersten, 1999), there is an important distinction.

Whereas for face perception an advantage of frontal view in comparison to profile view

emerges for the recognition of the own person, a similar effect was not observed for

BM. For recognition of other familiar persons the reverse pattern emerged: person

identification did not vary with angle in faces, whereas there is a clear frontal view

advantage for BM perception. This dissociation supports again the assumption that

information conveyed by the motor systems contribute to the perception and

recognition of the own movements. Nevertheless, neither the visual information nor the

information from motor representation is perfect given the substantial error rate.

Information from BM in everyday life is used for different purposes, such as estimating


91

the smoothness and attractiveness of the movements of a possible partner or for the

inference about a person’s emotions and personality traits from the way he or she

moves. Deriving information from motor cognition in the context of self recognition,

on the other hand, might depend on the precision of the own body scheme or the degree

of experience with physical exercise.

We can confirm earlier findings on person identification from BM. Even though error

rates are rather high, performance is way above chance level. For recognition of

familiar persons the viewing angle plays an important role. Identity information can be

extracted more reliably from frontal and half profile view. Finally, recognition of one’s

own movements is independent of the viewing angle. We hypothesize that this reflects

a cross-modal transfer between visual and motor representations according to the direct

matching hypothesis.

Chapter V Biological Motion Perception in Cerebellar Patients

92

V Study 4: Differential Involvement of

the Cerebellum in Biological and Coherent

Motion Perception

Daniel Jokisch, Nikolaus F. Troje, Benno Koch1, Michael Schwarz2 and

Irene Daum

Summary

Perception of biological motion (BM) is a fundamental property of the human visual

system. It is as yet unclear which role the cerebellum plays with respect to the

perceptual analysis of BM represented as point-light displays. Imaging studies

investigating BM perception revealed inconsistent results concerning cerebellar

contribution. The present study aims to explore the role of the cerebellum in the

perception of BM by testing the performance of BM perception in patients suffering

from circumscribed cerebellar lesions and comparing their performance with an age-

matched control group.

Perceptual performance was investigated in an experimental task testing the threshold

to detect BM masked by scrambled motion and a control task testing detection of

motion direction of coherent motion masked by random noise. Results show clear

evidence for a differential contribution of the cerebellum to the perceptual analysis of

coherent motion perception compared to BM. Whereas the ability to detect BM masked

by scrambled motion was unaffected in the patient group, their ability to discriminate

direction of coherent motion in random noise was substantially affected. We conclude

that intact cerebellar function is not a prerequisite for a preserved ability to detect BM.

Since the dorsal motion pathway as well as the ventral form pathway contribute to the

1 Benno Koch contributed to this study by providing assistance with pre-selecting patients on the basis of clinical and MRI characteristics and by performing neurological examinations 2 Michael Schwarz contributed to the discussion of the design and discussion of the results.


93

visual perception of BM, the question remains open, whether cerebellar dysfunction

affecting the dorsal pathway is compensated for by the not affected ventral pathway or

whether perceptual analysis of BM is performed completely without cerebellar

contribution.


94

5.1 Introduction

Motion patterns characteristic of living beings are termed biological motion (BM).

Detection of such motion patterns is a fundamental property of the human visual

system. Humans can efficiently detect another living being in the visual environment,

and are able to retrieve many features from its kinematics. An experimental approach to

uncouple information from BM from other non dynamic sources of information is to

represent the main joints of a person’s body by bright dots against a dark background

(Johansson, 1973). Employing this point-light display technique, observers can easily

recognize a human walker, determine his/her gender (Barclay et al., 1978; Cutting,

1978; Kozlowski & Cutting, 1977; Mather & Murdoch, 1994; Troje, 2002a), recognize

various action patterns (Dittrich, 1993), identify individual persons (Cutting &

Kozlowski, 1977) and even recognize themselves (Beardsworth & Buckner, 1981).

The highly adaptive value of an efficient perception of animate motion patterns is

reflected by a specific neural machinery performing perceptual analysis of such kind of

visual information (Bonda et al., 1996; Grezes et al., 2001; Grossman & Blake, 2001,

2002; Grossman et al., 2000; Servos et al., 2002; Vaina et al., 2001). Neuroimaging

studies report selective activation of the superior temporal sulcus (STS) to visual

stimuli consisting of BM. In addition to area STS, activation specific for BM has also

been shown in the cerebellum (Grossman et al., 2000; Vaina et al., 2001), area VP

(Servos et al., 2002), the amygdala (Bonda et al., 1996), the occipital and fusiform face

area (Grossman & Blake, 2002) and the premotor cortex (Saygin et al., 2004). Results

from these studies and from computational modelling (Giese & Poggio, 2003) are

consistent with neuropsychological findings in neurological patients suffering from

focal cortical brain lesions (Cowey & Vaina, 2000; McLeod et al., 1996; Schenk &

Zihl, 1997; Vaina, 1994; Vaina et al., 1990).

The cerebellum has traditionally been viewed as a brain structure subserving skilled

motor behavior. While recent work has suggested a much broader functional role of the

cerebellum with contributions to a wide range of cognitive and perceptual functions

(for review Daum, Snitz, & Ackermann, 2001; Justus & Ivry, 2001), the role of the

cerebellum in BM perception is unclear. The neuroimaging literature on the role of the

cerebellum with respect to the perception of BM is inconsistent, with some studies


95

reporting cerebellar involvement (Grossman et al., 2000; Vaina et al., 2001), while

others failed to detect cerebellar activity associated with BM perception (e.g. Grezes et

al., 2001; Grossman & Blake, 2001; Servos et al., 2002). Moreover, there are some

inconsistencies regarding the cerebellar substructure which may be involved in BM

perception. Grossman and colleagues (2000) found cerebellar activity in the anterior

portion near the midline, whereas Vaina and colleagues (2001) reported activity

specific to BM in lateral parts of the cerebellum.

The current study aims to elucidate the functional role of the cerebellum in perception

of BM using a lesion approach i.e. examining the perceptual performance of patients

with selective cerebellar lesions. Within this context, a particular issue of interest was

the differential cerebellar contribution to visual processing of BM relative to motion

perception per se.

5.2 Methods

Two experimental tasks were administered in order to explore the functional role of the

cerebellum in the perception of BM and to compare its involvement in non-BM

perception. A group of patients with selective ischemic cerebellar lesions was examined

in these tasks. The patients’ perceptual performance was compared to the performance

of an age-matched control group. Perceptual performance was assessed by determining

the threshold for the detection of masked BM and masked non-BM. In the BM task, the

presence or absence of a point-light walker that was masked by dots consisting of

scrambled motion had to be detected. In the non-BM task, observers had to detect the

motion direction of coherently moving dots that were masked by random noise dots.

5.2.1 Participants

Seven cerebellar patients and seven healthy control subjects participated in the

investigation. The patients (ranging from 27 to 68 years, mean age 45.6 years) suffered

from a cerebellar infarction of either the posterior inferior cerebellar artery (PICA), the

anterior inferior cerebellar artery (AICA) or the superior cerebellar artery (SupCA) in


96

the post-acute state. Main cerebellar symptoms in the acute stage included ataxia,

dysmetria, dysarthria, and impairments of fine motor coordination. Table 5.1 presents a

summary of relevant clinical information. The examination was carried out between 16

and 47 months (mean 27.4 months) after the ischemic event. At this time, patients

suffered only from residual motor impairments.

Table 5.1. Summary of relevant patient information. Asterisks indicate patients being able to solve the control task. (AICA = anterior inferior cerebellar artery; PICA = posterior inferior cerebellar artery; SupCA = superior cerebellar artery)

Patient Sex Hemisphere Lesion

type

Location Time since lesion

(months)

Pat 1 M R AICA medial 47

*Pat 2 M L PICA medial 23

*Pat 3 F LR SupCA lateral 20

Pat 4 F LR SupCA lateral 28

Pat 5 F R AICA medial 29

*Pat 6 M L PICA medio-basal 16

Pat 7 M L PICA medial 29

Patients were extensively screened in neuropsychological functioning. Their present

state IQ was assessed by the subtests similarities and picture completion of the short

German version from the Wechsler Intelligence Scale (Dahl, 1972). According to these

subtests, their mean IQ was 113.6 and, therefore, in the average to upper average range.

Patients’ ability to scan the visual field was tested with the subtest visual scanning of a

widely used German attention test battery (Zimmermann & Fimm, 1993). In this

subtest, patients’ performance for search accuracy ranged between percentile score 14

and percentile score 58. This pattern of performance revealed no specific impairment in

visual scanning in our sample of patients.

Healthy control subjects were recruited by advertisement to match the patients with

respect to age (ranging from 24 to 67 years, mean age 45.1 years) and sex. All

participants had normal or corrected to normal vision. The examination was undertaken


97

with the understanding and written consent of each participant. The study had been

approved by the ethics committee of the Ruhr-University Bochum.

5.2.2 Stimuli

Stimuli in all experimental tasks were presented using Matlab with the Psychophysics

Toolbox extension (Brainard, 1997; Pelli, 1997). In all experimental trials, stimuli were

presented for a duration of 200 ms in order to preclude an effect of fixation shifts.

5.2.2.1 Biological motion detection

Perception of BM was tested with stimuli of point-light walkers in frontal view masked

by noise dots consisting of scrambled motion. Stimuli were presented as white dots on a

black screen (Fig. 5.1). The mask dots had the same local motion trajectories as the dots

defining the point-light displays, but the spatial relation among the dots was removed

by randomizing their initial starting position.

Fig. 5.1. Depiction of the stimuli used in the BM task: Dots connected by lines represent the point-light walker. Remaining dots represent scrambled motion. Lines are only drawn in the figure depiction for the sake of clarity.


98

Three male individuals served as walking models for the construction of the point-light

displays. Motion data of the models were acquired by recording their walking patterns

in three-dimensional space using a motion capture system equipped with 9 CCD-

cameras (Oxford Metrics, Vicon 512). Models were instructed to walk at a comfortable

speed through the capture volume which was 7 m long. A set of 41 retroreflective

markers was attached to their bodies. The motion capture system tracks the three-

dimensional trajectories of the markers with spatial accuracy in the range of 1mm and a

temporal resolution of 120 HZ. From the original 41 markers the trajectories of 15

“virtual” markers positioned at major joints of the body were computed. Commercially

available software (Bodybuilder, Oxford Metrics) for biomechanical modeling was

used to perform the respective computations. Translational motion was subtracted such

that the walkers appeared to walk on a treadmill.

Degree of difficulty of the detection task was manipulated by varying the number of

mask dots from 0 to 60 dots in steps of five dots. Accordingly, thirteen different

degrees of difficulty were obtained. In half of the trials the walker was present, in the

other half of the trials the walker was absent and replaced by the same number of

scrambled dots. Mask dots and random dots were displayed in an area subtending 7.4 x

7.4 degree visual angle. Within the display area, the position of the point-light displays

as well as the positions of the mask dots were chosen randomly. The walkers subtended

5.5 x 1.5 deg of visual angle at the viewing distance of 57 cm. Point-light displays of

walkers were computed in real time on a frame by frame basis and synchronized with

the 60 Hz refresh rate of the 15” monitor to ensure smooth, regular motion.

5.2.2.2 Coherent motion detection

Perception of non-BM was tested with displays of coherent motion in random noise

which were matched with respect to size to the BM stimuli (7.4 x 7.4 deg). Displays

consisted of 200 white dots with a size of 0.05 x 0.05 deg presented on a black screen.

A specified percentage of the dots moved coherently at a speed of 6 deg/second either

to the right or to the left hand side (Fig. 5.2). Signal dots had a limited lifetime of five

frames. After the end of lifetime the dots disappeared and reappeared on the screen at a

location opposite to direction of movement. The mask dots were positioned randomly


99

within the display area and had a limited lifetime of two frames in which they were

displayed stationary. After disappearing they reappeared at a new location. The

percentage of signal dots was varied between 65% and 5% in steps of 5% resulting in

thirteen different degrees of difficulty.

Fig. 5.2. Depiction of the stimuli used in the control task (coherent motion in random noise): Direction of coherently moving dots is illustrated by arrows. Remaining dots represent random noise.

5.2.3 Procedure

The experiment was carried out in the Klinikum Dortmund and in the Institute of

Cognitive Neuroscience of the Ruhr-University Bochum. All participants were seated

in front of a 15’’ monitor at a distance of 57 cm with response buttons under their right

hands. Stimuli in both tasks were presented at the center of the screen for 200 ms in

order to preclude eye movements. Successive trials were separated by intertrial

intervals of 2000 ms during which a black screen was presented. Before each trial a

fixation cross was presented for 2000 ms.

Both experiments comprised three blocks of 52 trials each (13 degrees of severity x 4

repetitions) resulting in 156 single trials per experimental task. Trials within each block

were presented in random order. The two experimental tasks were presented in


100

counterbalanced order. Participants were asked to keep central eye fixation and to

respond as accurately as possible by pressing one of the response buttons. Instructions

stressed accuracy rather than speed of responding. Observers did not receive feedback

on their responses. Before starting the experimental trials participants were shown

demonstration trials in order to familiarize them with the display and the setup.

5.2.4 Data analysis

Experimental data were analyzed in two consecutive steps separately for each

experiment. First, the detection threshold for each subject was determined for both

experimental paradigms. The likelihood to respond correctly by chance was 50% in

both experimental tasks. The threshold was defined as the signal to noise ratio needed

to perform correctly at 75% of the trials per degree of severity. To achieve this, a

sigmoidal curve (Boltzmann function) was fitted to the experimental data with the

upper asymptote fixed at 100% performance and the lower asymptote fixed at chance

level corresponding to 50% performance. Fig. 5.3 illustrates this procedure for a single

subject.

Subsequently, a group comparison of the thresholds of the subjects of the experimental

and control group was performed by a t-test for independent measures separately for

both experiments. In addition, the reaction times were recorded. A group comparison of

the median reaction time per subjects was performed by a t-test for independent

measures separately for both experiments.


101

Fig. 5.3. Illustration of the procedure to determine the detection threshold for a single subject in BM detection (A) and coherent motion detection (B).


102

5.3 Results

5.3.1 Biological motion detection

On average, the control group reached the threshold criterion in the BM paradigm at a

noise level of 23.3 masking dots. Cerebellar patients showed a similar performance

reaching the criterion at a noise level of 22.2 masking dots (Fig. 5.4). A group

comparison between experimental and control group revealed no significant differences

of the threshold to detect BM in the current paradigm (t(12) = 0.183; p = 0.858).

Fig. 5.4. BM task: Perceptual threshold as number of mask dots for the detection of a point-light walker masked by scrambled motion. Error bars indicate SEM.

5.3.2 Coherent motion detection

In the control task comprising the detection of the direction of coherent motion in

random noise, control subjects reached the criterion at a signal to noise ratio of 34.40%.

By contrast, only three out of seven cerebellar patients were able to solve the task even

at the highest signal to noise ratio of 65% presented in the experiment. In order to

compare both groups statistically, for those patients who did not succeed in solving the


103

task, a threshold value of 70% signal to noise ratio was submitted to analysis. Important

to note is that this conservative procedure underestimates the magnitude of the

impairment in direction detection of coherent motion in random noise in the patient

group. Applying this procedure, the patient group needed on average a signal to noise

ratio of 61% to fulfill the criterion (Fig. 5.5). A group comparison revealed a significant

difference between experimental and control group (t(12) = 2.757; p = 0.020).

Fig. 5.5. Control task: perceptual threshold as signal to noise ratio for direction detection of coherent motion masked by random noise. Error bars indicate SEM.

5.3.3 Relation between biological and coherent motion perception

In order to examine the relation between performance in BM detection and coherent

motion detection, a correlation analysis was calculated. For the patient group, Pearson’s

correlation coefficient between performance in both tasks was –0.293 (p = 0.524). For

the control group, Pearson’s correlation coefficient was 0.116 (p = 0.805). The

correlation between performance in BM and coherent motion detection failed to reach

significance in both groups, although Pearson’s correlation coefficient was slightly

higher in the patient group. According to these results, BM perception and coherent

motion perception seem to be independent processes.


104

5.3.4 Reaction times

In addition to accuracy, the reaction times were analyzed (Fig. 5.6). For coherent

motion detection, a group comparison revealed longer reaction times for patients (989

ms) than for controls (451 ms) (t(12) = 3.454; p = 0.005). Separate analyses of correct

and error trials yielded significant group differences for correct trials (p = 0.005) and a

trend towards significant differences for incorrect trials (p = 0.107). A different pattern

emerged for BM detection. Reaction times for patients (864 ms) and controls (707 ms)

did not differ significantly (t(12) = 1.2; p = 0.253). This is true for correct trials (p =

0.175) as well as for incorrect trials (p = 0.267).

Fig. 5.6. Reaction times in the BM detection task (BM) and the coherent motion detection task (CM) for each group. Error bars indicate SEM.

5.4 Discussion

The objective of the current study was to elucidate the differential contribution of the

cerebellum to the perceptual analysis of BM by examining perceptual performance of

patients with selective cerebellar lesions. Previous imaging studies revealed

inconsistent results with respect to cerebellar activation in BM perception (Bonda et al.,

1996; Grezes et al., 2001; Grossman & Blake, 2001, 2002; Grossman et al., 2000;

0100200300400500600700800900

100011001200

CMPatients

CMControls

BMPatients

BMControls

Rea

ctio

n TI

me

[ms]


105

Servos et al., 2002; Vaina et al., 2001). It is difficult to estimate on the basis of

neuroimaging studies alone which brain regions are critically involved in specific

aspects of cognitive function, since multiple co-activations are usually observed when

applying this method. As a consequence, neuropsychological studies of patients with

selective lesions to different regions play an important role in the evaluation of the

distinct nature of information processing in each brain region.

The present results show clear evidence for a differential contribution of the cerebellum

to the perceptual analysis of coherent motion perception on the one hand and BM on

the other hand. Whereas the perception of coherent motion in random noise was

substantially affected in our patients with selective cerebellar lesions, the ability to

perceive BM camouflaged by scrambled motion was unaffected. In addition, we did not

observe significant correlations between the perceptual threshold for BM detection and

coherent motion detection in each group. Moreover, patients’ higher threshold for

coherent motion detection corresponds to longer reaction times. When comparing

overall performance in the present BM detection task with performance in other studies

exploring BM detection using scrambled motion as mask (Bertenthal & Pinto, 1994),

the detection threshold obtained in the present work is substantially higher. This

difference is probably due to the very short presentation time in the present study (200

ms) compared to 1000 ms in the study by Bertenthal and Pinto (1994).

The finding of an impairment in the detection of movement direction in the control

paradigm confirms previous reports (Ivry & Diener, 1991; Nawrot & Rizzo, 1995,

1998). Results from a study comparing perceptual judgments of the velocity of moving

stimuli and the position of static stimuli in cerebellar patients (Ivry & Diener, 1991)

showed selective impairments for the discrimination of moving stimuli. Further support

for the notion of cerebellar involvement in motion perception was given by Nawrot and

Rizzo (1995, 1998), who showed that midline cerebellar lesions can cause visual

motion perception deficits in tasks such as detecting the direction of dot movements in

a masking paradigm. These deficits occur during the acute stage as well as in the

chronic stage of lesions. The primary interest of the current study was to explore

whether there is a differential contribution of the cerebellum to BM perception as

compared to motion perception per se.


106

The general framework, that is, presentation time and general procedure was identical

in both experimental tasks and cannot explain the performance deficits of the patients in

the control task. This is particularly true for the role of eye movements. In order to

control the influence of eye movements a very short presentation time of 200 ms was

chosen. Within this short time period it is almost impossible to initiate eye movements.

If ocular motor problems or defective fixation had played a major role in these tasks,

performance in both tasks would have been affected to similar extent. The cerebellum

plays a critical role in motor control, with the lateral regions mediating movement

planning and programming, while the medial regions contributing to the execution of

movements (Dichgans & Diener, 1984). Accordingly, the most prominent symptoms

after cerebellar dysfunction are impairments in motor control. For this reason, accuracy

rather than reaction time was stressed in the instructions of the experiments. Since

significant reaction time differences between patients and controls were observed only

in the coherent motion detection paradigm, these differences cannot be attributed to

motor impairments.

To understand the current results of intact perception of BM in patients with selective

cerebellar lesions, the consideration of the cortical network involved in the perception

of BM might provide deeper insight. Both, the dorsal motion pathway as well as the

ventral form pathway contribute to the perceptual analysis of BM (for review Giese &

Poggio, 2003). Findings from imaging studies are complemented by computational

simulations modeling key experimental findings with respect to BM perception (Giese

& Poggio, 2003) and neuropsychological studies examining patients suffering from

selective cortical lesions (Cowey & Vaina, 2000; MacLeod, 1988; Schenk & Zihl,

1997; Vaina, 1994; Vaina et al., 1990). These case studies provide evidence for a

dissociation between mechanisms involved in the perception of BM on the one hand

and mechanisms involved in inanimate visual motion tasks or static object recognition

tasks on the other hand. Patients LM (MacLeod, 1988) and AF (Vaina et al., 1990) who

have bilateral lesions involving the posterior visual pathway showed severe deficits in

visual motion perception but can nevertheless recognize human action patterns

presented as point-light displays. Patients with bilateral ventral lesions involving the

posterior temporal lobes such as patients EW (Vaina, 1994), who suffered from

prosopagnosia and object agnosia, could identify BM in point-light animations as well.

On the other hand, there is patient AL (Cowey & Vaina, 2000) who is hemianopic and


107

suffers from visual perceptual impairments in her intact hemifield as consequence of an

additional lesion in the ventral extrastriate cortex. AL fails to recognizing BM displays

despite intact static form perception and motion detection. This pattern of impairments

makes sense when assuming that the lesion in the intact hemifield includes the STS-

complex which receives input both from the ventral and dorsal visual stream.

Given these case studies and computational simulations, it is reasonable to assume that

detection of BM can be achieved by the ventral or dorsal visual stream alone if the

STS-complex is still intact. From this point of view it might be possible that the

cerebellum facilitates perceptual analysis of BM in the dorsal visual stream.

Nevertheless, dysfunctional cerebellar processing would not necessarily lead to a

significant impairment in BM perception, since dysfunctions of the dorsal visual stream

could be compensated for by intact processing of the ventral visual stream.

Alternatively, one might argue that the perceptual analysis of BM is completely

performed by neocortical structures without any cerebellar contribution to BM

perception at all. This view is also in accordance with our present findings. Moreover,

there is empirical evidence that the cerebellum becomes not only active during

execution of movement sequences but also during motor imagery in tasks such as

imagination of complex movement sequences (Decety, 1996; Decety et al., 1990;

Hanakawa et al., 2003; Luft et al., 1998; Ryding et al., 1993). Activity in response to

point-light displays of BM as observed in some imaging studies (Grossman et al., 2000;

Vaina et al., 2001) might be a result of such feedforward mechanisms.

The variety of neural connections of the cerebellum to cortical areas provide the

neuroanatomical basis for cerebellar contributions to a variety of perceptual tasks. The

cerebellum projects from lateral parts via the dentate nucleus and the thalamus to

several neocortical structures, among them the prefrontal cortex, the superior temporal

sulcus and the parietal cortex (Schmahmann & Pandya, 1997). These regions project

back to the cerebellum via the pontine nuclei. The lateral cerebellum was shown to be

engaged during the acquisition and discrimination of somatosensory information (Gao

et al., 1996). It was suggested that the lateral cerebellum may be specifically active

during motor, perceptual and cognitive performances because of the requirement to

process sensory data. In the view of Bower (1997), the cerebellum is assumed to

facilitate the efficiency with which other brain structures perform their own function,


108

and therefore, it is considered useful but not imperative for many different kinds of

brain functions. The view of a general contribution of the cerebellum to the acquisition

of sensory data is inconsistent with the present findings, since the ability of cerebellar

patients to perceive BM was spared. Therefore, cerebellar function with respect to

sensory data acquisition must be more specific.

Keele and Ivry (1990) have put forward the idea that the cerebellum has the function of

an internal clock which measures time intervals in the millisecond range. Such an exact

timing of very short intervals subserves motor as well as non-motor functions. The

demonstration of the role of the cerebellum in visual perceptual functions that require

velocity perception (Ivry & Diener, 1991; Nawrot & Rizzo, 1995, 1998) was also

interpreted to be in accordance with the timing hypothesis. Similarly, deficits in speech

perception (Ackermann, Graber, Hertrich, & Daum, 1999) and classical conditioning

(Daum et al., 1993; Topka, Valls-Sole, Massaquoi, & Hallett, 1993; Woodruff-Pak,

Papka, & Ivry, 1996) have also been discussed in relation to impaired timing in

cerebellar patients.

Recently the timing hypothesis of cerebellar function has been modified by

differentiating event timing from emergent timing (Ivry, Spencer, Zelaznik, &

Diedrichsen, 2002; Spencer, Zelaznik, Diedrichsen, & Ivry, 2003). Event timing is

defined as a form of representation in which the temporal goals are explicitly

represented. In contrast, emergent timing reflects temporal consistencies that arise

through the control of other parameters. Whereas the cerebellum is involved in tasks

requiring explicit temporal representation (event timing), it seems to be less important

in emergent timing which requires other control parameters not associated with the

cerebellum.

The timing hypothesis, especially in its modified form, can best explain our present

findings. An exact timing is necessary in order to detect coherent motion in random

noise. Considering motion as spatial displacement per time unit, the direct link between

the accurate representation of small time units and motion perception becomes obvious.

In the case of BM, the motion information only mediates the form of a human observer.

A precise timing in the millisecond range seems to be unnecessary with respect to this

perceptual demand. Nevertheless, the timing hypothesis may not explain all deficits


109

seen in cerebellar patients. Thier and colleagues (Thier, Haarmeier, Treue, & Barash

1999) tried to identify the nature of visual impairments resulting from cerebellar

dysfunction by a set of experiments. Their results support the presence of visual deficits

in cerebellar disease, but in contrast to previous studies, they provide evidence against a

common, simple denominator that can explain the deficits in both motion perception

and position discrimination.

Previous studies (Ivry & Diener, 1991; Nawrot & Rizzo, 1998, 1995) reported that

visual motion perception is linked to the medial rather than the lateral cerebellum. The

impairment pattern found in our sample of cerebellar patients does not support this

view. Patients in our study failed to show such a clear distinction between more medial

and more lateral located lesions with respect to perceptual performance in the control

task. Three out of four patients who showed deficits in direction discrimination of

coherently moving dots had lesions primarily affecting medial parts whereas only one

patient had a lesion primarily affecting lateral parts. On the other hand, the three

patients who were able to solve the task had lesions in medial parts, lateral parts, and

medio-basal parts. This result pattern shows only a slight tendency that intact

processing in the medial cerebellum is necessary for normal motion perception.

Taken together, functional integrity of the cerebellum is not required for BM detection.

Taking into consideration that both the dorsal visual stream as well as the ventral visual

stream contribute to the perception of BM, it might be possible that processing in the

dorsal motion pathway is affected by cerebellar dysfunction but processing in the

ventral form pathway can compensate for this deficit. An impairment would thus not

emerge on the behavioral level when the ventral form pathway is intact.

Chapter VI General Discussion

110

VI General Discussion

The present thesis aimed to shed further light on the neuropsychological basis of BM

perception. The four studies described in the previous chapters investigated several

open issues with that respect. In order to answer these unsolved questions the studies

included in this thesis used different methodological approaches focusing on several

aspects of BM perception. The objectives of the studies were to find out more about the

content of information contained in the kinematics of animate motion patterns

(Study1), the temporal aspects of BM processing and associated distinct processing

stages (Study 2), differential visual representations of one’s own movement patterns

compared to other familiar movement patterns (Study 3) and the role of the cerebellum

in the perceptual analysis of BM (Study 4). In the following paragraphs, the different

experimental approaches are described in a nutshell and the main findings of the four

studies are briefly summarized and concluding discussed. The last paragraph provides

an overview about the contribution of this thesis to a better understanding of the

neuropsychological basis of BM perception. Finally, an outlook is given about possible

future directions of research on BM perception.

The first study of this thesis investigated whether information from BM, i.e. from the

kinematics of locomotion patterns can serve as cue for the size of animate beings. The

fact that the constant force of earth gravity directly influences the motion patterns of

animate and inanimate beings in the physical world is the theoretical basis for this idea.

With respect to BM, gravity determines periodic fluctuations between kinetic and

potential energy. Therefore, gravity determines a fixed relation between temporal (i.e.

the stride frequency) and spatial parameters (i.e. the length of a leg) of energetically

optimal gait patterns. In fact, it has been proven that animals adjust their gait patterns in

order to minimize the energy required for their locomotion and that such a relation

between size and stride frequency does exist for a number of different species.

In the two experiments included in this study, the temporal parameters of animal

locomotion represented as point-light displays were manipulated in order to examine

their influence on size estimations of human observers. The results indicated that this


111

procedure had a significant effect on observers’ size judgments in the expected

direction. Displays with high stride frequency were perceived to be smaller than

displays with low stride frequency. Therefore, it was concluded that human observers

are able to retrieve size information from the kinematics of animate motion. Moreover,

size judgments of the observers were consistent with an inverse quadratic relation

between size and stride frequency rather than a simple linear relation. This finding

shows that observers did not make their judgments according to a simple general rule

associating a higher stride frequency to a smaller size of an animal, but that they seem

to have an implicit knowledge about the exact property of this relation.

The impact of the current findings goes beyond the fact that kinematics of BM patterns

contain cues for the size of animate objects, which can be derived by human observers.

Deriving this information requires a highly developed visual system and extensive

experience with this kind of visual stimulation. One might speculate that the typical

pattern generated by the influence of gravity on BM is one of the crucial features used

by the mammalian visual system to detect BM and process it as special category. As

consequence, this specific motion pattern, i.e. the optimized periodic fluctuations of

kinetic and potential energy might play a crucial role as sensory filter for the detection

of BM. An efficient sensory filter for BM detection plays a very important role in the

animal kingdom as well as in the development of the human species, since prays or

predators must be detected as fast as possible to have the chance to react optimally to

this potential life threatening danger or source of food.

The second study elucidated the temporal course and location of neural processing of

the perceptual analysis of BM. Observers were presented with point-light displays of a

walking figure in normal orientation, inverted orientation and displays of scrambled

motion while recording EEG. Data analysis was carried out using conventional

averaging techniques to obtain the event-related potentials as well as source

localization with low resolution brain electromagnetic tomography (LORETA).

Application of this methodological approach (ERP and LORETA) provides the

advantage to measure brain activity with a very precise temporal resolution and

additional localization of the sources generating the distinct ERP-components with an

acceptable spatial resolution.


112

The present findings suggest two distinct components of BM processing, recruiting

different neuronal populations. The first processing stage (N170) reflects the

generation of a global percept of the visual scene leading to a pop-out effect of upright

BM. The sensitivity to upright BM is the result of the familiarity of BM in normal

orientation. Moreover, the finding of sources in the posterior cingulate cortex, which

has different attention-related functions, fits well with behavioral data suggesting that

attention is required for the visual analysis of point-light displays of BM. At the level

of the second processing stage (N300), brain areas as the superior temporal gyrus and

the fusiform gyrus, which are known from fMRI studies to be involved in the fine and

detailed perceptual analysis of BM, play an important role.

The spatial resolution obtained by LORETA is lower than that obtained by fMRI but its

temporal resolution is much higher. Nevertheless, it makes sense to directly compare

the findings reported here with neuroimaging findings from previous studies using

similar stimulus categories. The present findings are generally consistent with those

from neuroimaging studies. Therefore, the validity of the results with respect to

localization as calculated by source analysis is supported. Moreover, current findings

extend evidence from fMRI-studies, since activity in distinct brain areas can directly be

related to distinct time windows.

The evidence for fast, efficient processing underlines the importance of perception of

BM, and provides further evidence for a specific neural network involved in processing

biologically relevant motion signals. The right-hemispheric dominance associated with

BM perception shows clear parallels to asymmetries in face perception and probably

reflects the social relevance of animate motion perception. Furthermore, the superior

temporal gyrus and fusiform gyrus are primarily involved in the second ERP

component and show clear hemispheric asymmetries in the perceptual analysis of BM

only during later processing stages.

The third study of this thesis addressed the question whether the mental representation

of one’s own movement pattern is different from representations of movement patterns

from other familiar persons. Using a psychophysical approach, viewpoint-dependent

recognition effects were examined in order to gain deeper knowledge of the mental

representation and perceptual mechanisms of BM processing. Observers were


113

presented with point-light display animations of the walking patterns of familiar

persons and one’s own person, shown from three different viewpoints.

The current results presented further evidence that kinematic cues from BM provide

information about personal identity which can be transferred from real life experience

to reduced point-light displays of BM. Whereas recognition performance of one’s own

walking pattern was viewpoint independent, recognition rate for other familiar persons

was better for frontal and half profile view than for profile view. Viewpoint-dependent

recognition effects for other people might be due to selective attention to approaching

people leading to preferential exposure to frontal and half profile views. The finding of

a viewpoint-independent representation of one’s own movement patterns might be

related to a crossmodal transfer from motor to visual representations.

Therefore, these results are consistent with the hypothesis that humans understand an

action by mapping the visual representation of the observed action onto their motor

representation of the same action. Such a mechanism would give an exact explanation

of the observed result pattern. Mapping the visual representation on the motor

representation would provide an advantage only for the identification of the own gait

pattern, since for this stimulus the action representation and the visual representation

refer to exactly the same individual movement pattern and, therefore, match perfectly.

Moreover, the motor representation is assumed to be stored in a three dimensional

mode explaining viewpoint-independent recognition of the own movement pattern. In

contrast, mapping the visual representation of another person’s gait pattern would

support the correct recognition of the movement as walking but would not provide any

advantage for the identification of the identity of the person whose gait pattern is seen.

The observer has to rely on the stored visual representation of the familiar gait pattern

and to compare it with the stimulus gait pattern. Viewpoint dependency is probably due

to a different degree of experience for different perspectives.

The neuroimaging literature on the role of the cerebellum with respect to its role in the

perceptual analysis of BM has been inconsistent. The fourth study of the present thesis

explored the role of the cerebellum in the perception of BM by assessing the

performance of BM perception in patients with distinct ischemic cerebellar lesions.

Perceptual performance was investigated in an experimental task testing the threshold


114

to detect BM masked by scrambled motion and a control task testing detection of

motion direction of coherent motion masked by random noise. Results revealed clear

evidence for a differential contribution of the cerebellum to the perceptual analysis of

coherent motion perception compared to BM. Whereas the ability to detect BM masked

by scrambled motion was unaffected in the patient group, their ability to discriminate

direction of coherent motion in random noise was substantially affected. Based on this

finding it was concluded that intact cerebellar function is not a prerequisite for a

preserved ability to detect BM. Since the dorsal motion pathway as well as the ventral

form pathway contribute to the visual perception of BM, it cannot definitely be stated,

whether cerebellar dysfunction affecting the dorsal pathway is compensated for by the

not affected ventral pathway or whether perceptual analysis of BM is performed

completely without cerebellar contribution.

Taken together, the results of the studies included in this thesis contributed to a better

understanding of the neuropsychological basis of BM perception with respect to a

number of open questions: First, it was shown that the human visual system can extract

information about the size of animate beings from the kinematics of their movement

patterns. The relation between temporal and spatial parameters of the kinematics of

movement patterns as defined by natural laws has been suggested as crucial feature of a

sensory filter for the detection of animate beings. Second, the temporal coarse of

processing of the visual analysis of BM was clarified and distinct processing stages

could be linked to distinct neuronal structures providing new insights about the

underlying neural mechanisms. Third, the mental representation of movement patterns

of other people and one’s own person were investigated. Evidence for a common

coding of perception and action was provided and, therefore, the assumption of a direct

matching between visual representations and action representations was further

supported. Fourth, the role of the cerebellum in the perceptual analysis of BM was

explored in a lesion approach, clarifying inconsistent results from neuroimaging

studies. Cerebellar dysfunction was found not to affect detection of BM.

The present findings have several implications for theories on BM perception. It is

widely accepted that BM is processed as a special category i.e. that animate motion is

processed differently than inanimate motion. Based on results from Study 1 it was

suggested that the crucial feature making BM unique is the periodic dissipation


115

between potential and kinetic energy under the influence of gravity associated with

animate motion patterns. The development of a sensory filter sensitive for this specific

motion style during evolution might be the reason for the very fast and efficient

processing of BM.

Based on the findings from Study 2 two processing stages associated with the

perceptual analysis of BM processing were suggested. The STS-complex and the

fusiform face area were mainly involved in the second processing stage, whereas

attention-related areas seem to play an important role in the first component. One might

speculate that the later component reflects the fine analysis of BM. In contrast, the first

stage might reflect a pop-out effect of BM caused by activation of the neural correlate

of a sensory filter for the BM features as specified above.

Theories on action perception have proposed a strong interaction between visual and

motor representations of movement patterns. This view was supported by

psychophysical findings from Study 3 showing a viewpoint invariant representation of

one’s own movement pattern compared to movement patterns from other persons.

Therefore, perception of motor actions does not seem to depend on purely sensory

processing exclusively, but might be facilitated by the activation of premotor

representations.

The analysis of the role of the cerebellum with respect to biological motion showed

that intact cerebellar function is not a prerequisite for a preserved ability to detect BM.

This finding has some implications for the understanding of the neuronal correlates of

BM perception. Since motion perception per se, a function of the dorsal visual stream,

is affected by cerebellar dysfunction, intact BM perception with distinct cerebellar

lesions can be either explained by intact processing in the ventral stream compensating

for impaired processing in the dorsal stream or by no cerebellar involvement in BM

perception.

Findings from the studies included in this thesis might stimulate future research of BM

in several directions. The hypothesis concerning the features of a sensory filter for the

detection of BM has received first evidence. Such a sensory filter might be specified by

the periodic dissipation of potential and kinetic energy. It would be interesting to


116

design different classes of stimuli for psychophysical experiments in order to isolate

the crucial features for the sensory filter of BM. Next, it would be interesting to

investigate the neural activity elicited by these stimulus classes in neuroimaging

experiments in order to explore the neural correlates of such a sensory filter.

Differences in the mental representations of one’s own movement pattern and

movement patterns of other persons have been shown by a psychophysical approach in

this thesis. It would be interesting to examine the neural correlates of this difference in

representations by a neuroimaging approach. When the predictions based on the

psychophysical findings are correct, BM animations of one’s own movement patterns

are assumed to elicit a stronger premotor involvement than perception of movement

patterns from other persons. Information from BM can be used for different purposes,

e.g. for action understanding or for social perception. It would be interesting to know to

what extent BM perception for different purposes recruits the same neural structures or

relies on different neural networks. Similarly, hemispheric asymmetries associated with

BM perception for different purposes would be interesting to explore in future research.

Chapter VII References

117

VII References

Ackermann, H., Graber, S., Hertrich, I., & Daum, I. (1999). Cerebellar contributions to

the perception of temporal cues within the speech and nonspeech domain. Brain

and Language, 67(3), 228-241.

Adolphs, R. (1999). Social cognition and the human brain. Trends in Cognitive Science,

3(12), 469-479.

Adolphs, R. (2001). The neurobiology of social cognition. Current Opinion in

Neurobiology, 11(2), 231-239.

Adolphs, R. (2003). Investigating the cognitive neuroscience of social behavior.

Neuropsychologia, 41(2), 119-126.

Alexander, R. M. (1977). Mechanics and scaling of terrestrial locomotion. In T. J.

Pedley (Ed.), Scale Effects in Animal Locomotion (pp. 93-110). New York:

Academic Press.

Alexander, R. M. (1984). The gaits of bipedal and quadrupedal animals. The

International Journal of Robotics Research, 3(49-59).

Alexander, R. M. (1989). Optimization and gaits in the locomotion of vertebrates.

Physiological Reviews, 69(4), 1199-1227.

Alexander, R. M., & Jayes, A. S. (1983). A dynamic similarity hypothesis for the gaits

of quadrupedal mammals. Journal of Zoological Society of London, 201, 135-

152.

Allison, T., Puce, A., & McCarthy, G. (2000). Social perception from visual cues: role

of the STS region. Trends in Cognitive Science, 4(7), 267-278.

Bach, M., & Ullrich, D. (1994). Motion adaptation governs the shape of motion-evoked

cortical potentials. Vision Research, 34(12), 1541-1547.

Bach, M., & Ullrich, D. (1997). Contrast dependency of motion-onset and pattern-

reversal VEPs: interaction of stimulus type, recording site and response

component. Vision Research, 37(13), 1845-1849.


118

Barclay, C. D., Cutting, J. E., & Kozlowski, L. T. (1978). Temporal and spatial factors

in gait perception that influence gender recognition. Perception &

Psychophysics, 23, 145-152.

Battelli, L., Cavanagh, P., & Thornton, I. M. (2003). Perception of biological motion in

parietal patients. Neuropsychologia, 41(13), 1808-1816.

Beardsworth, T., & Buckner, T. (1981). The ability to recognize oneself from a video

recording of one's movements without seeing one's body. Bulletin of the

Psychonomic Society, 18, 19-22.

Beauchamp, M. S., Lee, K. E., Haxby, J. V., & Martin, A. (2003). FMRI responses to

video and point-light displays of moving humans and manipulable objects.

Journal of Cognitive Neuroscience, 15(7), 991-1001.

Beintema, J. A., & Lappe, M. (2002). Perception of biological motion without local

image motion. Proceedings of the National Academy of Sciences, 99(8), 5661-

5663.

Bentin, S., Allison, T., Puce, A., Perez, E., & McCarthy, G. (1996).

Electrophysiological studies of face perception in humans. Journal of Cognitive

Neuroscience, 8(6), 551-565.

Bertenthal, B. I., & Pinto, J. (1994). Global processing of biological motions.

Psychological Science, 5(4), 221-225.

Bertenthal, B. I., Proffitt, D. R., & Kramer, S. J. (1987). Perception of biological motion

by infants: implementation of various processing constraints. Journal of

Experimental Psychology: Human Perception and Performance, 13, 577-585.

Bertenthal, B. I., Proffitt, D. R., Spetner, N. B., & Thomas, M. A. (1985). The

development of infant sensitivity to biomechanical motions. Child Development,

56(3), 531-543.

Biederman, I., & Gerhardstein, P. C. (1993). Recognizing depth-rotated objects:

evidence and conditions for three-dimensional viewpoint invariance [published

erratum appears in Journal of Experimental Psychology: Human Perception and

Performance 1994 Feb;20(1):80]. Journal of Experimental Psychology: Human

Perception and Performance, 19(6), 1162-1182.


119

Biederman, I., & Gerhardstein, P. C. (1995). Viewpoint-dependent mechanisms in

visual object recognition - reply to Tarr and Bulthoff (1995). Journal of

Experimental Psychology: Human Perception & Performance, 21(6), 1506-

1514.

Bingham, G. P. (1987). Kinematic form and scaling: further investigations on the visual

perception of lifted weight. Journal of Experimental Psychology: Human

Perception & Performance, 13(2), 155-177.

Bingham, G. P. (1993a). Perceiving the size of trees: biological form and the horizon

ratio. Perception & Psychophysics, 54(4), 485-495.

Bingham, G. P. (1993b). Perceiving the size of trees: Form as information about scale.

Journal of Experimental Psychology: Human Perception and Performance, 19,

1139-1161.

Bingham, G. P. (1993c). Scaling judgments of lifted weight: Lifter size and the role of

the standard. Ecological Psychology, 5, 31-.64.

Blake, R. (1993). Cats perceive biological motion. Psychological Science, 4(1), 54-57.

Blakemore, S. J., & Decety, J. (2001). From the perception of action to the

understanding of intention. Nature Reviews Neuroscience, 2(8), 561-567.

Bonda, E., Petrides, M., Ostry, D., & Evans, A. (1996). Specific involvement of human

parietal systems and the amygdala in the perception of biological motion.

Journal of Neuroscience, 16(11), 3737-3744.

Bower, J. B. (1997). Control of sensory data acquisition. In J. D. Schmahmann (Ed.),

The cerebellum and cognition (Vol. 41, pp. 490-513). San Diego, London:

Academic Press.

Brainard, D. H. (1997). The Psychophysics Toolbox. Spatial Vision, 10(4), 433-436.

Bruce, V., Valentine, T., & Baddeley, A. D. (1987). The basis of the 3/4 view advantage

in face recognition. Applied Cognitive Psychology, 1, 109-120.

Buccino, G., Binkofski, F., Fink, G. R., Fadiga, L., Fogassi, L., Gallese, V., et al.

(2001). Action observation activates premotor and parietal areas in a

somatotopic manner: an fMRI study. European Journal of Neuroscience, 13(2),

400-404.


120

Bulthoff, H. H., & Edelman, S. (1992). Psychophysical support for a two-dimensional

view interpolation theory of object recognition. Proceedings of the National

Academy of Sciences, 89, 60-64.

Bülthoff, I., Bülthoff, H. H., & Sinah, P. (1998). Top-down influences on stereoscopic

depth-perception. Nature Neuroscience, 1(3), 254-257.

Bunge, S. A., Hazeltine, E., Scanlon, M. D., Rosen, A. C., & Gabrieli, J. D. (2002).

Dissociable contributions of prefrontal and parietal cortices to response

selection. Neuroimage, 17(3), 1562-1571.

Calvert, G. A., Bullmore, E. T., Brammer, M. J., Campbell, R., Williams, S. C.,

McGuire, P. K., et al. (1997). Activation of auditory cortex during silent

lipreading. Science, 276(5312), 593-596.

Caramazza, A., & Shelton, J. R. (1998). Domain-specific knowledge systems in the

brain: the animate-inanimate distinction. Journal of Cognitive Neuroscience.,

10(1), 1-34.

Carter, C. S., Braver, T. S., Barch, D. M., Botvinick, M. M., Noll, D., & Cohen, J. D.

(1998). Anterior cingulate cortex, error detection, and the online monitoring of

performance. Science, 280(5364), 747-749.

Cavagna, G. A., Thys, H., & Zamboni, A. (1976). The sources of external work in level

walking and running. Journal of Physiology, 262(3), 639-657.

Cavanagh, P., Labianca, A. T., & Thornton, I. M. (2001). Attention-based visual

routines: sprites. Cognition, 80(1-2), 47-60.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences. New York:

Erlbaum: Hillsdale.

Cowey, A., & Vaina, L. M. (2000). Blindness to form from motion despite intact static

form perception and motion detection. Neuropsychologia, 38(5), 566-578.

Cutting, J. E. (1978). Generation of synthetic male and female walkers through

manipulation of a biomechanical invariant. Perception, 7(4), 393-405.

Cutting, J. E., & Kozlowski, L. T. (1977). Recognizing friends by their walk: Gait

perception without familiarity cues. Bulletin of the Psychonomic Society, 9(5),

353-356.


121

Dahl, G. (1972). Reduzierter Wechsler Intelligenztest (Short Version of the Wechsler

Intelligence Test). Meisenheim: Hain.

Damasio, A. R. (1996). The somatic marker hypothesis and the possible functions of the

prefrontal cortex. Philosophical Transactions of the Royal Society of London

Series B, 351(1346), 1413-1420.

Damasio, H., Grabowski, T., Frank, R., Galaburda, A. M., & Damasio, A. R. (1994).

The return of Phineas Gage: clues about the brain from the skull of a famous

patient. Science, 264(5162), 1102-1105.

Daum, I., Schugens, M. M., Ackermann, H., Lutzenberger, W., Dichgans, J., &

Birbaumer, N. (1993). Classical conditioning after cerebellar lesions in humans.

Behavioral Neuroscience, 107(5), 748-756.

Daum, I., Snitz, B. E., & Ackermann, H. (2001). Neuropsychological deficits in

cerebellar syndromes. International Review in Psychiatry, 13, 268-275.

Davidson, R. J., Putnam, K. M., & Larson, C. L. (2000). Dysfunction in the neural

circuitry of emotion regulation--a possible prelude to violence. Science,

289(5479), 591-594.

Decety, J. (1996). Do imagined and executed actions share the same neural substrate?

Brain Research - Cognitive Brain Research, 3(2), 87-93.

Decety, J., & Grezes, J. (1999). Neural mechanisms subserving the perception of human

actions. Trends in Cognitive Sciences, 3(5), 172-178.

Decety, J., Sjoholm, H., Ryding, E., Stenberg, G., & Ingvar, D. H. (1990). The

cerebellum participates in mental activity: tomographic measurements of

regional cerebral blood flow. Brain Research, 535(2), 313-317.

Dichgans, J. E., & Diener, H. C. (1984). Clinical evidence for functional

compartmentalisation of the cerebellum. In J. Bloedel, J. Diechgans & W. Precht

(Eds.), Cerebellar functions (pp. 126-147). Berlin: Springer.

Dittrich, W. H. (1993). Action categories and the perception of biological motion.

Perception, 22(1), 15-22.


122

Dittrich, W. H., Lea, S. E. G., Barrett, J., & Gurr, P. R. (1998). Categorization of natural

movements by pigeons - visual concept discrimination and biological motion.

Journal of the Experimental Analysis of Behaviour, 70, 281-299.

Dittrich, W. H., Troscianko, T., Lea, S. E., & Morgan, D. (1996). Perception of emotion

from dynamic point-light displays represented in dance. Perception, 25(6), 727-

738.

Dittrich, W. H., Troscianko, T., Lea, S. E. G., & Morgan, D. (1996). Perception of

emotion from dynamic point-light displays represented in dance. Perception, 25,

727-738.

Eimer, M. (2000a). Effects of face inversion on the structural encoding and recognition

of faces. Evidence from event-related brain potentials. Brain Research -

Cognitive Brain Research, 10(1-2), 145-158.

Eimer, M. (2000b). Event-related brain potentials distinguish processing stages

involved in face perception and recognition. Clinical Neurophysiology, 111(4),

694-705.

Erdfelder, E., Faul, F., & Buchner, A. (1996). GPOWER: A general power analysis

program. Behavior Research Methods, Instruments and Computers, 28, 1-11.

Fabre-Thorpe, M., Delorme, A., Marlot, C., & Thorpe, S. (2001). A limit to the speed of

processing in ultra-rapid visual categorization of novel natural scenes. Journal of

Cognitive Neuroscience, 13(2), 171-180.

Fadiga, L., & Craighero, L. (2003). New insights on sensorimotor integration: from

hand action to speech perception. Brain and Cognition, 53(3), 514-524.

Fadiga, L., Fogassi, L., Pavesi, G., & Rizzolatti, G. (1995). Motor facilitation during

action observation: a magnetic stimulation study. Journal of Neurophysiology,

73(6), 2608-2611.

Foster, D. H., & Gilson, S. J. (2002). Recognizing novel three-dimensional objects by

summing signals from parts and views. Proceedings of the Royal Society of

London Series B, 269(1503), 1939-1947.

Fox, R., & McDaniel, C. (1982). The perception of biological motion by human infants.

Science, 218(4571), 486-487.


123

Frith, C. D., & Frith, U. (1999). Interacting minds--a biological basis. Science,

286(5445), 1692-1695.

Frith, U., & Frith, C. D. (2003). Development and neurophysiology of mentalizing.

Philosophical Transactions of the Royal Society of London Series B, 358(1431),

459-473.

Gallese, V., Fadiga, L., Fogassi, L., & Rizzolatti, G. (1996). Action recognition in the

premotor cortex. Brain, 119, 593-609.

Gallese, V., & Goldmann, A. (1998). Mirror neurons and the simulation theory of mind-

reading. Trends in Cognitive Sciences, 2(12), 493-501.

Gangitano, M., Mottaghy, F. M., & Pascual-Leone, A. (2001). Phase-specific

modulation of cortical motor output during movement observation. Neuroreport,

12(7), 1489-1492.

Gao, J. H., Parsons, L. M., Bower, J. M., Xiong, J., Li, J., & Fox, P. T. (1996).

Cerebellum implicated in sensory acquisition and discrimination rather than

motor control. Science, 272(5261), 545-547.

Giese, M. A., & Poggio, T. (2003). Neural mechanisms for the recognition of biological

movements. Nature Reviews Neuroscience, 4(3), 179-192.

Goodale, M. A., & Milner, A. D. (1992). Separate visual pathways for perception and

action. Trends in Neurosciences, 15(1), 20-25.

Grafton, S. T., Arbib, M. A., Fadiga, L., & Rizzolatti, G. (1996). Localization of grasp

representations in humans by positron emission tomography. 2. Observation

compared with imagination. Experimental Brain Research, 112(1), 103-111.

Grezes, J., Costes, N., & Decety, J. (1998). Top-down effect of strategy on the

perception of human biological motion: A PET investigation. Cognitive

Neuropsychology, 15(6-8), 553-582.

Grezes, J., Fonlupt, P., Bertenthal, B. I., Delon-Martin, C., Segebarth, C., & Decety, J.

(2001). Does perception of biological motion rely on specific brain regions?

Neuroimage, 13(5), 775-785.

Grossman, E. D., & Blake, R. (2001). Brain activity evoked by inverted and imagined

biological motion. Vision Research, 41(10-11), 1475-1482.


124

Grossman, E. D., & Blake, R. (2002). Brain Areas Active during Visual Perception of

Biological Motion. Neuron, 35(6), 1167-1175.

Grossman, E. D., Donnelly, M., Price, R., Pickens, D., Morgan, V., Neighbor, G., et al.

(2000). Brain areas involved in perception of biological motion. Journal of

Cognitive Neuroscience, 12(5), 711-720.

Hanakawa, T., Immisch, I., Toma, K., Dimyan, M. A., Van Gelderen, P., & Hallett, M.

(2003). Functional properties of brain areas associated with motor execution and

imagery. Journal of Neurophysiology, 89(2), 989-1002.

Hari, R., Forss, N., Avikainen, S., Kirveskari, E., Salenius, S., & Rizzolatti, G. (1998).

Activation of human primary motor cortex during action observation: a

neuromagnetic study. Proceedings of the National Academy of Sciences, 95(25),

15061-15065.

Hauser, M. D., Chomsky, N., & Fitch, W. T. (2002). The faculty of language: what is it,

who has it, and how did it evolve? Science, 298(5598), 1569-1579.

Haxby, J. V., Hoffman, E. A., & Gobbini, M. I. (2000). The distributed human neural

system for face perception. Trends in Cognitive Sciences, 4(6), 223-233.

Hecht, H., Kaiser, M. K., & Banks, M. S. (1996). Gravitational acceleration as a cue for

absolute size and distance? Perception & Psychophysics, 58(7), 1066-1075.

Hill, H., & Bruce, V. (1996). Effects of lighting on the perception of facial surfaces.

Journal of Experimental Psychology: Human Perception and Performance,

22(4), 986-1004.

Hill, H., Schyns, P. G., & Akamatsu, S. (1997). Information and viewpoint dependence

in face recognition. Cognition, 62(2), 201-222.

Hirai, M., Fukushima, H., & Hiraki, K. (2003). An event-related potentials study of

biological motion perception in humans. Neuroscience Letters, 344(1), 41-44.

Hochstein, S., & Ahissar, M. (2002). View from the top: hierarchies and reverse

hierarchies in the visual system. Neuron, 36(5), 791-804.

Hoffman, D. D., & Flinchbaugh, B. E. (1982). The interpretation of biological motion.

Biological Cybernetics, 42(3), 195-204.


125

Hoffmann, M. B., Unsold, A. S., & Bach, M. (2001). Directional tuning of human

motion adaptation as reflected by the motion VEP. Vision Research, 41(17),

2187-2194.

Howard, R. J., Brammer, M., Wright, I., Woodruff, P. W., Bullmore, E. T., & Zeki, S.

(1996). A direct demonstration of functional specialization within motion-

related visual and auditory cortex of the human brain. Current Biology, 6(8),

1015-1019.

Iacoboni, M., Woods, R. P., Brass, M., Bekkering, H., Mazziotta, J. C., & Rizzolatti, G.

(1999). Cortical mechanisms of human imitation. Science, 286, 2526-2528.

Ivry, R. B., & Diener, H. C. (1991). Impaired velocity perception in patients with

lesions of the cerebellum. Journal of Cognitive Neuroscience, 3(4), 355-366.

Ivry, R. B., Spencer, R. M., Zelaznik, H. N., & Diedrichsen, J. (2002). The cerebellum

and event timing. Annals of the New York Academy of Science, 978, 302-317.

Jellema, T., & Perrett, D. I. (2003a). Cells in monkey STS responsive to articulated

body motions and consequent static posture: a case of implied motion?

Neuropsychologia, 41(13), 1728-1737.

Jellema, T., & Perrett, D. I. (2003b). Perceptual history influences neural responses to

face and body postures. Journal of Cognitive Neuroscience, 15(7), 961-971.

Johansson, G. (1973). Visual perception of biological motion and a model for its

analysis. Perception & Psychophysics, 14(2), 201-211.

Johansson, G. (1976). Spatio-temporal differentiation and integration in visual motion

perception. Psychological Research, 38, 379-393.

Jokisch, D., Midford, P. E., & Troje, N. F. (2001). Biological motion as a cue for the

perception of absolute size. [Abstract] Journal of Vision, 1(3), 357a,

http://journalofvision.org/1/3/357/, doi:10.1167/1.3.357.

Justus, T. C., & Ivry, R. B. (2001). The cognitive neuropsychology of the cerebellum.

International Review in Psychiatry, 13, 276-282.

Kandel, E. R., Schwartz, J. H., & Jessell, T. M. (2000). Principles of neural science

(Fourth Edition ed.): McGraw-Hill.


126

Keele, S. W., & Ivry, R. (1990). Does the cerebellum provide a common computation

for diverse tasks? A timing hypothesis. Annals of the New York Academy of

Science, 608, 179-207; discussion 207-111.

Kleinke, C. L. (1986). Gaze and eye contact: a research review. Psychological Bulletin,

100(1), 78-100.

Kohler, E., Keysers, C., Umilta, M. A., Fogassi, L., Gallese, V., & Rizzolatti, G. (2002).

Hearing sounds, understanding actions: action representation in mirror neurons.

Science, 297(5582), 846-848.

Kourtzi, Z., & Kanwisher, N. (2000). Activation in human MT/MST by static images

with implied motion. Journal of Cognitive Neuroscience, 12(1), 48-55.

Kozlowski, L. T., & Cutting, J. E. (1977). Recognizing the sex of a walker from a

dynamic point-light display. Perception & Psychophysics, 21(6), 575-580.

Kram, R., Domingo, A., & Ferris, D. P. (1997). Effect of reduced gravity on the

preferred walk-run transition speed. Journal of Experimental Biology, 200(Pt 4),

821-826.

Liberman, A. M., Cooper, F. S., Shankweiler, D. P., & Studdert-Kennedy, M. (1967).

Perception of the speech code. Psychological Review, 74(6), 431-461.

Liberman, A. M., & Mattingly, I. G. (1985). The motor theory of speech perception

revised. Cognition, 21(1), 1-36.

Luft, A. R., Skalej, M., Stefanou, A., Klose, U., & Voigt, K. (1998). Comparing

motion- and imagery-related activation in the human cerebellum: a functional

MRI study. Human Brain Mapping, 6(2), 105-113.

MacLeod, C. M. (1988). Forgotten but not gone: Savings for pictures and words in

long-term memory. Journal of Experimental Psychology Learning, Memory, and

Cognition, 14(2), 195-212.

Mather, G., & Murdoch, L. (1994). Gender discrimination in biological motion displays

based on dynamic cues. Proceedings of the Royal Society of London Series B,

258, 273-279.


127

Mather, G., Radford, K., & West, S. (1992). Low-level visual processing of biological

motion. Proceedings of the Royal Society of London Series B, 249(1325), 149-

155.

Mather, G., & West, S. (1993). Recognition of animal locomotion from dynamic point-

light displays. Perception, 22(7), 759-766.

McConnell, D. S., Muchisky, M. M., & Bingham, G. P. (1998). The use of time and

trajectory forms as visual information about spatial scale in events. Perception

& Psychophysics, 60(7), 1175-1187.

McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature,

264(5588), 746-748.

McLeod, P., Dittrich, W., Driver, J., Perrett, D., & Zihl, J. (1996). Preserved and

impaired detection of structure from motion by a "motion-blind" patient. Visual

Cognition, 3(4), 363-391.

Mesulam, M. M., Nobre, A. C., Kim, Y. H., Parrish, T. B., & Gitelman, D. R. (2001).

Heterogeneity of cingulate contributions to spatial attention. Neuroimage, 13(6

Pt 1), 1065-1072.

Mitkin, A. A., & Pavlova, M. A. (1990). Changing a natural orientation: Recognition of

biological motion pattern by children and adults. Psychologische Beitraege,

32(1-2), 28-35.

Nawrot, M., & Rizzo, M. (1995). Motion perception deficits from midline cerebellar

lesions in human. Vision Research, 35(5), 723-731.

Nawrot, M., & Rizzo, M. (1998). Chronic motion perception deficits from midline

cerebellar lesions in human. Vision Research, 38(14), 2219-2224.

Neville, H. J., Bavelier, D., Corina, D., Rauschecker, J., Karni, A., Lalwani, A., et al.

(1998). Cerebral organization for language in deaf and hearing subjects:

biological constraints and effects of experience. Proceedings of the National

Academy of Sciences, 95(3), 922-929.

Oram, M. W., & Perrett, D. I. (1994). Responses of anterior superior temporal

polysensory (STPa) neurons to "biological motion" stimuli. Journal of Cognitive



128

Oram, M. W., & Perrett, D. I. (1996). Integration of form and motion in the anterior

superior temporal polysensory area (STPa) of the macaque monkey. Journal of

Neurophysiology, 76(1), 109-129.

Pascual-Marqui, R. D., Lehmann, D., Koenig, T., Kochi, K., Merlo, M. C., Hell, D., et

al. (1999). Low resolution brain electromagnetic tomography (LORETA)

functional imaging in acute, neuroleptic-naive, first-episode, productive

schizophrenia. Psychiatry Research, 90(3), 169-179.

Pascual-Marqui, R. D., Michel, C. M., & Lehmann, D. (1994). Low resolution

electromagnetic tomography: a new method for localizing electrical activity in

the brain. International Journal of Psychophysiology, 18(1), 49-65.

Pavlova, M., Lutzenberger, W., Sokolov, A., & Birbaumer, N. (2004). Dissociable

cortical processing of recognizable and non-recognizable biological movement:

analysing gamma MEG activity. Cerebral Cortex, 14(2), 181-188.

Pavlova, M., & Sokolov, A. (2000). Orientation specificity in biological motion

perception. Perception & Psychophysics, 62(5), 889-899.

Pelli, D. G. (1997). The VideoToolbox software for visual psychophysics: transforming

numbers into movies. Spatial Vision, 10(4), 437-442.

Pelphrey, K. A., Mitchell, T. V., McKeown, M. J., Goldstein, J., Allison, T., &

McCarthy, G. (2003). Brain activity evoked by the perception of human

walking: controlling for meaningful coherent motion. Journal of Neuroscience,

23(17), 6819-6825.

Pennycuick, C. J. (1975). On the running of the gnu (Connochaetes taurinus) and other

animals. Journal of Experimental Biology, 63, 775-799.

Pinto, J., & Shiffrar, M. (1999). Subconfigurations of the human form in the perception

of biological motion displays. Acta Psychologica, 102(2-3), 293-318.

Pittenger, J. B. (1985). Estimation of pendulum length from information in motion.

Perception, 14(3), 247-256.

Pittenger, J. B. (1990). Detection of violations of the law of pendulum motion:

Observers' sensitivity to the relation between period and length. Ecological

Psychology, 2(1), 55-81.


129

Pittenger, J. B., & Todd, J. T. (1983). Perception of growth from changes in body

proportions. Journal of Experimental Psychology: Human Perception and

Performance, 9(6), 945-954.

Pollick, F. E., Paterson, H. M., Bruderlin, A., & Sanford, A. J. (2001). Perceiving affect

from arm movement. Cognition, 82(2), B51-61.

Posner, M. I., & DiGirolamo, G. J. (1998). Executive attention: Conflict, target

detection, and cognitive control. In R. Parasuraman (Ed.), The attentive brain

(pp. 401-423). Cambridge, Massachusetts: MIT Press.

Puce, A., & Perrett, D. (2003). Electrophysiology and brain imaging of biological

motion. Philosophical Transactions of the Royal Society of London Series B,

358(1431), 435-445.

Puce, A., Smith, A., & Allison, T. (2000). ERPs evoked by viewing facial movements.

Cognitive Neuropsychology, 17(1-3), 221-239.

Puce, A., Syngeniotis, A., Thompson, J. C., Abbott, D. F., Wheaton, K. J., & Castiello,

U. (2003). The human temporal lobe integrates facial form and motion: evidence

from fMRI and ERP studies. Neuroimage, 19(3), 861-869.

Regolin, L., Tommasi, L., & Vallortigara, G. (1999). Discrimination of point-light

animation sequences by newborn chicks. Perception, 28 Supplement, 23.

Rizzolatti, G., & Fadiga, L. (in press). The mirror-neuron system and action recognition.

In H. J. Freund, M. Jeannerod & M. Hallett (Eds.), Higher-order motor

disorders: from Neuroanatomy and Neurobiology to Clinical Neurology. New

York: Oxford university Press.

Rizzolatti, G., Fadiga, L., Gallese, V., & Fogassi, L. (1996). Premotor cortex and the

recognition of motor actions. Cognitive Brain Research, 3(2), 131-141.

Rizzolatti, G., Fadiga, L., Matelli, M., Bettinardi, V., Paulesu, E., Perani, D., et al.

(1996). Localization of grasp representations in humans by PET: 1. Observation

versus execution. Experimental Brain Research, 111(2), 246-252.

Rizzolatti, G., Fogassi, L., & Gallese, V. (2001). Neurophysiological mechanisms

underlying the understanding and imitation of action. Nature Reviews



130

Rosenblum, L. D., Johnson, J. A., & Saldana, H. M. (1996). Point-light facial displays

enhance comprehension of speech in noise. Journal of Speech and Hearing

Research, 39(6), 1159-1170.

Rosenblum, L. D., & Saldana, H. M. (1996). An audiovisual test of kinematic primitives

for visual speech perception. Journal of Experimental Psychology: Human

Perception and Performance, 22(2), 318-331.

Rossion, B., & Gauthier, I. (2002). How does the brain process upright and inverted

faces? Behavioural and Cognitive Neuroscience Reviews, 1(1), 62-74.

Runeson, S., & Frykholm, G. (1981). Visual perception of lifted weight. Journal of

Experimental Psychology: Human Perception and Performance, 7(4), 733-740.

Runeson, S., & Frykholm, G. (1983). Kinematic specification of dynamics as an

informational basis for person-and-action perception: Expectation, gender

recognition, and deceptive intention. Journal of Experimental Psychology:

General, 112(4), 585-615.

Ryding, E., Decety, J., Sjoholm, H., Stenberg, G., & Ingvar, D. H. (1993). Motor

imagery activates the cerebellum regionally. A SPECT rCBF study with 99mTc-

HMPAO. Brain Research - Cognitive Brain Research, 1(2), 94-99.

Santi, A., Servos, P., Vatikiotis-Bateson, E., Kuratate, T., & Munhall, K. (2003).

Perceiving biological motion: dissociating visible speech from walking. Journal

of Cognitive Neuroscience, 15(6), 800-809.

Saxberg, B. V. (1987a). Projected free fall trajectories. I. Theory and simulations.

Biological Cybernetics, 56(2-3), 159-175.

Saxberg, B. V. (1987b). Projected free fall trajectories. II. Human experiments.

Biological Cybernetics, 56(2-3), 177-184.

Saygin, A. P., Wilson, S. M., Hagler, D. J., Jr., Bates, E., & Sereno, M. I. (2004). Point-

light biological motion perception activates human premotor cortex. Journal of

Neuroscience, 24(27), 6181-6188.

Schenk, T., & Zihl, J. (1997). Visual motion perception after brain damage: II. Deficits

in form-from-motion perception. Neuropsychologia, 35(9), 1299-1310.


131

Schmahmann, J. D., & Pandya, D. N. (1997). The cerebrocerebellar system. In J. D.

Schmahmann (Ed.), The cerebellum and cognition (Vol. 41, pp. 31-60). San

Diego, London: Academic Press.

Sedgwick, H. A. (1993). The effects of viewpoint on the virtual space of pictures. In S.

R. Ellis, M. K. Kaiser & A. Grunwald (Eds.), Pictorial communication in virtual

and real environments. New York: Taylor & Francis.

Servos, P., Osu, R., Santi, A., & Kawato, M. (2002). The Neural Substrates of

Biological Motion Perception: an fMRI Study. Cerebral Cortex, 12(7), 772-782.

Shiffrar, M., & Freyd, J. J. (1990). Apparent motion of the human body. Psychological

Science, 1(4), 257-264.

Shiffrar, M., & Freyd, J. J. (1993). Timing and apparent motion path choice with human

body photographs. Psychological Science, 4, 379-384.

Shipley, T. F. (2003). The effect of object and event orientation on perception of

biological motion. Psychological Science, 14(4), 377-380.

Small, D. M., Gitelman, D. R., Gregory, M. D., Nobre, A. C., Parrish, T. B., &

Mesulam, M. M. (2003). The posterior cingulate and medial prefrontal cortex

mediate the anticipatory allocation of spatial attention. Neuroimage, 18(3), 633-

641.

Spencer, R. M., Zelaznik, H. N., Diedrichsen, J., & Ivry, R. B. (2003). Disrupted timing

of discontinuous but not continuous movements by cerebellar lesions. Science,

300(5624), 1437-1439.

Stappers, P. J., & Waller, P. E. (1993). Using the free fall of objects under gravity for

visual depth estimation. Bulletin of the Psychonomic Society, 31(2), 125-127.

Sumi, S. (1984). Upside-down presentation of the Johansson moving light-spot pattern.

Perception, 13(3), 283-286.

Talairach, J., & Tournoux, P. (1988). Co-planar stereotaxic atlas of the human brain:

Thieme Medical Publishers.

Tarr, M. J., & Bulthoff, H. H. (1995). Is human object recognition better described by

geon structural descriptions or by multiple views? Comment on Biederman and


132

Gerhardstein (1993). Journal of Experimental Psychology: Human Perception

and Performance, 21(6), 1494-1505.

Thier, P., Haarmeier, T., Treue, S., & Barash, S. (1999). Absence of a common

functional denominator of visual disturbances in cerebellar disease. Brain, 122,

2133-2146.

Thompson, P. (1980). Margaret Thatcher -- A new illusion. Perception, 9, 483-484.

Thornton, I. M., Rensink, R. A., & Shiffrar, M. (2002). Active versus passive

processing of biological motion. Perception, 31(7), 837-853.

Thornton, I. M., & Vuong, Q. C. (2004). Incidental processing of biological motion.

Current Biology, 14(12), 1084-1089.

Thorpe, S., Fize, D., & Marlot, C. (1996). Speed of processing in the human visual

system. Nature, 381(6582), 520-522.

Topka, H., Valls-Sole, J., Massaquoi, S. G., & Hallett, M. (1993). Deficit in classical

conditioning in patients with cerebellar degeneration. Brain, 116 ( Pt 4), 961-

969.

Troje, N. F. (2002a). Decomposing biological motion: a framework for analysis and

synthesis of human gait patterns. Journal of Vision, 2(5), 371-387.

Troje, N. F. (2002b). The little difference: Fourier based gender classification from

biological motion. In R. P. Würtz & M. Lappe (Eds.), Dynamic Perception (pp.

115-120). Berlin: Aka Verlag.

Troje, N. F. (2003). Reference frames for orientation anisotropies in face recognition

and biological-motion perception. Perception, 32(2), 201-210.

Troje, N. F., & Bulthoff, H. H. (1996). Face recognition under varying poses: the role of

texture and shape. Vision Research, 36(12), 1761-1771.

Troje, N. F., & Kersten, D. (1999). Viewpoint-dependent recognition of familiar faces.

Perception, 28(4), 483-487.

Troje, N. F., Westhoff, C., & Lavrov, M. (in press). Person identification from

biological motion: Effects of structural and dynamic cues. Perception &

Psychophysics.


133

Ungerleider, L. G., & Mishkin, M. (1982). Two cortical visual systems. In D. L. Ingle,

J. W. Mansfield & M. A. Goodale (Eds.), Advances in the Analysis of Visual

Behavior (pp. 549-596). Cambridge, MA: MIT Press.

Vaina, L. M. (1994). Functional segregation of color and motion processing in the

human visual cortex: clinical evidence. Cerebral Cortex, 4(5), 555-572.

Vaina, L. M., Lemay, M., Bienfang, D. C., Choi, A. Y., & Nakayama, K. (1990). Intact

"biological motion" and "structure from motion" perception in a patient with

impaired motion mechanisms: A case study. Visual Neuroscience, 5(4), 353-

369.

Vaina, L. M., Solomon, J., Chowdhury, S., Sinha, P., & Belliveau, J. W. (2001).

Functional neuroanatomy of biological motion perception in humans.

Proceedings of the National Academy of Sciences, 11, 11.

Valentine, T. (1988). Upside-down faces: a review of the effect of inversion upon face

recognition. British Journal of Psychology, 79(Pt 4), 471-491.

Warren, W. H., Jr., Kim, E. E., & Husney, R. (1987). The way the ball bounces: visual

and auditory perception of elasticity and control of the bounce pass. Perception,

16(3), 309-336.

Watson, J. S., Banks, M. S., von Hofsten, C., & Royden, C. S. (1992). Gravity as a

monocular cue for perception of absolute distance and/or absolute size.

Perception, 21(1), 69-76.

Webb, J. A., & Aggarwal, J. K. (1982). Structure from motion of rigid and jointed

objects. Artificial Intelligence, 19(1), 107-130.

Wheaton, K. J., Pipingas, A., Silberstein, R. B., & Puce, A. (2001). Human neural

responses elicited to observing the actions of others. Visual Neuroscience, 18(3),

401-406.

Woodruff-Pak, D. S., Papka, M., & Ivry, R. B. (1996). Cerebellar involvement in

eyeblink classical conditioning in humans. Neuropsychology, 10, 443-458.

Yamaguchi, M. K., & Fujita, K. (1999). Perception of biological motion by newly

hatched chicks and quail. Perception, 28 Supplement, 23-24.


134

Zimmermann, P., & Fimm, B. (1993). Testbatterie zur Aufmerksamkeitsprüfung.

Würselen: PSYTEST.

135

List of Partial Publications

Jokisch, D., Daum, I., Suchan, B., & Troje, N. F. (in press). Structural encoding and

recognition of biological motion: Evidence from event-related potentials and

source analysis. Behavioural Brain Research.

Jokisch, D., Daum, I., & Troje, N. F. (submitted). Self recognition versus recognition of

others by biological motion: Viewpoint-dependent effects. Perception.

Jokisch, D., & Troje, N. F. (2003). Biological motion as a cue for the perception of size.

Journal of Vision, 3(4), 252-264.

Jokisch, D., Troje, N. F., Koch, B., Schwarz, M., & Daum, I. (submitted). Differential

involvement of the cerebellum in biological and coherent motion perception.

European Journal of Neuroscience.

136

Declaration

I guarantee that I have written this dissertation autonomously and without any

illegitimate aids, the references and aids used are cited in their entity. This dissertation

has not been submitted to another faculty, it has not been published yet with the

exception of the partial publications listed below. I guarantee that I will not publish the

dissertation before completion of the promotion procedure.

I have complied with the regulations laid down in the latest version of the “Guidelines

for Good Scientific Practice and Procedural Principles for Dealing with Suspected

Infringements in Academic Research Work”

137

Acknowledgments

I would like to acknowledge many people for helping me during my doctoral work.

First of all I would like to thank my supervisors Prof. Dr. Irene Daum and Prof. Dr.

Nikolaus F. Troje. Throughout my doctoral work they supported me with their great

engagement, their excellent knowledge and their analytical skills.

I would like to thank the International Graduate School of Neuroscience of the Ruhr-

University Bochum for providing great research and education opportunities and for

funding my research and attendance at international conferences.

I am grateful to my fellow colleagues at the Department of Neuropsychology and the

Biomotion-Laboratory of the Institute of Cognitive Neuroscience for their assistance

and fruitful discussions concerning my work. Special thanks to my fellow graduate-

student Christian Bellebaum for proof reading and helpful comments on a preliminary

version of my dissertation. Thanks to Cord Westhoff for programming some of the

stimuli used in my experiments.

I would like to thank my fellow graduate-students at the International Graduate School

of Neuroscience for their support during the entire period of our studies.

Last but not least I am especially grateful to my girl-friend, my family and all my

friends for their understanding, patience and for helping me to keep my life in balance.

138

Curriculum Vitae

Personal Data

Name: Daniel Jokisch

Date of birth: 22.07.1975

Place of birth: Bottrop, Germany

Nationality: German

Address: Institute of Cognitive Neuroscience

Department of Neuropsychology

Ruhr-University Bochum

Universitätsstr. 150, 44780 Bochum

e-mail: [email protected]

Private address: Am Gartenkamp 18, 44807 Bochum

Educational background

1982-1995 Primary school and grammar school in Bottrop, degree

“Allgemeine Hochschulreife”

1995-1996 Alternative civilian service at the “Ambulanz Hilfe für das

autistische Kind” in Bottrop, Germany

1996-1999 Psychology student at the University of Trier, Germany

139

October 1998 Intermediate diploma in psychology (cumulative grade “sehr

gut”)

1999-2001 Psychology student at the Ruhr-University Bochum, Germany;

student assistant at the Department of Biopsychology

2000-2001 Diploma thesis “Visuelle Wahrnehmung von absoluter Größe in

biologischer Bewegung”

October 2001 Diploma in Psychology “with distinction”

2001-2004 PhD-Student at the “International Graduate School of

Neuroscience“ of the Ruhr-University Bochum

since October 2004 Research assistant at the Department of Neuropsychology of the

Institute of Cognitive Neuroscience of the Ruhr-University

Bochum

the neuropsychological basis of perception of biological...

Documents