motion - preps verbs pustejovsky
TRANSCRIPT
-
8/2/2019 Motion - Preps Verbs Pustejovsky
1/29
1
Introduction
1.1 Overview
1.1.1 MotivationNatural language abounds with descriptions of motion. This is hardly surprising,
since our environment teems with slithering, swimming, flying, and cruising crea-
tures that navigate in a world with natural elements that can spin, flow, slide, whirl,
etc. Our experience of our own motion, and our perception of motion in the world,
together have given human languages substantial means to verbally express many
different aspects of movement, including its temporal circumstances and its spatial
trajectory and its manner. In every language on earth, verbalizations of motion can
specify changes in the spatial position of an object over time. In addition to when and
where the motion takes place, languages additionally characterize how the motion
takes place: its path, its manner, how it was caused, etc. The path of motion, in
particular, involves conceptualizations of the various spatial relationships that an
object can have to other objects in the space it moves in.
Physicists and philosophers have long theorized about the nature of space and
spatial relationships. Newton (1995) believed that space has an existence independent
of physical objects, an absolute space that will remain always similar and immov-
able (Newton 1995, Scholium 3). Objects, in his account, occupy places that are part of
absolute space, which affords a universal coordinate system with objects and theirrelationships being characterizable in terms of Euclidean geometry. This sort of model
of space underlies most of the classical, pre-relativistic analyses of motion in physics.
The conception of space found in natural languages is quite different. As we shall
see, it allows for positioning objects in terms of coordinate systems, but does not have
built-in a universal, absolute coordinate system that allows for precise specification of
object positions. (Of course, languages can in many cases specify relatively precise
positions by importing absolute coordinate systems.) Typically, a figure object is
expressed as being in a particular orientation (left, east, under, etc.) with respect
to another reference or ground object and possibly a third object, the viewer
(Levinson 2003). A figure object can also be positioned in terms of topological
relations (inside, separate from, etc.) along with distance from a ground object.
-
8/2/2019 Motion - Preps Verbs Pustejovsky
2/29
When objects are positioned without a reference object, the descriptions can indicate
paths in a coordinate system (to the east or seaward). Space in language, at least
in terms of the way it is revealed by the use of closed-class terms for topology and
orientation, seems to be parasitic on objects and the relations between them, and canbe broadly described as incorporating a relational view of space.1
This book articulates a new computational linguistics approach to understanding
natural language descriptions of motion. Our goals are theoretical as well as prag-
matic. From a theoretical standpoint, we aim to provide a semantic theory of motion
expressions that can be used for computation. This sort of theory involves mapping
motion descriptions in natural language to formal representations that computers
can automatically reason with. As we shall see, such reasoning uses qualitative
models of space and time, making inferences about changes in the positions of
objects over time. From an empirical standpoint, we want our theory to mesh well
with natural language data, and so we allow our computational methods to avail of
information found in text corpora.
The ability to create computer programs that can automatically process large
corpora containing descriptions of motion has an important practical consequence:
it allows us to map from texts to data representations that can be of immense value in
everyday life. For example, a system could take a set of verbal directions for getting to
a particular place, and automatically transform it into a map with trajectories marked
on it. Narratives of journeys taken today and long ago could be parsed into logs thatrecord where, when, and how the various segments of the journey were carried out.
Documents involving media such as pictures and videos that have associated linguis-
tic annotations can be analyzed so as to retrieve spatial, temporal, and motion-related
information from collections of such media on the Web.
In this chapter, we will first discuss the challenges in linguistic analysis and
inference that are faced by such systems. After outlining our technical approach,
we highlight two key insights that inform our work. The challenges and our approach
give rise to a set of requirements that have to be met, in our view, in order to achieve
success; this constitutes a short list of desiderata. Last but not least, all research builds
1 The natural language-derived relational view of space that we have sketched is often viewed as being inconformity with Leibnizs philosophy of space. Leibniz denies the reality of an absolute space out there,arguing that space is a mental construct arising from an ordering of physical objects (like time, which he
views as a mental construct arising from an ordering of events). Specifically, an objects physical location isdetermined by its relation to that offixed (what we might call ground) objects: Particularly, that placeis that, which is the same in different moments to different existent things, when their relations ofco-existence with certain other existentes, which are supposed to continue fixed from one of those
moments to the other, agree entirely together. [ . . . ] Lastly, space is that which results from places takentogether. (Clarke 1717, p. 199; my elisions indicated by [ . . . ]). Leibnizs places are thus defined in terms ofrelations between objects, similar to the situation revealed in natural language usage. However, naturallanguage and its analysis has nothing to say about the metaphysical question as to whether space exists or isa mental construct.
2 Interpreting Motion
-
8/2/2019 Motion - Preps Verbs Pustejovsky
3/29
on the labor of others; to situate our work and convince the reader that we have
something interesting and plausible to say, we compare and contrast our work with
previous research in linguistics on spatial prepositions and motion verbs.
1.1.2 Challenges
In order to interpret motion expressions in natural language, each sentence has to be
first parsed along with morphological analysis, and once a syntactic structure is
arrived at and disambiguated from among alternative parses, the predicates and
their semantic arguments have to be identified, with the latter classified in terms of
their semantic roles (the agent of the event, the theme, the manner and path of
motion, etc.). To carry this out, the system must have knowledge of the morphology
and syntax of the language, as well as the mapping between the semantic arguments
of different lexical predicates on one hand, and on the other, the syntactic constitu-
ents (arguments) these predicates can combine with (i.e., subcategorize for) as well as
additional phrases (adjuncts) that co-occur with them in the sentence. This sort of
information is usually represented in a lexicon for the particular language. In
addition, the events must be anchored to the times they are purported to occur in.
For example, in the sentence The Princess of Wales arrived at a Christmas concert
last night, the syntactic subject The Princess of Wales has to be identified as the
Theme of the predicate arrive, at a Christmas concert as its Goal, and last night
as its Time. In addition, last night must be pegged to a time that is on the previousnight with respect to the speech time (which could of course be on the same day as
the speech time).
Here tense has to be recognized. Some languages (like the Bantu language Chi-
Bemba) have several past and future tenses; some, like Mandarin Chinese, do not
have grammatical tense; and still others like Burmese distinguish only between
ongoing or past events and others. These apparent linguistic peculiarities (which
are in fact entirely normal for the speakers of those languages) have to be taken into
account, along with context, to situate the event with respect to the speech time.
Events also have to be ordered with respect to each other, which can be non-trivial
when events are narrated in an order different from that of their occurrence. The
results of these inferences have to be represented in terms of an inventory of temporal
relations that is drawn from some calculus that deals with orderings in time. Time
expressions must also be resolved, to calendar times where possible.
These inferential tasks can be fairly challenging for computational approaches,
because most narratives will not explicitly date each event, and when time and
date expressions are used, they may be anaphoric, i.e., relative to times introduced
earlier in the discourse (as in arrived on Tuesday). Further, the inventory oftemporal relations in the calculus used must be expressive enough to capture the
distinctions between temporal relations found in any natural language; and it is also
Introduction 3
-
8/2/2019 Motion - Preps Verbs Pustejovsky
4/29
desirable to be able to carry out efficient computations using the calculus. This
reflects an important desideratum: the semantic representations need to be expressive
enough for natural languages, but also must be amenable to inference methods that
can be used in practical systems.Turning to spatial information, spatial references in the form of place names
(toponyms) mentioned in text must be identified and, when geographic in nature,
resolved to particular entities such as countries, mountain ranges, cities, etc., and when
construed as points, resolved to geo-coordinates where possible. This resolution
process can involve considerable disambiguation, as humans naturally tend to reuse
names when naming places as well as other entities. Spatial relationships involving
topological, orientation, and distance relations between places must be recognized.
This too can be challenging, due in part to the ambiguity of prepositions and adver-
bials. The unraveling of directions, in particular, can be notoriously difficult, as any
driver navigating from others helpful verbal directions can attest. In addition, some
languages have fairly elaborate inventories of closed-class terms for representing spatial
relations. For example, Talmy (2000) cites the (now extinct) Californian language
Atsugewi which has a set of suffixes appearing on the verb that mark some 50
distinctions of Ground geometries and the paths that relate to them. Some dozen of
these suffixes represent distinctions covered by the English preposition into, which
does not itself reflect such finer subdivisions. (ibid., p. 192). As with time, these spatial
relations must be represented in terms of some calculus that characterizes orderings inspace. Such a calculus must, of course, also satisfy the desideratum above.
The above inferences are just prerequisites for interpreting motion expressions.
Once the events are anchored to times, and the objects participating in the events are
located with respect to other objects in terms of spatial relations, motion events have
to be analyzed. In particular, information from the lexicon such as the class of the
motion verb must be brought to bear on the analysis; for example, run is a manner-
of-motion verb, while arrive is a path verb. This will allow the system to character-
ize motion events in terms of the event or situation involved in the change of
location, the object that is undergoing movement (the figure), the region (or path)traversed through the motion, a distinguished point or region of the path (the
ground), the manner in which the change of location is carried out, and the medium
through which the motion takes place. Once the motion is grounded in this way by
linguistic analysis, qualitative reasoning tools must operate on the underlying repre-
sentation, allowing inferences to be made. Maps and other visualizations that track
the movements of entities may also be generated from the representation.
1.1.3 ApproachThese requirements present a set of formidable problems for automatic interpreta-
tion of motion expressions in language. However, writing in the second decade of the
4 Interpreting Motion
-
8/2/2019 Motion - Preps Verbs Pustejovsky
5/29
21st century, we believe computational approaches have started to address these
challenges. The goal of our book is to flesh out a computational approach, addressing
for the first time in a systematic manner the integration of the language of motion
with qualitative reasoning. This integration is evaluated in terms of the desideratumabove, discussed in Chapters 3 and 4, highlighting gaps and outstanding problems.
We also indicate along the way, in Chapter 5, the performance accuracies of practical
systems.
Our approach integrates together the linguistic conceptualizations with the formal
methods, mapping one to the other in the context of natural language processing.
Our approach is empirical, driven by instances of language use found in text collec-
tions (or corpora), especially the newsletters, travel blogs, route directions, etc., found
on the Web. In terms of methodology, these corpora are first annotated by humans
with features reflecting the kinds of linguistic distinctions and analyses mentioned
above. Computers then mine the annotated corpora to learn automatically how to
reproduce the annotations, using a variety of machine-learning tools. These annota-
tions are then mapped to the representations used by the formal models, allowing
reasoning to be carried out over motion information captured from natural language.
Throughout, the goal of satisfying the above desideratum is addressed to the extent
possible. The details of this methodology are described in Chapter 5.
The automatic systems that result from training on the annotated data offer both a
working embodiment of the theory and the modularity that it defi
nes, as well aspractical tools that can interpret motion expressions in language and generate
visualizations including maps and sketches. From a theoretical standpoint, this
methodology allows linguistic theories to be tested empirically, both in terms of the
breadth of their applicability when faced with actual language use, as well as the
precise linguistic representation that should result for each example. This test also
involves measuring the reliability of humans in terms of the annotations that they
produce. In practical terms, the approach results in systems with a text-to-sketch
capability that can display tracks on a map of where a moving object has been at
particular times. For example, given a bikers travel blog as input, a map with trackscould be generated as output. The resulting systems can be evaluated and compared
with each other, stimulating in turn the development of new and better methods.
In a nutshell, we offer an integrated perspective on how language structures
concepts of motion, and how the world shapes the way in which motion is linguisti-
cally expressed. The books approach is two-pronged: analysis of the details of
language use in different contexts (based on the exploitation of linguistic corpora),
along with theoretical modeling and formal reasoning (based on qualitative
representations).While there has been a great deal of linguistics research on the semantics of motion
verbs as well as locative constructions, and considerable research on qualitative spatial
reasoning, there has been little interdisciplinary effort on trying to connect these two
Introduction 5
-
8/2/2019 Motion - Preps Verbs Pustejovsky
6/29
fields in a systematic way. This is the first book, we believe, to analyze concepts of
motion in language while integrating these two fundamental points-of-view.
In the rest of this chapter, we outline two key insights that inform our approach.
After discussing our desiderata, to further situate our approach, we differentiate ourframework from other work in linguistics, as well as compare our classifications and
semantics for motion with other relevant approaches.
1.2 Key insights
1.2.1 Spatial abstractions
One of the key insights from prior research has to do with the types of conceptualization
needed to understand spatial language, e.g., Miller and Johnson-Laird (1976), Herskovits
(1986), Talmy (1983, 2000), among others. For example, research by Talmy (1983, 2000)
has characterized various primitive templates or schemas for representing motion. In a
description like (1), a complex spatial scene is abstracted as a geometric point (the
figure) moving towards another point (the ground) for a bounded temporal extent.
Likewise, a moving object may be described as a point moving along a path that is a line
(2), or as a line moving coaxially along the linear path (3).
(1) The ball rolled toward the lamp for 10 seconds.
(2) The ball rolled across the railway bed.
(3) The trickle flowed along the ledge.
The idealization is such that the speaker is able to abstract away from irrelevant
details such as the length or orientation of the path, representing each spatial scene
using a schema, and the hearer in turn is able to recreate the scenes from the schema.2
Talmy points out that these representations do not rely on Euclidean geometry and
the properties of metric spaces, emphasizing instead topological relations that remain
invariant irrespective of changes in sizes, distances, and shapes of the objects. He also
points out that while the expressions for the geometries of figure objects tend to be
limited in variety, the geometries of ground objects, by contrast, are less constrained
and vary considerably with the language, including bounded planes (e.g., the bike
sped across the field/around the track), cylindrical forms (the bike sped through
the tunnel), a wide variety of different types of enclosures (I crawled out the
window, I ran in the house), etc.
A related set offindings has to do with the differences across languages in the way
one can specify a figure object as being in a particular orientation (left, east,
2 The use of such intuitive geometries begs the question as to whether the points being idealized are infact mathematical points. After all, natural language does not typically construe points in space or time asbeing dimensionless; instead, they are all conceived as having extent.
6 Interpreting Motion
-
8/2/2019 Motion - Preps Verbs Pustejovsky
7/29
under etc.) with respect to another reference or ground object and possibly a third
object, the viewer. Studies of speakers across a wide variety of languages have
revealed a basic inventory of three types of geometric coordinate systems (frames
of reference) whose types are unevenly distributed, along with a variety of idiosyn-cratic instantiations, across languages (Levinson 2003). The human ability to refer to
and pick out objects in space relies on these particular frames of reference. These are
discussed in more detail in Chapter 3.
While understanding spatial descriptions appears to rely on interpreting such
topological and geometrical relationships, it is important to note that it does not
require precise geometries. Humans, after all, communicate successfully by and large
without specifying the relatively exact (e.g. GPS) positions of objects and their shapes.
We are able to describe and understand fairly elaborate motions, without needing to
drill down into equations that characterize the physical motions signaled by these
verbs. The use of imprecise and often incomplete qualitative geometric descriptions
(instead of quantitative ones such as specifying the coordinates and shapes of every
object) allows human communication to be highly efficient. Our communication
relies on a rich commonsense model of the world that has proved sufficient for
humans to survive and evolve until now.
In turn, this fact has hardly gone unnoticed in artificial intelligence research.
Having an artificial agent reason qualitatively allows for reasoning to be more
effi
cient in some situations, since abstracting away from numerical details allowsthe agent to focus on more compact representations that isolate just the relevant
information needed to solve a particular problem. AI approaches to qualitative
reasoning have developed a rich set of geometric primitives for representing time,
space (includingdistance, orientation, and topological relations involving notions
such as contact and containment), and together with those, motion. The results of
such research have yielded a wide variety of spatial and temporal reasoning logics
and tools. Qualitative Spatial Reasoning has been successfully applied to military
sketch maps (Forbus et al. 2003), meteorology (Bailey-Kellogg and Zhao 2004), robot
navigation (Moratz and Wallgrn 2003), integration of sensor information forenvironmental monitoring (Jung and Nittel 2008), etc.
In contrast, the primitives specified in the linguistic approaches above are not
expressive enough for formal computational reasoning. To address this gap, in
Chapter 3, we map the geometric and topological primitives and calculi used in
qualitative reasoning in a systematic manner to natural language. Our work thus
allows for more formal and expressive models to be constructed for linguistic
representations. Our innovations are similar in spirit to Miller and Johnson-Laird
(1976) and Johnson-Laird (1977), who argued that understanding of language in-volves translating a sentence into an executable program. We are thus committed to
providing computationally expressive ways of representing motion expressed in
natural language, in particular subscribing to the idea that understanding motion
Introduction 7
-
8/2/2019 Motion - Preps Verbs Pustejovsky
8/29
in language involves assembling and executing programs. However, the program-
ming framework we use, discussed in Chapter 4, involves precise formal logics
developed in computer science, rather than Miller and Johnson-Lairds early and
somewhat ad hoc procedural semantics.3
In section 1.3.3, we compare our approachto the semantics of motion with several other approaches.
1.2.2 Motion semantics: action- versus location-based predicates
Motion verbs, according to Talmy (1985, 1991, 2000), occur in syntactic constructions
that express several semantic components: (i) a Figure object that moves with respect
to (ii) a Ground object, along a spatial region, called (iii) the Path. There are also two
additional components (called co-events, in keeping with his view that they are
construable as distinct events): (iv) the Manner of the movement and (v) the Cause
that is responsible for the motion.
A further distinction that Talmy makes (one that is largely borne out by cross-
linguistic research) is that languages have two distinct strategies for expressing
concepts of motion. In satellite-framing, commonly used in English and other
Germanic languages, as well as Slavic languages, also called manner-type languages,
the main verb conflates (i.e., contains a morpheme that encodes) the manner or
cause of motion, while path information is expressed in satellites.4 Here a satellite is
any constituent other than a noun-phrase or prepositional-phrase complement that
is in a sister relation to the verb root (Talmy2000, p. 102), and includes particles,affixes, etc.5 Thus, in (4a), the language represents the motion as an action of
bouncing, with slid/ rolled/ bounced expressing the manner of the motion,
and the path being expressed by the satellite down.6 In contrast, in verb-framing,
found in Turkish, Romance, Semitic, and other languages, also called path-type
languages, the verb conflates the path, whereas the manner is optionally expressed
by adjuncts, as in the Spanish (4b).
3 The procedural semantics of Miller and Johnson-Laird (1976) is based on primitive routines such asfindingin a search domain an entity referred to by a natural language description, testingif the particularproperties predicated by the description hold of it, and actingso as to make the description be true of theentity.
4 Such manner-of-motion verbs are extremely common in English, as attested by the long list of suchverbs in the verb classification of Levin (1993).
5 Talmy (1991) characterized satellites in more detail: The satellite, which can be either a bound affix ora free word, is thus intended to encompass all of the following grammatical forms, which traditionally havebeen largely treated independently of each other: English verb particles, German separable and inseparable
verb prefixes, Latin or Russian verb prefixes, Chinese verb complements, Caddo incorporated nouns and
Atsugewi polysynthetic affixes around the verb root.
(Talmy1991, p. 486).6 Likewise, in the napkin blew off the table, the verb conflates the Cause of the motion, with the pathbeing expressed by the satellite off, In addition to Manner/Cause and Path conflation, Talmy (1985)points out that verbs can also conflate Figure information, as in the Atsugewi verb root -caq-, whichmeans for a slimy lumpish object (e.g., a toad, a cow-dropping) to be move/be located.
8 Interpreting Motion
-
8/2/2019 Motion - Preps Verbs Pustejovsky
9/29
(4a) The rock slid/rolled/bounced down the hill.
(4b) La botella entr a la cueva (flotando)
the bottle moved-in to the cave (floating)
The bottle floated into the cave.
Here the language represents the motion as a change of location. Note that there
are exceptions; English has Romance-derived verbs like enter, arrive, ascend
etc. that encode path. As Talmy (1985) points out, the (small number of) verbs in
English that conflate Path are mostly Romance borrowings.
Now, various scholars including Talmy have recognized that this classification is
not quite disjoint. For example, in languages involving serial verb compounds, like
Lahu, Thai, and Mandarin Chinese (Slobin 2004), it is unclear which one is the main
verb; and in Native American language families such as Hokan and Penutian, path
and manner morphemes together form part of a verb complex, with neither one
being classifiable as a main verb or satellite (Delancey 1989). Also, in the Australian
language Jaminjung, motion is expressed by one of five core verbs combined with
preverbs that encode both path and manner with neither one being of subordinate
status (Schultze-Berndt 2000). All such languages have been designated by Slobin
(2004) as belonging to a third category instantiating equipollent-framing, where
both manner and path are equally salient. In response, Talmy (2009) has accepted
that cases of equipollent framing definitely exist. For example, based on a set oflinguistic criteria for what constitutes a main verb, he points out that in the case of
Mandarin serial verbs, the verb in the first position is clearly the main verb, while the
verb in second position is sometimes viewed as subordinate, and sometimes a main
verbin the latter case, demonstrating equipollent framing. However, such in-
stances, he shows, are relatively rare.
Given this qualified but fundamental linguistic distinction,7 the semantic repre-
sentations for verbs can involve two classes of logical predicates: action-based
predicates (e.g., manner-of-motion verbs found in satellite-framing patterns, like
bike, drive, fly, etc.) and location-based predicates (e.g. for path verbs found inverb-framing patterns, such as arrive, depart, etc.). Action-based predicates do
not make reference to distinguished locations, but rather to the assignment and
reassignment of locations of the object, through the action. Since the location-based
predicates focus on points on a path, we view them as making reference to a
distinguished location, and the location of the moving object is tested to check
its relation to this distinguished value.
The predicate semantics makes use of Dynamic Interval Temporal Logic (DITL)
from Pustejovsky and Moszkowicz (2011), which in turn blends dynamic logic (Harel
7 For equipollent languages, our semantic representation will thus have to make use of a combination ofaction- and location-based predicates.
Introduction 9
-
8/2/2019 Motion - Preps Verbs Pustejovsky
10/29
1984) with a first-order linear temporal logic (Allen, 1984; Moszkowski, 1986; Manna
and Pnueli, 1995; Krger and Merz, 2008). DITL is a hybrid, first-order dynamic logic
where events are modeled as either dynamic processes or static situations. Here event
expressions refer to simple or complex programs, and states refer to preconditions orpost-conditions of these programs. Assignment-of-location is modeled as an atomic
program, and change-of-location is modeled as a compound program, whose
relation is determined compositionally by the relations denoted by its atomic parts.
This approach to modeling the semantics of motion is discussed in more depth in
Chapter 4.
There are obvious subtypes of action-based predicates, due, for example, to the
type of vehicle involved in the motion (bike, drive, etc.). Just as important are
aspects of manner defined in terms of topological constraints between the objects
throughout the motion. Consider a figure object that is moving with respect to a
ground object. Here we can consider four subclasses, based on the orientation of the
figure with respect to the ground, whether the topological relation is constant
throughout the process of motion, whether it involves all of the figure or only a
part thereof, and characteristics of the medium in which the figure moves.
Similarly, location-based predicates can be differentiated according to how many
formal qualitative dimensions are involved in their definitions. For example, the
simplest path is merely an implicit line associated with a distinguished end or
start point, as in the case of thetopological path
verbs
arrive
,
exit
,
take off
,
etc. This can be further refined to make reference to orientation or direction, as
in the orientation path verbs climb and descend, metric information, as in
the topometric verbs approach, near, etc., or a combination of both, as in the
topometric orientation expressions just below or just above.
In this book, we will examine how these categories and subcategories of motion
predicates are expressed through qualitative spatial and temporal models. In the next
section, we critically assess, in the light of our approach, prior work on the semantics
of spatial prepositions, verb classification, and motion verb semantics.
1.3 Desiderata
The challenges we identified earlier can only be met if we constrain our approach to
meet some strict requirements. These have to be borne in mind when we assess any
technical approach, both ours as well as that of other research. We list these now,
while delving into them further throughout this chapter and book.
1. As mentioned earlier: the semantic representations need to be expressive
enough for natural languages, but also must be amenable to inference methodsthat can be used in practical systems.
10 Interpreting Motion
-
8/2/2019 Motion - Preps Verbs Pustejovsky
11/29
2. The semantic theory must be denotational, i.e. provide a mapping in terms of a
model of things in the world.
3. The semantic analysis must be compositional, i.e., the meaning of sentences
must be built up systematically from the meanings of the constituent phrasesand in turn the lexical elements in them, in tandem with the syntactic opera-
tions that assemble them.
4. The representations used have to support qualitative reasoning.
5. The systems built must be evaluated to be accurate and efficient enough to
support practical applications.
1.4 Theoretical background
1.4.1 Spatial prepositions
1.4.1.1 Classic studies There has been considerable prior research on motion verbs
(e.g. run), spatial prepositions (across), adjectives (narrow), adverbs (far),
nouns (lake), proper names (San Francisco), and other locative constructions.
We focus here on spatial prepositions and adpositions. Two key issues emerge from
the prior research. The first issue is the nature of the spatial representations involved,
and the second issue is what exactly differentiates the different senses to produce
polysemy. Underlying them both is a third issue, the characteristics and properties of
a theory of meaning.Prepositions are traditionally classified as either directional or locative (Miller and
Johnson-Laird 1976; Herskovits 1986; Zwarts and Winter 2000). Directional ones involve
a path and/or movement, and include across, around, from, into, onto, and
to. Locative prepositions are sub-classified into projective ones, which involve a point-
of-view (e.g. above, behind, below, beside, in front of, over, under) and
non-projective ones (e.g. at, between, in, inside, on, outside, near).
The work of Miller and Johnson-Laird (1976) represents a significant advance in the
modeling of the semantics of spatial prepositions. Consider their analysis ofin asin(5):
(5a) a cityin Sweden
(5b) the coffee in the cup
(5c) the spoon in the cup
(5d) the scratch in the surface
(5e) the bone in the leg
In (5a,b), the figure is entirelyenclosed within the ground object, whereas in (5c)
part of the figure need not be enclosed in the ground. In (5b,c), the ground object is
conceptualized as some form ofcontainer. In (5d,e), the figure is entirely enclosed inthe ground object, with (5d) dealing with two-dimensional (2D) objects and (5e)
dealing with three-dimensional (3D) objects. To handle these cases, Miller and
Introduction 11
-
8/2/2019 Motion - Preps Verbs Pustejovsky
12/29
Johnson-Laird develop a semantic theory of parthood and topological relations, i.e.
mereotopology. In their account, in has a common meaning in the above uses: the
figure has a part that is totally inside the ground object.8 Providing a theory of
mereotopology, built, say on primitive notions of connection and parthood, isessential, we believe, to characterizing of spatial relations. Such a theory will be
discussed more in Chapter 2 and formalized in Chapter 3.
Likewise, consider the uses of on in (6).
(6a) the scratch on the surface
(6b) the picture on the wall
(6c) the lamp on the table
(6d) the house on the river
(6e) the boat on the river
Miller and Johnson-Laird point out that in (6ac), the relation is between surfaces.
In (6b), part of the figure is over a part of the ground (such as a hook), and the latter
part supports the rest of the figure. In (6c), if the table is on a rug, which is on the
floor, it is fine to saythe table is on the floor, because the region of interaction with
the floor includes the table legs. But the transitivity is limited: we cannot say in (6c)
that the lamp is on the floor. Searching the region of interaction with the floor will
not reveal the lamp.
Functional notions such assupport
and
regions of interaction
(or
affordances
of objects (Gibson 1977)) are part and parcel of a theory of spatial relations; in this
book, though we will take note of their presence, we will not be formally representing
functional notions, as they presuppose a great deal of commonsense knowledge that
is difficult to acquire and represent in a general way for use in practical systems. Of
course, in specific domains, it is possible to enumerate object-specific functional
properties (including shape). For example, in their natural language-driven scene
rendering system, Coyne and Sproat (2001) associate 3D regions called spatial tags
with objects, so that the object representing daisy has a stem spatial tag and
likewise test-tube a cup spatial tag. Given the input expression the daisy is in thetest tube, the graphical output has the daisys stem inserted into the test tubes
cupped opening. A similar approach could be used to represent the meaning of (5c).
However, his daisy is in the scrapbook would presumably require an entirely
different spatial tag for daisy, begging the question of the enumeration of
domain-independent functional properties for each object.
Regarding (6d), it involves a path that is potentially ambiguous between being on
the edge of the ground object (the river) and being on the surface of the ground object
(where the surface is that part of the object that will refl
ect light to the eye or that can
8 In their semantic framework, the relations are between percepts of figure and ground, rather thanbetween things in the world.
12 Interpreting Motion
-
8/2/2019 Motion - Preps Verbs Pustejovsky
13/29
be explored by touch), with a strong preference for the former (in contrast to (6e)).
Based on this and other evidence, Miller and Johnson-Laird argue that on has two
spatial meanings: either the figure is part of the region of interaction with the surface
of the ground object, with the ground supporting the figure, or else the figure object isconstrued as being in a path relation with the ground object.
In subsequent research, Herskovits (1986) proposed underlying geometric mean-
ings for spatial prepositions in English involving geometric relations between figure
and ground objects; these relations are between objects construed as points, lines,
surfaces, volumes, and vectors. The preposition on in (7a), for example, involves
concepts ofcontiguity(the figure is next to and touches the ground object) and (as
we have seen) support (the ground object supports the figure). However, in (7b),
contrary to Miller and Johnson-Laird, she argues that support is not involved.
(7a) The bookon the table.
(7b) The wrinkles on his forehead.
In addition, the objects related by a preposition must be modeled in terms of their
geometric properties, expressed as geometric functions that define characteristics of
the space occupied by the object. For example, a table is geometrically constrained to
be bounded and definite in shape, whereas water is not. Other geometric functions
include idealizations (approximations to a point, line, surface, or plane), parts (e.g.
edges, bases, surfaces, etc.), axes, volumes, projections, and what she callsgood-
form. For example, in (8a), good form provides the Gestalt closure on the tree such
that a bird can be contained in the space occupied by that form, shown in (8b), from
Pustejovsky (1989).
(8a) The bird in the tree.
(8b) Included-in (Part (Place (Bird)), Interior (Outline (VisiblePart (Place (Tree))))).
Turning to the issue of polysemy, Herskovits argues that (7a) above expresses an
ideal meaning of on, whose sense is shifted in (7b). Senses can also shift due to a
pragmatic degree oftolerance, i.e. to handle fuzzy cases of (7a) where the book is on atable cloth which is in turn on the table. As a result, while an ideal meaning is semantic,
the actual senses in use are produced as pragmatic alterations to the ideal meaning.
From the standpoint of a theory of meaning, Herskovits account rejects the notion
of a compositional theory. Further, although there is a sketch of a mereotopology,
there is no precise theory of how exactly the pragmatic alterations occur, resulting in
a lack of applicability to computational processes.
1.4.1.2 Cognitive linguistics Along with Herskovits work, there has been a great
deal of activity in cognitive linguistics on the semantics of spatial prepositions. Herewe will consider some of the core work from this area, while deferring a discussion of
Jackendoffs contributions to the next section.
Introduction 13
-
8/2/2019 Motion - Preps Verbs Pustejovsky
14/29
One of the fundamental tenets of this rather diverse field is that human concepts
are embodied, i.e., the concepts we have access to and the nature of the reality we
think and talk about are a function of our embodiment (Evans et al. 2007, p. 7).
Following (Johnson 1987; Lakoff and Johnson 1980; Brugman 1981; Mandler 2004;Evans, op. cit.), basic topological concepts like contact and inclusion (in the spatial
sense ofenclosure) are formed through the infants interaction with objects. In this
account, it is the schema of the container which underlies both the enclosure or
inclusion sense of in in (9a) and its metaphorical extension in (9b).
(9a) The cat is in the house.
(9b) The cat is in trouble.
The nature of polysemy is a contentious issue in cognitive linguistics. Consider the
preposition over, which has been the subject of considerable discussion. The classic
account of Lakoff (1987) makes fine-grained sense distinctions for the preposition
based on characteristics of the figure and ground object. In (10a), the landmark (i.e.,
ground object) is an extended object, but not so in (10b) (examples from Tyler and
Evans 2001):
(10a) The helicopter hovered over the ocean.
(10b) The hummingbird hovered over the flower.
Likewise, in (11a) there is contact with the wall, whereas there is not in (11b); in(11c), there is covering and occlusion of the ground. These differences would
warrant, in the classic account, different senses for over.9
(11a) The boy climbed over the wall.
(11b) The tennis ball flewover the wall.
(11c) Joan nailed a board over the hole in the ceiling.
(11d) The heavy rains caused the river to flowover its banks.
In general, this sort of argument by appeal to arbitrary spatial distinctions proliferates
senses in a somewhat unprincipled manner. There is no underlying mereotopologicaltheory, providing no way of building up spatial concepts from more primitive ones.
Researchers have struggled to constrain the number of senses, using (quite sensi-
bly) dictionaries, lexical resources, and various theoretical criteria. For example,
Tyler and Evans (2001) take their cue from Herskovits and propose a proto-sense
or (primary sense) of every preposition that they argue is the diachronically earliest
sense;10 the proto-sense ofover means above except that unlike above, there is
potential contact with the ground. Notably, this sense does not contain path
9 Examples in (11) from Tyler and Evans (2001, pp. 728, 732, 757).10 Postulating the diachronically earliest sense as more basic in every case does not seem at all correct
given modern usage.
14 Interpreting Motion
-
8/2/2019 Motion - Preps Verbs Pustejovsky
15/29
information. The above and across interpretation in (11a) and (11b), which does
include the path, is not a different sense ofover, but arises in conjunction with the
meaning of the verb and the figure and ground objects. In (11c), however, a non-
primary sense of
over
is differentiated, as it involves the distinct spatial notion ofcovering. In (11d), the sense is distinguished based on a supposedly distinct spatial
notion of excess given by a cognitive scenario of a container overflowing, with the
figure rising higher than the top of the ground object.
The Tyler and Evans proposal suffers from the same problems we observed with
Herskovits account. Appealing to potential contact between figure and ground only
serves as a way of grouping together disjunctions. Further, (11d) does not seem to
warrant a different sense, given the contribution of the verb flow. In addition, as
Cuyckens (2007) points out, consider (12a) and (12b).
(12a) The cat jumped overthe wall.
(12b) The cat jumped up on the wall.
The only syntactic difference is the preposition, but (12a) results in a different path
than (12b)the cat ends up on the wall in the latter, but on the other side of the wall
in the former. Thus over must involve a path meaning. Having said that, the
question arises as to the set of spatial properties that should be considered when
distinguishing spatial senses of a preposition. Unless these properties are drawn from
a structured domain, in particular geometric or topological domains that can bemade mathematically precise, pretty much any set of spatial properties that sound
relevant might be used, since the theory has no way of evaluating them except by
arguments based on linguistic tests.
In general, the inability to find reliable criteria to differentiate word senses is also a
reflection of the lack of empirical, corpus-based methodology in the cognitive
linguistics approach. Corpus-level annotation of word senses is a well-established
task in computational linguistics, e.g. SENSEVAL-1 (Kilgarriff and Palmer 2000). In
these annotation efforts, fine-grained lexical resources such as WordNet (Fellbaum
1998), where different senses of words are grouped into synonym classes calledsynsets (with the classes being linked by conceptual relations such as hypernymy
and part-whole relations), have been used as sense inventories for annotating open-
class terms in large corpora. Certain senses will of course be more frequent than
others, and the more frequent ones may coincide with notions of central or more
salient meanings for a given word. (As it happens, WordNet provides a ranking of
different senses based on frequencies in the British National Corpus.) This sort of
project also has the practical benefit of dividing the problem of polysemy into those
word senses that are easy to agree on and those that arent, focusing attention on the
ones that pose challenges, and perhaps suggesting revisions or limitations to the
sense inventory. In SENSEVAL-3 (Mihalcea and Edmonds 2004), annotators agreed
with each other almost two-thirds of the time.
Introduction 15
-
8/2/2019 Motion - Preps Verbs Pustejovsky
16/29
Turning to the theory of meaning, cognitive linguistics is an inherently mentalistic
theory of meaning.11 In contrast, denotational theories12 are important for several
reasons: (i) Truth and reference are important for successful communication, as
work in discourse modeling, e.g. Kamp and Reyle (1993) indicates. (ii) Mentalistictheories tend not to tell us what role in understanding the things communicated
about play. As Putnam (1975) points out, a person may not have the conceptual
knowledge to tell the difference between a beech and an elm, even though the two
terms clearly refer to different things in the world. (iii) Using a logical representation
allows for logical inferences to be made, for formal properties of computation to be
studied systematically, etc. The latter property is of course of considerable interest to
computational approaches.
1.4.1.3 Jackendoff In our earlier linguistic analyses, we mentioned paths. In additionto Talmy, another cognitive linguist who provides a rich representation for paths is
Jackendoff (1983, 1990). In his theory of Lexical Conceptual Structure (LCS), the verbs
of location and motion are viewed as fundamentally spatial, with non-spatial senses
being an extension of the spatial senses. Jackendoff gives distinguished status to
places and paths in LCS.
Paths can be bounded, where the ground is the start- or end-point of the path.
Another type of path is a direction, as in (13a), where the ground object does not fall
on the path, but would if the path were extended some unspecified distance (ibid.,
p. 165). A third kind is a route, where the ground object is related to some point in the
interior of the path, as in (14a). Unlike Herskovits account, Jackendoffs semantics
has an implicit mereotopology and is compositional. He relies on functions to
assemble meanings of words together to form meanings of phrases. A place-function
(e.g. IN, ON, INSIDE, UNDER, etc.) takes a Thing and returns a Place, while a path-
function (FROM, TO, TOWARD, AWAY-FROM, and VIA) takes either a Thing or
a Place and returns a Path. Examples of place-and path-functions are shown in the
prepositional phrase meanings in (13b) and (14b).
(13a) [John ran] towardthe house.(13b) [Path TOWARD ([Thing house])]
(14a) [The car passed] through the tunnel.
(14b) [Path VIA ([Place INSIDE ([Thing tunnel)])]
11 Mentalistic, or representational theories of meaning, are concerned mainly with understanding therelation between linguistic expressions and things in the speakers mind, namely, explaining what goes on
in peoples minds when they use language.12 Denotational theories of meaning (i.e. as found in model-theoretic semantics) are concerned mainly
with the correspondence between expressions and things in the environment, and thus this enterprise aimsat a theory of truth and reference. Such theories represent the environment in terms of a formal model forthe denotation of expressions.
16 Interpreting Motion
-
8/2/2019 Motion - Preps Verbs Pustejovsky
17/29
While the semantics of LCS is obviously compositional, it is not intended to be
truth-conditional, and is thus in keeping with cognitive semantics precepts. Since it
has no basis in logic, Conceptual Structure cannot be used to make logical inferences,
and as such cannot account for entailments between sentences.13
Another drawbackis that the primitives corresponding to prepositions, such as IN, ON, TOWARD,
INSIDE, etc. are not further elaborated to support reasoning; they are functors in a
compositional syntax, but are not differentiated from each other in terms of seman-
tics. Finally, unlike the work say of (Talmy 2000), the geometry used is far too
abstract to be relevant to computational modeling of spatial reference and motion.
1.4.1.4 Vector representations It must be acknowledged that Jackendoffs ontology
of paths and places and the differentiation between place- and path-functions
constitute one of the more expressive accounts of the semantics of spatial preposi-tions offered within an entirely compositional semantics. His basic notions of paths
have been further elaborated by others, most notably within a denotational semantics
by Zwarts (2003). In the latters work, a spatial preposition denotes a set of paths,
where a path is defined as a continuous function from the real interval [0, 1] to points
(or regions) in space. The denotation of a prepositional phrase (PP) of the form into
the room is a set of paths whose end-point is inside the room. Zwarts associates
events with paths via a function that takes an event and returns its path. Accordingly,
the denotation of a verb phrase (VP) of the form enter the room is a set of events
such that (only) the end-point of the events path is inside the room.
In support of this theory, relations like into, inside etc. are based on an
underlying model of vectors14 (Zwarts and Winter 2000). Here, the preposition
inside is treated as a function which maps a set of points representing the ground
object A to a set of vectors whose start-points are on the boundary of A and whose
end-points are internal to A. Since there may be multiple vectors from different
points on the boundary to the particular end-point, only the shortest vector is
considered. The set of points representing an object is treated as convex,15 in keeping
with our use of prepositions like
inside
to conceptualize even non-convex groundobjects as being convex. As Zwarts and Winter point out, the ball is inside the bowl
is compatible with a situation where the ball is sitting on the bottom of an open bowl,
where the ball actually occupies a space that is disjoint from that of the bowl.
The preposition outside is similar, except that the externally closest vectors are
involved, i.e. the shortest vectors that start at the boundary of A and end at points
13 However, a truth-conditional semantics for Conceptual Structure has been demonstrated by (Zwarts
and Verkuyl 1994), who recast it as a many-sorted first-order logic.14 Other researchers have also explored vectors, including Talmy (2000), Bohnemeyer (2003), OKeefe(2003), and Carlson et al. (2003). However, they have not concerned themselves with building up acompositional semantics for spatial language based on vectors.
15 A set of points is convex if the line segment joining any pair of points in the set lies entirely in the set.
Introduction 17
-
8/2/2019 Motion - Preps Verbs Pustejovsky
18/29
not belonging to A. As for the preposition on, its meaning is a set of vectors each of
whose end-points is outside the set of points corresponding to the figure object, but
whose length is less than some small number, so that distance between figure and
ground is near zero.Although the theory of Zwarts and Winter (2000) does provide an elegant
compositional semantics for PPs, including those modified by measure phrases, it
can be faulted on several grounds. For one thing, though there are vectors and point
sets, there is no explicit mereotopology. The invocation of metric notions of distance
to represent topological relations is somewhat counter-intuitive. A related failing is
that the theory does not distinguish between in and inside, or between at and
on, and the case of (5c) mentioned earlier, where there is a part of the figure that is
outside the ground object, is ignored. Finally, carrying out formal reasoning using
these vector models is still an open question. In short, the theory does not provide an
adequate grounding in a spatial semantics that can be used for reasoning.
1.4.1.5 Assessment In summary, then, the prior theoretical research, while
providing insightful discussions of the semantics of spatial prepositions, has made
assumptions (such as those of cognitive linguistics) that are untenable in a computa-
tional approach, and has also largely ignored evidence from corpus-based annotation
efforts at distinguishing senses in context. While compositional treatments of prepo-
sitional meaning have flourished, the question of what underlying spatial primitives
to rely on has not thus far been tied to those available in qualitative reasoning
systems. In Chapter 3, we explore topological and geometric representations that
can be used for expressing prepositional meaning in qualitative reasoning systems.
1.4.2 Motion verbs
1.4.2.1 Langacker As with spatial prepositions, there has been a fair amount of
research on the semantics of motion verbs. We had earlier discussed the influential
work of Talmy and Jackendoff. Another key cognitive linguist who has tackled
motion is Langacker (1987). It is not possible to do justice to his overall cognitivistphilosophy here; instead, let us get down to brass tacks and examine his analyses of
motion verbs. Consider the verb enter. Langacker (1987) characterizes it as a
dynamic process, whose conceptual semantics involves, in effect, a temporally in-
dexed sequence of relations between the trajector (i.e. movingfigure object) and the
landmark (i.e. ground object, which may or may not move). The trajector changes
from a state of being spatially OUT with respect to the landmark to a state of being
IN with respect to the landmark. From his diagrams of image-schema16 (ibid.
16 An image schema is a mental pattern that recurrently provides structured understanding of variousexperiences, and is available for use in metaphor as a source domain to provide an understanding of yetother experiences (Johnson 1987, pp. 24).
18 Interpreting Motion
-
8/2/2019 Motion - Preps Verbs Pustejovsky
19/29
p. 245, figures 7.1 and 7.2), it appears that this change of state occurs over a conceived
time interval, where the process involves a sequence of an indefinite number of
component states (ibid. p. 244). As for the relations IN and OUT, they are explained
informally as follows:
The relation [A IN B], based on immanence, specifies that thecognitive events constituting the conception of A (in a given domain) are included
among those comprised by B. The relation of separation, which I will give as [A OUT
B], is based on the absence of such inclusion. (ibid. p. 228).
In contrast, the verb arrive, according to Langacker (1987), presupposes an
extended path of motion on the part of its trajectory, but only the final portions of
this trajectorythose where the trajector enters the vicinity of its destination and
then reaches itare specifically designated by this verb. (ibid. p. 246).
Langackers account does clearly capture some of our topological intuitions about
enter. However, his presentation relies on diagrams representing image-schema,
and there is no formal description of the process of entering. While one can accept
the idea of a primitive spatial relation IN standing for inclusion, characterizing it in
terms of relationships between cognitive events is somewhat vague. Further, there is
no clear distinction between enter and arrive, except by way of various diagrams
and the informal definitions above. More specifically, there is no statement that
arrive involves the trajector, at the end of the process, being merely AT the
landmark, as opposed to being IN the landmark as in the case of enter. This
problem is further borne out by his analysis of the verb
leave
: (Langacker 1988,p. 96) indicates that the trajector is at first IN with respect to the landmark, and then
overlaps with its boundary (i.e. trajector is AT the landmark), before being OUT with
respect to the landmark. Here too, there is no difference from exit.
Having critiqued his account, it is worth pointing out that Langackers intuitions
reflect a topological view of motion verbs. In Chapter 3, we will formalize notions
such as IN in terms of mereotopology, and in Chapter 4, we will provide a formal
semantics for verbs like enter and arrive that gives a specific computational
interpretation to notions similar to Langackers.
1.4.2.2 Jackendoff Let us turn now to the interpretation of motion in Jackendoffs
LCS (Jackendoff1983, 1990). In LCS, verbs of spatial motion, such as bike, are given a
common semantic template, which determines their syntactic behavior, shown in (15).
(15) [Event GO+LOC ([Thing]x, [Path]y)]
GO is a semantic primitive of motion, which is a function that takes as inputs a Thing
and a Path and returns as output an Event. GO+LOC involves movement specialized
to a locative semantic field17. When the above verb template is combined with a path
PP, we get examples like (16).
17 Analogously, verbs of temporal motion, such as delay, use GO+TEMP.
Introduction 19
-
8/2/2019 Motion - Preps Verbs Pustejovsky
20/29
(16a) John biked to the store.
(16b) [Event GO ([Thing John], [Path TO ([Place AT ([Thing store])])])]
A verb like enter is treated as equivalent to go into, and has the more
instantiated semantics shown in (17).
(17) [Event GO ([Thing]x, [Path TO ([Place IN ([Thing]y)])])]
Note that LCS, in addition to bearing the disadvantages described in the previous
section, also blurs important differences, since all motion verbs are represented just
by either GO(Thing, Path), STAY(Thing, Place), as in cling, ORIENT(Thing,
Path), as in point, BE(Thing, Place) as in lie, and GO_Ext(Thing, Path), as in
reach, along with their specialization to different semantic fields. The inability to
distinguish among verb meanings is a serious problem with such highly abstractrepresentations of meaning.
1.4.2.3 WordNet Given the theories of verb semantics, one would expect that lexical
resources would exist that provide a rich semantics for motion verbs. Unfortunately,
this is not the case. We mentioned WordNet (Fellbaum 1998) earlier, and its
differentiation and ranking of word senses based on corpora. In WordNet, verbs
are grouped into a hierarchy, with related verbs differentiated by manner into
troponyms. For example, the troponyms of arrive are: land, reach, flood/drive/
come in, light, perch, force-land, beach, disembark, debark, set down, touch down, andcrash land. However, while WordNet is widely used for its coverage of relations such
as synonymy and hypernymy, which is what it was designed for, it is impoverished
not only in terms of the syntactic representations for the verbs, but also in terms of
the absence of any semantic representation for lexical items. Consequently, research-
ers have integrated WordNet with other resources that provide the missing
information.
1.4.2.4 VerbNet VerbNet (Kipper et al. 2006) is one such key lexical resource that
provides syntactic and semantic information about verbs which are grouped intoclasses based on extensions of the well-known classification of Levin (1993). We first
discuss the latters classification, where verbs are grouped into semantic classes based
on participating in common meaning-preserving syntactic constructions involving
syntactic arguments, called diathesis alternations.
For example, consider the verbs break and cut. As seen in (18) (examples from
Kipper-Schuler (2005)), break participates in transitive (18a), the simple intransi-
tive (18b), the middle construction (18c), but not the conative alternation (18d).
(18a) John broke the jar.(18b) The jar broke.
(18c) Jars break easily.
(18d) *John broke at the loaf.
20 Interpreting Motion
-
8/2/2019 Motion - Preps Verbs Pustejovsky
21/29
In comparison, cut participates in the transitive, middle, and conative alternations.
(19a) John cut the bread.
(19b) *The bread cut.
(19c) Bread cuts easily.
(19d) John valiantlycutat the frozen loaf, but his knife was too dull to make a dent
in it.
These differences are grounds, in Levins account, for splittingbreak verbs (along
with similar-behaving verbs such as chip, crack, crash, crush, fracture, rip, shatter,
smash, snap, splinter, tear) into a separate class from cutverbs (with fellow-members
chip, clip, cut, hack, hew, saw, scrape, scratch, slash, snip). In particular, the motion
verbs (Levin class 51) are grouped into 9 subclasses.
As Kipper-Schuler (ibid.) points out, this method also produces classes whose
members are far from synonymous, e.g. the braid class, which counts among its
members bob, braid, brush, clip, comb, condition, crimp, crop, curl, etc. Further, the
classes are not disjoint, and some verbs are members of multiple classes with
conflicting sets of alternations. VerbNet attempts to fix these and other problems
by refining the classes (e.g. as in Dang et al. (1998), grouping together classes which
share at least three members), adding new classes, integrating the classes with
WordNet, and most importantly, providing semantic templates for each of the
classes.For example, consider the semantics for the path verb arrive in VerbNet (version
3.1), as in arrived in the US. The entry specifies that the entity that fills the semantic
role of Theme (the subject noun phrase (NP)) moves during the arrival event, and
that at the end of the arriving event, the location of the moving object is in the US,
i.e. the entity that fills the semantic role of the Oblique object (the PP). Thus, the
semantic information for arrive is expressed as:
(20) motion(during(E), Theme) location(end(E), Theme, Oblique)
As we shall see in Chapter 2, arrive is a verb whose meaning involves the figureobject traversing a path that goes from its not being located at the ground object to its
being at the ground object. Although (20) does not make reference to paths and to
start(E), VerbNet appears to at least capture part of the meaning.
However, as Zaenen et al. (2008) reveal, while some of the motion verbs in
VerbNet (such as carry) have start and/or end point information, others dont,
leaving a great deal of incompleteness. They argue that although they were able to get
around some of these glitches and extract change of location information from
VerbNet by a variety of post-processing rules, there is a more fundamental problemwith the VerbNet approach: the classification is driven by syntactic considerations
separating arguments from adjuncts. As is well-known, there is no one-to-one
mapping between syntactic predications and semantic ones. The latter often include
Introduction 21
-
8/2/2019 Motion - Preps Verbs Pustejovsky
22/29
as arguments constituents that are syntactically adjuncts. For lexical resources to be
helpful in normalizing textual information, they have to encode the distinction
between syntactic and semantic predication and be systematic about the correspon-
dence between the two
. (ibid., p. 390). Their investigation reveals, unfortunately,that VerbNet lacks such a systematic mapping.18
1.4.2.5 FrameNet Another well-known lexical resource is FrameNet (Baker et al.
2003), which has been developed based on the underlying theory of Frame Semantics,
e.g. Fillmore (1976). It involves specifying each lexical items syntactic properties in
the context of a hierarchy of semantic structures called frames, which represent the
experiential knowledge evoked by lexical items. The semantic roles of verbs (called
frame elements) are annotated in terms of corpus examples.
For example, consider the path verb
arrive, for which a FrameNet III example is
shown in (21).
(21) [The Princess of Wales THEME] arrived TARGET [smiling and laughing DEPIC-
TIVE] [at a Christmas concert GOAL] [last night TIME].
In FrameNets view, the lexical entryarrive evokes the frame ofarriving, which
is a subframe of (i.e. is part of) the traversal frame, which in turn is a subclass of the
motion frame and involves the Theme changing location with respect to a Path.
In the motion frame, a Theme starting out at a location expressed by the Source
role ends up at a Goal location, covering space between the two, expressed by the
Path role; or else, the Theme moves in a particular Area of Direction, or its Distance
may be expressed.19 Arriving involves a moving object (filling the semantic role of
Theme) moving in the direction of a location filling the semantic role of Goal.
According to the comments for the arrive lexical entry, the Goal is always
implied by the verb, but may or may not be explicit in the text; it indicates where
the Theme ends up, or would end up, as a result of the motion. Note that this
FrameNet representation is weaker than the one we have been advocating, in that it
doesnt commit to the
figure object of the Princess of Wales in (21) being located, at
the point of arrival, atthe ground object (the site of the Christmas concert). In turn,
FrameNets representation for the preposition at, while it is associated with a
Locative_relation frame (a subclass of the Trajector-Landmark frame that is derived
from Langackers account), does not convey any specific semantics for at.
18 In more recent work, Palmer et al. (2009) have tried to address some of these issues.19 The motion frame is defined as Some entity (Theme) starts out in one place (Source) and ends up
in some other place (Goal), having covered some space between the two (Path).
Additional frames thatinherit the motion frame elaborate on this definition. Goal-profiling frames account for verbs such asreach. Source-profiling frames capture verbs from the Leave class. Path-profiling frames are for verbssuch as traverse or cross, and, finally, the manner of motion can be elaborated on in additional framesfor verbs like run and fly.
22 Interpreting Motion
-
8/2/2019 Motion - Preps Verbs Pustejovsky
23/29
Likewise, the verb enter, which is also associated with the arriving frame and
illustrated in (22), does not indicate that at the end of the event, the figure we is
inside the ground object the upper room, thus failing to distinguish enter from
arrive
(in the latter, the figure is merelyatthe ground).
(22) We THEME entered TARGET [the upper room GOAL] [by a flight of stairs leading
from the north side of the yard PATH].
While FrameNet seems to do well with change of location motions, the hierarchy
can be confusing. Sometimes the motion frame is directly inherited as in the case of
the traversal frame. Conversely, the departing frame uses the motion frame (i.e. it
does not necessarily inherit or specialize the semantic roles of the motion frame) and
is a subclass of the traversal frame.
As another example, the manner verb drive is associated with the frame of
operate_vehicle, which has semantic roles that include those illustrated in (23),
from FrameNet III.20
(23a) [Jamie Shepherd DRIVER] droveTARGET [the bucketing old vehicle VEHICLE]
[out of the estate SOURCE] [towards the main road PATH].
(23b) [The riders DRIVER] droveTARGET [all over the place AREA].
(23c) Dhamma is [the charioteer DRIVER] [that DRIVER] drivesTARGET [the chariot
VEHICLE] [along the road [to Nirvana GOAL] PATH].The frame operate_vehicle is a subclass of the Operating_a_system frame,
inheriting or specializing all its semantic roles; it also uses the motion frame.
However, the combined information does not explicitly indicate that driving a
vehicle involves an iterated change of location. In Chapter 2, we will provide such
a semantics for manner verbs like drive.
All in all, while FrameNets rich subclassification of motion verbs and its integra-
tion of semantics, syntax and corpus data are both impressive and commendable,
FrameNet does not address or explicitly represent the sorts of spatial relationshipsinvolved in motion that we have been emphasizing. Further, although it has been
used for inferential tasks such as question-answering (Narayanan and Harabagiu
2004), FrameNets representation, even when mapped to knowledge representation
languages such as OWL, is not directly amenable to spatial reasoning. And although
FrameNet, VerbNet and WordNet have been mapped to each other, e.g. (Shi and
Mihalcea 2005), such an integrated resource, given the discussion above, also does
not address our desiderata.
20 As the FrameNet III website indicates, the semantic role AREA is used for expressions which describe ageneral area in which motion takes place when the motion is understood to be irregular and not to consist ofa single linear path. Locative setting adjuncts of motion expressions may also be assigned this frame element.
Introduction 23
-
8/2/2019 Motion - Preps Verbs Pustejovsky
24/29
1.4.2.6 Verb classifications based on qualitative reasoning Let us now turn to other
verb classifications, inspired by work in qualitative spatial reasoning (QSR). One of
the most successful models in QSR, which has been used for static spatial relations, is
the Region Connection Calculus 8 (RCC-8), (Randell et al. 1992), a calculus grounded
in mereotopology (to be discussed in Chapter 2). It identifies the following eight
jointly exhaustive and pairwise disjoint relations between two regions A and B:
(24) a. Disconnected (DC): A and B do not touch each other.
b. Externally Connected (EC): A and B touch each other at their boundaries.
c. Partial Overlap (PO): A and B overlap each other in Euclidean space.
d. Equal (EQ): A and B occupy the exact same Euclidean space.
e. Tangential Proper Part (TPP): A is inside B and touches the boundary of B.
f. Non-tangential Proper Part (NTPP): A is inside B and does not touch the
boundary of B.
g. Tangential Proper Part Inverse (TPPi): B is inside A and touches the bound-
ary of A.
h. Non-tangential Proper Part Inverse (NTPPi): B is inside A and does not
touch the boundary of A.
As we shall see in Chapters 2 and 3, RCC-8 and other systems like it do an adequate
job of representing static information about space. However, it cannot help us deal
with motion, since that task requires a temporal component. Muller (1998) proposes
just such a system, one which merges spatial and temporal phenomena with a
qualitative theory of motion based on spatiotemporal primitives. This system has
at its base a topological system borrowed from Asher and Vieu (1995) that is similar
to RCC-8 but adds the concept of open and closed regions, as well as a set of temporal
relations that include a relation of temporal connection, along with the standard
ordering relations. The result of Mullers system is a set of six motion classes: leave,
hit, reach, external, internal, and cross.
Asher and Sablayrolles (1995) offer a related account of motion verbs and spatialprepositional phrases in French. They propose ten groups of motion verbs as follows:
sapprocher (to approach), arriver (to arrive), entrer (to enter), se poser (to alight),
sloigner (to distance oneself from), partir (to leave), sortir (to go out), dcoller (to
take off), passer (par) (to go through), and dvier(to deviate). This verb classification
is more fine-grained than Mullers. Asher and Sablayrolles, however, do not have any
groups that match well with Mullers internaland external. In addition, Muller does
not include a class for the inverse of hit. The most striking difference between the
accounts is that Asher and Sablayrolles include a notion of metric distance that
Muller does not. This allows the separation of verbs such as approach and reach.For Muller, approach would have to be a simple external motion, which does not
adequately capture the meaning of this verb.
24 Interpreting Motion
-
8/2/2019 Motion - Preps Verbs Pustejovsky
25/29
How do the semantic classifications of Muller, Asher, Sablayrolles, and Vieu
among others relate to those in VerbNet and FrameNet? To answer this, Pustejovsky
and Moszkowicz (2008) mapped Asher and Sablayrolles verbs to VerbNet classes.The mapping revealed that while many of the motion predicates we care about have
specific classes in VerbNet, it is not always clear what these classes have in common
unless we look to FrameNet to find a higher level representation. Pustejovsky and
Moszkowicz (ibid.) therefore considered a mapping to FrameNet, arriving at a more
expressive verb classification. The resulting ten classes are based largely on Mullers
classifications with some very slight modifications detailed in Table 1.1, along with
some revisions we have made. Here X means there is no mapping.
1.4.2.7 Compositional semantics, revisited So far, we have discussed motion verbs aswell as spatial prepositions separately, but of course when they combine together in
sentences there is the question of specifying and composing together the meanings of
each constituent. Our approach, discussed in Chapter 4, leverages a richer semantics
for nouns, prepositions, and motion verbs that allows one to parcel the meaning
contributions of the various constituents appropriately, without promiscuously pro-
liferating preposition senses.
For example, in (5b) discussed earlier (the coffee in the cup), cup has a noun
sense as an open container made of solid material used for drinking; this comes out ofits lexical entry, based on the Generative Lexicon (GL) account of Pustejovsky (1995,
2001). The preposition in has a meaning that involves an underspecified notion of
containment, specifically inside a container. Thus, in the cup involves containment
TABLE 1.1. A revised classification of motion verbs
Class Examples FrameNet Muller
Asher and
Sablayrolles
MOVE drive, fly, run Motion or Self -
motion
X X
MOVE_EXTERNAL drive around, pass Traversing External X MOVE INTERNAL walk around the
room
Motion Internal X
LEAVE desert, leave Departing Internal partir, sortir
REACH arrive, enter, reach Arriving Reach arriver/entrer
ATTACH approach Attaching X X
DETACH disconnect, pull
away, take off
X X dcoller
HIT hit, land Impact Hit se poser
FOLLOW chase, follow Co-Theme X X
DEVIATE flee, run from Fleeing X dvier
STAY remain, stay State continue X X
Introduction 25
-
8/2/2019 Motion - Preps Verbs Pustejovsky
26/29
inside a drinking instrument. Coffee has a noun sense of being constituted of liquid
material. To glue the two together, to get coffee in the cup, the liquid has to be
contained in the container, and for that its convex hull21 is required to be inside the
container. This is achieved within a compositional semantics using GL (based onnotions ofcoercion and co-composition), via an axiom of world knowledge. In (5c),
spoon is an eating instrument with a handle, and constituted of solid material, and
to be contained in a container, it is sufficient for a part of it to be inside the container.
The details of how this integration is performed compositionally are explored in
Pustejovsky (forthcoming).
Likewise, consider the preposition around. In (25a), the walking is outside the
pool, whereas in (25b), the swimming is inside the pool.
(25a) He walked aroundthe pool.(25b) He swam aroundthe pool.
Clearly, it is the verb which differentiates the spatial relationship between figure
and ground in each case, rather than the preposition. Here, around creates a region
that is displaced relative to the ground region, without committing to the direction of
displacement. It is the medium of the motion (a parameter of verb meaning) that has
a contrasting value in this case: swimming involves water as the medium, whereas
walking involves a solid surface, setting aside some notable (e.g. mythological)
exceptions.This overview of approaches and resources for analysis of motion in language
establishes that while there have been a variety of linguistic theories and resources
that provide a classification of motion verbs, a substantial gap exists in terms of
actually representing the spatial semantics of motion in a manner consistent with our
desiderata. The fact that even basic sense differences such as the distinction between
the motion verbs enter and arrive are not adequately explicated by these theories
shows that they are not expressive enough for natural language. We have suggested
that our account has an improved modularity that allows verbs, nouns, and preposi-
tions to contribute spatial meaning in such a way that these meanings can be composedtogether (within a particular GL-derived compositional account) so as to provide fine-
grained meaning differences, without proliferating prepositional senses. Finally, we
have arrived at a verb classification that builds on and extends earlier ones.
1.5 Caveats
An interdisciplinary book like this one is necessarily restricted in scope, and as a
result there are several deliberate lacunae. First and foremost, the theory being
21 The convex hull of a region, treated as a set of points S, is the boundary formed by the minimalconvex set containing S.
26 Interpreting Motion
-
8/2/2019 Motion - Preps Verbs Pustejovsky
27/29
developed here is essentially a semantic one. As such, questions of pragmatics, which
of course are key to the understanding of language in context, are not addressed. We
have already observed that the meaning of spatial prepositions, even when putting
aside metaphorical uses, can involve functional notions such as support and
affordances, i.e. the nature of interactions with the ground object. An especially
compelling argument implicating functional notions is found in the experiments of
Coventry et al. (2001). They showed subjects pictures of the kind displayed in
Figure 1.1, and asked them to rate the acceptability of sentences of the form theFigure is preposition to the Ground, where the prepositions used were over,
above, under, and below. For example, a given sentence could be the umbrella
is over the man. Not only were the ratings related to the degree of rotation of
the figure from the vertical plane, but ratings for functional scenes (the middle row)
were higher than those for controls (top row), which were in turn higher than for
non-functional scenes (bottom row).
In addition to Coventry et al. (2001), there have been a substantial number of other
psycholinguistic investigations into the acceptability of different spatial terms given
geometric and functional relations between figure and ground, e.g. (Logan and
Sadler, 1996; Garrod et al., 1999; Carlson et al., 2003; Coventry, 2003), with the latter
two developing a psychologically-grounded computational model that integrates
FIGURE 1.1 Acceptability ratings, rotation, and functional information, from Coventry (2003, p. 60)
Introduction 27
-
8/2/2019 Motion - Preps Verbs Pustejovsky
28/29
these two types of relations. We will not survey these here; suffice it to say that in our
framework, as discussed in Chapters 3 and 4, we do not as yet address such functional
information or different degrees of centrality in word meaning.
Other topics that we leave out include perceptual accessibility (e.g. visibility andocclusion) of the objects to the viewer. Nor do we consider the pragmatic conditions
under which particular spatial references take place and succeed (e.g. the speaker s
choice of a reference frame and point-of-view, the details of a spatial description in
the presence of particular distractors in the environment, etc.). A good discussion of
these and other factors is found in the work of Tenbrink (2007). Finally, a book of this
limited length cannot claim to offer a thorough survey of the field; in the course of
our exposition, the best we can do is to cite other papers that introduce the reader to
the relevant literature.
1.6 Conclusion
Let us first summarize the argument so far. We launched this book with a discussion
of the substantial challenges faced by todays text-to-sketch technology in terms of
comprehending natural language. We based our approach on two key insights from
the previous literature: research on the types of spatial abstractions underlying
language use, and the distinction between satellite-framing patterns (used with
manner-of-motion verbs like
bike
,
drive
,fl
y
etc.) and verb-framing patterns(used in path-verbs such as arrive, depart etc.). The former provides inspiration
for our account of qualitative spatial relations based on a theory of mereotopology, to
be explicated in Chapter 3. The latter distinction motivated our differentiating, in our
semantic theory, between action-based and path-based predicates, leading to a first-
order dynamic logic (discussed in Chapters 2 and 4) where events are modeled as
dynamic processes or static situations.
For the approach to be of practical use in computational approaches, five specific
requirements have to be met. When considered in the light of these requirements, the
prior theories of spatial prepositions turned out to be rich in fundamental insights,but made assumptions untenable for a computational approach, while also ignoring
evidence from corpus-based word-sense disambiguation. While compositional treat-
ments of the semantics of spatial prepositions were available, the question of what
underlying spatial primitives to rely on was not tied to those available in qualitative
reasoning systems. As for motion verbs, we found a gap in terms of a lack of
expressiveness and some specific shortcomings with respect to our desiderata. We
indicated how the compositional integration of prepositional, verb, and noun mean-
ings will be handled in our framework. We also proposed what we believe to be amore expressive verb classification than has been hitherto considered. Finally, we
listed some of the obvious lacunae in our approach.
28 Interpreting Motion
-
8/2/2019 Motion - Preps Verbs Pustejovsky
29/29
In Chapter 2, we will delve more deeply into how motion is expressed in natural
languages, introducing a framework that analyzes different parameters of spatial
meaning in natural language in terms of successively more expressive representation
languages. Following that, in Chapter 3, we will examine spatial and temporalrepresentations and inference methods that have been developed based on qualitative
reasoning, applying them to spatial phenomena in language involving topological
and orientation relations. Chapter 4 applies the methods discussed in Chapters 2 and 3
to motion, providing a grounding for the semantics of motion expressions in
language within a cognitively inspired spatiotemporal model of change. We demon-
strate how the two linguistic strategies for encoding motion (that of path construc-
tions and manner-of-motion constructions) can be modeled within an operational
(dynamic) interval temporal logic. We also show how prepositional, noun, and verb
meanings are integrated together